Readability Formula Scores: Flesch-Kincaid, Gunning Fog, SMOG, and More
Reference for 7 readability formulas used in website content analysis: Flesch-Kincaid, Flesch Reading Ease, Gunning Fog, SMOG, ARI, Coleman-Liau, and Dale-Chall. What each measures, how scores are interpreted, and where each is most applicable.
Last updated 02/08/2026
Readability formulas estimate text complexity by analyzing surface features of writing: sentence length, syllable count, word familiarity, and character density. Seven formulas are commonly applied to web content. Each uses different inputs and produces different scales. No single formula captures comprehension difficulty completely, because comprehension depends on domain knowledge, context, and reader intent, none of which these formulas measure.
What readability formulas measure
Readability formulas produce a numerical estimate of how difficult a text is to read. They operate on structural features of writing, not meaning. Sentence length, word length, syllable count, and vocabulary familiarity are the primary inputs. The output is typically a grade level or a score on a fixed scale. These formulas were developed for print media and educational materials. They remain useful as comparative signals when applied consistently across pages, but they measure text surface, not reader experience.
Flesch-Kincaid Grade Level
Flesch-Kincaid Grade Level maps text complexity to a US school grade. It uses average sentence length and average syllables per word as inputs. A score of 8.0 means the text is estimated to require an eighth-grade reading level. The formula weights sentence length and syllable density roughly equally. It is the most widely cited readability metric in content tooling and compliance standards, which makes it useful as a baseline for comparison across pages and over time. Its limitation is that it penalizes technical vocabulary regardless of whether the audience already knows those terms.
Flesch Reading Ease
Flesch Reading Ease uses the same inputs as Flesch-Kincaid Grade Level but inverts the scale. Scores range from 0 to 100, where higher values indicate easier text. A score between 60 and 70 is considered broadly accessible. Below 30 indicates academic or specialist-level density. The inversion makes it more intuitive for non-technical stakeholders: higher is easier. It remains one of the few readability metrics with a direct interpretation that does not require mapping to a grade system.
Gunning Fog Index
The Gunning Fog Index emphasizes complex words, defined as words with three or more syllables. It combines average sentence length with the percentage of complex words to produce a grade-level estimate. Fog tends to produce higher scores than Flesch-Kincaid for the same text because it treats polysyllabic words as a stronger difficulty signal. This makes it particularly sensitive to jargon-heavy writing. The tradeoff is that it can overpenalize text that uses long but common words, and it does not distinguish between unfamiliar terminology and ordinary multisyllabic vocabulary.
SMOG Grade
SMOG (Simple Measure of Gobbledygook) estimates reading grade level using only the count of polysyllabic words across a sample of sentences. It requires a minimum of 30 sentences for reliable output. SMOG was developed specifically for health literacy assessment and tends to correlate more closely with actual comprehension test results than other formulas. It is the preferred metric in medical, pharmaceutical, and public health content standards. For general web content, SMOG often produces grade levels 1 to 2 points higher than Flesch-Kincaid on the same text.
Automated Readability Index
The Automated Readability Index (ARI) uses character count per word and words per sentence to produce a grade-level estimate. It avoids syllable counting entirely, which eliminates ambiguity around syllabification rules in English. This makes ARI computationally simpler and more consistent across implementations. The tradeoff is reduced sensitivity: a word like "through" and a word like "approximately" may score similarly despite different recognition difficulty. ARI is useful when consistency of measurement matters more than precision of difficulty estimation.
Coleman-Liau Index
The Coleman-Liau Index also avoids syllable counting. It uses average number of characters per 100 words and average number of sentences per 100 words. Like ARI, this approach removes syllabification as a variable. Coleman-Liau was designed to be calculable by machine without phonetic analysis, which made it practical for early text processing systems. It tends to produce results close to Flesch-Kincaid for most web content, with divergence appearing in text that uses many short, unfamiliar words or few long, common ones.
Dale-Chall Readability Score
Dale-Chall takes a fundamentally different approach. Instead of measuring word length or syllable count, it checks words against a list of approximately 3,000 words familiar to most fourth-grade readers. Words not on the list are counted as unfamiliar. The score is then derived from the percentage of unfamiliar words combined with average sentence length. This method better captures vocabulary difficulty than syllable-based formulas, because short words can be obscure and long words can be common. The limitation is that the familiarity list reflects general English, not domain-specific language. Technical content will always score as difficult under Dale-Chall, even when the audience routinely uses that vocabulary.
Where teams encounter readability scores
Readability scores surface during content audits, accessibility reviews, and SEO analysis. They appear as flags or metrics across multiple tools, often without context about which formula was used or why a particular threshold was chosen. Teams frequently encounter a situation where one tool reports content as acceptable and another flags the same content as too complex. This is usually a formula difference, not a content problem.
Why these measurements exist
Readability formulas provide a repeatable, page-level signal for text complexity. They are most useful for detecting drift: content that has grown more complex over time without deliberate intent, or pages within the same section that vary widely in reading level without reason. They are least useful as targets. A readability score is a measurement, not a goal. The appropriate complexity for a page depends on its audience, purpose, and the precision required by its subject matter.
Scope
Readability scores apply at the page level. They are calculated against the visible text content of a page, excluding markup, navigation, and boilerplate. Scores are most informative when compared across pages within the same section or content type, not when evaluated in isolation.
How to verify
Run the page content through multiple formulas and compare results. Consistent signals across formulas indicate a genuine complexity characteristic. A score that appears extreme under one formula but normal under others usually reflects a formula-specific sensitivity rather than a content problem. When scores change over time, check whether the content changed intentionally.
What becomes visible with readability scoring
- Which pages have drifted toward unnecessary complexity
- Where reading level varies significantly within a content section
- Whether content rewrites changed difficulty intentionally or accidentally
- How different formulas respond to the same content, revealing which text features are driving the score