English phonology is the study of the sound system (phonology) of the English language. Like many languages, English has wide variation in pronunciation, both historically and from dialect to dialect. In general, however, the major regional dialects of English are mutually intelligible.

Although there are many dialects of English, the following are usually used as prestige or standard accents: Received Pronunciation for the United Kingdom, General American for the United States, and General Australian for Australia.



See IPA chart for English dialects for concise charts of the English phonemes.

A phoneme is a sound or a group of different sounds which is/are all perceived to have the same function by speakers of the language or dialect in question. For example, the word "sound" has four phonemes: the "s", the vowel diphthong "ou", the "n", and the "d". Note that a phoneme is a feature of pronunciation, not of spelling (which in English sometimes does not relate directly to the phonemes that are present: e.g., "cough" has three phonemes — the initial consonant sound, the monopthong vowel sound "aw", and the final consonant sound "f").

The number of speech sounds in English varies from dialect to dialect, and any actual tally depends greatly on the interpretation of the researcher doing the counting. The Longman Pronunciation Dictionary by John C. Wells, for example, denotes 24 consonants and 23 vowels used in Received Pronunciation, plus two additional consonants and four additional vowels used in foreign words only. For General American, it provides for 25 consonants and 19 vowels, with one additional consonant and three additional vowels for foreign words. The American Heritage Dictionary, on the other hand, suggests 25 consonants and 18 vowels (including r-colored vowels) for American English, plus one consonant and five vowels for non-English terms [1].


The following table shows the consonant phonemes found in most dialects of English. When consonants appear in pairs, fortis consonants (i.e., aspirated or voiceless) appear on the left and lenis consonants (i.e., lightly voiced or voiced) appear on the right:

Consonant phonemes of English
  Bilabial Labio-
Dental Alveolar Post-
Palatal Velar Glottal
Nasal1 m n ŋ
Plosive p  b t  d k  ɡ
Affricate tʃ  dʒ
Fricative f  v θ  ð s  z ʃ  ʒ (x)3 h
Approximant ɹ1, 2, 5 j w4
Lateral l1, 6
  1. Some phonologists identify syllabic nasals and liquids in unstressed syllables, while others analyse these phonemically as C/.
  2. Postalveolar consonants are usually labialized (e.g., [ʃʷ]), as is word-initial or pre-tonic /r/ (i.e., [ɹʷ]), though this is rarely transcribed.
  3. The voiceless velar fricative /x/ is dialectal, occurring largely in Scottish English. In other dialects, words with these sounds are pronounced with /k/. It may appear in recently-domiciled words such as chutzpah.
  4. The sequence /hw/, a voiceless labiovelar approximant [hw̥], is sometimes considered an additional phoneme. For most speakers, words that historically used to have these sounds are now pronounced with /w/; the phoneme /hw/ is retained, for example, in much of the American South, Scotland, and Ireland.
  5. Depending on dialect, /r/ may be an alveolar approximant [ɹ], postalveolar approximant, retroflex approximant [ɻ], or labiodental approximant [ʋ], along with other possibilities.
  6. Many dialects have two allophones of /l/—the "clear" L and the "dark" or velarized L. In some dialects, /l/ may be always clear (e.g. Wales, Ireland, the Caribbean) or always dark (e.g. Scotland, most of North America, Australia, New Zealand).
/p/ pit /b/ bit
/t/ tin /d/ din
/k/ cut /ɡ/ gut
/tʃ/ cheap /dʒ/ jeep
/f/ fat /v/ vat
/θ/ thin /ð/ then
/s/ sap /z/ zap
/ʃ/ she /ʒ/ measure
/x/ loch
/w/ we /m/ map
/l/ left /n/ nap
/ɹ/ run (also ⟨r⟩, ⟨ɻ⟩) /j/ yes
/h/ ham /ŋ/ bang


An allophone is one of a set of multiple possible spoken sounds (or phones) used to pronounce a single phoneme. For example, the phoneme /t/ is pronounced differently in "tonsils" than in "button", and still differently in "cat". All of these "t" sounds are allophones of the same phoneme, since no two words can be distinguished from each other solely on the basis of which of these pronunciations is used.

Although regional variation is very great across English dialects, some generalizations can be made about pronunciation in all (or at least the vast majority) of English accents:

  • The voiceless stops /p t k/ are aspirated [pʰ tʰ kʰ] at the beginnings of words (for example tomato) and at the beginnings of word-internal stressed syllables (for example potato). They are unaspirated [p t k] after /s/ (stan, span, scan) and at the ends of syllables.
  • For many people, /r/ is somewhat labialized in some environments, as in reed [ɹʷiːd] and tree [tʰɹ̥ʷiː]. In the latter case, the [t] may be slightly labialized as well.[1]

The phoneme /t/ has six different allophones,[2]:pp.62-67 differing somewhat between British and American English. As noted above, /t/ is aspirated as [tʰ] at the beginning of a word or stressed syllable, but unaspirated as [t] after /s/. After a stressed syllable and at the beginning of an unstressed syllable, after a vowel or /r/ and before a vowel or a syllabic / l /, as in water or bottle, in American English it is pronounced as a voiced flap [ɾ] that is indistinguishable from /d/ (so that, for example, petal and peddle sound alike); this flap may even appear at word boundaries, as in put it on. But British English does not use the flap, instead de-aspirating [tʰ] somewhat. When /t/ follows /n/ and precedes an unstressed vowel, as in winter, the /t/ is pronounced by some speakers of American English as a nasalized flap that is identical to the /n/ flap and hence becomes essentially silent, so that for example /nt/ is indistinguishable from /n/ in winter / winner. Before /n/, as in catnip and button, British and American English pronounce /t/ as a glottal stop [ʔ], allowing a distinction in pronunciation between, for example, Sutton and sudden or bitten and bidden. Finally, final /t/ as in cat is not released, and may be glottalized in British English. However, in speech with careful enunciation, in all situations /t/ may be pronounced as [t] or [tʰ].

The phoneme /n/ is usually pronounced as [n], but before /k/ the allophone [ŋ] usually appears (mandatorily in stressed syllables and optionally in unstressed syllables). For example, sink is pronounced as [sɪŋk], never as [sɪnk]. This allophonic change can even occur across syllable boundaries: synchrony is pronounced as ['sɪŋkɹəni] whereas synchronic may be pronounced either as [sɪŋ'kɹɑnɪk] or as [sɪn'kɹɑnɪk]. Note that when not followed by /k/, /ŋ/ serves as an English phoneme in its own right, as for example in sing [siŋ]; but there is no phonemic distinction between [ŋk] and [nk].


The vowels of English differ considerably between dialects. Because of this, corresponding vowels may be transcribed with various symbols depending on the dialect under consideration. When considering English as a whole, no specific phonemic symbols are chosen over others; instead, lexical sets are used, each named by a word containing the vowel in question. For example, the vowel of the LOT set ("short o") is transcribed /ɒ/ in Received Pronunciation, /ɔ/ in Australian English, and /ɑ/ in General American. For an overview of these diaphonemic correspondences, see IPA chart for English dialects.

Monophthongs of Received Pronunciation[3]
Front Central Back
long short long short long short
Close ɪ ʊ
Mid ɛ ɜː ə ɔː
Open æ ʌ* ɑː ɒ
Monophthongs of Australian English
Front Central Back
long short long short long short
Close ɪ ʉː ʊ
Mid e ɜː ə ɔ
Open æː æ a

^* The vowel of strut is closer to a Near-open central vowel ([ɐ]) in RP, though ⟨ʌ⟩ is still used for tradition (it was historically a back vowel) and because it is still back in other varieties.[4]

The monophthong phonemes of General American differ in a number of ways from Received Pronunciation:

  1. The central vowel of nurse is rhotic [ɝ] (also transcribed as a syllabic [ɹ̩].
  2. Speakers make a phonemic distinction between rhotic /ɚ/ and non-rhotic /ə/.
  3. No distinction is made between /ɒ/ and /ɑː/, nor for some speakers between these vowels and /ɔː/.

Reduced vowels occur in some unstressed syllables. (Other unstressed syllables may have full vowels, which some dictionaries mark as secondary stress.) The number of distinctions made among reduced vowels varies by dialect. In some dialects vowels are centralized but otherwise kept mostly distinct, while in Australia, New Zealand and some US dialects[citation needed] all reduced vowels collapse to a schwa [ə]. In Received Pronunciation, there is a distinct high reduced vowel, which the OED writes ɪ.

  • [ɪ]: roses (merged with [ə] in Australian and New Zealand English)
  • [ə]: Rosa’s, runner
  • [l̩]: bottle
  • [n̩]: button
  • [m̩]: rhythm
English diphthongs
RP Australian American
low /əʊ/ /əʉ/ /oʊ/
loud /aʊ/ /æɔ/ /aʊ/
lied /aɪ/ /ɑe/ /aɪ/
lane /eɪ/ /æɪ/ /eɪ/
loin /ɔɪ/ /oɪ/ /ɔɪ/
leer /ɪə/ /ɪə/ /ɪɚ/[d 1]
lair /ɛə/[d 2] /eː/ /ɛɚ/[d 1]
lure /ʊə/[d 2] (/ʊə/)[d 3] /ʊɚ/[d 1]
  1. ^ a b c In rhotic dialects, words like pair, poor, and peer can be analyzed as diphthongs, although other descriptions analyze them as vowels with /r/ in the coda.[5]
  2. ^ a b In Received Pronunciation, the vowels in lair and lure may be monophthongized to [ɛː] and [oː] respectively.[6]
  3. ^ In Australian English, the vowel /ʊə/ is often omitted from descriptions as for most speakers it has split into the long monophthong /oː/ (e.g. poor, sure) or the sequence /ʉː.ə/ (e.g. cure, lure).[7]

Reduced vowels

Vowel reduction refers to the weakening of a vowel sound in certain situations. In English this typically involves decreasing its volume, decreasing its duration, and pronouncing it more like a schwa, as in the vowel sound in the second syllable of "typically".

Linguists such as Ladefoged[8] and Bolinger[9] argue that vowel reduction is phonemic in English (that is, that it allows otherwise identical words to be distinguished from each other), and that there are two "tiers" of vowels in English, full and reduced; traditionally many English dictionaries have attempted to mark the distinction by transcribing unstressed full vowels as having "secondary" stress, though this was later abandoned by the Oxford English Dictionary. Though full unstressed vowels may derive historically from stressed vowels, either because stress shifted over time (such as stress shifting away from the final syllable of French loan words in British English) or because of loss or shift of stress in compound words or phrases (óverseas vóyage from overséas or óverséas plus vóyage), the distinction is not one of stress but of vowel quality (Bolinger 1989:351), and over time, if the word is frequent enough, the vowel tends to reduce.

English has up to five reduced vowels, though this varies with dialect and speaker. Schwa /ə/ is found in all dialects, and a rhotic schwa ("schwer") /ɚ/ is found in rhotic dialects. Less common is a high reduced vowel ("schwi") /ɪ̈/ (also "/ɪ/"); the two are distinguished by many people in Rosa's /ˈroʊzəz/ vs roses /ˈroʊzɪ̈z/. More unstable is a rounded schwa, /ö/ (also /ɵ/); this contrasts for some speakers in a mission /əˈmɪʃən/, emission /ɪ̈ˈmɪʃən/, and omission /ɵˈmɪʃən/. In words like following, the following vowel is preceded by a [w] even in dialects that otherwise don't have a rounded schwa: [ˈfɒlɵwɪŋ, ˈfɒləwɪŋ]. A high rounded schwa /ʊ̈/ (also "/ʊ/") may be found in words such as into /ˈɪntʊ̈/, though in many dialects this is not distinguished from /ɵ/.

Though speakers vary, full and reduced unstressed vowels may contrast in pairs of words like Shogun /ˈʃoʊɡʌn/ and slogan /ˈsloʊɡən/, chickaree /ˈtʃɪkəriː/ and chicory /ˈtʃɪkərɪ̈/, Pharaoh /ˈfɛəroʊ/ and farrow /ˈfæroʊ/ (Bolinger 1989:348), Bantu /ˈbæntuː/ and into /ˈɪntʊ̈/ (OED).


  • A distinction is made between tense and lax vowels in pairs like beet/bit and bait/bet, although the exact phonetic implementation of the distinction varies from accent to accent. However, this distinction collapses before [ŋ].
  • Wherever /r/ originally followed a tense vowel or diphthong (in Early Modern English) a schwa offglide was inserted, resulting in centering diphthongs like [iə] in beer [biəɹ], [uə] in poor [puəɹ], [aɪə] in fire [faɪəɹ], [aʊə] in sour [saʊəɹ], and so forth. This phenomenon is known as breaking. The subsequent history depends on whether the accent in question is rhotic or not: In non-rhotic accents like RP the postvocalic [ɹ] was dropped, leaving [biə, puə, faɪə, saʊə] and the like (now usually transcribed [bɪə, pʊə] and so forth). In rhotic accents like General American, on the other hand, the [əɹ] sequence was coalesced into a single sound, a non-syllabic [ɚ], giving [biɚ, puɚ, faɪɚ, saʊɚ] and the like (now usually transcribed [bɪɹ, pʊɹ, faɪɹ, saʊɹ] and so forth). As a result, originally monosyllabic words like those just mentioned came to rhyme with originally disyllabic words like seer, doer, higher, power.
  • In many (but not all) accents of English, a similar breaking happens to tense vowels before /l/, resulting in pronunciations like [piəɫ] for peel, [puəɫ] for pool, [peəɫ] for pail, and [poəɫ] for pole.
  • /h/ becomes [ç˕] before [j] and [i], as in human [ˈç˕juːmən] or [ˈç˕uːmən] where it is not dropped.
  • The quality of the vowel /aɪ/ is influenced by a following unvoiced stop, fricative, or affricate, which makes the vowel less open.[2]:p.66 Thus, for example, writer is distinguished from rider even though the /t/ and /d/ are pronounced essentially identically in this environment; and the vowel quality in fife differs from that in five.

Transcription variants

The choice of which symbols to use for phonemic transcriptions may reveal theoretical assumptions or claims on the part of the transcriber. English "lax" and "tense" vowels are distinguished by a synergy of features, such as height, length, and contour (monophthong vs. diphthong); different traditions in the linguistic literature emphasize different features. For example, if the primary feature is thought to be vowel height, then the non-reduced vowels of General American English may be represented according to the table to the left and below. If, on the other hand, vowel length is considered to be the deciding factor, the symbols in the table to the below and center may be chosen (this convention has sometimes been used because the publisher did not have IPA fonts available, though that is seldom an issue any longer.) The rightmost table lists the corresponding lexical sets.

General American full vowels,
vowel height distinctive
i u
ɪ ʊ
e ɚ o
ɛ ʌ ɔ
æ ɑ
General American full vowels,
vowel length distinctive
i u
e ʌ o
Lexical sets representing
General American full vowels

If vowel transition is taken to be paramount, then the chart may look like one of these:

General American full vowels,
vowel contour distinctive
ij uw
i u
ej ər ow
e ə o
æ ɑ
General American full vowels,
vowel contour distinctive
ɪi̯ ʊu̯
ɪ ʊ
ɛɪ̯ ɚɹ ɔʊ̯
ɛ ʌ ɔ
æ ɑ

(The transcriber at left assumes that there is no phonemic distinction between semivowels and approximants, so that /ej/ is equivalent to /eɪ̯/.)

Many linguists combine more than one of these features in their transcriptions, suggesting they consider the phonemic differences to be more complex than a single feature.

General American full vowels,
height & length distinctive
ɪ ʊ
ɛ ʌ ɔ
æ ɑː
General American full vowels,
height & contour distinctive
ij uw
ɪ ʊ
ej ɜr ow
ɛ ʌ ɔ
æ ɑː


Prosody consists of stress, rhythm, and intonation, which occur in English as follows.


Stress is phonemic in English. For example, the words desert and dessert are distinguished in part by stress (and in part by vowel reduction in unstressed syllables), as are the noun a record and the verb to record. Stressed syllables in English are louder than non-stressed syllables, as well as being longer and having a higher pitch. They also tend to have a fuller realization[clarification needed] than unstressed syllables.

Examples of stress in English words, using boldface to represent stressed syllables, are holiday, alone, admiration, confidential, degree, and weaker. Ordinarily, grammatical words (auxiliary verbs, prepositions, pronouns, and the like) do not receive stress, whereas lexical words (nouns, verbs, adjectives, etc.) must have at least one stressed syllable.

Traditional approaches describe English as having three degrees of stress: Primary, secondary, and unstressed. However, if stress is defined as relative respiratory force (that is, it involves greater pressure from the lungs than unstressed syllables), as most phoneticians argue, and is inherent in the word rather than the sentence (that is, it is lexical rather than prosodic), then these traditional approaches conflate two distinct processes: stress, and vowel reduction. In this case, primary stress is actually prosodic stress, whereas secondary stress is simple stress in some positions, and an unstressed but not reduced vowel in others. Either way, there is a three-way phonemic distinction: either three degrees of stress, or else stressed, unstressed, and reduced. The two approaches are sometimes conflated into a four-way 'stress' classification: primary (tonic stress), secondary (lexical stress), tertiary (unstressed full vowel), and quaternary (reduced vowel). See secondary stress for details.

Initial-stress-derived nouns are nouns that are derived from verbs by changing the position of their stress. For example, a rebel [ˈɹɛb.ɫ̩] (stress on the first syllable) is inclined to rebel [ɹɨ.ˈbɛɫ] (stress on the second syllable) against the powers that be. The number of words using this pattern as opposed to only stressing the second syllable in all circumstances doubles every century or so, and includes words such as object, convict, and addict.

Prosodic stress is extra stress given to words when they appear in certain positions in an utterance, or when they receive special emphasis. It normally appears on the final stressed syllable in an intonation unit. So, for example, when the word admiration is said in isolation, or at the end of a sentence, the syllable ra is pronounced with greater force than the syllable ad. (This is traditionally transcribed as /ˌædmɨˈreɪʃən/.) This is the origin of the primary stress-secondary stress distinction. However, the difference disappears when the word is not pronounced with this final intonation.

Prosodic stress can shift for various pragmatic functions, such as focus or contrast. For instance, consider the dialogue

"Is it brunch tomorrow?"
"No, it's dinner tomorrow."

In this case, the extra stress shifts from the last stressed syllable of the sentence, tomorrow, to the last stressed syllable of the emphasized word, dinner. Compare

"I'm going tomorrow." /aɪm ˈɡoʊɪŋ təˈˈmɒroʊ/


"It's dinner tomorrow." /ɪts ˈˈdɪnɚ təˈmɒroʊ/

Although grammatical words generally do not have lexical stress, they do acquire prosodic stress when emphasized. Compare ordinary

"Come in"! /ˈˈkʌm ɪn/

with more emphatic

"Oh, do come in!" /oʊ ˈˈduː kʌm ˈɪn/


English is a stress-timed language. That is, stressed syllables appear at a roughly steady tempo, and non-stressed syllables are shortened to accommodate this.


English declarative sentences generally have a pattern of rising pitch on the final stressed syllable followed by falling pitch on the subsequent unstressed syllables (or on the last part of the final stressed syllable itself, if it is also the last syllable of the sentence). But if something is left unsaid, the final fall in pitch occurs only to a lesser extent. Wh-questions, and tag questions with declarative intent, follow the same pattern as do declarative sentences.

In contrast, yes-no questions show pitch rising on the last stressed syllable, and remaining high on any subsequent syllables.


Most languages of the world syllabify CVCV and CVCCV sequences as /CV.CV/ and /CVC.CV/ or /CV.CCV/, with consonants preferentially acting as the onset of a syllable containing the following vowel. According to one view, English is unusual in this regard, in that stressed syllables attract following consonants, so that ˈCVCV and ˈCVCCV syllabify as /ˈCVC.V/ and /ˈCVCC.V/, as long as the consonant cluster CC is a possible syllable coda.[10] In addition, according to this view, /r/ preferentially syllabifies with the preceding vowel even when both syllables are unstressed, so that CVrV occurs as /CVr.V/.[10] However, many scholars do not agree with this view.[10]

Syllable structure

The syllable structure in English is (C)3V(C)5, with a near maximal example being strengths (/ˈstrɛŋkθs/, although it can be pronounced /ˈstrɛŋθs/).[11] Because of an extensive pattern of articulatory overlap, English speakers rarely produce an audible release in consonant clusters.[12] This can lead to cross-articulations that seem very much like deletions or complete assimilations.

For example, hundred pounds may sound like [hʌndɹɛb pʰaʊndz] but X-ray[13] and electropalatographic[14][15] studies demonstrate that inaudible and possibly weakened contacts may still be made so that the second /d/ in hundred pounds does not entirely assimilate a labial place of articulation, rather the labial co-occurs with the alveolar one.

When a stressed syllable contains a pure vowel (rather than a diphthong), followed by a single consonant and then another vowel, as in holiday, many native speakers feel that the consonant belongs to the preceding stressed syllable, /ˈhɒl.ɨ.deɪ/. However, when the stressed vowel is a long vowel or diphthong, as in admiration or pekoe, speakers agree that the consonant belongs to the following syllable: /ˈæd.mɨ.ˈreɪ.ʃən/, /ˈpiː.koʊ/. Wells (1990)[10] notes that consonants syllabify with the preceding rather than following vowel when the preceding vowel is the nucleus of a more salient syllable, with stressed syllables being the most salient, reduced syllables the least, and secondary stress / full unstressed vowels intermediate. But there are lexical differences as well, frequently with compound words but not exclusively.

For example, in dolphin and selfish, he argues that the stressed syllable ends in /lf/, but in shellfish, the /f/ belongs with the following syllable: /ˈdɒlf.ɪn/, /ˈsɛlf.ɪʃ/[ˈdɒlfɨn], [ˈsɛlfɨʃ] vs /ˈʃɛl.fɪʃ/[ˈʃɛlˑfɪʃ], where the /l/ is a little longer and the /ɪ/ not reduced.

Similarly, in toe-strap the /t/ is a full plosive, as usual in syllable onset, whereas in toast-rack the /t/ is in many dialects reduced to the unreleased allophone it takes in syllable codas, or even elided: /ˈtoʊ.stræp/, /ˈtoʊst.ræk/[ˈtʰoˑʊstɹæp], [ˈtoʊs(t̚)ɹʷæk]; likewise nitrate /ˈnaɪ.treɪt/[ˈnʌɪtɹ̥ʷeɪt] with a voiceless /r/, vs night-rate /ˈnaɪt.reɪt/[ˈnʌɪt̚ɹʷeɪt] with a voiced /r/. Cues of syllable boundaries include aspiration of syllable onsets and (in the US) flapping of coda /t, d/ (a tease /ə.ˈtiːz/[əˈtʰiːz] vs. at ease /æt.ˈiːz/[æɾˈiːz]), epenthetic plosives like [t] in syllable codas (fence /ˈfɛns/[ˈfɛnts] but inside /ɪn.ˈsaɪd/[ɪnˈsaɪd]), and r-colored vowels when the /r/ is in the coda vs. labialization when it is in the onset (key-ring /ˈkiː.rɪŋ/[ˈkʰiːɹʷɪŋ] but fearing /ˈfiːr.ɪŋ/[ˈfɪəɹɪŋ]).


The following can occur as the onset:

All single consonant phonemes except /ŋ/  
Plosive plus approximant other than /j/:

/pl/, /bl/, /kl/, /ɡl/, /pr/, /br/, /tr/,[1] /dr/,[1] /kr/, /ɡr/, /tw/, /dw/, /ɡw/, /kw/, /pw/

play, blood, clean, glove, prize, bring, tree,[1] dream,[1] crowd, green, twin, dwarf, language, quick, puissance
Voiceless fricative plus approximant other than /j/:[2]

/fl/, /sl/, /θl/,[3] /fr/, /θr/, /ʃr/, /hw/,[4] /sw/, /θw/, /vw/

floor, sleep, thlipsis,[3] friend, three, shrimp, what,[4] swing, thwart, reservoir
Consonant plus /j/ (before /uː/ or /ʊr/):

/pj/, /bj/, /tj/,[5] /dj/,[5] /kj/, /ɡj/, /mj/, /nj/,[5] /fj/, /vj/, /θj/,[5] /sj/,[5] /zj/,[5] /hj/, /lj/[5]

pure, beautiful, tube,[5] during,[5] cute, argue, music, new,[5] few, view, thew,[5] suit,[5] Zeus,[5] huge, lurid[5]
/s/ plus voiceless plosive:[6]

/sp/, /st/, /sk/

speak, stop, skill
/s/ plus nasal other than /ŋ/:[6]

/sm/, /sn/

smile, snow
/s/ plus voiceless fricative:[3]

/sf/, /sθ/

sphere, sthenic
/s/ plus voiceless plosive plus approximant:[6]

/spl/, /skl/,[3] /spr/, /str/, /skr/, /skw/, /smj/, /spj/, /stj/,[5] /skj/

split, sclera, spring, street, scream, square, smew, spew, student,[5] skewer
/s/ plus voiceless fricative plus approximant:[3]




  1. In many dialects, /tr/ and /dr/ tend to affricate, so that tree resembles "chree", and dream resembles "jream".[16][17][18] This is sometimes transcribed as [tʃr] and [dʒr] respectively, but the pronunciation varies and may, for example, be closer to [tʂ] and [dʐ][19] or with a fricative release similar in quality to the rhotic, i.e. [tɹ̝̊ɹ̥], [dɹ̝ɹ], or [tʂɻ], [dʐɻ].
  2. In some dialects[which?], /wr/ (rather than /r/) occurs in words beginning in wr- (write, wrong, wren, etc.).[citation needed]
  3. Words beginning in unusual consonant clusters that originated in Latinized Greek loanwords tend to drop the first phoneme, as in */bd/, */fθ/, */ɡn/, */hr/, */kn/, */ks/, */kt/, */kθ/, */mn/, */pn/, */ps/, */pt/, */tm/, and */θm/, which have become /d/ (bdellium), /θ/ (phthisis), /n/ (gnome), /r/ (rhythm), /n/ (cnidoblast), /z/ (xylophone), /t/ (ctenophore), /θ/ (chthonic), /n/ (mnemonic), /n/ (pneumonia), /s/ (psychology), /t/ (pterodactyl), /m/ (tmesis), and /m/ (asthma). However, the onsets /sf/, /sfr/, /skl/, /sθ/, and /θl/ have remained intact.
  4. The onset /hw/ is simplified to /w/ in many dialects (wine–whine merger).
  5. There is an on-going sound change (yod dropping) by which /j/ as the final consonant in a cluster is being lost. In RP, words with /sj/ and /lj/ can usually be pronounced with or without this sound, e.g., [suːt] or [sjuːt]. For some speakers of English, including some British speakers, the sound change is more advanced and so, for example, General American does not contain the onsets /tj/, /dj/, /nj/, /θj/, /sj/, /stj/, /zj/, or /lj/. Words that would otherwise begin in these onsets drop the /j/: e.g., tube (/tuːb/), during (/ˈdʊrɪŋ/), new (/nuː/), Thule (/ˈθuːliː/), suit (/suːt/), student (/ˈstuːdənt/), Zeus (/zuːs/), lurid (/ˈlʊrɪd/). In some dialects, such Welsh English, /j/ may occur in more combinations; for example in /tʃj/ (chew), /dʒj/ (Jew), /ʃj/ (sure), and /slj/ (slew).
  6. Many clusters beginning with /ʃ/ and paralleling native clusters beginning with /s/ are found initially in German and Yiddish loanwords, such as /ʃl/, /ʃp/, /ʃt/, /ʃm/, /ʃn/, /ʃpr/, /ʃtr/ (in words such as schlep, spiel, shtick, schmuck, schnapps, Shprintzen's, strudel). /ʃw/ is found initially in the Hebrew loanword schwa. Before /r/ however, the native cluster is /ʃr/. The opposite cluster /sr/ is found in loanwords such as Sri Lanka, but this can be nativized by changing it to /ʃr/.
Other onsets

Certain English onsets appear only in contractions: e.g., /zbl/ ('sblood), and /zw/ or /dzw/ ('swounds or 'dswounds). Some, such as /pʃ/ (pshaw), /fw/ (fwoosh), or /vr/ (vroom), can occur in interjections. An archaic voiceless fricative plus nasal exists, /fn/ (fnese), as does an archaic /snj/ (snew).

A few other onsets occur in further (anglicized) loan words, including /bw/ (bwana), /mw/ (moiré), /nw/ (noire), /zw/ (zwieback), /kv/ (kvetch), /ʃv/ (schvartze), /tv/ (Tver), /vl/ (Vladimir), and /zl/ (zloty).

Some clusters of this type can be converted to regular English phonotactics by simplifying the cluster: e.g. /(d)z/ (dziggetai), /(h)r/ (Hrolf), /kr(w)/ (croissant), /(p)f/ (pfennig), /(f)θ/ (phthalic), and /(t)s/ (tsunami).

Others can be substituted by native clusters differing only in voice: /zb ~ sp/ (sbirro), and /zɡr ~ skr/ (sgraffito).


The following can occur as the nucleus:


Most (in theory, all) the following except those that end with /s/, /z/, /ʃ/, /ʒ/, /tʃ/ or /dʒ/ can be extended with /s/ or /z/ representing the morpheme -s/z-. Similarly, most (in theory, all) the following except those that end with /t/ or /d/ can be extended with /t/ or /d/ representing the morpheme -t/d-.

Wells (1990) argues that a variety of syllable codas are possible in English, even /ntr, ndr/ in words like entry /ˈɛntr.ɪ/ and sundry /ˈsʌndr.ɪ/, with /tr, dr/ being treated as affricates along the lines of /tʃ, dʒ/. He argues that the traditional assumption that pre-vocalic consonants form a syllable with the following vowel is due to the influence of languages like French and Latin, where syllable structure is CVC.CVC regardless of stress placement. Disregarding such contentious cases, which do not occur at the ends of words, the following sequences can occur as the coda:

The single consonant phonemes except /h/, /w/, /j/ and, in non-rhotic varieties, /r/  
Lateral approximant + plosive or affricate: /lp/, /lb/, /lt/, /ld/, /ltʃ/, /ldʒ/, /lk/ help, bulb, belt, hold, belch, indulge, milk
In rhotic varieties, /r/ + plosive or affricate: /rp/, /rb/, /rt/, /rd/, /rtʃ/, /rdʒ/, /rk/, /rɡ/ harp, orb, fort, beard, arch, large, mark, morgue
Lateral approximant + fricative: /lf/, /lv/, /lθ/, /ls/, /lʃ/ golf, solve, wealth, else, Welsh
In rhotic varieties, /r/ + fricative: /rf/, /rv/, /rθ/, /rs/, /rz/, /rʃ/ dwarf, carve, north, force, Mars, marsh
Lateral approximant + nasal: /lm/, /ln/ film, kiln
In rhotic varieties, /r/ + nasal or lateral: /rm/, /rn/, /rl/ arm, born, snarl
Nasal + homorganic plosive or affricate: /mp/, /nt/, /nd/, /ntʃ/, /ndʒ/, /ŋk/ jump, tent, end, lunch, lounge, pink
Nasal + fricative: /mf/, /mθ/, /nθ/, /ns/, /nz/, /ŋθ/ in some varieties triumph, gloomth, month, prince, bronze, length
Voiceless fricative + voiceless plosive: /ft/, /sp/, /st/, /sk/ left, crisp, lost, ask
Two voiceless fricatives: /fθ/ fifth
Two voiceless plosives: /pt/, /kt/ opt, act
Plosive + voiceless fricative: /pθ/, /ps/, /tθ/, /ts/, /dθ/, /dz/, /ks/ depth, lapse, eighth, klutz, width, adze, box
Lateral approximant + two consonants: /lpt/, /lfθ/, /lts/, /lst/, /lkt/, /lks/ sculpt, twelfth, waltz, whilst, mulct, calx
In rhotic varieties, /r/ + two consonants: /rmθ/, /rpt/, /rps/, /rts/, /rst/, /rkt/ warmth, excerpt, corpse, quartz, horst, infarct
Nasal + homorganic plosive + plosive or fricative: /mpt/, /mps/, /ndθ/, /ŋkt/, /ŋks/, /ŋkθ/ in some varieties prompt, glimpse, thousandth, distinct, jinx, length
Three obstruents: /ksθ/, /kst/ sixth, next

Note: For some speakers, a fricative before /θ/ is elided so that these never appear phonetically: /ˈfɪfθ/ becomes [ˈfɪθ], /ˈsiksθ/ becomes [ˈsikθ], /ˈtwelfθ/ becomes [ˈtwelθ].

Syllable-level rules

  • Both the onset and the coda are optional
  • /j/ at the end of an onset cluster (/pj/, /bj/, /tj/, /dj/, /kj/, /fj/, /vj/, /θj/, /sj/, /zj/, /hj/, /mj/, /nj/, /lj/, /spj/, /stj/, /skj/) must be followed by /uː/ or /ʊə/
  • Long vowels and diphthongs are not found before /ŋ/ except for the mimetic word boing![20]
  • /ʊ/ is rare in syllable-initial position[21]
  • Stop + /w/ before /uː, ʊ, ʌ, aʊ/ (all presently or historically /u(ː)/) are excluded[22]
  • Sequences of /s/ + C1 + + C1, where C1 is a consonant other that /t/ and is a short vowel, are virtually nonexistent[22]

Word-level rules

  • /ə/ does not occur in stressed syllables
  • /ʒ/ does not occur in word-initial position in native English words although it can occur syllable-initial, e.g., luxurious /lʌɡˈʒʊəriəs/
  • /m/, /n/, /l/ and, in rhotic varieties, /r/ can be the syllable nucleus (i.e. a syllabic consonant) in an unstressed syllable following another consonant, especially /t/, /d/, /s/ or /z/
  • Certain short vowel sounds, called checked vowels, cannot occur without a coda in a single-syllable word. In RP, the following short vowel sounds are checked: /ɪ/, /ɛ/, /æ/, /ɒ/, /ʌ/, and /ʊ/.

History of English pronunciation

English consonants have been remarkably stable over time, and have undergone few changes in the last 1500 years. On the other hand, English vowels have been quite unstable. Not surprisingly, then, the main differences between modern dialects almost always involve vowels.

Around the late 14th century, English began to undergo the Great Vowel Shift, in which

  • The high long vowels [iː] and [uː] in words like price and mouth became diphthongized, first to [əɪ] and [əʊ] (where they remain today in some environments in some accents such as Canadian English) and later to their modern values [aɪ] and [aʊ]. This is not unique to English, as this also happened in Dutch (first shift only) and German (both shifts).
  • The other long vowels became higher:
    • [eː] became [iː] (for example meet).
    • [aː] became [eː] (later diphthongized to [eɪ], for example name).
    • [oː] became [uː] (for example goose).
    • [ɔː] become [oː] (later diphthongized to [oʊ], for example bone).

Later developments complicate the picture: whereas in Geoffrey Chaucer's time food, good, and blood all had the vowel [oː] and in William Shakespeare's time they all had the vowel [uː], in modern pronunciation good has shortened its vowel to [ʊ] and blood has shortened and lowered its vowel to [ʌ] in most accents. In Shakespeare's day (late 16th-early 17th century),[23] many rhymes were possible that no longer hold today.[24] For example, in his play The Taming of the Shrew, shrew rhymed with woe.[25]

Dialectical differences


æ-tensing is a phenomenon found in many varieties of American English by which the vowel /æ/ has a longer, higher, and usually diphthongal pronunciation in some environments, usually to something like [eə]. Some American accents, for example those of New York City, Philadelphia, and Baltimore, make a marginal phonemic distinction between /æ/ and /eə/ although the two occur largely in mutually exclusive environments.

Bad–lad split

The bad–lad split refers to the situation in some varieties of southern British English and Australian English, where a long phoneme /æː/ in words like bad contrasts with a short /æ/ in words like lad.

Cot–caught merger

The cot–caught merger is a sound change by which the vowel of words like caught, talk, and tall (/ɔ/), is pronounced the same as the vowel of words like cot, rock, and doll (/ɒ/ in New England /ɑː/ elsewhere). This merger is widespread in North American English, being found in approximately 40% of American speakers and virtually all Canadian speakers.

Father–bother merger

The father–bother merger is the pronunciation of the short O /ɒ/ in words such as "bother" identically to the broad A /ɑː/ of words such as "father", nearly universal in all of the United States and Canada save New England and the Maritime provinces; many American dictionaries use the same symbol for these vowels in pronunciation guides.

