Standard Chinese phonology

The phonology of Standard Chinese is reproduced below. Actual production varies widely among speakers, as people inadvertently introduce elements of their native dialects. By contrast, television and radio announcers are chosen for their pronunciation accuracy and standard accent.



The following is the consonant inventory of Standard Chinese, transcribed in the International Phonetic Alphabet (IPA):

Bilabial Labio-
Alveolar Retroflex Alveolo-
Palatal Velar
Nasal m n ŋ
Plosive p    t    k   
Affricate  t͡s   t͡sʰ  t͡ʂ   t͡ʂʰ    t͡ɕ   t͡ɕʰ ²
Fricatives f s ʂ ɻ~ʐ ¹ ɕ ² x
Approximant l (j)   (ɥ) ³ (w) ³

All but /ŋ/ occur in syllable onsets (as "initials"), whereas only /n/, /ŋ/, and /ɻ/ occur as syllable codas. [m] may occur as /n/'s allophone before [p], [pʰ], [m] when speaking quickly.

  1. /ɻ/ is often transcribed as [ʐ] (a voiced retroflex fricative). This represents a variation in pronunciation among different speakers, rather than two different phonemes.
  2. These are not always considered independent phonemes. See below.
  3. These are commonly viewed not as independent phonemes but as either (1) consonantal allophones of "medial" high vowels (i.e. when another vowel follows); or (2) epenthetic (automatically inserted) glides before "main" high vowels (i.e. not followed by another vowel).

The retroflex consonants are flat apical postalveolar (Ladefoged & Wu 1984; Ladefoged & Maddieson 1996:150-154). See retroflex consonants.

The alveolo-palatal consonants  [t͡ɕ   t͡ɕʰ  ɕ] are in complementary distribution with the alveolar consonants  [t͡s   t͡sʰ  s], retroflex consonants  [t͡ʂ   t͡ʂʰ  ʂ], and velar consonants [k kʰ x], which they derive from historically. As a result, linguists often prefer to classify  [t͡ɕ   t͡ɕʰ  ɕ] as allophones of the other three series. The Yale and Wade-Giles systems mostly treat the palatals as allophones of the retroflex consonants; Tongyong Pinyin mostly treats them as allophones of the alveolars; and Chinese braille treats them as allophones of the velars. In Hanyu Pinyin they are considered apart, however.

The collapse of the velar and alveolar sibilant series into the alveolo-palatal in palatalizing environments happened only a few centuries ago. Before then, some instances of modern [t͡ɕ(ʰ)i] were instead [k(ʰ)i], and others were [t͡s(ʰ)i] . The change took place in the last two or three centuries at different times in different areas, but not in the dialect used in the Manchu dynasty imperial court. This explains why some European transcriptions of Chinese names (especially in the postal map spelling) contain "ki-", "hi-", "tsi-" or "si-". Examples are "Peking" for Beijing; "Chungking" for Chongqing; "Fukien" for Fujian (a province); "Tientsin" for Tianjin; "Sinkiang" for Xinjiang; "Sian" for Xi'an. The complementary distribution with the retroflex series appeared as syllables that had a retroflex consonant followed by a medial glide lost the latter.

[t͡ɕ   t͡ɕʰ  ɕ] may be pronounced  [t͡sj   t͡sʰj  sj],[citation needed] which is characteristic of the speech of young women, and also of some men.[citation needed] This is considered rather effeminate[citation needed] and may also be substandard.[citation needed]

The null initial, written as an apostrophe in pinyin word-medially, is most commonly realized as [ɰ][citation needed], though [n], [ŋ], [ɣ], and [ʔ] are common in nonstandard Mandarin dialects; some of these correspond to null in Standard Chinese but contrast with it in their dialect.[citation needed]

Corresponding chart in:


Standard Chinese has approximately half a dozen vowels. Phonetically, the following phones may be distinguished:

  • [a], in the sequences [an], [wan]
  • [ä], in [ä], [jä], [wä], [äɪ̯], [wäɪ̯], [äʊ̯], [jäʊ̯] (Depending on the whether the sound after it is 'front' or 'back', Some people may pronounce it more likely to [a] and [ɑ] respectively)
  • [ɑ], in [ɑŋ], [jɑŋ], [wɑŋ]
  • [e], in [eɪ̯], [weɪ̯] (Some people may pronounce it more likely to [e̽])
  • [ɛ], in [jɛ], [jɛn] (and an interjection [ɛ])
  • [œ̜], in [ɥœ̜], [ɥœ̜n],
  • [ɤ], in [ɤ], [ɤŋ], [wɤŋ]
  • [o], in [oʊ̯], [joʊ̯] (Some people may pronounce it more likely to [ɤ̹])
  • [ɔ], in [wɔ] (and an interjection [ɔ]) (Some people may pronounce it closer)
  • [ə], in [ən], [wən]
  • [ʌ], as the bare syllabic nucleus [ɰʌ] (rare)
  • [z̩], as the bare syllabic nucleus [z̩] [despite the transcription, not actually a syllabic fricative] after the alveolar sibilants /t͡s   t͡sʰ s/. It's accurate pronunciation is usually not a sibilant or fricative sound (but still a alveolar sound).
  • [ʐ̩], as the bare syllabic nucleus [ʐ̩] after the retroflex sibilants /t͡ʂ  t͡ʂʰ ʂ ʐ/. It's accurate pronunciation is usually not a sibilant or fricative sound (but still a retroflex sound).
  • [i], in [i], [in], [iŋ] (Some people may add a vowel in [in], [iŋ] between [i] and the consontant)
  • [ʊ], in [ʊŋ], [jʊŋ]
  • [u], in [u]
  • [y], in [y], [yn](Some people may add a vowel in [yn] between [i] and [n])

At first glance, these would appear to constitute a system of eight phonemes: /a/ ([a ~ ä ~ ɑ]), /e/ ([e ~ ɛ ~ œ]), /o/ ([o ~ ɔ]), /ə/ ([ə ~ ɤ ~ ʌ]), /ɨ/ ([z̩ ~ ʐ̩]), /i/ ([i]), /u/ ([ʊ ~ u]), and /y/ ([y]). However, the mid vowels /e/, /o/, /ə/ are in complementary distribution, and are therefore treated as a single phoneme /ə/. Exceptions include exclamations that can be treated as outside of the core system (similar to the normal treatment of "hmm", "unh-unh", "shhh!" and other English exclamations that violate usual syllabic constraints): [ɛ][ɔ] (e.g. the interjections 喔, 哦 and 噢) – [ɰʌ] (e.g. 饿 "hungry", 鹅 "goose"), [jɛ] (e.g. 夜 "night", 爷 "grandfather") – [jɔ] (e.g. the interjection 哟), [lə] (e.g. 乐 "glad") – [lo] (e.g. the interjection 咯). Nonetheless, disregarding these exceptions would result in a six-vowel system.

It would also be possible to merge /ɨ/ and /i/, which are historically related, since they are also in complementary distribution, provided that the alveolo-palatal series is either left unmerged, or merged with the velars rather than the retroflex or alveolar series. (That is, [t͡ɕi], [t͡sɨ] and [t͡ʂɨ] all exist, but there is neither *[ki] nor *[kɨ], so there is no problem merging both [i]~[ɨ] and [k]~[t͡ɕ] at the same time.) The result is a five-vowel system of /a/, /ə/, /i/, /u/, and /y/.

The medials /j, w, ɥ/ can also be merged to the high vowels /i, u, y/ — there is no ambiguity in interpreting a sequence like [jɑʊ̯] as /iau/, and potentially problematic sequences such as */iu/ never occur. This results in a minimal system with 19 consonants and 5 vowels.

An alternative and potentially more abstract system that sometimes appears in the linguistic literature (e.g. in Mantaro Hashimoto and Edwin Pulleyblank)[1] uses the opposite approach of analyzing the vowels /i/, /u/ and /y/ as the surface form of the glides /j, w, ɥ/ combined with a null meta-phoneme Ø. In this system, shown below, there are just two vowel nuclei, /a/ and /ə/; various allophones result from a preceding glide /j, w, ɥ/ (or null) and a coda /i~j, u~w, n, ŋ/ (or null; see erhua for the additional sequences afforded by the rhotic coda /ɻ/). (The minimal vowel /ɨ/ is ascribed to the surface manifestation of all three values being null, e.g. [sɨ] would be pronounced like an underlying syllable /s/.)

Nucleus Coda Medial
Ø j w ɥ
a Ø ä ä
i aɪ̯ waɪ̯
u äʊ̯ jäʊ̯
n an jɛn wan ɥœ̜n
ŋ ɑŋ jɑŋ wɑŋ
ə Ø ɤ ¹ ɥœ̜ ²
i eɪ̯ weɪ̯
u oʊ̯ joʊ̯
n ən in wən yn
ŋ ɤŋ wɤŋ
~ ʊŋ
Ø z̩~ʐ̩ i u y

¹ Both pinyin and zhuyin have an additional "o", used after "b p m f", which is distinguished from "uo", used after everything else. "o" is generally put into the first column instead of the third. However, in Beijing pronunciation, these are identical.
² Another way to represent the four finals of this line is: [ɰʌ jɛ wɔ ɥœ], which reflects Beijing pronunciation.
³ /wɤŋ/ is pronounced [ʊŋ] when it follows an initial.

The sequence [jɛn] can be considered to be phonemically either /jən/ or /jan/; likewise [ɥɛn] could be either /ɥən/ or /ɥan/. Since [jɛn] and [ɥɛn] become [jɐɻ] and [ɥɐɻ] with the addition of a suffix /ɻ/, the latter interpretation is generally preferred.


Syllables in Standard Chinese have the maximal form CGVCT, where the first C is the initial consonant; G is one of the glides /j w ɥ/; V is a vowel (or diphthong); the second C is a coda, /n ŋ ɻ/ (if diphthongs like ou, ai are analyzed as V) or /n ŋ ɻ j w/ (if not); and T is the tone. In traditional Chinese phonology, C is called the "initial", G the "medial", and VFT the "final" or "rime"; sometimes the medial is considered part of the rime.

Not counting tone distinctions or the rhotic coda, there are some 35 finals in Standard Chinese. They can be seen at:

Tables of all syllables (excluding tone and rhotic coda) are at:

The rhotic coda

Standard Chinese also uses a rhotic consonant, /ɻ/. This usage is a unique feature of Mandarin dialects, especially the Beijing dialect; other dialects lack this sound.[dubious ] In Chinese, this feature is known as Erhua. There are two cases in which it is used:

  1. In a small number of words, such as 二 èr "two", 耳 ěr "ear", etc. All of these words are pronounced [ɑɻ] with no initial consonant.
  2. As a noun suffix -兒/-儿 -r. The suffix combines with the final, and regular but complex changes occur as a result.

The "r" final must be distinguished from the retroflex consonant written <ri> in pinyin and [ʐ] in IPA. "The star rode a donkey" in some rhotic English accents, and 我女兒入醫院/我女儿入医院 Wǒ nǚ'ér rù yīyuàn "My daughter entered/enters the hospital" in Standard Chinese, both have a first r pronounced with a relatively lax tongue, and a the second /r/ sounds involving an active retraction of the tongue and contact with the top of the mouth.

In other Mandarin dialects, the rhotic consonant is sometimes replaced by another syllable, such as li, in words that indicate locations. For example, 這兒/这儿 zhèr "here" and 那兒/那儿 nàr "there" become 這裡/这里 zhèli and 那裡/那里 nàli, respectively.


Relative pitch changes of the four tones

Standard Chinese, like all Chinese dialects, is a tonal language. This means that tones, just like consonants and vowels, are used to distinguish words from each other. Many foreigners have difficulties mastering the tones of each character, but correct tonal pronunciation is essential for intelligibility because of the vast number of words in the language that only differ by tone (i.e. are minimal pairs with respect to tone). Statistically, tones are as important as vowels in Standard Chinese.[2] The following are the 4 tones of Standard Chinese:

Tone chart of Standard Chinese
Tone name Yin Ping Yang Ping Shang Qu
Tone number 1 2 3 4
Pinyin diacritic ā á ǎ à
Tone letter ˥ (55) ˧˥ (35) ˨˩, ˨˩˦ (21, 214) ˥˩ (51)
IPA diacritic á ǎ à, a᷉ â
  1. First tone, or high-level tone (陰平/阴平 yīnpíng, literal meaning: dark level):
    a steady high sound, as if it were being sung instead of spoken.
  2. Second tone, or rising tone (陽平/阳平 yángpíng, literal meaning: light level), or more specifically, high-rising:
    is a sound that rises from mid-level tone to high (e.g., What?!)
  3. Third tone, low or dipping tone (上 shǎng,[3][4] literal meaning: "rising"):
    has a mid-low to low descent; if at the end of a sentence or before a pause, it is then followed by a rising pitch. Between other tones it may simply be low.
  4. Fourth tone, falling tone, or high-falling (去 qù, literal meaning: "departing"):
    features a sharp fall from high to low, and is a shorter tone, similar to curt commands. (e.g., Stop!)
About this sound The syllable "ma" pronounced with the four main tones

Neutral tone

Also called fifth tone or zeroth tone (in Chinese: 輕聲/轻声 qīng shēng, literal meaning: "light tone"), neutral tone is sometimes thought of as a lack of tone. It usually comes at the end of a word or phrase, and is pronounced in a light and short manner. Because of this characteristic, and because there is no standard rule for whether a syllable has a neutral tone, it is considered analogous to an unstressed syllable. The neutral tone has a large number of allophones: its pitch depends almost entirely on the tone of the preceding syllable. The situation is further complicated by the amount of dialectal variation associated with it; in some regions, notably Taiwan, the neutral tone is relatively uncommon.

Despite many examples of minimal pairs (for example, 要是 and 钥匙, yàoshì if and yàoshi key, respectively), it is sometimes described as something other than a full-fledged tone for technical reasons: some linguists feel that it results from a "spreading out" of the tone on the preceding syllable. This idea is appealing intuitively because without it, the neutral tone needs relatively complex tone sandhi rules to be made sense of; indeed, it would have to have 4 allotones, one for each of the four tones that could precede it. However, the "spreading" theory incompletely characterizes the neutral tone, especially in sequences where more than one neutrally toned syllable are found adjacent.[5]

The following are from Beijing dialect.[6] Other dialects may be slightly different.

Realization of neutral tones
Tone of first syllable Pitch of neutral tone Example Pinyin English meaning
1 ˥ ˨ (2) 玻璃 (˥.˨) bōli glass
2 ˧˥ ˧ (3) 伯伯 (˧˥.˧) bóbo uncle
3 ˨˩ ˦ (4) 喇叭 (˨˩.˦) lǎba horn
4 ˥˩ ˩ (1) 兔子 (˥˩.˩) tùzi rabbit

Most romanizations represent the tones as diacritics on the vowels (e.g., Hanyu Pinyin, MPS II and Tongyong Pinyin). Zhuyin uses diacritics as well. Others, like Wade-Giles, use superscript numbers at the end of each syllable. The tone marks and numbers are rarely used outside of language textbooks. Gwoyeu Romatzyh is a rare example where tones are not represented as special symbols, but using normal letters of the alphabet (although without a one-to-one correspondence).

To listen to the tones, see (click on the blue-red yin yang symbol).

Tone sandhi

Pronunciation also varies with context according to the rules of tone sandhi. The most prominent phenomenon of this kind is when there are two third tones in immediate sequence, in which case the first of them changes to a rising tone, the second tone. In the literature, this contour is often called two-thirds tone or half-third tone, though generally, in Standard Chinese, the "two-thirds tone" is the same as the second tone. If there are three third tones in series, the tone sandhi rules become more complex, and depend on word boundaries, stress, and dialectal variations.

Tone sandhi rules at a glance

  1. When there are two 3rd tones (˨˩˦) in a row, the first syllable becomes 2nd tone (˧˥), and the second syllable becomes a half-3rd tone (˨˩). The half-3rd tone is a tone that only falls but does not rise.
    ex: 老鼠 (lǎoshǔ) becomes [lɑʊ̯˧˥ʂu˨˩]
  2. When there are three 3rd tones in a row, things get more complicated.
    If the first word is two syllables, and the second word is one syllable, the first two syllables become 2nd tones, and the last syllable stays 3rd tone:
    ex: 保管 (bǎoguǎn hǎo) becomes [pɑʊ̯˧˥ku̯an˧˥xɑʊ̯˨˩˦]
    If the first word is one syllable, and the second word is two syllables, the first syllable becomes half-3rd tone (˨˩), the second syllable becomes 2nd tone, and the last syllable stays 3rd tone:
    ex: 保管 (lǎo bǎoguǎn) becomes [lɑʊ̯˨˩pɑʊ̯˧˥ku̯an˨˩˦]
  3. When a 3rd tone is followed by a first, second or fourth tone, or most neutral tone syllables, it usually becomes a half-3rd tone.
    ex: 美妙 (měimiào) becomes [mei̯˨˩mi̯ɑʊ̯˥˩]

Rules for "一" and "不"

"" () and "" () have special rules which do not apply to other Chinese characters:

  1. When in front of a 4th tone syllable, "" becomes 2nd tone.
    ex: 一定 (yīdìng becomes yídìng [i˧˥tiŋ˥˩])
  2. When in front of a non-4th tone syllable, "" becomes 4th tone.
    ex. (1st tone):一天 (yītiān → yìtiān [i˥˩tʰi̯ɛn˥])
    ex. (2nd tone): 一年 (yīnián → yìnián [i˥˩ni̯ɛn˧˥])
    ex. (3rd tone): 一起 (yīqǐ → yìqǐ [i˥˩t͡ɕʰi˨˩˦])
  3. When "" falls between two words, it becomes neutral tone.
    ex: 看一看 (kànyīkàn) becomes kànyikàn
  4. When counting sequentially, and for all other situations "" retains its root tone value of 1st tone. This includes when 一 is used at the end of a multi-syllable word (regardless of the first tone of the next word), and when 一 is immediately followed by any digit, including another 一; hence 一 also retains its root tone value of 1st tone in both syllables of the word "一一". However, it does not include situations where 一一 is part of a longer word like 一一对应 or 一一如命 (these are pronounced yìyíduìyìng and yíyìrúmìng, although written yīyīduìyìng and yīyīrúmìng). The word 不一一 (meaning "I won't go into details") is pronounced differently depending on whether or not speakers interpret it as containing 一一 as a component word.
  5. When 一 is part of a cardinal number, it is pronounced as 4th tone when before or , but in an ordinal number it is pronounced as 1st tone in these contexts.
  6. "" becomes 2nd tone only when followed by a 4th tone syllable.
    ex: 不是 (bùshì) becomes [pu˧˥ʂɻ̩˥˩]
  7. When "" comes between two words in a yes-no question, it loses its tone (becomes neutral in tone).[citation needed]
    ex: 是不是 (shìbùshì) becomes shìbushì[citation needed]

Relationship between Middle Chinese and modern tones

Relationship between Middle Chinese and modern tones:

V- = unvoiced initial consonant
L = sonorant initial consonant
V+ = voiced initial consonant (not sonorant)

Middle Chinese Tone Ping (平) Shang (上) Qu (去) Ru (入)
Initial V- L V+ V- L V+ V- L V+ V- L V+
Standard Chinese Tone name Yin Ping
(陰平, 1)
Yang Ping
(陽平, 2)
(上, 3)
(去, 4)
with no pattern
to Qu to Yang Ping
Tone contour 55 35 214 51 to 51 to 35

It is known[citation needed] that if the two morphemes of a compound word cannot be ordered by grammar, the order of the two is usually determined by tones — Yin Ping (1), Yang Ping (2), Shang (3), Qu (4), and Ru, which is the plosive-ending tone that has already disappeared. Below are some compound words that show this rule. Tones are shown in parentheses, and R indicates Ru.

  • 左右 (34)
  • 南北 (2R)
  • 輕重 (14)
  • 貧富 (24)
  • 凹凸 (1R)
  • 喜怒 (34)
  • 哀樂 (1R)
  • 生死 (13)
  • 死活 (3R)
  • 陰陽 (12)
  • 明暗 (24)
  • 毀譽 (34)
  • 褒貶 (13)
  • 離合 (2R)

Word stress

The stress pattern of Chinese language is made up of three degrees of stress. There are three stress patterns, which commonly occur in the two-syllable compound words:[7]

  1. Normal Stress + Primary Stress (\ + /)
    • 字画儿 zìhuàr
    • 音乐 yīnyuè
    • 学校 xuéxiào
    • 汽车 qìchē
  2. Primary Stress + Unstressed (/ + o)
    • 父亲 fùqin
    • 喜欢 xǐhuan
    • 东西 dōngxi
  3. Primary Stress + Normal Stress (/ + \)
    • 农村 nóngcūn
    • 社会 shèhuì
    • 热情 rèqíng


  1. ^ Hashimoto, Mantaro (1970), "Notes on Mandarin Phonology", in Jakobson, Roman; Kawamoto, Shigeo, Studies in General and Oriental Linguistics, Tokyo: TEC, pp. 207–220 
  2. ^ Surendran, Dinoj and Levow, Gina-Anne (2004), "The functional load of tone in Mandarin is as high as that of vowels", Proceedings of the International Conference on Speech Prosody 2004, Nara, Japan, pp. 99–102.
  3. ^ "上聲 - 教育部重編國語辭典修訂本". 中華民國教育部. 1994. Retrieved 2010-05-15. 
  4. ^ 古代汉语大词典大字本. 北京: 商务印书馆. 2002. p. 1369. ISBN 9787100035156. 
  5. ^ Yiya Chen and Yi Xu, Pitch Target of Mandarin Neutral Tone (abstract), presented at the 8th Conference on Laboratory Phonology
  6. ^ Wang Jialing, The Neutral Tone in Trisyllabic Sequences in Chinese Dialects, Tianjin Normal University, 2004
  7. ^ A Reference Grammar of Chinese Sentences with Exercises by Henry Hung-Yeh Tiee, University of Arizona Press, 1986, p. XXVI. ISBN 978-0-8165-1166-2.

Further reading

  • San, Duanmu (2007). The phonology of standard Chinese (2nd ed.). Oxford University Press. ISBN 978-0-19-921579-9. 

