Chomsky normal form

In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:

$A \rightarrow BC$ or
$A \rightarrow \alpha$ or
$S \rightarrow \varepsilon$

where A, B and C are nonterminal symbols, α is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as (Hopcroft and Ullman, 1979). As pointed out by Lange and Leiß, the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. Using | G | to denote the size of the original grammar G, the size blow-up in the worst case may range from | G | 2 to 22 | G | , depending on the transformation algorithm used (Lange and Leiß, 2009).

## Alternative definition

Another way to define Chomsky normal form (e.g., Hopcroft and Ullman 1979, and Hopcroft et al. 2006) is:

A formal grammar is in Chomsky reduced form if all of its production rules are of the form:

$A \rightarrow\, BC$ or
$A \rightarrow\, \alpha$

where A, B and C are nonterminal symbols, and α is a terminal symbol. When using this definition, B or C may be the start symbol.

## Converting a grammar to Chomsky Normal Form

1. Introduce S0
Introduce a new start variable, S0 and a new rule $S_0 \rightarrow S$ where S is the previous start variable.
2. Eliminate all $\epsilon$ rules
$\epsilon$ rules are rules of the form $A \rightarrow \epsilon$ where $A \not= S_0$ and $A \in V$ where V is the CFG's variable alphabet.
Remove every rule with $\epsilon$ on its right hand side (RHS). For each rule with A in its RHS, add a set of new rules consisting of the different possible combinations of A replaced or not replaced with $\epsilon$. If a rule has A as a singleton on its RHS, add a new rule $R = A \rightarrow \epsilon$ unless R has already been removed through this process. For example, examine the following grammar G:
$S \rightarrow AbA | B$
$B \rightarrow b | c$
$A \rightarrow \epsilon$
G has one epsilon rule. When the $A \rightarrow \epsilon$ is removed, we get the following:
$S \rightarrow AbA | Ab | bA | b | B$
$B \rightarrow b | c$
Notice that we have to account for all possibilities of $A \rightarrow \epsilon$ and so we actually end up adding 3 rules.
3. Eliminate all unit rules
$A \rightarrow B \ni A,B \in V$
After all the $\epsilon$ rules have been removed, you can begin removing unit rules, or rules whose RHS contains one variable and no terminals (which is inconsistent with CNF.)
To remove $A \rightarrow B$
$\forall B \rightarrow U$ add rule $A \rightarrow U$ unless this is a unit rule which has already been removed.
4. Clean up remaining rules that are not in Chomsky normal form.
Replace $A \rightarrow u_1,u_2, ... u_k, k \ge 3, u_1 \in V \cup \Sigma$ with $A \rightarrow u_1 A_1 , A_1 \rightarrow u_2 A_2 , ... , A_{k-2} \rightarrow u_{k-1} u_k$ where Ai are new variables.
If $u_i \in \Sigma$, replace ui in above rules with some new variable vi and add rule $v_i \rightarrow u_i$.

## References

• John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation, 3rd Edition, Addison-Wesley, 2006. ISBN 0-321-45536-3. (See subsection 7.1.5, page 272.)
• John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-02988-X. (See chapter 4.)
• Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X.  (Pages 98–101 of section 2.1: context-free grammars. Page 156.)
• John Martin (2003). Introduction to Languages and the Theory of Computation. McGraw Hill. ISBN 0-07-232200-4.  (Pages 237–240 of section 6.6: simplified forms and normal forms.)
• Michael A. Harrison (1978). Introduction to Formal Language Theory. Addison-Wesley. ISBN 978-0201029550.  (Pages 103–106.)
• Lange, Martin and Leiß, Hans. To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm. Informatica Didactica 8, 2009. ((pdf)
• Cole, Richard. Converting CFGs to CNF (Chomsky Normal Form), October 17, 2007. (pdf)
• Sipser, Michael. Introduction to the Theory of Computation, 2nd edition.[[Category:Noam Chomsky|Normal Form, Chomsky}}

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Normal form — may refer to: Normal form (abstract rewriting) Normal form (databases) Normal form (game theory) Normal form (mathematics) In formal language theory: Beta normal form Chomsky normal form Greibach normal form Kuroda normal form Normal form… …   Wikipedia

• Greibach normal form — In computer science, to say that a context free grammar is in Greibach normal form (GNF) means that all production rules are of the form::A o alpha X or:S o lambdawhere A is a nonterminal symbol, α is a terminal symbol, X is a (possibly empty)… …   Wikipedia

• Kuroda normal form — In formal language theory, a grammar is in Kuroda normal form iff all production rules are of the form:: AB rarr; CD or: A rarr; BC or: A rarr; B or: A rarr; α where A, B, C and D are nonterminal symbols and α is a terminal symbol.Every grammar… …   Wikipedia

• Chomsky (disambiguation) — Chomsky may refer to: Noam Chomsky (born 1928), American linguist, philosopher, cognitive scientist, political activist, author, lecturer, professor emeritus at MIT, known for early work in transformational grammar and A.I. Chomsky (surname),… …   Wikipedia

• Noam Chomsky — Chomsky redirects here. For other topics with the same name, see Chomsky (disambiguation). Noam Chomsky Noam Chomsky visiting Vancouver, Canada in 2004 …   Wikipedia

• Backus–Naur Form — In computer science, Backus–Naur Form (BNF) is a metasyntax used to express context free grammars: that is, a formal way to describe formal languages. John Backus and Peter Naur developed a context free grammar to define the syntax of a… …   Wikipedia

• Laws of Form — (hereinafter LoF ) is a book by G. Spencer Brown, published in 1969, that straddles the boundary between mathematics and of philosophy. LoF describes three distinct logical systems: * The primary arithmetic (described in Chapter 4), whose models… …   Wikipedia

• Backus-Naur Form — Die Backus Naur Form oder Backus Normalform, kurz BNF, ist eine kompakte formale Metasprache zur Darstellung kontextfreier Grammatiken (Typ 2 Grammatiken in der Chomsky Hierarchie). Hierzu zählt die Syntax gängiger höherer Programmiersprachen.… …   Deutsch Wikipedia

• Backus-Naur-Form — Die Backus Naur Form oder Backus Normalform, kurz BNF, ist eine kompakte formale Metasprache zur Darstellung kontextfreier Grammatiken (Typ 2 Grammatiken in der Chomsky Hierarchie). Hierzu zählt die Syntax gängiger höherer Programmiersprachen.… …   Deutsch Wikipedia

• Noam Chomsky's political views — Noam Chomsky at an antiwar rally in Vancouver, 2004 Noam Chomsky is a widely known intellectual, political activist, and …   Wikipedia