 Chomsky normal form

In computer science, a contextfree grammar is said to be in Chomsky normal form if all of its production rules are of the form:
 or
 or
where A, B and C are nonterminal symbols, α is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.
Every grammar in Chomsky normal form is contextfree, and conversely, every contextfree grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as (Hopcroft and Ullman, 1979). As pointed out by Lange and Leiß, the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. Using  G  to denote the size of the original grammar G, the size blowup in the worst case may range from  G  ^{2} to 2^{2  G } , depending on the transformation algorithm used (Lange and Leiß, 2009).
Contents
Alternative definition
Another way to define Chomsky normal form (e.g., Hopcroft and Ullman 1979, and Hopcroft et al. 2006) is:
A formal grammar is in Chomsky reduced form if all of its production rules are of the form:
 or
where A, B and C are nonterminal symbols, and α is a terminal symbol. When using this definition, B or C may be the start symbol.
Converting a grammar to Chomsky Normal Form
 Introduce S_{0}
 Introduce a new start variable, S_{0} and a new rule where S is the previous start variable.
 Eliminate all rules
 rules are rules of the form where and where V is the CFG's variable alphabet.
 Remove every rule with on its right hand side (RHS). For each rule with A in its RHS, add a set of new rules consisting of the different possible combinations of A replaced or not replaced with . If a rule has A as a singleton on its RHS, add a new rule unless R has already been removed through this process. For example, examine the following grammar G:
 G has one epsilon rule. When the is removed, we get the following:
 Notice that we have to account for all possibilities of and so we actually end up adding 3 rules.
 Eliminate all unit rules
 After all the rules have been removed, you can begin removing unit rules, or rules whose RHS contains one variable and no terminals (which is inconsistent with CNF.)
 To remove
 add rule unless this is a unit rule which has already been removed.
 After all the rules have been removed, you can begin removing unit rules, or rules whose RHS contains one variable and no terminals (which is inconsistent with CNF.)
 Clean up remaining rules that are not in Chomsky normal form.
 Replace with where A_{i} are new variables.
 If , replace u_{i} in above rules with some new variable v_{i} and add rule .
See also
 CYK algorithm
 BackusNaur form
 Greibach normal form
 Kuroda normal form
References
 John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation, 3rd Edition, AddisonWesley, 2006. ISBN 0321455363. (See subsection 7.1.5, page 272.)
 John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages and Computation, AddisonWesley Publishing, Reading Massachusetts, 1979. ISBN 020102988X. (See chapter 4.)
 Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 053494728X. (Pages 98–101 of section 2.1: contextfree grammars. Page 156.)
 John Martin (2003). Introduction to Languages and the Theory of Computation. McGraw Hill. ISBN 0072322004. (Pages 237–240 of section 6.6: simplified forms and normal forms.)
 Michael A. Harrison (1978). Introduction to Formal Language Theory. AddisonWesley. ISBN 9780201029550. (Pages 103–106.)
 Lange, Martin and Leiß, Hans. To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm. Informatica Didactica 8, 2009. ((pdf)
 Cole, Richard. Converting CFGs to CNF (Chomsky Normal Form), October 17, 2007. (pdf)
 Sipser, Michael. Introduction to the Theory of Computation, 2nd edition.[[Category:Noam ChomskyNormal Form, Chomsky}}
Categories: Formal languages
Wikimedia Foundation. 2010.