LL parser

LL parser

An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser). The class of grammars which are parsable in this way is known as the "LL grammars".

The remainder of this article describes the table-based kind of parser, the alternative being a recursive descent parser which are usually coded by hand (although not always; see e.g. ANTLR for an LL(*) recursive-descent parser generator).

An LL parser is called an LL("k") parser if it uses "k" tokens of look-ahead when parsing a sentence. If such a parser exists for a certain grammar and it can parse sentences of this grammar without backtracking then it is called an LL("k") grammar. Of these grammars, LL(1) grammars, although fairly restrictive, are very popular because the corresponding LL parsers only need to look at the next token to make their parsing decisions. Languages based on grammars with a high value of "k" require considerable effort to parse.

There is contention between the "European school" of language design, who prefer LL-based grammars, and the "US-school", who predominantly prefer LR-based grammars.Fact|date=December 2007 This is largely due to teaching traditions and the detailed description of specific methods and tools in certain text books; another influence may be Niklaus Wirth at ETH Zürich in Switzerland, whose research has described a number of ways of optimising LL(1) languages and compilers.

General case

The parser works on strings from a particular grammar.

The parser consists of
* an "input buffer", holding the input string (built from the grammar)
* a "stack" on which to store the terminals and non-terminals from the grammar yet to be parsed
* a "parsing table" which tells it what (if any) grammar rule to apply given the symbols on top of its stack and the next input token

The parser applies the rule found in the table by matching the top-most symbol on the stack (row) with the current symbol in the input stream (column).

When the parser starts the stack already contains two symbols:

[ S, $ ]

where '$' is a special terminal to indicate the bottom of the stack and the end of the input stream, and 'S' is the start symbol of the grammar. The parser will attempt to rewrite the contents of this stack to what it sees on the input stream. However, it only keeps on the stack what still needs to be rewritten.

Concrete example

Set up

To explain its workings we will consider the following small grammar:

# S → F
# S → ( S + F )
# F → 1

and parse the following input:

:( 1 + 1 )

The parsing table for this grammar looks as follows:

:(Note that there is also a column for the special terminal, represented here as $, that is used to indicate the end of the input stream.)

Parsing procedure

The parser reads the first '(' from the input stream, and the 'S' from the stack. From the table it knows that it has to apply rule 2; it has to rewrite 'S' to '( S + F )' on the stack and write the number of this rule to the output. The stack then becomes:

[ (, S, +, F, ), $ ]

In the next step it removes the '(' from its input stream and from its stack:

[ S, +, F, ), $ ]

Now the parser sees a '1' on its input stream so it knows that it has to apply rule (1) and then rule (3) from the grammar and write their number to the output stream. This results in the following stacks:

[ F, +, F, ), $ ] [ 1, +, F, ), $ ]

In the next two steps the parser reads the '1' and '+' from the input stream and, since they match the next two items on the stack, also removes them from the stack. This results in:

[ F, ), $ ]

In the next three steps the 'F' will be replaced on the stack with '1', the number 3 will be written to the output stream and then the '1' and ')' will be removed from the stack and the input stream. So the parser ends with both '$' on its stack and on its input stream.

In this case it will report that it has accepted the input string and write to the output stream the list of numbers

: [ 2, 1, 3, 3 ]

which is indeed a leftmost derivation of the input string. We see that a leftmost derivation of the input string is:

: S → ( S + F ) → ( F + F ) → ( 1 + F ) → ( 1 + 1 )

Remarks

As can be seen from the example the parser performs three types of steps depending on whether the top of the stack is a nonterminal, a terminal or the special symbol $:
* If the top is a nonterminal then it looks up in the parsing table on the basis of this nonterminal and the symbol on the input stream which rule of the grammar it should use to replace it with on the stack. The number of the rule is written to the output stream. If the parsing table indicates that there is no such rule then it reports an error and stops.
* If the top is a terminal then it compares it to the symbol on the input stream and if they are equal they are both removed. If they are not equal the parser reports an error and stops.
* If the top is $ and on the input stream there is also a $ then the parser reports that it has successfully parsed the input, otherwise it reports an error. In both cases the parser will stop.These steps are repeated until the parser stops, and then it will have either completely parsed the input and written a leftmost derivation to the output stream or it will have reported an error.

Constructing an LL(1) parsing table

In order to fill the parsing table, we have to establish what grammar rule the parser should choose if it sees a nonterminal "A" on the top of its stack and a symbol "a" on its input stream. It is easy to see that such a rule should be of the form "A" → "w" and that the language corresponding to "w" should have at least one string starting with "a". For this purpose we define the "First-set" of "w", written here as Fi("w"), as the set of terminals that can be found at the start of any string in "w", plus ε if the empty string also belongs to "w". Given a grammar with the rules "A"1 → "w"1, ..., "A""n" → "w""n", we can compute the Fi("w""i") and Fi("A""i") for every rule as follows:
# initialize every Fi("w""i") and Fi("A""i") with the empty set
# add "Fi"("w""i") to Fi("A""i") for every rule "A""i" → "w"i, where "Fi" is defined as follows:
#* "Fi"("a" "w' ") = { "a" } for every terminal "a"
#* "Fi"("A" "w' ") = Fi("A") for every nonterminal "A" with ε not in Fi("A")
#* "Fi"("A" "w' ") = Fi("A") { ε } ∪ "Fi"("w' ") for every nonterminal "A" with ε in Fi("A")
#* "Fi"(ε) = { ε }
# add Fi("w""i") to Fi("A"i"") for every rule "A""i" → "w""i"
# do steps 2 and 3 until all Fi sets stay the same.

Unfortunately, the First-sets are not sufficient to compute the parsing table. This is because a right-hand side "w" of a rule might ultimately be rewritten to the empty string. So the parser should also use the a rule "A" → "w" if ε is in Fi("w") and it sees on the input stream a symbol that could follow "A". Therefore we also need the "Follow-set" of "A", written as Fo("A") here, which is defined as the set of terminals "a" such that there is a string of symbols "αAaβ" that can be derived from the start symbol. Computing the Follow-sets for the nonterminals in a grammar can be done as follows:
# initialize every Fo("A""i") with the empty set
# if there is a rule of the form "A""j" → "wAiw' ", then
#* if the terminal "a" is in "Fi"("w' "), then add "a" to Fo("A""i")
#* if ε is in "Fi"("w' "), then add Fo("A""j") to Fo("A""i")
# repeat step 2 until all "Fo" sets stay the same.

Now we can define exactly which rules will be contained where in the parsing table. If "T" ["A", "a"] denotes the entry in the table for nonterminal "A" and terminal "a", then: "T" ["A","a"] contains the rule "A" → "w" if and only if:: "a" is in Fi("w") or:: ε is in Fi("w") and "a" is in Fo("A").

If the table contains at most one rule in every one of its cells, then the parser will always know which rule it has to use and can therefore parse strings without backtracking. It is in precisely this case that the grammar is called an "LL(1) grammar".

Constructing an LL("k") parsing table

Until the mid 1990s, it was widely believed that LL("k") parsing (for "k" > 1) was impracticalFact|date=February 2007, since the size of the parse table would (in general, in the worst case) have to have exponential complexity in "k". This perception changed gradually after the release of the Purdue Compiler Construction Tool Set (PCCTS, now known as ANTLR) around 1992, when it was demonstrated that many programming languages can be parsed efficiently by an LL("k") parser without triggering the worst-case behavior of the parser. Moreover, in certain cases LL parsing is feasible even with unlimited lookahead. By contrast, traditional parser generators, like yacc/GNU bison use LALR(1) parse tables to construct a restricted LR parser with a fixed one-token lookahead.

LL("k") parser generators

Modern parser generators that generate LL parsers with multi-token lookahead include:
* ANTLR : [http://www.antlr.org/ Home page]
* Coco/R : [http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/ Home page]
* JavaCC : [https://javacc.dev.java.net/ Home page]
* PCCTS is now ANTLR, there is an archived site at http://www.polhode.com/pccts.html
* SLK : Strong LL(k) parsing, [http://home.earthlink.net/~slkpg/ Home page] has an in-depth discussion of LL(k) parsing
* Spirit Parser Framework : [http://spirit.sourceforge.net/ Home page] is a flexible LL(∞) parser generation framework in which the grammars themselves are written inline in a C++ program.
* JetPAG: Jet Parser Auto-Generator. An optimizing LL(k) parser generator.
* Parsec : [http://www.cs.uu.nl/~daan/parsec.html Home page] is a monadic parser combinator library for Haskell, which can parse LL(∞), context-sensitive grammars, but performs best when the grammar is LL(1).
* Ocaml Genlex Module : [http://caml.inria.fr/pub/docs/manual-ocaml/libref/Genlex.html Home page]
* JFLAP : [http://www.jflap.org Home page] an educational tool to learn LL(1) parsing
* Grammatica : [http://grammatica.percederberg.net/ Home page] - An LL(k) parser generator for C# and Java.

Other Resources

* [http://www.jambe.co.nz/UNI/FirstAndFollowSets.html An easy explanation of First and Follow Sets] (an attempt to explain the process of creating first and follow sets in a more straight forward way)
* [http://www.itu.dk/people/kfl/parsernotes.pdf] A tutorial on implementing LL(1) parsers in C#
* [http://pdos.csail.mit.edu/papers/packrat-parsing:icfp02.pdf] LL/LR parsers, various other topics, by Bryan Ford, MIT

ee Also

Parse tree - A tree that is generated by a parser.

Top-down parsing - The order in which an LL parse tree is generated.

Bottom-up parsing - The order in which an LR parse tree is generated.


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Parser (magazine) — Parser: New Poetry Poetics is an annual journal of anarchist poetry and poetics from Vancouver, BC, edited by Roger Farr and Reg Johanson. The first issue was published in May 2007, featuring writing by Alice Becker Ho, Alfredo M. Bonanno, P.… …   Wikipedia

  • parser — (izg. pàrser) m DEFINICIJA inform. program koji obavlja sintaktičku analizu nekog jezika; sintaktički analizator ETIMOLOGIJA engl …   Hrvatski jezični portal

  • parser — pars er, n. One who parses. [1913 Webster] …   The Collaborative International Dictionary of English

  • Parser — Ein Parser [ˈpɑːʁzɐ] (engl. to parse, „analysieren“, bzw. lateinisch pars, „Teil“; im Deutschen gelegentlich auch Zerteiler) ist ein Computerprogramm, das in der Computertechnik für die Zerlegung und Umwandlung einer beliebigen Eingabe in ein für …   Deutsch Wikipedia

  • Parser combinator — In functional programming, a parser combinator is a higher order function which accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some… …   Wikipedia

  • Parser Combinator — In mathematics and functional programming, Higher Order functions (HOF) are defined as the functions that can take functions as their input and can also produce functions as their output. The use of a HOF as an infix operator in a function… …   Wikipedia

  • Parser — Это статья о языке программирования, об алгоритме синтаксического анализа см. Синтаксический анализ. Parser Семантика: мультипарадигменный Тип исполнения: Интерпретатор компилирующего типа Появился в …   Википедия

  • Parser-Generator — Ein Parsergenerator ist ein Computerprogramm, das unter Eingabe einer Spezifikation einen Parser erzeugt. Inhaltsverzeichnis 1 Grundlagen 2 Algorithmen 3 Siehe auch 4 Weblinks // …   Deutsch Wikipedia

  • Parser — Pạr|ser 〈m. 3; EDV〉 Bestandteil eines Compilers, Programm, das die syntaktische Analyse eines Quellprogramms durchführt, um es in eine Maschinensprache zu übertragen [zu engl. parse „(grammatisch) analysieren“] * * * Pạr|ser , der; s, [engl.… …   Universal-Lexikon

  • Parser (CGI language) — Infobox programming language name = Parser paradigm = multiparadigm macro, object oriented year = since 1997 designer = Konstantin Morshnev (Art. Lebedev Studio) developer = Alexander Petrosyan (Art. Lebedev Studio) latest release version = 3.2.2 …   Wikipedia

  • Parser Grammar Engine — The Parser Grammar Engine (originally Parrot Grammar Engine) or PGE is a compiler and runtime for a Perl 6 rules for the Parrot virtual machine. [cite web | url=http://search.cpan.org/ ltoetsch/parrot 0.2.2/compilers/pge/README | title=Parrot… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”