Bottom-up parsing

Bottom-up parsing (also known as shift-reduce parsing) is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. It attempts to build trees upward toward the start symbol. It occurs in the analysis of both natural languages and computer languages.

Linguistics

In linguistics, an example of bottom-up parsing would be analyzing a sentence by identifying words first, and then using properties of the words to infer grammatical relations and phrase structures to build a parse tree of the complete sentence. This means that rather than beginning with the starting symbol and generating an input string, we shall examine the string and attempt to work our way back to the starting symbol. We can gain some power by starting at the bottom and working our way up.

Computer Science

In programming language compilers, bottom-up parsing is a parsing method that works by identifying terminal symbols first, and combines them successively to produce nonterminals. The productions of the parser can be used to build a parse tree of a program written in human-readable source code that can be compiled to assembly language or pseudocode.

Different computer languages require different parsing techniques, although it is not uncommon to use a parsing technique that is more powerful than that actually required.

It is common for bottom-up parsers to take the form of general parsing engines, that can either parse or generate a parser for a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generators are YACC and GNU bison.

An example using a parse tree

A trivial example illustrates the difference. Here is a trivial grammar:

S → AxbrA → abrA → b

For the input sentence ax, the leftmost derivation is

S → Ax → ax

which also happens to be the rightmost derivation as there is only one nonterminal ever to replace in a sentential form.

An LL(1) parser starts with S and asks "which production should I attempt?" Naturally, it predicts the only alternative of S. From there it tries to match A by calling method A (in a recursive-descent parser). Lookahead a predicts production

A → a

The parser matches a, returns to S and matches x. Done. The derivation tree is:

S / A x
a

A bottom up parser is trying to go backwards, performing the following reverse derivation sequence:

ax → Ax → S

Intuitively, a top-down parser tries to expand nonterminals into right-hand-sides and a bottom-up parser tries to replace (reduce) right-hand-sides with nonterminals.

The first action of the bottom-up parser would be to replace a with A yielding Ax. Then it would replace Ax with S. Once it arrives at a sentential form with exactly S, it has reached the goal and stops, indicating success.

Just as with top-down parsing, a brute-force approach will work. Try every replacement until you run out of right-hand-sides to replace or you reach a sentential form consisting of exactly S. While not obvious here, not every replacement is valid and this approach may try all the invalid ones before attempting the correct reduction. Backtracking is extremely inefficient, but as you would expect lookahead proves useful in reducing the number of "wrong turns."

Type of bottom-up parsers

The common classes of bottom-up parsing are:
* LR parser
** LR(0) - No lookahead symbol
** SLR(1) - Simple with one lookahead symbol
** LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to implement. YACC uses this language.
** LR(1) - Most general language, but most complex to implement.
** LR(n) - (where n is a positive integer) indicates an LR parser with n lookahead symbols; while languages can be designed that require more than 1 lookahead, practical languages try to avoid this because increasing n can theoretically require exponentially more code and data space (in practice, this may not be as bad).
*Precedence parsers
**Simple precedence parser
**Operator-precedence parser
**Extended precedence parser

hift-reduce parsers

The most common bottom-up parsers are the shift-reduce parsers. These parsers examine the input tokens and either shift (push) them onto a stack or reduce elements at the top of the stack, replacing a right-hand side by a left-hand side.

Action table

Often an action (or parse) table is constructed which helps the parser determine what to do next. The following is a description of what can be held in an action table.

Actions
* Shift - push token onto stack
* Reduce - remove handle from stack and push on corresponding nonterminal
* Accept - recognize sentence when stack contains only the distinguished symbol and input is empty
* Error - happens when none of the above is possible; means original input was not a sentence!

hift and reduce

A shift-reduce parser uses a stack to hold the grammar symbols while awaiting reduction. During the operation of the parser, symbols from the input are shifted onto the stack. If a prefix of the symbols on top of the stack matches the RHS of a grammar rule which is the correct rule to use within the current context, then the parser reduces the RHS of the rule to its LHS, replacing the RHS symbols on top of the stack with the nonterminal occurring on the LHS of the rule. This shift-reduce process continues until the parser terminates, reporting either success or failure. It terminates with success when the input is legal and is accepted by the parser. It terminates with failure if an error is detected in the input.

The parser is a stack automaton which is in one of several discrete states. In reality, the parse stack contains states, rather than grammar symbols. However, since each state corresponds to a unique grammar symbol, the state stack can be mapped onto the grammar symbol stack mentioned earlier.

An example of shift-reduce parsing

# Start with the sentence to be parsed as the initial sentential form
# Until the sentential form is the start symbol do:
## Scan through the input until we recognise something that corresponds to the RHS of one of the production rules (this is called a handle)
## Apply a production rule in reverse; i.e., replace the RHS of the rule which appears in the sentential form with the LHS of the rule (an action known as a reduction)

In step 2.1 above we are "shifting" the input symbols to one side as we move through them; hence a parser which operates by repeatedly applying steps 2.1 and 2.2 above is known as a shift-reduce parser.

A shift-reduce parser is most commonly implemented using a stack, where we proceed as follows:

* start with an empty stack
* a "shift" action corresponds to pushing the current input symbol onto the stack
* a "reduce" action occurs when we have a handle on top of the stack. To perform the reduction, we pop the handle off the stack and replace it with the terminal on the LHS of the corresponding rule.

Figure 1.

Take the language: Sentence --> NounPhrase VerbPhrase NounPhrase --> Art Noun VerbPhrase --> Verb | Adverb Verb Art --> the | a | ... Verb --> jumps | sings | ... Noun --> dog | cat | ...
And the input: the dog jumps
Then the bottom up parsing is:Stack Input Sequence() (the dog jumps)(the) (dog jumps) SHIFT word onto stack(Art) (dog jumps) REDUCE using grammar rule(Art dog) (jumps) SHIFT..(Art Noun) (jumps) REDUCE..(NounPhrase) (jumps) REDUCE(NounPhrase jumps) () SHIFT(NounPhrase Verb) () REDUCE(NounPhrase VerbPhrase)() REDUCE(Sentence) () SUCCESS
Given the language: --> | + --> | * --> [ ] | 0...9
() (2 * [ 1 + 3 ] ) SHIFT(2) (* [ 1 + 3 ] ) REDUCE() (* [ 1 + 3] ) SHIFT( *) ( [ 1 + 3] ) SHIFT( * [) (1 + 3] ) SHIFT( * [ 1) (+ 3] ) REDUCE (twice)( * [ ) (+ 3 ] ) SHIFT (twice)( * [ + 3) ( ] ) REDUCE (thrice)( * [ + ) ( ] ) REDUCE( * [ ) ( ] ) SHIFT( * [ ] ) () REDUCE( * ) () REDUCE( * ) () REDUCE() () REDUCE() () SUCCESS

ee also

* Parsing
* Top-down parsing

External links

* [http://lambda.uta.edu/cse5317/notes/node18.html An example of shift-reduce parsing] (which is a type of bottom up parsing), with a small grammar, state diagram, and C language code to implement the parser
* [http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Outlines/outline.20.html Course notes on shift reduce parsing]
* [http://nltk.sourceforge.net/tutorial/parsing/section-approaches.html A good non-technical tutorial in the context of natural (human) languages]
* [http://www.gobosoft.com/eiffel/gobo/geyacc/algorithm.html A discussion of shift-reduce conflicts in bottom up parsers] . A knowledgeable but technical article.
* [http://www.cs.uky.edu/~lewis/essays/compilers/bu-parse.html Yet another bottom-up parsing illustration]

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Bottom-up — may refer to:* In business development, a bottom up approach means that the adviser takes the needs and wishes of the would be entrepreneur as the starting point, rather than a market opportunity (which would be a top down approach). * Top down… … Wikipedia
Parsing — In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given (more or less) formal grammar.Parsing is also… … Wikipedia
Bottom-Up Parser — Der Begriff Bottom up Parser bzw. Aufwärtsparser bezeichnet ein Analyse Werkzeug für natürliche und formale Sprachen. Im Regelfall wird ein Parser als Teil eines Übersetzungsprogramms von einer Sprache in eine andere eingesetzt. Bei… … Deutsch Wikipedia
Parsing — Ein Parser [ˈpɑːɹsɚ] (engl. to parse „analysieren“ bzw. von lateinisch pars „Teil“; im Deutschen gelegentlich auch Zerteiler) ist ein Computerprogramm, das in der Computertechnik für die Zerlegung und Umwandlung einer beliebigen Eingabe in ein… … Deutsch Wikipedia
Bottom-Up-Parser — Der Begriff Bottom up Parser bzw. Aufwärtsparser bezeichnet ein Analyse Werkzeug für natürliche und formale Sprachen. Im Regelfall wird ein Parser als Teil eines Übersetzungsprogramms von einer Sprache in eine andere eingesetzt. Bei… … Deutsch Wikipedia
Parsing table — A parsing table is the part of a parser that makes decisions about how to treat input tokens in compiler development.: This page is about LR parsing tables, there are also LL parsing tables which are substantially different. Overview A parsing… … Wikipedia
Top-down parsing — is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis. It occurs in the analysis of both natural… … Wikipedia
Top-down and bottom-up design — Top down and bottom up are strategies of information processing and knowledge ordering, mostly involving software, but also other humanistic and scientific theories (see systemics). In practice, they can be seen as a style of thinking and… … Wikipedia
LL parser — An LL parser is a top down parser for a subset of the context free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser). The class of grammars which are… … Wikipedia
Aufwärtsparser — Der Begriff Bottom up Parser bzw. Aufwärtsparser bezeichnet ein Analyse Werkzeug für natürliche und formale Sprachen. Im Regelfall wird ein Parser als Teil eines Übersetzungsprogramms von einer Sprache in eine andere eingesetzt. Bei… … Deutsch Wikipedia

Academic Dictionaries and Encyclopedias

Bottom-up parsing

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Bottom-up parsing

Look at other dictionaries:

Share the article and excerpts

Direct link