# Earley parser

The Earley parser is a type of chart parser mainly used for parsing in computational linguistics, named after its inventor, Jay Earley.

Earley parsers are appealing because they can parse all context-free languages. The Earley parser executes in cubic time in the general case, and quadratic time for unambiguous grammars. It performs particularly well when the rules are written left-recursively.

## Performing the Algorithm

To understand how Earley's algorithm executes, you have to understand dot notation. Given a production A -> BCD (where B, C, and D are symbols in the grammar, terminals or nonterminals), the notation A -> B • C D represents a condition in which B has already been parsed and the sequence C D is expected.

For every input position (which represents a position between tokens), the parser generates a state set. Each state is the cartesian product (that is, just the combination) of:

• A dot condition for a particular production.
• The position at which the matching of this production began: the origin state.

The state at input position k is called S(k). The parser is seeded with S(0) being the top-level rule. The parser then iteratively operates in three stages: prediction, scanning, and completion. In the following descriptions, α, β, and γ represent any sequence of terminals/nonterminals (including the null sequence), X, Y, and Z represent single nonterminals, and a represents a terminal symbol.

• Prediction: For every state in S(k) of the form (X -> α • Y β, j) (where j is the origin state as above), add (Y -> • γ, k) to S(k) for every production with Y on the left-hand side.
• Scanning: If a is the next symbol in the input stream, for every state in S(k) of the form (X -> α • a β, j), add (X -> α a • β, j) to S(k+1).
• Completion: For every state in S(k) of the form (X -> γ •, j), find states in S(j) of the form (Y -> α • X β, i) and add (Y -> α X • β, i) to S(k).

These steps are repeated until no more states can be added to the set. This is generally realized by making a queue of states to process, and performing the corresponding operation depending on what kind of state it is.

## Example

The algorithm is hard to see from the abstract description above. It becomes much clearer how it operates once you see it in action. The output is a little verbose, but you should be able to follow it.

Let's say you have the following simple arithmetic grammar:

``` P -> S      # the start rule
S -> S + M
| M
M -> M * T
| T
T -> number
```

And you have the input:

``` 2 + 3 * 4
```

Here are the generated state sets:

``` (state no.) Production          (Origin) # Comment
---------------------------------
== S(0): • 2 + 3 * 4 ==
(1)  P -> • S         (0)    # start rule
(2)  S -> • S + M     (0)    # predict from (1)
(3)  S -> • M         (0)    # predict from (1)
(4)  M -> • M * T     (0)    # predict from (3)
(5)  M -> • T         (0)    # predict from (3)
(6)  T -> • number    (0)    # predict from (5)

== S(1): 2 • + 3 * 4 ==
(1)  T -> number •    (0)    # scan from S(0)(6)
(2)  M -> T •         (0)    # complete from S(0)(5)
(3)  M -> M • * T     (0)    # complete from S(0)(4)
(4)  S -> M •         (0)    # complete from S(0)(3)
(5)  S -> S • + M     (0)    # complete from S(0)(2)
(6)  P -> S •         (0)    # complete from S(0)(1)

== S(2): 2 + • 3 * 4 ==
(1)  S -> S + • M     (0)    # scan from S(1)(5)
(2)  M -> • M * T     (2)    # predict from (1)
(3)  M -> • T         (2)    # predict from (1)
(4)  T -> • number    (2)    # predict from (3)

== S(3): 2 + 3 • * 4 ==
(1)  T -> number •    (2)    # scan from S(2)(4)
(2)  M -> T •         (2)    # complete from S(2)(3)
(3)  M -> M • * T     (2)    # complete from S(2)(2)
(4)  S -> S + M •     (0)    # complete from S(2)(1)
(5)  S -> S • + M     (0)    # complete from S(0)(2)
(6)  P -> S •         (0)    # complete from S(0)(1)

== S(4): 2 + 3 * • 4 ==
(1)  M -> M * • T     (2)    # scan from S(3)(3)
(2)  T -> • number    (4)    # predict from (1)

== S(5): 2 + 3 * 4 • ==
(1)  T -> number •    (4)    # scan from S(4)(2)
(2)  M -> M * T •     (2)    # complete from S(4)(1)
(3)  M -> M • * T     (2)    # complete from S(2)(2)
(4)  S -> S + M •     (0)    # complete from S(2)(1)
(5)  S -> S • + M     (0)    # complete from S(0)(2)
(6)  P -> S •         (0)    # complete from S(0)(1)
```

And now the input is parsed, since we have the state (P -> S •, 0) (note that we also had it in S(3) and S(1); those were complete sentences).

• Art and Cultures
• Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
• Space and Astronomy