Constituency Parsing with CKY
Constituency parsing analyzes a sentence as nested phrases. Jurafsky and Martin introduce context-free grammars, treebanks, Chomsky normal form, ambiguity, CKY dynamic programming, span-based neural parsing, and parser evaluation. Eisenstein gives a complementary formal treatment of context-free languages, weighted context-free grammars, probabilistic CFGs, semiring parsing, grammar refinement, and links to transition-based parsing.
The main idea is that natural language has hierarchical structure. A noun phrase can contain a determiner and nominal; a verb phrase can contain a verb and noun phrase; larger phrases can be built recursively. CKY shows how to search the enormous space of possible trees efficiently when the grammar is in a suitable binary form.
Definitions
A context-free grammar is a tuple where is a set of nonterminals, is a set of terminals, is a set of productions, and is the start symbol. Productions have a single nonterminal on the left:
A constituency parse tree shows how a sentence is derived from grammar rules. Internal nodes are nonterminals such as S, NP, and VP; leaves are words.
A grammar is in Chomsky normal form when every rule has one of these forms:
CKY requires this binary branching structure. Any CFG can be converted into a weakly equivalent CNF grammar, meaning it recognizes the same strings, though the tree shape may change.
The CKY chart is a triangular table. Cell stores nonterminals that can derive the span from word position up to but not including . For a binary rule , CKY adds to if there is a split such that and .
A probabilistic CFG assigns probabilities to rules. The probability of a tree is the product of its rule probabilities, often computed in log space.
Key results
CKY is dynamic programming over spans. It avoids repeated work because once a constituent is found for a span, that result can be reused in every larger span. For a sentence of length and a grammar in CNF, there are spans and split points per span, giving an parsing structure, multiplied by grammar lookup costs.
Ambiguity is central. A sentence can have many parses due to prepositional phrase attachment, coordination, noun compound structure, and other phenomena. The basic CKY recognizer tells whether a sentence is in the language. A parser stores backpointers to recover one or all trees. A weighted or probabilistic parser stores scores to recover the best tree.
The CKY recurrence for a best-scoring parse is
Lexical rules initialize length-one spans:
Neural span-based parsers keep the chart idea but replace hand-written grammar scores with learned span scores. A transformer or BiLSTM encodes the sentence; an MLP scores each span and label; a CKY-style algorithm finds the highest-scoring tree. This preserves dynamic programming while using modern representations.
Parser evaluation commonly uses labeled precision, labeled recall, and over constituents. A predicted constituent is correct if its span and label match the gold tree, usually after standard preprocessing conventions.
CKY is also a template for thinking about chart algorithms. The chart stores completed subanalyses, not partial guesses that must be recomputed. This is why dynamic programming works for ambiguous grammars: two larger parses can share the same noun phrase analysis for a span. The same idea appears in semiring parsing, where changing the operations changes the computation. A max-plus semiring finds the best parse; a sum-product semiring computes total probability; a boolean semiring performs recognition.
Weighted grammars make the independence assumptions visible. A PCFG assumes that the probability of expanding a nonterminal depends only on that nonterminal, not on its parent, lexical head, or wider context. This is often too weak for natural language. Grammar refinements, parent annotation, lexicalization, and neural span scores all add context back into the scoring function while trying to preserve tractable search.
Constituency parsing remains useful even when dependency parsing is more compact. Constituents identify spans that behave as units, which is valuable for question answering, summarization, semantic role labeling, and grammar checking. A dependency tree says which word is head of a phrase; a constituency tree says where the phrase begins and ends. Many applications need both views.
Treebanks make parsing measurable but also define what counts as syntax. The Penn Treebank, for example, uses conventions for flat noun phrases, traces, function tags, and punctuation that affect both training and evaluation. A parser trained on one treebank learns those annotation decisions, not a universal truth about grammar. When applying a parser to another domain or language, check whether the target annotation scheme expects the same structures.
For hand calculations, it is useful to separate the recognizer from the scorer. First verify that every span can be built by grammar rules; then add probabilities or neural span scores. This prevents arithmetic mistakes from hiding grammar mistakes.
Visual
Sentence: book the flight
CKY span chart:
[0,3] S
/-----------\
[0,1] V [1,3] NP
/------\
[1,2] Det [2,3] N
the flight
book
| Concept | In CKY | Why it matters |
|---|---|---|
| Span | contiguous substring | unit of dynamic programming |
| Split | with | combines two smaller constituents |
| CNF rule | guarantees binary combination | |
| Backpointer | chosen rule and split | reconstructs parse tree |
| Weight | log rule or span score | chooses among ambiguous parses |
Worked example 1: CKY recognition
Problem: parse book the flight with this CNF grammar:
S -> V NP
NP -> Det N
V -> book
Det -> the
N -> flight
Use positions 0 book 1 the 2 flight 3.
- Initialize length-one spans:
[0,1]containsVbecauseV -> book.[1,2]containsDetbecauseDet -> the.[2,3]containsNbecauseN -> flight.
- Fill length-two span
[0,2]:- Split at : left
[0,1]={V}, right[1,2]={Det}. - No rule expands to
V Det, so add nothing.
- Split at : left
- Fill length-two span
[1,3]:- Split at : left
Det, rightN. - Rule
NP -> Det Napplies, so addNPto[1,3].
- Split at : left
- Fill length-three span
[0,3]:- Split at : left
V, rightNP. - Rule
S -> V NPapplies, so addS. - Split at $k=2
:[0,2]` is empty, so nothing else.
- Split at : left
Checked answer: because S is in [0,3], the sentence is recognized and has parse (S (V book) (NP (Det the) (N flight))).
Worked example 2: choosing a weighted parse
Problem: a span [0,3] can be built as VP -> V NP in two ways:
- Split : score of left
Vis , rightNPis , rule score is . - Split : score of left
Vis , rightNPis , rule score is .
Use log scores and choose the maximum.
- Compute split :
- Compute split :
- Compare:
Checked answer: choose split and store its backpointer. In probability space, this is the higher-probability derivation because log probabilities closer to zero are larger.
Code
from collections import defaultdict
words = "book the flight".split()
binary = {
("V", "NP"): ["S"],
("Det", "N"): ["NP"],
}
lexicon = {
"book": ["V"],
"the": ["Det"],
"flight": ["N"],
}
n = len(words)
chart = [[set() for _ in range(n + 1)] for _ in range(n)]
back = {}
for i, word in enumerate(words):
for lhs in lexicon.get(word, []):
chart[i][i + 1].add(lhs)
back[(i, i + 1, lhs)] = word
for span in range(2, n + 1):
for i in range(n - span + 1):
j = i + span
for k in range(i + 1, j):
for b in chart[i][k]:
for c in chart[k][j]:
for a in binary.get((b, c), []):
chart[i][j].add(a)
back[(i, j, a)] = (k, b, c)
print("S" in chart[0][n])
print(chart[0][n])
Common pitfalls
- Running CKY on a grammar that has not been converted to CNF or otherwise adapted.
- Confusing recognition with parsing; recognition only says whether a parse exists.
- Keeping chart entries but no backpointers, then being unable to reconstruct trees.
- Forgetting that CNF conversion can alter tree shape, affecting evaluation unless reversed or normalized.
- Assuming one parse means one meaning; syntactic ambiguity often remains semantically important.
- Evaluating by exact whole-tree match only; constituent precision and recall give more informative diagnostics.
- Treating neural span parsers as unrelated to CKY when many still use chart-style dynamic programming.