paint-brush
A Brief Introduction to Statistical Parsingby@authoring
New Story

A Brief Introduction to Statistical Parsing

by AuthoringMarch 7th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Statistical parsing uses probabilistic context-free grammars (PCFGs) to analyze sentence structures. Each rule in a grammar has a probability based on corpus data, allowing NLP models to disambiguate competing parse trees. This method plays a crucial role in text classification, including authorship attribution.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - A Brief Introduction to Statistical Parsing
Authoring HackerNoon profile picture
0-item

Authors:

(1) Todd K. Moon, Electrical and Computer Engineering Department, Utah State University, Logan, Utah;

(2) Jacob H. Gunther, Electrical and Computer Engineering Department, Utah State University, Logan, Utah.

Abstract and 1 Introduction and Background

2 Statistical Parsing and Extracted Features

3 Parse Tree Features

4 Classifier

5 Dimension Reduction

6 The Federalist Papers

6.1 Sanditon

7 Conclusions, Discussion, and Future Work

A. A Brief Introduction to Statistical Parsing

B. Dimension Reduction: Some Mathematical Details

References

A. A Brief Introduction to Statistical Parsing

At the suggestion of an anonymous reviewer, this appendix was written to provide a brief a discussion of the statistical parsing, drawing very closely from [26, Chapter 14, Statistical Parsing]. More detailed discussions are provided in [21, 24]. The probabilistic grammar employed is a probabilistic context-free grammar (PCFG). In this grammar are rules for transforming nonterminal symbols to a string of symbols which could be nonterminal symbols or terminal symbols. In a PCFG each rule is accompanied by a probability. As an example, Table 17 shows a grammar for a toy language (used for airline reservations). Each rule in the table of the form A → B [p] means that p is the probability that the non-terminal A will be expanded to the sequence B. This can be alternatively represented as P(A → B) or as P(A → B|A) or as P(LHS|RHS), where LHS and RHS mean “left hand side” and “right hand side,” respectively.


In Table 17, S denotes an start symbol (for a sentence). The grammar’s first rule says that a sentence may consist of a NP (noun phrase) followed by a VP (verb phrase), and that such a rule occurs with probability 0.8. The second rule says that a sentence may consist of an auxiliary (such as does or can) followed by NP then a VP, with probability 0.15. The next rule says that a sentence can be a verb phrase, VP. The tokens obtained by application of a rule can be recursively expanded using their own rules, shown in the table. Thus, a NP may consist of a pronoun, or a proper-noun, etc. The probabilities are estimated from a large corpus of data parsed by a linguist.


There is also a lexicon (or dictionary) of terms, each term of which has a probability within its lexicon type. The right side of Table 17 illustrates a lexicon for this application. For example, a determiner Det can be that (with probability 0.1) or a (with probability 0.3) or the (with probability 0.6). A noun (in this application context) can be book, flight, meal, money, flights, or dinner, with their respective probabilities. Probabilities are (again) estimated from a corpus.


A PCFG can be used to estimate the probability of a parse tree, which can be used to disambiguate multiple parsings, or it can be used to determine the probability of a sentence in a language modeling setting.


“The probability of a particular parse tree T is defined as the product of the probabilities of all the rules used to expand each of the n non-terminal nodes in the parse tree T, where each rule i can be expressed as LHSi → RHSi



The resulting probability P(T, S) is . . . the joint probability of the parse and the sentence and the probability of the parse P(T).” [26, p. 462] . In computing the probability on a tree, there is a factor for every rule, which corresponds to every edge on the tree.


As an example, consider two different ways of parsing the sentence: “Book the dinner flight.” This can be parsed (understood) either as


  1. Book a flight that serves dinner


  1. Book a flight for [on behalf of] the dinner.


The parse tree, rules, and corresponding probability for parsing 1 is shown here:



The parse tree, rules, and corresponding probability for parsing 2 is shown here:



The probabilities computed for these two parse structures are



The parsing 1 has much higher probability than parsing 2 (which accords with a common understanding of the sense of the sentence).


The parser works through the text being parsed, probabilistically associating the word with its grammatical element, in the context of the tree that is being built. When competing trees are constructed, the tree with highest probability is accepted.


This paper is available on arxiv under CC BY 4.0 DEED license.