Final Exam Study Guide | ENLP Fall 2016

This page provides a list of concepts you should be familiar with and questions you should be able to answer if you are thoroughly familiar with the material in the course. It is safe to assume that if you have a good grasp of everything listed here and the midterm study guide, you will do well on the exam. However, we cannot guarantee that only the topics mentioned here, and nothing else, will appear on the exam.

How to review

You should review the lecture slides, quizzes, and homework assignments. The readings should be helpful as well. If there are topics that you are not clear on from these resources, please ask on the discussion board or in office hours.

Assume the instructors will not be available to answer questions in the 48 hours preceding the exam.

Exam procedures

The exam will be completed without use of a laptop, calculator, or textbook/reference materials.

Scope of the midterm

Everything in the course is fair game. In addition to this study guide, it is therefore recommended that you review the midterm topics. The wrap-up slides from the last lecture summarize several major themes of the course.

Style of questions

The final will have a variety of question types. Be prepared for a greater number of short answer questions than in the midterm/quizzes. These may be broadly worded to allow flexibility in which specific representations/models/algorithms you use in your answer. Some parts of the exam may give you a choice of questions to answer.

Structured prediction algorithms

You should be able to explain why some models require structured prediction, and the difference between

full enumeration/exhaustive search
greedy decoding
dynamic programming

for taggers and parsers.

You should understand the Viterbi and CKY algorithms well enough to illustrate them by hand and discuss their asymptotic complexity. Recall that Viterbi is used for sequence taggers (the HMM and structured perceptron), while CKY is used for parsing with a CFG or PCFG.

You should also be familiar with the intuition behind algorithms for dependency parsing, including theoretical complexity and practical considerations (e.g., why it might be desirable to model dependency parses directly vs. constituency-parse and then convert to dependencies).

(This year, we did not really talk about beam search, so you will not be asked about this technique.)

Discriminative sequence tagging

You should be able to explain the relationship between the structured perceptron and (i) the regular perceptron, (ii) HMMs.

Annotation

You should be able to answer questions about annotation concepts like

Crowdsourcing: for what types of annotation can it work, and what methods are needed to ensure high quality?
What factors can contribute challenges or costs in linguistic annotation?
Inter-annotator agreement: what is it and what is it used for?
- Given a confusion matrix, you should be able to calculate raw agreement and Cohen's kappa (but you do not have to memorize the formula for kappa).

Grammars and syntax

We covered Context-Free Grammars (CFGs) and Probabilistic Context-Free Grammars (PCFGs).

You should be able to explain the linguistic and computational pros and cons of modeling language with n-grams/sequences vs. hierarchical syntactic structure.
You should be familiar with the terms language, grammar, rule/production, parse, nonterminal, terminal, and yield in formal language theory.
With respect to natural language syntax, you should understand the difference between a constituency/phrase structure tree and a dependency tree.
You should understand the relationship between grammar rules and derivations of the grammar (trees). You should be able to give examples of trees that are licensed by a grammar. You should be able to determine whether the language expressed by a grammar is finite and whether it is recursive. Given a sentence or tree, you should be able to determine whether or not it is licensed by the grammar, and if so, which rules it uses.
You should understand the purpose of binarization and be able to binarize a CFG without altering the language (set of sentences it licenses).
You should be able to label a simple English sentence with its constituency or dependency tree (including labels like S, NP, VP, PP for constituents and subject, object, etc. for dependencies). You should understand the concept of syntactic head-modifier dependencies, including the difference between functional and content heads; lexicalization, by which heads are incorporated in a phrase structure grammar; and conversion from a phrase structure tree to a dependency tree. You should be able to tell whether a dependency parse is projective or nonprojective.
Parse evaluation: You should be able to compute unlabeled attachment score or labeled attachment score for a dependency parse.

A PCFG is a generative model over trees (nonterminals and terminals). As with the other generative models in this course (see midterm topics), you should be able to describe the independence assumptions and generative process, compute probabilities, etc.

(You will not be probed extensively on the Chomsky Hierarchy, but you should be aware that CFGs are strictly more expressive than regular expressions, and computationally more expensive to parse with. Both are classes of formal grammars.)

Semantic roles

You should be able to argue why syntactic relationships in a sentence are not the same as semantic relationships, and why some tasks could benefit from semantic relationships. E.g., give examples of syntactic structures that are ambiguous in their semantic roles, or syntactically similar sentences with different semantic roles.
You should be able to explain the key differences between PropBank and FrameNet as semantic role resources/representations.
Given a sentence and an inventory of roles, you should be able to label phrases with roles.

Similarity and distributional representations

For example, you should be able to:

State and explain the distributional hypothesis.
Give examples of how similarity between word types or documents can be useful.
Given a small corpus, construct a distributional word vector using neighboring word counts or PMI scores.
Compute the cosine similarity between two vectors.
Map between Brown cluster bit strings and a hierarchy, and determine which pairs of clusters are most similar.
Explain the effect of window size on the nature of the distributional word representations (vectors/clusters) that are induced.
Explain the value of dimensionality reduction techniques for obtaining word or document representations.

Applications and other topics

Machine translation, especially
- how the Noisy Channel model can be used for statistical MT
- subtasks: word alignment, translation model training, language model training, decoding
- for word alignment, the (generative) IBM Models 1 & 2: what independence assumptions they make, etc.
- the challenge of evaluation
The EM algorithm: how it iterates between (hard or soft) prediction and parameter estimation; examples of models it can be used for.
Unsupervised learning: You should be able to give examples of unsupervised learning tasks/algorithms we have discussed in class.
Salient aspects of information retrieval, question answering, and neural networks.

Other formulas

In addition to models and formulas discussed above, you should know the formulas for the following concepts, what they may be used for, and be able to apply them appropriately. Where relevant you should be able to discuss strengths and weaknesses of the associated method, and alternatives.

Cosine similarity
Pointwise mutual information
Raw agreement rate