Final Exam Study Guide | ENLP Spring 2019

This page provides a list of concepts you should be familiar with and questions you should be able to answer if you are thoroughly familiar with the material in the course. It is safe to assume that if you have a good grasp of everything listed here and the midterm study guide, you will do well on the exam. However, we cannot guarantee that only the topics mentioned here, and nothing else, will appear on the exam.

How to review

You should review the lecture slides, quizzes, and homework assignments. The readings should be helpful as well. If there are topics that you are not clear on from these resources, please ask on the discussion board or in office hours.

Assume the instructors will not be available to answer questions in the 48 hours preceding the exam.

Exam procedures

The exam will be completed without use of a laptop, calculator, or textbook/reference materials.

Scope of the final

Everything in the course is fair game. In addition to this study guide, it is therefore recommended that you review the midterm topics. The wrap-up slides from the last lecture summarize several major themes of the course.

Style of questions

The final will have a variety of question types. Be prepared for a greater number of short answer questions than in the midterm/quizzes. These may be broadly worded to allow flexibility in which specific representations/models/algorithms you use in your answer. Some parts of the exam may give you a choice of questions to answer.

Structured prediction algorithms

You should understand the Viterbi, CKY, and transition-based parsing algorithms well enough to illustrate them by hand and discuss their asymptotic complexity. Recall that Viterbi is used for sequence taggers (the HMM and structured perceptron), CKY is used for parsing with a CFG or PCFG, and transition-based parsing is most typically used for dependency parsing.

(This year, we did not really talk about beam search or graph-based dependency parsing, so you will not be asked about these techniques.)

Annotation

You should be able to answer questions about annotation concepts like

Crowdsourcing: for what types of annotation can it work, and what methods are needed to ensure high quality?
What factors can contribute challenges or costs in linguistic annotation?
Inter-annotator agreement: what is it and what is it used for?
- Given a confusion matrix, you should be able to calculate raw agreement and Cohen's kappa (but you do not have to memorize the formula for kappa).

Grammars and syntax

We covered Hidden Markov Models (HMMs), Context-Free Grammars (CFGs), and Probabilistic Context-Free Grammars (PCFGs).

You should be able to explain the linguistic and computational pros and cons of modeling language with n-grams/sequences vs. hierarchical syntactic structure.
You should be familiar with the terms language, grammar, rule/production, parse, nonterminal, terminal, and yield in formal language theory.
With respect to natural language syntax, you should understand the difference between a constituency/phrase structure tree and a dependency tree.
You should understand the relationship between grammar rules and derivations of the grammar (trees). You should be able to give examples of trees that are licensed by a grammar. You should be able to determine whether the language expressed by a grammar is finite and whether it is recursive. Given a sentence or tree, you should be able to determine whether or not it is licensed by the grammar, and if so, which rules it uses.
You should understand the purpose of binarization and be able to binarize a CFG without altering the language (set of sentences it licenses).
You should be able to label a simple English sentence with its parts of speech and constituency or dependency tree (including labels like S, NP, VP, PP for constituents and subject, object, etc. for dependencies). You should understand the concept of syntactic head-modifier dependencies, including the difference between functional and content heads; lexicalization, by which heads are incorporated in a phrase structure grammar; and conversion from a phrase structure tree to a dependency tree. You should be able to tell whether a dependency parse is projective or nonprojective.

An HMM is a generative model over tagged words; a PCFG is a generative model over trees (nonterminals and terminals). As with the other generative models in this course (see midterm topics), you should be able to describe the independence assumptions and generative process, compute probabilities, etc.

(You will not be probed extensively on the Chomsky Hierarchy, but you should be aware that CFGs are strictly more expressive than regular expressions, and computationally more expensive to parse with. Both are classes of formal grammars.)

Semantic roles

You should be able to argue why syntactic relationships in a sentence are not the same as semantic relationships, and why some tasks could benefit from semantic relationships. E.g., give examples of syntactic structures that are ambiguous in their semantic roles, or syntactically similar sentences with different semantic roles.
You should be able to explain the key differences between PropBank and FrameNet as semantic role resources/representations.
Given a sentence and an inventory of roles, you should be able to label phrases with roles.

Similarity and distributional representations

For example, you should be able to:

State and explain the distributional hypothesis.
Give examples of how similarity between word types or documents can be useful.
Given a small corpus, construct a distributional word vector using neighboring word counts or PMI scores.
Compute the cosine similarity between two vectors.
Explain the effect of window size on the nature of the distributional word representations (vectors/clusters) that are induced.
Explain the value of dimensionality reduction techniques for obtaining word or document representations.

(This year we did not cover Brown clustering, so this will not be on the test.)

Neural networks

You should be familiar with

Properties that distinguish neural networks differ from other supervised learning techniques in the class (tradeoffs such as representational power, training time and data required, interpretability)
Perceptron units (neurons)
Feed-forward networks
The concepts: loss function, activation function, the backpropagation algorithm (we will not ask you to implement it)
RNNs, especially LSTMs and BiLSTMs: what they are used for in NLP and why they can be more powerful than linear sequence models
Sequence-to-sequence (seq2seq a.k.a. encoder-decoder) models and attention
The Transformer
How pretrained word embeddings can be learned and plugged into neural models, possibly with fine-tuning
The main differences between type-based embeddings (such as word2vec, GloVe) and contextualized embeddings (such as ELMo, BERT)
Hyperparameters/training regimes you encountered in your LSTM homework, such as dropout, batch size, early stopping
Vector space terminology like 1-hot, dense, tensor
Why GPUs/TPUs are useful when working with neural networks

You will not be asked about

specific NN software libraries
the inner workings of complicated neural architectures like an LSTM or Transformer

Applications and other topics

Machine translation, especially
- how the Noisy Channel model can be used for statistical MT
- Statistical MT subtasks: word alignment, translation model training, language model training, decoding
- for word alignment, the (generative) IBM Models 1 & 2: what independence assumptions they make, etc.
- the challenge of evaluation; BLEU score
Unsupervised learning: You should be able to give examples of unsupervised learning tasks/algorithms we have discussed in class.
- The EM algorithm: how it iterates between (hard or soft) prediction and parameter estimation; examples of models it can be used for.
Examples of how different forms of context matter in full-fledged language understanding and NLP, including coreference resolution, pragmatics, multimodality, and social/societal impact.

Other formulas

In addition to models and formulas discussed above, you should know the formulas for the following concepts, what they may be used for, and be able to apply them appropriately. Where relevant you should be able to discuss strengths and weaknesses of the associated method, and alternatives.

Cosine similarity
Raw agreement rate