Midterm Exam Study Guide | ANLP Fall 2017

This page provides a list of concepts you should be familiar with and questions you should be able to answer if you are thoroughly familiar with the material in the course. It is safe to assume that if you have a good grasp of everything listed here, you will do well on the exam. However, we cannot guarantee that only the topics mentioned here, and nothing else, will appear on the exam.

How to review

You should review the lecture slides, quizzes, and homework assignments. The readings should be helpful as well. If there are topics that you are not clear on from these resources, please ask on the discussion board, in office hours, or in the review session.

Exam procedures

The exam will be completed without use of a laptop, calculator, or textbook/reference materials.

Scope of the midterm

Everything in the course up through the perceptron is fair game. Part-of-speech tagging and HMMs will not be covered in the midterm.

Text Processing

You should be able to write and interpret Python-style regular expressions with the following components:

string delimiters: ^ $
optionality/repetition operators: | ? * + {3,5}
the . operator (any character)
character classes: e.g. [xyz], [^a-z0-9] and the abbreviations \w \W \s \S \d \D
groups: e.g. ([a-z][0-9])+
backslash-escaping for metacharacters

You should be familiar with basic Python functionality, esp. involving strings and data structures of the types: list, tuple, dict, Counter.

You will not be asked to write Python code from scratch, but you may be asked to choose which of several commands performs the desired function, for example.

You should be familiar with the file formats: TSV, JSON

You should be familiar with the concept of version control and its benefits. We will not test you on specific version control systems or commands.

Finite-state methods

We have discussed:

Finite-state automaton
Finite-state transducer

For these, you should be able to:

Given an English description of a string language, express it in an FSA or vice versa
Write a regular expression for the same string language as an FSA, or vice versa
Describe the relationship between regular expressions/regular languages and finite-state automata
Given a simple example of a spelling or morphological phenomenon in a natural language, express valid strings in an FSA
Given a diagram of an FSA, determine what strings it matches and what strings it excludes
Given a diagram of an FST and an input string, determine the output(s)
Explain the acceptor/recognizer and generator views of the FSA, and the recognizer, translator, generator, and set relator views of the FST
Explain the operations and perform them manually on small examples: concatenation, composition, intersection
Define: finite language, infinite language; finite-state; deterministic, nondeterministic; epsilon
State advantages and disadvantages for NLP of representing a language's vocabulary as: a lexicon; an FSA; an FST
Explain how (unweighted) FSAs/FSTs handle ambiguity

The midterm will not cover FOMA notation, nor will it cover weighted or probabilistic FSAs/FSTs.

Similarity & distance

Describe an application where semantic similarity between words is useful.
Describe an application where string similarity between words is useful.
Explain the intuitions behind (i) WordNet path similarity/distance, and (ii) Wu-Palmer similarity (you do not need to memorize the formulas).
Understand and be able to apply the edit distance algorithm between strings, including
- the 3 standard cost parameters
- Levenshtein distance
- recursive definition of minimum cost
- filling out the cost matrix (chart)
- sequence of edit operations/alignment implied by minimum cost path
- why the algorithm is an example of dynamic programming (and why a brute force solution is problematic)
- worst case space and time complexity

Generative probabilistic models

We have discussed the following generative probabilistic models:

Naïve Bayes classifier
N-gram Language Model

For each of these, you should be able to

describe the imagined process by which data is generated, and say what independence assumptions are made (includes familiarity with terms like “Markov assumption” and “trigram model”).
write down the associated formula for the joint probability of hidden and observed variables (or just the observed variables if there are no hidden variables).
compute the probability of (say) a word sequence, document, or whatever the model describes (assuming you know the model parameters).
for the naïve Bayes model, compute the most probable class for a particular input, hand-simulating any algorithms that might be needed (again assuming you know the model parameters).
explain how the model is trained.
give examples of tasks the model could be applied to, and how it would be applied.
say what the model can and cannot capture about natural language, ideally giving examples of its failure modes.

Discriminative classification models

We have covered:

binary & multiclass Perceptron

For this model, you should be able to

give examples of tasks the model could be applied to, and how it would apply (e.g., what features might be useful).
explain at a high level what training the model aims to achieve, and how it differs from training a generative model.
identify and explain techniques to avoid overfitting.
discuss the pros and cons of discriminative classifiers vs. Naïve Bayes.
explain why optimization is required for learning, unlike with generative models.
write and explain the decoding (classification) rule for the perceptron given a vector of weights and a feature function.
walk through the Perceptron learning algorithm for an example.
explain how, for the Perceptron, decoding is embedded as a step within learning.

Other formulas

In addition to the equations for the generative and discriminative models listed above, you should know the formulas for the following concepts, what they may be used for, and be able to apply them appropriately. Where relevant you should be able to discuss strengths and weaknesses of the associated method, and alternatives.

Bayes' Rule (also: definition of Condition Probability, law of Total Probability aka Sum Rule, Chain Rule, relationship to Joint Probability/marginalization, and all other relevant formulas in the Basic Probability Theory reading)
Add-One / Add-Alpha Smoothing
Dot product
Precision, recall, and F1-score

Additional Mathematical and Computational Concepts

Overarching concepts:

Zipf's Law and sparse data: What is Zipf's law and what are its implications? What does "sparse data" refer to? Be able to discuss these with respect to specific tasks.
Probability estimation and smoothing: What are different methods for estimating probabilities from corpus data, and what are the pros and cons of each, and the characteristic errors? Under what circumstances might you find simpler methods acceptable, or unacceptable? You should be familiar at a high level at least with:
- Maximum Likelihood Estimation
- Add-One / Add-Alpha Smoothing
- Entropy and Cross-Entropy/Perplexity
Except as noted under "Formulas" above, you do not need to memorize the formulas, but should understand the conceptual differences and motivation behind each method, and should be able to use the formulas if they are given to you.
Training, development, and test sets: How are these used and for what reason? Be able to explain their application to particular problems.
Cross-validation
The distinction between parameters and hyperparameters
The distinction between models and algorithms

Linguistic and Representational Concepts

You should be able to explain each of these concepts, give one or two examples where appropriate, and be able to identify examples if given to you. You should be able to say what NLP tasks these are relevant to and why.

Ambiguity (of many varieties, w.r.t. all tasks we've discussed)
Part-of-Speech
- especially, the terms: common noun, proper noun, pronoun, adjective, adverb, auxiliary, main verb, copula, preposition, conjunction, punctuation
- open-class vs. closed-class
Word Senses and relations between them (synonym, antonym, hypernym, hyponym, similarity; homonymy vs. polysemy)
Word order typology: SVO, VSO, etc.
Dialect vs. language
Phonetics, phonology, lexicon, morphology, syntax, semantics, pragmatics, orthography
Synthetic vs. analytic language
Inflectional vs. derivational morphology
consonant, vowel, tone, syllable, prosody, stress
morpheme, affix, prefix, suffix, compound
International Phonetic Alphabet (what it is—not how to use it), Unicode
Language families, e.g. Indo-European, Romance, Germanic, Slavic, Sino-Tibetan, Semitic
Salient aspects of languages presented thus far (questions will be about groups of languages, so if you missed one or two presentations that shouldn’t put you at a disadvantage)

Also, you should be able to give an analysis of a phrase or sentence using the following formalisms. Assume that either the example will be very simple and/or some set of labels is provided for you to use. (i.e. you should know some standard categories for English but you don't need to memorize details of specific tagsets etc.)

label parts of speech
label word senses given dictionary definitions
label named entities given a list of classes
label the sentiment of a document as well as some positive and negative cue words

Tasks

You should be able to explain each of these tasks, give one or two examples where appropriate, and discuss cases of ambiguity or what makes the task difficult. In most cases you should be able to say what algorithm(s) or general method(s) can be used to solve the task, and what evaluation method(s) are typically used.

Tokenization
Spelling correction
Language modeling
Text categorization
Word sense disambiguation
Named entity recognition
Sentiment analysis

Corpora, Resources, and Evaluation

You should be able to describe what linguistic information is captured in each of the following resources, and how it might be used in an NLP system.

Penn Treebank
WordNet, synset
SemCor

For each of the following evaluation measures, you should be able to explain what it measures, what tasks it would be appropriate for, and why.

Cross-entropy/Perplexity
Accuracy
Precision, recall, and F1-score

In addition:

Intrinsic vs. extrinsic evaluation: be able to explain the difference and give examples of each for particular tasks.
Gold standard: what is it and what is it used for?
Confusion matrix: what is it and what is it used for?