Midterm Exam Study Guide | ANLP Fall 2017

This page provides a list of concepts you should be familiar with and questions you should be able to answer if you are thoroughly familiar with the material in the course. It is safe to assume that if you have a good grasp of everything listed here, you will do well on the exam. However, we cannot guarantee that only the topics mentioned here, and nothing else, will appear on the exam.

How to review

You should review the lecture slides, quizzes, and homework assignments. The readings should be helpful as well. If there are topics that you are not clear on from these resources, please ask on the discussion board, in office hours, or in the review session.

Exam procedures

The exam will be completed without use of a laptop, calculator, or textbook/reference materials.

Scope of the midterm

Everything in the course up through the perceptron is fair game. Part-of-speech tagging and HMMs will not be covered in the midterm.

Text Processing

You should be able to write and interpret Python-style regular expressions with the following components:

You should be familiar with basic Python functionality, esp. involving strings and data structures of the types: list, tuple, dict, Counter.

You will not be asked to write Python code from scratch, but you may be asked to choose which of several commands performs the desired function, for example.

You should be familiar with the file formats: TSV, JSON

You should be familiar with the concept of version control and its benefits. We will not test you on specific version control systems or commands.

Finite-state methods

We have discussed:

For these, you should be able to:

The midterm will not cover FOMA notation, nor will it cover weighted or probabilistic FSAs/FSTs.

Similarity & distance

Generative probabilistic models

We have discussed the following generative probabilistic models:

For each of these, you should be able to

Discriminative classification models

We have covered:

For this model, you should be able to

Other formulas

In addition to the equations for the generative and discriminative models listed above, you should know the formulas for the following concepts, what they may be used for, and be able to apply them appropriately. Where relevant you should be able to discuss strengths and weaknesses of the associated method, and alternatives.

Additional Mathematical and Computational Concepts

Overarching concepts:

Linguistic and Representational Concepts

You should be able to explain each of these concepts, give one or two examples where appropriate, and be able to identify examples if given to you. You should be able to say what NLP tasks these are relevant to and why.

Also, you should be able to give an analysis of a phrase or sentence using the following formalisms. Assume that either the example will be very simple and/or some set of labels is provided for you to use. (i.e. you should know some standard categories for English but you don't need to memorize details of specific tagsets etc.)

Tasks

You should be able to explain each of these tasks, give one or two examples where appropriate, and discuss cases of ambiguity or what makes the task difficult. In most cases you should be able to say what algorithm(s) or general method(s) can be used to solve the task, and what evaluation method(s) are typically used.

Corpora, Resources, and Evaluation

You should be able to describe what linguistic information is captured in each of the following resources, and how it might be used in an NLP system.

For each of the following evaluation measures, you should be able to explain what it measures, what tasks it would be appropriate for, and why.

In addition: