- Summer 2017: Corpus Linguistics
(2017 Linguistic Institute, Lexington, KY, July 5–August 1).
With Amir Zeldes.
Corpus data is essential to many approaches to linguistics, including usage-based approaches to grammar, variationist sociolinguistics, and historical linguistics. Corpus building and evaluation have advanced tremendously over the past two decades but the barriers to constructing one’s own corpus can be daunting: annotation interfaces are difficult to learn, Natural Language Processing tools can be highly complex to work with and handling data requires more than basic computer skills. In this hands-on course we will learn to apply corpus methods to a dataset created during the course itself, focusing on the growing and challenging domain of social media. We will learn practical annotation schemes and consider how design choices impact our subsequent evaluation as we build and explore a small example corpus together.
- Fall 2017: LING-272/COSC-272: Algorithms for Natural Language Processing.
An introduction to NLP for undergraduates who are experienced programmers.
Human language technologies increasingly help us to communicate with computers and with each other. But every human language is extraordinarily complex, and the diversity seen in languages of the world is massive. Natural language processing (NLP) seeks to formalize and unpack different aspects of a language so computers can approximate human-like language abilities. In this course, we will examine the building blocks that underlie a human language such as English (or Japanese, Arabic, Tamil, or Navajo), and fundamental algorithms for analyzing those building blocks in text data, with an emphasis on the structure and meaning of words and sentences. Students will implement a variety of core algorithms for both rule-based and machine learning methods, and learn how to use computational linguistic datasets such as lexicons and treebanks. Text processing applications such as machine translation, information retrieval, and dialogue systems will be introduced as well.
This course is designed for undergraduates who are comfortable with the basics of discrete probability and possess solid programming skills, including the ability to use basic data structures and familiarity with regular expressions. COSC-160: Data Structures is the prerequisite for CS students, and LING-001 is the prerequisite for Linguistics students. Students that are new to programming or need a refresher are directed to LING-362: Introduction to NLP. The languages of instruction will be English and Python.
- Spring 2017: LING-672/COSC-672: Advanced Semantic Representation.
In-depth graduate-level exploration of representations, data, and algorithms
for sentence semantics, with a focus on AMR. Description
This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? In Spring 2017 the focus will be on the Abstract Meaning Representation (AMR); its relationship to other representations in the literature will also be considered. Term projects will consist of (i) innovating on the representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.
- Fall 2016: LING-572/COSC-572: Empirical Methods in Natural Language Processing.
Graduate-level survey of NLP. Description
Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.
This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.
- Spring 2016: INFR09028: Foundations of Natural Language Processing (University of Edinburgh School of Informatics). With Sharon Goldwater.
These are listed on the publications page.