Current Courses

  • Fall 2018: LING-672/COSC-672: Advanced Semantic Representation. In-depth graduate-level exploration of representations, data, and algorithms for sentence semantics, with a focus on UCCA. Description

    This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? In Spring 2017 the focus will be on Universal Conceptual Cognitive Annotation (UCCA); its relationship to other representations in the literature will also be considered. Term projects will consist of (i) innovating on the representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.

Upcoming Courses

  • Spring 2019: LING-572/COSC-572: Empirical Methods in Natural Language Processing. Graduate-level survey of NLP. Description

    Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

    This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

  • Spring 2019: COSC-872: Doctoral Seminar on Natural Language Processing. Reading seminar on current NLP research. Description

    Details TBA.

Past Courses

  • Spring 2018: LING-572/COSC-572: Empirical Methods in Natural Language Processing. Graduate-level survey of NLP. Description

    Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

    This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

  • Spring 2018: LING-485: Cognitive Grammar. Cognitive-functionalist accounts of how grammar structures meaning. With Lourdes Ortega. Description

    The work of cognitive linguists falls under the general category of functional approaches to linguistic structure. Theories of grammar in Cognitive Linguistics emphasize the centrality of meaning and communicative context, and the role of cognitive processes such as memory and attention. This course will elucidate the major themes and core concepts of these theories, including Langacker’s Cognitive Grammar; categorization and prototype theory; frame semantics; Construction Grammar; metaphor and metonymy; and mental spaces and blending. Of particular emphasis will be the application of these theories to analyze sentences in terms of both form and meaning. Implications for linguistic typology, first and second language acquisition, computational linguistics, language pedagogy, and discourse and ideology will also be explored, depending on the particular interests represented in students taking the class.

  • Fall 2017: LING-272/COSC-272: Algorithms for Natural Language Processing. An introduction to NLP for undergraduates who are experienced programmers. Description

    Human language technologies increasingly help us to communicate with computers and with each other. But every human language is extraordinarily complex, and the diversity seen in languages of the world is massive. Natural language processing (NLP) seeks to formalize and unpack different aspects of a language so computers can approximate human-like language abilities. In this course, we will examine the building blocks that underlie a human language such as English (or Japanese, Arabic, Tamil, or Navajo), and fundamental algorithms for analyzing those building blocks in text data, with an emphasis on the structure and meaning of words and sentences. Students will implement a variety of core algorithms for both rule-based and machine learning methods, and learn how to use computational linguistic datasets such as lexicons and treebanks. Text processing applications such as machine translation, information retrieval, and dialogue systems will be introduced as well.

    This course is designed for undergraduates who are comfortable with the basics of discrete probability and possess solid programming skills, including the ability to use basic data structures and familiarity with regular expressions. COSC-160: Data Structures is the prerequisite for CS students, and LING-001 is the prerequisite for Linguistics students. Students that are new to programming or need a refresher are directed to LING-362: Introduction to NLP. The languages of instruction will be English and Python.

  • Summer 2017: Corpus Linguistics (2017 Linguistic Institute, Lexington, KY, July 5–August 1). With Amir Zeldes. Description
    Corpus data is essential to many approaches to linguistics, including usage-based approaches to grammar, variationist sociolinguistics, and historical linguistics. Corpus building and evaluation have advanced tremendously over the past two decades but the barriers to constructing one’s own corpus can be daunting: annotation interfaces are difficult to learn, Natural Language Processing tools can be highly complex to work with and handling data requires more than basic computer skills. In this hands-on course we will learn to apply corpus methods to a dataset created during the course itself, focusing on the growing and challenging domain of social media. We will learn practical annotation schemes and consider how design choices impact our subsequent evaluation as we build and explore a small example corpus together.
  • Spring 2017: LING-672/COSC-672: Advanced Semantic Representation. In-depth graduate-level exploration of representations, data, and algorithms for sentence semantics, with a focus on AMR. Description

    This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? In Spring 2017 the focus will be on the Abstract Meaning Representation (AMR); its relationship to other representations in the literature will also be considered. Term projects will consist of (i) innovating on the representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.

  • Fall 2016: LING-572/COSC-572: Empirical Methods in Natural Language Processing. Graduate-level survey of NLP. Description

    Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

    This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

  • Spring 2016: INFR09028: Foundations of Natural Language Processing (University of Edinburgh School of Informatics). With Sharon Goldwater.

Scientific Tutorials

These are listed on the publications page.