We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more, with participation from both the Linguistics and Computer Science departments.

GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab

Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative

Related academic groups in the DC/Baltimore region: Howard NLP, JHU CLSP, UMD CLIP, George Mason NLP

News & Media

10/29/25: Georgetown Grad Student Team Wins International Natural Language Processing Challenge (Corpling lab)
6/12/25: How Georgetown Linguists, Legal Expert Scored a Win in Supreme Court ‘Ghost Guns’ Case (Kevin Tobia, Nathan Schneider, Brandon Waldon)
9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
9/10/18: #MeToo Movement on Twitter (Lisa Singh)
8/29/18: Cliches in baseball (Nathan Schneider)
1/20/18: The Coptic Scriptorium project (Amir Zeldes)
Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award for their EMNLP 2017 paper!

Mailing list: Contact Nathan Schneider to subscribe!

upcoming talks/events

Maciej Ogrodniczuk (IPI PAN Warsaw): Linguistics, 9/6/24, 3:30 in Poulton 230
Alexis Palmer (Colorado Boulder): Linguistics, 9/20/24, 3:30 in Poulton 230
Barbara Plank (LMU Munich): CS, Thurs. 10/10/24, 1:00 in STM 414
Eugene Yang (JHU): CS, 11/1/24, 12:15 in STM 107
Kyle Mahowald (UT Austin): Linguistics, 11/1/24, 3:30 in Poulton 230
Linguistics Career Mixer, 3/19/25, 5:30 in Poulton Hall
William Schuler (OSU): Linguistics, 3/21/25, 3:30 in Poulton 230
Emily Pace: Linguistics Career Talk, 3/26/25, 3:30 in Poulton 230
Sorelle Friedler (Haverford College): CS, 4/4/25, 11:00 in room TBA
Ellie Pavlick (Brown): Linguistics, 4/4/25, ~~3:30~~ 1:30 in Poulton 230
Ziyu Yao (GMU): CS, 4/11/25, 11:30 in room TBA
Ethan Wilcox (Georgetown): Cognitive Science, 4/25/25, 1:00 in Leavey Conference Center Salon B
Sarah Mess, MD (JHU Hospital): Linguistics, 4/25/25, 3:30 in Poulton 230
Tom McCoy (Yale): Linguistics, 6/13/25, 2:00 in Poulton 230
Shane Steinert-Threlkeld (UW): Linguistics, 10/31/25, 3:30 in Poulton 230
Hal Daumé (UMD): CS Distinguished AI Talk, 1/16/26, 2:00 in Riggs Library (3rd floor Healy)
Yanjun (Jane) Qi (UVA): CS Distinguished AI Talk, ~~1/30/26~~ 2/6/26, 1:00 in Thomson Athletic Center—Nolan Hall
Idan Blank (UCLA): Linguistics, 2/13/26, 3:30 in Poulton 230
Dawson Petersen (GU): Linguistics, 2/27/26, 3:30 in Poulton 230
Mohit Bansal (UNC): CS Distinguished AI Talk, 3/13/26, 2:00 in Riggs Library (3rd floor Healy)
Paul Bennett (Spotify): CS Distinguished AI Talk, 3/20/26, 2:00 in Thomson Athletic Center—Nolan Hall
Vered Shwartz (UBC): Linguistics, 3/20/26, 3:30 in Poulton 230
Sean Trott (Rutgers–Newark): Linguistics, 3/27/26, 3:30 in Poulton 230
Nitesh Chawla (Notre Dame): CS Distinguished AI Talk, Tues. 3/31/26, 3:00 in Fisher Colloquium (4th floor Hariri)
Michael Littman (Brown): CS Distinguished AI Talk, Tues. 4/7/26, 2:00 in Fisher Colloquium (4th floor Hariri)
Sejin Paik (GU): Linguistics, 4/24/26, 3:30 in Poulton 230
Previous talks

Courses

Overview of CL course offerings
Document listing courses in CS, Linguistics, and other departments that are most relevant to students interested in computational linguistics. Includes estimates of when each course will be offered.

COSC-285 | Data Mining

Nazli Goharian Upperclass Undergraduate

This course covers concepts and techniques in the field of data mining. This includes both supervised and unsupervised algorithms, such as naive Bayes, neural network, decision tree, rule based classifiers, distance based learners, clustering, and association rule mining. Various issues in the pre-processing of the data are addressed. Text classification, social media mining, and recommender systems will be addressed. The students learn the material by building various data mining models and using various data pre-processing techniques, performing experimentation and provide analysis of the results.

COSC-488 | Information Retrieval

Nazli Goharian Upperclass Undergraduate & Graduate

Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user. Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught.

The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the material by building a prototype of such a search engine. These approaches are in daily use by all search and social media companies.

COSC-689 | Deep Reinforcement Learning

Grace Hui Yang Graduate

Deep Reinforcement learning is an area of machine learning that learns how to make optimal decisions from interacting with an environment. From the environment, an agent observes the consequence of its action and alters its behavior to maximize the amount of rewards received in the long term. Reinforcement learning has developed strong mathematical foundations and impressive applications in diverse disciplines such as psychology, control theory, artificial intelligence, and neuroscience. An example is the winning of AlphaGo, developed using Monte Carlo tree search and deep neural networks, over world-class human Go players. The overall problem of learning from interaction to achieve goals is still far from being solved, but our understanding of it has improved significantly. In this course, we study fundamentals, algorithms, and applications in deep reinforcement learning. Topics include Markov Decision Processes, Multi-armed Bandits, Monte Carlo Methods, Temporal Difference Learning, Function Approximation, Deep Neural Networks, Actor-Critic, Deep Q-Learning, Policy Gradient Methods, and connections to Psychology and to Neuroscience. The course has lectures, mathematical and programming assignments, and exams.

COSC-872 | Seminar in NLP

Nathan Schneider Graduate: Doctoral [2 credits]

This course will expose students to current research in natural language processing and computational linguistics. Class meetings will consist primarily of student-led reading discussions, supplemented occasionally by lectures or hands-on activities. The subtopics and reading list will be determined at the start of the semester; readings will consist of research papers, advanced tutorials, and/or dissertations.

Requirements: Familiarity with NLP using machine learning methods (for example satisfied by COSC-572, Empirical Methods in NLP)

LING-362 | Introduction to Natural Language Processing

Amir Zeldes Upperclass Undergraduate & Graduate

This course will introduce students to the basics of Natural Language Processing (NLP), a field which combines insights from linguistics and computer science to produce applications such as machine translation, information retrieval, and spell checking. We will cover a range of topics that will help students understand how current NLP technology works and will provide students with a platform for future study and research. We will learn to implement simple representations such as finite-state techniques, n-gram models and basic parsing in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the course of the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.

LING-367 | Computational Corpus Linguistics

Amir Zeldes Upperclass Undergraduate & Graduate

Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/

LING-461 | Signal Processing

Corey Miller Upperclass Undergraduate & Graduate

How do things like Amazon Echo and Siri work? What kinds of linguistics went into them and how could they be made better? In order to explore these questions, this course will survey speech technology from a computational linguistic perspective. Both speech recognition, also known as speech-to-text (STT), and speech synthesis, also known as text-to-speech (TTS), will be investigated along with related technologies like speaker/dialect/accent/language identification. While communicating the basic algorithms employed by these technologies, the course will emphasize hands-on and project work to allow you to work with web-based and open source tools to build your own components, evaluate existing products and explore linguistic questions. Students from a variety of backgrounds are encouraged to take this course. Helpful background includes: natural language processing, phonetics, phonology and sociolinguistics. While not required, helpful technical background includes familiarity with speech analysis software such as PRAAT, Linux, shell scripting and coding/scripting in languages like Python, Java, C++, etc.

ANLY-580 | NLP for Data Analytics

Chris Larson Graduate

This course will cover the major techniques for mining and analyzing textual data to extract interesting patterns, discover knowledge, and support decision-making. In this course, the students will learn the main concepts and algorithms in Natural Language Processing and their applications in data science. These include search and information retrieval, document clustering and classification, topic modeling, sentiment analysis, and deriving meaning from unstructured narratives. In addition to traditional techniques in machine learning such as regression, decision trees, and Naive Bayes algorithms, the course will also examine the latest approaches in Deep Learning. The students will be given the opportunity to develop hands-on experience in building foundational tools and machine learning algorithms that can be applied to real analytics problems. The data obtained from textual content can be used to augment numerical data for the purposes of building predictive models, identifying emerging issues, detecting opinion, and determining important relationships.

ANLY-590 | Neural Nets and Deep Learning

James Hickman Graduate

This course will explore the fundamentals of artificial neural networks (ANNs) and deep learning. The following topics will be covered: feed-forward ANNs, activation functions, output transfer functions for regression and classification, cost functions and related likelihood functions, backpropagation and optimization (including stochastic gradient descent and conjugate gradient), auto-encoders for manifold learning and dimensionality reduction, convolutional neural networks, and recurrent neural networks. Overfitting and regularization will be discussed from both theoretical and practical viewpoints. Concepts and techniques will be applied to several domains including image processing, time series analysis, natural language processing, and more. Students will gain mastery of popular deep learning frameworks in the Python ecosystem including Tensorflow and Keras.

COSC-270 | Artificial Intelligence

Mark Maloof Undergraduate

Artificial Intelligence (AI) is the branch of computer science that studies how to program computers to reason, learn, see, and understand. The lecture portion of this class surveys basic and advanced concepts and techniques of artificial intelligence, including search, knowledge representation, automated reasoning, uncertain reasoning, and machine learning. Additional topics include the Lisp programming language, theorem proving, game playing, rule-based systems, and philosophical issues. Applications of artificial intelligence will also be discussed and will include domains such as medicine, computer security, and face detection. Students must complete midterm and final exams, and five projects using the Lisp programming language.

COSC/LING-572 | Empirical Methods in Natural Language Processing

Nathan Schneider Graduate

Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

COSC-586 | Text Mining & Analysis

Nazli Goharian Graduate

This course covers various aspects and research areas in text mining and analysis. Text may be a document, query, blog, tag description, etc. The structure of the course is a combination of lectures & students' presentations. The lectures will cover Text/Web/query classification, information extraction, word sense disambiguation, opinion mining & sentiment analysis, query log analysis, ontology extraction and integration, and more. The students are assigned a related topic in the field for further study and presentation in the class.

COSC/LING-672 | Advanced Semantic Representation

Nathan Schneider Graduate

Natural language is an imperfect vehicle for meaning. On the one hand, some expressions can be interpreted in multiple ways; on the other hand, there are often many superficially divergent ways to express very similar meanings. Semantic representations attempt to disentangle these two effects by exposing similarities and differences in how a word or sentence is interpreted. Such representations, and algorithms for working with them, constitute a major research area in natural language processing.

This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? Representations covered in depth will include FrameNet (http://framenet.icsi.berkeley.edu), Universal Cognitive Conceptual Annotation (http://www.cs.huji.ac.il/~oabend/ucca.html), and Abstract Meaning Representation (http://amr.isi.edu/). Term projects will consist of (i) innovating on a representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.

COSC-578 | Statistical Machine Learning

Grace Hui Yang Graduate

Statistical machine learning brings together statistics and computational sciences such as computer science, system science, and optimization. The recent developments in bioinformatics, signal processing, information management, finance, and artificial intelligence have been largely influenced by statistical machine learning. With a focus on mathematical and algorithmic theories, this class offers basics in statistical methodology in dealing with applied problems in science and technology. Topics covered in the class include probability, mathematical statistics, inference, sampling, optimization, and their applications in machine learning. The class will have lectures, mathematical homework, exams, and a programming-based project.

COSC-688 | Experimental Artificial Intelligence (AI)

Grace Hui Yang Graduate

This course offers opportunities for students to have an in-depth understanding and hands-on experience with practical AI systems for state-of-the-art evaluation campaigns. It includes seminar-style classroom presentations and a significant project component. Students will be guided to go through the design and implementation of AI systems in different domains. The course will review recent AI and Machine Learning publications and lead students to work in small groups to build systems. Students are expected to have strong programming skills and previous experience in machine learning, deep learning, and/or AI.

LING-462/COSC-482 | Machine Translation

Achim Ruopp Upperclass Undergraduate & Graduate

More than 60 years ago Georgetown was one of the founding institutions of machine translation (MT) research – using computers to translate from one human language to another. Since then we have seen an evolution from rule-based approaches and statistical approaches to the now dominant deep learning approach with a slow, but steady increase in translation quality.

In this course we explore data-driven approaches to MT. We lay the foundations of the course with corpus preparation, statistical MT (SMT) and evaluating system output with automatic and human evaluation methods. Since the mid-2010’s deep learning-based neural machine translation (NMT) has become the dominant technique with superior translation quality. We learn about Transformers - the current state-of-the-art in NMT, and discuss NMT’s strengths and weaknesses. We explore how to adapt systems to different domains and use cases and how to use linguistic information to improve systems.

All of this is accompanied with practical assignments in which students can demonstrate their learning.

Requirements: Basic Python programming skills are required (for example satisfied by LING-362, Intro to NLP); knowledge of a second language is advantageous, but not required.

LING-472/ANLY-521 | Computational Linguistics with Advanced Python

Elizabeth Merkhofer Upperclass Undergraduate & Graduate

This course teaches advanced topics in programming for linguistic data analysis and processing using the Python language. A series of assignments will give students hands-on practice implementing core algorithms for linguistic tasks. By the end of the course, students will be able to transform pseudocode into well-written code for algorithms that make sense of textual data, and to evaluate the algorithms quantitatively and qualitatively. Linguistic tasks will include edit distance, semantic similarity, authorship detection, and named entity recognition. Python topics will include the appropriate use of data structures; mathematical objects in numpy; exception handling; object-oriented programming; and software development practices such as code documentation and version control.

Requirements: Basic Python programming skills are required (for example satisfied by LING-362, Intro to NLP)

LING-504 | Machine Learning for Linguistics

Amir Zeldes Graduate

In the past few years, the advent of abundant computing power and data has catapulted machine learning to the forefront of a number of fields of research, including Linguistics and especially Natural Language Processing. At the same time, general machine learning toolkits and tutorials make handling ‘default cases’ relatively easy, but are much less useful in handling non-standard data, less studied languages, low-resource scenarios and the need for interpretability that is essential for drawing robust inferences from data. This course gives a broad overview of the machine learning techniques most used for text processing and linguistic research. The course is taught in Python, covering both general statistical ML algorithms, such as linear models, SVMs, decision trees and ensembles, and current deep learning models, such as deep neural net classifiers, recurrent networks and contextualized continuous meaning representations. The course assumes good command of Python (ability to implement a program from pseudo-code) but does not require previous experience with machine learning.

Requirements: Intermediate Python (courses such as LING-472: Computational Linguistics with Advanced Python provide a good preparation)

LING-765 | Computational Discourse Models

Amir Zeldes Graduate

Recent years have seen an explosion of computational work on higher level discourse representations, such as entity recognition, mention and coreference resolution and shallow discourse parsing. At the same time, the theoretical status of the underlying categories is not well understood, and despite progress, these tasks remain very much unsolved in practice. This graduate level seminar will concentrate on theoretical and practical models representing how referring expressions, such as mentions of people, things and events, are coded during language processing. We will begin by exploring the literature on human discourse processing in terms of information structure, discourse coherence and theories about anaphora, such as Centering Theory and Alternative Semantics. We will then look at computational linguistics implementations of systems for entity recognition and coreference resolution and explore their relationship with linguistic theory. Over the course of the semester, participants will implement their own coding project exploring some phenomenon within the domain of entity recognition, coreference, discourse modeling or a related area.

GUCL: Computation and Language @ Georgetown

upcoming talks/events

Courses

Fall 2021

COSC-285 | Data Mining

Nazli Goharian Upperclass Undergraduate

COSC-488 | Information Retrieval

Nazli Goharian Upperclass Undergraduate & Graduate

COSC-689 | Deep Reinforcement Learning

Grace Hui Yang Graduate

COSC-872 | Seminar in NLP

Nathan Schneider Graduate: Doctoral [2 credits]

LING-362 | Introduction to Natural Language Processing

Amir Zeldes Upperclass Undergraduate & Graduate

LING-367 | Computational Corpus Linguistics

Amir Zeldes Upperclass Undergraduate & Graduate

LING-461 | Signal Processing

Corey Miller Upperclass Undergraduate & Graduate

ANLY-580 | NLP for Data Analytics

Chris Larson Graduate

ANLY-590 | Neural Nets and Deep Learning

James Hickman Graduate

Spring 2022

COSC-270 | Artificial Intelligence

Mark Maloof Undergraduate

COSC/LING-572 | Empirical Methods in Natural Language Processing

Nathan Schneider Graduate

COSC-586 | Text Mining & Analysis

Nazli Goharian Graduate

COSC/LING-672 | Advanced Semantic Representation

Nathan Schneider Graduate

COSC-578 | Statistical Machine Learning

Grace Hui Yang Graduate

COSC-688 | Experimental Artificial Intelligence (AI)

Grace Hui Yang Graduate

LING-462/COSC-482 | Machine Translation

Achim Ruopp Upperclass Undergraduate & Graduate

LING-472/ANLY-521 | Computational Linguistics with Advanced Python

Elizabeth Merkhofer Upperclass Undergraduate & Graduate

LING-504 | Machine Learning for Linguistics

Amir Zeldes Graduate

LING-765 | Computational Discourse Models

Amir Zeldes Graduate