We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more, with participation from both the Linguistics and Computer Science departments.

GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab

Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative

Related academic groups in the DC/Baltimore region: Howard NLP, JHU CLSP, UMD CLIP, George Mason NLP

News & Media

10/29/25: Georgetown Grad Student Team Wins International Natural Language Processing Challenge (Corpling lab)
6/12/25: How Georgetown Linguists, Legal Expert Scored a Win in Supreme Court ‘Ghost Guns’ Case (Kevin Tobia, Nathan Schneider, Brandon Waldon)
9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
9/10/18: #MeToo Movement on Twitter (Lisa Singh)
8/29/18: Cliches in baseball (Nathan Schneider)
1/20/18: The Coptic Scriptorium project (Amir Zeldes)
Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award for their EMNLP 2017 paper!

Mailing list: Contact Nathan Schneider to subscribe!

upcoming talks/events

Maciej Ogrodniczuk (IPI PAN Warsaw): Linguistics, 9/6/24, 3:30 in Poulton 230
Alexis Palmer (Colorado Boulder): Linguistics, 9/20/24, 3:30 in Poulton 230
Barbara Plank (LMU Munich): CS, Thurs. 10/10/24, 1:00 in STM 414
Eugene Yang (JHU): CS, 11/1/24, 12:15 in STM 107
Kyle Mahowald (UT Austin): Linguistics, 11/1/24, 3:30 in Poulton 230
Linguistics Career Mixer, 3/19/25, 5:30 in Poulton Hall
William Schuler (OSU): Linguistics, 3/21/25, 3:30 in Poulton 230
Emily Pace: Linguistics Career Talk, 3/26/25, 3:30 in Poulton 230
Sorelle Friedler (Haverford College): CS, 4/4/25, 11:00 in room TBA
Ellie Pavlick (Brown): Linguistics, 4/4/25, ~~3:30~~ 1:30 in Poulton 230
Ziyu Yao (GMU): CS, 4/11/25, 11:30 in room TBA
Ethan Wilcox (Georgetown): Cognitive Science, 4/25/25, 1:00 in Leavey Conference Center Salon B
Sarah Mess, MD (JHU Hospital): Linguistics, 4/25/25, 3:30 in Poulton 230
Tom McCoy (Yale): Linguistics, 6/13/25, 2:00 in Poulton 230
Shane Steinert-Threlkeld (UW): Linguistics, 10/31/25, 3:30 in Poulton 230
Hal Daumé (UMD): CS Distinguished AI Talk, 1/16/26, 2:00 in Riggs Library (3rd floor Healy)
Yanjun (Jane) Qi (UVA): CS Distinguished AI Talk, ~~1/30/26~~ 2/6/26, 1:00 in Thomson Athletic Center—Nolan Hall
Idan Blank (UCLA): Linguistics, 2/13/26, 3:30 in Poulton 230
Nitesh Chawla (Notre Dame): CS Distinguished AI Talk, ~~2/20/26~~ to be rescheduled, TBD
Dawson Petersen (GU): Linguistics, 2/27/26, 3:30 in Poulton 230
Mohit Bansal (UNC): CS Distinguished AI Talk, 3/13/26, 2:00 in Riggs Library (3rd floor Healy)
Paul Bennett (Spotify): CS Distinguished AI Talk, 3/20/26, time and room TBA
Vered Shwartz (UBC): Linguistics, 3/20/26, 3:30 in Poulton 230
Sean Trott (Rutgers–Newark): Linguistics, 3/27/26, 3:30 in Poulton 230
Michael Littman (Brown): CS Distinguished AI Talk, Tues. 4/7/26, time and room TBA
Sejin Paik (GU): Linguistics, 4/24/26, 3:30 in Poulton 230
Previous talks

Courses

Overview of CL course offerings
Document listing courses in CS, Linguistics, and other departments that are most relevant to students interested in computational linguistics. Includes estimates of when each course will be offered.

COSC-270 | Artificial Intelligence

Mark Maloof Undergraduate

Artificial Intelligence (AI) is the branch of computer science that studies how to program computers to reason, learn, see, and understand. The lecture portion of this class surveys basic and advanced concepts and techniques of artificial intelligence, including search, knowledge representation, automated reasoning, uncertain reasoning, and machine learning. Additional topics include the Lisp programming language, theorem proving, game playing, rule-based systems, and philosophical issues. Applications of artificial intelligence will also be discussed and will include domains such as medicine, computer security, and face detection. Students must complete midterm and final exams, and five projects using the Lisp programming language.

COSC-488 | Information Retrieval

Nazli Goharian Upperclass Undergraduate & Graduate

Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user. Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught.

The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the material by building a prototype of such a search engine. These approaches are in daily use by all search and social media companies.

COSC-574 | Automated Reasoning

Mark Maloof Graduate

This graduate lecture surveys methods of automated deductive reasoning. Through traditional lectures, programming projects, paper presentations, and research projects, students learn (1) to understand the foundations of logical and probabilistic methods of automated reasoning. (2) to implement algorithms for logical and probabilistic reasoning, (3) to comprehend, analyze, and critique papers from the primary literature, (4) to replicate studies described in the primary literature, and (5) to design, conduct, and present their own studies. Topics include propositional logic, predicate logic, resolution proof, production systems, Prolog, uncertain reasoning, certainty factors, Bayesian decision theory, Bayesian networks, exact inference, approximate inference, first-order probabilistic models, probabilistic programming languages, and applications.

COSC-586 | Text Mining & Analysis

Nazli Goharian Graduate

This course covers various aspects and research areas in text mining and analysis. Text may be a document, query, blog, tag description, etc. The structure of the course is a combination of lectures & students' presentations. The lectures will cover Text/Web/query classification, information extraction, word sense disambiguation, opinion mining & sentiment analysis, query log analysis, ontology extraction and integration, and more. The students are assigned a related topic in the field for further study and presentation in the class.

LING-362 | Introduction to Natural Language Processing

Amir Zeldes Upperclass Undergraduate & Graduate

This course will introduce students to the basics of Natural Language Processing (NLP), a field which combines insights from linguistics and computer science to produce applications such as machine translation, information retrieval, and spell checking. We will cover a range of topics that will help students understand how current NLP technology works and will provide students with a platform for future study and research. We will learn to implement simple representations such as finite-state techniques, n-gram models and basic parsing in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the course of the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.

LING-367 | Computational Corpus Linguistics

Amir Zeldes Upperclass Undergraduate & Graduate

Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/

LING-461 | Topics in Computational Linguistics: Signal Processing

Corey Miller Upperclass Undergraduate & Graduate

This course will survey speech processing technology from a computational linguistic perspective. Speech processing technology is a component of human language technology that focuses on the processing of audio data. The audio data can be either the input or output of speech processing. When speech serves as the output, the technology is known as speech synthesis or text-to-speech (TTS). Additional technologies to be examined include spoken language identification (SLID), speaker verification and identification and speech diarization, which is the parsing of audio data into individual speaker segments.

Particular attention will be paid to the linguistic components of speech technology. Phonetics and phonology play an important role in both TTS and STT. In addition, morphology, syntax and pragmatics are important both in authentic modeling of TTS and in constraining possible STT output. Semantics plays a role in the interpretation of STT output, which can feed into text-based natural language processing (NLP).

The algorithms underlying contemporary speech technology approaches will be discussed. Despite the focus on the linguistic aspects of the technology, it is important for students to have sufficient understanding of the algorithms used in order to grasp both where linguistics fits in and the possible constraints on its incorporation into larger systems.

The course will examine freely available TTS and STT packages so that students can build their own engines and experiment with the construction of the components. For assignments and projects, students will be encouraged to pick a language or dialect of their choice in order to build a synthesizer or recognizer for that variety. It would be most interesting to focus on languages or varieties that do not generally receive attention in commercial applications, such as African American or accented varieties of English.

Students from a variety of backgrounds are encouraged to take this course. Helpful background includes: natural language processing, phonetics, phonology and sociolinguistics. While not required, helpful technical background includes familiarity with speech analysis software such as PRAAT, Linux, shell scripting and coding/scripting in languages like Python, Java, C++, etc.

COSC-285 | Data Mining

Nazli Goharian Upperclass Undergraduate

This course covers concepts and techniques in the field of data mining. This includes both supervised and unsupervised algorithms, such as naive Bayes, neural network, decision tree, rule based classifiers, distance based learners, clustering, and association rule mining. Various issues in the pre-processing of the data are addressed. Text classification, social media mining, and recommender systems will be addressed. The students learn the material by building various data mining models and using various data pre-processing techniques, performing experimentation and provide analysis of the results.

COSC/LING-572 | Empirical Methods in Natural Language Processing

Nathan Schneider Graduate

Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

COSC-576 | Introduction to Deep Learning with Neural Nets

Joe Garman Graduate

Recent advances in hardware have made deep learning with neural networks practical for real-world problems. Neural networks are a powerful tool that have shown benefit in a wide range of fields. Deep learning involves creating artificial neural networks with greater layer depth or deep neural nets (DNN) for short. These DNNs can find patterns in complex data, and are useful in a wide variety of situations. In numerous fields, state-of-the-art solutions have been accomplished with DNNs and DNN systems dominate head-to-head competitions. This course will introduce the student to neural networks, explain different neural network architectures, and then demonstrate the use of these neural networks on a wide array of tasks.

COSC-883 | Search and Mining of Textual Data

Nazli Goharian Graduate: Doctoral [2 credits]

In this doctoral seminar, doctoral students read, present, and discuss research papers on search and mining methodologies to process textual data of any form: short or long, general or domain specific, formal scientific text or some informal social media text. Student groups are assigned projects towards the aim of developing research insights.

LING-261 | Language and Computers

Emma Manning Undergraduate

Science fiction has promised us intelligent robots like C3P0 and HAL, but instead we're stuck with Siri. What happened? Why has getting computers to understand language proven so difficult?

In this course, we'll look at this question through the history of computational linguistics and natural language processing: what approaches have researchers taken over the last 60 years of computational linguistics. In what ways did those approaches succeed? In what ways did they fail?

Topics will include:

The Goals and Applications of Computational Linguistics
Pre-statistical Approaches to Natural Language Processing (NLP)
Modern, Statistical Approaches to NLP

Students will also learn:

Basic Theoretical Linguistics and Sociolinguistics
Intuitions of what's possible with NLP and computers
To address critically over-hyped claims of machine intelligence

The class will focus on the concepts behind these topics rather than implementing them, so no programming experience is required.

No prerequisites, though many concepts will overlap with those in Introduction to Language (LING-001).

LING-462/COSC-482 | Statistical Machine Translation

Achim Ruopp Upperclass Undergraduate & Graduate

After more than 60 years since Machine Translation (MT) research started at Georgetown, this area of Natural Language Processing (NLP) research is more active than ever. In this course we explore the data-driven approaches to translate human language with computers that supplanted rule-based approaches in the past quarter century. First, we lay foundations for the course with statistical NLP relevant to MT and corpus preparation. Next, we start exploring statistical MT (SMT) – from word-based models to phrase-based models to tree-based models. We will then cover domain-adaptation, incremental learning and how to integrate linguistic information. We will learn how to evaluate system output with automatic and human evaluation methods.

Recently, deep learning-based approaches have proven to produce superior translation quality compared to SMT. We will investigate the current state-of-the-art in neural MT (NMT) and contrast its strength and weaknesses with SMT.

Machine translation does not exist in a vacuum; it is now used to provide draft translations for human translators and is embedded in other NLP systems. With better quality, raw MT is increasingly used in in written and spoken human communication. We study the adaptation of MT for the most common applications.

Requirements: Basic Python programming skills are required (for example satisfied by LING-362, Intro to NLP)

LING-469 | Analyzing language data with R

Amir Zeldes Upperclass Undergraduate & Graduate

This course will teach statistical analysis of language data with a focus on corpus materials, using the freely available statistics software 'R'. The course will begin with foundational notions and methods for statistical evaluation, hypothesis testing and visualization of linguistic data which are necessary for both the practice and the understanding of current quantitative research. As we progress we will learn exploratory methods to chart out meaningful structures in language data, such as agglomerative clustering, principal component analysis and multifactorial regression analysis. The course assumes basic mathematical skills and familiarity with linguistic methodology, but does not require a background in statistics or R.

LING-472/ANLY-521 | Computational Linguistics with Advanced Python

Elizabeth Merkhofer Upperclass Undergraduate & Graduate

This course teaches advanced topics in programming for linguistic data analysis and processing using the Python language. A series of assignments will give students hands-on practice implementing core algorithms for linguistic tasks. By the end of the course, students will be able to transform pseudocode into well-written code for algorithms that make sense of textual data, and to evaluate the algorithms quantitatively and qualitatively. Linguistic tasks will include edit distance, semantic similarity, authorship detection, and named entity recognition. Python topics will include the appropriate use of data structures; mathematical objects in numpy; exception handling; object-oriented programming; and software development practices such as code documentation and version control.

Requirements: Basic Python programming skills are required (for example satisfied by LING-362, Intro to NLP)

LING-504 | Machine Learning for Linguistics

Amir Zeldes Graduate

In the past few years, the advent of abundant computing power and data has catapulted machine learning to the forefront of a number of fields of research, including Linguistics and especially Natural Language Processing. At the same time, general machine learning toolkits and tutorials make handling ‘default cases’ relatively easy, but are much less useful in handling non-standard data, less studied languages, low-resource scenarios and the need for interpretability that is essential for drawing robust inferences from data. This course gives a broad overview of the machine learning techniques most used for text processing and linguistic research. The course is taught in Python, covering both general statistical ML algorithms, such as linear models, SVMs, decision trees and ensembles, and current deep learning models, such as deep neural net classifiers, recurrent networks and contextualized continuous meaning representations. The course assumes good command of Python (ability to implement a program from pseudo-code) but does not require previous experience with machine learning.

Requirements: Intermediate Python (courses such as LING-472: Computational Linguistics with Advanced Python provide a good preparation)

GUCL: Computation and Language @ Georgetown

upcoming talks/events

Courses

Fall 2019

COSC-270 | Artificial Intelligence

Mark Maloof Undergraduate

COSC-488 | Information Retrieval

Nazli Goharian Upperclass Undergraduate & Graduate

COSC-574 | Automated Reasoning

Mark Maloof Graduate

COSC-586 | Text Mining & Analysis

Nazli Goharian Graduate

LING-362 | Introduction to Natural Language Processing

Amir Zeldes Upperclass Undergraduate & Graduate

LING-367 | Computational Corpus Linguistics

Amir Zeldes Upperclass Undergraduate & Graduate

LING-461 | Topics in Computational Linguistics: Signal Processing

Corey Miller Upperclass Undergraduate & Graduate

Spring 2020

COSC-285 | Data Mining

Nazli Goharian Upperclass Undergraduate

COSC/LING-572 | Empirical Methods in Natural Language Processing

Nathan Schneider Graduate

COSC-576 | Introduction to Deep Learning with Neural Nets

Joe Garman Graduate

COSC-883 | Search and Mining of Textual Data

Nazli Goharian Graduate: Doctoral [2 credits]

LING-261 | Language and Computers

Emma Manning Undergraduate

LING-462/COSC-482 | Statistical Machine Translation

Achim Ruopp Upperclass Undergraduate & Graduate

LING-469 | Analyzing language data with R

Amir Zeldes Upperclass Undergraduate & Graduate

LING-472/ANLY-521 | Computational Linguistics with Advanced Python

Elizabeth Merkhofer Upperclass Undergraduate & Graduate

LING-504 | Machine Learning for Linguistics

Amir Zeldes Graduate