We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more, with participation from both the Linguistics and Computer Science departments.

GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab

Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative

Related academic groups in the DC/Baltimore region: Howard NLP, JHU CLSP, UMD CLIP, George Mason NLP

News & Media

6/12/25: How Georgetown Linguists, Legal Expert Scored a Win in Supreme Court ‘Ghost Guns’ Case (Kevin Tobia, Nathan Schneider, Brandon Waldon)
9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
9/10/18: #MeToo Movement on Twitter (Lisa Singh)
8/29/18: Cliches in baseball (Nathan Schneider)
1/20/18: The Coptic Scriptorium project (Amir Zeldes)
Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award at EMNLP 2017!
Congratulations to Ophir Frieder, who has been named to the European Academy of Sciences and Arts (EASA)!
9/19/16: "Email" Dominates What Americans Have Heard About Clinton (Lisa Singh)
7/12/16: Searching Harsh Environments (Ophir Frieder)

Mailing list: Contact Nathan Schneider to subscribe!

upcoming talks/events

Maciej Ogrodniczuk (IPI PAN Warsaw): Linguistics, 9/6/24, 3:30 in Poulton 230
Alexis Palmer (Colorado Boulder): Linguistics, 9/20/24, 3:30 in Poulton 230
Barbara Plank (LMU Munich): CS, Thurs. 10/10/24, 1:00 in STM 414
Eugene Yang (JHU): CS, 11/1/24, 12:15 in STM 107
Kyle Mahowald (UT Austin): Linguistics, 11/1/24, 3:30 in Poulton 230
Linguistics Career Mixer, 3/19/25, 5:30 in Poulton Hall
William Schuler (OSU): Linguistics, 3/21/25, 3:30 in Poulton 230
Emily Pace: Linguistics Career Talk, 3/26/25, 3:30 in Poulton 230
Sorelle Friedler (Haverford College): CS, 4/4/25, 11:00 in room TBA
Ellie Pavlick (Brown): Linguistics, 4/4/25, ~~3:30~~ 1:30 in Poulton 230
Ziyu Yao (GMU): CS, 4/11/25, 11:30 in room TBA
Ethan Wilcox (Georgetown): Cognitive Science, 4/25/25, 1:00 in Leavey Conference Center Salon B
Sarah Mess, MD (JHU Hospital): Linguistics, 4/25/25, 3:30 in Poulton 230
Tom McCoy (Yale): Linguistics, 6/13/25, 2:00 in Poulton 230
Previous talks

Courses

Overview of CL course offerings (April 2025)
Document listing courses in CS, Linguistics, and other departments that are most relevant to students interested in computational linguistics.

COSC-3470 | Deep Learning

Sarah Bargal Upperclass Undergraduate

This course will focus on building state-of-the-art systems in the intersection of deep learning and computer vision. Student will be introduced to deep architectures and learning algorithms for various discriminative and generative computer vision tasks. The course will demonstrate how such tasks are main building blocks in processing images and videos for applications such as self-driving cars, healthcare, surveillance, and human-computer interfaces.

COSC/LING-4467 | Speech & Audio Processing with Deep Neural Networks

Joe GarmanUpperclass Undergraduate & Graduate

This course covers modern deep learning approaches for speech recognition, synthesis, and audio processing. Students learn PyTorch implementation of neural architectures, from foundational networks to state-of-the-art transformer models. Topics include basic text processing, audio feature extraction, automatic speech recognition, text-to-speech synthesis, and audio/music generation. The course emphasizes hands-on experience through weekly programming assignments using PyTorch. Prior programming experience in Python required; no previous signal processing or deep learning experience assumed. Designed for computational linguistics and computer science graduate students or advanced undergraduates.

COSC-5480 | Large Language Models

Grace Hui Yang Graduate

This course delves deep into the intricacies of Large Language Models (LLMs), offering students an understanding of their design, implementation, and applications. Beginning with the foundational architectures such as transformers and attention mechanisms, students will journey through the evolution from the fundamental models to contemporary marvels like GPT-3, ChatGPT, and GPT-4. The course aims to provide a comprehensive overview of the historical and current state of LLMs, equipping students with the knowledge to design, train, and fine-tune LLMs for custom applications. It will also encourage critical discussions on the ethical, societal, and technical challenges associated with LLMs. Key topics covered in the course include (1) Foundations: Review of RNNs, LSTMs, Attention Mechanisms, and Transformers. (2) Architectural Deep Dive: Behind the design of GPT-3, BERT, and other leading models. (3) Training Paradigms: Techniques and challenges in training massive models. (4) Applications: chatbots, content generation, recommendation systems, and beyond. (5) Societal Impact: Ethical considerations, fairness, and bias in LLMs. (6) Technical Challenges: Model explainability, controllability, and safety concerns. (7) Future Directions: Where LLMs are headed and emerging research areas. The course assessments consist of monthly assignments involving practical implementations and model evaluations, exams covering theoretical and applied concepts, and one optional final project focusing on designing a custom application utilizing LLMs. Class participation and critical discussion sessions are also important components in student assessments.

COSC-5540 | Text Mining & Analysis

Nazli Goharian Graduate

This course covers various concepts and research areas in text search and mining. The structure of the course is a combination of lectures & students' presentations. The lectures will cover various search technologies, classification, text summarization, opinion and sentiment mining, covering applications on varying domains and formats, including scientific, health, and social media. The students are assigned a related topic in the field for further study, implementation, experimentation and presentation in the class.

COSC-8405 | Seminar in NLP

Nathan Schneider Graduate: Doctoral [2 credits]

This course will expose students to current research in natural language processing and computational linguistics. Class meetings will consist primarily of student-led reading discussions, supplemented occasionally by lectures or hands-on activities. The subtopics and reading list will be determined at the start of the semester; readings will consist of research papers, advanced tutorials, and/or dissertations.

Requirements: Familiarity with NLP using machine learning methods (for example satisfied by COSC-5402, Empirical Methods in NLP)

LING-2040/4400 | Computational Language Processing (a.k.a. Introduction to Natural Language Processing)

Amir Zeldes Undergraduate & Graduate

This course will introduce students to the basics of Natural Language Processing (NLP), a field that combines linguistics and computer science to produce applications, such as generative AI, that are profoundly impacting our society. We will cover a range of topics that form the basis of these exciting technological advances and will provide students with a platform for future study and research in this area. We will learn to implement simple representations such as finite-state techniques, n-gram models, and topic models in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.

LING-4401/DSAN-5400 | Computational Linguistics with Advanced Python

Trevor Adriaanse Upperclass Undergraduate & Graduate

This course presents topics in Natural Language Processing (NLP) and Python programming for both text processing and analysis. The goal of this class is to explore both classical and modern techniques in NLP, with emphasis on hands-on application. We will examine topics such as text classification, model evaluation, nearest neighbors, and distributed representations. Applications include authorship identification, structured prediction, and semantic textual similarity, to name a few.

Programming topics include Python best practices, scientific computing libraries (e.g., NumPy, sklearn, etc.), exception handling, object-oriented programming, and more. By the end of this course, students will be able to program proficiently in Python, with enough comfort to reference software documentation and pseudocode to write sophisticated programs from scratch.

Requirements: Basic Python programming skills are required (for example satisfied by LING-4400, Computational Language Processing/Intro to NLP)

LING-4424 | All About Prepositions

Nathan Schneider Upperclass Undergraduate & Graduate

This course will take on the grammatical category of prepositions, which are hands-down some of the most intriguing and beguiling words once you get to know them. (How many prepositions are there in the previous sentence? The answer may surprise you!) We will look at their syntactic and semantic versatility in English and how they vary across languages. We will explore how they denote relations in space and time, as well as many other kinds of meanings. We will see why they are so hard to learn in a second language, and why they are difficult to define in dictionaries and teach to computers. The course will be project-based, including a significant project on a language other than English.

Prerequisites: Some background in syntactic description, e.g. satisfied by LING-2020, LING-4427, or LING-5127

LING-4427 | Computational Corpus Linguistics

Amir Zeldes Upperclass Undergraduate & Graduate

Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/

LING-4480 | Computational Linguistics Research Methods

Ethan Wilcox Upperclass Undergraduate & Graduate

Computational Linguistics is a fast-growing and fast-moving field. This course is intended to give advanced undergraduate and graduate students practice conducting original research in computational linguistics and to enhance their research and communication skills. It will serve as a platform for students to pursue an independent research project with guidance and oversight from faculty and peers. Students will be expected to bring their own pre-existing research topics/questions to the class. Over the course of the semester, they will select and present key research papers pertinent to their topic, and develop the project with the goal of writing an ACL-style conference proceedings paper. In addition to hands-on research, this class will provide a venue for students to learn CL-related skills that often fall through the cracks of other, content-focused courses. Possible workshop topics include data annotation, LaTeX, and working with pretrained language models, as well as communication skills such as poster and slide design. As a hands-on course whose content changes based on the instructor and students, this course can be repeated for credit.

DSAN-5800 | Advanced NLP

Chris Larson Graduate

This course provides a formalism for understanding the statistical machine learning methods that have come to dominate natural language processing. Divided into three core modules, the course explores (i) how language understanding is framed as a tractable statistical inference problem, (ii) a formal yet practical treatment of the DNN architectures and learning algorithms used in NLP, and (iii) how these components are leveraged in modern AI systems such as information retrieval, recommender systems, and conversational agents. In exploring these topics, the course exposes students to the foundational math, practical applications, current research directions, and software design that is critical to gaining proficiency as an NLP/ML practitioner. The course culminates in a capstone project, conducted over its final six weeks, in which students apply NLP to an interesting problem of their choosing. In past semesters students have built chatbots, code completion tools, stock trading algorithms, just to name a few. This course assumes a basic understanding of linear algebra, probability theory, first order optimization methods, and proficiency in Python.

This is an advanced course. Suggested prerequisites are DSAN 5000, DSAN 5100 and DSAN 5400. However, first-year students with the necessary math, statistics, and deep learning background will be considered.

ICOS-7710 | Cognitive Science Core Course

Abigail Marsh & Elissa Newport Graduate

A seminar in which important topics in cognitive science are taught by participating Georgetown faculty from the main and medical campuses. Required for the Cognitive Science concentration, available for Ph.D. students in other programs with instructor permission. (Can be taken more than once for credit.)

Additional courses TBA.

LING-2040/4400 | Computational Language Processing (a.k.a. Introduction to Natural Language Processing)

Ethan Wilcox Undergraduate & Graduate

COSC/LING-5402 | Empirical Methods in Natural Language Processing

Nathan Schneider Graduate

Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.

This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.

ICOS-7712 | Cognitive Science Seminar

Abigail Marsh & Elissa Newport Graduate

A seminar in which graduate students and faculty interested in the cognitive sciences will read and discuss prominent articles across our fields. Can be repeated for credit.

GUCL: Computational Linguistics @ Georgetown

upcoming talks/events

Courses

Fall 2025

COSC-3470 | Deep Learning

Sarah Bargal Upperclass Undergraduate

COSC/LING-4467 | Speech & Audio Processing with Deep Neural Networks

Joe GarmanUpperclass Undergraduate & Graduate

COSC-5480 | Large Language Models

Grace Hui Yang Graduate

COSC-5540 | Text Mining & Analysis

Nazli Goharian Graduate

COSC-8405 | Seminar in NLP

Nathan Schneider Graduate: Doctoral [2 credits]

LING-2040/4400 | Computational Language Processing (a.k.a. Introduction to Natural Language Processing)

Amir Zeldes Undergraduate & Graduate

LING-4401/DSAN-5400 | Computational Linguistics with Advanced Python

Trevor Adriaanse Upperclass Undergraduate & Graduate

LING-4424 | All About Prepositions

Nathan Schneider Upperclass Undergraduate & Graduate

LING-4427 | Computational Corpus Linguistics

Amir Zeldes Upperclass Undergraduate & Graduate

LING-4480 | Computational Linguistics Research Methods

Ethan Wilcox Upperclass Undergraduate & Graduate

DSAN-5800 | Advanced NLP

Chris Larson Graduate

ICOS-7710 | Cognitive Science Core Course

Abigail Marsh & Elissa Newport Graduate

Spring 2026

Additional courses TBA.

LING-2040/4400 | Computational Language Processing (a.k.a. Introduction to Natural Language Processing)

Ethan Wilcox Undergraduate & Graduate

COSC/LING-5402 | Empirical Methods in Natural Language Processing

Nathan Schneider Graduate

ICOS-7712 | Cognitive Science Seminar

Abigail Marsh & Elissa Newport Graduate