GUCL: Computation and Language @ Georgetown
We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more, with participation from both the Linguistics and Computer Science departments.
GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab
Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative
Related academic groups in the DC/Baltimore region: Howard NLP, JHU CLSP, UMD CLIP, George Mason NLP
- 10/29/25: Georgetown Grad Student Team Wins International Natural Language Processing Challenge (Corpling lab)
- 6/12/25: How Georgetown Linguists, Legal Expert Scored a Win in Supreme Court ‘Ghost Guns’ Case (Kevin Tobia, Nathan Schneider, Brandon Waldon)
- 9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
- 8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
- 9/10/18: #MeToo Movement on Twitter (Lisa Singh)
- 8/29/18: Cliches in baseball (Nathan Schneider)
- 1/20/18: The Coptic Scriptorium project (Amir Zeldes)
- Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award for their EMNLP 2017 paper!
Mailing list: Contact Nathan Schneider to subscribe! 
upcoming talks/events
- Mohit Bansal (UNC): CS Distinguished AI Talk, 3/13/26, 2:00 in Riggs Library (3rd floor Healy)
- Paul Bennett (Spotify): CS Distinguished AI Talk, 3/20/26, 2:00 in Thomson Athletic Center—Nolan Hall
- Vered Shwartz (UBC): Linguistics, 3/20/26, 3:30 in Poulton 230
- Sean Trott (Rutgers–Newark): Linguistics, 3/27/26, 3:30 in Poulton 230
- Nitesh Chawla (Notre Dame): CS Distinguished AI Talk, Tues. 3/31/26, time and room TBA
- Michael Littman (Brown): CS Distinguished AI Talk, Tues. 4/7/26, time and room TBA
- Sejin Paik (GU): Linguistics, 4/24/26, 3:30 in Poulton 230
- Previous talks
Courses
Overview of CL course offerings (April 2025)
Document listing courses in CS, Linguistics, and other departments
that are most relevant to students interested in computational linguistics.
Fall 2026
COSC-4463/LING-4463 | AI, Language, and Interaction (was: Dialogue Systems)
Claire Bonial Upperclass Undergraduate & Graduate
AI, Language, and Interaction examines how intelligent systems learn, represent, and use language, with a focus on the technical foundations of chatbots, task-oriented dialogue systems, and emerging LLM-based interactive agents. Bridging computer science and linguistics, the course explores parallels between usage-based theories such as Construction Grammar and the probabilistic linguistic patterns large language models acquire from text, while also addressing the limits of current systems in reasoning, grounding, and embodiment. Students will study how dialogue systems interpret context, manage interaction, and support real-world communication, with attention to future directions including multimodal and human-robot dialogue. The course is project-based, and final projects may take either a computational or a linguistic orientation, from building dialogue systems to conducting data-driven analyses of language and interaction. No formal prerequisites are required, but students should have prior coursework or a substantive interest in computer science, linguistics, or both.
COSC-4550 | Information Retrieval
Nazli Goharian Upperclass Undergraduate & Graduate
Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user. Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught.
The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the material by building a prototype of such a search engine. These approaches are in daily use by all search and social media companies.
LING-2040/4400 | Computational Language Processing
Amir Zeldes (section 01); Ethan Wilcox (section 02) Undergraduate & Graduate
This course will introduce students to the basics of Natural Language Processing (NLP), a field that combines linguistics and computer science to produce applications, such as generative AI, that are profoundly impacting our society. We will cover a range of topics that form the basis of these exciting technological advances and will provide students with a platform for future study and research in this area. We will learn to implement simple representations such as finite-state techniques, n-gram models, and topic models in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.
LING-4401/DSAN-5400 | Computational Linguistics with Advanced Python
Trevor Adriaanse Upperclass Undergraduate & Graduate
This course presents topics in Natural Language Processing (NLP) and Python programming for both text processing and analysis. The goal of this class is to explore both classical and modern techniques in NLP, with emphasis on hands-on application. We will examine topics such as text classification, model evaluation, nearest neighbors, and distributed representations. Applications include authorship identification, structured prediction, and semantic textual similarity, to name a few.
Programming topics include Python best practices, scientific computing libraries (e.g., NumPy, sklearn, etc.), exception handling, object-oriented programming, and more. By the end of this course, students will be able to program proficiently in Python, with enough comfort to reference software documentation and pseudocode to write sophisticated programs from scratch.
Requirements: Basic Python programming skills are required (for example satisfied by LING-4400, Computational Language Processing/Intro to NLP)
LING-4427 | Computational Corpus Linguistics
Amir Zeldes Upperclass Undergraduate & Graduate
Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/
LING-4480 | Computational Linguistics Research Methods
Nathan Schneider Upperclass Undergraduate & Graduate
Computational Linguistics is a fast-growing and fast-moving field. This course is intended to give advanced undergraduate and graduate students practice conducting original research in computational linguistics and to enhance their research and communication skills. It will serve as a platform for students to pursue an independent research project with guidance and oversight from faculty and peers. Students will be expected to bring their own pre-existing research topics/questions to the class. Over the course of the semester, they will select and present key research papers pertinent to their topic, and develop the project with the goal of writing an ACL-style conference proceedings paper. In addition to hands-on research, this class will provide a venue for students to learn CL-related skills that often fall through the cracks of other, content-focused courses. Possible workshop topics include data annotation, LaTeX, and working with pretrained language models, as well as communication skills such as poster and slide design. As a hands-on course whose content changes based on the instructor and students, this course can be repeated for credit.
LING-8430 | Information, Structure and Language
Ethan Wilcox Upperclass Undergraduate & Graduate
This seminar brings together two divergent perspectives on human language. On one hand, linguistics research seeks to describe the structures that underlie human communication systems, often using formal tools such as grammars and logics. On the other hand, research in computer science, in particular information theory, seeks to discover the optimal way to package and transmit information over a channel. This seminar will focus on the intersection between these two programs: To what extent are human languages optimized for efficient communication? Can structural features of human language, or human linguistic behaviors be analyzed using the toolkit developed for efficient information exchange? Topics covered will include the structure of the lexicon, the relationship between syntactic and statistical dependencies, pragmatic inferences, as well various language-processing phenomena. Students will gain experience reading and presenting research papers in this area, and implementing concepts from information theory in code.
Prerequisite knowledge: Students should be proficient in at least one programming language (Python or R), and familiar with basic concepts of probability theory and/or machine learning.
ICOS-7710 | Cognitive Science Core Course
Abigail Marsh & Elissa Newport Graduate
A seminar in which important topics in cognitive science are taught by participating Georgetown faculty from the main and medical campuses. Required for the Cognitive Science concentration, available for Ph.D. students in other programs with instructor permission. (Can be taken more than once for credit.)
Spring 2027
Additional courses TBA.
LING-2040/4400 | Computational Language Processing
Ethan Wilcox Undergraduate & Graduate
This course will introduce students to the basics of Natural Language Processing (NLP), a field that combines linguistics and computer science to produce applications, such as generative AI, that are profoundly impacting our society. We will cover a range of topics that form the basis of these exciting technological advances and will provide students with a platform for future study and research in this area. We will learn to implement simple representations such as finite-state techniques, n-gram models, and topic models in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.
COSC/LING-5402 | Empirical Methods in Natural Language Processing
Nathan Schneider Graduate
Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.
This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.
COSC-6422/LING-8422 | Advanced Semantic Representation
Nathan Schneider Graduate
Natural language is an imperfect vehicle for meaning. On the one hand, some expressions can be interpreted in multiple ways; on the other hand, there are often many superficially divergent ways to express very similar meanings. Semantic representations attempt to disentangle these two effects by exposing similarities and differences in how a word or sentence is interpreted. Such representations, and algorithms for working with them, constitute a major research area in natural language processing.
This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? Representations covered in depth will include FrameNet (http://framenet.icsi.berkeley.edu), Universal Cognitive Conceptual Annotation (http://www.cs.huji.ac.il/~oabend/ucca.html), and Abstract Meaning Representation (http://amr.isi.edu/). Term projects will consist of (i) innovating on a representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.
DSAN-5400 | Computational Linguistics with Advanced Python
Trevor Adriaanse Graduate
This course presents topics in Natural Language Processing (NLP) and Python programming. The goal of this class is to explore techniques in NLP, with a strong emphasis on hands-on instruction that progressively matures basic Python users into expert Python developers. We will examine topics such as text classification, model evaluation, machine translation, and distributed representations. Throughout the semester, students will select and read a book on AI ethics to motivate discussions on the social impact of modern NLP technologies. Applications include authorship identification, retrieval, and textual similarity, to name a few.
About half of the total class time is devoted to addressing an essential but often neglected piece in software development education: moving from typical data science programming workflows (such as writing basic scripts) to developing sophisticated Python projects. In other words, students will learn to design professional-grade software that they and others will be proud to contribute to together. Programming topics are explored in great depth, including Python best practices, object-oriented design, project structuring, and more. This class will give students the skills they need to contribute to the professional software repositories they work with already and even develop their own.
ICOS-7712 | Cognitive Science Seminar
Abigail Marsh Graduate
A seminar in which graduate students and faculty interested in the cognitive sciences will read and discuss prominent articles across our fields. Can be repeated for credit.