GUCL: Computational Linguistics @ Georgetown
We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more, with participation from both the Linguistics and Computer Science departments.
GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab
Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative
Related academic groups in the DC/Baltimore region: Howard NLP, JHU CLSP, UMD CLIP, George Mason NLP
- 10/24/24: Op-ed and brief in Supreme Court ghost gun case (Kevin Tobia, Nathan Schneider, Brandon Waldon)
- 9/15/23: [Job] The Linguistics department will be hiring an Assistant Professor of Computational Linguistics.
- 9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
- 8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
- 9/10/18: #MeToo Movement on Twitter (Lisa Singh)
- 8/29/18: Cliches in baseball (Nathan Schneider)
- 1/20/18: The Coptic Scriptorium project (Amir Zeldes)
- Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award at EMNLP 2017!
- Congratulations to Ophir Frieder, who has been named to the European Academy of Sciences and Arts (EASA)!
- 9/19/16: "Email" Dominates What Americans Have Heard About Clinton (Lisa Singh)
- 7/12/16: Searching Harsh Environments (Ophir Frieder)
Mailing list: Contact Nathan Schneider to subscribe!
upcoming talks/events
- Maciej Ogrodniczuk (IPI PAN Warsaw): Linguistics, 9/6/24, 3:30 in Poulton 230
- Alexis Palmer (Colorado Boulder): Linguistics, 9/20/24, 3:30 in Poulton 230
- Barbara Plank (LMU Munich): CS, Thurs. 10/10/24, 1:00 in STM 414
- Eugene Yang (JHU): CS, 11/1/24, 12:15 in STM 107
- Kyle Mahowald (UT Austin): Linguistics, 11/1/24, 3:30 in Poulton 230
- William Schuler (OSU): Linguistics, 3/21/25, 3:30 in Poulton 230
- Ellie Pavlick (Brown): Linguistics, 4/4/25, 3:30 in Poulton 230
- Previous talks
Courses
Overview of CL course offerings (note: old numbering system)
Document listing courses in CS, Linguistics, and other departments
that are most relevant to students interested in computational linguistics.
Includes estimates of when each course will be offered.
Fall 2024
COSC-4463/LING-4463 (was 463) | Dialogue Systems
Claire Bonial Upperclass Undergraduate & Graduate
Nearly all of us interact with dialogue systems -- from calling up banks and hotels, to talking with intelligent assistants like Siri, Alexa, or Cortana, dialogue systems enable people to get tasks done with software agents using language. Since the interaction is bi-directional, we must consider the fundamentals of how people engage in conversation so as to manage users’ expectations and track how information is exchanged in dialogue. Dialogue systems require an array of technologies to come together for them to work well, including speech recognition, natural language understanding, dialogue management, natural language generation, and speech synthesis. This course will explore what makes dialogue systems effective in commercial and research applications (ranging from personal assistants and chatbots to embodied conversational agents and language-directed robots) and how this contrasts with everyday human-human dialogue.
This course will introduce students to the fundamentals of dialogue systems, expanding on technologies and algorithms that are used in today’s dialogue systems and chatbots. There will also be emphasis on the psycholinguistic properties of human conversation (turn-taking, grounding) so as to prepare students for designing effective, user-friendly dialogue systems. The course will also include examining datasets and dialogue annotations used to train dialogue systems with machine learning algorithms. Coursework will consist of lectures, writing and programming assignments, and student-led presentations on special topics in dialogue. A final project will give students a chance to build their own dialogue system using open source and freely available software. This course is intended for students that are already comfortable with limited amounts of programming (in Python).
COSC-4550 (was 488) | Information Retrieval
Nazli Goharian Upperclass Undergraduate & Graduate
Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user. Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught.
The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the material by building a prototype of such a search engine. These approaches are in daily use by all search and social media companies.
COSC-5450 | Foundations of Machine Learning
Grace Hui Yang Graduate
This course provides a comprehensive introduction to the core principles and methodologies of machine learning. The course is designed to cover essential topics such as probability theories, common distributions, point estimation, sampling, model selection, gradient optimization, and evaluation, ensuring a comprehensive understanding of the theoretical and algorithmic aspects of machine learning and providing fundamental concepts to support further study in supervised learning, unsupervised learning, and reinforcement learning. It will also cover cutting-edge topics such as shallow learning vs. deep learning, self-supervised learning, and high-dimensional learning, focusing on enabling students to understand the theories and principles behind the latest advancements in the field. By the end of this course, students will not only grasp the fundamental concepts of machine learning but also cultivate a mindset and skill set that are adaptable to the dynamic nature of technological progress, keeping pace with the rapidly evolving technological landscape in the field of machine learning. The class will have lectures, mathematical homework, and exams.
COSC-5455 (was 576) | Introduction to Deep Learning
Sarah BargalGraduate
Recent advances in hardware have made deep learning with neural networks practical for real-world problems. Neural networks are a powerful tool that have shown benefit in a wide range of fields. Deep learning involves creating artificial neural networks with greater layer depth or deep neural nets (DNN) for short. These DNNs can find patterns in complex data, and are useful in a wide variety of situations. In numerous fields, state-of-the-art solutions have been accomplished with DNNs and DNN systems dominate head-to-head competitions. This course will introduce the student to neural networks, explain different neural network architectures, and then demonstrate the use of these neural networks on a wide array of tasks.
LING-4400 (was 362) | Introduction to Natural Language Processing
Ethan Wilcox Upperclass Undergraduate & Graduate
This course will introduce students to the basics of Natural Language Processing (NLP), a field that combines linguistics and computer science to produce applications, such as generative AI, that are profoundly impacting our society. We will cover a range of topics that form the basis of these exciting technological advances and will provide students with a platform for future study and research in this area. We will learn to implement simple representations such as finite-state techniques, n-gram models, and topic models in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.
LING-4427 (was 367) | Computational Corpus Linguistics
Amir Zeldes Upperclass Undergraduate & Graduate
Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/
LING-8430 | Information, Structure and Language
Ethan Wilcox Upperclass Undergraduate & Graduate
This seminar brings together two divergent perspectives on human language. On one hand, linguistics research seeks to describe the structures that underlie human communication systems, often using formal tools such as grammars and logics. On the other hand, research in computer science, in particular information theory, seeks to discover the optimal way to package and transmit information over a channel. This seminar will focus on the intersection between these two programs: To what extent are human languages optimized for efficient communication? Can structural features of human language, or human linguistic behaviors be analyzed using the toolkit developed for efficient information exchange? Topics covered will include the structure of the lexicon, the relationship between syntactic and statistical dependencies, pragmatic inferences, as well various language-processing phenomena. Students will gain experience reading and presenting research papers in this area, and implementing concepts from information theory in code.
Prerequisite knowledge: Students should be proficient in at least one programming language (Python or R), and familiar with basic concepts of probability theory and/or machine learning.
DSAN-5800 (was ANLY-580) | Advanced NLP
Chris Larson Graduate
This course provides a formalism for understanding the statistical machine learning methods that have come to dominate natural language processing. Divided into three core modules, the course explores (i) how language understanding is framed as a tractable statistical inference problem, (ii) a formal yet practical treatment of the DNN architectures and learning algorithms used in NLP, and (iii) how these components are leveraged in modern AI systems such as information retrieval, recommender systems, and conversational agents. In exploring these topics, the course exposes students to the foundational math, practical applications, current research directions, and software design that is critical to gaining proficiency as an NLP/ML practitioner. The course culminates in a capstone project, conducted over its final six weeks, in which students apply NLP to an interesting problem of their choosing. In past semesters students have built chatbots, code completion tools, stock trading algorithms, just to name a few. This course assumes a basic understanding of linear algebra, probability theory, first order optimization methods, and proficiency in Python.
This is an advanced course. Suggested prerequisites are DSAN 5000, DSAN 5100 and DSAN 5400. However, first-year students with the necessary math, statistics, and deep learning background will be considered.
ICOS-7710 (was 710) | Cognitive Science Core Course
Abigail Marsh & Elissa Newport Graduate
A seminar in which important topics in cognitive science are taught by participating Georgetown faculty from the main and medical campuses. Required for the Cognitive Science concentration, available for Ph.D. students in other programs with instructor permission. (Can be taken more than once for credit.)
Spring 2025
COSC-3470 | Deep Learning
Sarah Bargal Upperclass Undergraduate
This course will focus on building state-of-the-art systems in the intersection of deep learning and computer vision. Student will be introduced to deep architectures and learning algorithms for various discriminative and generative computer vision tasks. The course will demonstrate how such tasks are main building blocks in processing images and videos for applications such as self-driving cars, healthcare, surveillance, and human-computer interfaces.
COSC/LING-5402 (was 572) | Empirical Methods in Natural Language Processing
Nathan Schneider Graduate
Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.
This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.
COSC-6422/LING-8422 (was COSC-672/LING-672) | Advanced Semantic Representation
Nathan Schneider Graduate
Natural language is an imperfect vehicle for meaning. On the one hand, some expressions can be interpreted in multiple ways; on the other hand, there are often many superficially divergent ways to express very similar meanings. Semantic representations attempt to disentangle these two effects by exposing similarities and differences in how a word or sentence is interpreted. Such representations, and algorithms for working with them, constitute a major research area in natural language processing.
This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? Representations covered in depth will include FrameNet (http://framenet.icsi.berkeley.edu), Universal Cognitive Conceptual Annotation (http://www.cs.huji.ac.il/~oabend/ucca.html), and Abstract Meaning Representation (http://amr.isi.edu/). Term projects will consist of (i) innovating on a representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.
COSC-6440 (was 689) | Deep Reinforcement Learning
Grace Hui Yang Graduate
Deep Reinforcement learning is an area of machine learning that learns how to make optimal decisions from interacting with an environment. From the environment, an agent observes the consequence of its action and alters its behavior to maximize the amount of rewards received in the long term. Reinforcement learning has developed strong mathematical foundations and impressive applications in diverse disciplines such as psychology, control theory, artificial intelligence, and neuroscience. An example is the winning of AlphaGo, developed using Monte Carlo tree search and deep neural networks, over world-class human Go players. The overall problem of learning from interaction to achieve goals is still far from being solved, but our understanding of it has improved significantly. In this course, we study fundamentals, algorithms, and applications in deep reinforcement learning. Topics include Markov Decision Processes, Multi-armed Bandits, Monte Carlo Methods, Temporal Difference Learning, Function Approximation, Deep Neural Networks, Actor-Critic, Deep Q-Learning, Policy Gradient Methods, and connections to Psychology and to Neuroscience. The course has lectures, mathematical and programming assignments, and exams.
LING-4400 (was 362) | Introduction to Natural Language Processing
Ethan Wilcox Upperclass Undergraduate & Graduate
This course will introduce students to the basics of Natural Language Processing (NLP), a field that combines linguistics and computer science to produce applications, such as generative AI, that are profoundly impacting our society. We will cover a range of topics that form the basis of these exciting technological advances and will provide students with a platform for future study and research in this area. We will learn to implement simple representations such as finite-state techniques, n-gram models, and topic models in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.
LING-4401/DSAN-5400 (was LING-472/ANLY-521) | Computational Linguistics with Advanced Python
Trevor Adriaanse Upperclass Undergraduate & Graduate
This course presents topics in Natural Language Processing (NLP) and Python programming for both text processing and analysis. The goal of this class is to explore both classical and modern techniques in NLP, with emphasis on hands-on application. We will examine topics such as text classification, model evaluation, nearest neighbors, and distributed representations. Applications include authorship identification, structured prediction, and semantic textual similarity, to name a few.
Programming topics include Python best practices, scientific computing libraries (e.g., NumPy, sklearn, etc.), exception handling, object-oriented programming, and more. By the end of this course, students will be able to program proficiently in Python, with enough comfort to reference software documentation and pseudocode to write sophisticated programs from scratch.
Requirements: Basic Python programming skills are required (for example satisfied by LING-4400, Intro to NLP)
LING-4428 | Corpus Approaches to Historical Linguistics
Amir Zeldes Upperclass Undergraduate & Graduate
This course provides an in-depth introduction to Comparative Historical Linguistics while focusing on the use of corpus data and corpus-based methods in the study of language change. Using samples from a broad range of Indo-European, Afro-Asiatic, and Sino-Tibetan languages, students will explore patterns of diachronic language change, from sound change laws, through morphological change, to syntactic and semantic change across language stages. The course emphasizes the integration of quantitative methods with traditional linguistic analysis, offering a comprehensive understanding of how languages evolve through the lens of actual examples. By working on projects on phonological, morphosyntactic or semantic change, students will gain practical experience in applying corpus techniques to reconstruct historical developments and uncover the dynamics of language history.
LING-4464 (was 464) | Social Factors in Computational Linguistics and AI
Shabnam Tafreshi Upperclass Undergraduate & Graduate
Advances in technologies for processing human languages have increasingly brought computational linguistics into contact with people. As such, what language reveals about people—and how AI algorithms make decisions affecting people based on their language—is of paramount concern. At the same time, contemporary algorithms for processing language offer powerful new tools for studying people and society on a large scale. Designed for students with grounding in computational linguistics, this course will examine the intersection of people, language, and algorithms with technical precision as well as an appreciation for human context. Topics will include: computational models of conversational interaction and power dynamics; emotions, sentiment, subjectivity, and politeness; toxic language; sociolinguistic variation; detection of attributes such as race and gender; issues of privacy, ethics, bias, and fairness, with special attention to minoritized speakers, languages, and dialects; and the use of large-scale language data for studying political framing and social movements like #MeToo and Black Lives Matter.
LING-4449 (was 469) | Analyzing language data with R
Amir Zeldes Upperclass Undergraduate & Graduate
This course will teach statistical analysis of language data with a focus on corpus materials, using the freely available statistics software 'R'. The course will begin with foundational notions and methods for statistical evaluation, hypothesis testing and visualization of linguistic data which are necessary for both the practice and the understanding of current quantitative research. As we progress we will learn exploratory methods to chart out meaningful structures in language data, such as agglomerative clustering, principal component analysis and multifactorial regression analysis. The course assumes basic mathematical skills and familiarity with linguistic methodology, but does not require a background in statistics or R.
ICOS-7712 (was 712) | Cognitive Science Seminar
Abigail Marsh & Elissa Newport Graduate
A seminar in which graduate students and faculty interested in the cognitive sciences will read and discuss prominent articles across our fields. Can be repeated for credit.