GUCL: Computational Linguistics @ Georgetown
We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more. Members belong to the Linguistics and/or Computer Science departments.
GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab
Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative
Related academic groups in the DC/Baltimore region: JHU CLSP, UMD CLIP, George Mason NLP
- 9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
- 8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
- 9/10/18: #MeToo Movement on Twitter (Lisa Singh)
- 8/29/18: Cliches in baseball (Nathan Schneider)
- 1/20/18: The Coptic Scriptorium project (Amir Zeldes)
- Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award at EMNLP 2017! The paper is entitled "Depression and Self-Harm Risk Assessment in Online Forums."
- Congratulations to Ophir Frieder, who has been named to the European Academy of Sciences and Arts (EASA)!
- 9/19/16: "Email" Dominates What Americans Have Heard About Clinton (Lisa Singh)
- 7/12/16: Searching Harsh Environments (Ophir Frieder)
Mailing list: Contact Nathan Schneider to subscribe!
- Nianwen Xue (Brandeis): Linguistics, Thurs. 12/1/22, 3:30 in Poulton 230
- James Mayfield (JHU): CS, 2/10/23, 11:15 in STM 326/hybrid
- GURT 2023: Computational and Corpus Linguistics (conference on campus), 3/9-12/23
- Gabriella Pasi (University of Milano-Bicocca): CS, 3/24/23, 11:00 via Zoom
- Luca Soldaini (AI2): CS, 4/14/23,
11:002:00 in STM 414/hybrid
- Carolyn Penstein Rosé (CMU): Linguistics, 4/14/23, 3:30 in Poulton 230
- Aixin Sun (NTU Singapore): InfoSense, 4/27/23, 9:00pm via Zoom
- Rada Mihalcea (UMich): CS, 5/4/23, 11:00 in room TBA
- Previous talks
Overview of CL course offerings
Document listing courses in CS, Linguistics, and other departments that are most relevant to students interested in computational linguistics. Includes estimates of when each course will be offered.
COSC-288 | Introduction to Machine Learning
Mark Maloof Undergraduate
This undergraduate course surveys the major research areas of machine learning. Through traditional lectures and programming projects, students learn (1) to understand the foundations of machine learning, (2) to implement methods of machine learning in a high-level programming language, (3) to comprehend papers from the primary literature, and (4) to design and conduct their own studies. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include instance-based approaches, naive Bayes, decision trees, rule induction, linear classifiers, neural networks, support vector machines, ensemble methods, evaluation, and applications.
COSC-483/LING-463 | Dialogue Systems
Matthew Marge Upperclass Undergraduate & Graduate
Nearly all of us interact with dialogue systems -- from calling up banks and hotels, to talking with intelligent assistants like Siri, Alexa, or Cortana, dialogue systems enable people to get tasks done with software agents using language. Since the interaction is bi-directional, we must consider the fundamentals of how people engage in conversation so as to manage users’ expectations and track how information is exchanged in dialogue. Dialogue systems require an array of technologies to come together for them to work well, including speech recognition, natural language understanding, dialogue management, natural language generation, and speech synthesis. This course will explore what makes dialogue systems effective in commercial and research applications (ranging from personal assistants and chatbots to embodied conversational agents and language-directed robots) and how this contrasts with everyday human-human dialogue.
This course will introduce students to the fundamentals of dialogue systems, expanding on technologies and algorithms that are used in today’s dialogue systems and chatbots. There will also be emphasis on the psycholinguistic properties of human conversation (turn-taking, grounding) so as to prepare students for designing effective, user-friendly dialogue systems. The course will also include examining datasets and dialogue annotations used to train dialogue systems with machine learning algorithms. Coursework will consist of lectures, writing and programming assignments, and student-led presentations on special topics in dialogue. A final project will give students a chance to build their own dialogue system using open source and freely available software. This course is intended for students that are already comfortable with limited amounts of programming (in Python).
COSC-575 | Machine Learning
Mark Maloof Graduate
This course surveys the major research areas of machine learning, concentrating on inductive learning. The course will also compare and contrast machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics will include rule induction, decision trees, Bayesian methods, density estimation, linear classifiers, neural networks, instance-based approaches, genetic algorithms, evaluation, and applications. In addition to programming projects and homework, students will complete a semester project.
COSC/LING-672 | Advanced Semantic Representation
Nathan Schneider Graduate
Natural language is an imperfect vehicle for meaning. On the one hand, some expressions can be interpreted in multiple ways; on the other hand, there are often many superficially divergent ways to express very similar meanings. Semantic representations attempt to disentangle these two effects by exposing similarities and differences in how a word or sentence is interpreted. Such representations, and algorithms for working with them, constitute a major research area in natural language processing.
This course will examine semantic representations for natural language from a computational/NLP perspective. Through readings, presentations, discussions, and hands-on exercises, we will put a semantic representation under the microscope to assess its strengths and weaknesses. For each representation we will confront questions such as: What aspects of meaning are and are not captured? How well does the representation scale to the large vocabulary of a language? What assumptions does it make about grammar? How language-specific is it? In what ways does it facilitate manual annotation and automatic analysis? What datasets and algorithms have been developed for the representation? What has it been used for? In Fall 2018 the focus will be on Universal Cognitive Conceptual Annotation (http://www.cs.huji.ac.il/~oabend/ucca.html); its relationship to other representations in the literature will also be considered. Term projects will consist of (i) innovating on the representation's design, datasets, or analysis algorithms, or (ii) applying it to questions in linguistics or downstream NLP tasks.
COSC-586 | Text Mining & Analysis
Nazli Goharian Graduate
This course covers various aspects and research areas in text mining and analysis. Text may be a document, query, blog, tag description, etc. The structure of the course is a combination of lectures & students' presentations. The lectures will cover Text/Web/query classification, information extraction, word sense disambiguation, opinion mining & sentiment analysis, query log analysis, ontology extraction and integration, and more. The students are assigned a related topic in the field for further study and presentation in the class.
COSC-878 | Seminar: Large-Scale Statistical Machine Learning
Grace Hui Yang Graduate: Ph.D.
This doctoral seminar studies topics in statistical machine learning in the age of big data and artificial intelligence. In the seminar, we will read both classical and recent work in supervised learning, nonparametric models, optimization, and deep reinforcement learning. In the class, we will read textbooks and survey milestone papers. Students are expected to submit questions for the readings before each class and give presentations when it is their turns. To have first-hand experience, students are also expected to do a few programming exercises in the textbooks.
LING-362 | Introduction to Natural Language Processing
Amir Zeldes Upperclass Undergraduate & Graduate
This course will introduce students to the basics of Natural Language Processing (NLP), a field which combines insights from linguistics and computer science to produce applications such as machine translation, information retrieval, and spell checking. We will cover a range of topics that will help students understand how current NLP technology works and will provide students with a platform for future study and research. We will learn to implement simple representations such as finite-state techniques, n-gram models and basic parsing in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the course of the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.
LING-367 | Computational Corpus Linguistics
Amir Zeldes Upperclass Undergraduate & Graduate
Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/
LING-429 | Grammar Formalisms for Computational Research
Paul Portner Upperclass Undergraduate & Graduate
Linguists have developed a large number of formally precise syntactic theories, and many of them have been important tools for computational research. In this course, we will study five such systems with the goal of understanding both their perspective on syntax and its relation to parsing, production, and semantics, and will work to gain sufficient skill in using the formal systems to make them useful for computational work. The five systems we will discuss, along with classic early references, are the following:
- HPSG (Head-driven Phrase-structure Grammar: Pollard and Sag 1994; Sag, Wasow, and Bender 1999)
- CCG (Combinatory Categorial Grammar: Steedman 2000)
- LFG (Lexical Functional Grammar: Kaplan and Bresnan 1982, Dalrymple 2001)
- TAG (Tree Adjoining Grammar (Joshi 1987)
- Minimalist Grammars (Stabler 2001)
We will spend most of our time on HPSG (with its semantic theory Minimal Recursion Semantics, MRS) and CCG. HPSG is is both widely used in computational research and influential as a framework for studying syntax. CCG is an important modern version of the classical framework of categorial grammar and supports a direct syntax-semantics interface. We will also do brief one-week overviews of LFG and TAG, and will take a look at Minimalist Grammars because they represent a formalization of the Minimalist syntax familiar to many linguists.
ANLY-590 | Neural Nets and Deep Learning
Joshuah Touyz, Keegan Hines (2 sections) Graduate
This course will explore the fundamentals of artificial neural networks (ANNs) and deep learning. The following topics will be covered: feed-forward ANNs, activation functions, output transfer functions for regression and classification, cost functions and related likelihood functions, backpropagation and optimization (including stochastic gradient descent and conjugate gradient), auto-encoders for manifold learning and dimensionality reduction, convolutional neural networks, and recurrent neural networks. Overfitting and regularization will be discussed from both theoretical and practical viewpoints. Concepts and techniques will be applied to several domains including image processing, time series analysis, natural language processing, and more. Students will gain mastery of popular deep learning frameworks in the Python ecosystem including Tensorflow and Keras.
COSC-285 | Introduction to Data Mining
Nazli Goharian Upperclass Undergraduate
This course covers concepts and techniques in the field of data mining. This includes both supervised and unsupervised algorithms, such as naive Bayes, neural network, decision tree, rule based classifiers, distance based learners, clustering, and association rule mining. Various issues in the pre-processing of the data are addressed. Text classification, social media mining, and recommender systems will be addressed. The students learn the material by building various data mining models and using various data pre-processing techniques, performing experimentation and provide analysis of the results.
COSC-488 | Information Retrieval
Nazli Goharian Upperclass Undergraduate & Graduate
Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user. Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught.
The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the material by building a prototype of such a search engine. These approaches are in daily use by all search and social media companies.
COSC/LING-572 | Empirical Methods in Natural Language Processing
Nathan Schneider Graduate
Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.
This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.
Requirements: Intermediate Python programming skills are required (for example satisfied by LING-472, Comp Ling with Advanced Python)
COSC-689 | Deep Reinforcement Learning
Grace Hui Yang Graduate
Deep Reinforcement learning is an area of machine learning that learns how to make optimal decisions from interacting with an environment. From the environment, an agent observes the consequence of its action and alters its behavior to maximize the amount of rewards received in the long term. Reinforcement learning has developed strong mathematical foundations and impressive applications in diverse disciplines such as psychology, control theory, artificial intelligence, and neuroscience. An example is the winning of AlphaGo, developed using Monte Carlo tree search and deep neural networks, over world-class human Go players. The overall problem of learning from interaction to achieve goals is still far from being solved, but our understanding of it has improved significantly. In this course, we study fundamentals, algorithms, and applications in deep reinforcement learning. Topics include Markov Decision Processes, Multi-armed Bandits, Monte Carlo Methods, Temporal Difference Learning, Function Approximation, Deep Neural Networks, Actor-Critic, Deep Q-Learning, Policy Gradient Methods, and connections to Psychology and to Neuroscience. The course has lectures, mathematical and programming assignments, and exams.
COSC-872 | Seminar in NLP
Nathan Schneider Graduate: Doctoral [2 credits]
This course will expose students to current research in natural language processing and computational linguistics. Class meetings will consist primarily of student-led reading discussions, supplemented occasionally by lectures or hands-on activities. The subtopics and reading list will be determined at the start of the semester; readings will consist of research papers, advanced tutorials, and/or dissertations.
Requirements: Familiarity with NLP using machine learning methods (for example satisfied by COSC-572, Empirical Methods in NLP)
LING-469 | Analyzing language data with R
Amir Zeldes Upperclass Undergraduate & Graduate
This course will teach statistical analysis of language data with a focus on corpus materials, using the freely available statistics software 'R'. The course will begin with foundational notions and methods for statistical evaluation, hypothesis testing and visualization of linguistic data which are necessary for both the practice and the understanding of current quantitative research. As we progress we will learn exploratory methods to chart out meaningful structures in language data, such as agglomerative clustering, principal component analysis and multifactorial regression analysis. The course assumes basic mathematical skills and familiarity with linguistic methodology, but does not require a background in statistics or R.
LING-472/ANLY-521 | Computational Linguistics with Advanced Python
Elizabeth Merkhofer Upperclass Undergraduate & Graduate
This course teaches advanced topics in programming for linguistic data analysis and processing using the Python language. A series of assignments will give students hands-on practice implementing core algorithms for linguistic tasks. By the end of the course, students will be able to transform pseudocode into well-written code for algorithms that make sense of textual data, and to evaluate the algorithms quantitatively and qualitatively. Linguistic tasks will include edit distance, semantic similarity, authorship detection, and named entity recognition. Python topics will include the appropriate use of data structures; mathematical objects in numpy; exception handling; object-oriented programming; and software development practices such as code documentation and version control.
Requirements: Basic Python programming skills are required (for example satisfied by LING-362, Intro to NLP)
LING-765 | Computational Discourse Models
Amir Zeldes Graduate
Recent years have seen an explosion of computational work on higher level discourse representations, such as entity recognition, mention and coreference resolution and shallow discourse parsing. At the same time, the theoretical status of the underlying categories is not well understood, and despite progress, these tasks remain very much unsolved in practice. This graduate level seminar will concentrate on theoretical and practical models representing how referring expressions, such as mentions of people, things and events, are coded during language processing. We will begin by exploring the literature on human discourse processing in terms of information structure, discourse coherence and theories about anaphora, such as Centering Theory and Alternative Semantics. We will then look at computational linguistics implementations of systems for entity recognition and coreference resolution and explore their relationship with linguistic theory. Over the course of the semester, participants will implement their own coding project exploring some phenomenon within the domain of entity recognition, coreference, discourse modeling or a related area.