GUCL: Computational Linguistics @ Georgetown
We are a group of Georgetown University faculty, student, and staff researchers at the intersection of language and computation. Our areas of expertise include natural language processing, corpus linguistics, information retrieval, text mining, and more. Members belong to the Linguistics and/or Computer Science departments.
GU research groups: Corpling, NERT, IRLab, InfoSense, Singh lab
Other GU groups: GU-HLT Group, GU Women Coders, Massive Data Institute, Tech & Society Initiative
Related academic groups in the DC/Baltimore region: JHU CLSP, UMD CLIP, George Mason NLP
- 9/8/21: Congratulations to the Corpling lab on winning the DISRPT 2021 shared task on discourse processing!
- 8/27/20: First-Year Student Presented Paper at Prestigious Computational Linguistics Conference (Aryaman Arora)
- 9/10/18: #MeToo Movement on Twitter (Lisa Singh)
- 8/29/18: Cliches in baseball (Nathan Schneider)
- 1/20/18: The Coptic Scriptorium project (Amir Zeldes)
- Congratulations to Arman Cohan, Nazli Goharian, and Georgetown alum Andrew Yates for winning a Best Long Paper award at EMNLP 2017! The paper is entitled "Depression and Self-Harm Risk Assessment in Online Forums."
- Congratulations to Ophir Frieder, who has been named to the European Academy of Sciences and Arts (EASA)!
- 9/19/16: "Email" Dominates What Americans Have Heard About Clinton (Lisa Singh)
- 7/12/16: Searching Harsh Environments (Ophir Frieder)
Mailing list: Contact Nathan Schneider to subscribe!
- Nianwen Xue (Brandeis): Linguistics, Thurs. 12/1/22, 3:30 in Poulton 230
- James Mayfield (JHU): CS, 2/10/23, 11:15 in STM 326/hybrid
- GURT 2023: Computational and Corpus Linguistics (conference on campus), 3/9-12/23
- Gabriella Pasi (University of Milano-Bicocca): CS, 3/24/23, 11:00 via Zoom
- Luca Soldaini (AI2): CS, 4/14/23,
11:002:00 in STM 414/hybrid
- Carolyn Penstein Rosé (CMU): Linguistics, 4/14/23, 3:30 in Poulton 230
- Aixin Sun (NTU Singapore): InfoSense, 4/27/23, 9:00pm via Zoom
- Rada Mihalcea (UMich): CS, 5/4/23, 11:00 in room TBA
- Previous talks
Overview of CL course offerings
Document listing courses in CS, Linguistics, and other departments that are most relevant to students interested in computational linguistics. Includes estimates of when each course will be offered.
COSC-288 | Introduction to Machine Learning
Mark Maloof Undergraduate
This undergraduate course surveys the major research areas of machine learning focusing on classification. Through traditional lectures and programming projects, students learn (1) to understand the foundations of machine learning, (2) to design and implement methods of machine learning, (3) to evaluate methods of machine learning, and (4) to conduct empirical evaluations of multiple methods of machine learning. The course compares and contrasts machine learning with related endeavors, such as statistical learning, pattern classification, data mining, and information retrieval. Topics include instance-based approaches, naive Bayes, decision trees, rule induction, linear classifiers, support vector machines, neural networks, ensemble methods, evaluation, and applications. Students complete five programming projects using Java. There are midterm and final exams.
COSC-483/LING-463 | Dialogue Systems
Matthew Marge Upperclass Undergraduate & Graduate
Nearly all of us interact with dialogue systems -- from calling up banks and hotels, to talking with intelligent assistants like Siri, Alexa, or Cortana, dialogue systems enable people to get tasks done with software agents using language. Since the interaction is bi-directional, we must consider the fundamentals of how people engage in conversation so as to manage users’ expectations and track how information is exchanged in dialogue. Dialogue systems require an array of technologies to come together for them to work well, including speech recognition, natural language understanding, dialogue management, natural language generation, and speech synthesis. This course will explore what makes dialogue systems effective in commercial and research applications (ranging from personal assistants and chatbots to embodied conversational agents and language-directed robots) and how this contrasts with everyday human-human dialogue.
This course will introduce students to the fundamentals of dialogue systems, expanding on technologies and algorithms that are used in today’s dialogue systems and chatbots. There will also be emphasis on the psycholinguistic properties of human conversation (turn-taking, grounding) so as to prepare students for designing effective, user-friendly dialogue systems. The course will also include examining datasets and dialogue annotations used to train dialogue systems with machine learning algorithms. Coursework will consist of lectures, writing and programming assignments, and student-led presentations on special topics in dialogue. A final project will give students a chance to build their own dialogue system using open source and freely available software. This course is intended for students that are already comfortable with limited amounts of programming (in Python).
COSC-488 | Information Retrieval
Nazli Goharian Upperclass Undergraduate & Graduate
Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user. Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught.
The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the material by building a prototype of such a search engine. These approaches are in daily use by all search and social media companies.
COSC-576 | Introduction to Deep Learning with Neural Nets
Joe Garman Graduate
Recent advances in hardware have made deep learning with neural networks practical for real-world problems. Neural networks are a powerful tool that have shown benefit in a wide range of fields. Deep learning involves creating artificial neural networks with greater layer depth or deep neural nets (DNN) for short. These DNNs can find patterns in complex data, and are useful in a wide variety of situations. In numerous fields, state-of-the-art solutions have been accomplished with DNNs and DNN systems dominate head-to-head competitions. This course will introduce the student to neural networks, explain different neural network architectures, and then demonstrate the use of these neural networks on a wide array of tasks.
COSC-689 | Deep Reinforcement Learning
Grace Hui Yang Graduate
Deep Reinforcement learning is an area of machine learning that learns how to make optimal decisions from interacting with an environment. From the environment, an agent observes the consequence of its action and alters its behavior to maximize the amount of rewards received in the long term. Reinforcement learning has developed strong mathematical foundations and impressive applications in diverse disciplines such as psychology, control theory, artificial intelligence, and neuroscience. An example is the winning of AlphaGo, developed using Monte Carlo tree search and deep neural networks, over world-class human Go players. The overall problem of learning from interaction to achieve goals is still far from being solved, but our understanding of it has improved significantly. In this course, we study fundamentals, algorithms, and applications in deep reinforcement learning. Topics include Markov Decision Processes, Multi-armed Bandits, Monte Carlo Methods, Temporal Difference Learning, Function Approximation, Deep Neural Networks, Actor-Critic, Deep Q-Learning, Policy Gradient Methods, and connections to Psychology and to Neuroscience. The course has lectures, mathematical and programming assignments, and exams.
COSC-878 | Seminar: Large-Scale Statistical Machine Learning
Grace Hui Yang Graduate: Ph.D.
This doctoral seminar studies topics in statistical machine learning in the age of big data and artificial intelligence. In the seminar, we will read both classical and recent work in supervised learning, nonparametric models, optimization, and deep reinforcement learning. In the class, we will read textbooks and survey milestone papers. Students are expected to submit questions for the readings before each class and give presentations when it is their turn. To have first-hand experience, students are also expected to do a few programming exercises in the textbooks.
LING-362 | Introduction to Natural Language Processing
Austin Blodgett Upperclass Undergraduate & Graduate
This course will introduce students to the basics of Natural Language Processing (NLP), a field which combines insights from linguistics and computer science to produce applications such as machine translation, information retrieval, and spell checking. We will cover a range of topics that will help students understand how current NLP technology works and will provide students with a platform for future study and research. We will learn to implement simple representations such as finite-state techniques, n-gram models and basic parsing in the Python programming language. Previous knowledge of Python is not required, but students should be prepared to invest the necessary time and effort to become proficient over the course of the semester. Students who take this course will gain a thorough understanding of the fundamental methods used in natural language understanding, along with an ability to assess the strengths and weaknesses of natural language technologies based on these methods.
LING-367 | Computational Corpus Linguistics
Amir Zeldes Upperclass Undergraduate & Graduate
Digital linguistic corpora, i.e. electronic collections of written, spoken or multimodal language data, have become an increasingly important source of empirical information for theoretical and applied linguistics in recent years. This course is meant as a theoretically founded, practical introduction to corpus work with a broad selection of data, including non-standardized varieties such as language on the Internet, learner corpora and historical corpora. We will discuss issues of corpus design, annotation and evaluation using quantitative methods and both manual and automatic annotation tools for different levels of linguistic analysis, from parts-of-speech, through syntax to discourse annotation. Students in this course participate in building the corpus described here: https://corpling.uis.georgetown.edu/gum/
LING-424 | All About Prepositions
Nathan Schneider Upperclass Undergraduate & Graduate
This course will take on the grammatical category of prepositions, which are hands-down some of the most intriguing and beguiling words once you get to know them. (How many prepositions are there in the previous sentence? The answer may surprise you!) We will look at their syntactic and semantic versatility in English and how they vary across languages. We will explore how they denote relations in space and time, as well as many other kinds of meanings. We will see why they are so hard to learn in a second language, and why they are difficult to define in dictionaries and teach to computers. The course will be project-based, including a significant project on a language other than English.
Prerequisites: Some background in syntactic description, e.g. satisfied by LING-224, LING-427, or LING-367
LING-429 | Grammar Formalisms for Computational Research
Paul Portner Upperclass Undergraduate & Graduate
Linguists have developed a large number of formally precise syntactic theories, and many of them have been important tools for computational research. In this course, we will study five such systems with the goal of understanding both their perspective on syntax and its relation to parsing, production, and semantics, and will work to gain sufficient skill in using the formal systems to make them useful for computational work. The five systems we will discuss, along with classic early references, are the following:
- HPSG (Head-driven Phrase-structure Grammar: Pollard and Sag 1994; Sag, Wasow, and Bender 1999)
- CCG (Combinatory Categorial Grammar: Steedman 2000)
- LFG (Lexical Functional Grammar: Kaplan and Bresnan 1982, Dalrymple 2001)
- TAG (Tree Adjoining Grammar (Joshi 1987)
- Minimalist Grammars (Stabler 2001)
We will spend most of our time on HPSG (with its semantic theory Minimal Recursion Semantics, MRS) and CCG. HPSG is is both widely used in computational research and influential as a framework for studying syntax. CCG is an important modern version of the classical framework of categorial grammar and supports a direct syntax-semantics interface. We will also do brief one-week overviews of LFG and TAG, and will take a look at Minimalist Grammars because they represent a formalization of the Minimalist syntax familiar to many linguists.
ANLY-580 | NLP for Data Analytics
Chris Larson Graduate
This course will cover the major techniques for mining and analyzing textual data to extract interesting patterns, discover knowledge, and support decision-making. In this course, the students will learn the main concepts and algorithms in Natural Language Processing and their applications in data science. These include search and information retrieval, document clustering and classification, topic modeling, sentiment analysis, and deriving meaning from unstructured narratives. In addition to traditional techniques in machine learning such as regression, decision trees, and Naive Bayes algorithms, the course will also examine the latest approaches in Deep Learning. The students will be given the opportunity to develop hands-on experience in building foundational tools and machine learning algorithms that can be applied to real analytics problems. The data obtained from textual content can be used to augment numerical data for the purposes of building predictive models, identifying emerging issues, detecting opinion, and determining important relationships.
COSC-285 | Data Mining
Nazli Goharian Upperclass Undergraduate
This course covers concepts and techniques in the field of data mining. This includes both supervised and unsupervised algorithms, such as naive Bayes, neural network, decision tree, rule based classifiers, distance based learners, clustering, and association rule mining. Various issues in the pre-processing of the data are addressed. Text classification, social media mining, and recommender systems will be addressed. The students learn the material by building various data mining models and using various data pre-processing techniques, performing experimentation and provide analysis of the results.
COSC/LING-572 | Empirical Methods in Natural Language Processing
Nathan Schneider Graduate
Systems of communication that come naturally to humans are thoroughly unnatural for computers. For truly robust information technologies, we need to teach computers to unpack our language. Natural language processing (NLP) technologies facilitate semi-intelligent artificial processing of human language text. In particular, techniques for analyzing the grammar and meaning of words and sentences can be used as components within applications such as web search, question answering, and machine translation.
This course introduces fundamental NLP concepts and algorithms, emphasizing the marriage of linguistic corpus resources with statistical and machine learning methods. As such, the course combines elements of linguistics, computer science, and data science. Coursework will consist of lectures, programming assignments (in Python), and a final team project. The course is intended for students who are already comfortable with programming and have some familiarity with probability theory.
COSC-578 | Statistical Machine Learning
Grace Hui Yang Graduate
Statistical machine learning brings together statistics and computational sciences such as computer science, system science, and optimization. The recent developments in bioinformatics, signal processing, information management, finance, and artificial intelligence have been largely influenced by statistical machine learning. With a focus on mathematical and algorithmic theories, this class offers basics in statistical methodology in dealing with applied problems in science and technology. Topics covered in the class include probability, mathematical statistics, inference, sampling, optimization, and their applications in machine learning. The class will have lectures, mathematical homework, exams, and a programming-based project.
COSC-586 | Text Mining & Analysis
Nazli Goharian Graduate
This course covers various aspects and research areas in text mining and analysis. Text may be a document, query, blog, tag description, etc. The structure of the course is a combination of lectures & students' presentations. The lectures will cover Text/Web/query classification, information extraction, word sense disambiguation, opinion mining & sentiment analysis, query log analysis, ontology extraction and integration, and more. The students are assigned a related topic in the field for further study and presentation in the class.
LING-452 | Construction Grammar
Claire Bonial Upperclass Undergraduate & Graduate
Many theories of compositionality posit strict separation between the lexicon on the one hand, and abstract procedural rules for grammatical combination on the other. In contrast, Construction Grammar cohesively accounts for the forms and meanings of morphemes, words, sentence patterns, and even “fringe” phenomena such as the Caused-Motion construction: She blinked the snow off of her eyelashes.
This course provides an overview of several variant approaches to Construction Grammar. The course will begin with a survey of readings representing differing Construction Grammar approaches, including work from Fillmore, Lambrecht, Michaelis, Goldberg, Croft, and Bergen. As we examine the different flavors of Construction Grammar, we will explore the validity, advantages, and disadvantages of Construction Grammar. In this exploration, you will be asked to consider and debate questions at the crux of grammar, such as: How does syntax fit into the big picture of language in general? How are generalizations over utterances represented? How are an indefinite variety of utterances producible from a finite system of grammatical knowledge? Finally, you will select and apply a construction grammar approach of your choice to a theoretical problem (e.g., how are light verb constructions extended, and what makes a novel combination acceptable or unacceptable?) or application area (e.g., language learning or multi-word expression detection) of interest to you in any language you are studying, comparing with other approaches to this problem.
LING-464 | Social Factors in Computational Linguistics and AI
Shabnam Tafreshi Upperclass Undergraduate & Graduate
Advances in technologies for processing human languages have increasingly brought computational linguistics into contact with people. As such, what language reveals about people—and how AI algorithms make decisions affecting people based on their language—is of paramount concern. At the same time, contemporary algorithms for processing language offer powerful new tools for studying people and society on a large scale. Designed for students with grounding in computational linguistics, this course will examine the intersection of people, language, and algorithms with technical precision as well as an appreciation for human context. Topics will include: computational models of conversational interaction and power dynamics; emotions, sentiment, subjectivity, and politeness; toxic language; sociolinguistic variation; detection of attributes such as race and gender; issues of privacy, ethics, bias, and fairness, with special attention to minoritized speakers, languages, and dialects; and the use of large-scale language data for studying political framing and social movements like #MeToo and Black Lives Matter.
LING-469 | Analyzing language data with R
Amir Zeldes Upperclass Undergraduate & Graduate
This course will teach statistical analysis of language data with a focus on corpus materials, using the freely available statistics software 'R'. The course will begin with foundational notions and methods for statistical evaluation, hypothesis testing and visualization of linguistic data which are necessary for both the practice and the understanding of current quantitative research. As we progress we will learn exploratory methods to chart out meaningful structures in language data, such as agglomerative clustering, principal component analysis and multifactorial regression analysis. The course assumes basic mathematical skills and familiarity with linguistic methodology, but does not require a background in statistics or R.
LING-472/ANLY-521 | Computational Linguistics with Advanced Python
Trevor Adriaanse Upperclass Undergraduate & Graduate
This course teaches advanced topics in programming for linguistic data analysis and processing using the Python language. A series of assignments will give students hands-on practice implementing core algorithms for linguistic tasks. By the end of the course, students will be able to transform pseudocode into well-written code for algorithms that make sense of textual data, and to evaluate the algorithms quantitatively and qualitatively. Linguistic tasks will include edit distance, semantic similarity, authorship detection, and named entity recognition. Python topics will include the appropriate use of data structures; mathematical objects in numpy; exception handling; object-oriented programming; and software development practices such as code documentation and version control.
Requirements: Basic Python programming skills are required (for example satisfied by LING-362, Intro to NLP)