Ph.D. Dissertation
Lexical Semantic Analysis in Natural Language Text
Nathan Schneider
Carnegie Mellon University, 2014
Abstract
Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural language processing tools. The framework does not require any lexical resource or syntactic parser, so it will be relatively simple to adapt to new languages and domains.
The contributions of this thesis are: a formal representation of lexical segments and coarse semantic classes; a well-tested linguistic annotation scheme with detailed guidelines for identifying multiword expressions and categorizing nouns, verbs, and prepositions; an English web corpus annotated with this scheme; and an open source NLP system that automates the analysis by statistical sequence tagging. Finally, we motivate the applicability of lexical semantic information to sentence-level language technologies (such as semantic parsing and machine translation) and to corpus-based linguistic inquiry.
Chapters
- Setting the Stage
- General Background: Computational Lexical Semantics
- Multiword Expressions
- Noun and Verb Supersenses
- Preposition Supersenses
- Multiword Expression Identification
- Full Supersense Tagging
- Conclusion
BibTeX
To cite the thesis:
@phdthesis{schneider-thesis, address = {Pittsburgh, Pennsylvania, {USA}}, type = {{Ph.D.} dissertation}, title = {Lexical Semantic Analysis in Natural Language Text}, url = {http://www.cs.cmu.edu/~nschneid/thesis/thesis-print.pdf}, school = {Carnegie Mellon University}, author = {Schneider, Nathan}, month = sep, year = {2014} }