Ph.D. Dissertation

Lexical Semantic Analysis in Natural Language Text

Nathan Schneider
Carnegie Mellon University, 2014

Abstract

Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural language processing tools. The framework does not require any lexical resource or syntactic parser, so it will be relatively simple to adapt to new languages and domains.

The contributions of this thesis are: a formal representation of lexical segments and coarse semantic classes; a well-tested linguistic annotation scheme with detailed guidelines for identifying multiword expressions and categorizing nouns, verbs, and prepositions; an English web corpus annotated with this scheme; and an open source NLP system that automates the analysis by statistical sequence tagging. Finally, we motivate the applicability of lexical semantic information to sentence-level language technologies (such as semantic parsing and machine translation) and to corpus-based linguistic inquiry.

[extended abstract]

Chapters

Setting the Stage
General Background: Computational Lexical Semantics
Multiword Expressions
Noun and Verb Supersenses
Preposition Supersenses
Multiword Expression Identification
Full Supersense Tagging
Conclusion

BibTeX

To cite the thesis:

@phdthesis{schneider-thesis,
	address = {Pittsburgh, Pennsylvania, {USA}},
	type = {{Ph.D.} dissertation},
	title = {Lexical Semantic Analysis in Natural Language Text},
	url = {https://www.cs.cmu.edu/~nschneid/thesis/thesis-print.pdf},
	school = {Carnegie Mellon University},
	author = {Schneider, Nathan},
	month = sep,
	year = {2014}
}

Works cited in the thesis