Ph.D. Dissertation

Lexical Semantic Analysis in Natural Language Text


Nathan Schneider
Carnegie Mellon University, 2014


Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural language processing tools. The framework does not require any lexical resource or syntactic parser, so it will be relatively simple to adapt to new languages and domains.

The contributions of this thesis are: a formal representation of lexical segments and coarse semantic classes; a well-tested linguistic annotation scheme with detailed guidelines for identifying multiword expressions and categorizing nouns, verbs, and prepositions; an English web corpus annotated with this scheme; and an open source NLP system that automates the analysis by statistical sequence tagging. Finally, we motivate the applicability of lexical semantic information to sentence-level language technologies (such as semantic parsing and machine translation) and to corpus-based linguistic inquiry.

[extended abstract]


  1. Setting the Stage
  2. General Background: Computational Lexical Semantics
  3. Multiword Expressions
  4. Noun and Verb Supersenses
  5. Preposition Supersenses
  6. Multiword Expression Identification
  7. Full Supersense Tagging
  8. Conclusion


To cite the thesis:

	address = {Pittsburgh, Pennsylvania, {USA}},
	type = {{Ph.D.} dissertation},
	title = {Lexical Semantic Analysis in Natural Language Text},
	url = {},
	school = {Carnegie Mellon University},
	author = {Schneider, Nathan},
	month = sep,
	year = {2014}

Works cited in the thesis