Extracted tokens and POS tags from the English Web Treebank via the Universal Dependencies Project (http://universaldependencies.org/). POS tags come in two varieties: Universal (v2), in .upos files, and new-style Penn Treebank tags, in .ppos files. Tagset overviews: * Universal: http://people.cs.georgetown.edu/nschneid/p/UPOS-English.pdf * Penn (full new-style tagset): https://spacy.io/docs/usage/pos-tagging#pos-tagging-english * Penn (examples): http://surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankTagset.html Data files: en-ud-train.upos.tsv en-ud-dev.upos.tsv en-ud-test.upos.tsv en-ud-train.ppos.tsv en-ud-dev.ppos.tsv en-ud-test.ppos.tsv Version information: UD_English corpus: https://github.com/UniversalDependencies/UD_English commit 5afdab7bf1a3add314b7a5b51d020a8a9dcd6379 Sun Oct 29 12:17:41 2017 -0700