Java Stanford Nlp: Part of Speech Labels

Java Stanford NLP: Part of Speech labels?

The Penn Treebank Project. Look at the Part-of-speech tagging ps.

JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.

That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition or subordinating conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NNP Proper noun, singular
  15. NNPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PRP Personal pronoun
  19. PRP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund or present participle
  30. VBN Verb, past participle
  31. VBP Verb, non­3rd person singular present
  32. VBZ Verb, 3rd person singular present
  33. WDT Wh­determiner
  34. WP Wh­pronoun
  35. WP$ Possessive wh­pronoun
  36. WRB Wh­adverb

Specific Part of Speech labels for Java Stanford NLP

It is the Penn Treebank POS set, but many descriptions of this tag set seem to omit punctuation marks. Here is a complete list of tags:

https://www.eecis.udel.edu/~vijay/cis889/ie/pos-set.pdf

(But parentheses are tagged as -LRB- and -RRB-, not sure why they don't mention this in the documentation.)

Stanford NLP: Arabic Part of Speech labels?

First, I am not sure of this answer; but I hope it will help you.
What you asked for, must be located in the following link : POS tag set does the parser use? (but unfortunately there are many broken links!!).

As they mentioned, you can find the tag set (you called labels) in the following file atb1-v4.1-taglist-conversion-to-PennPOS-forrelease.lisp. As I understand it, they map one or more sequence of English tags to one or more Arabic tag. for example:the mapping:

(DET+ADJ_COMP+NSUFF_MASC_DU_GEN DT+JJR)

means that we map the English sequence of tags (DET+ADJ_COMP+NSUFF_MASC_DU_GEN) to the Arabic tags (DT+JJR).

Regarding your question (المدرسة/DTNN), they mentioned that it consists of two tags (DT + NN) where DT= (الــ) (pronounced as 'Al') and NN = (Noun, singular or mass) see Penn Treebank .

Stanford NLP: Chinese Part of Speech labels?

We use the tag set of the (Penn/LDC/Brandeis/UC Boulder) Chinese Treebank.

See here for details on the tag set: http://www.cis.upenn.edu/~chinese/

This was documented in the parser FAQ, but I'll add it to the tagger FAQ.

number possible part of speech of the word

Stanford CoreNLP doesn't seem to have an interface to WordNet, but it's pretty easy to do this with one of the other small Java WordNet libraries. For this example, I used JWI 2.3.3.

Besides JWI, you'll need to download a copy of the WordNet database. For example, you can download WordNet-3.0.tar.gz from Princeton. Untar the dictionary.

The following code includes a function that returns a list of the possible parts of speech for a word:

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;

import edu.mit.jwi.Dictionary;
import edu.mit.jwi.item.POS;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.morph.WordnetStemmer;

public class WNDemo {

/**
* Given a dictionary and a word, find all the parts of speech the
* word can be.
*/
public static Collection getPartsOfSpeech(Dictionary dict, String word) {
ArrayList<POS> parts = new ArrayList<POS>();
WordnetStemmer stemmer = new WordnetStemmer(dict);
// Check every part of speech.
for (POS pos : POS.values()) {
// Check every stem, because WordNet doesn't have every surface
// form in its database.
for (String stem : stemmer.findStems(word, pos)) {
IIndexWord iw = dict.getIndexWord(stem, pos);
if (iw != null) {
parts.add(pos);
}
}
}
return parts;
}

public static void main(String[] args) {
try {
Dictionary dict = new Dictionary(new File("WordNet-3.0/dict"));
dict.open();
System.out.println("'like' is a " + getPartsOfSpeech(dict, "like"));
} catch (IOException e) {
System.err.println("Error: " + e);
}
}
}

And the output:

'like' is a [noun, verb, adjective]

What is the meaning of labels on the arrows of dependency parser graph?

Here are the definitions of dependency relations and their labels as used for English. When you click on a relation label, you'll get full explanation with examples.

https://universaldependencies.org/en/dep/index.html

stanford nlp pos tagging

Of course Stanford CoreNLP can do tagging directly. The following lines of code tag your example, and give you the desired output.

Properties props = new Properties();

props.setProperty("annotators","tokenize, ssplit, pos");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation("I'm so happy about my marks");
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
System.out.println(word + "/" + pos);
}
}


Related Topics



Leave a reply



Submit