Java Stanford Nlp: Part of Speech Labels

Java Stanford NLP: Part of Speech labels?

The Penn Treebank Project. Look at the Part-of-speech tagging ps.

JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.

That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.

CC Coordinating conjunction

CD Cardinal number

DT Determiner

EX Existential there

FW Foreign word

IN Preposition or subordinating conjunction

JJ Adjective

JJR Adjective, comparative

JJS Adjective, superlative

LS List item marker

MD Modal

NN Noun, singular or mass

NNS Noun, plural

NNP Proper noun, singular

NNPS Proper noun, plural

PDT Predeterminer

POS Possessive ending

PRP Personal pronoun

PRP$ Possessive pronoun

RB Adverb

RBR Adverb, comparative

RBS Adverb, superlative

RP Particle

SYM Symbol

TO to

UH Interjection

VB Verb, base form

VBD Verb, past tense

VBG Verb, gerund or present participle

VBN Verb, past participle

VBP Verb, non3rd person singular present

VBZ Verb, 3rd person singular present

WDT Whdeterminer

WP Whpronoun

WP$ Possessive whpronoun

WRB Whadverb

Specific Part of Speech labels for Java Stanford NLP

It is the Penn Treebank POS set, but many descriptions of this tag set seem to omit punctuation marks. Here is a complete list of tags:

https://www.eecis.udel.edu/~vijay/cis889/ie/pos-set.pdf

(But parentheses are tagged as -LRB- and -RRB-, not sure why they don't mention this in the documentation.)

Stanford NLP: Arabic Part of Speech labels?

First, I am not sure of this answer; but I hope it will help you.
What you asked for, must be located in the following link : POS tag set does the parser use? (but unfortunately there are many broken links!!).

As they mentioned, you can find the tag set (you called labels) in the following file atb1-v4.1-taglist-conversion-to-PennPOS-forrelease.lisp. As I understand it, they map one or more sequence of English tags to one or more Arabic tag. for example:the mapping:

(DET+ADJ_COMP+NSUFF_MASC_DU_GEN DT+JJR)

means that we map the English sequence of tags (DET+ADJ_COMP+NSUFF_MASC_DU_GEN) to the Arabic tags (DT+JJR).

Regarding your question (المدرسة/DTNN), they mentioned that it consists of two tags (DT + NN) where DT= (الــ) (pronounced as 'Al') and NN = (Noun, singular or mass) see Penn Treebank .

Stanford NLP: Chinese Part of Speech labels?

We use the tag set of the (Penn/LDC/Brandeis/UC Boulder) Chinese Treebank.

See here for details on the tag set: http://www.cis.upenn.edu/~chinese/

This was documented in the parser FAQ, but I'll add it to the tagger FAQ.

number possible part of speech of the word

Stanford CoreNLP doesn't seem to have an interface to WordNet, but it's pretty easy to do this with one of the other small Java WordNet libraries. For this example, I used JWI 2.3.3.

Besides JWI, you'll need to download a copy of the WordNet database. For example, you can download WordNet-3.0.tar.gz from Princeton. Untar the dictionary.

The following code includes a function that returns a list of the possible parts of speech for a word:

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;

import edu.mit.jwi.Dictionary;
import edu.mit.jwi.item.POS;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.morph.WordnetStemmer;

public class WNDemo {

  /**
   * Given a dictionary and a word, find all the parts of speech the
   * word can be.
   */
  public static Collection getPartsOfSpeech(Dictionary dict, String word) {
    ArrayList<POS> parts = new ArrayList<POS>();
    WordnetStemmer stemmer = new WordnetStemmer(dict);
    // Check every part of speech.
    for (POS pos : POS.values()) {
      // Check every stem, because WordNet doesn't have every surface
      // form in its database.
      for (String stem : stemmer.findStems(word, pos)) {
        IIndexWord iw = dict.getIndexWord(stem, pos);
        if (iw != null) {
          parts.add(pos);
        }
      }
    }
    return parts;
  }

  public static void main(String[] args) {
    try {
      Dictionary dict = new Dictionary(new File("WordNet-3.0/dict"));
      dict.open();
      System.out.println("'like' is a " + getPartsOfSpeech(dict, "like"));
    } catch (IOException e) {
      System.err.println("Error: " + e);
    }
  }
}

And the output:

'like' is a [noun, verb, adjective]

What is the meaning of labels on the arrows of dependency parser graph?

Here are the definitions of dependency relations and their labels as used for English. When you click on a relation label, you'll get full explanation with examples.

https://universaldependencies.org/en/dep/index.html

stanford nlp pos tagging

Of course Stanford CoreNLP can do tagging directly. The following lines of code tag your example, and give you the desired output.

Properties props = new Properties();

props.setProperty("annotators","tokenize, ssplit, pos");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation("I'm so happy about my marks");
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
    for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
        String word = token.get(CoreAnnotations.TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
        System.out.println(word + "/" + pos);
    }
}

Java Stanford Nlp: Part of Speech Labels