Java Stanford NLP: Part of Speech labels?
The Penn Treebank Project. Look at the Part-of-speech tagging ps.
JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.
That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.
- CC Coordinating conjunction
- CD Cardinal number
- DT Determiner
- EX Existential there
- FW Foreign word
- IN Preposition or subordinating conjunction
- JJ Adjective
- JJR Adjective, comparative
- JJS Adjective, superlative
- LS List item marker
- MD Modal
- NN Noun, singular or mass
- NNS Noun, plural
- NNP Proper noun, singular
- NNPS Proper noun, plural
- PDT Predeterminer
- POS Possessive ending
- PRP Personal pronoun
- PRP$ Possessive pronoun
- RB Adverb
- RBR Adverb, comparative
- RBS Adverb, superlative
- RP Particle
- SYM Symbol
- TO to
- UH Interjection
- VB Verb, base form
- VBD Verb, past tense
- VBG Verb, gerund or present participle
- VBN Verb, past participle
- VBP Verb, non3rd person singular present
- VBZ Verb, 3rd person singular present
- WDT Whdeterminer
- WP Whpronoun
- WP$ Possessive whpronoun
- WRB Whadverb
Specific Part of Speech labels for Java Stanford NLP
It is the Penn Treebank POS set, but many descriptions of this tag set seem to omit punctuation marks. Here is a complete list of tags:
https://www.eecis.udel.edu/~vijay/cis889/ie/pos-set.pdf
(But parentheses are tagged as -LRB- and -RRB-, not sure why they don't mention this in the documentation.)
Stanford NLP: Arabic Part of Speech labels?
First, I am not sure of this answer; but I hope it will help you.
What you asked for, must be located in the following link : POS tag set does the parser use? (but unfortunately there are many broken links!!).
As they mentioned, you can find the tag set (you called labels) in the following file atb1-v4.1-taglist-conversion-to-PennPOS-forrelease.lisp. As I understand it, they map one or more sequence of English tags to one or more Arabic tag. for example:the mapping:
(DET+ADJ_COMP+NSUFF_MASC_DU_GEN DT+JJR)
means that we map the English sequence of tags (DET+ADJ_COMP+NSUFF_MASC_DU_GEN) to the Arabic tags (DT+JJR).
Regarding your question (المدرسة/DTNN), they mentioned that it consists of two tags (DT + NN) where DT= (الــ) (pronounced as 'Al') and NN = (Noun, singular or mass) see Penn Treebank .
Stanford NLP: Chinese Part of Speech labels?
We use the tag set of the (Penn/LDC/Brandeis/UC Boulder) Chinese Treebank.
See here for details on the tag set: http://www.cis.upenn.edu/~chinese/
This was documented in the parser FAQ, but I'll add it to the tagger FAQ.
number possible part of speech of the word
Stanford CoreNLP doesn't seem to have an interface to WordNet, but it's pretty easy to do this with one of the other small Java WordNet libraries. For this example, I used JWI 2.3.3.
Besides JWI, you'll need to download a copy of the WordNet database. For example, you can download WordNet-3.0.tar.gz from Princeton. Untar the dictionary.
The following code includes a function that returns a list of the possible parts of speech for a word:
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import edu.mit.jwi.Dictionary;
import edu.mit.jwi.item.POS;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.morph.WordnetStemmer;
public class WNDemo {
/**
* Given a dictionary and a word, find all the parts of speech the
* word can be.
*/
public static Collection getPartsOfSpeech(Dictionary dict, String word) {
ArrayList<POS> parts = new ArrayList<POS>();
WordnetStemmer stemmer = new WordnetStemmer(dict);
// Check every part of speech.
for (POS pos : POS.values()) {
// Check every stem, because WordNet doesn't have every surface
// form in its database.
for (String stem : stemmer.findStems(word, pos)) {
IIndexWord iw = dict.getIndexWord(stem, pos);
if (iw != null) {
parts.add(pos);
}
}
}
return parts;
}
public static void main(String[] args) {
try {
Dictionary dict = new Dictionary(new File("WordNet-3.0/dict"));
dict.open();
System.out.println("'like' is a " + getPartsOfSpeech(dict, "like"));
} catch (IOException e) {
System.err.println("Error: " + e);
}
}
}
And the output:
'like' is a [noun, verb, adjective]
What is the meaning of labels on the arrows of dependency parser graph?
Here are the definitions of dependency relations and their labels as used for English. When you click on a relation label, you'll get full explanation with examples.
https://universaldependencies.org/en/dep/index.html
stanford nlp pos tagging
Of course Stanford CoreNLP can do tagging directly. The following lines of code tag your example, and give you the desired output.
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation("I'm so happy about my marks");
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
System.out.println(word + "/" + pos);
}
}
Related Topics
Efficient Intersection of Two List<String> in Java
Exact Difference Between Charsequence and String in Java
Initializing an Array in Java Using the 'Advanced' for Each Loop
Good Examples Using Java.Util.Logging
Retrieve Java Annotation Attribute
Intellij: Never Use Wildcard Imports
How to Deal with Maven-3 Timestamped Snapshots Efficiently
How to Insert an Pdpage Within Another Pdpage with PDFbox
How to Dynamically Build a Multi-Dimensional Array in Java
Why Does Java's Java.Time.Format.Datetimeformatter#Format(Localdatetime) Add a Year
Try/Catch Versus Throws Exception
How to Efficiently Remove All Null Elements from a Arraylist or String Array
Websphere All Logs Are Going to Systemout.Log
Method Calls Inside a Java Class Return an "Identifier Expected After This Token" Error
Getting the Array Length of a 2D Array in Java