Killing Stanford Core Nlp Process

Killing Stanford core nlp process

You can always CTRL-C in the terminal window to stop the server.

You could also ps aux | grep StanfordCoreNLPServer to find the pid and then kill the process manually.

When the server is started it should create a shutdown key and you can send that message to the server to close the server. This isn't working on my Macbook Pro (maybe a permission issue ??) but I've seen it work on other machines.

Here is the command:

wget "localhost:9000/shutdown?key=`cat /tmp/corenlp.shutdown`" -O -

Note the shutdown key is stored at /tmp/corenlp.shutdown

If you use the the -server_id server0 option the shutdown key will be stored at this path /tmp/corenlp.shutdown.server0

Standard way to start and stop StanfordCoreNLP server in python?

You should send a shutdown message with the shutdown key. Here is an example command line call:

wget "localhost:9000/shutdown?key=`cat /tmp/corenlp.shutdown`" -O -

You could execute such a command with subprocess or os.system etc...

Note that the shutdown key is located at /tmp/corenlp.shutdown unless you specify a different name.

If you want to be nicer you could also use the requests library:

import requests

from commands import getoutput

url = "http://localhost:9000/shutdown?"
shutdown_key = getoutput("cat /tmp/corenlp.shutdown")
r = requests.post(url,data="",params={"key": shutdown_key})

That will transmit the shutdown message to the server as well.

Mute Stanford coreNLP logging

Om’s answer is good, but two other possibly useful approaches:

  • If it is just these warnings from the tokenizer that are annoying you, you can (in code or in StanfordCoreNLP.properties) set a property so they disappear: props.setProperty("tokenize.options", "untokenizable=NoneKeep");.
  • If slf4j is on the classpath, then, by default, our own Redwoods logger will indeed log through slf4j. So, you can also set the logging level using slf4j.

Stanford coreNLP - split words ignoring apostrophe

Currently, no. The subsequent Stanford CoreNLP processing tools all use Penn Treebank tokenization, which splits contractions into two tokens (regarding "I'm" as a reduced form of "I am" by making it the two "words" [I] ['m]). It sounds like you want a different type of tokenization.

While there are some tokenization options, there isn't one to change this, and subsequent tools (like the POS tagger or parser) would work badly without contractions being split. You could add such an option to the tokenizer, changing (deleting) the treatment of REDAUX and SREDAUX trailing contexts.

You can also join contractions via post processing as @dhg suggests, but you'd want to do it a little more carefully in the "if" so it didn't join on quotes.

stanford coreNLP process many files with a script

By far the best way to process a lot of files with Stanford CoreNLP is to arrange to load the system once - since loading all the various models takes 15 seconds or more depending on your computer before any actually document processing is done - and then to process a bunch of files with it. What you have in your update doesn't do that because running CoreNLP is inside the for loop. A good solution is to use the for loop to make a file list and then to run CoreNLP once on the file list. The file list is just a text file with one filename per line, so you can make it any way you want (using a script, editor macro, typing it in yourself), and you can and should check that its contents look correct before running CoreNLP. For your example, based on your update example, the following should work:

dir=/Users/matthew/Workbench
for f in $dir/Data/NYTimes/NYTimesCorpus_3/*/*/*/*.txt; do
echo $f >> filelist.txt
done
# You can here check that filelist.txt has in it the files you want
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -filelist filelist
# By default output files are written to the current directory, so you don't need to specify -outputDirectory .

Other notes on earlier tries:

  • -mx600m isn't a reasonable way to run the full CoreNLP pipeline (right through parsing and coref). The sum of all it's models is just too large. -mx2g is fine.
  • The best way above doesn't fully extend to the NER case. Stanford NER doesn't take a -filelist option, and if you use -textFiles then the files are concatenated and become one output file, which you may well not want. At present, for NER, you may well need to run it inside the for loop, as in your script for that.
  • I haven't quite decoded how you're getting the error Could not find or load main class .Users.matthew.Workbench.Code.CoreNLP.Stanford-corenlp-full-2015-01-29.edu.stanford.nlp.pipeline.StanfordCoreNLP, but this is happening because you're putting a String (filename?) like that (perhaps with slashes rather than periods) where the java command expects a class name. In that place, there should only be edu.stanford.nlp.pipeline.StanfordCoreNLP as in your updated script or mine.
  • You can't have a dynamic outputDirectory in one call to CoreNLP. You could get the effect that I think you want reasonably efficiently by making one call to CoreNLP per directory using two nested for loops. The outer for loop would iterate over directories, the inner one make a file list from all the files in that directory, which would then be processed in one call to CoreNLP and written to an appropriate output directory based on the input directory in the outer for loop. Someone with more time or bash-fu than me could try to write that....
  • You can certainly also write your own code to call CoreNLP, but then you're responsible for scanning input directories and writing to appropriate output files yourself. What you have looks basically okay, except the line System.out.println( doc ); won't do anything useful - it just prints out the test you began with. You need something like:

    PrintWriter xmlOut = new PrintWriter("outputFileName.xml");
    pipeline.xmlPrint(doc, xmlOut);

Using Stanford Core NLP in PHP?

In a past project, we tried searching for a PHP library that would work well with Core NLP to no avail. We ended up writing our own wrapper by creating methods that would just run an exec command and get us some of the base functionality we needed.



Related Topics



Leave a reply



Submit