Killing Stanford core nlp process
You can always CTRL-C
in the terminal window to stop the server.
You could also ps aux | grep StanfordCoreNLPServer
to find the pid and then kill the process manually.
When the server is started it should create a shutdown key and you can send that message to the server to close the server. This isn't working on my Macbook Pro (maybe a permission issue ??) but I've seen it work on other machines.
Here is the command:
wget "localhost:9000/shutdown?key=`cat /tmp/corenlp.shutdown`" -O -
Note the shutdown key is stored at /tmp/corenlp.shutdown
If you use the the -server_id server0
option the shutdown key will be stored at this path /tmp/corenlp.shutdown.server0
Standard way to start and stop StanfordCoreNLP server in python?
You should send a shutdown message with the shutdown key. Here is an example command line call:
wget "localhost:9000/shutdown?key=`cat /tmp/corenlp.shutdown`" -O -
You could execute such a command with subprocess
or os.system
etc...
Note that the shutdown key is located at /tmp/corenlp.shutdown
unless you specify a different name.
If you want to be nicer you could also use the requests
library:
import requests
from commands import getoutput
url = "http://localhost:9000/shutdown?"
shutdown_key = getoutput("cat /tmp/corenlp.shutdown")
r = requests.post(url,data="",params={"key": shutdown_key})
That will transmit the shutdown message to the server as well.
Mute Stanford coreNLP logging
Om’s answer is good, but two other possibly useful approaches:
- If it is just these warnings from the tokenizer that are annoying you, you can (in code or in StanfordCoreNLP.properties) set a property so they disappear:
props.setProperty("tokenize.options", "untokenizable=NoneKeep");
. - If slf4j is on the classpath, then, by default, our own Redwoods logger will indeed log through slf4j. So, you can also set the logging level using slf4j.
Stanford coreNLP - split words ignoring apostrophe
Currently, no. The subsequent Stanford CoreNLP processing tools all use Penn Treebank tokenization, which splits contractions into two tokens (regarding "I'm" as a reduced form of "I am" by making it the two "words" [I] ['m]). It sounds like you want a different type of tokenization.
While there are some tokenization options, there isn't one to change this, and subsequent tools (like the POS tagger or parser) would work badly without contractions being split. You could add such an option to the tokenizer, changing (deleting) the treatment of REDAUX and SREDAUX trailing contexts.
You can also join contractions via post processing as @dhg suggests, but you'd want to do it a little more carefully in the "if" so it didn't join on quotes.
stanford coreNLP process many files with a script
By far the best way to process a lot of files with Stanford CoreNLP is to arrange to load the system once - since loading all the various models takes 15 seconds or more depending on your computer before any actually document processing is done - and then to process a bunch of files with it. What you have in your update doesn't do that because running CoreNLP is inside the for
loop. A good solution is to use the for
loop to make a file list and then to run CoreNLP once on the file list. The file list is just a text file with one filename per line, so you can make it any way you want (using a script, editor macro, typing it in yourself), and you can and should check that its contents look correct before running CoreNLP. For your example, based on your update example, the following should work:
dir=/Users/matthew/Workbench
for f in $dir/Data/NYTimes/NYTimesCorpus_3/*/*/*/*.txt; do
echo $f >> filelist.txt
done
# You can here check that filelist.txt has in it the files you want
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -filelist filelist
# By default output files are written to the current directory, so you don't need to specify -outputDirectory .
Other notes on earlier tries:
-mx600m
isn't a reasonable way to run the full CoreNLP pipeline (right through parsing and coref). The sum of all it's models is just too large.-mx2g
is fine.- The best way above doesn't fully extend to the NER case. Stanford NER doesn't take a
-filelist
option, and if you use-textFiles
then the files are concatenated and become one output file, which you may well not want. At present, for NER, you may well need to run it inside thefor
loop, as in your script for that. - I haven't quite decoded how you're getting the error
Could not find or load main class .Users.matthew.Workbench.Code.CoreNLP.Stanford-corenlp-full-2015-01-29.edu.stanford.nlp.pipeline.StanfordCoreNLP
, but this is happening because you're putting a String (filename?) like that (perhaps with slashes rather than periods) where thejava
command expects a class name. In that place, there should only beedu.stanford.nlp.pipeline.StanfordCoreNLP
as in your updated script or mine. - You can't have a dynamic
outputDirectory
in one call to CoreNLP. You could get the effect that I think you want reasonably efficiently by making one call to CoreNLP per directory using two nestedfor
loops. The outerfor
loop would iterate over directories, the inner one make a file list from all the files in that directory, which would then be processed in one call to CoreNLP and written to an appropriate output directory based on the input directory in the outerfor
loop. Someone with more time or bash-fu than me could try to write that.... You can certainly also write your own code to call CoreNLP, but then you're responsible for scanning input directories and writing to appropriate output files yourself. What you have looks basically okay, except the line
System.out.println( doc );
won't do anything useful - it just prints out the test you began with. You need something like:PrintWriter xmlOut = new PrintWriter("outputFileName.xml");
pipeline.xmlPrint(doc, xmlOut);
Using Stanford Core NLP in PHP?
In a past project, we tried searching for a PHP library that would work well with Core NLP to no avail. We ended up writing our own wrapper by creating methods that would just run an exec command and get us some of the base functionality we needed.
Related Topics
How to Delay Pipe Netcat to Connect on First Input
Is There Any Way for Ioctl() in Linux to Specify Submission Queue Id for a Nvme Io Request
Rpm Spec to Require Specific Rhel Release
Count Total Number of Pattern Between Two Pattern (Using Sed If Possible) in Linux
Linux + Ssh Limitation + Ssh at The Same Time from Multiple Machine to One Machine
Insecure $Env{Path} While Running with - T Switch
Search Ip from a Text File in .Csv Log File, If Found Add New Column Next to It
Perl Script to Capture Stderr and Stdout of Command Executed in Back-Quotes
How to Cross-Compile a Autotools Project for Arm
Auto-Start Program at Login in Angstrom on Beagleboard
Linking with 32Bit Libraries Under Linux 64Bit
How to Split My File into Multiple Files