Maltparser giving error in NLTK
The MaltParser API in NLTK just had a patch that fixes and stabilizes the problems that it used to have:
- How to use malt parser in python nltk
- Malt Parser throwing class not found exception
- MaltParser Not Working in Python NLTK
Here's an example of how to use MaltParser API in NLTK:
# Upgrade your NLTK.
alvas@ubi:~$ cd ~
alvas@ubi:~$ pip install -U nltk
# Get the latest MaltParser and model
alvas@ubi:~$ wget http://maltparser.org/dist/maltparser-1.8.1.zip
alvas@ubi:~$ unzip maltparser-1.8.1.zip
alvas@ubi:~$ wget http://www.maltparser.org/mco/english_parser/engmalt.poly-1.7.mco
# In python, now you can do this:
alvas@ubi:~$ python
>>> from nltk.parse.malt import MaltParser
>>> mp = MaltParser('/home/alvas/maltparser-1.8.1', '/home/alvas/engmalt.poly-1.7.mco')
>>> sent1 = 'I shot an elephant in my pajamas .'.split()
>>> print(mp.parse_one(sent1).tree())
(shot I (elephant an (in (pajamas my))) .)
(See here for more demo code or here for a more elaborated demo code)
Note that you can also use the export features and you can escape the usage of full path when initializing the MaltParser
object. But you have to still tell the object what is the name of the parser directory and model filename to look for, e.g.
alvas@ubi:~$ export MALT_PARSER='/home/$UID/maltparser-1.8.1/'
alvas@ubi:~$ export MALT_MODEL='/home/$UID/engmalt.poly-1.7.mco'
alvas@ubi:~$ python
>>> from nltk.parse.malt import MaltParser
>>> mp = MaltParser('maltparser-1.8.1', 'engmalt.poly-1.7.mco')
>>> sent1 = 'I shot an elephant in my pajamas .'.split()
>>> print(mp.parse_one(sent1).tree())
(shot I (elephant an (in (pajamas my))) .)
NLTK MaltParser won't parse
Iam not sure if the Problem is still unsolved (but I think its already solved),
but as I had the same problems a while ago, I would like to share my knowledge.
First of all, the MaltParser-Jar does not accept a .connl file with a direct path to its file in front of it. Like seen above.
Why it is so... I do not know.
But you can easily fix it by changing the command line to something like this:
cmd = ['java', '-jar %s' % self._malt_bin,'-w %s' %self.working_dir,'-c %s' % self.mco, '-i %s' % input_file, '-o %s' % output_file, '-m parse']
Here now the directory of the .conll file is set using the -w parameter. Using this you can load any .conll file from any given folder.
I also change from tempfile.gettempdir()
to self.working_dir
, because in the "original" NLTK Version, always the /tmp/ folder is set as working directory. Even if you initialise the Maltparser with another working directory.
I hope this informations will help someone.
Another thing,
if you want to parse many sentences as once, but each individually and not depending on all other sentences, you have to add a blank line in the input.conll file, and start the numeration for each sentence again with 1.
How to use malt parser in python nltk
Edited
Note that is answer is no longer working because of the updated version of the MaltParser API in NLTK since August 2015. This answer is kept for legacy sake.
Please see this answers to get MaltParser working with NLTK:
- Step by step to getting malt parser in NLTK to work?
Disclaimer: This is not an eternal solutions. The answer in the above link (posted on Feb 2016) will work for now. But when MaltParser or NLTK API changes, it might also change the syntax to using MaltParser in NLTK.
A couple problems with your setup:
- The input to
train_from_file
must be a file in CoNLL format, not a pre-trained model. For anmco
file, you pass it to theMaltParser
constructor using themco
andworking_directory
parameters. - The default java heap allocation is not large enough to load that particular
mco
file, so you'll have to tell java to use more heap space with the-Xmx
parameter. Unfortunately this wasn't possible with the existing code so I just checked in a change to allow an additional constructor parameters for java args. See here.
So here's what you need to do:
First, get the latest NLTK revision:
git clone https://github.com/nltk/nltk.git
(NOTE: If you can't use the git version of NLTK, then you'll have to update the file malt.py
manually or copy it from here to have your own version.)
Second, rename the jar file to malt.jar
, which is what NLTK expects:
cd /usr/lib/
ln -s maltparser-1.7.2.jar malt.jar
Then add an environment variable pointing to malt parser:
export MALTPARSERHOME="/Users/dhg/Downloads/maltparser-1.7.2"
Finally, load and use malt parser in python:
>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/home/rohith/malt-1.7.2",
... mco="engmalt.linear-1.7",
... additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
'(This (sentence is a test))'
Related Topics
Executing Command Using "Su -L" in Ssh Using Python
Get All Modules/Packages Used by a Python Project
Python Gdal 2.1 Installation on Ubuntu 16.04
How to Close a Socket Left Open by a Killed Program
Get Mouse Deltas Using Python! (In Linux)
How to "Watch" a File for Modification/Change
Python Script to List Users and Groups
Environment Variables When Script Run by Cron
Schedule Python Script with Crontab
Standalone Python Applications in Linux
Changing the Process Name of a Python Script
Give the Python Terminal a Persistent History
How to Make My Python Module Available System Wide on Linux
Setuid Bit on Python Script:Linux VS Solaris
Ipython Notebook on Linux Vm Running Matplotlib Interactive with Nbagg