Change nltk.download() path directory from default ~/ntlk_data
This can be configured both by command-line (nltk.download(..., download_dir=)
or by GUI. Bizarrely nltk seems to totally ignore its own environment variable NLTK_DATA
and default its download directories to a standard set of five paths, regardless whether NLTK_DATA
is defined and where it points, and regardless whether nltk's five default dirs even exist on the machine or architecture(!). Some of that is documented in Installing NLTK Data, although it's incomplete and kinda buried; reproduced below with much clearer formatting:
Command line installation
The downloader will search for an existing
nltk_data
directory to
install NLTK data. If one does not exist it will attempt to create one
in a central location (when using an administrator account) or
otherwise in the user’s filespace. If necessary, run the download
command from an administrator account, or using sudo. The recommended
system location is:
C:\nltk_data
(Windows) ;/usr/local/share/nltk_data
(Mac) and/usr/share/nltk_data
(Unix).
You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).
Run the command
python -m nltk.downloader all
To ensure central installation, run the command:
sudo python -m nltk.downloader -d /usr/local/share/nltk_data all
But really they should say:
sudo python -m nltk.downloader -d $NLTK_DATA all
Now as to what recommended path NLTK_DATA should use, nltk doesn't really give any proper guidance, but it should be a generic standalone path not under any install tree (so not under <python-install-directory>/lib/site-packages
) or any user dir. Hence, /usr/local/share
, /opt/share
or similar. On MacOS 10.7+, /usr
and thus /usr/local/
these days are hidden by default, so /opt/share
may well be a better choice. Or do chflags nohidden /usr/local/share
.
Is there a way to explicitly specify an alternative location for NLTK's corpora/wordnet?
have you tried to add the following line to your script?
nltk.path.append('/home/user/some_directory/nltk_data/')
Regards,
Grzegorz
Paths in AWS lambda with Python NLTK
So I've found the answer to this question. After a couple of days messing around I've finally figured it out. The data.py file in the nltk folder needs to be modified as follows. Basically remove the /usr/... paths and add in the folder that Lambda executes from /var/task/ and ensure that your nltk_data folder is in the root of your execution zip.
Not sure why, but using the inline nltk.data.path.append() method does not work with AWS Lambda and the data.py file needs to be modified directly.
else:
# Common locations on UNIX & OS X:
path += [
str('/var/task/nltk_data')
#str('/usr/share/nltk_data'),
#str('/usr/local/share/nltk_data'),
#str('/usr/lib/nltk_data'),
#str('/usr/local/lib/nltk_data')
]
Related Topics
How to Implement the Softmax Function in Python
How to Get Most Informative Features for Scikit-Learn Classifiers
Looping Over All Member Variables of a Class in Python
Naming Conflict with Built-In Function
How to Declare an Array in Python
Str' Object Has No Attribute 'Decode'. Python 3 Error
How to Capture Output of Python's Interpreter and Show in a Text Widget
Pipelinedrdd' Object Has No Attribute 'Todf' in Pyspark
Ssl Insecureplatform Error When Using Requests Package
Python, Https Get with Basic Authentication
Number of Days Between 2 Dates, Excluding Weekends
Check If a File Is Open in Python
Defining Private Module Functions in Python
In Django - Model Inheritance - Does It Allow You to Override a Parent Model's Attribute