How to install NLTK data in windows (Anaconda)
After installing nltk using pip,run the following code in ipython
import nltk
nltk.download()
After this you will get a GUI where you can download all the data
If you want specific download, you can do that too. GUI looks as shown below
Change nltk.download() path directory from default ~/ntlk_data
This can be configured both by command-line (nltk.download(..., download_dir=)
or by GUI. Bizarrely nltk seems to totally ignore its own environment variable NLTK_DATA
and default its download directories to a standard set of five paths, regardless whether NLTK_DATA
is defined and where it points, and regardless whether nltk's five default dirs even exist on the machine or architecture(!). Some of that is documented in Installing NLTK Data, although it's incomplete and kinda buried; reproduced below with much clearer formatting:
Command line installation
The downloader will search for an existing
nltk_data
directory to
install NLTK data. If one does not exist it will attempt to create one
in a central location (when using an administrator account) or
otherwise in the user’s filespace. If necessary, run the download
command from an administrator account, or using sudo. The recommended
system location is:
C:\nltk_data
(Windows) ;/usr/local/share/nltk_data
(Mac) and/usr/share/nltk_data
(Unix).
You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).
Run the command
python -m nltk.downloader all
To ensure central installation, run the command:
sudo python -m nltk.downloader -d /usr/local/share/nltk_data all
But really they should say:
sudo python -m nltk.downloader -d $NLTK_DATA all
Now as to what recommended path NLTK_DATA should use, nltk doesn't really give any proper guidance, but it should be a generic standalone path not under any install tree (so not under <python-install-directory>/lib/site-packages
) or any user dir. Hence, /usr/local/share
, /opt/share
or similar. On MacOS 10.7+, /usr
and thus /usr/local/
these days are hidden by default, so /opt/share
may well be a better choice. Or do chflags nohidden /usr/local/share
.
download nltk corpus as cmdclass in setup.py files not working
Pass the class, not its instance:
cmdclass={'download_nltk': DownloadNLTK}
(no ()
to avoid instantiating the class)
How to install nltk_data as package with pip?
The bottom of the NLTK data documentation explains this:
Run the command
python -m nltk.downloader all
. To ensure central installation, run the commandsudo python -m nltk.downloader -d /usr/local/share/nltk_data all
.
If you want to distribute your program, you might want to consider writing a setuptools
setup.py
file to simplify installation:
What is setup.py?
Official packaging docs
NLTK - Download all nltk data except corpara from command line without Downloader UI
List all corpora ids and set _status_cache[pkg.id] = 'installed'
.
It will set status value for all corpora as 'installed' and corpora packages will be skipped when we use nltk.download()
.
Instead of downloading all corpora and models, if you're unsure of which corpora/package you need, use nltk.download('popular')
.
import nltk
dwlr = nltk.downloader.Downloader()
for pkg in dwlr.corpora():
dwlr._status_cache[pkg.id] = 'installed'
dwlr.download('popular')
To download all packages of specific folder.
import nltk
dwlr = nltk.downloader.Downloader()
# chunkers, corpora, grammars, help, misc,
# models, sentiment, stemmers, taggers, tokenizers
for pkg in dwlr.packages():
if pkg.subdir== 'taggers':
dwlr.download(pkg.id)
Related Topics
Calculating a Directory's Size Using Python
Where Is Python's Sys.Path Initialized From
Printing All Instances of a Class
How to .Decode('String-Escape') in Python 3
How to Get the Original Variable Name of Variable Passed to a Function
How to Delete a Character from a String Using Python
Opencv 2.4 Videocapture Not Working on Windows
Python Max Function Using 'Key' and Lambda Expression
Very Large Matrices Using Python and Numpy
What Soap Client Libraries Exist for Python, and Where Is the Documentation for Them
How to Access the Ith Column of a Numpy Multidimensional Array
String Concatenation of Two Pandas Columns