How to include package data with setuptools/distutils?
I realize that this is an old question, but for people finding their way here via Google: package_data
is a low-down, dirty lie. It is only used when building binary packages (python setup.py bdist ...
) but not when building source packages (python setup.py sdist ...
). This is, of course, ridiculous -- one would expect that building a source distribution would result in a collection of files that could be sent to someone else to built the binary distribution.
In any case, using MANIFEST.in
will work both for binary and for source distributions.
How to add platform-specific package data in setup.py?
This is the solution I am currently using for pypdfium2
:
- Create a class of supported platforms whose values correspond to the data directory names:
class PlatformNames:
darwin_x64 = "darwin_x64"
linux_x64 = "linux_x64"
windows_x64 = "windows_x64"
# ...
sourcebuild = "sourcebuild"
- Wrap
setuptools.setup()
with a function that takes the platform name as argument and copies platform-dependent files into the source tree as required:
# A list of non-python file names to consider for inclusion in the installation, e. g.
Libnames = (
"somelib.so",
"somelib.dll",
"somelib.dylib",
)
# _clean() removes possible old binaries/bindings
# _copy_bindings() copies the new stuff into the source tree
# _get_bdist() returns a custom `wheel.bdist_wheel` subclass with the `get_tag()` and `finalize_options()` functions overridden so as to tag the wheels according to their target platform.
def mkwheel(pl_name):
_clean()
_copy_bindings(pl_name)
setuptools.setup(
package_data = {"": Libnames},
cmdclass = {"bdist_wheel": _get_bdist(pl_name)},
# ...
)
# not cleaning up afterwards so that editable installs work (`pip3 install -e .`)
- In
setup.py
, query for a custom environment variable defining the target platform (e. g.$PYP_TARGET_PLATFORM
).- If set to a value that indicates the need for a source distribution (e. g.
sdist
), run the rawsetuptools.setup()
function without copying in any build artifacts. - If set to a platform name, build for the requested platform. This makes packaging platform-independent and avoids the need for native hosts to craft the wheels.
- If not set, detect the host platform using
sysconfig.get_platform()
and callmkwheel()
with the correspondingPlatformNames
member.- In case the detected platform is not supported, trigger code that performs a source build, moves the created files into
data/sourcebuild/
and runsmkwheel(PlatformNames.sourcebuild)
.
- In case the detected platform is not supported, trigger code that performs a source build, moves the created files into
- If set to a value that indicates the need for a source distribution (e. g.
- Write a script that iterates through the platform names, sets your environment variable and runs
python3 -m build --no-isolation --skip-dependency-check --wheel
for each. Also invokebuild
once with--sdist
instead of--wheel
and the environment variable set to the value for source distribution.
→ If all goes well, the platform-specific wheels and a source distribution will be written into dist/
.
Perhaps this is a lot easier to understand just by looking at pypdfium2's code (especially setup.py
, setup_base.py
and craft_packages.py
).
Disclaimer: I am not experienced with the setup infrastructure of Python and merely wrote this code out of personal need. I acknowledge that the approach is a bit "hacky". If there is a possibility to achieve the same goal while using the setuptools API in a more official sort of way, I'd be interested to hear about it.
Update 1: A negative implication of this concept is that the content wrongly ends up in a purelib
folder, although it should be platlib
as per PEP 427. I'm not sure how to instruct wheel/setuptools differently. Luckily, this is rather just a cosmetic problem.
Update 2: Found a fix to the purelib
problem:
class BinaryDistribution (setuptools.Distribution):
def has_ext_modules(self):
return True
setuptools.setup(
# ...
distclass = BinaryDistribution,
)
Setup.py - Add data files inside package in setuptools
Moving both config
and doc
dirs under mypackage
(the one that is actually a package, containing an __init__.py
) should fix the issue. The changed directory structure from the question:
mypackage/
├── mypackage/
│ ├── __init__.py
| ├── config/
| | └── config.json
| ├── docs/
| | ├── __init__.py
| | └── doc_folder/
| | └── text_file.txt
| └── main.py
├── setup.py
└── MANIFEST.in
setuptools: adding additional files outside package
There is also data_files
data_files=[("yourdir",
["additionalstuff/moredata.txt", "INFO.txt"])],
Have a think about where you want to put those files. More info in the docs.
How include static files to setuptools - python package
As pointed out in the comments, there are 2 ways to add the static files:
1 - include_package_data=True + MANIFEST.in
A MANIFEST.in
file in the same directory of setup.py
that looks like this:
include src/static/*
include src/Potato/*.txt
With include_package_data = True
in setup.py
.
2 - package_data in setup.py
package_data = {
'static': ['*'],
'Potato': ['*.txt']
}
Specify the files inside the setup.py
.
Do not use both include_package_data
and package_data
in setup.py
.
include_package_data
will nullify the package_data
information.
Official docs:
https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html
How to add package data recursively in Python setup.py?
- Use Setuptools instead of distutils.
- Use data files instead of package data. These do not require
__init__.py
. Generate the lists of files and directories using standard Python code, instead of writing it literally:
data_files = []
directories = glob.glob('data/subfolder?/subfolder??/')
for directory in directories:
files = glob.glob(directory+'*')
data_files.append((directory, files))
# then pass data_files to setup()
Add custom action to setup.py
It is not a good idea to do any customization at install time. It is good practice to do customization at run time, usually at the start of the first run.
At the start of your program, you should check if login and pass are somehow available. If login and pass are not available, then ask the user to enter them and save the values in a file. Usually such files should be saved in user configuration directory. Typically you would use the platformdirs library to get the right location for such a file.
Something like that:
import pathlib
import platformdirs
user_config_dir = platformdirs.user_config_dir('MyApp', 'tibhar940')
user_config_path = pathlib.Path(user_config_dir, 'config.cfg')
if user_config_path.is_file():
# read
else:
# prompt the user and save in file
Related:
- How to setup application to personalize it?
- How to install writable shared and user specific data files with setuptools?
Including non-Python files with setup.py
Probably the best way to do this is to use the setuptools
package_data
directive. This does mean using setuptools
(or distribute
) instead of distutils
, but this is a very seamless "upgrade".
Here's a full (but untested) example:
from setuptools import setup, find_packages
setup(
name='your_project_name',
version='0.1',
description='A description.',
packages=find_packages(exclude=['ez_setup', 'tests', 'tests.*']),
package_data={'': ['license.txt']},
include_package_data=True,
install_requires=[],
)
Note the specific lines that are critical here:
package_data={'': ['license.txt']},
include_package_data=True,
package_data
is a dict
of package names (empty = all packages) to a list of patterns (can include globs). For example, if you want to only specify files within your package, you can do that too:
package_data={'yourpackage': ['*.txt', 'path/to/resources/*.txt']}
The solution here is definitely not to rename your non-py
files with a .py
extension.
See Ian Bicking's presentation for more info.
UPDATE: Another [Better] Approach
Another approach that works well if you just want to control the contents of the source distribution (sdist
) and have files outside of the package (e.g. top-level directory) is to add a MANIFEST.in
file. See the Python documentation for the format of this file.
Since writing this response, I have found that using MANIFEST.in
is typically a less frustrating approach to just make sure your source distribution (tar.gz
) has the files you need.
For example, if you wanted to include the requirements.txt
from top-level, recursively include the top-level "data" directory:
include requirements.txt
recursive-include data *
Nevertheless, in order for these files to be copied at install time to the package’s folder inside site-packages, you’ll need to supply include_package_data=True
to the setup()
function. See Adding Non-Code Files for more information.
Accessing data files before and after distutils/setuptools
I've used a utility method called data_file:
def data_file(fname):
"""Return the path to a data file of ours."""
return os.path.join(os.path.split(__file__)[0], fname)
I put this in the init.py file in my project, and then call it from anywhere in my package to get a file relative to the package.
Setuptools offers a similar function, but this doesn't need setuptools.
Related Topics
Does Python Do Variable Interpolation Similar to "String #{Var}" in Ruby
Extracting Just Month and Year Separately from Pandas Datetime Column
Your CPU Supports Instructions That This Tensorflow Binary Was Not Compiled to Use: Avx Avx2
Getting Python Error "From: Can't Read /Var/Mail/Bio"
Python List in SQL Query as Parameter
How to Return Two Values from a Function in Python
How to Switch to New Window in Selenium for Python
Generate a Heatmap in Matplotlib Using a Scatter Data Set
Tkinter.Tclerror: Image "Pyimage3" Doesn't Exist
How to Deploy a Perl/Python/Ruby Script Without Installing an Interpreter
Understanding _Get_ and _Set_ and Python Descriptors
How to Determine a Python Variable's Type
Python Extract Pattern Matches