Package Only Binary Compiled .So Files of a Python Library Compiled With Cython

Package only binary compiled .so files of a python library compiled with Cython

Unfortunately, the answer suggesting setting packages=[] is wrong and may break a lot of stuff, as can e.g. be seen in this question. Don't use it. Instead of excluding all packages from the dist, you should exclude only the python files that will be cythonized and compiled to shared objects.

Below is a working example; it uses my recipe from the question Exclude single source file from python bdist_egg or bdist_wheel. The example project contains package spam with two modules, spam.eggs and spam.bacon, and a subpackage spam.fizz with one module spam.fizz.buzz:

root
├── setup.py
└── spam
├── __init__.py
├── bacon.py
├── eggs.py
└── fizz
├── __init__.py
└── buzz.py

The module lookup is being done in the build_py command, so it is the one you need to subclass with custom behaviour.

Simple case: compile all source code, make no exceptions

If you are about to compile every .py file (including __init__.pys), it is already sufficient to override build_py.build_packages method, making it a noop. Because build_packages doesn't do anything, no .py file will be collected at all and the dist will include only cythonized extensions:

import fnmatch
from setuptools import find_packages, setup, Extension
from setuptools.command.build_py import build_py as build_py_orig
from Cython.Build import cythonize


extensions = [
# example of extensions with regex
Extension('spam.*', ['spam/*.py']),
# example of extension with single source file
Extension('spam.fizz.buzz', ['spam/fizz/buzz.py']),
]


class build_py(build_py_orig):
def build_packages(self):
pass


setup(
name='...',
version='...',
packages=find_packages(),
ext_modules=cythonize(extensions),
cmdclass={'build_py': build_py},
)

Complex case: mix cythonized extensions with source modules

If you want to compile only selected modules and leave the rest untouched, you will need a bit more complex logic; in this case, you need to override module lookup. In the below example, I still compile spam.bacon, spam.eggs and spam.fizz.buzz to shared objects, but leave __init__.py files untouched, so they will be included as source modules:

import fnmatch
from setuptools import find_packages, setup, Extension
from setuptools.command.build_py import build_py as build_py_orig
from Cython.Build import cythonize


extensions = [
Extension('spam.*', ['spam/*.py']),
Extension('spam.fizz.buzz', ['spam/fizz/buzz.py']),
]
cython_excludes = ['**/__init__.py']


def not_cythonized(tup):
(package, module, filepath) = tup
return any(
fnmatch.fnmatchcase(filepath, pat=pattern) for pattern in cython_excludes
) or not any(
fnmatch.fnmatchcase(filepath, pat=pattern)
for ext in extensions
for pattern in ext.sources
)


class build_py(build_py_orig):
def find_modules(self):
modules = super().find_modules()
return list(filter(not_cythonized, modules))

def find_package_modules(self, package, package_dir):
modules = super().find_package_modules(package, package_dir)
return list(filter(not_cythonized, modules))


setup(
name='...',
version='...',
packages=find_packages(),
ext_modules=cythonize(extensions, exclude=cython_excludes),
cmdclass={'build_py': build_py},
)

Cython binary package compile issues

What you probably want to do is create a package containing an extension module for for each .py file.

setup.py would contain:

      ext_modules=cythonize([
Extension("hello.hello1", ["hello/hello1.py"]),
Extension("hello.hello2", ["hello/hello2.py"]),
Extension("bye.bye1", ["bye/bye1.py"]),
Extension("bye.bye2", ["bye/bye2.py"]), language_level=3),

I've skipped the __init__.py files because there's usually little value in compiling them with Cython, and you have to work around a setuptools bug on Windows.

The upshot is that Python imports extension modules by searching for the module name first, then seeing if it has a PyInit_<module_name> function to call. When Cython compiles a file (e.g. hello1.py) it'll therefore create a PyInit_hello1 function.

Therefore your supposed combined "hello" module ends up containing PyInit_hello1, PyInit_hello2, but no PyInit_hello and so doesn't import.

If you really want to bundle multiple modules together into a single .so file then you can follow the instructions in Collapse multiple submodules to one Cython extension. You'll note that it isn't a built-in feature and it involves a lot of fine details of the Python import mechanism. Alternatively you could use some third-party tools designed to automated this (e.g. https://github.com/smok-serwis/snakehouse). I don't recommend this because it's complicated and likely a bit fragile.

package only cythonized binary python files and resource data but ignoring python .py source files

Based on this Answer by @hoefling I was able to package my resource_folder and obfuscated binary a.so file.

The recipe for setup.py

from Cython.Distutils import build_ext
from Cython.Build import cythonize
from setuptools.extension import Extension
from setuptools.command.build_py import build_py as build_py_orig
from pathlib import Path
from setuptools import find_packages, setup, Command
import os
import shutil

here = os.path.abspath(os.path.dirname(__file__))
packages = find_packages(exclude=('tests',))

def get_package_files_in_directory(directory):
paths = []
for (path, directories, filenames) in os.walk(directory):
for filename in filenames:
paths.append(os.path.join('..', path, filename))
return paths
#to copy the __init__.py as specified in above references links

class MyBuildExt(build_ext):
def run(self):
build_ext.run(self)

build_dir = Path(self.build_lib)
root_dir = Path(__file__).parent

target_dir = build_dir if not self.inplace else root_dir

self.copy_file(Path('main_folder') / '__init__.py', root_dir, target_dir)


def copy_file(self, path, source_dir, destination_dir):
if not (source_dir / path).exists():
return

shutil.copyfile(str(source_dir / path), str(destination_dir / path))

#as specified by @hoefling to ignore .py and not resource_folder
class build_py(build_py_orig):
def build_packages(self):
pass

setup(
packages=find_packages(), # needed for obfuscation
ext_modules=cythonize(
[
Extension("main_folder.*", ["main_folder/*.py"])

],
build_dir="build",
compiler_directives=dict(
always_allow_keywords=True
)),
package_data={p: get_package_files_in_directory(os.path.join(here, p, 'resource_folder')) for p in packages}, #package_data as found in another reference
cmdclass={
'build_py': build_py
},
entry_points={
},
)

To create obfuscated *.whl package set of commands

python setup.py build_ext  #creates the a.so
python setup.py build_py #copies the resource_folder excluding .py
python setup.py bdist_wheel # then whl generation

Cythonize installs .so files to wrong location

I'm going to self answer here. The issue stemmed from an improperly specified Extension.

    Extension("kmerdb.distance", ["kmerdb/distance.pyx"], include_dirs=[np.get_include()])

All I had to do was include the module name for the fully specified submodule hierarchy. Fixed it!

Does Cython compile imported modules as part of the binary?

The "interface" of a Cython module remains at the Python level. When you import a module in Cython, the module becomes available only at the Python level of the code and uses the regular Python import mechanism.

So:

  1. Cython does not "compile in" the dependencies.
  2. You need to install the dependencies on the target machine.

For "Cython level" code, including the question of "cimporting" module, Cython uses the equivalent of C headers (the .pxd declaration files) and dynamically loaded libraries to access external code. The .so files (for Linux, DLL for windows and dylib for mac) need to be present on the target machine.



Related Topics



Leave a reply



Submit