What Is a Python Egg

What is a Python egg?

Note: Egg packaging has been superseded by Wheel packaging.

Same concept as a .jar file in Java, it is a .zip file with some metadata files renamed .egg, for distributing code as bundles.

Specifically: The Internal Structure of Python Eggs

A "Python egg" is a logical structure embodying the release of a
specific version of a Python project, comprising its code, resources,
and metadata. There are multiple formats that can be used to
physically encode a Python egg, and others can be developed. However,
a key principle of Python eggs is that they should be discoverable and
importable. That is, it should be possible for a Python application to
easily and efficiently find out what eggs are present on a system, and
to ensure that the desired eggs' contents are importable.

The .egg format is well-suited to distribution and the easy
uninstallation or upgrades of code, since the project is essentially
self-contained within a single directory or file, unmingled with any
other projects' code or resources. It also makes it possible to have
multiple versions of a project simultaneously installed, such that
individual programs can select the versions they wish to use.

What is the Python egg cache (PYTHON_EGG_CACHE)?

From my investigations it turns out that some eggs are packaged as zip files, and are saved as such in Python's site-packages directory.

These zipped eggs need to be unzipped before they can be executed, so are expanded into the PYTHON_EGG_CACHE directory which by default is ~/.python-eggs (located in the user's home directory). If this doesn't exist it causes problems when trying to run applications.

There are a number of fixes:

  1. Create a .python-eggs directory in the user's home directory and make it writable for the user.
  2. Create a global directory for unzipping (eg. /tmp/python-eggs) and set the environment variable PYTHON_EGG_CACHE to this directory.
  3. Use the -Z switch when using easy_install to unzip the package when installing.

Why we need python packaging (e.g. egg)?

The Python Packaging User Guide has to say the following on this topic:

Wheel and Egg are both packaging formats that aim to support the use case of needing an install artifact that doesn’t require building or compilation, which can be costly in testing and production workflows.

These formats can be used to distribute packages that contain binary extension modules. These would otherwise require compilation during installation.

If no compilation is involved a source distribution is in principle sufficient, but the user guide still recommends to create a wheel for performance reasons:

Minimally, you should create a Source Distribution:

python setup.py sdist

A “source distribution” is unbuilt (i.e, it’s not a Built Distribution), and requires a build step when installed by pip. Even if the distribution is pure python (i.e. contains no extensions), it still involves a build step to build out the installation metadata from setup.py.

[...]

You should also create a wheel for your project. A wheel is a built package that can be installed without needing to go through the “build” process. Installing wheels is substantially faster for the end user than installing from a source distribution.

In short, packages are a convenience thing - mostly for the user.

Wheel packages unify the process of distributing and installing projects that contain pure python, platform dependent code, or compiled extensions. The user does not need to worry if the package is written in Python or in C - it just works.

How to access files inside a Python egg file?

egg files are zipfiles, so you must access "stuff" inside them with the zipfile module of the Python standard libraries, not with the built-in open function!

How to debug a python source file within a .egg file

An egg file is a specialized zip file that packages together Python modules and other data. The modules can be any of .py(o) python source, .pyc compiled Python, and .exe compiled C(++). Python can import modules from within a zip file. AFAIK, IDLE cannot read text files from within an egg.

According to the answer by 'kmario23' to the SO question referenced above, one can change .egg to .zip and then explore the contents. You could then tell if any of the modules are .py files.

I suspect that you could extract the modules into a normal directory with the same name (minus .egg) and have python import the extracted modules. IDLE could then read any .py files. You might need to rename the .egg to, say, .eggback, to prevent its use. I am guessing this from my experience with normal zip files, as I have never manipulated an egg file. There is probably some detail that I have omitted.

What are the benefits and downsides of using the Python packaging *.egg format over a simple directory with setup.py?

Eggs are tied to a specific architecture and python version, and until Python 3.3, if the egg contains C extensions, even the internal Unicode representation size (UCS2 vs. UCS4).

Unfortunately, the latter difference is not captured in the egg metadata; an egg filename contains the architecture and the python version (major.minor, so 2.4 or 3.1) but the unicode byte size is omitted.

Because of this, eggs are not very portable. A .tgz or .zip distribution on the other hand, is (hopefully) platform agnostic. Your installation tool, be it easy_install, pip, buildout or whatever, knows how to compile a python package distribution into an egg for you, so you generally avoid distributing the .egg files altogether.

The only exception is Windows, where most people will be lacking the toolchain to compile C extensions. As Windows distributions of Python default to UCS2, you are usually safe to distribute Windows .egg builds of packages with C extensions, to facilitate installation by automated tools.

If you use the setup.py script to create the distribution, it's trivial to create a source-only package for upload to PyPI. I can recommend the Python Packaging User Guide for more information.

How is a python egg different from a regular package?

Yes absolutely but a bit more
read this http://www.ibm.com/developerworks/library/l-cppeak3.html goto section "All about eggs"
copied from the above website :

However, this sort of manipulation of the PYTHONPATH (or of sys.path within a script or Python shell session) is a bit fragile. Discovery of eggs is probably best handled within some newish magic .pth files. Any .pth files found in site-packages/ or on the PYTHONPATH are parsed for additional imports to perform, in a very similar manner to the way directories in those locations that might contain packages are examined. If you handle package management with setuptools, a file called easy-install.pth is modified when packages are installed, upgraded, removed, etc. But you may call your .pth files whatever you like (as long as they have the .pth extension). For example, here is my easy-install.pth:

Listing 11. .pth files as configuration of egg locations


% cat /sw/lib/python2.4/site-packages/easy-install.pth
import sys; sys.__plen = len(sys.path)
setuptools-0.6b1-py2.4.egg
SQLObject-0.7.0-py2.4.egg
FormEncode-0.5.1-py2.4.egg
Gnosis_Utils-1.2.1-py2.4.egg
import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:];
p=getattr(sys,'__egginsert',0); sys.path[p:p]=new;
sys.__egginsert = p+len(new)

The format is a bit peculiar: it is almost, but not quite, a Python script. Suffice it to say that you may add additional listed eggs in there; or better yet, easy_install will do it for you when it runs. You may also create as many other .pth files as you like under site-packages/, and each may simply list which eggs to make available.



Related Topics



Leave a reply



Submit