Where Is Python's Sys.Path Initialized From

Where is Python's sys.path initialized from?

"Initialized from the environment variable PYTHONPATH, plus an installation-dependent default"

-- http://docs.python.org/library/sys.html#sys.path

What sets up sys.path with Python, and when?

Most of the stuff is set up in Python's site.py which is automatically imported when starting the interpreter (unless you start it with the -S option). Few paths are set up in the interpreter itself during initialization (you can find out which by starting python with -S).

Additionally, some frameworks (like Django I think) modify sys.path upon startup to meet their requirements.

The site module has a pretty good documentation, a commented source code and prints out some information if you run it via python -m site.

How does python interpreter know where to find sys module when modules are imported?

sys module is a way to access python internals, like size of objects, module load paths.

sys.path

A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

On startup, python reads the PYTHONPATH env. variable, adds some other built-in paths, and runs your module. When you use import, it just looks the module up in the internal path list.

When you import sys and you change path, it reflects in the python internal path. But it's just an API. Had they created a add_path (and remove_path, get_path....) method(s) instead that would have made things less magical, but also less natural.

The underlying mechanism is active even when you're not using it. sys.path is a python-level API so everyone understands how to change the configuration, but python doesn't need the sys package to operate.

How exactly is Python2's sys.path set in Windows?

Seems like Praveen Gollakota has good info at Troubleshooting python sys.path (repasted here:)

  • The first that is added C:\WINNT\system32\python27.zip (more details in PEP273).

  • Next ones that are added are from entries in windows registry. The entries C:\Python27\DLLs;C:\Python27\lib; C:\Python27\lib\plat-win; C:\Python27\lib\lib-tk come from HOT_KEY_LOCAL_USER/Python/PythonCore/2.7/PythonPath in the registry. More details in Python source code comments here http://svn.python.org/projects/python/trunk/PC/getpathp.c (These entries were the trickiest for me to understand until I found the link above).

  • Next, as explained in the site package documentation, sys.path is built from sys.prefix and sys.exec_prefix. On my computer both of them point to C:\Python27. And by default it searches the lib/site-packages anyway. So now the entries C:\Python27; C:\Python27\lib\site-packages are appended to the list above.

  • Next it searches each of the .pth files in alphabetical order. I have easy_install.pth, pywin32.pth and setuptools.pth in my site-packages. This is where things start getting weird. It would be straightforward if the entries in the .pth files were just directory locations. They would just get appended to the sys.path line by line. However easy_install.pth has some python code that causes the entries listed in easy_install.pth to add the packages list at the beginning of the sys.path list.

  • After this the directory entries in pywin32.pth, setuptools.pth are added at the end of the sys.path list as expected.

Note: While the above discussion pertains to Windows, it is similar even on Mac etc. On Mac it just adds different OS defaults like darwin etc. before it starts looking at site-packages directory for .pth files.

Difference between $PATH, sys.path and os.environ

This is actually more complicated than it would seem. It's unclear by the question if you understand the Linux/MacOS $PATH environment variable. Lets start there. The $PATH variable (in Python you're able to access the system environement variables from os.environ) denotes the current users $PATH variable as defined in various shell profile and environment files. It typically contains things like "/usr/bin" and other places where programs are installed. For example when you type "ls" into the system shell, the underlying system searches the $PATH for programs named "ls". So what actually gets executed is probably something like "/usr/bin/ls" I've included additional reading below.

sys.path on the other hand is constructed by Python when the interpreter is started, based on a number of things. The first sentence in the help page is as follows. "A list of strings that specifies the search path for modules. Initialized from the environment variable $PYTHONPATH, plus an installation-dependent default." The installation-dependent portion typically defines the installation location of Python site packages. $PYTHONPATH is another environment variable (like $PATH) which can be added to facilitate the module search location and can be set the same way the system $PATH can

Typically if you have non-installed sources (ie you have Python files that you want to run outside the site-packages directory) you typically need to manipulate sys.path either directly in your scripts or add the location to the $PYTHONPATH environment variable so the interpreter knows where to find your modules. Alternatively, you could use .pth files to manipulate the module search path as well

This is just a basic overview, I hope you read the docs for better understanding

Sources

  • Linux $PATH variable information
  • Python sys.path
  • Python site.py

Troubleshooting python sys.path

I had some issues recently with sys.path and here is how I went about trying to figure out where the entries are coming from. I was able to track all the entries and where they were coming from. Hopefully this will help you too.

  • The first that is added C:\WINNT\system32\python27.zip (more details in PEP273).

  • Next ones that are added are from entries in windows registry. The entries C:\Python27\DLLs;C:\Python27\lib; C:\Python27\lib\plat-win; C:\Python27\lib\lib-tk come from HOT_KEY_LOCAL_USER/Python/PythonCore/2.7/PythonPath in the registry. More details in Python source code comments here http://svn.python.org/projects/python/trunk/PC/getpathp.c (These entries were the trickiest for me to understand until I found the link above).

  • Next, as explained in the site package documentation (link), sys.path is built from sys.prefix and sys.exec_prefix. On my computer both of them point to C:\Python27. And by default it searches the lib/site-packages anywways. So now the entries C:\Python27; C:\Python27\lib\site-packages are appended to the list above.

  • Next it searches each of the .pth files in alphabetical order. I have easy_install.pth, pywin32.pth and setuptools.pth in my site-packages. This is where things start getting weird. It would be straightforward if the entries in the .pth files were just directory locations. They would just get appended to the sys.path line by line. However easy_install.pth has some python code that causes the entries listed in easy_install.pth to add the packages list at the beginning of the sys.path list.

  • After this the directory entries in pywin32.pth, setuptools.pth are added at the end of the sys.path list as expected.

Note: While the above discussion pertains to Windows, it is similar even on Mac etc. On Mac it just adds different OS defaults like darwin etc. before it starts looking at site-packages directory for .pth files.

In your case, you can start by starting a python shell and checking where sys.prefix and sys.exec_prefix point to and then drilling down from there.

Note 2: If you are using an IDE such as Aptana/PyDev it will add more configurations of its own. So you need to be wary of that.

Is PYTHONPATH consistent across multiple import statements even if some of them manipulate the sys.path?

If your module y changes sys.path, the value will be the same on your A.py script even if you execute importlib.reload(sys)

So imagine module 'y' executes

from sys import path
path.clear()

In your A.py script:

import sys, importlib
import x, y

importlib.reload(sys)
print(sys.path) # is []

import z

The module z will not be found.

To fix this you can restore your script sys.path variable to the same value assigned at the beginning by the interpreter.


From the documentation:

A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

And...

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter

Lets assume that the interpreter is not running in interactive mode or read from stdin (Its executing a file script) and its located on the current working directory

Our A.py could look like:

import importlib

import x, y

# We can still load (sys, os, ...)
from sys import path
from os import getcwd
import site

print(sys.path) # []

path.append(getcwd()) # Add directory where script is executed
path.append(os.environ.get('PYTHONPATH')) # Add PYTHONPATH
site.main() # Add site packages

import z # Now this dont fail

Note: Even removing all sys.path items, importlib is able to locate the packages os, site, sys, ...

This is because importlib uses sys.modules to access such packages:

From importlib.find_loader documentation:

If the module is in sys.modules, then sys.modules[name].loader is returned

And from the sys.modules documentation:

This is a dictionary that maps module names to modules which have already been loaded.


EDIT:

This is a tricky solution that you can use to solve this problem: You can create a function which is invoked everytime you load a module. The function checks if sys.path is changed after the module is loaded.
If true, set it to its original value

from copy import copy
import warnings
import sys

sys.path = list(sys.path)
_original_path = copy(sys.path)
_base_import = __import__

def _import(*args, **kwargs):
try:
module = _base_import(*args, **kwargs)
return module
finally:
if type(sys.path) != list or sys.path != _original_path:
warnings.warn('System path was modified', Warning)
# Restore path
sys.path = copy(_original_path)

__builtins__.__import__ = _import

And now execute this code:

import sys

before = copy(sys.path)
import y # 'y' tries to change sys.path
after = copy(sys.path)

print(before == after) # True

It will also display a warning message on stdout


EDIT #2 (Another solution):

This works only on python >=3.7 because it relies on PEP 562

Here I basically replace the module 'sys' so that i can avoid external modules to change the actual sys.path

First create a script with the next code (proxy.py):

import importlib
from sys import path, modules
from copy import copy

path = copy(path)
modules = copy(modules)

def __getattr__(name):
if name in globals():
return getattr(globals(), name)
return getattr(importlib.import_module('sys'), name)

def __dir__():
return dir(importlib.import_module('sys'))

Now, on your A.py, put the next code:

import proxy
import sys
sys.modules['sys'] = proxy

import y # y imports 'sys' but import sys returns the 'proxy' module
# 'y' thinks he changes sys.path but it only modifies proxy.path

print(proxy.path) # []
print(sys.path) # Unchanged

Code on y module:

import sys
sys.path.clear() # a.k: proxy.path.clear()

# You can still access to all properties from the sys module
print(dir(sys)) # ['ps1', 'ps2', 'platform', ...]


Related Topics



Leave a reply



Submit