Where is Python's sys.path initialized from?
"Initialized from the environment variable PYTHONPATH, plus an installation-dependent default"
-- http://docs.python.org/library/sys.html#sys.path
What sets up sys.path with Python, and when?
Most of the stuff is set up in Python's site.py
which is automatically imported when starting the interpreter (unless you start it with the -S
option). Few paths are set up in the interpreter itself during initialization (you can find out which by starting python with -S
).
Additionally, some frameworks (like Django I think) modify sys.path
upon startup to meet their requirements.
The site
module has a pretty good documentation, a commented source code and prints out some information if you run it via python -m site
.
How does python interpreter know where to find sys module when modules are imported?
sys
module is a way to access python internals, like size of objects, module load paths.
sys.path
A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.
On startup, python reads the PYTHONPATH
env. variable, adds some other built-in paths, and runs your module. When you use import
, it just looks the module up in the internal path list.
When you import sys
and you change path
, it reflects in the python internal path. But it's just an API. Had they created a add_path
(and remove_path
, get_path
....) method(s) instead that would have made things less magical, but also less natural.
The underlying mechanism is active even when you're not using it. sys.path
is a python-level API so everyone understands how to change the configuration, but python doesn't need the sys
package to operate.
How exactly is Python2's sys.path set in Windows?
Seems like Praveen Gollakota has good info at Troubleshooting python sys.path (repasted here:)
The first that is added C:\WINNT\system32\python27.zip (more details in PEP273).
Next ones that are added are from entries in windows registry. The entries
C:\Python27\DLLs;C:\Python27\lib; C:\Python27\lib\plat-win; C:\Python27\lib\lib-tk
come fromHOT_KEY_LOCAL_USER/Python/PythonCore/2.7/PythonPath
in the registry. More details in Python source code comments here http://svn.python.org/projects/python/trunk/PC/getpathp.c (These entries were the trickiest for me to understand until I found the link above).Next, as explained in the
site
package documentation,sys.path
is built fromsys.prefix
andsys.exec_prefix
. On my computer both of them point toC:\Python27
. And by default it searches thelib/site-packages
anyway. So now the entriesC:\Python27; C:\Python27\lib\site-packages
are appended to the list above.Next it searches each of the
.pth
files in alphabetical order. I haveeasy_install.pth
,pywin32.pth
andsetuptools.pth
in my site-packages. This is where things start getting weird. It would be straightforward if the entries in the.pth
files were just directory locations. They would just get appended to thesys.path
line by line. Howevereasy_install.pth
has some python code that causes the entries listed ineasy_install.pth
to add the packages list at the beginning of thesys.path
list.After this the directory entries in
pywin32.pth
,setuptools.pth
are added at the end of thesys.path
list as expected.
Note: While the above discussion pertains to Windows, it is similar even on Mac etc. On Mac it just adds different OS defaults like darwin etc. before it starts looking at site-packages
directory for .pth
files.
Difference between $PATH, sys.path and os.environ
This is actually more complicated than it would seem. It's unclear by the question if you understand the Linux/MacOS $PATH environment variable. Lets start there. The $PATH variable (in Python you're able to access the system environement variables from os.environ) denotes the current users $PATH variable as defined in various shell profile and environment files. It typically contains things like "/usr/bin" and other places where programs are installed. For example when you type "ls" into the system shell, the underlying system searches the $PATH for programs named "ls". So what actually gets executed is probably something like "/usr/bin/ls" I've included additional reading below.
sys.path on the other hand is constructed by Python when the interpreter is started, based on a number of things. The first sentence in the help page is as follows. "A list of strings that specifies the search path for modules. Initialized from the environment variable $PYTHONPATH, plus an installation-dependent default." The installation-dependent portion typically defines the installation location of Python site packages. $PYTHONPATH is another environment variable (like $PATH) which can be added to facilitate the module search location and can be set the same way the system $PATH can
Typically if you have non-installed sources (ie you have Python files that you want to run outside the site-packages directory) you typically need to manipulate sys.path either directly in your scripts or add the location to the $PYTHONPATH environment variable so the interpreter knows where to find your modules. Alternatively, you could use .pth files to manipulate the module search path as well
This is just a basic overview, I hope you read the docs for better understanding
Sources
- Linux $PATH variable information
- Python sys.path
- Python site.py
Troubleshooting python sys.path
I had some issues recently with sys.path
and here is how I went about trying to figure out where the entries are coming from. I was able to track all the entries and where they were coming from. Hopefully this will help you too.
The first that is added
C:\WINNT\system32\python27.zip
(more details in PEP273).Next ones that are added are from entries in windows registry. The entries
C:\Python27\DLLs;C:\Python27\lib; C:\Python27\lib\plat-win; C:\Python27\lib\lib-tk
come fromHOT_KEY_LOCAL_USER/Python/PythonCore/2.7/PythonPath
in the registry. More details in Python source code comments here http://svn.python.org/projects/python/trunk/PC/getpathp.c (These entries were the trickiest for me to understand until I found the link above).Next, as explained in the
site
package documentation (link),sys.path
is built fromsys.prefix
andsys.exec_prefix
. On my computer both of them point toC:\Python27
. And by default it searches thelib/site-packages
anywways. So now the entriesC:\Python27; C:\Python27\lib\site-packages
are appended to the list above.Next it searches each of the
.pth
files in alphabetical order. I haveeasy_install.pth
,pywin32.pth
andsetuptools.pth
in my site-packages. This is where things start getting weird. It would be straightforward if the entries in the.pth
files were just directory locations. They would just get appended to thesys.path
line by line. Howevereasy_install.pth
has some python code that causes the entries listed ineasy_install.pth
to add the packages list at the beginning of thesys.path
list.After this the directory entries in
pywin32.pth
,setuptools.pth
are added at the end of thesys.path
list as expected.
Note: While the above discussion pertains to Windows, it is similar even on Mac etc. On Mac it just adds different OS defaults like darwin
etc. before it starts looking at site-packages
directory for .pth
files.
In your case, you can start by starting a python shell and checking where sys.prefix
and sys.exec_prefix
point to and then drilling down from there.
Note 2: If you are using an IDE such as Aptana/PyDev it will add more configurations of its own. So you need to be wary of that.
Is PYTHONPATH consistent across multiple import statements even if some of them manipulate the sys.path?
If your module y changes sys.path
, the value will be the same on your A.py script even if you execute importlib.reload(sys)
So imagine module 'y' executes
from sys import path
path.clear()
In your A.py script:
import sys, importlib
import x, y
importlib.reload(sys)
print(sys.path) # is []
import z
The module z will not be found.
To fix this you can restore your script sys.path
variable to the same value assigned at the beginning by the interpreter.
From the documentation:
A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.
And...
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter
Lets assume that the interpreter is not running in interactive mode or read from stdin (Its executing a file script) and its located on the current working directory
Our A.py could look like:
import importlib
import x, y
# We can still load (sys, os, ...)
from sys import path
from os import getcwd
import site
print(sys.path) # []
path.append(getcwd()) # Add directory where script is executed
path.append(os.environ.get('PYTHONPATH')) # Add PYTHONPATH
site.main() # Add site packages
import z # Now this dont fail
Note: Even removing all sys.path
items, importlib
is able to locate the packages os
, site
, sys
, ...
This is because importlib
uses sys.modules
to access such packages:
From importlib.find_loader documentation:
If the module is in sys.modules, then sys.modules[name].loader is returned
And from the sys.modules documentation:
This is a dictionary that maps module names to modules which have already been loaded.
EDIT:
This is a tricky solution that you can use to solve this problem: You can create a function which is invoked everytime you load a module. The function checks if
sys.path
is changed after the module is loaded.If true, set it to its original value
from copy import copy
import warnings
import sys
sys.path = list(sys.path)
_original_path = copy(sys.path)
_base_import = __import__
def _import(*args, **kwargs):
try:
module = _base_import(*args, **kwargs)
return module
finally:
if type(sys.path) != list or sys.path != _original_path:
warnings.warn('System path was modified', Warning)
# Restore path
sys.path = copy(_original_path)
__builtins__.__import__ = _import
And now execute this code:
import sys
before = copy(sys.path)
import y # 'y' tries to change sys.path
after = copy(sys.path)
print(before == after) # True
It will also display a warning message on stdout
EDIT #2 (Another solution):
This works only on python >=3.7 because it relies on PEP 562
Here I basically replace the module 'sys' so that i can avoid external modules to change the actual sys.path
First create a script with the next code (proxy.py):
import importlib
from sys import path, modules
from copy import copy
path = copy(path)
modules = copy(modules)
def __getattr__(name):
if name in globals():
return getattr(globals(), name)
return getattr(importlib.import_module('sys'), name)
def __dir__():
return dir(importlib.import_module('sys'))
Now, on your A.py, put the next code:
import proxy
import sys
sys.modules['sys'] = proxy
import y # y imports 'sys' but import sys returns the 'proxy' module
# 'y' thinks he changes sys.path but it only modifies proxy.path
print(proxy.path) # []
print(sys.path) # Unchanged
Code on y module:
import sys
sys.path.clear() # a.k: proxy.path.clear()
# You can still access to all properties from the sys module
print(dir(sys)) # ['ps1', 'ps2', 'platform', ...]
Related Topics
Delete an Element from a Dictionary
From ... Import' VS 'Import .'
Setup Script Exited with Error: Command 'X86_64-Linux-Gnu-Gcc' Failed with Exit Status 1
Max Retries Exceeded with Url in Requests
Using a Dictionary to Count the Items in a List
Specifying and Saving a Figure with Exact Size in Pixels
Numpy Where Function Multiple Conditions
Python: Finding Differences Between Elements of a List
Python: Bind an Unbound Method
Finding All Possible Permutations of a Given String in Python
Rank Items in an Array Using Python/Numpy, Without Sorting Array Twice
Can You Monkey Patch Methods on Core Types in Python
How to Extract Multiple JSON Objects from One File
Can Not Click on a Element: Elementclickinterceptedexception in Splinter/Selenium