Pythonpath VS. Sys.Path

PYTHONPATH vs. sys.path

If the only reason to modify the path is for developers working from their working tree, then you should use an installation tool to set up your environment for you. virtualenv is very popular, and if you are using setuptools, you can simply run setup.py develop to semi-install the working tree in your current Python installation.

sys.path vs. $PATH

you can read environment variables accessing to the os.environdictionary

import os

my_path = os.environ['PATH']

about searching where a Package is installed, it depends if is installed in PATH

Difference between $PATH, sys.path and os.environ

This is actually more complicated than it would seem. It's unclear by the question if you understand the Linux/MacOS $PATH environment variable. Lets start there. The $PATH variable (in Python you're able to access the system environement variables from os.environ) denotes the current users $PATH variable as defined in various shell profile and environment files. It typically contains things like "/usr/bin" and other places where programs are installed. For example when you type "ls" into the system shell, the underlying system searches the $PATH for programs named "ls". So what actually gets executed is probably something like "/usr/bin/ls" I've included additional reading below.

sys.path on the other hand is constructed by Python when the interpreter is started, based on a number of things. The first sentence in the help page is as follows. "A list of strings that specifies the search path for modules. Initialized from the environment variable $PYTHONPATH, plus an installation-dependent default." The installation-dependent portion typically defines the installation location of Python site packages. $PYTHONPATH is another environment variable (like $PATH) which can be added to facilitate the module search location and can be set the same way the system $PATH can

Typically if you have non-installed sources (ie you have Python files that you want to run outside the site-packages directory) you typically need to manipulate sys.path either directly in your scripts or add the location to the $PYTHONPATH environment variable so the interpreter knows where to find your modules. Alternatively, you could use .pth files to manipulate the module search path as well

This is just a basic overview, I hope you read the docs for better understanding

Sources

  • Linux $PATH variable information
  • Python sys.path
  • Python site.py

PYTHONPATH vs. sys.path (RELOADED)

The better way of doing this now is to use pip install with the -e option.

pip install -e .

It uses a directory with the setup.py file. The "." indicates this directory. This works the same way as the setuptools develop method.

I believe that the develop creates an egg link in your sight packages folder which points to the folder of the library. http://pythonhosted.org/setuptools/setuptools.html#develop-deploy-the-project-source-in-development-mode

python setup.py develop

I believe this is why you get the absolute path. There may be a conflict with a develop link and an install. Things could have also been moved.

For double clicking just have something that checks sys.argv. If there is no value for sys.argv[1] append build, install, or develop.

In addition, I've always heard that you want to import the modules then call the functions from the modules. from package import lib. lib.foo() that way you know where the method came from. I believe the import does the same thing for both ways; this may clean up your import. Python pathing and packaging can be a pain.

from package import lib
lib.foo()

PYTHONPATH sys.path difference

There are several reasons why a path may show up. Make sure you don't hit one of these:

  • The path must exist, non-existing paths are ignored. From the PYTHONPATH documentation:

    Non-existent directories are silently ignored.

  • Duplicates are removed (the first entry is kept); paths are made absolute (relative to the current working directory) and compared case-insensitively on platforms where this matters.

    So if you have a relative path that comes down to the same absolute path in your sys.path, only the first entry is kept.

  • After normilization and cleanup, the site module tries to import sitecustomize and usercustomize modules. These could manipulate sys.path too.

You can take a closer look at your sys.path right after cleaning and if there is a usercustomize module to be imported by running the site module as a command line tool:

python -m site

It'll print out your sys.path in a readable one-line-per-entry format.

which python vs PYTHONPATH

You're mixing 2 environment variables:

  • PATH where which looks up for executables when they're accessed by name only. This variable is a list (colon/semi-colon separated depending on the platform) of directories containing executables. Not python specific. which python just looks in this variable and prints the full path
  • PYTHONPATH is python-specific list of directories (colon/semi-colon separated like PATH) where python looks for packages that aren't installed directly in the python distribution. The name & format is very close to system/shell PATH variable on purpose, but it's not used by the operating system at all, just by python.

Why use sys.path.append(path) instead of sys.path.insert(1, path)?

If you have multiple versions of a package / module, you need to be using virtualenv (emphasis mine):

virtualenv is a tool to create isolated Python environments.

The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.

Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.

Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.

In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).

That's why people consider insert(0, to be wrong -- it's an incomplete, stopgap solution to the problem of managing multiple environments.

Is PYTHONPATH consistent across multiple import statements even if some of them manipulate the sys.path?

If your module y changes sys.path, the value will be the same on your A.py script even if you execute importlib.reload(sys)

So imagine module 'y' executes

from sys import path
path.clear()

In your A.py script:

import sys, importlib
import x, y

importlib.reload(sys)
print(sys.path) # is []

import z

The module z will not be found.

To fix this you can restore your script sys.path variable to the same value assigned at the beginning by the interpreter.


From the documentation:

A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

And...

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter

Lets assume that the interpreter is not running in interactive mode or read from stdin (Its executing a file script) and its located on the current working directory

Our A.py could look like:

import importlib

import x, y

# We can still load (sys, os, ...)
from sys import path
from os import getcwd
import site

print(sys.path) # []

path.append(getcwd()) # Add directory where script is executed
path.append(os.environ.get('PYTHONPATH')) # Add PYTHONPATH
site.main() # Add site packages

import z # Now this dont fail

Note: Even removing all sys.path items, importlib is able to locate the packages os, site, sys, ...

This is because importlib uses sys.modules to access such packages:

From importlib.find_loader documentation:

If the module is in sys.modules, then sys.modules[name].loader is returned

And from the sys.modules documentation:

This is a dictionary that maps module names to modules which have already been loaded.


EDIT:

This is a tricky solution that you can use to solve this problem: You can create a function which is invoked everytime you load a module. The function checks if sys.path is changed after the module is loaded.
If true, set it to its original value

from copy import copy
import warnings
import sys

sys.path = list(sys.path)
_original_path = copy(sys.path)
_base_import = __import__

def _import(*args, **kwargs):
try:
module = _base_import(*args, **kwargs)
return module
finally:
if type(sys.path) != list or sys.path != _original_path:
warnings.warn('System path was modified', Warning)
# Restore path
sys.path = copy(_original_path)

__builtins__.__import__ = _import

And now execute this code:

import sys

before = copy(sys.path)
import y # 'y' tries to change sys.path
after = copy(sys.path)

print(before == after) # True

It will also display a warning message on stdout


EDIT #2 (Another solution):

This works only on python >=3.7 because it relies on PEP 562

Here I basically replace the module 'sys' so that i can avoid external modules to change the actual sys.path

First create a script with the next code (proxy.py):

import importlib
from sys import path, modules
from copy import copy

path = copy(path)
modules = copy(modules)

def __getattr__(name):
if name in globals():
return getattr(globals(), name)
return getattr(importlib.import_module('sys'), name)

def __dir__():
return dir(importlib.import_module('sys'))

Now, on your A.py, put the next code:

import proxy
import sys
sys.modules['sys'] = proxy

import y # y imports 'sys' but import sys returns the 'proxy' module
# 'y' thinks he changes sys.path but it only modifies proxy.path

print(proxy.path) # []
print(sys.path) # Unchanged

Code on y module:

import sys
sys.path.clear() # a.k: proxy.path.clear()

# You can still access to all properties from the sys module
print(dir(sys)) # ['ps1', 'ps2', 'platform', ...]

Effect of using sys.path.insert(0, path) and sys.path(append) when loading modules

Because python checks in the directories in sequential order starting at the first directory in sys.path list, till it find the .py file it was looking for.

Ideally, the current directory or the directory of the script is the first always the first element in the list, unless you modify it, like you did. From documentation -

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.

So, most probably, you had a .py file with the same name as the module you were trying to import from, in the current directory (where the script was being run from).

Also, a thing to note about ImportErrors , lets say the import error says -
ImportError: No module named main - it doesn't mean the main.py is overwritten, no if that was overwritten we would not be having issues trying to read it. Its some module above this that got overwritten with a .py or some other file.

Example -

My directory structure looks like -

 - test
- shared
- __init__.py
- phtest.py
- testmain.py

Now From testmain.py , I call from shared import phtest , it works fine.

Now lets say I introduce a shared.py in test directory` , example -

 - test
- shared
- __init__.py
- phtest.py
- testmain.py
- shared.py

Now when I try to do from shared import phtest from testmain.py , I will get the error -

ImportError: cannot import name 'phtest'

As you can see above, the file that is causing the issue is shared.py , not phtest.py .



Related Topics



Leave a reply



Submit