Finding the source code for built-in Python functions?
Since Python is open source you can read the source code.
To find out what file a particular module or function is implemented in you can usually print the __file__
attribute. Alternatively, you may use the inspect
module, see the section Retrieving Source Code in the documentation of inspect
.
For built-in classes and methods this is not so straightforward since inspect.getfile
and inspect.getsource
will return a type error stating that the object is built-in. However, many of the built-in types can be found in the Objects
sub-directory of the Python source trunk. For example, see here for the implementation of the enumerate class or here for the implementation of the list
type.
Where to find the source code of built-in map() function in Python
Builtin functions are not written in Python, but rather in C. What you see in you IDE is only a placeholder, it is of course not the real implementation.
Here you can find the cPython implementation of map
.
You will notice that since map
is a class, its method are defined beforehand. By exemple, if you are interested in map.__next__
, you look for the definition. map_next
. After those method definitions, you can find the allocation of the map
type.
Where to find python source code for built in open()
With Python on GitHub, have a look at:
- https://github.com/python/cpython/blob/master/Lib/io.py#L59
- https://github.com/python/cpython/blob/master/Lib/_pyio.py#L40
- https://github.com/python/cpython/blob/master/Python/fileutils.c#L989
How can I find the location of the source code of a built-in Python method?
You can usually find the source files for core python modules in the python installation folder itself. For instance, on linux
, I can find the source code for os
module which is a quite popular python module in this location:
/usr/lib/python2.7/os.py
If you are on windows
, this is generally C:\python27\lib
, but you can verify it for yourself by running which python
in case of linux and where python
in case of windows
.
How len() function is implemented in python 3 and how to find the source code of built in functions in python?
Question 0
I got curious if I write len(nums) in while will my program do more computations.
One aspect of this is documented in the Python wiki's TimeComplexity page for all built-in data structures. len()
for a list is O(1)
.
If you mean something along the lines of
will my program be faster if I do
n = len(nums)
, then manually subtract 1 fromn
each time I remove from the list
then that's a whole other question, and the answer is likely (measure it!) to be (perhaps somewhat unintuitively) "no", since len()
is implemented in C (fast!) and interpreting Python code (n -= 1
) and executing it is slower.
Question 1
How to look for the source code of builtin functions without manually searching the source code on GitHub?
As prerequisites, you will need to
- know how to read C and understand the control flow
- be able to keep track of the call graph (in your head, in a text file, on a notepad)
- have an intuition of where you start looking in the source code
GitHub's source code search is, well, passable, but you'll have a better time downloading the source and using a better IDE to jump around in the code.
For built-in functions in modules, I'd start searching for e.g. mathmodule.c
for the math functions, etc.
For implementations of objects, there's e.g. listobject.c
. It's fairly logical (most of the time).
Question 2
- You already found
builtin_len
. - You can see it calls
PyObject_Size
. That's defined here. - It does
PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence;
, i.e. grabs a pointer to the type header of the object, and the "slot" (not to be confused with the Python userland slots) of the sequence-related methods for the object. - If that method collection contains a valid
sq_length()
function, it is called:Py_ssize_t len = m->sq_length(o);
If that length is valid, it is returned, andlen()
wraps the baresize_t
into a Python long object and passes it to you. - If that fails,
PyMapping_Size
gets called. - It does a similar thing as the
sq_length
stuff, only using mapping methods,tp_as_mapping
andmp_length
. - If all that fails, a TypeError is raised using the
type_error()
helper.
Here in listobject.c
, you can see how list_length()
is hooked up to be sq_length
for list objects.
That function only calls Py_SIZE()
[https://docs.python.org/3/c-api/structures.html#c.Py_SIZE], which is a macro to access the ob_size
field which all PyVarObjects have.
The documentation on find how Python's list objects use ob_size
is here.
As for how a custom type with __len__
hooks up into all of this, my recollection is that objects with __len__
will have their sq_length()
call the Python callable, if one exists, and that value is then "trampolined" back through the C code back to your Python code.
Finding the source code of methods implemented in C?
No, there is not. There is no metadata accessible from Python that will let you find the original source file. Such metadata would have to be created explicitly by the Python developers, without a clear benefit as to what that would achieve.
First and foremost, the vast majority of Python installations do not include the C source code. Next, while you could conceivably expect users of the Python language to be able to read Python source code, Python's userbase is very broad and a large number do not know C or are interested in how the C code works, and finally, even developers that know C can't be expected to have to read the Python C API documentation, something that quickly becomes a requirement if you want to understand the Python codebase.
C files do not directly map to a specific output file, unlike Python bytecode cache files and scripts. Unless you create a debug build with a symbol table, the compiler doesn't retain the source filename in the generated object file (.o
) it outputs, nor will the linker record what .o
files went into the result it produces. Nor do all C files end up contributing to the same executable or dynamic shared object file; some become part of the Python binary, others become loadable extensions, and the mix is configurable and dependent on what external libraries are available at the time of compilation.
And between makefiles, setup.py
and C pre-propressor macros, the combination of input files and what lines of source code are actually used to create each of the output files also varies. Last but not least, because the C source files are no longer consulted at runtime, they can't be expected to still be available in the same original location, so even if there was some metadata stored you still couldn't map that back to the original.
So, it's just easier to just remember a few base rules about how the Python C-API works, then map that back to the C code with a few informed code searches.
Alternatively, download the Python source code and create a debug build, and use a good IDE to help you map symbols and such back to source files. Different compilers, platforms and IDEs have different methods of supporting symbol tables for debugging.
Related Topics
How to Pass a Variable Between Flask Pages
Import a Module from a Relative Path
How to Expand the Output Display to See More Columns of a Pandas Dataframe
Find Unique Rows in Numpy.Array
How to Install Pip With Python 3
Normal Arguments Vs. Keyword Arguments
Setting the Correct Encoding When Piping Stdout in Python
Does Python Optimize Tail Recursion
Multiple Assignment and Evaluation Order in Python
Word Boundary With Words Starting or Ending With Special Characters Gives Unexpected Results
How to Iterate Over Files in a Given Directory
Equivalent of Shell 'Cd' Command to Change the Working Directory
How to "Log In" to a Website Using Python'S Requests Module
How to Select a Drop-Down Menu Value With Selenium Using Python