Finding the Source Code For Built-In Python Functions

Finding the source code for built-in Python functions?

Since Python is open source you can read the source code.

To find out what file a particular module or function is implemented in you can usually print the __file__ attribute. Alternatively, you may use the inspect module, see the section Retrieving Source Code in the documentation of inspect.

For built-in classes and methods this is not so straightforward since inspect.getfile and inspect.getsource will return a type error stating that the object is built-in. However, many of the built-in types can be found in the Objects sub-directory of the Python source trunk. For example, see here for the implementation of the enumerate class or here for the implementation of the list type.

Where to find the source code of built-in map() function in Python

Builtin functions are not written in Python, but rather in C. What you see in you IDE is only a placeholder, it is of course not the real implementation.

Here you can find the cPython implementation of map.

You will notice that since map is a class, its method are defined beforehand. By exemple, if you are interested in map.__next__, you look for the definition. map_next. After those method definitions, you can find the allocation of the map type.

Where to find python source code for built in open()

With Python on GitHub, have a look at:

  • https://github.com/python/cpython/blob/master/Lib/io.py#L59
  • https://github.com/python/cpython/blob/master/Lib/_pyio.py#L40
  • https://github.com/python/cpython/blob/master/Python/fileutils.c#L989

How can I find the location of the source code of a built-in Python method?

You can usually find the source files for core python modules in the python installation folder itself. For instance, on linux, I can find the source code for os module which is a quite popular python module in this location:

/usr/lib/python2.7/os.py

If you are on windows, this is generally C:\python27\lib, but you can verify it for yourself by running which python in case of linux and where python in case of windows.

How len() function is implemented in python 3 and how to find the source code of built in functions in python?

Question 0

I got curious if I write len(nums) in while will my program do more computations.

One aspect of this is documented in the Python wiki's TimeComplexity page for all built-in data structures. len() for a list is O(1).

If you mean something along the lines of

will my program be faster if I do n = len(nums), then manually subtract 1 from n each time I remove from the list

then that's a whole other question, and the answer is likely (measure it!) to be (perhaps somewhat unintuitively) "no", since len() is implemented in C (fast!) and interpreting Python code (n -= 1) and executing it is slower.

Question 1

How to look for the source code of builtin functions without manually searching the source code on GitHub?

As prerequisites, you will need to

  • know how to read C and understand the control flow
  • be able to keep track of the call graph (in your head, in a text file, on a notepad)
  • have an intuition of where you start looking in the source code

GitHub's source code search is, well, passable, but you'll have a better time downloading the source and using a better IDE to jump around in the code.

For built-in functions in modules, I'd start searching for e.g. mathmodule.c for the math functions, etc.

For implementations of objects, there's e.g. listobject.c. It's fairly logical (most of the time).

Question 2

  1. You already found builtin_len.
  2. You can see it calls PyObject_Size. That's defined here.
  3. It does PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence;, i.e. grabs a pointer to the type header of the object, and the "slot" (not to be confused with the Python userland slots) of the sequence-related methods for the object.
  4. If that method collection contains a valid sq_length() function, it is called: Py_ssize_t len = m->sq_length(o); If that length is valid, it is returned, and len() wraps the bare size_t into a Python long object and passes it to you.
  5. If that fails, PyMapping_Size gets called.
  6. It does a similar thing as the sq_length stuff, only using mapping methods, tp_as_mapping and mp_length.
  7. If all that fails, a TypeError is raised using the type_error() helper.

Here in listobject.c, you can see how list_length() is hooked up to be sq_length for list objects.

That function only calls Py_SIZE()[https://docs.python.org/3/c-api/structures.html#c.Py_SIZE], which is a macro to access the ob_size field which all PyVarObjects have.

The documentation on find how Python's list objects use ob_size is here.

As for how a custom type with __len__ hooks up into all of this, my recollection is that objects with __len__ will have their sq_length() call the Python callable, if one exists, and that value is then "trampolined" back through the C code back to your Python code.

Finding the source code of methods implemented in C?

No, there is not. There is no metadata accessible from Python that will let you find the original source file. Such metadata would have to be created explicitly by the Python developers, without a clear benefit as to what that would achieve.

First and foremost, the vast majority of Python installations do not include the C source code. Next, while you could conceivably expect users of the Python language to be able to read Python source code, Python's userbase is very broad and a large number do not know C or are interested in how the C code works, and finally, even developers that know C can't be expected to have to read the Python C API documentation, something that quickly becomes a requirement if you want to understand the Python codebase.

C files do not directly map to a specific output file, unlike Python bytecode cache files and scripts. Unless you create a debug build with a symbol table, the compiler doesn't retain the source filename in the generated object file (.o) it outputs, nor will the linker record what .o files went into the result it produces. Nor do all C files end up contributing to the same executable or dynamic shared object file; some become part of the Python binary, others become loadable extensions, and the mix is configurable and dependent on what external libraries are available at the time of compilation.

And between makefiles, setup.py and C pre-propressor macros, the combination of input files and what lines of source code are actually used to create each of the output files also varies. Last but not least, because the C source files are no longer consulted at runtime, they can't be expected to still be available in the same original location, so even if there was some metadata stored you still couldn't map that back to the original.

So, it's just easier to just remember a few base rules about how the Python C-API works, then map that back to the C code with a few informed code searches.

Alternatively, download the Python source code and create a debug build, and use a good IDE to help you map symbols and such back to source files. Different compilers, platforms and IDEs have different methods of supporting symbol tables for debugging.



Related Topics



Leave a reply



Submit