What Does --Enable-Optimizations Do While Compiling Python

what does --enable-optimizations do while compiling python?

This flag enables Profile guided optimization (PGO) and Link Time Optimization (LTO).

Both are expensive optimizations that slow down the build process but yield a significant speed boost (around 10-20% from what I remember reading).

The discussion of what these exactly do is beyond my knowledge and probably too broad for a single question. Either way, you can read a bit about LTO from the the docs on GCC which has an implementation for it and get a start on PGO by reading its wiki page.

Also, see the relevant issues opened on the Python Bug Tracker that added these:

  • Issue 24915: Profile Guided Optimization improvements (better training, llvm support, etc) (Added PGO.)
  • Issue 25702: Link Time Optimizations support for GCC and CLANG (Added LTO.)
  • Issue 26359: CPython build options for out-of-the box performance (Adds the --enable-optimizations flag to the configure script which enables the aforementioned optimizations.)

As pointed out by @Shuo in a comment and stated in Issue 28032, LTO isn't always enabled with the --enable-optimizations flag. Some platforms (depending on the supported version of gcc) will disable it in the configuration script.

Future versions of this flag will probably always have it enabled though, so it's pretty safe to talk about them both here.

What flags to use for ./configure when building Python from source

Welcome to the world of Python build configuration! I'll go through the command line options to ./configure one by one.

--with-pydebug is for core Python developers, not developers (like you and me) just using Python. It creates debugging symbols and slows down execution. You don't need it.

--enable-optimizations is good for performance in the long run, at the expense of lengthening the compiling process, possibly by 3-fold (or more), depending on your system. However, it results in faster execution, so I would use it in your situation.

--with-ensurepip=install is good. You want the most up-to-date version of pip.

--enable-shared is maybe not a good idea in your case, so I'd recommend not using it here. Read Difference between static and shared libraries? to understand the difference. Basically, since you'll possibly be installing to a non-system path (/opt/local, see below) that almost certainly isn't on your system's search path for shared libraries, you'll very likely run into problems down the road. A static build has all the pieces in one place, so you can install and run it from wherever. This is at the expense of size - the python binary will be rather large - but is great for non-sys admins. Even if you end up installing to /usr/local, I would argue that static is better/easier than shared.

--enable-unicode=ucs4 is optional, and may not be compatible with your system. You don't need it. ./configure is smart enough to figure out what Unicode settings are best. This option is left over from build instructions that are quite a few versions out of date.

--prefix I would suggest you use --prefix=/opt/local if that directory already exists and is in your $PATH, or if you know how to edit your $PATH in ~/.bashrc. Otherwise, use /usr/local or $HOME. /usr/local is the designated system-wide location for local software installs (i.e., stuff that doesn't come with Ubuntu), and is likely already on your $PATH. $HOME is always an option that doesn't require the use of sudo, which is great from a security perspective. You'll need to add /home/your_username/bin to your $PATH if it isn't already present.

Failed tests when compiling Python 3.7.4 on Ubuntu 18.04

It turned out to be an issue with readline. I resolved this issue by uninstalling readline and installing gnureadline instead, as noted here.

How to make the Python3 interpreter faster or the fastest possible?

I think I found a way but it only makes it slightly faster from my benchmarks.

I did it by doing the following things:

  1. Extending the profiling for PGO to all 425 regression tests that
    comes with the python3 source. Configuring "--enable-optimizations"
    only runs a small subset of the 425 regression test that comes with
    the Python3 source.
  2. Adding CFLAGS="-march=native -O3 -pipe" with LTO via "--with-lto"
    configure option
  3. Adding "-fprofile-update=prefer-atomic" to the profiling stage
  4. Adding "-fprofile-partial-training" to the final Feedback Directed
    Optimisation (FDO) stage.

How to do the above and what are the consequences?

First, the results...

Sample Image

Picture paints a thousand words as they say!

  • The red python has all points 1-4 above done.
  • While the green python only has points 2 with the stock
    "--enable-optimizations" configuration which does the limited PGO
    subset.

Lower is better. So you can see the majority of wins goes to the red python with several wins to the green python.

Pyperformance was used for the benchmarks which has a focus on real-world benchmarks, rather than synthetic benchmarks, using whole applications when possible.

https://pyperformance.readthedocs.io/index.html

And it was graphed using pyperfplot.

https://github.com/stefantalpalaru/pyperfplot

The endeavour wet my appetite so I did several more benchmarks which took a full day to do....
Sample Image

  • Red and yellow pythons are the same as the red and green from the
    previous graph.
  • Green python is Python3.9 from Ubuntu's repository compiled by them
    using gcc9.3.
  • Light Blue python is Clang12 with point 2 above with the stock
    "--enable-optimizations" configuration which does the limited PGO
    subset. It is the worst performer of the lot in the benchmarks!
    Surprising really I started this endeavour thinking Clang-12 would
    win out with all the recent publications and advertising going around
    with Linux now fully LTO'able and Clang-12 dominating first place
    wins in many Phoronix benchmark articles in the last couple of
    months.
  • Dark Blue is the default Ubuntu Python3.8 that comes from the
    repositories. Added here just to show if there's been progress from
    3.8 to 3.9 and to compare with my custom builds.

So How to do the above 4 points and what are the consequences?

  1. Get the python3 version you want to build, I got 3.9.6...
wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tar.xz

  1. Decompress...
tar -xf ./Python-3.9.6.tar.xz

  1. Go to the directory and configure it.
cd ./Python-3.9.6

For gcc


time CFLAGS="-march=native -O3 -pipe" ./configure --enable-optimizations --with-lto

For clang

time CC="clang" CFLAGS="-march=native -O3 -pipe -Wno-unused-value -Wno-empty-body -Qunused-arguments -Wno-parentheses-equality" ./configure --enable-optimizations --with-lto

The extra options for clang is just following the official advice from the python devs here... https://devguide.python.org/setup/#clang


  1. At this point you would traditional start building/compiling. However we want to further customise how the build will be with the extra options during profiling and during final release build.
nano Makefile

Search for "PGO_PROF_GEN_FLAG" (ctrl+w)
And append after a space "-fprofile-update=prefer-atomic" without the quotes. It should look something like...

PGO_PROF_GEN_FLAG=-fprofile-generate -fprofile-update=prefer-atomic

  1. The next line underneath should say "PGO_PROF_USE_FLAG"; it affects the final release build/compile append "-fprofile-partial-training" after a space at the end without the quotes. It should look something like...
PGO_PROF_USE_FLAG=-fprofile-use -fprofile-correction -fprofile-partial-training

Note that this point is only compatible with gcc. "-fprofile-partial-training" is not available to clang-12 at the time of this writing. Without this setting, gcc will 'optimise for size' code paths that were not part of profiling. Enabling this setting will make gcc optimise code paths not profiled, to be 'optimised for speed' aggressively which can lead to better performance but at the cost of larger code size.

see here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


  1. Finally we extended the list of regression tests to run from the stock subset to the full set of tests.
    While still in "nano Makefile" search for "PROFILE_TASK= -m test --pgo"
    and replace it with:
PROFILE_TASK=   -m test --pgo-extended

  1. Now you can start the build. Note however that enabling the full suite of tests for profiling will massively increase the amount of time needed to build Python3 to completion.
time make -j$(( $(nproc) + 1 ))

The -j formula in the command above just figures out the number of cpus you have and adds 1 for multiprocessing of the build/compile/linking to speed it up.

The regression tests though will be executed sequentially unfortunately with no easy way of switching to a concurrent way of running the tests.
It will run 425 tests profiling all of them!

On my i7-3770 it took this long...

real    49m26.882s
user 55m1.160s
sys 2m1.106s

But I did have a few other programs and applications and a VM running at the same time.


  1. Once done, "altinstall" so you do not mess up the default python3 that comes with your distribution which can cause problems.
sudo make altinstall

  1. If you have multiple custom built python versions use update-alternatives to manage them.
sudo update-alternatives --verbose --install /usr/local/bin/python3 python3 /usr/local/bin/python3.7 374 --slave /usr/local/bin/python3-config python3-config /usr/local/bin/python3.7-config
sudo update-alternatives --verbose --install /usr/local/bin/python3 python3 /usr/local/bin/python3.8 382 --slave /usr/local/bin/python3-config python3-config /usr/local/bin/python3.8-config
sudo update-alternatives --verbose --install /usr/local/bin/python3 python3 /usr/local/bin/python3.9 396 --slave /usr/local/bin/python3-config python3-config /usr/local/bin/python3.9-config

Use the following command to configure which is the default "python3"

sudo update-alternatives --config python3

This is mine...

There are 4 choices for the alternative python3 (providing /usr/local/bin/python3).

Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/local/bin/python3.9 396 auto mode
1 /usr/bin/pypy3 369 manual mode
2 /usr/local/bin/python3.7 374 manual mode
3 /usr/local/bin/python3.8 382 manual mode
4 /usr/local/bin/python3.9 396 manual mode

Press <enter> to keep the current choice[*], or type selection number:

Lastly, something to note is that any python3 in "/usr/bin" belongs to your linux distribution. Try not to mess with it as it can mess things up later. All your altinstalls will got to "/usr/local/bin".

Some Conclusions...

  • Clang, an awesome compiler and project, is bad for python3, well at least with my setup. Perhaps if their devs are reading this, they can do something about it.
  • GCC rules Python3, I haven't got Intel's compiler (ICC?) so don't know, but I hear it is even better when used to build python3.
  • The tweaks stated above and outlined have made my default python3 faster and snappier overall, however it took a LOT of time to build it! It is worth it in my opinion.

UPDATE:
Python 3.10.0
vs
ubuntu 20.04 stock python 3.8.10

Python 3.10.0 with Full PGO, Partial Training, Prefer-atomic, march=native (zen3 R7-5800X), O3 optimisation

Clear wins --> 33

Python 3.8.10 Ubuntu stock from repos

Clear wins --> 22

However looking at the graph you can see that where 3.10 lost out to 3.8.10 stock from Ubuntu repos, some of the magnitudes are quite big.

Python3.10-Zen3Native-PGO-PartialTraining-PreferAtomic.VS.Stock-Python3.8.10

How to fix the errors while compiling Python 3.2.0

I want to start by saying that what you're trying to do, is an exercise of futility. Check:

  • [Python]: PEP 373 -- Python 2.7 Release Schedule
  • [Python]: PEP 392 -- Python 3.2 Release Schedule

So:

  1. You're trying to "upgrade" from a version that's going out of support at the end of the year (your particular flavor (v2.7.15) released last year) to a version that has been dead for several years

  2. More: you're attempting v3.2.0 which is the very 1st one of that series

Quickly searching for your error revealed:

  • [Python.Bugs]: _dbm not building on Fedora 17
  • [Python.Bugs]: Failure to build _dbm with ndbm on Arch Linux

Now, this may or may not be the cause in your case. If it is, there is a fix, but you won't benefit from it because of #2..

A couple of ideas:

  • Generally, a software's 1st version of a series is likely to have more bugs, because it hasn't been tested much "in the real world" (as it hasn't been released yet). The chances of something going wrong increase if there is other software that is built on top of it (in this case, other Python 3rd-party modules). As an example (although not related to the current scenario), you could check [SO]: PyWin32 and Python 3.8.0
  • You should use a (Python) version that's supported and maintained (e.g. v3.8, or v3.7), so that you could have a real chance of getting help if running into problems
  • If for some reasons (that I fail to find logical) you need to stick to the v3.2, try at least using the latest one ([Python]: Python-3.2.6.tgz)

how to succesfully compile python 3.x

It seems the enable-optimizations was the problem,

jeremyr@b88:~/Python-3.7.3$ ./configure   
jeremyr@b88:~/Python-3.7.3$ make clean

takes care of it in my case.

Make (install from source) python without running tests

The configure option --enable-optimizations enables running test suites to generate data for profiling Python. The resulting python binary has better performance in executing python code. Improvements noted here

From configure help:
--enable-optimizations Enable expensive optimizations (PGO, etc). Disabled by default.

From wikipedia

 profile-guided optimisation uses the results of profiling test runs of the instrumented program to optimize the final generated code.

In short, you should not skip tests when using --enable-optimizations as the data required for profiling is generated by running tests.
You can run make -j8 build_all followed by make -j8 install to skip tests once(the tests would still run with install target), but that would defeat the purpose.
You can instead drop the configure flag for better build times.



Related Topics



Leave a reply



Submit