Why the Global Interpreter Lock

What is the global interpreter lock (GIL) in CPython?

Python's GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can't effectively make use of multiple cores. (If the GIL didn't lead to this problem, most people wouldn't care about the GIL - it's only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides. It might be too much information, but then you did ask for details :-)

Note that Python's GIL is only really an issue for CPython, the reference implementation. Jython and IronPython don't have a GIL. As a Python developer, you don't generally come across the GIL unless you're writing a C extension. C extension writers need to release the GIL when their extensions do blocking I/O, so that other threads in the Python process get a chance to run.

Python multithreading and Global Interpreter Lock

The Global Interpreter Lock ensures that only one thread is executing byte code at once. That execution could be interrupted at any time.

Consider this simple function which might be intended to atomically store related values to attributes on an instance x

def f(x, a, b):
    x.a, x.b = a, b

Here is its disassembly into bytecode

          0 LOAD_FAST                1 (a)
          3 LOAD_FAST                2 (b)
          6 ROT_TWO
          7 LOAD_FAST                0 (x)
         10 STORE_ATTR               0 (a)
         13 LOAD_FAST                0 (x)
         16 STORE_ATTR               1 (b)
         19 LOAD_CONST               0 (None)
         22 RETURN_VALUE

Suppose x is not protected by a mutex. Then any thread executing f(x, 1, 2) can easily be interrupted between storing a (at 10) and storing b (at 16). That interrupting thread will now see x in an inconsistent state.

Why do GIL alternatives have an impact on performance?

Simply put, locking and unlocking many locks is more expensive than locking and unlocking a single lock. This shouldn't be surprising, doing anything N times instead of once obvious takes more time (all other things being equal). And for this kind of thing, economics of scale don't really apply, there's no big one-time cost to amortize over all locking operations.

Edit: In principle, Java has the same problem, but due to different focus of everyone involved, history, and perhaps other factors, Java gets by rather well with fine-grained locks. In short, single-threaded performance is not regarded that important, and multi-threaded performance is probably better than a hypothetical free-threaded CPython.

Historically, I don't think there ever was a JVM with a GIL (though it started out with green threads running on a single OS thread - but this was long ago), so there's no historical reasons for keeping a GIL and no base-line single-threaded performance that makes people loathe locks. Instead, a lot of effort was put into making Java good at multi-threading, and this ability is widely used. In contrast, even if you solved the GIL issue with no performance cost for single-threaded Python or Ruby programs, most code out there wouldn't benefit from it and the libraries are... not awful, but not exactly on par with java.util.concurrent either.

Because Java has (now) a memory model which explicitly doesn't give a lot of guarantees, many common operations in Java programs don't need any kind of lock in general. The downside is, of course, that Java programmers have to add locks or other synchronization manually when it is needed.
In addition, Java's locks have seen a lot of optimizations (some of which was original research and first introduced in JVM) to locks - thin locks, lock elision, etc. - which make locks with contention very cheap.

Another factor may be that a Java program runs almost entirely Java code (which, as I've described above, only needs very little synchronization if it's not explicitly requested), with only few calls into a runtime library. As a consequence, a free-threaded JVM could even have a global lock (or only a few coarse locks) for the JIT, the classloader, etc. without affecting most Java programs too much. In contrast, a Python program will spend a large part of its time in C code, either of the built-in modules or in third-party extension modules.

Is my Python multithreading code affected by the Global Interpreter Lock

If your do_stuff_function is CPU-bound, then running it in multiple thread will not help, because the GIL only allows 1 thread to be executed at a time.

The way around this in Python is to use multiple process, just replace

from multiprocessing.dummy import Pool

with

from multiprocessing import Pool

Global Interpreter lock: Jython vs CPython

Yes, Jython uses Java-Threads (even if you're using the threading modul of Python) and so it has no GIL. But this isn't the answer (otherwise it has to be 42, because the question is unclear :^) ).
The better Question is, what criteria you have and if CPython or Jython would be better.

If you want real multithreadding, it's your thing.
If you want to use Java and Python, use it.
If you want fast execution times .... then are other languages maybe better (you can try to messure the time in a thread task in Python and the same code in Jython, but I guess even with GIL CPython would be faster).

Greets,
Zonk

Why are numpy calculations not affected by the global interpreter lock?

Many numpy calculations are unaffected by the GIL, but not all.

While in code that does not require the Python interpreter (e.g. C libraries) it is possible to specifically release the GIL - allowing other code that depends on the interpreter to continue running. In the Numpy C codebase the macros NPY_BEGIN_THREADS and NPY_END_THREADS are used to delimit blocks of code that permit GIL release. You can see these in this search of the numpy source.

The NumPy C API documentation has more information on threading support. Note the additional macros NPY_BEGIN_THREADS_DESCR, NPY_END_THREADS_DESCR and NPY_BEGIN_THREADS_THRESHOLDED which handle conditional GIL release, dependent on array dtypes and the size of loops.

Most core functions release the GIL - for example Universal Functions (ufunc) do so as described:

as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions.

With regard to your own code, the source code for NumPy is available. Check the functions you use (and the functions they call) for the above macros. Note also that the performance benefit is heavily dependent on how long the GIL is released - if your code is constantly dropping in/out of Python you won't see much of an improvement.

The other option is to just test it. However, bear in mind that functions using the conditional GIL macros may exhibit different behaviour with small and large arrays. A test with a small dataset may therefore not be an accurate representation of performance for a larger task.

There is some additional information on parallel processing with numpy available on the official wiki and a useful post about the Python GIL in general over on Programmers.SE.

Why is there no GIL in the Java Virtual Machine? Why does Python need one so bad?

Python (the language) doesn't need a GIL (which is why it can perfectly be implemented on JVM [Jython] and .NET [IronPython], and those implementations multithread freely). CPython (the popular implementation) has always used a GIL for ease of coding (esp. the coding of the garbage collection mechanisms) and of integration of non-thread-safe C-coded libraries (there used to be a ton of those around;-).

The Unladen Swallow project, among other ambitious goals, does plan a GIL-free virtual machine for Python -- to quote that site, "In addition, we intend to remove the GIL and fix the state of multithreading in Python. We believe this is possible through the implementation of a more sophisticated GC system, something like IBM's Recycler (Bacon et al, 2001)."

Python thread and global interpreter lock when access same object

What did you expect? That one thread add one item, after that another thread and etc.? Than why so many threads, if they work by one at once? Threads are trying to work simultaneously with one object. But since the GIL is not a good thing to do parallel computing, they do it so ugly.

To get more undestanding how GIL works, you may add logging.

logging.basicConfig(format="%(levelname)-8s [%(asctime)s] %(threadName)-12s %(message)s", level=logging.DEBUG, filename='log.log')

def listTo300Elem(id):
    list_len = len(mylist)
    while list_len < 300:
        item = mylist[-1][1]+1]
        mylist.append([id, item])
        logging.debug('Len = {}, item {} added'.format(list_len, item))
        list_len = len(mylist)
    logging.debug('Len = {}, exit'.format(list_len, item))

So, threading in python is not suitable for all cases.