Cpython Is Bytecode Interpreter

CPython is bytecode interpreter?

CPython is the implementation of Python in C. It's the first implementation, and still the main one that people mean when they talk about Python. It compiles .py files to .pyc files. .pyc files contain bytecodes. The CPython implementation also interprets those bytecodes.
CPython is not written in C++, it is C.

The compilation from .py to .pyc happens transparently as needed. When you execute a .py file, it will first be compiled to a .pyc file if needed, then the .pyc file will be interpreted.

Jython is different because (in addition to being implemented in Java instead of C) it compiles .py files into .class files so they can be executed in the JVM.

How exactly is Python Bytecode Run in CPython?

Yes, your understanding is correct. There is basically (very basically) a giant switch statement inside the CPython interpreter that says "if the current opcode is so and so, do this and that".

http://hg.python.org/cpython/file/3.3/Python/ceval.c#l790

Other implementations, like Pypy, have JIT compilation, i.e. they translate Python to machine codes on the fly.

Does python bytecode (.pyc) need the C Python interpreter to run?

No, python interpreter is required.

You can use apps such as pyinstaller to make a executable of your scripts so that all required packages and python libs including the interpreter is self contained in a single executable. It runs like any other programs so nothing else needs to be done except double click and run.

Also .pyc files require the specific version of python to run with which they are compiled so it is really not a recommended way of distributing python code if thats what you are planning.

This answer has more details: https://stackoverflow.com/a/36027342/4289062

How is CPython implemented?

  1. Yes, CPython is compiled to bytecode which is then executed by the virtual machine.
  2. The virtual machine executes instructions one-by-one. It's written in C (but you can write it in another language) and looks like a huge if/else statement like "if the current instruction is this, do this; if the instruction is this, do another thing", and so on. Instructions aren't translated to binary - that's why it's called an interpreter.
    1. You can find the list of instructions here: https://docs.python.org/3.10/library/dis.html#python-bytecode-instructions
    2. The implementation of the VM is available here: https://github.com/python/cpython/blob/f71a69aa9209cf67cc1060051b147d6afa379bba/Python/ceval.c#L1718
  3. Bytecode doesn't have a concept of "line": it's just a stream of bytes. The interpreter can read one byte at a time and use another if/else statement to decide what instruction it's looking at. For example:
    curr_byte = read_byte()
    if curr_byte == 0x00:
    # Parse instruction with no arguments
    curr_instruction = DO_THING_A;
    args = NULL;
    elif curr_byte == 0x01:
    another_byte = read_byte()
    if another_byte == 0x00:
    # Parse a two-byte instruction
    curr_instruction = DO_THING_B;
    args = NULL;
    else:
    # Parse a one-byte instruction
    # with one argument
    curr_instruction = DO_THING_C;
    args = another_byte >> 1; # or whatever
    elif curr_byte == ...:
    ... # go on and on and on
  4. The entire point of bytecode is that it can be executed by another program (the interpreter, or virtual machine) on almost any hardware. For example, in order to get CPython running on new hardware, you'll need a C toolchain (compiler, linker, assembler etc) for this hardware and a bunch of functions that Python can call to do low-level stuff (allocate memory, output text, do networking etc). Once you have that, write C code that can execute the bytecode - and that's it.

Python vs Cpython

So what is CPython?

CPython is the original Python implementation. It is the implementation you download from Python.org. People call it CPython to distinguish it from other, later, Python implementations, and to distinguish the implementation of the language engine from the Python programming language itself.

The latter part is where your confusion comes from; you need to keep Python-the-language separate from whatever runs the Python code.

CPython happens to be implemented in C. That is just an implementation detail, really. CPython compiles your Python code into bytecode (transparently) and interprets that bytecode in a evaluation loop.

CPython is also the first to implement new features; Python-the-language development uses CPython as the base; other implementations follow.

What about Jython, etc.?

Jython, IronPython and PyPy are the current "other" implementations of the Python programming language; these are implemented in Java, C# and RPython (a subset of Python), respectively. Jython compiles your Python code to Java bytecode, so your Python code can run on the JVM. IronPython lets you run Python on the Microsoft CLR. And PyPy, being implemented in (a subset of) Python, lets you run Python code faster than CPython, which rightly should blow your mind. :-)

Actually compiling to C

So CPython does not translate your Python code to C by itself. Instead, it runs an interpreter loop. There is a project that does translate Python-ish code to C, and that is called Cython. Cython adds a few extensions to the Python language, and lets you compile your code to C extensions, code that plugs into the CPython interpreter.

Does Python Virtual Machine require a CPU to execute the bytecode?

In order to run an application on any computer, its code must always be somehow converted to machine code and then be executed by the CPU. The question is rather when and how this happens.

Let me try and show you how Python effectively executes bytecode.

Compiler vs Interpreter

Imagine the CPU in your computer understands nothing but Latin. You want to send it a letter with detailed instructions or a request, but you do not speak Latin. So, you will engage a translator: someone who translates your "English" letter (or whatever language you use) to Latin for you.

Compiled languages like C or Rust take your entire letter, translate all of it to Latin and really polish it. The outcome is a translated letter that is highly poetic and uses sophisticated language. An interpreter like Python, on the other hand, translates one word or one sentence at a time; it is more really like an interpreter as you encounter in the news that translates what someone in a foreign language says as they speak.

Bytecode

The full translation process from languages like C, Rust, or Python to machine code is quite complex and requires to carefully analyse the original program code. In order to avoid having to analyse your program code over and over again, the Python interpreter will do it just once, and then generate bytecode that is a very close representation of your Python code, but split up into the basic elements.

Let's take a look at a very simple Python function:

def f(x):
y = (x + 1)*(x - 1)
return y

The computation in this function comprises several calculations, which all have to be performed in the correct order. The bytecode reflects this:

    LOAD_VAR     x    # x+1
LOAD_CONST 1
ADD
LOAD_VAR x # x-1
LOAD_CONST 1
SUBTRACT
MULTIPLY # ()*()
STORE_VAR y # y = ...
LOAD_VAR y
RETURN

Indeed, the bytecode in Python is usually a very close representation of the Python code itself, just broken up into pieces of 'atomic' simple operations.

Internally, each bytecode instruction has a numeric value (that actually fits into a byte, hence the name). For instance, LOAD_VAR = 124, LOAD_CONST = 100, ADD = 23, etc. And the local variables and constant value are also expressed through numbers. Thus, if we assign x = 01 and y = 02, the above code becomes:

  124,  01, 100,  01,  23, 124,  01, 100,  01,  
24, 20, 125, 02, 124, 02, 83

Executing Bytecode

Below you will find a simple and minimalistic interpreter for 'Python bytecode' that is capable of executing the function we have defined in the beginning. The actual bytecode interpreter of Python is written in C and thus compiled to highly efficient machine code. But the principle is exactly the same.

It uses a stack to hold intermediate values. That is, the result of each operation is appended to a list. An operation that further processes these results takes them off the end of the list, does something (like add them together), and appends the result then back to the list (but you have to be careful when doing things like subtraction or division to keep the right order).

It is convenient to arrange the bytecode into pairs of instructions and arguments. Some instructions (like ADD) do not have an argument, so we just use 0 in that case. But the code used here is still the bytecode presented above.

def execute(bytecode, consts, vars):
stack = []
for (instr, arg) in bytecode:
if instr == 20:
stack.append(stack.pop() * stack.pop())
elif instr == 23:
stack.append(stack.pop() + stack.pop())
elif instr == 24:
second = stack.pop()
first = stack.pop()
stack.append(first - second)
elif instr == 83:
return stack.pop()
elif instr == 100:
stack.append( consts[arg] )
elif instr == 124:
stack.append( vars[arg] )
elif instr == 125:
vars[arg] = stack.pop()

my_bytecode = [
(124, 1), (100, 1), (23, 0), (124, 1), (100, 1),
(24, 0), (20, 0), (125, 2), (124, 2), (83, 0)
]
my_consts = [ None, 1 ]
my_vars = [ x, 0 ]
execute(my_bytecode, my_consts, my_vars)

You can actually look at the lists of constant values (although they are actually tuples, not lists), or in what order the local variables are defined using:

print(f.__code__.co_code)      # prints the bytecode
print(f.__code__.co_consts) # prints (None, 1)
print(f.__code__.co_varnames) # prints ('x', 'y')

A tad more convenient is to use the inspect and dis modules, of course.

python bytecode, the interpreter and virtual machine

You would first need to write a Python compiler (not interpreter), in any language, preferably Python. The first run of the compiler would need to be run throught the interpreter.

You would then compile your compiler with itself, leading to a native compiler that needs no interpreter.

You could then use the compiler to compile any Python to native code.

This process is called bootstrapping, and is used by many, if not most, major compilers for many languages.

You can read more about this process here: http://en.wikipedia.org/wiki/Bootstrapping_(compilers)

As for creating an operating system, you would need to implement, as a bare minimum, a Python interpreter, if you want to avoid compiled code. If you write a Python interpreter as a microkernel, you could write the rest of the operating system in Python. (Edit: I just inadvertently described Cleese, which Jiaaro mentioned :))

Is there any difference between cpython and python

Python is a language.

CPython is the default byte-code interpreter of Python, which is written in C.

There is also other implementation of Python such as IronPython (for .NET), Jython (for Java), etc.



Related Topics



Leave a reply



Submit