How should I understand the output of dis.dis?
You are trying to disassemble a string containing source code, but that's not supported by dis.dis
in Python 2. With a string argument, it treats the string as if it contained byte code (see the function disassemble_string
in dis.py
). So you are seeing nonsensical output based on misinterpreting source code as byte code.
Things are different in Python 3, where dis.dis
compiles a string argument before disassembling it:
Python 3.2.3 (default, Aug 13 2012, 22:28:10)
>>> import dis
>>> dis.dis('heapq.nlargest(d,3)')
1 0 LOAD_NAME 0 (heapq)
3 LOAD_ATTR 1 (nlargest)
6 LOAD_NAME 2 (d)
9 LOAD_CONST 0 (3)
12 CALL_FUNCTION 2
15 RETURN_VALUE
In Python 2 you need to compile the code yourself before passing it to dis.dis
:
Python 2.7.3 (default, Aug 13 2012, 18:25:43)
>>> import dis
>>> dis.dis(compile('heapq.nlargest(d,3)', '<none>', 'eval'))
1 0 LOAD_NAME 0 (heapq)
3 LOAD_ATTR 1 (nlargest)
6 LOAD_NAME 2 (d)
9 LOAD_CONST 0 (3)
12 CALL_FUNCTION 2
15 RETURN_VALUE
What do the numbers mean? The number 1
on the far left is the line number in the source code from which this byte code was compiled. The numbers in the column on the left are the offset of the instruction within the bytecode, and the numbers on the right are the opargs. Let's look at the actual byte code:
>>> co = compile('heapq.nlargest(d,3)', '<none>', 'eval')
>>> co.co_code.encode('hex')
'6500006a010065020064000083020053'
At offset 0 in the byte code we find 65
, the opcode for LOAD_NAME
, with the oparg 0000
; then (at offset 3) 6a
is the opcode LOAD_ATTR
, with 0100
the oparg, and so on. Note that the opargs are in little-endian order, so that 0100
is the number 1. The undocumented opcode
module contains tables opname
giving you the name for each opcode, and opmap
giving you the opcode for each name:
>>> opcode.opname[0x65]
'LOAD_NAME'
The meaning of the oparg depends on the opcode, and for the full story you need to read the implementation of the CPython virtual machine in ceval.c
. For LOAD_NAME
and LOAD_ATTR
the oparg is an index into the co_names
property of the code object:
>>> co.co_names
('heapq', 'nlargest', 'd')
For LOAD_CONST
it is an index into the co_consts
property of the code object:
>>> co.co_consts
(3,)
For CALL_FUNCTION
, it is the number of arguments to pass to the function, encoded in 16 bits with the number of ordinary arguments in the low byte, and the number of keyword arguments in the high byte.
capturing dis.dis results
Unfortunately, in Python versions before 3.4 the dis
module uses print statements to stdout, so it won't return anything directly useful. Either you have to re-implement the dis
, disassemble
and disassemble_string
functions, or you temporarily replace sys.stdout
with an alternative to capture the output:
import sys
from cStringIO import StringIO
out = StringIO()
stdout = sys.stdout
sys.stdout = out
try:
dis.dis()
finally:
sys.stdout = stdout
out = out.getvalue()
This is actually best done using a context manager:
import sys
from contextlib import contextmanager
from cStringIO import StringIO
@contextmanager
def captureStdOut(output):
stdout = sys.stdout
sys.stdout = output
try:
yield
finally:
sys.stdout = stdout
out = StringIO()
with captureStdOut(out):
dis.dis()
print out.getvalue()
That way you are guaranteed to have stdout
restored even if something goes wrong with dis
. A little demonstration:
>>> out = StringIO()
>>> with captureStdOut(out):
... dis.dis(captureStdOut)
...
>>> print out.getvalue()
83 0 LOAD_GLOBAL 0 (GeneratorContextManager)
3 LOAD_DEREF 0 (func)
6 LOAD_FAST 0 (args)
9 LOAD_FAST 1 (kwds)
12 CALL_FUNCTION_VAR_KW 0
15 CALL_FUNCTION 1
18 RETURN_VALUE
In Python 3.4 and up, the relevant functions take a file
parameter to redirect output to:
from io import StringIO
with StringIO() as out:
dis.dis(file=out)
print(out.getvalue())
Get the results of dis.dis() in a string
Uses StringIO
to redirect stdout
to a string-like object (python 2.7 solution)
import sys
import StringIO
import dis
def a():
print "Hello World"
stdout = sys.stdout # Hold onto the stdout handle
f = StringIO.StringIO()
sys.stdout = f # Assign new stdout
dis.dis(a) # Run dis.dis()
sys.stdout = stdout # Reattach stdout
print f.getvalue() # print contents
How does one use `dis.dis` to analyze performance?
You (or at least regular people) can't look at different assembly codes, and tell which one is faster.
Try %%timeit magic function from IPython.
It will automatically run the piece of code several times, and give you an objective answer.
I recently found this blog post that teaches how to measure these kind of things in Python. Not only time, but memory usage too. The higlight of the post (for me, at least) it's when it teaches you to implement the %lprun magic function.
Using it, you will be able to see your function line by line, and know exactly how much each one contribute to the total time spent.
I've been using for a few weeks now, and it's great.
Use output of dis module?
I found out that Python 3.4 comes with dis.get_instructions()
. This is what I was looking for:
def get_assigned_name(frame):
''' Checks the bytecode of *frame* to find the name of the variable
a result is being assigned to and returns that name. Returns the full
left operand of the assignment. Raises a `ValueError` if the variable
name could not be retrieved from the bytecode (eg. if an unpack sequence
is on the left side of the assignment).
>>> var = get_assigned_frame(sys._getframe())
>>> assert var == 'var'
'''
SEARCHING, MATCHED = 1, 2
state = SEARCHING
result = ''
for op in dis.get_instructions(frame.f_code):
if state == SEARCHING and op.offset == frame.f_lasti:
state = MATCHED
elif state == MATCHED:
if result:
if op.opname == 'LOAD_ATTR':
result += op.argval + '.'
elif op.opname == 'STORE_ATTR':
result += op.argval
break
else:
raise ValueError('expected {LOAD_ATTR, STORE_ATTR}', op.opname)
else:
if op.opname in ('LOAD_NAME', 'LOAD_FAST'):
result += op.argval + '.'
elif op.opname in ('STORE_NAME', 'STORE_FAST'):
result = op.argval
break
else:
message = 'expected {LOAD_NAME, LOAD_FAST, STORE_NAME, STORE_FAST}'
raise ValueError(message, op.opname)
if not result:
raise RuntimeError('last frame instruction not found')
return result
what's in python dis output?
If you look for the byte offsets you will observe that each of the tagged lines is the object of a jump or other branching operation. It's supposed to help you identify loop scopes and the like more easily.
Why does dis.dis(None) return output?
From the documentation:
dis.dis([bytesource])
Disassemble the bytesource object. bytesource can denote either a module, a class, a method, a function, or a code object. For a module, it disassembles all functions. For a class, it disassembles all methods. For a single code sequence, it prints one line per bytecode instruction. If no object is provided, it disassembles the last traceback.
Emphasis mine.
If you try it in a new interpreter there is no last traceback, so you get an error:
>>> import dis
>>> dis.dis(None)
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
dis.dis(None)
File "C:\Python27\lib\dis.py", line 23, in dis
distb()
File "C:\Python27\lib\dis.py", line 57, in distb
raise RuntimeError, "no last traceback to disassemble"
RuntimeError: no last traceback to disassemble
But now if you try the same command it disassembles the RuntimeError
you just got:
>>> dis.dis(None)
53 0 LOAD_FAST 0 (tb)
3 LOAD_CONST 2 (None)
6 COMPARE_OP 8 (is)
9 POP_JUMP_IF_FALSE 82
54 12 SETUP_EXCEPT 13 (to 28)
55 15 LOAD_GLOBAL 1 (sys)
18 LOAD_ATTR 2 (last_traceback)
21 STORE_FAST 0 (tb)
24 POP_BLOCK
25 JUMP_FORWARD 26 (to 54)
56 >> 28 DUP_TOP
29 LOAD_GLOBAL 3 (AttributeError)
32 COMPARE_OP 10 (exception match)
35 POP_JUMP_IF_FALSE 53
38 POP_TOP
39 POP_TOP
40 POP_TOP
57 41 LOAD_GLOBAL 4 (RuntimeError)
44 LOAD_CONST 1 ('no last traceback to disassemble')
--> 47 RAISE_VARARGS 2
50 JUMP_FORWARD 1 (to 54)
Why does dis.dis(None) return output?
From the documentation:
dis.dis([bytesource])
Disassemble the bytesource object. bytesource can denote either a module, a class, a method, a function, or a code object. For a module, it disassembles all functions. For a class, it disassembles all methods. For a single code sequence, it prints one line per bytecode instruction. If no object is provided, it disassembles the last traceback.
Emphasis mine.
If you try it in a new interpreter there is no last traceback, so you get an error:
>>> import dis
>>> dis.dis(None)
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
dis.dis(None)
File "C:\Python27\lib\dis.py", line 23, in dis
distb()
File "C:\Python27\lib\dis.py", line 57, in distb
raise RuntimeError, "no last traceback to disassemble"
RuntimeError: no last traceback to disassemble
But now if you try the same command it disassembles the RuntimeError
you just got:
>>> dis.dis(None)
53 0 LOAD_FAST 0 (tb)
3 LOAD_CONST 2 (None)
6 COMPARE_OP 8 (is)
9 POP_JUMP_IF_FALSE 82
54 12 SETUP_EXCEPT 13 (to 28)
55 15 LOAD_GLOBAL 1 (sys)
18 LOAD_ATTR 2 (last_traceback)
21 STORE_FAST 0 (tb)
24 POP_BLOCK
25 JUMP_FORWARD 26 (to 54)
56 >> 28 DUP_TOP
29 LOAD_GLOBAL 3 (AttributeError)
32 COMPARE_OP 10 (exception match)
35 POP_JUMP_IF_FALSE 53
38 POP_TOP
39 POP_TOP
40 POP_TOP
57 41 LOAD_GLOBAL 4 (RuntimeError)
44 LOAD_CONST 1 ('no last traceback to disassemble')
--> 47 RAISE_VARARGS 2
50 JUMP_FORWARD 1 (to 54)
How to get function parameters from dis
xdis isn't going to give you much useful functionality here. It's just giving you some more object'd output that you could theoretically muck with a bit easier than the regular dis
module. But, the stock module tells us all we need to know:
>>> from dis import dis
>>> def f(a, b):
... return 1
...
>>> dis(f)
2 0 LOAD_CONST 1 (1)
2 RETURN_VALUE
Note how the disassembly includes only two opcodes. LOAD_CONST
pushes a 1
onto the stack (the cpython runtime is stack based) and RETURN_VALUE
returns from the function with the value on the top of the stack. There is no mention of a
nor b
here. And this makes sense. They aren't used! Byte the byte code doesn't concern itself with function arguments. It will emit the necessary ops to put them on the stack (where needed):
>>> def f(a, b):
... return a + b
...
>>> dis(f)
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_ADD
6 RETURN_VALUE
Note here that LOAD_FAST
gets a
and b
and pushes them onto the stack for BINARY_ADD
(which adds the top two values on the stack and pushes the result).
You can get at what you want by using __code__
, specifically:
params_and_locals = f.__code__.co_varnames
num_args = f.__code__.co_argcount + f.__code__.co_kwonlyargcount
params = params_and_locals[:num_args]
Related Topics
Get First Row Value of a Given Column
Filedialog, Tkinter and Opening Files
Convert List into a Dictionary
Python: Fastest Way to Create a List of N Lists
Python Imaging Library - Text Rendering
Passing Numpy Arrays to a C Function for Input and Output
Why Do Two Identical Lists Have a Different Memory Footprint
How to Read Datetime Back from SQLite as a Datetime Instead of String in Python
Best Way to Parse a Url Query String
Printing a List Separated with Commas, Without a Trailing Comma
How Does My Input Not Equal the Answer
Numpy to Tfrecords: Is There a More Simple Way to Handle Batch Inputs from Tfrecords
Fitting a Normal Distribution to 1D Data
Real World Example About How to Use Property Feature in Python
How to Use If/Else in a Dictionary Comprehension
Python Command Line Input in a Process