2

I'd like to develop a small debugging tool for Python programs. For the "Dynamic Slicing" feature, I need to find the variables that are accessed in a statement, and find the type of access (read or write) for those variables.

But the only disassembly feature that's built into Python is dis.disassemble, and that just prints the disassembly to standard output:

>>> dis.disassemble(compile('x = a + b', '', 'single'))
  1           0 LOAD_NAME                0 (a)
              3 LOAD_NAME                1 (b)
              6 BINARY_ADD          
              7 STORE_NAME               2 (x)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

I'd like to be able to transform the disassembly into a dictionary of sets describing which variables are used by each instruction, like this:

>>> my_disassemble('x = a + b')
{'LOAD_NAME': set(['a', 'b']), 'STORE_NAME': set(['x'])}

How can I do this?

4
  • How should this dictionary look like? Commented Jan 14, 2013 at 18:29
  • You can get the result as a string by temporarily redirecting stdout to a stringio object -- From there you need to parse it into a dictionary, but I don't know what the dict should look like, so that's kind of hard to say ... Commented Jan 14, 2013 at 18:29
  • I'd like to develop a small debugging tool for python programs.In the part Dynamic Slicing I need to find the variables that are accessed in a statement and find the type of access (read or write) for those variables. Commented Jan 14, 2013 at 18:31
  • for example in this line : x=a+b ==> LOAD_NAME={a,b} , STORE_NAME={x} Commented Jan 14, 2013 at 18:43

2 Answers 2

3

Read the source code for the dis module and you'll see that it's easy to do your own disassembly and generate whatever output format you like. Here's some code that generates the sequence of instructions in a code object, together with their arguments:

from opcode import *

def disassemble(co):
    """
    Disassemble a code object and generate its instructions.
    """
    code = co.co_code
    n = len(code)
    extended_arg = 0
    i = 0
    free = None
    while i < n:
        c = code[i]
        op = ord(c)
        i = i+1
        if op < HAVE_ARGUMENT:
            yield opname[op],
        else:
            oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
            extended_arg = 0
            i = i+2
            if op == EXTENDED_ARG:
                extended_arg = oparg*65536L
            if op in hasconst:
                arg = co.co_consts[oparg]
            elif op in hasname:
                arg = co.co_names[oparg]
            elif op in hasjrel:
                arg = repr(i + oparg)
            elif op in haslocal:
                arg = co.co_varnames[oparg]
            elif op in hascompare:
                arg = cmp_op[oparg]
            elif op in hasfree:
                if free is None:
                    free = co.co_cellvars + co.co_freevars
                arg = free[oparg]
            else:
                arg = oparg
            yield opname[op], arg

And here's an example disassembly.

>>> def f(x):
...     return x + 1
... 
>>> list(disassemble(f.func_code))
[('LOAD_FAST', 'x'), ('LOAD_CONST', 1), ('BINARY_ADD',), ('RETURN_VALUE',)]

You can easily transform this into the dictionary-of-sets data structure you want:

>>> from collections import defaultdict
>>> d = defaultdict(set)
>>> for op in disassemble(f.func_code):
...     if len(op) == 2:
...         d[op[0]].add(op[1])
... 
>>> d
defaultdict(<type 'set'>, {'LOAD_FAST': set(['x']), 'LOAD_CONST': set([1])})

(Or you could generate the dictionary-of-sets data structure directly.)

Note that in your application you probably don't actually need look up the name for each opcode. Instead, you could look up the opcodes you need in the opcode.opmap dictionary and create named constants, perhaps like this:

LOAD_FAST = opmap['LOAD_FAST'] # actual value is 124
...
for var in disassembly[LOAD_FAST]:
    ...

Update: in Python 3.4 you can use the new dis.get_instructions:

>>> def f(x):
...     return x + 1
>>> import dis
>>> list(dis.get_instructions(f))
[Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x',
             argrepr='x', offset=0, starts_line=1, is_jump_target=False),
 Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1,
             argrepr='1', offset=3, starts_line=None, is_jump_target=False),
 Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None,
             argrepr='', offset=6, starts_line=None, is_jump_target=False),
 Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None,
             argrepr='', offset=7, starts_line=None, is_jump_target=False)]
Sign up to request clarification or add additional context in comments.

Comments

-1

I think the challenge here is to capture the output of a dis rather than parsing the output and create a dictionary. The reason I will not cover the second part is, the format and the fields (key, value) of the dictionary is not mentioned and its trivial.

As I mentioned, the reason its a challenge to capture the OP of dis is, its a print rather than a return, but this can be captured through context manager

def foo(co):
    import sys
    from contextlib import contextmanager
    from cStringIO import StringIO
    @contextmanager
    def captureStdOut(output):
        stdout = sys.stdout
        sys.stdout = output
        yield
        sys.stdout = stdout
    out = StringIO()
    with captureStdOut(out):
        dis.disassemble(co.func_code)
    return out.getvalue()

import dis
import re
dict(re.findall("^.*?([A-Z_]+)\s+(.*)$", line)[0] for line in foo(foo).splitlines() 
                                                  if line.strip())
{'LOAD_CONST': '0 (None)', 'WITH_CLEANUP': '', 'SETUP_WITH': '21 (to 107)', 'STORE_DEREF': '0 (sys)', 'POP_TOP': '', 'LOAD_FAST': '4 (out)', 'MAKE_CLOSURE': '0', 'STORE_FAST': '4 (out)', 'IMPORT_FROM': '4 (StringIO)', 'LOAD_GLOBAL': '5 (dis)', 'END_FINALLY': '', 'RETURN_VALUE': '', 'LOAD_CLOSURE': '0 (sys)', 'BUILD_TUPLE': '1', 'CALL_FUNCTION': '0', 'LOAD_ATTR': '8 (getvalue)', 'IMPORT_NAME': '3 (cStringIO)', 'POP_BLOCK': ''}
>>> 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.