1

I've built a function called foo to alter a function's code at bytecode level and execute it before returning to regular function execution flow.

import sys
from types import CodeType


def foo():
    frame = sys._getframe(1) # get main's frame

    main_code: CodeType = do_something(frame.f_code) # modify function code

    # copy globals & locals
    main_globals: dict = frame.f_globals.copy()
    main_locals: dict = frame.f_locals.copy()

    # execute altered bytecode before returning to regular code
    exec(main_code, main_globals, main_locals)

    return

def main():
    bar: list = []

    # run altered code
    foo()

    # return to regular code
    bar.append(0)

    return bar

if __name__ == '__main__':
    main()

Though, there is a problem with the evaluation of the local variable during exec:

Traceback (most recent call last):
  File "C:\Users\Pedro\main.py", line 31, in <module>
    main()
  File "C:\Users\Pedro\main.py", line 23, in main
    foo()
  File "C:\Users\Pedro\main.py", line 15, in foo
    exec(main_code, main_globals, main_locals)
  File "C:\Users\Pedro\main.py", line 26, in main
    bar.append(0)
UnboundLocalError: local variable 'bar' referenced before assignment

If I print main_locals before the call to exec it shows exactly the same contents as if it was done before calling foo. I wonder if it has to do with any of the frame.f_code.co_* arguments passed to the CodeType constructor. They are pretty much the same, except for the actual bytecode frame.f_code.co_code, to which I made a few operations.

I need help to understand why the evaluation of the code under these globals and locals fail to reference main's local variables.

Note: I'm pretty sure that the changes made to main's bytecode prevent the process from going into unwanted recursion.

Edit: As asked in the comments, the basic behaviour of do_something can be resumed to remove all of main's code before call to foo. Some additional steps would involve applying changes to local variables i.e. bar.

import copy
import dis

## dump opcodes into global scope
globals().update(dis.opmap)

NULL = 0

def do_something(f_code) -> CodeType:
    bytecode = f_code.co_code
    f_consts = copy.deepcopy(f_code.co_consts)

    for i in range(0, len(bytecode), 2):
        cmd, arg = bytecode[i], bytecode[i+1]
        # watch for the first occurence of calling 'foo'
        if cmd == LOAD_GLOBAL and f_code.co_names[arg] == 'foo':
            break # use 'i' variable later
    else:
        raise NameError('foo is not defined.')

    f_bytelist = list(bytecode)

    f_bytelist[i:i+4] = [
        NOP, NULL,                ## LOAD
        LOAD_CONST, len(f_consts) ## CALL
        # Constant 'None' will be added to 'f_consts'
        ]

    f_bytelist[-2:] = [NOP, NULL] # 'main' function RETURN

    # This piece of code removes all code before
    # calling 'foo' (except for JUMP_ABSOLUTE) so
    # it can be usend inside while loops.
    null_code = [True] * i
    j = i + 2
    while j < len(f_bytelist):
        if j >= i:
            cmd, arg = f_bytelist[j], f_bytelist[j+1]
            if cmd == JUMP_ABSOLUTE and arg < i and null_code[arg]:
                j = arg
            else:
                j += 2
        else:
            null_code[j] = False
            j += 2
    else:
        for j in range(0, i, 2):
            if null_code[j]:
                f_bytelist[j:j+2] = [NOP, NULL] # skip instruction
            else:
                continue    
    
    f_bytecode = bytes(f_bytelist)
    f_consts = f_consts + (None,) ## Add constant to return

    return CodeType(
            f_code.co_argcount,
            f_code.co_kwonlyargcount, 
            f_code.co_posonlyargcount, # Remove this if Python < 3.8
            f_code.co_nlocals,
            f_code.co_stacksize,
            f_code.co_flags,
            f_bytecode,
            f_consts,
            f_code.co_names,
            f_code.co_varnames,
            f_code.co_filename,
            f_code.co_name,
            f_code.co_firstlineno,
            f_code.co_lnotab,
            f_code.co_freevars,
            f_code.co_cellvars
            )
5
  • 3
    i think we are going to have to see do_something(frame.f_code) Commented Mar 13, 2021 at 4:46
  • 4
    Yes, the issue is almost certainly in the function you didn't show; that is by far the most important part of how your program actually works, surely? Please edit to include a minimal reproducible example. Commented Mar 13, 2021 at 8:05
  • I just added it below. I did not include it before due to its extent and because i was thinking that variable evaluation depended only in the scope defined by exec's parameters. Commented Mar 14, 2021 at 5:50
  • 1
    Can you add the necessary import statements? The code still can't be run as shown (we're missing the copy module and something that gives lots of bytecode constants). Anyway, my minimally informed guess is that this has to do with LOAD_FAST bytecodes not using dictionaries for locals and globals, but using a separate sequential bit of memory that gives much better performance. I think when you run exec with two arguments, you get the equivalent of a code run in a class body, not the equivalent of a function body, like you want. Commented Mar 14, 2021 at 6:01
  • The LOAD_FAST hypotesis is very interesting. I will dive further into this. Maybe, there is something to do with the f_code.co_lnotab byte sequence. I haven't understood it yet but it's some sort of lookup table. I've included aditional code. Commented Mar 14, 2021 at 21:30

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.