5

First of all I want to mention that I know this is a horrible idea and it shouldn't be done. My intention is mainly curiosity and learning the innards of Python, and how to 'hack' them.

I was wondering whether it is at all possible to change what happens when we, for instance, use [] to create a list. Is there a way to modify how the parser behaves in order to, for instance, cause ["hello world"] to call print("hello world") instead of creating a list with one element?

I've attempted to find any documentation or posts about this but failed to do so.

Below is an example of replacing the built-in dict to instead use a custom class:

from __future__ import annotations
from typing import List, Any
import builtins


class Dict(dict):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__dict__ = self

    def subset(self, keys: List[Any]) -> Dict:
        return Dict({key: self[key] for key in keys})


builtins.dict = Dict

When this module is imported, it replaces the dict built-in with the Dict class. However this only works when we directly call dict(). If we attempt to use {} it will fall back to the base dict built-in implementation:

import new_dict

a = dict({'a': 5, 'b': 8})
b = {'a': 5, 'b': 8}

print(type(a))
print(type(b))

Yields:

<class 'py_extensions.new_dict.Dict'>
<class 'dict'>
19
  • 2
    You can't override built-ins from within Python. You will have to modify the actual interpreter implementation (such as CPython). Commented Mar 28, 2022 at 4:02
  • 2
    @JackAvante True, you can shadow built-in names. I should have been more specific. Commented Mar 28, 2022 at 4:06
  • 1
    @SeanXie Edited my code to show what I'm doing as well Commented Mar 28, 2022 at 4:10
  • 1
    One possibility is to do this via transpiling. Then, when you "import" a .py_better file, it transpiles it into python and imports that. Commented Mar 28, 2022 at 4:31
  • 1
    Very related: stackoverflow.com/q/19083160/476 Commented Mar 31, 2022 at 12:33

2 Answers 2

3
+25

[] and {} are compiled to specific opcodes that specifically return a list or a dict, respectively. On the other hand list() and dict() compile to bytecodes that search global variables for list and dict and then call them as functions:

import dis

dis.dis(lambda:[])
dis.dis(lambda:{})
dis.dis(lambda:list())
dis.dis(lambda:dict())

returns (with some additional newlines for clarity):

  3           0 BUILD_LIST               0
              2 RETURN_VALUE

  5           0 BUILD_MAP                0
              2 RETURN_VALUE

  7           0 LOAD_GLOBAL              0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

  9           0 LOAD_GLOBAL              0 (dict)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

Thus you can overwrite what dict() returns simply by overwriting the global dict, but you can't overwrite what {} returns.

These opcodes are documented here. If the BUILD_MAP opcode runs, you get a dict, no way around it. As an example, here is the implementation of BUILD_MAP in CPython, which calls the function _PyDict_FromItems. It doesn't look at any kind of user-defined classes, it specifically makes a C struct that represents a python dict.

It is possible in at least some cases to manipulate the python bytecode at runtime. If you really wanted to make {} return a custom class, I suppose you could write some code to search for the BUILD_MAP opcode and replace it with the appropriate opcodes. Though those opcodes aren't the same size, so there's probably quite a few additional changes you'd have to make.

Sign up to request clarification or add additional context in comments.

Comments

1

The ast module is an interface to Python's Abstract Syntax Tree which is built after parsing Python code.
It's possible to replace literal dict ({}) with dict call by modifying Abstract Syntax Tree of Python code.

import ast
import new_dict

a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}

print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))

src = """

a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}

print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))

"""

class RewriteDict(ast.NodeTransformer):
    def visit_Dict(self, node):
        # don't replace `dict({"a": 1})`
        if isinstance(node.parent, ast.Call) and node.parent.func.id == "dict":
            return node
        # replace `{"a": 1} with `dict({"a": 1})
        new_node = ast.Call(
            func=ast.Name(id="dict", ctx=ast.Load()),
            args=[node],
            keywords=[],
            type_comment=None,
        )
        return ast.fix_missing_locations(new_node)


tree = ast.parse(src)

# set parent to every node
for node in ast.walk(tree):
    for child in ast.iter_child_nodes(node):
        child.parent = node

RewriteDict().visit(tree)
exec(compile(tree, "ast", "exec"))

output;

<class 'new_dict.Dict'>
<class 'dict'>
<class 'dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>

1 Comment

Thar's awesome! Thanks for the demonstration! I might make use of this together with MacroPy

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.