2

I'm writing a C extension for Python and going through the documentation, I'm having a hard time understanding member assignment in the __init__ function.

So there, in section 2.2, member assignment is done as follow:

if (first) {
    tmp = self->first;
    Py_INCREF(first);
    self->first = first;
    Py_XDECREF(tmp);
}

The explanation says later:

Our type doesn’t restrict the type of the first member, so it could be any kind of object. It could have a destructor that causes code to be executed that tries to access the first member; or that destructor could release the Global interpreter Lock and let arbitrary code run in other threads that accesses and modifies our object.

If I Py_XDECREF it, self->first will become invalid.

I understand the case where, if the destructor releases the Global interpreter Lock, there is a dangling pointer and my object (self) could be modified, etc... Everything goes out of control.

But why is it an issue the destructor accesses it ? Why this part:

It could have a destructor that causes code to be executed that tries to access the first member;

is an issue ? If the destructor of self->first accesses itself, fine. I don't care, it's its problem.

Hope I'm clear enough. Thanks for your replies.

2
  • not quite sure where the confusion is, but maybe here: the object referred to by first could have a reference back to the object being initialised, this other object's destructor could itself change first (it's exposed via a PyMemberDef) and cause an issue, if the code wasn't being as careful. can write up a better description as an answer if that's about right... Commented Sep 3, 2019 at 11:12
  • Did not think about the back reference problem. Thanks for pointing it out @SamMason However, in such a situation, why on hell would the destructor of first do something with the back reference (the self of self->first) ? Commented Sep 3, 2019 at 13:17

1 Answer 1

2

Whenever you allow arbitrary Python code to run you need to make sure your object is in a valid state.

When you DECREF an object it's possible that arbitrary Python code is run, for example if the object being deleted has a __del__ method (or if it's a C extension method a tp_dealloc). That code could do anything, for example (like mentioned in the quoted text) access the first attribute of the instance.

For example:

c = Custom("Firstname", "Lastname", 10)

class T:
    def __del__(self):
        print("T", c.first)  # access the first attribute of the "c" variable

c.first = T()
c.__init__(T())
c.__init__(T())
c.__init__(T())

Now if your C code would look like this:

Py_XDECREF(self->first);
Py_INCREF(first);
self->first = first;

At the time the T.__del__ would run (triggered by Py_XDECREF) it would access the c.first which at that moment is an invalid object (because it has a reference count of 0).

In this example it accidentally doesn't break (on my computer) because the memory isn't re-used yet. However if it's slightly more complex it often kills the Python process (on my computer):

c = Custom("Firstname", "Lastname", 10)

class T:
    def __del__(self):
        print("T", c.first)

class U:
    def __init__(self, t):
        self.t = t

    def __del__(self):
        print("U")

c.first = U(T())
c.__init__(U(T()))  # repeated multiple times to make the crash more likely
c.__init__(U(T()))
c.__init__(U(T()))

All that can be avoided (and should be avoided) by making sure the object is in a valid state before calling DECREF (not just during __init__ but everywhere!), either by setting it to null:

Py_CLEAR(self->first);  // Sets the field to null before XDECREF is used
Py_INCREF(first);
self->first = first;

Or by replacing it immediately with first:

tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_XDECREF(tmp);

Or if you don't want to repeatedly use this kind of idiom, you can also create a function like this:

static void replace_field(PyObject **field, PyObject *val)
{
    PyObject *tmp = *field;
    *field = val;
    Py_XDECREF(tmp);
}

Which simplifies the code to:

Py_INCREF(first);
replace_field(&self->first, first);
Sign up to request clarification or add additional context in comments.

1 Comment

As I always pay attention to what I'm doing in __del__, I did not think about such (evil) possibilities. Thanks for the Py_CLEAR macro, did not see it at first in the doc. ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.