12

Many iterator "functions" in the __builtin__ module are actually implemented as types, even although the documentation talks about them as being "functions". Take for instance enumerate. The documentation says that it is equivalent to:

def enumerate(sequence, start=0):
    n = start
    for elem in sequence:
        yield n, elem
        n += 1

Which is exactly as I would have implemented it, of course. However, I ran the following test with the previous definition, and got this:

>>> x = enumerate(range(10))
>>> x
<generator object enumerate at 0x01ED9F08>

Which is what I expect. However, when using the __builtin__ version, I get this:

>>> x = enumerate(range(10))
>>> x
<enumerate object at 0x01EE9EE0>

From this I infer that it is defined as

class enumerate:
    def __init__(self, sequence, start=0):
        # ....

    def __iter__(self):
        # ...

Rather than in the standard form the documentation shows. Now I can understand how this works, and how it is equivalent to the standard form, what I want to know is what is the reason to do it this way. Is it more efficient this way? Does it has something to do with these functions being implemented in C (I don't know if they are, but I suspect so)?

I'm using Python 2.7.2, just in case the difference is important.

Thanks in advance.

3
  • 1
    Is that a problem to you? Function and classes are just callable objects... Commented Feb 13, 2013 at 19:38
  • @JBernardo It's not a problem in almost all circumstances (and when it is, you should probably just fix the hack that breaks). But it's still interesting. Commented Feb 13, 2013 at 19:39
  • 4
    No, of course not. Its just an academic question. I want to know the rationale behind their implementation, when implementing generators is so easy. And maybe it will give me some insight into the question of: should I do it this way for my own generators? Commented Feb 13, 2013 at 19:41

3 Answers 3

9

Yes, it has to do with the fact that built-ins are generally implemented in C. Really often C code will introduce new types instead of plain functions, as in the case of enumerate. Writing them in C provide finer control over them and often some performance improvements, and since there is no real downside it's a natural choice.

Take into account that to write the equivalent of:

def enumerate(sequence, start=0):
    n = start
    for elem in sequence:
        yield n, elem
        n += 1

in C, i.e. a new instance of a generator, you should create a code object that contains the actual bytecode. This is not impossible, but is not so easier than writing a new type which simply implements __iter__ and __next__ calling the Python C-API, plus the other advantages of having a different type.

So, in the case of enumerate and reversed it's simply because it provides better performance, and it's more maintainable.

Other advantages include:

  • You can add methods to the type(e.g. chain.from_iterable). This could be done even with functions, but you'd have to first define them and then manually set the attributes, which doesn't look so clean.
  • You can us isinstance on the iterables. This could allow some optimizations(e.g if you know that isinstance(iterable, itertools.repeat), then you may be able to optimize the code since you know which values will be yielded.

Edit: Just to clarify what I mean by:

in C, i.e. a new instance of a generator, you should create a code object that contains the actual bytecode.

Looking at Objects/genobject.c the only function to create a PyGen_Type instance is PyGen_New whose signature is:

PyObject *
PyGen_New(PyFrameObject *f)

Now, looking at Objects/frameobject.c we can see that to create a PyFrameObject you must call PyFrame_New, which has this signature:

PyFrameObject *
PyFrame_New(PyThreadState *tstate, PyCodeObject *code, PyObject *globals,
            PyObject *locals)

As you can see it requires a PyCodeObject instance. PyCodeObjects are how the python interpreter represents bytecode internally(e.g. a PyCodeObject can represent the bytecode of a function), so: yes, to create a PyGen_Type instance from C you must manually create the bytecode, and it's not so easy to create PyCodeObjects since PyCode_New has this signature:

PyCodeObject *
PyCode_New(int argcount, int kwonlyargcount,
           int nlocals, int stacksize, int flags,
           PyObject *code, PyObject *consts, PyObject *names,
           PyObject *varnames, PyObject *freevars, PyObject *cellvars,
           PyObject *filename, PyObject *name, int firstlineno,
           PyObject *lnotab)

Note how it contains arguments such as firstlineno, filename which are obviously meant to be obtained by python source and not from other C code. Obviously you can create it in C, but I'm not at all sure that it would require less characters than writing a simple new type.

Sign up to request clarification or add additional context in comments.

15 Comments

Do you have to? A large number of functions are written in C, and I really really doubt one can't emulate generators in C. Whether it's pretty or useful is another question entirely ;-)
Why do functions written in C have to be new types?
@martineau I probably generalized too much. What I wanted to say is that creating a new generator in C means to manually create the bytecode for the function, which is an overkill and not so "confortable". Writing a new type with __iter__ and __next__ methods is pretty easy and provides more benefits.
@forivall You probably skipped the beginning of my answer: "Really often C code will introduce new types instead of plain functions, as in the case of enumerate".
@forivall The OP mentioned the fact that it probably had to do with the fact that they are implemented in C. I showed why it's not a good idea to write in C enumerate as a function instead of implementing it as a new type, so it is related to the OP's doubts.
|
2

Yes, they're implemented in C. They use the C API for iterators (PEP 234), in which iterators are defined by creating new types that have the tp_iternext slot.

The functions that are created by the generator function syntax (yield) are 'magical' functions that return a special generator object. These are instances of types.GeneratorType, which you cannot manually create. If a different library that uses the C API defines its own iterator type, it won't be an instance of GeneratorType, but it'll still implement the C API iterator protocol.

Therefore, the enumerate type is a distinct type that is different from GeneratorType, and you can use it like any other type, with isinstance and such (although you shouldn't).


Unlike Bakuriu's answer, enumerate isn't a generator, so there's no bytecode/frames.

$ grep -i 'frame\|gen' Objects/enumobject.c
    PyObject_GenericGetAttr,        /* tp_getattro */
    PyType_GenericAlloc,            /* tp_alloc */
    PyObject_GenericGetAttr,        /* tp_getattro */
    PyType_GenericAlloc,            /* tp_alloc */

Instead, the way you create a new enumobject is with the function enum_new, whose signature doesn't use a frame

static PyObject *
enum_new(PyTypeObject *type, PyObject *args, PyObject *kwds)

This function is placed within the tp_new slot of the PyEnum_Type struct (of type PyTypeObject). Here, we also see that the tp_iternext slot is occupied by the enum_next function, which contains straightforward C code that gets the next item of the iterator it's enumerating over, and then returns a PyObject (a tuple).

Moving on, PyEnum_Type is then placed into the builtin module (Python/bltinmodule.c) with the name enumerate, so that it is publicly accessible.

No bytecode needed. Pure C. Much more efficient than any pure python or generatortype implementation.

7 Comments

I never stated that enumerate requires "bytecode" or frame objects. I stated that to create a new instance of GeneratorType requires that, and so would enumerate if it were implemented as a function returning a GeneratorType instance.
@Bakuriu And I accused you of saying such a thing. Your answer is devoted to defining a generator in C. But nobody does that, we define custom iterator types in C.
If I wanted to turn enumerate into a function in C, I would use PyCFunction_New* and make it return a custom object, say, a PyEnum_Type. Hooray for duck typing: we don't need to care that it's not an instance of 'GeneratorType'.
My answer is devoted to answering the question, which is: 1) Why enumerate is not a simple generator but a new type and 2) if it has to do with being written in C. The OP never asked how enumerate is actually implemented.
"The OP never asked how enumerate is actually implemented." Yup. "Why enumerate is not a simple generator but a new type" Nope. He asked why they're not "functions". I explained that generators are constructed by special functions that aren't manually created (they only make sense when written in python with a 'yield' function). And then you explained how to write a generator in C. How silly. So I explained how 'enumerate' is written in C, and how it doesn't need to be generator.
|
1

The enumerate call needs to return an iterator. An iterator is an object with a specific API. The easiest way of implementing a class with a specific API is generally to, well, implementing it as a class.

The reason it says "type" instead of "class" is Python 2 specific, as builtin classes was called "types" in Python 2, as a rest of Python having both types and classes before Python 2.2. In Python 2.3 classes and types was unified. And in Python 3 it therefore it says class:

>>> enumerate
<class 'enumerate'>

This makes it clearer that your question "Why is some builtins types instead of functions" has very little to do with them being implemented in C. They are types/classes because that was the best way to implement the functionality. It's that easy.

Now if we instead interpret your question as "Why is enumerate a type/class instead of a generator" (which is a very different question), then the answer is also naturally different. The answer there is that generators are Python shortcuts for creating iterators from Python functions. They are not intended for use from C. They are also less useful for making generators out of functions than out of class methods, as if you want to create an iterator object out of a class method you need to also pass in the object context, but with a function you don't need this. So there it's mostly the benefit that you have less "scaffolding" code.

6 Comments

I fail to see how the python3/python2 difference has anything to do with the OP question(since he only mentioned python2.7). Also "They are types/classes because that was the best way to implement the functionality" is kind of obvious, otherwise it would mean that python devs like to waste time doing things in the hard way without any advantage. The OP question is more specific.
@Bakuriu: The point is that Python 2 calling it "types" has led people to think this has something to do with them being implemented in C, as evident from the two other answers. This is wrong. It has nothing to do with them being implemented in C. This is evident from Python 3, where they are no longer types, but classes.
@Bakuriu I clarified the question.
I believe the OP was well aware of that. In fact note the "From this I infer that it is defined as class enumerate: ...". No C code here. Also the OP himself admits he does not know whether their are implemented in C or not. He is asking: "why is is enumerate(sequence) an instance of enumerate and not an instance of generator? May it be related to the fact that, maybe it is implemented in C?". At least this is what I read in the question when I answered and I believe being implemented in C is an issue, as I've explained.
@Bakuriu What the OP is well aware of is less relevant than answering the question the OP actually posts, as SO is supposed to be generically useful.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.