6

I am reading Joel Grus's data science from scratch book and found something a bit mysterious. Basically, in some sample code, he wrote

a = [1, 2 ,3 ,4]
xs = [i for i,_ in enumerate(a)]

Why would he prefer to do this way? Instead of

xs = range(len(a))
2
  • 1
    Honestly, I don't know. Range is more readable than enumerate and avoids the uneccessary generated index... Commented Apr 15, 2016 at 12:44
  • 1
    this just looks like he doesn't know what he is doing TBH. an extra throwaway variable, and throwing away the only extra thing enumerate gets you? Commented Apr 15, 2016 at 12:59

4 Answers 4

18

Answer: personal preference of the author. I find

[i for i, _ in enumerate(xs)]

clearer and more readable than

list(range(len(xs)))

which feels clunky to me. (I don't like reading the nested functions.) Your mileage may vary (and apparently does!).

That said, I am pretty sure I didn't say not to do the second, I just happen to prefer the first.

Source: I am the author.

P.S. If you're the commenter who had no intention of reading anything I write about Python, I apologize if you read this answer by accident.

Sign up to request clarification or add additional context in comments.

2 Comments

I guess the list comp with enumerate is a little clearer to read than the triply nested function call... but I still don't like it. ;)
Perhaps my saying that I have no intention of reading anything you write about Python was a bit extreme. And although I don't like your your list comp, I guess it is Pythonic, since "Flat is better than nested". OTOH, "There should be one-- and preferably only one --obvious way to do it". :)
8

I looked at the code available on github and frankly, I do not see any other reason for this except the personal preference of the author.

However, the result needs to be a list in places like this:

indexes = [i for i, _ in enumerate(data)]  # create a list of indexes
random.shuffle(indexes)                    # shuffle them
for i in indexes:                          # return the data in that order
    yield data[i]

Using bare range(len(data)) in that part on Python 3 would be wrong, because random.shuffle() requires a mutable sequence as the argument, and the range objects in Python 3 are immutable sequences.


I personally would use list(range(len(data))) on Python 3 in the case that I linked to, as it is guaranteed to be more efficient and would fail if a generator/iterator was passed in by accident, instead of a sequence.

1 Comment

Nice point about len raising an error (TypeError) if data isn't a valid arg for it.
2

Without being the author, I would have to guess, but my guess is that it's for Python 2 and 3 compatibility.

In Python 2:

>>> a = [1,2,3,4]
>>> xs = range(len(a))
>>> xs
[0, 1, 2, 3]
>>> type(xs)
<type 'list'>

In Python 3:

>>> a = [1,2,3,4]
>>> xs = range(len(a))
>>> xs
range(0, 4)
>>> type(xs)
<class 'range'>

Now, that doesn't make a difference when you're directly iterating over the range, but if you're planning to use the index list for something else later on, the author may feel that the enumerate is simpler to understand than list(range(len(a)))

2 Comments

If the author feels that the enumerate is simpler to understand than list(range(len(a))) I have no intention of reading anything he writes about Python! Sure, list(range(len(a))) is slightly inefficient in Python 2, but both those calls run at C speed so it's still pretty fast, and for large len(a) it will be much faster than the Python speed loop in a list comp using enumerate (or range or xrange).
I've not read the book, but it may also be a poor choice of syllabus ordering, too - if the book hasn't introduced the range statement, but has introduced enumerate, that might be a reason. Mind you, even if that's true, it's better to introduce range
-2

Both are ok. When I started coding in python I was more list(range(len(a))) . Now I am more in pythonic way . Both are readable.

7 Comments

Sure, range(len(a)) can often be a symptom of un-Pythonic code. And one should often use enumerate instead. But in this case, list(range(len(a))) is more Pythonic than that list comp in the OP.
I dont think range(len(a)) is unpythonic. And I really like list comprehension solution.
I didn't say that range(len(a)) is unpythonic, I said it can often be a symptom of un-Pythonic code. That's because it's often used to indirectly iterate over a list, IOW, to iterate via the index, rather than to iterate directly over the list items. Generally, it's better to iterate directly, and if you also need the index then you should use enumerate. But to use enumerate simply to get the index when you don't want the list items is just plain weird, IMHO.
(cont) Antti Happala's answer explains why list(range(len(data))) is better here (or just range(len(data)) on Python 2 if you don't care about Python 3 compatibility).
"I personally would use list(range(len(data)))" this is not an explaination, but a personnal choice. I find personnaly find [index for index,_ in enumerate(lst)] really easy to read and I dont clain it is the only way to do it. reason (still personnal preference):
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.