The order of nested list comprehension and nested generator expression in python

Question

I'm new to Python and is confused by a piece of code in Python's official documentation.

unique_words = set(word  for line in page  for word in line.split())

To me, it looks equivalent to:

unique_words=set()
for word in line.split():
    for line in page:
        unique_words.add(word)

How can line be used in the first loop before it's defined in the nested loop? However, it actually works. I think it suggests the order of nested list comprehension and generator expression is from left to right, which contradicts with my previous understanding.

Can anyone clarify the correct order for me?

You've got the loops backwards. The for line in page part should be the outer loop. — APerson
– APerson, Commented Nov 5, 2014 at 14:15
If you think your nested loop is equivalent, you need to explain where line in the outer loop is defined. The order in a nested generator expression is the same as any nested loop. — chepner
– chepner, Commented Nov 5, 2014 at 14:21

ni8mr · Accepted Answer · 2014-11-05 15:40:43Z

8

word for line in page for word in line.split()

this part works like this:-

for line in page:
    for word in line.split():
        print word

() this makes it `generator function hence overall statement work lie this:-

def solve():
    for line in page:
        for word in line.split():
            yield word

and set() is used to avoid duplicacy or repetition of same word as the code is meant to get 'unique words'.

edited Nov 5, 2014 at 15:40

ni8mr

1,7954 gold badges32 silver badges59 bronze badges

answered Nov 5, 2014 at 14:20

Vishnu Upadhyay

5,0611 gold badge17 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user3378649 Over a year ago

Great answer; I'd add to the fact that a set is used to delete duplicates

BreakBadSP Over a year ago

(word for line in page for word in line.split())

AlMa1r Over a year ago

What do the characters ` and - in `generator function hence overall statement work lie this:- mean?

Parviz Oct 23 at 13:38

Well answered! Clear and concise! Thank you.

Vincent Beltman · Accepted Answer · 2014-11-05 14:19:28Z

2

You got the loops wrong. Use this:

unique_words = set(word for line in page for word in line.split())
print unique_words

l = []
for line in page:
    for word in line.split():
        l.append(word)
print set(l)

output:

C:\...>python test.py
set(['sdaf', 'sadfa', 'sfsf', 'fsdf', 'fa', 'sdf', 'asd', 'asdf'])
set(['sdaf', 'sadfa', 'sfsf', 'fsdf', 'fa', 'sdf', 'asd', 'asdf'])

answered Nov 5, 2014 at 14:19

Vincent Beltman

2,11216 silver badges28 bronze badges

2 Comments

user3378649 Over a year ago

He's right! l should be a set, not a list. it's a way to delete duplicated values.

Vincent Beltman Over a year ago

Please explain the downvote, so I can enhance my answer

John Y · Accepted Answer · 2014-11-05 14:38:05Z

2

From the tutorial in the official documentation:

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:
>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
and it’s equivalent to:
>>> combs = []
>>> for x in [1,2,3]:
...     for y in [3,1,4]:
...         if x != y:
...             combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
Note how the order of the for and if statements is the same in both these snippets.

See the last sentence quoted above.

Also note that the construct you're describing is not (officially) called a "nested list comprehension". A nested list comprehension entails a list comprehension which is within another list comprehension, such as (again from the tutorial):

[[row[i] for row in matrix] for i in range(4)]

The thing you're asking about is simply a list comprehension with multiple for clauses.

edited Nov 5, 2014 at 14:38

answered Nov 5, 2014 at 14:32

John Y

14.6k2 gold badges51 silver badges76 bronze badges

4 Comments

Eric Duminil Over a year ago

Note that the first example doesn't answer the question : x and y are independent and can be swapped, which isn't the case in the OP's example.

John Y Over a year ago

@EricDuminil - it does answer the question. OP wanted to know the correct order for parsing the multiple for clauses in a comprehension. Actually, OP already deduced the correct order from observing the behavior, but wanted confirmation. What better confirmation than official documentation? Whether x and y are independent is irrelevant. The relevant part is unrolling the comprehension to its equivalent nested-loop form, which is incidentally precisely what the accepted and top-voted answer does (except that answer doesn't cite any references to justify it).

Eric Duminil Over a year ago

I still think it's a poor choice of an example (in the documentation, not your answer) because x and y could be swapped. It doesn't really cover the OP's case where word is in line and line is in page.

John Y Over a year ago

@EricDuminil - I understand what you are saying, but the point is that it does cover the OP's case, because what matters is the order of the loops. Notice that while x and y are independent, they are not equal. So if you swap them, you get different results. You seem to be saying that in OP's example, getting the order wrong breaks the program. Sure, but so does getting it wrong in the tutorial example if it happens to be in a program which can't handle tuples whose leftmost element is 4. If you understand the tutorial example, you understand how to parse OP's code snippet.

Yair Daon · Accepted Answer · 2014-11-05 14:20:46Z

0

You have the nested loops mixed. What the code does is:

unique_words={}
for line in page:
    for word in line.split():
        unique_words.add(word)

answered Nov 5, 2014 at 14:20

Yair Daon

1,1632 gold badges15 silver badges28 bronze badges

Comments

Ahmad Shapiro · Accepted Answer · 2023-05-23 16:24:22Z

0

for outer_val in outer_loop :
    for inner_val in inner_loop:
        do_something()

Translates to [do_something() for inner_val in inner_loop for outer_val in outer_loop ]

[ op <inner_loop> <outer_loop>]

answered May 23, 2023 at 16:24

Ahmad Shapiro

1

Comments

Community · Accepted Answer · 2017-05-23 12:33:51Z

-2

In addition to the right answers that stressed the point of the order, I would add the fact that we use set to delete duplicates from line to make "unique words". check this and this thread

unique_words = set(word for line in page for word in line.split())
print unique_words

l = {}
for line in page:
    for word in line.split():
        l.add(word)
print l

edited May 23, 2017 at 12:33

CommunityBot

11 silver badge

answered Nov 5, 2014 at 14:26

user3378649

5,35416 gold badges55 silver badges77 bronze badges

1 Comment

Mad Physicist Over a year ago

{} does not create an empty set.

Collectives™ on Stack Overflow

The order of nested list comprehension and nested generator expression in python

6 Answers 6

4 Comments

2 Comments

4 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

2 Comments

4 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related