1

I am new to PySpark and I am trying to understand how can we write multiple nested for loop in PySpark, rough high level example below. Any help will be appreciated.

for ( i=0;i<10;i++)
   for ( j=0;j<10;j++)
       for ( k=0;k<10;k++)
          { 
           print "i"."j"."k"
}

1 Answer 1

8

In non distributed setting for-loops are rewritten using foreachcombinator, but due to Spark nature map and flatMap are a better choice:

from __future__ import print_function
a_loop = lambda x: ((x, y) for y in xrange(10))
print_me = lambda ((x, y), z): print("{0}.{1}.{2}".format(x, y, z)))

(sc.
    parallelize(xrange(10)).
    flatMap(a_loop).
    flatMap(a_loop).
    foreach(print_me)

Of using itertools.product:

from itertools import product
sc.parallelize(product(xrange(10), repeat=3)).foreach(print)
Sign up to request clarification or add additional context in comments.

1 Comment

@Ali print in lambda work's just fine. Tuple parameter unpacking doesn't, but the OP explicitly uses Python 2.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.