How to map single column value of a row in for loop in pyspark

Question

from pyspark.sql import HiveContext  
hive_context = HiveContext(sc)  
test = hive_context.table("dbname.tablename")    
iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
 for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id) 
   print (v)

Above code is giving error in for loop.How to print a single column with out changing above mapping because further i would like to write the code as

for ite in iterate.collect():  
   for ite11 in secondtable.collect() :   
       if ite.metric_id.find(ite11.column1)  
         result.append(ite , ite11)

Kindly any one help on this

Yaron · Accepted Answer · 2017-01-10 10:34:38Z

1

Reason for error when running:

for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id)

The result of iterate.collect() is not RDD, it is a python list (or something like that).

map can be execute on RDD, and can't be executed on python-list.

In general collect() is NOT recommended to use in spark

The following should perform similar operation without error:

iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
   v = iterate.map(lambda (survey_date,pro_catg,metric_id): metric_id)
   print (v.collect())

edited Jan 10, 2017 at 10:34

answered Jan 10, 2017 at 10:29

Yaron

10.6k9 gold badges50 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jgtrz Over a year ago

I have a question. Thanks in advance!stackoverflow.com/questions/61964179/…

Ramesh Muthavarapu · Accepted Answer · 2017-02-16 09:49:23Z

1

Finally i got one more solution to map single column value in for loop as

for ite in iterate.collect():
  for itp in prod.collect():    
    if itp[0] in ite[1]: result.append(p)   
print(result)

It works fine. Instead of in we can use find as

if ite[1].find(itp[0]): result.append(p)

edited Feb 16, 2017 at 9:49

answered Feb 16, 2017 at 7:36

Ramesh Muthavarapu

675 bronze badges

1 Comment

jgtrz Over a year ago

I have a questions. Thanks in advance!`stackoverflow.com/questions/61964179/…

Collectives™ on Stack Overflow

How to map single column value of a row in for loop in pyspark

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related