0
from pyspark.sql import HiveContext  
hive_context = HiveContext(sc)  
test = hive_context.table("dbname.tablename")    
iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
 for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id) 
   print (v)  

Above code is giving error in for loop.How to print a single column with out changing above mapping because further i would like to write the code as

for ite in iterate.collect():  
   for ite11 in secondtable.collect() :   
       if ite.metric_id.find(ite11.column1)  
         result.append(ite , ite11)   

Kindly any one help on this

2 Answers 2

1

Reason for error when running:

for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id) 

The result of iterate.collect() is not RDD, it is a python list (or something like that).

map can be execute on RDD, and can't be executed on python-list.

In general collect() is NOT recommended to use in spark

The following should perform similar operation without error:

iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
   v = iterate.map(lambda (survey_date,pro_catg,metric_id): metric_id)
   print (v.collect())  
Sign up to request clarification or add additional context in comments.

1 Comment

I have a question. Thanks in advance!stackoverflow.com/questions/61964179/…
1

Finally i got one more solution to map single column value in for loop as

for ite in iterate.collect():
  for itp in prod.collect():    
    if itp[0] in ite[1]: result.append(p)   
print(result)  

It works fine. Instead of in we can use find as

if ite[1].find(itp[0]): result.append(p)

1 Comment

I have a questions. Thanks in advance!`stackoverflow.com/questions/61964179/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.