How can I access a specific column from Spark Data frame in python?

Question

My Dataframe looks like this

------+-------+                                                                
|cat_id|counter|
+------+-------+
|    12|  61060|
|     1| 542118|
|    13| 164700|
|     3| 406622|
|     5|  54902|
|    10| 118281|
|    11|  13658|
|    14|  72229|
|     2| 131206|
+------+-------+

Query to get above data frame is :

grouped_data = dataframe.groupBy("cat_id").agg(count("*").alias("counter"))

Now I need to read values for different cat_id to save it in another database.

The way I can get it done is by using a for loop on my id's

for cat_id in cat_ids_map:
     statsCount = grouped_data.select("counter").filter("cat_id = " + cat_id).collect()[0].counter

But I think there can be a better way to read the counter without for loop. Any suggestions would be helpful!!!

Thanks

what is your target database ? you can write your dataframe almost anywhere. — Steven
– Steven, Commented Aug 11, 2020 at 22:23

Matej Murin · Accepted Answer · 2020-08-11 22:26:59Z

2

If you're to iterate through the entire dataframe, the way to do it is usually using a .foreach function.

so you would do:

grouped_data.foreach(lambda x: f(x))

where f is your function that will do whatever you want with each element in the dataframe

answered Aug 11, 2020 at 22:26

Matej Murin

463 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How can I access a specific column from Spark Data frame in python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related