0

My Dataframe looks like this

------+-------+                                                                
|cat_id|counter|
+------+-------+
|    12|  61060|
|     1| 542118|
|    13| 164700|
|     3| 406622|
|     5|  54902|
|    10| 118281|
|    11|  13658|
|    14|  72229|
|     2| 131206|
+------+-------+

Query to get above data frame is :

grouped_data = dataframe.groupBy("cat_id").agg(count("*").alias("counter"))

Now I need to read values for different cat_id to save it in another database.

The way I can get it done is by using a for loop on my id's

for cat_id in cat_ids_map:
     statsCount = grouped_data.select("counter").filter("cat_id = " + cat_id).collect()[0].counter

But I think there can be a better way to read the counter without for loop. Any suggestions would be helpful!!!

Thanks

2
  • 1
    what is your target database ? you can write your dataframe almost anywhere. Commented Aug 11, 2020 at 22:23
  • Need to pass it to influxDB for logging purposes Commented Aug 12, 2020 at 2:02

1 Answer 1

2

If you're to iterate through the entire dataframe, the way to do it is usually using a .foreach function.

so you would do:

grouped_data.foreach(lambda x: f(x))

where f is your function that will do whatever you want with each element in the dataframe

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.