0

I have a dataset on which I was asked to write a pyspark code for the following question.

List of Winners of Each World champions Trophy Hint: Total Result of all rounds of Tournament for that player is considered as that player's
Score/Result.
Result attributes: winner, tournament_name

I wrote this code:

game_info = spark.read.load("/content/chess/chess_wc_history_game_info.csv",
                     format="csv", sep=",", inferSchema="true", header="true")

game_info.groupBy('winner').show()

But on execution I got an error as:

AttributeError: 'GroupedData' object has no attribute 'show'
3
  • the error is quite explanatory. there is no show() method on df.groupBy(). the agg() has to accompany it. Commented Aug 9, 2022 at 10:17
  • what do you want to achieve ? what are you trying to compute ? Commented Aug 9, 2022 at 10:19
  • List of Winners of Each World champions Trophy Hint: Total Result of all rounds of Tournament for that player is considered as that player's Score/Result. Result attributes: winner, tournament_name this is what I have to find out Commented Aug 10, 2022 at 17:50

2 Answers 2

4

This error is there because groupBy() contains only below mentioned functions:

  • count() - Returns the count of rows for each group.
  • mean() - Returns the mean of values for each group.
  • max() - Returns the maximum of values for each group.
  • min() - Returns the minimum of values for each group.
  • sum() - Returns the total for values for each group.
  • avg() - Returns the average for values for each group.
  • agg() - Using agg() function, we can calculate more than one aggregation at a time.
  • pivot() - This function is used to Pivot the DataFrame.
Sign up to request clarification or add additional context in comments.

1 Comment

so for this question do i need to apply max() function?
1

I want to add another usefull function to @numb's list

collect_list - Collects all the values for a specific column foreach group

I guess this would help to "see" the groups

side note: truncate=False in show method print the table without truncating long text so you can actually see all the values

from pyspark.sql.functions import collect_list

game_info.groupBy('winner').agg(collect_list("<column you want to fetch>").alias('group_values')).show(truncate=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.