0

Can someone Please help me how should i Implemnet below sql in Pyspark Dataframe.

    (SUM(Cash) /SUM(cash + credit)) * 100 AS Percentage,
        
    df1=df.withColumn("cash_credit",sf.col("cash") + sf.col("credit")) 
    df1.show(5)

    -------------+---------------+ +--------+-------+------+------| 
    Credit        |Cash   |       MTH|YR           |  cash_credit | 
    -------------+---------------+ -------+--------|--------------|
     100.00|       400.00|         10|       2019  |  500.00      | 
     0.00  |       500.00|         6 |       2019  |  500.00      |  
     200.00|       600.00|         12|       2018  |  800.00      | 
     0.00  |       0.00  |         10|       2019  |  0.00        | 
     300.00|       700.00|          7|       2019  |  1000.00     | 
    -------------+---------------+----------+--------+-------+--- | 

I have tried below Pyspark Code.

    df2 = df1.groupBy('MTH', 'YR').agg(sf.sum("Cash").alias("sum_Cash"))\
             .withColumn("final_column",sf.col("sum_Cash") + sf.col("cash_credit"))\
             .withColumn("div",sf.col("sum_Cash")/sf.col("final_column"))\
             .withColumn("Percentage",sf.col("div")*100)

But not able to execute it. It's showing below error.

    cannot resolve '`cash_credit`' given input columns: [MTH, YR, sum_Cash];;
3
  • 1
    Your LOGIC is wrong. When you groupBy('MTH', 'YR').agg(sf.sum("Cash").alias("sum_Cash")) you will have only MTH, YR, sum_Cash . Commented Oct 25, 2020 at 8:25
  • @Sanket9394 Thanks a lot.could you Please help me how should i Rewrite this. Commented Oct 25, 2020 at 8:42
  • 1
    You need to pass this - cash_credit Inside agg function else it will not available to access subsequent steps..:) Commented Oct 25, 2020 at 9:58

1 Answer 1

1

You can modify it like this to bring cash_credit out of groupby-aggregation:

df2 = df1.groupBy('MTH', 'YR').agg(sf.sum("Cash").alias("sum_Cash"),sf.sum("cash_credit").alias("cash_credit"))\
         .withColumn("final_column",sf.col("sum_Cash") + sf.col("cash_credit"))\
         .withColumn("div",sf.col("sum_Cash")/sf.col("final_column"))\
         .withColumn("Percentage",sf.col("div")*100)

I uses sum aggregation for 'cash_credit' but you can use other aggregation functions.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.