1

I have a Pyspark Dataframe with this structure:

+----+----+----+----+---+
|user| A/B|   C| A/B| C | 
+----+----+-------------+
|  1 |   0|   1|   1|  2| 
|  2 |   0|   2|   4|  0| 
+----+----+----+----+---+

I had originally two dataframes, but I outer joined them using user as key, so there could be also null values. I can't find the way to sum the columns with equal name in order to get a dataframe like this:

+----+----+----+
|user| A/B|   C| 
+----+----+----+
|  1 |   1|   3| 
|  2 |   4|   2| 
+----+----+----+

Also note that there could be many equal columns, so selecting literally each column is not an option. In pandas this was possible using "user" as Index and then adding both dataframes. How can I do this on Spark?

1 Answer 1

1

I have a work around for this

val dataFrameOneColumns=df1.columns.map(a=>if(a.equals("user")) a else a+"_1")
val updatedDF=df1.toDF(dataFrameOneColumns:_*)

Now make the Join then the out will contain the Values with different names

Then make the tuple of the list to be combined

val newlist=df1.columns.filter(_.equals("user").zip(dataFrameOneColumns.filter(_.equals("user"))

And them Combine the value of the Columns within each tuple and get the desired output !

PS: i am guessing you can write the logic for combining ! So i am not spoon feeding !

Sign up to request clarification or add additional context in comments.

2 Comments

I got the first step of naming the first dataframe with the columns with _1 as suffix but you got me a little lost in the second step. Could you please rewrite it in Python (Pyspark)? You wrote it in Scala.
Sorry I am not that much familiar with python but i can tell you the concept , I am just making the tuple of the column names that sound familiar and then apply sum function on the two values of the tuple to get the output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.