3

I have two dataframes, one simply with some unique ids with associated names like so:

Id name
0 name_a
1 name_b
2 name_c

Second dataframe contains the ids from the first dataframe stored in an array, in each row:

Row_1 row_2
0 [0,2]
1 [1,0]

My question is it possible to replace the arrays from the second dataframe so it checks the names from the first df based on the ids, so:

Row_1 row_2
0 [name_a, name_c]
1 [name_b, name_a]

It seems too time consuming to create a map of the first df and just add it to the second df with an udf. Any help is much appreciated on how to approach this.

2 Answers 2

2

Join using array_contains function + groupby and collect_list:

from pyspark.sql import functions as F

df1 = spark.createDataFrame([(0, "name_a"), (1, "name_b"), (2, "name_c")], ["Id", "name"])
df2 = spark.createDataFrame([(0, [0, 2]), (1, [1, 0])], ["Row_1", "Row_2"])

result = df2.join(
    df1, on=F.array_contains("Row_2", F.col("Id")), how="left"
).groupBy("Row_1").agg(
    F.collect_list("name").alias("Row_2")
)

result.show()
#+-----+----------------+
#|Row_1|           Row_2|
#+-----+----------------+
#|    0|[name_a, name_c]|
#|    1|[name_a, name_b]|
#+-----+----------------+
Sign up to request clarification or add additional context in comments.

Comments

0

You can try using explode function to convert array into rows, then join the data with the initial data frame, in the last step do a group by & .agg(collect_list())

from pyspark.sql.functions import explode

df3 = df2.select(df2.row_1,explode(df2.row_2))
df4 = df3.join(df1,df3.row_1==df1.Id).select(df3.row_1,df1.name)
df5 = df4.groupBy('row_1').agg(collect_list('name').alias('name'))

Reference links:

  1. https://sparkbyexamples.com/pyspark/pyspark-explode-array-and-map-columns-to-rows/#:~:text=explode%20%E2%80%93%20PySpark%20explode%20array%20or,it%20contains%20all%20array%20elements.
  2. https://www.owenrumney.co.uk/pyspark-opposite-of-explode/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.