2

I have the following PySpark dataframe (first_df):

id cat dog bird
0 ["persan", "sphynx"] [] ["strisores"]
1 [] ["bulldog"] ["columbaves", "gruiformes"]
2 ["ragdoll"] ["labrador"] []

And I would like to explode multiple columns at once, keeping the old column names in a new column, such as:

id animal animal_type
0 persan cat
0 sphynx cat
0 strisores bird
1 bulldog dog
1 columbaves bird
1 gruiformes bird
2 ragdoll cat
2 labrador dog

So far, my current solution is the following:

animal_types = ['cat', 'dog', 'bird']
df = spark.createDataFrame([], schema=StructType([
    StructField('id', StringType()),
    StructField('animal', StringType()),
    StructField('animal_type', StringType())
]))

for animal_type in animal_types:
  df = first_df \
    .select('id', animal_type) \
    .withColumn('animal', F.explode(animal_type)) \
    .drop(animal_type) \
    .withColumn('animal_type', F.lit(animal_type.upper())) \
    .union(df)

But I found it quite inefficient, particularly when working in clusters.

Is there a better spark way to accomplish this?

1 Answer 1

3

You can unpivot and explode the array:

df2 = df.selectExpr(
    'id', 
    'stack(' + str(len(df.columns[1:])) + ', ' + ', '.join(["%s, '%s'" % (col,col) for col in df.columns[1:]]) + ') as (animal, animal_type)'
).withColumn(
    'animal', 
    F.explode('animal')
)

df2.show()
+---+----------+-----------+
| id|    animal|animal_type|
+---+----------+-----------+
|  0| strisores|       bird|
|  0|    persan|        cat|
|  0|    sphynx|        cat|
|  1|columbaves|       bird|
|  1|gruiformes|       bird|
|  1|   bulldog|        dog|
|  2|   ragdoll|        cat|
|  2|  labrador|        dog|
+---+----------+-----------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.