swapping column values in pyspark

Question

I have the following data frame in spark

s s_type o o_type
-----------------  
s1 ss1  o1   oo1   
s2 ss2  o2   oo2

I want to swap the columns

 s s_type o o_type
 -----------------  
 o1 oo1  s1   ss1   
 o2 oo2  s2   ss2

one way is to copy columns [o, o_type] into temporary columns ['o_temp','o_type_temp'] and then copy the values of [s,s_type] into [o,o_type] and finally ['o_temp','o_type_temp'] into [s,s_type].

I was wondering if there is a better/more efficient way to do this?

pault · Accepted Answer · 2019-01-29 19:24:58Z

4

You can just use select with pyspark.sql.Column.alias:

from pyspark.sql.functions import col
df = df.select(
    col("o").alias("s"),
    col("o_type").alias("s_type"),
    col("s").alias("o"),
    col("s_type").alias("o_type")
)

For a more generalized solution, you can create a mapping of old name to new name and loop over this in a list comprehension:

# key = old column, value = new column
mapping = {
    "o": "s",
    "o_type": "s_type",
    "s": "o",
    "s_type": "o_type"
}

df = df.select(*[col(old).alias(new) for old, new in mapping.items()])

edited Jan 29, 2019 at 19:24

answered Jan 29, 2019 at 19:13

pault

43.7k17 gold badges121 silver badges161 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

swapping column values in pyspark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related