2

I have the following data frame in spark

s s_type o o_type
-----------------  
s1 ss1  o1   oo1   
s2 ss2  o2   oo2

I want to swap the columns

 s s_type o o_type
 -----------------  
 o1 oo1  s1   ss1   
 o2 oo2  s2   ss2

one way is to copy columns [o, o_type] into temporary columns ['o_temp','o_type_temp'] and then copy the values of [s,s_type] into [o,o_type] and finally ['o_temp','o_type_temp'] into [s,s_type].

I was wondering if there is a better/more efficient way to do this?

1 Answer 1

4

You can just use select with pyspark.sql.Column.alias:

from pyspark.sql.functions import col
df = df.select(
    col("o").alias("s"),
    col("o_type").alias("s_type"),
    col("s").alias("o"),
    col("s_type").alias("o_type")
)

For a more generalized solution, you can create a mapping of old name to new name and loop over this in a list comprehension:

# key = old column, value = new column
mapping = {
    "o": "s",
    "o_type": "s_type",
    "s": "o",
    "s_type": "o_type"
}

df = df.select(*[col(old).alias(new) for old, new in mapping.items()])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.