how to rename all columns of pyspark dataframe using a list

Question

I have a existing pyspark dataframe that has around 200 columns. I have a list of the column names (in the correct order and length).

How can I apply the list to the dataframe without using structtype?

Has the list of column names the correct order and a matching length? — Michael Szczesny
– Michael Szczesny, Commented Sep 2, 2021 at 22:08

Michael Szczesny · Accepted Answer · 2021-09-02 22:18:45Z

7

Assuming the list of column names is in the right order and has a matching length you can use toDF

Preparing an example dataframe

import numpy as np
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(np.random.randint(1,10,(5,4)).tolist(), list('ABCD'))
df.show()

Output

+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  6|  9|  4|  7|
|  6|  4|  7|  9|
|  2|  5|  2|  2|
|  3|  7|  4|  5|
|  8|  9|  6|  8|
+---+---+---+---+

Changing the column names

newcolumns = ['new_A','new_B','new_C','new_D']
df.toDF(*newcolumns).show()

Output

+-----+-----+-----+-----+
|new_A|new_B|new_C|new_D|
+-----+-----+-----+-----+
|    6|    9|    4|    7|
|    6|    4|    7|    9|
|    2|    5|    2|    2|
|    3|    7|    4|    5|
|    8|    9|    6|    8|
+-----+-----+-----+-----+

answered Sep 2, 2021 at 22:18

Michael Szczesny

5,0465 gold badges20 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sargupta · Accepted Answer · 2022-04-02 10:57:21Z

2

If you have list of columns pre-exiting, it would work fine:

df_list = ["newName_1", "newName_2", "newName_3", "newName_4"]
renamed_df = df.toDF(*df_list)
renamed_df.show()

But if you want to make it dynamic and without relying on list of columns, here is alternate way of doing it:

df.select([col(col_name).alias(col_name) for col_name in df])

answered Apr 2, 2022 at 10:57

sargupta

1,03316 silver badges28 bronze badges

2 Comments

Adam Conrad Over a year ago

Thanks, second part is useful to do some string manipulation on all the columns.

Adam Conrad Over a year ago

Second part gives runtime error: TypeError: Column is not iterable

Collectives™ on Stack Overflow

how to rename all columns of pyspark dataframe using a list

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related