3

I have a existing pyspark dataframe that has around 200 columns. I have a list of the column names (in the correct order and length).

How can I apply the list to the dataframe without using structtype?

1
  • Has the list of column names the correct order and a matching length? Commented Sep 2, 2021 at 22:08

2 Answers 2

7

Assuming the list of column names is in the right order and has a matching length you can use toDF

Preparing an example dataframe

import numpy as np
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(np.random.randint(1,10,(5,4)).tolist(), list('ABCD'))
df.show()

Output

+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  6|  9|  4|  7|
|  6|  4|  7|  9|
|  2|  5|  2|  2|
|  3|  7|  4|  5|
|  8|  9|  6|  8|
+---+---+---+---+

Changing the column names

newcolumns = ['new_A','new_B','new_C','new_D']
df.toDF(*newcolumns).show()

Output

+-----+-----+-----+-----+
|new_A|new_B|new_C|new_D|
+-----+-----+-----+-----+
|    6|    9|    4|    7|
|    6|    4|    7|    9|
|    2|    5|    2|    2|
|    3|    7|    4|    5|
|    8|    9|    6|    8|
+-----+-----+-----+-----+
Sign up to request clarification or add additional context in comments.

Comments

2

If you have list of columns pre-exiting, it would work fine:

df_list = ["newName_1", "newName_2", "newName_3", "newName_4"]
renamed_df = df.toDF(*df_list)
renamed_df.show()

But if you want to make it dynamic and without relying on list of columns, here is alternate way of doing it:

df.select([col(col_name).alias(col_name) for col_name in df])

2 Comments

Thanks, second part is useful to do some string manipulation on all the columns.
Second part gives runtime error: TypeError: Column is not iterable

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.