6

I have a dataframe which has a lot of columns (more than 50 columns) and want to select all the columns as they are with few column names renamed by maintaining the below order. I tried the following ,

cols = list(set(df.columns) - {'id','starttime','endtime'})
df.select(col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime"),*cols,lit(proceessing_time).alias("processingtime"))

and got the error , SyntaxError: only named arguments may follow *expression

Also, instead of *cols, i tried to pass a list of column type

df.select(col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime"),([col(x) for x in cols]),lit(proceessing_time).alias("processingtime"))

which gives the following error,

`TypeError: 'Column' object is not callable`

Any help is highly appreciated.

1 Answer 1

7

We could append the columns together and select from df,

df.select([col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime")]+cols+[lit(proceessing_time).alias("processingtime")])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.