remove a column from a dataframe spark

Question

I have a Spark dataframe with a very large number of columns. I want to remove two columns from it to get a new dataframe.

Had there been fewer columns, I could have used the select method in the API like this:

pcomments = pcomments.select(pcomments.col("post_id"),pcomments.col("comment_id"),pcomments.col("comment_message"),pcomments.col("user_name"),pcomments.col("comment_createdtime"));

But since picking columns from a long list is a tedious task, is there a workaround?

SanthoshPrasad · Accepted Answer · 2017-01-20 12:05:36Z

59

Use drop method and withColumnRenamed methods.

Example:

    val initialDf= ....

    val dfAfterDrop=initialDf.drop("column1").drop("coumn2")

    val dfAfterColRename= dfAfterDrop.withColumnRenamed("oldColumnName","new ColumnName")

answered Jan 20, 2017 at 12:05

SanthoshPrasad

1,1759 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

swdev Over a year ago

Why are you renaming columns?

Aditya Agarwal Over a year ago

If i want to first check if the column present in the DF and only if it exisits then drop it .. how can i achieve it ?

Shaido · Accepted Answer · 2020-04-14 04:09:54Z

17

Try this:

val initialDf = ...

val dfAfterDropCols = initialDf.drop("column1", "coumn2")

edited Apr 14, 2020 at 4:09

Shaido

28.6k26 gold badges76 silver badges82 bronze badges

answered Feb 10, 2020 at 9:36

Manoj Kumar Dhakad

1,9021 gold badge14 silver badges27 bronze badges

2 Comments

Manoj Kumar Dhakad Over a year ago

Yes, But I have provided different syntax.

eljusticiero67 Over a year ago

agreed this is a better answer.

Collectives™ on Stack Overflow

remove a column from a dataframe spark

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related