41

I have a Spark dataframe with a very large number of columns. I want to remove two columns from it to get a new dataframe.

Had there been fewer columns, I could have used the select method in the API like this:

pcomments = pcomments.select(pcomments.col("post_id"),pcomments.col("comment_id"),pcomments.col("comment_message"),pcomments.col("user_name"),pcomments.col("comment_createdtime"));

But since picking columns from a long list is a tedious task, is there a workaround?

2 Answers 2

59

Use drop method and withColumnRenamed methods.

Example:

    val initialDf= ....

    val dfAfterDrop=initialDf.drop("column1").drop("coumn2")

    val dfAfterColRename= dfAfterDrop.withColumnRenamed("oldColumnName","new ColumnName")
Sign up to request clarification or add additional context in comments.

2 Comments

Why are you renaming columns?
If i want to first check if the column present in the DF and only if it exisits then drop it .. how can i achieve it ?
17

Try this:

val initialDf = ...

val dfAfterDropCols = initialDf.drop("column1", "coumn2")

2 Comments

Yes, But I have provided different syntax.
agreed this is a better answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.