2

I'm trying to drop multiple columns from a dataframe. I've followed the example at https://stackoverflow.com/a/39818645/10836338 but it doesn't work for me. I am using spark 2.4.0

Here's what I'm doing right now. It does work, but I was wondering if there is a better way of doing this.

val colsToRemove = Seq("colA", "colB", "colC", etc) 

var filterdDF = df
 for(i <- 0 until colsToRemove.size){
    filterdDF = filterdDF.drop(colsToRemove(i))
}
2
  • 1
    drop also accepts a list of column name strings which can be used here. No need to loop. Commented Jan 3, 2019 at 19:26
  • I've already tried that but it returns an error. error: overloaded method value drop with alternatives: (col: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame <and> (colNames: String*)org.apache.spark.sql.DataFrame <and> (colName: String)org.apache.spark.sql.DataFrame Commented Jan 3, 2019 at 19:29

2 Answers 2

8
val filterdDF = df.drop(colsToRemove:_*)
Sign up to request clarification or add additional context in comments.

Comments

-2

you can try this...

val colsToRemove = Seq("emp_id", "emp_city")

val filteredDF = ds1.select(ds1.columns.filter(colName => !colsToRemove.contains(colName)).map(colname => new Column(colname)).toList: _*)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.