Dropping columns by data type in Scala Spark

Question

df1.printSchema() prints out the column names and the data type that they possess.

df1.drop($"colName") will drop columns by their name.

Is there a way to adapt this command to drop by the data-type instead?

Shaido · Accepted Answer · 2018-11-27 06:45:31Z

9

If you are looking to drop specific columns in the dataframe based on the types, then the below snippet would help. In this example, I have a dataframe with two columns of type String and Int respectivly. I am dropping my String (all fields of type String would be dropped) field from the schema based on its type.

import sqlContext.implicits._

val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")

df.schema.fields
    .collect({case x if x.dataType.typeName == "string" => x.name})
    .foldLeft(df)({case(dframe,field) => dframe.drop(field)})

The schema of the newDf is org.apache.spark.sql.DataFrame = [c2: int]

edited Nov 27, 2018 at 6:45

Shaido

28.6k26 gold badges76 silver badges82 bronze badges

answered Jan 29, 2017 at 9:26

rogue-one

11.6k8 gold badges56 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ali AzG Over a year ago

How can I apply this approach on nested columns? It does not work on columns in Struct or Array type columns.

rogue-one Over a year ago

you will have to un-furl the struct and re-create the struct with all the fields except the ones you wanted to drop. Something similar along the same lines must be followed for Array type as well..

Soroosh · Accepted Answer · 2017-09-04 05:56:23Z

2

Here is a fancy way in scala:

var categoricalFeatColNames = df.schema.fields filter { _.dataType.isInstanceOf[org.apache.spark.sql.types.StringType] } map { _.name }

answered Sep 4, 2017 at 5:56

Soroosh

4772 gold badges8 silver badges18 bronze badges

Collectives™ on Stack Overflow

Dropping columns by data type in Scala Spark

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related