Select columns from a dataframe into another dataframe based on column datatype in Apache Spark Scala

Question

I have a spark dataframe

inputDF: org.apache.spark.sql.DataFrame = [_id: string, Frequency:              double, Monterary: double, Recency: double, CustID: string]
        root
     |-- _id: string (nullable = false)
     |-- Frequency: double (nullable = false)
     |-- Monterary: double (nullable = false)
     |-- Recency: double (nullable = false)
     |-- CustID: string (nullable = false)

I want to create a new dataframe by dropping string columns from this. Specific condition is not to iterate over the column names . Anyone has any idea ?

zero323 · Accepted Answer · 2016-01-15 01:37:45Z

4

If schema is flat and contains only simple types you can filter over fields but unless you have a crystal ball you cannot really avoid iteration:

import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions.col

df.select(df.schema.fields.flatMap(f => f.dataType match {
  case StringType => Nil
  case _ => col(f.name) :: Nil
}): _*)

answered Jan 15, 2016 at 1:37

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Select columns from a dataframe into another dataframe based on column datatype in Apache Spark Scala

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related