Cast all specific datatype columns into other datatypes programatically in Scala Spark

Question

I am programmatically trying to convert datatypes of columns and running into some coding issues.

I modified the code used here for this.

Data >> any numbers being read as strings.

Code >>

import org.apache.spark.sql
raw_data.schema.fields
    .collect({case x if x.dataType.typeName == "string" => x.name})
    .foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})

Error >>

<console>:75: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
           .foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})

Shaido · Accepted Answer · 2018-11-27 06:52:18Z

2

The problem is that the result of dframe(field).cast(sql.types.IntegerType) in the foldLeft is a column, however, to continue the iteration a dataframe is expected. In the link where the code is originally from dframe.drop(field) is used which does return a dataframe and hence works.

To fix this, simply use withColumn which will adjust a specific column and then return the whole dataframe:

foldLeft(raw_data)({case(dframe, field) => dframe.withColumn(field, dframe(field).cast(sql.types.IntegerType))})

answered Nov 27, 2018 at 6:52

Shaido

28.6k26 gold badges76 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Cast all specific datatype columns into other datatypes programatically in Scala Spark

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related