1

I am programmatically trying to convert datatypes of columns and running into some coding issues.

I modified the code used here for this.

Data >> any numbers being read as strings.

Code >>

import org.apache.spark.sql
raw_data.schema.fields
    .collect({case x if x.dataType.typeName == "string" => x.name})
    .foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})

Error >>

<console>:75: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
           .foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})
0

1 Answer 1

2

The problem is that the result of dframe(field).cast(sql.types.IntegerType) in the foldLeft is a column, however, to continue the iteration a dataframe is expected. In the link where the code is originally from dframe.drop(field) is used which does return a dataframe and hence works.

To fix this, simply use withColumn which will adjust a specific column and then return the whole dataframe:

foldLeft(raw_data)({case(dframe, field) => dframe.withColumn(field, dframe(field).cast(sql.types.IntegerType))})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.