4

I loaded a csv as dataframe. I would like to cast all columns to float, knowing that the file is to big to write all columns names:

val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val df = spark.read.option("header",true).option("inferSchema", "true").csv("C:/Users/mhattabi/Desktop/dataTest2.csv")

1 Answer 1

8

Given this DataFrame as example:

val df = sqlContext.createDataFrame(Seq(("0", 0),("1", 1),("2", 0))).toDF("id", "c0")

with schema:

StructType(
    StructField(id,StringType,true), 
    StructField(c0,IntegerType,false))

You can loop over DF columns by .columns functions:

val castedDF = df.columns.foldLeft(df)((current, c) => current.withColumn(c, col(c).cast("float")))

So the new DF schema looks like:

StructType(
    StructField(id,FloatType,true), 
    StructField(c0,FloatType,false))

EDIT:

If you wanna exclude some columns from casting, you could do something like (supposing we want to exclude the column id):

val exclude = Array("id")

val someCastedDF = (df.columns.toBuffer --= exclude).foldLeft(df)((current, c) =>
                                              current.withColumn(c, col(c).cast("float")))

where exclude is an Array of all columns we want to exclude from casting.

So the schema of this new DF is:

StructType(
    StructField(id,StringType,true), 
    StructField(c0,FloatType,false))

Please notice that maybe this is not the best solution to do it but it can be a starting point.

Sign up to request clarification or add additional context in comments.

4 Comments

You can use foldLeft on the columns to refrain from using a mutable var: df.columns.foldLeft(df)((current, c) => current.withColumn(c, col(c).cast("float")))
one last question before i accept your answer ,if i would like to exclude a specefic column that i would like to cast to timestamp type for exemple ,,while looping on the other columns .Thanks to update the solution
@user7394882 Do you already know the names of the columns you want to exclude from casting?
@pheeleeppoo yesi know the name of the column i need to exclude what should i do please thank

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.