1

I have a StructType as follows:

to_Schema = StructType([StructField('name', StringType(), True),
           StructField('sales', IntegerType(), True)])

The dataframe_1 has both fields as StringType. So I created the above StructType so that I could use it to typecast the fields in dataframe_1.

I am able to do it in Scala:

val df2 = dataframe_1.selectExpr(to_Schema.map(
  col => s"CAST ( ${col.name} As ${col.dataType.sql}) ${col.name}"
): _*)

I am not able to use the same map function in python as StructType has no map function.

I've tried using for loop but it doesn't work as expected.

I am looking for a PySpark equivalent of the above Scala code.

2 Answers 2

2

The below code will achieve the same thing in python:

for s in to_Schema:
    df = df.withColumn(s.name, df[s.name].cast(s.dataType))

You can also create a new dataframe from the old one using the new schema as shown in this answer:

df2 = spark.createDataFrame(dataframe_1.rdd, to_Schema)
Sign up to request clarification or add additional context in comments.

Comments

1

This would be the direct translation:

df2 = dataframe_1.selectExpr(*[f"CAST ({c.name} AS {c.dataType.simpleString()}) {c.name}" for c in to_Schema])

It could be simplified:

df2 = dataframe_1.select([col(c.name).cast(c.dataType).alias(c.name) for c in to_Schema])

However, I like this answer more ;)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.