I have csv stored in s3 location which has data like this
column1 | column2 |
--------+----------
| adsf | 2000.0 |
| fff | 232.34 |
I have a AWS Glue Job in Scala which reads this file into a dataframe
var srcDF= glueContext.getCatalogSource(database = '',
tableName = '',
redshiftTmpDir = "",
transformationContext = "").getDynamicFrame().toDF()
When I print the schema, it infers itself like this
srcDF.printSchema()
|-- column1 : string |
|-- column2 : struct (double, string) |
And the dataframe looks like
column1 | column2 |
--------+-------------
| adsf | [2000.0,] |
| fff | [232.34,] |
When I try to save the dataframe to csv it complains that
org.apache.spark.sql.AnalysisException CSV data source does not support struct<double:double,string:string> data type.
How do I convert dataframe so that only the columns of Struct type (if exist) to decimal type? Output like this
column1 | column2 |
--------+----------
| adsf | 2000.0 |
| fff | 232.34 |
Edit:
Thanks for the response. I have tried using following code
df.select($"column2._1".alias("column2")).show()
But got the same error for both
org.apache.spark.sql.AnalysisException No such struct field _1 in double, string;
Edit 2:
It seems the spark, the columns were flattened and renamed as "double,string"
So, this solution worked for me
df.select($"column2.double").show()