1

I am new to Spark and Scala i am stuck on this exception, I am trying to add some extra fields, i.e. StructField to an existing StructType retrieved from Data Frame for a column using Spark SQL and gettting below exception.

code snippet:

val dfStruct:StructType=parquetDf.select("columnname").schema
dfStruct.add("newField","IntegerType",true)

Exception in thread "main"

 org.apache.spark.sql.types.DataTypeException: Unsupported dataType: IntegerType. If you have a struct and a field name of it has any special characters, please use backticks (`) to quote that field name, e.g. `x+y`. Please note that backtick itself is not supported in a field name.
    at org.apache.spark.sql.types.DataTypeParser$class.toDataType(DataTypeParser.scala:95)
    at org.apache.spark.sql.types.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:107)
    at org.apache.spark.sql.types.DataTypeParser$.parse(DataTypeParser.scala:111)

I can see there some open issues running on jira related to this exception but not able to understand much. I am using Spark 1.5.1 version

https://mail-archives.apache.org/mod_mbox/spark-issues/201508.mbox/%3CJIRA.12852533.1438855066000.143133.1440397426473@Atlassian.JIRA%3E

https://mail-archives.apache.org/mod_mbox/spark-issues/201508.mbox/%3CJIRA.12852533.1438855066000.143133.1440397426473@Atlassian.JIRA%3E

https://issues.apache.org/jira/browse/SPARK-9685

0

1 Answer 1

1

When you use StructType.add with a following signature:

add(name: String, dataType: String, nullable: Boolean)

dataType string should correspond to either .simpleString or .typeName. For IntegerType it is either int:

import org.apache.spark.sql.types._

IntegerType.simpleString
// String = int

or integer:

IntegerType.typeName
// String = integer

so what you need is something like this:

val schema = StructType(Nil)

schema.add("foo", "int", true)
// org.apache.spark.sql.types.StructType = 
//   StructType(StructField(foo,IntegerType,true))

or

schema.add("foo", "integer", true)
// org.apache.spark.sql.types.StructType = 
//   StructType(StructField(foo,IntegerType,true))

If you want to pass IntegerType it has to be DataType not String:

schema.add("foo", IntegerType, true)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the quick response zero323, I am now able to append StructFiled to existing StructType. The cause was my IDE has not imported "import org.apache.spark.sql.types._" hence was not able to compile this code StructType( StructField("column1", DoubleType, false) :: StructField("column2", DoubleType, false) :: StructField("column3", StringType, false) ::Nil) hence i was tyring with api dfStruct.add("newField","IntegerType",true) after writing the import statement manually i am able to fix this problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.