0

I want to create a dataframe with this schema:

 |-- Col1 : string (nullable = true)
 |-- Col2 : string (nullable = true)
 |-- Col3 : struct (nullable = true)
 |    |-- 513: long (nullable = true)
 |    |-- 549: long (nullable = true)

Code:

val someData = Seq(
      Row("AAAAAAAAA", "BBBBB", Seq(513, 549))
    )

val col3Fields = Seq[StructField](StructField.apply("513",IntegerType, true), StructField.apply("549",IntegerType, true))

val someSchema = List(
  StructField("Col1", StringType, true),
  StructField("Col2", StringType, true),
  StructField("Col3", StructType.apply(col3Fields), true)
)

val someDF = spark.createDataFrame(
  spark.sparkContext.parallelize(someData),
  StructType(someSchema)
)

someDF.show

But someDF.show throws:

ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.immutable.$colon$colon is not a valid external type for schema of struct<513:int,549:int> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, Col1), StringType), true, false) AS Col1#0 if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, Col2), StringType), true, false) AS Col2#1 if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else named_struct(513, if (validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)).isNullAt) null else validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)), 0, 513), IntegerType), 549, if (validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)).isNullAt) null else validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)), 1, 549), IntegerType)) AS Col3#2 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:291)

Edit:

513 and 549 should be the subcolumn names rather than the values. Here is an example of an output I expect:

someDF.select("Col1","Col2","Col3.*").show

+-----------+--------+------+------+
|       Col1|    Col1|   513|   549|
+-----------+--------+------+------+
| AAAAAAAAA |  BBBBB |    39|    38|
+-----------+--------+------+------+

1 Answer 1

2

The data you have and the Schema you have are not same, The schema you want to create is here how you create

val schema = StructType(
  Seq(
    StructField("col1", StringType, true),
    StructField("col2", StringType, true),
    StructField("col3", StructType(
      Seq(
        StructField("513", LongType, true),
        StructField("549", LongType, true)
      ))
    )
  )
)

Schema:

root
 |-- col1: string (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: struct (nullable = true)
 |    |-- 513: long (nullable = true)
 |    |-- 549: long (nullable = true)

This gives you the schema you want

You can get the data as below and apply the schema

val someData = Seq(
  Row("AAAAAAAAA", "  BBBBB", Row(39l, 38l))
)

val someDF = spark.createDataFrame(
  spark.sparkContext.parallelize(someData), schema
)

df.select("Col1","Col2","Col3.*").show 

Output:

+---------+-------+---+---+
|     Col1|   Col2|513|549|
+---------+-------+---+---+
|AAAAAAAAA|  BBBBB| 39| 38|
+---------+-------+---+---+
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for your help. Although, this is not accurately what I need. I've edited my question to exaplain my case better. Please have a look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.