0

Json can have either a single value or an array of multiple values as a property value. E.g., "A":"ae" or "A":["ab", "cd"] When I read json-nd files (each line is a json), my target property may have a single value or an array of the same type value.

I need to use explode() to get the data. but when a single value is shown, then explode() complains.

So, my question is how to make all values of a certain property become an array in spark dataframe?

1 Answer 1

1

If all values are scalars you can use array function:

import org.apache.spark.sql.functions._

array(col("A")).alias("A")

spark.read.json(
  Seq("""{"A": "ae"}""").toDS).select(array(col("A")).alias("A")
).show
// +----+
// |   A|
// +----+
// |[ae]|
// +----+

If values are mixed, you have parse things manually:

import org.apache.spark.sql.types._

Seq("""{"A": "ae"}""", """{"A": ["ab", "cd"]}""").toDS.select(coalesce(
    // Attempt to parse value as array<string>
    from_json($"value", StructType.fromDDL("A array<string>"))("A"),
    // If the first one fails, try to extract it as string and enclose with array
    array(get_json_object($"value", "$.A"))
).alias("A")).show

// +--------+
// |       A|
// +--------+
// |    [ae]|
// |[ab, cd]|
// +--------+

If you use older Spark version replace:

 StructType.fromDDL("A array<string>")

with

StructType(Seq(StructField("A", ArrayType(StringType))))
Sign up to request clarification or add additional context in comments.

4 Comments

Looks like you use older version of Spark.
I got an error: error: not found: value from_json from_json($"value", StructType(Seq(StructField("A", ArrayType(StringType)))))("A"), ^ Do you know any old version of from_json()?
There isn't one. You could create an udf, but if you use that old Spark version (2.0 or earlier) it is really time for update.
I've just checked in my spark-shell it works fine with your first solution. Thank you. Just one more question: what is the meaning of "$.A" ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.