0

I'm trying to read out a json config file into my scala project. The format of the json is as follows:

{
  "parameters": [
    {
      "name": "testInteger",
      "type": "Integer",
      "value": "10"
    },
    {
      "name": "testString",
      "type": "String",
      "value": "yeah"
    }
  ]
}

I have been using spark to generate a data frame

val df = spark.read.option("multiline","true").json(path)

I am needing the data from the json file to then be read into a Map that has key "name" and value of the specified type

Expected Output:

Map: "testInteger" -> 10
     "testString" -> "yeah"

I am new to scala and unsure where to start, any advice would be appreciated.

(Note: using Java 8 and intellij to write)

4
  • Can you add your expected output ? Commented May 22, 2020 at 4:29
  • what do you want to do it after converting to map. Commented May 22, 2020 at 5:05
  • Ideally would like to be "testInteger" -> 10, "testString" -> "yeah" Commented May 22, 2020 at 5:15
  • I should clarify, there is potentially going to be more than one Map object. In this case the map would be called Parameters and would map a string (name) to the value of the specified type. I need to have the functionality to expand to include more maps Commented May 22, 2020 at 5:18

1 Answer 1

1

So, this is what you should do,

  1. Create SparkSession,
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.{ArrayType, StructType}

val spark = SparkSession.builder().master("local[2]").getOrCreate()
import spark.implicits._
  1. Create Schema,
val schema = new StructType().add(
"parameters",ArrayType.apply(
      new StructType()
          .add("name", "string")
          .add("type", "string")
          .add("value", "string")
       ))
  1. Read data set,
 val df = spark.read
      .option("multiline", "true")
      .schema(schema)
      .json("/path/to/json")
      .select(explode(col("parameters")).alias("params"))

This will give you a struct column called 'params' with fields name, type and value. This will look like,

root
 |-- params: struct (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- type: string (nullable = true)
 |    |-- value: string (nullable = true)

Note: All struct and map type columns impose type safety. So the schema cannot allow values of a different type in the same column. Thus all your values in value field will be cast to string. As per your use case, you can use udf to cast things at runtime using the field type.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.