0

I have this particular JSON file which is a log generated by PostgreSQL. I turned this JSON file from multi-line format to single line format. In one of the fields in the Dataframe that I parse there is a String Column. This String Column is by itself in JSON format with this example:

"query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "671.64"
    },
    "ordering_operation": {
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "100.00"
      },
      "table": {
        "table_name": "test1",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "id"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 100,
        "rows_produced_per_join": 100,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "21.64",
          "eval_cost": "20.00",
          "prefix_cost": "41.64",
          "data_read_per_join": "18K"
        },
        "used_columns": [
          "id",
          "c"
        ],
        "attached_condition": "(`proxydemo`.`test1`.`id` between 501747 and <cache>((504767 + 99)))"
      }
    }
  }
}

I know that in Spark 2.0+ I can use

from_json(e: Column, schema: StructType): Column

function from SparkSQL functions. But I am not sure what should be the schema for this String. I have done many schema and StructType definitions but this one is kinda hierarchical! and I do not understand how this schema should be defined! `

1
  • I just wanted to add that I do not ask for someone to complete the schema for me, only to know what is the technique for explaining this hierarchical data's schema? Commented Apr 16, 2017 at 16:36

1 Answer 1

0

I found out how nested schemas work.

In this particular example schemas go as this:

For the root of the object:

  val query_block_schema = (new StructType)
      .add("select_id", LongType)
      .add("cost_info", StringType)
      .add("ordering_operation", StringType)

For the second layer:

  val query_plan_schema = (new StructType)
    .add("query_block", StringType)

and so on...

So I consider this problem as solved. Later on, I merge all these together in case they are not null and basically flat the whole nested object.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.