0

I'm attempting to create a Power Query data model with some unpleasantly-formatted JSON data, and would like to present it more logically to its consumers.

This is a heavily simplified abstraction of my data. My main data set is a list of support cases, structured like this:

[
  {
    "id": 1,
    "subject": "Spill in hallway",
    "custom_data_fields": [
      {
        "id": 234512,
        "value": "Building A"
      },
      {
        "id": 4680123,
        "value": "Maintenance"
      }
    ]
  },
  {
    "id": 2,
    "subject": "Disable user's key access",
    "custom_data_fields": [
      {
        "id": 987123,
        "value": "John Smith"
      },
      {
        "id": 4680123,
        "value": "Security"
      }
    ]
  },
  ...
]

Each id in custom_data_fields refers to a different field that could be in each record. I have another JSON file with its own list of what these custom_data_fields id values refer to, akin to:

[
  {
    "id": 234512,
    "title": "Location",
    "description": "Physical location of this issue"
  },
  {
    "id": 4680123,
    "title": "Department",
    "description": "Department responsible for this issue"
  },
  {
    "id": 987123,
    "title": "Affected User",
    "description": "User affected by this issue"
  },
  ...
]

The list of possible custom_data_fields is mutable—when these data sources are refreshed, there may be more fields, so this data model needs to avoid static renames and definitions where possible. Null values are fine in the data model (they would have to be, since not every case has every field in its nested data set.)

I need to set up a relationship between each case record and its list of custom_data_field values and the "reference list" such that this data is clear to report creators and consumers. The goal is for users working with the resulting data in Power BI to not need to know what any particular custom_data_field id refers to—they should just see "Location", "Department", "Affected User", etc. as fields to build reports from. If it makes more sense to have those custom fields in a separate but linked table/query/view, that's fine; I already have something similar (but simpler) for the list of users, for example. I understand that intermediate tables/queries from subsets of this data may be necessary, but I'm not certain exactly what the logical structure should be—that's really what I need help with. (I'm not a data scientist, so if I'm using incorrect terminology or something is unclear, I'll do my best to fix that.)

1
  • What would you expect for a result from the data you have presented in the two tables? Commented Jul 3 at 19:49

2 Answers 2

1

Try the following three queries as a demo. The following was performed all via the UI and without hard-coding any field names. In essence, unpack the first JSON, and do the same for the second JSON. Then merge the two, and pivot on the title column. Which will give a resulting table in the following format (a fat table):

id subject Location Department Affected User
1 Spill in hallway Building A Maintenance
2 Disable user's key access Security John Smith
// Query1
let
    Source = "[#(cr)#(lf)  {#(cr)#(lf)    ""id"": 1,#(cr)#(lf)    ""subject"": ""Spill in hallway"",#(cr)#(lf)    ""custom_data_fields"": [#(cr)#(lf)      {#(cr)#(lf)        ""id"": 234512,#(cr)#(lf)        ""value"": ""Building A""#(cr)#(lf)      },#(cr)#(lf)      {#(cr)#(lf)        ""id"": 4680123,#(cr)#(lf)        ""value"": ""Maintenance""#(cr)#(lf)      }#(cr)#(lf)    ]#(cr)#(lf)  },#(cr)#(lf)  {#(cr)#(lf)    ""id"": 2,#(cr)#(lf)    ""subject"": ""Disable user's key access"",#(cr)#(lf)    ""custom_data_fields"": [#(cr)#(lf)      {#(cr)#(lf)        ""id"": 987123,#(cr)#(lf)        ""value"": ""John Smith""#(cr)#(lf)      },#(cr)#(lf)      {#(cr)#(lf)        ""id"": 4680123,#(cr)#(lf)        ""value"": ""Security""#(cr)#(lf)      }#(cr)#(lf)    ]#(cr)#(lf)  }#(cr)#(lf)]",
    #"Parsed JSON" = Json.Document(Source),
    #"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "subject", "custom_data_fields"}, {"id", "subject", "custom_data_fields"}),
    #"Expanded custom_data_fields" = Table.ExpandListColumn(#"Expanded Column1", "custom_data_fields"),
    #"Expanded custom_data_fields1" = Table.ExpandRecordColumn(#"Expanded custom_data_fields", "custom_data_fields", {"id", "value"}, {"id.1", "value"})
in
    #"Expanded custom_data_fields1"

// Query2
let
    Source = "[#(cr)#(lf)  {#(cr)#(lf)    ""id"": 234512,#(cr)#(lf)    ""title"": ""Location"",#(cr)#(lf)    ""description"": ""Physical location of this issue""#(cr)#(lf)  },#(cr)#(lf)  {#(cr)#(lf)    ""id"": 4680123,#(cr)#(lf)    ""title"": ""Department"",#(cr)#(lf)    ""description"": ""Department responsible for this issue""#(cr)#(lf)  },#(cr)#(lf)  {#(cr)#(lf)    ""id"": 987123,#(cr)#(lf)    ""title"": ""Affected User"",#(cr)#(lf)    ""description"": ""User affected by this issue""#(cr)#(lf)  }]",
    #"Parsed JSON" = Json.Document(Source),
    #"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "title", "description"}, {"id", "title", "description"})
in
    #"Expanded Column1"

// Merge1
let
    Source = Table.NestedJoin(Query1, {"id.1"}, Query2, {"id"}, "Query2", JoinKind.LeftOuter),
    #"Expanded Query2" = Table.ExpandTableColumn(Source, "Query2", {"title"}, {"title"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Query2",{"id.1"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[title]), "title", "value")
in
    #"Pivoted Column"
Sign up to request clarification or add additional context in comments.

Comments

1

Depending on the relation of your real data to your sample, and what exactly you want for output, the following PQ should produce your desired results:

  • Read in the two json documents as separate queries.
    • I named them Json Data and Json Map
  • Expand them to their relevant columns
  • Combine the two tables using the id fields as the key
  • Recover the title column from the joined table
  • Group by the id column
  • Extract the department from the title column and the value from the custom_data_fields.value column.

Custom Function to set Data Types in the record
Enter into a blank query in the Advanced Editor and rename as per the comment in the code

//Rename Query "fnRecordTypes"

(fieldNames as list, fieldTypes as list)=>
   
let
    rowColumnTypes = List.Transform(fieldTypes, (t) => [Type = t, Optional = false]),
    rowType = Type.ForRecord(Record.FromList(rowColumnTypes, fieldNames),false)
in
   rowType

Json Data

let
    Source = Json.Document(File.Contents("C:\Users\ron\Desktop\json data.json")),
    
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "subject", "custom_data_fields"}, {"id", "subject", "custom_data_fields"}),
    #"Expanded custom_data_fields" = Table.ExpandListColumn(#"Expanded Column1", "custom_data_fields"),
    #"Expanded custom_data_fields1" = Table.ExpandRecordColumn(#"Expanded custom_data_fields", "custom_data_fields", {"id", "value"}, {"custom_data_fields.id", "custom_data_fields.value"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded custom_data_fields1",{{"id", Int64.Type}, {"subject", type text}, {"custom_data_fields.id", Int64.Type}, {"custom_data_fields.value", type text}})
in
    #"Changed Type"

Json Map

let
    Source = Json.Document(File.Contents("C:\Users\ron\Desktop\JsonMap.json")),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "title", "description"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{
        {"id", Int64.Type}, {"title", type text}, {"description", type text}})
in
    #"Changed Type"

Join the two Json's

let
    Source = Table.NestedJoin(#"Json Data", {"custom_data_fields.id"}, #"Json Map", {"id"}, "json data", JoinKind.LeftOuter),
    #"Removed Columns" = Table.RemoveColumns(Source,{"custom_data_fields.id"}),

    #"Get Titles" = Table.ExpandTableColumn(#"Removed Columns", "json data", {"title"}),
    
    #"Column Headers" = {"subject"} & List.Distinct(#"Get Titles"[title]),

    #"Grouped Rows" = Table.Group(#"Get Titles", {"id"}, {
        {"x", each Record.FromList({[subject]{0}} & [custom_data_fields.value], {"subject"} & [title]), 
            fnRecordTypes(#"Column Headers", List.Repeat({type text}, List.Count(#"Column Headers")))
        }}),
        
    #"Expanded x" = Table.ExpandRecordColumn(#"Grouped Rows", "x", #"Column Headers")
in
    #"Expanded x"

Results
enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.