I'm attempting to create a Power Query data model with some unpleasantly-formatted JSON data, and would like to present it more logically to its consumers.
This is a heavily simplified abstraction of my data. My main data set is a list of support cases, structured like this:
[
{
"id": 1,
"subject": "Spill in hallway",
"custom_data_fields": [
{
"id": 234512,
"value": "Building A"
},
{
"id": 4680123,
"value": "Maintenance"
}
]
},
{
"id": 2,
"subject": "Disable user's key access",
"custom_data_fields": [
{
"id": 987123,
"value": "John Smith"
},
{
"id": 4680123,
"value": "Security"
}
]
},
...
]
Each id in custom_data_fields refers to a different field that could be in each record. I have another JSON file with its own list of what these custom_data_fields id values refer to, akin to:
[
{
"id": 234512,
"title": "Location",
"description": "Physical location of this issue"
},
{
"id": 4680123,
"title": "Department",
"description": "Department responsible for this issue"
},
{
"id": 987123,
"title": "Affected User",
"description": "User affected by this issue"
},
...
]
The list of possible custom_data_fields is mutable—when these data sources are refreshed, there may be more fields, so this data model needs to avoid static renames and definitions where possible. Null values are fine in the data model (they would have to be, since not every case has every field in its nested data set.)
I need to set up a relationship between each case record and its list of custom_data_field values and the "reference list" such that this data is clear to report creators and consumers. The goal is for users working with the resulting data in Power BI to not need to know what any particular custom_data_field id refers to—they should just see "Location", "Department", "Affected User", etc. as fields to build reports from. If it makes more sense to have those custom fields in a separate but linked table/query/view, that's fine; I already have something similar (but simpler) for the list of users, for example. I understand that intermediate tables/queries from subsets of this data may be necessary, but I'm not certain exactly what the logical structure should be—that's really what I need help with. (I'm not a data scientist, so if I'm using incorrect terminology or something is unclear, I'll do my best to fix that.)
