1

I have a list with some empty values and some w/ nested JSON. The data looks like this:

 [[],
 [],
 [{'id': 32,
   'globalId': 'a73dec29-9431-4806-a4f7-0667872746ce',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_9774.jpeg',
   'contentType': 'image/jpeg',
   'size': 157893,
   'keywords': '',
   'exifInfo': None},
  {'id': 33,
   'globalId': '0455db91-946e-4fae-8aab-0a4729219527',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_9766.jpeg',
   'contentType': 'image/jpeg',
   'size': 160480,
   'keywords': '',
   'exifInfo': None},
  {'id': 34,
   'globalId': '4c036305-a1c5-4689-8640-1dc79aaf0358',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_3870.jpeg',
   'contentType': 'image/jpeg',
   'size': 757939,
   'keywords': '',
   'exifInfo': None},
  {'id': 35,
   'globalId': '1868ac95-1830-45fb-8f15-975ef0e14338',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_2357.jpeg',
   'contentType': 'image/jpeg',
   'size': 4500893,
   'keywords': '',
   'exifInfo': None}],
 []]

Using a simple json_normalize ()

test = pd.json_normalize(attach)
test

I get the following result:

    0   1   2   3   4
0   None    None    None    None    None
1   None    None    None    None    None
2   None    None    None    None    None
3   None    None    None    None    None
4   None    None    None    None    None
... ... ... ... ... ...
83  None    None    None    None    None
84  None    None    None    None    None
85  None    None    None    None    None
86  {'id': 32, 'globalId': 'a73dec29-9431-4806-a4f...   {'id': 33, 'globalId': '0455db91-946e-4fae-8aa...   {'id': 34, 'globalId': '4c036305-a1c5-4689-864...   {'id': 35, 'globalId': '1868ac95-1830-45fb-8f1...   None
87  None    None    None    None    None

I would ideally have a dataframe w/ each key in the JSON/object as a column name, something like:

id  globalId                parentGlobalId              name        contentType size    keywords    exifInfo
None    None                    None                    None        None        None    None        None
None    None                    None                    None        None        None    None        None
32  a73dec29-9431-4806-a4f7-0667872746ce    ad21cef5-cfa7-4e52-ab8f-8b5da30020af    IMG_9774.jpeg   image/jpeg  157893  None        None
33  0455db91-946e-4fae-8aab-0a4729219527    ad21cef5-cfa7-4e52-ab8f-8b5da30020af    IMG_9766.jpeg   image/jpeg  160480  None        None
34  4c036305-a1c5-4689-8640-1dc79aaf0358    ad21cef5-cfa7-4e52-ab8f-8b5da30020af    IMG_3870.jpeg   image/jpeg  757939  None        None
None    None                    None                    None        None        None    None        None

I've experimented a bunch with the parameters in the json_normalize() method with no luck.

1 Answer 1

1

If lst is your list from the question you can do:

df = pd.DataFrame([d for l in lst for d in (l or [{}])])
print(df)

Prints:

     id                              globalId                        parentGlobalId           name contentType       size keywords  exifInfo
0   NaN                                   NaN                                   NaN            NaN         NaN        NaN      NaN       NaN
1   NaN                                   NaN                                   NaN            NaN         NaN        NaN      NaN       NaN
2  32.0  a73dec29-9431-4806-a4f7-0667872746ce  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_9774.jpeg  image/jpeg   157893.0                NaN
3  33.0  0455db91-946e-4fae-8aab-0a4729219527  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_9766.jpeg  image/jpeg   160480.0                NaN
4  34.0  4c036305-a1c5-4689-8640-1dc79aaf0358  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_3870.jpeg  image/jpeg   757939.0                NaN
5  35.0  1868ac95-1830-45fb-8f15-975ef0e14338  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_2357.jpeg  image/jpeg  4500893.0                NaN
6   NaN                                   NaN                                   NaN            NaN         NaN        NaN      NaN       NaN
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! Can you please explain this loop for me?
@ZacStanley I've just flattened the list, making it only list of dictionaries - substituting empty lists ([]) with empty dictionaries ({}). That way pandas will construct correct dataframe.
Ah! @Andrej Kesely. The empty list wasn't allowing the flattening. Very helpful. Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.