Adding lists from nested JSON file to pandas dataframe?

Question

I have a Json file that looks like the following. I want to grab the strings of names in the "actors" list and add them to a dataframe (which is empty now, the first item added to the dataframe would be the strings of actor names as rows).


{
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus%3A%20Gods%20and%20Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley"
        ]
    },
...

I have tried using the following python code to do this but I am unsuccesful, I beleive because I am using a function wrong or not using the right function at all. Any suggestions as to what function/method to use?

# Create dataframe from json file
df_json = pd.read_json("movies_metadata.json", encoding='latin-1')

# Create new dataframe with actor names
data = [df.iloc[4]]
df = pd.DataFrame(data)

I strongly beleive that my code is very poor, but have had a hard time finding how to do this when googling.

Tried googling all around, as well as different methods from pandas to add items to dataframes

Andrej Kesely · Accepted Answer · 2022-11-07 23:36:05Z

2

You can use list-comprehension to get actors from the dictionary and then construct a dataframe. For example:

data = {
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus%3A%20Gods%20and%20Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley",
        ],
    }
}

df = pd.DataFrame(
    [actor for v in data.values() for actor in v["actors"]], columns=["Actors"]
)
print(df)

Prints:

             Actors
0    Christian Bale
1     Joel Edgerton
2     John Turturro
3        Aaron Paul
4    Ben Mendelsohn
5  Sigourney Weaver
6      Ben Kingsley

answered Nov 7, 2022 at 23:36

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nic.o Over a year ago

My JSON file does not have the data attribute that encompasses it all, do you suggest that I just add it manually or is there something else I can try? I attempted using df_json instead of data, which is the dataframe with the JSON info, but get TypeError: 'numpy.ndarray' object is not callable

Andrej Kesely Over a year ago

@NicoO Try to use json module to load the json. For example data = json.load(open("movies_metadata.json"))

nic.o Over a year ago

do you have an idea of how I could add another column which is the "title" item from the JSON list to the current dataframe, so each actor is in the same row as their movie? I tried using your respective code that you used for actors but was unsuccesful.

Naveed · Accepted Answer · 2022-11-08 21:00:49Z

1

# read in the json file
df =pd.read_json('txt.json')


#if you have multiple json records, each will be its own columns
# filter the actor rows and then explode 
df.loc['actors',:].explode()

1       Christian Bale
1        Joel Edgerton
1        John Turturro
1           Aaron Paul
1       Ben Mendelsohn
1     Sigourney Weaver
1         Ben Kingsley
2      2Christian Bale
2       2Joel Edgerton
2       2John Turturro
2          2Aaron Paul
2      2Ben Mendelsohn
2    2Sigourney Weaver
2        2Ben Kingsley
Name: actors, dtype: object

Resetting the index

df.loc['actors',:].explode().reset_index()

    index   actors
0   1   Christian Bale
1   1   Joel Edgerton
2   1   John Turturro
3   1   Aaron Paul
4   1   Ben Mendelsohn
5   1   Sigourney Weaver
6   1   Ben Kingsley
7   2   2Christian Bale
8   2   2Joel Edgerton
9   2   2John Turturro
10  2   2Aaron Paul
11  2   2Ben Mendelsohn
12  2   2Sigourney Weaver
13  2   2Ben Kingsley

Alternate Solution

(df[df.index.isin( ['actors','title'])]
 .T
 .explode('actors')
 .reset_index())


    index   title               actors
0   1   Exodus: Gods and Kings  Christian Bale
1   1   Exodus: Gods and Kings  Joel Edgerton
2   1   Exodus: Gods and Kings  John Turturro
3   1   Exodus: Gods and Kings  Aaron Paul
4   1   Exodus: Gods and Kings  Ben Mendelsohn
5   1   Exodus: Gods and Kings  Sigourney Weaver
6   1   Exodus: Gods and Kings  Ben Kingsley
7   2   Exodus: Gods and Kings  2Christian Bale
8   2   Exodus: Gods and Kings  2Joel Edgerton
9   2   Exodus: Gods and Kings  2John Turturro
10  2   Exodus: Gods and Kings  2Aaron Paul
11  2   Exodus: Gods and Kings  2Ben Mendelsohn
12  2   Exodus: Gods and Kings  2Sigourney Weaver
13  2   Exodus: Gods and Kings  2Ben Kingsley

PS: I expanded your JSON file to hae two records in it

edited Nov 8, 2022 at 21:00

answered Nov 7, 2022 at 23:38

Naveed

11.7k2 gold badges16 silver badges21 bronze badges

3 Comments

nic.o Over a year ago

This works perfectly, thank you. How would I go about in adding an index? I attempted df_actor.columns = ['movie_id', 'actor_name'] which didnt work, and If im not wrong I cant use set_index unless I have column names

Naveed Over a year ago

do a reset_index() @NicoO

Naveed Over a year ago

@NicoO, added alternate solution with title added as a column

Collectives™ on Stack Overflow

Adding lists from nested JSON file to pandas dataframe?

2 Answers 2

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related