1

I have a Json file that looks like the following. I want to grab the strings of names in the "actors" list and add them to a dataframe (which is empty now, the first item added to the dataframe would be the strings of actor names as rows).


{
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus%3A%20Gods%20and%20Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley"
        ]
    },
...

I have tried using the following python code to do this but I am unsuccesful, I beleive because I am using a function wrong or not using the right function at all. Any suggestions as to what function/method to use?

# Create dataframe from json file
df_json = pd.read_json("movies_metadata.json", encoding='latin-1')

# Create new dataframe with actor names
data = [df.iloc[4]]
df = pd.DataFrame(data)

I strongly beleive that my code is very poor, but have had a hard time finding how to do this when googling.

Tried googling all around, as well as different methods from pandas to add items to dataframes

2 Answers 2

2

You can use list-comprehension to get actors from the dictionary and then construct a dataframe. For example:

data = {
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus%3A%20Gods%20and%20Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley",
        ],
    }
}

df = pd.DataFrame(
    [actor for v in data.values() for actor in v["actors"]], columns=["Actors"]
)
print(df)

Prints:

             Actors
0    Christian Bale
1     Joel Edgerton
2     John Turturro
3        Aaron Paul
4    Ben Mendelsohn
5  Sigourney Weaver
6      Ben Kingsley
Sign up to request clarification or add additional context in comments.

3 Comments

My JSON file does not have the data attribute that encompasses it all, do you suggest that I just add it manually or is there something else I can try? I attempted using df_json instead of data, which is the dataframe with the JSON info, but get TypeError: 'numpy.ndarray' object is not callable
@NicoO Try to use json module to load the json. For example data = json.load(open("movies_metadata.json"))
do you have an idea of how I could add another column which is the "title" item from the JSON list to the current dataframe, so each actor is in the same row as their movie? I tried using your respective code that you used for actors but was unsuccesful.
1
# read in the json file
df =pd.read_json('txt.json')


#if you have multiple json records, each will be its own columns
# filter the actor rows and then explode 
df.loc['actors',:].explode()

1       Christian Bale
1        Joel Edgerton
1        John Turturro
1           Aaron Paul
1       Ben Mendelsohn
1     Sigourney Weaver
1         Ben Kingsley
2      2Christian Bale
2       2Joel Edgerton
2       2John Turturro
2          2Aaron Paul
2      2Ben Mendelsohn
2    2Sigourney Weaver
2        2Ben Kingsley
Name: actors, dtype: object

Resetting the index

df.loc['actors',:].explode().reset_index()
    index   actors
0   1   Christian Bale
1   1   Joel Edgerton
2   1   John Turturro
3   1   Aaron Paul
4   1   Ben Mendelsohn
5   1   Sigourney Weaver
6   1   Ben Kingsley
7   2   2Christian Bale
8   2   2Joel Edgerton
9   2   2John Turturro
10  2   2Aaron Paul
11  2   2Ben Mendelsohn
12  2   2Sigourney Weaver
13  2   2Ben Kingsley

Alternate Solution

(df[df.index.isin( ['actors','title'])]
 .T
 .explode('actors')
 .reset_index())

    index   title               actors
0   1   Exodus: Gods and Kings  Christian Bale
1   1   Exodus: Gods and Kings  Joel Edgerton
2   1   Exodus: Gods and Kings  John Turturro
3   1   Exodus: Gods and Kings  Aaron Paul
4   1   Exodus: Gods and Kings  Ben Mendelsohn
5   1   Exodus: Gods and Kings  Sigourney Weaver
6   1   Exodus: Gods and Kings  Ben Kingsley
7   2   Exodus: Gods and Kings  2Christian Bale
8   2   Exodus: Gods and Kings  2Joel Edgerton
9   2   Exodus: Gods and Kings  2John Turturro
10  2   Exodus: Gods and Kings  2Aaron Paul
11  2   Exodus: Gods and Kings  2Ben Mendelsohn
12  2   Exodus: Gods and Kings  2Sigourney Weaver
13  2   Exodus: Gods and Kings  2Ben Kingsley

PS: I expanded your JSON file to hae two records in it

3 Comments

This works perfectly, thank you. How would I go about in adding an index? I attempted df_actor.columns = ['movie_id', 'actor_name'] which didnt work, and If im not wrong I cant use set_index unless I have column names
do a reset_index() @NicoO
@NicoO, added alternate solution with title added as a column

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.