5

Here is a JSON string that contains a list of objects with each having another list embedded.

[
  {
    "name": "Alice",
    "hobbies": [
      "volleyball",
      "shopping",
      "movies"
    ]
  },
  {
    "name": "Bob",
    "hobbies": [
      "fishing",
      "movies"
    ]
  }
]

Using pandas.read_json() this turns into a DataFrame like this:

  name      hobbies
  --------------------------------------
1 Alice     [volleyball, shopping, movies]
2 Bob       [fishing, movies]

However, I would like to flatten the lists into Boolean columns like this:

  name      volleyball  shopping    movies  fishing 
  ----------------------------------------------------
1 Alice     True        True        True    False
2 Bob       False       False       True    True

I.e. when the list contains a value, the field in the corresponding column is filled with a Boolean True, otherwise with False.

I have also looked into pandas.io.json.json_normalize(), but that does not seem support this idea either. Is there any built-in way (either Python3, or pandas) to do this?

(PS. I realize that you can cook up your own code to 'normalize' the dictionary objects before loading the whole list into a DataFrame, but I might be reinventing the wheel with that and probably in a very inefficient way).

2 Answers 2

2

you can do the following:

In [56]: data = [
   ....:   {
   ....:     "name": "Alice",
   ....:     "hobbies": [
   ....:       "volleyball",
   ....:       "shopping",
   ....:       "movies"
   ....:     ]
   ....:   },
   ....:   {
   ....:     "name": "Bob",
   ....:     "hobbies": [
   ....:       "fishing",
   ....:       "movies"
   ....:     ]
   ....:   }
   ....: ]

 In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})

In [59]: df['count'] = 1

In [60]: df
Out[60]:
        hobby   name  count
0  volleyball  Alice      1
1    shopping  Alice      1
2      movies  Alice      1
3     fishing    Bob      1
4      movies    Bob      1

In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0

Or even better:

In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)

In [89]: r
Out[89]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0

let's generate list of 'boolean' columns dynamically

In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']

In [91]: r = r[cols_boolean].astype(bool)

In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice   False   True     True       True
Bob      True   True    False      False
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, in the mean time I have actually solved this with iterating over the dictionaries in the list. I will study your code.
1

You can use crosstab with cast to bool by astype:

df = pd.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
print df
        hobby   name
0  volleyball  Alice
1    shopping  Alice
2      movies  Alice
3     fishing    Bob
4      movies    Bob

print pd.crosstab(df.name, df.hobby).astype(bool)

hobby fishing movies shopping volleyball
name                                    
Alice   False   True     True       True
Bob      True   True    False      False

1 Comment

df = pd.json_normalize (data, 'hobbies', ['name']).rename(columns={0:'hobby'}) as <string>:20: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.