pandas | Read json file with list/array-like fields to Boolean columns

Question

Here is a JSON string that contains a list of objects with each having another list embedded.

[
  {
    "name": "Alice",
    "hobbies": [
      "volleyball",
      "shopping",
      "movies"
    ]
  },
  {
    "name": "Bob",
    "hobbies": [
      "fishing",
      "movies"
    ]
  }
]

Using pandas.read_json() this turns into a DataFrame like this:

  name      hobbies
  --------------------------------------
1 Alice     [volleyball, shopping, movies]
2 Bob       [fishing, movies]

However, I would like to flatten the lists into Boolean columns like this:

  name      volleyball  shopping    movies  fishing 
  ----------------------------------------------------
1 Alice     True        True        True    False
2 Bob       False       False       True    True

I.e. when the list contains a value, the field in the corresponding column is filled with a Boolean True, otherwise with False.

I have also looked into pandas.io.json.json_normalize(), but that does not seem support this idea either. Is there any built-in way (either Python3, or pandas) to do this?

(PS. I realize that you can cook up your own code to 'normalize' the dictionary objects before loading the whole list into a DataFrame, but I might be reinventing the wheel with that and probably in a very inefficient way).

MaxU - stand with Ukraine · Accepted Answer · 2016-03-16 10:59:57Z

2

you can do the following:

In [56]: data = [
   ....:   {
   ....:     "name": "Alice",
   ....:     "hobbies": [
   ....:       "volleyball",
   ....:       "shopping",
   ....:       "movies"
   ....:     ]
   ....:   },
   ....:   {
   ....:     "name": "Bob",
   ....:     "hobbies": [
   ....:       "fishing",
   ....:       "movies"
   ....:     ]
   ....:   }
   ....: ]

 In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})

In [59]: df['count'] = 1

In [60]: df
Out[60]:
        hobby   name  count
0  volleyball  Alice      1
1    shopping  Alice      1
2      movies  Alice      1
3     fishing    Bob      1
4      movies    Bob      1

In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0

Or even better:

In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)

In [89]: r
Out[89]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0

let's generate list of 'boolean' columns dynamically

In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']

In [91]: r = r[cols_boolean].astype(bool)

In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice   False   True     True       True
Bob      True   True    False      False

edited Mar 16, 2016 at 10:59

answered Mar 16, 2016 at 10:24

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

imrek Over a year ago

Thank you, in the mean time I have actually solved this with iterating over the dictionaries in the list. I will study your code.

jezrael · Accepted Answer · 2021-03-22 09:09:02Z

1

You can use crosstab with cast to bool by astype:

df = pd.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
print df
        hobby   name
0  volleyball  Alice
1    shopping  Alice
2      movies  Alice
3     fishing    Bob
4      movies    Bob

print pd.crosstab(df.name, df.hobby).astype(bool)

hobby fishing movies shopping volleyball
name                                    
Alice   False   True     True       True
Bob      True   True    False      False

edited Mar 22, 2021 at 9:09

answered Mar 16, 2016 at 11:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Subham Over a year ago

df = pd.json_normalize (data, 'hobbies', ['name']).rename(columns={0:'hobby'}) as <string>:20: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead

Collectives™ on Stack Overflow

pandas | Read json file with list/array-like fields to Boolean columns

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related