How to read attributes of an object column, using Python Pandas

Question

I have a dataframe where the 'location' column contains an object:

import pandas as pd

item1 = {
     'project': 'A',
     'location': {'country': 'united states', 'city': 'new york'},
     'raised_usd': 1.0}

item2 =  {
    'project': 'B',
    'location': {'country': 'united kingdom', 'city': 'cambridge'},
    'raised_usd': 5.0}

item3 =  {
    'project': 'C',
    'raised_usd': 10.0}

data = [item1, item2, item3]

df = pd.DataFrame(list(data))
df

enter image description here

I'd like to create an extra column, 'project_country', which contains just the country information, if available. I've tried the following:

def get_country(location):
    try:
        return location['country']
    except Exception:
        return 'n/a'

df['project_country'] = get_country(df['location'])
df

But this doesn't work: enter image description here

How should I go about importing this field?

Strictly in Python those are items (of the dict), not attributes. Back in the original JSON they were attributes. — smci
– smci, Commented May 29, 2018 at 11:53

EdChum · Accepted Answer · 2015-05-21 11:19:19Z

Use apply and pass your func to it:

In [62]:

def get_country(location):
    try:
        return location['country']
    except Exception:
        return 'n/a'

df['project_country'] = df['location'].apply(get_country)
df
Out[62]:
                                            location project  raised_usd  \
0   {'country': 'united states', 'city': 'new york'}       A           1   
1  {'country': 'united kingdom', 'city': 'cambrid...       B           5   
2                                                NaN       C          10   

  project_country  
0   united states  
1  united kingdom  
2             n/a

The reason your original code failed is because what is passed is the entire column or pandas Series:

In [64]:

def get_country(location):
    print(location)
    try:
        print(location['country'])
    except Exception:
        print('n/a')

get_country(df['location'])
0     {'country': 'united states', 'city': 'new york'}
1    {'country': 'united kingdom', 'city': 'cambrid...
2                                                  NaN
Name: location, dtype: object
n/a

As such an attempt to find the key using the entire Series raises a KeyError and you get 'n/a' returned.

Vladyslav Suprunov · Accepted Answer · 2020-02-27 13:46:14Z

2

Another way to do it - use .str[<key>]. It implicitly call __getitem__ with key argument for each item:

In [17]: df['location'].str['country']
Out[17]: 
0     united states
1    united kingdom
2               NaN
Name: location, dtype: object

It returns NaN in case of error and returns value otherwise.

edited Feb 27, 2020 at 13:46

answered Feb 27, 2020 at 12:27

Vladyslav Suprunov

314 bronze badges

3 Comments

timgeb Over a year ago

Unfortunately that does not seem to work anymore with the recent version of pandas. I'm getting "AttributeError: Can only use .str accessor with string values!`'.

Vladyslav Suprunov Over a year ago

@timgeb Which version do you use? I am testing with 1.2.1 right now and it works correctly.

johnnyasd12 Over a year ago

Same as @VladyslavSuprunov I'm using 1.3.4 and it also works correctly.

fixxxer · Accepted Answer · 2015-05-21 11:38:36Z

The correct way as EdChum pointed out is to use apply on the 'location' column. You could compress that code in one line:

In [15]: df['location'].apply(lambda v: v.get('country') if isinstance(v, dict) else '')
Out[15]: 
0     united states
1    united kingdom
2                  
Name: location, dtype: object

And, assign it to a column:

In [16]: df['country'] = df['location'].apply(lambda v: v.get('country') if isinstance(v, dict) else '')

In [17]: df
Out[17]: 
                                            location project  raised_usd  \
0  {u'country': u'united states', u'city': u'new ...       A           1   
1  {u'country': u'united kingdom', u'city': u'cam...       B           5   
2                                                NaN       C          10   

          country  
0   united states  
1  united kingdom  
2

smci · Accepted Answer · 2018-05-29 11:52:41Z

0

With apply, you can use operator.itemgetter. Note we need to use dropna() since your column contains NaN:

from operator import itemgetter
df['location'].apply(itemgetter('country'))

df['location'].dropna().apply(itemgetter('country'))
0     united states
1    united kingdom
Name: location, dtype: object

answered May 29, 2018 at 11:52

smci

34.2k21 gold badges118 silver badges152 bronze badges

Comments

Nadeeshani William · Accepted Answer · 2021-09-21 06:50:11Z

0

When read csv file, you can use converters option:

def string_to_dict(dict_string):`
    try:
        return json.loads(dict_string)
    except Exception:
        return "N/A"

 df = pd.read_csv('../data/data.csv', converters={'locations': string_to_dict})

Access data by using from pandas import json_normalize:

normalized_locations = json_normalize(df['locations'])
df['country'] = normalized_locations['country']

answered Sep 21, 2021 at 6:50

Nadeeshani William

8188 silver badges13 bronze badges

Collectives™ on Stack Overflow

How to read attributes of an object column, using Python Pandas

5 Answers 5

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related