4

I have a dataframe where the 'location' column contains an object:

import pandas as pd

item1 = {
     'project': 'A',
     'location': {'country': 'united states', 'city': 'new york'},
     'raised_usd': 1.0}

item2 =  {
    'project': 'B',
    'location': {'country': 'united kingdom', 'city': 'cambridge'},
    'raised_usd': 5.0}

item3 =  {
    'project': 'C',
    'raised_usd': 10.0}

data = [item1, item2, item3]

df = pd.DataFrame(list(data))
df

enter image description here

I'd like to create an extra column, 'project_country', which contains just the country information, if available. I've tried the following:

def get_country(location):
    try:
        return location['country']
    except Exception:
        return 'n/a'

df['project_country'] = get_country(df['location'])
df

But this doesn't work: enter image description here

How should I go about importing this field?

1
  • 1
    Strictly in Python those are items (of the dict), not attributes. Back in the original JSON they were attributes. Commented May 29, 2018 at 11:53

5 Answers 5

4

Use apply and pass your func to it:

In [62]:

def get_country(location):
    try:
        return location['country']
    except Exception:
        return 'n/a'
​
df['project_country'] = df['location'].apply(get_country)
df
Out[62]:
                                            location project  raised_usd  \
0   {'country': 'united states', 'city': 'new york'}       A           1   
1  {'country': 'united kingdom', 'city': 'cambrid...       B           5   
2                                                NaN       C          10   

  project_country  
0   united states  
1  united kingdom  
2             n/a 

The reason your original code failed is because what is passed is the entire column or pandas Series:

In [64]:

def get_country(location):
    print(location)
    try:
        print(location['country'])
    except Exception:
        print('n/a')
​
get_country(df['location'])
0     {'country': 'united states', 'city': 'new york'}
1    {'country': 'united kingdom', 'city': 'cambrid...
2                                                  NaN
Name: location, dtype: object
n/a

As such an attempt to find the key using the entire Series raises a KeyError and you get 'n/a' returned.

Sign up to request clarification or add additional context in comments.

Comments

2

Another way to do it - use .str[<key>]. It implicitly call __getitem__ with key argument for each item:

In [17]: df['location'].str['country']
Out[17]: 
0     united states
1    united kingdom
2               NaN
Name: location, dtype: object

It returns NaN in case of error and returns value otherwise.

3 Comments

Unfortunately that does not seem to work anymore with the recent version of pandas. I'm getting "AttributeError: Can only use .str accessor with string values!`'.
@timgeb Which version do you use? I am testing with 1.2.1 right now and it works correctly.
Same as @VladyslavSuprunov I'm using 1.3.4 and it also works correctly.
1

The correct way as EdChum pointed out is to use apply on the 'location' column. You could compress that code in one line:

In [15]: df['location'].apply(lambda v: v.get('country') if isinstance(v, dict) else '')
Out[15]: 
0     united states
1    united kingdom
2                  
Name: location, dtype: object

And, assign it to a column:

In [16]: df['country'] = df['location'].apply(lambda v: v.get('country') if isinstance(v, dict) else '')

In [17]: df
Out[17]: 
                                            location project  raised_usd  \
0  {u'country': u'united states', u'city': u'new ...       A           1   
1  {u'country': u'united kingdom', u'city': u'cam...       B           5   
2                                                NaN       C          10   

          country  
0   united states  
1  united kingdom  
2 

Comments

0

With apply, you can use operator.itemgetter. Note we need to use dropna() since your column contains NaN:

from operator import itemgetter
df['location'].apply(itemgetter('country'))

df['location'].dropna().apply(itemgetter('country'))
0     united states
1    united kingdom
Name: location, dtype: object

Comments

0

When read csv file, you can use converters option:

def string_to_dict(dict_string):`
    try:
        return json.loads(dict_string)
    except Exception:
        return "N/A"

 df = pd.read_csv('../data/data.csv', converters={'locations': string_to_dict})

Access data by using from pandas import json_normalize:

normalized_locations = json_normalize(df['locations'])
df['country'] = normalized_locations['country']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.