Exploding rows with identical nested keys in pandas

Question

I have data in a column that looks like this in a pandas dataframe:

'Column name':    
    [{'Name': Dan Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}, {'Name': Bob Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}], 
    
    [{'Name': Shelly Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}, {'Name': Sam Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}], 
    
    {'Name': Jane Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6},
     
    [{'Name': Chris Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}, {'Name': Darryl Smith, 'Attribute1': 4, 'Attribute2': 15, 'Attribute3': 6}],

Companies are separated by [] except in instances where the company only has 1 observation (such as the 3rd observation in this example with Jane Smith). My problem is trying to parse the nested keys when the nested keys are identical. My goal is to grab the attribute with the highest value for each company.

I've tried:

 df = df.explode('Column Name')

However, that does not do anything. The observations appear the same as before. After some research I tried the following

from ast import literal_eval
df['Column name'] = df['Column name'].apply(literal_eval)
df = df.explode('Column Name')

However, when I do this, I get a "KeyError:0" return. I found that this error occurs due to instances like the 3rd row where there is only 1 observation for that company. I can explode small samples of my data and grab the highest attribute and proceed according to plan. However, I have 1.62 million rows so splitting the sample into small batches is not wise.

Is there a way to pass the 'KeyError:0' exceptions? Or is there a better way to get where I'm trying to go? I'm new to Python/Pandas.

Keto · Accepted Answer · 2021-05-12 19:05:09Z

1

def tolist(x):
    if isinstance(x, dict):
        return [x]
    else:
        return x

df['Column name'] = df['Column name'].apply(literal_eval).apply(tolist)
df = df.explode('Column name')

Explanation

To use explode, every row must be a sequence type (list for this case). The first thing you need to do is clean up any rows where it's a single element and convert it a list of one element

    [{'Name': Jane Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}],

edited May 12, 2021 at 19:05

answered May 12, 2021 at 18:59

Keto

2,3481 gold badge20 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sk7785 Over a year ago

Thank you so much, you have made the day for a very, very stressed out PhD student!

Collectives™ on Stack Overflow

Exploding rows with identical nested keys in pandas

1 Answer 1

Explanation

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Explanation

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related