0

I have a large data set with over 350,000 rows. Now one of the columns contains a single value dictionary as its row values, and I was to assign each unique key as a new column in the data frame and the value as the row value in the right [row, column] position.

Here's the code I was hoping to use but due to a large number of rows it's taking too long.

idx = 0
 for row in df['value']:
    for key in row:
        if key not in df.columns.tolist():
            df[key] = 0
            df.loc[idx,key] = row[key]
        else:
            df.loc[idx,key] = row[key]
    idx += 1

Here's the sample data

import pandas as pd

df = pd.DataFrame({'time': [12,342,786],
                   'event': ['offer received', 'transaction', 'offer viewed'],
                   'value': [{'offer id': '2906b810c7d4411798c6938adc9daaa5'}, {'amount': 0.35000000000000003},
                            {'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}]
                   })

Here the expected output:

df2 = pd.DataFrame({'time': [12,342,786],
               'event': ['offer received', 'transaction', 'offer viewed'],
               'value': [{'offer id': '2906b810c7d4411798c6938adc9daaa5'}, {'amount': 0.35000000000000003},
                        {'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}],
               'offer id': ['2906b810c7d4411798c6938adc9daaa5', 0, '0b1e1539f2cc45b7b9fa7c272da2e1d7' ],
               'amount': [0, 0.35000000000000003, 0]
               })
5
  • Hi @Emi, kindly post a sample dataset(just a couple of rows) with ur expected output.Read this guide on how to create a minimal reproducible example Commented Apr 29, 2020 at 0:31
  • 1
    @sammywemmy i just did. What do you think? Commented Apr 29, 2020 at 0:52
  • good job so far, what's left is ur expected output in dataframe format. alternatively, u can check Matthew's answer and see if that is what u r after Commented Apr 29, 2020 at 1:11
  • Can you provide some more context? Where does that data come from, how does it end up in this format? There must be a better/simpler way of solving this issue. Commented Apr 29, 2020 at 1:32
  • @sammywemmy done. Commented Apr 29, 2020 at 1:42

1 Answer 1

1
df["offer id"] = df["value"].apply(lambda d: d["offer id"] if "offer id" in d else 0)
df["amount"] = df["value"].apply(lambda d: d["amount"] if "amount" in d else 0)

Output:

 time           event                                             value  \
0    12  offer received  {'offer id': '2906b810c7d4411798c6938adc9daaa5'}   
1   342     transaction                   {'amount': 0.35000000000000003}   
2   786    offer viewed  {'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}   

                           offer id  amount  
0  2906b810c7d4411798c6938adc9daaa5    0.00  
1                                 0    0.35  
2  0b1e1539f2cc45b7b9fa7c272da2e1d7    0.00  
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.