What is the right way to read csv with dictionary in one column in Python?

Question

I have a CSV file named test.csv with the following data:

,ts,snapshot
0,1686321823605,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"
1,1686321823705,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"

I am attempting to read this file using pandas and get the following result:

import pandas as pd
df = pd.read_csv(test.csv)

Unnamed: 0 ts           snapshot    
0           0.0  1686321823605  {'asks': [['26566.1', '1.451456']], 'bids': [[...
1           1.0  1686321823705  {'asks': [['26566.1', '1.451456']], 'bids': [[...

So the snapshot column contains a dictionary in each row, but I would like to split it into four columns titled asks0, asks1, bids0, and bids1, each of which contains the respective values from the rows. So the result I am looking for should be something like this:

ts,asks0, asks1, bids0, bids1
1686321823605 , 26566.1 , 1.451456 , 26566 , 4.17579913
1686321823705 , 26566.1 , 1.451456 , 26566 , 4.17579913

So far, I could not find a way of obtaining this result.

I attempted to use pd.json_normalize(), as from other discussions here it looked as if that should do the job.

df1 = pd.json_normalize(df.snapshot)
df1

but the output has n rows x zero columns.

What am I doing wrong here?

Thank you for your feedback and sorry for the incorrect description then. I have modified the title of the question. Nevertheless, I am still puzzled as to how to solve the issue and would really appreciate any help. — djg
– djg, Commented Jun 16, 2023 at 13:48
I attempted to use pd.json_normalize(df) If you don't show us the actual code you tried, and the results, we can't point out your mistake... — John Gordon
– John Gordon, Commented Jun 16, 2023 at 23:46

justhecuke · Accepted Answer · 2023-06-17 07:28:57Z

json_normalize() is likely failing to do what you want because it doesn't know how to handle df['snapshot'] which is a malformed JSON string. ' is not a legal JSON string delimiter. Also, I'm not sure if the function in general knows to parse a string as JSON. It may require that you already have a JSON-like object to work with.

Your data is also oddly formatted in snapshot where asks and bids are nested lists for no discernable reason. If you don't need the nesting I'd suggest removing it just to get rid of some confusion.

What you can try to do is make the new columns explicitly.

# Your string is malformed JSON. `'` is not a legal string delimiter.
df['snapshot'] = df['snapshot'].str.replace("'", '"')
# We need to translate your raw string into actual objects.
df['snapshot'] = df['snapshot'].apply(json.loads)

# From here you can do `new_df = pd.json_normalize(df['snapshot'])`,
# but I don't know how to use that to do what you want.

# Explicitly create the extra columns you want
df['asks0'] = df['snapshot'].str['asks'].str[0][0]
df['asks1'] = df['snapshot'].str['asks'].str[0][1]
df['bids0'] = df['snapshot'].str['bids'].str[0][0]
df['bids1'] = df['snapshot'].str['bids'].str[0][1]

# Get rid of the old data
df.drop('snapshot', axis=1, inplace=True)

Collectives™ on Stack Overflow

What is the right way to read csv with dictionary in one column in Python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related