0

I have a CSV file named test.csv with the following data:

,ts,snapshot
0,1686321823605,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"
1,1686321823705,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"

I am attempting to read this file using pandas and get the following result:

import pandas as pd
df = pd.read_csv(test.csv)

Unnamed: 0 ts           snapshot    
0           0.0  1686321823605  {'asks': [['26566.1', '1.451456']], 'bids': [[...
1           1.0  1686321823705  {'asks': [['26566.1', '1.451456']], 'bids': [[...

So the snapshot column contains a dictionary in each row, but I would like to split it into four columns titled asks0, asks1, bids0, and bids1, each of which contains the respective values from the rows. So the result I am looking for should be something like this:

ts,asks0, asks1, bids0, bids1
1686321823605 , 26566.1 , 1.451456 , 26566 , 4.17579913
1686321823705 , 26566.1 , 1.451456 , 26566 , 4.17579913

So far, I could not find a way of obtaining this result.

I attempted to use pd.json_normalize(), as from other discussions here it looked as if that should do the job.

df1 = pd.json_normalize(df.snapshot)
df1

but the output has n rows x zero columns.

What am I doing wrong here?

4
  • 2
    That's not JSON. That's a Python literal at best. Commented Jun 16, 2023 at 13:03
  • 1
    Thank you for your feedback and sorry for the incorrect description then. I have modified the title of the question. Nevertheless, I am still puzzled as to how to solve the issue and would really appreciate any help. Commented Jun 16, 2023 at 13:48
  • I attempted to use pd.json_normalize(df) If you don't show us the actual code you tried, and the results, we can't point out your mistake... Commented Jun 16, 2023 at 23:46
  • Thank you John for your reply. I edited accordingly. Commented Jun 17, 2023 at 6:21

1 Answer 1

0

json_normalize() is likely failing to do what you want because it doesn't know how to handle df['snapshot'] which is a malformed JSON string. ' is not a legal JSON string delimiter. Also, I'm not sure if the function in general knows to parse a string as JSON. It may require that you already have a JSON-like object to work with.

Your data is also oddly formatted in snapshot where asks and bids are nested lists for no discernable reason. If you don't need the nesting I'd suggest removing it just to get rid of some confusion.

What you can try to do is make the new columns explicitly.

# Your string is malformed JSON. `'` is not a legal string delimiter.
df['snapshot'] = df['snapshot'].str.replace("'", '"')
# We need to translate your raw string into actual objects.
df['snapshot'] = df['snapshot'].apply(json.loads)

# From here you can do `new_df = pd.json_normalize(df['snapshot'])`,
# but I don't know how to use that to do what you want.

# Explicitly create the extra columns you want
df['asks0'] = df['snapshot'].str['asks'].str[0][0]
df['asks1'] = df['snapshot'].str['asks'].str[0][1]
df['bids0'] = df['snapshot'].str['bids'].str[0][0]
df['bids1'] = df['snapshot'].str['bids'].str[0][1]

# Get rid of the old data
df.drop('snapshot', axis=1, inplace=True)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.