0

I have this relatively large (9mb) JSON, it's a list of dicts (I don't know if that's the convention for JSON) any way I've been able to read it in and turn into a data frame.

The data is a backtest for a predictive model model and is of the format:

[{"assetname":"xxx", 'return':0.9, "timestamp":1451080800},{"assetname":"xxx", 'return':0.9, "timestamp":1451080800}...{"assetname":"yyy", 'return':0.9, "timestamp":1451080800},{"assetname":"yyy", 'return':0.9, "timestamp":1451080800} ]

I would like the separate all the assets into their own data frames, can anyone help?

Here's the data btw http://www.mediafire.com/view/957et8za5wv56ba/test_predictions.json

1
  • What are your expecting output? You could do it with pandas.Series with [pd.Series(x) for x in l] where l is your list with dicts Commented Jan 26, 2016 at 11:10

3 Answers 3

1

Just put your data into DataFrame:

import pandas as pd

df = pd.DataFrame([{"assetname":"xxx", 'return':0.9, "timestamp":1451080800},
                   {"assetname":"xxx", 'return':0.9, "timestamp":1451080800}, 
                   {"assetname":"yyy", 'return':0.9, "timestamp":1451080800},
                   {"assetname":"yyy", 'return':0.9, "timestamp":1451080800}])
print(df)

Output:

  assetname  return   timestamp
0       xxx     0.9  1451080800
1       xxx     0.9  1451080800
2       yyy     0.9  1451080800
3       yyy     0.9  1451080800 
Sign up to request clarification or add additional context in comments.

Comments

1

You can load a dataframe from a json file like this:

In [9]: from pandas.io.json import read_json

In [10]: d = read_json('Descargas/test_predictions.json')

In [11]: d.head()
Out[11]: 
  market_trading_pair  next_future_timestep_return  ohlcv_start_date  \
0    Poloniex_ETH_BTC                     0.003013        1450753200   
1    Poloniex_ETH_BTC                    -0.006521        1450756800   
2    Poloniex_ETH_BTC                     0.003171        1450760400   
3    Poloniex_ETH_BTC                    -0.003083        1450764000   
4    Poloniex_ETH_BTC                    -0.001382        1450767600   

   prediction_at_ohlcv_end_date  
0                     -0.157053  
1                     -0.920074  
2                      0.999806  
3                      0.627140  
4                      0.999857  

You may split it like this:

Poloniex_ETH_BTC = d[d['market_trading_pair'] == 'Poloniex_ETH_BTC']

Comments

0

Extending rapto's answer, you can split the whole dataframe by the value of one column like this:

df_dict = dict()
for name,df in d.groupby('market_trading_pair'):
    df_dict[name]=df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.