0

I am trying to retrieve val1 and val2 values from the following nested json file to build a pandas dataframe with two columns: val1 and val2:

{
 'start': '2015-10-01 00:00',
 'end': '2015-10-01 01:00',
 'records': 
     {
        'val1': 
             [
                1,
                2,
                3,
                4,
                5
             ],
         'val2':
             [
                0.1,
                0.5,
                0.2,
                0.1,
                0.0
             ],
         'val3': 'abc'
      }
}

This is what I do:

import json
from pandas.io.json import json_normalize

with open(json_file) as data_file:    
    data = json.load(data_file)  

df = json_normalize(data, 'records', ['val1', 'val2'], record_prefix='records_', errors='ignore')

However, I get this output:

    records_0   val1  val2
0   val1        NaN   NaN
1   val2        NaN   NaN
2   val3        NaN   NaN

The expected output:

val1   val2
1      0.1
2      0.5
3      0.2
4      0.1
5      0.0

5 Answers 5

3

Put your json to a variable or using json.load: then use json_normalize

[Here the example and the code]

import pandas as pd

json = {'start': '2015-10-01 00:00','end': '2015-10-01 01:00','records': {'val1': [1,2,3,4,5],'val2':[0.1,0.5,0.2,0.1,0.0],'val3': 'abc'}}

df = pd.json_normalize(json)

df.columns = df.columns.map(lambda x: x.split(".")[-1])

1

If you want only 2 columns left, then just drop another column and decide which column you want to keep

[Here the example and the code]

for column in df.columns:
if column != 'val1' and column != 'val2':
    df = df.drop([column], axis = 1)

2

Sign up to request clarification or add additional context in comments.

Comments

2

for your example, using json_normalize is not appropriate, because this method assumes that the base container is an array...

you can use another method:

with open(json_file) as data_file:    
    data = json.load(data_file)
  
pandas.DataFrame.from_dict( data= data["records"]  ) 
 

Comments

2

You can systematically pull out what you want.

js = {'start': '2015-10-01 00:00',
 'end': '2015-10-01 01:00',
 'records': {'val1': [1, 2, 3, 4, 5],
  'val2': [0.1, 0.5, 0.2, 0.1, 0.0],
  'val3': 'abc'}}

(pd.json_normalize(js["records"],"val1")
 .rename(columns={0:"val1"})
 .join(pd.json_normalize(js["records"],"val2"))
 .rename(columns={0:"val2"})
)

val1 val2
0 1 0.1
1 2 0.5
2 3 0.2
3 4 0.1
4 5 0

Comments

1

You can define a list as ['val1','val2'] and initialize a dataframe and populate the elements of this new dataframe through use of a for loop such as

import json
import pandas as pd

l=['val1','val2']
df = pd.DataFrame(columns=l)
with open('myfile.json') as data_file:    
    data = json.load(data_file) 

for i in l:
    df[i]=data['records'][i]

df

   val1  val2
0     1   0.1
1     2   0.5
2     3   0.2
3     4   0.1
4     5   0.0

Comments

1

The function expects an array of dictionary.

Just twist a bit 😉.

json_normalize({"val1":val1,"val2":val2} for val1,val2 in zip(data['records']['val1'],data['records']['val2']))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.