How to convert python JSON list to dataframe columns without looping

Question

I'm using python and trying to figure out how to do the following without using a loop.

I have a dataframe that has several columns including one that has a JSON objects list. What I'm trying to do is convert the JSON string column into their own columns within the dataframe. For example I have the following dataframe:

name	age	group
John	35	[{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]
Ann	20	[{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]
Emma	25	[{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]

I want to get marks for testid = 001 and testid = 002 as follows.

name	age	test_id1	test_id2
John	35	67	70
Ann	20	75	80
Emma	25	90	99

Here is my dataset

[
   {
      "name":"John",
      "age":35,
      "group":[
         {
            "testid":"001",
            "marks":67
         },
         {
            "testid":"002",
            "marks":70
         }
      ]
   },
   {
      "name":"Ann",
      "age":20,
      "group":[
         {
            "testid":"001",
            "marks":75
         },
         {
            "testid":"002",
            "marks":80
         },
         {
            "testid":"003",
            "marks":87
         }
      ]
   },
   {
      "name":"Emma",
      "age":25,
      "group":[
         {
            "testid":"001",
            "marks":90
         },
         {
            "testid":"002",
            "marks":99
         }
      ]
   }
]

Any idea is highly appreciated. Thank you.

Kindly share the dataframe as code : df.to_dict('records') — sammywemmy
– sammywemmy, Commented May 14, 2021 at 22:56
@sammywemmy Thank you for your comment. I'have added the data set. — Natasha Perera
– Natasha Perera, Commented May 15, 2021 at 3:25

sammywemmy · Accepted Answer · 2021-05-15 04:36:55Z

2

A list compreshension is handy here in pulling the data out; as a side note, if you can, possibly do the extraction, before getting the dict like data into a dataframe (more efficient to do so):

outcome = [[entry[num]['marks']
           for num in range(len(entry)) 
           if entry[num]['testid'] in ('001', '002')] 
           for entry in df.group]

print(outcome)
[[67, 70], [75, 80], [90, 99]]

Zip the data, and assign to new column names in the dataframe:

test_id1, test_id2 = zip(*outcome)

df.filter(['name', 'age']).assign(test_id1 = test_id1, test_id2 = test_id2)

   name  age  test_id1  test_id2
0  John   35        67        70
1   Ann   20        75        80
2  Emma   25        90        99

answered May 15, 2021 at 4:36

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Natasha Perera Over a year ago

Thank you. BTW, it throws me the following error Traceback (most recent call last): File "//HOME-SVR-001/User Directory Folder/Administrator/Desktop/testtest.py", line 27, in <module> for entry in df.group] AttributeError: 'list' object has no attribute 'group' Any idea?

sammywemmy Over a year ago

No idea. df is a dataframe right? It seems your code is reading df as a list

Jonathan Leon · Accepted Answer · 2021-05-15 03:34:17Z

1

See comments inline. Using apply() does the iterating for you. You just need to write the function.

data='''name|age|group
John|35|[{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]
Ann|20|[{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]
Emma|25|[{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]'''
df = pd.read_csv(io.StringIO(data), sep='|', engine='python')

# create function for apply()
def expand_json(xname, x):
    for i, j in enumerate(json.loads(x), 1):
        # print(i, j)
        col = 'test_id'+str(i)
        # print(col)
        # print(j['marks'])
        df.loc[df.name==xname, col] = j['marks']
        
#dftemp is a throw away so nothing prints to the screen. The function writes to the main df

dftemp = df.apply(lambda x: expand_json(x['name'], x['group']), axis=1)
print(df)

   name  age                                                                                             group  test_id1  test_id2  test_id3
0  John   35                                  [{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]    67.000    70.000       NaN
1   Ann   20  [{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]    75.000    80.000    87.000
2  Emma   25                                  [{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]    90.000    99.000       NaN

answered May 15, 2021 at 3:34

Jonathan Leon

5,6862 gold badges9 silver badges16 bronze badges

3 Comments

Natasha Perera Over a year ago

Thank you very much. It works. Can I only get name, age, test_id1 and test_id2 data only

Natasha Perera Over a year ago

Thanks! I can use df.filter() function.

Jonathan Leon Over a year ago

On the for loop in the function you can also limit it by using [0:2]. That should limit to just first two ids. Or filter as you mention.

Collectives™ on Stack Overflow

How to convert python JSON list to dataframe columns without looping

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related