3

I'm working on a quite simple thing : requesting something to a database, this one is returning me an [huge] dictionary. It's ok, I love dictionnaries. But i'm not a pro with this thing in Python.

My problem is that I want to convert this dictionary into a DataFrame. It's ok, I googled it and it works. But in my dictionary, I have others dictionaries (yeah I know...).

I want to take from those dictionaries (which are into my dataframe) the values of the "value" key

Here's a sample and what I tried. Thank's in advance.

[[res is my huge dictionary, the result from the query]]

res :

{'head': {'vars': ['id', 'marque', 'modele']},
 'results': {'bindings': [{'id': {'type': 'literal', 'value': '1362'},
    'marque': {'type': 'literal', 'value': 'PEUGEOT'},
    'modele': {'type': 'literal', 'value': '206'}},....

pd.DataFrame(res['results']['bindings'],columns=res['head']['vars']) : enter image description here

As you can see, there's another dictionary into my dataframe ! What I want is to take the values from the "value" key, in an efficient way (indeed, I know how to do that with a big for statement but please, not in python).

I tried the things like res['results']['bindings']['values'], or res['results']['bindings'].values() (or .values), and others things on the dataframe like df.values()['value'] = df.values() but it doesn't work.

2 Answers 2

4

IIUC, you can use an applymap and extract the value associated with the value key from every dictionary.

import operator

df = pd.DataFrame(res['results']['bindings'], columns=res['head']['vars']) 
df = df.applymap(operator.itemgetter('value'))

This operates under the assumption the each cell value is a dictionary.


It could be possible some of your dictionaries do not contain value as a key. In that case, a slight modification is required, using dict.get:

df = df.applymap(lambda x: x.get('value', np.nan) \
                        if isinstance(x, dict) else np.nan)

This will also handle the potential problems that arise when x is not a dict.

Sign up to request clarification or add additional context in comments.

2 Comments

It works. And it's very efficient. Thank you ! :) Clément Edit : thanks for your advice on the second part of your answer, it's usefull and yes, it can happen.
@ClementB Glad it helped.
3

You can use json_normalize which perfectly add NaNs:

d = {'head': {'vars': ['id', 'marque', 'modele']},
 'results': {'bindings': [{'id': {'type': 'literal', 'value': '1362'},
    'marque': {'type': 'literal', 'value': 'PEUGEOT'},
    'modele': {'type': 'literal', 'value': '206'}},{'id': {'type': 'literal', 'value': '1362'},
    'marque': {'type': 'literal', 'value': 'PEUGEOT'}}]}}

from pandas.io.json import json_normalize    
df = json_normalize(d['results']['bindings']).filter(like='value')
df.columns = df.columns.str.replace('.value','')
print (df)
     id   marque modele
0  1362  PEUGEOT    206
1  1362  PEUGEOT    NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.