1

Let's say my dataframe looks like this.

date     app_id country val1 val2 val3 val4
2016-01-01  123 US       50   70   80   90
2016-01-02  123 US       60   80   90   100
2016-01-03  123 US       70   88   99   11

I want to dump this into a nested dictionary or even a JSON object as follows:

{
   country:
   {
       app_id: 
       {
           date: [val1, val2, val3, val4]
       }
    }
}

So that way if I called my_dict['US'[123['2016-01-01']]], I would get to the list [50,70,80,90]

Is there an elegant way to go about doing this? I'm aware of Pandas's to_dict() function but I can't seem to get around nesting dictionaries.

1

2 Answers 2

2

1st create the dataframe you need. then using recur_dictify from DSM.

dd=df.groupby(['country','app_id','date'],as_index=False)['val1', 'val2', 'val3', 'val4'].apply(lambda x : x.values.tolist()[0]).to_frame()

def recur_dictify(frame):
    if len(frame.columns) == 1:
        if frame.values.size == 1: return frame.values[0][0]
        return frame.values.squeeze()
    grouped = frame.groupby(frame.columns[0])
    d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
    return d


recur_dictify(dd.reset_index())
Out[711]: 
{'US': {123: {'2016-01-01': [50, 70, 80, 90],
   '2016-01-02': [60, 80, 90, 100],
   '2016-01-03': [70, 88, 99, 11]}}}
Sign up to request clarification or add additional context in comments.

Comments

0

update

Actually this might work with a simple nested dictionary:

import pandas as pd
from collections import defaultdict

nested_dict = lambda: defaultdict(nested_dict)
output = nested_dict()

for lst in df.values:
    output[lst[1]][lst[0]][lst[2]] = lst[3:].tolist()

Or:

output = defaultdict(dict)

for lst in df.values:
    try:
        output[lst[1]][lst[0]].update({lst[2]:lst[3:].tolist()})
    except KeyError:
        output[lst[1]][lst[0]] = {}
    finally:
        output[lst[1]][lst[0]].update({lst[2]:lst[3:].tolist()})

Or:

output = defaultdict(dict)

for lst in df.values:

    if output.get(lst[1], {}).get(lst[0]) == None:
        output[lst[1]][lst[0]] = {}        
    output[lst[1]][lst[0]].update({lst[2]:lst[3:].tolist()})

output

Here is my old solution, we make use df.groupbyto group the dataframe by country and app_id. From here we collect the data (excluding country and app_id) and use defaultdict(dict) to add data to output dictionary in a nested way.

import pandas as pd
from collections import defaultdict

output = defaultdict(dict)

groups = ["country","app_id"]
cols = [i for i in df.columns if i not in groups]

for i,subdf in df.groupby(groups):
    data = subdf[cols].set_index('date').to_dict("split") #filter away unwanted cols
    d = dict(zip(data['index'],data['data'])) 
    output[i[0]][i[1]] = d # assign country=level1, app_id=level2

output

return:

{'FR': {123: {'2016-01-01': [10, 20, 30, 40]}},
 'US': {123: {'2016-01-01': [50, 70, 80, 90],
   '2016-01-02': [60, 80, 90, 100],
   '2016-01-03': [70, 88, 99, 11]},
  124: {'2016-01-01': [10, 20, 30, 40]}}}

and output['US'][123]['2016-01-01'] return:

[50, 70, 80, 90]

if:

df = pd.DataFrame.from_dict({'app_id': {0: 123, 1: 123, 2: 123, 3: 123, 4: 124},
 'country': {0: 'US', 1: 'US', 2: 'US', 3: 'FR', 4: 'US'},
 'date': {0: '2016-01-01',
  1: '2016-01-02',
  2: '2016-01-03',
  3: '2016-01-01',
  4: '2016-01-01'},
 'val1': {0: 50, 1: 60, 2: 70, 3: 10, 4: 10},
 'val2': {0: 70, 1: 80, 2: 88, 3: 20, 4: 20},
 'val3': {0: 80, 1: 90, 2: 99, 3: 30, 4: 30},
 'val4': {0: 90, 1: 100, 2: 11, 3: 40, 4: 40}})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.