1

i have the following dataframe (i have simplified it)

                  Column0 Column1             Type
Asset Code
R0083TX3P3PATX999    0.00    0.00  variable_name_1
R0084TX3P3WTXNM99   55.74   55.74  variable_name_1
R0087KY2P2KY99999  265.35  265.35  variable_name_1
T7001OK2P2OK99999    0.00    0.00  variable_name_2
T7029LA3P3SLA9999    0.00    0.00  variable_name_2
T7032CA5P5SW99999    0.00    0.00  variable_name_2
T7001OK2P2OK99999    0.00    0.00  variable_name_3
T7029LA3P3SLA9999    9.00    9.00  variable_name_3
T7032CA5P5SW99999   14.00   14.00  variable_name_3

it is actually 3 different dataframes that i have joined together after adding the 'type' column so i could tell where it came from

my end goal is to create a nested json that at the top has the "ColumnName" as the key, and then second has the "Asset Code" as the next key, which has a third nested object which is just the associated variables and values.

the intent of this would be that the json could be accessed by data['Column0']['Asset-Code'] and the list of values would be present

the output would be a json object with this format:

{
  "Column1": {
    "R0083TX3P3PATX999": {
      "variable_name_1": 0
    },
    "R0084TX3P3WTXNM99": {
      "variable_name_1": 55.74
    },
    "R0087KY2P2KY99999": {
      "variable_name_1": 265.35
    },
    "T7001OK2P2OK99999": {
      "variable_name_2": 0,
      "variable_name_3": 0
    },
    "T7029LA3P3SLA9999": {
      "variable_name_2": 0,
      "variable_name_3": 9.0
    },
    "T7032CA5P5SW99999": {
      "variable_name_2": 0,
      "variable_name_3": 14
    }
  },
  "Column2": {
    "R0083TX3P3PATX999": {
      "variable_name_1": 0
    },
    "R0084TX3P3WTXNM99": {
      "variable_name_1": 55.74
    },
    "R0087KY2P2KY99999": {
      "variable_name_1": 265.35
    },
    "T7001OK2P2OK99999": {
      "variable_name_2": 2,
      "variable_name_3": 3
    },
    "T7029LA3P3SLA9999": {
      "variable_name_2": 2,
      "variable_name_3": 9.0
    },
    "T7032CA5P5SW99999": {
      "variable_name_2": 0,
      "variable_name_3": 14
    }
  }
}

I am unsure how to do this, does this mean my new master dataframe that contains all of them needs to be reindexed (or multi indexed?) i was looking at groupby functions as well but was unsure how to apply them, specifically because of that third nested object. initially before that it was very easy because i was just exporting the to_json with orient=columns and it worked perfectly, but only for the two levels of data.

1 Answer 1

2

One way you can do it is by loop through the columns and transform to dict as an index:

res = {col: df.pivot('Asset Code','Type', col)\
                              .replace({np.nan:None})\
                              .to_dict(orient='index') for col in 'Column1','Column2']}

Output

{'Column1': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0087KY2P2KY99999': {'variable_name_1': 265.35,
   'variable_name_2': None,
   'variable_name_3': None},
  'T7001OK2P2OK99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 0.0},
  'T7029LA3P3SLA9999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 9.0},
  'T7032CA5P5SW99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 14.0}},
 'Column2': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0087KY2P2KY99999': {'variable_name_1': 265.35,
   'variable_name_2': None,
   'variable_name_3': None},
  'T7001OK2P2OK99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 0.0},
  'T7029LA3P3SLA9999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 9.0},
  'T7032CA5P5SW99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 14.0}}}

Below the code I used to create the df:

df = pd.DataFrame([
    ['R0083TX3P3PATX999',    0.00,    0.00,  'variable_name_1'],
    ['R0084TX3P3WTXNM99',   55.74 ,  55.74,  'variable_name_1'],
    ['R0087KY2P2KY99999',  265.35 , 265.35,  'variable_name_1'],
    ['T7001OK2P2OK99999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7029LA3P3SLA9999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7032CA5P5SW99999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7001OK2P2OK99999',    0.00 ,   0.00,  'variable_name_3'],
    ['T7029LA3P3SLA9999',    9.00 ,   9.00,  'variable_name_3'],
    ['T7032CA5P5SW99999',   14.00 ,  14.00,  'variable_name_3'],
], columns = ['Asset Code','Column1','Column2', 'Type'])
Sign up to request clarification or add additional context in comments.

1 Comment

sorry, it took so long to reply, just wanted to say that i have implemented this answer and it works fantastically for accomplishing what i need. was also able to loop thru the dictionary to clear out the None values with for k in res.keys(): for k2 in res[k].keys(): filtered = {k2: v for k2, v in res[k][k2].items() if v is not None} res[k][k2].clear() res[k][k2].update(filtered)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.