0

I first try to normalize the data:

df = pd.json_normalize(balance_sheet_data_qt)

then I have tried flattening it using this answer, How to flatten a pandas dataframe with some columns as json? but doesnt seem to do anything.

json_struct = json.loads(df .to_json(orient="records"))    
#df_flat = pd.json_normalize(json_struct) 

Also tried this How to read and normalize following json in pandas?, but had an issue on what columns to use assign on.

sample data before normalizing aka balanace_sheet_data_qt

   {'balanceSheetHistoryQuarterly': {'AAPL': [{'2020-12-26': {'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000, 'commonStock': 51744000000, 'otherCurrentAssets': 13687000000, 'retainedEarnings': 14301000000, 'otherLiab': 56042000000, 'treasuryStock': 179000000, 'otherAssets': 43270000000, 'cash': 36010000000, 'totalCurrentLiabilities': 132507000000, 'shortLongTermDebt': 7762000000, 'otherStockholderEquity': 179000000, 'propertyPlantEquipment': 37933000000, 'totalCurrentAssets': 154106000000, 'longTermInvestments': 118745000000, 'netTangibleAssets': 66224000000, 'shortTermInvestments': 40816000000, 'netReceivables': 58620000000, 'longTermDebt': 99281000000, 'inventory': 4973000000, 'accountsPayable': 63846000000}}, {'2020-09-26': {'totalLiab': 258549000000, 'totalStockholderEquity': 65339000000, 'otherCurrentLiab': 47867000000, 'totalAssets': 323888000000, 'commonStock': 50779000000, 'otherCurrentAssets': 11264000000, 'retainedEarnings': 14966000000, 'otherLiab': 46108000000, 'treasuryStock': -406000000, 'otherAssets': 33952000000, 'cash': 38016000000, 'totalCurrentLiabilities': 105392000000, 'shortLongTermDebt': 8773000000, 'otherStockholderEquity': -406000000, 'propertyPlantEquipment': 45336000000, 'totalCurrentAssets': 143713000000, 'longTermInvestments': 100887000000, 'netTangibleAssets': 65339000000, 'shortTermInvestments': 52927000000, 'netReceivables': 37445000000, 'longTermDebt': 98667000000, 'inventory': 4061000000, 'accountsPayable': 42296000000}}, {'2020-06-27': {'totalLiab': 245062000000, 'totalStockholderEquity': 72282000000, 'otherCurrentLiab': 39945000000, 'totalAssets': 317344000000, 'commonStock': 48696000000, 'otherCurrentAssets': 10987000000, 'retainedEarnings': 24136000000, 'otherLiab': 47606000000, 'treasuryStock': -550000000, 'otherAssets': 32836000000, 'cash': 33383000000, 'totalCurrentLiabilities': 95318000000, 'shortLongTermDebt': 7509000000, 'otherStockholderEquity': -550000000, 'propertyPlantEquipment': 43851000000, 'totalCurrentAssets': 140065000000, 'longTermInvestments': 100592000000, 'netTangibleAssets': 72282000000, 'shortTermInvestments': 59642000000, 'netReceivables': 32075000000, 'longTermDebt': 94048000000, 'inventory': 3978000000, 'accountsPayable': 35325000000}}, {'2020-03-28': {'totalLiab': 241975000000, 'totalStockholderEquity': 78425000000, 'otherCurrentLiab': 42048000000, 'totalAssets': 320400000000, 'commonStock': 48032000000, 'otherCurrentAssets': 15691000000, 'retainedEarnings': 33182000000, 'otherLiab': 48745000000, 'treasuryStock': -2789000000, 'otherAssets': 33868000000, 'cash': 40174000000, 'totalCurrentLiabilities': 96094000000, 'shortLongTermDebt': 10392000000, 'otherStockholderEquity': -2789000000, 'propertyPlantEquipment': 43986000000, 'totalCurrentAssets': 143753000000, 'longTermInvestments': 98793000000, 'netTangibleAssets': 78425000000, 'shortTermInvestments': 53877000000, 'netReceivables': 30677000000, 'longTermDebt': 89086000000, 'inventory': 3334000000, 'accountsPayable': 32421000000}}]}}

sample data after normalizaing.

,balanceSheetHistoryQuarterly.AAPL
0,"[{'2020-12-26': {'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000, 

list of columns I want:

'totalLiab'
 'totalStockholderEquity'
 'otherCurrentLiab'
 'totalAssets'
 'commonStock'
 'otherCurrentAssets'
 'retainedEarnings'
 'otherLiab'
 'treasuryStock'
 'otherAssets'
 'cash'
 'totalCurrentLiabilities'
 'shortLongTermDebt'
 'otherStockholderEquity'
 'propertyPlantEquipment'
 'totalCurrentAssets'
 'propertyPlantEquipment'
 'totalCurrentAssets'
 'longTermInvestments'
 'netTangibleAssets'
 'shortTermInvestments'
 'netReceivables'
 'longTermDebt'
 'inventory'
 'accountsPayable'

I am trying to put this into dataframe/tabular format. I think the first row balanceSheetHistoryQuarterly.AAPL or date column may be throwing it off.

Any help is appreciated.

4
  • can you please post sample data before normalizing? Commented Feb 26, 2021 at 2:43
  • okay ive added that Commented Feb 26, 2021 at 2:50
  • I think your sample data in this question is broken, it's not a valid json. Can you provide balance_sheet_data_qt and columns you want? Commented Feb 26, 2021 at 9:02
  • okay ive added the full balance_sheet_data_qt and the columns I want. Commented Feb 26, 2021 at 14:15

1 Answer 1

1

As the structure of data is like below

{
    'balanceSheetHistoryQuarterly': {
        'AAPL': [
            {
                '2020-12-26': {
                    'totalLiab': 287830000000,
                    'totalStockholderEquity': 66224000000,
                    'otherCurrentLiab': 55899000000
                }
            }, {
                '2020-06-27':
                {
                    'totalLiab': 245062000000,
                    'totalStockholderEquity': 72282000000,
                    'otherCurrentLiab': 39945000000
                }
            }
        ]
    }
}

You have to create a new array and iterate through AAPL and add its values to your list

listInput = []
for js in balance_sheet_data_qt["balanceSheetHistoryQuarterly"]["AAPL"]:
    value = js.values()
    listInput += value
df = pd.json_normalize(listInput)

If you want to pass the date to dataframe, you have to create a new json each times you iterate through js.items() and js.keys()

listInput = []
for js in balance_sheet_data_qt["balanceSheetHistoryQuarterly"]["AAPL"]:
    for key,value in zip(js.keys(), js.values()):
        new_json = value.copy()
        new_json["date"] = key
        listInput.append(new_json)
df = pd.json_normalize(listInput)
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, your awesome! how would I pass the date through into a new col?
@excelguy I edited my answer, you can check it again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.