0

I hace a big json file data and I want to convert it in to tabular form. I am trying to flatten the data in to dataframe using json_nomalise. so Far I have this :

code so far

I want to further flatten the submissions and product data in columns i tried this:

submission_data = pd.json_normalize(data=rawData['results'], record_path=rawData['results']['submissions'], meta=['application_number', 'sponsor_name'] , errors='ignore') submission_data.head(3)

But I am getting error saying: TypeError: list indices must be integers or slices, not str

Any output on this will be helpful

1 Answer 1

0

As submissions and Products are lists (and not objects with a regular structure), JSON_normalize will leave them untouched. Also, given that they are lists, can you make sure that they are always the same number for each record? If not, distributing them trough columns makes no sense. If submissions and products are pairs (i.e. if every submission corresponds to one product) you can consider distributing along lines (In a melting dataframe strategy).

finally, regarding the error, raw_data seems to be a list of objects that contain a 'results' field. That means you cannot retrieve directly raw_data['results'], but only raw_data[0]['results'] to get the results from the first object.

Adding a solution proposition

Given your data structure, what I would do is the following:

  1. normalize the raw_data as you do in the notebook.
  2. for each line of the resulting dataframe: a. normalize the json in 'submissions' field b. change the column names of that resulting dataframe to 'submissions.<column_name>'. c. add a column with value equal to the application number of the line you are evaluating. d. add that resulting df to a list, collecting all such dataframes
  3. concatenate those dataframes
  4. merge the original dataframe with the concatenated one using 'application_number' as the key, and drop the submissions column.

Repeat the process for the 'products'; however, unless you know the relationship between submissions and products, there is no clear way of merging the dataframes you get:

  • If they have no relationship except for being under the same application number, you basically get separate datasets for each.
  • If there is a one-to-one relationship, you can just merge them by index (concatenate each line)

in code:

df = pd.normalize_json(raw_data)

submissions = []
products = []

for i, line in df.iterrows():
    temp_df_sub = pd.normalize_json(line['submissions'])
    temp_df_sub.cols = list(map(lambda x: f'submissions.{x}', temp_df_sub)
    temp_df_sub['application_number'] = line['application_number']
    submissions.append(temp_df_sub)

    temp_df_prod = pd.normalize_json(line['products'])
    temp_df_prod.cols = list(map(lambda x: f'products.{x}', temp_df_sub)
    temp_df_prod['application_number'] = line['application_number']
    products.append(temp_df_prod)

submissions_df = pd.concat(submissions)
products_df = pd.concat(products)


# if one-to-one relationship between submissions and products
sub_prod_df = pd.concat([submissions_df, products_df], axis=1)
final_df = df.merge(sub_prod_df, on='application_number')


# if no relationship
final_sub_df = submissions_df.merge(df, on='application_number')
final_prod_df = products_df.merge(df, on='application_number')

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, Yess the lists are always the samew number. These lists further contains the dictionary which I want to extract in to table columns with the other data. The structure of the submmisions as follow:
And are submissions and products related in a one-to-one fashion?
But anyway, then you would have the same problem with the application_docs, that is also a list with objects; is this one also guaranteed to have the same number of elements in every instance?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.