0

I have multiple new line delimited json files (they are nested) and I need to join all of them into one big csv. They all have the same schema (field names).

I have read several solutions on flatten the nest and then append them but I need to append them with a new column saying what file (table) the information comes from.

Table A has

Column 1 Column 2
apple orange
Cell 3 Cell 4

Table B has

Column 1 Column 2
walmart target
Cell 3 Cell 4

Then the CSV would say

Column A Column B Column C
TABLE 1 apple orange
TABLE 2 walmart target

I'm thinking of creating the csv file with different headers like ID, Date, Store, Product then use Insert but I'm not sure on how to do this because most of the tutorials I found only convert json into pandas dataframe.

I have tried to use pd.Dataframe and normalize to try to unnest my json files in order to put it in a dataframe but keep getting into problems. I don't know what to do next. I think it might be because of my json files are not in the right json format? My json file is like this:

{
    "idA":{
         "property 1": "..."
         "property 2": "..."
         "property 3": [
                        {
                          "A" : "B",
                          "C" : "D"
                         }
                    ]
     },
    "idB":{
          .....
     }
} 
         

Think of idA and idB are like the id part of an URL, really long. I'm very new and kinda very overwhelmed about this, please help :((

2
  • Can you share json with same format if possible? Commented Feb 8, 2023 at 22:06
  • Welcome to StackOverflow. Please provide a minimal reproducible example, including a small example input data and the corresponding expected result. Please make the input (code, data) easy to copy and paste, so we can help you more easily. Commented Feb 12, 2023 at 1:06

1 Answer 1

0

I think you need to be a bit more specific with your question (i.e: what are you getting as an error or undesired result). This way we can help you sort out that specific problem.

This said, I'm noticing that your Json file is not a list of dictionaries, but a dictionary containing objects, which have the information within them, the pd.json_normalize function will look to iterate on the outer most list of dictionaries for your json document, that perhaps is not permitting it to work properly. (you can refer to the function's docs in order to understand further this behaviour)

If it is the case that your document is indeed a big dictionary containing dictionaries with the needed information, you could use a for loop in order to access the information within the dictionary, and then use pd.DataFrame.from_dict() in order to manipulate the information within to a pandas df, and then add the column for each iteration, appending each new dataframe to a list and using pd.concat() in order to create the final df, like so:

df_list = []
for key in outer_dict:

    inside_dict = outer_dict[key]
    df = pd.DataFrame.from_dict(inside_dict)
    df['doc_name'] = key
    df_list.append[df]
final_df = pd.concat(df_list, axis=1)

As for the list containing a dictionary on one of the dictionary's properties, you could flatten that one into the df by using json_normalize() or even acces that property directly and create a df that you could join into the main df.

You can also save the contents of the dict without the keys into a list and use pd.read_json() for reading the resulting string:

dict_list = []
for key in outer_dict:

    inside_dict = outer_dict[key]
    dict_list.append[inside_dict]
json_string = str(dict_list)
pd.read_json(json_string)

Please let me know if this helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.