1

I have this dataframe:

Month_Year  City_Name   Chain_Name Product_Name Product_Price
11-2021     London      Aldi       Pasta        2.33
11-2021     Bristol     Spar       Bananas      1.45
10-2021     London      Tesco      Olives       4.12
10-2021     Cardiff     Spar       Pasta        2.25

This dataframe will be displayed in nested collapsible which will expand in the following order:

Date_month:
    City_Name:
         Shop_name:
             Product_name : Price

Theefore, I want to transform the dataframe to the following structure:

{10-2021:
    {London:
        {Aldi:
            {Pasta:2.33}}},
    {Bristol:{
        {Spar:
            {Bananas:1.45}}}}

Essentially I want to groupby every element recursively, starting by purchase month. The closest thing I've found is the solution in this answer, but as dictionary doesn't take duplicate values as I might have duplicate values in every column. Other than that, I am not sure if using a dictionary is the optimal answer for this problem.

My best guess is I have to use other DS type such as some type of tree, but couldn't find any way to do that.

Any advises?

2
  • Could you explain how you'd want to handle duplicate values in your final data structure? One workaround could be to tag a number to the duplicates. For example: { ... "London": { "Aldi 1", "Aldi 2"} ...} Commented Nov 30, 2021 at 12:13
  • I would like to keep the values unchanged; One way could be override built-in dict class to allow duplicate values, but not sure how efficient that solution would be. A tree like structure would be best probably; where each node represent one element of nested collapsible object in the front-end. Commented Nov 30, 2021 at 12:35

1 Answer 1

2

You can group your dataframe by all columns except price, then create your dictionaries in a loop:

# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')

result = dict()
nested_dict = dict()

for index, value in grouped_df.itertuples():
    for i, key in enumerate(index):
        if i == 0:
            if not key in result:
                result[key] = {}
            nested_dict = result[key]
        elif i == len(index) - 1:
            nested_dict[key] = value
        else:
            if not key in nested_dict:
                nested_dict[key] = {}
            nested_dict = nested_dict[key]

print(json.dumps(result, indent=4))

Changing your df to show nested dict and mean calculation to:

  Month_Year City_Name Chain_Name Product_Name  Product_Price
0    11-2021    London       Aldi        Pasta           2.33
1    11-2021    London       Aldi        Pasta           2.35
2    11-2021    London       Aldi       Olives           3.99
3    11-2021   Bristol       Spar      Bananas           1.45
4    10-2021    London      Tesco       Olives           4.12
5    10-2021   Cardiff       Spar        Pasta           2.25

You get the output:

{
    "10-2021": {
        "Cardiff": {
            "Spar": {
                "Pasta": 2.25
            }
        },
        "London": {
            "Tesco": {
                "Olives": 4.12
            }
        }
    },
    "11-2021": {
        "Bristol": {
            "Spar": {
                "Bananas": 1.45
            }
        },
        "London": {
            "Aldi": {
                "Olives": 3.99,
                "Pasta": 2.34
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

That worked perfectly. Thanks!
@Tranbi this works perfectly but I'm trying to understand the code as a newbie. How is it that the result dictionary gets updated during each loop in the second for loop when result is only present in the i==0 condition?
@MikeLee Since dictionaries are mutables nested_dict = result[key] assigns nested_dict to the reference of result[key], ie. changing one will change both (think pointer in C++). I recommend you read about how python handles mutables/immutables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.