Transforming List of dicts to pandas dataframe with dynamic header columns

Question

Suppose I have list of dicts:

 my lists = [
    
    {'rank': 2, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B01MG0ORBL'
    },
    {'rank': 18, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B0735C9RDZ'
    },
    {'rank': 21, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B07FPVR858'
    },
    {'rank': 126, 'keyword_name': 'mens wallet', 'volume':   , 'asin': 'B01MG0ORBL'
    },
    {'rank': 128, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B0735C9RDZ'
    },
    {'rank': 136, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B07FPVR858'
    },
    {'rank': 19, 'keyword_name': 'leather wallets', 'volume': , 'asin': 'B0735C9RDZ'
    },
    {'rank': 10, 'keyword_name': 'wallets for men', 'volume': 566, 'asin': 'B07FPVR858'
    },
    {'rank': 16, 'keyword_name': 'wallets for men', 'volume': 566, 'asin': 'B0735C9RDZ'
    },
 ]

I want to group by asin and keyword_name since they appear more than once in the list of dicts, so my goal is to have a dataframe that looks like this:

    **keyword_name     volume   B01MG0ORBL    B0735C9RDZ       B07FPVR858** // column headers

     mens wallet       456677    2 126         18 128 19 16    21 10
     leather wallets   23                                 
     wallets for men   566                      16              10

So initially I am thinking of

   d = [{d['asin']:d['rank'] for d in l} for l in my_lists]
   pd.dataframe(d)

   
   // save as xlsx file
   writer = pd.ExcelWriter(f"{path}/sheet.xlsx", engine="xlsxwriter")
   d.to_excel(
        writer, sheet_name="Organic", startrow=0, header=True, index=False
    )

But not possible since it will ran into errors TypeError: string indices must be integers.

jezrael · Accepted Answer · 2022-05-30 13:36:01Z

2

You can create DataFrame and then pivoting with lists:

df = pd.DataFrame(my_lists)
    
df = df.pivot_table(index=['keyword_name','volume'], 
                    columns='asin', 
                    values='rank', 
                    aggfunc=list)
print (df)
asin                   B01MG0ORBL B0735C9RDZ B07FPVR858
keyword_name    volume                                 
leather wallets 23            NaN       [19]        NaN
mens wallet     456677   [2, 126]  [18, 128]  [21, 136]
wallets for men 566           NaN       [16]       [10]

Or join converted values to strings:

df = pd.DataFrame(my_lists)

df = (df.assign(rank=df['rank'].astype(str))
        .pivot_table(index=['keyword_name','volume'], 
                     columns='asin', 
                     values='rank', 
                     aggfunc=' '.join, 
                     fill_value=''))
print (df)
asin                   B01MG0ORBL B0735C9RDZ B07FPVR858
keyword_name    volume                                 
leather wallets 23                        19           
mens wallet     456677      2 126     18 128     21 136
wallets for men 566                       16         10

answered May 30, 2022 at 13:36

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ira Over a year ago

I updated my question, do you know why printing will show desired out put but when I tried to save as xlsx file, the keyword_name and volume is missing?

jezrael Over a year ago

@ira remove index=False in to_excel

Collectives™ on Stack Overflow

Transforming List of dicts to pandas dataframe with dynamic header columns

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related