2

Suppose I have list of dicts:

 my lists = [
    
    {'rank': 2, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B01MG0ORBL'
    },
    {'rank': 18, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B0735C9RDZ'
    },
    {'rank': 21, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B07FPVR858'
    },
    {'rank': 126, 'keyword_name': 'mens wallet', 'volume':   , 'asin': 'B01MG0ORBL'
    },
    {'rank': 128, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B0735C9RDZ'
    },
    {'rank': 136, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B07FPVR858'
    },
    {'rank': 19, 'keyword_name': 'leather wallets', 'volume': , 'asin': 'B0735C9RDZ'
    },
    {'rank': 10, 'keyword_name': 'wallets for men', 'volume': 566, 'asin': 'B07FPVR858'
    },
    {'rank': 16, 'keyword_name': 'wallets for men', 'volume': 566, 'asin': 'B0735C9RDZ'
    },
 ]

I want to group by asin and keyword_name since they appear more than once in the list of dicts, so my goal is to have a dataframe that looks like this:

    **keyword_name     volume   B01MG0ORBL    B0735C9RDZ       B07FPVR858** // column headers

     mens wallet       456677    2 126         18 128 19 16    21 10
     leather wallets   23                                 
     wallets for men   566                      16              10
     

So initially I am thinking of

   d = [{d['asin']:d['rank'] for d in l} for l in my_lists]
   pd.dataframe(d)

   
   // save as xlsx file
   writer = pd.ExcelWriter(f"{path}/sheet.xlsx", engine="xlsxwriter")
   d.to_excel(
        writer, sheet_name="Organic", startrow=0, header=True, index=False
    )

But not possible since it will ran into errors TypeError: string indices must be integers.

1 Answer 1

2

You can create DataFrame and then pivoting with lists:

df = pd.DataFrame(my_lists)
    
df = df.pivot_table(index=['keyword_name','volume'], 
                    columns='asin', 
                    values='rank', 
                    aggfunc=list)
print (df)
asin                   B01MG0ORBL B0735C9RDZ B07FPVR858
keyword_name    volume                                 
leather wallets 23            NaN       [19]        NaN
mens wallet     456677   [2, 126]  [18, 128]  [21, 136]
wallets for men 566           NaN       [16]       [10]

Or join converted values to strings:

df = pd.DataFrame(my_lists)

df = (df.assign(rank=df['rank'].astype(str))
        .pivot_table(index=['keyword_name','volume'], 
                     columns='asin', 
                     values='rank', 
                     aggfunc=' '.join, 
                     fill_value=''))
print (df)
asin                   B01MG0ORBL B0735C9RDZ B07FPVR858
keyword_name    volume                                 
leather wallets 23                        19           
mens wallet     456677      2 126     18 128     21 136
wallets for men 566                       16         10
Sign up to request clarification or add additional context in comments.

2 Comments

I updated my question, do you know why printing will show desired out put but when I tried to save as xlsx file, the keyword_name and volume is missing?
@ira remove index=False in to_excel

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.