I have the following data in multiple columns:
col1 col2 col3
123456 ['mary','ralph', ''bob'] ['bob','sam']
456789 ['george','fred', susie'] ['ralph','mary', 'bob']
789123 ['mary', bob'] ['bob']
I eventually need a value_counts on each column. To get everything out of the lists, I am trying explode. I can get the values into their columns post-explode, no problem. But, those of you who know about explode, know that my value_counts would then be inflated because of the repetition of values that explode causes when applying it to multiple columns
Explode yields this for example:
col1 col2 col3
123456 mary bob
123456 mary sam
123456 mary george
123456 ralph bob
123456 ralph sam
123456 ralph george...etc.
Obviously, this throws off an accurate value_counts per column which is what I need. I have tried looping the explode over each column and then after each column explode matching the first col and the exploded column and remove duplicates, doesn't work. Always enjoying not being the smartest guy in the room (more to learn) so I sent this question to you pandas guru exploding with ideas. (see what I did there?). Thanks.
Expected output so that I can value_counts all columns except the col1 would be this:
123456 mary bob
123456 ralph sam
123456 bob george
456789 george ralph
456789 fred mary
456789 susie bob
789123 mary bob
789123 bob george