Pandas get count of value stored in an array in a column

Question

I have a pandas data frame where one of the columns is an array of keywords, one row in the data frame would look like

id, jobtitle, company, url, keywords

1, Software Engineer, Facebook, http://xx.xx, [javascript, java, python]

However the number of possible keywords can range from 1 to 40

But I would like to do some data analysis like,

what keyword appears most often across the whole dataset
what keywords appear most often for each job title/company

Apart from giving each keyword its own column and dealing with lots of NAN values is there an easy way to answer these questions with python, (permeably pandas as its a dataframe)

GeorgesAA · Accepted Answer · 2021-05-09 02:34:00Z

1

You can do something like this :

import pandas as pd

keyword_dict = {}
def count_keywords(keyword):

    for item in keyword:
        if item in keyword_dict:
            keyword_dict[item] += 1
        else:
            keyword_dict[item] =1

def new_function():
    data = {'keywords':
            [['hello', 'test'], ['test', 'other'], ['test', 'hello']]
            }
    df = pd.DataFrame(data)
    df.keywords.map(count_keywords)
    
    print(keyword_dict)
    
if __name__ == '__main__':
    new_function()

output

{'hello': 2, 'test': 3, 'other': 1}

edited May 9, 2021 at 2:34

answered May 9, 2021 at 1:49

GeorgesAA

1535 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Marc-9 Over a year ago

I changed it from job title to keywords as that is what I want to expand however that always returns an empty set, here is a screenshot of one of the rows from the dataframe snipboard.io/mQLDjr.jpg

GeorgesAA Over a year ago

Are you storing a list in the keyword column ?

Marc-9 Over a year ago

I believe so, type(df.loc[0]['keywords']) returns a list

GeorgesAA Over a year ago

Ok - what I wrote above likely won't work with a string, that's made more for a string of text. One thing I have done in the past is map that column and add the words to a dictionary and increment the values for each occurrence. I'll update my answer.

Marc-9 Over a year ago

I appreciate the help, this worked just as I had hoped

Collectives™ on Stack Overflow

Pandas get count of value stored in an array in a column

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related