0

I have a pandas data frame where one of the columns is an array of keywords, one row in the data frame would look like

id, jobtitle, company, url, keywords

1, Software Engineer, Facebook, http://xx.xx, [javascript, java, python]

However the number of possible keywords can range from 1 to 40

But I would like to do some data analysis like,

  1. what keyword appears most often across the whole dataset
  2. what keywords appear most often for each job title/company

Apart from giving each keyword its own column and dealing with lots of NAN values is there an easy way to answer these questions with python, (permeably pandas as its a dataframe)

1 Answer 1

1

You can do something like this :

import pandas as pd

keyword_dict = {}
def count_keywords(keyword):

    for item in keyword:
        if item in keyword_dict:
            keyword_dict[item] += 1
        else:
            keyword_dict[item] =1

def new_function():
    data = {'keywords':
            [['hello', 'test'], ['test', 'other'], ['test', 'hello']]
            }
    df = pd.DataFrame(data)
    df.keywords.map(count_keywords)
    
    print(keyword_dict)
    
if __name__ == '__main__':
    new_function()

output

{'hello': 2, 'test': 3, 'other': 1}
Sign up to request clarification or add additional context in comments.

5 Comments

I changed it from job title to keywords as that is what I want to expand however that always returns an empty set, here is a screenshot of one of the rows from the dataframe snipboard.io/mQLDjr.jpg
Are you storing a list in the keyword column ?
I believe so, type(df.loc[0]['keywords']) returns a list
Ok - what I wrote above likely won't work with a string, that's made more for a string of text. One thing I have done in the past is map that column and add the words to a dictionary and increment the values for each occurrence. I'll update my answer.
I appreciate the help, this worked just as I had hoped

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.