1

I have a pandas dataframe like this:

name           salary  skills
Web-master     2000     ['django', 'html', 'java']
Engineer       2700     ['python', 'java', 'sql']
Programer      2400     ['python', 'css', 'sql']

I want to create a dataframe like the below:

name    count  meansalary
django   1       2000
python   2       2550
java     2       2350

I tried this:

skildf = pd.DataFrame(columns = ['skill','count', 'salary']) 

for i in data['skills']: 
  for j in i: 
    if j in skildf['skill']: 
      skildf.loc[j] = skildf.loc[j][['salary'] = i['salary'], ['count'] = 0]
    else: 
      skildf.loc[j] = skildf.loc[j][['salary'] += i['salary'], ['salary'] += 1]
2
  • Please share the code which you tried? Commented May 6, 2021 at 9:23
  • i tried this:''' skildf = pd.DataFrame(columns = ['skill','count', 'salary']) for i in data['skills']: for j in i: if j in skildf['skill']: skildf.loc[j] = skildf.loc[j][['salary'] = i['salary'], ['count'] = 0] else: skildf.loc[j] = skildf.loc[j][['salary'] += i['salary'], ['salary'] += 1]''' Commented May 6, 2021 at 9:29

1 Answer 1

3
out = (df.explode("skills")
         .groupby("skills")
         .agg(count=("skills", "size"), meansalary=("salary", "mean")))

explode the lists and then groupby over the individual skills. Then aggregrate over skills' size to get the count column and over salary's mean to get meansalary

to get

>>> out

        count  meansalary
skills
css         1        2400
django      1        2000
html        1        2000
java        2        2350
python      2        2550
sql         2        2550
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.