2

I want to create two new columns in job_transitions_sample.csv and add the wage data from wage_data_sample.csv for both Title 1 and Title 2:

job_transitions_sample.csv:

                     Title 1                    Title 2  Count
0   administrative assistant             office manager     20
1                 accountant                    cashier      1
2                 accountant          financial analyst     22
4                 accountant          senior accountant     23
6           accounting clerk                 bookkeeper     11
7     accounts payable clerk  accounts receivable clerk      8
8   administrative assistant           accounting clerk      8
9   administrative assistant       administrative clerk     12
...

wage_data_sample.csv

                      title   wage
0                   cashier  17.00
1           sandwich artist  18.50
2                dishwasher  20.00
3                babysitter  20.00
4                   barista  21.50
5               housekeeper  21.50
6    retail sales associate  23.00
7                 bartender  23.50
8                   cleaner  23.50
9                 line cook  23.50
10               pizza cook  23.50
...

I want the end result to look like this:

                      Title 1             Title 2  Count  Wage of Title 1  Wage of Title 2
0    administrative assistant      office manager     20              NaN              NaN
1                  accountant             cashier      1              NaN              NaN
2                  accountant   financial analyst     22              NaN              NaN
...

I'm thinking of using dictionaries then try to iterate every column but is there a more elegant built in solution? This is my code so far:

wage_data = pd.read_csv('wage_data_sample.csv')
dict = dict(zip(wage_data.title, wage_data.wage))

2 Answers 2

1

Use Series.map by dictionary d - cannot use dict for varialbe name, because python code name:

df = pd.read_csv('job_transitions_sample.csv')
wage_data = pd.read_csv('wage_data_sample.csv')

d = dict(zip(wage_data.title, wage_data.wage))
df['Wage of Title 1'] = df['Title 1'].map(d)
df['Wage of Title 2'] = df['Title 2'].map(d)
Sign up to request clarification or add additional context in comments.

5 Comments

why are there warnings? df['Wage of Title 1'] = df['Title 1'].map(d) SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
@pellypen - is some code between df = pd.read_csv('job_transitions_sample.csv') wage_data = pd.read_csv('wage_data_sample.csv') and d = dict(zip(wage_data.title, wage_data.wage)) df['Wage of Title 1'] = df['Title 1'].map(d) df['Wage of Title 2'] = df['Title 2'].map(d) ?
yes a lot of data cleaning code for job_transitions_sample.csv
@pellypen - there is some filtration? If yes, add .copy() like df = df[df['col'] > 10].copy()
a bit unrelated but any idea how this dataset can be used to provide job transition suggestions? I want to perform quantitative and qualitative analysis.
0

You can try with 2 merge con the 2 different Titles subsequentely.

For example, let be

  • df1 : job_transitions_sample.csv

  • df2 : wage_data_sample.csv

    df1.merge(df2, left_on='Title 1', right_on='title',suffixes=('', 'Wage of')).merge(df2, left_on='Title 2', right_on='title',suffixes=('', 'Wage of'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.