I´d like to learn how to data frame column as code maped from multiple columns.
In the partial example below I was trying what would could be a clumsy way folowing the path: get unique values as a temporary data frame; concatenate some prefix string to temp row number as a new column and them join the 2 data frames.
df = pd.DataFrame({'col1' : ['A1', 'A2', 'A1', 'A3'],
'col2' : ['B1', 'B2', 'B1', 'B1'],
'value' : [100, 200, 300, 400],
})
tmp = df[['col1','col2']].drop_duplicates(['col1', 'col2'])
# col1 col2
# 0 A1 B1
# 1 A2 B2
# 3 A3 B1
The first question is how to get 'temp' row number and its value to a tmp column?
And what is the clever pythonic way to achieve the result below from df?
dfnew = pd.DataFrame({'col1' : ['A1', 'A2', 'A1', 'A3'],
'col2' : ['B1', 'B2', 'B1', 'B1'],
'code' : ['CODE0','CODE1', 'CODE0', 'CODE3'],
'value' : [100, 200, 300, 400],
})
code col1 col2 value
0 CODE0 A1 B1 100
1 CODE1 A2 B2 200
2 CODE0 A1 B1 300
3 CODE3 A3 B1 400
thanks.
After the answers and just as an exercise I kept working on the non-pythonic version I had in mind with insights I got from great answers, and reached this:
tmp = df[['col1','col2']].drop_duplicates(['col1', 'col2'])
tmp.reset_index(inplace=True)
tmp.drop('index', axis=1, inplace=True)
tmp['code'] = tmp.index.to_series().apply(lambda x: 'code' + format(x, '04d'))
dfnew = pd.merge(df, tmp, on=['col1', 'col2'])
At the time of posting this question, I did not realize that would be nicer to have the index reset to have a fresh sequence instead of their original index numbers.
I tried some variations but I did not get how to chain 'reset_index' and 'drop' in just one command.
I´m starting to enjoy Python. Thank you all.
