1

I have a data frame with many binary variables and I would like to create a new variable with categorical values based on many of these binary variables

My dataframe looks like this

gov_winner    corp_winner    in part
        1              0           0
        0              1           0
        0              0           1

I variable I would like to create is called winning_party and would look like this

gov_winner    corp_winner    in part    winning_party
        1              0           0             gov
        0              1           0            corp
        0              0           1         in part

I started trying the following code but haven't had success yet:

 harrington_citations = harrington_citations.assign(winning_party=lambda x: x['gov_winner'] 
 == 1 then x = 'gov' else x == 0)

Using anky_91's answer I get the following error:

TypeError: can't multiply sequence by non-int of type 'str'

2
  • There are only columns filled by 1 and 0 ? Commented Jan 13, 2020 at 15:00
  • [email protected] works? Commented Jan 13, 2020 at 15:08

3 Answers 3

3

You can use a dot product:

df.assign(Winner_Party=df.dot(df.columns))
#df.assign(Winner_Party=df @ df.columns)

   gov_winner  corp_winner  in_part Winner_Party
0           1            0        0   gov_winner
1           0            1        0  corp_winner
2           0            0        1      in_part
Sign up to request clarification or add additional context in comments.

3 Comments

I updated my answer with the error I got. A problem may be that the actual data frame I am working with has many variables not involved in this new variable I am working to create. Thanks.
I can make a df with just the variables I am using to create this new variable and see if your answer works...
@GrahamStreich May be you have columns which doesnt only have 1 and 0 , filter such columns out and try
3

How about idxmax, notice this will only select the first max , you have multiple cell equal to 1 per row, you may want to try Jez's solution

df['Winner_Party']=df.eq(1).idxmax(1)

Comments

1

If there is always only one 1 per rows use DataFrame.dot, also you can filter only 1 and 0 columns before:

df1 = df.loc[:, df.isin([0,1,'0','1']).all()].astype(int)
df['Winner_Party'] = df1.dot(df1.columns)

But if there is multiple 1 per rows and need all matched values add separator and then remove it :

df['Winner_Party'] = df1.dot(df1.columns + ',').str.rstrip(',')

print (df)
   gov_winner  corp_winner  in part Winner_Party
0           1            0        0   gov_winner
1           0            1        0  corp_winner
2           0            0        1      in part

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.