6
test = {'ngrp' : ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx']}
test = pd.DataFrame(test)
dummy = pd.get_dummies(test['ngrp'], drop_first = True)

This gives me:

   Brooklyn  Manhattan  Queens  Staten Island
0         0          1       0              0
1         1          0       0              0
2         0          0       1              0
3         0          0       0              1
4         0          0       0              0

I will get Bronx as my reference level (because that is what gets dropped), how do I change it to specify that Manhattan should be my reference level? My expected output is

   Brooklyn  Queens  Staten Island  Bronx
0         0       0              0      0
1         1       0              0      0
2         0       1              0      0
3         0       0              1      0
4         0       0              0      1
1
  • 1
    What do you mean by "reference level", and what is the output expected? Commented Nov 15, 2019 at 1:24

1 Answer 1

2

get_dummies sorts your values (lexicographically) and then creates dummies. That's why you don't see "Bronx" in your initial result; its because it was the first sorted value in your column, so it was dropped first.

To avoid the behavior you see, enforce the ordering to be on a "first-seen" basis (i.e., convert it to an ordered categorical).

pd.get_dummies(
    pd.Categorical(test['ngrp'], categories=test['ngrp'].unique(), ordered=True), 
    drop_first=True)                                       

   Brooklyn  Queens  Staten Island  Bronx
0         0       0              0      0
1         1       0              0      0
2         0       1              0      0
3         0       0              1      0
4         0       0              0      1

Of course, this has the side effect of returning dummies with categorical column names as the result, but that's almost never an issue.

Sign up to request clarification or add additional context in comments.

2 Comments

What if I would like to pick a specific category, for example Staten Island? Then it won't be on a 'first-seen' basis anymore.
@leecolin your question doesn't indicate that that could be a possible case? This would still work, you would just need to change the argument to categories as appropriate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.