1

I have this DataFrame with both categorical and non-categorical data and I would like to dummy encode it but not all dummy values that I know are possible are present in the data.

For example let's use the following DataFrame:

>>> df = pd.DataFrame({"a": [1,2,3], "b": ["x", "y", "x"], "c": ["h", "h", "i"]})
>>> df
   a  b  c
0  1  x  h
1  2  y  h
2  3  x  i

Column a has a non-categorical values but both column b and c are categorical.

Now let's say column b can contain the categories x, y and z and column c the categories h, i, j and k

>>> dummy_map = {"b": ["x", "y", "z"], "c": ["h", "i", "j", "k"]}

I want to encode it so that the resulting dataframe is as follows:

>>> df_encoded
    a  b_x   b_y   b_z  c_h   c_i   c_j   c_k
0   1   1     0     0    1     0     0     0
1   2   0     1     0    1     0     0     0
2   3   1     0     0    0     1     0     0

My current solution is as follows:

df_encoded = pd.get_dummies(df)
for k, v in dummy_map.items():
  for cat in v:
    name = k + "_" + cat
    if name not in result:
      df_encoded[name] = 0

But it seems to me a bit inefficient and inelegant. So is there a better solution for this?

1 Answer 1

1

Use Index.union with vae values generated by list comprehension and f-strings and DataFrame.reindex:

c = [f'{k}_{x}' for k, v in dummy_map.items() for x in v]
print (c)
['b_x', 'b_y', 'b_z', 'c_h', 'c_i', 'c_j', 'c_k']

df_encoded = pd.get_dummies(df)

vals = df_encoded.columns.union(c, sort=False)
df_encoded = df_encoded.reindex(vals, axis=1, fill_value=0)
print (df_encoded)
   a  b_x  b_y  c_h  c_i  b_z  c_j  c_k
0  1    1    0    1    0    0    0    0
1  2    0    1    1    0    0    0    0
2  3    1    0    0    1    0    0    0

If values should be sorted in union:

df_encoded = pd.get_dummies(df)

vals = df_encoded.columns.union(c)
df_encoded = df_encoded.reindex(vals, axis=1, fill_value=0)
print (df_encoded)
   a  b_x  b_y  b_z  c_h  c_i  c_j  c_k
0  1    1    0    0    1    0    0    0
1  2    0    1    0    1    0    0    0
2  3    1    0    0    0    1    0    0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.