1

If I have a Pandas dataframe like so:

colA colB
 A    A1
 B    C1
 A    B1
 B    A1

colA has 2 unique values (A, B) and colB has 3 unique values (A1, B1 and C1).

I would like to create a new dataframe where colA and colB are all combinations and another column colC which is 1 or 0 based on the combination present in earlier df.

expected result:

colA colB colC
 A    A1   1
 A    B1   1
 A    C1   0
 B    A1   1
 B    B1   0
 B    C1   1

1 Answer 1

5

First create new column by DataFrame.assign filled by 1, then create MultiIndex.from_product by Series.unique values of both columns and after DataFrame.set_index use DataFrame.reindex - there is possible set value for new appended rows in colC column by fill_value parameter:

mux = pd.MultiIndex.from_product([df['colA'].unique(),
                                  df['colB'].unique()], names=['colA','colB'])
df1 = df.assign(colC = 1).set_index(['colA','colB']).reindex(mux, fill_value=0).reset_index()
print (df1)
  colA  colB  colC
0      A  A1     1
1      A  C1     0
2      A  B1     1
3      B  A1     1
4      B  C1     1
5      B  B1     0

Alternative is use reshape by DataFrame.set_index, Series.unstack and DataFrame.stack:

df1 = (df.assign(colC = 1)
         .set_index(['colA','colB'])['colC']
         .unstack(fill_value=0)
         .stack()
         .reset_index(name='ColC'))

print (df1)
  colA colB  ColC
0    A   A1     1
1    A   B1     1
2    A   C1     0
3    B   A1     1
4    B   B1     0
5    B   C1     1

Another solution is create new DataFrame by itertools.product, DataFrame.merge with indicator=True, rename column and set by compare by both and casting to integer for True/False to 1/0 mapping:

from  itertools import product
df1 = pd.DataFrame(product(df['colA'].unique(), df['colB'].unique()), columns=['colA','colB'])
df = df1.merge(df, how='left', indicator=True).rename(columns={'_merge':'colC'})
df['colC'] = df['colC'].eq('both').astype(int)
print (df)
  colA colB  colC
0    A   A1     1
1    A   C1     0
2    A   B1     1
3    B   A1     1
4    B   C1     1
5    B   B1     0

Last if necessary add sorting by both columns by DataFrame.sort_values:

df1 = df1.sort_values(['colA','colB'])
Sign up to request clarification or add additional context in comments.

2 Comments

Could you add an explanation for what you're doing, especially for the assign() line?
@FedericoS - Done :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.