How to create new DataFrame based on conditions from another DataFrame

Question

Suppose I have the following DataFrame:

A   B   C   D   E   F    Cost 
0   1   1   0   0   0    10
0   0   1   0   1   0    3
1   0   0   0   0   1    5
0   1   0   1   0   0    7

I want to construct a new DataFrame based on the values above.

Specifically, if value==1 then combine their columns into one and assign value for the new column from Cost column above.

So the expected output would be something like:

BC  CE  AF  BD    
10   3   5   7

How can I achieve such thing?

Please do not use images, as its makes it difficult to copy the data and also takes more bandwidth. — quest
– quest, Commented Aug 3, 2021 at 15:17
@quest sorry about that, I edited the question to avoid using images — Emma
– Emma, Commented Aug 3, 2021 at 16:46

Henry Ecker · Accepted Answer · 2021-08-03 15:51:14Z

2

We can try the dot of the the binary columns with the column names to get the key string based on 1s and 0s, then add the Cost Column back:

cols = df.columns.difference(['Cost'])
new_df = df[cols].dot(cols).to_frame(name='key')
new_df['Cost'] = df['Cost']

new_df:

  key  Cost
0  BC    10
1  CE     3
2  AF     5
3  BD     7

The DataFrame can be transposed if needed:

cols = df.columns.difference(['Cost'])
new_df = df[cols].dot(cols).to_frame(name='key')
new_df['Cost'] = df['Cost']
new_df = new_df.set_index('key').T.rename_axis(columns=None)

new_df:

      BC  CE  AF  BD
Cost  10   3   5   7

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    "A": [0, 0, 1, 0],
    "B": [1, 0, 0, 1],
    "C": [1, 1, 0, 0],
    "D": [0, 0, 0, 1],
    "E": [0, 1, 0, 0],
    "F": [0, 0, 1, 0],
    "Cost": [10, 3, 5, 7],
})

answered Aug 3, 2021 at 15:51

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Panwen Wang · Accepted Answer · 2021-08-03 17:33:15Z

You don't need a loop to do it. With datar, you can achieve it with dplyr-like syntax:

>>> from datar.all import *
>>> 
>>> # Create the df
>>> df = tribble(
...     f.A,   f.B,   f.C,   f.D,   f.E,   f.F,    f.Cost,
...     0,     1,     1,     0,     0,     0,      10,
...     0,     0,     1,     0,     1,     0,      3,
...     1,     0,     0,     0,     0,     1,      5,
...     0,     1,     0,     1,     0,     0,      7,
... )
>>> df
        A       B       C       D       E       F    Cost
  <int64> <int64> <int64> <int64> <int64> <int64> <int64>
0       0       1       1       0       0       0      10
1       0       0       1       0       1       0       3
2       1       0       0       0       0       1       5
3       0       1       0       1       0       0       7
>>> # replace value with column names
>>> df = df >> mutate(across(f[1:6], lambda x: if_else(x, x.name, "")))  
>>> df
         A        B        C        D        E        F    Cost
  <object> <object> <object> <object> <object> <object> <int64>
0                 B        C                                 10
1                          C                 E                3
2        A                                            F       5
3                 B                 D                         7
>>> # unite the columns
>>> df = df >> unite('col', f[1:6], sep="") 
>>> df
       col    Cost
  <object> <int64>
0       BC      10
1       CE       3
2       AF       5
3       BD       7
>>> # reshape the result
>>> df >> column_to_rownames(f.col) >> t()
          BC      CE      AF      BD
     <int64> <int64> <int64> <int64>
Cost      10       3       5       7

Disclaimer: I am the author of the datar package.

quest · Accepted Answer · 2021-08-03 15:35:52Z

1

Here is how I will proceed:

Create the df

data = {
    "A": [0, 0, 1, 0],
    "B": [1, 0, 0, 1],
    "C": [1, 1, 0, 0],
    "D": [0, 0, 0, 1],
    "E": [0, 1, 0, 0],
    "F": [0, 0, 1, 0],
    "Cost": [10, 3, 5, 7],
}
df = pd.DataFrame(data)

Get the columns names

def make_df(row):
    row = row.to_dict()
    return "".join([k for k, v in row.items() if v if k!="Cost"])

df_ind = df.apply(make_df, axis=1)

Create the desired data frame

pd.DataFrame(df.Cost.values, index=df_ind.values).T

This will give you:

 BC CE  AF  BD
 10 3   5   7

answered Aug 3, 2021 at 15:35

quest

3,9762 gold badges18 silver badges27 bronze badges

Comments

Just Honza · Accepted Answer · 2021-08-03 16:15:50Z

1

Not as nice as previous answers, but straightforward & step-by-step:

outp_dict = {}

for index, row in df.iterrows():
    new_col = ""
    col_nr = 0
    for value in row:
        if value and row.index[col_nr] is not "Cost":
            new_col += str(row.index[col_nr])
        col_nr += 1
    outp_dict[new_col] = row[-1]

outp_df = pd.DataFrame(outp_dict, index = [0])

answered Aug 3, 2021 at 16:15

Just Honza

2311 silver badge7 bronze badges

1 Comment

Emma Over a year ago

Thanks! It works on this specific example but did not show all possibilities in case of duplicate entries when values of A-F are the same but cost is different.

Collectives™ on Stack Overflow

How to create new DataFrame based on conditions from another DataFrame

4 Answers 4

Comments

1 Comment

Create the df

Get the columns names

Create the desired data frame

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Create the df

Get the columns names

Create the desired data frame

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related