2

Suppose I have the following DataFrame:

A   B   C   D   E   F    Cost 
0   1   1   0   0   0    10
0   0   1   0   1   0    3
1   0   0   0   0   1    5
0   1   0   1   0   0    7

I want to construct a new DataFrame based on the values above.

Specifically, if value==1 then combine their columns into one and assign value for the new column from Cost column above.

So the expected output would be something like:

BC  CE  AF  BD    
10   3   5   7   

How can I achieve such thing?

3
  • 1
    Please do not use images, as its makes it difficult to copy the data and also takes more bandwidth. Commented Aug 3, 2021 at 15:17
  • 1
    @quest sorry about that, I edited the question to avoid using images Commented Aug 3, 2021 at 16:46
  • No apologies needed :), I have learned the same way. Commented Aug 3, 2021 at 17:58

4 Answers 4

2

We can try the dot of the the binary columns with the column names to get the key string based on 1s and 0s, then add the Cost Column back:

cols = df.columns.difference(['Cost'])
new_df = df[cols].dot(cols).to_frame(name='key')
new_df['Cost'] = df['Cost']

new_df:

  key  Cost
0  BC    10
1  CE     3
2  AF     5
3  BD     7

The DataFrame can be transposed if needed:

cols = df.columns.difference(['Cost'])
new_df = df[cols].dot(cols).to_frame(name='key')
new_df['Cost'] = df['Cost']
new_df = new_df.set_index('key').T.rename_axis(columns=None)

new_df:

      BC  CE  AF  BD
Cost  10   3   5   7

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    "A": [0, 0, 1, 0],
    "B": [1, 0, 0, 1],
    "C": [1, 1, 0, 0],
    "D": [0, 0, 0, 1],
    "E": [0, 1, 0, 0],
    "F": [0, 0, 1, 0],
    "Cost": [10, 3, 5, 7],
})
Sign up to request clarification or add additional context in comments.

Comments

2

You don't need a loop to do it. With datar, you can achieve it with dplyr-like syntax:

>>> from datar.all import *
>>> 
>>> # Create the df
>>> df = tribble(
...     f.A,   f.B,   f.C,   f.D,   f.E,   f.F,    f.Cost,
...     0,     1,     1,     0,     0,     0,      10,
...     0,     0,     1,     0,     1,     0,      3,
...     1,     0,     0,     0,     0,     1,      5,
...     0,     1,     0,     1,     0,     0,      7,
... )
>>> df
        A       B       C       D       E       F    Cost
  <int64> <int64> <int64> <int64> <int64> <int64> <int64>
0       0       1       1       0       0       0      10
1       0       0       1       0       1       0       3
2       1       0       0       0       0       1       5
3       0       1       0       1       0       0       7
>>> # replace value with column names
>>> df = df >> mutate(across(f[1:6], lambda x: if_else(x, x.name, "")))  
>>> df
         A        B        C        D        E        F    Cost
  <object> <object> <object> <object> <object> <object> <int64>
0                 B        C                                 10
1                          C                 E                3
2        A                                            F       5
3                 B                 D                         7
>>> # unite the columns
>>> df = df >> unite('col', f[1:6], sep="") 
>>> df
       col    Cost
  <object> <int64>
0       BC      10
1       CE       3
2       AF       5
3       BD       7
>>> # reshape the result
>>> df >> column_to_rownames(f.col) >> t()
          BC      CE      AF      BD
     <int64> <int64> <int64> <int64>
Cost      10       3       5       7

Disclaimer: I am the author of the datar package.

1 Comment

Interesting! I did not know about datar.. Thanks!
1

Here is how I will proceed:

Create the df

data = {
    "A": [0, 0, 1, 0],
    "B": [1, 0, 0, 1],
    "C": [1, 1, 0, 0],
    "D": [0, 0, 0, 1],
    "E": [0, 1, 0, 0],
    "F": [0, 0, 1, 0],
    "Cost": [10, 3, 5, 7],
}
df = pd.DataFrame(data)

Get the columns names

def make_df(row):
    row = row.to_dict()
    return "".join([k for k, v in row.items() if v if k!="Cost"])

df_ind = df.apply(make_df, axis=1)

Create the desired data frame

pd.DataFrame(df.Cost.values, index=df_ind.values).T

This will give you:

 BC CE  AF  BD
 10 3   5   7

Comments

1

Not as nice as previous answers, but straightforward & step-by-step:

outp_dict = {}

for index, row in df.iterrows():
    new_col = ""
    col_nr = 0
    for value in row:
        if value and row.index[col_nr] is not "Cost":
            new_col += str(row.index[col_nr])
        col_nr += 1
    outp_dict[new_col] = row[-1]

outp_df = pd.DataFrame(outp_dict, index = [0])

1 Comment

Thanks! It works on this specific example but did not show all possibilities in case of duplicate entries when values of A-F are the same but cost is different.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.