Create a column in pandas dataframe

Question

I have a dataframe as below:

df = pd.DataFrame({'ORDER':["A", "A", "A", "B", "B","B"], 'GROUP': ["A1C", "A1", "B1", "B1C", "M1", "M1C"]})
df['_A1_XYZ'] = 1
df['_A1C_XYZ'] = 2
df['_B1_XYZ'] = 3
df['_B1C_XYZ'] = 4
df['_M1_XYZ'] = 5
df

    ORDER   GROUP   _A1_XYZ   _A1C_XYZ   _B1_XYZ      _B1C_XYZ  _M1_XYZ
0   A       A1C      1          2             3       4          5     
1   A       A1       1          2             3       4          5     
2   A       B1       1          2             3       4          5     
3   B       B1C      1          2             3       4          5     
4   B       M1       1          2             3       4          5     
5   B       M1C      1          2             3       4          5

I want to create a column "NEW" based on column "GROUP" and all the columns that ends with XYZ as below: Based on the value of GROUP for each row df["NEW"] = df["_XYZ"].

For example, for 1st row, GROUP = A1C, So "NEW" = 2 (_A1C_XYZ), Similarly for 2nd row "NEW" = 1 (_A1_XYZ)

My expected output

    ORDER   GROUP   _A1_XYZ   _A1C_XYZ   _B1_XYZ      _B1C_XYZ  _M1_XYZ      NEW
0   A       A1C      1          2             3       4          5           2
1   A       A1       1          2             3       4          5           1
2   A       B1       1          2             3       4          5           3
3   B       B1C      1          2             3       4          5           4
4   B       M1       1          2             3       4          5           5
5   B       M1C      1          2             3       4          5

IS the B1C_XYZ column correct, or should it be _B1C_XYZ?

David Erickson
– David Erickson

2020-07-09 18:26:02 +00:00
Commented Jul 9, 2020 at 18:26 — David Erickson
– David Erickson, Commented Jul 9, 2020 at 18:26
@DavidErickson It should be _B1C_XYZ. Updated the question

Shanoo
– Shanoo

2020-07-09 18:33:51 +00:00
Commented Jul 9, 2020 at 18:33 — Shanoo
– Shanoo, Commented Jul 9, 2020 at 18:33

Scott Boston · Accepted Answer · 2020-07-09 19:06:50Z

1

Use pd.DataFrame.lookup:

df['NEW'] = df.lookup(df.index, '_'+df['GROUP']+'_XYZ')
df

Output:

  ORDER GROUP  _A1_XYZ  _A1C_XYZ  _B1_XYZ  _B1C_XYZ  _M1_XYZ  _M1C_XYZ  NEW
0     A   A1C        1         2        3         4        5         6    2
1     A    A1        1         2        3         4        5         6    1
2     A    B1        1         2        3         4        5         6    3
3     B   B1C        1         2        3         4        5         6    4
4     B    M1        1         2        3         4        5         6    5
5     B   M1C        1         2        3         4        5         6    6

Updated after question edited.

Or use stack and reindex,

(df['New'] = df.stack().reindex(zip(df.index, '_'+dfl['GROUP']+'_XYZ'))
               .rename('NEW').reset_index(level=1, drop=True))

df

Output:

  ORDER GROUP  _A1_XYZ  _A1C_XYZ  _B1_XYZ  _B1C_XYZ  _M1_XYZ  New
0     A   A1C        1         2        3         4        5    2
1     A    A1        1         2        3         4        5    1
2     A    B1        1         2        3         4        5    3
3     B   B1C        1         2        3         4        5    4
4     B    M1        1         2        3         4        5    5
5     B   M1C        1         2        3         4        5  NaN

edited Jul 9, 2020 at 19:06

answered Jul 9, 2020 at 18:40

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shanoo Over a year ago

This solution will not work if any of the values in GROUP doesnot have the corresponding column name. For example, for last row, the group value is M1C but we dont have "_M1C_XYZ", it should return nan

David Erickson · Accepted Answer · 2020-07-09 19:54:45Z

@ScottBoston's answer is better if all of the values in the rows are also columns, but I thought I'd share mine! Essentially, I create a new dataframe with the relevant columns, drop the duplicates, change the column names, transpose the dataframe and merge the column back in...

a = df.iloc[:,2:].drop_duplicates()
a.columns = [col.split('_')[1] for col in df.columns if '_' in col]
a = a.T.rename({0:'NEW'}, axis=1)
df = pd.merge(df, a, how='left', left_on='GROUP', right_index=True)
df

output:

ORDER   GROUP   _A1_XYZ _A1C_XYZ    _B1_XYZ _B1C_XYZ    _M1_XYZ  NEW
0   A   A1C     1       2           3       4           5        2.0
1   A   A1      1       2           3       4           5        1.0
2   A   B1      1       2           3       4           5        3.0
3   B   B1C     1       2           3       4           5        4.0
4   B   M1      1       2           3       4           5        5.0
5   B   M1C     1       2           3       4           5        NaN

Collectives™ on Stack Overflow

Create a column in pandas dataframe

2 Answers 2

Updated after question edited.

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Updated after question edited.

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related