Python - Add rows based on information in columns

Question

I want to add rows in Python based on the information in some of the columns. For example let's say this is my data

df = pd.DataFrame({
    'ID':[1,2,3],
    'E Test':['Y','Y','N'],
    'M Test':['Y','Y','Y'],
})

For the row with ID equal to 1, I'd like to add a column that says if the column labeled "E Test" equals "Y" then new column "Test Date" equals "April 1". I'd like to do the same for the "M Test" but with a different date and add a completely new row for the ID equal to 1. Therefore there would be 2 rows that have the ID equal to 1 and with different "Test Date" numbers.

Here is what it would look like ideally:

hi and welcome! Thank you for providing an example of your input data. I've edited your question so that it's easier for people to create a pandas dataframe from your input. Can you please add a screenshot of what you expect your output to be? — mitoRibo
– mitoRibo, Commented Jun 6, 2022 at 22:44
if you are happy with one of the answers below please accept it with the green checkmark button — mitoRibo
– mitoRibo, Commented Jun 7, 2022 at 1:41

BeRT2me · Accepted Answer · 2022-06-06 23:37:39Z

1

I use two different dates to show how they can be individually changed.

df2 = df.melt('ID', var_name='Test', value_name='Test Date')
df2['Test'] = df2['Test'].str[0]
df2.replace({'Y': True, 'N': np.nan}, inplace=True)
df2.dropna(inplace=True)
df2.loc[df2['Test'].eq('E'), 'Test Date'] = '1-Apr'
df2.loc[df2['Test'].eq('M'), 'Test Date'] = '2-Apr'
df2 = df2.sort_values('ID').reset_index(drop=True)
print(df2)

Output:

   ID Test Test Date
0   1    E     1-Apr
1   1    M     2-Apr
2   2    E     1-Apr
3   2    M     2-Apr
4   3    M     2-Apr

Filtered down to just ID == 1:

print(df2[df2['ID'].eq(1)])

...

   ID Test Test Date
0   1    E     1-Apr
1   1    M     2-Apr

edited Jun 6, 2022 at 23:37

answered Jun 6, 2022 at 23:06

BeRT2me

13.3k2 gold badges18 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Marc Williams Over a year ago

Thank you! Quick question on this, if my data set were bigger could I scale this?

BeRT2me Over a year ago

Depends on what you mean by 'bigger'.

Marc Williams Over a year ago

A dataframe with 2600 original rows

BeRT2me Over a year ago

Okay, that tells me nothing more about the problem. Are you going to have tons of different Tests/Date combinations? Will there be more than just Y and N? Do you want to keep rows with N?

Marc Williams Over a year ago

No more Tests/Date combinations. There will only by Y and N. Do not want to keep rows with N.

|

sitting_duck · Accepted Answer · 2022-06-07 00:41:39Z

Generated test data :

   ID E TEST M TEST
0   1      Y      Y
1   3      Y      N
2   4      Y      N
3   5      N      Y
4   6      N      Y
5   7      Y      N
6   8      Y      Y
7   9      Y      Y
8  10      N      Y

Then:

test_dates={'E':'1-Apr','M':'1-May'}

df = df.melt(id_vars='ID').sort_values(['ID','variable']) \
    .assign(Test=lambda x: x['variable'].str.slice(0,1)) \
    .assign(**{'Test Date': lambda x: x['Test'].map(test_dates)}) \
    .loc[lambda x: x['value']=='Y',['ID','Test','Test Date']].reset_index(drop=True)

print(df)

Which results in:

    ID Test Test Date
0    1    E     1-Apr
1    1    M     1-May
2    3    E     1-Apr
3    4    E     1-Apr
4    5    M     1-May
5    6    M     1-May
6    7    E     1-Apr
7    8    E     1-Apr
8    8    M     1-May
9    9    E     1-Apr
10   9    M     1-May
11  10    M     1-May

Collectives™ on Stack Overflow

Python - Add rows based on information in columns

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related