Create a DataFrame from present Dataframe with multiple conditions

Question

I have a dataframe as given below.

data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
    'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3], 
    'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ], 
    'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df

Note: 'time' column is in DateTime format if it helps.

I want to create a new dataframe in which the multiple values of 'Participant' are consolidated to 1 row with the creation of multiple rows of time and result. The required final result is given below as in how it should look.

Any help is greatly appreciated. Thanks.

Pygirl · Accepted Answer · 2021-07-17 05:08:30Z

1

You can use pd.pivot_table:

df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'], 
                                      columns=['test_res'],
                                      values=['date','result'],
                                      aggfunc = 'first'
                                      )
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()

df1:

edited Jul 17, 2021 at 5:08

answered Jul 16, 2021 at 18:14

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

shiva kumar Over a year ago

This doesn't make it because for Participant 'A', for the first test, the result is 'negative'. In this case its 'NaN'. I think its taking the values randomly, I am not sure.

Pygirl Over a year ago

@shivakumar: oops sorry the issue was with the df1.columns = sorted(df1.columns)

Pygirl Over a year ago

@shivakumar: Actually I made blunder. I was trying to sort the columns but instead rename the columns. That's why you saw NaN which was part of date column not the result

Andrej Kesely · Accepted Answer · 2021-07-16 18:11:20Z

1

Try:

x = pres_df.groupby("Participant", as_index=False).agg(
    {"Total test Result": "first", "result": list, "time": list}
)

a = x.pop("result").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_Result" for v in range(1, len(x) + 1)]
    )
)
b = x.pop("time").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_date" for v in range(1, len(x) + 1)]
    )
)

out = pd.concat([x, a, b], axis=1).sort_index(axis=1)
print(out)

Prints:

  Participant  Total test Result test1_Result  test1_date test2_Result  test2_date test3_Result  test3_date test4_Result  test4_date
0           A                  1     negative  2021-06-14          NaN         NaN          NaN         NaN          NaN         NaN
1           B                  4     negative  2021-06-21     negative  2021-06-24     negative  2021-06-28     negative  2021-07-01
2           C                  2     negative  2021-07-05     negative  2021-07-08          NaN         NaN          NaN         NaN
3           D                  3     negative  2021-06-17     negative  2021-06-17     negative  2021-06-20          NaN         NaN

answered Jul 16, 2021 at 18:11

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

1 Comment

shiva kumar Over a year ago

Working perfectly well. Thanks a lot!. This is some usage of pandas, pop and Series function.

Collectives™ on Stack Overflow

Create a DataFrame from present Dataframe with multiple conditions

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related