0

I have a dataframe as given below.

data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
    'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3], 
    'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ], 
    'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df

Note: 'time' column is in DateTime format if it helps.

enter image description here

I want to create a new dataframe in which the multiple values of 'Participant' are consolidated to 1 row with the creation of multiple rows of time and result. The required final result is given below as in how it should look.

enter image description here Any help is greatly appreciated. Thanks.

2 Answers 2

1

You can use pd.pivot_table:

df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'], 
                                      columns=['test_res'],
                                      values=['date','result'],
                                      aggfunc = 'first'
                                      )
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()

df1: enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

This doesn't make it because for Participant 'A', for the first test, the result is 'negative'. In this case its 'NaN'. I think its taking the values randomly, I am not sure.
@shivakumar: oops sorry the issue was with the df1.columns = sorted(df1.columns)
@shivakumar: Actually I made blunder. I was trying to sort the columns but instead rename the columns. That's why you saw NaN which was part of date column not the result
1

Try:

x = pres_df.groupby("Participant", as_index=False).agg(
    {"Total test Result": "first", "result": list, "time": list}
)

a = x.pop("result").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_Result" for v in range(1, len(x) + 1)]
    )
)
b = x.pop("time").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_date" for v in range(1, len(x) + 1)]
    )
)

out = pd.concat([x, a, b], axis=1).sort_index(axis=1)
print(out)

Prints:

  Participant  Total test Result test1_Result  test1_date test2_Result  test2_date test3_Result  test3_date test4_Result  test4_date
0           A                  1     negative  2021-06-14          NaN         NaN          NaN         NaN          NaN         NaN
1           B                  4     negative  2021-06-21     negative  2021-06-24     negative  2021-06-28     negative  2021-07-01
2           C                  2     negative  2021-07-05     negative  2021-07-08          NaN         NaN          NaN         NaN
3           D                  3     negative  2021-06-17     negative  2021-06-17     negative  2021-06-20          NaN         NaN

1 Comment

Working perfectly well. Thanks a lot!. This is some usage of pandas, pop and Series function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.