Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

Question

I have the following dataframe:

     Name   rollNumber   external_roll_number    testDate      marks 

0    John      34             234               2021-04-28      15 

1    John      34             234               2021-03-28      25

I would like to convert it like this:

     Name   rollNumber   external_roll_number    testMonth      marks    testMonth      marks

0    John      34             234                  April          15       March         25

If the above is not possible then I would atleast want it to be like this:

     Name   rollNumber   external_roll_number    testDate      marks    testDate      marks

0    John      34             234                2021-04-28      15     2021-03-28       25

How can I convert my dataframe to the desired output? This change will be based on the Name column of the rows.

EDIT 1

I tried using pivot_table like this but I did not get the desired result.

merged_df_pivot = pd.pivot_table(merged_df, index=["name", "testDate"], aggfunc="first", dropna=False).fillna("")

When I try to iterate through the merged_df_pivot like this:

for index, details in merged_df_pivot.iterrows():

I am again getting two rows and also I was not able to add the new testMonth column by the above method.

I have added the code that I had tried before.

user001
– user001

2021-04-27 17:38:53 +00:00
Commented Apr 27, 2021 at 17:38 — user001
– user001, Commented Apr 27, 2021 at 17:38

Rob Raymond · Accepted Answer · 2021-04-27 18:35:38Z

1

core is unstack() month to be columns
detail then to re-structure month-by month marks columns to required structure
generally consider bad practice to have duplicate column names, hence have suffixed them

df = pd.read_csv(io.StringIO("""     Name   rollNumber   external_roll_number    testDate      marks 
0    John      34             234               2021-04-28      15 
1    John      34             234               2021-03-28      25
"""), sep="\s+")

df["testDate"] =pd.to_datetime(df["testDate"])
df = df.assign(testMonth = df["testDate"].dt.strftime("%B")).drop(columns="testDate")


dft = (df.set_index([c for c in df.columns if c!="marks"])
 .unstack("testMonth") # make month a column
 .droplevel(0, axis=1) # remove unneeded level in columns
 # create columns for months from column names and rename marks columns
 .pipe(lambda d: d.assign(**{f"testMonth_{i+1}":c 
                             for i,c in enumerate(d.columns)}).rename(columns={c:f"marks_{i+1}" 
                                                                               for i,c in enumerate(d.columns)}))
 .reset_index()
)

output

	Name	rollNumber	external_roll_number	marks_1	marks_2	testMonth_1	testMonth_2
0	John	34	234	15	25	April	March

edited Apr 27, 2021 at 18:35

answered Apr 27, 2021 at 17:58

Rob Raymond

31.5k3 gold badges19 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user001 Over a year ago

Is there anyway to send all column names at once in set_index instead of specifying each one explicitly? I have added only some columns for the purpose of this question. There are more than 20 columns in the dataframe in reality.

Rob Raymond Over a year ago

yep - updated. use a list comprehension excluding col that should not go into index

Rob Raymond Over a year ago

on actual data or the sample data in the question? if actual data I'll need the to see actual columns and a few rows to check reason

user001 Over a year ago

Hey. Thanks. I resolved it. It was my mistake.

Collectives™ on Stack Overflow

Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

1 Answer 1

output

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

output

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related