2

I have the following dataframe:

     Name   rollNumber   external_roll_number    testDate      marks 

0    John      34             234               2021-04-28      15 

1    John      34             234               2021-03-28      25

I would like to convert it like this:

     Name   rollNumber   external_roll_number    testMonth      marks    testMonth      marks

0    John      34             234                  April          15       March         25

If the above is not possible then I would atleast want it to be like this:

     Name   rollNumber   external_roll_number    testDate      marks    testDate      marks

0    John      34             234                2021-04-28      15     2021-03-28       25

How can I convert my dataframe to the desired output? This change will be based on the Name column of the rows.

EDIT 1

I tried using pivot_table like this but I did not get the desired result.

merged_df_pivot = pd.pivot_table(merged_df, index=["name", "testDate"], aggfunc="first", dropna=False).fillna("")

When I try to iterate through the merged_df_pivot like this:

for index, details in merged_df_pivot.iterrows():

I am again getting two rows and also I was not able to add the new testMonth column by the above method.

1
  • I have added the code that I had tried before. Commented Apr 27, 2021 at 17:38

1 Answer 1

1
  • core is unstack() month to be columns
  • detail then to re-structure month-by month marks columns to required structure
  • generally consider bad practice to have duplicate column names, hence have suffixed them
df = pd.read_csv(io.StringIO("""     Name   rollNumber   external_roll_number    testDate      marks 
0    John      34             234               2021-04-28      15 
1    John      34             234               2021-03-28      25
"""), sep="\s+")

df["testDate"] =pd.to_datetime(df["testDate"])
df = df.assign(testMonth = df["testDate"].dt.strftime("%B")).drop(columns="testDate")


dft = (df.set_index([c for c in df.columns if c!="marks"])
 .unstack("testMonth") # make month a column
 .droplevel(0, axis=1) # remove unneeded level in columns
 # create columns for months from column names and rename marks columns
 .pipe(lambda d: d.assign(**{f"testMonth_{i+1}":c 
                             for i,c in enumerate(d.columns)}).rename(columns={c:f"marks_{i+1}" 
                                                                               for i,c in enumerate(d.columns)}))
 .reset_index()
)

output

Name rollNumber external_roll_number marks_1 marks_2 testMonth_1 testMonth_2
0 John 34 234 15 25 April March
Sign up to request clarification or add additional context in comments.

4 Comments

Is there anyway to send all column names at once in set_index instead of specifying each one explicitly? I have added only some columns for the purpose of this question. There are more than 20 columns in the dataframe in reality.
yep - updated. use a list comprehension excluding col that should not go into index
on actual data or the sample data in the question? if actual data I'll need the to see actual columns and a few rows to check reason
Hey. Thanks. I resolved it. It was my mistake.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.