0

I have got three different .csv files which contain the grades for students in three different assignment. I would like to read them with pandas and calculate the average for each student. The template for each file is:

Student id, Mark, extra fields, ...
4358975489, 9,  ... ...
2345234523, 10,  ... ...
7634565323, 7,  ... ...
7653563366, 7,  ... ...
...         ...,  ... ...

For the second assignment:

Student id, Mark, extra fields, ...
4358975489, 6,  ... ...
2345234523, 8,  ... ...
7634565323, 4,  ... ...
7653563366, 5,  ... ...
...         ...,  ... ...

Desired output for the two doc for instance:

Student id, average, extra fields, ...
4358975489, 7.5,  ... ...
2345234523, 9,  ... ...
7634565323, 5.5,  ... ...
7653563366, 6,  ... ...
...         ...,  ... ...

the same for the last doc. I want to read these docs separately and for each student id to average the Mark.

Now, my code for reading one file is the following:

i_df1 = pandas.read_csv('first.csv')
i_df2 = pandas.read_csv('second.csv')
i_df3 = pandas.read_csv('third.csv')

print (o_df.keys())
for i, row in i_df1.iterrows():
    pdb.set_trace()

How can I process all three files simultaneously and extract the average grade?

2
  • pd.concat([i_df1, i_df2, i_df3]).groupby('Student id').mean(). Commented Nov 9, 2020 at 14:30
  • Would be great if you can show some sample data with expected output. Commented Nov 9, 2020 at 14:30

4 Answers 4

1

If you want to process all dfs together, you can do this:

df = df1.append([df2, df3]).groupby('Student id', as_index=False).mean()

OR:

If you want to do it simultaneously, you can use list comprehension with df.append and mean:

Below is the list of your dfs:

In [1220]: df_list = [i_df1, i_df2, i_df3]

You can simultaneously find average of each student in a file and store the output in another list:

In [1223]: df = [i.groupby('Student_id', as_index=False).mean() for i in df_list]
Sign up to request clarification or add additional context in comments.

4 Comments

I want to average the Mark though and not the student id.
Just to avoid confusions, can you please add some sample dataframes with final expected output. It would really help, otherwise we will keep going back and forth.
Hey I just did so.
@JoseRamon Great. My first answer is exactly what you need. Try it with just 2 data frames. Like this df = df1.append(df2).groupby('Student_id').mean()
1

Use pd.concat to concat 3 df:

 i_df1 = pandas.read_csv('first.csv')
 i_df2 = pandas.read_csv('second.csv')
 i_df3 = pandas.read_csv('third.csv')

 df = pd.concat([i_df1, i_df2, i_df3])
 df.groupby('Student id').agg({'Mark':'mean'})

4 Comments

I don't think .agg({'Mark','mean'}) would work. Did you mean .agg({'Mark':'mean'})
df = pd.concat([i_df1, i_df2, i_df3]) returns for df just i_df1
Still, after the groupby and the averaging, I am getting just the i_df1 result.
Are u sure? I've already tested it and it works fine
1

You could also use the filenames and directly concatenate the dataframes together:

fnames = ["first.csv", "second.csv", "third.csv"]
df = pd.concat(pd.read_csv(fname) for fname in fnames)

df.groupby("Student id")["Mark"].mean()

Probably practically irrelevant, but perhaps nice to know anyway: This approach doesn't load the data into your memory twice, but only once.

Comments

1

using the data you gave assuming they get same marks three times

import pandas as pd
import numpy as np


data = [
    [4358975489, 9],
     [2345234523, 10],
    [7634565323, 7]]



data = np.array(data)
data = pd.DataFrame(data, columns=["student", "mark"])
data1 = pd.DataFrame(data, columns=["student", "mark"])
data2 = pd.DataFrame(data, columns=["student", "mark"])

std_maks = pd.concat([data, data1, data2]).groupby('student')
print(std_maks['mark'].mean())

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.