0

I'm trying to create a Python script using pandas that can import a .txt file and calculate the average of each subject

I'm trying to turn this "file.txt":

code name subject1 subject2 subject3
1234 Ali 6 0 8
1235 Carl 4 7 7
1236 Jason 3 5 0

and turn in intro this:

subject1 average is: 4.3
subject2 average is: 6
subject3 average is: 7.5
  • subject1 is calculated like this: (6 + 4 + 3) / 3,
  • subject2 is calculated like this: (7 + 5) / 2 <-- because one person has a 0 means he/she didn't anticipate so their 0 does't add and counts toward the average

  • subject3 is calculated like this: (8 + 7) / 2 <-- Like above

    I'm also trying to figure out a way for the script to be flexible and have the ability to add more subjects and more people (so 3 instead of 5)

This is my code until now:

# read input file
df = pd.read_csv('file.txt')

# calculate mean, ignoring 0 values
df['mean'] = df.iloc[:, 2:].astype(float).replace(0, np.nan).mean(1)

# iterate rows and print results
for name, mean in df.set_index('name')['mean'].items():
    print(f'{name} has average of {mean:.2f}')
  • It calculates the average of each person (horizontally)
  • but I can't figure out a way to do it vertically for each subject.

thanks for the help guys ^_^

2
  • 2
    What kind of help do you expect? Do you want us to write code for you? If so, we don't do that: we only help with specific issues in concrete code. Otherwise, please post the code you've written to solve this and explain what the issue is. Commented Oct 7, 2018 at 13:47
  • @ForceBru, I added more information, I already have some code I hope it help thanks! Commented Oct 7, 2018 at 13:58

2 Answers 2

2

The argument 1 that you provide to pd.Series.mean is the axis along which the mean is calculated; the default is columns, so you are explicitly telling it to calculate the row-wise mean. Remove that argument and you should be good.

In [155]: df.iloc[:, 2:].astype(float).replace(0, np.nan).mean()
Out[155]:
subject1    4.333333
subject2    6.000000
subject3    7.500000
Sign up to request clarification or add additional context in comments.

3 Comments

Is it possible to both calculate the horizontal and vertical to print both calculations?
Well, that's what you're doing. If you let df_nan = df.iloc[:, 2:].astype(float).replace(0, np.nan), then you could print df_nan.mean() first, then df_nan.mean(1) afterwards.
Great, you're welcome. If you found the answer helpful, you can accept it. Besides giving us mostly useless internet points, this helps to indicate which questions on StackOverflow are still in need of attention.
0

If I understand you good, you want to do this.

import pandas as pd
data=pd.read_csv('data.csv',sep=' ')
    #You can change the range for number of subjects
    for i in range(1,4):
    #Print average for subject
    print(datos['subject'+str(i)].mean())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.