0

question regarding conditional looping on pandas dataframe. Data frame of interest is huge. We have student name(s) and their test score(s) at different time in columns (Please see below). A student is considered as fail if his/her score is less than 75 in any of the tests, pass otherwise. I'm not able to do it efficiently. Dataframe:

score = {'student_name': ['Jiten', 'Jac', 'Ali', 'Steve', 'Dave', 'James'],
    'test_quiz_1': [74, 81, 84, 67, 59, 96],
'test_quiz_2': [76, np.NaN, 99, 77, 53, 69],
'test_mid_term': [76, 88, 84, 67, 58, np.NaN],
'test_final_term': [76, 78, 89, 67, 58, 96]}

df = pd.DataFrame(score, columns = ['student_name', 'test_quiz_1', 'test_quiz_2', 'test_mid_term', 'test_final_term'])

My approach: (Modifying based on Jacques Kvam's Answer)

df.test_quiz_1 > 70

This(^) gives me location where particular student fail. The same can be repeated for other tests (df.test_quiz_2, ...). Finally, I need to combine these all into one final column where student is failed if he/she fails at any test.

Edited: I have very little knowledge about python and pandas. I'm writing pseudo code as to how I would have implemented in C/C++.

for student in student_list:
    value=0
    for i in range (no_of_test):
        if (score<75):
             value=value+1
        else:
             continue
    if(value>0):
         student[status]=fail
    else:
         student[status]=pass

Above is just a pseudo code. I'm not creating any additional column to mark if student fail in any test or not. Is it possible to implement something similar in Python using Pandas.

Please advice.

3 Answers 3

2

Instead of looping, you should use pandas vector operations it inherits from numpy. For example, to mark people that passed test_quiz_1:

df.test_quiz_1 > 70

Giving:

0     True
1     True
2     True
3    False
4    False
5     True
Name: test_quiz_1, dtype: bool

Edit: Continuing let's say you have 3 tests with 5 students and represent it as a boolean dataframe:

      0      1      2
0  True   True  False
1  True   True   True
2  True  False  False
3  True  False   True
4  True  False  False

The student passes if they pass all the tests, so we can run df.all(axis=1) to check if they passed all tests, which gives:

0    False
1     True
2    False
3    False
4    False
dtype: bool

Only student 1 passed in this case.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I have modified the question based on your answer. Please advice.
Thanks again. Can't we do away with creating additional columns in data frame (Ref. col. 0, 1, 2 in boolean dataframe above).
1
df.set_index('student_name').lt(75).any(1)
# `lt` is the method version of `<`
# this identifies students that received
# a score less than 75 on any of the tests.

student_name
Jiten     True
Jac      False
Ali      False
Steve     True
Dave      True
James     True
dtype: bool

Comments

0

I think this suits your needs:

cols = df.columns.drop("student_name").tolist()
df["PassOrFail"] = df[cols].fillna(0).lt(75).any(1)

for i in cols:
    df[i+"_"] = df[i].fillna(0).lt(75)

Explanation

First we create a list with the relevant columns:

['test_quiz_1', 'test_quiz_2', 'test_mid_term', 'test_final_term']

We then create a new col ["PassOrFail”] which checks if the dataframe conataining the relevant columns (np.Nan=0) is lower than 75.

And lastly create a new column for every relevant column with True or False values.

Update

Let's say we are only interested in getting True or False, then the following code should be sufficient:

cols = df.columns.drop("student_name").tolist()
results = df[cols].fillna(0).lt(75).any(1).tolist()
(~pd.Series(results,index=df["student_name"])).to_dict()

Outputs:

{'Ali': True,
'Dave': False,
'Jac': False,
'James': False,
'Jiten': False,
'Steve': False}

2 Comments

Thanks. Please check edited section of the question.
@XingfangLee Updated my answer. Is this what you are asking for?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.