Conditional looping: Pandas Python

Question

question regarding conditional looping on pandas dataframe. Data frame of interest is huge. We have student name(s) and their test score(s) at different time in columns (Please see below). A student is considered as fail if his/her score is less than 75 in any of the tests, pass otherwise. I'm not able to do it efficiently. Dataframe:

score = {'student_name': ['Jiten', 'Jac', 'Ali', 'Steve', 'Dave', 'James'],
    'test_quiz_1': [74, 81, 84, 67, 59, 96],
'test_quiz_2': [76, np.NaN, 99, 77, 53, 69],
'test_mid_term': [76, 88, 84, 67, 58, np.NaN],
'test_final_term': [76, 78, 89, 67, 58, 96]}

df = pd.DataFrame(score, columns = ['student_name', 'test_quiz_1', 'test_quiz_2', 'test_mid_term', 'test_final_term'])

My approach: (Modifying based on Jacques Kvam's Answer)

df.test_quiz_1 > 70

This(^) gives me location where particular student fail. The same can be repeated for other tests (df.test_quiz_2, ...). Finally, I need to combine these all into one final column where student is failed if he/she fails at any test.

Edited: I have very little knowledge about python and pandas. I'm writing pseudo code as to how I would have implemented in C/C++.

for student in student_list:
    value=0
    for i in range (no_of_test):
        if (score<75):
             value=value+1
        else:
             continue
    if(value>0):
         student[status]=fail
    else:
         student[status]=pass

Above is just a pseudo code. I'm not creating any additional column to mark if student fail in any test or not. Is it possible to implement something similar in Python using Pandas.

Please advice.

Jacques Kvam · Accepted Answer · 2017-07-26 05:18:57Z

2

Instead of looping, you should use pandas vector operations it inherits from numpy. For example, to mark people that passed test_quiz_1:

df.test_quiz_1 > 70

Giving:

0     True
1     True
2     True
3    False
4    False
5     True
Name: test_quiz_1, dtype: bool

Edit: Continuing let's say you have 3 tests with 5 students and represent it as a boolean dataframe:

      0      1      2
0  True   True  False
1  True   True   True
2  True  False  False
3  True  False   True
4  True  False  False

The student passes if they pass all the tests, so we can run df.all(axis=1) to check if they passed all tests, which gives:

0    False
1     True
2    False
3    False
4    False
dtype: bool

Only student 1 passed in this case.

edited Jul 26, 2017 at 5:18

answered Jul 26, 2017 at 4:48

Jacques Kvam

3,1162 gold badges30 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Xingfang Lee Over a year ago

Thanks. I have modified the question based on your answer. Please advice.

Xingfang Lee Over a year ago

Thanks again. Can't we do away with creating additional columns in data frame (Ref. col. 0, 1, 2 in boolean dataframe above).

piRSquared · Accepted Answer · 2017-07-26 04:53:07Z

1

df.set_index('student_name').lt(75).any(1)
# `lt` is the method version of `<`
# this identifies students that received
# a score less than 75 on any of the tests.

student_name
Jiten     True
Jac      False
Ali      False
Steve     True
Dave      True
James     True
dtype: bool

answered Jul 26, 2017 at 4:53

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Anton vBR · Accepted Answer · 2017-07-26 08:55:30Z

0

I think this suits your needs:

cols = df.columns.drop("student_name").tolist()
df["PassOrFail"] = df[cols].fillna(0).lt(75).any(1)

for i in cols:
    df[i+"_"] = df[i].fillna(0).lt(75)

Explanation

First we create a list with the relevant columns:

['test_quiz_1', 'test_quiz_2', 'test_mid_term', 'test_final_term']

We then create a new col ["PassOrFail”] which checks if the dataframe conataining the relevant columns (np.Nan=0) is lower than 75.

And lastly create a new column for every relevant column with True or False values.

Update

Let's say we are only interested in getting True or False, then the following code should be sufficient:

cols = df.columns.drop("student_name").tolist()
results = df[cols].fillna(0).lt(75).any(1).tolist()
(~pd.Series(results,index=df["student_name"])).to_dict()

Outputs:

{'Ali': True,
'Dave': False,
'Jac': False,
'James': False,
'Jiten': False,
'Steve': False}

edited Jul 26, 2017 at 8:55

answered Jul 26, 2017 at 6:34

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

2 Comments

Xingfang Lee Over a year ago

Thanks. Please check edited section of the question.

Anton vBR Over a year ago

@XingfangLee Updated my answer. Is this what you are asking for?

Collectives™ on Stack Overflow

Conditional looping: Pandas Python

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related