How to loop over a dataframe and create list

Question

So, i have the following data below and i want to loop through the dataframe and perform some functions and at the end save the results from the function in a list. I am have trouble creating a list. i only get a single value in the list and not the two means which i intend to get. Anybody with a more effective way to solve this problem please share.


     dict = {'PassengerId' : [0.0, 0.001, 0.002, 0.003, 0.004, 0.006, 0.007, 0.008, 0.009, 0.01], 
'Survived' : [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0], 
'Pclass' : [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.5],
'Age' : [0.271, 0.472, 0.321, 0.435, 0.435, np.nan, 0.673, 0.02, 0.334, 0.171], 
'SibSp' : [0.125, 0.125, 0.0, 0.125, 0.0, 0.0, 0.0, 0.375, 0.0, 0.125], 
'Parch' : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.167, 0.333, 0.0], 
'Fare' : [0.014, 0.139, 0.015, 0.104, 0.016, 0.017, 0.101, 0.041, 0.022, 0.059]}


        
import pandas as pd
dicts = pd.DataFrame(dicts, columns = dicts.keys())
def Mean(self):
    list_mean = []
    list_all = []
    for i, row in dicts.iterrows():
        if (row['Age'] > 0.2) & (row['Fare'] < 0.1):
            list_all.append(row['PassengerId'])
        elif (row['Age'] > 0.2) & (row['Fare'] > 0.1):
            list_all.clear()
            list_all.append(row['PassengerId'])
    return list_mean.append(np.mean(list_all))
            
               
Mean()

Help Please!!

If I understand the question correctly, you are getting only item in the list and that is because you are returning as soon as your if condition is satisfied for the first value in the dataframe. I believe you should return the final value i.e. Return at the completion of the for loop. — Somu Sinhhaa
– Somu Sinhhaa, Commented Apr 22, 2021 at 11:57
@SomuSinhhaa Thank you for your reply, however i was able to solve the problem, i now have a new challange, could you help me check it out? i have modified the code. — deep learning Engineer
– deep learning Engineer, Commented Apr 22, 2021 at 14:02
Sorry, I still see your old code, where you are trying to return within the if block. You should return as mentioned in one of the answers i.e only after you have stored all the list elements in list_mean i.e after completion of for loop. Further if you have a different question, I would suggest you to open a new thread. — Somu Sinhhaa
– Somu Sinhhaa, Commented Apr 22, 2021 at 14:19
Request you to elaborate this line a bit more. Its not very clear "I only get a single value in the list and not the two means which i intend to get" I guess but not sure that you want to append to the list in case if either of your condition matches then in that case you have to use logical OR to combine the 2 conditions rather than if elif — Somu Sinhhaa
– Somu Sinhhaa, Commented Apr 22, 2021 at 14:30

Davinder Singh · Accepted Answer · 2021-04-22 17:15:35Z

1

Some of changes you have to made in you solution to resolve this issue. And for vectorized answer checkout my Code section.

1.

Return statement return list_mean should placed in function block not in if-block

Change:

. . .         
if (row['Age'] > self.age) & (row['Fare'] < self.fare):
                list_mean.append(row['PassengerId'])
                return list_mean            
. . .

To:

. . .
list_mean = []
for i, row in dicts.iterrows():
    if (row['Age'] > self.age) & (row['Fare'] < self.fare):
         list_mean.append(row['PassengerId'])
return list_mean
. . .

CODE :(Vectorized-Version-Solution) No need of defining explicit class to perform this action

import numpy as np
dict_ = {
    'PassengerId':
    [0.0, 0.001, 0.002, 0.003, 0.004, 0.006, 0.007, 0.008, 0.009, 0.01],
    'Survived': [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0],
    'Pclass': [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.5],
    'Age':
    [0.271, 0.472, 0.321, 0.435, 0.435, np.nan, 0.673, 0.02, 0.334, 0.171],
    'SibSp': [0.125, 0.125, 0.0, 0.125, 0.0, 0.0, 0.0, 0.375, 0.0, 0.125],
    'Parch': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.167, 0.333, 0.0],
    'Fare':
    [0.014, 0.139, 0.015, 0.104, 0.016, 0.017, 0.101, 0.041, 0.022, 0.059]
}

import pandas as pd
dicts = pd.DataFrame(dict_, columns=dict_.keys())

l1 = dicts['PassengerId'][np.logical_and(dicts['Age'] > 0.2, dicts['Fare'] < 0.1)]
l2 = dicts['PassengerId'][np.logical_and(dicts['Age'] > 0.2, dicts['Fare'] > 0.1)]

print( (sum(list(l1))/len(l1), sum(list(l2))/len(l2)) )

OUTPUT :

(0.00375, 0.0036666666666666666)

edited Apr 22, 2021 at 17:15

answered Apr 22, 2021 at 11:54

Davinder Singh

2,1722 gold badges10 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

deep learning Engineer Over a year ago

i edited the question, can you please help me with the most effective way to solve it.

Davinder Singh Over a year ago

@deeplearningEngineer please check the solution and lets me know any issue

deep learning Engineer Over a year ago

thanks for the reply, it looks alright. However, i was hoping if there was a way to do it using loop? the code is part of a larger code and it would be efficent to loop and not have a chunky code @Exploore X

Somu Sinhhaa · Accepted Answer · 2021-04-22 15:14:57Z

import pandas as pd
import numpy as np

dict = {'PassengerId' : [0.0, 0.001, 0.002, 0.003, 0.004, 0.006, 0.007, 0.008, 0.009, 0.01],
'Survived' : [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0],
'Pclass' : [1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.5],
'Age' : [0.271, 0.472, 0.321, 0.435, 0.435, np.nan, 0.673, 0.02, 0.334, 0.171],
'SibSp' : [0.125, 0.125, 0.0, 0.125, 0.0, 0.0, 0.0, 0.375, 0.0, 0.125],
'Parch' : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.167, 0.333, 0.0],
'Fare' : [0.014, 0.139, 0.015, 0.104, 0.016, 0.017, 0.101, 0.041, 0.022, 0.059]}

df = pd.DataFrame(dict, columns = dict.keys())

def calculate_mean():
    l1, l2 = [], []
    for i, row in df.iterrows():
        if row['Age'] > 0.2 and row['Fare'] < 0.1:
            l1.append(row['PassengerId'])
        elif row['Age'] > 0.2 and row['Fare'] > 0.1:
            l2.append(row['PassengerId'])
    return np.mean(l1), np.mean(l2)


print(calculate_mean()) # (0.00375, 0.0036666666666666666)

Collectives™ on Stack Overflow

How to loop over a dataframe and create list

2 Answers 2

1.

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1.

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related