0

I have been trying to code this for a while. here is a sample dataframe:

capacity = 500
s = pd.Series(['School 1','School 2', 'School 3','School 4', 'School 5'])
p = pd.Series(['132', '458', '333', '300', '258'])
d = pd.Series(['1', '2', '3', '4', '5'])

df = pd.DataFrame(np.c_[s,p,d],columns = ['School Name','Population', 'Distance'])

What I want to do is to make loop where loop will continually subtract the 'Population' from the 'capacity' as long as it does not exceed the capacity. It would need to check the 'Distance' for the order.

example: Since 'School 1' is the nearest it subtracts 132 from 500 which is 368. But since 'School 2' is the next nearest but the population exceeds 368 (458>368), it would stop here, it would no longer continue to check the next nearest School which is 'School 3'.

After this it should then assign the school name in to another column

end result would be:

s = pd.Series(['School 1','School 2', 'School 3','School 4', 'School 5'])
p = pd.Series(['132', '458', '333', '300', '258'])
d = pd.Series(['1', '2', '3', '4', '5'])
sn = pd.Series(['School 1', 0, 0 ,0 ,0])
df2 = pd.DataFrame(np.c_[s,p,d,sn],columns = ['School Name','Population', 'Distance','Included'])

Been trying to work on this since yesterday, still have no clue how to do it except manually. Still a beginner python user.

Thanks for the help!

3
  • 3
    "Python loop, how to loop this?" isn't a so good title tho.. Commented Nov 6, 2018 at 3:33
  • 1
    @U9-Forward, edited the title. That was supposed to be a place holder title but I forgot to change it. Thanks Commented Nov 6, 2018 at 3:38
  • 1
    I think you meant (458 > 368) in your example. There is a typo (368 is written as 468). Commented Nov 6, 2018 at 4:22

1 Answer 1

2

Based on your question, I am assuming that you want just one school name right before the capacity is exceeded. That could be achieved like this:

import pandas as pd
import numpy as np

capacity = 500

s = pd.Series(['School 1','School 2', 'School 3','School 4', 'School 5'])
p = pd.Series(['132', '458', '333', '300', '258'])
d = pd.Series(['1', '2', '3', '4', '5'])
df = pd.DataFrame(np.c_[s,p,d],columns = ['School Name','Population', 'Distance'])

# converting population to integer values
p = p.astype('int')

# placeholder to store school name
school_name = None

for idx, val in enumerate(p):
  # keep assigning school name until capacity is exceeded
  capacity -= val
  if capacity < 0:
      break
  school_name = s[idx]

# add included column     
df['included'] = np.where(df['School Name'] == school_name, df['School Name'], 0)

Then you can print the df to see that it works indeed:

>>> df1
School Name Population Distance    included
0    School 1        132        1    School 1
1    School 2        458        2           0
2    School 3        333        3           0
3    School 4        300        4           0
4    School 5        258        5           0

However, let's say that you want to keep all the schools until the capacity gets exceeded, it is very simply to modify the above program .. just replace the placeholder and the loop like this:

school_names = []    # placeholder will be a list now
for idx, val in enumerate(p):
    capacity -= val
    if capacity < 0:
        break
    school_names.append(s[idx])    # keep adding schools that do not exceed capacity to the list

# Instead of equality, check if school name is in your list
df['included'] = np.where(df['School Name'].isin(school_names), df['School Name'], 0)

Now, if your capacity = 500 and you change the 2nd population such that p = pd.Series(['132', '128', '333', '300', '258']) then both School 1 and School 2 would be included.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.