How to fill columns based on other column values?

Question

I have a df where I want to query the postalcode to match address and city.

Postalcodestring
1181
1055
8547

I'm using nomi.query_postal_code('n') for this. Hereby, when inputting the following the table is shown:

postal_code                1181
country_code                 NL
place_name           Amstelveen
state_name        Noord-Holland
state_code                    7
county_name          Amstelveen
county_code                 362
community_name              NaN
community_code              NaN
latitude                  52.31
longitude                4.8631
accuracy                      6
Name: 0, dtype: object

I want to fill the city and country for the column 'City1' and 'Country1' to fill each row of postal code. When a postalcode is n/a, I want the row City1 and Country1 to be N/A too!

I have tried the following code:

#NL
for i, row in df.iterrows():
    df.loc[i, 'City1'] = nomi.query_postal_code(df['Postalcodestring'][i])[2]    
#DE
for i, row in df.iterrows():
    df.loc[i,'City2'] = nomi2.query_postal_code(df['Postalcodestring'][i])[2]

#NLCountry
for i, row in df.iterrows():
    df.loc[i,['Country1']] = nomi.query_postal_code(df['Postalcodestring'][i])[1]    
#DECountry
for i, row in df.iterrows():
    df.loc[i,'Country2'] = nomi2.query_postal_code(df['Postalcodestring'][i])[1]

However, getting the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-80-d0d96a6ea61b> in <module>
     67 #NL
     68 for i, row in df.iterrows():
---> 69     df.loc[i, 'City1'] = nomi.query_postal_code(df['Postalcodestring'][i])[2]
     70 #DE
     71 for i, row in df.iterrows():
ValueError: DataFrame constructor not properly called!

Desired output:

Postalcodestring   City1 
1181               Amstelveen
1055               Amsterdam
8547               NaN

Please help !

sunnytown · Accepted Answer · 2021-03-22 13:38:53Z

You should use the df.apply method:

import pandas as pd
import pgeocode

df = pd.DataFrame({'Postalcodestring': ['1181', '1055', '8547']})
nomi = pgeocode.Nominatim('nl')

df['City1'] = df['Postalcodestring'].apply(lambda code: nomi.query_postal_code(code)['place_name'])

There is really no need to loop over the rows individually, when you can use df[COL].apply instead, to apply a function to every row of a column. As you can see in my code, you pass the function as the argument to the apply method. In my case, I use a lambda function to define the function in the same expression, but you could just aswell define the function explicitly outside:

def get_city(code):
    return nomi.query_postal_code(code)['place_name']

df['City1'] = df['Postalcodestring'].apply(get_city)

Just on a sidenote: Don't get confused because my code doesn't use loops. Of course loops are needed to perform such an operation on multiple rows. It's just that df.apply does the looping internally, so you don't need to do it yourself.

Collectives™ on Stack Overflow

How to fill columns based on other column values?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related