Construct a DataFrame from pandas apply

Question

I have a function that returns latitude and longitude information. I want to create columns for these 4 variables in a data frame.

Here is my code:

import geocoder
import pandas as pd
import geolib
from geolib import geohash

df = pd.read_csv('New_DP2.csv')

key = [redacted]


fields = ['NWLat', 'NWLong', 'SELat', 'SELong']
def getData(address, key):
    g = geocoder.mapquest(address, key=key)
    lat = g.lat
    lng = g.lng
    h = geolib.geohash.encode(lat, lng, 7)
    hashes = geolib.geohash.neighbours(h)
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)
    nwlat = NW.lat
    nwlon = NW.lon
    selat = SE.lat
    selon = SE.lon

I want to create four columns in a data frame that will make columns for 'nwlat','nwlon', 'selat', 'selon'.

Normally I would simply return nwlat and then create a lambda

df['NWLong'] = df.apply(lambda row: getData(row['a'], key), axis = 1)

Then I would do this for each case of the other 3 variables I want returned. But then I am running this a total of 4 times instead of just once.

Ok, so normally you'd use df.apply. What problem are you encountering in this case? — Peter Leimbigler
– Peter Leimbigler, Commented Feb 28, 2019 at 1:07
@PeterLeimbigler Please see edit, apologies if I was not clear. — Wolfy
– Wolfy, Commented Feb 28, 2019 at 2:51
Are you trying to construct bounding boxes with this data? You might want to consider using geopandas or a spatial database (just a suggestion, unrelated to the question here). — cs95
– cs95, Commented May 14, 2019 at 20:37

cs95 · Accepted Answer · 2019-05-14 23:33:09Z

You were quite close. All you needed to do was to figure out how to return the result appropriately. Your function will need to look like this:

def getData(address, key):
    ...
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)

    return pd.Series(dict(zip(fields, [NW.lat, NW.lon,  SE.lat, SE.lon])))

You can then use Series.apply:

df = pd.DataFrame({'address': ['Los Angeles, CA']})  # for example
df['address'].apply(getData, key=key)

                 NWLat                 NWLong                SELat                 SELong
0  34.0541839599609375  -118.2451629638671875  34.0541839599609375  -118.2424163818359375

This works by having getData return a Series object (with the fields as the index). apply will then automatically construct a DataFrame and return the result.

Side note: To concatenate these columns to the existing df, call pd.concat:

res = pd.concat([df, df['address'].apply(getData, key=key)], axis=1)

Another option would be to use a list comprehension, if there are no NaNs in your DataFrame. This is a performance (AND memory) micro-optimization.

def getData2(address, key):
    ...
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)

    return [NW.lat, NW.lon,  SE.lat, SE.lon]

pd.DataFrame([getData2(a, key) for a in df['address']], columns=fields)

                 NWLat                 NWLong                SELat                 SELong
0  34.0541839599609375  -118.2451629638671875  34.0541839599609375  -118.2424163818359375

More information on List Comprehensions and their benefits have been detailed in my post: For loops with pandas - When should I care?

Collectives™ on Stack Overflow

Construct a DataFrame from pandas apply

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related