5

I have a function that returns latitude and longitude information. I want to create columns for these 4 variables in a data frame.

Here is my code:

import geocoder
import pandas as pd
import geolib
from geolib import geohash

df = pd.read_csv('New_DP2.csv')

key = [redacted]


fields = ['NWLat', 'NWLong', 'SELat', 'SELong']
def getData(address, key):
    g = geocoder.mapquest(address, key=key)
    lat = g.lat
    lng = g.lng
    h = geolib.geohash.encode(lat, lng, 7)
    hashes = geolib.geohash.neighbours(h)
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)
    nwlat = NW.lat
    nwlon = NW.lon
    selat = SE.lat
    selon = SE.lon

I want to create four columns in a data frame that will make columns for 'nwlat','nwlon', 'selat', 'selon'.

Normally I would simply return nwlat and then create a lambda

df['NWLong'] = df.apply(lambda row: getData(row['a'], key), axis = 1)

Then I would do this for each case of the other 3 variables I want returned. But then I am running this a total of 4 times instead of just once.

3
  • Ok, so normally you'd use df.apply. What problem are you encountering in this case? Commented Feb 28, 2019 at 1:07
  • @PeterLeimbigler Please see edit, apologies if I was not clear. Commented Feb 28, 2019 at 2:51
  • Are you trying to construct bounding boxes with this data? You might want to consider using geopandas or a spatial database (just a suggestion, unrelated to the question here). Commented May 14, 2019 at 20:37

1 Answer 1

8
+100

You were quite close. All you needed to do was to figure out how to return the result appropriately. Your function will need to look like this:

def getData(address, key):
    ...
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)

    return pd.Series(dict(zip(fields, [NW.lat, NW.lon,  SE.lat, SE.lon]))) 

You can then use Series.apply:

df = pd.DataFrame({'address': ['Los Angeles, CA']})  # for example
df['address'].apply(getData, key=key)

                 NWLat                 NWLong                SELat                 SELong
0  34.0541839599609375  -118.2451629638671875  34.0541839599609375  -118.2424163818359375

This works by having getData return a Series object (with the fields as the index). apply will then automatically construct a DataFrame and return the result.

Side note: To concatenate these columns to the existing df, call pd.concat:

res = pd.concat([df, df['address'].apply(getData, key=key)], axis=1)

Another option would be to use a list comprehension, if there are no NaNs in your DataFrame. This is a performance (AND memory) micro-optimization.

def getData2(address, key):
    ...
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)

    return [NW.lat, NW.lon,  SE.lat, SE.lon]

pd.DataFrame([getData2(a, key) for a in df['address']], columns=fields)

                 NWLat                 NWLong                SELat                 SELong
0  34.0541839599609375  -118.2451629638671875  34.0541839599609375  -118.2424163818359375

More information on List Comprehensions and their benefits have been detailed in my post: For loops with pandas - When should I care?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.