Python - speed up reverse geo-coding

Question

I am currently performing a reverse-geocoding operation as follows:

import json
from shapely.geometry import shape, Point
import time

with open('districts.json') as f: districts = json.load(f)
# file also kept at https://raw.githubusercontent.com/Thevesh/Display/master/districts.json

def reverse_geocode(lon,lat):
    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point): return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

start_time = time.time()
for i in range(1000): test = reverse_geocode(103, 3)
print('----- Code ran in ' + "{:.3f}".format(time.time() - start_time) + ' seconds -----')

This takes about 13 seconds to reverse geocode 1000 points, which is fine.

However, I'm going to need to reverse geocode 10mil coordinate pairs for a task, which means it'll take 130k seconds (1.5 days) assuming linear complexity. Not fine.

The obvious inefficiency in this algorithm is that it iterates through the entire set of polygons each and every time it classifies a point, which is a giant waste of time.

How can I improve this code? To compute 10mil pairs in a time acceptable for the task, I need to run 1k pairs in 1 second.

The first obvious solution is to run it with multiprocessing, did you try that? — Ron Serruya
– Ron Serruya, Commented Mar 3, 2021 at 16:54

Gustavo Pereira · Accepted Answer · 2021-03-03 23:40:53Z

1

I arrived at this algorithm using parallelism

if possible, return it to me if it was useful for your purpose. remember that it is an amateur algorithm, it will need adjustments.

import concurrent.futures

with open('districts.json') as f: districts = json.load(f)

def reverse_geocode(lon:int, lat:int) -> list:

    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

if __name__ == '__main__':
    time_start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as process:
        for url in range(1000):
            process.submit(reverse_geocode, 103, 3)

    time_end = time.time()
    print(f'\nfim {round(time_end - time_start, 2)} seconds')

answered Mar 3, 2021 at 23:40

Gustavo Pereira

111 bronze badge

Sign up to request clarification or add additional context in comments.

1 Comment

Thev Over a year ago

This cuts the time in half, which is awesome, but I think the fundamental problem is still that we're iterating through the whole list. I'm going to try an approach which sorts the list to search, and the features to search through.

Collectives™ on Stack Overflow

Python - speed up reverse geo-coding

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related