I am currently performing a reverse-geocoding operation as follows:
import json
from shapely.geometry import shape, Point
import time
with open('districts.json') as f: districts = json.load(f)
# file also kept at https://raw.githubusercontent.com/Thevesh/Display/master/districts.json
def reverse_geocode(lon,lat):
point = Point(lon, lat) # lon/lat
for feature in districts['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point): return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
return ['','']
start_time = time.time()
for i in range(1000): test = reverse_geocode(103, 3)
print('----- Code ran in ' + "{:.3f}".format(time.time() - start_time) + ' seconds -----')
This takes about 13 seconds to reverse geocode 1000 points, which is fine.
However, I'm going to need to reverse geocode 10mil coordinate pairs for a task, which means it'll take 130k seconds (1.5 days) assuming linear complexity. Not fine.
The obvious inefficiency in this algorithm is that it iterates through the entire set of polygons each and every time it classifies a point, which is a giant waste of time.
How can I improve this code? To compute 10mil pairs in a time acceptable for the task, I need to run 1k pairs in 1 second.