MySQL query optimization with complex index

Question

I have a database used for simple reverse geocoding. The database rely on a table containing latitude, longitude and place name. Everytime a couple latitude,longitude is not present or, better, everytime the searched latitude,longitude differs too much from an existing latitude, longitude, I add a new row using GoogleMaps reverse geocoding service. Below the code to generate the address table:

CREATE TABLE `data_addresses` (
    `ID` int(11) NOT NULL COMMENT 'Primary Key',
    `LAT` int(11) NOT NULL COMMENT 'Latitude x 10000',
    `LNG` int(11) NOT NULL COMMENT 'Longitude x 10000',
    `ADDRESS` varchar(128) NOT NULL COMMENT 'Reverse Geocoded Street Address'
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 
ALTER TABLE `data_addresses`
    ADD PRIMARY KEY (`ID`),
    ADD UNIQUE KEY `IDX_ADDRESS_UNIQUE_LATLNG` (`LAT`,`LNG`),
    ADD KEY `IDX_ADDRESS_LAT` (`LAT`),
    ADD KEY `IDX_ADDRESS_LNG` (`LNG`);
ALTER TABLE `data_addresses`
    MODIFY `ID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary Key';

As you can see the trick is to use place two indexes on Latitude and Longitude. As normally latitude and longitude are float we use their value multiplied by 10000, so each couple latitude/longitude is unique. This implies a resolution of about 50m that is satisfying for my needs.

Now the problem: everytime I need to know if a given latitude/longitude (MyLat,MyLon) is already present or not I execute the following query:

SELECT `id`, ROUND(SQRT(POW(ABS(`LAT`-ROUND(MyLat*10000)),2)+POW(ABS(`LNG`-ROUND(MyLon*10000)),2))) AS R FROM splc_smarttrk.`data_addresses` ORDER BY R ASC LIMIT 1

This query will return to me the closest point and will give me also R (the rating): smaller R means closest approximation, so let say that everytime I find an R that is above 10 I need to add a new row to address table. Address table at present contains about 615k rows.

The problem is that despite indexes that I have placed this query is too slow (takes about 2 seconds on a 2x Xeon server). Below the results of Explain:

I've tried to upload a snapshot image of the explain results as picture but the editor of stackoverflow is terrifying when you need to upload an image...don't ask me why is on top of the page... — Power Engineering
– Power Engineering, Commented Mar 21, 2018 at 13:36
You ORDER BY on a "calculated" column this is never been optimized with index there for MySQL needs to use quicksort sorting algorithm on a estimate on 615088 rows... "using filesort" in extra column off the explain output indicates that.. — Raymond Nijland
– Raymond Nijland, Commented Mar 21, 2018 at 13:51
@RaymondNijland, your're right this is the core of the problem. So the real question is How can I check the closest couple lat/lon among the vailable lat/lon in address table? I just need to find the closest one, no matter how do I check that. Is it possible to create a "calulated index" maybe based on lat and lon pair? — Power Engineering
– Power Engineering, Commented Mar 21, 2018 at 14:05

Raymond Nijland · Accepted Answer · 2018-03-21 14:15:03Z

2

Can't you optimize this by retriving a fixed dataset of nearby latitude(s) and longitude(s) and calculate the Rating (R) and pick the smallest Rating on this fixed dataset.

p.s not tested might contain errors in the sorting. but it might help you on your way.

SELECT 
   id 
 , ROUND(SQRT(POW(ABS(`LAT`-ROUND([LAT]*10000)),2)+POW(ABS(`LNG`- ROUND([LNG]*10000)),2))) AS R

FROM ( 

  SELECT 
   LAT 
  FROM  
   data_addresses
  WHERE 
   LAT <= [LAT]  
  ORDER BY
   LAT DESC
  LIMIT 100

  UNION ALL

  SELECT 
   LAT   
  FROM 
   data_addresses
  WHERE 
   LAT >= [LAT]
  ORDER BY
   LAT ASC
  LIMIT 100

  SELECT 
   LNG 
  FROM 
   data_addresses
  WHERE 
   LNG <= [LNG]
  ORDER BY
   LNG DESC
  LIMIT 100

  UNION ALL

  SELECT 
   LNG
  FROM 
   data_addresses
  WHERE 
   LNG >= [LNG]
  ORDER BY
   LNG ASC
  LIMIT 100
) 
 AS data_addresses_range
ORDER BY 
 R ASC
LIMIT 1

edited Mar 21, 2018 at 14:15

answered Mar 21, 2018 at 14:06

Raymond Nijland

11.6k2 gold badges26 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Power Engineering Over a year ago

your suggestion was right even if not 100% adequate, but you put me on the right way to solve the problem that's why I gave you a well deserved +1. The problem was that I really don't need the distance, just need to know if the distance is above certain level. So I modified the query as described below in my answer.

Rick James · Accepted Answer · 2018-03-23 02:01:35Z

1

Instead of computing the distance (or in addition to), provide a "bounding box". This will be much faster.

Still faster would be the complex code here: mysql.rjweb.org/doc.php/latlng

Once you have UNIQUE KEY IDX_ADDRESS_UNIQUE_LATLNG (LAT, LNG), there is no need for KEY IDX_ADDRESS_LAT (LAT)

*10000 can fit in MEDIUMINT. And it is good to about 16 meters or 52 feet.

answered Mar 23, 2018 at 2:01

Rick James

144k15 gold badges144 silver badges255 bronze badges

1 Comment

Power Engineering Over a year ago

congratulations! Your link at mysql.rjweb.org/doc.php/latlng is really focused on my issue! In fact my solution (developed before your suggestion) goes in the same direction implementing a "bounding box" but without partitioning. Really useful and very well done webpage! Many Thanks!

Rick James · Accepted Answer · 2018-03-26 17:37:14Z

0

Following the suggestion of Raymond Nijland I modified the query as follows:

SELECT  `id` AS ID,
ROUND(SQRT(POW(ABS(`LAT`-ROUND(NLat*10000)), 2) +
           POW(ABS(`LNG`-ROUND(NLon*10000)), 2))
     ) AS RT INTO  ADDR_ID, RATING
    FROM  splc_smarttrk.`data_addresses`
    WHERE  (`LAT` BETWEEN (ROUND(NLat*10000)-R) AND (ROUND(NLat*10000)+R))
      AND  (`LNG` BETWEEN (ROUND(NLon*10000)-R) AND (ROUND(NLon*10000)+R))
    ORDER BY  RT ASC
    LIMIT  1;

this trick reduces the dataset to 10 records in the worst case scenario, hence the speed is fair good despite the ORDER BY clause. In fact I don't really need to know the Distance from existing point, I just need to know if that distance is above a givel limit (here if is within a 10x10 rectangle that means R=5).

edited Mar 26, 2018 at 17:37

Rick James

144k15 gold badges144 silver badges255 bronze badges

answered Mar 21, 2018 at 19:56

Power Engineering

7202 gold badges14 silver badges27 bronze badges

1 Comment

Rick James Over a year ago

aka "bounding box". Need a cosine of the latitude to account for longitude lines being closer together than latitude.

Collectives™ on Stack Overflow

MySQL query optimization with complex index

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related