Geo-Search (Distance) in PHP/MySQL (Performance)

Geo-Search (Distance) in PHP/MySQL (Performance)

Calculate a bounding box to select a subset of the rows in the WHERE clause of your SQL query, so that you're only executing the expensive distance calculation on that subset of rows rather than against the entire 200k records in your table. The method is described in this article on Movable Type (with PHP code examples). Then you can include the Haversine calculation in your query against that subset to calculate the actual distances, and factor in the HAVING clause at that point.

It's the bounding box that helps your performance, because it means you're only doing the expensive distance calculation on a small subset of your data. This is effectively the same method that Patrick has suggested, but the Movable Type link has extensive explanations of the method, as well as PHP code that you can use to build the bounding box and your SQL query.

EDIT

If you don't think haversine is accurate enough, then there's also the Vincenty formula.

//  Vincenty formula to calculate great circle distance between 2 locations expressed as Lat/Long in KM

function VincentyDistance($lat1,$lat2,$lon1,$lon2){
$a = 6378137 - 21 * sin($lat1);
$b = 6356752.3142;
$f = 1/298.257223563;

$p1_lat = $lat1/57.29577951;
$p2_lat = $lat2/57.29577951;
$p1_lon = $lon1/57.29577951;
$p2_lon = $lon2/57.29577951;

$L = $p2_lon - $p1_lon;

$U1 = atan((1-$f) * tan($p1_lat));
$U2 = atan((1-$f) * tan($p2_lat));

$sinU1 = sin($U1);
$cosU1 = cos($U1);
$sinU2 = sin($U2);
$cosU2 = cos($U2);

$lambda = $L;
$lambdaP = 2*M_PI;
$iterLimit = 20;

while(abs($lambda-$lambdaP) > 1e-12 && $iterLimit>0) {
$sinLambda = sin($lambda);
$cosLambda = cos($lambda);
$sinSigma = sqrt(($cosU2*$sinLambda) * ($cosU2*$sinLambda) + ($cosU1*$sinU2-$sinU1*$cosU2*$cosLambda) * ($cosU1*$sinU2-$sinU1*$cosU2*$cosLambda));

//if ($sinSigma==0){return 0;} // co-incident points
$cosSigma = $sinU1*$sinU2 + $cosU1*$cosU2*$cosLambda;
$sigma = atan2($sinSigma, $cosSigma);
$alpha = asin($cosU1 * $cosU2 * $sinLambda / $sinSigma);
$cosSqAlpha = cos($alpha) * cos($alpha);
$cos2SigmaM = $cosSigma - 2*$sinU1*$sinU2/$cosSqAlpha;
$C = $f/16*$cosSqAlpha*(4+$f*(4-3*$cosSqAlpha));
$lambdaP = $lambda;
$lambda = $L + (1-$C) * $f * sin($alpha) * ($sigma + $C*$sinSigma*($cos2SigmaM+$C*$cosSigma*(-1+2*$cos2SigmaM*$cos2SigmaM)));
}

$uSq = $cosSqAlpha*($a*$a-$b*$b)/($b*$b);
$A = 1 + $uSq/16384*(4096+$uSq*(-768+$uSq*(320-175*$uSq)));
$B = $uSq/1024 * (256+$uSq*(-128+$uSq*(74-47*$uSq)));

$deltaSigma = $B*$sinSigma*($cos2SigmaM+$B/4*($cosSigma*(-1+2*$cos2SigmaM*$cos2SigmaM)- $B/6*$cos2SigmaM*(-3+4*$sinSigma*$sinSigma)*(-3+4*$cos2SigmaM*$cos2SigmaM)));

$s = $b*$A*($sigma-$deltaSigma);
return $s/1000;
}

echo VincentyDistance($lat1,$lat2,$lon1,$lon2);

Geo Location Radius Search Using PHP and MySQL

I have gone with below -

SELECT id,
name,
lat,
lng,
ROUND((6371 * acos(
cos(radians($lat)) * cos(radians(lat)) * cos(radians(lng) - radians($lng)) +
sin(radians($lat)) * sin(radians(lat)))), (2)) AS distance
FROM jobs
HAVING distance < 50
ORDER BY distance;

I have done benchmarking with data and found this slightly faster than Mukesh's answer and 2x better than @jision's answer.

PHP calculate lan/lon distance performance

I was able to reduce the calculation time from 13 seconds to 1 second!

I did this by filtering out with mysql the locations that were not within a 10 km bounding box of my lat/long coordinates.

I used this code:

  $rad = 10;  // radius of bounding circle in kilometers

$R = 6371; // earth's mean radius, km

// first-cut bounding box (in degrees)
$maxLat = $_GET['lat'] + rad2deg($rad/$R);
$minLat = $_GET['lat'] - rad2deg($rad/$R);
// compensate for degrees longitude getting smaller with increasing latitude
$maxLon = $_GET['long'] + rad2deg($rad/$R/cos(deg2rad($_GET['lat'])));
$minLon = $_GET['long'] - rad2deg($rad/$R/cos(deg2rad($_GET['lat'])));

and I changed my mysql to this:

$sql2 = "SELECT ....WHERE LAT BETWEEN '".$minLat."' AND '".$maxLat."'";
$sql3 = "SELECT ....WHERE LON BETWEEN '".$minLon."' AND '".$maxLon."'";

The rest of my code and calculation is exact the same, but instead of doing 3000 calculations, mysql sweeps out the majority.

I don't know if this approach is 100% mathematically correct, but as far as I see it works very fast with minor changes to my initial coding so for my project it's great.

And of course, the source: http://www.movable-type.co.uk/scripts/latlong-db.html

Elastic Search geo distance query with MySQL relationships

I didn't find any good solutions to do this, but I solved it by simply adding a location field on both posts and comments and then getting the coordinates of the related location when I push it into the Elasticsearch index.

Maybe not an optimal solution, but it works good and it is very fast as the index is kept completely flat. As long as I make sure the changes in the location coordinates propagate to the related posts and comments I don't see any real issues with this approach except that it's not so clean to store duplicated data in the index.



Related Topics



Leave a reply



Submit