﻿ Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates) - ITCodar

# Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates)

## Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package `geosphere` has the `distCosine`, `distHaversine`, `distVincentySphere` and `distVincentyEllipsoid` for calculating the distance. Of these, the `distVincentyEllipsoid` is considered the most accurate one, but is computationally more intensive than the other ones.

With one of these functions, you can make a distance matrix. Based on that matrix you can then assign `locality` names based on shortest distance with `which.min` and the corresponding distance with `min` (see for this the last part of the answer) like this:

``library(geosphere)# create distance matrixmat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)# assign the name to the point in list1 based on shortest distance in the matrixlist1\$locality <- list2\$locality[max.col(-mat)]``

this gives:

``> list1   longitude latitude locality1   80.15998 12.90524        D2   72.89125 19.08120        A3   77.65032 12.97238        C4   77.60599 12.90927        D5   72.88120 19.08225        A6   76.65460 12.81447        E7   72.88232 19.08241        A8   77.49186 13.00984        D9   72.82228 18.99347        A10  72.88871 19.07990        A``

Another possibility is to assign the `locality` based on the average longitude and latitude values of the `locality`s in `list2`:

``library(dplyr)list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)list1 <- list1 %>% mutate(locality2 = list2a\$locality[max.col(-mat2)])``

or with `data.table`:

``library(data.table)list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)list1[, locality2 := list2a\$locality[max.col(-mat2)] ]``

this gives:

``> list1   longitude latitude locality locality21   80.15998 12.90524        D         D2   72.89125 19.08120        A         B3   77.65032 12.97238        C         C4   77.60599 12.90927        D         C5   72.88120 19.08225        A         B6   76.65460 12.81447        E         E7   72.88232 19.08241        A         B8   77.49186 13.00984        D         C9   72.82228 18.99347        A         B10  72.88871 19.07990        A         B``

As you can see, this leads in most (7 out of 10) occasions to another assigned `locality`.

You can add the distance with:

``list1\$near_dist <- apply(mat2, 1, min)``

or another approach with `max.col` (which is highly probable faster):

``list1\$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]# or using dplyrlist1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]``

the result:

``> list1    longitude latitude locality locality2   near_dist 1:  80.15998 12.90524        D         D 269966.8970 2:  72.89125 19.08120        A         B  65820.2047 3:  77.65032 12.97238        C         C    739.1885 4:  77.60599 12.90927        D         C   9209.8165 5:  72.88120 19.08225        A         B  66832.7223 6:  76.65460 12.81447        E         E      0.0000 7:  72.88232 19.08241        A         B  66732.3127 8:  77.49186 13.00984        D         C  17855.3083 9:  72.82228 18.99347        A         B  69456.338210:  72.88871 19.07990        A         B  66004.9900``

## How to convert the geospatial distance of two points (lat, lon) to a similarity measuremnt

A simple answer is to define the extremes, and then interpolate linearly between them. So you would have

• if `dv == 0`, `similarity = 1` (same point)
• if `dv == MAX`, `similarity = 0` (antipodal points; you can take 20 000 000 m as an approximation for MAX, or use the exact value, whichever you prefer)
• otherwise, `similarity = (MAX - dv) / MAX`

You can use other interpolations besides linear - but the basic idea holds: this is easier than you make it seem. Any monotonous and continous function `f` such that `f(0) == 0 && f(1) == 1` will make `f((MAX - dv) / MAX)` a similarity that matches your constraints.

## Function to calculate geospatial distance between two points (lat,long) using R

Loading the geosphere package you can use a number of different functions

``library(geosphere)distm(c(lon1, lat1), c(lon2, lat2), fun = distHaversine)``

Also:

``distHaversine()distMeeus()distRhumb()distVincentyEllipsoid()distVincentySphere()``

...

## Calculating geographic distance between a list of coordinates (lat, lng)

Here's the solution I ended up using. It's called the Haversine (distance) function if you want to look up what it does for yourself.

I changed my approach a little as well. My input (`positions`) is a list of tuple coordinates:

``def calculate_distance(positions):    results = []    for i in range(1, len(positions)):        loc1 = positions[i - 1]        loc2 = positions[i]        lat1 = loc1        lng1 = loc1        lat2 = loc2        lng2 = loc2        degreesToRadians = (math.pi / 180)        latrad1 = lat1 * degreesToRadians        latrad2 = lat2 * degreesToRadians        dlat = (lat2 - lat1) * degreesToRadians        dlng = (lng2 - lng1) * degreesToRadians        a = math.sin(dlat / 2) * math.sin(dlat / 2) + math.cos(latrad1) * \        math.cos(latrad2) * math.sin(dlng / 2) * math.sin(dlng / 2)        c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))        r = 6371000        results.append(r * c)    return (sum(results) / 1000)  # Converting from m to km``

## Getting distance between two points based on latitude/longitude

Edit: Just as a note, if you just need a quick and easy way of finding the distance between two points, I strongly recommend using the approach described in Kurt's answer below instead of re-implementing Haversine -- see his post for rationale.

This answer focuses just on answering the specific bug OP ran into.

It's because in Python, all the trig functions use radians, not degrees.

You can either convert the numbers manually to radians, or use the `radians` function from the math module:

``from math import sin, cos, sqrt, atan2, radians# approximate radius of earth in kmR = 6373.0lat1 = radians(52.2296756)lon1 = radians(21.0122287)lat2 = radians(52.406374)lon2 = radians(16.9251681)dlon = lon2 - lon1dlat = lat2 - lat1a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2c = 2 * atan2(sqrt(a), sqrt(1 - a))distance = R * cprint("Result:", distance)print("Should be:", 278.546, "km")``

The distance is now returning the correct value of `278.545589351` km.

## Evaluating the closest distance from one point between multiple options?

If you only want the nearest building to each person, and they're relatively close:

``library(sf)## load data here from @dcarlson's dputperson_location <- person_location %>%  st_as_sf(coords = c('longitude', 'latitude')) %>%  st_set_crs(4326)building_location <- building_location %>%  st_as_sf(coords = c('longitude', 'latitude')) %>%  st_set_crs(4326)st_nearest_feature(person_location, building_location)#although coordinates are longitude/latitude, st_nearest_feature assumes that they #are planar# 6 2 6 6 8``

So person 1,3 & 4 are closest to building #6. Person 2 -> building #2 ...

All distances can be calculated with `st_distance(person_location, building_location)`.

You can use the `nngeo` library to easily find the shortest distance for each person.

``library(nngeo)st_connect(person_location, building_location) %>% st_length()Calculating nearest IDs  |===============================================================================================================| 100%Calculating lines  |===============================================================================================================| 100%Done.Units: [m] 5054.381 5856.388 1923.254 1796.608 1976.786``

Things are easier to understand with a graph:

``st_connect(person_location, building_location) %>%   ggplot() +     geom_sf() +     geom_sf(data = person_location, color = 'green') +     geom_sf(data = building_location, color = 'red')`` And even easier on a map:

``st_connect(person_location, building_location) %>%   mapview::mapview() +  mapview::mapview(person_location, color = 'green', col.regions = 'green') +   mapview::mapview(building_location, color = 'black', col.regions = 'black')`` geosphere is probably more accurate, but if you're dealing with relatively small areas these tools are probably good enough. I find it easier to work with, and don't often need extreme precision.

## Distm function for calculate distance between coordinates in R

This should work:

``library(geosphere)distm(df[,c('Longitude','Latitude')],      df1[,c('Longitude','Latitude')],             fun=distVincentyEllipsoid)         [,1]     [,2]     [,3]     [,4][1,] 45461.49 23203.37 44300.99 10190.84[2,] 60243.58 15053.19 53852.61 40763.35[3,] 63272.26 22151.07 59016.34 32505.87[4,] 56308.59 46393.08 59016.34 15048.01``

The first row indicates the distance between property 1 and industries 1, 2, 3 and 4.

Function to calculate geospatial distance between two points (lat,long) using R

Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

## Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package `geosphere` has the `distCosine`, `distHaversine`, `distVincentySphere` and `distVincentyEllipsoid` for calculating the distance. Of these, the `distVincentyEllipsoid` is considered the most accurate one, but is computationally more intensive than the other ones.

With one of these functions, you can make a distance matrix. Based on that matrix you can then assign `locality` names based on shortest distance with `which.min` and the corresponding distance with `min` (see for this the last part of the answer) like this:

``library(geosphere)# create distance matrixmat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)# assign the name to the point in list1 based on shortest distance in the matrixlist1\$locality <- list2\$locality[max.col(-mat)]``

this gives:

``> list1   longitude latitude locality1   80.15998 12.90524        D2   72.89125 19.08120        A3   77.65032 12.97238        C4   77.60599 12.90927        D5   72.88120 19.08225        A6   76.65460 12.81447        E7   72.88232 19.08241        A8   77.49186 13.00984        D9   72.82228 18.99347        A10  72.88871 19.07990        A``

Another possibility is to assign the `locality` based on the average longitude and latitude values of the `locality`s in `list2`:

``library(dplyr)list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)list1 <- list1 %>% mutate(locality2 = list2a\$locality[max.col(-mat2)])``

or with `data.table`:

``library(data.table)list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)list1[, locality2 := list2a\$locality[max.col(-mat2)] ]``

this gives:

``> list1   longitude latitude locality locality21   80.15998 12.90524        D         D2   72.89125 19.08120        A         B3   77.65032 12.97238        C         C4   77.60599 12.90927        D         C5   72.88120 19.08225        A         B6   76.65460 12.81447        E         E7   72.88232 19.08241        A         B8   77.49186 13.00984        D         C9   72.82228 18.99347        A         B10  72.88871 19.07990        A         B``

As you can see, this leads in most (7 out of 10) occasions to another assigned `locality`.

You can add the distance with:

``list1\$near_dist <- apply(mat2, 1, min)``

or another approach with `max.col` (which is highly probable faster):

``list1\$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]# or using dplyrlist1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]``

the result:

``> list1    longitude latitude locality locality2   near_dist 1:  80.15998 12.90524        D         D 269966.8970 2:  72.89125 19.08120        A         B  65820.2047 3:  77.65032 12.97238        C         C    739.1885 4:  77.60599 12.90927        D         C   9209.8165 5:  72.88120 19.08225        A         B  66832.7223 6:  76.65460 12.81447        E         E      0.0000 7:  72.88232 19.08241        A         B  66732.3127 8:  77.49186 13.00984        D         C  17855.3083 9:  72.82228 18.99347        A         B  69456.338210:  72.88871 19.07990        A         B  66004.9900``