Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates)

Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package geosphere has the distCosine, distHaversine, distVincentySphere and distVincentyEllipsoid for calculating the distance. Of these, the distVincentyEllipsoid is considered the most accurate one, but is computationally more intensive than the other ones.

With one of these functions, you can make a distance matrix. Based on that matrix you can then assign locality names based on shortest distance with which.min and the corresponding distance with min (see for this the last part of the answer) like this:

library(geosphere)

# create distance matrix
mat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)

# assign the name to the point in list1 based on shortest distance in the matrix
list1$locality <- list2$locality[max.col(-mat)]

this gives:

> list1
longitude latitude locality
1 80.15998 12.90524 D
2 72.89125 19.08120 A
3 77.65032 12.97238 C
4 77.60599 12.90927 D
5 72.88120 19.08225 A
6 76.65460 12.81447 E
7 72.88232 19.08241 A
8 77.49186 13.00984 D
9 72.82228 18.99347 A
10 72.88871 19.07990 A

Another possibility is to assign the locality based on the average longitude and latitude values of the localitys in list2:

library(dplyr)
list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()
mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)
list1 <- list1 %>% mutate(locality2 = list2a$locality[max.col(-mat2)])

or with data.table:

library(data.table)
list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]
mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)
list1[, locality2 := list2a$locality[max.col(-mat2)] ]

this gives:

> list1
longitude latitude locality locality2
1 80.15998 12.90524 D D
2 72.89125 19.08120 A B
3 77.65032 12.97238 C C
4 77.60599 12.90927 D C
5 72.88120 19.08225 A B
6 76.65460 12.81447 E E
7 72.88232 19.08241 A B
8 77.49186 13.00984 D C
9 72.82228 18.99347 A B
10 72.88871 19.07990 A B

As you can see, this leads in most (7 out of 10) occasions to another assigned locality.


You can add the distance with:

list1$near_dist <- apply(mat2, 1, min)

or another approach with max.col (which is highly probable faster):

list1$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]

# or using dplyr
list1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])
# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )
list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]

the result:

> list1
longitude latitude locality locality2 near_dist
1: 80.15998 12.90524 D D 269966.8970
2: 72.89125 19.08120 A B 65820.2047
3: 77.65032 12.97238 C C 739.1885
4: 77.60599 12.90927 D C 9209.8165
5: 72.88120 19.08225 A B 66832.7223
6: 76.65460 12.81447 E E 0.0000
7: 72.88232 19.08241 A B 66732.3127
8: 77.49186 13.00984 D C 17855.3083
9: 72.82228 18.99347 A B 69456.3382
10: 72.88871 19.07990 A B 66004.9900

How to convert the geospatial distance of two points (lat, lon) to a similarity measuremnt

A simple answer is to define the extremes, and then interpolate linearly between them. So you would have

  • if dv == 0, similarity = 1 (same point)
  • if dv == MAX, similarity = 0 (antipodal points; you can take 20 000 000 m as an approximation for MAX, or use the exact value, whichever you prefer)
  • otherwise, similarity = (MAX - dv) / MAX

You can use other interpolations besides linear - but the basic idea holds: this is easier than you make it seem. Any monotonous and continous function f such that f(0) == 0 && f(1) == 1 will make f((MAX - dv) / MAX) a similarity that matches your constraints.

Function to calculate geospatial distance between two points (lat,long) using R

Loading the geosphere package you can use a number of different functions

library(geosphere)
distm(c(lon1, lat1), c(lon2, lat2), fun = distHaversine)

Also:

distHaversine()
distMeeus()
distRhumb()
distVincentyEllipsoid()
distVincentySphere()

...

Calculating geographic distance between a list of coordinates (lat, lng)

Here's the solution I ended up using. It's called the Haversine (distance) function if you want to look up what it does for yourself.

I changed my approach a little as well. My input (positions) is a list of tuple coordinates:

def calculate_distance(positions):
results = []
for i in range(1, len(positions)):
loc1 = positions[i - 1]
loc2 = positions[i]

lat1 = loc1[0]
lng1 = loc1[1]

lat2 = loc2[0]
lng2 = loc2[1]

degreesToRadians = (math.pi / 180)
latrad1 = lat1 * degreesToRadians
latrad2 = lat2 * degreesToRadians
dlat = (lat2 - lat1) * degreesToRadians
dlng = (lng2 - lng1) * degreesToRadians

a = math.sin(dlat / 2) * math.sin(dlat / 2) + math.cos(latrad1) * \
math.cos(latrad2) * math.sin(dlng / 2) * math.sin(dlng / 2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
r = 6371000

results.append(r * c)

return (sum(results) / 1000) # Converting from m to km

Getting distance between two points based on latitude/longitude

Edit: Just as a note, if you just need a quick and easy way of finding the distance between two points, I strongly recommend using the approach described in Kurt's answer below instead of re-implementing Haversine -- see his post for rationale.

This answer focuses just on answering the specific bug OP ran into.


It's because in Python, all the trig functions use radians, not degrees.

You can either convert the numbers manually to radians, or use the radians function from the math module:

from math import sin, cos, sqrt, atan2, radians

# approximate radius of earth in km
R = 6373.0

lat1 = radians(52.2296756)
lon1 = radians(21.0122287)
lat2 = radians(52.406374)
lon2 = radians(16.9251681)

dlon = lon2 - lon1
dlat = lat2 - lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

distance = R * c

print("Result:", distance)
print("Should be:", 278.546, "km")

The distance is now returning the correct value of 278.545589351 km.

Evaluating the closest distance from one point between multiple options?

If you only want the nearest building to each person, and they're relatively close:

library(sf)

## load data here from @dcarlson's dput

person_location <- person_location %>%
st_as_sf(coords = c('longitude', 'latitude')) %>%
st_set_crs(4326)

building_location <- building_location %>%
st_as_sf(coords = c('longitude', 'latitude')) %>%
st_set_crs(4326)

st_nearest_feature(person_location, building_location)

#although coordinates are longitude/latitude, st_nearest_feature assumes that they #are planar
#[1] 6 2 6 6 8

So person 1,3 & 4 are closest to building #6. Person 2 -> building #2 ...

All distances can be calculated with st_distance(person_location, building_location).

You can use the nngeo library to easily find the shortest distance for each person.

library(nngeo)

st_connect(person_location, building_location) %>% st_length()
Calculating nearest IDs
|===============================================================================================================| 100%
Calculating lines
|===============================================================================================================| 100%
Done.
Units: [m]
[1] 5054.381 5856.388 1923.254 1796.608 1976.786

Things are easier to understand with a graph:

st_connect(person_location, building_location) %>% 
ggplot() +
geom_sf() +
geom_sf(data = person_location, color = 'green') +
geom_sf(data = building_location, color = 'red')

ggplot people & bldgs

And even easier on a map:

st_connect(person_location, building_location) %>% 
mapview::mapview() +
mapview::mapview(person_location, color = 'green', col.regions = 'green') +
mapview::mapview(building_location, color = 'black', col.regions = 'black')

mapview

geosphere is probably more accurate, but if you're dealing with relatively small areas these tools are probably good enough. I find it easier to work with, and don't often need extreme precision.

Distm function for calculate distance between coordinates in R

This should work:

library(geosphere)

distm(df[,c('Longitude','Latitude')],
df1[,c('Longitude','Latitude')],
fun=distVincentyEllipsoid)

[,1] [,2] [,3] [,4]
[1,] 45461.49 23203.37 44300.99 10190.84
[2,] 60243.58 15053.19 53852.61 40763.35
[3,] 63272.26 22151.07 59016.34 32505.87
[4,] 56308.59 46393.08 59016.34 15048.01

The first row indicates the distance between property 1 and industries 1, 2, 3 and 4.

See also here:

Function to calculate geospatial distance between two points (lat,long) using R

Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package geosphere has the distCosine, distHaversine, distVincentySphere and distVincentyEllipsoid for calculating the distance. Of these, the distVincentyEllipsoid is considered the most accurate one, but is computationally more intensive than the other ones.

With one of these functions, you can make a distance matrix. Based on that matrix you can then assign locality names based on shortest distance with which.min and the corresponding distance with min (see for this the last part of the answer) like this:

library(geosphere)

# create distance matrix
mat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)

# assign the name to the point in list1 based on shortest distance in the matrix
list1$locality <- list2$locality[max.col(-mat)]

this gives:

> list1
longitude latitude locality
1 80.15998 12.90524 D
2 72.89125 19.08120 A
3 77.65032 12.97238 C
4 77.60599 12.90927 D
5 72.88120 19.08225 A
6 76.65460 12.81447 E
7 72.88232 19.08241 A
8 77.49186 13.00984 D
9 72.82228 18.99347 A
10 72.88871 19.07990 A

Another possibility is to assign the locality based on the average longitude and latitude values of the localitys in list2:

library(dplyr)
list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()
mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)
list1 <- list1 %>% mutate(locality2 = list2a$locality[max.col(-mat2)])

or with data.table:

library(data.table)
list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]
mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)
list1[, locality2 := list2a$locality[max.col(-mat2)] ]

this gives:

> list1
longitude latitude locality locality2
1 80.15998 12.90524 D D
2 72.89125 19.08120 A B
3 77.65032 12.97238 C C
4 77.60599 12.90927 D C
5 72.88120 19.08225 A B
6 76.65460 12.81447 E E
7 72.88232 19.08241 A B
8 77.49186 13.00984 D C
9 72.82228 18.99347 A B
10 72.88871 19.07990 A B

As you can see, this leads in most (7 out of 10) occasions to another assigned locality.


You can add the distance with:

list1$near_dist <- apply(mat2, 1, min)

or another approach with max.col (which is highly probable faster):

list1$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]

# or using dplyr
list1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])
# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )
list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]

the result:

> list1
longitude latitude locality locality2 near_dist
1: 80.15998 12.90524 D D 269966.8970
2: 72.89125 19.08120 A B 65820.2047
3: 77.65032 12.97238 C C 739.1885
4: 77.60599 12.90927 D C 9209.8165
5: 72.88120 19.08225 A B 66832.7223
6: 76.65460 12.81447 E E 0.0000
7: 72.88232 19.08241 A B 66732.3127
8: 77.49186 13.00984 D C 17855.3083
9: 72.82228 18.99347 A B 69456.3382
10: 72.88871 19.07990 A B 66004.9900


Related Topics



Leave a reply



Submit