Function to Calculate Geospatial Distance Between Two Points (Lat,Long) Using R

Function to calculate geospatial distance between two points (lat,long) using R

Loading the geosphere package you can use a number of different functions

library(geosphere)
distm(c(lon1, lat1), c(lon2, lat2), fun = distHaversine)

Also:

distHaversine()
distMeeus()
distRhumb()
distVincentyEllipsoid()
distVincentySphere()

...

Distm function for calculate distance between coordinates in R

This should work:

library(geosphere)

distm(df[,c('Longitude','Latitude')],
df1[,c('Longitude','Latitude')],
fun=distVincentyEllipsoid)

[,1] [,2] [,3] [,4]
[1,] 45461.49 23203.37 44300.99 10190.84
[2,] 60243.58 15053.19 53852.61 40763.35
[3,] 63272.26 22151.07 59016.34 32505.87
[4,] 56308.59 46393.08 59016.34 15048.01

The first row indicates the distance between property 1 and industries 1, 2, 3 and 4.

See also here:

Function to calculate geospatial distance between two points (lat,long) using R

Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

R calculate distance in miles using 2 latitude & 2 longitude vectors in a data frame for 18k rows

library(geodist) is a good & fast library for calculating distances, and the geodist_vec() function is vectorised to work on 'columns' of data

library(geodist)

## calcualte distance in metres using Haversine formula
df$dist_m <- geodist::geodist_vec(
x1 = df$df1_Longitude
, y1 = df$df1_Latitude
, x2 = df$df2_Longitude
, y2 = df$df2_Latitude
, paired = TRUE
, measure = "haversine"
)

## convert to miles
df$dist_miles <- df$dist_m / 1609

# df1_location_number df1_Latitude df1_Longitude df2_location_number df2_Latitude df2_Longitude dist_m dist_miles
# 1 5051 34.71714 -118.9107 3051 34.71714 -118.91073 0.0 0.0000
# 2 5051 34.71714 -118.9107 3085 39.53404 -93.29237 2327593.8 1446.6089
# 3 5051 34.71714 -118.9107 3022 31.62679 -88.33012 2859098.6 1776.9413
# 4 5051 34.71714 -118.9107 3041 35.24798 -84.80412 3095858.6 1924.0886
# 5 5051 34.71714 -118.9107 3104 39.33425 -123.71306 667849.7 415.0713


Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)

To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package geosphere has the distCosine, distHaversine, distVincentySphere and distVincentyEllipsoid for calculating the distance. Of these, the distVincentyEllipsoid is considered the most accurate one, but is computationally more intensive than the other ones.

With one of these functions, you can make a distance matrix. Based on that matrix you can then assign locality names based on shortest distance with which.min and the corresponding distance with min (see for this the last part of the answer) like this:

library(geosphere)

# create distance matrix
mat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)

# assign the name to the point in list1 based on shortest distance in the matrix
list1$locality <- list2$locality[max.col(-mat)]

this gives:

> list1
longitude latitude locality
1 80.15998 12.90524 D
2 72.89125 19.08120 A
3 77.65032 12.97238 C
4 77.60599 12.90927 D
5 72.88120 19.08225 A
6 76.65460 12.81447 E
7 72.88232 19.08241 A
8 77.49186 13.00984 D
9 72.82228 18.99347 A
10 72.88871 19.07990 A

Another possibility is to assign the locality based on the average longitude and latitude values of the localitys in list2:

library(dplyr)
list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()
mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)
list1 <- list1 %>% mutate(locality2 = list2a$locality[max.col(-mat2)])

or with data.table:

library(data.table)
list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]
mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)
list1[, locality2 := list2a$locality[max.col(-mat2)] ]

this gives:

> list1
longitude latitude locality locality2
1 80.15998 12.90524 D D
2 72.89125 19.08120 A B
3 77.65032 12.97238 C C
4 77.60599 12.90927 D C
5 72.88120 19.08225 A B
6 76.65460 12.81447 E E
7 72.88232 19.08241 A B
8 77.49186 13.00984 D C
9 72.82228 18.99347 A B
10 72.88871 19.07990 A B

As you can see, this leads in most (7 out of 10) occasions to another assigned locality.


You can add the distance with:

list1$near_dist <- apply(mat2, 1, min)

or another approach with max.col (which is highly probable faster):

list1$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]

# or using dplyr
list1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])
# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )
list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]

the result:

> list1
longitude latitude locality locality2 near_dist
1: 80.15998 12.90524 D D 269966.8970
2: 72.89125 19.08120 A B 65820.2047
3: 77.65032 12.97238 C C 739.1885
4: 77.60599 12.90927 D C 9209.8165
5: 72.88120 19.08225 A B 66832.7223
6: 76.65460 12.81447 E E 0.0000
7: 72.88232 19.08241 A B 66732.3127
8: 77.49186 13.00984 D C 17855.3083
9: 72.82228 18.99347 A B 69456.3382
10: 72.88871 19.07990 A B 66004.9900

Calculate distance longitude latitude of multiple in dataframe R

Instead of distm you can use the distHaversine-function. Further in your mutate call you should not repeat the dataframe and use the $ operator, mutate already nows where to look for the columns. The error occurs because you need to use cbind instead of c, as c creates one long vector, simply stacking the columns together, whereas cbind creates a dataframe with two columns (what you want to have in this case).

library(geosphere)
library(dplyr)

mutate(mydata,
Distance = distHaversine(cbind(Longitude, Latitude),
cbind(lag(Longitude), lag(Latitude))))

# Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
# 1 A118 18000 110 340 2017-11-06T22:28:09 70.6086 58.2959 NA
# 2 A118 18500 120 339 2017-11-06T22:29:09 72.1508 58.7894 172569.2
# 3 B222 18500 150 350 2017-11-08T07:28:09 71.1689 59.1234 109928.5
# 4 D123 19000 150 110 2018-05-29T15:13:27 69.4523 68.1235 387356.2

With distCosine it is a little bit more tricky, as it doesn't return NA if one of the input latitudes or longitudes is missing. Thus I modified the function a little bit and this solves the problem:

modified_distCosine <- function(Longitude1, Latitude1, Longitude2, Latitude2) {
if (any(is.na(c(Longitude1, Latitude1, Longitude2, Latitude2)))) {
NA
} else {
distCosine(c(Longitude1, Latitude1), c(Longitude2, Latitude2))
}
}

mutate(mydata,
Distance = mapply(modified_distCosine,
Longitude, Latitude, lag(Longitude), lag(Latitude)))

# Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
# 1 A118 18000 110 340 2017-11-06T22:28:09 70.6086 58.2959 NA
# 2 A118 18500 120 339 2017-11-06T22:29:09 72.1508 58.7894 172569.2
# 3 B222 18500 150 350 2017-11-08T07:28:09 71.1689 59.1234 109928.5
# 4 D123 19000 150 110 2018-05-29T15:13:27 69.4523 68.1235 387356.2

Here I use mapply to apply the modified function with the arguments Longitude, Latitude, lag(Longitude), lag(Latitude).

I'm quite sure there has to be a more elegant way, but at least this works.

Data

mydata <- structure(list(Callsign = c("A118", "A118", "B222", "D123"), 
Altitude = c(18000L, 18500L, 18500L, 19000L),
Speed = c(110L, 120L, 150L, 150L),
Direction = c(340L, 339L, 350L, 110L),
Date_Time = c("2017-11-06T22:28:09", "2017-11-06T22:29:09", "2017-11-08T07:28:09", "2018-05-29T15:13:27"),
Latitude = c(70.6086, 72.1508, 71.1689, 69.4523),
Longitude = c(58.2959, 58.7894, 59.1234, 68.1235)),
.Names = c("Callsign", "Altitude", "Speed", "Direction", "Date_Time", "Latitude", "Longitude"),
class = "data.frame", row.names = c(NA, -4L))

Distance between coordinates in dataframe sequentially?

df["distance"] <- c(NA,
sapply(seq.int(2,nrow(df)), function(i){
distm(c(df$Longitude[i-1],df$Latitude[i-1]),
c(df$Longitude[i], df$Latitude[i]),
fun = distHaversine)
})
)

This generates a vector beginning with NA for the first row. then it iterates until the last row while calculating the distance and adds those to the vector.

Calculating distance between two points using the distm function inside mutate

Perhaps we can use pmap

library(purrr)
pmap_dbl(DF, ~ distm(x = c(..1, ..2), y = c(..3, ..4),
fun = distHaversine) %>% c)

When combined with mutate

library(dplyr)
DF %>%
mutate(Dist = pmap_dbl(., ~
distm(x = c(..1, ..2), y = c(..3, ..4), fun = distHaversine)))
# A tibble: 10 x 5
# Long1 Lat1 Long2 Lat2 Dist
# <int> <int> <int> <int> <dbl>
# 1 3 3 10 5 808552.
# 2 4 2 2 6 497573.
# 3 5 6 6 4 248726.
# 4 7 10 1 2 1110668.
# 5 2 5 9 10 951974.
# 6 8 7 8 8 111319.
# 7 9 8 7 9 246730.
# 8 6 4 5 1 351986.
# 9 10 1 3 7 1024599.
#10 1 9 4 3 745867.


Related Topics



Leave a reply



Submit