Function to calculate geospatial distance between two points (lat,long) using R
Loading the geosphere package you can use a number of different functions
library(geosphere)
distm(c(lon1, lat1), c(lon2, lat2), fun = distHaversine)
Also:
distHaversine()
distMeeus()
distRhumb()
distVincentyEllipsoid()
distVincentySphere()
...
Distm function for calculate distance between coordinates in R
This should work:
library(geosphere)
distm(df[,c('Longitude','Latitude')],
df1[,c('Longitude','Latitude')],
fun=distVincentyEllipsoid)
[,1] [,2] [,3] [,4]
[1,] 45461.49 23203.37 44300.99 10190.84
[2,] 60243.58 15053.19 53852.61 40763.35
[3,] 63272.26 22151.07 59016.34 32505.87
[4,] 56308.59 46393.08 59016.34 15048.01
The first row indicates the distance between property 1 and industries 1, 2, 3 and 4.
See also here:
Function to calculate geospatial distance between two points (lat,long) using R
Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)
R calculate distance in miles using 2 latitude & 2 longitude vectors in a data frame for 18k rows
library(geodist)
is a good & fast library for calculating distances, and the geodist_vec()
function is vectorised to work on 'columns' of data
library(geodist)
## calcualte distance in metres using Haversine formula
df$dist_m <- geodist::geodist_vec(
x1 = df$df1_Longitude
, y1 = df$df1_Latitude
, x2 = df$df2_Longitude
, y2 = df$df2_Latitude
, paired = TRUE
, measure = "haversine"
)
## convert to miles
df$dist_miles <- df$dist_m / 1609
# df1_location_number df1_Latitude df1_Longitude df2_location_number df2_Latitude df2_Longitude dist_m dist_miles
# 1 5051 34.71714 -118.9107 3051 34.71714 -118.91073 0.0 0.0000
# 2 5051 34.71714 -118.9107 3085 39.53404 -93.29237 2327593.8 1446.6089
# 3 5051 34.71714 -118.9107 3022 31.62679 -88.33012 2859098.6 1776.9413
# 4 5051 34.71714 -118.9107 3041 35.24798 -84.80412 3095858.6 1924.0886
# 5 5051 34.71714 -118.9107 3104 39.33425 -123.71306 667849.7 415.0713
Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)
To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package geosphere
has the distCosine
, distHaversine
, distVincentySphere
and distVincentyEllipsoid
for calculating the distance. Of these, the distVincentyEllipsoid
is considered the most accurate one, but is computationally more intensive than the other ones.
With one of these functions, you can make a distance matrix. Based on that matrix you can then assign locality
names based on shortest distance with which.min
and the corresponding distance with min
(see for this the last part of the answer) like this:
library(geosphere)
# create distance matrix
mat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)
# assign the name to the point in list1 based on shortest distance in the matrix
list1$locality <- list2$locality[max.col(-mat)]
this gives:
> list1
longitude latitude locality
1 80.15998 12.90524 D
2 72.89125 19.08120 A
3 77.65032 12.97238 C
4 77.60599 12.90927 D
5 72.88120 19.08225 A
6 76.65460 12.81447 E
7 72.88232 19.08241 A
8 77.49186 13.00984 D
9 72.82228 18.99347 A
10 72.88871 19.07990 A
Another possibility is to assign the locality
based on the average longitude and latitude values of the locality
s in list2
:
library(dplyr)
list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()
mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)
list1 <- list1 %>% mutate(locality2 = list2a$locality[max.col(-mat2)])
or with data.table
:
library(data.table)
list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]
mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)
list1[, locality2 := list2a$locality[max.col(-mat2)] ]
this gives:
> list1
longitude latitude locality locality2
1 80.15998 12.90524 D D
2 72.89125 19.08120 A B
3 77.65032 12.97238 C C
4 77.60599 12.90927 D C
5 72.88120 19.08225 A B
6 76.65460 12.81447 E E
7 72.88232 19.08241 A B
8 77.49186 13.00984 D C
9 72.82228 18.99347 A B
10 72.88871 19.07990 A B
As you can see, this leads in most (7 out of 10) occasions to another assigned locality
.
You can add the distance with:
list1$near_dist <- apply(mat2, 1, min)
or another approach with max.col
(which is highly probable faster):
list1$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]
# or using dplyr
list1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])
# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )
list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]
the result:
> list1
longitude latitude locality locality2 near_dist
1: 80.15998 12.90524 D D 269966.8970
2: 72.89125 19.08120 A B 65820.2047
3: 77.65032 12.97238 C C 739.1885
4: 77.60599 12.90927 D C 9209.8165
5: 72.88120 19.08225 A B 66832.7223
6: 76.65460 12.81447 E E 0.0000
7: 72.88232 19.08241 A B 66732.3127
8: 77.49186 13.00984 D C 17855.3083
9: 72.82228 18.99347 A B 69456.3382
10: 72.88871 19.07990 A B 66004.9900
Calculate distance longitude latitude of multiple in dataframe R
Instead of distm
you can use the distHaversine
-function. Further in your mutate
call you should not repeat the dataframe and use the $
operator, mutate
already nows where to look for the columns. The error occurs because you need to use cbind
instead of c
, as c
creates one long vector, simply stacking the columns together, whereas cbind
creates a dataframe with two columns (what you want to have in this case).
library(geosphere)
library(dplyr)
mutate(mydata,
Distance = distHaversine(cbind(Longitude, Latitude),
cbind(lag(Longitude), lag(Latitude))))
# Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
# 1 A118 18000 110 340 2017-11-06T22:28:09 70.6086 58.2959 NA
# 2 A118 18500 120 339 2017-11-06T22:29:09 72.1508 58.7894 172569.2
# 3 B222 18500 150 350 2017-11-08T07:28:09 71.1689 59.1234 109928.5
# 4 D123 19000 150 110 2018-05-29T15:13:27 69.4523 68.1235 387356.2
With distCosine
it is a little bit more tricky, as it doesn't return NA
if one of the input latitudes or longitudes is missing. Thus I modified the function a little bit and this solves the problem:
modified_distCosine <- function(Longitude1, Latitude1, Longitude2, Latitude2) {
if (any(is.na(c(Longitude1, Latitude1, Longitude2, Latitude2)))) {
NA
} else {
distCosine(c(Longitude1, Latitude1), c(Longitude2, Latitude2))
}
}
mutate(mydata,
Distance = mapply(modified_distCosine,
Longitude, Latitude, lag(Longitude), lag(Latitude)))
# Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
# 1 A118 18000 110 340 2017-11-06T22:28:09 70.6086 58.2959 NA
# 2 A118 18500 120 339 2017-11-06T22:29:09 72.1508 58.7894 172569.2
# 3 B222 18500 150 350 2017-11-08T07:28:09 71.1689 59.1234 109928.5
# 4 D123 19000 150 110 2018-05-29T15:13:27 69.4523 68.1235 387356.2
Here I use mapply
to apply the modified function with the arguments Longitude, Latitude, lag(Longitude), lag(Latitude)
.
I'm quite sure there has to be a more elegant way, but at least this works.
Data
mydata <- structure(list(Callsign = c("A118", "A118", "B222", "D123"),
Altitude = c(18000L, 18500L, 18500L, 19000L),
Speed = c(110L, 120L, 150L, 150L),
Direction = c(340L, 339L, 350L, 110L),
Date_Time = c("2017-11-06T22:28:09", "2017-11-06T22:29:09", "2017-11-08T07:28:09", "2018-05-29T15:13:27"),
Latitude = c(70.6086, 72.1508, 71.1689, 69.4523),
Longitude = c(58.2959, 58.7894, 59.1234, 68.1235)),
.Names = c("Callsign", "Altitude", "Speed", "Direction", "Date_Time", "Latitude", "Longitude"),
class = "data.frame", row.names = c(NA, -4L))
Distance between coordinates in dataframe sequentially?
df["distance"] <- c(NA,
sapply(seq.int(2,nrow(df)), function(i){
distm(c(df$Longitude[i-1],df$Latitude[i-1]),
c(df$Longitude[i], df$Latitude[i]),
fun = distHaversine)
})
)
This generates a vector beginning with NA
for the first row. then it iterates until the last row while calculating the distance and adds those to the vector.
Calculating distance between two points using the distm function inside mutate
Perhaps we can use pmap
library(purrr)
pmap_dbl(DF, ~ distm(x = c(..1, ..2), y = c(..3, ..4),
fun = distHaversine) %>% c)
When combined with mutate
library(dplyr)
DF %>%
mutate(Dist = pmap_dbl(., ~
distm(x = c(..1, ..2), y = c(..3, ..4), fun = distHaversine)))
# A tibble: 10 x 5
# Long1 Lat1 Long2 Lat2 Dist
# <int> <int> <int> <int> <dbl>
# 1 3 3 10 5 808552.
# 2 4 2 2 6 497573.
# 3 5 6 6 4 248726.
# 4 7 10 1 2 1110668.
# 5 2 5 9 10 951974.
# 6 8 7 8 8 111319.
# 7 9 8 7 9 246730.
# 8 6 4 5 1 351986.
# 9 10 1 3 7 1024599.
#10 1 9 4 3 745867.
Related Topics
Why Would R Use the "L" Suffix to Denote an Integer
Using Rcpp Within Parallel Code via Snow to Make a Cluster
Drop-Down Checkbox Input in Shiny
Alignment of Numbers on the Individual Bars
How to Overlay Density Plots in R
Difference Between Passing Options in Aes() and Outside of It in Ggplot2
Remove Duplicates Keeping Entry with Largest Absolute Value
Is There a More Elegant Way to Convert Two-Digit Years to Four-Digit Years with Lubridate
How to Make Geom_Text Plot Within the Canvas's Bounds
Subtract a Column in a Dataframe from Many Columns in R
Automatically Delete Files/Folders
Can't Print to PDF Ggplot Charts
How to Specify the Actual X Axis Values to Plot as X Axis Ticks in R