Plotting interpolated data on map with irregular boundaries in R
The problem was sold using package ‘GADMTools’.
joinmap <- gadm_sp_loadCountries(c("RUS","LTU","LVA", "EST", "FIN", "POL", "BLR"), basefile = "./")
studyarea <- gadm_crop(joinmap, xmin=20.375, ymin=52.375, xmax=31.375, ymax=61.375)
gadm_exportToShapefile(studyarea, "path for shapefile")
Interpolated heat map plot from discrete data points
I adapted an example of scipy.interpolate.griddata
, with plt.contourf()
as suggested by Matt Pitkin:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import griddata
x, y, vals = data[:,0], data[:,1], data[:,2]
X, Y = np.meshgrid(
np.linspace(np.min(x), np.max(x), 100),
np.linspace(np.min(y), np.max(y), 100)
)
interpolated_vals = griddata((x, y), vals, (X, Y), method='cubic')
plt.contourf(X, Y, interpolated_vals)
plt.show()
Plotting interpolated data on map
I have a number of remarks on your post:
Using kriging
I see that you are using geostatistics to construct your heatmap. You could also consider other interpolation techniques such as splines (e.g. Thin plate splines in the fields package). These make less assumptions about the data (e.g. stationarity), and can also visualize your data just fine. The reduction in the number of assumptions might help in case you send it to a journal, then you have less to explain to the reviewers. You can also compare a few interpolation techniques if you want, see a report I wrote for some tips.
Data projection
I see that you are using lat long coordinates for kriging. Edzer Pebesma (author of gstat
) remarked that there are no variogram models that are suitable for lat lon coordinates. This is because in lat lon the distances are not straight (i.e. Euclidean), but over a sphere, (i.e. Great circle distances). There are no covariance functions (or variogram models) that are valid for spherical coordinates. I recommend projecting them using spTransform
from the rgdal
package before using automap.
The rgdal package uses the proj4 projection library to perform the calculations. To project your data you first need to define its projection:
proj4string(df) = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
The proj4 string on the right hand side of the expression above defines the type of projection (+proj
), the ellips that was used (+ellps
) and the datum (+datum
). To understand what these terms mean, you have to imagine the Earth as a potato. The Earth is not perfectly spherical, this is defined by the ellips. Neither is the Earth a perfect ellipsoid, but the surface is more irregular. This irregularity is defined by the datum. See also this article on Wikipedia.
Once you have the projection defined, you can use spTransform
:
project_df = spTransform(df, CRS("+proj= etcetc"))
where CRS("+proj etc") defines the target projection. Which projection is appropriate depends on your geographical location and the size of your study area.
Plotting with ggplot2
For adding polygons or polylines to ggplot, please to a look the documentation of coord_map
. This includes an example of using the maps
package to plot country boundaries. If you need to load for example shapefiles for your study area, you can do so using rgdal
. Do remember that ggplot2
works with data.frame's, not SpatialPolygons
. You can transform SpatialPolygons
to data.frame
using:
poly_df = fortify(poly_Spatial)
See also this function I created to plot spatial grids. It works directly on SpatialGrids/Pixels. Note that you need to source one or two additional files from that repository (continuousToDiscrete).
Creating interpolation grid
I created automap to generate an output grid when none was specified. This is done by creating a convex hull around the data points, and sampling 5000 points inside it. The boundaries of the prediction area, and the number of points sampled in it (and thus the resolution) is quite arbitrary. For a specific application the shape of the prediction area can be derived from a polygon, using spsample
to sample points inside the polygon. How many points to sample, and thus the resolution, depends on two things:
- the kind of data you have, For example, if your data is very smooth, there is not much point in raising the resolution really high in comparison to this smoothness. Alternatively, if your data has many small scale strcutures, you need a high resolution. This is only possible ofcourse if you have the observations to support this high resolution.
- the density of data. If your data is more dense, you can raise the resolution.
If you use your interpolated map for subsequent analyses, getting the resolution right is important. If you use the map purely for visuatlisation purposes, this is less important. Note however that in both cases a too high resolution can be misleading as to the accuracy of your predictions, and that a too low resolution does not do justice to the data.
Plotting contour map of interpolated function: unmatching results for different sections of data
Your data are already on a grid. Creating a new grid based on that grid generates something very messy.
Now the array res
contains 600 x,y,z values. Using reshape(20, 30)
tells numpy that these 600 entries are in reality 20 rows of 30 columns. The 'purest' form to show the data, would be a scatter
plot, showing just the data.
With imshow
the data can be shown as an image. Using interpolation='nearest'
would 'blow up' each z-value to fill a rectangular pixel. Using interpolation='bicubic'
would smoothly smear out these pixels.
contourf
creates contours with equally-valued z. Depending on the data, having many levels (such as 100) would help to get a smooth image, but it wouldn't be valuable compared to just displaying a smoothed image. Having a limited number of levels (e.g. 20) could help to show a general underlying shape.
Here is some code to compare and experiment with the different approaches:
import numpy as np
import matplotlib.pyplot as plt
# Full axis:
res = np.load("stackoverflow-example-data.npy")
X = res[:, 0].reshape(20, 30)
Y = res[:, 1].reshape(20, 30)
Z = res[:, 2].reshape(20, 30)
fig, axes = plt.subplots(ncols=3, figsize=(12, 4))
axes[0].scatter(X, Y, c=Z, s=10)
axes[1].imshow(Z, origin='lower', interpolation='bicubic', aspect='auto',
extent=[X.min(), X.max(), Y.min(), Y.max()])
contour = axes[2].contourf(X, Y, Z, levels=20)
axes[0].set_title('scatter plot')
axes[1].set_title('imshow, bicubic interpolation')
axes[2].set_title('contour plot, 20 levels')
plt.show()
Experimenting with different colormaps, can accentuate different properties of the function you are showing. You could also transform the Z-value (such as np.log(Z)
or np.exp(Z)
) so different regions get more or less detail.
for ax, cmap in zip(axes, ('viridis', 'inferno_r', 'Purples')):
ax.imshow(Z, origin='lower', interpolation='bicubic', aspect='auto', cmap=cmap,
extent=[X.min(), X.max(), Y.min(), Y.max()])
ax.set_title(cmap)
How to plot interpolating data on a projected map using ggplot2 in R
After digging a bit more, I guess you may want this:
Krig = autoKrige(APPT~1,sp_mydata)$krige_output
Krig = Krig[!is.na(over(Krig,as(g,"SpatialPolygons"))),] # take only the points falling in poolygons
Krig_df = as.data.frame(Krig)
names(Krig_df) = c("APPT_pred","APPT_var","APPT_stdev","longitude","latitude")
g_fort = fortify(g)
Borders = ggplot() +
geom_raster(data=Krig_df, aes(x=longitude, y=latitude,fill=APPT_pred))+
geom_polygon(data=g_fort,aes(x=long,y=lat,group=group),
fill='transparent',color = "black")+
theme_bw()
Borders
which gives:
Only problem is that you still have "missing" interpolated areas in the resulting map (e.g., on the western part).
This is due to the fact that, as from autokrige
help:
new_data: A sp object containing the prediction locations. new_data can be a points set, a grid or a polygon. Must not contain NA’s. If this object is not provided a default is calculated. This is done by taking the convex hull of input_data and placing around 5000 gridcells in that convex hull
Therefore, if you do not provide a feasible newdata as argument, the interpolated area is limited by the convex hull of the points of the input dataset (= no extrapolation).
This can be solved using spsample
insp
package:
library(sp)
ptsreg <- spsample(g, 4000, type = "regular") # Define the ouput grid - 4000 points in polygons extent
Krig = autoKrige(APPT~1,sp_mydata, new_data = ptsreg)$krige_output
Krig = Krig[!is.na(over(Krig,as(g,"SpatialPolygons"))),] # take only the points falling in poolygons
Krig_df = as.data.frame(Krig)
names(Krig_df) = c("longitude","latitude", "APPT_pred","APPT_var","APPT_stdev")
g_fort = fortify(g)
Borders = ggplot() +
geom_raster(data=Krig_df, aes(x=longitude, y=latitude,fill=APPT_pred))+
geom_polygon(data=g_fort,aes(x=long,y=lat,group=group),
fill='transparent',color = "black")+
theme_bw()
Borders
which gives:
Notice that the small "holes" that you still have near polygon boundaries can be removed by increasing the number of interpolation points in the call to spsample
(Since it is a slow operation I only asked for 4000, here)
A simpler quick alternative could be to use package mapview
library(mapview)
m1 <- mapview(Krig)
m2 <- mapview(g)
m2+m1
(you may want to use a less detailed polygon boundaries shapefiles, since this is slow)
HTH !
Related Topics
Extracting Unique Rows from a Data Table in R
Shinydashboard Some Font Awesome Icons Not Working
How to Annotate a Reference Line at the Same Angle as the Reference Line Itself
Find All Date Ranges for Overlapping Start and End Dates in R
Flip Ordering of Legend Without Altering Ordering in Plot
Dt[!(X == .)] and Dt[X != .] Treat Na in X Inconsistently
What Evaluates to True/False in R
How to Deal with Spaces in Column Names
Heatmap-Like Plot, But for Categorical Variables
Change Geom_Text's Default "A" Legend to Label String Itself
R: Replacing Na Values by Mean of Hour with Dplyr
R: Extracting "Clean" Utf-8 Text from a Web Page Scraped with Rcurl
How to Facet a Plot_Ly() Chart
R Create Reference Manual with R Cmd Check
How to Resolve the "No Font Name" Issue When Importing Fonts into R Using Extrafont