Maps, Ggplot2, Fill by State Is Missing Certain Areas on the Map

Maps, ggplot2, fill by state is missing certain areas on the map

I played with your code. One thing I can tell is that when you used merge something happened. I drew states map using geom_path and confirmed that there were a couple of weird lines which do not exist in the original map data. I, then, further investigated this case by playing with merge and inner_join. merge and inner_join are doing the same job here. However, I found a difference. When I used merge, order changed; the numbers were not in the right sequence. This was not the case with inner_join. You will see a bit of data with California below. Your approach was right. But merge somehow did not work in your favour. I am not sure why the function changed order, though.

library(dplyr)

### Call US map polygon
states <- map_data("state")

### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)

### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.

setdiff(fbi$state, states$region)
#[1] "alaska" "hawaii" "washington d. c."

setdiff(states$region, fbi$state)
#[1] "district of columbia"

### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
filter(Year == 2012) %>%
select(state, Robbery)

Now I created two data frames with merge and inner_join.

### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))

ana %>%
filter(state == "california") %>%
slice(1:5)

# state Robbery long lat group order subregion
#1 california 56521 -119.8685 38.90956 4 676 <NA>
#2 california 56521 -119.5706 38.69757 4 677 <NA>
#3 california 56521 -119.3299 38.53141 4 678 <NA>
#4 california 56521 -120.0060 42.00927 4 667 <NA>
#5 california 56521 -120.0060 41.20139 4 668 <NA>

bob %>%
filter(state == "california") %>%
slice(1:5)

# state Robbery long lat group order subregion
#1 california 56521 -120.0060 42.00927 4 667 <NA>
#2 california 56521 -120.0060 41.20139 4 668 <NA>
#3 california 56521 -120.0060 39.70024 4 669 <NA>
#4 california 56521 -119.9946 39.44241 4 670 <NA>
#5 california 56521 -120.0060 39.31636 4 671 <NA>

ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()

Sample Image

Region missing parts of map in ggplot2 in R

No need for the data joining or a horrible projection. Note, your data is really not continuous and you should figure out how you should normalize it (probably find the estimated # of twitter users per state and normalize it by that, or use per 100,000 population).

library(ggplot2)
library(ggthemes)
library(viridis)

my_data <- structure(list(region = c("alabama", "alaska", "arkansas", "arizona",
"california", "colorado", "connecticut", "delaware", "florida",
"georgia", "hawaii", "iowa", "idaho", "illinois", "indiana",
"kansas", "kentucky", "louisiana", "massachusetts", "maryland",
"maine", "michigan", "minnesota", "missouri", "mississippi",
"montana", "north carolina", "north dakota", "nebraska", "new hampshire",
"new jersey", "new mexico", "nevada", "new york", "ohio", "oklahoma",
"oregon", "pennsylvania", "rhode island", "south carolina", "south dakota",
"tennessee", "texas", "utah", "virginia", "vermont", "washington",
"wisconsin", "wyoming", "west virginia"), number_of_tweets = c(10929L,
0L, 5107L, 452L, 26299L, 265L, 1459L, 2418L, 9666L, 7306L, 2486L,
29229L, 7607L, 10221L, 20700L, 32252L, 11098L, 938L, 10764L,
4091L, 5770L, 47335L, 1079L, 1079L, 1273L, 11606L, 22354L, 6294L,
7319L, 7185L, 26850L, 0L, 7918L, 16007L, 8284L, 63551L, 1120L,
908L, 10240L, 6296L, 3559L, 4765L, 30235L, 15019L, 5541L, 16444L,
7506L, 7817L, 10496L, 0L)), .Names = c("region", "number_of_tweets"
), class = "data.frame", row.names = c(NA, -50L))

states <- map_data("state")

my_data$cut <- as.character(cut(my_data$number_of_tweets,
breaks=pretty(x=my_data$number_of_tweets, n=7),
labels=pretty(x=my_data$number_of_tweets, n=7)[-1]))

my_data$cut <- ifelse(is.na(my_data$cut), 0, my_data$cut)

gg <- ggplot()
gg <- gg + geom_map(data=states, map=states,
aes(x=long, y=lat, map_id=region),
color="white", size=0.1, fill=NA)
gg <- gg + geom_map(data=my_data, map=states,
aes(fill=cut, map_id=region),
color="white", size=0.1)
gg <- gg + scale_fill_viridis(name="# Tweets", discrete=TRUE, begin=0.1, end=0.9)
gg <- gg + coord_map("polyconic")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

Sample Image

How to fill out white/missing parts of the map in R?

Solution was to use left_join from dplyr instead of merge:

rm(list=ls())
library(tidyverse)
library(maptools)
library(raster)
library(plotrix)
library(ggrepel)

df2016 <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
name value amount
LD 1 3
ZE 1 2
WS 0.79 19
ML 0.75 12
HS 0.75 4
TQ 0.74 38
WN 0.73 15
CA 0.71 28
HU 0.7 33
FY 0.69 16
HG 0.69 16
IV 0.68 19
DL 0.68 25
CB 0.68 115
TS 0.67 46
IP 0.67 87
AB 0.67 66
NP 0.67 45
FK 0.67 18
IM 0.67 9
SM 0.66 50
HD 0.66 32
EN 0.66 61
CO 0.65 52
ME 0.65 54
PE 0.64 266
EX 0.64 81
WV 0.63 49
JE 0.63 24
NE 0.62 148
YO 0.62 47
DE 0.62 78
LN 0.61 36
SN 0.61 109
IG 0.6 63
NR 0.6 90
SP 0.59 37
BA 0.59 93
UB 0.59 127
TN 0.59 95
BT 0.59 180
BD 0.59 51
HP 0.59 126
TA 0.59 46
PO 0.58 113
DH 0.58 55
WD 0.58 102
BH 0.57 96
DG 0.57 14
CV 0.57 225
RG 0.57 255
BN 0.56 158
DY 0.56 48
HA 0.56 148
W 0.56 359
WA 0.56 77
DA 0.55 38
CT 0.55 62
GU 0.55 231
RH 0.55 132
BL 0.55 33
HX 0.55 11
BS 0.54 184
SS 0.54 46
EH 0.54 185
DT 0.54 37
G 0.54 137
B 0.54 283
LU 0.54 41
NG 0.54 97
OX 0.53 208
S 0.53 179
CM 0.53 100
DD 0.53 17
GL 0.53 87
AL 0.53 89
HR 0.53 38
LS 0.52 122
TF 0.52 21
RM 0.52 44
SL 0.52 155
MK 0.52 136
SY 0.52 46
DN 0.52 81
N 0.52 191
M 0.52 226
SR 0.52 29
SK 0.52 64
BB 0.51 140
KY 0.51 41
WF 0.51 51
PR 0.51 63
L 0.51 81
KT 0.5 185
CF 0.5 118
ST 0.5 84
TR 0.5 46
CW 0.5 44
TD 0.5 12
P 0.5 2
SW 0.5 317
LL 0.49 49
CH 0.49 43
E 0.49 275
EC 0.48 364
PA 0.48 27
SO 0.48 157
CR 0.48 84
PL 0.48 61
SG 0.47 59
KA 0.47 15
LA 0.47 43
SA 0.46 78
LE 0.46 194
TW 0.45 125
OL 0.44 41
SE 0.44 297
NN 0.43 143
NW 0.42 236
WC 0.41 138
WR 0.38 73
BR 0.37 62
GY 0.26 35
PH 0.23 13
")

# Download a shapefile of postal codes into your working directory
download.file(
"http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
)

# Unzip the shapefile
unzip("postal_shapefile")

# Read the shapefile
postal <- readShapeSpatial("./Distribution/Areas")

postal.df <- fortify(postal, region = "name")

# Join your data to the shapefile
colnames(postal.df)[colnames(postal.df) == "id"] <- "name"

library(dplyr)
test <- left_join(postal.df, df2016, by = "name", copy = FALSE)

#postal.df <- raster::merge(postal.df, df2016, by = "name")

test$value[is.na(test$value)] <- 0.50

# for use in plotting area names.

postal.centroids.df <- data.frame(long = coordinates(postal)[, 1],
lat = coordinates(postal)[, 2],
id=postal$name)

p <- ggplot(test, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = cut(value,5))) +
geom_text_repel(data = postal.centroids.df, aes(label = id, x = long, y = lat, group = id), size = 3, check_overlap = T) +
labs(x=" ", y=" ") +
theme_bw() + scale_fill_brewer('Success Rate 2016', palette = 15) +
coord_map() +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(panel.border = element_blank())
p

R ggplot2 mapping issue, automate missing State info

So this turns out to be a common problem. Generally choropleth maps require some sort of merge of the map data with the dataset containing the information used to set the polygon fill colors. In OP's case this is done as follows:

states <- map_data("state")
states <- merge(states,statelookup,by="region")
penetration_levels <- merge(states,penetration_levels,by="State")

The problem is that, if penetration_levels has any missing States, these rows will be excluded from the merge (in database terminology, this is an inner join). So in rendering the map, those polygons will be missing. The solution is to use:

penetration_levels <- merge(states,penetration_levels,by="State",all.x=T)

This returns all rows of the first argument (the "x" argument), merged with any data from matching states in the second argument (a left join). Missing values are set to NA.

The fill color of polygons (states) with NA values is set by default to grey50, but can be changed by adding the following call to the plot definition:

scale_fill_gradient(na.value="red")

How do I fix this missing variable problem in ggplot when tmap works fine?

Without a reproducible example of your dataset, it's hard to be sure of the solution to your question, but maybe you can plot your data with ggplot2 after converting it to a sf object and then use geom_sf:

library(sf)
library(sp)
library(ggplot2)
SF_Obj <- st_as_sf(SouthIslandTAs, fill = TRUE, plot = FALSE)
ggplot()+ geom_sf(data = SouthIslandTAs, aes(fill = TA2013_label))

Here an example using USA maps from raster package:

States <- raster::getData("GADM", country = "United States", level = 1)  
ggplot() + geom_polygon(data = States, aes(x=long, y = lat, group = group, fill = NAME_1))

I get the same error than you:

Regions defined for each Polygons Error in FUN(X[[i]], ...) : object 'NAME_1' not found

But when I do:

library(sf)
library(sp)
library(ggplot2)
library(dplyr)
sf_states <- sf::st_as_sf(States, plot = FALSE, fill = TRUE)
sf_states %>% filter(!(NAME_1 %in% c("Alaska","Hawaii"))) %>%
ggplot() + geom_sf(aes(fill = NAME_1), show.legend = FALSE)

I get:

Sample Image

Assigning specific filling color

To assign specific colors starting from the sf object, you can create a new column with some color names specified and then use scale_fill_identity:

library(sf)
library(sp)
library(ggplot2)
library(dplyr)
sf_states %>% filter(!(NAME_1 %in% c("Alaska","Hawaii"))) %>%
mutate(COLOR = ifelse(NAME_1 %in% c("Oregon","Florida"),"green","red")) %>%
ggplot() + geom_sf(aes(fill = COLOR), show.legend = FALSE)+
scale_fill_identity()

Sample Image

If you prefer filling with 0 and 1 depending of the country, you can get the same plot by doing:

sf_states %>% filter(!(NAME_1 %in% c("Alaska","Hawaii"))) %>% 
mutate(COLOR = ifelse(NAME_1 %in% c("Oregon","Florida"),1,0)) %>%
ggplot() + geom_sf(aes(fill = as.factor(COLOR)), show.legend = FALSE)+
scale_fill_manual(values = c("red","green"))

Does it answer your question ? If not, please consider providing a reproducible example of your dataset

Corrupted color fill of the world map - using geom_map

It's best to give us something as code that replicates the problem you are having. I was able to replicate your code without using the link you provided. My suggestion would be to use left_join() instead of merge() and replace_na() instead of the for loop.

library(maps)
library(tidyverse)
library(mapdata)
library(ggthemes)
library(mapproj)

m <-
map_data("world")

mapka <-
m %>%
distinct(region) %>%
slice(1:100) %>%
mutate(a = c(1:100)*40)

# replicates your issue
choro <- merge(m, mapka, by = "region", all.x = TRUE)

for (i in 1:nrow(choro)) {
if (is.na(choro$a[i]) == TRUE) {
choro$a[i] <- 0
}
}

ggplot() +
geom_map(
data = choro, map = choro,
aes(long, lat, map_id = region, fill = a)
)

# using left_join and replace_na
choro <-
m %>%
left_join(mapka) %>%
mutate(a = replace_na(a, 0))

ggplot() +
geom_map(
data = choro, map = choro,
aes(long, lat, map_id = region, fill = a)
)

Sample Image

ggplot centered names on a map

Since you are creating two layers (one for the polygons and the second for the labels), you need to specify the data source and mapping correctly for each layer:

ggplot(ny, aes(long, lat)) +  
geom_polygon(aes(group=group), colour='black', fill=NA) +
geom_text(data=cnames, aes(long, lat, label = subregion), size=2)

Note:

  • Since long and lat occur in both data frames, you can use aes(long, lat) in the first call to ggplot. Any mapping you declare here is available to all layers.
  • For the same reason, you need to declare aes(group=group) inside the polygon layer.
  • In the text layer, you need to move the data source outside the aes.

Once you've done that, and the map plots, you'll realize that the midpoint is better approximated by the mean of range, and to use a map coordinate system that respects the aspect ratio and projection:

cnames <- aggregate(cbind(long, lat) ~ subregion, data=ny, 
FUN=function(x)mean(range(x)))

ggplot(ny, aes(long, lat)) +
geom_polygon(aes(group=group), colour='black', fill=NA) +
geom_text(data=cnames, aes(long, lat, label = subregion), size=2) +
coord_map()

Sample Image

How can I highlight some specific municipalities in a map with ggplot?

You are almost there. As with a normal ggplot graph, you need to have a column that can be used as a facet to fill in the colours. If you have a list of coastal municipalities you can use those to create a column like I do below with coastal_cmun. Use this column for the fill value and use scale_fill_manual (or any other discrete fill scale) to adjust the colours. Hide the legends if needed.

# coastal municipalities
coast <- c("005", "025", "041")
malaga$coastal_cmun <- ifelse(malaga$cmun %in% coast, "coast", "inland")

ggplot(malaga) +
geom_sf(aes(fill = coastal_cmun)) +
scale_fill_manual(values = c("red2", "grey50")) + # order by factor of coastal_cmun
labs(
fill = "",
title = "Municipalities",
subtitle = "Málaga"
) +
theme_void() +
theme(
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(face = "italic")
)

Sample Image

geom_map all subregions the same color

This is documented behavior of geom_map. geom_map always draws the region variable (or alternatively id) from states_map. This is confirmed by the following. Running:

ny$region = ny$subregion

puts the subregion names into the region column. Now plotting leads to the correct image:

Sample Image

So, geom_map uses the region or id.



Related Topics



Leave a reply



Submit