How to drop factor levels while scraping data off US Census HTML site
It's actually R-FAQ 7.10:
You should be able to see the FAQ with your R-help() system. On my machine it is set up as html:
http://127.0.0.1:23603/doc/manual/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
7.10 How do I convert factors to numeric?
It may happen that when reading numeric data into R (usually, when reading in a file), they come in as factors. If f is such a factor object, you can use
as.numeric(as.character(f))
to get the numbers back. More efficient, but harder to remember, isas.numeric(levels(f))[as.integer(f)]
In any case, do not call as.numeric() or their likes directly for the task at hand (as as.numeric() or unclass() give the internal codes).
How to get the levels number in R?
The as.data.frame
method for objects of class "table" returns the first column as a factor and (along with any other "marginal labels" columns) and only the last column as the numeric counts. See the help page for ?table
and look at the Value section. Tyler's recommendation to use the R-FAQ recommended as.numeric(as.character(.))
conversion strategy is "standard R".
Merge data frame with SpatialPolygonsDataFrame
Caveat: I've never done this before so I'm "feeling my way around". First look at the object-states
:
Note: this was with rgdal_0.9-3 and sp_1.1-1 loaded under R 3.2.1 (and with GDAL installed on my OSX system, from kingchaos, IIRC):
> str(states)
Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
..@ data :'data.frame': 52 obs. of 9 variables:
.. ..$ STATEFP : Factor w/ 52 levels "01","02","04",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ STATENS : Factor w/ 52 levels "00068085","00294478",..: 22 17 2 18 27 28 29 30 16 19 ...
.. ..$ AFFGEOID: Factor w/ 52 levels "0400000US01",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ GEOID : Factor w/ 52 levels "01","02","04",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ STUSPS : Factor w/ 52 levels "AK","AL","AR",..: 5 8 10 11 14 15 13 18 19 21 ...
.. ..$ NAME : Factor w/ 52 levels "Alabama","Alaska",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ LSAD : Factor w/ 1 level "00": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ ALAND : num [1:52] 4.03e+11 1.58e+08 1.39e+11 1.49e+11 2.14e+11 ...
.. ..$ AWATER : num [1:52] 2.05e+10 1.86e+07 3.14e+10 4.95e+09 2.40e+09 ...
..@ polygons :List of 52
.. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
.. .. .. ..@ Polygons :List of 6
.. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
.. .. .. .. .. .. ..@ labpt : num [1:2] -118.4 33.4
.. .. .. .. .. .. ..@ area : num 0.0259
.. .. .. .. .. .. ..@ hole : logi FALSE
##### Snipped rest of output ............................
So after looking for help on merge and reading:
?merge # and choosing the option for:
Merge a Spatial* object having attributes with a data.frame
(in package sp in library /Library/Frameworks/R.framework/Versions/3.2/Resources/library)
I decided to try (and appear to have succeeded:
> newobj <- merge(states, my_counts, by.x="STUSPS", by.y="State")
Warning message:
In .local(x, y, ...) : 8 records in y cannot be matched to x
> names(newobj@data)
[1] "STUSPS" "STATEFP" "STATENS" "AFFGEOID" "GEOID" "NAME"
[7] "LSAD" "ALAND" "AWATER" "count"
The warning makes sense. You seem to have some extra "States" not anticipated by the authors of that "states" shp-file:
> length( table(my_counts$State))
[1] 60
> length( unique(states@data$STUSPS) )
[1] 52
The moral
You should look at the names
-values in the two objects when you are merging:
> names(states)
[1] "STATEFP" "STATENS" "AFFGEOID" "GEOID" "STUSPS" "NAME" "LSAD"
[8] "ALAND" "AWATER"
> names(my_counts)
[1] "State" "count"
R convert zipcode or lat/long to county
I ended up using the suggestion from JoshO'Brien
mentioned above and found here.
I took his code and changed state
to county
as shown here:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
latlong2county <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per county
counties <- map('county', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(counties$names, ":"), function(x) x[1])
counties_sp <- map2SpatialPolygons(counties, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, counties_sp)
# Return the county names of the Polygons object containing each point
countyNames <- sapply(counties_sp@polygons, function(x) x@ID)
countyNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
latlong2county(testPoints)
[1] "wisconsin,juneau" "oregon,crook" # IT WORKS
Related Topics
Adding a Legend to an Rgl 3D Plot
Error with H2O in R - Can't Connect to Local Host
Classic Case of 'Sum' Returning Na Because It Doesn't Sum Nas
Why "Character Is Often Preferred to Factor" in Data.Table for Key
R Shiny: Multiple Use in UI of Same Renderui in Server
Create a Concentric Circle Legend for a Ggplot Bubble Chart
Getting the Minimum of the Rows in a Data Frame
Place Text Values to Right of Sankey Diagram
Ggplot: How to Produce a Gradient Fill Within a Geom_Polygon
How to Pop Up the Graphics Window from Rscript
Placement of Error Bars in Barplot Using Ggplot2
Plotly - Different Colours for Different Surfaces
R: Reading a Binary File That Is Zipped
Increase Space Between Legend Keys Without Increasing Legend Keys