How to Drop Factor Levels While Scraping Data Off Us Census HTML Site

How to drop factor levels while scraping data off US Census HTML site

It's actually R-FAQ 7.10:

You should be able to see the FAQ with your R-help() system. On my machine it is set up as html:

http://127.0.0.1:23603/doc/manual/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

7.10 How do I convert factors to numeric?

It may happen that when reading numeric data into R (usually, when reading in a file), they come in as factors. If f is such a factor object, you can use

as.numeric(as.character(f))
to get the numbers back. More efficient, but harder to remember, is

as.numeric(levels(f))[as.integer(f)]
In any case, do not call as.numeric() or their likes directly for the task at hand (as as.numeric() or unclass() give the internal codes).

How to get the levels number in R?

The as.data.frame method for objects of class "table" returns the first column as a factor and (along with any other "marginal labels" columns) and only the last column as the numeric counts. See the help page for ?table and look at the Value section. Tyler's recommendation to use the R-FAQ recommended as.numeric(as.character(.)) conversion strategy is "standard R".

Merge data frame with SpatialPolygonsDataFrame

Caveat: I've never done this before so I'm "feeling my way around". First look at the object-states:

Note: this was with rgdal_0.9-3 and sp_1.1-1 loaded under R 3.2.1 (and with GDAL installed on my OSX system, from kingchaos, IIRC):

> str(states)
Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
..@ data :'data.frame': 52 obs. of 9 variables:
.. ..$ STATEFP : Factor w/ 52 levels "01","02","04",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ STATENS : Factor w/ 52 levels "00068085","00294478",..: 22 17 2 18 27 28 29 30 16 19 ...
.. ..$ AFFGEOID: Factor w/ 52 levels "0400000US01",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ GEOID : Factor w/ 52 levels "01","02","04",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ STUSPS : Factor w/ 52 levels "AK","AL","AR",..: 5 8 10 11 14 15 13 18 19 21 ...
.. ..$ NAME : Factor w/ 52 levels "Alabama","Alaska",..: 5 9 10 11 13 14 16 18 19 21 ...
.. ..$ LSAD : Factor w/ 1 level "00": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ ALAND : num [1:52] 4.03e+11 1.58e+08 1.39e+11 1.49e+11 2.14e+11 ...
.. ..$ AWATER : num [1:52] 2.05e+10 1.86e+07 3.14e+10 4.95e+09 2.40e+09 ...
..@ polygons :List of 52
.. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
.. .. .. ..@ Polygons :List of 6
.. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
.. .. .. .. .. .. ..@ labpt : num [1:2] -118.4 33.4
.. .. .. .. .. .. ..@ area : num 0.0259
.. .. .. .. .. .. ..@ hole : logi FALSE
##### Snipped rest of output ............................

So after looking for help on merge and reading:

 ?merge   # and choosing the option for:

Merge a Spatial* object having attributes with a data.frame
(in package sp in library /Library/Frameworks/R.framework/Versions/3.2/Resources/library)

I decided to try (and appear to have succeeded:

> newobj <- merge(states, my_counts, by.x="STUSPS", by.y="State")
Warning message:
In .local(x, y, ...) : 8 records in y cannot be matched to x

> names(newobj@data)
[1] "STUSPS" "STATEFP" "STATENS" "AFFGEOID" "GEOID" "NAME"
[7] "LSAD" "ALAND" "AWATER" "count"

The warning makes sense. You seem to have some extra "States" not anticipated by the authors of that "states" shp-file:

> length( table(my_counts$State))
[1] 60
> length( unique(states@data$STUSPS) )
[1] 52

The moral

You should look at the names-values in the two objects when you are merging:

> names(states)
[1] "STATEFP" "STATENS" "AFFGEOID" "GEOID" "STUSPS" "NAME" "LSAD"
[8] "ALAND" "AWATER"

> names(my_counts)
[1] "State" "count"

R convert zipcode or lat/long to county

I ended up using the suggestion from JoshO'Brien mentioned above and found here.

I took his code and changed state to county as shown here:

library(sp)
library(maps)
library(maptools)

# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees

latlong2county <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per county
counties <- map('county', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(counties$names, ":"), function(x) x[1])
counties_sp <- map2SpatialPolygons(counties, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))

# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))

# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, counties_sp)

# Return the county names of the Polygons object containing each point
countyNames <- sapply(counties_sp@polygons, function(x) x@ID)
countyNames[indices]
}

# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))

latlong2county(testPoints)
[1] "wisconsin,juneau" "oregon,crook" # IT WORKS


Related Topics



Leave a reply



Submit