How to concatenate factors, without them being converted to integer level?
From the R Mailing list:
unlist(list(facs[1 : 3], facs[4 : 5]))
To 'cbind' factors, do
data.frame(facs[1 : 3], facs[4 : 5])
How to combine two columns of factors into one column without changing the factor levels into number
factors
are numbers that happen to have labels. When you combine factors, you generally are combining their numeric values. This can often trip a person up.
If you want their labels, you must coerce them to strings, using as.character
student.list <- c( as.character(dataset1[,2]) ,
as.character(dataset2[,2]) )
If you want to get that back to factors, wrap it all in as.factor
(can be all in one line, or split into two lines for easier reading)
student.list <- c(as.character(dataset1[,2]),as.character(dataset2[,2]))
student.list <- as.factor(student.list)
how to convert factor levels to integer in r
We can use match
with unique
elements
library(dplyr)
dat %>%
mutate_all(funs(match(., unique(.))))
# ID Season Year Weekday
#1 1 1 1 1
#2 2 1 2 2
#3 3 2 1 1
#4 4 2 2 3
Combine multiple factor columns into a single numeric column
Here is another base R method, where we replace non-blank value in the column with the numeric part in the column name using sub
.
df[] <- t(as.integer(sub(".*?(\\d+)", "\\1", names(df))) * t(df != ""))
df
# q.82 q.77 q.72
#1 0 77 0
#2 82 0 0
#3 82 0 0
#4 0 0 72
#5 0 0 72
and then if you want to row-wise sum the values you can use rowSums
df$q <- rowSums(df)
Concatenating two vectors in R
We need to convert the factor
class to character
class
c(as.character(a), as.character(b))
The reason we get numbers instead of the character
is based on the storage mode of factor
i.e. an integer. So when we do the concatenation, it coerces to the integer
mode
Combining factor levels in R 3.2.1
I've always found it easiest (less typing and less headache) to convert to character and back for these sorts of operations. Keeping with your as.data.frame.table
and using replace
to do the replacement of the low-frequency levels:
whittle <- function(data, cutoff_val) {
tab = as.data.frame.table(table(data))
factor(replace(as.character(data), data %in% tab$data[tab$Freq < cutoff_val], "Other"))
}
Testing on some sample data:
state <- factor(c("MD", "MD", "MD", "VA", "TX"))
whittle(state, 2)
# [1] MD MD MD Other Other
# Levels: MD Other
How to convert a factor variable to numeric while preserving the numbers in R
dv$ICPSR <- as.numeric(as.character(dv$ICPSR))
Transform your factor to a character vector before transforming it into a numeric vector.
Convert factor to integer
You can combine the two functions; coerce to characters thence to numerics:
> fac <- factor(c("1","2","1","2"))
> as.numeric(as.character(fac))
[1] 1 2 1 2
Converting factor variable to numeric, and from numeric back to factor
Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.
library(dplyr)
data(warpbreaks)
original <- warpbreaks
value_label_map <- warpbreaks %>%
select(wool, tension) %>%
mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
distinct()
warpbreaks <- warpbreaks %>%
mutate(wool = as.numeric(wool), tension = as.numeric(tension))
warpbreaks <- left_join(warpbreaks, value_label_map,
by = c("wool" = "wool_num", "tension" = "tension_num"))
identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)
Related Topics
Directly Creating Dummy Variable Set in a Sparse Matrix in R
Plot Data in Descending Order as Appears in Data Frame
Add Number of Observations Per Group in Ggplot2 Boxplot
Format for Ordinal Dates (Day of Month with Suffixes -St, -Nd, -Rd, -Th)
Creating a Local R Package Repository
How to Group My Date Variable into Month/Year in R
Getting a Stacked Area Plot in R
No Rtools Compatible with R Version 3.5.0 Was Found
Most Frequent Value (Mode) by Group
R - How to Get Row & Column Subscripts of Matched Elements from a Distance Matrix
Merge Multiple Spaces to Single Space; Remove Trailing/Leading Spaces
Dynamic Column Names in Data.Table
How Subset a Data Frame by a Factor and Repeat a Plot for Each Subset
Ggplot2 Shade Area Under Density Curve by Group