Add Extra Level to Factors in Dataframe

Add extra level to factors in dataframe

The levels function accept the levels(x) <- value call. Therefore, it's very easy to add different levels:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
str(f1)
Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
levels(f1) <- c(levels(f1),"No Answer")
f1[is.na(f1)] <- "No Answer"
str(f1)
Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...

You can then loop it around all variables in a data.frame:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b"))
f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a"))
df1 <- data.frame(f1,n1=1:11,f2,f3)

str(df1)
'data.frame': 11 obs. of 4 variables:
$ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
$ n1: int 1 2 3 4 5 6 7 8 9 10 ...
$ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...
$ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...

for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer")
df1[is.na(df1)] <- "No Answer"

str(df1)
'data.frame': 11 obs. of 4 variables:
$ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
$ n1: int 1 2 3 4 5 6 7 8 9 10 ...
$ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...
$ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...

Add a new level to a factor and substitute existing one

If you want all the entries to be unique then a factor does not gain you much over just using a character variable.

Probably the simplest way to do what you want is to coerce to a character vector, use the duplicated function to find the duplicates and paste something onto the end of them, then if you want use factor to recoerce it back to a factor. Possibly something like:

df$col_foo <- factor( ifelse( duplicated(df$col_fo), 
paste(df$col_foo, '_x', sep=''), as.character(df$col_foo)))

Add level column to a dataframe with factor column

We extract the levels of the column and then expand it by matching with the element

df$a_lev <- levels(df$a)[df$a]

The class will be character, So, it is easier to use as.character

df$a_lev <- as.character(df$a)

How do I create a factor with three levels in a dataframe in R?

Is something like the following that the question is asking for?

forcats::fct_collapse(y, 
DirImp = grep("DirImp", y, ignore.case = TRUE, value = TRUE),
Distances = grep("km", y, ignore.case = TRUE, value = TRUE),
Control = grep("control", y, ignore.case = TRUE, value = TRUE)
)
# [1] Distances Distances Distances Distances Distances Distances
# [7] Distances Distances Distances Distances Distances Distances
#[13] Distances Distances Distances Distances Distances Distances
#[19] Distances Distances Distances Distances Distances Distances
#[25] Distances Distances Distances Distances Control Distances
#Levels: DirImp Distances Control

Or, maybe more readable,

grep_tmp <- function(pattern, x){
grep(pattern, x, ignore.case = TRUE, value = TRUE)
}

forcats::fct_collapse(y,
DirImp = grep_tmp("DirImp", y),
Distances = grep_tmp("^\\d+km", y),
Control = grep_tmp("control", y)
)

Data

With the levels posted in the question, here is sample data.

set.seed(1234)
x <- scan(text = '"DirImp" "10km" "10km_" "1km_" "20km" "2km_" "30km"
"3km_" "40km" "4km_" "50km" "5km_" "60km" "6km_"
"70km" "7km_" "8km_" "9km_" "control"', what = character())

y <- factor(sample(x, 30, TRUE), levels = x)

Changing levels of dataframe column changes value in dataframe


x_value <- factor("yes", levels = c("no", "yes"))
df <- data.frame(
x = x_value
)

df

x
1 yes

Why the example in the question is showing this "weird" behaviour:

The dataframe created has a factor with one level. The corresponding number of that level is one, and this is the element that is associated with, when you set levels().

Here is a quick example:

If we create a dataframe like this

x_value <- c("somethingElse", "more", "more")
df <- data.frame(
x = x_value
)

df$x

shows us that the levels are

[1] somethingElse more          more         
Levels: more somethingElse

Note, that the first level is "more" even though "somethingElse" occurs first. This is because when sorted "more"comes first.
So, if we assign now

levels(df$x) <- c("yes", "somethingElse", "more")

the first factor level gets "yes", the second gets "somethingElse", resulting in (maybe unintuitively)

              x
1 somethingElse
2 yes
3 yes

add factor levels that are not in use

Matrices can't hold factors. When you put a factor in a matrix, it gets coerced to character, and the unused levels are lost. as.data.frame(matrix(...))) is a bad habit for this (and other class conversion) reasons.

Here's a way to replicate your data transformations as near as I can follow them without losing factor levels:

f <- factor(c("Free", "Work"))
x= rep(f[2], 4)
mon <- data.frame(A = x, B = x)
str(mon)
# 'data.frame': 4 obs. of 2 variables:
# $ A: Factor w/ 2 levels "Free","Work": 2 2 2 2
# $ B: Factor w/ 2 levels "Free","Work": 2 2 2 2
## looks good

# What is y? What's the point?
#mt <- t(as.matrix(rev(data.frame(as.matrix(mon))))) # change order of y

mon$id = 1:nrow(mon)
m <- reshape2::melt(mon, id.vars = "id", factorsAsStrings = FALSE)

levels(m$value)
# [1] "Free" "Work"
## looks good

Now, when we get to plotting, specify drop = FALSE in the scale to include unused levels in the legend. (Use the default drop = TRUE if you don't want the unused levels showing up.) Since the levels are already there, we don't need to customize the labels.

col   <- c("azure",  "orange")

ggplot(m, aes(x = id, y = variable, fill = value)) +
geom_tile(colour="grey10") +
scale_fill_manual(values = col, name = NULL, drop = FALSE) +
theme(panel.background = element_rect(fill = "white"), axis.ticks = element_blank()) +
theme(axis.title.x = element_blank(), axis.title.y = element_blank())

Sample Image

If you want to be extra safe with the color scale, you can add names to the values vector before putting it in the scale:

names(col) = levels(f)

Another way to get the data would be to not worry about the levels during transformation, and re-factor with appropriate levels at the end:

# your original code:
f <- factor(c("Free", "Work"))
mon <- as.data.frame(matrix(as.factor(rep(f[2], times = 8)), nrow = 4))
colnames(mon) <- c("A", "B")

mt <- t(as.matrix(rev(data.frame(as.matrix(mon))))) # change order of y
m <- melt(mt)

# add this at the end
m$value = factor(m$value, levels = levels(f))

# check that it looks good:
str(m$value)
# Factor w/ 2 levels "Free","Work": 2 2 2 2 2 2 2 2

Adding a new factor level to a variable in R

I'm still not sure what you want (!) but this might be getting closer???

I think the easiest way to do this is with plyr's rbind.fill() function, which will automatically unify factor levels. You could also do it by hand by converting the factor variable back to a character variable before putting the pieces together.

Data:

dat2 <- structure(list(Category = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("X",
"Y", "Z"), class = "factor"), Mode = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 2L, 1L, 3L, 1L, 1L, 3L, 2L, 3L, 3L, 2L, 3L), .Label = c("K",
"L", "M"), class = "factor")), .Names = c("Category", "Mode"), row.names = c(NA,
18L), class = "data.frame")

Get row totals:

tab <- with(dat,addmargins(table(Category,Mode),2))

Convert row totals to a data frame:

dat3 <- data.frame(Category=rownames(tab),Mode=paste("Total:",tab[,"Sum"]))

Concatenate:

plyr::rbind.fill(dat2,dat3)

How to automate adding factors to variables in large data frame in R

A data frame is not really an appropriate data structure for storing the
factor level definitions in: there’s no reason to expect all factors to have
an equal amount of levels. Rather, I’d just use a plain list, and store the
level information more compactly as named vectors, along these lines:

df <- data.frame(
Gender = c("2", "2", "1", "2"),
AgeG = c("3", "1", "4", "2")
)

value_labels <- list(
Gender = c("Male" = 1, "Female" = 2),
AgeG = c("<25" = 1, "25-60" = 2, "65-80" = 3, ">80" = 4)
)

Then you can make a function that uses that data structure to make factors
in a data frame:

make_factors <- function(data, value_labels) {
for (var in names(value_labels)) {
if (var %in% colnames(data)) {
vl <- value_labels[[var]]
data[[var]] <- factor(
data[[var]],
levels = unname(vl),
labels = names(vl)
)
}
}
data
}

make_factors(df, value_labels)
#> Gender AgeG
#> 1 Female 65-80
#> 2 Female <25
#> 3 Male >80
#> 4 Female 25-60


Related Topics



Leave a reply



Submit