Coding Variable Values into Classes Using R

coding variable values into classes using R

The cut method as outlined by @Greg is probably what you want here. One thing to note is that cut returns a factor by default, which you can suppress by supplying labels = FALSE to return the integer values:

cut(data$wt, c(178, 200, 300, Inf), labels = FALSE)

Alternatively, if your cutting does not lend itself to natural breaks, you can use ifelse(). You can "nest" the ifelse statements similar to Excel. I use "with" to cut down on the typing needed:

data$group2 <- with(data, ifelse(wt >= 179 & wt < 200, 1, 
ifelse(wt >= 200 & wt < 300, 2, 3))
)

Recoding a dataset with variables of different classes

I've spotted at least one small problem with your custom function: if you're using ifelse, you need to start off with the is.na condition. See this example:

x <- c(1, 2, NA)
ifelse(x == 1, "foo", "bar")
# > [1] "foo" "bar" NA

Here's an alternative I've made. The coalesce function comes from the dplyr package.

recode.var <- function(x) {
if (is.character(x)) {
return(coalesce(as.numeric(x == "Yes"), 0))
}

if (is.numeric(x)) {
return(coalesce(x, 0))
}

if (is.logical(x)) {
return(coalesce(as.numeric(x), 0))
}

x
}

My version does not deal with values outside the options you've mentioned. I'm assuming they don't exist in your dataset, so they don't need to be accounted for, but do tell me if that's a problem.

The final step is how to apply the function to the dataframe. Using dplyr you can use the following:

tmp2 <- mutate_all(tmp, recode.var)

how to get class of a variable in R using loop?

Since data.frames are really just a list of columns, I do this often using lapply:

lapply(df, class)

As for the for loop you have in the example, when you call df$name, R is trying to find the column called "name". Instead, you want df[, name]:

for (i in names(df)){
name <- names(df[i])
print(name)
print(class(df[, name]))
}

Changing Class and Mode from Character to Numeric

The lines

as.factor(df$StudyAreaVisitNote)
as.numeric(df$Year)
as.numeric(df$Session)

do not permanently change the values in df. They return transformed vectors that are printed to the console, then, because you do not save them anywhere, they disappear as soon as that line in done being called. Generally objects in R are not updated via referece, you must alwayts re-assign the returned result to wherevver you would like to store it. So try

df$Year <- as.numeric(df$Year)
df$Session <- as.numeric(df$Session)

instead

Reading the class of each variable in a DF based on a DF list of variables in R

I think what you want is the following:

sapply(DF1[, DF2[,1]], class)

What this does is first subset DF1 to only include those columns which are named in DF2, then maps the "class" function to each column, sapply makes it return a vector. To get the class of each column in a dataset you need to us a mapping function like lapply, or a for loop. For instance lapply(mtcars, class gives you the class of each column.

Change class of variables in a data frame using another reference data frame

You could try it like this:

Make sure both tables are in the same order:

variable_info <- variable_info[match(variable_info$variable_name, names(df)),]

Create a list of function calls:

funs <- sapply(paste0("as.", variable_info$variable_class), match.fun)

Then map them to each column:

df[] <- Map(function(dd, f) f(as.character(dd)), df, funs)

With data.table you could do it almost the same way, except you replace the last line by:

library(data.table)
dt <- as.data.table(df) # or use setDT(df)
dt[, names(dt) := Map(function(dd, f) f(as.character(dd)), dt, funs)]


Related Topics



Leave a reply



Submit