Error in Running Factor() on a Column of a Data Frame

error in running factor() on a column of a data frame

Your data is a tbl_df. I don't have your data, but we can look at an example using mtcars.

library(dplyr)

tbl_df(mtcars)[, "mpg"]
# Source: local data frame [32 x 1]
# 
#      mpg
#    (dbl)
# 1   21.0
# 2   21.0
# 3   22.8
# 4   21.4
# 5   18.7
# 6   18.1
# 7   14.3
# 8   24.4
# 9   22.8
# 10  19.2
# ..   ...

It's still a data frame, whereas in base R it would have been dropped to an atomic vector. dplyr:::`[.tbl_df` does not drop single columns, as is done in [.data.frame from base R. This is why we can't run factor() on it.

factor(tbl_df(mtcars)[, "mpg"])
# Error in sort.list(y) : 'x' must be atomic for 'sort.list'
# Have you called 'sort' on a list?

So you'll need to use [[, as in df[["my_col"]], or just use $.

df[["my_col"]] <- factor(df[["my_col"]])

Note: When you use the $ operator you can do it without the quotes around the column name.

df$my_col <- factor(df$my_col)

Why do I get this error while running factor() on a column of a data.frame

The column was created using dplyr's mutate() function by adding a list()

Thus the column was read as list()

to solve it..

mydata$finding<-unlist(mydata$finding)
factor(mydata$finding)

Now works
Credits to @User20650 for the solution

R: In a data frame, I get an error using a factor variable's level

The reason it doesn't work is that you are comparing a tibble with a tibble. Suggestion is read hadley wickham's R book, where it's written:

Subsetting a tibble with [ always returns a tibble:

We can try an example:

sizes <- factor(c(1,2,3,7,9,2,1,3,7,3,9,2,3), levels = c(1,3,2,7,9),ordered=TRUE)
write.csv(data.frame(A=1:length(sizes),sizes=sizes),"test.csv",row.names=FALSE)

A_Dataset <- read_csv("test.csv", 
     col_types = cols(A = col_integer(), 
         sizes = col_factor(levels = c("1","3", "2", "7", "9"))))
A_Dataset$sizes = factor(A_Dataset$sizes, levels=c(1,3,2,7,9),ordered=TRUE)

If you look at the class:

class(A_Dataset[1,2])
[1] "tbl_df"     "tbl"        "data.frame"

You cannot compare the data.frames, you can do:

class(A_Dataset$sizes[2])
[1] "ordered" "factor"

A_Dataset$sizes[2] > A_Dataset$sizes[1]
[1] TRUE

And this works:

as.data.frame(A_Dataset[2,2]) >as.data.frame(A_Dataset[1,2])
     sizes
[1,]  TRUE

Converting DF columns to factor is less than straightforward

To change multiple columns to factor, use:

DF[,1:3] <- lapply(DF[,1:3], factor)

To change from factor to numeric, remember to use as.numeric(as.character(x)), like this:

DF[,1:3] <- lapply(DF[,1:3], function(x) as.numeric(as.character(x)))

Only certain values of column as levels in factor

Yes. Use the labels option:


x <- c("a","a","b","b","happy", "sad", "angry")
levels = c("a", "b", "happy", "sad", "angry")
labels = c("letter", "letter", "happy", "sad", "angry")

y <- factor(x, levels, labels = labels)

y

https://rdrr.io/r/base/factor.html

"Duplicated values in labels can be used to map different values of x to the same factor level."

EDIT: Your mistake in the above code example is the nested vector.

Error when mutating a dataframe in R to add a column with an if condition

We just need to change the 'date' to Date class and it should work

data.cur$date <- as.Date(dta.cur$date)

as the error is mainly because of dealing with factor column comparison where it requires a Date class

R: unused argument in levels

From the help page?as.factor it shows that the function only takes one argument (in your case the filtered_table$column), and therefore the error message indicates that there's not another argument to match up with the second one you've specified in the function call. To specify the levels explicitly, you may need to use the factor() function.

Running into R error with matching data frame columns

Consider forgoing the use of for loop and use the base R merge() function of both dataframes. However, a little data management is needed: 1) temporarily convert factors to characters (or use stringAsFactors=FALSE in read.csv() or read.table()) and 2) adding suffixes for repeat column names. Once calculated MAF is complete with ifelse(), split the merged data frame and reset column names and data types to original structure:

# CONVERT FACTORS TO CHARACTER
gwas.data[, c("A1","A2")] <- sapply(gwas.data[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(gwas.data) <- paste0(names(gwas.data), "_A")

# CONVERT FACTORS TO CHARACTER
correct.orientation[, c("A1","A2")] <- sapply(correct.orientation[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(correct.orientation) <- paste0(names(correct.orientation ), "_B")

# MERGE DATA FRAMES (ASSUMING SNP IS UNIQUE IDENTIFIER)
comparedf <- merge(gwas.data, correct.orientation, by.x="SNP_A", by.y="SNP_B", all=TRUE)

# CALCULATE NEW MAF
comparedf$MAF_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
                           (comparedf$A2_B == comparedf$A1_A)), 
                          (1 - comparedf$MAF_A), 
                          comparedf$MAF_A)
comparedf$zscore_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
                              (comparedf$A2_B == comparedf$A1_A)),   
                               -1 * comparedf$zscore_A, 
                               comparedf$zscore_A)

# SPLIT MERGE BACK TO ORIGINAL STRUCTURE
newgwas.data <- comparedf[,names(gwas.data)]
# REMOVE SUFFIX
names(newgwas.data) <- gsub("_A", "", names(newgwas.data))
# RESET FACTORS
newgwas.data$A1 <- as.factor(newgwas.data$A1)
newgwas.data$A2 <- as.factor(newgwas.data$A2)

Error in Running Factor() on a Column of a Data Frame