How to Convert All Column Data Type to Numeric and Character Dynamically

How to convert all column data type to numeric and character dynamically?

If you don't know which columns need to be converted beforehand, you can extract that info from your dataframe as follows:

vec <- sapply(dat, is.factor)

which gives:

> vec
particles humidity timestamp date
TRUE TRUE FALSE FALSE

You can then use this vector to do the conversion on the subset with lapply:

# notation option one:
dat[, vec] <- lapply(dat[, vec], function(x) as.numeric(as.character(x)))
# notation option two:
dat[vec] <- lapply(dat[vec], function(x) as.numeric(as.character(x)))

If you want to detect both factor and character columns, you can use:

sapply(dat, function(x) is.factor(x)|is.character(x))

R - convert datatype of all columns in a dataframe from character to numeric dynamically

df <- apply(df,2,function(x){
if(is.character(x)){
x <- as.factor(x)
levels(x) <- 1:length(levels(x))
return(x)
}
})
## I believe that this should work

Change the class from factor to numeric of many columns in a data frame

Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x) returning the internal, numeric representation of the factor x at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character() first as per Ramnath's example.

Your for loop is just as reasonable as an apply call and might be slightly more readable as to what the intention of the code is. Just change this line:

stats[,i] <- as.numeric(stats[,i])

to read

stats[,i] <- as.numeric(as.character(stats[,i]))

This is FAQ 7.10 in the R FAQ.

HTH

How to convert data.frame column from Factor to numeric

breast$class <- as.numeric(as.character(breast$class))

If you have many columns to convert to numeric

indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))

Another option is to use stringsAsFactors=FALSE while reading the file using read.table or read.csv

Just in case, other options to create/change columns

 breast[,'class'] <- as.numeric(as.character(breast[,'class']))

or

 breast <- transform(breast, class=as.numeric(as.character(breast)))

How to convert all columns where entries have length ≤1 to numeric?

The best function for these situations is type_convert(), from readr:

"[type_convert() re-converts character columns in a data frame], which is useful if you need to do some manual munging - you can read the columns in as character, clean it up with (e.g.) regular expressions and other transformations, and then let readr take another stab at parsing it."

So, all you need to do is add it at the end of your pipe:

df %>% ... %>% type_convert() 

Alternatively, we can use type.convert from base R, which would automatically detect the column type based on the value and change it

df[] <- type.convert(df, as.is = TRUE)

If the constraint is to look for columns that have only one character

i1 <- !colSums(nchar(as.matrix(df)) > 1)
df[i1] <- type.convert(df[i1])

If we want to use tidyverse, there is parse_guess from readr

library(tidyverse)
library(readr)
df %>%
mutate_if(all(nchar(.) == 1), parse_guess)

Dynamically type casting to numeric in dplyr and sparklyr

I didn't find an exact solution per se, but I did find a workaround.

typeCheckPartition <- function(df)
{
require(dplyr)
require(varhandle)
checkNumeric <- function(column)
{
column %>% as.data.frame %>% .[,1] %>% varhandle::check.numeric(.) %>% all
}

# this works on non-spark data frames
columns <- colnames(df)
numericIdx <- df %>% mutate(across(all_of(columns), checkNumeric)) %>% .[1,]

return(numericIdx)
}

typeCastSpark <- function(df, max_partitions = 1000, undo_coalesce = T)
{
# numericIdxDf will have these dimensions: num_partition rows x num_columns
# so long as num_columns is not absurd, this coalesce should make collect a safe operation
num_partitions <- sdf_num_partitions(df)
if (num_partitions > max_partitions)
{
undo_coalesce <- T && undo_coalesce
df <- df %>% sdf_coalesce(max_partitions)
} else
{
undo_coalesce <- F
}

columns <- colnames(df)
numericIdxDf <- df %>% spark_apply(typeCheckPartition, packages=T) %>% collect
numericIdx <- numericIdxDf %>% as.data.frame %>% apply(2, all)

doThese <- columns[which(numericIdx==T)]
df <- df %>% mutate_at(all_of(vars(doThese)), as.numeric)

if (undo_coalesce)
df <- df %>% sdf_repartition(num_partitions)

return(df)
}

Just run the typeCastSpark function against your dataframe and it will type cast all of the columns to numeric (that can be).

How to dynamically change data type of columns in data frame

If I understand correctly you have one data frame with factor column values of another data frame. You want to extract these from the 1st df and mutate these columns in the 2nd df and turn them into factors.

What about keeping a vector of the column names and then mutate them all?

colnames <- t %>%
pull(Item) %>%
as.character()

d_with_factors <- d %>%
mutate_at(colnames, as.factor)

Then

sapply(d_with_factors, class)

Returns

       id       age   factor1   factor2 
"integer" "integer" "factor" "factor"


Related Topics



Leave a reply



Submit