How to convert all column data type to numeric and character dynamically?
If you don't know which columns need to be converted beforehand, you can extract that info from your dataframe as follows:
vec <- sapply(dat, is.factor)
which gives:
> vec
particles humidity timestamp date
TRUE TRUE FALSE FALSE
You can then use this vector to do the conversion on the subset with lapply
:
# notation option one:
dat[, vec] <- lapply(dat[, vec], function(x) as.numeric(as.character(x)))
# notation option two:
dat[vec] <- lapply(dat[vec], function(x) as.numeric(as.character(x)))
If you want to detect both factor and character columns, you can use:
sapply(dat, function(x) is.factor(x)|is.character(x))
R - convert datatype of all columns in a dataframe from character to numeric dynamically
df <- apply(df,2,function(x){
if(is.character(x)){
x <- as.factor(x)
levels(x) <- 1:length(levels(x))
return(x)
}
})
## I believe that this should work
Change the class from factor to numeric of many columns in a data frame
Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x)
returning the internal, numeric representation of the factor x
at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character()
first as per Ramnath's example.
Your for
loop is just as reasonable as an apply
call and might be slightly more readable as to what the intention of the code is. Just change this line:
stats[,i] <- as.numeric(stats[,i])
to read
stats[,i] <- as.numeric(as.character(stats[,i]))
This is FAQ 7.10 in the R FAQ.
HTH
How to convert data.frame column from Factor to numeric
breast$class <- as.numeric(as.character(breast$class))
If you have many columns to convert to numeric
indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))
Another option is to use stringsAsFactors=FALSE
while reading the file using read.table
or read.csv
Just in case, other options to create/change columns
breast[,'class'] <- as.numeric(as.character(breast[,'class']))
or
breast <- transform(breast, class=as.numeric(as.character(breast)))
How to convert all columns where entries have length ≤1 to numeric?
The best function for these situations is type_convert(), from readr
:
"[type_convert()
re-converts character columns in a data frame], which is useful if you need to do some manual munging - you can read the columns in as character, clean it up with (e.g.) regular expressions and other transformations, and then let readr
take another stab at parsing it."
So, all you need to do is add it at the end of your pipe:
df %>% ... %>% type_convert()
Alternatively, we can use type.convert
from base R
, which would automatically detect the column type based on the value and change it
df[] <- type.convert(df, as.is = TRUE)
If the constraint is to look for columns that have only one character
i1 <- !colSums(nchar(as.matrix(df)) > 1)
df[i1] <- type.convert(df[i1])
If we want to use tidyverse
, there is parse_guess
from readr
library(tidyverse)
library(readr)
df %>%
mutate_if(all(nchar(.) == 1), parse_guess)
Dynamically type casting to numeric in dplyr and sparklyr
I didn't find an exact solution per se, but I did find a workaround.
typeCheckPartition <- function(df)
{
require(dplyr)
require(varhandle)
checkNumeric <- function(column)
{
column %>% as.data.frame %>% .[,1] %>% varhandle::check.numeric(.) %>% all
}
# this works on non-spark data frames
columns <- colnames(df)
numericIdx <- df %>% mutate(across(all_of(columns), checkNumeric)) %>% .[1,]
return(numericIdx)
}
typeCastSpark <- function(df, max_partitions = 1000, undo_coalesce = T)
{
# numericIdxDf will have these dimensions: num_partition rows x num_columns
# so long as num_columns is not absurd, this coalesce should make collect a safe operation
num_partitions <- sdf_num_partitions(df)
if (num_partitions > max_partitions)
{
undo_coalesce <- T && undo_coalesce
df <- df %>% sdf_coalesce(max_partitions)
} else
{
undo_coalesce <- F
}
columns <- colnames(df)
numericIdxDf <- df %>% spark_apply(typeCheckPartition, packages=T) %>% collect
numericIdx <- numericIdxDf %>% as.data.frame %>% apply(2, all)
doThese <- columns[which(numericIdx==T)]
df <- df %>% mutate_at(all_of(vars(doThese)), as.numeric)
if (undo_coalesce)
df <- df %>% sdf_repartition(num_partitions)
return(df)
}
Just run the typeCastSpark function against your dataframe and it will type cast all of the columns to numeric (that can be).
How to dynamically change data type of columns in data frame
If I understand correctly you have one data frame with factor column values of another data frame. You want to extract these from the 1st df and mutate these columns in the 2nd df and turn them into factors.
What about keeping a vector of the column names and then mutate them all?
colnames <- t %>%
pull(Item) %>%
as.character()
d_with_factors <- d %>%
mutate_at(colnames, as.factor)
Then
sapply(d_with_factors, class)
Returns
id age factor1 factor2
"integer" "integer" "factor" "factor"
Related Topics
How to Convert a Numeric Value into a Date Value
How to Print on a Serie Sof Graphs Pairwise Comparisons Bars and Effect Size Value
How to Get a Minimum Value by Group
R Convert String Date (E.G. "October 1, 2014") to Date Format
Removing Unicode Symbols from Column Names
Create All Subvectors of a Certain Length (Moving Window)
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them
Multiply All the Columns in a Data.Frame by the First
Getting Unique Rows of a Table and Their Numbers
Display Different Time Elements at Different Speeds in Gganimate
Merge Multiple Data.Frames in R with Varying Row Length
Filter a Column Which Contains Several Keywords
Plotly - Different Colours for Different Surfaces
In R, Switch Uppercase to Lowercase and Vice-Versa in a String
Cannot Install Stringi Since Xcode Command Line Tools Update