How to convert a data frame column to numeric type?
Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven't specified what type of vector you want to convert to numeric
. I suggest that you should apply transform
function in order to complete your task.
Now I'm about to demonstrate certain "conversion anomaly":
# create dummy data.frame
d <- data.frame(char = letters[1:5],
fake_char = as.character(1:5),
fac = factor(1:5),
char_fac = factor(letters[1:5]),
num = 1:5, stringsAsFactors = FALSE)
Let us have a glance at data.frame
> d
char fake_char fac char_fac num
1 a 1 1 a 1
2 b 2 2 b 2
3 c 3 3 c 3
4 d 4 4 d 4
5 e 5 5 e 5
and let us run:
> sapply(d, mode)
char fake_char fac char_fac num
"character" "character" "numeric" "numeric" "numeric"
> sapply(d, class)
char fake_char fac char_fac num
"character" "character" "factor" "factor" "integer"
Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.
Here goes: first two columns are character
. I've deliberately called 2nd one fake_char
. Spot the similarity of this character
variable with one that Dirk created in his reply. It's actually a numerical
vector converted to character
. 3rd and 4th column are factor
, and the last one is "purely" numeric
.
If you utilize transform
function, you can convert the fake_char
into numeric
, but not the char
variable itself.
> transform(d, char = as.numeric(char))
char fake_char fac char_fac num
1 NA 1 1 a 1
2 NA 2 2 b 2
3 NA 3 3 c 3
4 NA 4 4 d 4
5 NA 5 5 e 5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
but if you do same thing on fake_char
and char_fac
, you'll be lucky, and get away with no NA's:
> transform(d, fake_char = as.numeric(fake_char),
char_fac = as.numeric(char_fac))
char fake_char fac char_fac num
1 a 1 1 1 1
2 b 2 2 2 2
3 c 3 3 3 3
4 d 4 4 4 4
5 e 5 5 5 5
If you save transformed data.frame
and check for mode
and class
, you'll get:
> D <- transform(d, fake_char = as.numeric(fake_char),
char_fac = as.numeric(char_fac))
> sapply(D, mode)
char fake_char fac char_fac num
"character" "numeric" "numeric" "numeric" "numeric"
> sapply(D, class)
char fake_char fac char_fac num
"character" "numeric" "factor" "numeric" "integer"
So, the conclusion is: Yes, you can convert character
vector into a numeric
one, but only if it's elements are "convertible" to numeric
. If there's just one character
element in vector, you'll get error when trying to convert that vector to numerical
one.
And just to prove my point:
> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion
> char
[1] 1 NA 3 4 NA
And now, just for fun (or practice), try to guess the output of these commands:
> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???
Kind regards to Patrick Burns! =)
Converting data frame column from character to numeric
If we need only one column to be numeric
yyz$b <- as.numeric(as.character(yyz$b))
But, if all the columns needs to changed to numeric
, use lapply
to loop over the columns and convert to numeric
by first converting it to character
class as the columns were factor
.
yyz[] <- lapply(yyz, function(x) as.numeric(as.character(x)))
Both the columns in the OP's post are factor
because of the string "n/a"
. This could be easily avoided while reading the file using na.strings = "n/a"
in the read.table/read.csv
or if we are using data.frame
, we can have character
columns with stringsAsFactors=FALSE
(the default is stringsAsFactors=TRUE
)
Regarding the usage of apply
, it converts the dataset to matrix
and matrix
can hold only a single class. To check the class
, we need
lapply(yyz, class)
Or
sapply(yyz, class)
Or check
str(yyz)
Convert data. frame column character to numeric
You can try,
mapply(function(x, y)paste(x + as.numeric(y), collapse = ','),df$C1 ,strsplit(df$C3, ','))
[1] "33,333,3933,433,4533,433,4233" "83,132,149,158,241,243,253,266,301" "146,149,159,275,420,424,529,627,628,642"
DATA
df <- data.frame(C1 = c(33, 83, 146),
C2 = c(1, 2, 3),
C3 = c('0,300,3900,400,4500,400,4200', '0,49,66,75,158,160,170,183,218', '0,3,13,129,274,278,383,481,482,496'),
stringsAsFactors = FALSE)
EDIT
To make C3
into numeric you will have to split it into many columns. There are a bunch of ways to do it as shown here. I like the splitstackshape
approach, i.e.
library(splitstackshape)
df1 <- cSplit(df, 'C3', sep = ',')
#C1 C2 C3_01 C3_02 C3_03 C3_04 C3_05 C3_06 C3_07 C3_08 C3_09 C3_10
#1: 33 1 33 333 3933 433 4533 433 4233 NA NA NA
#2: 83 2 83 132 149 158 241 243 253 266 301 NA
#3: 146 3 146 149 159 275 420 424 529 627 628 642
str(df1)
Classes ‘data.table’ and 'data.frame': 3 obs. of 12 variables:
$ C1 : num 33 83 146
$ C2 : num 1 2 3
$ C3_01: int 33 83 146
$ C3_02: int 333 132 149
$ C3_03: int 3933 149 159
$ C3_04: int 433 158 275
$ C3_05: int 4533 241 420
$ C3_06: int 433 243 424
$ C3_07: int 4233 253 529
$ C3_08: int NA 266 627
$ C3_09: int NA 301 628
$ C3_10: int NA NA 642
converting multiple columns from character to numeric format in r
You could try
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
# Check columns classes
sapply(DF, class)
# a b c
# "character" "character" "character"
cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)
# a b c
# "numeric" "numeric" "character"
For loop for converting character data to numeric in a data frame
Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list
in the example below). To acheive this you could use mget()
(ex: mget(onomata)
or list.files()
).
Once this is done, you can use lapply
(or mapply
) to go through all your dataframes.
Solution 1
To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.
# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
# Extract elements you need from the date & time column and store them in new columns
df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
return(df)
}
# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)
Adapted from Split date data (m/d/y) into 3 separate columns (this answer)
Data:
# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata: df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))
Outputs:
> lapply(df_list, FUN = fun_split_datehour)
$df1
date hour datetime year month day min
1 2010-01-02 5 2010-01-02 05:32:00 2010 1 2 32
2 2010-02-03 8 2010-02-03 08:20:00 2010 2 3 20
3 2010-09-10 15 2010-09-10 15:33:00 2010 9 10 33
$df2
date hour datetime year month day min
1 2010-01-02 5 2010-01-02 05:32:00 2010 1 2 32
2 2010-02-03 8 2010-02-03 08:20:00 2010 2 3 20
3 2010-09-10 15 2010-09-10 15:33:00 2010 9 10 33
And columns year
, month
, day
, hour
and min
are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour))
.
Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.
Solution 2
If you do not want to use POSIXct, you could do:
# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))
# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes
Convert dataframe numeric column to character
I think you can do that and that should work :
coldata$Station<-as.character(coldata$Station)
Problems converting NA to numeric in a data frame in R
We can loop over the columns of dataset, replace
the NAs with 0 and convert it to numeric
(as there are some character
columns)
df[] <- lapply(df, function(x) as.numeric(replace(x, is.na(x), 0)))
The OP's method of replacing the NAs with 0 first should also work, but the character
columns remain as character
unless we change it
df[is.na(df)] <-0
df[] <- lapply(df, as.numeric)
Here, we don't have any factor
columns, so as.character
is not needed. Note that as.character/as.numeric
are applied on vector/columns
and not on the entire dataset
Related Topics
Selecting a Subset of Columns in a Data.Table
How to Calculate the Probability for a Given Quantile in R
Convert 12 Hour Character Time to 24 Hour
Using R to Download Gzipped Data File, Extract, and Import Data
Extract File Extension from File Path
How to Resolve Spherical Geometry Failures When Joining Spatial Data
Relocating Alaska and Hawaii on Thematic Map of the Usa with Ggplot2
Select Na in a Data.Table in R
Using Un-Exported Function from Another R Package
Buffer (Geo)Spatial Points in R with Gbuffer
Wrap Text Around Plots in Markdown
How to Conditionally Highlight Points in Ggplot2 Facet Plots - Mapping Color to Column
Dealing with Very Small Numbers in R
Can Ggplot2 Control Point Size and Line Size (Lineweight) Separately in One Legend
Find All Date Ranges for Overlapping Start and End Dates in R
Replace Multiple Values in a Column for a Single One
Ggplot: Adding Regression Line Equation and R2 with Facet
Modify Glm Function to Adopt User-Specified Link Function in R