Converting Data Frame Column from Character to Numeric

How to convert a data frame column to numeric type?

Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven't specified what type of vector you want to convert to numeric. I suggest that you should apply transform function in order to complete your task.

Now I'm about to demonstrate certain "conversion anomaly":

# create dummy data.frame
d <- data.frame(char = letters[1:5],
fake_char = as.character(1:5),
fac = factor(1:5),
char_fac = factor(letters[1:5]),
num = 1:5, stringsAsFactors = FALSE)

Let us have a glance at data.frame

> d
char fake_char fac char_fac num
1 a 1 1 a 1
2 b 2 2 b 2
3 c 3 3 c 3
4 d 4 4 d 4
5 e 5 5 e 5

and let us run:

> sapply(d, mode)
char fake_char fac char_fac num
"character" "character" "numeric" "numeric" "numeric"
> sapply(d, class)
char fake_char fac char_fac num
"character" "character" "factor" "factor" "integer"

Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.

Here goes: first two columns are character. I've deliberately called 2nd one fake_char. Spot the similarity of this character variable with one that Dirk created in his reply. It's actually a numerical vector converted to character. 3rd and 4th column are factor, and the last one is "purely" numeric.

If you utilize transform function, you can convert the fake_char into numeric, but not the char variable itself.

> transform(d, char = as.numeric(char))
char fake_char fac char_fac num
1 NA 1 1 a 1
2 NA 2 2 b 2
3 NA 3 3 c 3
4 NA 4 4 d 4
5 NA 5 5 e 5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion

but if you do same thing on fake_char and char_fac, you'll be lucky, and get away with no NA's:

> transform(d, fake_char = as.numeric(fake_char), 
char_fac = as.numeric(char_fac))

char fake_char fac char_fac num
1 a 1 1 1 1
2 b 2 2 2 2
3 c 3 3 3 3
4 d 4 4 4 4
5 e 5 5 5 5

If you save transformed data.frame and check for mode and class, you'll get:

> D <- transform(d, fake_char = as.numeric(fake_char), 
char_fac = as.numeric(char_fac))

> sapply(D, mode)
char fake_char fac char_fac num
"character" "numeric" "numeric" "numeric" "numeric"
> sapply(D, class)
char fake_char fac char_fac num
"character" "numeric" "factor" "numeric" "integer"

So, the conclusion is: Yes, you can convert character vector into a numeric one, but only if it's elements are "convertible" to numeric. If there's just one character element in vector, you'll get error when trying to convert that vector to numerical one.

And just to prove my point:

> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion
> char
[1] 1 NA 3 4 NA

And now, just for fun (or practice), try to guess the output of these commands:

> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???

Kind regards to Patrick Burns! =)

Converting data frame column from character to numeric

If we need only one column to be numeric

yyz$b <- as.numeric(as.character(yyz$b))

But, if all the columns needs to changed to numeric, use lapply to loop over the columns and convert to numeric by first converting it to character class as the columns were factor.

yyz[] <- lapply(yyz, function(x) as.numeric(as.character(x)))

Both the columns in the OP's post are factor because of the string "n/a". This could be easily avoided while reading the file using na.strings = "n/a" in the read.table/read.csv or if we are using data.frame, we can have character columns with stringsAsFactors=FALSE (the default is stringsAsFactors=TRUE)


Regarding the usage of apply, it converts the dataset to matrix and matrix can hold only a single class. To check the class, we need

lapply(yyz, class)

Or

sapply(yyz, class)

Or check

str(yyz)

Convert data. frame column character to numeric

You can try,

mapply(function(x, y)paste(x + as.numeric(y), collapse = ','),df$C1 ,strsplit(df$C3, ','))
[1] "33,333,3933,433,4533,433,4233" "83,132,149,158,241,243,253,266,301" "146,149,159,275,420,424,529,627,628,642"

DATA

df <- data.frame(C1 = c(33, 83, 146), 
C2 = c(1, 2, 3),
C3 = c('0,300,3900,400,4500,400,4200', '0,49,66,75,158,160,170,183,218', '0,3,13,129,274,278,383,481,482,496'),
stringsAsFactors = FALSE)

EDIT
To make C3 into numeric you will have to split it into many columns. There are a bunch of ways to do it as shown here. I like the splitstackshape approach, i.e.

library(splitstackshape)
df1 <- cSplit(df, 'C3', sep = ',')

#C1 C2 C3_01 C3_02 C3_03 C3_04 C3_05 C3_06 C3_07 C3_08 C3_09 C3_10
#1: 33 1 33 333 3933 433 4533 433 4233 NA NA NA
#2: 83 2 83 132 149 158 241 243 253 266 301 NA
#3: 146 3 146 149 159 275 420 424 529 627 628 642

str(df1)
Classes ‘data.table’ and 'data.frame': 3 obs. of 12 variables:
$ C1 : num 33 83 146
$ C2 : num 1 2 3
$ C3_01: int 33 83 146
$ C3_02: int 333 132 149
$ C3_03: int 3933 149 159
$ C3_04: int 433 158 275
$ C3_05: int 4533 241 420
$ C3_06: int 433 243 424
$ C3_07: int 4233 253 529
$ C3_08: int NA 266 627
$ C3_09: int NA 301 628
$ C3_10: int NA NA 642

converting multiple columns from character to numeric format in r

You could try

DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)

# Check columns classes
sapply(DF, class)

# a b c
# "character" "character" "character"

cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)

# a b c
# "numeric" "numeric" "character"

For loop for converting character data to numeric in a data frame

Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list in the example below). To acheive this you could use mget() (ex: mget(onomata) or list.files()).

Once this is done, you can use lapply (or mapply) to go through all your dataframes.

Solution 1

To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.

# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){

df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time

# Extract elements you need from the date & time column and store them in new columns
df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))

return(df)
}

# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)

Adapted from Split date data (m/d/y) into 3 separate columns (this answer)

Data:

# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata: df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))

Outputs:

> lapply(df_list, FUN = fun_split_datehour)
$df1
date hour datetime year month day min
1 2010-01-02 5 2010-01-02 05:32:00 2010 1 2 32
2 2010-02-03 8 2010-02-03 08:20:00 2010 2 3 20
3 2010-09-10 15 2010-09-10 15:33:00 2010 9 10 33

$df2
date hour datetime year month day min
1 2010-01-02 5 2010-01-02 05:32:00 2010 1 2 32
2 2010-02-03 8 2010-02-03 08:20:00 2010 2 3 20
3 2010-09-10 15 2010-09-10 15:33:00 2010 9 10 33

And columns year, month, day, hour and min are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour)).

Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.


Solution 2

If you do not want to use POSIXct, you could do:

# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))

# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes

Convert dataframe numeric column to character

I think you can do that and that should work :

coldata$Station<-as.character(coldata$Station)

Problems converting NA to numeric in a data frame in R

We can loop over the columns of dataset, replace the NAs with 0 and convert it to numeric (as there are some character columns)

df[] <- lapply(df, function(x) as.numeric(replace(x, is.na(x), 0)))

The OP's method of replacing the NAs with 0 first should also work, but the character columns remain as character unless we change it

df[is.na(df)] <-0 
df[] <- lapply(df, as.numeric)

Here, we don't have any factor columns, so as.character is not needed. Note that as.character/as.numeric are applied on vector/columns and not on the entire dataset



Related Topics



Leave a reply



Submit