Arithmetic Operations on R Factors

Arithmetic operations on R factors

If you really want the levels of the factor to be used, you're either doing something very wrong or too clever for its own good.

If what you have is a factor containing numbers stored in the levels of the factor, then you want to coerce it to numeric first using as.numeric(as.character(...)):

dat <- data.frame(f=as.character(runif(10)))

You can see the difference between accessing the factor indices and assigning the factor contents here:

> as.numeric(dat$f)
[1] 9 7 2 1 4 6 5 3 10 8
> as.numeric(as.character(dat$f))
[1] 0.6369432 0.4455214 0.1204000 0.0336245 0.2731787 0.4219241 0.2910194
[8] 0.1868443 0.9443593 0.5784658

Timings vs. an alternative approach which only does the conversion on the levels shows it's faster if levels are not unique to each element:

dat <- data.frame( f = sample(as.character(runif(10)),10^4,replace=TRUE) )
library(microbenchmark)
microbenchmark(
as.numeric(as.character(dat$f)),
as.numeric( levels(dat$f) )[dat$f] ,
as.numeric( levels(dat$f)[dat$f] ),
times=50
)

expr min lq median uq max
1 as.numeric(as.character(dat$f)) 7835865 7869228 7919699 7998399 9576694
2 as.numeric(levels(dat$f))[dat$f] 237814 242947 255778 270321 371263
3 as.numeric(levels(dat$f)[dat$f]) 7817045 7905156 7964610 8121583 9297819

Therefore, if length(levels(dat$f)) < length(dat$f), use as.numeric(levels(dat$f))[dat$f] for a substantial speed gain.

If length(levels(dat$f)) is approximately equal to length(dat$f), there is no speed gain:

dat <- data.frame( f = as.character(runif(10^4) ) )
library(microbenchmark)
microbenchmark(
as.numeric(as.character(dat$f)),
as.numeric( levels(dat$f) )[dat$f] ,
as.numeric( levels(dat$f)[dat$f] ),
times=50
)

expr min lq median uq max
1 as.numeric(as.character(dat$f)) 7986423 8036895 8101480 8202850 12522842
2 as.numeric(levels(dat$f))[dat$f] 7815335 7866661 7949640 8102764 15809456
3 as.numeric(levels(dat$f)[dat$f]) 7989845 8040316 8122012 8330312 10420161

R maths operation NA values

Here is a base R solution using rowSums(), where the option na.rm should be set to TRUE.

You can try the code below for your objective:

data$j <- rowSums(abs(replicate((ncol(data)-2),df$a) - data[-(1:2)]),na.rm = T)/156

such that

> data
ID a b c d e f g h i j
1 1 0 0 0 1 NA NA NA NA NA 0.006410256
2 2 0 0 0 1 1 NA NA NA NA 0.012820513
3 3 0 0 0 0 0 NA NA NA NA 0.000000000
4 4 0 0 0 0 0 0 NA NA NA 0.000000000
5 5 0 0 0 NA NA NA NA NA NA 0.000000000
6 6 0 0 0 0 0 NA NA NA NA 0.000000000

DATA

data <- structure(list(ID = 1:6, a = c(0, 0, 0, 0, 0, 0), b = c(0, 0, 
0, 0, 0, 0), c = c(0, 0, 0, 0, 0, 0), d = c(1, 1, 0, 0, NA, 0
), e = c(NA, 1, 0, 0, NA, 0), f = c(NA, NA, NA, 0, NA, NA), g = c(NA,
NA, NA, NA, NA, NA), h = c(NA, NA, NA, NA, NA, NA), i = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = "data.frame")

Add or multiply a different value by factor level

Let's make use of the internal integer representation of a factor:

df$x2 <- with(df, c(1, -1)[class] + x)

I don't recommend using df and class as variable names however, as they are aliased to R base functions. (Don't use data for the same reason.)

Some explanation here. You have coded class with factor levels "low" and "high", so they map to 1 and 2. Try as.integer(df$class) to see this. Now, your code suggest you want to add 1 to x for "low" and subtract 1 from x for "high", so we dispatch the increment vector c(1, -1) according to factor levels, then add it to x.

Arithmetic operation based on value from another column

We can arrange by 'Year', and take the difference of 'PPT' with lead of 'PPT' where the 'n' is specified as 5

library(dplyr)
df %>%
arrange(Year) %>%
mutate(newcol = PPT - lead(PPT, n = 5, default = 0))
# code PPT Year newcol
#1 AFG 123 1990 119
#2 AGO 42 1991 19
#3 ALB 23 1992 -2
#4 AND 5 1993 -1
#5 ARB 23 1994 -611
#6 ARE 4 1995 -1
#7 ARG 23 1996 -5540
#8 ARM 25 1997 -31
#9 ASM 6 1998 -50
#10 ATG 634 1999 -11
#...

if some 'Year's are missing, we can expand the data with complete and then do the mutate

library(tidyr)
df %>%
arrange(Year) %>%
complete(Year = min(Year):max(Year)) %>%
mutate(newcol = PPT - lead(PPT, n = 5, default = 0)) %>%
filter(!is.na(PPT))

Or using base R

df$newcol <- with(df, c(head(PPT, -5) - tail(PPT, -5), tail(PPT, 5)))

data

df <- structure(list(code = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 13L, 13L, 13L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 9L), .Label = c("ABW", "AFG", "AGO", "ALB", "AND",
"ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT"), class = "factor"),
PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56,
56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
Year = 1990:2014), class = "data.frame", row.names = c(NA,
-25L))

How to perform mathematical operation which is stored as character in R?

You can try the code below

> within(df, new.dimension <- sapply(gsub("(\\d)\\(","\\1*(",dimension),function(x) eval(parse(text = x)))) 
component dimension new.dimension
1 a 12*10*05 600
2 b 30*30*20+174*153*21+108*014*04 583110
3 c 98*98*12(2) 230496
4 d 30*30*20(2)+32*34*04 40352

How to do multiple operations, ignoring NAs, in R

If you want to change all NAs to 0 you can do:

df<-data.frame(d1=c(2,2,2,2), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
df.new <- as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, x)))

or (thanks to Sotos!):

df[is.na(df)] <- 0  

But be careful: this will work well for dataframes with all columns numeric. In other cases you might face problems. Here is a solution for the case of nonnumeric columns:

df <- data.frame(d1=c(2,2,2,2), dx=c("A", "bb", "C", "DD"), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
numCols <- sapply(df, is.numeric)

df[, numCols][is.na(df[, numCols])] <- 0
df

How to use apply() with arithmetic functions (R)

Is this what you are looking for:

apply(U, 2, function(c_i) { c_i + min(c_i)*sign(min(c_i))*1.05 })

mathematical operations between the grouped data and a dataframe in R

Using match.

cbind(ss[1:2], ss[-(1:2)] / zz[match(ss$year, zz$year), -(1:2)])
# country year x y z
# 1 a 1961 9.5000000 7.666667 22.000000
# 2 b 1962 4.0000000 20.000000 2.000000
# 3 c 1963 1.0000000 14.000000 7.666667
# 4 d 1961 11.5000000 2.333333 16.000000
# 5 e 1962 24.0000000 4.000000 14.500000
# 6 f 1963 5.3333333 12.500000 4.666667
# 7 g 1961 14.0000000 1.666667 11.000000
# 8 h 1962 9.0000000 8.000000 6.500000
# 9 k 1963 9.6666667 5.000000 9.000000
# 10 v 1961 10.0000000 4.333333 26.000000
# 11 a 1962 14.0000000 9.000000 2.500000
# 12 b 1963 7.0000000 0.500000 4.000000
# 13 c 1961 15.0000000 7.000000 2.000000
# 14 d 1962 1.0000000 11.000000 4.500000
# 15 e 1963 4.0000000 13.000000 3.333333
# 16 f 1961 8.5000000 5.333333 25.000000
# 17 g 1962 25.0000000 27.000000 3.500000
# 18 h 1963 8.6666667 1.000000 7.000000
# 19 k 1961 6.5000000 9.666667 6.000000
# 20 v 1962 8.0000000 24.000000 10.000000
# 21 a 1963 0.6666667 1.500000 1.000000
# 22 b 1961 3.5000000 5.000000 30.000000
# 23 c 1962 10.0000000 6.000000 9.000000
# 24 d 1963 3.6666667 9.500000 2.666667
# 25 e 1961 3.0000000 4.666667 1.000000
# 26 f 1962 22.0000000 22.000000 12.000000
# 27 g 1963 9.0000000 6.000000 5.666667
# 28 h 1961 2.5000000 6.000000 15.000000
# 29 k 1962 15.0000000 17.000000 14.000000
# 30 v 1963 6.0000000 15.000000 6.333333


Related Topics



Leave a reply



Submit