Arithmetic Operations on R Factors

Arithmetic operations on R factors

If you really want the levels of the factor to be used, you're either doing something very wrong or too clever for its own good.

If what you have is a factor containing numbers stored in the levels of the factor, then you want to coerce it to numeric first using as.numeric(as.character(...)):

dat <- data.frame(f=as.character(runif(10)))

You can see the difference between accessing the factor indices and assigning the factor contents here:

> as.numeric(dat$f)
 [1]  9  7  2  1  4  6  5  3 10  8
> as.numeric(as.character(dat$f))
 [1] 0.6369432 0.4455214 0.1204000 0.0336245 0.2731787 0.4219241 0.2910194
 [8] 0.1868443 0.9443593 0.5784658

Timings vs. an alternative approach which only does the conversion on the levels shows it's faster if levels are not unique to each element:

dat <- data.frame( f = sample(as.character(runif(10)),10^4,replace=TRUE) )
library(microbenchmark)
microbenchmark(
  as.numeric(as.character(dat$f)),
  as.numeric( levels(dat$f) )[dat$f] ,
  as.numeric( levels(dat$f)[dat$f] ),
  times=50
  )

                              expr     min      lq  median      uq     max
1  as.numeric(as.character(dat$f)) 7835865 7869228 7919699 7998399 9576694
2 as.numeric(levels(dat$f))[dat$f]  237814  242947  255778  270321  371263
3 as.numeric(levels(dat$f)[dat$f]) 7817045 7905156 7964610 8121583 9297819

Therefore, if length(levels(dat$f)) < length(dat$f), use as.numeric(levels(dat$f))[dat$f] for a substantial speed gain.

If length(levels(dat$f)) is approximately equal to length(dat$f), there is no speed gain:

dat <- data.frame( f = as.character(runif(10^4) ) )
library(microbenchmark)
microbenchmark(
  as.numeric(as.character(dat$f)),
  as.numeric( levels(dat$f) )[dat$f] ,
  as.numeric( levels(dat$f)[dat$f] ),
  times=50
  )

                              expr     min      lq  median      uq      max
1  as.numeric(as.character(dat$f)) 7986423 8036895 8101480 8202850 12522842
2 as.numeric(levels(dat$f))[dat$f] 7815335 7866661 7949640 8102764 15809456
3 as.numeric(levels(dat$f)[dat$f]) 7989845 8040316 8122012 8330312 10420161

R maths operation NA values

Here is a base R solution using rowSums(), where the option na.rm should be set to TRUE.

You can try the code below for your objective:

data$j <- rowSums(abs(replicate((ncol(data)-2),df$a) - data[-(1:2)]),na.rm = T)/156

such that

> data
  ID a b c  d  e  f  g  h  i           j
1  1 0 0 0  1 NA NA NA NA NA 0.006410256
2  2 0 0 0  1  1 NA NA NA NA 0.012820513
3  3 0 0 0  0  0 NA NA NA NA 0.000000000
4  4 0 0 0  0  0  0 NA NA NA 0.000000000
5  5 0 0 0 NA NA NA NA NA NA 0.000000000
6  6 0 0 0  0  0 NA NA NA NA 0.000000000

DATA

data <- structure(list(ID = 1:6, a = c(0, 0, 0, 0, 0, 0), b = c(0, 0, 
0, 0, 0, 0), c = c(0, 0, 0, 0, 0, 0), d = c(1, 1, 0, 0, NA, 0
), e = c(NA, 1, 0, 0, NA, 0), f = c(NA, NA, NA, 0, NA, NA), g = c(NA, 
NA, NA, NA, NA, NA), h = c(NA, NA, NA, NA, NA, NA), i = c(NA, 
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = "data.frame")

Add or multiply a different value by factor level

Let's make use of the internal integer representation of a factor:

df$x2 <- with(df, c(1, -1)[class] + x)

I don't recommend using df and class as variable names however, as they are aliased to R base functions. (Don't use data for the same reason.)

Some explanation here. You have coded class with factor levels "low" and "high", so they map to 1 and 2. Try as.integer(df$class) to see this. Now, your code suggest you want to add 1 to x for "low" and subtract 1 from x for "high", so we dispatch the increment vector c(1, -1) according to factor levels, then add it to x.

Arithmetic operation based on value from another column

We can arrange by 'Year', and take the difference of 'PPT' with lead of 'PPT' where the 'n' is specified as 5

library(dplyr)
df %>%
    arrange(Year) %>% 
    mutate(newcol = PPT - lead(PPT, n = 5, default = 0))
#    code  PPT Year newcol
#1   AFG  123 1990    119
#2   AGO   42 1991     19
#3   ALB   23 1992     -2
#4   AND    5 1993     -1
#5   ARB   23 1994   -611
#6   ARE    4 1995     -1
#7   ARG   23 1996  -5540
#8   ARM   25 1997    -31
#9   ASM    6 1998    -50
#10  ATG  634 1999    -11
#...

if some 'Year's are missing, we can expand the data with complete and then do the mutate

library(tidyr)
df %>% 
    arrange(Year) %>% 
    complete(Year = min(Year):max(Year)) %>%
    mutate(newcol = PPT - lead(PPT, n = 5, default = 0)) %>%
    filter(!is.na(PPT))

Or using base R

df$newcol <- with(df, c(head(PPT, -5) - tail(PPT, -5), tail(PPT, 5)))

data

df <- structure(list(code = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 13L, 13L, 13L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 9L), .Label = c("ABW", "AFG", "AGO", "ALB", "AND", 
"ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT"), class = "factor"), 
    PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56, 
    56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
    Year = 1990:2014), class = "data.frame", row.names = c(NA, 
-25L))

How to perform mathematical operation which is stored as character in R?

You can try the code below

> within(df, new.dimension <- sapply(gsub("(\\d)\\(","\\1*(",dimension),function(x) eval(parse(text = x)))) 
  component                      dimension new.dimension
1         a                       12*10*05           600
2         b 30*30*20+174*153*21+108*014*04        583110
3         c                    98*98*12(2)        230496
4         d           30*30*20(2)+32*34*04         40352

How to do multiple operations, ignoring NAs, in R

If you want to change all NAs to 0 you can do:

df<-data.frame(d1=c(2,2,2,2), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
df.new <- as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, x)))

or (thanks to Sotos!):

df[is.na(df)] <- 0

But be careful: this will work well for dataframes with all columns numeric. In other cases you might face problems. Here is a solution for the case of nonnumeric columns:

df <- data.frame(d1=c(2,2,2,2), dx=c("A", "bb", "C", "DD"), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
numCols <- sapply(df, is.numeric)

df[, numCols][is.na(df[, numCols])] <- 0
df

How to use apply() with arithmetic functions (R)

Is this what you are looking for:

apply(U, 2, function(c_i) { c_i + min(c_i)*sign(min(c_i))*1.05 })

mathematical operations between the grouped data and a dataframe in R

Using match.

cbind(ss[1:2], ss[-(1:2)] / zz[match(ss$year, zz$year), -(1:2)])
#   country year          x         y         z
# 1        a 1961  9.5000000  7.666667 22.000000
# 2        b 1962  4.0000000 20.000000  2.000000
# 3        c 1963  1.0000000 14.000000  7.666667
# 4        d 1961 11.5000000  2.333333 16.000000
# 5        e 1962 24.0000000  4.000000 14.500000
# 6        f 1963  5.3333333 12.500000  4.666667
# 7        g 1961 14.0000000  1.666667 11.000000
# 8        h 1962  9.0000000  8.000000  6.500000
# 9        k 1963  9.6666667  5.000000  9.000000
# 10       v 1961 10.0000000  4.333333 26.000000
# 11       a 1962 14.0000000  9.000000  2.500000
# 12       b 1963  7.0000000  0.500000  4.000000
# 13       c 1961 15.0000000  7.000000  2.000000
# 14       d 1962  1.0000000 11.000000  4.500000
# 15       e 1963  4.0000000 13.000000  3.333333
# 16       f 1961  8.5000000  5.333333 25.000000
# 17       g 1962 25.0000000 27.000000  3.500000
# 18       h 1963  8.6666667  1.000000  7.000000
# 19       k 1961  6.5000000  9.666667  6.000000
# 20       v 1962  8.0000000 24.000000 10.000000
# 21       a 1963  0.6666667  1.500000  1.000000
# 22       b 1961  3.5000000  5.000000 30.000000
# 23       c 1962 10.0000000  6.000000  9.000000
# 24       d 1963  3.6666667  9.500000  2.666667
# 25       e 1961  3.0000000  4.666667  1.000000
# 26       f 1962 22.0000000 22.000000 12.000000
# 27       g 1963  9.0000000  6.000000  5.666667
# 28       h 1961  2.5000000  6.000000 15.000000
# 29       k 1962 15.0000000 17.000000 14.000000
# 30       v 1963  6.0000000 15.000000  6.333333

Arithmetic Operations on R Factors