Arithmetic operations on R factors
If you really want the levels of the factor to be used, you're either doing something very wrong or too clever for its own good.
If what you have is a factor containing numbers stored in the levels of the factor, then you want to coerce it to numeric first using as.numeric(as.character(...))
:
dat <- data.frame(f=as.character(runif(10)))
You can see the difference between accessing the factor indices and assigning the factor contents here:
> as.numeric(dat$f)
[1] 9 7 2 1 4 6 5 3 10 8
> as.numeric(as.character(dat$f))
[1] 0.6369432 0.4455214 0.1204000 0.0336245 0.2731787 0.4219241 0.2910194
[8] 0.1868443 0.9443593 0.5784658
Timings vs. an alternative approach which only does the conversion on the levels shows it's faster if levels are not unique to each element:
dat <- data.frame( f = sample(as.character(runif(10)),10^4,replace=TRUE) )
library(microbenchmark)
microbenchmark(
as.numeric(as.character(dat$f)),
as.numeric( levels(dat$f) )[dat$f] ,
as.numeric( levels(dat$f)[dat$f] ),
times=50
)
expr min lq median uq max
1 as.numeric(as.character(dat$f)) 7835865 7869228 7919699 7998399 9576694
2 as.numeric(levels(dat$f))[dat$f] 237814 242947 255778 270321 371263
3 as.numeric(levels(dat$f)[dat$f]) 7817045 7905156 7964610 8121583 9297819
Therefore, if length(levels(dat$f)) < length(dat$f)
, use as.numeric(levels(dat$f))[dat$f]
for a substantial speed gain.
If length(levels(dat$f))
is approximately equal to length(dat$f)
, there is no speed gain:
dat <- data.frame( f = as.character(runif(10^4) ) )
library(microbenchmark)
microbenchmark(
as.numeric(as.character(dat$f)),
as.numeric( levels(dat$f) )[dat$f] ,
as.numeric( levels(dat$f)[dat$f] ),
times=50
)
expr min lq median uq max
1 as.numeric(as.character(dat$f)) 7986423 8036895 8101480 8202850 12522842
2 as.numeric(levels(dat$f))[dat$f] 7815335 7866661 7949640 8102764 15809456
3 as.numeric(levels(dat$f)[dat$f]) 7989845 8040316 8122012 8330312 10420161
R maths operation NA values
Here is a base R solution using rowSums()
, where the option na.rm
should be set to TRUE
.
You can try the code below for your objective:
data$j <- rowSums(abs(replicate((ncol(data)-2),df$a) - data[-(1:2)]),na.rm = T)/156
such that
> data
ID a b c d e f g h i j
1 1 0 0 0 1 NA NA NA NA NA 0.006410256
2 2 0 0 0 1 1 NA NA NA NA 0.012820513
3 3 0 0 0 0 0 NA NA NA NA 0.000000000
4 4 0 0 0 0 0 0 NA NA NA 0.000000000
5 5 0 0 0 NA NA NA NA NA NA 0.000000000
6 6 0 0 0 0 0 NA NA NA NA 0.000000000
DATA
data <- structure(list(ID = 1:6, a = c(0, 0, 0, 0, 0, 0), b = c(0, 0,
0, 0, 0, 0), c = c(0, 0, 0, 0, 0, 0), d = c(1, 1, 0, 0, NA, 0
), e = c(NA, 1, 0, 0, NA, 0), f = c(NA, NA, NA, 0, NA, NA), g = c(NA,
NA, NA, NA, NA, NA), h = c(NA, NA, NA, NA, NA, NA), i = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = "data.frame")
Add or multiply a different value by factor level
Let's make use of the internal integer representation of a factor:
df$x2 <- with(df, c(1, -1)[class] + x)
I don't recommend using df
and class
as variable names however, as they are aliased to R base functions. (Don't use data
for the same reason.)
Some explanation here. You have coded class
with factor levels "low" and "high", so they map to 1 and 2. Try as.integer(df$class)
to see this. Now, your code suggest you want to add 1 to x
for "low" and subtract 1 from x
for "high", so we dispatch the increment vector c(1, -1)
according to factor levels, then add it to x
.
Arithmetic operation based on value from another column
We can arrange
by 'Year', and take the difference of 'PPT' with lead
of 'PPT' where the 'n' is specified as 5
library(dplyr)
df %>%
arrange(Year) %>%
mutate(newcol = PPT - lead(PPT, n = 5, default = 0))
# code PPT Year newcol
#1 AFG 123 1990 119
#2 AGO 42 1991 19
#3 ALB 23 1992 -2
#4 AND 5 1993 -1
#5 ARB 23 1994 -611
#6 ARE 4 1995 -1
#7 ARG 23 1996 -5540
#8 ARM 25 1997 -31
#9 ASM 6 1998 -50
#10 ATG 634 1999 -11
#...
if some 'Year's are missing, we can expand the data with complete
and then do the mutate
library(tidyr)
df %>%
arrange(Year) %>%
complete(Year = min(Year):max(Year)) %>%
mutate(newcol = PPT - lead(PPT, n = 5, default = 0)) %>%
filter(!is.na(PPT))
Or using base R
df$newcol <- with(df, c(head(PPT, -5) - tail(PPT, -5), tail(PPT, 5)))
data
df <- structure(list(code = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 13L, 13L, 13L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 9L), .Label = c("ABW", "AFG", "AGO", "ALB", "AND",
"ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT"), class = "factor"),
PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56,
56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
Year = 1990:2014), class = "data.frame", row.names = c(NA,
-25L))
How to perform mathematical operation which is stored as character in R?
You can try the code below
> within(df, new.dimension <- sapply(gsub("(\\d)\\(","\\1*(",dimension),function(x) eval(parse(text = x))))
component dimension new.dimension
1 a 12*10*05 600
2 b 30*30*20+174*153*21+108*014*04 583110
3 c 98*98*12(2) 230496
4 d 30*30*20(2)+32*34*04 40352
How to do multiple operations, ignoring NAs, in R
If you want to change all NAs to 0 you can do:
df<-data.frame(d1=c(2,2,2,2), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
df.new <- as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, x)))
or (thanks to Sotos!):
df[is.na(df)] <- 0
But be careful: this will work well for dataframes with all columns numeric. In other cases you might face problems. Here is a solution for the case of nonnumeric columns:
df <- data.frame(d1=c(2,2,2,2), dx=c("A", "bb", "C", "DD"), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
numCols <- sapply(df, is.numeric)
df[, numCols][is.na(df[, numCols])] <- 0
df
How to use apply() with arithmetic functions (R)
Is this what you are looking for:
apply(U, 2, function(c_i) { c_i + min(c_i)*sign(min(c_i))*1.05 })
mathematical operations between the grouped data and a dataframe in R
Using match
.
cbind(ss[1:2], ss[-(1:2)] / zz[match(ss$year, zz$year), -(1:2)])
# country year x y z
# 1 a 1961 9.5000000 7.666667 22.000000
# 2 b 1962 4.0000000 20.000000 2.000000
# 3 c 1963 1.0000000 14.000000 7.666667
# 4 d 1961 11.5000000 2.333333 16.000000
# 5 e 1962 24.0000000 4.000000 14.500000
# 6 f 1963 5.3333333 12.500000 4.666667
# 7 g 1961 14.0000000 1.666667 11.000000
# 8 h 1962 9.0000000 8.000000 6.500000
# 9 k 1963 9.6666667 5.000000 9.000000
# 10 v 1961 10.0000000 4.333333 26.000000
# 11 a 1962 14.0000000 9.000000 2.500000
# 12 b 1963 7.0000000 0.500000 4.000000
# 13 c 1961 15.0000000 7.000000 2.000000
# 14 d 1962 1.0000000 11.000000 4.500000
# 15 e 1963 4.0000000 13.000000 3.333333
# 16 f 1961 8.5000000 5.333333 25.000000
# 17 g 1962 25.0000000 27.000000 3.500000
# 18 h 1963 8.6666667 1.000000 7.000000
# 19 k 1961 6.5000000 9.666667 6.000000
# 20 v 1962 8.0000000 24.000000 10.000000
# 21 a 1963 0.6666667 1.500000 1.000000
# 22 b 1961 3.5000000 5.000000 30.000000
# 23 c 1962 10.0000000 6.000000 9.000000
# 24 d 1963 3.6666667 9.500000 2.666667
# 25 e 1961 3.0000000 4.666667 1.000000
# 26 f 1962 22.0000000 22.000000 12.000000
# 27 g 1963 9.0000000 6.000000 5.666667
# 28 h 1961 2.5000000 6.000000 15.000000
# 29 k 1962 15.0000000 17.000000 14.000000
# 30 v 1963 6.0000000 15.000000 6.333333
Related Topics
Get Name of Dataframe Passed Through Pipe in R
Calculating Standard Deviation of Each Row
Dynamically Adjust Height And/Or Width of Shiny-Plotly Output Based on Window Size
How to Create Binned Factor Variables from a Continuous Variable, with Custom Breaks
Data.Table Inner/Outer Join with Na in Join Column of Type Double Bug
How to Show Matrix Values on Levelplot
How to Build a Graph from a Data Frame Using the Igraph Package
Convert Daily to Weekly/Monthly Data with R
Storing Specific Xml Node Values with R's Xmleventparse
Run a Custom Function on a Data Frame in R, by Group
Linear Model and Dplyr - a Better Solution
Rename a Sequence of Variable Names in Data Frame
Why Is Expand.Grid Faster Than Data.Table 's Cj
Change Size of Axes Title and Labels in Ggplot2
Matching Multiple Columns on Different Data Frames and Getting Other Column as Result
How to Create, Structure, Maintain and Update Data Codebooks in R