R: How to Get the Percentage Change from Two Different Columns

R: How to get the percentage change from two different columns

You can also use the ddply function from plyr package:

library(plyr)
ddply(df, .(Carrier, Station), summarise,
PerentChange = (sum(TYSeats) - sum(LYSeats))/sum(LYSeats))

Carrier Station PerentChange
1 AAL BSB 0.01538462
2 AAL CNF -0.05319134

Applying percentage change between two columns, same row

The answer is below. It was a order issue with the syntax. Here is the working solution which applies a percentage change between two columns.

new.dataframe$close.prct.ema.10 <- apply(new.dataframe[,c('Close', 'ema.10')], 1, function(x) { (x[1]-x[2])/x[2] * 100 } )

(x[1]-x[2])/x[2] * 100

Note the brackets first around (x1-x2) / 2 *100

How can I calculate the percentage change within a group for multiple columns in R?

How about using
pct <- function(x) x/lag(x)? (or (x/lag(x)-1)*100, or however you wish to specify pct change exactly)
e.g.,

pct(1:3)
[1] NA 2.0 1.5

Edit: Adding Frank's suggestion

pct <- function(x) {x/lag(x)}

dt %>% group_by(ID) %>% mutate_each(funs(pct), c(V1, V2, V3))

ID Date V1 V2 V3
1 Jan NA NA NA
1 Feb 1.500000 1.333333 1.2
1 Mar 2.333333 2.000000 1.5
2 Jan NA NA NA
2 Feb 2.000000 3.000000 4.0
2 Mar 3.500000 2.666667 2.0

How to calculate the percent change of a data frame column, then the next, and so on?

Here are a few different approaches. No packages are used.

1) Divide all but the first 2 columns by all but the first and last columns, subtract 1 and multiply by 100. Combine that with the original first column and NA times the original second column.

data.frame(DF[1], NA * DF[2], 100 * (DF[-(1:2)] / DF[-c(1, ncol(DF))] - 1))

giving:

    Product Q1        Q2       Q3
1 X Product NA 1.959081 0.511280
2 Y Product NA -1.222760 2.452749
3 Z Product NA -5.249560 2.611586

1a) A variation of (1) that is even shorter is based on working in the log domain and then converting back:

data.frame(DF[1], NA * DF[2], 100 * t(exp(diff(t(log(DF[-1]))))-1))

giving:

    Product Q1        Q2       Q3
1 X Product NA 1.959081 0.511280
2 Y Product NA -1.222760 2.452749
3 Z Product NA -5.249560 2.611586

2) Define a function percent which calculates the percentages based on vector x returning a vector the same length as x filling in the first element with NA since there is no prior value for which to calculate its percent. Then apply that to each row noting that apply will return the transpose of what we want so transpose it back.

percent <- function(x) 100 * c(NA * x[1], diff(x) / head(x, -1))
data.frame(DF[1], t(apply(DF[-1], 1, percent)))

giving:

    Product Q1        Q2       Q3
1 X Product NA 1.959081 0.511280
2 Y Product NA -1.222760 2.452749
3 Z Product NA -5.249560 2.611586

Note: The input DF in reproducible form was assumed to be:

DF <- structure(list(Product = structure(1:3, .Label = c("X Product", 
"Y Product", "Z Product"), class = "factor"), Q1 = c(4.986184956,
2.86990877, 6.58413545), Q2 = c(5.083868356, 2.834816682, 6.238497279
), Q3 = c(5.109861156, 2.904347607, 6.40142101)), .Names = c("Product",
"Q1", "Q2", "Q3"), class = "data.frame", row.names = c(NA, -3L
))

Create variable for percentage change between two values based on group

If you use lag() to define your rate, you will end up with "NA" for every b == 3, and the correct value for b == 7 when you group_by "b". Also, c() is a Primitive function in R and it's best not to use "c" as the name of a variable.

Is this the outcome you're after?

library(tidyverse)
a <- c(3,7,3,7,3,7,3,3,7,3,7,3,7,7)
b <- c("a", "a", "b", "b", "c", "c", "d", "e", "e", "f","f", "g", "g", "h")
d <- c(80, 100, runif(12, min=80, max=100))
df <- data.frame(a,b,d)

df %>% group_by(b) %>%
mutate(rate = 100 * (d - lag(d, default = NA))/lag(d, default = NA))
#> # A tibble: 14 × 4
#> # Groups: b [8]
#> a b d rate
#> <dbl> <chr> <dbl> <dbl>
#> 1 3 a 80 NA
#> 2 7 a 100 25
#> 3 3 b 88.0 NA
#> 4 7 b 91.1 3.54
#> 5 3 c 95.1 NA
#> 6 7 c 82.7 -13.1
#> 7 3 d 92.6 NA
#> 8 3 e 84.1 NA
#> 9 7 e 91.8 9.20
#> 10 3 f 81.9 NA
#> 11 7 f 93.6 14.4
#> 12 3 g 88.7 NA
#> 13 7 g 80.6 -9.11
#> 14 7 h 99.2 NA

Created on 2021-12-20 by the reprex package (v2.0.1)

You can be more flexible with ifelse()'s too, e.g. if you want NA's for cases where you have a single group but zeros for cases where a == 3:

library(tidyverse)
a <- c(3,7,3,7,3,7,3,3,7,3,7,3,7,7)
b <- c("a", "a", "b", "b", "c", "c", "d", "e", "e", "f","f", "g", "g", "h")
d <- c(80, 100, runif(12, min=80, max=100))
df <- data.frame(a,b,d)

df %>% group_by(b) %>%
mutate(group_number = n()) %>%
mutate(rate = ifelse(group_number == 1, NA, ifelse(a == 7, 100 * (d - lag(d, default = NA))/lag(d, default = NA), 0))) %>%
select(-group_number) %>%
ungroup()
#> # A tibble: 14 × 4
#> a b d rate
#> <dbl> <chr> <dbl> <dbl>
#> 1 3 a 80 0
#> 2 7 a 100 25
#> 3 3 b 95.8 0
#> 4 7 b 83.9 -12.5
#> 5 3 c 87.0 0
#> 6 7 c 81.5 -6.26
#> 7 3 d 97.0 NA
#> 8 3 e 99.1 0
#> 9 7 e 82.6 -16.6
#> 10 3 f 82.3 0
#> 11 7 f 96.0 16.7
#> 12 3 g 99.5 0
#> 13 7 g 93.4 -6.09
#> 14 7 h 86.8 NA

Created on 2021-12-20 by the reprex package (v2.0.1)

Mutating across multiple columns to create percent score in R

You could do:

df %>%
mutate(across(starts_with('score'), ~ ./5 * 100, .names = 'percent_{col}'))

percentage difference between two dataframe columns (only numeric)

We can subset the numeric columns and then perform the operation

num_cols <- c("colB", "colC")
df3 <- (df2[num_cols] - df1[num_cols])/df2[num_cols] * 100
df3

# colB colC
#1 -100 20
#2 50 0

To get other non numeric columns we can use setdiff and then cbind

cbind(df1[setdiff(names(df1), num_cols)], df3)

# colA colZ colB colC
#1 mean stringA -100 20
#2 count stringB 50 0


Related Topics



Leave a reply



Submit