How to Loop Over the Length of a Dataframe in R

Proper way to loop over the length of a dataframe in R

You can use seq_along instead:

vec <- numeric() 
length(vec)
#[1] 0

for(i in seq_along(vec)) print(i) # doesn't print anything

vec <- 1:5

for(i in seq_along(vec)) print(i)
#[1] 1
#[1] 2
#[1] 3
#[1] 4
#[1] 5

Edit after OP update

df <- data.frame(a = numeric(), b = numeric())
> df
#[1] a b
#<0 rows> (or row.names with length 0)

for(i in seq_len(nrow(df))) print(i) # doesn't print anything

df <- data.frame(a = 1:3, b = 5:7)

for(i in seq_len(nrow(df))) print(i)
#[1] 1
#[1] 2
#[1] 3

Loop over and count columns in a data frame until condition is met

I think you can use rowSums and is.na(). No need for for loops here.

library(dplyr)

DF %>% mutate(HHSIZE=rowSums(across(5:19, ~!is.na(.x)))

An example with a tow data.frame:

#toy dataframe:

df<-data.frame(name=c('bob', 'mike', 'john'),
member1=c(1,1,1),
member2=c(2,1,NA),
member3=c(1, NA, NA),
member4=c(NA, NA, NA))

solution:

DF %>% mutate(HHSIZE=rowSums(across(2:5, ~!is.na(.x)))

name member1 member2 member3 member4 HHSIZE
1 bob 1 2 1 NA 3
2 mike 1 1 NA NA 2
3 john 1 NA NA NA 1

It can be done with a for loop too. Just create an empty numeric vector, then loop rowwise, and populate the vector with the sum of non-na elements of the desired columns in the dataframe (2:5 in my example):

output<-numeric()
for (i in seq_len(nrow(df))){
output[i]<-sum(!is.na(df[i, 2:5]))
}
df$HHSIZE<-output

How to loop through numbered dataframes in R environment. I have to loop through 22 (potentially 22*6) dataframes in R

You can use get(object_name) to get an object by name

for (i in time) {
df <- get(paste0("y_V_", i))
}

Will get the dataframe y_V_{i} where i is the time index.
You can do the letter as well:

for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
}
}

Will write y_{l}_{i} to df, given that they all exist. That's up to you


Edit: use assign to write to a pasted object name

for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
assign(paste0("df_", l, "_", i), df)
}
}

Second edit. You can write the dataframes to a list:

# first initialize the list
list_with_dfs <- list()

for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
assign(paste0("df_", l, "_", i), df)

# Then write to the list
list_with_dfs[[length(list_with_dfs) + 1]] <- get(paste0("df_", l, "_", i))

# Or just use the df
list_with_dfs[[length(list_with_dfs) + 1]] <- df
}
}

How do I iterate over different length lists?

The problem is that you are iterating over each combination of REPORT_TYPE and CRITERIA_NO even though some combinations, like PE-4, don't exist in the data. This results in you passing an empty data frame into ggplot() which eventually leads to the error you're seeing.

Here's an example way to fix this.

library(tidyverse)
quarter_plot = function(data, x, y) {
df %>%
split(list(.$REPORT_TYPE, .$CRITERIA_NO)) %>%
discard(~ nrow(.x) == 0) %>%
map(function(sub_data) {
ggplot(sub_data, aes_string(x = x, y = y)) +
geom_bar(stat = "identity") +
facet_wrap(~ DESCRIPTION) +
ggtitle("Properties by Qtr")
})
}

# Example usage
quarter_plot(Property, "QTR_END_DATE", "NUMBER_PROPERTY")

In my implementation, I split the data by each combination of REPORT_TYPE and CRITERIA_NO and then removed the cases where there is no data. Afterwards I used purrr::map() to generate a plot for each sub-dataframe.
A couple of notes regarding this implementation:

  1. Instead of passing symbols into the function I changed it to strings, as aes_string() makes the implementation cleaner. Feel free to revert this.
  2. You accidentally used Property in your function even though df is the parameter - I fixed this.
  3. I don't recommend naming your function plot() as it overrides an existing function.
  4. I recommend that instead of printing the plots, you actually return them (this was done in my version using map()). This gives you the option to save the plots to a variable, and you are still easily able to print the plots to the screen by just running the function in your console.

If you want to make minimal changes to your original function, you could go with something like:

plot <- function(df, x, y){
# create list of reports and criterias in data to loop over

rpt_list<-unique(df$REPORT_TYPE)
crit_list<-unique(df$CRITERIA_NO)

for (i in seq_along(rpt_list)) {
for(j in seq_along(crit_list)){
data=subset(
df,
REPORT_TYPE==rpt_list[[i]] & CRITERIA_NO==crit_list[[j]]
)
if (nrow(data) == 0) {
next
}
x_var <- enquo(x)
y_var <- enquo(y)

blah<-ggplot(data, aes(x=!!x_var, y=!!y_var)) +
geom_bar(stat="identity") +
facet_wrap(~DESCRIPTION) +
ggtitle(expression('Properties by Qtr'))
print(blah)
}
}
}

What is faster/better: Loop over each row of a dataframe or split it into a list of length `nrow` , R

I very often use the scheme tibble %>% nest %>% mutate(map) %>% unnest.
Take a look at the example below.

library(tidyverse)
n = 10000

f = function(data) sqrt(data$x^2+data$y^2+data$z^2)
tibble(
x = 1:n,
y = x + rnorm(n, mean=50, sd=50),
z = x + y + rnorm(n, mean=50, sd=50)
) %>% nest(data = c(x:z)) %>%
mutate(l = map(data, f)) %>%
unnest(c(data, l))

output

# A tibble: 10,000 x 4
x y z l
<int> <dbl> <dbl> <dbl>
1 1 67.1 136. 151.
2 2 75.4 127. 148.
3 3 -11.1 38.9 40.6
4 4 58.1 106. 121.
5 5 23.5 126. 128.
6 6 73.4 179. 193.
7 7 44.5 121. 129.
8 8 106. 131. 169.
9 9 32.5 140. 144.
10 10 -27.7 82.7 87.8
# ... with 9,990 more rows

For me personally, it is very clear and elegant. But you can disagree with that.

Update 1

Honestly, your question also intrigued me in terms of performance. So I decided to check it out.
Here is the code:

library(tidyverse)
library(microbenchmark)

n = 1000
df = tibble(
x = 1:n,
y = x + rnorm(n, mean=50, sd=50),
z = x + y + rnorm(n, mean=50, sd=50)
)

f = function(data) sqrt(data$x^2+data$y^2+data$z^2)

f1 = function(df){
df %>% nest(data = c(x:z)) %>%
mutate(l = map(data, f)) %>%
unnest(c(data, l))
}
f1(df)

f2 = function(df){
df = df %>% mutate(l=NA)
for(r in 1:nrow(df)){
row = df[r, ]
df$l[r] = f(row)
}
df
}
f2(df)

f3 = function(df){
res = df %>%
split(., .$x) %>%
lapply(., f)
df %>% bind_cols(l = unlist(res))
}
f3(df)

ggplot2::autoplot(microbenchmark(f1(df), f2(df), f3(df), times=100))

Here is the result:
Sample Image
Do I have to add anything else and explain why the scheme tibble%>% nest%>% mutate (map)%>% unnest is so cool?

How to change length of certain column of a dataframe in foreach() in R

Assuming that we wanted to apply the foreach separately on those unique 'id' rows, an option is either to split by 'id' or wrap with an outer for loop iterating over the unique 'id', then inside the loop, subset the data for that particular 'id' ('subdat'). Initiate a list ('out') to store the results of the foreach and assign the matrix output from the inner nested for loop

library(doParallel)

un_id <- unique(data2$id)
out <- vector('list', length(un_id))
names(out) <- un_id
registerDoParallel(cl <- makeCluster(length(un_id)))
for(id1 in un_id) {
subdat <- subset(data2, id == id1)
out[[id1]] <- foreach(i = seq_len(nrow(subdat))) %dopar% {
a <- matrix(nrow = nrow(subdat), ncol=2)
for (j in seq(i)) {
for (k in 1:2) {
a[j,k] <- j*k
}
}
a
}


}

stopCluster(cl)

Looping through list of data frames in R

> df1 <- data.frame("Row One"=x, "Row Two"=y)
> df2 <- data.frame("Row Two"=y,"Row One"=x)
> dfList <- list(df1,df2)
> lapply(dfList, function(x) {
names(x)[ grep("One", names(x))] <- "R1"
names(x)[ grep("Two", names(x))] <- "R2"
x} )
[[1]]
R1 R2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

[[2]]
R2 R1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5


Related Topics



Leave a reply



Submit