Proper way to loop over the length of a dataframe in R
You can use seq_along
instead:
vec <- numeric()
length(vec)
#[1] 0
for(i in seq_along(vec)) print(i) # doesn't print anything
vec <- 1:5
for(i in seq_along(vec)) print(i)
#[1] 1
#[1] 2
#[1] 3
#[1] 4
#[1] 5
Edit after OP update
df <- data.frame(a = numeric(), b = numeric())
> df
#[1] a b
#<0 rows> (or row.names with length 0)
for(i in seq_len(nrow(df))) print(i) # doesn't print anything
df <- data.frame(a = 1:3, b = 5:7)
for(i in seq_len(nrow(df))) print(i)
#[1] 1
#[1] 2
#[1] 3
Loop over and count columns in a data frame until condition is met
I think you can use rowSums and is.na(). No need for for loops here.
library(dplyr)
DF %>% mutate(HHSIZE=rowSums(across(5:19, ~!is.na(.x)))
An example with a tow data.frame:
#toy dataframe:
df<-data.frame(name=c('bob', 'mike', 'john'),
member1=c(1,1,1),
member2=c(2,1,NA),
member3=c(1, NA, NA),
member4=c(NA, NA, NA))
solution:
DF %>% mutate(HHSIZE=rowSums(across(2:5, ~!is.na(.x)))
name member1 member2 member3 member4 HHSIZE
1 bob 1 2 1 NA 3
2 mike 1 1 NA NA 2
3 john 1 NA NA NA 1
It can be done with a for loop too. Just create an empty numeric vector, then loop rowwise, and populate the vector with the sum of non-na elements of the desired columns in the dataframe (2:5 in my example):
output<-numeric()
for (i in seq_len(nrow(df))){
output[i]<-sum(!is.na(df[i, 2:5]))
}
df$HHSIZE<-output
How to loop through numbered dataframes in R environment. I have to loop through 22 (potentially 22*6) dataframes in R
You can use get(object_name)
to get an object by name
for (i in time) {
df <- get(paste0("y_V_", i))
}
Will get the dataframe y_V_{i} where i is the time index.
You can do the letter as well:
for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
}
}
Will write y_{l}_{i} to df, given that they all exist. That's up to you
Edit: use assign to write to a pasted object name
for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
assign(paste0("df_", l, "_", i), df)
}
}
Second edit. You can write the dataframes to a list:
# first initialize the list
list_with_dfs <- list()
for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
assign(paste0("df_", l, "_", i), df)
# Then write to the list
list_with_dfs[[length(list_with_dfs) + 1]] <- get(paste0("df_", l, "_", i))
# Or just use the df
list_with_dfs[[length(list_with_dfs) + 1]] <- df
}
}
How do I iterate over different length lists?
The problem is that you are iterating over each combination of REPORT_TYPE
and CRITERIA_NO
even though some combinations, like PE-4, don't exist in the data. This results in you passing an empty data frame into ggplot()
which eventually leads to the error you're seeing.
Here's an example way to fix this.
library(tidyverse)
quarter_plot = function(data, x, y) {
df %>%
split(list(.$REPORT_TYPE, .$CRITERIA_NO)) %>%
discard(~ nrow(.x) == 0) %>%
map(function(sub_data) {
ggplot(sub_data, aes_string(x = x, y = y)) +
geom_bar(stat = "identity") +
facet_wrap(~ DESCRIPTION) +
ggtitle("Properties by Qtr")
})
}
# Example usage
quarter_plot(Property, "QTR_END_DATE", "NUMBER_PROPERTY")
In my implementation, I split the data by each combination of REPORT_TYPE
and CRITERIA_NO
and then removed the cases where there is no data. Afterwards I used purrr::map()
to generate a plot for each sub-dataframe.
A couple of notes regarding this implementation:
- Instead of passing symbols into the function I changed it to strings, as
aes_string()
makes the implementation cleaner. Feel free to revert this. - You accidentally used
Property
in your function even thoughdf
is the parameter - I fixed this. - I don't recommend naming your function
plot()
as it overrides an existing function. - I recommend that instead of printing the plots, you actually return them (this was done in my version using
map()
). This gives you the option to save the plots to a variable, and you are still easily able to print the plots to the screen by just running the function in your console.
If you want to make minimal changes to your original function, you could go with something like:
plot <- function(df, x, y){
# create list of reports and criterias in data to loop over
rpt_list<-unique(df$REPORT_TYPE)
crit_list<-unique(df$CRITERIA_NO)
for (i in seq_along(rpt_list)) {
for(j in seq_along(crit_list)){
data=subset(
df,
REPORT_TYPE==rpt_list[[i]] & CRITERIA_NO==crit_list[[j]]
)
if (nrow(data) == 0) {
next
}
x_var <- enquo(x)
y_var <- enquo(y)
blah<-ggplot(data, aes(x=!!x_var, y=!!y_var)) +
geom_bar(stat="identity") +
facet_wrap(~DESCRIPTION) +
ggtitle(expression('Properties by Qtr'))
print(blah)
}
}
}
What is faster/better: Loop over each row of a dataframe or split it into a list of length `nrow` , R
I very often use the scheme tibble %>% nest %>% mutate(map) %>% unnest
.
Take a look at the example below.
library(tidyverse)
n = 10000
f = function(data) sqrt(data$x^2+data$y^2+data$z^2)
tibble(
x = 1:n,
y = x + rnorm(n, mean=50, sd=50),
z = x + y + rnorm(n, mean=50, sd=50)
) %>% nest(data = c(x:z)) %>%
mutate(l = map(data, f)) %>%
unnest(c(data, l))
output
# A tibble: 10,000 x 4
x y z l
<int> <dbl> <dbl> <dbl>
1 1 67.1 136. 151.
2 2 75.4 127. 148.
3 3 -11.1 38.9 40.6
4 4 58.1 106. 121.
5 5 23.5 126. 128.
6 6 73.4 179. 193.
7 7 44.5 121. 129.
8 8 106. 131. 169.
9 9 32.5 140. 144.
10 10 -27.7 82.7 87.8
# ... with 9,990 more rows
For me personally, it is very clear and elegant. But you can disagree with that.
Update 1
Honestly, your question also intrigued me in terms of performance. So I decided to check it out.
Here is the code:
library(tidyverse)
library(microbenchmark)
n = 1000
df = tibble(
x = 1:n,
y = x + rnorm(n, mean=50, sd=50),
z = x + y + rnorm(n, mean=50, sd=50)
)
f = function(data) sqrt(data$x^2+data$y^2+data$z^2)
f1 = function(df){
df %>% nest(data = c(x:z)) %>%
mutate(l = map(data, f)) %>%
unnest(c(data, l))
}
f1(df)
f2 = function(df){
df = df %>% mutate(l=NA)
for(r in 1:nrow(df)){
row = df[r, ]
df$l[r] = f(row)
}
df
}
f2(df)
f3 = function(df){
res = df %>%
split(., .$x) %>%
lapply(., f)
df %>% bind_cols(l = unlist(res))
}
f3(df)
ggplot2::autoplot(microbenchmark(f1(df), f2(df), f3(df), times=100))
Here is the result:
Do I have to add anything else and explain why the scheme tibble%>% nest%>% mutate (map)%>% unnest
is so cool?
How to change length of certain column of a dataframe in foreach() in R
Assuming that we wanted to apply the foreach
separately on those unique
'id' rows, an option is either to split
by 'id' or wrap with an outer for
loop iterating over the unique
'id', then inside the loop, subset
the data for that particular 'id' ('subdat'). Initiate a list
('out') to store the results of the foreach
and assign the matrix
output from the inner nested for
loop
library(doParallel)
un_id <- unique(data2$id)
out <- vector('list', length(un_id))
names(out) <- un_id
registerDoParallel(cl <- makeCluster(length(un_id)))
for(id1 in un_id) {
subdat <- subset(data2, id == id1)
out[[id1]] <- foreach(i = seq_len(nrow(subdat))) %dopar% {
a <- matrix(nrow = nrow(subdat), ncol=2)
for (j in seq(i)) {
for (k in 1:2) {
a[j,k] <- j*k
}
}
a
}
}
stopCluster(cl)
Looping through list of data frames in R
> df1 <- data.frame("Row One"=x, "Row Two"=y)
> df2 <- data.frame("Row Two"=y,"Row One"=x)
> dfList <- list(df1,df2)
> lapply(dfList, function(x) {
names(x)[ grep("One", names(x))] <- "R1"
names(x)[ grep("Two", names(x))] <- "R2"
x} )
[[1]]
R1 R2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
R2 R1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Related Topics
Reshaping Several Variables Wide with Cast
Ddply Multiple Quantiles by Group
How to Increase Stack Space Overflow for Pandoc in R
How to Store the Returned Value from a Shiny Module in Reactivevalues
Use Href Infobox as Actionbutton
Pretty Axis Labels for Log Scale in Ggplot
Applying the Optim Function in R in C++ with Rcpp
Hover Image in Plotly R Chart in Shiny App
Shading Confidence Intervals Manually with Ggplot2
R Data.Table: Subgroup Weighted Percent of Group
What Are Helpful Optimizations in R for Big Data Sets
How to Install Rhadoop Packages (Rmr, Rhdfs, Rhbase)
How to Turn the Numeric Output of Boxplot (With Plot=False) into Something Usable