Subsetting Data Based on Dynamic Column Names

Subsetting data based on dynamic column names

Use [[-subsetting:

DFb <- DF[DF[[Column_Name]] == "ABC",]

This is not as elegant as subset(), but it works. subset() uses "non-standard evaluation", which is very convenient for interactive use, but makes things more complicated when you want to do this kind of second-order reference.

The main thing is the [[; you could use subset(DF,DF[[Column_Name]]=="ABC") instead, the results will be (almost) equivalent (subset() automatically drops values where the criterion evaluates to NA ...)

You can do this in the dplyr package, which allows more flexibility in avoiding non-standard evaluation, but it's still a bit roundabout (there may be a better way to do this: I'm not very experienced with dplyr).

library("dplyr")    ## for filter_()
library("lazyeval") ## for interp()
colname <- "speed"
filter_(cars,interp(~ var == 4, var = as.name(colname)))

Subset a dataframe based on dynamic number and names of columns

If you change your output$table to the following, it works:

output$table<-renderDT({
req(input$p1, sapply(input$p1, function(x) input[[x]]))
dt_part <- dt
for (colname in input$p1) {
if (is.factor(dt_part[[colname]])) {
dt_part <- subset(dt_part, dt_part[[colname]] %in% input[[colname]])
} else {
dt_part <- subset(dt_part, (dt_part[[colname]] >= input[[colname]][[1]]) & dt_part[[colname]] <= input[[colname]][[2]])
}
}
dt_part
})

The first line makes sure the required fields are available.

The rest of the lines repeatedly filter dt based on the columns in input$p1 and values in input[[input$p1]].

Subset a data.frame using a variable for the column name in the select expression

You are looking for get()

mycol = 'col1'
subset(df1, get(mycol) == 2)

Incorrect subset of dataframe based on dynamic column names selection

By default, inputs where no value is selected are NULL. So you you have to check if the input is NULL and then don't filter. If you use dplyr, I've recently written a function to make this filtering easier in shiny.

Here is a working example with your code:

library(shiny)
library(DT)
library(shinyWidgets)
# ui object
ui <- fluidPage(
titlePanel(p("Spatial app", style = "color:#3474A7")),
sidebarLayout(
sidebarPanel(
pickerInput(
inputId = "p1",
label = "Select Column headers",
choices = colnames( dt),
multiple = TRUE,
options = list(`actions-box` = TRUE)
),
#Add the output for new pickers
uiOutput("pickers")
),

mainPanel(
DTOutput("table")
)
)
)

# server()
server <- function(input, output) {

observeEvent(input$p1, {
#Create the new pickers
output$pickers<-renderUI({
div(lapply(input$p1, function(x){
if (is.numeric(dt[[x]])) {
sliderInput(inputId=x, label=x, min=min(dt[x]), max=max(dt[[x]]), value=c(min(dt[[x]]),max(dt[[x]])))
}
else if (is.factor(dt[[x]])) {
selectInput(
inputId = x#The colname of selected column
,
label = x #The colname of selected column
,
choices = dt[,x]#all rows of selected column
,
multiple = TRUE

)
}

}))
})
})

output_table <- reactive({
req(input$p1, sapply(input$p1, function(x) input[[x]]))
dt_part <- dt
for (colname in input$p1) {
if (is.factor(dt_part[[colname]]) && !is.null(input[[colname]])) {
dt_part <- subset(dt_part, dt_part[[colname]] %in% input[[colname]])
} else {
if (!is.null(input[[colname]][[1]])) {
dt_part <- subset(dt_part, (dt_part[[colname]] >= input[[colname]][[1]]) & dt_part[[colname]] <= input[[colname]][[2]])
}
}
}
dt_part
})
output$table<-renderDT({
output_table()
})
}

# shinyApp()
shinyApp(ui = ui, server = server)

Dynamic column names for subsetting

If I were in your place, I would avoid subset, and manage the issue like this.

xy <- data.frame(vals1 = runif(9), vals2 = runif(9), a = sample(1:3, 9, replace = TRUE), 
b = sample(1:3, 9, replace = TRUE), c = sample(1:3, 9, replace = TRUE),
d = sample(1:3, 9, replace = TRUE))

iterate.vals <- names(xy)[!grepl("vals", names(xy))]
sapply(iterate.vals, FUN = function(x) {
print(xy[xy[, x] == 1, ])
# Run
})

vals1 vals2 a b c d
2 0.6165867 0.3728094 1 1 2 1
3 0.2962395 0.9669952 1 3 1 2
7 0.5657228 0.7200541 1 3 2 3
8 0.7793529 0.8391430 1 1 1 1
vals1 vals2 a b c d
1 0.6028678 0.9178560 2 1 1 3
2 0.6165867 0.3728094 1 1 2 1
5 0.7234325 0.8426445 2 1 1 1
6 0.5637070 0.1895586 2 1 2 3
8 0.7793529 0.8391430 1 1 1 1
vals1 vals2 a b c d
1 0.6028678 0.9178560 2 1 1 3
3 0.2962395 0.9669952 1 3 1 2
4 0.9293780 0.3459115 2 2 1 3
5 0.7234325 0.8426445 2 1 1 1
8 0.7793529 0.8391430 1 1 1 1
vals1 vals2 a b c d
2 0.6165867 0.3728094 1 1 2 1
5 0.7234325 0.8426445 2 1 1 1
8 0.7793529 0.8391430 1 1 1 1

How to add dynamic column name to subset?

eval with parse works

eval(parse( text=paste0("subset(t, ", inputparam, "=='", inputvalue, "')") ))

The inputvalue has to be enclosed as with another quote so that parse recognizes it as a character.

Alternatively you should try something like this, (check comments for reasons)

t[ t[colnames(t)==inputparam]==inputvalue, ]

subsetting data tables by dynamic column names

Let's look at your expression in i:

grep(i,colnames(mm2myModuleByYear),value=TRUE)
[1] "module1997"

Therefore the expression:

grep(i,colnames(mm2myModuleByYear),value=TRUE)==mId
# [1] FALSE

would return FALSE (of course "module1997" != 37). What you intend here is to fetch the column returned by your grep() expression. To to that, you can use get() from base R.

with(mm2myModuleByYear, get(grep(i,colnames(mm2myModuleByYear),value=TRUE)))
# [1] 1428 669 37 NA NA NA

In short, you're missing a get() in your i-expression.

mm2myModuleByYear[get(grep(i,colnames(mm2myModuleByYear),value=TRUE))==mId, authId]
# [1] 2270

Dynamically generate subset column names for a dataframe using for loop

You can do it in base R like this with a bit of help from the lubridate package.

year_months <- c('2021-12', '2021-11', '2021-10')  
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
df[c(
"id",
format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)]
}, curr, prev, SIMPLIFY = FALSE)

Output

[[1]]
id 2021-10(actual) 2021-11(actual) 2021-12(actual) 2021-12(pred) 2021-12(error)
1 M0000607 8.9 7.3 6.1 6.113632 0.7198461
2 M0000609 15.7 14.8 14.2 14.162432 0.1544640
3 M0000612 5.3 3.1 3.5 3.288373 1.2259926

[[2]]
id 2021-09(actual) 2021-10(actual) 2021-11(actual) 2021-11(pred) 2021-11(error)
1 M0000607 10.3 8.9 7.3 8.352098 1.9981091
2 M0000609 17.3 15.7 14.8 13.973182 0.4143733
3 M0000612 6.4 5.3 3.1 3.164683 0.3420726

[[3]]
id 2021-08(actual) 2021-09(actual) 2021-10(actual) 2021-10(pred) 2021-10(error)
1 M0000607 12.6 10.3 8.9 9.619846 0.9455678
2 M0000609 19.2 17.3 15.7 15.545536 4.8832500
3 M0000612 8.3 6.4 5.3 6.525993 1.2158196

If you want to apply a plot function to the selected dataframe, then

year_months <- c('2021-12', '2021-11', '2021-10')  
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
plots <- mapply(function(x, y) {
plot_fun(df[c(
"id",
format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)])
}, curr, prev, SIMPLIFY = FALSE)

gives you a list of (gg)plots.


Update (to also select last year of the current month). However, you need to ensure that the columns you want to select exist in the dataframe; otherwise, you will get an error.

year_months <- c('2021-12', '2021-11', '2021-10')  
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
df[c(
"id",
format(c(x - lubridate::years(1L), seq.Date(y, x, by = "month")), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)]
}, curr, prev, SIMPLIFY = FALSE)

Subset based on variable column name

This is precisely why subset is a bad tool for anything other than interactive use:

d <- data.frame(x = letters[1:5],y = runif(5))
> d[d[,'x'] == 'c',]
x y
3 c 0.3080524

Fundamentally, extracting things in R is built around [. Use it.



Related Topics



Leave a reply



Submit