Update data frame via function doesn't work
test
in your function is a copy of the object from your global environment (I'm assuming that's where it is defined). Assignment happens in the current environment unless specified otherwise, so any changes that happen inside the function apply only to the copy inside the function, not the object in your global environment.
And it's good form to pass all necessary objects as arguments to the function.
Personally, I would return(test)
at the end of your function and make the assignment outside of the function, but I'm not sure if you can do this in your actual situation.
test.fun <- function (x, test) {
test[test$v1==x,"v2"] <- 10
return(test)
}
test <- data.frame(v1=c(rep(1,3),rep(2,3)),v2=0)
(test <- test.fun(1, test))
# v1 v2
#1 1 10
#2 1 10
#3 1 10
#4 2 0
#5 2 0
#6 2 0
If it is absolutely necessary to modify an object outside your function directly, so you need to tell R that you want to assign the local copy of test
to the test
in the .GlobalEnv
.
test.fun <- function (x, test) {
test[test$v1==x,"v2"] <- 10
assign('test',test,envir=.GlobalEnv)
#test <<- test # This also works, but the above is more explicit.
}
(test.fun(1, test))
# v1 v2
#1 1 10
#2 1 10
#3 1 10
#4 2 0
#5 2 0
#6 2 0
Using assign
or <<-
in this fashion is fairly uncommon, though, and many experienced R programmers will recommend against it.
Why does function not create new dataset?
Your function cannot modify the object in place. That's how R works. You need something like this:
dat <- read_dta('file.dta')
f1 <- function(x){
mutate_at(x, vars(a,b,c,...,z), list(~ ifelse(. == 9 | . == 99 | . == 999, NA, .)))}
dat <- f1(dat)
Why pandas dataframe doesn't change when i used it as a input of a function with multiprocessing
Serializing df1
& df2
for multiprocessing means that you're making a copy.
Return your dataframe from the function and it'll work fine.
def changeDF(df):
df['Signal'] = 0
return(df)
with multiprocessing.Pool(processes=2) as pool:
df1, df2 = pool.map(changeDF, [df1, df2])
I would warn you that the serialization costs of this will certainly be higher than the benefit you get from multiprocessing.
How do I properly call a function and return an updated dataframe?
You could use groupby
with apply
to get dataframe from apply
call, like this:
import pandas as pd
# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
{'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
index=['a', 'b', 'c', 'd', 'e', 'f'])
def get_item(data):
# create empty dataframe to be returned
comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
# append series data (or any data) to dataframe's columns
comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
comb['AnotherNewfield'] = 'y'
# return complete dataframe
return comb
# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)
Output:
Newfield AnotherNewfield
0 adam y
1 ed y
2 dra y
3 dave y
4 sed y
5 mike y
R: Using function arguments to update elements in a data frame
Sounds like a simple replace based on match
ing entries between a (list of) query dataframes and a subject dataframe.
Here is an example based on some simulated data.
I first simulate data for the subject dataframe
:
# Sample data
giraffe <- data.frame(
runkeys = seq(1:500),
col1 = runif(500),
col2 = runif(500),
col3 = runif(500),
col4 = runif(500));
I then simulate runkeys
data for 2 query dataframes
:
spine_hlfs <- data.frame(
runkeys = c(44, 260, 478));
ir_dia <- data.frame(
runkeys = c(10, 20, 30))
The query dataframes
are stored in a list
:
lst.runkeys <- list(
spine_hlfs = spine_hlfs,
ir_dia = ir_dia);
To flag runkeys
entries present in any of the query dataframes
, we can use a for
loop to match
runkeys
entries from every query dataframe
:
# This is the critical line that loops through the dataframe
# and flags runkeys in giraffe with the name of the query dataframe
for (i in 1:length(lst.runkeys)) {
giraffe[match(lst.runkeys[[i]]$runkeys, giraffe$runkeys), 5] <- names(lst.runkeys)[i];
}
This is the output of the subject dataframe
after matching runkeys
entries. I'm only showing rows where entries in column 5 where replaced.
giraffe[grep("(spine_hlfs|ir_dia)", giraffe[, 5]), ];
10 10 0.7401977 0.005703928 0.6778921 ir_dia
20 20 0.7954076 0.331462567 0.7637870 ir_dia
30 30 0.5772808 0.183716142 0.6984193 ir_dia
44 44 0.9701355 0.655736489 0.4917452 spine_hlfs
260 260 0.1893012 0.600140166 0.0390346 spine_hlfs
478 478 0.7655976 0.910946623 0.9779205 spine_hlfs
my function doesn't work when mutiple dataframe in space
All I did was "optimize" the code and the function works correctly, again i think the issue is with using an undefined dt
as I mentioned in the comments:
Sun <- function(data,A) {
dt <- data.table(data)
dt[, V4:=str_replace_all(as.character(V4),c(" |//"="", "//"="") )][,
str_split_fixed(V4 , ";", 100)
] -> splits
data.table(substr(splits, 1,4)) -> splits
splits[, which(sapply(.SD, function(x) all(!nzchar(x))))] -> rem
splits[, (rem):=NULL]
splits[,
sapply(
A,
function (code) { pmin(1, rowSums(.SD == code, na.rm=T)) },
simplify=F, USE.NAMES=T
)]
}
> Sun(DF1, A)
A01B A02B A03B A04B A05B G01B H02J G01R E05B
1: 0 0 0 0 0 0 0 0 1
2: 0 0 0 0 0 1 0 0 0
3: 0 0 0 0 0 0 1 1 0
> Sun(DF2, A)
A01B A02B A03B A04B A05B G01B H02J G01R E05B
1: 0 0 0 0 1 0 0 0 1
2: 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 1 1 0
DataFrame modified inside a function
def test(df):
df = df.copy(deep=True)
df['tt'] = np.nan
return df
If you pass the dataframe into a function and manipulate it and return the same dataframe, you are going to get the same dataframe in modified version. If you want to keep your old dataframe and create a new dataframe with your modifications then by definition you have to have 2 dataframes. The one that you pass in that you don't want modified and the new one that is modified. Therefore, if you don't want to change the original dataframe your best bet is to make a copy of the original dataframe. In my example I rebound the variable "df" in the function to the new copied dataframe. I used the copy method and the argument "deep=True" makes a copy of the dataframe and its contents. You can read more here:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html
Data frame won't update using observeEvent in Shiny R
Your code directly reacted to every change because you are using reactive
.
If you want to delay the reaction you can use observeEvent
along with reactiveValues
or eventReactive
.
Here is an example using reactiveVal
and observeEvent
:
library(shiny)
library(DT)
current.shiny <- data.frame(
"Task" = as.character(c("Task 1", "Task 2", "Task 3")),
"Completed" = as.character(c("Yes", "NO", "Yes")),
"Date.Completed" = as.Date(c("2020-10-19", "2020-10-20", "2020-10-21"))
)
ui <- fluidPage(
# Application title
titlePanel("Week of 11.02.2020"),
# Sidebar with reactive inputs
sidebarLayout(
sidebarPanel(
selectInput(
inputId = "task.choice",
label = "Task",
choices = c(as.list(current.shiny$Task))
),
selectInput(
inputId = "completed",
label = "Completed?",
choices = c("Yes" = "Yes", "No" = "No")
),
dateInput(inputId = "date.completed", label = "Date Completed"),
actionButton("update", "Update Sheet")
),
mainPanel(column(
12,
DT::dataTableOutput("xchangeOut", width = "100%")
))
))
server <- function(input, output) {
xchange <- reactiveVal(current.shiny)
observeEvent(input$update, {
test.data <- xchange()
test.data$Completed[test.data$Task == input$task.choice] <-input$completed
test.data$Date.Completed[test.data$Task == input$task.choice] <- input$date.completed
xchange(test.data)
# write.csv
})
#Display the most recent file, with the most recent changes
output$xchangeOut <- renderDataTable({
datatable(xchange(), options = list(dom = "t"))
})
}
shinyApp(ui, server)
Related Topics
Overlap Join With Start and End Positions
How to Prevent Ifelse() from Turning Date Objects into Numeric Objects
Is R'S Apply Family More Than Syntactic Sugar
Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'
Split Column At Delimiter in Data Frame
Pass a Data.Frame Column Name to a Function
Extract Row Corresponding to Minimum Value of a Variable by Group
Unique Combination of All Elements from Two (Or More) Vectors
How to Read Data When Some Numbers Contain Commas as Thousand Separator
How to Use a Variable to Specify Column Name in Ggplot
Calculate the Mean of Every 13 Rows in Data Frame
Add a Common Legend For Combined Ggplots
Force R Not to Use Exponential Notation (E.G. E+10)
Concatenate a Vector of Strings/Character