Update Data Frame Via Function Doesn't Work

Update data frame via function doesn't work

test in your function is a copy of the object from your global environment (I'm assuming that's where it is defined). Assignment happens in the current environment unless specified otherwise, so any changes that happen inside the function apply only to the copy inside the function, not the object in your global environment.

And it's good form to pass all necessary objects as arguments to the function.

Personally, I would return(test) at the end of your function and make the assignment outside of the function, but I'm not sure if you can do this in your actual situation.

test.fun <- function (x, test) {
test[test$v1==x,"v2"] <- 10
return(test)
}
test <- data.frame(v1=c(rep(1,3),rep(2,3)),v2=0)
(test <- test.fun(1, test))
# v1 v2
#1 1 10
#2 1 10
#3 1 10
#4 2 0
#5 2 0
#6 2 0

If it is absolutely necessary to modify an object outside your function directly, so you need to tell R that you want to assign the local copy of test to the test in the .GlobalEnv.

test.fun <- function (x, test) {
test[test$v1==x,"v2"] <- 10
assign('test',test,envir=.GlobalEnv)
#test <<- test # This also works, but the above is more explicit.
}
(test.fun(1, test))
# v1 v2
#1 1 10
#2 1 10
#3 1 10
#4 2 0
#5 2 0
#6 2 0

Using assign or <<- in this fashion is fairly uncommon, though, and many experienced R programmers will recommend against it.

Why does function not create new dataset?

Your function cannot modify the object in place. That's how R works. You need something like this:

dat <- read_dta('file.dta')

f1 <- function(x){
mutate_at(x, vars(a,b,c,...,z), list(~ ifelse(. == 9 | . == 99 | . == 999, NA, .)))}

dat <- f1(dat)

Why pandas dataframe doesn't change when i used it as a input of a function with multiprocessing

Serializing df1 & df2 for multiprocessing means that you're making a copy.

Return your dataframe from the function and it'll work fine.

def changeDF(df):
df['Signal'] = 0
return(df)

with multiprocessing.Pool(processes=2) as pool:
df1, df2 = pool.map(changeDF, [df1, df2])

I would warn you that the serialization costs of this will certainly be higher than the benefit you get from multiprocessing.

How do I properly call a function and return an updated dataframe?

You could use groupby with apply to get dataframe from apply call, like this:

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
{'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
# create empty dataframe to be returned
comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
# append series data (or any data) to dataframe's columns
comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
comb['AnotherNewfield'] = 'y'
# return complete dataframe
return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

Output:

    Newfield    AnotherNewfield
0 adam y
1 ed y
2 dra y
3 dave y
4 sed y
5 mike y

R: Using function arguments to update elements in a data frame

Sounds like a simple replace based on matching entries between a (list of) query dataframes and a subject dataframe.

Here is an example based on some simulated data.

I first simulate data for the subject dataframe:

# Sample data
giraffe <- data.frame(
runkeys = seq(1:500),
col1 = runif(500),
col2 = runif(500),
col3 = runif(500),
col4 = runif(500));

I then simulate runkeys data for 2 query dataframes:

spine_hlfs <- data.frame(
runkeys = c(44, 260, 478));
ir_dia <- data.frame(
runkeys = c(10, 20, 30))

The query dataframes are stored in a list:

lst.runkeys <- list(
spine_hlfs = spine_hlfs,
ir_dia = ir_dia);

To flag runkeys entries present in any of the query dataframes, we can use a for loop to match runkeys entries from every query dataframe:

# This is the critical line that loops through the dataframe
# and flags runkeys in giraffe with the name of the query dataframe
for (i in 1:length(lst.runkeys)) {
giraffe[match(lst.runkeys[[i]]$runkeys, giraffe$runkeys), 5] <- names(lst.runkeys)[i];
}

This is the output of the subject dataframe after matching runkeys entries. I'm only showing rows where entries in column 5 where replaced.

giraffe[grep("(spine_hlfs|ir_dia)", giraffe[, 5]), ];
10 10 0.7401977 0.005703928 0.6778921 ir_dia
20 20 0.7954076 0.331462567 0.7637870 ir_dia
30 30 0.5772808 0.183716142 0.6984193 ir_dia
44 44 0.9701355 0.655736489 0.4917452 spine_hlfs
260 260 0.1893012 0.600140166 0.0390346 spine_hlfs
478 478 0.7655976 0.910946623 0.9779205 spine_hlfs

my function doesn't work when mutiple dataframe in space

All I did was "optimize" the code and the function works correctly, again i think the issue is with using an undefined dt as I mentioned in the comments:

Sun <- function(data,A) {
dt <- data.table(data)
dt[, V4:=str_replace_all(as.character(V4),c(" |//"="", "//"="") )][,
str_split_fixed(V4 , ";", 100)
] -> splits
data.table(substr(splits, 1,4)) -> splits

splits[, which(sapply(.SD, function(x) all(!nzchar(x))))] -> rem
splits[, (rem):=NULL]

splits[,
sapply(
A,
function (code) { pmin(1, rowSums(.SD == code, na.rm=T)) },
simplify=F, USE.NAMES=T
)]

}
> Sun(DF1, A)
A01B A02B A03B A04B A05B G01B H02J G01R E05B
1: 0 0 0 0 0 0 0 0 1
2: 0 0 0 0 0 1 0 0 0
3: 0 0 0 0 0 0 1 1 0
> Sun(DF2, A)
A01B A02B A03B A04B A05B G01B H02J G01R E05B
1: 0 0 0 0 1 0 0 0 1
2: 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 1 1 0

DataFrame modified inside a function

def test(df):
df = df.copy(deep=True)
df['tt'] = np.nan
return df

If you pass the dataframe into a function and manipulate it and return the same dataframe, you are going to get the same dataframe in modified version. If you want to keep your old dataframe and create a new dataframe with your modifications then by definition you have to have 2 dataframes. The one that you pass in that you don't want modified and the new one that is modified. Therefore, if you don't want to change the original dataframe your best bet is to make a copy of the original dataframe. In my example I rebound the variable "df" in the function to the new copied dataframe. I used the copy method and the argument "deep=True" makes a copy of the dataframe and its contents. You can read more here:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html

Data frame won't update using observeEvent in Shiny R

Your code directly reacted to every change because you are using reactive.

If you want to delay the reaction you can use observeEvent along with reactiveValues or eventReactive.

Here is an example using reactiveVal and observeEvent:

library(shiny)
library(DT)

current.shiny <- data.frame(
"Task" = as.character(c("Task 1", "Task 2", "Task 3")),
"Completed" = as.character(c("Yes", "NO", "Yes")),
"Date.Completed" = as.Date(c("2020-10-19", "2020-10-20", "2020-10-21"))
)

ui <- fluidPage(
# Application title
titlePanel("Week of 11.02.2020"),

# Sidebar with reactive inputs
sidebarLayout(
sidebarPanel(
selectInput(
inputId = "task.choice",
label = "Task",
choices = c(as.list(current.shiny$Task))
),
selectInput(
inputId = "completed",
label = "Completed?",
choices = c("Yes" = "Yes", "No" = "No")
),
dateInput(inputId = "date.completed", label = "Date Completed"),
actionButton("update", "Update Sheet")
),
mainPanel(column(
12,
DT::dataTableOutput("xchangeOut", width = "100%")
))
))

server <- function(input, output) {
xchange <- reactiveVal(current.shiny)
observeEvent(input$update, {
test.data <- xchange()
test.data$Completed[test.data$Task == input$task.choice] <-input$completed
test.data$Date.Completed[test.data$Task == input$task.choice] <- input$date.completed
xchange(test.data)
# write.csv
})

#Display the most recent file, with the most recent changes
output$xchangeOut <- renderDataTable({
datatable(xchange(), options = list(dom = "t"))
})
}

shinyApp(ui, server)


Related Topics



Leave a reply



Submit