Returning a dataframe in python function
Wwhen you call create_df()
, Python calls the function but doesn't save the result in any variable. That is why you got the error.
Assign the result of create_df()
to a new variable df
like this:
df = create_df()
df
Return a data frame from function
If I understand you correctly, you are trying to create a dataframe with the number of complete cases for each id
. Supposing your files are names with the id-numbers like you specified (e.g. f2.csv
), you can simplify your function as follows:
myfunc <- function(directory, id = 1:332) {
y <- vector()
for(i in 1:length(id)){
x <- id
y <- c(y, sum(complete.cases(
read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))))))
}
df <- data.frame(x, y)
colnames(df) <- c("id","ret2")
return(df)
}
You can call this function like this:
myfunc("name-of-your-directory",25:87)
An explanation of the above code. You have to break down your problem into steps:
- You need a vector of the id's, that's done by
x <- id
- For each
id
you want the number of complete cases. In order to get that, you have to read the file first. That's done byread.csv(as.character(paste0(directory,"/","f",id[i],".csv")))
. To get the number of complete cases for that file, you have to wrap theread.csv
code insidesum
andcomplete.cases
. - Now you want to add that number to a vector. Therefore you need an empty vector (
y <- vector()
) to which you can add the number of complete cases from step 2. That's done by wrapping the code from step 2 insidey <- c(y, "code step 2")
. With this you add the number of complete cases for eachid
to the vectory
. - The final step is to combine these two vectors into a dataframe with
df <- data.frame(x, y)
and assign some meaningfullcolnames
.
By including the steps 1, 2 and 3 (except the y <- vector()
part) in a for-loop, you can iterate over the list of specified id's. Creating the empty vector with y <- vector()
has to be done before the for-loop, so that the for-loop can add values to y
.
Python - return dataframe and list from function
You can simply return two variables in the function
import pandas as pd
# sample data
data_dict = {'unit':['a','b','c','d'],'salary':[100,200,250,300]}
# create data frame
df = pd.DataFrame(data_dict)
# Function that returns a dataframe
def test_func(df):
# create list meeting some criteria
mylist = list(df.loc[(df['salary']<=200),'unit'])
# create new dataframe based on mylist
new_df = df[df['unit'].isin(mylist)]
return new_df, my list
# create dataframe and list from function
new_df, mylist = test_func(df)
Apply a function returning a data frame to each row in a data frame
You haven't shown what you have in f
but based on comments it is written for dataframes, so this should work :
lapply(split(d, seq_len(nrow(d))), f)
split
divides every row of d
in 1 row-dataframe and using lapply
we apply function f
on each row.
You can also use by
:
by(d, seq_len(nrow(d)), f)
How do I properly call a function and return an updated dataframe?
You could use groupby
with apply
to get dataframe from apply
call, like this:
import pandas as pd
# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
{'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
index=['a', 'b', 'c', 'd', 'e', 'f'])
def get_item(data):
# create empty dataframe to be returned
comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
# append series data (or any data) to dataframe's columns
comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
comb['AnotherNewfield'] = 'y'
# return complete dataframe
return comb
# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)
Output:
Newfield AnotherNewfield
0 adam y
1 ed y
2 dra y
3 dave y
4 sed y
5 mike y
python: having trouble returning a pandas data frame from a user defined function (probably user error)
Inside testfunction
, the variable new_df_to_output
is essentially a label that you are assigning to the passed in object.
testfunction('Desired_DF_name')
doesn't do what you think; it is assigning the value of the string 'Desired_DF_name' to the variable new_df_to_output
; it is not creating a new variable named Desired_DF_name
. Basically it's the same as writing new_df_to_output = 'Desired_DF_name'
.
You want to save the DataFrame that is returned from the function into a variable. So instead of
testfunction('Desired_DF_name')
you want
def testfunction():
...
Desired_DF_name = testfunction()
(You can change the definition of testfunction
to remove the new_df_to_output
parameter. The function wasn't doing anything with it anyway because you immediately reassign the variable: new_df_to_output = pd.DataFrame()
.)
Converting returned values from a function into a data frame
There are a lot of different ways to do it, but with a single tuple entered as the data param for DataFrame we get 4 rows. So we can use .T to transpose the data and get four columns and one row. We can then rename the columns.
def targets():
return (1014.0, 260, 176, 84)
df = pd.DataFrame(targets()).T.rename({0:'value 1', 1:'value 2', 2:'value 3', 3:'value 4'}, axis='columns')
print(df)
value 1 value 2 value 3 value 4
0 1014.0 260.0 176.0 84.0
Return two data frames from a function with data frame format
How about this:
def test():
df1 = pd.DataFrame([1,2,3], ['a','b','c'])
df2 = pd.DataFrame([4,5,6], ['d','e','f'])
return df1, df2
a, b = test()
display(a, b)
This prints out:
0
a 1
b 2
c 3
0
d 4
e 5
f 6
How to return iteratively updated data frame from function in R?
To somewhat expand on MrFlick's comments:
The issue here is that functions in R perform pass-by-value: df
inside df_func
is a copy of df
(the data.frame
with empty columns iter
and x
) passed to the function. This copy is never modified due to the usage of <<-
. Instead, in each iteration of while
df <<- rbind(df, new_df)
,
which is equivalent to
df <<- rbind(data.frame(iter=integer(), x=integer()), new_df)
,
modifies df
in the global environment, resulting in
> df
iter x
1 10 100
after 10 iterations.
Return a dataframe from a function using the input variable as the name of the dataframe
I made two changes of your code:
1.) str_sub(list, 1, 4)
-> str_sub(filenames, 1, 4)
list is a function and dont contain any content.
2.) return(data.frame(pattern))
-> return(df)
returning the data.frame and not a sting.
files_to_df <- function(pattern){
# pattern <- "data"
filenames <- list.files(recursive = TRUE, pattern = pattern)
df_list <- lapply(filenames, read.csv, header = TRUE)
# Name each dataframe with the run and filename
names(df_list) <- str_sub(filenames, 1, 4)
# Create combined dataframe
df <- df_list %>%
bind_rows(.id = 'run')
# Assign dataframe to the name of the pattern
assign(pattern, df)
# Return the dataframe
return(data.frame(df))
#list2env(pattern,.GlobalEnv)
}
Related Topics
Rcpparmadillo Pass User-Defined Function
R Gotcha: Logical-And Operator for Combining Conditions Is & Not &&
Returning Anonymous Functions from Lapply - What Is Going Wrong
Struggling with Integers (Maximum Integer Size)
How to Change the First Row to Be the Header in R
Shiny App: Downloadhandler Does Not Produce a File
Setting Absolute Size of Facets in Ggplot2
Subsetting a Data Frame Based on Contents of Another Data Frame
R - Converting Date and Time Fields to Posixct with Hhmmss Format
How to Delete Everything After Nth Delimiter in R
Listing Contents of an R Data File Without Loading
Use Grepl to Search Either of Multiple Substrings in a Text
Using Different Scales as Fill Based on Factor
Number of Significant Digits in Dplyr Summarise