Return a Data Frame from Function

Returning a dataframe in python function

Wwhen you call create_df(), Python calls the function but doesn't save the result in any variable. That is why you got the error.

Assign the result of create_df() to a new variable df like this:

df = create_df()
df

Return a data frame from function

If I understand you correctly, you are trying to create a dataframe with the number of complete cases for each id. Supposing your files are names with the id-numbers like you specified (e.g. f2.csv), you can simplify your function as follows:

myfunc <- function(directory, id = 1:332) {
y <- vector()
for(i in 1:length(id)){
x <- id
y <- c(y, sum(complete.cases(
read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))))))
}
df <- data.frame(x, y)
colnames(df) <- c("id","ret2")
return(df)
}

You can call this function like this:

myfunc("name-of-your-directory",25:87)

An explanation of the above code. You have to break down your problem into steps:

  1. You need a vector of the id's, that's done by x <- id
  2. For each id you want the number of complete cases. In order to get that, you have to read the file first. That's done by read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))). To get the number of complete cases for that file, you have to wrap the read.csv code inside sum and complete.cases.
  3. Now you want to add that number to a vector. Therefore you need an empty vector (y <- vector()) to which you can add the number of complete cases from step 2. That's done by wrapping the code from step 2 inside y <- c(y, "code step 2"). With this you add the number of complete cases for each id to the vector y.
  4. The final step is to combine these two vectors into a dataframe with df <- data.frame(x, y) and assign some meaningfull colnames.

By including the steps 1, 2 and 3 (except the y <- vector() part) in a for-loop, you can iterate over the list of specified id's. Creating the empty vector with y <- vector() has to be done before the for-loop, so that the for-loop can add values to y.

Python - return dataframe and list from function

You can simply return two variables in the function

import pandas as pd
# sample data
data_dict = {'unit':['a','b','c','d'],'salary':[100,200,250,300]}
# create data frame
df = pd.DataFrame(data_dict)
# Function that returns a dataframe
def test_func(df):
# create list meeting some criteria
mylist = list(df.loc[(df['salary']<=200),'unit'])
# create new dataframe based on mylist
new_df = df[df['unit'].isin(mylist)]
return new_df, my list

# create dataframe and list from function
new_df, mylist = test_func(df)

Apply a function returning a data frame to each row in a data frame

You haven't shown what you have in f but based on comments it is written for dataframes, so this should work :

lapply(split(d, seq_len(nrow(d))), f)

split divides every row of d in 1 row-dataframe and using lapply we apply function f on each row.

You can also use by :

by(d, seq_len(nrow(d)), f)

How do I properly call a function and return an updated dataframe?

You could use groupby with apply to get dataframe from apply call, like this:

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
{'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
# create empty dataframe to be returned
comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
# append series data (or any data) to dataframe's columns
comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
comb['AnotherNewfield'] = 'y'
# return complete dataframe
return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

Output:

    Newfield    AnotherNewfield
0 adam y
1 ed y
2 dra y
3 dave y
4 sed y
5 mike y

python: having trouble returning a pandas data frame from a user defined function (probably user error)

Inside testfunction, the variable new_df_to_output is essentially a label that you are assigning to the passed in object.

testfunction('Desired_DF_name') doesn't do what you think; it is assigning the value of the string 'Desired_DF_name' to the variable new_df_to_output; it is not creating a new variable named Desired_DF_name. Basically it's the same as writing new_df_to_output = 'Desired_DF_name'.

You want to save the DataFrame that is returned from the function into a variable. So instead of

testfunction('Desired_DF_name')

you want

def testfunction():
...
Desired_DF_name = testfunction()

(You can change the definition of testfunction to remove the new_df_to_output parameter. The function wasn't doing anything with it anyway because you immediately reassign the variable: new_df_to_output = pd.DataFrame().)

Converting returned values from a function into a data frame

There are a lot of different ways to do it, but with a single tuple entered as the data param for DataFrame we get 4 rows. So we can use .T to transpose the data and get four columns and one row. We can then rename the columns.

def targets():

return (1014.0, 260, 176, 84)

df = pd.DataFrame(targets()).T.rename({0:'value 1', 1:'value 2', 2:'value 3', 3:'value 4'}, axis='columns')

print(df)

value 1 value 2 value 3 value 4
0 1014.0 260.0 176.0 84.0

Return two data frames from a function with data frame format

How about this:

def test():
df1 = pd.DataFrame([1,2,3], ['a','b','c'])
df2 = pd.DataFrame([4,5,6], ['d','e','f'])
return df1, df2

a, b = test()
display(a, b)

This prints out:

    0
a 1
b 2
c 3

0
d 4
e 5
f 6

How to return iteratively updated data frame from function in R?

To somewhat expand on MrFlick's comments:

The issue here is that functions in R perform pass-by-value: df inside df_func is a copy of df (the data.frame with empty columns iter and x) passed to the function. This copy is never modified due to the usage of <<-. Instead, in each iteration of while

df <<- rbind(df, new_df),

which is equivalent to

df <<- rbind(data.frame(iter=integer(), x=integer()), new_df),

modifies df in the global environment, resulting in

> df
iter x
1 10 100

after 10 iterations.

Return a dataframe from a function using the input variable as the name of the dataframe

I made two changes of your code:

1.) str_sub(list, 1, 4) -> str_sub(filenames, 1, 4)
list is a function and dont contain any content.

2.) return(data.frame(pattern)) -> return(df)
returning the data.frame and not a sting.

files_to_df <- function(pattern){ 

# pattern <- "data"
filenames <- list.files(recursive = TRUE, pattern = pattern)

df_list <- lapply(filenames, read.csv, header = TRUE)

# Name each dataframe with the run and filename
names(df_list) <- str_sub(filenames, 1, 4)

# Create combined dataframe
df <- df_list %>%
bind_rows(.id = 'run')

# Assign dataframe to the name of the pattern
assign(pattern, df)

# Return the dataframe
return(data.frame(df))
#list2env(pattern,.GlobalEnv)
}


Related Topics



Leave a reply



Submit