Return a Data Frame from Function

Returning a dataframe in python function

Wwhen you call create_df(), Python calls the function but doesn't save the result in any variable. That is why you got the error.

Assign the result of create_df() to a new variable df like this:

df = create_df()
df

Return a data frame from function

If I understand you correctly, you are trying to create a dataframe with the number of complete cases for each id. Supposing your files are names with the id-numbers like you specified (e.g. f2.csv), you can simplify your function as follows:

myfunc <- function(directory, id = 1:332) {
  y <- vector()
  for(i in 1:length(id)){
    x <- id
    y <- c(y, sum(complete.cases(
      read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))))))
  }
  df <- data.frame(x, y)
  colnames(df) <- c("id","ret2")
  return(df)
}

You can call this function like this:

myfunc("name-of-your-directory",25:87)

An explanation of the above code. You have to break down your problem into steps:

You need a vector of the id's, that's done by x <- id
For each id you want the number of complete cases. In order to get that, you have to read the file first. That's done by read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))). To get the number of complete cases for that file, you have to wrap the read.csv code inside sum and complete.cases.
Now you want to add that number to a vector. Therefore you need an empty vector (y <- vector()) to which you can add the number of complete cases from step 2. That's done by wrapping the code from step 2 inside y <- c(y, "code step 2"). With this you add the number of complete cases for each id to the vector y.
The final step is to combine these two vectors into a dataframe with df <- data.frame(x, y) and assign some meaningfull colnames.

By including the steps 1, 2 and 3 (except the y <- vector() part) in a for-loop, you can iterate over the list of specified id's. Creating the empty vector with y <- vector() has to be done before the for-loop, so that the for-loop can add values to y.

Python - return dataframe and list from function

You can simply return two variables in the function

import pandas as pd
# sample data
data_dict = {'unit':['a','b','c','d'],'salary':[100,200,250,300]}
# create data frame
df = pd.DataFrame(data_dict)
# Function that returns a dataframe
def test_func(df):
    # create list meeting some criteria
    mylist = list(df.loc[(df['salary']<=200),'unit'])
    # create new dataframe based on mylist
    new_df = df[df['unit'].isin(mylist)]
    return new_df, my list

# create dataframe and list from function
new_df, mylist = test_func(df)

Apply a function returning a data frame to each row in a data frame

You haven't shown what you have in f but based on comments it is written for dataframes, so this should work :

lapply(split(d, seq_len(nrow(d))), f)

split divides every row of d in 1 row-dataframe and using lapply we apply function f on each row.

You can also use by :

by(d, seq_len(nrow(d)), f)

How do I properly call a function and return an updated dataframe?

You could use groupby with apply to get dataframe from apply call, like this:

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
    {'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
    index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
    # create empty dataframe to be returned
    comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
    # append series data (or any data) to dataframe's columns 
    comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
    comb['AnotherNewfield'] = 'y'
    # return complete dataframe
    return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

Output:

    Newfield    AnotherNewfield
0   adam        y
1   ed          y
2   dra         y
3   dave        y
4   sed         y
5   mike        y

python: having trouble returning a pandas data frame from a user defined function (probably user error)

Inside testfunction, the variable new_df_to_output is essentially a label that you are assigning to the passed in object.

testfunction('Desired_DF_name') doesn't do what you think; it is assigning the value of the string 'Desired_DF_name' to the variable new_df_to_output; it is not creating a new variable named Desired_DF_name. Basically it's the same as writing new_df_to_output = 'Desired_DF_name'.

You want to save the DataFrame that is returned from the function into a variable. So instead of

testfunction('Desired_DF_name')

you want

def testfunction():
    ...
Desired_DF_name = testfunction()

(You can change the definition of testfunction to remove the new_df_to_output parameter. The function wasn't doing anything with it anyway because you immediately reassign the variable: new_df_to_output = pd.DataFrame().)

Converting returned values from a function into a data frame

There are a lot of different ways to do it, but with a single tuple entered as the data param for DataFrame we get 4 rows. So we can use .T to transpose the data and get four columns and one row. We can then rename the columns.

def targets():
    
    return (1014.0, 260, 176, 84)

df = pd.DataFrame(targets()).T.rename({0:'value 1', 1:'value 2', 2:'value 3', 3:'value 4'}, axis='columns')

print(df)

    value 1  value 2  value 3   value 4
0   1014.0   260.0    176.0     84.0

Return two data frames from a function with data frame format

How about this:

def test():
    df1 = pd.DataFrame([1,2,3], ['a','b','c'])
    df2 = pd.DataFrame([4,5,6], ['d','e','f'])
    return df1, df2

a, b = test()
display(a, b)

This prints out:

How to return iteratively updated data frame from function in R?

To somewhat expand on MrFlick's comments:

The issue here is that functions in R perform pass-by-value: df inside df_func is a copy of df (the data.frame with empty columns iter and x) passed to the function. This copy is never modified due to the usage of <<-. Instead, in each iteration of while

df <<- rbind(df, new_df),

which is equivalent to

df <<- rbind(data.frame(iter=integer(), x=integer()), new_df),

modifies df in the global environment, resulting in

> df
  iter   x
1   10 100

after 10 iterations.

Return a dataframe from a function using the input variable as the name of the dataframe

I made two changes of your code:

1.) str_sub(list, 1, 4) -> str_sub(filenames, 1, 4)
list is a function and dont contain any content.

2.) return(data.frame(pattern)) -> return(df)
returning the data.frame and not a sting.

files_to_df <- function(pattern){ 

  # pattern <- "data"
  filenames <- list.files(recursive = TRUE, pattern = pattern) 

  df_list <- lapply(filenames, read.csv, header = TRUE)

  # Name each dataframe with the run and filename
  names(df_list) <- str_sub(filenames, 1, 4)

  # Create combined dataframe  
  df <- df_list %>%
    bind_rows(.id = 'run')

  # Assign dataframe to the name of the pattern  
  assign(pattern, df)

  # Return the dataframe  
  return(data.frame(df))
  #list2env(pattern,.GlobalEnv)
}