Reading Multiple Files and Calculating Mean Based on User Input

Reading multiple files and calculating mean based on user input

That's the way I fixed it:

pollutantmean <- function(directory, pollutant, id = 1:332) {
#set the path
path = directory

#get the file List in that directory
fileList = list.files(path)

#extract the file names and store as numeric for comparison
file.names = as.numeric(sub("\\.csv$","",fileList))

#select files to be imported based on the user input or default
selected.files = fileList[match(id,file.names)]

#import data
Data = lapply(file.path(path,selected.files),read.csv)

#convert into data frame
Data = do.call(rbind.data.frame,Data)

#calculate mean
mean(Data[,pollutant],na.rm=TRUE)

}

The last question is that my function should call "specdata" (the directory name where all the csv's are located) as the directory, is there a directory type object in r?

suppose i call the function as:

pollutantmean(specdata, "niterate", 1:10)

It should get the path of specdata directory which is on my working directory... how can I do that?

Reading data from .txt file and calculating mean in Python

You can create a list of values by splitting by '\n' and convert those values to float, after that you can calculate the mean of that list using the mean from statistics:

from statistics import mean

with open('inputdata.txt','r') as fin:
data=[float(x) for x in fin.read().split('\n')]

average = mean(data)
print(average)

How to loop through text files, find the average of each, and store it in a dataframe in R?

Select the numeric columns, unlist them to a vector and calculate mean.

library(dplyr)
library(purrr)
library(vroom)

map_dbl(Filenames, ~ vroom(.x) %>%
select(where(is.numeric)) %>%
unlist %>% mean(na.rm = TRUE)) -> mean_values

mean_values

Column means over multiple files

Using the data.table library:

library(data.table)

# reading each file as a data.table. Bonus - fread is much faster than read.csv
m <- lapply(Files, fread, header=TRUE, comment.char="#")

#compiling into one dataset
m2 <- rbindlist(m)

#calculating mean by id over each column
m2[,lapply(.SD,mean),by="id"]

Building a mean across several csv files

Based on your example e.g. 16 files for 10:25, i.e. 010.csv, 011.csv, 012.csv, etc.
Under the assumption that your naming convention follows the order of the files in the directory, you could try:

csvFiles <- list.files(pattern="\\.csv")[10:15]#here [10:15] ... in production use your function parameter here 
file_list <- vector('list', length=length(csvFiles))
df_list <- lapply(X=csvFiles, read.csv, header=TRUE)
names(df_list) <- csvFiles #OPTIONAL: if you want to rename (later rows) to the csv list
df <- do.call("rbind", df_list)
mean(df[ ,"columnName"])

These code snippets should be possible to pimp and incorprate into your routine.

Bash: Finding average of entries from multiple columns after reading a CSV text file

Trying to fix OP's attempt here and adding logic to get average of averages at last of the file's reading. Written on mobile so couldn't test it should work in case I got the thought correct by OP's description.

awk -F, '
$2~/[24680]$/{
count++
for(i=3;i<=7;i++){
sum+=$i
}
tot+=sum/5
sum=0
}
END{
print "Average of averages is: " (count?tot/count:"NaN")
}
' user-list.txt > superuser.txt


Related Topics



Leave a reply



Submit