Merge Many Data Frames from CSV Files, When Id Column Is Implied

Merge many data frames from csv files, when ID column is implied?

My understanding is that you need to extract the ID from the filename, and then merge the imported csv with the existing dataframe.

df1 <- read.csv(textConnection("ID, var1, var2
A, 2, 2
B, 4, 5"))

# assuming the imported csv-files are in working directory
filenames <- list.files(getwd(), pattern = "ID_[A-Z].csv")

# extract ID from filename
ids <- gsub("ID_([A-Z]).csv", "\\1", filenames)

# import csv-files and append ID
library(plyr)
import <- mdply(filenames, read.csv)
import$ID <- ids[import$Var1]
import$Var1 <- NULL

# merge imported csv-files and the existing dataframe
merge(df1, import)

Result:

ID var1 var2 time var4 var5
1 B 4 5 0 1 2
2 B 4 5 1 4 5
3 B 4 5 2 1 6

Pick up one column of varying lengths from many CSV files and merge to one matrix in R

To save a LITTLE memory/time you could modify the solution from @Ben Bolker like this:

datlist <- lapply(csvnames,read.csv, colClasses=c("character", "NULL"))
rowseq <- seq_len( max(vapply(datlist,nrow, integer(1))) )
keylist <- lapply(datlist,function(x) { x[[1]][rowseq] ) })
names(keylist) <- paste(KEYTERMS,csvnames,sep="_")
#do.call(cbind,keylist)
do.call(data.frame,keylist)

...I just changed so that only the first column is read, and simplified the NA padding by observing that selecting a sequence that extends outside a character vector pads with NA automatically...

If you kept the old way of padding, you should at least pad with NA_character_ instead of NA to avoid unnecessary coercion.

I also index the KEYTERMS column by number instead of name (since there should be only one). I also changed sapply to vapply because I like it better :) - it actually is faster too.

Finally you said you wanted a data.frame. The last line produces that instead of a matrix.

pandas three-way joining multiple dataframes on columns

Zero's answer is basically a reduce operation. If I had more than a handful of dataframes, I'd put them in a list like this (generated via list comprehensions or loops or whatnot):

dfs = [df0, df1, df2, ..., dfN]

Assuming they have a common column, like name in your example, I'd do the following:

import functools as ft
df_final = ft.reduce(lambda left, right: pd.merge(left, right, on='name'), dfs)

That way, your code should work with whatever number of dataframes you want to merge.

Merge several data frames on two common columns

You could try this. Untested since your example is not reproducible. Give us some dummy data for z.sfff.oi, z.sff.oig and z.sff.ret if you want a better answer. You can use dput() to generate code for a reproducible dataset.

A <- data.frame(Group = "oi", date = as.factor(index(z.ssf.oi),) as.data.frame(z.ssf.oi)))
B <- data.frame(Group = "oig", date = as.factor(index(z.ssf.oig)), as.data.frame(z.ssf.oig)))
C <- data.frame(Group = "ret", date = as.factor(index(z.ssf.ret)), as.data.frame(z.ssf.ret)))
Long <- melt(rbind(A, B, C), id.vars = c("Group", "date")))
cast(date ~ Group, data = Long)

Merging CSV files with partly same filename

You could simply iterate a static list of property names in a FOR loop. Note that multiple word names should be enclosed within quotes:

@echo off
for %%A in (
level
pressure
volume
"multiword name"
etc.
) do copy /b %%A??.csv %%a-day.csv /y

Or you could dynamically discover the names of the properties.

Assuming every property always has a time 00 file:

@echo off
setlocal enableDelayedExpansion
for %%A in (*00.csv) do (
set "prop=%%~nA"
set "prop=!prop:~0,-2!"
copy /b "!prop!??.csv" "!prop!-day.csv"
)

If time 00 might be missing sometimes:

@echo off
setlocal enableDelayedExpansion

:: Delete any existing -day files
delete *-day??.csv 2>nul

for /f "eol=: delims=" %%A in (
'dir /b /a-d *.csv^|findstr "[^0-9][0-9][0-9]\.csv$"'
) do (
set "prop=%%~nA"
set "prop=!prop:~0,-2!"
if not exist "!prop!-day.csv" copy /b "!prop!??.csv" "!prop!-day.csv"
)

Concatenate two dataframes and replacing NA values in R and transform the result in a csv file

You may combine the two datasets using bind_rows and sort the columns putting NA's at the last.

library(dplyr)

bind_rows(x, y) %>%
mutate(ID = row_number(),
across(c(S1, S2), sort, na.last = TRUE))

# ID S1 S2
#1 1 10 21
#2 2 11 22
#3 3 12 23
#4 4 13 24
#5 5 14 25
#6 6 NA 26
#7 7 NA 27

how to perform outer merge on multiple DataFrames using pandas in python

Try this code.

filelist = glob.glob('*.csv')
merged_df = pd.read_csv(filelist[0])

for filename in filelist:
df = pd.read_csv(filename)
merged_df = pd.merge(merged_df, df, on = ['gene_id'], how = 'outer')


Related Topics



Leave a reply



Submit