How to Merge 200 CSV Files in Python

how to merge 200 csv files in Python

As ghostdog74 said, but this time with headers:

with open("out.csv", "ab") as fout:
# first file:
with open("sh1.csv", "rb") as f:
fout.writelines(f)
# now the rest:
for num in range(2, 201):
with open("sh"+str(num)+".csv", "rb") as f:
next(f) # skip the header, portably
fout.writelines(f)

python script to merge more than 200 very large csv very in just one

You probably just need to keep a merged.csv file open whilst reading in each of the certificates.csv files. glob.glob() can be used to recursively find all suitable files:

import glob
import csv
import os

path = r'C:\path\to\folder\where\all\files\are-allowated-in-subfolders'
os.chdir(path)

with open('merged.csv', 'w', newline='') as f_merged:
csv_merged = csv.writer(f_merged)

for filename in glob.glob(os.path.join(path, '*/certificates.csv'), recursive=True):
print(filename)

try:
with open(filename) as f_csv:
csv_merged.writerows(csv.reader(f_csv))
except:
print('problem with file: ', filename)

An r prefix can be added to your path to avoid needing to escape each backslash. Also newline='' should be added to the open() when using a csv.writer() to stop extra blank lines being written to your output file.

Loop to merge multiple csv files

You were almost there. First of all, you are not actually renaming because you missed file= in front of the rename.

Then, to add a column to a dataframe, you simply do df[col]=file[col].

Therefore:

df = pd.DataFrame()
for i, f in enumerate(files):
file = pd.read_csv(f)
file = file.rename(columns = {'Damage': '{}sec'.format(i)})
df['{}sec'.format(i)] = file['{}sec'.format(i)]

Don't forget to add the id column once before iterating.

Reading multiple CSV files and merge Python Pandas

You can use pd.concat and a list comprehension:

df = pd.concat([pd.read_csv(csv_name, sep=';', header=None) for csv_name in csv_names])

How to merge multiple text files into one csv file in Python

The problem is that your os.listdir gives you the list of filenames inside dirpath, not the full path to these files. You can get the full path by prepending the dirpath to filenames with os.path.join function.

import os
import pandas as pd

dirpath = 'C:\Files\Code\Analysis\Input\qobs_RR1\\'
output = 'C:\Files\Code\Analysis\output\qobs_CSV.csv'
csvout_lst = []
files = [os.path.join(dirpath, fname) for fname in os.listdir(dirpath)]

for filename in sorted(files):
data = pd.read_csv(filename, sep=':', index_col=0, header=None)
csvout_lst.append(data)

pd.concat(csvout_lst).to_csv(output)

Edit: this can be done with a one-liner:

pd.concat(
pd.read_csv(os.path.join(dirpath, fname), sep=':', index_col=0, header=None)
for fname in sorted(os.listdir(dirpath))
).to_csv(output)

Edit 2: updated the answer, so the list of files is sorted alphabetically.

Python csv merge multiple files with different columns

As already explained in your original question, you can easily extend the columns in Awk if you know how many to expect.

awk -F ',' -v cols=5 'BEGIN { OFS=FS }
FNR == 1 && NR > 1 { next }
NF<cols { for (i=NF+1; i<=cols; ++i) $i = "" }
1' *.csv >file.csv

I slightly refactored this to skip the unwanted lines with next rather than vice versa; this simplifies the rest of the script slightly. I also added the missing comma separator.

You can easily print the number of columns in each file, and just note the maximum:

awk -F , 'FNR==1 { print NF, FILENAME }' *.csv

If you don't know how many fields there are going to be in files you do not yet have, or if you need to cope with complex CSV with quoted fields, maybe switch to Python for this. It's not too hard to do the field number sniffing in Awk, but coping with quoting is tricky.

import csv
import sys

# Sniff just the first line from every file
fields = 0
for filename in sys.argv[1:]:
with open(filename) as raw:
for row in csv.reader(raw):
# If the line is longer than current max, update
if len(row) > fields:
fields = len(row)
titles = row
# Break after first line, skip to next file
break

# Now do the proper reading
writer = csv.writer(sys.stdout)
writer.writerow(titles)

for filename in sys.argv[1:]:
with open(filename) as raw:
for idx, row in enumerate(csv.reader(raw)):
if idx == 0:
next
row.extend([''] * (fields - len(row)))
writer.writerow(row)

This simply assumes that the additional fields go at the end. If the files could have extra columns between other columns, or columns in different order, you need a more complex solution (though not by much; the Python CSV DictReader subclass could do most of the heavy lifting).

Demo: https://ideone.com/S998l4

If you wanted to do the same type of sniffing in Awk, you basically have to specify the names of the input files twice, or do some nontrivial processing in the BEGIN block to read all the files before starting the main script.



Related Topics



Leave a reply



Submit