Adding Columns to Dataframe Based on File Name in Python

Python Pandas add Filename Column CSV

This should work:

import os

for csv in globbed_files:
frame = pd.read_csv(csv)
frame['filename'] = os.path.basename(csv)
data.append(frame)

frame['filename'] creates a new column named filename and os.path.basename() turns a path like /a/d/c.txt into the filename c.txt.

adding columns to dataframe based on file name in python

Iterate over your files with glob and do some simple splitting on the filenames.

import glob
import pandas as pd

df_list = []
for file in glob.glob('C:/file1_*_*_*.txt'):
# Tweak this to work for your actual filepaths, if needed.
country, typ, dur = file.split('.')[0].split('_')[1:]
df = (pd.read_csv(file)
.assign(Country=country, Type=typ, duration=dur))
df_list.append(df)

df = pd.concat(df_list)

Adding file name in a column in pandas

Assuming that 6th column name is Name File, and considering the file is in the path csv = '/home/User/Documents/file.csv' or csv = 'file.csv', one can do that using the os.path module.

import os.path

df['Name File'] = os.path.basename(csv)

One might also do, as @tdy suggests. Assign the name of the file to a variable

filename='chr1.step1.csv'; 

Then, assuming the df already exists (else one needs to read it, with something like df=pd.read_csv(filename,sep='\t',header=None)), assign the file name to the cells in a new column

df['Name File'] = filename

Extra: If one has a directory with lots of csv files

import pandas as pd
import glob
import os.path

# Create a list of all CSV files
files = glob.glob("*.csv")

# Create an empty list to append the df
filenames = []

for csv in files:
df = pd.read_csv(csv)
df['Name File'] = os.path.basename(csv)
filenames.append(df)

Add column to pandas dataframe with partial file name while importing many files

This is one way to do it:

from pathlib import PureWindowsPath

def fn_helper(fn):
df = pd.read_csv(fn, sep='\t')
p = PureWindowsPath(fn)
part = p.name.split('.')[0]
df['col3'] = part
return df

df_from_each_file = (fn_helper(f) for f in all_files)
...

Or as other people are showing with one-liners:

(pd.read_csv(f, sep='\t').assign(col3=PureWindowsPath(f).name.split('.')[0]) for f in all_files)

Adding dataframe column names based on filename after merging using Glob

You could unpack the list comprehension into a for-loop and add an additional column to each data file, something like this:

import glob
import os
import pandas as pd

os.chdir("Countries/")
extension = 'xlsx'

all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

file_list = []
for f in all_filenames:
data = pd.read_excel(f, sheet_name='Dataset2')
data['source_file'] = f # create a column with the name of the file
file_list.append(data)

combined = pd.concat(file_list, axis=1, ignore_index=True)

combined.to_excel( "New/combined.xlsx", index=False, encoding='utf-8-sig')

How to add filename as column to every file in a directory python

use:
df.to_csv

like this:

for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
df.to_csv(fp, index=False) # index=False if you don't want to save the index as a new column in the csv

btw, I think this may also work and is more readable:

for fp in files:
df = pd.read(fp)
df[date] = os.path.basename(fp).split('.')[0][:8]
df.to_csv(fp, index=False)


Related Topics



Leave a reply



Submit