Extract File Name from Read_Csv - Python

Extract file name from read_csv - Python

Many ways to do it

for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename.split(".")[0])

One more

for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename[:-4])

and many more

As @barmar pointed out, better to append path as well to the table_list to avoid any issues related to path and location of files and script.

How to display the csv file name which is read by pandas read_csv() function?

The pandas.read_csv() method accepts a File object (actually any file-like object with a read() method).

And the File class has a name object that has the name of the opened file.

I see this code and situation as absolutely meaningless since you already know the file name beforehand, but for the sake of completeness, here you go:

import pandas as pd

csv_file = open("your_csv_filename.csv")
print(csv_file.name)
df = pd.read_csv(csv_file)

extract characters from filenames and add those as a value in csv python

You can try using the pandas package

import pandas

filename = 'CFL200_ABCD (2018-01-01).csv'

df = pandas.read_csv(filename)
df.insert(0, 'ID', filename[:6])
df.to_csv(filename, index=False)

Extracting filename and using it as label on DataFrame in Pandas

I would probably use the standard csv module in Python, https://docs.python.org/3/library/csv.html. But if you prefer to use pandas, below is a code snippet you can modify:

import os
import pandas as pd

#get your working directory and target folder that contains all your files
path = os.path.join(os.getcwd(),'folder')

files = [os.path.join(path,i) for i in os.listdir(path) if os.path.isfile(os.path.join(path,i))]

df = pd.DataFrame()

#for every file in folder, read it and append to a empty dataframe with column filename as 'Date'
for file in files:
_df = pd.read_csv(file)
_df['Date'] = os.path.split(file)[-1]
df = df.append(_df)

The example I used above reads every file in a folder, checks if it is a valid file and stores it in list. Once we have the list of files, we just loop it and store it in _df where it appends to df with the file name. Your final df will contain all the data rows and file names.

Read a list of zip files and extract the year from the filename in Pandas

Try concat():

frames = []
for file in files:
df = pd.read_csv(file,compression='zip')
# there several ways to get the year; here's one
df['YEAR'] = file.split('MERGED')[1].split('_')[0]
frames.append(df)

pd.concat(frames)

Extracting a row value from one file, and putting that value to another row in another file (with filename corresponds to row of the previous file)

If I understand your problem correctly I think your approach is a little bit complex. I implemented a script that is creating the desired output.

First, the CSV file with the names of the other files is read directly into the first column of the data frame. Then, the file names are used to extract the longitude and latitude from each file. For this, I created a function, which you can see in the first part of the script. In the end, I add the extracted values to the data frame and store it in a file in the desired format.

import pandas as pd
import csv

# Function that takes
def get_lati_and_long_from_csv(csv_path):
with open(csv_path,'rt') as file:
# Read csv file content to list of rows
data = list(csv.reader(file, delimiter =';'))

# Take values from row zero and one
latitude = data[0][1]
longitude = data[1][1]


return (latitude, longitude)

def main():
# Define path of first csv file
csv_file_1_path = "CSV_file_1.csv"

# Read data frame from csv file and create correct column name
CSV_file_1 = pd.read_csv(csv_file_1_path, header=None)
CSV_file_1.columns = ['FILENAME:']

# Create list of files to read the coordinates
list_of_csvs = list(CSV_file_1['FILENAME:'])

# Define empty lists to add the coordinates
lat_list = []
lon_list = []

# Iterate over all csv files and extract longitude and latitude
for csv_path in list_of_csvs:
lat, lon = get_lati_and_long_from_csv(csv_path)
lat_list.append(lat)
lon_list.append(lon)

# Add coordinates to the data frame
CSV_file_1['Latitude:'] = lat_list
CSV_file_1['Longitude:'] = lon_list

# Save final data frame to csv file
CSV_file_1.to_csv(csv_file_1_path+'.out', index = False, sep='\t')

if __name__ == "__main__":
main()

Test input file content:

1.csv
2.csv
3.csv

Test output file content:

FILENAME:   Latitude:   Longitude:
1.csv 13.63345 123.207083
2.csv 13.11111 123.22222
3.csv 13.22222 123.11111

EDIT:
If your files do not contain any other data, I would suggest simplifying things and removing pandas as it is not needed. The following main() function produces the same result but uses only the CSV module.

def main():      
# Define path of first csv file
csv_file_1_path = "CSV_file_1.csv"

# Read file to list containing the paths of the other csv files
with open(csv_file_1_path,'rt') as file:
list_of_csvs = file.read().splitlines()

print(list_of_csvs)
# Define empty lists to add the coordinates
lat_list = []
lon_list = []

# Iterate over all csv files and extract longitude and latitude
for csv_path in list_of_csvs:
lat, lon = get_lati_and_long_from_csv(csv_path)
lat_list.append(lat)
lon_list.append(lon)

# Combine the three different lists to create the rows of the new csv file
data = list(zip(list_of_csvs, lat_list, lon_list))

# Create the headers and combine them with the other rows
rows = [['FILENAME:', 'Latitude:', 'Longitude:']]
rows.extend(data)

# Write everything to the final csv file
with open(csv_file_1_path + '.out','w') as file:
csv_writer = csv.writer(file, dialect='excel', delimiter='\t')
csv_writer.writerows(rows)

How to read a file name and append the name to a new column in a csv file using python pandas?

You can use glob.glob() to give you a list of all of the CSV files and then just extract the ID from each filename and add a new column. The file can then be updated as follows:

from glob import glob
import pandas as pd
import os.path

for filename in glob('my/source/folder/agent_op*.csv'):
id = os.path.basename(filename).lstrip('agent_op_').rstrip('-.csv')
df = pd.read_csv(filename)
df['run_id'] = id
df.to_csv(filename, index=False)


Related Topics



Leave a reply



Submit