Extract file name from read_csv - Python
Many ways to do it
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename.split(".")[0])
One more
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename[:-4])
and many more
As @barmar pointed out, better to append path as well to the table_list
to avoid any issues related to path and location of files and script.
How to display the csv file name which is read by pandas read_csv() function?
The pandas.read_csv()
method accepts a File
object (actually any file-like object with a read()
method).
And the File
class has a name
object that has the name of the opened file.
I see this code and situation as absolutely meaningless since you already know the file name beforehand, but for the sake of completeness, here you go:
import pandas as pd
csv_file = open("your_csv_filename.csv")
print(csv_file.name)
df = pd.read_csv(csv_file)
extract characters from filenames and add those as a value in csv python
You can try using the pandas
package
import pandas
filename = 'CFL200_ABCD (2018-01-01).csv'
df = pandas.read_csv(filename)
df.insert(0, 'ID', filename[:6])
df.to_csv(filename, index=False)
Extracting filename and using it as label on DataFrame in Pandas
I would probably use the standard csv module in Python, https://docs.python.org/3/library/csv.html. But if you prefer to use pandas, below is a code snippet you can modify:
import os
import pandas as pd
#get your working directory and target folder that contains all your files
path = os.path.join(os.getcwd(),'folder')
files = [os.path.join(path,i) for i in os.listdir(path) if os.path.isfile(os.path.join(path,i))]
df = pd.DataFrame()
#for every file in folder, read it and append to a empty dataframe with column filename as 'Date'
for file in files:
_df = pd.read_csv(file)
_df['Date'] = os.path.split(file)[-1]
df = df.append(_df)
The example I used above reads every file in a folder, checks if it is a valid file and stores it in list. Once we have the list of files, we just loop it and store it in _df where it appends to df with the file name. Your final df will contain all the data rows and file names.
Read a list of zip files and extract the year from the filename in Pandas
Try concat():
frames = []
for file in files:
df = pd.read_csv(file,compression='zip')
# there several ways to get the year; here's one
df['YEAR'] = file.split('MERGED')[1].split('_')[0]
frames.append(df)
pd.concat(frames)
Extracting a row value from one file, and putting that value to another row in another file (with filename corresponds to row of the previous file)
If I understand your problem correctly I think your approach is a little bit complex. I implemented a script that is creating the desired output.
First, the CSV file with the names of the other files is read directly into the first column of the data frame. Then, the file names are used to extract the longitude and latitude from each file. For this, I created a function, which you can see in the first part of the script. In the end, I add the extracted values to the data frame and store it in a file in the desired format.
import pandas as pd
import csv
# Function that takes
def get_lati_and_long_from_csv(csv_path):
with open(csv_path,'rt') as file:
# Read csv file content to list of rows
data = list(csv.reader(file, delimiter =';'))
# Take values from row zero and one
latitude = data[0][1]
longitude = data[1][1]
return (latitude, longitude)
def main():
# Define path of first csv file
csv_file_1_path = "CSV_file_1.csv"
# Read data frame from csv file and create correct column name
CSV_file_1 = pd.read_csv(csv_file_1_path, header=None)
CSV_file_1.columns = ['FILENAME:']
# Create list of files to read the coordinates
list_of_csvs = list(CSV_file_1['FILENAME:'])
# Define empty lists to add the coordinates
lat_list = []
lon_list = []
# Iterate over all csv files and extract longitude and latitude
for csv_path in list_of_csvs:
lat, lon = get_lati_and_long_from_csv(csv_path)
lat_list.append(lat)
lon_list.append(lon)
# Add coordinates to the data frame
CSV_file_1['Latitude:'] = lat_list
CSV_file_1['Longitude:'] = lon_list
# Save final data frame to csv file
CSV_file_1.to_csv(csv_file_1_path+'.out', index = False, sep='\t')
if __name__ == "__main__":
main()
Test input file content:
1.csv
2.csv
3.csv
Test output file content:
FILENAME: Latitude: Longitude:
1.csv 13.63345 123.207083
2.csv 13.11111 123.22222
3.csv 13.22222 123.11111
EDIT:
If your files do not contain any other data, I would suggest simplifying things and removing pandas as it is not needed. The following main()
function produces the same result but uses only the CSV module.
def main():
# Define path of first csv file
csv_file_1_path = "CSV_file_1.csv"
# Read file to list containing the paths of the other csv files
with open(csv_file_1_path,'rt') as file:
list_of_csvs = file.read().splitlines()
print(list_of_csvs)
# Define empty lists to add the coordinates
lat_list = []
lon_list = []
# Iterate over all csv files and extract longitude and latitude
for csv_path in list_of_csvs:
lat, lon = get_lati_and_long_from_csv(csv_path)
lat_list.append(lat)
lon_list.append(lon)
# Combine the three different lists to create the rows of the new csv file
data = list(zip(list_of_csvs, lat_list, lon_list))
# Create the headers and combine them with the other rows
rows = [['FILENAME:', 'Latitude:', 'Longitude:']]
rows.extend(data)
# Write everything to the final csv file
with open(csv_file_1_path + '.out','w') as file:
csv_writer = csv.writer(file, dialect='excel', delimiter='\t')
csv_writer.writerows(rows)
How to read a file name and append the name to a new column in a csv file using python pandas?
You can use glob.glob()
to give you a list of all of the CSV files and then just extract the ID from each filename and add a new column. The file can then be updated as follows:
from glob import glob
import pandas as pd
import os.path
for filename in glob('my/source/folder/agent_op*.csv'):
id = os.path.basename(filename).lstrip('agent_op_').rstrip('-.csv')
df = pd.read_csv(filename)
df['run_id'] = id
df.to_csv(filename, index=False)
Related Topics
Split Large Text File(Around 50Gb) into Multiple Files
How to Change Milliseconds to Seconds in Python
Swapping List Elements Effectively in Python
Capturing Video from Two Cameras in Opencv At Once
How to Convert Dict Value to a Float
Write a Dictionary With Multiple Values to Store Data in Columns in the CSV File
Fitting a Straight Line to a Log-Log Curve in Matplotlib
How to Use a Module Without Installing It on Your Computer
How to Loop Over Grouped Pandas Dataframe
How to Convert SQL Query Results into a Python Dictionary
How to Force Pip to Reinstall the Current Version
How to Append New Data Onto a New Line
Pass Variable Between Python Scripts
Converting Numpy Dtypes to Native Python Types
Codehs Python, Remove All from String
How to Execute Multiple Commands in a Single Session in Paramiko - Python