Find All CSV Files in a Directory Using Python

Apply python code to all csv files in a folder

You don't need the line:
file_name=os.path.splitext(...)

Just this:

path = "absolute/path/to/your/folder"
os.chdir(path)
all_files = glob.glob('*.csv')

for file in all_files:
df = pd.read_csv(file)
df["new_column"] = df["seq"] + df["log_id"]
df.to_csv(file)

Reading all the csv files with Pandas inside a folder Location Python

You can create variables through globals() althought this is not recommended. It is much better to use a dictionary with the keys as the file names.

Try:

import os
data = dict()
for file in os.listdir(folder_location):
if file.endswith(".csv"):
data[file.replace(".csv","")] = pd.read_csv(os.path.join(folder_location, file))

If you absolutlely must create dynamic variables with the file names, use the below instead:

import os
for file in os.listdir(folder_location):
if file.endswith(".csv"):
globals()[file.replace(".csv","")] = pd.read_csv(os.path.join(folder_location, file))

Read in all csv files from a directory using Python

That's how I'd do it:

import os

directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()

Python Pandas - import all CSV files in folder, only picking up 1 file

It is processing all the CSVs, when concatenating you are not using your base dataframe (dfData) and just using the the new dataframe (df).

Also considering the Filename, it will be overwritten everytime.
Have it at df to avoid this:

df['Filename'] = filename
dfData = pd.concat([dfData, df], ignore_index=True)

List method

as suggested by pyaj in the comments, you can also use lists to achieve the same thing.

It will look like this:

csvPath = "blahblah"

df_list = []

for f in glob.glob(csvPath + "\*.csv"):
df = pd.read_csv(f)
filename = (os.path.basename(f))
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
df['ID'] = df['ID'].str.upper()
df = df.set_index('ID').stack().reset_index()
df['Filename'] = filename

df_list.append(df)

dfData = pd.concat(df_list, ignore_index=True)

You can also check the list to see if each individual dataframe is correct.

Import all csv files existing in a folder and group them based on their names?

First off I want to say that my solution is only robust given the fact that there will always be 4 files that belong grouped together and there won't be missing anything. If you want to make it more robust filenameparsing should be used.

As far as I understand the question you want to get the data from four csv files with the same string prefix grouped together in a list. That then is embedded in a bigger list for all the data there is in the 1000 files.
Therefore I would not sort by timestamp but by name and then simply store the files in lists that get added to a bigger one after four items were added and subsequently resetet. This is my code then:

import os
import glob
import pandas as pd

path1 = 'D:\folder'

all_files1 = glob.glob("*.csv")
# Sort by name not timestamp
all_files1.sort()
List_DATA = []
# For Storing sub list of data frames
SubList_DATA = []

for idx,filename in enumerate(all_files1):
data = pd.read_csv(filename, index_col=None)
SubList_DATA.append(data)
# Every 4th time the sublist gets stored in main list and reset.
if idx%4==3:
List_DATA.append(SubList_DATA)
SubList_DATA = []

EDIT:
I just hacked a version together that makes use of the filenames and will work even if there are more or less files in a group:

import os
import glob
import pandas as pd

path1 = 'D:\folder'

all_files1 = glob.glob("*.csv")
# Sort by name not timestamp
all_files1.sort()
List_DATA = []
# For Storing sub list of data frames
SubList_DATA = []
# For keeping track which sublist is generated.
currentprefix = ""

for idx,filename in enumerate(all_files1):
# Parse prefix string from filename
prefix, suffix = filename.split("_")
# Since sorted the prefix should be change only once and nether reappear
if currentprefix != prefix:
# Skip this at the first step
if idx != 0:
# Add sublist to major one and reset it
List_DATA.append(SubList_DATA)
SubList_DATA = []
# Set current prefix to the current block of read in files
currentprefix = prefix
# Add data to sublist
data = pd.read_csv(filename, index_col=None)
SubList_DATA.append(data)

# Finally add last sublist
List_DATA.append(SubList_DATA)

Upload folder of CSV'S to Google Sheet using Python

While I am not entirely familiar with the gspread package, I do know that the os package would be very helpful for iterating through files in a folder. You would not need to install os, as it should already come with Python. You can use it like so:

import gspread
import os
gc = gspread.oauth(credentials_filename='/users/krzysztofpaszta/credentials.json')

os.chdir(FOLDER_PATH)

files = os.listdir()

for filename in files:
if filename.split(".")[1] == "csv":
content = open(filename, 'r').read().encode('utf-8')
gc.import_csv('1gv-bKe-flo5FwIbt_xgCp1vNn0L0KBnpiu', content)

You would need to replace FOLDER_PATH with the path to the folder you are storing the csv's in relative to the directory you are running your python script in. The if filename.split(".")[1] == "csv": line is there to ensure that the script only tries to upload csv files and ignores any other type of file. I am not entirely familiar with what the first argument in the gc.import_csv() command is doing, so that might cause an issue if it is specific to the particular csv you were trying to upload before. Hope this helps!

EDIT: Upon looking into the import_csv() function, it seems that my current code would just continuously overwrite the same spreadsheet over and over. It seems like you would need to create new spreadsheets for every file and then pass in the file_id as the argument to import_csv() each time

EDIT2: Try adding this line after the if statement:
sh = gc.create(filename.split(".")[0]) and then replacing the long string that is currently the first argument of import_csv() with sh.id. I hope this works!



Related Topics



Leave a reply



Submit