Apply python code to all csv files in a folder
You don't need the line:file_name=os.path.splitext(...)
Just this:
path = "absolute/path/to/your/folder"
os.chdir(path)
all_files = glob.glob('*.csv')
for file in all_files:
df = pd.read_csv(file)
df["new_column"] = df["seq"] + df["log_id"]
df.to_csv(file)
Reading all the csv files with Pandas inside a folder Location Python
You can create variables through globals()
althought this is not recommended. It is much better to use a dictionary with the keys as the file names.
Try:
import os
data = dict()
for file in os.listdir(folder_location):
if file.endswith(".csv"):
data[file.replace(".csv","")] = pd.read_csv(os.path.join(folder_location, file))
If you absolutlely must create dynamic variables with the file names, use the below instead:
import os
for file in os.listdir(folder_location):
if file.endswith(".csv"):
globals()[file.replace(".csv","")] = pd.read_csv(os.path.join(folder_location, file))
Read in all csv files from a directory using Python
That's how I'd do it:
import os
directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()
Python Pandas - import all CSV files in folder, only picking up 1 file
It is processing all the CSVs, when concatenating you are not using your base dataframe (dfData
) and just using the the new dataframe (df
).
Also considering the Filename
, it will be overwritten everytime.
Have it at df
to avoid this:
df['Filename'] = filename
dfData = pd.concat([dfData, df], ignore_index=True)
List method
as suggested by pyaj in the comments, you can also use lists to achieve the same thing.
It will look like this:
csvPath = "blahblah"
df_list = []
for f in glob.glob(csvPath + "\*.csv"):
df = pd.read_csv(f)
filename = (os.path.basename(f))
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
df['ID'] = df['ID'].str.upper()
df = df.set_index('ID').stack().reset_index()
df['Filename'] = filename
df_list.append(df)
dfData = pd.concat(df_list, ignore_index=True)
You can also check the list to see if each individual dataframe is correct.
Import all csv files existing in a folder and group them based on their names?
First off I want to say that my solution is only robust given the fact that there will always be 4 files that belong grouped together and there won't be missing anything. If you want to make it more robust filenameparsing should be used.
As far as I understand the question you want to get the data from four csv files with the same string prefix grouped together in a list. That then is embedded in a bigger list for all the data there is in the 1000 files.
Therefore I would not sort by timestamp but by name and then simply store the files in lists that get added to a bigger one after four items were added and subsequently resetet. This is my code then:
import os
import glob
import pandas as pd
path1 = 'D:\folder'
all_files1 = glob.glob("*.csv")
# Sort by name not timestamp
all_files1.sort()
List_DATA = []
# For Storing sub list of data frames
SubList_DATA = []
for idx,filename in enumerate(all_files1):
data = pd.read_csv(filename, index_col=None)
SubList_DATA.append(data)
# Every 4th time the sublist gets stored in main list and reset.
if idx%4==3:
List_DATA.append(SubList_DATA)
SubList_DATA = []
EDIT:
I just hacked a version together that makes use of the filenames and will work even if there are more or less files in a group:
import os
import glob
import pandas as pd
path1 = 'D:\folder'
all_files1 = glob.glob("*.csv")
# Sort by name not timestamp
all_files1.sort()
List_DATA = []
# For Storing sub list of data frames
SubList_DATA = []
# For keeping track which sublist is generated.
currentprefix = ""
for idx,filename in enumerate(all_files1):
# Parse prefix string from filename
prefix, suffix = filename.split("_")
# Since sorted the prefix should be change only once and nether reappear
if currentprefix != prefix:
# Skip this at the first step
if idx != 0:
# Add sublist to major one and reset it
List_DATA.append(SubList_DATA)
SubList_DATA = []
# Set current prefix to the current block of read in files
currentprefix = prefix
# Add data to sublist
data = pd.read_csv(filename, index_col=None)
SubList_DATA.append(data)
# Finally add last sublist
List_DATA.append(SubList_DATA)
Upload folder of CSV'S to Google Sheet using Python
While I am not entirely familiar with the gspread package, I do know that the os package would be very helpful for iterating through files in a folder. You would not need to install os, as it should already come with Python. You can use it like so:
import gspread
import os
gc = gspread.oauth(credentials_filename='/users/krzysztofpaszta/credentials.json')
os.chdir(FOLDER_PATH)
files = os.listdir()
for filename in files:
if filename.split(".")[1] == "csv":
content = open(filename, 'r').read().encode('utf-8')
gc.import_csv('1gv-bKe-flo5FwIbt_xgCp1vNn0L0KBnpiu', content)
You would need to replace FOLDER_PATH
with the path to the folder you are storing the csv's in relative to the directory you are running your python script in. The if filename.split(".")[1] == "csv":
line is there to ensure that the script only tries to upload csv files and ignores any other type of file. I am not entirely familiar with what the first argument in the gc.import_csv() command is doing, so that might cause an issue if it is specific to the particular csv you were trying to upload before. Hope this helps!
EDIT: Upon looking into the import_csv() function, it seems that my current code would just continuously overwrite the same spreadsheet over and over. It seems like you would need to create new spreadsheets for every file and then pass in the file_id
as the argument to import_csv()
each time
EDIT2: Try adding this line after the if statement:sh = gc.create(filename.split(".")[0])
and then replacing the long string that is currently the first argument of import_csv() with sh.id
. I hope this works!
Related Topics
How to Get Rid of the B-Prefix in a String in Python
Typeerror: Strptime() Argument 1 Must Be Str, Not List
How to Make Multiple Empty Lists in Python
Split String At Nth Occurrence of a Given Character
Making Python Dictionary from a Text File With Multiple Keys
How to Check the Version of Python Modules
Count Unique Words in a Text File (Python)
Reading Particular Cell Value from Excelsheet in Python
How to Extract Integer or Float from String
Python Replace Empty Strings in a List With Values from a Different List
Make Alternate Letters Capital
Python Strftime - Date Without Leading 0
I Need to Code a 1 22 333 4444 Pattern in Python With While Loops
Typeerror: Unsupported Format String Passed to List._Format_
Fill With Nan When Length of Values Does Not Match Length of Index