Extract file name from path, no matter what the os/path format
Using os.path.split
or os.path.basename
as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail.
Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath
module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.
import ntpath
ntpath.basename("a/b/c")
Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:
def path_leaf(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
Verification:
>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c',
... 'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']
(1) There's one caveat: Linux filenames may contain backslashes. So on linux, r'a/b\c'
always refers to the file b\c
in the a
folder, while on Windows, it always refers to the c
file in the b
subfolder of the a
folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it's usually safe to assume it's a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don't create accidental security holes.
How to extract the file name from a file path?
If all you want to do is truncate the file paths to just the filename, you can use os.path.basename
:
for file in files:
fname = os.path.basename(file)
dict_[fname] = (pd.read_csv(file, header=0, dtype=str, encoding='cp1252')
.fillna(''))
Example:
os.path.basename('Desktop/test.txt')
# 'test.txt'
Extracting the name of a full file path
You should use pathlib
(Python >= 3.4) for this.
from pathlib import Path
p = Path('D:\AI\Deep learning\face generator\images\chris evans 1.jpg')
filename = p.name
print(filename)
Extracting extension from filename in Python
Use os.path.splitext
:
>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'
Unlike most manual string-splitting attempts, os.path.splitext
will correctly treat /a/b.c/d
as having no extension instead of having extension .c/d
, and it will treat .bashrc
as having no extension instead of having extension .bashrc
:
>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')
extract features for similar file name
Here is a example of grouping the files properly in a dictionary with each key being the relevant name before the underscore (_
) and the values being the paths to the files using said name.
from pathlib import Path
from itertools import groupby
from collections import defaultdict
path = Path("path/to/dir")
def group_files(directory: Path, ext="jpg"): # We must use a pathlib object here
# We can glob the pathlib object with our preferred extension
list_of_files = list(map(lambda x : x.name, directory.glob(f"**/*.{ext}")))
file_dict = defaultdict(list) # File storage in a dict
keyfunc = lambda x: x.split("_")[0] # Split on the '_' and get the first word
data = sorted(list_of_files, key=keyfunc) # sort on the name before the '_'
for k, g in groupby(data, keyfunc):
file_dict[k] = list(g)
return file_dict
dict_of_files = group_files(path)
# Now we have something like
# {"shghgssd" : ["shghgssd_1212.jpg", "shghgssd_ewewe.jpg", "shghgssd_opopo.jpg"]}
# But with full paths, not printed for brevity
# This means that you can iterate of the keys and values and get some operation going
sums_of_vecs = []
for key, value in dict_of_files.items(): # key is a string, value is a list
print(f"Treating the files with the {key}_... prefix")
for filepath in value:
# DO YOUR COMPUTATION HERE
# APPEND TO RESULTS
pass # nothing happens here...
Note that if you're controlling how the files are created, it might be better to put each batch you're planning to treat further in its own directory and then load them with image_dataset_from_directory
.
Get just the filename from a file path stored as a string
For python3.4+, pathlib
from pathlib import Path
name = Path(Filename).name
Related Topics
Performant Cartesian Product (Cross Join) With Pandas
Installing Specific Package Version With Pip
Python Requests Throwing Sslerror
How Does Tuple Comparison Work in Python
Lambda in For Loop Only Takes Last Value
What Is the Python Keyword "With" Used For
Create List of Single Item Repeated N Times
How to Set Environment Variables in Python
Is Floating Point Arbitrary Precision Available
How to Merge Dictionaries of Dictionaries
Determine the Type of an Object
Text Progress Bar in Terminal With Block Characters
How to Read Specific Lines from a File (By Line Number)
How to Convert an Integer to a String in Any Base