Extract File Name from Path, No Matter What the Os/Path Format

Extract file name from path, no matter what the os/path format

Using os.path.split or os.path.basename as others suggest won't work in all cases: if you're running the script on Linux and attempt to process a classic windows-style path, it will fail.

Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.

import ntpath
ntpath.basename("a/b/c")

Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:

def path_leaf(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)

Verification:

>>> paths = ['a/b/c/', 'a/b/c', '\\a\\b\\c', '\\a\\b\\c\\', 'a\\b\\c', 
... 'a/b/../../a/b/c/', 'a/b/../../a/b/c']
>>> [path_leaf(path) for path in paths]
['c', 'c', 'c', 'c', 'c', 'c', 'c']


(1) There's one caveat: Linux filenames may contain backslashes. So on linux, r'a/b\c' always refers to the file b\c in the a folder, while on Windows, it always refers to the c file in the b subfolder of the a folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it's usually safe to assume it's a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don't create accidental security holes.

How to extract the file name from a file path?

If all you want to do is truncate the file paths to just the filename, you can use os.path.basename:

for file in files:
fname = os.path.basename(file)
dict_[fname] = (pd.read_csv(file, header=0, dtype=str, encoding='cp1252')
.fillna(''))

Example:

os.path.basename('Desktop/test.txt')
# 'test.txt'

Extracting the name of a full file path

You should use pathlib (Python >= 3.4) for this.

from pathlib import Path

p = Path('D:\AI\Deep learning\face generator\images\chris evans 1.jpg')
filename = p.name
print(filename)

Extracting extension from filename in Python

Use os.path.splitext:

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

extract features for similar file name

Here is a example of grouping the files properly in a dictionary with each key being the relevant name before the underscore (_) and the values being the paths to the files using said name.

from pathlib import Path
from itertools import groupby
from collections import defaultdict

path = Path("path/to/dir")


def group_files(directory: Path, ext="jpg"): # We must use a pathlib object here
# We can glob the pathlib object with our preferred extension
list_of_files = list(map(lambda x : x.name, directory.glob(f"**/*.{ext}")))
file_dict = defaultdict(list) # File storage in a dict
keyfunc = lambda x: x.split("_")[0] # Split on the '_' and get the first word
data = sorted(list_of_files, key=keyfunc) # sort on the name before the '_'
for k, g in groupby(data, keyfunc):
file_dict[k] = list(g)
return file_dict


dict_of_files = group_files(path)

# Now we have something like
# {"shghgssd" : ["shghgssd_1212.jpg", "shghgssd_ewewe.jpg", "shghgssd_opopo.jpg"]}
# But with full paths, not printed for brevity
# This means that you can iterate of the keys and values and get some operation going

sums_of_vecs = []
for key, value in dict_of_files.items(): # key is a string, value is a list
print(f"Treating the files with the {key}_... prefix")
for filepath in value:
# DO YOUR COMPUTATION HERE
# APPEND TO RESULTS
pass # nothing happens here...

Note that if you're controlling how the files are created, it might be better to put each batch you're planning to treat further in its own directory and then load them with image_dataset_from_directory.

Get just the filename from a file path stored as a string

For python3.4+, pathlib

from pathlib import Path

name = Path(Filename).name


Related Topics



Leave a reply



Submit