Extracting extension from filename in Python
Use os.path.splitext
:
>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'
Unlike most manual string-splitting attempts, os.path.splitext
will correctly treat /a/b.c/d
as having no extension instead of having extension .c/d
, and it will treat .bashrc
as having no extension instead of having extension .bashrc
:
>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')
How to get file extension correctly?
The role of a file extension is to tell the viewer (and sometimes the computer) which application to use to handle the file.
Taking your worst-case example in your comments (a.ppt.tar.gz
), this is a PowerPoint file that has been tar-balled and then gzipped. So you need to use a gzip-handling program to open it. Using PowerPoint or a tarball-handling program wouldn't work. OK, a clever program that knew how to handle both .tar
and .gz
files could understand both operations and work with a .tar.gz
file - but note that it would do that even if the extension was simply .gz
.
The fact that both tar
and gzip
add their extensions to the original filename, rather than replace them (as zip
does) is a convenience. But the base name of the gzip file is still a.ppt.tar
.
How do I get the filename without the extension from a path in Python?
Getting the name of the file without the extension:
import os
print(os.path.splitext("/path/to/some/file.txt")[0])
Prints:
/path/to/some/file
Documentation for os.path.splitext
.
Important Note: If the filename has multiple dots, only the extension after the last one is removed. For example:
import os
print(os.path.splitext("/path/to/some/file.txt.zip.asc")[0])
Prints:
/path/to/some/file.txt.zip
See other answers below if you need to handle that case.
How can I replace (or strip) an extension from a filename in Python?
Try os.path.splitext it should do what you want.
import os
print os.path.splitext('/home/user/somefile.txt')[0]+'.jpg' # /home/user/somefile.jpg
os.path.splitext('/home/user/somefile.txt') # returns ('/home/user/somefile', '.txt')
Extracting the file extensions from file names in pandas
Option 1
apply
df['FileType'] = df.FileName.apply(lambda x: x.split('.')[-1])
Option 2
Use str
twice
df['FileType'] = df.FileName.str.split('.').str[-1]
Option 2b
Use rsplit
(thanks @cᴏʟᴅsᴘᴇᴇᴅ)
df['FileType'] = df.FileName.str.rsplit('.', 1).str[-1]
All result in:
FileName FileType
0 a.b.c.d.txt txt
1 j.k.l.exe exe
Python 3.6.4, Pandas 0.22.0
Get file name along with extension from path
The path you are using uses '\' which will be treated as escape character in python. You must treat your path as raw string first and then split it using '\':
>>> r'\home\lancaster\Downloads\a.ppt'.split('\\')[-1]
'a.ppt'
What's the way to extract file extension from file name in Python?
import os
def splitext(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
return path[:-len(ext)], path[-len(ext):]
return os.path.splitext(path)
assert splitext('20090209.02s1.1_sequence.txt')[1] == '.txt'
assert splitext('SRR002321.fastq.bz2')[1] == '.bz2'
assert splitext('hello.tar.gz')[1] == '.tar.gz'
assert splitext('ok.txt')[1] == '.txt'
Removing dot:
import os
def splitext(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
path, ext = path[:-len(ext)], path[-len(ext):]
break
else:
path, ext = os.path.splitext(path)
return path, ext[1:]
assert splitext('20090209.02s1.1_sequence.txt')[1] == 'txt'
assert splitext('SRR002321.fastq.bz2')[1] == 'bz2'
assert splitext('hello.tar.gz')[1] == 'tar.gz'
assert splitext('ok.txt')[1] == 'txt'
Specific file extension in os.walk()
os.walk()
requires a single directory argument, so you can't use wildcards. You could filter the contents of filenames
but it's probably easier to just do this:
path = "/path/to/directory/"
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
if not filename.endswith(".mp4"):
continue
file = os.path.join(dirpath, filename)
folder = os.path.basename(dirpath)
filesize = os.path.getsize(file)
Alternatively, you could use the more modern and preferred pathlib
; this will find all .mp4
files recursively:
from pathlib import Path
path = "/path/to/directory/"
for file in Path(path).rglob("*.mp4"):
[....]
Each file
object will have attributes and methods you can use to obtain information about the file, e.g. file.name
, file.stat()
, etc.
Related Topics
How to Get the Ascii Value of a Character
Keras, How to Get the Output of Each Layer
Read Specific Columns from a CSV File with CSV Module
How to Run Functions in Parallel
What's the Correct Way to Convert Bytes to a Hex String in Python 3
Matplotlib: How to Create Axessubplot Objects, Then Add Them to a Figure Instance
Group by in Group by and Average
Converting a String Representation of a List into an Actual List Object
After Conda Update, Python Kernel Crashes When Matplotlib Is Used
Extracting Extension from Filename in Python
How to Get an Absolute File Path in Python
Best Way to Replace Multiple Characters in a String
How to Print the Value of a Tensor Object in Tensorflow
Creating a Dictionary from a CSV File
How to Limit Execution Time of a Function Call