Extracting Extension from Filename in Python

Extracting extension from filename in Python

Use os.path.splitext:

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

How to get file extension correctly?

The role of a file extension is to tell the viewer (and sometimes the computer) which application to use to handle the file.

Taking your worst-case example in your comments (a.ppt.tar.gz), this is a PowerPoint file that has been tar-balled and then gzipped. So you need to use a gzip-handling program to open it. Using PowerPoint or a tarball-handling program wouldn't work. OK, a clever program that knew how to handle both .tar and .gz files could understand both operations and work with a .tar.gz file - but note that it would do that even if the extension was simply .gz.

The fact that both tar and gzip add their extensions to the original filename, rather than replace them (as zip does) is a convenience. But the base name of the gzip file is still a.ppt.tar.

How do I get the filename without the extension from a path in Python?

Getting the name of the file without the extension:

import os
print(os.path.splitext("/path/to/some/file.txt")[0])

Prints:

/path/to/some/file

Documentation for os.path.splitext.

Important Note: If the filename has multiple dots, only the extension after the last one is removed. For example:

import os
print(os.path.splitext("/path/to/some/file.txt.zip.asc")[0])

Prints:

/path/to/some/file.txt.zip

See other answers below if you need to handle that case.

How can I replace (or strip) an extension from a filename in Python?

Try os.path.splitext it should do what you want.

import os
print os.path.splitext('/home/user/somefile.txt')[0]+'.jpg' # /home/user/somefile.jpg
os.path.splitext('/home/user/somefile.txt')  # returns ('/home/user/somefile', '.txt')

Extracting the file extensions from file names in pandas

Option 1
apply

df['FileType'] = df.FileName.apply(lambda x: x.split('.')[-1])

Option 2
Use str twice

df['FileType'] = df.FileName.str.split('.').str[-1]

Option 2b
Use rsplit (thanks @cᴏʟᴅsᴘᴇᴇᴅ)

df['FileType'] = df.FileName.str.rsplit('.', 1).str[-1]

All result in:

      FileName FileType
0 a.b.c.d.txt txt
1 j.k.l.exe exe

Python 3.6.4, Pandas 0.22.0

Get file name along with extension from path

The path you are using uses '\' which will be treated as escape character in python. You must treat your path as raw string first and then split it using '\':

>>> r'\home\lancaster\Downloads\a.ppt'.split('\\')[-1]
'a.ppt'

What's the way to extract file extension from file name in Python?

import os

def splitext(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
return path[:-len(ext)], path[-len(ext):]
return os.path.splitext(path)

assert splitext('20090209.02s1.1_sequence.txt')[1] == '.txt'
assert splitext('SRR002321.fastq.bz2')[1] == '.bz2'
assert splitext('hello.tar.gz')[1] == '.tar.gz'
assert splitext('ok.txt')[1] == '.txt'

Removing dot:

import os

def splitext(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
path, ext = path[:-len(ext)], path[-len(ext):]
break
else:
path, ext = os.path.splitext(path)
return path, ext[1:]

assert splitext('20090209.02s1.1_sequence.txt')[1] == 'txt'
assert splitext('SRR002321.fastq.bz2')[1] == 'bz2'
assert splitext('hello.tar.gz')[1] == 'tar.gz'
assert splitext('ok.txt')[1] == 'txt'

Specific file extension in os.walk()

os.walk() requires a single directory argument, so you can't use wildcards. You could filter the contents of filenames but it's probably easier to just do this:

path = "/path/to/directory/"
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
if not filename.endswith(".mp4"):
continue
file = os.path.join(dirpath, filename)
folder = os.path.basename(dirpath)
filesize = os.path.getsize(file)

Alternatively, you could use the more modern and preferred pathlib; this will find all .mp4 files recursively:

from pathlib import Path

path = "/path/to/directory/"
for file in Path(path).rglob("*.mp4"):
[....]

Each file object will have attributes and methods you can use to obtain information about the file, e.g. file.name, file.stat(), etc.



Related Topics



Leave a reply



Submit