How to split a dos path into its components in Python
I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path
, and recommend it on that basis.
(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)
You can get the drive and path+file like this:
drive, path_and_file = os.path.splitdrive(path)
Get the path and the file:
path, file = os.path.split(path_and_file)
Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:
folders = []
while 1:
path, folder = os.path.split(path)
if folder != "":
folders.append(folder)
elif path != "":
folders.append(path)
break
folders.reverse()
(This pops a "\"
at the start of folders
if the path was originally absolute. You could lose a bit of code if you didn't want that.)
cross-platform splitting of path in python
The OP specified "will work with Windows paths too". There are a few wrinkles with Windows paths.
Firstly, Windows has the concept of multiple drives, each with its own current working directory, and 'c:foo'
and 'c:\\foo'
are often not the same. Consequently it is a very good idea to separate out any drive designator first, using os.path.splitdrive(). Then reassembling the path (if required) can be done correctly bydrive + os.path.join(*other_pieces)
Secondly, Windows paths can contain slashes or backslashes or a mixture. Consequently, using os.sep
when parsing an unnormalised path is not useful.
More generally:
The results produced for 'foo'
and 'foo/'
should not be identical.
The loop termination condition seems to be best expressed as "os.path.split() treated its input as unsplittable".
Here's a suggested solution, with tests, including a comparison with @Spacedman's solution
import os.path
def os_path_split_asunder(path, debug=False):
parts = []
while True:
newpath, tail = os.path.split(path)
if debug: print repr(path), (newpath, tail)
if newpath == path:
assert not tail
if path: parts.append(path)
break
parts.append(tail)
path = newpath
parts.reverse()
return parts
def spacedman_parts(path):
components = []
while True:
(path,tail) = os.path.split(path)
if not tail:
return components
components.insert(0,tail)
if __name__ == "__main__":
tests = [
'',
'foo',
'foo/',
'foo\\',
'/foo',
'\\foo',
'foo/bar',
'/',
'c:',
'c:/',
'c:foo',
'c:/foo',
'c:/users/john/foo.txt',
'/users/john/foo.txt',
'foo/bar/baz/loop',
'foo/bar/baz/',
'//hostname/foo/bar.txt',
]
for i, test in enumerate(tests):
print "\nTest %d: %r" % (i, test)
drive, path = os.path.splitdrive(test)
print 'drive, path', repr(drive), repr(path)
a = os_path_split_asunder(path)
b = spacedman_parts(path)
print "a ... %r" % a
print "b ... %r" % b
print a == b
and here's the output (Python 2.7.1, Windows 7 Pro):
Test 0: ''
drive, path '' ''
a ... []
b ... []
True
Test 1: 'foo'
drive, path '' 'foo'
a ... ['foo']
b ... ['foo']
True
Test 2: 'foo/'
drive, path '' 'foo/'
a ... ['foo', '']
b ... []
False
Test 3: 'foo\\'
drive, path '' 'foo\\'
a ... ['foo', '']
b ... []
False
Test 4: '/foo'
drive, path '' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False
Test 5: '\\foo'
drive, path '' '\\foo'
a ... ['\\', 'foo']
b ... ['foo']
False
Test 6: 'foo/bar'
drive, path '' 'foo/bar'
a ... ['foo', 'bar']
b ... ['foo', 'bar']
True
Test 7: '/'
drive, path '' '/'
a ... ['/']
b ... []
False
Test 8: 'c:'
drive, path 'c:' ''
a ... []
b ... []
True
Test 9: 'c:/'
drive, path 'c:' '/'
a ... ['/']
b ... []
False
Test 10: 'c:foo'
drive, path 'c:' 'foo'
a ... ['foo']
b ... ['foo']
True
Test 11: 'c:/foo'
drive, path 'c:' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False
Test 12: 'c:/users/john/foo.txt'
drive, path 'c:' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False
Test 13: '/users/john/foo.txt'
drive, path '' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False
Test 14: 'foo/bar/baz/loop'
drive, path '' 'foo/bar/baz/loop'
a ... ['foo', 'bar', 'baz', 'loop']
b ... ['foo', 'bar', 'baz', 'loop']
True
Test 15: 'foo/bar/baz/'
drive, path '' 'foo/bar/baz/'
a ... ['foo', 'bar', 'baz', '']
b ... []
False
Test 16: '//hostname/foo/bar.txt'
drive, path '' '//hostname/foo/bar.txt'
a ... ['//', 'hostname', 'foo', 'bar.txt']
b ... ['hostname', 'foo', 'bar.txt']
False
Extract some portion of path
You can split the terms in \
and join it:
'\\'.join(path_name.split("\\")[-3:]
error with splitting file path string by / in python
You can split by the OS path separator:
import os
import glob
path = r'C://Users/Alexander/Desktop/test/*.txt'
for file in glob.glob(path):
name = file.split(os.path.sep)[-1]
name2 = name.split(".")[0]
print(name2)
How to slice the file path in Python
If you just want to split on elements
, then use this.
>>> path = 'C:/Users/arul/Desktop/jobs/project_folder/shots/shot_folder/elements/MexicoCity-Part1/'
>>> path.split('elements')[0]
'C:/Users/arul/Desktop/jobs/project_folder/shots/shot_folder/'
One drawback of this approach is that it'll fail if you encounter the word elements
in your path multiple times. In that case, you can do something like:
>>> '/'.join(path.split('/')[:-3]) + '/'
'C:/Users/arul/Desktop/jobs/project_folder/shots/shot_folder/'
Assuming you know the depth of the path you need.
Python: How to get last n elements of a file path
import os
path = r"/folderA/folderB/folderC/item1"
For Windows:
os.sep.join(path.rsplit(r"/")[-2:])
Operating system independent:
os.sep.join(os.path.normpath(path).split(os.sep)[-2:])
Path manipulation in python
os.path.dirname()
(doc) is the way to go. It returns the directory which contains the object pointed by the path:
>>> import os.path
>>> os.path.dirname('/path1/path2/path3/file')
'/path1/path2/path3'
In this case, you want the "grandparent" directory, so just use the function twice:
>>> parent = os.path.dirname('/path1/path2/path3/file')
>>> os.path.dirname(parent)
'/path1/path2'
If you want to do it an arbitrary number of times, a function can be helpful here:
def go_up(path, n):
for i in range(n):
path = os.path.dirname(path)
return path
Here are some examples:
>>> go_up('/path1/path2/path3/file', 1)
'/path1/path2/path3'
>>> go_up('/path1/path2/path3/file', 2)
'/path1/path2'
>>> go_up('/path1/path2/path3/file', 3)
'/path1'
Split string by escape character
prefix your string with r
- that will turn it into a raw string, telling python that \
is a literal \
.
s = r"C:\Users\as\Desktop\Data\pdf\txt\RTX_IDS_1DYS_20170610_0000_220279611-650000624200.txt"
parts = s.split("\\")
print(parts)
Output:
['C:', 'Users', 'as', 'Desktop', 'Data', 'pdf', 'txt', 'RTX_IDS_1DYS_20170610_0000_220279611-650000624200.txt']
For more information on string prefixes see:
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
splitting a full url into parts
The urlparse
library, found in urllib
in Python3, is designed for this. Example adapted from the documentation:
>>> from urllib.parse import urlparse
>>> o = urlparse('https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34')
>>> o
ParseResult(scheme='https', netloc='api.somedomain.co.uk', path='/api/addresses', params='', query='postcode=XXSDF&houseNo=34', fragment='')
>>> o.scheme
'http'
>>> o.port
None
>>> o.geturl()
'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'
In order to get host, path and query, the API is straighforward:
>>> print(o.hostname, o.path, o.query)
Returns:
api.somedomain.co.uk /api/addresses postcode=XXSDF&houseNo=34
In order to get the subdomain itself, the only way seems to split by .
.
Note that the urllib.parse.urlsplit
should be used instead urlparse
, according to the documentation:
This should generally be used instead of urlparse(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted
Related Topics
What Are the Differences Between JSON and Simplejson Python Modules
Pandas 'Count(Distinct)' Equivalent
Pandas: Merge (Join) Two Data Frames on Multiple Columns
How to Open Every File in a Folder
Pandas Dataframe: Replace All Values in a Column, Based on Condition
Python Unicodedecodeerror - am I Misunderstanding Encode
Large, Persistent Dataframe in Pandas
How to Merge a Transparent Png Image with Another Image Using Pil
Converting Xml to JSON Using Python
Load Data from Txt with Pandas
Pandas: Setting No. of Max Rows