How to Split a Dos Path into Its Components in Python

How to split a dos path into its components in Python

I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)

You can get the drive and path+file like this:

drive, path_and_file = os.path.splitdrive(path)

Get the path and the file:

path, file = os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders = []
while 1:
path, folder = os.path.split(path)

if folder != "":
folders.append(folder)
elif path != "":
folders.append(path)

break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn't want that.)

cross-platform splitting of path in python

The OP specified "will work with Windows paths too". There are a few wrinkles with Windows paths.

Firstly, Windows has the concept of multiple drives, each with its own current working directory, and 'c:foo' and 'c:\\foo' are often not the same. Consequently it is a very good idea to separate out any drive designator first, using os.path.splitdrive(). Then reassembling the path (if required) can be done correctly by
drive + os.path.join(*other_pieces)

Secondly, Windows paths can contain slashes or backslashes or a mixture. Consequently, using os.sep when parsing an unnormalised path is not useful.

More generally:

The results produced for 'foo' and 'foo/' should not be identical.

The loop termination condition seems to be best expressed as "os.path.split() treated its input as unsplittable".

Here's a suggested solution, with tests, including a comparison with @Spacedman's solution

import os.path

def os_path_split_asunder(path, debug=False):
parts = []
while True:
newpath, tail = os.path.split(path)
if debug: print repr(path), (newpath, tail)
if newpath == path:
assert not tail
if path: parts.append(path)
break
parts.append(tail)
path = newpath
parts.reverse()
return parts

def spacedman_parts(path):
components = []
while True:
(path,tail) = os.path.split(path)
if not tail:
return components
components.insert(0,tail)

if __name__ == "__main__":
tests = [
'',
'foo',
'foo/',
'foo\\',
'/foo',
'\\foo',
'foo/bar',
'/',
'c:',
'c:/',
'c:foo',
'c:/foo',
'c:/users/john/foo.txt',
'/users/john/foo.txt',
'foo/bar/baz/loop',
'foo/bar/baz/',
'//hostname/foo/bar.txt',
]
for i, test in enumerate(tests):
print "\nTest %d: %r" % (i, test)
drive, path = os.path.splitdrive(test)
print 'drive, path', repr(drive), repr(path)
a = os_path_split_asunder(path)
b = spacedman_parts(path)
print "a ... %r" % a
print "b ... %r" % b
print a == b

and here's the output (Python 2.7.1, Windows 7 Pro):

Test 0: ''
drive, path '' ''
a ... []
b ... []
True

Test 1: 'foo'
drive, path '' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 2: 'foo/'
drive, path '' 'foo/'
a ... ['foo', '']
b ... []
False

Test 3: 'foo\\'
drive, path '' 'foo\\'
a ... ['foo', '']
b ... []
False

Test 4: '/foo'
drive, path '' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 5: '\\foo'
drive, path '' '\\foo'
a ... ['\\', 'foo']
b ... ['foo']
False

Test 6: 'foo/bar'
drive, path '' 'foo/bar'
a ... ['foo', 'bar']
b ... ['foo', 'bar']
True

Test 7: '/'
drive, path '' '/'
a ... ['/']
b ... []
False

Test 8: 'c:'
drive, path 'c:' ''
a ... []
b ... []
True

Test 9: 'c:/'
drive, path 'c:' '/'
a ... ['/']
b ... []
False

Test 10: 'c:foo'
drive, path 'c:' 'foo'
a ... ['foo']
b ... ['foo']
True

Test 11: 'c:/foo'
drive, path 'c:' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False

Test 12: 'c:/users/john/foo.txt'
drive, path 'c:' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 13: '/users/john/foo.txt'
drive, path '' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False

Test 14: 'foo/bar/baz/loop'
drive, path '' 'foo/bar/baz/loop'
a ... ['foo', 'bar', 'baz', 'loop']
b ... ['foo', 'bar', 'baz', 'loop']
True

Test 15: 'foo/bar/baz/'
drive, path '' 'foo/bar/baz/'
a ... ['foo', 'bar', 'baz', '']
b ... []
False

Test 16: '//hostname/foo/bar.txt'
drive, path '' '//hostname/foo/bar.txt'
a ... ['//', 'hostname', 'foo', 'bar.txt']
b ... ['hostname', 'foo', 'bar.txt']
False

Extract some portion of path

You can split the terms in \ and join it:

'\\'.join(path_name.split("\\")[-3:]

error with splitting file path string by / in python

You can split by the OS path separator:

import os
import glob

path = r'C://Users/Alexander/Desktop/test/*.txt'
for file in glob.glob(path):
name = file.split(os.path.sep)[-1]
name2 = name.split(".")[0]
print(name2)

How to slice the file path in Python

If you just want to split on elements, then use this.

>>> path = 'C:/Users/arul/Desktop/jobs/project_folder/shots/shot_folder/elements/MexicoCity-Part1/'
>>> path.split('elements')[0]
'C:/Users/arul/Desktop/jobs/project_folder/shots/shot_folder/'

One drawback of this approach is that it'll fail if you encounter the word elements in your path multiple times. In that case, you can do something like:

>>> '/'.join(path.split('/')[:-3]) + '/'
'C:/Users/arul/Desktop/jobs/project_folder/shots/shot_folder/'

Assuming you know the depth of the path you need.

Python: How to get last n elements of a file path

import os
path = r"/folderA/folderB/folderC/item1"

For Windows:

os.sep.join(path.rsplit(r"/")[-2:])

Operating system independent:

os.sep.join(os.path.normpath(path).split(os.sep)[-2:])

Path manipulation in python

os.path.dirname() (doc) is the way to go. It returns the directory which contains the object pointed by the path:

>>> import os.path
>>> os.path.dirname('/path1/path2/path3/file')
'/path1/path2/path3'

In this case, you want the "grandparent" directory, so just use the function twice:

>>> parent = os.path.dirname('/path1/path2/path3/file')
>>> os.path.dirname(parent)
'/path1/path2'

If you want to do it an arbitrary number of times, a function can be helpful here:

def go_up(path, n):
for i in range(n):
path = os.path.dirname(path)

return path

Here are some examples:

>>> go_up('/path1/path2/path3/file', 1)
'/path1/path2/path3'
>>> go_up('/path1/path2/path3/file', 2)
'/path1/path2'
>>> go_up('/path1/path2/path3/file', 3)
'/path1'

Split string by escape character

prefix your string with r - that will turn it into a raw string, telling python that \ is a literal \.

s = r"C:\Users\as\Desktop\Data\pdf\txt\RTX_IDS_1DYS_20170610_0000_220279611-650000624200.txt"
parts = s.split("\\")
print(parts)

Output:

['C:', 'Users', 'as', 'Desktop', 'Data', 'pdf', 'txt', 'RTX_IDS_1DYS_20170610_0000_220279611-650000624200.txt']

For more information on string prefixes see:

https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

splitting a full url into parts

The urlparse library, found in urllib in Python3, is designed for this. Example adapted from the documentation:

>>> from urllib.parse import urlparse
>>> o = urlparse('https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34')
>>> o
ParseResult(scheme='https', netloc='api.somedomain.co.uk', path='/api/addresses', params='', query='postcode=XXSDF&houseNo=34', fragment='')
>>> o.scheme
'http'
>>> o.port
None
>>> o.geturl()
'https://api.somedomain.co.uk/api/addresses?postcode=XXSDF&houseNo=34'

In order to get host, path and query, the API is straighforward:

>>> print(o.hostname, o.path, o.query)

Returns:

api.somedomain.co.uk /api/addresses postcode=XXSDF&houseNo=34

In order to get the subdomain itself, the only way seems to split by ..


Note that the urllib.parse.urlsplit should be used instead urlparse, according to the documentation:

This should generally be used instead of urlparse(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted



Related Topics



Leave a reply



Submit