Split a File Path into Folder Names Vector

Split a file path into folder names vector

You can do it with a simple recursive function, by terminating when the dirname doesn't change:

split_path <- function(x) if (dirname(x)==x) x else c(basename(x),split_path(dirname(x)))
split_path("/home/foo/stats/index.html")
[1] "index.html" "stats" "foo" "home" "/"
split_path("C:\\Windows\\System32")
[1] "System32" "Windows" "C:/"
split_path("~")
[1] "James" "Users" "C:/"

R Split character vector of file paths into list by parent directory?

You could combine split and dirname:

path <- c("/home/username/data/dir/GCZ98/GCZ98_1998_12_16.asc.gz",
"/home/username/data/dir/GCZ98/GCZ98_1998_12_20.asc.gz",
"/home/username/data/dir/GCZ99/GCZ99_1999_12_21.asc.gz",
"/home/username/data/dir/GCZ99/GCZ99_1999_12_23.asc.gz",
"/home/username/data/dir/GCZ99/GCZ99_1999_12_27.asc.gz",
"/home/username/data/dir/GCZ99/GCZ99_1999_12_28.asc.gz")

## split by basedir
split(path, dirname(path))

# $`/home/username/data/dir/GCZ98`
# [1] "/home/username/data/dir/GCZ98/GCZ98_1998_12_16.asc.gz" "/home/username/data/dir/GCZ98/GCZ98_1998_12_20.asc.gz"
#
# $`/home/username/data/dir/GCZ99`
# [1] "/home/username/data/dir/GCZ99/GCZ99_1999_12_21.asc.gz" "/home/username/data/dir/GCZ99/GCZ99_1999_12_23.asc.gz" "/home/username/data/dir/GCZ99/GCZ99_1999_12_27.asc.gz"
# [4] "/home/username/data/dir/GCZ99/GCZ99_1999_12_28.asc.gz"

How to split a dos path into its components in Python

I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path, and recommend it on that basis.

(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)

You can get the drive and path+file like this:

drive, path_and_file = os.path.splitdrive(path)

Get the path and the file:

path, file = os.path.split(path_and_file)

Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:

folders = []
while 1:
path, folder = os.path.split(path)

if folder != "":
folders.append(folder)
elif path != "":
folders.append(path)

break

folders.reverse()

(This pops a "\" at the start of folders if the path was originally absolute. You could lose a bit of code if you didn't want that.)

How to split a path into separate strings?

Indeed, there is path_iterator. But if you want elegance:

#include <boost/filesystem.hpp>

int main() {
for(auto& part : boost::filesystem::path("/tmp/foo.txt"))
std::cout << part << "\n";
}

Prints:

"/"
"tmp"
"foo.txt"

And

    for(auto& part : boost::filesystem::path("/tmp/foo.txt"))
std::cout << part.c_str() << "\n";

prints

/
tmp
foo.txt

No need to worry about the moving parts

What would be the equivalent to str.split(/path/to/file, os.sep) from Python in R

See strsplit

strsplit("/path/to/file", .Platform$file.sep)

Get the first element from the file path

We can use cSplit from splitstackshape

splitstackshape::cSplit(df, "path", "\\\\+", fixed = FALSE)

# path_1 path_2 path_3 path_4 path_5 path_6
#1: E: My Network Places.old.dat <NA> <NA> <NA> <NA>
#2: E: pagefile.sys <NA> <NA> <NA> <NA>
#3: E: Press_Dly_Diff_G_91.rbc <NA> <NA> <NA> <NA>
#4: E: TV_Press_Dly_Diff_A Retrospect Dantz <NA> <NA>
#5: E: TV_Press_Dly_Diff_A Retrospect TV_Press_Dly_Diff_A 1-TV_Press_Dly_Diff_A AA000083.rdb
#6: E: TV_Press_Dly_Diff_A Retrospect TV_Press_Dly_Diff_A 1-TV_Press_Dly_Diff_A AA000561.rdb

Or if you already know how many columns the data will be expanded we can also use separate.

tidyr::separate(df, path, into = paste0('path', 1:6), sep = "\\\\+", fill = 'right')

data

df <- data.frame(path = x, stringsAsFactors = FALSE)


Related Topics



Leave a reply



Submit