Splitting a File Name into Name,Extension

Extracting extension from filename in Python

Use os.path.splitext:

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

Splitting a file name into name,extension

Use strsplit:

R> strsplit("name1.csv", "\\.")[[1]]
[1] "name1" "csv"
R>

Note that you a) need to escape the dot (as it is a metacharacter for regular expressions) and b) deal with the fact that strsplit() returns a list of which typically only the first element is of interest.

A more general solution involves regular expressions where you can extract the matches.

For the special case of filenames you also have:

R> library(tools)   # unless already loaded, comes with base R
R> file_ext("name1.csv")
[1] "csv"
R>

and

R> file_path_sans_ext("name1.csv")
[1] "name1"
R>

as these are such a common tasks (cf basename in shell etc).

Java: splitting the filename into a base and extension

I know others have mentioned String.split, but here is a variant that only yields two tokens (the base and the extension):

String[] tokens = fileName.split("\\.(?=[^\\.]+$)");

For example:

"test.cool.awesome.txt".split("\\.(?=[^\\.]+$)");

Yields:

["test.cool.awesome", "txt"]

The regular expression tells Java to split on any period that is followed by any number of non-periods, followed by the end of input. There is only one period that matches this definition (namely, the last period).

Technically Regexically speaking, this technique is called zero-width positive lookahead.


BTW, if you want to split a path and get the full filename including but not limited to the dot extension, using a path with forward slashes,

    String[] tokens = dir.split(".+?/(?=[^/]+$)");

For example:

    String dir = "/foo/bar/bam/boozled"; 
String[] tokens = dir.split(".+?/(?=[^/]+$)");
// [ "/foo/bar/bam/" "boozled" ]

Extract filename and extension in Bash

First, get file name without the path:

filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"

Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:

filename="${fullfile##*/}"

You may want to check the documentation :

  • On the web at section "3.5.3 Shell Parameter Expansion"
  • In the bash manpage at section called "Parameter Expansion"

How to separate the file name and the extension of a file in c#

You can use Path.GetExtension:

var extension = 
Path.GetExtension("C:\\sample.txt"); // returns txt

..and Path.GetFileNameWithoutExtension:

var fileNameWithoutExtension = 
Path.GetFileNameWithoutExtension("C:\\sample.txt"); // returns sample

How to split file name into base and extension


Final solution:

String pat = "(?!^)\\.(?=[^.]*$)|(?<=^\\.[^.]{0,1000})$|$";

The pattern consists of 3 alternatives to split with:

  • (?!^)\\.(?=[^.]*$) - split with a dot that is not the first character in the string ((?!^)) and that has 0+ characters other than . to the right of it up to the string end (``)
  • (?<=^\\.[^.]{0,1000})$) - split at the end of string if a string starts with a literal . and has 0 to 1000 characters (maybe setting to 1,256 is enough, but there are longer file names, please adjust accordingly)
  • $ - split at the end of string (replace with \\z if you need no \n if a string ends with \n)

When you pass 2 as a limit argument to the split method, you can limit the number of splits to just two, see Java demo:

System.out.println(Arrays.toString(".MyFile".split(pat,2)));            // [.MyFile, ]
System.out.println(Arrays.toString("MyFile.ext".split(pat,2))); // [MyFile, ext]
System.out.println(Arrays.toString("Another.MyFile.ext".split(pat,2))); // [Another.MyFile, ext]
System.out.println(Arrays.toString("MyFile.".split(pat,2))); // [MyFile, ]
System.out.println(Arrays.toString("MyFile".split(pat,2))); // [MyFile, ]

Original answer

I believe you are looking for

(?!^)\\.(?=[^.]*$)|(?<=^\\.[^.]{0,1000})$

One note: the pattern that can be used with split uses a constrained-width lookbehind that assumes that the length of the file cannot be more than 1000. Increase the value as needed.

See the IDEONE demo:

String pat = "(?!^)\\.(?=[^.]*$)|(?<=^\\.[^.]{0,1000})$";
String s = ".MyFile";
System.out.println(Arrays.toString(s.split(pat,-1)));
s = "MyFile.ext";
System.out.println(Arrays.toString(s.split(pat,-1)));
s = "Another.MyFile.ext";
System.out.println(Arrays.toString(s.split(pat,-1)));
s = "MyFile.";
System.out.println(Arrays.toString(s.split(pat,-1)));

Results:

".MyFile"            => [.MyFile, ]
"MyFile.ext" => [MyFile, ext]
"Another.MyFile.ext" => [Another.MyFile, ext]
"MyFile." => [MyFile, ]


Related Topics



Leave a reply



Submit