Splitting on last delimiter in Python string?
Use .rsplit()
or .rpartition()
instead:
s.rsplit(',', 1)
s.rpartition(',')
str.rsplit()
lets you specify how many times to split, while str.rpartition()
only splits once but always returns a fixed number of elements (prefix, delimiter & postfix) and is faster for the single split case.
Demo:
>>> s = "a,b,c,d"
>>> s.rsplit(',', 1)
['a,b,c', 'd']
>>> s.rsplit(',', 2)
['a,b', 'c', 'd']
>>> s.rpartition(',')
('a,b,c', ',', 'd')
Both methods start splitting from the right-hand-side of the string; by giving str.rsplit()
a maximum as the second argument, you get to split just the right-hand-most occurrences.
If you only need the last element, but there is a chance that the delimiter is not present in the input string or is the very last character in the input, use the following expressions:
# last element, or the original if no `,` is present or is the last character
s.rsplit(',', 1)[-1] or s
s.rpartition(',')[-1] or s
If you need the delimiter gone even when it is the last character, I'd use:
def last(string, delimiter):
"""Return the last element from string, after the delimiter
If string ends in the delimiter or the delimiter is absent,
returns the original string without the delimiter.
"""
prefix, delim, last = string.rpartition(delimiter)
return last if (delim and last) else prefix
This uses the fact that string.rpartition()
returns the delimiter as the second argument only if it was present, and an empty string otherwise.
Split string on the last occurrence of some character
It might be easier to just assume that files which end with a dot followed by alphanumeric characters have extensions.
int p=filePath.lastIndexOf(".");
String e=filePath.substring(p+1);
if( p==-1 || !e.matches("\\w+") ){/* file has no extension */}
else{ /* file has extension e */ }
See the Java docs for regular expression patterns. Remember to escape the backslash because the pattern string needs the backslash.
split string last delimiter
These use no packages. They assume that each element of col2
has at least one underscore. (See note if lifting this restriction is needed.)
1) The first regular expression (.*)_
matches everything up to the last underscore followed by everything remaining .*
and the first sub
replaces the entire match with the matched part within parens. This works because such matches are greedy so the first .*
will take everything it can leaving the rest for the second .*
. The second regular expression matches everything up to the last underscore and the second sub
replaces that with the empty string.
transform(df, col2 = sub("(.*)_.*", "\\1", col2), col3 = sub(".*_", "", col2))
2) Here is a variation that is a bit more symmetric. It uses the same regular expression for both sub
calls.
pat <- "(.*)_(.*)"
transform(df, col2 = sub(pat, "\\1", col2), col3 = sub(pat, "\\2", col2))
Note: If we did want to handle strings with no underscore at all such that "xyz" is split into "xyz" and "" then use this for the second sub
. It tries to match the left hand side of the | first and if that fails (which will occur if there are no underscores) then the entire string will match the right hand side and sub
will replace that with the empty string.
sub(".*_|^[^_]*$", "", col2)
How to split a string at the last occurence of a sequence
The range(of:...)
method of String
has a .backwards
option
to find the last occurrence of a string.
Then substring(to:)
and substring(from:)
can be used with the
lower/upper bound of that range to extract the parts of the string
preceding/following the separator:
func parseTuple(from string: String) -> (String, Int)? {
if let theRange = string.range(of: "###", options: .backwards),
let i = Int(string.substring(from: theRange.upperBound)) {
return (string.substring(to: theRange.lowerBound), i)
} else {
return nil
}
}
Example:
if let tuple = parseTuple(from: "Connect###Four###Player###7") {
print(tuple)
// ("Connect###Four###Player", 7)
}
Swift 4 update:
func parseTuple(from string: String) -> (String, Int)? {
if let theRange = string.range(of: "###", options: .backwards),
let i = Int(string[theRange.upperBound...]) {
return (String(string[...theRange.lowerBound]), i)
} else {
return nil
}
}
Split Character String Using Only Last Delimiter in r
A solution based on stringi
and data.table
: reverse the string and split it into fixed items and then reverse back:
library(stringi)
x <- c('foo - bar', 'hey-now-man', 'say-now-girl', 'fine-now')
lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
If we want to make a data.frame
with this:
y <- lapply(stri_split_regex(stri_reverse(x), pattern = '[-\\s]+', n = 2), stri_reverse)
y <- setNames(data.table::transpose(y)[2:1], c('output1', 'output2'))
df <- as.data.frame(c(list(input = x), y))
# > df
# input output1 output2
# 1 foo - bar foo bar
# 2 hey-now-man hey-now man
# 3 say-now-girl say-now girl
# 4 fine-now fine now
pandas split by last delimiter
With Series.str.rsplit
, limiting the number of splits.
df.col1.str.rsplit('|', 1, expand=True).rename(lambda x: f'col{x + 1}', axis=1)
If the above throws you a SyntaxError, it means you're on a python version older than 3.6 (shame on you!). Use instead
df.col1.str.rsplit('|', 1, expand=True)\
.rename(columns=lambda x: 'col{}'.format(x + 1))
col1 col2
0 MLB|NBA NFL
1 MLB NBA
2 NFL|NHL|NBA MLB
There's also the faster loopy str.rsplit
equivalent.
pd.DataFrame(
[x.rsplit('|', 1) for x in df.col1.tolist()],
columns=['col1', 'col2']
)
col1 col2
0 MLB|NBA NFL
1 MLB NBA
2 NFL|NHL|NBA MLB
P.S., yes, the second solution is faster:
df = pd.concat([df] * 100000, ignore_index=True)
%timeit df.col1.str.rsplit('|', 1, expand=True)
%timeit pd.DataFrame([x.rsplit('|', 1) for x in df.col1.tolist()])
473 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
128 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
How to split a string in shell and get the last field
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Split a string at the last occurrence of the separator in golang
Since this is for path operations, and it looks like you don't want the trailing path separator, then path.Dir
does what you're looking for:
fmt.Println(path.Dir("a/b/c/d/e"))
// a/b/c/d
If this is specifically for filesystem paths, you will want to use the filepath
package instead, to properly handle multiple path separators.
Second-to-last occurrence of delimiter-split string
Another option could be just a match with a negative lookahead assertion, and exclude matching newlines before asserting the end of the string.
\w+(?=,[^,\n]*$)
Regex demo
How to split a string into 2 at the last occurrence of an underscore character
You can use lastIndexOf
on String
which returns you the index of the last occurrence of a chain of caracters.
String thing = "132131_12313_1321_312";
int index = thing.lastIndexOf("_");
String yourCuttedString = thing.substring(0, index);
It returns -1
if the occurrence is not found in the String.
Related Topics
Filling in the Area Under a Line Graph in Ggplot2: Geom_Area()
Getting Unique Rows of a Table and Their Numbers
How to Filter Rows Based on the Previous Row and Keep Previous Row Using Dplyr
Selecting Multiple Parts of a List
Display Different Time Elements at Different Speeds in Gganimate
Removing Row with Duplicated Values in All Columns of a Data Frame (R)
Collapse/Concatenate/Aggregate Multiple Columns to a Single Comma Separated String Within Each Group
Function/Loop to Replace Na with Values in Adjacent Columns in R
Split a Column to Multiple Columns
Include a Comma Separator for Data Labels
How to Display Line Numbers for Code Chunks in Rmarkdown HTML and PDF
How to Prevent Blogdown from Rerendering All Posts
How to Pass R Variable into SQLdf