Splitting on First Occurrence

Splitting on first occurrence

From the docs:

str.split([sep[, maxsplit]])

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).

s.split('mango', 1)[1]

split string only on first instance of specified character

Use capturing parentheses:

'good_luck_buddy'.split(/_(.*)/s)
['good', 'luck_buddy', ''] // ignore the third element

They are defined as

If separator contains capturing parentheses, matched results are returned in the array.

So in this case we want to split at _.* (i.e. split separator being a sub string starting with _) but also let the result contain some part of our separator (i.e. everything after _).

In this example our separator (matching _(.*)) is _luck_buddy and the captured group (within the separator) is lucky_buddy. Without the capturing parenthesis the luck_buddy (matching .*) would've not been included in the result array as it is the case with simple split that separators are not included in the result.

We use the s regex flag to make . match on newline (\n) characters as well, otherwise it would only split to the first newline.

Split string when first occurence of a number

Try splitting on the first occurrence of [ ](?=\d):

text = "MARIA APARECIDA 99223-2000 / 98450-8026"
parts = re.split(r' (?=\d)', text, 1)
print(parts)

This prints:

['MARIA APARECIDA', '99223-2000 / 98450-8026']

Note that the regex pattern used splits and consumes a single space, but does not consume the digit that follows (lookaheads do not advance the position in the input).

Is there a function to split by the FIRST instance of a delimiter in C?

The first time you call strtok, use the delimiter you want to split with.

For the second call, use an empty delimiter string (if you really want the rest of the string) or use "\n", in the case that your string might include a newline character and you don't want that in the split (or even "\r\n"):

    const char* first = strtok(buf, ":");
const char* rest = strtok(NULL, "");
/* or: const char* rest = strtok(NULL, "\n"); */

Split string at separator after first occurrence

You can try to use regular expressions for this job.

Just note that this is an extremely specific (and, at the same time generic) regular expression based on your only sole example.

import re

_REGEX = re.compile('^(((\.\.?)?\/)*[^\/]*)((\/?(\.\.)?)*)$')

def split_path(path):
structure = _REGEX.match(path or '').groups()
return structure[0], structure[3]

Testing

>>> split_path('../../../folder.123/../..')
('../../../folder.123', '/../..')

>>> split_path('../../../folder.123')
('../../../folder.123', '')

>>> split_path('folder.123')
('folder.123', '')

>>> split_path('/')
('/', '')

>>> split_path('')
('', '')

split string only on first instance - java

string.split("=", limit=2);

As String.split(java.lang.String regex, int limit) explains:

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

The string boo:and:foo, for example, yields the following results with these parameters:

Regex Limit    Result
:     2        { "boo", "and:foo" }
: 5 { "boo", "and", "foo" }
: -2 { "boo", "and", "foo" }
o 5 { "b", "", ":and:f", "", "" }
o -2 { "b", "", ":and:f", "", "" }
o 0 { "b", "", ":and:f" }

Split String at First Occurrence of an Integer using R

You can use tidyr::extract:

library(tidyr)
df <- df %>%
extract("name_and_address", c("name", "address"), "(\\D*)(\\d.*)")
## => df
## name address
## 1 Mr. Smith 12 Some street
## 2 Mr. Jones 345 Another street
## 3 Mr. Anderson 6 A different street

The (\D*)(\d.*) regex matches the following:

  • (\D*) - Group 1: any zero or more non-digit chars
  • (\d.*) - Group 2: a digit and then any zero or more chars as many as possible.

Another solution with stringr::str_split is also possible:

str_split(df$name_and_address, "(?=\\d)", n=2)
## => [[1]]
## [1] "Mr. Smith" "12 Some street"

## [[2]]
## [1] "Mr. Jones" "345 Another street"

## [[3]]
## [1] "Mr. Anderson" "6 A different street"

The (?=\d) positive lookahead finds a location before a digit, and n=2 tells stringr::str_split to only split into 2 chunks max.

Base R approach that does not return anything if there is no digit in the string:

df = data.frame(name_and_address = c("Mr. Smith12 Some street", "Mr. Jones345 Another street", "Mr. Anderson6 A different street", "1 digit is at the start", "No digits, sorry."))

df$name <- sub("^(?:(\\D*)\\d.*|.+)", "\\1", df$name_and_address)
df$address <- sub("^\\D*(\\d.*)?", "\\1", df$name_and_address)
df$name
# => [1] "Mr. Smith" "Mr. Jones" "Mr. Anderson" "" ""
df$address
# => [1] "12 Some street" "345 Another street"
# [3] "6 A different street" "1 digit is at the start" ""

See an online R demo. This also supports cases when the first digit is the first char in the string.

Split string using separator skipping first occurrence

You can use the str.rsplit method with a maxsplit of 1 instead:

file_path.rsplit('/', maxsplit=1)[0]

How can i split a string into two on the first occurrence of a character

str.split takes a maxsplit argument, pass 1 to only split on the first -:

print components[i].rstrip().split('-',1)

To store the output in two variables:

In [7]: s = "console-3.45.1-0"

In [8]: a,b = s.split("-",1)

In [9]: a
Out[9]: 'console'

In [10]: b
Out[10]: '3.45.1-0'


Related Topics



Leave a reply



Submit