String.Split - by Multiple Character Delimiter

string.split - by multiple character delimiter

To show both string.Split and Regex usage:

string input = "abc][rfd][5][,][.";
string[] parts1 = input.Split(new string[] { "][" }, StringSplitOptions.None);
string[] parts2 = Regex.Split(input, @"\]\[");

Java string.split - by multiple character delimiter

String.split takes a regular expression, in this case, you want non-word characters (regex \W) to be the split, so it's simply:

String input = "Hi,X How-how are:any you?";
String[] parts = input.split("[\\W]");

If you wanted to be more explicit, you could use the exact characters in the expression:

String[] parts = input.split("[,\\s\\-:\\?]");

Split string into array using multi-character delimiter

If you can, replace the _-_ sequences with another single character that you can use for field splitting. For example,

$ str="db2-111_-_oracle12cR1RAC_-_mariadb101"
$ str2=${str//_-_/#}
$ IFS="#" read -ra arr <<< "$str2"
$ printf '%s\n' "${arr[@]}"
db2-111
oracle12cR1RAC
mariadb101

Howto split a string on a multi-character delimiter in bash?

Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:

in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"

If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:

# Taken from http://mywiki.wooledge.org/BashFAQ/021

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
# STR cannot be empty
[[ $1 ]] || return

# string manip needed to escape '\'s, so awk doesn't expand '\n' and such
awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
# get the length of the search string
BEGIN {
len = length(str);
}

{
# empty the output string
out = "";

# continue looping while the search string is in the line
while (i = index($0, str)) {
# append everything up to the search string, and the replacement string
out = out substr($0, 1, i-1) rep;

# remove everything up to and including the first instance of the
# search string from the line
$0 = substr($0, i + len);
}

# append whatever is left
out = out $0;

print out;
}
'
}

...used, in this context, as:

gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt

string.split - by multiple character delimiter },{

You can use a regular expression:

var sample = "abc},{rfd},{5},{,},{.";
var result = Regex.Split(sample, Regex.Escape("},{"));
foreach (var item in result)
Console.WriteLine(item);

Split string with multiple-character delimiter

Works for me

>>> "Hello there. My name is Fr.ed. I am 25.5 years old.".split(". ")
['Hello there', 'My name is Fr.ed', 'I am 25.5 years old.']

Java String split with multicharacter delimiter

You have to escape the | character since it's a regex metacharacter for logical OR

So I would use

line.split("\\|\\|##"))

Note that You have to escape the slash as well that is why I use

\\|

instead of

\|

To escape that metacharacter

Use String.split() with multiple delimiters

I think you need to include the regex OR operator:

String[]tokens = pdfName.split("-|\\.");

What you have will match:

[DASH followed by DOT together] -.

not

[DASH or DOT any of them] - or .

split char string with multi-character delimiter in C

Finding the point at which the desired sequence occurs is pretty easy: strstr supports that:

char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");

So, at that point, pos points to the first location of abc in the larger string. Here's where things get a little ugly. strtok has a nasty design where it 1) modifies the original string, and 2) stores a pointer to the "current" location in the string internally.

If we didn't mind doing roughly the same, we could do something like this:

char *multi_tok(char *input, char *delimiter) {
static char *string;
if (input != NULL)
string = input;

if (string == NULL)
return string;

char *end = strstr(string, delimiter);
if (end == NULL) {
char *temp = string;
string = NULL;
return temp;
}

char *temp = string;

*end = '\0';
string = end + strlen(delimiter);
return temp;
}

This does work. For example:

int main() {
char input [] = "this is abc a big abc input string abc to split up";

char *token = multi_tok(input, "abc");

while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, "abc");
}
}

produces roughly the expected output:

this is
a big
input string
to split up

Nonetheless, it's clumsy, difficult to make thread-safe (you have to make its internal string variable thread-local) and generally just a crappy design. Using (for one example) an interface something like strtok_r, we can fix at least the thread-safety issue:

typedef char *multi_tok_t;

char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
if (input != NULL)
*string = input;

if (*string == NULL)
return *string;

char *end = strstr(*string, delimiter);
if (end == NULL) {
char *temp = *string;
*string = NULL;
return temp;
}

char *temp = *string;

*end = '\0';
*string = end + strlen(delimiter);
return temp;
}

multi_tok_t init() { return NULL; }

int main() {
multi_tok_t s=init();

char input [] = "this is abc a big abc input string abc to split up";

char *token = multi_tok(input, &s, "abc");

while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, &s, "abc");
}
}

I guess I'll leave it at that for now though--to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit much to post here.



Related Topics



Leave a reply



Submit