string.split - by multiple character delimiter
To show both string.Split
and Regex
usage:
string input = "abc][rfd][5][,][.";
string[] parts1 = input.Split(new string[] { "][" }, StringSplitOptions.None);
string[] parts2 = Regex.Split(input, @"\]\[");
Java string.split - by multiple character delimiter
String.split
takes a regular expression, in this case, you want non-word characters (regex \W
) to be the split, so it's simply:
String input = "Hi,X How-how are:any you?";
String[] parts = input.split("[\\W]");
If you wanted to be more explicit, you could use the exact characters in the expression:
String[] parts = input.split("[,\\s\\-:\\?]");
Split string into array using multi-character delimiter
If you can, replace the _-_
sequences with another single character that you can use for field splitting. For example,
$ str="db2-111_-_oracle12cR1RAC_-_mariadb101"
$ str2=${str//_-_/#}
$ IFS="#" read -ra arr <<< "$str2"
$ printf '%s\n' "${arr[@]}"
db2-111
oracle12cR1RAC
mariadb101
Howto split a string on a multi-character delimiter in bash?
Since you're expecting newlines, you can simply replace all instances of mm
in your string with a newline. In pure native bash:
in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"
If you wanted to do such a replacement on a longer input stream, you might be better off using awk
, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal
shell function (backending into awk
) given in BashFAQ #21 is applicable:
# Taken from http://mywiki.wooledge.org/BashFAQ/021
# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
# STR cannot be empty
[[ $1 ]] || return
# string manip needed to escape '\'s, so awk doesn't expand '\n' and such
awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
# get the length of the search string
BEGIN {
len = length(str);
}
{
# empty the output string
out = "";
# continue looping while the search string is in the line
while (i = index($0, str)) {
# append everything up to the search string, and the replacement string
out = out substr($0, 1, i-1) rep;
# remove everything up to and including the first instance of the
# search string from the line
$0 = substr($0, i + len);
}
# append whatever is left
out = out $0;
print out;
}
'
}
...used, in this context, as:
gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
string.split - by multiple character delimiter },{
You can use a regular expression:
var sample = "abc},{rfd},{5},{,},{.";
var result = Regex.Split(sample, Regex.Escape("},{"));
foreach (var item in result)
Console.WriteLine(item);
Split string with multiple-character delimiter
Works for me
>>> "Hello there. My name is Fr.ed. I am 25.5 years old.".split(". ")
['Hello there', 'My name is Fr.ed', 'I am 25.5 years old.']
Java String split with multicharacter delimiter
You have to escape the | character since it's a regex metacharacter for logical OR
So I would use
line.split("\\|\\|##"))
Note that You have to escape the slash as well that is why I use
\\|
instead of
\|
To escape that metacharacter
Use String.split() with multiple delimiters
I think you need to include the regex OR operator:
String[]tokens = pdfName.split("-|\\.");
What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] -
or .
split char string with multi-character delimiter in C
Finding the point at which the desired sequence occurs is pretty easy: strstr
supports that:
char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");
So, at that point, pos
points to the first location of abc
in the larger string. Here's where things get a little ugly. strtok
has a nasty design where it 1) modifies the original string, and 2) stores a pointer to the "current" location in the string internally.
If we didn't mind doing roughly the same, we could do something like this:
char *multi_tok(char *input, char *delimiter) {
static char *string;
if (input != NULL)
string = input;
if (string == NULL)
return string;
char *end = strstr(string, delimiter);
if (end == NULL) {
char *temp = string;
string = NULL;
return temp;
}
char *temp = string;
*end = '\0';
string = end + strlen(delimiter);
return temp;
}
This does work. For example:
int main() {
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, "abc");
}
}
produces roughly the expected output:
this is
a big
input string
to split up
Nonetheless, it's clumsy, difficult to make thread-safe (you have to make its internal string
variable thread-local) and generally just a crappy design. Using (for one example) an interface something like strtok_r
, we can fix at least the thread-safety issue:
typedef char *multi_tok_t;
char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
if (input != NULL)
*string = input;
if (*string == NULL)
return *string;
char *end = strstr(*string, delimiter);
if (end == NULL) {
char *temp = *string;
*string = NULL;
return temp;
}
char *temp = *string;
*end = '\0';
*string = end + strlen(delimiter);
return temp;
}
multi_tok_t init() { return NULL; }
int main() {
multi_tok_t s=init();
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, &s, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, &s, "abc");
}
}
I guess I'll leave it at that for now though--to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit much to post here.
Related Topics
Stop Visual Studio Debug Putting Slash in String Containing Double Quotes
Best Way to Repeat a Character in C#
How to Implement a Configurationsection with a Configurationelementcollection
Entity Framework Provider Type Could Not Be Loaded
Linq to SQL - Left Outer Join with Multiple Join Conditions
How to Parse JSON Without JSON.Net Library
No Generic Implementation of Ordereddictionary
Configuration System Failed to Initialize
How to Insert an Image into a Richtextbox
Best Way to Resolve File Path Too Long Exception
Can One Executable Be Both a Console and Gui Application
Compute the Datetime of an Upcoming Weekday
Adding Http Headers to Httpclient
Why Should You Remove Unnecessary C# Using Directives