Remove Strings by a Specific Delimiter

How to remove string before and after certain delimiter positions in R?

You could use strsplit

sapply(strsplit(tt, "_"), "[[", 6)
#[1] "S50" "S50" "S62"

Explanation: We use vectorised strsplit to split tt on every "_" resulting in a list; sapply(..., "[[", 6) then extracts the 6th element from every list element.

Alternatively you could use an explicit anonymous function

sapply(strsplit(tt, "_"), function(x) x[6])

Remove strings by a specific delimiter

Just use the POSIX compatible split() function on $2 as

awk '{split($2,temp,":"); $2=temp[2];}1' file
--- 16050075 16050075 A G
--- 16050115 16050115 G A
--- 16050213 16050213 C T
--- 16050319 16050319 C T
--- 16050527 16050527 C A

Split the column 2 on de-limiter :, update the $2 value to the required element (temp[2]) and print the rest of the fields ({}1 re-constructs all individual fields based on FS and prints it).

Recommend this over using multiple de-limiters, as it alters the absolute position of the individual fields, while split() makes it easy to retain the position and just extract the required value.


For your updated requirement to add a new column, just do

awk '{split($2,temp,":"); $2=temp[1] FS temp[2];}1' file
--- 22 16050075 16050075 A G
--- 22 16050115 16050115 G A
--- 22 16050213 16050213 C T
--- 22 16050319 16050319 C T
--- 22 16050527 16050527 C A

Alternatively if you have GNU awk/gawk you can use its gensub() for a regex (using POSIX character class [[:digit]]) based extraction as

awk '{$2=gensub(/^([[:digit:]]+):([[:digit:]]+).*$/,"\\1 \\2","g",$2);}1' file
--- 22 16050075 16050075 A G
--- 22 16050115 16050115 G A
--- 22 16050213 16050213 C T
--- 22 16050319 16050319 C T
--- 22 16050527 16050527 C A

The gensub(/^([[:digit:]]+):([[:digit:]]+).*$/,"\\1 \\2","g",$2) part captures only the first two fields de-limited by : with the capturing groups \\1 and \\2 and printing the rest of the fields as such.

removing ' delimiters from a string

You can just do

toSplit = toSplit.Replace("'", "");

before you split

But I am not quite understanding your question. Your title says you want to remove ' from a string.

I am also unsure how your code gets 4 objects in an array by splitting by ' since there is only one in your string.

The array would look like that if you did a split with a space character.

So do this to get the output you want:

string toSplit = "hello how 'are u";
toSplit = toSplit.Replace("'", "");
string[] arr = toSplit.Split(' ');
for (int i=0 ; i < arr.Length ; i++)
Console.Write("arr[i]="+ arr[i]);

Removing strings between two delimiters

Use regex replace with a reluctant quantifier:

str = str.replaceAll("--/--.*?--/--\\s*", "");

The expression *? is a reluctant quantifier, which means it matches as little as possible while still matching, which in turn means it will stop at the next delimiter after the first in case there are multiple delimiter pairs in the input.

I added \s* to the end to also remove trailing spaces after the closing delimiter (which your example seemed to suggest was wanted).


To use this approach, you're going to have to read the text file line at a time, rather than word at a time, process the line to remove the username then split into words:

while (textFile.hasNextLine()) {
for (string word : textFile.nextLine().trim().toLowerCase().replaceAll("--/--.*?--/--\\s*", "").split("\\s+")) {
words.add(word);
}
}

Python pandas: remove everything after a delimiter in a string

You can use pandas.Series.str.split just like you would use split normally. Just split on the string '::', and index the list that's created from the split method:

>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]})
>>> df
text
0 vendor a::ProductA
1 vendor b::ProductA
2 vendor a::Productb
>>> df['text_new'] = df['text'].str.split('::').str[0]
>>> df
text text_new
0 vendor a::ProductA vendor a
1 vendor b::ProductA vendor b
2 vendor a::Productb vendor a

Here's a non-pandas solution:

>>> df['text_new1'] = [x.split('::')[0] for x in df['text']]
>>> df
text text_new text_new1
0 vendor a::ProductA vendor a vendor a
1 vendor b::ProductA vendor b vendor b
2 vendor a::Productb vendor a vendor a

Edit: Here's the step-by-step explanation of what's happening in pandas above:

# Select the pandas.Series object you want
>>> df['text']
0 vendor a::ProductA
1 vendor b::ProductA
2 vendor a::Productb
Name: text, dtype: object

# using pandas.Series.str allows us to implement "normal" string methods
# (like split) on a Series
>>> df['text'].str
<pandas.core.strings.StringMethods object at 0x110af4e48>

# Now we can use the split method to split on our '::' string. You'll see that
# a Series of lists is returned (just like what you'd see outside of pandas)
>>> df['text'].str.split('::')
0 [vendor a, ProductA]
1 [vendor b, ProductA]
2 [vendor a, Productb]
Name: text, dtype: object

# using the pandas.Series.str method, again, we will be able to index through
# the lists returned in the previous step
>>> df['text'].str.split('::').str
<pandas.core.strings.StringMethods object at 0x110b254a8>

# now we can grab the first item in each list above for our desired output
>>> df['text'].str.split('::').str[0]
0 vendor a
1 vendor b
2 vendor a
Name: text, dtype: object

I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.

Remove all characters after a delimiter in a string

An improvement on the answer by Luke Joshua Park is to parse the URL relative to the URL of the source page. This creates an absolute URL from what might be relative URL on the page (scheme not specified, host not specified, relative path). Another improvement is to check and handle errors.

func clean(pageURL, linkURL string) (string, error) {
p, err := url.Parse(pageURL)
if err != nil {
return "", err
}
l, err := p.Parse(linkURL)
if err != nil {
return "", err
}
l.Fragment = "" // chop off the fragment
return l.String()
}

If you are not interested in getting an absolute URL, then chop off everything after the #. This works because the only valid use of # in a URL is the fragment separator.

 func clean(linkURL string) string {
i := strings.LastIndexByte(linkURL, '#')
if i < 0 {
return linkURL
}
return linkURL[:i]
}

remove delimiter in the middle of a string

Explode the array by | pop off the end item using array_pop() and then implode() back together with ,, then finally append the last item (return value of array_pop) at the end.

<?php
$str = 'item1|item2|item 3|yyyy-mm-dd';

$array = explode('|', $str);

$last = array_pop($array);

echo implode (', ', $array).' '.$last;

https://3v4l.org/ShFJX

Result:

item1, item2, item 3 yyyy-mm-dd


If you have a string like: item1|item2|item 3||item 4|||yyyy-mm-dd, you could use array_filter:

<?php
$str = 'item1|item2|item 3||item 4|||yyyy-mm-dd';

$array = explode('|', $str);

$array = array_filter($array);

$last = array_pop($array);

echo implode (', ', $array).' '.$last;

https://3v4l.org/W0kPn

Result:

item1, item2, item 3, item 4 yyyy-mm-dd

String.Split(), empty strings and method deleting specified characters

string.Split() method:

" ".Split(); will result in an array with 2 string.Empty items as there is nothing (empty) on either side of the space character.

" something".Split(); and "something ".Split(); will result in an array with two items, that one of them is an empty string, and actually one side of the space character is empty.

"a  b".Split(); //double space in between

The first space has a on the left side and an empty string on the right side (the right side is empty because there is another delimiter right after), the second space, has an empty string on the left side and b on the right side. so the result will be:

{"a","","","b"}

C# - Remove Beginning of String then Splitting by a delimiter

Split returns a string array (string[]) while Remove returns a string. You need different variables to store these values:

string delimiterString = numbers.Substring(2, 1);
char delimiter = delimiterString[0];
string resultSource = numbers.Remove(0, 5);
string[] result = resultSource.Split(delimiter);

Also note that you misplaced the array brackets. The sample code you posted shouldn't compile.



Related Topics



Leave a reply



Submit