How to remove string before and after certain delimiter positions in R?
You could use strsplit
sapply(strsplit(tt, "_"), "[[", 6)
#[1] "S50" "S50" "S62"
Explanation: We use vectorised strsplit
to split tt
on every "_"
resulting in a list
; sapply(..., "[[", 6)
then extracts the 6th element from every list
element.
Alternatively you could use an explicit anonymous function
sapply(strsplit(tt, "_"), function(x) x[6])
Remove strings by a specific delimiter
Just use the POSIX
compatible split()
function on $2
as
awk '{split($2,temp,":"); $2=temp[2];}1' file
--- 16050075 16050075 A G
--- 16050115 16050115 G A
--- 16050213 16050213 C T
--- 16050319 16050319 C T
--- 16050527 16050527 C A
Split the column 2 on de-limiter :
, update the $2
value to the required element (temp[2]
) and print the rest of the fields ({}1
re-constructs all individual fields based on FS
and prints it).
Recommend this over using multiple de-limiters, as it alters the absolute position of the individual fields, while split()
makes it easy to retain the position and just extract the required value.
For your updated requirement to add a new column, just do
awk '{split($2,temp,":"); $2=temp[1] FS temp[2];}1' file
--- 22 16050075 16050075 A G
--- 22 16050115 16050115 G A
--- 22 16050213 16050213 C T
--- 22 16050319 16050319 C T
--- 22 16050527 16050527 C A
Alternatively if you have GNU awk
/gawk
you can use its gensub()
for a regex (using POSIX
character class [[:digit]]
) based extraction as
awk '{$2=gensub(/^([[:digit:]]+):([[:digit:]]+).*$/,"\\1 \\2","g",$2);}1' file
--- 22 16050075 16050075 A G
--- 22 16050115 16050115 G A
--- 22 16050213 16050213 C T
--- 22 16050319 16050319 C T
--- 22 16050527 16050527 C A
The gensub(/^([[:digit:]]+):([[:digit:]]+).*$/,"\\1 \\2","g",$2)
part captures only the first two fields de-limited by :
with the capturing groups \\1
and \\2
and printing the rest of the fields as such.
removing ' delimiters from a string
You can just do
toSplit = toSplit.Replace("'", "");
before you split
But I am not quite understanding your question. Your title says you want to remove ' from a string.
I am also unsure how your code gets 4 objects in an array by splitting by ' since there is only one in your string.
The array would look like that if you did a split with a space character.
So do this to get the output you want:
string toSplit = "hello how 'are u";
toSplit = toSplit.Replace("'", "");
string[] arr = toSplit.Split(' ');
for (int i=0 ; i < arr.Length ; i++)
Console.Write("arr[i]="+ arr[i]);
Removing strings between two delimiters
Use regex replace with a reluctant quantifier:
str = str.replaceAll("--/--.*?--/--\\s*", "");
The expression *?
is a reluctant quantifier, which means it matches as little as possible while still matching, which in turn means it will stop at the next delimiter after the first in case there are multiple delimiter pairs in the input.
I added \s*
to the end to also remove trailing spaces after the closing delimiter (which your example seemed to suggest was wanted).
To use this approach, you're going to have to read the text file line at a time, rather than word at a time, process the line to remove the username then split into words:
while (textFile.hasNextLine()) {
for (string word : textFile.nextLine().trim().toLowerCase().replaceAll("--/--.*?--/--\\s*", "").split("\\s+")) {
words.add(word);
}
}
Python pandas: remove everything after a delimiter in a string
You can use pandas.Series.str.split
just like you would use split
normally. Just split on the string '::'
, and index the list that's created from the split
method:
>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]})
>>> df
text
0 vendor a::ProductA
1 vendor b::ProductA
2 vendor a::Productb
>>> df['text_new'] = df['text'].str.split('::').str[0]
>>> df
text text_new
0 vendor a::ProductA vendor a
1 vendor b::ProductA vendor b
2 vendor a::Productb vendor a
Here's a non-pandas solution:
>>> df['text_new1'] = [x.split('::')[0] for x in df['text']]
>>> df
text text_new text_new1
0 vendor a::ProductA vendor a vendor a
1 vendor b::ProductA vendor b vendor b
2 vendor a::Productb vendor a vendor a
Edit: Here's the step-by-step explanation of what's happening in pandas
above:
# Select the pandas.Series object you want
>>> df['text']
0 vendor a::ProductA
1 vendor b::ProductA
2 vendor a::Productb
Name: text, dtype: object
# using pandas.Series.str allows us to implement "normal" string methods
# (like split) on a Series
>>> df['text'].str
<pandas.core.strings.StringMethods object at 0x110af4e48>
# Now we can use the split method to split on our '::' string. You'll see that
# a Series of lists is returned (just like what you'd see outside of pandas)
>>> df['text'].str.split('::')
0 [vendor a, ProductA]
1 [vendor b, ProductA]
2 [vendor a, Productb]
Name: text, dtype: object
# using the pandas.Series.str method, again, we will be able to index through
# the lists returned in the previous step
>>> df['text'].str.split('::').str
<pandas.core.strings.StringMethods object at 0x110b254a8>
# now we can grab the first item in each list above for our desired output
>>> df['text'].str.split('::').str[0]
0 vendor a
1 vendor b
2 vendor a
Name: text, dtype: object
I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.
Remove all characters after a delimiter in a string
An improvement on the answer by Luke Joshua Park is to parse the URL relative to the URL of the source page. This creates an absolute URL from what might be relative URL on the page (scheme not specified, host not specified, relative path). Another improvement is to check and handle errors.
func clean(pageURL, linkURL string) (string, error) {
p, err := url.Parse(pageURL)
if err != nil {
return "", err
}
l, err := p.Parse(linkURL)
if err != nil {
return "", err
}
l.Fragment = "" // chop off the fragment
return l.String()
}
If you are not interested in getting an absolute URL, then chop off everything after the #. This works because the only valid use of # in a URL is the fragment separator.
func clean(linkURL string) string {
i := strings.LastIndexByte(linkURL, '#')
if i < 0 {
return linkURL
}
return linkURL[:i]
}
remove delimiter in the middle of a string
Explode the array by |
pop off the end item using array_pop() and then implode() back together with ,
, then finally append the last item (return value of array_pop) at the end.
<?php
$str = 'item1|item2|item 3|yyyy-mm-dd';
$array = explode('|', $str);
$last = array_pop($array);
echo implode (', ', $array).' '.$last;
https://3v4l.org/ShFJX
Result:
item1, item2, item 3 yyyy-mm-dd
If you have a string like: item1|item2|item 3||item 4|||yyyy-mm-dd
, you could use array_filter:
<?php
$str = 'item1|item2|item 3||item 4|||yyyy-mm-dd';
$array = explode('|', $str);
$array = array_filter($array);
$last = array_pop($array);
echo implode (', ', $array).' '.$last;
https://3v4l.org/W0kPn
Result:
item1, item2, item 3, item 4 yyyy-mm-dd
String.Split(), empty strings and method deleting specified characters
string.Split()
method:
" ".Split();
will result in an array with 2 string.Empty
items as there is nothing (empty) on either side of the space character.
" something".Split();
and "something ".Split();
will result in an array with two items, that one of them is an empty string, and actually one side of the space character is empty.
"a b".Split(); //double space in between
The first space has a
on the left side and an empty string on the right side (the right side is empty because there is another delimiter right after), the second space, has an empty string on the left side and b
on the right side. so the result will be:
{"a","","","b"}
C# - Remove Beginning of String then Splitting by a delimiter
Split returns a string array (string[]) while Remove returns a string. You need different variables to store these values:
string delimiterString = numbers.Substring(2, 1);
char delimiter = delimiterString[0];
string resultSource = numbers.Remove(0, 5);
string[] result = resultSource.Split(delimiter);
Also note that you misplaced the array brackets. The sample code you posted shouldn't compile.
Related Topics
Laravel-Mix No Build Notification
Why Does Gdb Prompt "Unexpected Size of Section '.Reg-Xstate/Xxxxx' in Core File."
When to Use Linux Kernel Add_Timer Vs Queue_Delayed_Work
How Convert Address in Elf to Physical Address
Install Opencl(Amd Sdk Kit) on Linux Without Root Privilege
Process Scheduling from Processor Point of View
Wget Breaking with Content-Disposition
Some Flags About Workqueue in Kernel
Forcing a Context Switch from The Userland on Linux
Overview/Reference Manual for Open Firmware Device Trees
Finding Processor Id in Which Process Is Running [Through Command/Interface Similar to Top]
Sox Batch Process Under Debian
Shell Bash Script to Print Numbers in Ascending Order
Sending Realtime Signal from a Kernel Module to User Space Fails
Linux Set End of File (Shrink, Truncate, Cut Out Some Data @ End)