Deleting a Specific Character in a String from a Pattern

How to remove anything that starts with specific character (@) in various locations in string in R?

We may change the pattern to match zero or more space (\\s*) followed by @ and one or more non-white space (\\S+) in str_remove_all to remove those substring

library(stringr)
library(dplyr)
Customer %>%
mutate(Cleaned_Tweet = str_remove_all(Tweet, "\\s*@\\S+"))

-output

 ID                                                                 Tweet                                         Cleaned_Tweet
1 1 @ChipotleTweets @ChipotleTweets Becky is very nice Becky is very nice
2 2 Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets Happy Halloween! I now look forward to $3 booritos at
3 3 Considering walking to @.ChipotleTweets in my llama onesie. Considering walking to in my llama onesie.

NOTE: str_remove just removes the first instance of match i.e. if there are more than one match in a single string, it skips the others and matches only the first. We need str_remove_all for removing all instances of matching patterns.

data

Customer <- structure(list(ID = 1:3, Tweet = c("@ChipotleTweets @ChipotleTweets Becky is very nice", 
"Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets",
"Considering walking to @.ChipotleTweets in my llama onesie."
)), class = "data.frame", row.names = c(NA, -3L))

How to remove a specific special character pattern from a string

Try:

s.replaceAll("<NOUN>|</NOUN>", "");

In RegEx, the syntax [...] will match every character inside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe (|) to match both "<NOUN>" and "</NOUN>".

The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:

s.replaceAll("</?NOUN>", "");

Deleting a specific character in a string from a pattern

Try this approach via Using SUBSTRING and LEN functions as next pattern:-

Update tableName
Set columnName = substring(columnName , 0, 9) +
substring(columnName , 10, len(columnName ))
Where substring(columnName , 9, 1) = '0'
And len(columnName ) > 15

Demo

Create table #Temp (Col1 varchar(20))
insert into #Temp values ('18_0231_0121_001') -- Remove the 0 in index #9
insert into #Temp values ('18_0231_0121_12') -- keep the 0 in index #9
insert into #Temp values ('18_0231_2121_001') -- there is no 0 in index #9, so keep it as it is

select * from #Temp

Result Before Update

18_0231_0121_001
18_0231_0121_12
18_0231_2121_001

Use update as next

update #Temp 
set Col1 = substring(Col1, 0, 9) + substring(Col1, 10, len(Col1))
where substring(Col1, 9, 1) = '0' and len(Col1) > 15

select * from #Temp

Result After Update

18_0231_121_001
18_0231_0121_12
18_0231_2121_001

Regex to find and remove char with specific pattern

You need to remove ^ (start of the string anchor) and replace the match with the contents of Group 1 using $1 backreference:

var str = "This is mail@mail.text #1 but page is @001#";
var result = Regex.Replace(str, @"@([0-9]{1,3})#\z", "$1");

See the regex demo

The @([0-9]{1,3})#\z pattern will find @, 1 to 3 digits (put inside a group), and then a # at the very end of string (\z).

Another variation: if the value may start with a digit and can be followed with an ASCII letter or digit, use

var result = Regex.Replace(str, @"@([0-9][0-9a-zA-Z]{0,2})#\z", "$1");

And if the value can just be alphanumeric, just use

var result = Regex.Replace(str, @"@([0-9a-zA-Z]{1,3})#\z", "$1");

Removing certain characters from a string

I guess, the below code will help you.

    String input = "Just to clarify, I will have strings of varying "
+ "lengths. I want to strip characters from it, the exact "
+ "ones to be determined at runtime, and return the "
+ "resulting string.";
String regx = ",.";
char[] ca = regx.toCharArray();
for (char c : ca) {
input = input.replace(""+c, "");
}
System.out.println(input);

How to remove all characters before a specific character in Java?

You can use .substring():

String s = "the text=text";
String s1 = s.substring(s.indexOf("=") + 1);
s1.trim();

then s1 contains everything after = in the original string.

s1.trim()

.trim() removes spaces before the first character (which isn't a whitespace, such as letters, numbers etc.) of a string (leading spaces) and also removes spaces after the last character (trailing spaces).

remove '$' characters from a string

yup, that's a special character for pattern matching. you need to escape it with the % symbol.

local s = 'asdf$erer$iiuq'
print(s:gsub('%$', ''))

> asdfereriiuq 2

How to remove special characters from a string?

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");



If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").

A third way could be something like this, if you can exactly define what should be left in your string:

String  result = yourString.replaceAll("[^\\w\\s]","");

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String  result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

Additional information on Unicode

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.

Remove characters after specific character in string, then remove substring?

For string manipulation, if you just want to kill everything after the ?, you can do this

string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.IndexOf("?");
if (index >= 0)
input = input.Substring(0, index);

Edit: If everything after the last slash, do something like

string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.LastIndexOf("/");
if (index >= 0)
input = input.Substring(0, index); // or index + 1 to keep slash

Alternately, since you're working with a URL, you can do something with it like this code

System.Uri uri = new Uri("http://www.somesite.com/what/test.aspx?hello=1");
string fixedUri = uri.AbsoluteUri.Replace(uri.Query, string.Empty);

Remove specific characters from a string in Python

Strings in Python are immutable (can't be changed). Because of this, the effect of line.replace(...) is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line in order to have that variable take the new value, with those characters removed.

Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (see Python 3 answer below):

line = line.translate(None, '!@#$')

or regular expression replacement with re.sub

import re
line = re.sub('[!@#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in line which are in that class are replaced with the second parameter to sub: an empty string.

Python 3 answer

In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it's noted in the documentation for str.translate.

When calling the translate method of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass None as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord on them) to the ordinal values of the characters which should replace them, or—usefully to us—None to indicate that they should be deleted.

So to do the above dance with a Unicode string you would call something like

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeys and map are used to succinctly generate a dictionary containing

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

Or, as brought up by Joseph Lee, create the same translation table with str.maketrans:

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))

* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None:

import string
line = line.translate(string.maketrans('', ''), '!@#$')

Here string.maketrans is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.



Related Topics



Leave a reply



Submit