Methods to Remove Specific Characters from String

Remove specific characters from a string in Python

Strings in Python are immutable (can't be changed). Because of this, the effect of line.replace(...) is just to create a new string, rather than changing the old one. You need to rebind (assign) it to line in order to have that variable take the new value, with those characters removed.

Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (see Python 3 answer below):

line = line.translate(None, '!@#$')

or regular expression replacement with re.sub

import re
line = re.sub('[!@#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in line which are in that class are replaced with the second parameter to sub: an empty string.

Python 3 answer

In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a comment on one of the answers, and it's noted in the documentation for str.translate.

When calling the translate method of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass None as the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal values of characters (i.e. the result of calling ord on them) to the ordinal values of the characters which should replace them, or—usefully to us—None to indicate that they should be deleted.

So to do the above dance with a Unicode string you would call something like

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeys and map are used to succinctly generate a dictionary containing

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

Or, as brought up by Joseph Lee, create the same translation table with str.maketrans:

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))

* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None:

import string
line = line.translate(string.maketrans('', ''), '!@#$')

Here string.maketrans is used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.

How to remove special characters from a string?

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");


If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").

A third way could be something like this, if you can exactly define what should be left in your string:

String  result = yourString.replaceAll("[^\\w\\s]","");

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String  result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

Additional information on Unicode

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.

Remove specific char from String

use :

NewString = OldString.replaceAll("char", "");

in your Example in comment use:

NewString = OldString.replaceAll("d", "");

for removing Arabic character please see following link

how could i remove arabic punctuation form a String in java

removing characters of a specific unicode range from a string

Removing certain characters from a string

I guess, the below code will help you.

    String input = "Just to clarify, I will have strings of varying "
+ "lengths. I want to strip characters from it, the exact "
+ "ones to be determined at runtime, and return the "
+ "resulting string.";
String regx = ",.";
char[] ca = regx.toCharArray();
for (char c : ca) {
input = input.replace(""+c, "");
}
System.out.println(input);

How to remove single character from a String by index

You can also use the StringBuilder class which is mutable.

StringBuilder sb = new StringBuilder(inputString);

It has the method deleteCharAt(), along with many other mutator methods.

Just delete the characters that you need to delete and then get the result as follows:

String resultString = sb.toString();

This avoids creation of unnecessary string objects.

Remove specific characters from String List - Python

It can be implemented much simpler by directly traversing the file and writing its content to a variable with filtering out unwanted characters.

For example, here is the 'file1.txt' file with the content:

Hello how are you? Very good!

Then we can do the following:

def main():

characters = '!?¿-.:;'

with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)

# print(aux) # Hello how are you Very good

As we see aux is the file's content without unwanted chars and it can be easily edited based on the desired output format.

For example, if we want a list of words, we can do this:

def main():

characters = '!?¿-.:;'

with open('file1.txt') as f:
aux = ''.join(c for c in f.read() if c not in characters)
aux = aux.split()

# print(aux) # ['Hello', 'how', 'are', 'you', 'Very', 'good']

How to remove all characters before a specific character in Java?

You can use .substring():

String s = "the text=text";
String s1 = s.substring(s.indexOf("=") + 1);
s1.trim();

then s1 contains everything after = in the original string.

s1.trim()

.trim() removes spaces before the first character (which isn't a whitespace, such as letters, numbers etc.) of a string (leading spaces) and also removes spaces after the last character (trailing spaces).

How to remove certain characters from a string? [Python]

Since strings are immutable, use the replace function to reassign cool

cool = "cool°"
cool = cool.replace("°","")
cool
'cool'


Related Topics



Leave a reply



Submit