Remove All Special Characters from a String

How to remove special characters from a string?

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");



If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").

A third way could be something like this, if you can exactly define what should be left in your string:

String  result = yourString.replaceAll("[^\\w\\s]","");

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String  result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

Additional information on Unicode

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.

Remove all special characters except space from a string using JavaScript

You should use the string replace function, with a single regex.
Assuming by special characters, you mean anything that's not letter, here is a solution:

const str = "abc's test#s";console.log(str.replace(/[^a-zA-Z ]/g, ""));

Remove all special characters, punctuation and spaces from string

This can be done without regex:

>>> string = "Special $#! characters   spaces 888323"
>>> ''.join(e for e in string if e.isalnum())
'Specialcharactersspaces888323'

You can use str.isalnum:

S.isalnum() -> bool

Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.

If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that's the best way to go about it.

Remove all special characters from a string

This should do what you're looking for:

function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.

return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}

Usage:

echo clean('a|"bc!@£de^&$f g');

Will output: abcdef-g

Edit:

Hey, just a quick question, how can I prevent multiple hyphens from being next to each other? and have them replaced with just 1?

function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}

Remove all special characters and case from string in bash

cat yourfile.txt | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'

The first tr deletes special characters. d means delete, c means complement (invert the character set). So, -dc means delete all characters except those specified. The \n and \r are included to preserve linux or windows style newlines, which I assume you want.

The second one translates uppercase characters to lowercase.

Most efficient way to remove special characters from string

Why do you think that your method is not efficient? It's actually one of the most efficient ways that you can do it.

You should of course read the character into a local variable or use an enumerator to reduce the number of array accesses:

public static string RemoveSpecialCharacters(this string str) {
StringBuilder sb = new StringBuilder();
foreach (char c in str) {
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '.' || c == '_') {
sb.Append(c);
}
}
return sb.ToString();
}

One thing that makes a method like this efficient is that it scales well. The execution time will be relative to the length of the string. There is no nasty surprises if you would use it on a large string.

Edit:

I made a quick performance test, running each function a million times with a 24 character string. These are the results:

Original function: 54.5 ms.

My suggested change: 47.1 ms.

Mine with setting StringBuilder capacity: 43.3 ms.

Regular expression: 294.4 ms.

Edit 2:
I added the distinction between A-Z and a-z in the code above. (I reran the performance test, and there is no noticable difference.)

Edit 3:

I tested the lookup+char[] solution, and it runs in about 13 ms.

The price to pay is, of course, the initialization of the huge lookup table and keeping it in memory. Well, it's not that much data, but it's much for such a trivial function...

private static bool[] _lookup;

static Program() {
_lookup = new bool[65536];
for (char c = '0'; c <= '9'; c++) _lookup[c] = true;
for (char c = 'A'; c <= 'Z'; c++) _lookup[c] = true;
for (char c = 'a'; c <= 'z'; c++) _lookup[c] = true;
_lookup['.'] = true;
_lookup['_'] = true;
}

public static string RemoveSpecialCharacters(string str) {
char[] buffer = new char[str.Length];
int index = 0;
foreach (char c in str) {
if (_lookup[c]) {
buffer[index] = c;
index++;
}
}
return new string(buffer, 0, index);
}

Regex remove all special characters except numbers?

Use the global flag:

var name = name.replace(/[^a-zA-Z ]/g, "");
^

If you don't want to remove numbers, add it to the class:

var name = name.replace(/[^a-zA-Z0-9 ]/g, "");

How to remove all special Characters from a string except - . and space

Use either \s or simply a space character as explained in the Pattern class javadoc

\s - A whitespace character: [ \t\n\x0B\f\r]
- Literal space character

You must either escape - character as \- so it won't be interpreted as range expression or make sure it stays as the last regex character. Putting it all together:

filename.replaceAll("[^a-zA-Z0-9\\s.-]", "")
filename.replaceAll("[^a-zA-Z0-9 .-]", "")


Related Topics



Leave a reply



Submit