Remove Not Alphanumeric Characters from String

How do I remove all non alphanumeric characters from a string except dash?

Replace [^a-zA-Z0-9 -] with an empty string.

Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");

Remove not alphanumeric characters from string

Removing non-alphanumeric chars

The following is the/a correct regex to strip non-alphanumeric chars from an input string:

input.replace(/\W/g, '')

Note that \W is the equivalent of [^0-9a-zA-Z_] - it includes the underscore character. To also remove underscores use e.g.:

input.replace(/[^0-9a-z]/gi, '')

The input is malformed

Since the test string contains various escaped chars, which are not alphanumeric, it will remove them.

A backslash in the string needs escaping if it's to be taken literally:

"\\test\\red\\bob\\fred\\new".replace(/\W/g, '')
"testredbobfrednew" // output

Handling malformed strings

If you're not able to escape the input string correctly (why not?), or it's coming from some kind of untrusted/misconfigured source - you can do something like this:

JSON.stringify("\\test\red\bob\fred\new").replace(/\W/g, '')
"testredbobfrednew" // output

Note that the json representation of a string includes the quotes:

JSON.stringify("\\test\red\bob\fred\new")
""\\test\red\bob\fred\new""

But they are also removed by the replacement regex.

Removing non-alphanumeric characters with sed

tr's -c (complement) flag may be an option

echo "Â10.41.89.50-._ " | tr -cd '[:alnum:]._-'

Replace all non alphanumeric characters, new lines, and multiple white space with one space

Be aware, that \W leaves the underscore. A short equivalent for [^a-zA-Z0-9] would be [\W_]

text.replace(/[\W_]+/g," ");

\W is the negation of shorthand \w for [A-Za-z0-9_] word characters (including the underscore)

Example at regex101.com

Regex to remove all non alpha-numeric and replace spaces with +

This is actually fairly straightforward.

Assuming str is the string you're cleaning up:

str = str.replace(/[^a-z0-9+]+/gi, '+');

The ^ means "anything not in this list of characters". The + after the [...] group means "one or more". /gi means "replace all of these that you find, without regard to case".

So any stretch of characters that are not letters, numbers, or '+' will be converted into a single '+'.

To remove parenthesized substrings (as requested in the comments), do this replacement first:

str = str.replace(/\(.+?\)/g, '');

function replacer() {
var str = document.getElementById('before').value. replace(/\(.+?\)/g, ''). replace(/[^a-z0-9+]+/gi, '+');
document.getElementById('after').value = str;}
document.getElementById('replacem').onclick = replacer;
<p>Before:  <input id="before" value="Durarara!!x2 Ten" /></p>
<p> <input type="button" value="replace" id="replacem" /></p>
<p>After: <input id="after" value="" readonly /></p>

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

Use str.replace.

df
strings
0 a#bc1!
1 a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object

How can I select all records with non-alphanumeric and remove them?

I suggest using REGEXP_REPLACE for select, to remove the characters, and using REGEXP_CONTAINS to get only the one you want.

SELECT REGEXP_REPLACE(EMPLOYER, r'[^a-zA-Z\d\s]', '') 
FROM fec.work
WHERE REGEXP_CONTAINS(EMPLOYER, r'[^a-zA-Z\d\s]')

You say you don't want to use replace because you don't know how many alphanumerical there is. But instead of listing all non-alphanumerical, why not use ^ to get all but alphanumerical ?

EDIT :

To complete with what Mikhail answered, you have multiple choices for your regex :

'[^a-zA-Z\\d\\s]'  // Basic regex
r'[^a-zA-Z\d\s]' // Uses r to avoid escaping
r'[^\w\s]' // \w = [a-zA-Z0-9_] (! underscore as alphanumerical !)

If you don't consider underscores to be alphanumerical, you should not use \w

How can I remove leading and trailing non-alphanumeric characters

Try using a ^\W+|\W+$ pattern like this:

$string = preg_replace('/^\W+|\W+$/', '', $string); 

This will replace any non-alphanumeric characters (note this doesn't include underscores) which appear either at the beginning or end of the string. The | is an alternation, which will match any string which matches either the pattern on the left or the pattern on the right. The ^ matches the beginning of the chain.

If you also need to remove underscores, use a character class like this:

$string = preg_replace('/^[\W_]+|[\W_]+$/', '', $string); 

How to remove non-alphanumeric characters?

Sounds like you almost knew what you wanted to do already, you basically defined it as a regex.

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

Pandas remove non-alphanumeric characters from string column

You can use regex for this.

df['firstname'] = df['firstname'].str.replace('[^a-zA-Z0-9]', ' ', regex=True).str.strip()
df.firstname.tolist()
>>> ['joe down', 'lucash brown', 'antony', 'mary']


Related Topics



Leave a reply



Submit