How to Remove Non-Alphanumeric Characters

How to remove non-alphanumeric characters?

Sounds like you almost knew what you wanted to do already, you basically defined it as a regex.

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

Replacing all non-alphanumeric characters with empty strings

Use [^A-Za-z0-9].

Note: removed the space since that is not typically considered alphanumeric.

How do I remove all non alphanumeric characters from a string except dash?

Replace [^a-zA-Z0-9 -] with an empty string.

Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

Use str.replace.

df
strings
0 a#bc1!
1 a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object

Removing non-alphanumeric characters with sed

tr's -c (complement) flag may be an option

echo "Â10.41.89.50-._ " | tr -cd '[:alnum:]._-'

How can I select all records with non-alphanumeric and remove them?

I suggest using REGEXP_REPLACE for select, to remove the characters, and using REGEXP_CONTAINS to get only the one you want.

SELECT REGEXP_REPLACE(EMPLOYER, r'[^a-zA-Z\d\s]', '') 
FROM fec.work
WHERE REGEXP_CONTAINS(EMPLOYER, r'[^a-zA-Z\d\s]')

You say you don't want to use replace because you don't know how many alphanumerical there is. But instead of listing all non-alphanumerical, why not use ^ to get all but alphanumerical ?

EDIT :

To complete with what Mikhail answered, you have multiple choices for your regex :

'[^a-zA-Z\\d\\s]'  // Basic regex
r'[^a-zA-Z\d\s]' // Uses r to avoid escaping
r'[^\w\s]' // \w = [a-zA-Z0-9_] (! underscore as alphanumerical !)

If you don't consider underscores to be alphanumerical, you should not use \w

Replace all non alphanumeric characters, new lines, and multiple white space with one space

Be aware, that \W leaves the underscore. A short equivalent for [^a-zA-Z0-9] would be [\W_]

text.replace(/[\W_]+/g," ");

\W is the negation of shorthand \w for [A-Za-z0-9_] word characters (including the underscore)

Example at regex101.com

How to trim all non-alphanumeric characters from start and end of a string in Javascript?

Modify your current RegExp to specify the start or end of string with ^ or $ and make it greedy. You can then link the two together with an OR |.

val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g, '');

This can be simplified to a-z with i flag for all letters and \d for numbers

val.replace(/^[^a-z\d]*|[^a-z\d]*$/gi, '');

Regex to remove all non alpha-numeric and replace spaces with +

This is actually fairly straightforward.

Assuming str is the string you're cleaning up:

str = str.replace(/[^a-z0-9+]+/gi, '+');

The ^ means "anything not in this list of characters". The + after the [...] group means "one or more". /gi means "replace all of these that you find, without regard to case".

So any stretch of characters that are not letters, numbers, or '+' will be converted into a single '+'.

To remove parenthesized substrings (as requested in the comments), do this replacement first:

str = str.replace(/\(.+?\)/g, '');

function replacer() {
var str = document.getElementById('before').value. replace(/\(.+?\)/g, ''). replace(/[^a-z0-9+]+/gi, '+');
document.getElementById('after').value = str;}
document.getElementById('replacem').onclick = replacer;
<p>Before:  <input id="before" value="Durarara!!x2 Ten" /></p>
<p> <input type="button" value="replace" id="replacem" /></p>
<p>After: <input id="after" value="" readonly /></p>

Pandas remove non-alphanumeric characters from string column

You can use regex for this.

df['firstname'] = df['firstname'].str.replace('[^a-zA-Z0-9]', ' ', regex=True).str.strip()
df.firstname.tolist()
>>> ['joe down', 'lucash brown', 'antony', 'mary']


Related Topics



Leave a reply



Submit