How do I remove all non alphanumeric characters from a string except dash?
Replace [^a-zA-Z0-9 -]
with an empty string.
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");
Remove not alphanumeric characters from string
Removing non-alphanumeric chars
The following is the/a correct regex to strip non-alphanumeric chars from an input string:
input.replace(/\W/g, '')
Note that \W
is the equivalent of [^0-9a-zA-Z_]
- it includes the underscore character. To also remove underscores use e.g.:
input.replace(/[^0-9a-z]/gi, '')
The input is malformed
Since the test string contains various escaped chars, which are not alphanumeric, it will remove them.
A backslash in the string needs escaping if it's to be taken literally:
"\\test\\red\\bob\\fred\\new".replace(/\W/g, '')
"testredbobfrednew" // output
Handling malformed strings
If you're not able to escape the input string correctly (why not?), or it's coming from some kind of untrusted/misconfigured source - you can do something like this:
JSON.stringify("\\test\red\bob\fred\new").replace(/\W/g, '')
"testredbobfrednew" // output
Note that the json representation of a string includes the quotes:
JSON.stringify("\\test\red\bob\fred\new")
""\\test\red\bob\fred\new""
But they are also removed by the replacement regex.
Removing non-alphanumeric characters with sed
tr's -c
(complement) flag may be an option
echo "Â10.41.89.50-._ " | tr -cd '[:alnum:]._-'
Replace all non alphanumeric characters, new lines, and multiple white space with one space
Be aware, that \W
leaves the underscore. A short equivalent for [^a-zA-Z0-9]
would be [\W_]
text.replace(/[\W_]+/g," ");
\W
is the negation of shorthand \w
for [A-Za-z0-9_]
word characters (including the underscore)
Example at regex101.com
Regex to remove all non alpha-numeric and replace spaces with +
This is actually fairly straightforward.
Assuming str
is the string you're cleaning up:
str = str.replace(/[^a-z0-9+]+/gi, '+');
The ^
means "anything not in this list of characters". The +
after the [...]
group means "one or more". /gi
means "replace all of these that you find, without regard to case".
So any stretch of characters that are not letters, numbers, or '+' will be converted into a single '+'.
To remove parenthesized substrings (as requested in the comments), do this replacement first:
str = str.replace(/\(.+?\)/g, '');
function replacer() {
var str = document.getElementById('before').value. replace(/\(.+?\)/g, ''). replace(/[^a-z0-9+]+/gi, '+');
document.getElementById('after').value = str;}
document.getElementById('replacem').onclick = replacer;
<p>Before: <input id="before" value="Durarara!!x2 Ten" /></p>
<p> <input type="button" value="replace" id="replacem" /></p>
<p>After: <input id="after" value="" readonly /></p>
How to remove non-alpha-numeric characters from strings within a dataframe column in Python?
Use str.replace
.
df
strings
0 a#bc1!
1 a(b$c
df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object
To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:
df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object
How can I select all records with non-alphanumeric and remove them?
I suggest using REGEXP_REPLACE
for select, to remove the characters, and using REGEXP_CONTAINS
to get only the one you want.
SELECT REGEXP_REPLACE(EMPLOYER, r'[^a-zA-Z\d\s]', '')
FROM fec.work
WHERE REGEXP_CONTAINS(EMPLOYER, r'[^a-zA-Z\d\s]')
You say you don't want to use replace
because you don't know how many alphanumerical there is. But instead of listing all non-alphanumerical, why not use ^
to get all but alphanumerical ?
EDIT :
To complete with what Mikhail answered, you have multiple choices for your regex :
'[^a-zA-Z\\d\\s]' // Basic regex
r'[^a-zA-Z\d\s]' // Uses r to avoid escaping
r'[^\w\s]' // \w = [a-zA-Z0-9_] (! underscore as alphanumerical !)
If you don't consider underscores to be alphanumerical, you should not use \w
How can I remove leading and trailing non-alphanumeric characters
Try using a ^\W+|\W+$
pattern like this:
$string = preg_replace('/^\W+|\W+$/', '', $string);
This will replace any non-alphanumeric characters (note this doesn't include underscores) which appear either at the beginning or end of the string. The |
is an alternation, which will match any string which matches either the pattern on the left or the pattern on the right. The ^
matches the beginning of the chain.
If you also need to remove underscores, use a character class like this:
$string = preg_replace('/^[\W_]+|[\W_]+$/', '', $string);
How to remove non-alphanumeric characters?
Sounds like you almost knew what you wanted to do already, you basically defined it as a regex.
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
Pandas remove non-alphanumeric characters from string column
You can use regex for this.
df['firstname'] = df['firstname'].str.replace('[^a-zA-Z0-9]', ' ', regex=True).str.strip()
df.firstname.tolist()
>>> ['joe down', 'lucash brown', 'antony', 'mary']
Related Topics
How to Calculate Md5 Hash of a File Using JavaScript
Anonymous Class Instance - Is It a Bad Idea
JavaScript - How to Detect If Document Has Loaded (Ie 7/Firefox 3)
Referencing "This" Inside Setinterval/Settimeout Within Object Prototype Methods
How to Create Separate Angularjs Controller Files
Removing an Anonymous Event Listener
How to Share Code Between Node.Js and the Browser
Does Never Resolved Promise Cause Memory Leak
How to Replace Remapcolums with Remapcolumnsbyname in Free Jqgrid
How to Watch for a Route Change in Angularjs
Detecting When a Div's Height Changes Using Jquery
When Does Js Interpret {} as an Empty Block Instead of an Empty Object
Difference Between "Change" and "Input" Event for an 'Input' Element
Regular Expression for Password Validation
Why Should I Use a Semicolon After Every Function in JavaScript
Get First and Last Date of Current Month with JavaScript or Jquery
How to Get the Fragment Identifier (Value After Hash #) from a Url