Removing punctuation from strings?
In situations like yours you have to ask yourself which is easier:
- Create a REGEXP that blocks certain characters
- Create a REGEXP that allows certain characters
The choice you opt for should depend on which is less work and be more reliable.
Writing a pattern that blocks all symbols depends on you remembering every possible symbol - not just punctuation, but emoji patterns, mathematical symbols and so on.
If all you want is to allow numbers and letters only, you can do:
str.replace(/\W/g, '');
\W
/ is an alias for "non-alphanumeric" characters. The only caveat here is alphanumeric includes underscores, so if you want to block those too:
str.replace(/\W|_/g, '');
Javascript: Remove string punctuation and split into words?
That would be tricky to work around your own solution but you could consider apostrophes this way:
sentence = `"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."`;console.log( sentence.match(/\w+(?:'\w+)*/g));
Javascript regex to remove punctuation
*
and +
needs to be escaped.
function regex (str) { return str.replace(/(~|`|!|@|#|$|%|^|&|\*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|\+|=)/g,"")}
var testStr = 'test @ . / | ) this'document.write('<strong>before: </strong>' + testStr)document.write('<br><strong>after: </strong>' + regex(testStr))
Javascript regex to remove all punctuation except . and ?
Just use [^\w\s?.]
for your character class.
Other ways to remove or ignore punctuation in JS besides regex?
Without regex:
function LongestWord(sen) {
var wordStart = -1;
var bestWord = null;
var bestLength = 0;
for (var i = 0; i < sen.length; i++) {
var ch = sen[i];
if ('a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z')
{
if (wordStart === -1)
{
wordStart = i;
}
}
else
{
if (wordStart !== -1)
{
var word = sen.substring(wordStart, i);
if (word.length > bestLength)
{
bestLength = word.length;
bestWord = word;
}
wordStart = -1;
}
}
}
if (wordStart !== -1)
{
var word = sen.substring(wordStart);
if (word.length > bestLength)
{
bestLength = word.length;
bestWord = word;
}
wordStart = -1;
}
return bestWord;
}
With regex:
function LongestWord(sen) {
var bestWord = null;
var bestLength = 0;
var matches = sen.match(/[a-z]+/gi);
for (var i = 0; i < matches.length; i++)
var word = matches[i];
if (word.Length > bestLength)
{
bestLength = word.Length;
bestWord = word;
}
}
return bestWord;
}
Remove all non-latin passages from a string with regex
I understand that by "a non-latin character such as הּ
" you mean any non-ASCII letter.
To match any letter other than an ASCII letter, you can use [^\P{L}a-zA-Z]
. This is a negated character class that matches any chars other than a non-letter char (\P{L}
) and ASCII letters (a-zA-Z
). So, it is basically the \p{L}
pattern with the exception of ASCII letters.
This Unicode character class based pattern requires a u
flag, supported by Node.js JavaScript environment.
The solution will look like
text = text.replace(/[^\P{L}a-z][^a-z]*/gui, '')
Note the g
flag makes replace
replace all occurrences in the string and i
is used to shorten the ASCII letter pattern (since it makes the pattern matching case insensitive).
See the JavaScript demo:
const text = `or perhaps, a - אוֹ דִילְמָא אֵין אִשָּׁה מִתְקַדְּשֶׁת לַחֲצָאִין כְּלָל (12);time
תֵּיקוּ
person cannot be in separate halves at all, even
though both "halves” would come together simultaneously?(13)
The speaker replies:(14)`;
console.log(
text.replace(/[^\P{L}a-z][^a-z]*/gui, '')
)
Remove punctuation from string with Regex
First, please read here for information on regular expressions. It's worth learning.
You can use this:
Regex.Replace("This is a test string, with lots of: punctuations; in it?!.", @"[^\w\s]", "");
Which means:
[ #Character block start.
^ #Not these characters (letters, numbers).
\w #Word characters.
\s #Space characters.
] #Character block end.
In the end it reads "replace any character that is not a word character or a space character with nothing."
Regarding JavaScript RegEx - Replace all punctuation including underscore
+
repeats the previous token one or more times.
> "h....e l___l^^0".replace(/[\W_]+/g, "-")
'h-e-l-l-0'
[\W_]+
matches non-word characters or _
one or more times.
Related Topics
How to Async Await in React Render Function
Javascript: Class.Method VS. Class.Prototype.Method
Get the Index of the Object Inside an Array, Matching a Condition
Equivalent of String.Format in Jquery
How to Screenshot Website in JavaScript Client-Side/How Google Did It? (No Need to Access Hdd)
Why Should I Use a Semicolon After Every Function in JavaScript
Referencing "This" Inside Setinterval/Settimeout Within Object Prototype Methods
What Is Returned from a Constructor
How to Set Time Delay in JavaScript
JavaScript - Cannot Set Property of Undefined
Recursive Matching with Regular Expressions in JavaScript
Variable Scope in D3 JavaScript
Can a PDF File's Print Dialog Be Opened with JavaScript
How to Check If Iframe Is Loaded or It Has a Content
Getting Current Date and Time in JavaScript