How to Strip All Punctuation from a String in JavaScript Using Regex

Removing punctuation from strings?

In situations like yours you have to ask yourself which is easier:

  • Create a REGEXP that blocks certain characters
  • Create a REGEXP that allows certain characters

The choice you opt for should depend on which is less work and be more reliable.

Writing a pattern that blocks all symbols depends on you remembering every possible symbol - not just punctuation, but emoji patterns, mathematical symbols and so on.

If all you want is to allow numbers and letters only, you can do:

str.replace(/\W/g, '');

\W/ is an alias for "non-alphanumeric" characters. The only caveat here is alphanumeric includes underscores, so if you want to block those too:

str.replace(/\W|_/g, '');

Javascript: Remove string punctuation and split into words?

That would be tricky to work around your own solution but you could consider apostrophes this way:

sentence = `"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."`;console.log(    sentence.match(/\w+(?:'\w+)*/g));

Javascript regex to remove punctuation

* and + needs to be escaped.

function regex (str) {    return str.replace(/(~|`|!|@|#|$|%|^|&|\*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|\+|=)/g,"")}
var testStr = 'test @ . / | ) this'document.write('<strong>before: </strong>' + testStr)document.write('<br><strong>after: </strong>' + regex(testStr))

Javascript regex to remove all punctuation except . and ?

Just use [^\w\s?.] for your character class.

Other ways to remove or ignore punctuation in JS besides regex?

Without regex:

function LongestWord(sen) {
var wordStart = -1;
var bestWord = null;
var bestLength = 0;

for (var i = 0; i < sen.length; i++) {
var ch = sen[i];
if ('a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z')
{
if (wordStart === -1)
{
wordStart = i;
}
}
else
{
if (wordStart !== -1)
{
var word = sen.substring(wordStart, i);
if (word.length > bestLength)
{
bestLength = word.length;
bestWord = word;
}
wordStart = -1;
}
}
}
if (wordStart !== -1)
{
var word = sen.substring(wordStart);
if (word.length > bestLength)
{
bestLength = word.length;
bestWord = word;
}
wordStart = -1;
}
return bestWord;
}

With regex:

function LongestWord(sen) {
var bestWord = null;
var bestLength = 0;

var matches = sen.match(/[a-z]+/gi);
for (var i = 0; i < matches.length; i++)
var word = matches[i];
if (word.Length > bestLength)
{
bestLength = word.Length;
bestWord = word;
}
}
return bestWord;
}

Remove all non-latin passages from a string with regex

I understand that by "a non-latin character such as הּ" you mean any non-ASCII letter.

To match any letter other than an ASCII letter, you can use [^\P{L}a-zA-Z]. This is a negated character class that matches any chars other than a non-letter char (\P{L}) and ASCII letters (a-zA-Z). So, it is basically the \p{L} pattern with the exception of ASCII letters.

This Unicode character class based pattern requires a u flag, supported by Node.js JavaScript environment.

The solution will look like

text = text.replace(/[^\P{L}a-z][^a-z]*/gui, '')

Note the g flag makes replace replace all occurrences in the string and i is used to shorten the ASCII letter pattern (since it makes the pattern matching case insensitive).

See the JavaScript demo:

const text = `or perhaps, a - אוֹ דִילְמָא אֵין אִשָּׁה מִתְקַדְּשֶׁת לַחֲצָאִין כְּלָל (12);time
תֵּיקוּ
person cannot be in separate halves at all, even
though both "halves” would come together simultaneously?(13)
The speaker replies:(14)`;
console.log(
text.replace(/[^\P{L}a-z][^a-z]*/gui, '')
)

Remove punctuation from string with Regex

First, please read here for information on regular expressions. It's worth learning.

You can use this:

Regex.Replace("This is a test string, with lots of: punctuations; in it?!.", @"[^\w\s]", "");

Which means:

[   #Character block start.
^ #Not these characters (letters, numbers).
\w #Word characters.
\s #Space characters.
] #Character block end.

In the end it reads "replace any character that is not a word character or a space character with nothing."

Regarding JavaScript RegEx - Replace all punctuation including underscore

+ repeats the previous token one or more times.

> "h....e l___l^^0".replace(/[\W_]+/g, "-")
'h-e-l-l-0'

[\W_]+ matches non-word characters or _ one or more times.



Related Topics



Leave a reply



Submit