Remove All Special Characters, Punctuation and Spaces from String

Remove all special characters, punctuation and spaces from string

This can be done without regex:

>>> string = "Special $#! characters   spaces 888323"
>>> ''.join(e for e in string if e.isalnum())
'Specialcharactersspaces888323'

You can use str.isalnum:

S.isalnum() -> bool

Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.

If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that's the best way to go about it.

How to remove special characters except space from a file in python?

You can use this pattern, too, with regex:

import re
a = '''hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by@ an#other %million^ %%like $this.'''

for k in a.split("\n"):
print(re.sub(r"[^a-zA-Z0-9]+", ' ', k))
# Or:
# final = " ".join(re.findall(r"[a-zA-Z0-9]+", k))
# print(final)

Output:

hello there A Z R T world welcome to python 
this should the next line followed by an other million like this

Edit:

Otherwise, you can store the final lines into a list:

final = [re.sub(r"[^a-zA-Z0-9]+", ' ', k) for k in a.split("\n")]
print(final)

Output:

['hello there A Z R T world welcome to python ', 'this should the next line followed by an other million like this ']

How to remove all special characters, whitespaces, newlines, and tabs at once in Kotlin?

but the whitespaces between the words after removing all special characters didn't reduce.

This is because you first remove the spaces and then the special chars because of which you get extra spaces in between words. You should first remove the special chars and then go for spaces.

And, it might be possible to do everything in one step using some complex regex, but I would suggest doing it in smaller steps which will be easier to understand. One way to achieve what you want is:

val greet = """
hello ; good . morning , friends how are you
"""
val answer = greet.trim()
.filter { it.isLetterOrDigit() || it.isWhitespace() } // remove special characters
.replace(Regex("\\s+"), " ") // remove repeated whitespaces
print(answer) // hello good morning friends how are you

How do I remove all special characters, punctuation and whitespaces from a string in Lua?

In Lua patterns, the character class %p represents all punctuation characters, the character class %c represents all control characters, and the character class %s represents all whitespace characters. So you can represent all punctuation characters, all control characters, and all whitespace characters with the set [%p%c%s].

To remove these characters from a string, you can use string.gsub. For a string str, the code would be the following:

str = str:gsub('[%p%c%s]', '')

(Note that this is essentially the same as Egor's code snippet above.)

How can I remove extra spaces, special characters, and unwanted text from a list of country names in R?

You must be having issues due to the Unicode whitespace chars in your input.

You can use

trimws(gsub("\\([^()]*\\)|[^[:alpha:][:space:]]", "", country.names))
# => [1] "United States" "China" "Japan"
# [4] "Great Britain" "ROC" "Australia"
# [7] "Netherlands" "France" "Germany" "Italy"

The regex matches

  • \([^()]*\) - any substrings between closest parentheses
  • | - or
  • [^[:alpha:][:space:]] - any char other than a letter or whitespace (this is not fully Unicode aware, that is why it also removes all unusual whitespace).

Hence only regular ASCII whitespace is kept, trimws without any additional arguments works fine.

If the country names can contain accented letters, you will have to use PCRE Unicode-aware regex:

trimws(gsub("(*UCP)\\([^()]*\\)|[^\\p{L}\\s]", "", country.names, perl=TRUE), whitespace="[\\p{Z}\t]")

Here, [^\p{L}\s] (with (*UCP) PCRE flag) matches any char but a Unicode letter or whitespace and [\p{Z}\t] matches any Unicode whitespace.

Removing special characters and spaces from strings

We can use sub to remove the and string, then with gsub remove everything other (^) than the letters (upper, lower case) and convert the case to upper (toupper)

f1 <- function(x) toupper(gsub("[^A-Za-z]", "", sub("and", "", x, fixed = TRUE)))

-testing

> f1(name1)
[1] "ADAMEVE"
> f1(name2)
[1] "SPARTACUS"
> f1(name3)
[1] "FITNESSHEALTH"

How can I remove extra spaces, special characters, and make string lowercase?

I would argue that trying to combine both patterns into one would make it less readable. You could keep using two calls to Regex.Replace() and just append .ToLower() to the second one:

// Remove special characters except for space and /
str1 = Regex.Replace(str1, @"[^0-9a-zA-Z /]+", "");

// Remove all but one space, trim the ends, and convert to lower case.
str1 = Regex.Replace(str1.Trim(), @"\s+", " ").ToLower();
// ^^^^^^^^^

That said, if you really have to use a one-liner, you could write something like this:

str1 = Regex.Replace(str1, @"[^A-Za-z0-9 /]+|( )+", "$1").Trim().ToLower();

This matches any character not present in the negated character class or one or more space characters, placing the space character in a capturing group, and replaces each match with what was captured in group 1 (i.e., nothing or a single space character).

For the sake of completeness, if you want to also handle the trimming with regex (and make the pattern even less readable), you could:

str1 = Regex.Replace(str1, @"[^A-Za-z0-9 /]+|^ +| +$|( )+", "$1").ToLower();


Related Topics



Leave a reply



Submit