How can non-ASCII characters be removed from a string?
This will search and replace all non ASCII letters:
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");
Replace non-ASCII characters with a single space
Your ''.join()
expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:
return ''.join([i if ord(i) < 128 else ' ' for i in text])
This handles characters one by one and would still use one space per character replaced.
Your regular expression should just replace consecutive non-ASCII characters with a space:
re.sub(r'[^\x00-\x7F]+',' ', text)
Note the +
there.
Remove non-ASCII characters from String in Java
I'm guessing that the source of the URL is more at fault. Perhaps you're fixing the wrong problem? Removing "strange" characters from a URI might give it an entirely different meaning.
With that said, you may be able to remove all of the non-ASCII characters with a simple string replacement:
String fixed = original.replaceAll("[^\\x20-\\x7e]", "");
Or you can extend that to all non-four-byte-UTF-8 characters if that doesn't cover the "�" character:
String fixed = original.replaceAll("[^\\u0000-\\uFFFF]", "");
Remove non-ASCII characters from a string in Snowflake
A regular expression should be enough, unless you have other cases in mind:
select regexp_replace('Snéowñflake', '[^\x00-\x7F]', '')
How to remove non Ascii characters(non keyboard special charecters) from a text in hive
You can use
regex_replace('123Abh¿½ï¿½ï¿½ï¿½ï¿½v streeÁÉÍÓt', '[^\\x{0000}-\\x7E]+', '')
Here,
[^
- start of a negated character class that matches any chars but\x{0000}-\x7E
- chars fromNULL
to~
char in the ASCII table
]+
- end of the class, match one or more times.
What if I need to remove all special characters apart from spaces and hyphens? - In this case, you need to use
regex_replace('123Abh¿½ï¿½ï¿½ï¿½ï¿½v streeÁÉÍÓt', '[^\\w\\s-]|_', '')
Here, [^\w\s-]|_+
matches any one symbol other than letter, digit, _
, whitespace and -
, or an underscore (note \w
matches underscores, thus it must be added via a |
, an alternation operator).
Remove non-ASCII non-printable characters from a String
Your requirements are not clear. All characters in a Java String
are Unicode characters, so if you remove them, you'll be left with an empty string. I assume what you mean is that you want to remove any non-ASCII, non-printable characters.
String clean = str.replaceAll("\\P{Print}", "");
Here, \p{Print}
represents a POSIX character class for printable ASCII characters, while \P{Print}
is the complement of that class. With this expression, all characters that are not printable ASCII are replaced with the empty string. (The extra backslash is because \
starts an escape sequence in string literals.)
Apparently, all the input characters are actually ASCII characters that represent a printable encoding of non-printable or non-ASCII characters. Mongo shouldn't have any trouble with these strings, because they contain only plain printable ASCII characters.
This all sounds a little fishy to me. What I believe is happening is that the data really do contain non-printable and non-ASCII characters, and another component (like a logging framework) is replacing these with a printable representation. In your simple tests, you are failing to translate the printable representation back to the original string, so you mistakenly believe the first regular expression is not working.
That's my guess, but if I've misread the situation and you really do need to strip out literal \xHH
escapes, you can do it with the following regular expression.
String clean = str.replaceAll("\\\\x\\p{XDigit}{2}", "");
The API documentation for the Pattern
class does a good job of listing all of the syntax supported by Java's regex library. For more elaboration on what all of the syntax means, I have found the Regular-Expressions.info site very helpful.
Related Topics
How to Capture PHP Output into a Variable
File_Get_Contents Returns 403 Forbidden
Could Not Instantiate Mail Function. Why This Error Occurring
Prevent Innodb Auto Increment on Duplicate Key
PHP Send E-Mail with Attachment
Instantiate a Class with or Without Parentheses
Can You Add an If Statement in Order By
How to Use Curl Instead of File_Get_Contents
Convert PDF to Jpeg with PHP and Imagemagick
Comma Separated Values in MySQL "In" Clause
PHP Function with Unlimited Number of Parameters
Least Memory Intensive Way to Read a File in PHP
Is There a Biginteger Class in PHP
How to Use "Dependency Injection" in Simple PHP Functions, and Should I Bother