Remove \\U{E2} Characters from String

How to remove special character \u{e2} from string

Thank you every one for your help. You are so kind

I am able to find the invalid character using

phoneNumber?.map{$0.unicodeScalars.allSatisfy{$0.isASCII}}

and it returns

Optional<Array<Bool>>
▿ some : 12 elements
- 0 : false
- 1 : true
- 2 : true
- 3 : true
- 4 : true
- 5 : true
- 6 : true
- 7 : true
- 8 : true
- 9 : true
- 10 : false
- 11 : true

I am able to fix this issue using one line

phoneNumber?.filter{$0.unicodeScalars.allSatisfy{$0.isASCII}}

and Count is 10

using

phoneNumber?.filter{$0.unicodeScalars.allSatisfy{$0.isASCII}}.count

Hope it is helpful to others :)

To remove Unicode character from String in Java using REGEX

You can do this sequentially like below:

public static void main(final String args[]) {
String comment = "Good morning! \u2028\u2028I am looking to purchase a new Honda car as I\u2019m outgrowing my current car. I currently drive a Hyundai Accent and I was looking for something a little bit larger and more comfortable like the Honda Civic. May I know if you have any of the models currently in stock? Thank you! Warm regards Sandra";

// remove all non-ASCII characters
comment = comment.replaceAll("[^\\x00-\\x7F]", "");

// remove all the ASCII control characters
comment = comment.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");

// removes non-printable characters from Unicode
comment = comment.replaceAll("\\p{C}", "");
System.out.println(comment);
}

Removing unicode \u2026 like characters in a string in python2.7

Python 2.x

>>> s
'This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!'
>>> print(s.decode('unicode_escape').encode('ascii','ignore'))
This is some text that has to be cleaned! it's annoying!

Python 3.x

>>> s = 'This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!'
>>> s.encode('ascii', 'ignore')
b"This is some text that has to be cleaned! it's annoying!"

Remove unwanted unicode characters from string

testString = Regex.Replace(testString, @"[\u0000-\u0008\u000A-\u001F\u0100-\uFFFF]", "");

or

testString = Regex.Replace(testString, @"[^\t\r\n -~]", "");

Eliminating Unicode Characters and Escape Characters from String

Try

String  stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out = "";
while(mat.find()){
out+=mat.group()+" ";
}
System.out.println(out);

The regex matches all things except unicode and escape characters. The regex pictorially represented as:

Sample Image

Output:

My Actual String My Actual String

Remove zero width space unicode character from Python string

You can encode it into ascii and ignore errors:

u'\u200cHealth & Fitness'.encode('ascii', 'ignore')

Output:

'Health & Fitness'


Related Topics



Leave a reply



Submit