How to recognize if a string contains unicode chars?
If my assumptions are correct you wish to know if your string contains any "non-ANSI" characters. You can derive this as follows.
public void test()
{
const string WithUnicodeCharacter = "a hebrew character:\uFB2F";
const string WithoutUnicodeCharacter = "an ANSI character:Æ";
bool hasUnicode;
//true
hasUnicode = ContainsUnicodeCharacter(WithUnicodeCharacter);
Console.WriteLine(hasUnicode);
//false
hasUnicode = ContainsUnicodeCharacter(WithoutUnicodeCharacter);
Console.WriteLine(hasUnicode);
}
public bool ContainsUnicodeCharacter(string input)
{
const int MaxAnsiCode = 255;
return input.Any(c => c > MaxAnsiCode);
}
Update
This will detect for extended ASCII. If you only detect for the true ASCII character range (up to 127), then you could potentially get false positives for extended ASCII characters which does not denote Unicode. I have alluded to this in my sample.
Check if java string contains unicode character
You can invert the font selection logic:
The Font
class has goodies like canDisplay
and canDisplayUpTo. Javadoc:
public int canDisplayUpTo(String str)
Indicates whether or not this Font can display a specified String. For
strings with Unicode encoding, it is important to know if a particular
font can display the string. This method returns an offset into the
String str which is the first character this Font cannot display
without using the missing glyph code. If the Font can display all
characters, -1 is returned.
Check if String contains unicode character
In C#, unicode character escape sequences are written as \u25CF
, while ●
is XML or HTML.
So you should write
Text.Contains("\u25CF")
How to find out that string contain unicode character in C#
You could do something like this.
string input = ... // your input.
if(input.Any(c => c > 255))
{
// unicode
}
Check if string contains rage of Unicode characters
Is this what you want?
public static bool ContainsInvalidCharacters(string name)
{
return name.IndexOfAny(new[]
{
'\u0001', '\u0002', '\u0003',
}) != -1;
}
and
bool res = ContainsInvalidCharacters("Hello\u0001");
Note the use of '\uXXXX'
: the '
denote a char
instead of a string
.
Is there a way to check if a string contains a Unicode letter?
The main point here is that MATCHES
requires a full string match, and also, \
backslash passed to the regex engine should be a literal backslash.
The regex can thus be
(?s).*\p{L}.*
Which means:
(?s)
- enable DOTALL mode.*
- match 0 or more any characters\p{L}
- match a Unicode letter.*
- match zero or more characters.
In iOS, just double the backslashes:
NSPredicate * predicat = [NSPredicate predicateWithFormat:@"SELF MATCHES '(?s).*\\p{L}.*'"];
See IDEONE demo
If the backslashes inside the NSPrediciate
are treated specifically, use:
NSPredicate * predicat = [NSPredicate predicateWithFormat:@"SELF MATCHES '(?s).*\\\\p{L}.*'"];
Checking if string contains unicode using standard Python
There is no point is testing 'if a string contains Unicode characters', because all characters in a string are Unicode characters. The Unicode standard encompasses all codepoints that Python supports, including the ASCII range (Unicode codepoints U+0000 through to U+007F).
If you want to test for Emoji code, test for specific ranges, as outlined by the Unicode Emoji class specification:
re.compile(
u'[\u231A-\u231B\u2328\u23CF\23E9-\u23F3...\U0001F9C0]',
flags=re.UNICODE)
where you'll have to pick and choose what codepoints you consider to be Emoji. I personally would not include U+0023 NUMBER SIGN in that category for example, but apparently the Unicode standard does.
Note: To be explicit, the above expression is not complete. There are 209 separate entries in the Emoji category and I didn't feel like writing them all out.
Another note: the above uses a \Uhhhhhhhh
wide Unicode escape sequence; its use is only supported in a regex pattern in Python 3.3 and up, or in a wide (UCS-4) build for earlier versions of Python. For a narrow Python build, you'll have to match on surrogate pairs for codepoints over U+FFFF.
Check if string contains only Unicode values [\u0030-\u0039] or [\u0660-\u0669]
Use \x for unicode characters:
^([\x{0030}-\x{0039}\x{0660}-\x{0669}]+)$
if the patternt should match an empty string too, use * instead of +
Use this if you dont want to allows mixing characters from both sets you provided:
^([\x{0030}-\x{0039}]+|[\x{0660}-\x{0669}]+)$
https://regex101.com/r/xqWL4q/6
As mentioned by Holger in comments below. \x{0030}-\x{0039}
is equivalent with [0-9]
. So could be substituted and would be more readable.
Related Topics
Custom Validation Attributes: Comparing Two Properties in the Same Model
Wpf Equivalent to Textrenderer
How to Secure Passwords Stored Inside Web.Config
Monitor a Process's Network Usage
Exceptions That Can't Be Caught by Try-Catch Block in Application Code
How to Write to a Onenote 2013 Page Using C# and the Onenote Interop
Getting Date or Time Only from a Datetime Object
How to Set a Character at an Index in a String in C#
Can Someone Explain How Bcrypt Verifies a Hash
Change File Extension Using C#
How to Add Http Header to Soap Client
Reversible Shuffle Algorithm Using a Key
Xmlserializer Serialize Generic List of Interface
Why Do I Get "System.Data.Datarowview" Instead of Real Values in My Winforms Listbox