How to check if a String contains only ASCII?
From Guava 19.0 onward, you may use:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString)
method which relies on the factory method ascii()
rather than the now deprecated ASCII
singleton.
Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20
(space) such as tabs, line-feed / return but also BEL
with code 0x07
and DEL
with code 0x7F
.
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000
or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
For earlier Guava versions without the ascii()
method you may write:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);
Check that a string contains only ASCII characters?
In Python 3.7 were added methods which do what you want:
str
,bytes
, andbytearray
gained support for the newisascii()
method, which can be used to test if a string or bytes contain only the ASCII characters.
Otherwise:
>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False
Another version:
>>> def is_ascii(text):
if isinstance(text, unicode):
try:
text.encode('ascii')
except UnicodeEncodeError:
return False
else:
try:
text.decode('ascii')
except UnicodeDecodeError:
return False
return True
...
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False
Match if String contains only ASCII character set
The java code is:
public static boolean isAsciiPrintable(String str) {
if (str == null) {
return false;
}
int sz = str.length();
for (int i = 0; i < sz; i++) {
if (isAsciiPrintable(str.charAt(i)) == false) {
return false;
}
}
return true;
}
public static boolean isAsciiPrintable(char ch) {
return ch >= 32 && ch < 127;
}
}
Ref: http://www.java2s.com/Code/Java/Data-Type/ChecksifthestringcontainsonlyASCIIprintablecharacters.htm
Checking a string contains only ASCII characters
In Go, we care about performance, Therefore, we would benchmark your code:
func isASCII(s string) bool {
for _, c := range s {
if c > unicode.MaxASCII {
return false
}
}
return true
}
BenchmarkRange-4 20000000 82.0 ns/op
A faster (better, more idiomatic) version, which avoids unnecessary rune conversions:
func isASCII(s string) bool {
for i := 0; i < len(s); i++ {
if s[i] > unicode.MaxASCII {
return false
}
}
return true
}
BenchmarkIndex-4 30000000 55.4 ns/op
ascii_test.go
:
package main
import (
"testing"
"unicode"
)
func isASCIIRange(s string) bool {
for _, c := range s {
if c > unicode.MaxASCII {
return false
}
}
return true
}
func BenchmarkRange(b *testing.B) {
str := ascii()
b.ResetTimer()
for N := 0; N < b.N; N++ {
is := isASCIIRange(str)
if !is {
b.Fatal("notASCII")
}
}
}
func isASCIIIndex(s string) bool {
for i := 0; i < len(s); i++ {
if s[i] > unicode.MaxASCII {
return false
}
}
return true
}
func BenchmarkIndex(b *testing.B) {
str := ascii()
b.ResetTimer()
for N := 0; N < b.N; N++ {
is := isASCIIIndex(str)
if !is {
b.Log("notASCII")
}
}
}
func ascii() string {
byt := make([]byte, unicode.MaxASCII+1)
for i := range byt {
byt[i] = byte(i)
}
return string(byt)
}
Output:
$ go test ascii_test.go -bench=.
BenchmarkRange-4 20000000 82.0 ns/op
BenchmarkIndex-4 30000000 55.4 ns/op
$
Find out if a string contains only ASCII characters
I think I will go for one of these two
IF CONVERT(str, 'US7ASCII') = str THEN
DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;
IF ASCIISTR(REPLACE(str, '\', '/')) = REPLACE(str, '\', '/') THEN
DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;
How to check if a string in Python is in ASCII?
def is_ascii(s):
return all(ord(c) < 128 for c in s)
How can I tell if a string has any non-ASCII characters in it?
Try with this regex. It tests for all ascii characters that have some meaning in a string, from space 32
to tilde 126
:
var ascii = /^[ -~]+$/;
if ( !ascii.test( str ) ) {
// string has non-ascii characters
}
Edit: with tabs and newlines:
/^[ -~\t\n\r]+$/;
How to find out if there is any non ASCII character in a string with a file path
As suggested by several comments and highlighted by @CrisLuengo answer, we can iterate the characters looking for any in the upper bit set (live example):
#include <iostream>
#include <string>
#include <algorithm>
bool isASCII (const std::string& s)
{
return !std::any_of(s.begin(), s.end(), [](char c) {
return static_cast<unsigned char>(c) > 127;
});
}
int main()
{
std::string s1 { "C:\\Users\\myUser\\Downloads\\Hello my friend.pdf" };
std::string s2 { "C:\\Users\\myUser\\Downloads\\ü.pdf" };
std::cout << std::boolalpha << isASCII(s1) << "\n";
std::cout << std::boolalpha << isASCII(s2) << "\n";
}
true
false
Related Topics
What Is the Best Way Get the Symmetric Difference Between Two Sets in Java
Which Is the Best Alternative for Java Serialization
Java: How to Indent Xml Generated by Transformer
Java: Convert String to Timestamp
Jersey Maven Quickstart Archetype in Eclipse
How Is Length Implemented in Java Arrays
Should Getters and Setters Be Synchronized
Getting Database Connection in Pure JPA Setup
Jsoup Character Encoding Issue
Java: Join Array of Primitives with Separator
Parallel Streams, Collectors and Thread Safety
How to Convert a Byte Array to Its Numeric Value (Java)
Exception in Thread "Main" Java.Util.Nosuchelementexception
JSONmanagedreference VS JSONbackreference