How do I detect non-ASCII characters in a string?
I found it more useful to detect if any character falls out of the list
if(preg_match('/[^\x20-\x7e]/', $string))
detect non ascii characters in a string
another possible way is to try to convert your string to ASCII and the try to detect all the generated non printable control characters which couldn't be converted
grepl("[[:cntrl:]]", stringi::stri_enc_toascii(x))
## [1] TRUE FALSE TRUE FALSE
Though it seems stringi
has a built in function for this type of things too
stringi::stri_enc_mark(x)
# [1] "latin1" "ASCII" "latin1" "ASCII"
How can I tell if a string has any non-ASCII characters in it?
Try with this regex. It tests for all ascii characters that have some meaning in a string, from space 32
to tilde 126
:
var ascii = /^[ -~]+$/;
if ( !ascii.test( str ) ) {
// string has non-ascii characters
}
Edit: with tabs and newlines:
/^[ -~\t\n\r]+$/;
How to detect non-ASCII character in Python?
# -*- coding: utf-8 -*-
import re
elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']
for e in elements:
if (re.sub('[ -~]', '', e)) != "":
#do something here
print "-"
re.sub('[ -~]', '', e)
will strip out any valid ASCII characters in e
(Specifically, replace any valid ASCII characters with ""), only non-ASCII characters of e are remained.
Hope this help
How to find out if there is any non ASCII character in a string with a file path
As suggested by several comments and highlighted by @CrisLuengo answer, we can iterate the characters looking for any in the upper bit set (live example):
#include <iostream>
#include <string>
#include <algorithm>
bool isASCII (const std::string& s)
{
return !std::any_of(s.begin(), s.end(), [](char c) {
return static_cast<unsigned char>(c) > 127;
});
}
int main()
{
std::string s1 { "C:\\Users\\myUser\\Downloads\\Hello my friend.pdf" };
std::string s2 { "C:\\Users\\myUser\\Downloads\\ü.pdf" };
std::cout << std::boolalpha << isASCII(s1) << "\n";
std::cout << std::boolalpha << isASCII(s2) << "\n";
}
true
false
Testing for Non ASCII character not working Java
If already have a String then iterate over each character and check if each character is in the range of printable ASCII characters space (0x20) to tilde (~).
public static boolean isAscii(String v) {
if (s != null && !s.isEmpty()) {
for(char c : v.toCharArray()) {
if (c < 0x20 || c > 0x7E) return false;
}
}
return true;
}
May also want to review the Character static methods; e.g. isLetter(), isISOControl(), etc. See Reference.
In C#, how can I detect if a character is a non-ASCII character?
ASCII ranges from 0 - 127, so just check for that range:
char c = 'a';//or whatever char you have
bool isAscii = c < 128;
Check that a string contains only ASCII characters?
In Python 3.7 were added methods which do what you want:
str
,bytes
, andbytearray
gained support for the newisascii()
method, which can be used to test if a string or bytes contain only the ASCII characters.
Otherwise:
>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False
Another version:
>>> def is_ascii(text):
if isinstance(text, unicode):
try:
text.encode('ascii')
except UnicodeEncodeError:
return False
else:
try:
text.decode('ascii')
except UnicodeDecodeError:
return False
return True
...
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False
How to check if a String contains only ASCII?
From Guava 19.0 onward, you may use:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString)
method which relies on the factory method ascii()
rather than the now deprecated ASCII
singleton.
Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20
(space) such as tabs, line-feed / return but also BEL
with code 0x07
and DEL
with code 0x7F
.
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000
or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
For earlier Guava versions without the ascii()
method you may write:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);
Related Topics
Align Violin Plots with Dodged Box Plots
Visualizing R Function Dependencies
Predicted Values for Logistic Regression from Glm and Stat_Smooth in Ggplot2 Are Different
Extract Rgb Channels from a Jpeg Image in R
Passing List of Named Parameters to Function
Where Should I Put Data for Automated Tests with Testthat
Aesthetics Must Either Be Length One, or the Same Length as the Dataproblems
How to Draw Gridlines Using Abline() That Are Behind the Data
Change Stringsasfactors Settings for Data.Frame
Dependency 'Slam' Is Not Available When Installing Tm Package
Convert Matrix to Three Column Data.Frame
Random Forest with Classes That Are Very Unbalanced
Enter New Column Names as String in Dplyr's Rename Function
Suppress Messages Displayed by "Print" Instead of "Message" or "Warning" in R
Solving for the Inverse of a Function in R
Are Recursive Functions Used in R