How to check if a string in Python is in ASCII?
def is_ascii(s):
return all(ord(c) < 128 for c in s)
Check that a string contains only ASCII characters?
In Python 3.7 were added methods which do what you want:
str
,bytes
, andbytearray
gained support for the newisascii()
method, which can be used to test if a string or bytes contain only the ASCII characters.
Otherwise:
>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False
Another version:
>>> def is_ascii(text):
if isinstance(text, unicode):
try:
text.encode('ascii')
except UnicodeEncodeError:
return False
else:
try:
text.decode('ascii')
except UnicodeDecodeError:
return False
return True
...
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False
How do I check if a string is unicode or ascii?
In Python 3, all strings are sequences of Unicode characters. There is a bytes
type that holds raw bytes.
In Python 2, a string may be of type str
or of type unicode
. You can tell which using code something like this:
def whatisthis(s):
if isinstance(s, str):
print "ordinary string"
elif isinstance(s, unicode):
print "unicode string"
else:
print "not a string"
This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.
How to detect non-ASCII character in Python?
# -*- coding: utf-8 -*-
import re
elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']
for e in elements:
if (re.sub('[ -~]', '', e)) != "":
#do something here
print "-"
re.sub('[ -~]', '', e)
will strip out any valid ASCII characters in e
(Specifically, replace any valid ASCII characters with ""), only non-ASCII characters of e are remained.
Hope this help
How to detect ASCII characters on a string in python
It seems You have actually two questions.
How to discover if conversion is needed from accented characters to 'similar' from ASCII.
#coding: utf-8
import string
text = u"Montréal, über, 12.89, Mère, Françoise, noël, 889"
allowed_letters = string.printable
name_has_accented = [letter for letter in text if not letter in allowed_letters]
if name_has_accented:
text = "".join(convert(text))
print(text)How to convert them easily to non accented? You could devise nice generic solutions, or You might do it for French only, quite easily like this:
def convert(text):
replacements = {
u"à": "a",
u"Ö": "o",
u"é": "e",
u"ü": "u",
u"ç": "c",
u"ë": "e",
u"è": "e",
}
def convert_letter(letter):
try:
return replacements[letter]
except KeyError:
return letter
return [convert_letter(letter) for letter in text]
Check if any (all) character of a string is in a given range
You can speed up the check by using a set
(O(1)
contains check), especially if you are checking multiple strings for the same range since the initial set creation requires one iteration as well. You can then use all
for the early-breaking iteration pattern which fits better than any
here:
import string
ascii = set(string.ascii_uppercase)
ascii_all = set(string.ascii_uppercase + string.ascii_lowercase)
if all(x in ascii for x in my_string1):
# my_string1 is all ascii
Of course, any all
construct can be transformed to an any
via DeMorgan's Law:
if not any(x not in ascii for x in my_string1):
# my_string1 is all ascii
Update:
One good pure set based approach not requiring a complete iteration as pointed out by Artyer:
if ascii.issuperset(my_string1):
# my_string1 is all ascii
How to get the ASCII value of a character
From here:
The function
ord()
gets the int value
of the char. And in case you want to
convert back after playing with the
number, functionchr()
does the trick.
>>> ord('a')
97
>>> chr(97)
'a'
>>> chr(ord('a') + 3)
'd'
>>>
In Python 2, there was also the unichr
function, returning the Unicode character whose ordinal is the unichr
argument:
>>> unichr(97)
u'a'
>>> unichr(1234)
u'\u04d2'
In Python 3 you can use chr
instead of unichr
.
ord() - Python 3.6.5rc1 documentation
ord() - Python 2.7.14 documentation
Related Topics
Pyaudio Working, But Spits Out Error Messages Each Time
Python Unexpected Eof While Parsing
How to Use an Image for the Background in Tkinter
Converting String with Utc Offset to a Datetime Object
Finding Indices of Matches of One Array in Another Array
Pandas Groupby Multiple Fields Then Diff
Differencebetween Slice Assignment That Slices the Whole List and Direct Assignment
How to Flatten a Nested JSON Recursively, with Flatten_JSON
Using Backslash in Python (Not to Escape)
What Does It Mean to "Call" a Function in Python
Argument 1 Has Unexpected Type 'Nonetype'
How to Add Both File and JSON Body in a Fastapi Post Request
Zip Variable Empty After First Use