How to Check If a String in Python Is in Ascii

How to check if a string in Python is in ASCII?

def is_ascii(s):
return all(ord(c) < 128 for c in s)

Check that a string contains only ASCII characters?

In Python 3.7 were added methods which do what you want:

str, bytes, and bytearray gained support for the new isascii() method, which can be used to test if a string or bytes contain only the ASCII characters.


Otherwise:

>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False

Another version:

>>> def is_ascii(text):
if isinstance(text, unicode):
try:
text.encode('ascii')
except UnicodeEncodeError:
return False
else:
try:
text.decode('ascii')
except UnicodeDecodeError:
return False
return True
...
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False

How do I check if a string is unicode or ascii?

In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.

In Python 2, a string may be of type str or of type unicode. You can tell which using code something like this:

def whatisthis(s):
if isinstance(s, str):
print "ordinary string"
elif isinstance(s, unicode):
print "unicode string"
else:
print "not a string"

This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.

How to detect non-ASCII character in Python?

# -*- coding: utf-8 -*-

import re

elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']

for e in elements:
if (re.sub('[ -~]', '', e)) != "":
#do something here
print "-"

re.sub('[ -~]', '', e) will strip out any valid ASCII characters in e (Specifically, replace any valid ASCII characters with ""), only non-ASCII characters of e are remained.

Hope this help

How to detect ASCII characters on a string in python

It seems You have actually two questions.

  1. How to discover if conversion is needed from accented characters to 'similar' from ASCII.

    #coding: utf-8
    import string
    text = u"Montréal, über, 12.89, Mère, Françoise, noël, 889"
    allowed_letters = string.printable
    name_has_accented = [letter for letter in text if not letter in allowed_letters]
    if name_has_accented:
    text = "".join(convert(text))
    print(text)
  2. How to convert them easily to non accented? You could devise nice generic solutions, or You might do it for French only, quite easily like this:

    def convert(text):
    replacements = {
    u"à": "a",
    u"Ö": "o",
    u"é": "e",
    u"ü": "u",
    u"ç": "c",
    u"ë": "e",
    u"è": "e",
    }
    def convert_letter(letter):
    try:
    return replacements[letter]
    except KeyError:
    return letter
    return [convert_letter(letter) for letter in text]

Check if any (all) character of a string is in a given range

You can speed up the check by using a set (O(1) contains check), especially if you are checking multiple strings for the same range since the initial set creation requires one iteration as well. You can then use all for the early-breaking iteration pattern which fits better than any here:

import string

ascii = set(string.ascii_uppercase)
ascii_all = set(string.ascii_uppercase + string.ascii_lowercase)

if all(x in ascii for x in my_string1):
# my_string1 is all ascii

Of course, any all construct can be transformed to an any via DeMorgan's Law:

if not any(x not in ascii for x in my_string1):
# my_string1 is all ascii

Update:

One good pure set based approach not requiring a complete iteration as pointed out by Artyer:

if ascii.issuperset(my_string1):
# my_string1 is all ascii

How to get the ASCII value of a character

From here:

The function ord() gets the int value
of the char. And in case you want to
convert back after playing with the
number, function chr() does the trick.

>>> ord('a')
97
>>> chr(97)
'a'
>>> chr(ord('a') + 3)
'd'
>>>

In Python 2, there was also the unichr function, returning the Unicode character whose ordinal is the unichr argument:

>>> unichr(97)
u'a'
>>> unichr(1234)
u'\u04d2'

In Python 3 you can use chr instead of unichr.


ord() - Python 3.6.5rc1 documentation

ord() - Python 2.7.14 documentation



Related Topics



Leave a reply



Submit