What's the Difference Between Str.Isdigit, Isnumeric and Isdecimal in Python

What's the difference between str.isdigit(), isnumeric() and isdecimal() in Python?

It's mostly about unicode classifications. Here's some examples to show discrepancies:

>>> def spam(s):
... for attr in 'isnumeric', 'isdecimal', 'isdigit':
... print(attr, getattr(s, attr)())
...
>>> spam('½')
isnumeric True
isdecimal False
isdigit False
>>> spam('³')
isnumeric True
isdecimal False
isdigit True

Specific behaviour is in the official docs here.

Script to find all of them:

import sys
import unicodedata
from collections import defaultdict

d = defaultdict(list)
for i in range(sys.maxunicode + 1):
s = chr(i)
t = s.isnumeric(), s.isdecimal(), s.isdigit()
if len(set(t)) == 2:
try:
name = unicodedata.name(s)
except ValueError:
name = f'codepoint{i}'
print(s, name)
d[t].append(s)

str.isdecimal() and str.isdigit() difference example

There are differences, but they're somewhat rare*. It mainly crops up with various unicode characters, such as 2:

>>> c = '\u00B2'
>>> c.isdecimal()
False
>>> c.isdigit()
True

You can also go further down the careful-unicode-distinction rabbit hole with the isnumeric method:

>>> c = '\u00BD' # ½
>>> c.isdecimal()
False
>>> c.isdigit()
False
>>> c.isnumeric()
True

*At least, I've never encountered production code that needs to distinguish between strings that contain different types of these exceptional situations, but surely use cases exist somewhere.

Difference between isnumeric and isdecimal in Python

The two methods test for specific Unicode character classes. If all characters in the string are from the specified character class (have the specific Unicode property), the test is true.

isdecimal() does not test if a string is a decimal number. See the documentation:

Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category “Nd”.

The . period character is not a member of the Nd category; it is not a decimal character.

str.isdecimal() characters are a subset of str.isnumeric(); this tests for all numeric characters. Again, from the documentation:

Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

Nd is Numeric_Type=Digit here.

If you want to test if a string is a valid decimal number, just try to convert it to a float:

def is_valid_decimal(s):
try:
float(s)
except ValueError:
return False
else:
return True

str.isdigit() behaviour when handling strings

str.isdigit doesn't claim to be related to parsability as an int. It's reporting a simple Unicode property, is it a decimal character or digit of some sort:

str.isdigit()

Return True if all characters in the string are digits and there is at least one character, False otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

In short, str.isdigit is thoroughly useless for detecting valid numbers. The correct solution to checking if a given string is a legal integer is to call int on it, and catch the ValueError if it's not a legal integer. Anything else you do will be (badly) reinventing the same tests the actual parsing code in int() performs, so why not let it do the work in the first place?

Side-note: You're using the term "utf-8" incorrectly. UTF-8 is a specific way of encoding Unicode, and only applies to raw binary data. Python's str is an "idealized" Unicode text type; it has no encoding (under the hood, it's stored encoded as one of ASCII, latin-1, UCS-2, UCS-4, and possibly also UTF-8, but none of that is visible at the Python layer outside of indirect measurements like sys.getsizeof, which only hints at the underlying encoding by letting you see how much memory the string consumes). The characters you're talking about are simple Unicode characters above the ASCII range, they're not specifically UTF-8.

Difference between unicode.isdigit() and unicode.isnumeric()

The Python 3 documentation is a little clearer than the Python 2 docs:

str.isdigit()
[...] Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

str.isnumeric()
Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

So isnumeric() tests additionally for Numeric_Type=Numeric. Quoting from a historic proposal for official numeric type definitions:

Numeric_Type=Decimal
Characters used in a positional decimal systems, which standard base-10 radix systems with contiguous digits 0..9, and are most-significant-digit first (backingstore order). These are coextensive by definition with General_Category=Decimal_Number.

Numeric_Type=Digit
Variants of positional decimal characters (Numeric_Type=Decimal) or sequences thereof. These include super/subscripts, enclosed, or decorated by the addition of characters such as parentheses, dots, or commas.

Numeric_Type=Numeric
Characters with numeric value, but that are neither Decimal nor Digit.

So any character that is numeric, but not decimal or a variation thereof. Think fractions, roman numerals, glyphs that combine digits, and any numbering system that is not decimal-based.

That includes:

>>> import unicodedata
>>> for codepoint in range(2**16):
... chr = unichr(codepoint)
... if chr.isnumeric() and not chr.isdigit():
... print u'{:04x}: {} ({})'.format(codepoint, chr, unicodedata.name(chr, 'UNNAMED'))
...
00bc: ¼ (VULGAR FRACTION ONE QUARTER)
00bd: ½ (VULGAR FRACTION ONE HALF)
00be: ¾ (VULGAR FRACTION THREE QUARTERS)
09f4: ৴ (BENGALI CURRENCY NUMERATOR ONE)
09f5: ৵ (BENGALI CURRENCY NUMERATOR TWO)
09f6: ৶ (BENGALI CURRENCY NUMERATOR THREE)
09f7: ৷ (BENGALI CURRENCY NUMERATOR FOUR)
09f8: ৸ (BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR)
09f9: ৹ (BENGALI CURRENCY DENOMINATOR SIXTEEN)
0bf0: ௰ (TAMIL NUMBER TEN)
0bf1: ௱ (TAMIL NUMBER ONE HUNDRED)
0bf2: ௲ (TAMIL NUMBER ONE THOUSAND)
0c78: ౸ (TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR)
0c79: ౹ (TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF FOUR)
0c7a: ౺ (TELUGU FRACTION DIGIT TWO FOR ODD POWERS OF FOUR)
0c7b: ౻ (TELUGU FRACTION DIGIT THREE FOR ODD POWERS OF FOUR)
0c7c: ౼ (TELUGU FRACTION DIGIT ONE FOR EVEN POWERS OF FOUR)
0c7d: ౽ (TELUGU FRACTION DIGIT TWO FOR EVEN POWERS OF FOUR)
0c7e: ౾ (TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR)
0d70: ൰ (MALAYALAM NUMBER TEN)
0d71: ൱ (MALAYALAM NUMBER ONE HUNDRED)
0d72: ൲ (MALAYALAM NUMBER ONE THOUSAND)
0d73: ൳ (MALAYALAM FRACTION ONE QUARTER)
0d74: ൴ (MALAYALAM FRACTION ONE HALF)
0d75: ൵ (MALAYALAM FRACTION THREE QUARTERS)
0f2a: ༪ (TIBETAN DIGIT HALF ONE)
0f2b: ༫ (TIBETAN DIGIT HALF TWO)
0f2c: ༬ (TIBETAN DIGIT HALF THREE)
0f2d: ༭ (TIBETAN DIGIT HALF FOUR)
0f2e: ༮ (TIBETAN DIGIT HALF FIVE)
0f2f: ༯ (TIBETAN DIGIT HALF SIX)
0f30: ༰ (TIBETAN DIGIT HALF SEVEN)
0f31: ༱ (TIBETAN DIGIT HALF EIGHT)
0f32: ༲ (TIBETAN DIGIT HALF NINE)
0f33: ༳ (TIBETAN DIGIT HALF ZERO)
1372: ፲ (ETHIOPIC NUMBER TEN)
1373: ፳ (ETHIOPIC NUMBER TWENTY)
1374: ፴ (ETHIOPIC NUMBER THIRTY)
1375: ፵ (ETHIOPIC NUMBER FORTY)
1376: ፶ (ETHIOPIC NUMBER FIFTY)
1377: ፷ (ETHIOPIC NUMBER SIXTY)
1378: ፸ (ETHIOPIC NUMBER SEVENTY)
1379: ፹ (ETHIOPIC NUMBER EIGHTY)
137a: ፺ (ETHIOPIC NUMBER NINETY)
137b: ፻ (ETHIOPIC NUMBER HUNDRED)
137c: ፼ (ETHIOPIC NUMBER TEN THOUSAND)
16ee: ᛮ (RUNIC ARLAUG SYMBOL)
16ef: ᛯ (RUNIC TVIMADUR SYMBOL)
16f0: ᛰ (RUNIC BELGTHOR SYMBOL)
17f0: ៰ (KHMER SYMBOL LEK ATTAK SON)
17f1: ៱ (KHMER SYMBOL LEK ATTAK MUOY)
17f2: ៲ (KHMER SYMBOL LEK ATTAK PII)
17f3: ៳ (KHMER SYMBOL LEK ATTAK BEI)
17f4: ៴ (KHMER SYMBOL LEK ATTAK BUON)
17f5: ៵ (KHMER SYMBOL LEK ATTAK PRAM)
17f6: ៶ (KHMER SYMBOL LEK ATTAK PRAM-MUOY)
17f7: ៷ (KHMER SYMBOL LEK ATTAK PRAM-PII)
17f8: ៸ (KHMER SYMBOL LEK ATTAK PRAM-BEI)
17f9: ៹ (KHMER SYMBOL LEK ATTAK PRAM-BUON)
2150: ⅐ (VULGAR FRACTION ONE SEVENTH)
2151: ⅑ (VULGAR FRACTION ONE NINTH)
2152: ⅒ (VULGAR FRACTION ONE TENTH)
2153: ⅓ (VULGAR FRACTION ONE THIRD)
2154: ⅔ (VULGAR FRACTION TWO THIRDS)
2155: ⅕ (VULGAR FRACTION ONE FIFTH)
2156: ⅖ (VULGAR FRACTION TWO FIFTHS)
2157: ⅗ (VULGAR FRACTION THREE FIFTHS)
2158: ⅘ (VULGAR FRACTION FOUR FIFTHS)
2159: ⅙ (VULGAR FRACTION ONE SIXTH)
215a: ⅚ (VULGAR FRACTION FIVE SIXTHS)
215b: ⅛ (VULGAR FRACTION ONE EIGHTH)
215c: ⅜ (VULGAR FRACTION THREE EIGHTHS)
215d: ⅝ (VULGAR FRACTION FIVE EIGHTHS)
215e: ⅞ (VULGAR FRACTION SEVEN EIGHTHS)
215f: ⅟ (FRACTION NUMERATOR ONE)
2160: Ⅰ (ROMAN NUMERAL ONE)
2161: Ⅱ (ROMAN NUMERAL TWO)
2162: Ⅲ (ROMAN NUMERAL THREE)
2163: Ⅳ (ROMAN NUMERAL FOUR)
2164: Ⅴ (ROMAN NUMERAL FIVE)
2165: Ⅵ (ROMAN NUMERAL SIX)
2166: Ⅶ (ROMAN NUMERAL SEVEN)
2167: Ⅷ (ROMAN NUMERAL EIGHT)
2168: Ⅸ (ROMAN NUMERAL NINE)
2169: Ⅹ (ROMAN NUMERAL TEN)
216a: Ⅺ (ROMAN NUMERAL ELEVEN)
216b: Ⅻ (ROMAN NUMERAL TWELVE)
216c: Ⅼ (ROMAN NUMERAL FIFTY)
216d: Ⅽ (ROMAN NUMERAL ONE HUNDRED)
216e: Ⅾ (ROMAN NUMERAL FIVE HUNDRED)
216f: Ⅿ (ROMAN NUMERAL ONE THOUSAND)
2170: ⅰ (SMALL ROMAN NUMERAL ONE)
2171: ⅱ (SMALL ROMAN NUMERAL TWO)
2172: ⅲ (SMALL ROMAN NUMERAL THREE)
2173: ⅳ (SMALL ROMAN NUMERAL FOUR)
2174: ⅴ (SMALL ROMAN NUMERAL FIVE)
2175: ⅵ (SMALL ROMAN NUMERAL SIX)
2176: ⅶ (SMALL ROMAN NUMERAL SEVEN)
2177: ⅷ (SMALL ROMAN NUMERAL EIGHT)
2178: ⅸ (SMALL ROMAN NUMERAL NINE)
2179: ⅹ (SMALL ROMAN NUMERAL TEN)
217a: ⅺ (SMALL ROMAN NUMERAL ELEVEN)
217b: ⅻ (SMALL ROMAN NUMERAL TWELVE)
217c: ⅼ (SMALL ROMAN NUMERAL FIFTY)
217d: ⅽ (SMALL ROMAN NUMERAL ONE HUNDRED)
217e: ⅾ (SMALL ROMAN NUMERAL FIVE HUNDRED)
217f: ⅿ (SMALL ROMAN NUMERAL ONE THOUSAND)
2180: ↀ (ROMAN NUMERAL ONE THOUSAND C D)
2181: ↁ (ROMAN NUMERAL FIVE THOUSAND)
2182: ↂ (ROMAN NUMERAL TEN THOUSAND)
2185: ↅ (ROMAN NUMERAL SIX LATE FORM)
2186: ↆ (ROMAN NUMERAL FIFTY EARLY FORM)
2187: ↇ (ROMAN NUMERAL FIFTY THOUSAND)
2188: ↈ (ROMAN NUMERAL ONE HUNDRED THOUSAND)
2189: ↉ (VULGAR FRACTION ZERO THIRDS)
2469: ⑩ (CIRCLED NUMBER TEN)
246a: ⑪ (CIRCLED NUMBER ELEVEN)
246b: ⑫ (CIRCLED NUMBER TWELVE)
246c: ⑬ (CIRCLED NUMBER THIRTEEN)
246d: ⑭ (CIRCLED NUMBER FOURTEEN)
246e: ⑮ (CIRCLED NUMBER FIFTEEN)
246f: ⑯ (CIRCLED NUMBER SIXTEEN)
2470: ⑰ (CIRCLED NUMBER SEVENTEEN)
2471: ⑱ (CIRCLED NUMBER EIGHTEEN)
2472: ⑲ (CIRCLED NUMBER NINETEEN)
2473: ⑳ (CIRCLED NUMBER TWENTY)
247d: ⑽ (PARENTHESIZED NUMBER TEN)
247e: ⑾ (PARENTHESIZED NUMBER ELEVEN)
247f: ⑿ (PARENTHESIZED NUMBER TWELVE)
2480: ⒀ (PARENTHESIZED NUMBER THIRTEEN)
2481: ⒁ (PARENTHESIZED NUMBER FOURTEEN)
2482: ⒂ (PARENTHESIZED NUMBER FIFTEEN)
2483: ⒃ (PARENTHESIZED NUMBER SIXTEEN)
2484: ⒄ (PARENTHESIZED NUMBER SEVENTEEN)
2485: ⒅ (PARENTHESIZED NUMBER EIGHTEEN)
2486: ⒆ (PARENTHESIZED NUMBER NINETEEN)
2487: ⒇ (PARENTHESIZED NUMBER TWENTY)
2491: ⒑ (NUMBER TEN FULL STOP)
2492: ⒒ (NUMBER ELEVEN FULL STOP)
2493: ⒓ (NUMBER TWELVE FULL STOP)
2494: ⒔ (NUMBER THIRTEEN FULL STOP)
2495: ⒕ (NUMBER FOURTEEN FULL STOP)
2496: ⒖ (NUMBER FIFTEEN FULL STOP)
2497: ⒗ (NUMBER SIXTEEN FULL STOP)
2498: ⒘ (NUMBER SEVENTEEN FULL STOP)
2499: ⒙ (NUMBER EIGHTEEN FULL STOP)
249a: ⒚ (NUMBER NINETEEN FULL STOP)
249b: ⒛ (NUMBER TWENTY FULL STOP)
24eb: ⓫ (NEGATIVE CIRCLED NUMBER ELEVEN)
24ec: ⓬ (NEGATIVE CIRCLED NUMBER TWELVE)
24ed: ⓭ (NEGATIVE CIRCLED NUMBER THIRTEEN)
24ee: ⓮ (NEGATIVE CIRCLED NUMBER FOURTEEN)
24ef: ⓯ (NEGATIVE CIRCLED NUMBER FIFTEEN)
24f0: ⓰ (NEGATIVE CIRCLED NUMBER SIXTEEN)
24f1: ⓱ (NEGATIVE CIRCLED NUMBER SEVENTEEN)
24f2: ⓲ (NEGATIVE CIRCLED NUMBER EIGHTEEN)
24f3: ⓳ (NEGATIVE CIRCLED NUMBER NINETEEN)
24f4: ⓴ (NEGATIVE CIRCLED NUMBER TWENTY)
24fe: ⓾ (DOUBLE CIRCLED NUMBER TEN)
277f: ❿ (DINGBAT NEGATIVE CIRCLED NUMBER TEN)
2789: ➉ (DINGBAT CIRCLED SANS-SERIF NUMBER TEN)
2793: ➓ (DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN)
2cfd: ⳽ (COPTIC FRACTION ONE HALF)
3007: 〇 (IDEOGRAPHIC NUMBER ZERO)
3021: 〡 (HANGZHOU NUMERAL ONE)
3022: 〢 (HANGZHOU NUMERAL TWO)
3023: 〣 (HANGZHOU NUMERAL THREE)
3024: 〤 (HANGZHOU NUMERAL FOUR)
3025: 〥 (HANGZHOU NUMERAL FIVE)
3026: 〦 (HANGZHOU NUMERAL SIX)
3027: 〧 (HANGZHOU NUMERAL SEVEN)
3028: 〨 (HANGZHOU NUMERAL EIGHT)
3029: 〩 (HANGZHOU NUMERAL NINE)
3038: 〸 (HANGZHOU NUMERAL TEN)
3039: 〹 (HANGZHOU NUMERAL TWENTY)
303a: 〺 (HANGZHOU NUMERAL THIRTY)
3192: ㆒ (IDEOGRAPHIC ANNOTATION ONE MARK)
3193: ㆓ (IDEOGRAPHIC ANNOTATION TWO MARK)
3194: ㆔ (IDEOGRAPHIC ANNOTATION THREE MARK)
3195: ㆕ (IDEOGRAPHIC ANNOTATION FOUR MARK)
3220: ㈠ (PARENTHESIZED IDEOGRAPH ONE)
3221: ㈡ (PARENTHESIZED IDEOGRAPH TWO)
3222: ㈢ (PARENTHESIZED IDEOGRAPH THREE)
3223: ㈣ (PARENTHESIZED IDEOGRAPH FOUR)
3224: ㈤ (PARENTHESIZED IDEOGRAPH FIVE)
3225: ㈥ (PARENTHESIZED IDEOGRAPH SIX)
3226: ㈦ (PARENTHESIZED IDEOGRAPH SEVEN)
3227: ㈧ (PARENTHESIZED IDEOGRAPH EIGHT)
3228: ㈨ (PARENTHESIZED IDEOGRAPH NINE)
3229: ㈩ (PARENTHESIZED IDEOGRAPH TEN)
3251: ㉑ (CIRCLED NUMBER TWENTY ONE)
3252: ㉒ (CIRCLED NUMBER TWENTY TWO)
3253: ㉓ (CIRCLED NUMBER TWENTY THREE)
3254: ㉔ (CIRCLED NUMBER TWENTY FOUR)
3255: ㉕ (CIRCLED NUMBER TWENTY FIVE)
3256: ㉖ (CIRCLED NUMBER TWENTY SIX)
3257: ㉗ (CIRCLED NUMBER TWENTY SEVEN)
3258: ㉘ (CIRCLED NUMBER TWENTY EIGHT)
3259: ㉙ (CIRCLED NUMBER TWENTY NINE)
325a: ㉚ (CIRCLED NUMBER THIRTY)
325b: ㉛ (CIRCLED NUMBER THIRTY ONE)
325c: ㉜ (CIRCLED NUMBER THIRTY TWO)
325d: ㉝ (CIRCLED NUMBER THIRTY THREE)
325e: ㉞ (CIRCLED NUMBER THIRTY FOUR)
325f: ㉟ (CIRCLED NUMBER THIRTY FIVE)
3280: ㊀ (CIRCLED IDEOGRAPH ONE)
3281: ㊁ (CIRCLED IDEOGRAPH TWO)
3282: ㊂ (CIRCLED IDEOGRAPH THREE)
3283: ㊃ (CIRCLED IDEOGRAPH FOUR)
3284: ㊄ (CIRCLED IDEOGRAPH FIVE)
3285: ㊅ (CIRCLED IDEOGRAPH SIX)
3286: ㊆ (CIRCLED IDEOGRAPH SEVEN)
3287: ㊇ (CIRCLED IDEOGRAPH EIGHT)
3288: ㊈ (CIRCLED IDEOGRAPH NINE)
3289: ㊉ (CIRCLED IDEOGRAPH TEN)
32b1: ㊱ (CIRCLED NUMBER THIRTY SIX)
32b2: ㊲ (CIRCLED NUMBER THIRTY SEVEN)
32b3: ㊳ (CIRCLED NUMBER THIRTY EIGHT)
32b4: ㊴ (CIRCLED NUMBER THIRTY NINE)
32b5: ㊵ (CIRCLED NUMBER FORTY)
32b6: ㊶ (CIRCLED NUMBER FORTY ONE)
32b7: ㊷ (CIRCLED NUMBER FORTY TWO)
32b8: ㊸ (CIRCLED NUMBER FORTY THREE)
32b9: ㊹ (CIRCLED NUMBER FORTY FOUR)
32ba: ㊺ (CIRCLED NUMBER FORTY FIVE)
32bb: ㊻ (CIRCLED NUMBER FORTY SIX)
32bc: ㊼ (CIRCLED NUMBER FORTY SEVEN)
32bd: ㊽ (CIRCLED NUMBER FORTY EIGHT)
32be: ㊾ (CIRCLED NUMBER FORTY NINE)
32bf: ㊿ (CIRCLED NUMBER FIFTY)
3405: 㐅 (CJK UNIFIED IDEOGRAPH-3405)
3483: 㒃 (CJK UNIFIED IDEOGRAPH-3483)
382a: 㠪 (CJK UNIFIED IDEOGRAPH-382A)
3b4d: 㭍 (CJK UNIFIED IDEOGRAPH-3B4D)
4e00: 一 (CJK UNIFIED IDEOGRAPH-4E00)
4e03: 七 (CJK UNIFIED IDEOGRAPH-4E03)
4e07: 万 (CJK UNIFIED IDEOGRAPH-4E07)
4e09: 三 (CJK UNIFIED IDEOGRAPH-4E09)
4e5d: 九 (CJK UNIFIED IDEOGRAPH-4E5D)
4e8c: 二 (CJK UNIFIED IDEOGRAPH-4E8C)
4e94: 五 (CJK UNIFIED IDEOGRAPH-4E94)
4e96: 亖 (CJK UNIFIED IDEOGRAPH-4E96)
4ebf: 亿 (CJK UNIFIED IDEOGRAPH-4EBF)
4ec0: 什 (CJK UNIFIED IDEOGRAPH-4EC0)
4edf: 仟 (CJK UNIFIED IDEOGRAPH-4EDF)
4ee8: 仨 (CJK UNIFIED IDEOGRAPH-4EE8)
4f0d: 伍 (CJK UNIFIED IDEOGRAPH-4F0D)
4f70: 佰 (CJK UNIFIED IDEOGRAPH-4F70)
5104: 億 (CJK UNIFIED IDEOGRAPH-5104)
5146: 兆 (CJK UNIFIED IDEOGRAPH-5146)
5169: 兩 (CJK UNIFIED IDEOGRAPH-5169)
516b: 八 (CJK UNIFIED IDEOGRAPH-516B)
516d: 六 (CJK UNIFIED IDEOGRAPH-516D)
5341: 十 (CJK UNIFIED IDEOGRAPH-5341)
5343: 千 (CJK UNIFIED IDEOGRAPH-5343)
5344: 卄 (CJK UNIFIED IDEOGRAPH-5344)
5345: 卅 (CJK UNIFIED IDEOGRAPH-5345)
534c: 卌 (CJK UNIFIED IDEOGRAPH-534C)
53c1: 叁 (CJK UNIFIED IDEOGRAPH-53C1)
53c2: 参 (CJK UNIFIED IDEOGRAPH-53C2)
53c3: 參 (CJK UNIFIED IDEOGRAPH-53C3)
53c4: 叄 (CJK UNIFIED IDEOGRAPH-53C4)
56db: 四 (CJK UNIFIED IDEOGRAPH-56DB)
58f1: 壱 (CJK UNIFIED IDEOGRAPH-58F1)
58f9: 壹 (CJK UNIFIED IDEOGRAPH-58F9)
5e7a: 幺 (CJK UNIFIED IDEOGRAPH-5E7A)
5efe: 廾 (CJK UNIFIED IDEOGRAPH-5EFE)
5eff: 廿 (CJK UNIFIED IDEOGRAPH-5EFF)
5f0c: 弌 (CJK UNIFIED IDEOGRAPH-5F0C)
5f0d: 弍 (CJK UNIFIED IDEOGRAPH-5F0D)
5f0e: 弎 (CJK UNIFIED IDEOGRAPH-5F0E)
5f10: 弐 (CJK UNIFIED IDEOGRAPH-5F10)
62fe: 拾 (CJK UNIFIED IDEOGRAPH-62FE)
634c: 捌 (CJK UNIFIED IDEOGRAPH-634C)
67d2: 柒 (CJK UNIFIED IDEOGRAPH-67D2)
6f06: 漆 (CJK UNIFIED IDEOGRAPH-6F06)
7396: 玖 (CJK UNIFIED IDEOGRAPH-7396)
767e: 百 (CJK UNIFIED IDEOGRAPH-767E)
8086: 肆 (CJK UNIFIED IDEOGRAPH-8086)
842c: 萬 (CJK UNIFIED IDEOGRAPH-842C)
8cae: 貮 (CJK UNIFIED IDEOGRAPH-8CAE)
8cb3: 貳 (CJK UNIFIED IDEOGRAPH-8CB3)
8d30: 贰 (CJK UNIFIED IDEOGRAPH-8D30)
9621: 阡 (CJK UNIFIED IDEOGRAPH-9621)
9646: 陆 (CJK UNIFIED IDEOGRAPH-9646)
964c: 陌 (CJK UNIFIED IDEOGRAPH-964C)
9678: 陸 (CJK UNIFIED IDEOGRAPH-9678)
96f6: 零 (CJK UNIFIED IDEOGRAPH-96F6)
a6e6: ꛦ (BAMUM LETTER MO)
a6e7: ꛧ (BAMUM LETTER MBAA)
a6e8: ꛨ (BAMUM LETTER TET)
a6e9: ꛩ (BAMUM LETTER KPA)
a6ea: ꛪ (BAMUM LETTER TEN)
a6eb: ꛫ (BAMUM LETTER NTUU)
a6ec: ꛬ (BAMUM LETTER SAMBA)
a6ed: ꛭ (BAMUM LETTER FAAMAE)
a6ee: ꛮ (BAMUM LETTER KOVUU)
a6ef: ꛯ (BAMUM LETTER KOGHOM)
a830: ꠰ (NORTH INDIC FRACTION ONE QUARTER)
a831: ꠱ (NORTH INDIC FRACTION ONE HALF)
a832: ꠲ (NORTH INDIC FRACTION THREE QUARTERS)
a833: ꠳ (NORTH INDIC FRACTION ONE SIXTEENTH)
a834: ꠴ (NORTH INDIC FRACTION ONE EIGHTH)
a835: ꠵ (NORTH INDIC FRACTION THREE SIXTEENTHS)
f96b: 參 (CJK COMPATIBILITY IDEOGRAPH-F96B)
f973: 拾 (CJK COMPATIBILITY IDEOGRAPH-F973)
f978: 兩 (CJK COMPATIBILITY IDEOGRAPH-F978)
f9b2: 零 (CJK COMPATIBILITY IDEOGRAPH-F9B2)
f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)

However, the distinction between Numeric_Type=Digit and Numeric_Type=Numeric is no longer considered useful, and Numeric_Type=Digit is no longer used for new characters since Unicode 6.3.0. Quoting Unicode Standard Annex #44:

Starting with Unicode 6.3.0, no newly encoded numeric characters will be given Numeric_Type=Digit, nor will existing characters with Numeric_Type=Numeric be changed to Numeric_Type=Digit. The distinction between those two types is not considered useful.

Thus, /code> (DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO) and other characters that once would have been assigned Numeric_Type=Digit have instead been assigned Numeric_Type=Numeric, and they report False for isdigit:

>>> ''.isdigit()
False

Weird behavoir with casting int

This is exactly as documented:

str.isdigit() Return true if all characters in the string are digits
and there is at least one character, false otherwise. Digits include
decimal characters and digits that need special handling, such as the
compatibility superscript digits. This covers digits which cannot be
used to form numbers in base 10, like the Kharosthi numbers. Formally,
a digit is a character that has the property value Numeric_Type=Digit
or Numeric_Type=Decimal.

isnumeric & isdecimal not behaving as expected

As defined in the Python documentation, isnumeric returns True if all characters within the string are numeric, otherwise False. A dot is not considered to be numeric.

How to check if input is a number, and do something about it if it isn't?

You can just check if the string guess is a sequence of decimal digits, and is more efficient than the regex solution.

if not guess.isdecimal():
print("You didn't enter a number")


Related Topics



Leave a reply



Submit