How do I check that a string only contains ASCII characters in Python? Something like Ruby's ascii_only?
I want to be able to tell whether string specific data read from file is in ascii
In Python 3.7 were added methods which do what you want:
str,bytes, andbytearraygained support for the newisascii()method, which can be used to test if a string or bytes contain only the ASCII characters.
Otherwise:
>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False
Another version:
>>> def is_ascii(text):
if isinstance(text, unicode):
try:
text.encode('ascii')
except UnicodeEncodeError:
return False
else:
try:
text.decode('ascii')
except UnicodeDecodeError:
return False
return True
...
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False
all is a generator, which feeds characters one by one.mypy (mypy-lang.org) to static-type-check type-hinted string literals to byte types to support this effort (at mypy-check time) instead of relying only on run-time methods (which I understand is what's happening in this answer--pls correct me if I misunderstand)?If you have unicode strings you can use the "encode" function and then catch the exception:
try:
mynewstring = mystring.encode('ascii')
except UnicodeEncodeError:
print("there are non-ascii characters in there")
If you have bytes, you can import the chardet module and check the encoding:
import chardet
# Get the encoding
enc = chardet.detect(mystring)['encoding']
UnicodeDecodeError error that you're expecting, not the base Exception class. Consider what would happen if for whatever reason chardet.detect doesn't have a encoding key, or if mystring would be a list or int.A workaround to your problem would be to try and encode the string in a particular encoding.
For example:
'H€llø'.encode('utf-8')
This will throw the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.
try:
'H€llø'.encode('utf-8')
except UnicodeDecodeError:
print 'This string contains more than just the ASCII characters.'