16

How do I check that a string only contains ASCII characters in Python? Something like Ruby's ascii_only?

I want to be able to tell whether string specific data read from file is in ascii

4 Answers 4

36

In Python 3.7 were added methods which do what you want:

str, bytes, and bytearray gained support for the new isascii() method, which can be used to test if a string or bytes contain only the ASCII characters.


Otherwise:

>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False

Another version:

>>> def is_ascii(text):
    if isinstance(text, unicode):
        try:
            text.encode('ascii')
        except UnicodeEncodeError:
            return False
    else:
        try:
            text.decode('ascii')
        except UnicodeDecodeError:
            return False
    return True
... 
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False
Sign up to request clarification or add additional context in comments.

5 Comments

I think this will always create an entire list? It uses more memory, and will be slower if the first character is a >0x80 character since it keeps iterating over the entire string (which doesn't matter too much in most applications, but does in some).
@Carpetsmoker >I think this will always create an entire list? < No, it won't. The expression inside all is a generator, which feeds characters one by one.
Which is faster and has less time complexity? or both are the same?
@JavaSa, time complexity should be the same. Which one is faster - you should measure. I suspect that for bigger strings encode/decode version is faster - it's implemented in C.
Is there any way to leverage mypy (mypy-lang.org) to static-type-check type-hinted string literals to byte types to support this effort (at mypy-check time) instead of relying only on run-time methods (which I understand is what's happening in this answer--pls correct me if I misunderstand)?
6

You can also opt for regex to check for only ascii characters. [\x00-\x7F] can match a single ascii character:

>>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None
>>> OnlyAscii('string')
True
>>> OnlyAscii('Tannh‰user')
False

Comments

6

If you have unicode strings you can use the "encode" function and then catch the exception:

try:
    mynewstring = mystring.encode('ascii')
except UnicodeEncodeError:
    print("there are non-ascii characters in there")

If you have bytes, you can import the chardet module and check the encoding:

import chardet

# Get the encoding
enc = chardet.detect(mystring)['encoding']

1 Comment

You should catch the UnicodeDecodeError error that you're expecting, not the base Exception class. Consider what would happen if for whatever reason chardet.detect doesn't have a encoding key, or if mystring would be a list or int.
0

A workaround to your problem would be to try and encode the string in a particular encoding.

For example:

'H€llø'.encode('utf-8')

This will throw the following error:

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.

try:
    'H€llø'.encode('utf-8')
except UnicodeDecodeError:
    print 'This string contains more than just the ASCII characters.'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.