Check that a string contains only ASCII characters?

Question

How do I check that a string only contains ASCII characters in Python? Something like Ruby's ascii_only?

I want to be able to tell whether string specific data read from file is in ascii

wjandrea · Accepted Answer · 2022-02-09 22:27:20Z

36

In Python 3.7 were added methods which do what you want:

str, bytes, and bytearray gained support for the new isascii() method, which can be used to test if a string or bytes contain only the ASCII characters.

Otherwise:

>>> all(ord(char) < 128 for char in 'string')
True
>>> all(ord(char) < 128 for char in 'строка')
False

Another version:

>>> def is_ascii(text):
    if isinstance(text, unicode):
        try:
            text.encode('ascii')
        except UnicodeEncodeError:
            return False
    else:
        try:
            text.decode('ascii')
        except UnicodeDecodeError:
            return False
    return True
... 
>>> is_ascii('text')
True
>>> is_ascii(u'text')
True
>>> is_ascii(u'text-строка')
False
>>> is_ascii('text-строка')
False
>>> is_ascii(u'text-строка'.encode('utf-8'))
False

edited Feb 9, 2022 at 22:27

wjandrea

34k10 gold badges69 silver badges105 bronze badges

answered Mar 9, 2016 at 11:28

warvariuc

60.1k45 gold badges183 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Martin Tournoij Over a year ago

I think this will always create an entire list? It uses more memory, and will be slower if the first character is a >0x80 character since it keeps iterating over the entire string (which doesn't matter too much in most applications, but does in some).

warvariuc Over a year ago

@Carpetsmoker >I think this will always create an entire list? < No, it won't. The expression inside all is a generator, which feeds characters one by one.

JavaSa Over a year ago

Which is faster and has less time complexity? or both are the same?

warvariuc Over a year ago

@JavaSa, time complexity should be the same. Which one is faster - you should measure. I suspect that for bigger strings encode/decode version is faster - it's implemented in C.

Johnny Utahh Over a year ago

Is there any way to leverage mypy (mypy-lang.org) to static-type-check type-hinted string literals to byte types to support this effort (at mypy-check time) instead of relying only on run-time methods (which I understand is what's happening in this answer--pls correct me if I misunderstand)?

Quinn · Accepted Answer · 2016-03-09 15:30:26Z

6

You can also opt for regex to check for only ascii characters. [\x00-\x7F] can match a single ascii character:

>>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None
>>> OnlyAscii('string')
True
>>> OnlyAscii('Tannh‰user')
False

answered Mar 9, 2016 at 15:30

Quinn

4,5142 gold badges24 silver badges19 bronze badges

Comments

Quentin Pradet · Accepted Answer · 2017-09-28 11:29:02Z

6

If you have unicode strings you can use the "encode" function and then catch the exception:

try:
    mynewstring = mystring.encode('ascii')
except UnicodeEncodeError:
    print("there are non-ascii characters in there")

If you have bytes, you can import the chardet module and check the encoding:

import chardet

# Get the encoding
enc = chardet.detect(mystring)['encoding']

edited Sep 28, 2017 at 11:29

Quentin Pradet

4,7712 gold badges32 silver badges42 bronze badges

answered Mar 9, 2016 at 11:35

rotten

1,66222 silver badges26 bronze badges

1 Comment

Martin Tournoij Over a year ago

You should catch the UnicodeDecodeError error that you're expecting, not the base Exception class. Consider what would happen if for whatever reason chardet.detect doesn't have a encoding key, or if mystring would be a list or int.

Girish Jadhav · Accepted Answer · 2016-03-09 11:45:11Z

0

A workaround to your problem would be to try and encode the string in a particular encoding.

For example:

'H€llø'.encode('utf-8')

This will throw the following error:

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.

try:
    'H€llø'.encode('utf-8')
except UnicodeDecodeError:
    print 'This string contains more than just the ASCII characters.'

answered Mar 9, 2016 at 11:45

Girish Jadhav

1944 bronze badges

Collectives™ on Stack Overflow

Check that a string contains only ASCII characters?

4 Answers 4

5 Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related