2

To find out if user-agent pertains to Safari, one must look for the presence of Safari but not the presence of Chrome. I am also assuming that this needs to be case-insensitive.

I am trying to do this using regular expressions in Python without subsequently needing to traverse groups to match strings.

One way to solve this is :

r1 = re.compile ("Safari", re.I)
r2 = re.compile ("Chrome", re.I)

if len(r1.findall (userAgentString)) > 0 and len(r2.findall(userAgentString)) <=0):
    print "Found Safari"

I have also tried to attempt using

r = re.compile ("(?P<s>Safari)|(?P<c>Chrome)", re.I)
m = r.search (userAgentString)
if (m.group('s') and not m.group('c')):
    print "Found Safari"

This does not work because search will stop after finding the first instance of one of 'Chrome' or 'Safari' (probably obvious to the Regex-Gurus..).

I can get it to work slightly efficiently using the re.finditer() funciton as follows :

r = re.compile ("(?P<s>Safari)|(?P<c>Chrome)", re.I)
safari = chrome = False
for i in r.finditer (userAgentString):
    if i.group('s'):
        safari = True
    if i.group('c'):
        chrome = True
if safari and not chrome:
    print "Found Safari"

Is there a more efficient way to do this ? (Please note I am looking for efficiency not convenience). Thanks.

Sample User-agents :

Safari : "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25"

Chrome : "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36"

For it's worth, I timed it and jwodder is on the mark with the efficiency of a simple 'lower()' and an 'in'. Came out to be about 10 times faster than a precompiled regex. Unless I did something wrong in setup/timeit..

    import timeit
    setup = '''
import re
r = re.compile ('(?P<m>MSIE)|(?P<c>Chrome)|(?P<s>Safari)', re.I)
def strictBrowser (userAgentString):
    c=s=m=False
    for f in r.finditer(userAgentString):
        if f.group('m'):
            m = True
        if f.group('c'):
            c = True
        if f.group('s'):
            s = True
    # msie or (safari but not chrome)
    # all chromes us will have safari in them..
    return m or (s and not c)
'''
    print timeit.timeit(
        'strictBrowser ("Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.2")',
        setup=setup, number=100000
        )
    setup = '''
def strictBrowser (userAgentString):
    userAgentString = userAgentString.lower()
    if (
        'msie' in userAgentString or
        ('safari' in userAgentString and 'chrome' not in userAgentString)
        ):
        return True
    return False
'''
    print timeit.timeit(
        'strictBrowser ("Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.2")',
        setup=setup, number=100000
        )

Output :
0.0778814506637
0.00664118263765
1
  • Can you include sample User-Agent strings, please? Commented Nov 10, 2013 at 1:18

1 Answer 1

4

Since you're testing whether certain fixed strings appear in a given string, it's probably easiest and most efficient to forgo regexes entirely:

if 'safari' in userAgentString.lower() and 'chrome' not in userAgentString.lower():
    print "Found Safari"
Sign up to request clarification or add additional context in comments.

5 Comments

I am wondering what is the cost of a lower() on userAgentString ? In your case you are calling it two times. I am also wondering which is faster a lower on the entire userAgent string and then an 'in' 2 times or a regex. (Please note that I could compile the regex once at startup and call only the search() function each time on the user agent strings.. (sorry if I am missing something)
@user1055761: Run some tests (say, with timeit) and find out.
The .lower() method function is pretty inexpensive, but it is pretty easy to use a variable to avoid evaluating it twice and that is what I would recommend.
Why are you so worried about efficiency? There's not going to be a significant difference in performance whichever approach you take.
I ran tests with timeit and jwodder is on the mark ! Have added it to the main question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.