Python extract username from URL

Question

I'm scraping reddit usernames using Python and I'm trying to extract the username from an URL. The URL looks like this:

https://www.reddit.com/user/ExampleUser

This is my code:

def extract_username(url):
    start = url.find('https://www.reddit.com/user/') + 28
    end = url.find('?', start)
    end2 = url.find("/", start)
    return url[start:end] and url[start:end2] and url[start:]

The first part works but removing the question mark and forward slash doesen't. Maybe I'm using the "and" keyword wrong? Which means I sometimes get something like this:

ExampleUser/
ExampleUser/comments/
ExampleUser/submitted/
ExampleUser/gilded/
ExampleUser?sort=hot
ExampleUser?sort=new
ExampleUser?sort=top
ExampleUser?sort=controversial

I know I can use the api but i'd like to learn how to do it without. I've also heard about regular expressions but aren't they pretty slow?

Avinash Raj · Accepted Answer · 2015-01-19 07:09:17Z

3

You could use re module.

>>> s = "https://www.reddit.com/user/ExampleUser/comments/"
>>> import re
>>> re.search(r'https://www.reddit.com/user/([^/?]+)', s).group(1)
'ExampleUser'

[^/?]+ negated character class which matches any character but not of / or ? one or more times. () capturing group around the negated character class captures those matched characters. Later we could refer the captured characters through back-referencing (like \1 which refers the group index 1).

By defining a separate function.

>>> def extract_username(url):
...     return re.search(r'https://www.reddit.com/user/([^/?]+)', url).group(1)
... 
>>> extract_username('https://www.reddit.com/user/ExampleUser')
'ExampleUser'
>>> extract_username('https://www.reddit.com/user/ExampleUser/submitted/')
'ExampleUser'
>>> extract_username('https://www.reddit.com/user/ExampleUser?sort=controversial')
'ExampleUser'

edited Jan 19, 2015 at 7:09

answered Jan 19, 2015 at 7:03

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

John1024 · Accepted Answer · 2015-01-19 07:29:37Z

3

This removes anything which follows a '?' and then splits on '/', retrieving the fifth element which is the user name:

>>> s = 'https://www.reddit.com/user/ExampleUser?sort=new'
>>> s.split('?')[0].split('/')[4]
'ExampleUser'

This also works on the other cases that you showed. For example:

>>> s = 'https://www.reddit.com/user/ExampleUser/comments/'
>>> s.split('?')[0].split('/')[4]
'ExampleUser'
>>> s = 'https://www.reddit.com/user/ExampleUser'
>>> s.split('?')[0].split('/')[4]
'ExampleUser'

edited Jan 19, 2015 at 7:29

answered Jan 19, 2015 at 7:15

John1024

115k15 gold badges152 silver badges183 bronze badges

Comments

clockwatcher · Accepted Answer · 2015-01-19 07:30:58Z

0

Just for kicks, here's an example using find. Basically, you just want to take the minimum where you find your delimiter or the end if it's not found at all:

def extract_username(url):
    username = url[len('https://www.reddit.com/user/'):]
    end = min([i for i in (len(username), 
                           username.find('/'), 
                           username.find('?') ) if i >=0])
    return username[:end]

for url in ('https://www.reddit.com/user/ExampleUser', 
          'https://www.reddit.com/user/ExampleUser/submitted/',
          'https://www.reddit.com/user/ExampleUser?sort=controversial'):
    print extract_username(url)

answered Jan 19, 2015 at 7:30

clockwatcher

3,38316 silver badges15 bronze badges

Collectives™ on Stack Overflow

Python extract username from URL

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related