Python Reg Pattern URL select/filter

Question

links = [
    'http://www.npr.org/sections/thesalt/2017/03/10/519650091/falling-stars-negative-yelp-reviews-target-trump-restaurants-hotels',
    'https://ondemand.npr.org/anon.npr-mp3/npr/wesat/2017/03/20170311_wesat_south_korea_wrap.mp3?orgId=1&topicId=1125&d=195&p=7&story=519807707&t=progseg&e=519805215&seg=12&siteplayer=true&dl=1',
    'https://www.facebook.com/NPR',
    'https://www.twitter.com/NPR']

Objective: get links contain (/yyyy/mm/dd/ddddddddd/) format. e.g. /2017/03/10/519650091/

for some reasons just cannot get it right, always has the facebook, twitter and 2017/03/20170311 format links in it.

sel_links = []
def selectedLinks(links):
    r = re.compile("^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$")
    for link in links:
        if r.search(link)!="None":
            sel_links.append(link)
    return set(sel_links)
selectedLinks(links)

Next time please format your question properly, and make sure there is no typo in the posted code (I corrected in links = ) — janos
– janos, Commented Mar 12, 2017 at 8:03

janos · Accepted Answer · 2017-03-12 08:02:33Z

1

You have several problems here:

The pattern ^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$ requires the string to start with /[0-9]{4}/, but all your strings start with http.
The condition r.search(link)!="None" will never be true, because re.search returns None or a match object, so comparison to the string "None" is inappropriate

It seems you're looking for this:

def selectedLinks(links):
    r = re.compile(r"/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9}")
    for link in links:
        if r.search(link):
            sel_links.append(link)
    return set(sel_links)

answered Mar 12, 2017 at 8:02

janos

126k31 gold badges242 silver badges253 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Reg Pattern URL select/filter

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related