0
links = [
    'http://www.npr.org/sections/thesalt/2017/03/10/519650091/falling-stars-negative-yelp-reviews-target-trump-restaurants-hotels',
    'https://ondemand.npr.org/anon.npr-mp3/npr/wesat/2017/03/20170311_wesat_south_korea_wrap.mp3?orgId=1&topicId=1125&d=195&p=7&story=519807707&t=progseg&e=519805215&seg=12&siteplayer=true&dl=1',
    'https://www.facebook.com/NPR',
    'https://www.twitter.com/NPR']

Objective: get links contain (/yyyy/mm/dd/ddddddddd/) format. e.g. /2017/03/10/519650091/

for some reasons just cannot get it right, always has the facebook, twitter and 2017/03/20170311 format links in it.

sel_links = []
def selectedLinks(links):
    r = re.compile("^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$")
    for link in links:
        if r.search(link)!="None":
            sel_links.append(link)
    return set(sel_links)
selectedLinks(links)
1
  • Next time please format your question properly, and make sure there is no typo in the posted code (I corrected in links = ) Commented Mar 12, 2017 at 8:03

1 Answer 1

1

You have several problems here:

  1. The pattern ^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$ requires the string to start with /[0-9]{4}/, but all your strings start with http.
  2. The condition r.search(link)!="None" will never be true, because re.search returns None or a match object, so comparison to the string "None" is inappropriate

It seems you're looking for this:

def selectedLinks(links):
    r = re.compile(r"/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9}")
    for link in links:
        if r.search(link):
            sel_links.append(link)
    return set(sel_links)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.