How to extract certain pattern from a url using regex in Python?

Question

I have some bunch of urls like below

https://data.hova.com/strap/nik/sql_output1574414532.89.zip

https://data.hova.com/strap/asr/sql_output1574414532.89.zip

https://data.hova.com/strap/olr/sql_output1574414532.89.zip

Now I want to extract just the zip file name ie sql_output1574414532.89.zip, sql_output1574414532.89.zip, sql_output1574414532.89.zip respectively.

Now I could have used a simple split to get the filenames but if you observe, the directory name before the zip file changes like nik, asr, olr etc.

So I want to use regex so that I only look at anything that starts with sql and ends with zip.

So this is what I did

import re

string = "https://data.hova.com/strap/nik/sql_output1574414532.89.zip"
pattern = r'^sql\.zip$'
match = re.search(pattern, string)
print(match)

But the match comes as None. What am I doing wrong?

Budagov Blues · Accepted Answer · 2019-11-22 11:30:31Z

1

The pattern r'^sql\.zip$' matches only one string: "sql.zip".

For your purpose you need something like sql.+zip$, or, if you expect that sql string can be encountered in URL before file name, change it to sql[^/]+zip$.

answered Nov 22, 2019 at 11:30

Budagov Blues

462 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to extract certain pattern from a url using regex in Python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related