1

I have some bunch of urls like below

https://data.hova.com/strap/nik/sql_output1574414532.89.zip

https://data.hova.com/strap/asr/sql_output1574414532.89.zip

https://data.hova.com/strap/olr/sql_output1574414532.89.zip

Now I want to extract just the zip file name ie sql_output1574414532.89.zip, sql_output1574414532.89.zip, sql_output1574414532.89.zip respectively.

Now I could have used a simple split to get the filenames but if you observe, the directory name before the zip file changes like nik, asr, olr etc.

So I want to use regex so that I only look at anything that starts with sql and ends with zip.

So this is what I did

import re

string = "https://data.hova.com/strap/nik/sql_output1574414532.89.zip"
pattern = r'^sql\.zip$'
match = re.search(pattern, string)
print(match)

But the match comes as None. What am I doing wrong?

1 Answer 1

1

The pattern r'^sql\.zip$' matches only one string: "sql.zip".

For your purpose you need something like sql.+zip$, or, if you expect that sql string can be encountered in URL before file name, change it to sql[^/]+zip$.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.