Get string between the last two slash positions in a URL in Python

Question

I am working to slice an HTML address to get the unique part and use it as my filename.

Here is the challenge:

I am trying to set the filename as:

somestring-01.pdf  
anotherstring-01.pdf  
nostring-01.pdf

Since there is no way of knowing how many characters after the last slash and between the last two slashes, I am not able to hard-define splits such as [-5:-10].

To be able to solve this challenge, my pseudo-code to get the filename is as follows:

Find the index of the last string [int_last_slash_index]
Find the index of the one previous string [int_prev_slash_index]
Step 1: count no of slashes in string
Step 2: subtract one from the count (count_slash-1)
Step 3: find the (count_slash-1)th index position
Set slicing positions:
Position 1: last slash position = len(url) - int_last_slash_index
Position_2: previous slash position = len(url) - int_prev_slash_index
Slice the URL string with [-int_prev_slash_position:-int_last_slash_position]

In Python:

last_slash_index = url_string.rfind("/")
int_last_slash_index = int(last_slash_index)
int_last_slash_position = len(url_string) - int(last_slash_index)
slash_count = url_string.count("/")
one_prev_slash = slash_count -1 
index_one_prev_slash = url_string.find("/",one_prev_slash)
int_one_prev_slash_index = int(index_one_prev_slash)
int_one_prev_slash_position = len(url_string) - 
int(int_one_prev_slash_index)
filename = url_string[-int_last_slash_position:-int_one_prev_slash_position]

If there is such a way, I want to solve it with string operators, rather than diving into regexes, code tricks because I cannot handle them now. I am OK to learn further methods, libraries though.

As you would guess, I am new in Python and just trying to get a hold of strings.

Thank you.

PS: Just the opposite was posted before but for Java, no responses: 1

Dani Mesejo · Accepted Answer · 2018-09-29 18:49:56Z

6

You could use split using '/' as the separator, from the documentation:

Return a list of the words in the string, using sep as the delimiter string.

Code:

urls = ['http://www.someurl.com/folder-1/somestring/01.pdf',
'http://www.someurl.com/folders1531as12/anotherstring/183.pdf',
'http://www.someurl.com/folder-dsa990s/nostring/46798.pdf']

for url in urls:
    print('{}-{}'.format(*url.split('/')[-2:]))

Output

somestring-01.pdf
anotherstring-183.pdf
nostring-46798.pdf

Once the url is splitted you can get the last two elements of the list and combine them using the format function.

edited Sep 29, 2018 at 18:49

answered Sep 29, 2018 at 18:44

Dani Mesejo

62.2k6 gold badges57 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daniel · Accepted Answer · 2018-09-29 18:44:46Z

5

Use split:

urls = [
    "http://www.someurl.com/folder-1/somestring/01.pdf",
    "http://www.someurl.com/folders1531as12/anotherstring/183.pdf",
    "http://www.someurl.com/folder-dsa990s/nostring/46798.pdf",
]
for url in urls:
    print(url.split('/')[-2])

answered Sep 29, 2018 at 18:44

Daniel

42.9k4 gold badges57 silver badges82 bronze badges

Comments

SocketPlayer · Accepted Answer · 2018-09-29 18:46:55Z

0

try this:

import urllib3

url = r"http://www.someurl.com/folder-1/somestring/01.pdf"
print("-".join(urllib3.util.parse_url(url).path.split("/")[-2:]))

this would also work in case of more complicated urls

ex: http://www.someurl.com/folder-1/somestring/01.pdf?x=1

answered Sep 29, 2018 at 18:46

SocketPlayer

1666 bronze badges

Comments

tbalci · Accepted Answer · 2018-09-29 18:51:54Z

0

After days of scratching my bald head, I am illuminated with rsplit method. Instead of all the algorithm above, this did everything:

filename = url_string.rsplit("/")[-2]

Apologies for taking everybody's time and efforts. And thanks very much for the comments.

answered Sep 29, 2018 at 18:51

tbalci

931 silver badge7 bronze badges

Collectives™ on Stack Overflow

Get string between the last two slash positions in a URL in Python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related