1

I am working to slice an HTML address to get the unique part and use it as my filename.

Here is the challenge:

I am trying to set the filename as:

somestring-01.pdf  
anotherstring-01.pdf  
nostring-01.pdf  

Since there is no way of knowing how many characters after the last slash and between the last two slashes, I am not able to hard-define splits such as [-5:-10].

To be able to solve this challenge, my pseudo-code to get the filename is as follows:

  • Find the index of the last string [int_last_slash_index]
  • Find the index of the one previous string [int_prev_slash_index]
    Step 1: count no of slashes in string
    Step 2: subtract one from the count (count_slash-1)
    Step 3: find the (count_slash-1)th index position
  • Set slicing positions:
    Position 1: last slash position = len(url) - int_last_slash_index
    Position_2: previous slash position = len(url) - int_prev_slash_index
  • Slice the URL string with [-int_prev_slash_position:-int_last_slash_position]

In Python:

last_slash_index = url_string.rfind("/")
int_last_slash_index = int(last_slash_index)
int_last_slash_position = len(url_string) - int(last_slash_index)
slash_count = url_string.count("/")
one_prev_slash = slash_count -1 
index_one_prev_slash = url_string.find("/",one_prev_slash)
int_one_prev_slash_index = int(index_one_prev_slash)
int_one_prev_slash_position = len(url_string) - 
int(int_one_prev_slash_index)
filename = url_string[-int_last_slash_position:-int_one_prev_slash_position]

If there is such a way, I want to solve it with string operators, rather than diving into regexes, code tricks because I cannot handle them now. I am OK to learn further methods, libraries though.

As you would guess, I am new in Python and just trying to get a hold of strings.

Thank you.

PS: Just the opposite was posted before but for Java, no responses: 1

4 Answers 4

6

You could use split using '/' as the separator, from the documentation:

Return a list of the words in the string, using sep as the delimiter string.

Code:

urls = ['http://www.someurl.com/folder-1/somestring/01.pdf',
'http://www.someurl.com/folders1531as12/anotherstring/183.pdf',
'http://www.someurl.com/folder-dsa990s/nostring/46798.pdf']

for url in urls:
    print('{}-{}'.format(*url.split('/')[-2:]))

Output

somestring-01.pdf
anotherstring-183.pdf
nostring-46798.pdf

Once the url is splitted you can get the last two elements of the list and combine them using the format function.

Sign up to request clarification or add additional context in comments.

Comments

5

Use split:

urls = [
    "http://www.someurl.com/folder-1/somestring/01.pdf",
    "http://www.someurl.com/folders1531as12/anotherstring/183.pdf",
    "http://www.someurl.com/folder-dsa990s/nostring/46798.pdf",
]
for url in urls:
    print(url.split('/')[-2])

Comments

0

try this:

import urllib3

url = r"http://www.someurl.com/folder-1/somestring/01.pdf"
print("-".join(urllib3.util.parse_url(url).path.split("/")[-2:]))

this would also work in case of more complicated urls

ex: http://www.someurl.com/folder-1/somestring/01.pdf?x=1

Comments

0

After days of scratching my bald head, I am illuminated with rsplit method. Instead of all the algorithm above, this did everything:

filename = url_string.rsplit("/")[-2]

Apologies for taking everybody's time and efforts. And thanks very much for the comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.