I am working to slice an HTML address to get the unique part and use it as my filename.
Here is the challenge:
- http://www.someurl.com/folder-1/somestring/01.pdf
- http://www.someurl.com/folders1531as12/anotherstring/183.pdf
- http://www.someurl.com/folder-dsa990s/nostring/46798.pdf
I am trying to set the filename as:
somestring-01.pdf anotherstring-01.pdf nostring-01.pdf
Since there is no way of knowing how many characters after the last slash and between the last two slashes, I am not able to hard-define splits such as [-5:-10].
To be able to solve this challenge, my pseudo-code to get the filename is as follows:
- Find the index of the last string [int_last_slash_index]
- Find the index of the one previous string [int_prev_slash_index]
Step 1: count no of slashes in string
Step 2: subtract one from the count (count_slash-1)
Step 3: find the (count_slash-1)th index position - Set slicing positions:
Position 1: last slash position = len(url) - int_last_slash_index
Position_2: previous slash position = len(url) - int_prev_slash_index - Slice the URL string with [-int_prev_slash_position:-int_last_slash_position]
In Python:
last_slash_index = url_string.rfind("/")
int_last_slash_index = int(last_slash_index)
int_last_slash_position = len(url_string) - int(last_slash_index)
slash_count = url_string.count("/")
one_prev_slash = slash_count -1
index_one_prev_slash = url_string.find("/",one_prev_slash)
int_one_prev_slash_index = int(index_one_prev_slash)
int_one_prev_slash_position = len(url_string) -
int(int_one_prev_slash_index)
filename = url_string[-int_last_slash_position:-int_one_prev_slash_position]
If there is such a way, I want to solve it with string operators, rather than diving into regexes, code tricks because I cannot handle them now. I am OK to learn further methods, libraries though.
As you would guess, I am new in Python and just trying to get a hold of strings.
Thank you.
PS: Just the opposite was posted before but for Java, no responses: 1