2

I am currently working on a project that involves splitting a url. I have used the urlparse module to break up the url, so now I am working with just the path segment.

The problem is that when I try to split() the string based on the delimiter "/" to separate the directories, I end up with empty strings in my list.

For example, when I do the following:

import urlparse
url = "http://example/url/being/used/to/show/problem"
parsed = urlparse.urlparse(url)
path = parsed[2] #this is the path element

pathlist = path.split("/")

I get the list:

['', 'url', 'being', 'used', 'to', 'show', 'problem']

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

0

5 Answers 5

5

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

What? There's only one empty string and it's always first, by definition.

pathlist = path.split("/")[1:] 

Is pretty common.


A trailing slash can mean an "empty" filename. In which case, a default name may be implied (index.html, for example)

It may be meaningful.

"http://example/url/being/used/to/show/problem"

The filename is "problem"

"http://example/url/being/used/to/show/problem/"

The directory is "problem" and a default filename is implied by the empty string.

Sign up to request clarification or add additional context in comments.

5 Comments

If the url has a slash at the end, there is another empty string.
Then maybe use a list comp? path_list = [(p) for p in path.split('/') if len(p)]
@craigs: It's not arbitrary. It's the first position only. The last position may be meaningful. Simply suppressing path elements is wrong.
@S.Lott: I completely agree with your original response and do understand the significance of trailing slashes for most web servers; but I was responding to @chindes later response that indicated for their particular situation they wanted to suppress all empty strings in the split. So…would the only safe way to decide whether or not to suppress the trailing '/' be to actually issue a HEAD request and check for a redirect? p.s. 'I almost wet myself' when I got a response from S.Lott.
@craigs: "they wanted to suppress all empty strings in the split" is a Really Bad Idea. It's an Attractive Nuisance.
3

I am not familiar with urllib and its output for path but think that one way to form new list you can use list comprehension the following way:

[x for x in path.split("/") if x]

Or something like this if only leading '/':

path.lstrip('/').split("/")

Else if trailing too:

path.strip('/').split("/")

And at least if your string in path always starting from single '/' than the easiest way is:

path[1:].split('/')

Comments

2
pathlist = paths.strip('/').split("/")

Comments

1

remove the empty items?

pathlist.remove('')

Comments

1

I added this as a comment to a comment, so just in case: Couldn't you use a list comprehension to exclude the empty elements returned from the split, i.e.

path_list = [(p) for p in path.split('/') if len(p)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.