Splitting a url into a list in python

Question

I am currently working on a project that involves splitting a url. I have used the urlparse module to break up the url, so now I am working with just the path segment.

The problem is that when I try to split() the string based on the delimiter "/" to separate the directories, I end up with empty strings in my list.

For example, when I do the following:

import urlparse
url = "http://example/url/being/used/to/show/problem"
parsed = urlparse.urlparse(url)
path = parsed[2] #this is the path element

pathlist = path.split("/")

I get the list:

['', 'url', 'being', 'used', 'to', 'show', 'problem']

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

S.Lott · Accepted Answer · 2011-07-12 19:02:12Z

5

I do not want these empty strings. I realize that I can remove them by making a new list without them, but that seems sloppy. Is there a better way to remove the empty strings and slashes?

What? There's only one empty string and it's always first, by definition.

pathlist = path.split("/")[1:]

Is pretty common.

A trailing slash can mean an "empty" filename. In which case, a default name may be implied (index.html, for example)

It may be meaningful.

"http://example/url/being/used/to/show/problem"

The filename is "problem"

"http://example/url/being/used/to/show/problem/"

The directory is "problem" and a default filename is implied by the empty string.

edited Jul 12, 2011 at 19:02

answered Jul 12, 2011 at 18:50

S.Lott

393k83 gold badges521 silver badges791 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

chindes Over a year ago

If the url has a slash at the end, there is another empty string.

craigs Over a year ago

Then maybe use a list comp? path_list = [(p) for p in path.split('/') if len(p)]

S.Lott Over a year ago

@craigs: It's not arbitrary. It's the first position only. The last position may be meaningful. Simply suppressing path elements is wrong.

craigs Over a year ago

@S.Lott: I completely agree with your original response and do understand the significance of trailing slashes for most web servers; but I was responding to @chindes later response that indicated for their particular situation they wanted to suppress all empty strings in the split. So…would the only safe way to decide whether or not to suppress the trailing '/' be to actually issue a HEAD request and check for a redirect? p.s. 'I almost wet myself' when I got a response from S.Lott.

S.Lott Over a year ago

@craigs: "they wanted to suppress all empty strings in the split" is a Really Bad Idea. It's an Attractive Nuisance.

Artsiom Rudzenka · Accepted Answer · 2011-07-12 19:06:35Z

3

I am not familiar with urllib and its output for path but think that one way to form new list you can use list comprehension the following way:

[x for x in path.split("/") if x]

Or something like this if only leading '/':

path.lstrip('/').split("/")

Else if trailing too:

path.strip('/').split("/")

And at least if your string in path always starting from single '/' than the easiest way is:

path[1:].split('/')

edited Jul 12, 2011 at 19:06

answered Jul 12, 2011 at 18:49

Artsiom Rudzenka

29.3k5 gold badges36 silver badges53 bronze badges

Comments

Jochen Ritzel · Accepted Answer · 2011-07-12 18:56:50Z

2

pathlist = paths.strip('/').split("/")

answered Jul 12, 2011 at 18:56

Jochen Ritzel

108k33 gold badges205 silver badges196 bronze badges

Comments

Ilia Choly · Accepted Answer · 2011-07-12 18:50:42Z

1

remove the empty items?

pathlist.remove('')

answered Jul 12, 2011 at 18:50

Ilia Choly

18.6k14 gold badges95 silver badges166 bronze badges

Comments

craigs · Accepted Answer · 2011-07-12 19:42:09Z

1

I added this as a comment to a comment, so just in case: Couldn't you use a list comprehension to exclude the empty elements returned from the split, i.e.

path_list = [(p) for p in path.split('/') if len(p)]

answered Jul 12, 2011 at 19:42

craigs

1235 bronze badges

Collectives™ on Stack Overflow

Splitting a url into a list in python

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related