How to remove query string from a url?

Question

I have the following URL:

https://stackoverflow.com/questions/7990301?aaa=aaa
https://stackoverflow.com/questions/7990300?fr=aladdin
https://stackoverflow.com/questions/22375#6
https://stackoverflow.com/questions/22375?
https://stackoverflow.com/questions/22375#3_1

I need URLs for example:

https://stackoverflow.com/questions/7990301
https://stackoverflow.com/questions/7990300
https://stackoverflow.com/questions/22375
https://stackoverflow.com/questions/22375
https://stackoverflow.com/questions/22375

My attempt:

url='https://stackoverflow.com/questions/7990301?aaa=aaa'
if '?' in url:
    url=url.split('?')[0]
if '#' in url:
    url = url.split('#')[0]

I think this is a stupid way

Matthew Story · Accepted Answer · 2018-06-29 02:57:10Z

17

The very helpful library furl makes it trivial to remove both query and fragment parts:

>>> furl.furl("https://hi.com/?abc=def#ghi").remove(args=True, fragment=True).url
https://hi.com/

edited Jun 29, 2018 at 2:57

answered Jun 29, 2018 at 2:51

Matthew Story

3,82318 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3064538 Over a year ago

Why download this library when the builtin Python way is basically exactly the same: from urllib.parse import urlsplit, urlunsplit then urlunsplit(urlsplit("https://hi.com/?abc=def#ghi")._replace(query="", fragment=""))

TheDavidFactor · Accepted Answer · 2018-06-29 03:33:00Z

7

You can split on something that doesn't exist in the string, you'll just get a list of one element, so depending on your goal, you could do something like this to simplify your existing code:

url = url.split('?')[0].split('#')[0]

Not saying this is the best way (furl is a great solution), but it is a way.

edited Jun 29, 2018 at 3:33

answered Jun 29, 2018 at 3:08

TheDavidFactor

1,7612 gold badges19 silver badges20 bronze badges

Comments

Community · Accepted Answer · 2021-10-07 10:55:51Z

In your example you're also removing the fragment (the thing after a #), not just the query.

You can remove both by using urllib.parse.urlsplit, then calling ._replace on the namedtuple it returns and converting back to a string URL with urllib.parse.unsplit:

from urllib.parse import urlsplit, urlunsplit

def remove_query_params_and_fragment(url):
    return urlunsplit(urlsplit(url)._replace(query="", fragment=""))

Output:

>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/7990301?aaa=aaa")
'https://stackoverflow.com/questions/7990301'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/7990300?fr=aladdin")
'https://stackoverflow.com/questions/7990300'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/22375#6")
'https://stackoverflow.com/questions/22375'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/22375?")
'https://stackoverflow.com/questions/22375'
>>> remove_query_params_and_fragment("https://stackoverflow.com/questions/22375#3_1")
'https://stackoverflow.com/questions/22375'

Jay Calamari · Accepted Answer · 2018-06-29 02:52:19Z

2

You could try

urls = ["https://stackoverflow.com/questions/7990301?aaa=aaa",
"https://stackoverflow.com/questions/7990300?fr=aladdin",
"https://stackoverflow.com/questions/22375#6",
"https://stackoverflow.com/questions/22375"?,
"https://stackoverflow.com/questions/22375#3_1"]

urls_without_query = [url.split('?')[0] for url in urls]

for example, "https://stackoverflow.com/questions/7990301?aaa=aaa".split() returns a list that looks like ["https://stackoverflow.com/questions/7990301", "aaa=aaa"], and if that string is url, url.split('?')[0] would give you "https://stackoverflow.com/questions/7990301".

Edit: I didn't think about # arguments. The other answers might help you more :)

edited Jun 29, 2018 at 2:52

answered Jun 29, 2018 at 2:45

Jay Calamari

6531 gold badge6 silver badges17 bronze badges

1 Comment

Matthew Story Over a year ago

This does not remove fragments, and is not better than the solution the OP is looking to improve upon.

Lücks · Accepted Answer · 2020-04-20 17:57:46Z

1

You can use w3lib

from w3lib import url as w3_url
url_without_query = w3_url.url_query_cleaner(url)

answered Apr 20, 2020 at 17:57

Lücks

4,0142 gold badges43 silver badges57 bronze badges

Comments

Tom Anthony · Accepted Answer · 2020-10-06 15:32:45Z

0

Here is an answer using standard libraries, and which parses the URL properly:

from urllib.parse import urlparse

url = 'http://www.example.com/this/category?one=two'
parsed = urlparse(url)
print("".join([parsed.scheme,"://",parsed.netloc,parsed.path]))

expected output:

http://www.example.com/this/category

Note: this also strips params and the fragment, but is easy to modify to include those if you want.

answered Oct 6, 2020 at 15:32

Tom Anthony

9519 silver badges15 bronze badges

Collectives™ on Stack Overflow

How to remove query string from a url?

6 Answers 6

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related