Avoiding getting final '&' in Python regex when extracting url param

Question

I've been testing a regex of mine. The goal is getting a concrete and named url parameter from a website for replacing it.

Now I almost achieved to get the parameter with this regex:

.website.com.+tag=(?P<tagvalue>.+&|.+\s)

This works fine when the tag is at the end but it gets the value for 'tag' with a trailing '&' like 'value&' when it's in the middle.

I want to get the value but not capturing the ampersand. I tried to extract the termination characters out of the named group like this:

.website.com.+tag=(?P<tagvalue>.+)&|\s

but this regex doesn't work. It always gets until end of line. I want:

Check if there is a '&' character . If it is, capturing the parameter value without '&'
If 1 is not true and there is not a '&' character, then capture the value until end of line (I think this until a \s, because I'm processing text and the url comes inside it).

You can test the regex with some test text here:

Try .website.com.+tag=(?P<tagvalue>[^&\s]+). But like Mike said, you're better off using the urlparse library — sshashank124
– sshashank124, Commented Mar 19, 2018 at 13:54
@MikeScotty With a regex I can make the replacement in one line and I'm familiar with them. Also I don't import more modules. — madtyn
– madtyn, Commented Mar 19, 2018 at 14:29

sshashank124 · Accepted Answer · 2018-03-21 08:44:08Z

1

You can accomplish that with the following regex:

.website.com.+tag=(?P<tagvalue>[^&\s]+)

This will capture the values for the tag up to but not including the next & or whitespace

answered Mar 21, 2018 at 8:44

sshashank124

32.3k10 gold badges72 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mrzasa · Accepted Answer · 2018-03-19 13:55:53Z

0

Try with lazy repetition:

.website.com.+tag=(?P<tagvalue>.+?)(:?\s|&)

answered Mar 19, 2018 at 13:55

mrzasa

23.4k11 gold badges60 silver badges96 bronze badges