Make Python RegEx More Concise

Question

I'm still quite unfamiliar with Python, albeit I have quite a bit of experience with JavaScript, so it's really only the idiosyncrasies of Python that I need to work on. Considering that, and the fact that I know there are some subtle differences between JS RegEx and Python RegEx, I have a question about a Python RegEx statement. Is there any way to make the following statement more concise?

The Whole Regular Expression

^https://www.indiegogo.com/explore/[a-z]+-?[a-z]+\?project_type=[a-z]+&project_timing=[a-z]+_?[a-z]+&tags=&sort=trending$

^https://www.indiegogo.com/explore/[a-z]+-?[a-z]+\?project_type=[a-z]+&project_timing=[a-z]+_?[a-z]+&tags=&sort=trending$

Breakdown of the Whole Regular Expression

I'll break this down further for you. The URL address will always begin with ^https://www.indiegogo.com/explore/ and always end with &tags=&sort=trending$, so no need to worry about this, but...

[a-z]+-?[a-z]+\?project_type=[a-z]+&project_timing=[a-z]+_?[a-z]+

...is the specific part of the regular expression that matters, which can be broken down even further.

URL Structure and Possible Formats of Dynamic Values

^https://www.indiegogo.com/explore/
word or dash-separated or separated-by-dashes or words-separated-by-dashes
?project_type=
word
&project_timing=
word or additional_word
&tags=&sort=trending$

Steps 1., 3., 5., and 7. can be ignored altogether, which leaves us with...

The Only Dynamic Values

2. word or dash-separated or separated-by-dashes or words-separated-by-dashes

6. word or additional_word

It may be my own ignorance or inexperience, but the regular expression I've devised seems clunky so to speak. Is there any way to improve this regular expression?!

@WiktorStribiżew i'm not trying to parse the url. i need a regex for an object in a particular framework — oldboy
– oldboy, Commented Jun 25, 2018 at 20:16
You could compile each piece of the regex separately so they're more manageable, and then join them together as in this answer: stackoverflow.com/questions/22102814/… — divibisan
– divibisan, Commented Jun 25, 2018 at 20:18
@davedwards i'm not looking to concatenate a regex object... that was simply somebody's suggestion — oldboy
– oldboy, Commented Jun 25, 2018 at 20:28

emsimpson92 · Accepted Answer · 2018-06-25 23:17:35Z

1

Without having any sample URLs to test with, the simplest solution I could find is this:

^https:\/\/www.indiegogo.com\/explore\/[a-z\-?_=]+&project_timing=[a-z_]+&tags=&sort=trending$

So here's a breakdown of what I did differently:

Instead of [a-z]+-?[a-z]+\?project_type=[a-z]+, I simplified it with [a-z\-?_=]+
Instead of [a-z]+_?[a-z]+ I used [a-z_]+

The only issue I saw was that you aren't taking full advantage of your character classes. If you would like to provide a few sample URLs I'd be able to fix any issues you might run into. But as far as I can tell, it does what you need it to.

You can also use ^https:\/\/www.indiegogo.com\/explore\/[\w\-]+&project_timing=[\w]+&tags=&sort=trending$ if you really want to simplify, but that might not be restrictive enough.

edited Jun 25, 2018 at 23:17

answered Jun 25, 2018 at 23:12

emsimpson92

1,7781 gold badge12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

oldboy Over a year ago

god, i keep writing messages to you and then editing my question and it keeps erasing my messages. the first bit in your initial comment (i.e. using \w, etc) wouldn't have worked since it would've matched much more than just alphabet characters, but it got me thinking, and, yeah, 2. [a-z-]+ and 6. [a-z_] are definitely improvements

emsimpson92 Over a year ago

I'm not sure what your requirements are exactly, but ^https:\/\/www.indiegogo.com\/explore\/[a-z\-&_=]+$ matches the sample URL I used too. The only issue is it doesn't require the URL to contain &project_timing= or &tags=&sort=trending. I'm not sure if that's important to you.

emsimpson92 Over a year ago

If the beginning and end is all you care about ^https:\/\/www.indiegogo.com\/explore\/[a-z\-&_=]+&tags=&sort=trending$ works too.

oldboy Over a year ago

that format of the urls i listed in my questions is specific to the way in which the links i want are structured. when you start messing with the other stuff, there's a good chance i may end up matching stuff i don't want to :/

Collectives™ on Stack Overflow

Make Python RegEx More Concise

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related