2

I'm still quite unfamiliar with Python, albeit I have quite a bit of experience with JavaScript, so it's really only the idiosyncrasies of Python that I need to work on. Considering that, and the fact that I know there are some subtle differences between JS RegEx and Python RegEx, I have a question about a Python RegEx statement. Is there any way to make the following statement more concise?

The Whole Regular Expression

^https://www.indiegogo.com/explore/[a-z]+-?[a-z]+\?project_type=[a-z]+&project_timing=[a-z]+_?[a-z]+&tags=&sort=trending$

^https://www.indiegogo.com/explore/[a-z]+-?[a-z]+\?project_type=[a-z]+&project_timing=[a-z]+_?[a-z]+&tags=&sort=trending$

Breakdown of the Whole Regular Expression

I'll break this down further for you. The URL address will always begin with ^https://www.indiegogo.com/explore/ and always end with &tags=&sort=trending$, so no need to worry about this, but...

[a-z]+-?[a-z]+\?project_type=[a-z]+&project_timing=[a-z]+_?[a-z]+

...is the specific part of the regular expression that matters, which can be broken down even further.

URL Structure and Possible Formats of Dynamic Values

  1. ^https://www.indiegogo.com/explore/
  2. word or dash-separated or separated-by-dashes or words-separated-by-dashes
  3. ?project_type=
  4. word
  5. &project_timing=
  6. word or additional_word
  7. &tags=&sort=trending$

Steps 1., 3., 5., and 7. can be ignored altogether, which leaves us with...

The Only Dynamic Values

    2. word or dash-separated or separated-by-dashes or words-separated-by-dashes

    6. word or additional_word

It may be my own ignorance or inexperience, but the regular expression I've devised seems clunky so to speak. Is there any way to improve this regular expression?!

8
  • Parsing URLs in Python is easier with urlparse. Commented Jun 25, 2018 at 20:11
  • @WiktorStribiżew i'm not trying to parse the url. i need a regex for an object in a particular framework Commented Jun 25, 2018 at 20:16
  • 1
    You could compile each piece of the regex separately so they're more manageable, and then join them together as in this answer: stackoverflow.com/questions/22102814/… Commented Jun 25, 2018 at 20:18
  • @divibisan interesting! Commented Jun 25, 2018 at 20:19
  • @davedwards i'm not looking to concatenate a regex object... that was simply somebody's suggestion Commented Jun 25, 2018 at 20:28

1 Answer 1

1

Without having any sample URLs to test with, the simplest solution I could find is this:

^https:\/\/www.indiegogo.com\/explore\/[a-z\-?_=]+&project_timing=[a-z_]+&tags=&sort=trending$

So here's a breakdown of what I did differently:

  • Instead of [a-z]+-?[a-z]+\?project_type=[a-z]+, I simplified it with [a-z\-?_=]+
  • Instead of [a-z]+_?[a-z]+ I used [a-z_]+

The only issue I saw was that you aren't taking full advantage of your character classes. If you would like to provide a few sample URLs I'd be able to fix any issues you might run into. But as far as I can tell, it does what you need it to.

You can also use ^https:\/\/www.indiegogo.com\/explore\/[\w\-]+&project_timing=[\w]+&tags=&sort=trending$ if you really want to simplify, but that might not be restrictive enough.

Sign up to request clarification or add additional context in comments.

4 Comments

god, i keep writing messages to you and then editing my question and it keeps erasing my messages. the first bit in your initial comment (i.e. using \w, etc) wouldn't have worked since it would've matched much more than just alphabet characters, but it got me thinking, and, yeah, 2. [a-z-]+ and 6. [a-z_] are definitely improvements
I'm not sure what your requirements are exactly, but ^https:\/\/www.indiegogo.com\/explore\/[a-z\-&_=]+$ matches the sample URL I used too. The only issue is it doesn't require the URL to contain &project_timing= or &tags=&sort=trending. I'm not sure if that's important to you.
If the beginning and end is all you care about ^https:\/\/www.indiegogo.com\/explore\/[a-z\-&_=]+&tags=&sort=trending$ works too.
that format of the urls i listed in my questions is specific to the way in which the links i want are structured. when you start messing with the other stuff, there's a good chance i may end up matching stuff i don't want to :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.