1

I am trying to form a regular expression that will match as follows:

  • One or more of any character except a colon or space, that then ends with a colon
  • Followed by a space
  • Followed by one or more number of any character except a colon or space
  • Followed directly by /r/n

As specified above, this is for HTTP GET requests so any of the following would work:

  • Host: www.stackoverflow.com/r/n
  • a-b-sads&^*@hgsdafAS&FTD: sjal;dfh9S^&D^F&(SDfsdgafs/r/n

and the following would not:

  • Host : www.stackoverflow.com
  • H:o:s:t: www.stackoverflow.com
  • Host: www.:::stackoverflow.com
  • Host: www.stackoverflow.com/n

I am currently using re.compile(r"^.{1,}: .{1,}[/r/n]$") but am not sure how to exclude colons from certain subsets of the string.

EDIT: I believe what I want to start with is ^ to signify the beginning of a string. Then, I want one or more number of any character except a colon so .{1,}, but I am not sure how I would exclude colon from this list. Then I want a colon and a space, so just :, and then any character except a colon .{1,} with the same problem as before of excluding colons. Finally, I want it to end with [\r\n]$. This still does not seem to work, even if I exclude the no colon character requirement. So something like ^.{1,}: .{1,}\r\n$, but I still need to figure out how to exclude colons.

4
  • why does the first "Host : www.stackoverflow.com" not match your requirements? Commented Jan 20, 2018 at 23:10
  • @VeltzerDoron Because it goes Host space : space where the first space should not be present. Commented Jan 20, 2018 at 23:12
  • So, you want to exclude spaces from the first string as well, or just before the colon? Commented Jan 20, 2018 at 23:14
  • @Veltzer Doron Ah yes, the first string up to the colon should exclude spaces and colons. Then a colon followed by a single space, and then a second string that excludes spaces and colons that ends with /r/n. Commented Jan 20, 2018 at 23:17

1 Answer 1

1
  1. {1,} is simply +
  2. excluding colons is done by [^:]*
  3. If you want to exclude spaces and colons, use [^ :]
  4. catching end of string with $ following \r\n seems strange to me, it means a single string ending with an eoln and nothing after it (also I hope you know about the difference between unix and windows regarding this)
  5. Also: eoln is \r\n, putting something in square brackets means either of the characters contained will match which is not what you need

In total, the following should work

^([^ :]+): ([^ :]+)$

giving Host in group 1 and the url in group 2

Test it here

Sign up to request clarification or add additional context in comments.

1 Comment

Ok, it makes sense now what exactly I was doing wrong. Thank you for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.