0

I have a ('stolen':) Python code that use regex to parse all HTTP headers.

It is like this:

parser = re.compile(r'\s*(?P<key>.+\S)\s*:\s+(?P<value>.+\S)\s*')
header_list = [(key, value) for key, value in parser.findall(http_headers)] 

Normally this works great, but the following header is not found:

Access-Control-Allow-Origin: *

I think it can have something to do with the asterisk, but I'm not sure. I think the regex part:

P<value>.+\S

is used to match and group . any character + one or more times followed by \S any non-whitespace. Isn't asterisk a part of that?

Any ideas?

1
  • Your regex does not work as you expect it to. Ideally, If you must use regex I would write it a different way. But you need to change the .+ to .* in your second group. Commented Feb 20, 2015 at 0:50

1 Answer 1

2

The problem here is actually quite simple. The final .+ expects any character, then followed by a \S another single character. tl;dr: it only matches 2 or more characters after the regex.

Use a * to look for 0 or more characters (plus the \S) instead:

\s*(?P<key>.+\S)\s*:\s+(?P<value>.*\S)\s*
#                                 ^ * instead of +
Sign up to request clarification or add additional context in comments.

1 Comment

Works great! This was my very first question and I'm very happy and a little surprised over the fast response I got from all who answered. Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.