2

I want to parse some information from the User-Agent: HTTP header. The problem is that I'm getting two User-Agent: HTTP headers in the same HTTP Request:

CONNECT www.facebook.com:443 HTTP/1.1
Host: www.facebook.com
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (http://iim.com/a.jph) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.
CONNECT www.facebook.com:443 HTTP/1.1
Host: www.facebook.com
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.
CONNECT www.facebook.com:443 HTTP/1.1

I want the regex to match the non-http portion e.g Windows NT 6.1; WOW64. The flow analyzer software I'm using java regex engine.

My attempts

User-Agent:\s+.*?\((.*?)\)

Its matching both; I want to skip http portion of it.

0

1 Answer 1

2

Use a negative lookahead to prevent the match of http:

User-Agent:\s+.*?\((?!http)(.*?)\)

Though you might want to change the .*? to negated classes:

User-Agent:[^(]+\((?!http)([^)]+)\)
Sign up to request clarification or add additional context in comments.

3 Comments

The \s+ becomes unnecessary if you do [^(]+, which is indeed what I would also advice rather than .*?
I'm wondering if a user agent ever has the OS like Linux (Ubuntu) 12.10, where you wanna catch what's after the first )...
@funkwurm If you're curious, you can try: User-Agent:[^(]+\((?!http)((?:[^)]|\((?1)\))+)\)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.