7

I'm trying to split this string in PHP:

11.11.11.11 - - [25/Jan/2000:14:00:01 +0100] "GET /1986.js HTTP/1.1" 200 932 "http://domain.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

How can split this into IP, date, HTTP method, domain-name and browser?

1

4 Answers 4

14

This log format seems to be the Apache’s combined log format. Try this regular expression:

/^(\S+) \S+ \S+ \[([^\]]+)\] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m

The matching groups are as follows:

  1. remote IP address
  2. request date
  3. request HTTP method
  4. User-Agent value

But the domain is not listed there. The second quoted string is the Referer value.

Sign up to request clarification or add additional context in comments.

6 Comments

@streetparade: Use preg_match_all and you get all matches: preg_match_all('…', $str, $matches)
well this regex doesnt compile... theres a round bracket missing ;)
This one is wrong, the auth-user field (3rd one, %u) can contain spaces.
@JulienPalard Would you mind providing an update proposal?
|
4

You should check out a regular expression tutorial. But here is the answer:

if (preg_match('/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/', $line, $m)) {
  $ip = $m[1];
  $date = $m[2];
  $method = $m[3];
  $referer = $m[4];
  $browser = $m[5];
}

Take care, it's not the domain name in the log but the HTTP referer.

Comments

4

Here is some Perl, not PHP, but the regex to use is the same. This regex works to parse everything I've seen; clients can send some bizarre requests:

my ($ip, $date, $method, $url, $protocol, $alt_url, $code, $bytes,
        $referrer, $ua) = (m/
    ^(\S+)\s                    # IP
    \S+\s+                      # remote logname
    (?:\S+\s+)+                 # remote user
    \[([^]]+)\]\s               # date
    "(\S*)\s?                   # method
    (?:((?:[^"]*(?:\\")?)*)\s   # URL
    ([^"]*)"\s|                 # protocol
    ((?:[^"]*(?:\\")?)*)"\s)    # or, possibly URL with no protocol
    (\S+)\s                     # status code
    (\S+)\s                     # bytes
    "((?:[^"]*(?:\\")?)*)"\s    # referrer
    "(.*)"$                     # user agent
/x);
die "Couldn't match $_" unless $ip;
$alt_url ||= '';
$url ||= $alt_url;

Comments

2
// # Parses the NCSA Combined Log Format lines:
$pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/';

Usage:

if (preg_match($pattern,$yourstuff,$matches)) {

    //# puts each part of the match in a named variable

    list($whole_match, $remote_host, $logname, $user, $date_time, $method, $request, $protocol, $status, $bytes, $referer, $user_agent) = $matches;

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.