4

I want to say that (I am a penguin) I am not a Windows or a Powershell guy, but that shouldn't stop me from helping our windows team.

I need to combine two logs and sort them by date and time. I think that combining them should be simple enough, but it's sorting by date and time that seems to be throwing me off a bit.

The log I am working with Doesn't have the equal number of columns and so I am somewhat normalizing the log to try to sort by logline[3,4] which is date and time.

"SMTPD" 4416    2476943 "2018-09-11 23:53:37.410"   "1.1.1.1"   "SENT: 221 goodbye"
"TCPIP" 4308    "2018-09-11 23:59:47.255"   "TCP - 1.1.1.2 connected to 1.1.1.1:25."
"SMTPD" 4308    2476952 "2018-09-11 23:22:47.255"   "1.1.1.1"   "SENT: 220 mx9.bobdestroyer.com ESMTP"
"SMTPD" 4416    2476952 "2018-09-11 23:35:47.255"   "1.2.3.4"   "RECEIVED: EHLO smtp-cow-666"
"SMTPD" 4416    2476952 "2018-09-11 23:22:47.255"   "1.1.1.1"   "SENT: 250-mx5.bobthedestroyer.com[nl]250-SIZE 20480000[nl]250-AUTH LOGIN[nl]250 HELP"
"SMTPD" 4232    2476952 "2018-09-11 23:53:47.255"   "1.1.1.1"   "RECEIVED: MAIL FROM:<[email protected]>"
"SMTPD" 4232    2476952 "2018-09-11 23:59:47.255"   "1.1.1.1"   "SENT: 250 OK"
"SMTPD" 4416    2476952 "2018-09-11 23:11:47.270"   "1.1.1.1"   "RECEIVED: RCPT TO:<[email protected]>"
"SMTPD" 4416    2476952 "2018-09-11 23:22:47.270"   "1.1.1.1"   "SENT: 250 OK"
"SMTPD" 4308    2476952 "2018-09-11 23:55:47.270"   "1.1.1.1"   "RECEIVED: DATA"
"SMTPD" 4308    2476952 "2018-09-11 23:21:47.270"   "1.1.1.1"   "SENT: 354 OK, send."
"SMTPD" 4000    2476952 "2018-09-11 09:53:48.208"   "1.1.1.1"   "SENT: 250 Queued (0.768 seconds)"
"APPLICATION"   3100    "2018-09-11 11:53:48.208"   "SMTPDeliverer - Message 2570349: Delivering message from [email protected] to [email protected] . File: C:\Program Files (x86)\servers\toomanysecrets\{49E08D79-C4A5-43F1-9435-9999999999}.eml"
"APPLICATION"   3100    "2018-09-11 12:12:48.208"   "SMTPDeliverer - Message 2570349: Relaying to host [email protected] ."

Here is what I have written:

$Unclean_LogLines = Get-Content .\BHmailLog.txt

#$LogLines | %{"$($_.Split()[0,1,2,3,4,5,6,7,8,9,10,11,12,13 ])"}


$AppendedLogLines = [System.Collections.ArrayList]@()


#Attempts to normalise the log.... And even out the columns.So that I can grap $_[3,4] for each line.
#perhaps a simple foreach + regex would be better....

$Unclean_LogLines | foreach-object {

    $firstcolumn = ($_ -split '\s+',4)[0]
    if($firstcolumn -eq '"APPLICATION"'){
        $_ = '"APPLICATION" ' + $_ 
         $AppendedLogLines.Add($_ + "`n")

    }

    elseif($firstcolumn -eq '"TCPIP"'){
         $_ = '"TCPIP" ' + $_ 
         $AppendedLogLines.Add($_ + "`n") # minor problem here. I am not 100% normalising the log... I should make _$[2] = 4248 or something. 

    }
    else{
      $AppendedLogLines.Add($_ + "`n")

    }


}
"FINISHED NORMALISING!! "

   $AppendedLogLines| foreach-object {


    $timestamp,$null = %{"$($_.Split()[3,4])"}
     $timestamp = $timestamp.Replace('"','') # remove the last qoate....


  $_ |sort-object -property { 

    }
6
  • Does the date and time always begin at the same column number and occupy the same number of column positions? Commented Sep 17, 2018 at 16:28
  • It doesnt . For this reason I am using the code to append fields in those lines that don't have enough columns. The column $AppendedLogLines has this date and time at index (3,4) Commented Sep 17, 2018 at 16:31
  • Normalize to match the "SMTPD" lines, then write it out as a CSV file. Then, import with Import-CSV (docs on the link), and process the date/time field with [DateTime]::ParseExact() (see TechNet and this SO question Commented Sep 17, 2018 at 17:10
  • 1
    Remember, PowerShell likes objects, not text. It provides ways of converting text to objects without you having to do heavy-duty parsing by hand, the way you do in bash or perl. Commented Sep 17, 2018 at 17:12
  • 2
    Does your sample log segment above show all of the possible line formats that you will encounter? Commented Sep 17, 2018 at 17:18

2 Answers 2

1

In order to achieve sorting by timestamps, you don't strictly need to normalize your logs:

Get-Content ./BHmailLog.txt | 
  Sort-Object { 
    if ($_ -match '"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})"') {
      [datetime] $Matches[1]
    }
  }

Note: Should there be more than one timestamp on a line, this will sort by the first one.

The approach:

  • uses Sort-Object with a calculated property, whose script block uses the -match operator with a regex and capture group to extract the timestamp from each input line,
  • converts that string timestamp to a [datetime] instance by way of a cast,
    • Note that this cast works irrespective of what culture is in effect, because PowerShell uses the invariant culture when casting / parsing from strings, where possible - and your timestamps are in a format recognized by the invariant culture.
  • which, via a calculated property ({ ... }), makes Sort-Object perform correct chronological sorting.

Note that you can use the above approach even if you have normalized your logs first, but in case you then prefer to target the timestamps by field indices:

$AppendedLogLines |
  Sort-Object { [datetime] ((-split $_)[3,4] -join ' ' -replace '"') }

While this is perhaps conceptually cleaner and easier to understand, I'm not sure which approach performs better.


As for normalizing your log file:

$AppendedLines = switch -Regex -File .\BHmailLog.txt {
  # Malformed line, insert missing fields and add padding.
  '^(".+?")\s+(\d+)\s+(".+?")(\s+".+?")$' { 
    $Matches[1].PadRight(13) + ' ' + $Matches[2] + '    0       ' + $Matches[3] +   '   ""       ' + $Matches[4]
  }
  # Well-formed line -> add padding to 1st field, otherwise pass through  
  default {
    $first, $rest = $_ -split '\s+', 2
    $first.PadRight(13) + ' ' + $rest
  } 
}
  • The above uses a switch statement with the -Regex and -File parameters for efficiently processing the lines of a file with regex matching, where the results of a matching operation are reflected in the automatic $Matches variable

  • Malformed lines are assumed to be missing a numeric field before the first "..." field, and missing a "..." field before the last one, which are replaced with 0 and "", respectively.

Output:

"SMTPD"       4416    2476943 "2018-09-11 23:53:37.410"   "1.1.1.1"   "SENT: 221 goodbye"
"TCPIP"       4308    0       "2018-09-11 23:59:47.255"   ""          "TCP - 1.1.1.2 connected to 1.1.1.1:25."
"SMTPD"       4308    2476952 "2018-09-11 23:22:47.255"   "1.1.1.1"   "SENT: 220 mx9.bobdestroyer.com ESMTP"
"SMTPD"       4416    2476952 "2018-09-11 23:35:47.255"   "1.2.3.4"   "RECEIVED: EHLO smtp-cow-666"
"SMTPD"       4416    2476952 "2018-09-11 23:22:47.255"   "1.1.1.1"   "SENT: 250-mx5.bobthedestroyer.com[nl]250-SIZE 20480000[nl]250-AUTH LOGIN[nl]250 HELP"
"SMTPD"       4232    2476952 "2018-09-11 23:53:47.255"   "1.1.1.1"   "RECEIVED: MAIL FROM:<[email protected]>"
"SMTPD"       4232    2476952 "2018-09-11 23:59:47.255"   "1.1.1.1"   "SENT: 250 OK"
"SMTPD"       4416    2476952 "2018-09-11 23:11:47.270"   "1.1.1.1"   "RECEIVED: RCPT TO:<[email protected]>"
"SMTPD"       4416    2476952 "2018-09-11 23:22:47.270"   "1.1.1.1"   "SENT: 250 OK"
"SMTPD"       4308    2476952 "2018-09-11 23:55:47.270"   "1.1.1.1"   "RECEIVED: DATA"
"SMTPD"       4308    2476952 "2018-09-11 23:21:47.270"   "1.1.1.1"   "SENT: 354 OK, send."
"SMTPD"       4000    2476952 "2018-09-11 09:53:48.208"   "1.1.1.1"   "SENT: 250 Queued (0.768 seconds)"
"APPLICATION" 3100    0       "2018-09-11 11:53:48.208"   ""          "SMTPDeliverer - Message 2570349: Delivering message from [email protected] to [email protected] . File: C:\Program Files (x86)\servers\toomanysecrets\{49E08D79-C4A5-43F1-9435-9999999999}.eml"
"APPLICATION" 3100    0       "2018-09-11 12:12:48.208"   ""          "SMTPDeliverer - Message 2570349: Relaying to host [email protected] ."
Sign up to request clarification or add additional context in comments.

Comments

0

Quite a challenge.. I can't write code for this but I can offer some advice.. Split each line using Space as your delimiter.. then you're going to have to look at each element (or just the 3rd or 4th one) and see if it matches the day/time pattern. If it does, voila; there's your search element.. throw that into a hash as the key and the entire line as the data. Then sort your hashtable. That's how I would approach it.

1 Comment

The downside of this is that splitting on space breaks up the date/time string, where PowerShell could take the whole string and convert it to a date/time object. I would be more inclined to do as I suggested in the comments - normalize and export as CSV, then import and process the resulting objects. It may be slightly higher in overhead, but ultimately gives you more flexibility in processing, especially since you can do math on date/time objects, and sort them without having to worry about locale-dependent input or output formats for them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.