3

I´m getting started with powershell and my knowledge is very poor right now. I have this .log file which looks like the following:

18.7.2017 12:59:15  Starting thread: KEYWORD1
18.7.2017 12:59:33  Thread finished; ... KEYWORD1
18.7.2017 13:32:19  Starting thread: KEYWORD2
18.7.2017 13:34:8  Thread finished;... KEYWORD2

I want to find out now, if every thread that started, has also been finished. If there is an unfinished thread I want to compare the timestamp with the current time.

I thought a hashtable would do the trick and that is what i came up with:

foreach($line in Get-Content $sourceDirectory)
{
    if($line -like "*Starting thread*")
    {
        $arrStart = $line -split ' '
        $startThreads=$arrStart[$arrStart.Length-1]
        $hashmap1 = @{$arrEnd[$arrEnd.Length-1] = $arrEnd[1]}
    }

    if($line -like "*Thread finished*")
    {
        $arrEnd = $line -split ' '
        $hashmap2 = @{$arrEnd[$arrEnd.Length-1] = $arrEnd[1]}
        $endThreads=($arrEnd[1]+" "+$arrEnd[$arrEnd.Length-1])
    }
}

How is it possible to compare these two hashmaps now?

1
  • 1
    You can simply group all lines on keyword and have a look to odd groups. Commented Aug 28, 2018 at 13:10

3 Answers 3

2

JPBlanc recommends grouping the records in a comment on the question, and the Group-Object cmdlet indeed offers a conceptually elegant solution:

Note: The assumption is that if a given keyword only has one entry, it is always the starting entry.

Select-String 'Starting thread:|Thread finished;' file.log | 
  Group-Object { (-split $_)[-1] } | Where-Object { $_.Count % 2 -eq 1 }
  • The Select-String call extracts only the lines of interest (a thread starting, a thread finishing), using a regex (regular expression)

  • The Group-Object call groups the resulting lines by the last ([-1]) whitespace-separated token (-split ...) on each line ($_), i.e., the keywords.

  • Where-Object then returns only those resulting that have an odd number of entries, i.e., those that aren't paired, representing the started-but-not-finished threads.

This yields something like the following:

Count Name          Group
----- ----          -----
    1 KEYWORD3      {/Users/jdoe/file.log:5:28.8.2018 08:59:16  Starting thread: KEYWORD3}

This is probably not the format you want, but given that the outputs are objects, as is typical in PowerShell, you can easily process them to your liking programmatically.

Technically, the above command outputs [Microsoft.PowerShell.Commands.GroupInfo] instances whose .Group property in this case contains [Microsoft.PowerShell.Commands.MatchInfo] instances, as output by Select-String.


The following code extends the one above to produce custom output that reports how much time has elapsed since each unfinished thread has started:

$now = Get-Date
Select-String 'Starting thread:|Thread finished;' file.log  | 
  Group-Object { (-split $_)[-1] } | Where-Object { $_.Count % 2 -eq 1 } | ForEach-Object {
    foreach ($matchInfo in $_.Group) { # loop over started-only lines
      $tokens = -split $matchInfo.Line # split into tokens by whitespace
      $date, $time = $tokens[0..1]     # extract date and time (first 2 tokens)
      $keyword = $tokens[-1]           # extract keyword (last token)
      # Parse date+time into a [datetime] instance.
      # Note: Depending on the current culture, [datetime]::Parse("$date $time") may do.
      $start = [datetime]::ParseExact("$date $time", 'd\.M\.yyyy HH:mm:ss', [cultureinfo]::InvariantCulture)
      # Custom output string containing how long ago the thread was started:
      "Thread $keyword hasn't finished yet; time elapsed since it started: " +
        ($now - $start).ToString('g')
    }
  }

This yields something like the following:

Thread KEYWORD3 hasn't finished yet; time elapsed since it started: 2:03:35.347563

2:03:35.347563 (2 hours, 3 minutes, ...) is the string representation of a [TimeSpan] instance that is the result of subtracting two points in time ([datetime] instances).

Sign up to request clarification or add additional context in comments.

Comments

1

It looks like you are trying to make two hashtables, one for starting and one for finished. With the important information being the Keyword. Rather than making hashtables, since you really only need one piece of information, an array would be a better data type.

# Find Lines with `Starting thread` and drop everything before the final space to get the array of KEYWORDS that started
$Start = (Select-String $sourceDirectory 'Starting thread') -replace '^.*Starting thread.*\s+'
# Find Lines with `Thread finished` and drop everything before the final space to get the array of KEYWORDS that finished
$Finish = (Select-String $sourceDirectory 'Thread finished') -replace '^.*Thread finished.*\s+'
# Find everything that started but hasn't finished.
$Start.where({$_ -notin $Finish})

Notes: Requires PS4+ for where method and -notin. Also the assumption was made that a thread doesn't start and stop multiple times.

Comments

1

One way to do this is to use RegEx to pull each line apart, then create a new object from the details. For example:

Get-Content .\data.txt |
    ForEach-Object {
        if ($_ -match "^(?<time>(\d+\.){2}\d+ (\d{2}:){2}\d{2}).*(?<state>Starting|finished).*\b(?<keyword>\w+)$")
        {
            [PsCustomObject]@{
                Keyword = $matches.keyword
                Action = $(if($matches.state -eq "Starting"){"Start"}else{"Finish"})
                Time = (Get-Date $matches.time)
            }
        }
    }

Assume you have a log file (data.txt) with the following content:

18.7.2017 12:59:15  Starting thread: KEYWORD1
18.7.2017 13:32:19  Starting thread: KEYWORD2
18.7.2017 12:59:15  Starting thread: KEYWORD3
18.7.2017 13:34:18  Thread finished;... KEYWORD2
18.7.2017 12:59:15  Starting thread: KEYWORD4
18.7.2017 13:34:18  Thread finished;... KEYWORD3
18.7.2017 12:59:15  Starting thread: KEYWORD5
18.7.2017 13:34:18  Thread finished;... KEYWORD5

Running the above code against it, gives output:

Keyword  Action Time               
-------  ------ ----               
KEYWORD1 Start  18/07/2017 12:59:15
KEYWORD2 Start  18/07/2017 13:32:19
KEYWORD3 Start  18/07/2017 12:59:15
KEYWORD2 Finish 18/07/2017 13:34:18
KEYWORD4 Start  18/07/2017 12:59:15
KEYWORD3 Finish 18/07/2017 13:34:18
KEYWORD5 Start  18/07/2017 12:59:15
KEYWORD5 Finish 18/07/2017 13:34:18

This isn't much of an improvement over the raw file, but now that you have some objects, you can more easily process them. For example, you can see which ones have no matching start/finish by appending the following after the last bracket:

| Group-Object Keyword -NoElement | Sort-Object Count -Descending

This gives output like this:

Count Name                     
----- ----                     
    2 KEYWORD2                 
    2 KEYWORD3                 
    2 KEYWORD5                 
    1 KEYWORD1                 
    1 KEYWORD4  

It is now easier to see which ones have a start/finish pair (e.g. have 2 items in each group)

This is probably a bit overkill for your scenario, but as you said you were new to PowerShell, I thought I'd mention it as it is often very useful to turn text into object like this for processing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.