1

So, I have some data that looks like this:

19/10/2020 05:57:08.200, 362173.64, 4498564.26, 10.000, 10.000,   0.000, 0, 3.2,  
19/10/2020 05:57:08.270, 362173.64, 4498564.38, 10.000, 10.000,   0.000, 0, 3.2,  
19/10/2020 05:57:08.340, 362173.64, 4498564.49, 10.000, 10.000,   0.000, 0, 3.2,  
19/10/2020 05:57:08.410, 362173.64, 4498564.61, 10.000, 10.000,   0.000, 0, 3.7,  
19/10/2020 05:57:08.470, 362173.64, 4498564.72, 10.000, 10.000,   0.000, 0, 2.8,  

I need to sort the data by time stamp. Under linux, I would run this data through awk to pick apart the date and time to look like this:

2020 05 10 19 57 08.200  362173.64  4498564.26  10.000  10.000    0.000  0  3.2  
2020 05 10 19 57 08.270  362173.64  4498564.38  10.000  10.000    0.000  0  3.2  
2020 05 10 19 57 08.340  362173.64  4498564.49  10.000  10.000    0.000  0  3.2  

Using a command like this:

awk -F'[/:,]' '{print $3,$2,$1,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}'

Then run that through sort and back through awk to put the columns back in the correct order.

awk -F'[/:,]' '{print $3,$2,$1,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}' | sort | 
awk -F' ' '{print $3"/"$2"/"$1,$4":"$5":"$6","$7","$8","$9","$10","$11","$12","$13","$14}'

Is there a better way to do this in powershell, or at least a way to do the same thing? I have a powershell script that very clumsily does something similar, but somewhere in the process I lose the newlines...

$nav = (gc $navfile | %{$_ -replace ", ",","} | %{$_ -replace "  ",""} | %{"$($_.Split('[/,:]')[2,1,0,3,4,5,6,7,8,9,10,11,12,13,14])"} | sort-object | %{"$($_.Split(' ')[2,1,0,3,4,5,6,7,8,9,10,11,12,13,14])"} )  

[regex]$sorted = " "


$one = $sorted.replace($nav, "/", 2)  
$two = $sorted.replace($one, "!", 1)  
$three = $sorted.replace($two, ":", 2)  
$four = $three | %{$_ -replace " ",","}  
$five = $four | %{$_ -replace "!"," "}  
$six = $five | %{$_ -replace ",,","`n"}  
echo $six  

Any help would be greatly appreciated.

4
  • "powershell" is a windows thing, right? And if you're on Windows then you're using GNU awk, and GNU awk comes with it's own sort functionality so - why not just do it all in 1 call to awk? If you edit your question to show the expected output then we can help you - I tried running the awk | sort | awk pipeline you provided on the data you posted but the output didn't make sense so I assume that script or the sample input isn't actually correct. Isn't your posted sample input already sorted by timestamp? If so, that's not useful to test a sorting function. Commented Jan 18, 2021 at 16:24
  • 1
    @EdMorton powershell is a general shell that's available for Linux and macOS Commented Jan 18, 2021 at 17:03
  • I'm making a portable tool for others to use under windows, which is why I'm working in powershell. The data in the example happens to be sorted, but the full data is not. I may have missed something in explaining the awk code, but I have a functioning script under linux. Commented Jan 18, 2021 at 17:37
  • When asking a question it's important to show a minimal reproducible example that demonstrates the problem (so if your problem is how to sort data, provide unsorted data for the sample input and the same data sorted for the expected output) along with actual code that does what you say with the example you provide. Otherwise it's like asking your mechanic to help get your car started when you actually have a horse. Commented Jan 18, 2021 at 18:52

2 Answers 2

3

In PowerShell, you can use the Sort-Object cmdlet - and you don't need to modify the source file, at all:

Get-Content $navfile |Sort-Object {
  [datetime]::ParseExact($_.Substring(0, 23), 'dd/MM/yyyy HH:mm:ss.fff', $null)
}

The $_.Substring(0, 23) expression will extract the first 23 characters, ie. 19/10/2020 05:57:08.200. The [datetime]::ParseExact() call then parses the resulting string as a timestamp with the format dd/MM/yyyy HH:mm:ss.fff, and the resulting [datetime] value is then used as the sorting key.

Sign up to request clarification or add additional context in comments.

Comments

1

If your input really was unsorted:

$ cat file
21/10/2020 05:57:08.470, 362173.64, 4498564.72, 10.000, 10.000,   0.000, 0, 2.8,
19/10/2020 05:57:08.340, 362173.64, 4498564.49, 10.000, 10.000,   0.000, 0, 3.2,
08/10/2020 05:57:08.270, 362173.64, 4498564.38, 10.000, 10.000,   0.000, 0, 3.2,
19/10/2020 05:57:08.410, 362173.64, 4498564.61, 10.000, 10.000,   0.000, 0, 3.7,
19/10/2020 05:57:08.200, 362173.64, 4498564.26, 10.000, 10.000,   0.000, 0, 3.2,

here's how you could sort it by date+time just using GNU awk:

$ awk -F'[/ ]' '
    { rec[$3 $2 $1 $4] = $0 }
    END { PROCINFO["sorted_in"]="@ind_str_asc"; for (ts in rec) print rec[ts] }
' file
08/10/2020 05:57:08.270, 362173.64, 4498564.38, 10.000, 10.000,   0.000, 0, 3.2,
19/10/2020 05:57:08.200, 362173.64, 4498564.26, 10.000, 10.000,   0.000, 0, 3.2,
19/10/2020 05:57:08.340, 362173.64, 4498564.49, 10.000, 10.000,   0.000, 0, 3.2,
19/10/2020 05:57:08.410, 362173.64, 4498564.61, 10.000, 10.000,   0.000, 0, 3.7,
21/10/2020 05:57:08.470, 362173.64, 4498564.72, 10.000, 10.000,   0.000, 0, 2.8,

Obviously that's being run on Unix - apply Windows magic quoting rules as/if appropriate (or just save the script in a file and run it as awk -f script file).

12 Comments

I don't understand what your awk code is doing, but when I run it, the output is unsorted.
@SamAlleman it's just creating an array of records (each line of input being a record) indexed by the date+time values then printing the contents of that array in the order of it's indices, see gnu.org/software/gawk/manual/…. If the output you get isn't sorted then either you aren't using GNU awk or you're using a very old version of it that doesn't support sorted_in or your real input doesn't look like the example you provided. Add some print statements throughout it to dump values used if it's not obvious.
Ah, I was using mawk, not gnu awk. Let me give that another try. Thanks!
Gnu awk fixed it, but it's about half the speed of what I was doing before. I guess mawk doesn't support "sorted_in"...
Right, mawk is traditionally a minimal-featured awk designed for speed of execution whereas gawk is feature-rich designed for enhanced functionality. Also, sort is a utility designed specifically to sort data so you should expect it to be able to do so faster than some other tool that just does sorting as an extra bit of functionality. Having said that - half the speed sounds like a lot, I wonder if the script you're running is just doing exactly what I show or other things too that might be slowing it down, and I wonder if you did 3rd-run timing or if you're experiencing cache-ing impact.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.