1

I have a pipe delimited flat file from which I need to deduplicate the entries based on an object, to be specific a part of file is:

"001A"|"1"|"*"||"A"|"504367667"|"1"|"2005-06-10-16.57.23.000000"|
"001A"|"1"|"*"||"A"|"504367667"|"1"|"2005-10-24-16.52.29.000000"|
"001A"|"1"|"*"||"A"|"504367667"|"1"|"2007-12-13-15.48.47.000000"|
"001A"|"1"|"*"||"A"|"504367667"|"1"|"2008-12-09-17.10.39.000000"|
"001B"|"1"|"*"||"B"|"800026800"|"1"|"2005-08-08-10.48.16.000000"|
"001C"|"1"|"*"||"C"|"490349139"|"1"|"2006-01-19-12.03.08.000000"|
"001C"|"1"|"*"||"C"|"490349139"|"1"|"2009-03-12-15.08.11.000000"|

The first field is ID and last field is a timestamp, I want to deduplicate the entries such that only the latest timestamp entry is kept for each ID. So, The output that I need should be:

"001A"|"1"|"*"||"A"|"504367667"|"1"|"2008-12-09-17.10.39.000000"|
"001B"|"1"|"*"||"B"|"800026800"|"1"|"2005-08-08-10.48.16.000000"|
"001C"|"1"|"*"||"C"|"490349139"|"1"|"2009-03-12-15.08.11.000000"|

I read the file and stored the entries in an array with distinct object names, then I tried

$inputdeduped = $inputfilearray | Sort-Object Date
$inputdeduped = $inputdeduped | Select-Object ID -Unique

hoping that once the date is sorted, get-unique cmdlet used as -unique here would either pick either the first or last of the duplicated entry in the sorted array so depending on that I would sort the date in either desc or asc order, however it doesn't and randomly picks one entry.

Please help me out guys or help me understand how the get-unique cmdlet works.

1 Answer 1

2

you can try this:

  $newInputdeduped =  $inputfilearray | sort id, date -ascending | group -Property id |
     select @{n="GroupedList"; e={ $_.group | select -first 1 }} |
     select -expa list 

This is what I do with your example data after saving it as a txt file:

> $a = Import-Csv -Header "id","n1","n2","v1","n3","n4","n5","date"  -Path .\c.txt -delimiter '|'

> $a | ft -AutoSize

id   n1 n2 v1 n3 n4        n5 date
--   -- -- -- -- --        -- ----
001A 1  *     A  504367667 1  2005-06-10-16.57.23.000000
001A 1  *     A  504367667 1  2005-10-24-16.52.29.000000
001A 1  *     A  504367667 1  2007-12-13-15.48.47.000000
001A 1  *     A  504367667 1  2008-12-09-17.10.39.000000
001B 1  *     B  800026800 1  2005-08-08-10.48.16.000000
001C 1  *     C  490349139 1  2006-01-19-12.03.08.000000
001C 1  *     C  490349139 1  2009-03-12-15.08.11.000000

> $b = $a | sort id, date -ascending | group -Property id | select @{n="list";e={ $_.group | select -first 1  }} | select -expa list

    > $b | ft -AutoSize

id   n1 n2 v1 n3 n4        n5 date
--   -- -- -- -- --        -- ----
001C 1  *     C  490349139 1  2009-03-12-15.08.11.000000
001B 1  *     B  800026800 1  2005-08-08-10.48.16.000000
001A 1  *     A  504367667 1  2008-12-09-17.10.39.000000
Sign up to request clarification or add additional context in comments.

2 Comments

Dint work, It gives out a blank file. I can understand that your solution would group the sorted input by IDs and then in the third pipe you are selecting the first row of the group however I dont understand whats happening in the last 2 pipes.
group-object return an array of custom objects , with the two select-object I rebuilt an array like the one piped in. I can't figure how you have built the $InputFilearray.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.