0

How can I search for duplicate data using batch, sed or awk? Goal is to remove duplicate "Changelist: XXXXX" entry from data.txt file. I'm kind of stuck, can somebody help me?

Please take a look at output.txt for desired output.

data.txt

====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result: 
 ====================================
 Changelist: 808271
 Date: 2015/03/19
 Developer: C
 ShortDescr: HI

 CodeReview: 
 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result:  
 ====================================
  Changelist: 808277
 Date: 2015/03/19
 Developer: D
 ShortDescr: HEY

 CodeReview: 
 ====================================

output.txt

    ====================================
     Changelist: 808298
     Date: 2015/03/19
     Developer: A
     ShortDescr: Checking in the following graphics:

     CodeReview: 
     CodeReview: Result: @result___
     ====================================
     Changelist: 808273
     Date: 2015/03/19
     Developer: B
     ShortDescr: Hello

     CodeReview: Result: 
     ====================================
     Changelist: 808271
     Date: 2015/03/19
     Developer: C
     ShortDescr: HI

     CodeReview: 
     ====================================
      Changelist: 808277
     Date: 2015/03/19
     Developer: D
     ShortDescr: HEY

     CodeReview: 
     ====================================


glen's output.txt

 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview:
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview:
 ====================================
 Changelist: 808271
 Date: 2015/03/19
 Developer: C
 ShortDescr: HI

 CodeReview: 
 ====================================
  Changelist: 808277
 Date: 2015/03/19
 Developer: D
 ShortDescr: HEY

 CodeReview: 
 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview:
 ====================================$sep
6
  • Do you mean bash, not batch? Commented Mar 20, 2015 at 16:11
  • i'm sorry, yes that's what i meant. Commented Mar 20, 2015 at 16:14
  • Do you want to keep the first one found? Or the last one? Commented Mar 20, 2015 at 16:26
  • Keep first one found.. Commented Mar 20, 2015 at 16:28
  • are you sure the data sharing the same ChangeList will be extactly the same? Commented Mar 20, 2015 at 16:29

1 Answer 1

2

This is actually a very common task with awk

sep='====================================\n'
awk -F'\n' -v RS="$sep" -v ORS="$sep" '!seen[$1]++' data.txt > output.txt

Here, we're using the $sep as the awk record separator to read the paragraphs, and newline as the field separator

!seen[$1]++ is an expression that is only true for the first record where this particular field 1 is encountered. Since no action is given, the default action is to print the current record, with the output record separator appended.

Sign up to request clarification or add additional context in comments.

6 Comments

glenn thanks for quick response. but it didn't remove that duplicate block.
hmm, did for me. What's your OS?
i'm running on windows
it seems your environment didn't perform extension. are you running some kind of shell?
I think that single quotes have no special meaning, so try changing all single quotes to double quotes
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.