I have been playing with awk and sed. I have a file with the following format
0000098236|Q1.1|one|Q2.1|one|Q3.1|one
0000027965|Q1.5|five|Q1.1|one|Q2.1|one
0000083783|Q1.1|one|Q1.5|five|Q2.1|one
0000027965|Q1.1|one|Q1.1|one|Q1.5|five
0000083983|Q1.1|one|Q1.5|five|Q2.1|one
0000083993|Q1.3|three|Q1.4|four|Q1.2|two
I want to tansform the QX.X to a specific numerical value. I accomplished that with sed:
sed -e "s/\<Q1.1\>/88/g" |
sed -e "s/Q1.2/89/g" |
sed -e "s/Q1.3/90/g" |
sed -e "s/Q1.4/91/g" |
sed -e "s/Q1.5/92/g" |
etc, etc. So far so good. After I do this I get
0000098236|88|one|88|one|88|one
0000027965|92|five|88|one|88|one
0000083783|88|one|92|five|88|one
0000027965|88|one|88|one|92|five
0000083983|88|one|92|five|88|one
0000083993|90|three|91|four|89|two
The delimiter is the pipe. Now I need to remove the duplicates pairs
- I want to always keep the first value
- I want to group the rest in pairs, so in the first line above,
88|oneis one pair - I want to create a file that takes the duplicates pairs out from a single line
So the file above, should look something like the following after running the transformation
0000098236|88|one
0000027965|95|five|88|one
0000083783|88|one|92|five
0000027965|88|one|88|one
0000083983|88|one|92|five
0000083993|90|three|91|four|89|two
I tried to use awk and arrays but cannot get it to work.
92|fiveis removed even though it occurs once, but two occurrences of88|oneare retained. Line 2 has a95in the original, but92in the filtered.