1

I've got a list of devices which I need to remove duplicates (keep only the first occurrence) while preserving order and matching a condition. In this case I'm looking for a specific string and then printing the field with the device name. Here is some example raw data from the sar application:

10:02:01 AM       sdc      0.70      0.00      8.13     11.62      0.00      1.29      0.86      0.06
10:02:01 AM       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:02:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc      1.31      3.73     99.44     78.46      0.02     17.92      0.92      0.12
Average:          sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:05:01 AM       sdc      2.70      0.00     39.92     14.79      0.02      5.95      0.31      0.08
10:05:01 AM       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:05:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
10:06:01 AM       sdc      0.83      0.00     10.00     12.00      0.00      0.78      0.56      0.05
11:04:01 AM       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
11:04:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc      0.70      2.55      8.62     15.91      0.00      1.31      0.78      0.05
Average:          sda      0.12      0.95      0.00      7.99      0.00      0.60      0.60      0.01
Average:          sdb      0.22      1.78      0.00      8.31      0.00      0.54      0.52      0.01

The following will give me the list of devices from lines containing the word "average" but it sorts the output:

sar -dp | awk '/Average/ {devices[$2]} END {for (device in devices) {print device}}'
sda
sdb
sdc

The following gives me exactly what I want (command from here):

sar -dp | awk '/Average/ {print $2}' | awk '!devices[$0]++'
sdc
sda
sdb

Maybe I'm missing something painfully obvious but I can't figure out how to do the same in one awk command, that is without piping the output of the first awk into the second awk.

2 Answers 2

3

You can do:

sar -dp | awk '/Average/ && !devices[$2]++ {print $2}' 
sdc
sda
sdb

The problem is this part for (device in devices). For some reason the for does randomize the output.
I have read a long complicated information on why some where but have not the link.

Sign up to request clarification or add additional context in comments.

4 Comments

awk makes no claims about order of keys retrieved from an array as far as I know. Though in awk 4 you can inform it about the sorting to use when retrieving keys (but I don't know if "input order" is an option).
Awk arrays are stored as hash tables for efficiency. The in operator retrieves the elements from the array in the order they are stored in memory, i.e. in whatever order the hashing algorithm arranges them. If you need an array traversed in a specific order you need to decide which order (insertion order? alphabetical? numerical? by element? by index? something else?) and program that order somehow. With GNU awk you can assign an order by populating PROCINFO["sorted_in"], see gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array.
@EdMorton Thanks for the refreshment. My memory is some limited and for some reason has stared to remove stuff by it self without telling me :) This is the link to the sorted_in gnu.org/software/gawk/manual/…
@Jotne tell me about it. I learned French in school and a few years ago started learning Spanish which I eventually realized was just pushing the French out of my brain to make room. The net result is that I now can speak neither of them and am just barely holding onto English....
1
awk '/Average/ && !devices[$2]++ {print $2}' sar.in

You just need to combine the two tests. The only caveat is that in the original the entire line is field two from the original input so you need to replace $0 with $2.

1 Comment

This looks very like my post :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.