Conditional Sort using Awk or sort

Question

Alright, so I asked a question a week or so ago about how I could use sed or awk to extract a block of text between two blank lines, as well as omit part of the extracted text. The answers I got pretty much satisfied my needs, but now I'm doing something extra for fun (and for OCD's sake).

I want to sort the output from awk in this round. I found this question & answer but it doesn't quite help me to solve the problem. I've also tried wrapping my head around a lot of awk documentation as well to try and figure out how I could do this, to no avail.

So here's the block of code in my script that does all the dirty work:

# This block of stuff fetches the nameservers as reported by the registrar and DNS zone
# Then it gets piped into awk to work some more formatting magic...
# The following is a step-for-step description since I can't put comments inside the awk block:
# BEGIN:
#     Set the record separator to a blank line
#     Set the input/output field separators to newlines
# FNR == 3:
#     The third block of dig's output is the nameservers reported by the registrar
#     Also blanks the last field & strips it since it's just a useless dig comment
dig +trace +additional $host | \
awk -v host="$host" '
    BEGIN {
        RS = "";
        FS = "\n"
    }
    FNR == 3 {
        print "Nameservers of",host,"reported by the registrar:";
        OFS = "\n";
        $NF = ""; sub( /[[:space:]]+$/, "" );
        print
    }
'

And here's the output if I pass google.com in as the value of $host (other hostnames may produce output of differing line counts):

Nameservers of google.com reported by the registrar:
google.com.         172800  IN  NS  ns2.google.com.
google.com.         172800  IN  NS  ns1.google.com.
google.com.         172800  IN  NS  ns3.google.com.
google.com.         172800  IN  NS  ns4.google.com.
ns2.google.com.         172800  IN  A   216.239.34.10
ns1.google.com.         172800  IN  A   216.239.32.10
ns3.google.com.         172800  IN  A   216.239.36.10
ns4.google.com.         172800  IN  A   216.239.38.10

The idea is, using either the existing block of awk, or piping awk's output into a combination of more awk, sort, or whatever else, sort that block of text using a conditional algorithm:

if ( column 4 == 'NS' )
    sort by column 5
else // This will ensure that the col 1 sort includes A and AAAA records
    sort by column 1

I've pretty much got the same preferences for answers as the previous question:

Most important of all, it must be portable since I've encountered different behaviour between OS X (my home system) and Fedora (what I use at work) when using sed (had to replace it with gsed on OS X) and grep's -m flag (used in another script)
An explanation of how the solution works would be very much appreciated, as a learning opportunity moreso than anything else. I already learned quite a bit from the awk solution already provided in the previous question.
If the solution can be implemented within the same block of awk, that would also be awesome
If not, then something simple and eloquent that I can pipe awk's output through would suffice

The traditional pipeline solution would be to add, as you hinted, another step that adds an extra 'type=1 (or type=2) column and the end of each row, AND at the same time reformats all record types to a common format that will work with sort, (by the easiest way possible, likely duplicating columns at the front of row). The last step in the pipeline, coming out of sort, would be to restore the original format of each row based on the type column. Good luck. — shellter
– shellter, Commented Oct 6, 2013 at 12:16
In both cases you could sort by 1,5, given that field 1 is constant for type=='NS' — wildplasser
– wildplasser, Commented Oct 6, 2013 at 15:42

janos · Accepted Answer · 2013-10-06 15:29:08Z

1

Here's a solution based on @shellter's idea. Pipe the output of your nameserver records to this:

awk '$4 == "NS" {print $1, $5, $0} $4 == "A" {print $1, $1, $0}' | sort | cut -f3- -d' '

Explanation:

With awk, we take only the NS and A records, and re-print the same line with prefix: primary search column + secondary search column
sort will sort the lines, thanks to the way we set the first and second column, the order should be as you wanted
With cut we get rid of the prefix that we used for sorting

answered Oct 6, 2013 at 15:29

janos

126k31 gold badges242 silver badges253 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Calyo Delphi Over a year ago

I had to modify my code slightly to put the first print statement outside of awk in a separate echo statement to make this work, but it pretty much worked right out of the box. I'm testing this at work right now, so I'll have to study it to learn how it all works during my break and tweak it as needed to make it IPv6 compatible. Thanks!

Calyo Delphi Over a year ago

Now that I've had a chance to work on this more, there is one small bug in this that I was able to resolve: hostnames that come after the nameservers alphabetically cause the NS & A groups to sort out of order. After the cut, I piped through sort -rk 4 and that fixed it perfectly. :)

Jester · Accepted Answer · 2013-10-06 15:30:07Z

I know you asked about awk solution, but since you tagged it with bash too, I thought I'd provide such a version. It should also be more portable than awk ;)

# the whole line
declare -a lines
# the key to use for sorting
declare -a keys

# insert into the arrays at the appropriate position
function insert
{
    local key="$1"
    local line="$2"
    local count=${#lines[*]}
    local i
    # go from the end backwards
    for((i=count; i>0; i-=1))
    do
        # if we have the insertion point, break
        [[ "${keys[i-1]}" > "$key" ]] || break
        # shift the current item to make room for the new one
        lines[i]=${lines[i-1]}
        keys[i]=${keys[i-1]}
    done
    # insert the new item
    lines[i]=$line
    keys[i]=$key
}

# This block of stuff fetches the nameservers as reported by the registrar and DNS zone
#     The third block of dig's output is the nameservers reported by the registrar
#     Also blanks the last field & strips it since it's just a useless dig comment
block=0
dig +trace +additional $host |
while read f1 f2 f3 f4 f5
do
    # empty line begins new block
    if [ -z "$f1" ]
    then
        # increment block counter
        block=$((block+1))
        # and read next line
        continue
    fi

    # if we are not in block #3, read next line
    [[ $block == 3 ]] || continue

    # ;; ends the block
    if [[ "$f1" == ";;" ]]
    then
        echo "Nameservers of $host reported by the registrar:"
        # print the lines collected so far
        for((i=0; i<${#lines[*]}; i+=1))
        do
            echo ${lines[i]}
        done
        # don't bother reading the rest
        break
    fi

    # figure out what key to use for sorting
    if [[ "$f4" == "NS" ]]
    then
        key=$f5
    else
        key=$f1
    fi
    # add the line to the arrays
    insert "$key" "$f1 $f2 $f3 $f4 $f5"
done

Collectives™ on Stack Overflow

Conditional Sort using Awk or sort

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related