2

awk novice here, was wondering if this is doable.

My file:

CCDDBBAA 
EFGHAC 
KJLDFU
ABBAAC

Desired output:

ABCD
ACEFGH
DFJKLU
ABC

I want to sort the strings in my file alphabetically and remove the duplicates within the string.

Thanks!

4 Answers 4

1

This might work for you (GNU sed & sort):

sed 's/\s*/\n/g;s/.*/echo "&"|sort -u/e;s/\n//g' file

Remove white space and separate each character by a newline. Sort the lines generated removing duplicates. Remove the introduced newlines.

Sign up to request clarification or add additional context in comments.

1 Comment

Well done, but I think you could explain a bit more -- that you are using the e GNU sed extension to execute shell commands, and the original newlines are preserved automatically by sed.
1

With gawk:

 awk -v FS="" '{
    for(i=1;i<=NF;i++){
        if ($i in a == 0){
            a[$i]
        }
    };
    d=asorti(a,b);
    for(x=1;x<=d;x++){
        printf "%s",b[x]
    };
    print "";
    delete a;
    delete b
    }'

1 Comment

Thanks. Modified the answer.
0

perl:

perl -pe '%x = map {$_=>1} split ""; $_ = join "", sort keys %x' file

or ruby:

ruby -pe '$_ = $_.chars.uniq.sort.join("")' file

Comments

0

With GNU awk 4.* for sorted_in and splitting a record into characters when FS is null:

$ cat tst.awk
BEGIN { FS=OFS=ORS=""; PROCINFO["sorted_in"]="@ind_str_asc" }
{
    for (i=1;i<=NF;i++) a[$i]
    for (i in a) print i
    print RS
    delete a
}

$ awk -f tst.awk file
ABCD
ACEFGH
DFJKLU
ABC

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.