awk - Sort a string alphabetically and remove duplicates within the string

Question

awk novice here, was wondering if this is doable.

My file:

CCDDBBAA 
EFGHAC 
KJLDFU
ABBAAC

Desired output:

ABCD
ACEFGH
DFJKLU
ABC

I want to sort the strings in my file alphabetically and remove the duplicates within the string.

Thanks!

potong · Accepted Answer · 2016-04-08 07:53:06Z

1

This might work for you (GNU sed & sort):

sed 's/\s*/\n/g;s/.*/echo "&"|sort -u/e;s/\n//g' file

Remove white space and separate each character by a newline. Sort the lines generated removing duplicates. Remove the introduced newlines.

answered Apr 8, 2016 at 7:53

potong

59.3k6 gold badges55 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user4401178 Over a year ago

Well done, but I think you could explain a bit more -- that you are using the e GNU sed extension to execute shell commands, and the original newlines are preserved automatically by sed.

jijinp · Accepted Answer · 2016-04-08 17:47:43Z

1

With gawk:

 awk -v FS="" '{
    for(i=1;i<=NF;i++){
        if ($i in a == 0){
            a[$i]
        }
    };
    d=asorti(a,b);
    for(x=1;x<=d;x++){
        printf "%s",b[x]
    };
    print "";
    delete a;
    delete b
    }'

edited Apr 8, 2016 at 17:47

answered Apr 8, 2016 at 8:14

jijinp

2,6821 gold badge15 silver badges15 bronze badges

1 Comment

jijinp Over a year ago

Thanks. Modified the answer.

glenn jackman · Accepted Answer · 2016-04-08 12:54:43Z

0

perl:

perl -pe '%x = map {$_=>1} split ""; $_ = join "", sort keys %x' file

or ruby:

ruby -pe '$_ = $_.chars.uniq.sort.join("")' file

edited Apr 8, 2016 at 12:54

answered Apr 8, 2016 at 12:27

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

Comments

Ed Morton · Accepted Answer · 2016-04-08 16:23:21Z

0

With GNU awk 4.* for sorted_in and splitting a record into characters when FS is null:

$ cat tst.awk
BEGIN { FS=OFS=ORS=""; PROCINFO["sorted_in"]="@ind_str_asc" }
{
    for (i=1;i<=NF;i++) a[$i]
    for (i in a) print i
    print RS
    delete a
}

$ awk -f tst.awk file
ABCD
ACEFGH
DFJKLU
ABC

answered Apr 8, 2016 at 16:23

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Collectives™ on Stack Overflow

awk - Sort a string alphabetically and remove duplicates within the string

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related