perl -F\\t -lE'map$%[$_]{$F[$_]}++,keys@$ or@$=@F}{say"$$[$_]\t",join",",keys%{$%[$_]}for keys@$' input.tsv
awk probably needed to be longer:
awk -F\\t '{for(i=n;i;--i)s[i,$i]++||v[i]=v[i]c$i}!n{n=split($0,k)}NR==2{c=","}END{for(j in k)print k[j]FS v[j]}' input.tsv
or if using busybox, or row order is important:
awk -F\\t '{for(i=n;i;--i)s[i,$i]++||(v[i]=v[i]c$i)}!n{n=split($0,k)}NR==2{c=","}END{while(j++<NF)print k[j]FS v[j]}' input.tsv
Note that simply storing any script in a file, making it executable, and ensuring it is in your path, obviates the need for one-liners in most cases. Presumably you don't expect to type out the entire source-code of less on the command-line each time you use it, nor assign its source-code to an alias.
perl -F\\t -lE '
map $%[$_]{$F[$_]}++, keys @$
or @$ = @F
}{
say "$$[$_]\t", join ",", keys %{$%[$_]}
for keys @$
' input.tsv
@F is array of current row's column values, indexed by column number
- automatically populated by splitting input lines with
-F regex
@$ is array of columns of first row, indexed by column number
@% is array of hashes (unique rows (>1) of a column), indexed by column number
map builds a list (which is discarded but has side-effect of adding the unique row-of-column values as they are found) and so evaluates to false when @$ is empty since the resulting list is also empty, which triggers @$ to be initialised (rhs of or)
- with
-n option (implied by -F), }{ ... makes ... happen after all input has been processed
- loop over indices of
@$ printing lines built from the corresponding element of @$ and the list of keys from the corresponding hash element of @%
- note: elements of the "distinct_values" column appear in apparently-random order since result of
keys on a hash is not sorted
awk -F\\t '
{
for (i = n; i; --i)
s[i,$i]++ || (v[i] = v[i] c $i)
}
!n { n = split($0,k) }
NR==2 { c = "," }
END {
while (j++<NF)
print k[j] FS v[j]
}
' input.tsv
k is array of columns of first row, indexed by column number
s is array (hash) whose keys are the unique rows (>1) of every column seen so far and values are the count of times seen
v[i] stores string built from unique rows (>1) of ith column
when reading first line, n is not set, so nothing is added to v and then k is generated
differences from the shorter awk version:
- uses
a||b=c but busybox needs a||(b=c)
- uses
while to ensure output row i corresponds to input column i
- in standard awk,
for(j in k)return elements of k in unspecified order