1

Considering the following input and output:

  infile   |   outfile
1 3 5 2 4  |  1 2 3 4 5
2 4 5      |  2 4 5
4 6 2 1    |  1 2 4 6

Is there any combination of UNIX programs, not involving programming languages -- not any other than the shell scripting itself --, that sorts the entries in each line of a file faster than the following approach:

while read line; do
    tr ' ' '\n' <<< ${line} | sort | tr '\n' ' '
    echo ""
done < infile > outfile

I mean, I'm able to create a small cpp/python/awk/... program to do so, but it is just not the same as using the usual one-liners to magically solve problems.

Edit:

I must have added too much text, instead of simply asking what I wanted; straightforwardly, I wanted to confirm whether there was any UNIX program/combination of programs (using pipes, fors, whiles, ...) capable of sorting entries in a line, but without as much overhead as the one solution above.

I know I may do the nasty job in a programming language, like perl, awk, python, but I was actually looking for a composition of UNIX programs that wouldn't involve these language interpreters. From the answers, I must conclude there is no such inline sort tool(s), and I'm very thankful for the solutions I've got -- mainly the very neat Perl one-liner.

Yet, I do not really understand the reason for so much overhead on the Bash approach I posted. Is it really due to a multitude of context switches, or is it simply the overhead of translating back and fro the input, and sorting it?

I can't seem to understand which of these steps is slowing down the execution so much. It takes several minutes to sort the entries in a file with ~500k lines, with ~30 values in each line.

3
  • 2
    It would appear that none of the responders fully read/comprehended your question :) Commented May 16, 2013 at 2:38
  • 1
    Your code works for numbers below 10. if you add 11 on one line and it will not get sorted properly. use sort -n if you need numeric sort. Commented May 16, 2013 at 3:19
  • You might be able to save some cycles with creative use of tsort but this also depends on the type of your input data. Commented May 16, 2013 at 4:02

4 Answers 4

2

Perl can do this nicely as a one-line Unix/Linux command:

perl -n -e "print join ' ', sort{a<=>b} split ' '" < input.txt > output.txt

This is "archaic" Perl with no dollars before the a and b, which allows the command to run fine in both Windows and bash shells. If you use the dollars with bash, they must either be escaped with backslashes, or you must invert the single and double quotes.

Note that the distinctions you are trying to draw between commands, programming languages, and programs are pretty thin. Bash is a programming language. Perl can certainly be used as a shell. Both are commands.

The reason your script runs slowly is that it spawns 3 processes per loop iteration. Process creation is pretty expensive.

Sign up to request clarification or add additional context in comments.

3 Comments

Instead of escaping the dollar signs, why not use single quotes around the perl script, and double quotes for the strings inside the script?
@Barmar Sure that's fine. I was trying to keep the two as similar as possible.
Or, on Unix, simply invert single and double quotes: perl -n -e 'print join " ", sort{$a <=> $b} split " "' < input.txt > output.txt.
1

The question is more subtle than it seems. You appear to be asking whether there is a quicker way to perform the sort, and you are getting a lot of (elegant!) answers with Perl and awk and so on. But your question seems to be whether you can do a quicker sort with shell built-ins, and for that, the answer is no.

Obviously, sort is not a shell built-in, and neither is tr. There isn't a built-in that does what sort does, and the built-ins that might substitute for "tr" are not likely to help you here (it would take as much work to manipulate, say, bash's IFS variable to remove the call to tr as to just live with the tr).

Personally, I would go with Perl. Note that if your data set is large or funky, you have the option of changing Perls default sorting algorithm using the sort pragma. I don;t think you will need it for sorting a file of integers, but maybe that was just an illustration on your part.

2 Comments

I did not name sort or tr built-ins; I was simply trying to differ average UNIX text processing tools from language interpreters/shells (like perl, (i)python, awk, and so on). I do not actually mind the tool, I was just looking for something a bit simpler, like chaining tools with fors/pipes, or the very elegant indeed solution with Perl. And, of course, something that does not take as long as my approach -- which I haven't yet figured out why is so incredibly slow.
@Rubens It is incredibly slow because it reads the entire data 4 times (read, tr, sort, tr). It also creates subshells, pipes and many processes for every line.
1
#!awk -f
{
  baz = 0
  PROCINFO["sorted_in"] = "@val_num_asc"
  split($0, foo)
  for (bar in foo)
    $++baz = foo[bar]
}
1

Result

1 2 3 4 5
2 4 5
1 2 4 6

Comments

0

Its not pretty (definitely not a 1-liner), but you can sort a line using only builtin shell commands, however for short lines it may be faster than repeatedly calling external functions.

#!/bin/sh
sortline(){
for x in $@;do
    [ ! "$FIRST" ] && FIRST=t && set --
    i=0
    while [ $i -le $# ];do
        [ $x -lt $((${@:$((i+1)):1})) ] && break || i=$((i+1))
    done
    set -- ${@:1:$i}  $x   ${@:$((i+1)):$(($#-$i))}
done
echo $@
}
while read LINE || [ "$LINE" ];do
    sortline $LINE
done <$1 >$2

Edit: btw this is a selection sort algorithm in case anyone wondered

Edit2: this is for numerical values only, for strings you would need to use some comparison like [ "$x" -lt "${@:$((i+1)):1}" ] (unchecked),however I use this C program for strings (I just call it qsort), but it could be modified using atoi on argv:

#include <stdlib.h>
#include <string.h>
static inline int cmp(const void *a, const void *b){
   return strcmp(*(const char **)a, *(const char **)b);
}

int main(int argc, char *argv[]){
    qsort(++argv, --argc, sizeof(char *), cmp);
    while (argc){
      write(1,argv[0],strlen(argv[0]));
      write(1,(--argc && argv++)?"\t":"\n",1);
   }
}

2 Comments

As a tour de force, it is remarkable (+1); as an suggested answer, it is appalling (-1). Net — no vote.
@JonathanLeffler -eh, sounded like a fun challenge, but its only faster than calling sort for less than ~10 fields per line. If you think this is appalling, check out distro.ibiblio.org/amigolinux/download/AmigoProjects/BashTrix/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.