5

sort doesn't seem to like my key specification. Why?

~/tmp $ sort --version
sort (GNU coreutils) 8.25
Packaged by Cygwin (8.25-1)
~/tmp $ echo 'a;b;c;d;e;f;g'|sort --field-separator=';' --key=1,5,2                                          
sort: stray character in field spec: invalid field specification '1,5,2'

From the man page:

-k, --key=KEYDEF : sort via a key; KEYDEF gives location and type

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.

Since the .C and OPTS part in the KEYDEF is optional, a key specification F,F,F (i.e. just the field numbers) should be correct. What did I do wrong?

BTW, my environment is Cygwin, running the Z-shell.

3 Answers 3

9

The two fields in -k arg are the START AND END fields. You can specify -k ANY NUMBER OF TIMES, to sort on multiple keys. So, -k 1,1 -k 2,2 -k 3,3 will sort first on field 1, then field 2 then field 3.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot, got it finally!
Or, more simply, -k1 -k2 -k3
Note that, according to the help, "the stop position defaults to the line's end", so, e.g., -k1,1 -k9,9 can behave differently than -k1 -k9. (I.e., with -k1, -k9 will effectively be ignored when there is a difference in fields 2 to 8.)
1

Oops, I should have taken the man page more literally. The definition for KEYDEF says

F[.C][OPTS][,F[.C][OPTS]]

and not

F[.C][OPTS][,F[.C][OPTS]...]

which means that only 1 or 2 fields can be supplied, not an arbitrary number. This explains the error.

As a side note, I believe there is still an error in the man page. The KEYDEF definition says that the stop position defaults to the line's end. This can't be true, can it? IMO it should be the stop position defaults to the field's end.

UPDATE: My explanation is NOT correct. See the answer provided by @tedtoal for a correct explanation.

1 Comment

I believe the man page is correct about the line end - I stumbled across a case where a key -k9 was ignored when used with a preceding -k1 if there was a difference in the fields in between. Using -k1,1 then made the sort work as expected.
0

As everything with logic, mentioning the to part of the --key=from,to has a meaning. But a subtle one.

3 1 3 4 2
2 2 2 3 4
1 1 1 5 0
2 0 0 3 4
2 1 4 3 4
2 1 6 3 4

would get sorted differently with -k2 than with -k2,2. On the one hand, mentioning the ending field is good for saving sort resources, so I'd use it in production. However omitting it may give more comparable results, so I'd use it for testing the same dataset.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.