5

I am using this code to parse the first argument passed to my script. It error handles and it works just the way I want it:

if [ -z "$action" ]; then
    printf "[${c_RED}ERROR${c_RESET}] The action must be specified.\n" && exit 1
elif [[ "$action" =~ ^-{0,2}[Hh][Ee][Ll][Pp]$ ]] || [[ "$action" =~ ^-{0,2}[Hh]$ ]]; then
    printf "Usage: pocsag [ACTION] [INPUTMETHOD/REDACTION] [OUTPUTMETHOD/PATHTOFILE/SERVICEACTION]                              "
    printf "Examples:                                                                                                           "
    printf "  pocsag decode rtlsdr cli                                                                                          "
    printf "  pocsag decode netcat file                                                                                         "
    printf "  pocsag redact medical ~/media/signals/pocsag/decoded/POCSAG*                                                      "
    printf "  pocsag service rtlsdr start                                                                                       "
    printf "                                                                                                                    "
    printf "Actions:                                                                                                            "
    printf "  decode                    Envoke the usage of the input tuner, sox and multimon-ng to decode the signals.         "
    printf "  redact                    Copy file but redact regex matching lines of a file. For example: Removing medical TXs. "
    printf "  service                   Used to start/stop the systemd service in user's ~/.config. Relies on rtlsdr_pager_rx   "
    printf "                                                                                                                    "
    printf "Input Methods:                                                                                                      "
    printf "  rtlsdr                    Use an RTLSDR device plugged into the local computer.                                   "
    printf "  netcat                    Listen to localhost:7355 using netcat, then process and output locally.                 "
    printf "                                                                                                                    "
elif ! [[ "$action" =~ ^([Dd][Ee][Cc][Oo][Dd][Ee]|[Rr][Ee][Dd][Aa][Cc][Tt]|[Ss][Ee][Rr][Vv][Ii][Cc][Ee])$ ]]; then
    printf "[${c_RED}ERROR${c_RESET}] The action must be 'decode', 'redact' or 'service'.\n" && exit 3
fi

I now want to make this script POSIX compliant and cannot use the [[ ]] bash idiom. How would I go about this? Elaborate case statements? Surely there is a better method.

Thanks :)

5
  • 4
    non-answer and opinion: ^-{0,2}[Hh][Ee][Ll][Pp]$ is a bit unnecessarily elaborate. Most tools are happy to just accept one version of a command line arg, or -h and --help and that's it. Long options are usually all lowercase, but for short options the letter case usually matters: -h and -H are different. And usually -help is the same as -h -e -l -p (unless one of them takes an argument). Especially in the case of a "help" option, you could just recognize one form and print the help text anyway for the cases where the user gives an invalid option, e.g. the first branch here. Commented Nov 24 at 8:25
  • and then you can just do POSIXly if [ "$action" = "decode" ]; then ...; elif  [ "$action" = "redact" ]; then ... else print_usage; fi or case $action in action) ... redact) ... *) print_usage;; esac Commented Nov 24 at 8:27
  • Dare I ask what "pocsag" is intended to mean? Commented Nov 25 at 3:12
  • [[ ... =~ ... ]] is enough of a special case to be worth calling out. Most of [[ can be substituted for with just test with careful quoting. Commented Nov 25 at 14:32
  • 2
    "I now want to make this script POSIX compliant". Why? Being POSIX-compliant sounds great and all, but it's 2025*. What systems will you run this on that don't have bash? Commented Nov 26 at 6:58

2 Answers 2

13
  1. Most people use case statements for option processing because it's simple and easy and it works. There are countless examples using either built-in getopts or /usr/bin/getopt or custom/hand-crafted. It often IS the better method, and certainly better/more-readable than a bunch of if/elif/else statements.

  2. POSIX sh does not support regex tests. For that, you need a shell like bash, ksh, or zsh, it's one of the reasons why the [[ ... ]] built-in was invented - to do stuff that [ aka test can't do without breaking compatibility.

    Or you could use awk to do your matching but (like most external programs) it's not something you'd want to fork repeatedly in a shell loop - the advantage of a built-in is that it IS built-in. Or you could use perl, but in that case it would make more sense to just write the entire script in perl.

  3. If you're not going to use nocasematch then at least use tr '[:upper:]' '[:lower:]' or similar to convert $action to all lower-case (or upper, it doesn't matter) before trying to match it against something.

    Then you wouldn't need all those ugly and unreadable and tedious to type & edit bracket expressions just to match mixed-case. That all looks like you're going out of your way to make things harder for yourself when you should be trying to make things easier and simpler. Converting to all lower or all upper case would also benefit matching with a case statement as well as regex.

    BTW, those [upper:] and [:lower:] classes work with at least GNU and BSD tr, maybe others too. As @Stephane mentions, GNU tr doesn't support multi-byte characters, so tr A-Z a-z works too. It's not unreasonable to expect that it, or some other version of tr, will one day work correctly for multi-byte unicode characters. If handling unicode options is important to you, use a unicode-capable tool like perl (which has a built-in lc function for case conversion) to do the case conversion.

  4. You should use a heredoc rather than multiple printf statements for long sequences of text. It's not like you're actually using any of printf's formatting options (and if needed, you can use heredocs with printf). Also, all that embedded whitespace at the end of each line will annoy users, as it will end up in the selection or clipboard if copied. e.g.

    cat <<__EOF__
    Usage: pocsag [ACTION] [INPUTMETHOD/REDACTION] [OUTPUTMETHOD/PATHTOFILE/SERVICEACTION]                         Examples:        
      pocsag decode rtlsdr cli  
      pocsag decode netcat file
      pocsag redact medical ~/media/signals/pocsag/decoded/POCSAG*
      pocsag service rtlsdr start
    
    Actions:
      decode                    Envoke the usage of the input tuner, sox and multimon-ng to decode the signals.
      redact                    Copy file but redact regex matching lines of a file. For example: Removing medical TXs. 
      service                   Used to start/stop the systemd service in user's ~/.config. Relies on rtlsdr_pager_rx
    
    Input Methods:              
      rtlsdr                    Use an RTLSDR device plugged into the local computer.
      netcat                    Listen to localhost:7355 using netcat, then process and output locally.                 
    
    __EOF__
    

    This is more readable, easier to edit, and easier to re-format if/when needed with fmt, fold, par, or similar. It's just text, with no embedded code.

  5. Put your usage message in a function called, e.g. usage. And same for other if or case clauses with multiple lines of code or long strings of text to print. You don't want to embed lots of text or code in a case statement or a long if/else/elif/then/fi statement - something like that should be readable at a glance without needing to page forwards and backwards just to get an overview of what the entire statement is doing.


PS: I mention readability a lot here - that's because for any code where performance isn't critical (e.g. shell scripts - any performance-critical parts of the task should be done by external programs, not by shell itself) it's always better to choose readability over "clever tricks" or performance, especially if the trick or "optimisation" adds nothing of any real, practical value.

Code needs to be maintained and readable code is easier to maintain, and easier to understand when you need to update it in six months (or six years). As a general rule, an unreadable mess will require at least as much time and effort to understand by future you or somebody else as it took to write in the first place....probably a lot more, because it's much easier to write something new than to decipher obfuscated cruft.

12
  • There's nothing stopping [ doing regexp matching; the one in zsh or yash do have a =~ operator. ksh's [[...]] and ((...)) were more about having micro-languages easier to use to perform tests (including arithmetic ones with ((...)) with a C-like language) to give feature parity with csh (which was the popular shell at the time). Commented Nov 24 at 6:58
  • Note POSIX doesn't guarantee [ or printf or cat be builtin. In practice, I don't know of any modern shell where [ is not builtin. There aren't many shells where cat is builtin. printf is not builtin in ksh88 or more pdksh derivatives. Commented Nov 24 at 7:04
  • yeah, but the OP wanted something that works in posix sh, not non-standard extensions that only work in specific shells. Commented Nov 24 at 7:04
  • tr '[:upper:]' '[:lower:]' in current versions of GNU tr only works for single byte characters (so in UTF-8 locales only on on ASCII letters). Commented Nov 24 at 7:04
  • I never claimed or even suggested that cat was built-in. The only time I mentioned built-in was regarding [[ ... ]] - that is built-in to ksh, bash, zsh, etc and has a clear performance advantage over awk if used repeatedly in a loop. For a one-off execution, it wouldn't matter at all. Commented Nov 24 at 7:06
7

expr and awk are two POSIX utilities that can do regexp matching. expr using basic regexp¹ and awk a variant of extended regexp². expr suffers from a number of design flaws and is usually considered deprecated (even POSIX advises against using it) so is probably best avoided.

While the [ aka test builtin of several shells (zsh and yash at least) can do regexp matching with a =~ operator, that's an extension over the POSIX standard so can't be used in a sh script.

Here, you could define match (and imatch for the case-insensitive variant) shell helper functions that invoke awk to do the regexp matching:

match() {
  awk -- 'BEGIN{exit(ARGV[1] !~ ARGV[2])}' "$@"
}
imatch() {
  awk -- 'BEGIN{exit(tolower(ARGV[1]) !~ tolower(ARGV[2]))}' "$@"
}

if imatch "$action" '^-{0,2}h(elp)?$'; then...

Note that strictly speaking, that imatch is not a proper way to do case-insensitive matching but it's good enough for ASCII only input and regexp.

Here, using a case construct would probably by as easy and make it more legible:

case $action in
  ([hH] | -[hH] | --[hH] | [Hh][Ee][Ll][Pp] | -[Hh][Ee][Ll][Pp] | --[Hh][Ee][Ll][Pp]) ...;;
esac

You could also remove the one or two leading -s and convert to lowercase first:

tolower() {
  awk -- 'BEGIN{for (i = 1; i < ARGC; i++) print tolower(ARGV[i])}' "$@"
}
action=${action#-} action=${action#-}
action=$(tolower "$action")

case $action in
  (h | help) ...
esac

Here using awk's tolower() to do the case conversion. Alternatives in the POSIX tool chest include dd conv=lcase and tr '[:upper:]' '[:lower:]' but in the GNU tool chest, as of writing, only the awk one works on multi-byte characters.³

Note that the [[...]] construct is initially from ksh, not bash. It's been copied by a few shells including zsh, bash, yash, busybox ash with many variations.

Regex matching in there was first added in zsh in 2004 with the -pcre-match operator (PCREs have syntax for case insensitive matching) and then bash (with =~⁴ doing EREs) in 3.1 from 2005.

=~ was later added to more shells including zsh and ksh93. In zsh, for =~, you have the choice of either ERE or PCRE by (un)setting the rematchpcre option.

ksh93 globs were extended in ksh93r+ in 2006 to have syntax for regexp matching⁵ so you could do [[ $action = ~(Ei)^-{0,2}h(elp)?$ ]] for instance (equivalent to zsh's [[ $action -pcre-match '(?i)^-{0,2}h(elp)$' ]] after zmodload zsh/pcre).


¹ with its : operator. Also note that expr regexp matching is implicitly anchored at the start (as if there was a hidden ^), not at the end.

² also recognises things like \n/\b/\123... on top of standard EREs, which means that except in busybox awk, you don't get back-references (there's no back-references in standard EREs anyway). Beware support for the interval operator ({x,y}) has been added only relatively recently in some awk implementations (very recently in the case of mawk).

³ though some GNU/Linux distributions have been know to maintain patches addressing that so YMMV. Beware that mawk, the default awk implementation on Ubuntu doesn't support multi-byte characters either.

=~ likely inspired from the same operator in perl, itself likely inspired from awk's ~ operator.

⁵ And essentially, its =~ operator under the hood does a = with ~(E) prepended to the pattern.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.