1

How do I print output to a directory in awk using a shell argument or command parameter?

Shell program invokes and passes arguments to awk program:

testshell.sh

shelloutputdir="./ouputdir/"
./testawk inputfile.txt ./outputdir/

Awk program:

testawk

#!/usr/bin/awk -f
{
    print FILENAME > "./outputdir/outputfile1.txt"
    fn2="outputfile2.txt"
    fn3="outputfile3.txt"
    fn4="outputfile4.txt"
    print FILENAME > ARGV[2]"/"fn2
    print FILENAME > ARGV[2]"subdir/"fn3
    print FILENAME > $shelloutputdir"subdir/"fn4
}

Note:

inputfile.txt

is only an example, as the shell and awk programs will process other arguments.

The output directories already exist.

./outputdir/
./outputdir/subdir/

The outputs:

./outputdir/outputfile1.txt
./outputdir/outputfile2.txt
./outputdir/subdir/outputfile3.txt

outputfile4.txt is not created

The error:

awk: ./testawk:9: (FILENAME=inputfile.txt FNR=1) fatal: can't redirect to `input text filesubdir/outputfile4.txt' (No such file or directory)

Summary of questions:

  1. How do I explicitly set the output directory in awk?

  2. How do I use a command line parameter to set the output directory in awk?

  3. How do I create a directory if it does not exist in awk?

  4. How do I pass a shell variable to an awk variable to set the output directory?

Appreciate help and any example approaches

8
  • So to summarize, your questions are "How do I get command line parameters in an awk script?" and "How do I create a directory from within an awk script?" Commented Jun 16, 2017 at 0:11
  • Fairly close, "how do I get command line parameters in an awk script?" and both cases for the second question "output to an existing directory, by explicitly stating in the path in the awk script, and by the command line parameter", it would be good to have an example for creating the directory if it does not exist as well. Commented Jun 16, 2017 at 0:21
  • Here's a duplicate for getting command line arguments. You can already do print "foo" > "dir/subdir/file" to write to a file in subdirs that exist Commented Jun 16, 2017 at 0:31
  • 1
    And better make like that ARGV[2]"/"fn. If will be duplicate // it's not a problem. Commented Jun 16, 2017 at 1:00
  • 1
    No, it's no a problem. Multilply / in path will be ignored by system. Commented Jun 16, 2017 at 1:15

2 Answers 2

3

Using a shebang to execute the awk script just makes your life harder, don't do it. If you get rid of the shebang and write "testawk" as:

odir="$1"
shift
/usr/bin/awk -v odir="$odir" '
{
    print FILENAME > (odir "outputfile1.txt")
    fn2="outputfile2.txt"
    fn3="outputfile3.txt"
    fn4="outputfile4.txt"
    print FILENAME > (odir fn2)
    print FILENAME > (odir "subdir/" fn3)
    print FILENAME > (odir "subdir/" fn4)
}
' "$@"

then you can call it as:

shelloutputdir="./outputdir/"
./testawk "$shelloutputdir" inputfile.txt

or do whatever else you like. The point is that not using the shebang lets you separate awk from shell args and awk file names from awk variable initial values.

You can create a directory whose name is stored the variable foo with

system("mkdir -p \047" foo "\047")
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you, this looks elegant. Implemented as above, but receiving an error "awk: cmd. line:3: (FILENAME=inputfile.txt FNR=1) fatal: can't redirect to odir=./ouputdir/outputfile1.txt (No such file or directory)"
@Gabe sounds like you used tstawk odir=./outputdir/ instead of tstawk ./outputdir/. "$@" is expanded by the shell to "$1" "$2" etc. Yes, that's exactly what you'd write.
minor note in the example above shelloutputdir="./ouputdir/" in my case is ="./outputdir/" can see the advantage of using ouputdir over outputdir
Nicely done; worth noting that system("mkdir -p \047" foo "\047") would break if foo contained a single quote and that a generic quoting-for-the-shell mechanism requires more work.
Thank you every one that has helped with this. I have now implemented this framework into my [xml extraction and validation pipeline]( stackoverflow.com/questions/44388628/…) and very happy with the relative simplicity of extending this code base to scale for processing large data sets. Next is a performance measurement module.
|
1

Note:
* This answer addresses the question as asked, based on a stand-alone awk script that uses a shebang line (#!/usr/bin/awk -f).
* Ed Morton's helpful answer shows how to call awk from a shell script as an alternative, which has its advantages.

All operands passed to awk that come after the script operand (which is implicitly the stand-alone script itself, in this case) are by default interpreted as input files.

Given that ./outputdir/ is by definition a directory, it can't act as an input file, which is why you're getting the warning.

However, Awk offers pseudo-filename-operand syntax <var>=<value>, which, instead of passing a filename, defines an Awk variable, analogous to the the pre-script -v <var>=<value> option syntax (and given that your invocation is by shebang line, the -v-option-based variable assignment is not an option).

Note that these assignments happen as they're being encountered in the list of post-script operands, so you need to place them before actual input files whose processing relies on them:

shelloutputdir="./outputdir/"
./testawk odir="$shelloutputdir" inputfile.txt # Note the definition of variable `odir`

There is no limit on the number of variables you can define this way, but, at least hypothetically, you're limited by the maximum overall length of the command line, which is value close to, but less than what getconf ARG_MAX reports.

The above defines Awk variable odir, so your script needs to reference that:

#!/usr/bin/awk -f
{
    fn3="outputfile3.txt"
    print FILENAME > (odir "subdir/" fn3)
}

As Ed Morton points out, if the output filename is calculated from an expression, that expression should be enclosed in (...) for robustness; while it may also work without the parentheses in some Awk implementations (e.g., GNU Awk and Mawk), it will break in others (e.g., BSD/macOS Awk).
The Awk POSIX spec does not regulate the behavior in this situation.


  1. How do I explicitly set the output directory in awk?

There is no Awk-internal mechanism, but you can use the shell to cd to the output directory beforehand.

  1. How do I use a command line parameter to set the output directory in awk?

See solution above. There is no special output-directory parameter in Awk, but you can pass the output-directory path as an Awk variable.

  1. How do I create a directory if it does not exist in awk?

There is no Awk-internal mechanism, but - if creating the dir. ahead of time in the shell is not an option - you can use the system() function to invoke mkdir; e.g.:

# If the dir. name never contains ' (single quotes):
awk -v odir="out-dir" 'BEGIN { system("mkdir \047" odir "\047") }'

# *From inside your stand-alone Awk script only*, you don't need \047 to represent
# ' chars - see below.
system("mkdir '" odir "'")

# Otherwise, more work is needed:
awk -v odir="out'dir" '
   function shellQuote(s) { gsub("\047", "\047\\\047\047", s); return "\047" s "\047" }
   BEGIN { system("mkdir " shellQuote(odir)) }
'

\047 is an octal escape sequence representing ', which must be used when calling awk explicitly, from the shell, because '...' is already being used to enclose the script as a whole, which prevent use of embedded ' chars. altogether, because single-quoted shell strings do not support it.

This is one aspect in which a stand-alone awk script has an advantage over explicit awk invocation from the shell: you're free to use literal ' instances in the stand-alone script - no need for \047.

  1. How do I pass a shell variable to an awk variable to set the output directory?

See the answer to question #2.

1 Comment

Thank you, this is a very helpful and informative answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.