77

I want to run the system command in an awk script and get its output stored in a variable. I've been trying to do this, but the command's output always goes to the shell and I'm not able to capture it. Any ideas on how this can be done?

Example:

$ date | awk --field-separator=! {$1 = system("strip $1"); /*more processing*/}

Should call the strip system command and instead of sending the output to the shell, should assign the output back to $1 for more processing. Rignt now, it's sending output to shell and assigning the command's retcode to $1.

1
  • 3
    nit: The output isn't going to the shell, it's going to the terminal/console. The shell doesn't read any of the output of its children--they just share file descriptors that are associated with the same tty. Commented Dec 25, 2009 at 16:54

7 Answers 7

80

Note: Coprocess is GNU awk specific. Anyway another alternative is using getline

cmd = "strip "$1
while ( ( cmd | getline result ) > 0 ) {
  print  result
} 
close(cmd)

Calling close(cmd) will prevent awk to throw this error after a number of calls :

fatal: cannot open pipe `…' (Too many open files)

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. This way, I can remove the & from my answer. Looks cooler. But I'm writing only for usage in Linux, so unavailability of gawk shouldn't be an issue ?
yes, shouldn't be an issue. still you should check documentation and see if coprocess is only available in certain version of gawk. i can't remember on top of my head
From version 3.1. RedHat has 3.1.5. Anyways I'll use the way you suggested, unless I want to send something to stdin of the command, in which case coprocess is helpful.
Awk never ceases to amaze me.
Note that if you have a for loop over the code above then the close(cmd) is necessary as I discovered it the hard way that awk breaks out after 1018 iterations (this may depend on your system)
54

To run a system command in awk you can either use system() or cmd | getline.

I prefer cmd | getline because it allows you to catch the value into a variable:

$ awk 'BEGIN {"date" |  getline mydate; close("date"); print "returns", mydate}'
returns Thu Jul 28 10:16:55 CEST 2016

More generally, you can set the command into a variable:

awk 'BEGIN {
       cmd = "date -j -f %s"
       cmd | getline mydate
       close(cmd)
     }'

Note it is important to use close() to prevent getting a "makes too many open files" error if you have multiple results (thanks mateuscb for pointing this out in comments).


Using system(), the command output is printed automatically and the value you can catch is its return code:

$ awk 'BEGIN {d=system("date"); print "returns", d}'
Thu Jul 28 10:16:12 CEST 2016
returns 0
$ awk 'BEGIN {d=system("ls -l asdfasdfasd"); print "returns", d}'
ls: cannot access asdfasdfasd: No such file or directory
returns 2

5 Comments

+1 for adding close(), if you don't add it, and have multiple results, you may get "makes too many open files". If you have a longer command, you can do cmd = "date -j -f %s"; cmd | getline mydate; close(cmd)
@mateuscb many thanks for your feedback. I updated the question to include your useful comments.
Thanks for the reminding of close() command. It helps a lot. Without putting close(), I sometimes get wrong date result for multiple results. With putting close(). my multiple date results are all correctly displayed.
close(cmd) was crucial for me when doing a cmd | getline var in a awk internal function that was called several times. The second time it was being called and the getline was triggered, the var was no longer being populated
close(cmd): helps a lot. First, it frees the file descriptor. Second : it also "flushes" stdout and thus makes the display better (but it does cost a little bit of 'time' too, to call close for each operation. That "cost" should be paid, however).
36

Figured out.

We use awk's Two-way I/O

{
  "strip $1" |& getline $1
}

passes $1 to strip and the getline takes output from strip back to $1

3 Comments

If you need to call the same command several times, we have to close the command (staff.science.uu.nl/~oostr102/docs/nawk/nawk_26.html#SEC29)
This is not awk but gawk specific (gnu awk) : " with gawk, it is possible to open a two-way pipe to another process "
close("strip $1" ); afterwards is important for large files (probably small as well)
6
gawk '{dt=substr($4,2,11); gsub(/\//," ",dt); "date -d \""dt"\" +%s"|getline ts; print ts}'

2 Comments

If you post answers you should explain the different parts (what you did and why it works). So that others could learn from your answer. For some people this line would be self explaining. But for others its hard to follow what you did exactly.
CAUTION: You should use close(cmd) along with getline, else the results are wrong if run for bulk data. More here
5

You can use this when you need to process a grep output:

echo "some/path/exex.c:some text" | awk -F: '{ "basename "$1"" |& getline $1; print $1 " ==> " $2}'

option -F: tell awk to use : as field separator

"basename "$1"" execute shell command basename on first field

|& getline $1 reads output of previous shell command in substream

output:
exex.c ==> some text

Comments

3

I am using macOS's awk and I also needed exit status of the command. So I extended @ghostdog74's solution to get the exit status too:

Exit if non-zero exit status:

cmd = <your command goes here>
cmd = cmd" ; printf \"\n$?\""

last_res = ""
value = ""        

while ( ( cmd | getline res ) > 0 ) {

    if (value == "") {
        value = last_res
    } else {
        value = value"\n"last_res
    }

    last_res = res
}

close(cmd)

# Now `res` has the exit status of the command
# and `value` has the complete output of command

if (res != 0) {
    exit 1
} else {
    print value
}

So basically I just changed cmd to print exit status of the command on a new line. After the execution of the above while loop, res would contain the exit status of the command and value would contain the complete output of the command.

Honestly not a very neat way and I myself would like to know if there is some better way.

3 Comments

Nice trick, to add the return value as the last line. But maybe simpler: tmpfile="somename" ; cmd="thingyouwant >" tmpfile ; res=system(cmd) ; close(cmd) and then use the simple getline to parse tmpfile to get the output of thingyouwant? (and delete it afterwards with another cmd="rm " tmpfile (that you system(cmd) and close(cmd) as well)
Yes that's much cleaner. I would suggest you to add a new answer for that aswell. I won't be able to test it right now for speed and correctness but will try to use that way if it suits in my code whenever I get back to it.
I believe the exit status is returned by the "close(cmd)"
0

Using GNU awk, I wanted to grab the output of the function call and store it so I could format everything with printf. You can't do that with system(), but you can with myCmd | getline myVar:

#!/usr/bin/env bash

hrbytes() {  # human readable bytes. numfmt is cool.
  local num;
  if [[ $# -lt 1 ]]; then
    read num;
  else
    num="$1"
  fi
  local from
  if [[ "$num" =~ [KMGTPEZY]i$ ]]; then
    from="--from=iec-i"
  elif [[ "$num" =~ [KMGTPEZY]$ ]]; then
    from="--from=si"
  fi
  # purposefully not quoting from to avoid empty string issues
  numfmt --to=iec-i --suffix=B --format="%.1f" $from "${num//,}"
}
export -f hrbytes

command time -l helm ls 2>&1 | 
  awk '/peak memory/ {"hrbytes " $1 | getline mem}; /[0-9.] real / {time=$1} END {printf "%ss; %s\n", time, mem}'

This calls hrbytes with the argument of the first field on lines matching that regex and stores the output in mem, which I can think reference with my printf command at the END of reading the file.

This printed 1.49s; 152.1MiB, which was what I wanted to see.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.