2

So I have the equivalent of a list of files being output by another command, and it looks something like this:

http://somewhere.com/foo1.xml.gz
http://somewhere.com/foo2.xml.gz
...

I need to run the XML in each file through xmlstarlet, so I'm doing ... | xargs gzip -d | xmlstarlet ..., except I want xmlstarlet to be called once for each line going into gzip, not on all of the xml documents appended to each other. Is it possible to compose 'gzip -d' 'xmlstarlet ...', so that xargs will supply one argument to each of their composite functions?

1
  • You cannot do it in xargs when there is a pipe... well you could, but in a very clumsy way... use something like shellter answer instead. Commented Jul 23, 2011 at 14:09

4 Answers 4

4

Why not read your file and process each line separately in the shell? i.e.

fileList=/path/to/my/xmlFileList.txt
cat ${fileList} \
| while read fName ; do
   gzip -d ${fName} | xmlstartlet > ${fName}.new
done 

I hope this helps.

Sign up to request clarification or add additional context in comments.

3 Comments

Exactly what I was going to suggest +1.
Don't abuse a cat... use while read fName; do ...; done < fList ;)
Yes, I was aware of that, but used cat because with some shells I seem to recall that ... done < ${fName} (having a variable hold the inputFile) was not reliable (I think, or maybe it was something else ;-). Thanks for the reminder.
1

Although the right answer is the one suggested by shelter (+1), here is a one-liner "divertimento" providing that the input is the proposed by Andrey (a command that generates the list of urls) :-)

~$ eval $(command | awk '{a=a "wget -O - "$0" | gzip -d | xmlstartlet > $(basename "$0" .gz ).new; " } END {print a}')

It just generates a multi command line that does wget http://foo.xml.gz | gzip -d | xmlstartlet > $(basenname foo.xml.gz .gz).new for each of the urls in the input; after the resulting command is evaluated

Comments

1

Use GNU Parallel:

cat filelist | parallel 'zcat {} | xmlstarlet >{.}.out'

or if you want to include the fetching of urls:

cat urls | parallel 'wget -O - {} | zcat | xmlstarlet >{.}.out'

It is easy to read and you get the added benefit of having on job per CPU run in parallel. Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ

Comments

0

If xmlstarlet can operate on stdin instead of having to pass it a filename, then:

some command | xargs -i -n1 sh -c 'zcat "{}" | xmlstarlet options ...'

The xargs option -i means you can use the "{}" placeholder to indicate where the filename should go. Use -n 1 to indicate xargs should only one line at a time from its input.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.