13

An Oceanographer friend at work needs to back up many months worth of data. She is overwhelmed so I volunteered to do it. There are hundreds of directories to be backed up and we want to tar/bzip them into files with the same name as the directory. I can do this easy enough serially - but - I wanted to take advantage of the several hundred cores on my work station.

Question: using find with the -n -P args or GNU Parallel, how do I tar/bzip the directories, using as many cores as possible while naming the end product: origonalDirName.tar.bz2?

I have used find to bunzip 100 files simultaneously and it was VERY fast - so this is the way to approach the problem though I do not know how to get each filename to be that of each directory.

3
  • 1
    Just tar to stdout and pipe it to pigz. (You most likely don't want to parallelize disk access, just the compression part.) Commented Jul 20, 2015 at 20:30
  • 2
    @PSkocik pigz is an answer. Could you add a one liner, in an answer. Commented Jul 20, 2015 at 20:34
  • Consider using xz compression, it is usually better than bzip2. Commented Jul 20, 2015 at 21:11

3 Answers 3

15

Just tar to stdout and pipe it to pigz. (You most likely don't want to parallelize disk access, just the compression part.):

$ tar -c myDirectory/ | pigz > myDirectory.tar.gz

A plain tar invocation like the one above basically only concatenates directory trees in a reversible way. The compression part can be separate as it is in this example.

pigz does multithreaded compression. The number of threads it uses can be adjusted with -p and it'll default to the number of cores available. More detailed info can be found at the pigz github repo

1

pbzip2 works quite well. As with the answer above, tar to stdout and pipe to pbzip2:

$ tar -cf - mydir/ | pbzip2 > mydir.tar.bz2

pbzip2 accepts multiple options that allow for adjusting number of processors, amount of memory used, level of compression etc.

http://compression.ca/pbzip2/

Or for one archive per directory (assumes no spaces or special chars in directory names):

for dir in * ; do 
     [[ ! -d ${dir} ]] && continue
     tar cf -  ${dir} | bzip2 > ${dir}.tar.bz2 &
done
1

With GNU Parallel it looks like this:

parallel tar jcvf /tmp/{= s:/$:: =}.tar.bz2 {} ::: */

or:

parallel tar jcvf /tmp/{}.tar.bz2 {} ::: *

For better compression try:

parallel tar -I pxz -cvf /tmp/{= s:/$:: =}.tar.xz {} ::: */

s:/$:: is a perl expression. It removes the ending /

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.