Yes, FileOutputCommitter can be used that moves the contents of the temporary task directory to final output directory when a task succeeds, and deletes the original task directory.
I believe most of the the built-in output formats extending FileOutputFormat in Hadoop uses a OutputCommitter, by default which is FileOutputCommitter.
This is the code from FileOutputFormat
public synchronized
OutputCommitter getOutputCommitter(TaskAttemptContext context
) throws IOException {
if (committer == null) {
Path output = getOutputPath(context);
committer = new FileOutputCommitter(output, context);
}
return committer;
}
To write to multiple paths you can probably look into MultipleOutputs, that by default uses the OutputCommitter.
Or you can create your own output format and extend FileOutputFomat and override the above function in FileOutputFormat, create your own OutputCommitter implementation looking at the FileOutputCommitter code.
In the FileOoutputcommitter code you will find the function, which you might be interested in -
/**
* Delete the work directory
*/
@Override
public void abortTask(TaskAttemptContext context) {
try {
if (workPath != null) {
context.progress();
outputFileSystem.delete(workPath, true);
}
} catch (IOException ie) {
LOG.warn("Error discarding output" + StringUtils.stringifyException(ie));
}
}
If a task succeeds then commitTask()is called, which in the default
implementation moves the temporaray task output directory (which has
the task attempt ID in its name to avoid conflicts between task
attempts) to the final output path, ${mapred.out put.dir}. Otherwise,
the framework calls abortTask(), which deletes the temporary task
output directory.