Different output files

Question

I want hadoop ( 0.22.0 ) to write out the content into different files like part-r-00000
part-r-00001
part-r-00002
part-r-00003
Each reduce-job a different file. I know I can use the MultipleOutputs-Class, but this let me only change the 'part'-phrase, but this is not what I want. I want to be able to say which reducer uses which output file and what number it gets at the end.

Do you relay want to break this nice file naming convention? You can always change file names when job is done. — www
– www, Commented Mar 3, 2013 at 19:05
How can I change the filenames when the job is done? In the cleanup of the reduce-class the files do not exist. After that I don't have any control anymore. -- I've read into the sourcecode and it seems like the numbers at the end come from the TaskID and this a unique identifier for the reducer. So I tried to set the number of reducers to 9 ( I wan't part-r-00001 ... part-r-00009 ), but there is still only part-r-00001 :/ — user2329125
– user2329125, Commented Mar 3, 2013 at 19:25
Oh man this job.waitForCompletion was what I am looking for :D Big thanks to you. — user2329125
– user2329125, Commented Mar 3, 2013 at 19:54

www · Accepted Answer · 2013-03-03 20:07:17Z

1

Of course you have the control. When job finished (ex. after job.waitForCompletion(true) ). You know the output path and the number of reducers that were used. just rename files, it's all.... To run more reducers you should white a partitioner class.

answered Mar 3, 2013 at 20:07

www

4,4011 gold badge26 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Different output files

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related