1

I want hadoop ( 0.22.0 ) to write out the content into different files like part-r-00000
part-r-00001
part-r-00002
part-r-00003
Each reduce-job a different file. I know I can use the MultipleOutputs-Class, but this let me only change the 'part'-phrase, but this is not what I want. I want to be able to say which reducer uses which output file and what number it gets at the end.

3
  • 1
    Do you relay want to break this nice file naming convention? You can always change file names when job is done. Commented Mar 3, 2013 at 19:05
  • How can I change the filenames when the job is done? In the cleanup of the reduce-class the files do not exist. After that I don't have any control anymore. -- I've read into the sourcecode and it seems like the numbers at the end come from the TaskID and this a unique identifier for the reducer. So I tried to set the number of reducers to 9 ( I wan't part-r-00001 ... part-r-00009 ), but there is still only part-r-00001 :/ Commented Mar 3, 2013 at 19:25
  • Oh man this job.waitForCompletion was what I am looking for :D Big thanks to you. Commented Mar 3, 2013 at 19:54

1 Answer 1

1

Of course you have the control. When job finished (ex. after job.waitForCompletion(true) ). You know the output path and the number of reducers that were used. just rename files, it's all.... To run more reducers you should white a partitioner class.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.