I want hadoop ( 0.22.0 ) to write out the content into different files like
part-r-00000
part-r-00001
part-r-00002
part-r-00003
Each reduce-job a different file.
I know I can use the MultipleOutputs-Class, but this let me only change the 'part'-phrase, but this is not what I want. I want to be able to say which reducer uses which output file and what number it gets at the end.
-
1Do you relay want to break this nice file naming convention? You can always change file names when job is done.www– www2013-03-03 19:05:13 +00:00Commented Mar 3, 2013 at 19:05
-
How can I change the filenames when the job is done? In the cleanup of the reduce-class the files do not exist. After that I don't have any control anymore. -- I've read into the sourcecode and it seems like the numbers at the end come from the TaskID and this a unique identifier for the reducer. So I tried to set the number of reducers to 9 ( I wan't part-r-00001 ... part-r-00009 ), but there is still only part-r-00001 :/user2329125– user23291252013-03-03 19:25:26 +00:00Commented Mar 3, 2013 at 19:25
-
Oh man this job.waitForCompletion was what I am looking for :D Big thanks to you.user2329125– user23291252013-03-03 19:54:52 +00:00Commented Mar 3, 2013 at 19:54
Add a comment
|