Generating Multiple Output files with Hadoop 0.20+

Question

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category in their respected files. I know with 0.18 that you can do this with MultipleOutputs and it has not been removed. However, I am trying to make my application 0.20+ compliant. The existing Multiple outputs functionality still requires JobConf (which my application uses Job, and Configuration). How can I generate multiple outputs based on the key?

Binary Nerd · Accepted Answer · 2010-02-01 23:41:55Z

9

Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this problem.

answered Feb 1, 2010 at 23:41

Binary Nerd

13.9k4 gold badges46 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

monksy Over a year ago

That is incredibly frustrating and stupid. That seems like a fundamental thing that is needed in the program.

Steve Severance Over a year ago

Yeah. A lot of work is going on toward having the correct API interface for 1.0

smartnut007 Over a year ago

Not if you use the cdh distrubution. cdh3 which is 0.20.1 plus some patched with MultipleOutputs class. I was initially reluctant to use cdh and was using apache distro. But, after a couple of issues am more happy with cdh.

mrflip · Accepted Answer · 2010-02-03 01:06:27Z

2

You can do this in Hadoop 0.20, just that as mentioned you have to use the older API.

There's some very rough code to do so in http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The resulting jar writes each record to a file named after its (sanitized) key.

answered Feb 3, 2010 at 1:06

mrflip

8226 silver badges7 bronze badges

Collectives™ on Stack Overflow

Generating Multiple Output files with Hadoop 0.20+

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related