2

I have a mapreduce program whose output is all in text files right now. A sample of the program is below. What I do not understand how to do is output the key/value pairs from the reducer in sequence file format. No, I can't use SequeceFileFormat specifier because I'm using the hadoop 0.20 library

So what do I do? Below is a sample The wordcount program is just one small part of my larger program. If I know how to do it w/ one, I can do it with the rest. Please help. Word Count Reducer

public void reduce(Text key, Iterable<IntWritable> values, Context context) 
  throws IOException, InterruptedException 
  {
    int sum = 0;
    for (IntWritable val : values) {
        sum += val.get();
    }
    System.out.println("reducer.output: "+key.toString()+" "+sum);

    context.write(key, new IntWritable(sum)); **//RIGHT HERE!! OUTPUTS TO TEXT**

}

}

Now here is the main program that runs this (I left out the mapper and other irrelevant details)

Configuration conf = new Configuration();

Job job = new Job(conf, "Terms");
job.setJarByClass(wordCount.class);

//Outputting key value pairs as a dictionary (rememb python)
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//Setting the mapper and reducer classes
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);


//Setting the type of input format. In this case, plain TEXT
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

I know how to convert a text file to a sequence file. I know how to do the opposite. That isn't the issue here. I couldn't find any example of actually doing this in a hadoop program which is why I am stuck.

So the output that I want is for this program to write the key/value pairs in a sequence file instead of a text file

I also want to know how to read IN a sequence file with the Mapper

Any help would be greatly appreciated.

1 Answer 1

1

I believe it suffices to change input and output formats. Key/value pairs should be the same once things are encoded/decoded correctly. So use:

import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;

&

job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);

Give it a try, as I have not done this in a while...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.