0

I have written a mapreduce job for doing log file analysis.My mappers output text both as key and value and I have explicitly set the map output classes in my driver class.

But i still get the error:-Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

public class CompositeUserMapper extends Mapper<LongWritable, Text, Text, Text> {

IntWritable a = new IntWritable(1);
//Text txt = new Text();

@Override
protected void map(LongWritable key, Text value,
        Context context)
        throws IOException, InterruptedException {
    String line = value.toString();

    Pattern p = Pattern.compile("\bd{8}\b");
    Matcher m = p.matcher(line);
    String userId = "";
    String CompositeId = "";
    if(m.find()){

        userId = m.group(1);
    }

     CompositeId = line.substring(line.indexOf("compositeId :")+13).trim();

     context.write(new Text(CompositeId),new Text(userId));


    // TODO Auto-generated method stub
    super.map(key, value, context);
}    

My Driver class is as below:-

public class CompositeUserDriver extends Configured implements Tool {

public static void main(String[] args) throws Exception {

    CompositeUserDriver wd = new CompositeUserDriver();
    int res = ToolRunner.run(wd, args);
    System.exit(res);

}

public int run(String[] arg0) throws Exception {
    // TODO Auto-generated method stub

    Job job=new Job();
    job.setJarByClass(CompositeUserDriver.class);
    job.setJobName("Composite UserId Count" );

    FileInputFormat.addInputPath(job, new Path(arg0[0]));
    FileOutputFormat.setOutputPath(job, new Path(arg0[1]));
    job.setMapperClass(CompositeUserMapper.class);
    job.setReducerClass(CompositeUserReducer.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    return job.waitForCompletion(true) ? 0 : 1;
    //return 0;
}

}

Please advise how can sort this problem out.

1 Answer 1

1

Remove the super.map(key, value, context); line from your mapper code: it calls map method of the parent class, which is identity mapper that returns key and value passed to it, in this case the key is the byte offset from the beginning of the file

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, the idea seems to work.But when is that parent class's method used?Why should I remove it in my code.
This is not that you should "remove" it, it is more that "you should not add this". You are extending the Mapper class and set your new class as the mapper with job.setMapperClass call in a driver. Without this call, the default Mapper.class would be used (the one you are extending), and its map implementation is to pass the key-value pair it receives without any changes. Seems that your code was auto-generated and the line super.map(key, value, context); was the only one in the map method - in this case it will work fine with setting the right key and value types
Yes the code was autogenerated, but super.map(key, value, context); wasn't the only line in my code instead it was the last line, I had written my own logic to process key and value. That's why I got confused why it still preferred calling parent class's map logic.
It has not preference, it is doing exactly what you have written: first it outputs a couple of texts within your command context.write(new Text(CompositeId),new Text(userId));, then it calls super.map(key, value, context); which in turn calls context.write(key, value);. So you have both outputs collected, but the key type for them is different while it should be the same according to the Hadoop MapReduce framework rules

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.