1

I'm relatively new to hadoop and I'm struggling a little bit to understand the ClassNotFoundException I get when trying to run the job. I'm using the standard tutorial found here and here is my WordCount class (running on ubuntu 16.04 hadoop 2.7.3 distributed cluster mode):

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

To try and remain organized, I added a couple paths to my ~/.bashrc file:

hduser@mynode:~$ cd $HADOOP_CODE
hduser@mynode:/usr/local/hadoop/code$

This is one directory down from the $HADOOP_HOME directory. To compile the WordCount.JAVA file, I ran:

hduser@mynode:/usr/local/hadoop$ hadoop com.sun.tools.javac.Main $HADOOP_CODE/WordCount.java
hduser@mynode:/usr/local/hadoop$ jar cf wc.jar $HADOOP_CODE/WordCount*.class

I then tried:

hduser@mynode:/usr/local/hadoop$ hadoop jar $HADOOP_CODE/wc.jar $HADOOP_CODE/WordCount /home/hduser/input /home/hduser/output/wordcount

which bombed with the following error:

Exception in thread "main" java.lang.ClassNotFoundException: /usr/local/hadoop/code/WordCount

EDIT This gave me the same error:

hduser@mynode:/usr/local/hadoop/code$ hadoop jar $HADOOP_CODE/wc.jar WordCount /home/hduser/input /home/hduser/output/wordcount

To get it to run without error, I moved the WordCount.Java file up one directory to the default hadoop ($HADOOP_HOME) folder. I also know from here and here that the solution is to add a package to the file.

What I'm trying to understand is why that is the solution. With no package name, where is hadoop looking for the specified package, and why can't I pass it a full path to get it to run correctly? This may be a basic java question (apologies - I'm from the python world), but what is the package name doing during the compile process that makes it so I could run without a path name, but leaving off the package name means it has to be in that default directory? I'd prefer not to have to add a package name to every job I run. An explanation would be greatly appreciated!

11
  • Your argument $HADOOP_CODE/WordCount needs to be a class name, not a file path Commented Dec 2, 2016 at 3:22
  • And package names have periods, not slashes for file paths, just like python modules Commented Dec 2, 2016 at 3:24
  • I tried just WordCount, but without the class being compiled into the $HADOOP_HOME folder it errored out. I wasn't trying to point to a package, I was trying to put a path to the class. Commented Dec 2, 2016 at 3:24
  • And what was the error then? Commented Dec 2, 2016 at 3:25
  • edited above - same error Commented Dec 2, 2016 at 3:27

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.