Hadoop WordCount Tutorial java.lang.ClassNotFoundException

Ask Question

Asked 9 years ago

Modified 9 years ago

Viewed 621 times

I'm relatively new to hadoop and I'm struggling a little bit to understand the ClassNotFoundException I get when trying to run the job. I'm using the standard tutorial found here and here is my WordCount class (running on ubuntu 16.04 hadoop 2.7.3 distributed cluster mode):

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

To try and remain organized, I added a couple paths to my ~/.bashrc file:

hduser@mynode:~$ cd $HADOOP_CODE
hduser@mynode:/usr/local/hadoop/code$

This is one directory down from the $HADOOP_HOME directory. To compile the WordCount.JAVA file, I ran:

hduser@mynode:/usr/local/hadoop$ hadoop com.sun.tools.javac.Main $HADOOP_CODE/WordCount.java
hduser@mynode:/usr/local/hadoop$ jar cf wc.jar $HADOOP_CODE/WordCount*.class

I then tried:

hduser@mynode:/usr/local/hadoop$ hadoop jar $HADOOP_CODE/wc.jar $HADOOP_CODE/WordCount /home/hduser/input /home/hduser/output/wordcount

which bombed with the following error:

Exception in thread "main" java.lang.ClassNotFoundException: /usr/local/hadoop/code/WordCount

EDIT This gave me the same error:

hduser@mynode:/usr/local/hadoop/code$ hadoop jar $HADOOP_CODE/wc.jar WordCount /home/hduser/input /home/hduser/output/wordcount

To get it to run without error, I moved the WordCount.Java file up one directory to the default hadoop ($HADOOP_HOME) folder. I also know from here and here that the solution is to add a package to the file.

What I'm trying to understand is why that is the solution. With no package name, where is hadoop looking for the specified package, and why can't I pass it a full path to get it to run correctly? This may be a basic java question (apologies - I'm from the python world), but what is the package name doing during the compile process that makes it so I could run without a path name, but leaving off the package name means it has to be in that default directory? I'd prefer not to have to add a package name to every job I run. An explanation would be greatly appreciated!

edited May 23, 2017 at 12:24

CommunityBot

11 silver badge

asked Dec 2, 2016 at 3:17

flyingmeatball

8,0179 gold badges48 silver badges64 bronze badges

Your argument $HADOOP_CODE/WordCount needs to be a class name, not a file path

OneCricketeer
– OneCricketeer

2016-12-02 03:22:58 +00:00
Commented Dec 2, 2016 at 3:22
And package names have periods, not slashes for file paths, just like python modules

OneCricketeer
– OneCricketeer

2016-12-02 03:24:30 +00:00
Commented Dec 2, 2016 at 3:24
I tried just WordCount, but without the class being compiled into the $HADOOP_HOME folder it errored out. I wasn't trying to point to a package, I was trying to put a path to the class.

flyingmeatball
– flyingmeatball

2016-12-02 03:24:47 +00:00
Commented Dec 2, 2016 at 3:24
And what was the error then?

OneCricketeer
– OneCricketeer

2016-12-02 03:25:41 +00:00
Commented Dec 2, 2016 at 3:25
edited above - same error

flyingmeatball
– flyingmeatball

2016-12-02 03:27:16 +00:00
Commented Dec 2, 2016 at 3:27

| Show 6 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Hadoop WordCount Tutorial java.lang.ClassNotFoundException

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked