10

I am new to hadoop. I followed the maichel-noll tutorial to set up hadoop in single node.I tried running WordCount program. This is the code I used:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "WordCount");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

This is what I get when I try running it.

hduser@aswin-HP-Pavilion-15-Notebook-PC:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt
Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/myorg/WordCount)
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:788)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:447)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

Can anyone please help me. My class path :

hduser@aswin-HP-Pavilion-15-Notebook-PC:/usr/local/hadoop$ hadoop classpath
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/lib/jvm/java-7-openjdk-i386/lib/tools.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
6
  • You are missing Driver class (although you combined it with reducer which is bad practice). Btw do you have this jar in your classpath? Commented Nov 2, 2014 at 15:26
  • Have added the class path in the question. Should I place the jar file in one of the above folder. Commented Nov 2, 2014 at 15:35
  • No place it in same directory from where you are running your mapred or place it in lib directory where hadoop libs sits. Commented Nov 2, 2014 at 15:44
  • I tried placing it in lib and also share/lib/mapreduce. Still the same error. Commented Nov 2, 2014 at 15:54
  • just saw this "org/myorg" you are using package so use org.myorg.WordCount so your command would be: bin/hadoop jar wc.jar org.myorg.WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt Commented Nov 2, 2014 at 15:56

7 Answers 7

10

try this,

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

    public static class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, IntWritable> {

        @Override
        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens()) {
                value.set(tokenizer.nextToken());
                output.collect(value, new IntWritable(1));
            }

        }
    }

    public static class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {

        @Override
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }

            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {

        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

    }
}

then run command

bin/hadoop jar WordCount.jar WordCount /hdfs_Input_filename /output_filename

if your code is in particular package then you have to mention package name with class name

bin/hadoop jar WordCount.jar PakageName.WordCount /hdfs_Input_filename /output_filename
Sign up to request clarification or add additional context in comments.

5 Comments

I get the same No class def found error. Btw, this is how I compile and create jar. Is this rite : $ bin/hadoop com.sun.tools.javac.Main WordCount.java $ jar cf wc.jar WordCount*.class
What's problem with your code, your program is in particular package or in default package in eclipse???
Just look at my solution. I know its crazy :D . Thanks a lot. I actually used your code. :)
your answer is missing the explanation as to why the above didn't work OR why your solution works!
@KishoreKumarSuthar can you suggest some good tutorials for mapreduce?
1

This may sound crazy. I added package org.myorg; to my code and compiled it again. I placed the class files in org/myorg folder and created the jar file using them. Then I ran using the jar wc.jar org.myorg.WordCount command and it got executed successfully. It would be nice if someone could explain me how it actually ran :D . Any way, thanks a lot for helping me guys.

Comments

1

try explicitly including the nested classes(i.e. TokenizerMapper and IntSumReducer) in you jar file. Here is how I did it:

jar cvf WordCount.jar WordCount.class WordCount\$TokenizerMapper.class WordCount\$IntSumReducer.class

1 Comment

This did it for me. The error I was getting was "NoClassDefFoundError: TokenizerMapper"
1

there is something wrong with packaging. you should try this:

jar cf wc.jar WordCount*.class

notice there is a symbol '*'

Comments

0

You are using package in your class. So your command should be

bin/hadoop jar wc.jar org.myorg.WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt 

1 Comment

I get a class not found exception. Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.WordCount
0

I think you made a mistake here :

/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt

please change it to :

/usr/local/hadoop$ bin/hadoop jar wc.jar org.myorg.WordCount /home/hduser/gutenberg /home/hduser/gutenberg-output/sample.txt

that should work.

@Aswin Alagappan : Reason is a jar file cotains your path in it. JVM cannot find your class in the jar file becase it is in the "jar\org\myorg" path. Understand?

Comments

0

The answer of Kishore, allowed me to go in the right direction, if it’s possible i want to confirm this, reporting what I did about an experiiment with java code on moltiplication of sparse matrix :

1) Source code (downloaded from https://github.com/marufaytekin/MatrixMultiply/tree/master/src/main/java/com/lendap/hadoop), and saved in /home/hduser/playground/src/matrixMult

2) Downloaded datasets (matrix M and N from https://github.com/marufaytekin/MatrixMultiply/tree/master/input), and then saved in HDFS, with the following path : /user/hduser/inMatrix

3) Compilation with hadoop classes, with creation of java Classes in playground/classes5 : javac -classpath $HADOOP_HOME/share/hadoop/common/lib/activation-1.1.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.1.jar:/usr/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/* -d playground/classes5 playground/src/matrixMult/*

4) Creation of jar file MatrixMultiply.jar with the following command : jar -cvf playground/MatrixMultiply.jar -C playground/classes5/ .

5) hadoop mapReduce command (from the $HADOOP_HOME path, that in my case is /usr/hadoop/hadoop-2.7.1$ hadoop jar /home/hduser/playground/MatrixMultiply.jar com.lendap.hadoop.MatrixMultiply /user/hduser/inMatrix/ outputMatrix

6) Correct execution of mapreduce job on my 4 nodes cluster. Here, part of the final output :

0,375,890.0 0,376,1005.0 0,377,1377.0 0,378,604.0 0,379,924.0 0,38,476.0 0,380,621.0 0,381,730.0

990,225,542.0 990,226,639.0 990,227,466.0 990,228,406.0 990,229,343.0 990,23,397.0 990,230,794.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.