1

Please help me to solve this issue in rmr2 (rhadoop integration)

I am using cloudera-quickstart-vm-5.4.0-0-virtualbox

version details:

Hadoop 2.6.0-cdh5.4.0
java version "1.7.0_67"
R version 3.2.0 
rmr 2.3.0

Below R code and error:

Sys.setenv("HADOOP_HOME"="/usr/lib/hadoop")
Sys.setenv("HIVE_HOME"="/usr/lib/hive")
Sys.setenv("HADOOP_CMD"="/usr/lib/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.4.0.jar")
Sys.getenv("HADOOP_CMD")
[1] "/usr/lib/hadoop/bin/hadoop"
Sys.setenv("RHIVE_FS_HOME"="/home/rhive")
Sys.setenv(JAVA_HOME="/usr/java/jdk1.7.0_67-cloudera")
library(rmr2)
library(rhdfs)
hdfs.init()

## map function
map <- function(k,lines) {
  words.list <- strsplit(lines, '/t')
  words <- unlist(words.list)
  return( keyval(words, 1) )
}

## reduce function
reduce <- function(word, counts) {
  keyval(word, sum(counts))
}

wordcount <- function (input, output=NULL) {
  mapreduce(input=input, output=output, input.format="text",
  map=map, reduce=reduce)
}
## delete previous result if any
hdfs.root <- 'wordcount'
hdfs.data <- file.path(hdfs.root, 'input/d.txt')
hdfs.out <- file.path(hdfs.root, 'out')
out <- wordcount(hdfs.data, hdfs.out)

15/08/10 05:16:40 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/tmp/RtmpQRIzWy/rmr-local-env17807fd4e342, /tmp/RtmpQRIzWy/rmr-global-env17807c276b72, /tmp/RtmpQRIzWy/rmr-streaming-map17807a097c08, /tmp/RtmpQRIzWy/rmr-streaming-reduce178012121630] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.4.0.jar] /tmp/streamjob8275237671226206531.jar tmpDir=null
15/08/10 05:16:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/08/10 05:16:42 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/08/10 05:16:42 INFO mapred.FileInputFormat: Total input paths to process : 1
15/08/10 05:16:42 INFO mapreduce.JobSubmitter: number of splits:2
15/08/10 05:16:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439208338231_0002
15/08/10 05:16:43 INFO impl.YarnClientImpl: Submitted application application_1439208338231_0002
15/08/10 05:16:43 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1439208338231_0002/
15/08/10 05:16:43 INFO mapreduce.Job: Running job: job_1439208338231_0002
15/08/10 05:16:49 INFO mapreduce.Job: Job job_1439208338231_0002 running in uber mode : false
15/08/10 05:16:49 INFO mapreduce.Job:  map 0% reduce 0%
15/08/10 05:16:57 INFO mapreduce.Job:  map 50% reduce 0%
15/08/10 05:16:57 INFO mapreduce.Job: Task Id : attempt_1439208338231_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/10 05:16:57 INFO mapreduce.Job: Task Id : attempt_1439208338231_0002_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/10 05:16:58 INFO mapreduce.Job:  map 0% reduce 0%
15/08/10 05:17:06 INFO mapreduce.Job: Task Id : attempt_1439208338231_0002_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/10 05:17:07 INFO mapreduce.Job: Task Id : attempt_1439208338231_0002_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/10 05:17:16 INFO mapreduce.Job: Task Id : attempt_1439208338231_0002_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/10 05:17:17 INFO mapreduce.Job: Task Id : attempt_1439208338231_0002_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/08/10 05:17:25 INFO mapreduce.Job:  map 100% reduce 100%
15/08/10 05:17:25 INFO mapreduce.Job: Job job_1439208338231_0002 failed with state FAILED due to: Task failed task_1439208338231_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/08/10 05:17:25 INFO mapreduce.Job: Counters: 13
    Job Counters 
        Failed map tasks=7
        Killed map tasks=1
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=53461
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=53461
        Total vcore-seconds taken by all map tasks=53461
        Total megabyte-seconds taken by all map tasks=54744064
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
15/08/10 05:17:25 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

1 Answer 1

3

You can display logs for your app by running this command:

yarn logs -applicationId application_${appid}

in your case

yarn logs -applicationId application_1439208338231_0002

or you can find them in locations like:

/yarn/apps/&{user_name}/logs/application_${appid}/

or

/app-logs/&{user_name}/logs/application_${appid}/

In general, localization of logs in hadoop: https://stackoverflow.com/a/21629631/3846521

In logs you have stderr output from your code. If you'll still have problem with solution paste log here, I'll try help you.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.