Timeout Exception in Apache-Spark during program Execution

Question

I am running a Bash Script in MAC. This script calls a spark method written in Scala language for a large number of times. I am currently trying to call this spark method for 100,000 times using a for loop.

The code exits with the following exception after running a small number of iterations, around 3000 iterations.

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
    at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)

Exception in thread "dag-scheduler-event-loop" 16/11/22 13:37:32 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
    at io.netty.util.internal.MpscLinkedQueue.offer(MpscLinkedQueue.java:126)
    at io.netty.util.internal.MpscLinkedQueue.add(MpscLinkedQueue.java:221)
    at io.netty.util.concurrent.SingleThreadEventExecutor.fetchFromScheduledTaskQueue(SingleThreadEventExecutor.java:259)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:346)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
    at java.util.regex.Pattern.compile(Pattern.java:1047)
    at java.lang.String.replace(String.java:2180)
    at org.apache.spark.util.Utils$.getFormattedClassName(Utils.scala:1728)
    at org.apache.spark.storage.RDDInfo$$anonfun$1.apply(RDDInfo.scala:57)
    at org.apache.spark.storage.RDDInfo$$anonfun$1.apply(RDDInfo.scala:57)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:57)
    at org.apache.spark.scheduler.StageInfo$$anonfun$1.apply(StageInfo.scala:87)

Can someone help please, is this error being caused because of a large number of calls to spark method?

it also show java out of space exception so can you once increase the memory and try it again — Sandeep Purohit
– Sandeep Purohit, Commented Nov 22, 2016 at 12:05
Are you prsisting data? With cache or something like that? Are you using Dataframe or RDD api? — Thiago Baldim
– Thiago Baldim, Commented Nov 22, 2016 at 12:10
Can you add the snipped of the code? Maybe it can be from your method. — Thiago Baldim
– Thiago Baldim, Commented Nov 22, 2016 at 15:01
@RamPrasadG I am still running it. I have very large dataset. Once it successfully completes without error I shall accept the answer. Last time the error occurred after a day of execution. thanks for the help — Yasir Arfat
– Yasir Arfat, Commented Nov 23, 2016 at 17:56

Ram Ghadiyaram · Accepted Answer · 2016-11-22 19:45:32Z

22

Its RpcTimeoutException .. so spark.network.timeout (spark.rpc.askTimeout) could be tuned with larger-than-default values in order to handle complex workload. You can start with these values and adjust accordingly to your workloads. Please see latest

spark.network.timeout 120s Default timeout for all network interactions. This config will be used in place of spark.core.connection.ack.wait.timeout, spark.storage.blockManagerSlaveTimeoutMs, spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout if they are not configured.

Also consider increasing executor memory i.e spark.executor.memory and most imp thing is review your code, to check whether that is candidate for further optimization.

Solution : value 600 is based on requirement

set by SparkConf: conf.set("spark.network.timeout", "600s")
set by spark-defaults.conf: spark.network.timeout 600s
set when calling spark-submit: --conf spark.network.timeout=600s

edited Nov 22, 2016 at 19:45

answered Nov 22, 2016 at 18:09

Ram Ghadiyaram

29.4k16 gold badges102 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Amogh Huilgol Over a year ago

What would be the disadvantages of keeping a high value ? . Does it cause any side effects ?

Ram Ghadiyaram Over a year ago

@AmoghHuilgol : I haven't checked that! but it is clear that large workload needs bit high value. you can try on your own and post as separate question and answer for this.

Ram Ghadiyaram Over a year ago

@AmoghHuilgol :Pls give feedback if you like the answer here :-)

Ashwini · Accepted Answer · 2017-11-29 11:19:56Z

5

The above stack trace is also shown java heap space its OOM error so once try to increase the memory and run it and regarding timeout its rpc timeout so you can set spark.network.timeout with timeout value according to your need...

edited Nov 29, 2017 at 11:19

Ashwini

9001 gold badge13 silver badges27 bronze badges

answered Nov 22, 2016 at 12:09

Sandeep Purohit

3,71222 silver badges23 bronze badges

Comments

Ram Ghadiyaram · Accepted Answer · 2017-08-24 18:21:19Z

1

pls increase the executer memory so that OOM will go away else make chnage in code so that your RDD wont have big memory foot print.

--executer-memory = 3G

edited Aug 24, 2017 at 18:21

Ram Ghadiyaram

29.4k16 gold badges102 silver badges133 bronze badges

answered May 24, 2017 at 16:49

Prem S

2373 silver badges8 bronze badges

Comments

Ram Ghadiyaram · Accepted Answer · 2017-11-29 09:46:15Z

1

Just increase the spark.executor.heartbeatInterval to 20s, the error says that.

edited Nov 29, 2017 at 9:46

Ram Ghadiyaram

29.4k16 gold badges102 silver badges133 bronze badges

answered Oct 12, 2017 at 7:39

Luckylukee

7053 gold badges11 silver badges28 bronze badges

Comments

akl · Accepted Answer · 2020-01-22 22:31:39Z

1

You are seeing this issue due to the executor memory. Try increasing the memory to (x 2) so the containers don't time out while waiting on the remaining containers.

answered Jan 22, 2020 at 22:31

akl

464 bronze badges

Comments

0x5453 · Accepted Answer · 2021-12-31 19:37:20Z

0

For posterity: I was getting similar errors, but changing memory/timeout settings was not helping at all.

In my case the problem was that somebody was calling socket.setdefaulttimeout in a library function that I was calling before creating the Spark session. setdefaulttimeout affected all new sockets created after that point, including the socket that Spark used to communicate with YARN, so that connection would time out unexpectedly.

Needless to say, don't do this.

answered Dec 31, 2021 at 19:37

0x5453

13.8k2 gold badges36 silver badges70 bronze badges

Collectives™ on Stack Overflow

Timeout Exception in Apache-Spark during program Execution

6 Answers 6

Solution : value 600 is based on requirement

3 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Solution : value 600 is based on requirement

3 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related