Spark: DecoderException: java.lang.OutOfMemoryError

Question

I am running a Spark streaming application on a cluster with 3 worker nodes. Once in a while jobs are failing due to the following exception:

Job aborted due to stage failure: Task 0 in stage 4508517.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4508517.0 (TID 1376191, 172.31.47.126): io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:440)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:187)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:165)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146)
... 10 more

I am submitting the job in client mode without any special parameters. Both master and workers have 15 g of memory. Spark Version is 1.4.0.

Is this solvable by tuning configuration?

One thing worth pointing out is that we use a lot of DStream.cache in our code. — user3646174
– user3646174, Commented Oct 14, 2015 at 1:59
> Is this solvable by tuning configuration? You should've tried that already, see --executor-memory and --driver-memory. Don't forget to drop DStreams that are to no use for you anymore, with DStream.unpersist. — mehmetminanc
– mehmetminanc, Commented Oct 14, 2015 at 5:08

lhaferkamp · Accepted Answer · 2016-01-18 11:14:41Z

1

I'm facing the same problem and found out that its probably caused by a memory leak in netty version 4.0.23.Final which is used by Spark 1.4 (see https://github.com/netty/netty/issues/3837)

It is solved at least in Spark 1.5.0 (see https://issues.apache.org/jira/browse/SPARK-8101) which uses netty 4.0.29.Final.

So an upgrade to the latest Spark version should solve the problem. I will try it the next days.

Additionally Spark Jobserver in the current version forces netty 4.0.23.Final, so it needs a fix too.

EDIT: I upgraded to Spark 1.6 with netty 4.0.29.Final but still getting a direct buffer OOM using Spark Jobserver.

edited Jan 18, 2016 at 11:14

answered Jan 12, 2016 at 8:23

lhaferkamp

7818 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Thomas Decaux Over a year ago

Using Spark 1.6.2, still get this error, it's not about executor memory I am sure, but no idea how to fix it.

Collectives™ on Stack Overflow

Spark: DecoderException: java.lang.OutOfMemoryError

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related