CharsetDecoder with DirectByteBuffer and HeapByteBuffer performance difference

Ask Question

Asked 6 months ago

Modified 6 months ago

Viewed 73 times

My use case is the following: reading DirectByteBuffers from the network and decoding them into UTF-8 strings. I’ve observed that using a UTF_8 CharsetDecoder with a DirectByteBuffer is 3–4 times slower than with a HeapByteBuffer. From what I understand, this is due to intrinsic optimizations for ASCII input, as described here: https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231)

This brings me to a couple of questions:

Why can’t these optimizations be applied to DirectByteBuffer as well? Is it that intrinsic implementation is not possible in that case?
Would it be a reasonable approach to copy the contents of a DirectByteBuffer into a HeapByteBuffer (e.g., in 8KB chunks) before decoding? I’ve noticed this leads to lower decode latency, though I assume it comes at the cost of increased CPU usage.

Here is a small example:

fun main() {
  val decoder = StandardCharsets.UTF_8.newDecoder()
  val encoder = StandardCharsets.UTF_8.newEncoder()

  repeat(10_000) {
    val text = (0..10_000).joinToString(",") { UUID.randomUUID().toString() }
    val direct = ByteBuffer.allocateDirect(10_000 * 100)
    val heap = ByteBuffer.allocate(10_000 * 100)
    val heapTmp = ByteBuffer.allocate(10_000 * 100)

    val directEncoderTime = measureTime {
      encoder.encode(CharBuffer.wrap(text), direct, true)
      direct.flip()
    }
    val heapEncoderTime = measureTime {
      encoder.encode(CharBuffer.wrap(text), heap, true)
      heap.flip()
    }
    println("Direct encoding: $directEncoderTime")
    println("Heap encoding: $heapEncoderTime")

    val (directToHeapDecoded, directToHeapDecodeTime) = measureTimedValue {
      heapTmp.put(direct)
      heapTmp.flip()
      direct.position(0)
      decoder.decode(heapTmp)
    }
    val (directDecoded, directDecodeTime) = measureTimedValue {
      decoder.decode(direct)
    }
    val (heapDecoded, heapDecodeTime) = measureTimedValue {
      decoder.decode(heap)
    }
    println("DirectToHeap decoding: $directToHeapDecodeTime")
    println("Direct decoding: $directDecodeTime")
    println("Heap decoding: $heapDecodeTime")
  }
}

edited May 14 at 21:29

Progman

20.1k7 gold badges58 silver badges88 bronze badges

asked May 14 at 20:33

Artem Golovko

451 silver badge5 bronze badges

2

Don't ever try to measure the performance of things without using JMH, or you can get massive lies that tell you the exact opposite of the truth.

Louis Wasserman
– Louis Wasserman

2025-05-15 01:39:16 +00:00
Commented May 15 at 1:39
@LouisWasserman Yes, I agree, but in this case I think it's unnecessary, because I'm not trying to give precise performance numbers - just pointing out that the difference comes from the implementation itself. If we compare "decodeArrayLoop" and "decodeBufferLoop", we'll see that the array version has a fast path for ASCII input, while the buffer version does not and in the mentioned article there is already JMH tests that're comparing performance with and without fast path optimization. github.com/openjdk/jdk/blob/master/src/java.base/share/classes/…

Artem Golovko
– Artem Golovko

2025-05-15 07:48:50 +00:00
Commented May 15 at 7:48
Don't use the term "UTF-8 strings". There is no such thing as a UTF-8 string. A string is a string. (Or, more precisely, a Kotlin string is always a UTF-16 string.) You mean "decoding UTF-8 byte arrays into strings". Also, to answer your question about copying to heap in 8K chunks: that's dangerous as you might get a chunk that ends with an incomplete UTF-8 sequence.

k314159
– k314159

2025-05-15 14:50:36 +00:00
Commented May 15 at 14:50
Both of those are likely to have weird instrinsic effects. I would not consider looking at the source code sufficient or that JMH can be skipped here. Having extensively benchmarked UTF-8 conversion logic, there's lots of mysterious unpredictable stuff the JVM does to code like this.

Louis Wasserman
– Louis Wasserman

2025-05-15 15:07:16 +00:00
Commented May 15 at 15:07
@k314159 1) I mean decoding sequence of ByteBuffer to the sequence of CharBuffer, I think it's clear, since that's the contract of the CharsetDecoder 2) String in JVM not always stored as UTF16, since it's stored internally as a byte[], rather than char[] github.com/openjdk/jdk/blob/master/src/java.base/share/classes/… 3) It's not dangerous to decode by chunks, CharsetDecoder support it and will return CoderResult.Underflow in such cases and we need to fill again 8kb temporary buffer

Artem Golovko
– Artem Golovko

2025-05-15 16:09:34 +00:00
Commented May 15 at 16:09

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

CharsetDecoder with DirectByteBuffer and HeapByteBuffer performance difference

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest