2

My use case is the following: reading DirectByteBuffers from the network and decoding them into UTF-8 strings. I’ve observed that using a UTF_8 CharsetDecoder with a DirectByteBuffer is 3–4 times slower than with a HeapByteBuffer. From what I understand, this is due to intrinsic optimizations for ASCII input, as described here: https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231)

This brings me to a couple of questions:

  1. Why can’t these optimizations be applied to DirectByteBuffer as well? Is it that intrinsic implementation is not possible in that case?

  2. Would it be a reasonable approach to copy the contents of a DirectByteBuffer into a HeapByteBuffer (e.g., in 8KB chunks) before decoding? I’ve noticed this leads to lower decode latency, though I assume it comes at the cost of increased CPU usage.

Here is a small example:

fun main() {
  val decoder = StandardCharsets.UTF_8.newDecoder()
  val encoder = StandardCharsets.UTF_8.newEncoder()

  repeat(10_000) {
    val text = (0..10_000).joinToString(",") { UUID.randomUUID().toString() }
    val direct = ByteBuffer.allocateDirect(10_000 * 100)
    val heap = ByteBuffer.allocate(10_000 * 100)
    val heapTmp = ByteBuffer.allocate(10_000 * 100)

    val directEncoderTime = measureTime {
      encoder.encode(CharBuffer.wrap(text), direct, true)
      direct.flip()
    }
    val heapEncoderTime = measureTime {
      encoder.encode(CharBuffer.wrap(text), heap, true)
      heap.flip()
    }
    println("Direct encoding: $directEncoderTime")
    println("Heap encoding: $heapEncoderTime")

    val (directToHeapDecoded, directToHeapDecodeTime) = measureTimedValue {
      heapTmp.put(direct)
      heapTmp.flip()
      direct.position(0)
      decoder.decode(heapTmp)
    }
    val (directDecoded, directDecodeTime) = measureTimedValue {
      decoder.decode(direct)
    }
    val (heapDecoded, heapDecodeTime) = measureTimedValue {
      decoder.decode(heap)
    }
    println("DirectToHeap decoding: $directToHeapDecodeTime")
    println("Direct decoding: $directDecodeTime")
    println("Heap decoding: $heapDecodeTime")
  }
}
5
  • 2
    Don't ever try to measure the performance of things without using JMH, or you can get massive lies that tell you the exact opposite of the truth. Commented May 15 at 1:39
  • @LouisWasserman Yes, I agree, but in this case I think it's unnecessary, because I'm not trying to give precise performance numbers - just pointing out that the difference comes from the implementation itself. If we compare "decodeArrayLoop" and "decodeBufferLoop", we'll see that the array version has a fast path for ASCII input, while the buffer version does not and in the mentioned article there is already JMH tests that're comparing performance with and without fast path optimization. github.com/openjdk/jdk/blob/master/src/java.base/share/classes/… Commented May 15 at 7:48
  • Don't use the term "UTF-8 strings". There is no such thing as a UTF-8 string. A string is a string. (Or, more precisely, a Kotlin string is always a UTF-16 string.) You mean "decoding UTF-8 byte arrays into strings". Also, to answer your question about copying to heap in 8K chunks: that's dangerous as you might get a chunk that ends with an incomplete UTF-8 sequence. Commented May 15 at 14:50
  • Both of those are likely to have weird instrinsic effects. I would not consider looking at the source code sufficient or that JMH can be skipped here. Having extensively benchmarked UTF-8 conversion logic, there's lots of mysterious unpredictable stuff the JVM does to code like this. Commented May 15 at 15:07
  • @k314159 1) I mean decoding sequence of ByteBuffer to the sequence of CharBuffer, I think it's clear, since that's the contract of the CharsetDecoder 2) String in JVM not always stored as UTF16, since it's stored internally as a byte[], rather than char[] github.com/openjdk/jdk/blob/master/src/java.base/share/classes/… 3) It's not dangerous to decode by chunks, CharsetDecoder support it and will return CoderResult.Underflow in such cases and we need to fill again 8kb temporary buffer Commented May 15 at 16:09

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.