My use case is the following: reading DirectByteBuffers from the network and decoding them into UTF-8 strings. I’ve observed that using a UTF_8 CharsetDecoder with a DirectByteBuffer is 3–4 times slower than with a HeapByteBuffer. From what I understand, this is due to intrinsic optimizations for ASCII input, as described here: https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231)
This brings me to a couple of questions:
Why can’t these optimizations be applied to DirectByteBuffer as well? Is it that intrinsic implementation is not possible in that case?
Would it be a reasonable approach to copy the contents of a DirectByteBuffer into a HeapByteBuffer (e.g., in 8KB chunks) before decoding? I’ve noticed this leads to lower decode latency, though I assume it comes at the cost of increased CPU usage.
Here is a small example:
fun main() {
val decoder = StandardCharsets.UTF_8.newDecoder()
val encoder = StandardCharsets.UTF_8.newEncoder()
repeat(10_000) {
val text = (0..10_000).joinToString(",") { UUID.randomUUID().toString() }
val direct = ByteBuffer.allocateDirect(10_000 * 100)
val heap = ByteBuffer.allocate(10_000 * 100)
val heapTmp = ByteBuffer.allocate(10_000 * 100)
val directEncoderTime = measureTime {
encoder.encode(CharBuffer.wrap(text), direct, true)
direct.flip()
}
val heapEncoderTime = measureTime {
encoder.encode(CharBuffer.wrap(text), heap, true)
heap.flip()
}
println("Direct encoding: $directEncoderTime")
println("Heap encoding: $heapEncoderTime")
val (directToHeapDecoded, directToHeapDecodeTime) = measureTimedValue {
heapTmp.put(direct)
heapTmp.flip()
direct.position(0)
decoder.decode(heapTmp)
}
val (directDecoded, directDecodeTime) = measureTimedValue {
decoder.decode(direct)
}
val (heapDecoded, heapDecodeTime) = measureTimedValue {
decoder.decode(heap)
}
println("DirectToHeap decoding: $directToHeapDecodeTime")
println("Direct decoding: $directDecodeTime")
println("Heap decoding: $heapDecodeTime")
}
}