I'm kind of answering my own question even though I don't think this is the best solution. Not knowing Swift well I would this may be the best we can do but there is a potential memory allocation with the cast to UnsafeMutableRawPointer(mutating:) for memmove(). There has to be a way to cast that, otherwise inserting that allocation in my code would certainly kill performance.
If anyone knows how to avoid the cast to UnsafeMutableRawPointer please let me know.
var src: [UInt32] = [1, 2, 3, 4]
var dest: [UInt32] = [0, 0, 0, 0]
let elemSize = MemoryLayout<UInt32>.stride
dest.withUnsafeBytes { (destBuffer: UnsafeRawBufferPointer) in
src.withUnsafeBytes { (srcBuffer: UnsafeRawBufferPointer) in
// memmove requires UnsafeMutableRawPointer but how do we avoid this allocation?
// maybe Swift optimizes this somehow but it looks really bad from here
let destPtr = UnsafeMutableRawPointer(mutating: destBuffer.baseAddress)
let destOffset = destPtr! + elemSize
let srcOffset = srcBuffer.baseAddress! + 0
// copy 2 elements from src[0] to dest[1]
memmove(destOffset, srcOffset, elemSize * 2)
}
}
print(dest) // [0, 1, 2, 0]
EDIT 1: Getting closer now. replaceSubrange() is obviously slower than memmove(), about 6-7x slower in fact. The smaller the byte count the faster replaceSubrange() is by comparison. In a real example you'd only get the arrays bytes once before performing all the memmove() calls so it's even faster than this in practice.
replaceSubrange: 0.750978946685791
memmove: 0.139282941818237
func TestMemmove() {
var src: [UInt32] = Array(repeating: 1, count: 1000)
var dest: [UInt32] = Array(repeating: 0, count: 1000)
let elemSize = MemoryLayout<UInt32>.stride
let testCycles = 100000
let rows = 200
var startTime = CFAbsoluteTimeGetCurrent()
for _ in 0..<testCycles {
dest.replaceSubrange(1...1+rows, with: src[0...rows])
}
var endTime = CFAbsoluteTimeGetCurrent()
print("replaceSubrange: \(endTime - startTime)")
startTime = CFAbsoluteTimeGetCurrent()
for _ in 0..<testCycles {
dest.withUnsafeMutableBytes { destBytes in
src.withUnsafeMutableBytes { srcBytes in
let destOffset = destBytes.baseAddress! + elemSize
let srcOffset = srcBytes.baseAddress! + 0
memmove(destOffset, srcOffset, elemSize * rows)
}
}
}
endTime = CFAbsoluteTimeGetCurrent()
print("memmove: \(endTime - startTime)")
}
EDIT 2: After all this stupidity just call memmove from c is fastest. Swift will pass pointers to the first element of the array to c function and you can use pointer arithmetic from c to handle the offsets which required .withUnsafeXXX calls in Swift (that probably allocated some classes wrappers).
The conclusion is that Swift is slow so patch out to c with any performance sensitive code.
BlockMove: 0.0957469940185547
replaceSubrange: 1.89903497695923
memmove: 0.136561989784241
// from .c file bridged to Swift
void BlockMove (void* dest, int destOffset, const void* src, int srcOffset, size_t count) {
memmove(dest + destOffset, src + srcOffset, count);
}
func TestMemmove() {
var src: [UInt32] = Array(repeating: 1, count: 1000)
var dest: [UInt32] = Array(repeating: 0, count: 1000)
let elemSize = MemoryLayout<UInt32>.stride
let testCycles = 100000
let rows = 500
var startTime: CFAbsoluteTime = 0
var endTime: CFAbsoluteTime = 0
// BlockMove (from c)
startTime = CFAbsoluteTimeGetCurrent()
for _ in 0..<testCycles {
BlockMove(&dest, Int32(elemSize), &src, 0, Int32(elemSize * rows))
}
endTime = CFAbsoluteTimeGetCurrent()
print("BlockMove: \(endTime - startTime)")
// replaceSubrange
startTime = CFAbsoluteTimeGetCurrent()
for _ in 0..<testCycles {
dest.replaceSubrange(1...1+rows, with: src[0...rows])
}
endTime = CFAbsoluteTimeGetCurrent()
print("replaceSubrange: \(endTime - startTime)")
// memmove
startTime = CFAbsoluteTimeGetCurrent()
for _ in 0..<testCycles {
dest.withUnsafeMutableBytes { destBytes in
src.withUnsafeMutableBytes { srcBytes in
let destOffset = destBytes.baseAddress! + elemSize
let srcOffset = srcBytes.baseAddress! + 0
memmove(destOffset, srcOffset, elemSize * rows)
}
}
}
endTime = CFAbsoluteTimeGetCurrent()
print("memmove: \(endTime - startTime)")
}
[0...0]isrange.[0...2]means from 0th to 2nd.