Skip to content

Commit f555ee4

Browse files
Minor binary search optimization for field lookup slow path.
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path. Loop body before: ``` .LBB0_2: add w8, w12, #1 cmp w8, w11 b.gt .LBB0_6 // Predictable branch, ends the loop .LBB0_3: add w12, w8, w11 add w12, w12, w12, lsr #31 asr w12, w12, #1 smaddl x0, w12, w10, x9 ldr w13, [x0] cmp w13, w1 b.lo .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod b.ls .LBB0_7 // Predictable branch - ends the loop sub w11, w12, #1 cmp w8, w11 b.le .LBB0_3 // Predictable branch - continues the loop ``` Loop body after: ``` .LBB7_1: cmp w9, w11 b.hi .LBB7_4 // Predictable branch - ends the loop add w12, w9, w11 lsr w12, w12, #1 umaddl x0, w12, w8, x10 sub w14, w12, #1 ldr w13, [x0] cmp w13, w1 csel w11, w14, w11, hs csinc w9, w9, w12, hs b.ne .LBB7_1 // Predictable branch - continues the loop ``` PiperOrigin-RevId: 703213921
1 parent 671ae8f commit f555ee4

File tree

1 file changed

+21
-13
lines changed

1 file changed

+21
-13
lines changed

upb/mini_table/message.c

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77

88
#include "upb/mini_table/message.h"
99

10-
#include <inttypes.h>
1110
#include <stddef.h>
1211
#include <stdint.h>
1312

@@ -27,21 +26,30 @@ const upb_MiniTableField* upb_MiniTable_FindFieldByNumber(
2726
}
2827

2928
// Slow case: binary search
30-
int lo = m->UPB_PRIVATE(dense_below);
31-
int hi = m->UPB_PRIVATE(field_count) - 1;
32-
while (lo <= hi) {
33-
int mid = (lo + hi) / 2;
34-
uint32_t num = m->UPB_PRIVATE(fields)[mid].UPB_PRIVATE(number);
35-
if (num < number) {
36-
lo = mid + 1;
37-
continue;
29+
uint32_t lo = m->UPB_PRIVATE(dense_below);
30+
int32_t hi = m->UPB_PRIVATE(field_count) - 1;
31+
const upb_MiniTableField* base = m->UPB_PRIVATE(fields);
32+
while (hi >= (int32_t)lo) {
33+
uint32_t mid = (hi + lo) / 2;
34+
uint32_t num = base[mid].UPB_ONLYBITS(number);
35+
// These comparison operations allow, on ARM machines, to fuse all these
36+
// branches into one comparison followed by two CSELs to set the lo/hi
37+
// values, followed by a BNE to continue or terminate the loop. Since binary
38+
// search branches are generally unpredictable (50/50 in each direction),
39+
// this is a good deal. We use signed for the high, as this decrement may
40+
// underflow if mid is 0.
41+
int32_t hi_mid = mid - 1;
42+
uint32_t lo_mid = mid + 1;
43+
if (num == number) {
44+
return &base[mid];
3845
}
39-
if (num > number) {
40-
hi = mid - 1;
41-
continue;
46+
if (num < number) {
47+
lo = lo_mid;
48+
} else {
49+
hi = hi_mid;
4250
}
43-
return &m->UPB_PRIVATE(fields)[mid];
4451
}
52+
4553
return NULL;
4654
}
4755

0 commit comments

Comments
 (0)