Commit f555ee4
Minor binary search optimization for field lookup slow path.
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.
Loop body before:
```
.LBB0_2:
add w8, w12, #1
cmp w8, w11
b.gt .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
add w12, w8, w11
add w12, w12, w12, lsr #31
asr w12, w12, #1
smaddl x0, w12, w10, x9
ldr w13, [x0]
cmp w13, w1
b.lo .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
b.ls .LBB0_7 // Predictable branch - ends the loop
sub w11, w12, #1
cmp w8, w11
b.le .LBB0_3 // Predictable branch - continues the loop
```
Loop body after:
```
.LBB7_1:
cmp w9, w11
b.hi .LBB7_4 // Predictable branch - ends the loop
add w12, w9, w11
lsr w12, w12, #1
umaddl x0, w12, w8, x10
sub w14, w12, #1
ldr w13, [x0]
cmp w13, w1
csel w11, w14, w11, hs
csinc w9, w9, w12, hs
b.ne .LBB7_1 // Predictable branch - continues the loop
```
PiperOrigin-RevId: 7032139211 parent 671ae8f commit f555ee4
1 file changed
+21
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | 10 | | |
12 | 11 | | |
13 | 12 | | |
| |||
27 | 26 | | |
28 | 27 | | |
29 | 28 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
38 | 45 | | |
39 | | - | |
40 | | - | |
41 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
42 | 50 | | |
43 | | - | |
44 | 51 | | |
| 52 | + | |
45 | 53 | | |
46 | 54 | | |
47 | 55 | | |
| |||
0 commit comments