Skip to content

Conversation

@worr
Copy link
Contributor

@worr worr commented Sep 20, 2014

In Solaris, sched_yield lives in librt, rather than libc. This patch adds a
check which will link in librt if necessary.

In Solaris, sched_yield lives in librt, rather than libc. This patch adds a
check which will link in librt if necessary.
@cbsmith
Copy link

cbsmith commented Sep 21, 2014

This looks perfect. Small/compact/goes from "not work" to "work".

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Sep 22, 2014

Thanks!

xfxyjwf added a commit that referenced this pull request Sep 22, 2014
Add check for sched_yield in librt
@xfxyjwf xfxyjwf merged commit a48c08a into protocolbuffers:master Sep 22, 2014
TeBoring pushed a commit to TeBoring/protobuf that referenced this pull request Jan 19, 2019
Moved DynASM to third_party to comply with Google policy.
copybara-service bot pushed a commit that referenced this pull request Nov 28, 2024
Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 700864625
copybara-service bot pushed a commit that referenced this pull request Dec 4, 2024
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 700864625
copybara-service bot pushed a commit that referenced this pull request Dec 4, 2024
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 700864625
copybara-service bot pushed a commit that referenced this pull request Dec 4, 2024
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 700864625
copybara-service bot pushed a commit that referenced this pull request Dec 4, 2024
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 700864625
copybara-service bot pushed a commit that referenced this pull request Dec 5, 2024
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 703213921
copybara-service bot pushed a commit that referenced this pull request Dec 5, 2024
On a Cortex-A55 this resulted in a 28.30% reduction in CPU and wall time for the binary search path.

Loop body before:
```
.LBB0_2:
        add     w8, w12, #1
        cmp     w8, w11
        b.gt    .LBB0_6 // Predictable branch, ends the loop
.LBB0_3:
        add     w12, w8, w11
        add     w12, w12, w12, lsr #31
        asr     w12, w12, #1
        smaddl  x0, w12, w10, x9
        ldr     w13, [x0]
        cmp     w13, w1
        b.lo    .LBB0_2 // Unpredictable branch here! Will be hit 50/50 in prod
        b.ls    .LBB0_7 // Predictable branch - ends the loop
        sub     w11, w12, #1
        cmp     w8, w11
        b.le    .LBB0_3 // Predictable branch - continues the loop
```

Loop body after:
```
.LBB7_1:
        cmp     w9, w11
        b.hi    .LBB7_4 // Predictable branch - ends the loop
        add     w12, w9, w11
        lsr     w12, w12, #1
        umaddl  x0, w12, w8, x10
        sub     w14, w12, #1
        ldr     w13, [x0]
        cmp     w13, w1
        csel    w11, w14, w11, hs
        csinc   w9, w9, w12, hs
        b.ne    .LBB7_1 // Predictable branch - continues the loop
```

PiperOrigin-RevId: 703214356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants