Revisions to Sorting Floating Point Values

added 660 characters in body

Source Link

edited Apr 19, 2017 at 3:15

chux

36.5k
2
43
97

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 12,000,000,000 test cases.
As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
 static int x_cmp_ref(const void *av, const void *bv) {
   return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
 }
```

OP's has not stated the compare functionality of Not-a-number floats. A desirable aspect is that all NaN sort to one side, either all greater or all less than any other number, regardless of the NaN's "sign". x_cmp() considers sign first without regard to NaN-ness.

As a reference, I used the following to generate random float

float randf() {
  union {
      float f;
      unsigned char uc[sizeof (float)];
  } u;
  do {
    for (unsigned i=0; i<sizeof u.uc; i++) {
      u.uc[i] = (unsigned char) rand();
    }
  } while (!isfinite(u.f));
  return u.f;
}

At a later time, I may try to implement the idea in this comment

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 1,000,000,000 test cases.
As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
 static int x_cmp_ref(const void *av, const void *bv) {
   return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
 }
```

At a later time, I may try to implement the idea in this comment

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 2,000,000,000 test cases.
As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
 static int x_cmp_ref(const void *av, const void *bv) {
   return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
 }
```

OP's has not stated the compare functionality of Not-a-number floats. A desirable aspect is that all NaN sort to one side, either all greater or all less than any other number, regardless of the NaN's "sign". x_cmp() considers sign first without regard to NaN-ness.

As a reference, I used the following to generate random float

float randf() {
  union {
      float f;
      unsigned char uc[sizeof (float)];
  } u;
  do {
    for (unsigned i=0; i<sizeof u.uc; i++) {
      u.uc[i] = (unsigned char) rand();
    }
  } while (!isfinite(u.f));
  return u.f;
}

At a later time, I may try to implement the idea in this comment

Post Undeleted by chux

occurred Apr 19, 2017 at 3:09

added 660 characters in body

Source Link

edited Apr 19, 2017 at 3:07

chux

36.5k
2
43
97

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 1,000,000,000 test cases.

As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
 static int x_cmp_ref(const void *av, const void *bv) {
   return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
 }
```

At a later time, I may try to implement the idea in this comment

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 1,000,000,000 test cases.

As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
 static int x_cmp_ref(const void *av, const void *bv) {
   return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
 }
```

At a later time, I may try to implement the idea in this comment

Post Deleted by chux

occurred Apr 19, 2017 at 2:42

Source Link

answered Apr 19, 2017 at 2:41

chux

36.5k
2
43
97

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

 // ax.f = a->x, bx.f = b->x;
 ax.f = a->x;
 bx.f = b->x;

Stack Exchange Network

Return to Answer