Revisions to Doing a qsort function to compare strings, case insensitive

ctype functions work better with unsigned char

Source Link

edited Mar 5, 2021 at 15:41

88.7k
14
104
327

As mentioned in another answer, we need to use unsigned char with tolower():

int ci_strcmp(const unsigned char *s1, const unsigned char *s2)
{
    for (;;) {
        int c1 = tolower(*s1++);
        int c2 = tolower(*s2++);
        int result = (c1 > c2) - (c1 < c2);
        if (result || !c1) return result;
    }
}

/* Adapter for qsort */
int scmp(const void *p1, const void *p2)
{
    const unsigned char *const *sp1 = p1;
    const unsigned char *const *sp2 = p2;
    return ci_strcmp(*sp1, *sp2);
}

As mentioned in another answer, we need to use unsigned char with tolower():

int ci_strcmp(const unsigned char *s1, const unsigned char *s2)
{
    for (;;) {
        int c1 = tolower(*s1++);
        int c2 = tolower(*s2++);
        int result = (c1 > c2) - (c1 < c2);
        if (result || !c1) return result;
    }
}

/* Adapter for qsort */
int scmp(const void *p1, const void *p2)
{
    const unsigned char *const *sp1 = p1;
    const unsigned char *const *sp2 = p2;
    return ci_strcmp(*sp1, *sp2);
}

Use sizeof!

Source Link

edited Mar 5, 2021 at 10:00

Toby Speight

88.7k
14
104
327

We missed a few necessary includes for the comparison function:

#include <ctype.h>
#include <stdint.h>
#include <string.h>

And for the test program:

#include <stdio.h>
#include <stdlib.h>

We're assigning char* pointers using string literals. Whilst C allows this for historical reasionsreasons, it's dangerous, and should be avoided (because assignment through such pointers is UB, and the compiler can't spot that for you). It's easy to fix that:
```
const char* strings[4] = {"Onus", "deacon", "Alex", "zebra"};
```

And one that GCC can't spot:

qsort(strings, 4, 8, scmp);
//               ^^^

We can't assume that a const char* is a particular size on the target platform. Whilst this value may be correct where you are, it's not true everywhere. Luckily, C provides the sizeof operator to help:

qsort(strings, 4, sizeof strings[0], scmp);

We missed a few necessary includes for the comparison function:

#include <ctype.h>
#include <stdint.h>
#include <string.h>

And for the test program:

#include <stdio.h>
#include <stdlib.h>

We're assigning char* pointers using string literals. Whilst C allows this for historical reasions, it's dangerous, and should be avoided (because assignment through such pointers is UB, and the compiler can't spot that for you). It's easy to fix that:
```
const char* strings[4] = {"Onus", "deacon", "Alex", "zebra"};
```

We missed a few necessary includes for the comparison function:

#include <ctype.h>
#include <stdint.h>
#include <string.h>

And for the test program:

#include <stdio.h>
#include <stdlib.h>

We're assigning char* pointers using string literals. Whilst C allows this for historical reasons, it's dangerous, and should be avoided (because assignment through such pointers is UB, and the compiler can't spot that for you). It's easy to fix that:
```
const char* strings[4] = {"Onus", "deacon", "Alex", "zebra"};
```

And one that GCC can't spot:

qsort(strings, 4, 8, scmp);
//               ^^^

We can't assume that a const char* is a particular size on the target platform. Whilst this value may be correct where you are, it's not true everywhere. Luckily, C provides the sizeof operator to help:

qsort(strings, 4, sizeof strings[0], scmp);

added 1 character in body; added 62 characters in body; added 50 characters in body

Source Link

edited Mar 5, 2021 at 9:52

Toby Speight

88.7k
14
104
327

We might consider including some non-alphabetic characters, too ("should "A-team" sort before or after "abacus"?). And we'll want some tests of equal and almost-equal strings; not just ones that differ in the first character.

Looking in detail at the comparison function, it does a lot of extra work to make a copy of each input string (we have to traverse each string up to three times - once to find its length, once to copy and convert case, and once to compare).

We might consider including some non-alphabetic characters, too ("should "A-team" sort before or after "abacus"?).

Looking in detail at the comparison function, it does a lot of extra work to make a copy of each input string (we have to traverse each string up to three times - once to find its length, once to copy and convert case, and once to compare.

We might consider including some non-alphabetic characters, too ("should "A-team" sort before or after "abacus"?). And we'll want some tests of equal and almost-equal strings; not just ones that differ in the first character.

Looking in detail at the comparison function, it does a lot of extra work to make a copy of each input string (we have to traverse each string up to three times - once to find its length, once to copy and convert case, and once to compare).

Source Link

answered Mar 5, 2021 at 8:16

Toby Speight

88.7k
14
104
327

Loading

Stack Exchange Network

Return to Answer