Case Insensitive String Comparison in C

Question

I have two postcodes char* that I want to compare, ignoring case. Is there a function to do this?

Or do I have to loop through each use the tolower function and then do the comparison?

Any idea how this function will react with numbers in the string

Thanks

I think I wrote that in a bad way, postcode is not a type , just the real world value the char* will hold. — bond425
– bond425, Commented Apr 28, 2011 at 15:11
What platform are you on? Many platforms have a platform-specific function to do this. — Random832
– Random832, Commented Apr 28, 2011 at 15:11
If you are comparing a number with a letter, then you know the strings aren't equivalent, regardless of case. — Alex Reynolds
– Alex Reynolds, Commented Apr 28, 2011 at 15:11
I assume you just mean ASCII string comparison? Not generic to the whole world across multiple locales? — Doug T.
– Doug T., Commented Apr 28, 2011 at 15:11
The comparison could result in comparing a number and a letter, I need to test if two postcodes are equal to each other, one is greater than or one is less than. The greater than, less than part is confusing, I'm not sure how that's going to work out — bond425
– bond425, Commented Apr 28, 2011 at 16:49

4 revs, 2 users 97% · Accepted Answer · 2018-08-23 18:29:59Z

71

There is no function that does this in the C standard. Unix systems that comply with POSIX are required to have strcasecmp in the header strings.h; Microsoft systems have stricmp. To be on the portable side, write your own:

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

But note that none of these solutions will work with UTF-8 strings, only ASCII ones.

edited Aug 23, 2018 at 18:29

community wiki

4 revs, 2 users 97%
Fred Foo

Sign up to request clarification or add additional context in comments.

9 Comments

RobertoP Over a year ago

This implementation is not correct; it will incorrectly return 0 when b is a substring of a. For example it will return 0 for strcicmp("another", "an") but it should return 1

B. Nadolson Over a year ago

This is bad advice. There is no reason to "write your own" standard C text functions to deal with a simple name difference. Do #ifdef _WINDOWS ... #define strcasecmp stricmp ... #endif and put it in an appropriate header. The above comments where the author had to fix the function to work right is why rewriting standard C functions is counter-productive if a far simpler solution is available.

minexew Over a year ago

Neither _stricmp nor strcasecmp is available in -std=c++11. They also have different semantics with regards to locale.

YoTengoUnLCD Over a year ago

This will break awfully when a or b are NULL.

chux Over a year ago

@YoTengoUnLCD Re: break awfully when a or b are NULL. Breaking with a and/or b as NULL is commonly accepted practice as a null pointer does not point to a string. Not a bad check to add, yet what to return? Should cmp("", NULL) return 0, INT_MIN? There is not consensus on this. Note: C allows UB with strcmp(NULL, "abc");.

|

Georg Plaz · Accepted Answer · 2021-02-22 19:45:19Z

49

Take a look at strcasecmp() in strings.h.

edited Feb 22, 2021 at 19:45

Georg Plaz

6,0285 gold badges44 silver badges66 bronze badges

answered Apr 28, 2011 at 15:11

Mihran Hovsepyan

11.2k15 gold badges69 silver badges113 bronze badges

9 Comments

Brigham Over a year ago

I think you mean int strcasecmp(const char *s1, const char *s2); in strings.h

Fred Foo Over a year ago

This function is non-standard; Microsoft calls it stricmp. @entropo: strings.h is a header for compatibility with 1980s Unix systems.

Fred Foo Over a year ago

@entropo: apologies, POSIX does seem to define strings.h. It also defined strcasecmp, to be declared in that header. ISO C doesn't have it, though.

entropo Over a year ago

See: difference-between-string-h-and-strings-h . Some C standard libraries have merged all of the non-deprecated functions into string.h. See, e.g., Glibc

Mihran Hovsepyan Over a year ago

Yes it seems there is such header strings.h and in theory strcasecmp should be declared there. But all compilers I used have strcasecmp declared in string.h. at least cl, g++, forte c++ compilers has it.

|

chux · Accepted Answer · 2024-11-27 11:52:59Z

Additional pitfalls to watch out for when doing case insensitive compares:

Comparing as lower or as upper case? (common enough issue)

Both below will return 0 with strcicmpL("A", "a") and strcicmpU("A", "a").
Yet strcicmpL("A", "_") and strcicmpU("A", "_") can return different signed results as '_' is often between the upper and lower case letters.

This affects the sort order when used with qsort(..., ..., ..., strcicmp). Non-standard library C functions like the commonly available stricmp() or strcasecmp() tend to be well defined and favor comparing via lowercase. Yet variations exist.

int strcicmpL(char const *a, char const *b) {
  while (*b) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return tolower(*a);
}

int strcicmpU(char const *a, char const *b) {
  while (*b) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return toupper(*a);
}

char can have a negative value. (not rare)

touppper(int) and tolower(int) are specified for unsigned char values and the negative EOF. Further, strcmp() returns results as if each char was converted to unsigned char, regardless if char is signed or unsigned.

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)

char can have a negative value and not 2's complement. (rare)

[edit 2024]
This is no longer possible with C23 as the standard now requires 2's compliment for signed integer types.

The above does not handle -0 nor other negative values properly as the bit pattern should be interpreted as unsigned char. To properly handle all integer encodings, change the pointer type first.

// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct

Locale (less common)

Although character sets using ASCII code (0-127) are ubiquitous, the remainder codes tend to have locale specific issues. So strcasecmp("\xE4", "a") might return a 0 on one system and non-zero on another.

Unicode (the way of the future)

If a solution needs to handle more than ASCII consider a unicode_strcicmp(). As C lib does not provide such a function, a pre-coded function from some alternate library is recommended. Writing your own unicode_strcicmp() is a daunting task.

Do all letters map one lower to one upper? (pedantic)

[A-Z] maps one-to-one with [a-z], yet various locales map various lower case characters to one upper and visa-versa. Further, some uppercase characters may lack a lower case equivalent and again, visa-versa.

This obliges code to covert through both tolower() and tolower().

int d = tolower(toupper(*a)) - tolower(toupper(*b));

Again, potential different results for sorting if code did tolower(toupper(*a)) vs. toupper(tolower(*a)).

Portability

@B. Nadolson recommends to avoid rolling your own strcicmp() and this is reasonable, except when code needs high equivalent portable functionality.

Below is an approach that even performed faster than some system provided functions. It does a single compare per loop rather than two by using 2 different tables that differ with '\0'. Your results may vary.

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // Compare using tables that differ slightly so no need to check for a null character.
  while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}

Pedantically: there exist systems where the range of unsigned char is like unsigned, so better to compare in the end.

unsigned char c1 = low1[*(const unsigned char *)a];
unsigned char c2 = low1[*(const unsigned char *)b];
return (c1 > c2) - (c1 < c2);

Zohar81 · Accepted Answer · 2016-01-04 11:16:54Z

8

I've found built-in such method named from which contains additional string functions to the standard header .

Here's the relevant signatures :

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

I also found it's synonym in xnu kernel (osfmk/device/subrs.c) and it's implemented in the following code, so you wouldn't expect to have any change of behavior in number compared to the original strcmp function.

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}

answered Jan 4, 2016 at 11:16

Zohar81

5,2145 gold badges35 silver badges106 bronze badges

3 Comments

Mike C. Over a year ago

Kudos for mentioning the safer strncasecmp() function!

chux Over a year ago

strcasecmp() and strncasecmp() are not part of the standard C library, but common additions in *nix.

Andrew Henle Over a year ago

Note that there's no reason to implement your own tolower() function if you're compiling with a standards-compliant compiler/C implementation - tolower() is a required function per effectively every version of the C standard.

Jonathan Wood · Accepted Answer · 2011-04-28 15:17:43Z

6

I would use stricmp(). It compares two strings without regard to case.

Note that, in some cases, converting the string to lower case can be faster.

answered Apr 28, 2011 at 15:17

Jonathan Wood

68.1k86 gold badges309 silver badges542 bronze badges

Comments

Miljen Mikic · Accepted Answer · 2019-05-23 15:37:36Z

4

As others have stated, there is no portable function that works on all systems. You can partially circumvent this with simple ifdef:

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

answered May 23, 2019 at 15:37

Miljen Mikic

15.3k8 gold badges64 silver badges67 bronze badges

2 Comments

Gustavo Vargas Over a year ago

this reminds me that strings.h (with an s), is not the same as string.h.... I've spent some time looking from strcasecmp on the wrong one....

Miljen Mikic Over a year ago

@GustavoVargas Me too, then I decided to write it here and save time for the future myself and others :)

Gabriel Staples · Accepted Answer · 2024-07-25 22:37:45Z

4

POSIX `<strings.h>` header file replacement for `strcasecmp()` and `strncasecmp()` in C

Update 25 July 2024:

My latest work on this is now here:

The above library contains my implementations of my_strcasecmp() and my_strncasecmp() and uses Gtest to test them directly against the POSIX functions strcasecmp() and strncasecmp() which are contained in the non-C-standard POSIX header file named strings.h.

To test and run:

If in Linux, use your regular Bash terminal. If in Windows, use the MSYS2 terminal. See my instructions here: Installing & setting up MSYS2 from scratch, including adding all 7 profiles to Windows Terminal
First, install Gtest by following my instructions here: How do I build and use googletest (gtest) and googlemock (gmock) with gcc/g++ or clang?

Then, clone my repo and run the unit test file as a Bash script:

# clone it
git clone https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world.git

# cd into the directory
cd eRCaGuy_hello_world/c

# build and run the unit test
./stringslib_unittest.cpp

Example run and output:

eRCaGuy_hello_world/c$ ./stringslib_unittest.cpp 
Running main() from /home/gabriel/Downloads/Install_Files/gtest/googletest-1.14.0/googletest/src/gtest_main.cc
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from stringslib
[ RUN      ] stringslib.strncasecmp
[       OK ] stringslib.strncasecmp (0 ms)
[ RUN      ] stringslib.strcasecmp
[       OK ] stringslib.strcasecmp (0 ms)
[----------] 2 tests from stringslib (0 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (0 ms total)
[  PASSED  ] 2 tests.

`strncmpci()`, a direct, drop-in case-insensitive string comparison replacement for `strncmp()` and `strcmp()`

I'm not really a fan of the most-upvoted answer here (in part because it seems like it isn't correct since it should continue if it reads a null terminator in either string--but not both strings at once--and it doesn't do this), so I wrote my own.

This is a direct drop-in replacement for strncmp(), and has been tested with numerous test cases, as shown below.

It is identical to strncmp() except:

It is case-insensitive.
The behavior is NOT undefined (it is well-defined) if either string is a null ptr. Regular strncmp() has undefined behavior if either string is a null ptr (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
It returns INT_MIN as a special sentinel error value if either input string is a NULL ptr.

LIMITATIONS: Note that this code works on the original 7-bit ASCII character set only (decimal values 0 to 127, inclusive), NOT on unicode characters, such as unicode character encodings UTF-8 (the most popular), UTF-16, and UTF-32.

Here is the code only (no comments):

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

Fully-commented version:

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
    // long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
    // of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
    // that string still has more characters in it.
    // Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
    // `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
    // both of these C-strings outside of their array bounds.
    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

Test code:

Download the entire sample code, with unit tests, from my eRCaGuy_hello_world repository here: "strncmpci.c":

(this is just a snippet)

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

Sample output:

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------

INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
  a: strncmpci("hey", "HEY", 3) is 0
  b: 'h' - 'H' is 32

------ beginning ------

All unit tests passed!

References:

This question & other answers here served as inspiration and gave some insight (Case Insensitive String Comparison in C)
http://www.cplusplus.com/reference/cstring/strncmp/
https://en.wikipedia.org/wiki/ASCII
https://en.cppreference.com/w/c/language/operator_precedence
Undefined Behavior research I did to fix part of my code above (see comments below):
1. Google search for "c undefined behavior reading outside array bounds"
2. Is accessing a global array outside its bound undefined behavior?
3. https://en.cppreference.com/w/cpp/language/ub - see also the many really great "External links" at the bottom!
4. 1/3: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
5. 2/3: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
6. 3/3: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
7. https://blog.regehr.org/archives/213
8. https://www.geeksforgeeks.org/accessing-array-bounds-ccpp/

Topics to further research

(Note: this is C++, not C) Lowercase of Unicode character
tolower_tests.c on OnlineGDB: https://onlinegdb.com/HyZieXcew

TODO:

Make a version of this code which also works on Unicode's UTF-8 implementation (character encoding)!

edited Jul 25, 2024 at 22:37

answered Mar 22, 2019 at 5:40

Gabriel Staples

56.2k35 gold badges299 silver badges394 bronze badges

14 Comments

Pavel P Over a year ago

in part because it isn't correct since ... you code isn't correct either. There is no point to use tolower, it's going to be by far the slowest part of the function. If you really want your function to be locale aware and handle non-ascii chars then you have to cast your chars to unsigned first. Otherwise, your code results in UB

GaspardP Over a year ago

Voting down this solution - it advertizes to be a drop-in/tested solution, but a simple additional test using "" shows that it will not behave like the linux/windows version of it, returning strncmpci("", "", 0) = -9999 instead of 0

Gabriel Staples Over a year ago

Hi @GaspardP, thanks for pointing out this edge case. I've fixed my code now. The fix was simple. I initialized ret_code to 0 instead of to INT_MIN (or -9999 as it was in the code you tested), and then set it to INT_MIN only if one of the input strings is a NULL ptr. Now it works perfectly. The problem was simply that for n is 0, none of the blocks were entered (neither the if nor the while), so it simply returned what I had initialized ret_code to. Anyway, it's fixed now, & I've cleaned up my unit tests a ton and added in the test you mentioned. Hopefully you upvote now.

chux Over a year ago

Forming the address 1-past the range is OK. Dereferencing that address, as code does here with *str1, is UB. Code here is "using" it in that it attempting to read through that pointer. The UB is usually benign, yet remains UB. The whole point of a size parameter is to prevent access outside the array bounds - which this code violated. With num == 0, nothing should be read.

Gabriel Staples Over a year ago

Posted. This is my first question on that site: codereview.stackexchange.com/questions/255344/….

|

Andrei Suvorov · Accepted Answer · 2015-12-27 03:36:42Z

2

You can get an idea, how to implement an efficient one, if you don't have any in the library, from here

It use a table for all 256 chars.

in that table for all chars, except letters - used its ascii codes.
for upper case letter codes - the table list codes of lower cased symbols.

then we just need to traverse a strings and compare our table cells for a given chars:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);

answered Dec 27, 2015 at 3:36

Andrei Suvorov

5513 silver badges17 bronze badges

Comments

ericcurtin · Accepted Answer · 2019-12-14 11:09:35Z

1

Simple solution:

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}

edited Dec 14, 2019 at 11:09

answered Dec 14, 2019 at 10:33

ericcurtin

1,76719 silver badges24 bronze badges

Comments

Dominik Weber · Accepted Answer · 2024-11-20 22:58:25Z

1

Aspect that need consideration:

encoding of strings (MBCS / UTF8)
Traits / locale for comparison (string collation) Do diacritics matter? Also see: https://learn.microsoft.com/en-us/windows/win32/intl/sort-order-identifiers

Now, if it's just UK post codes (NW3 6SG), then well, |0x20 for A-Z and byte compare. But does the space matter? You might have some special rules.

And for prefix searches, one has to check the function used (e.g. stricmp) for what happens when one string is shorter than the other.

The sad reality is that there are MANY string comparisons that can be used, but the edge cases are complex.

TL;DR: just roll your own, depending on your needs.

answered Nov 20, 2024 at 22:58

Dominik Weber

576 bronze badges

Comments

The Oathman · Accepted Answer · 2022-02-23 20:48:35Z

0

if we have a null terminated character:

   bool striseq(const char* s1,const char* s2){ 
     for(;*s1;){ 
       if(tolower(*s1++)!=tolower(*s2++)) 
         return false; 
      } 
      return *s1 == *s2;
    }

or with this version that uses bitwise operations:

    int striseq(const char* s1,const char* s2)
       {for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}

i'm not sure if this works with symbols, I haven't tested there, but works fine with letters.

edited Feb 23, 2022 at 20:48

answered Feb 22, 2022 at 18:11

The Oathman

1658 bronze badges

1 Comment

Andrew Henle Over a year ago

This sure seems like it would fail if s1 is longer than s2.

jaldk · Accepted Answer · 2016-01-22 11:40:37Z

-1

int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

good luck

Edit-lowerCaseWord function get a char* variable with, and return the lower case value of this char*. For example "AbCdE" for value of char*, will return "abcde".

Basically what it does is to take the two char* variables, after being transferred to lower case, and make use the strcmp function on them.

For example- if we call the strcmpInsensitive function for values of "AbCdE", and "ABCDE", it will first return both values in lower case ("abcde"), and then do strcmp function on them.

edited Jan 22, 2016 at 11:40

answered Jan 21, 2016 at 21:51

jaldk

1131 silver badge7 bronze badges

5 Comments

davejal Over a year ago

some explanation could go a long way

T.S Over a year ago

It seems wholly inefficient to lower both input strings, when the function "might" return as soon as after the first character compare instead. e.g. "ABcDe" vs "BcdEF", could return very quickly, without needing to lower or upper anything other than the first character of each string.

Ruud van Gaal Over a year ago

Not to mention leaking memory twice.

sth Over a year ago

You don't null-terminate your lower case strings, so the subsequent strcmp() might crash the program.

Stefan Vorkoetter Over a year ago

You also compute strlen(a) a total of strlen(a)+1 times. That together with the loop itself and you're traversing a strlen(a)+2 times.

smamran · Accepted Answer · 2016-02-14 10:17:06Z

-1

static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

Reference

answered Feb 14, 2016 at 10:17

smamran

7872 gold badges14 silver badges23 bronze badges

2 Comments

user966939 Over a year ago

The ORing idea is kind of nifty, but the logic is flawed. For example, ignoreCaseComp("`", "@", 1) and perhaps more importantly, ignoreCaseComp("\0", " ", 1) (i.e. where all bits other than bit 5 (decimal 32) are identical) both evaluates to 0 (match).

Andrew Henle Over a year ago

Short version of the above comment: this code is broken

Collectives™ on Stack Overflow

Case Insensitive String Comparison in C

13 Answers 13

9 Comments

9 Comments

Comments

3 Comments

Comments

2 Comments

POSIX `<strings.h>` header file replacement for `strcasecmp()` and `strncasecmp()` in C

`strncmpci()`, a direct, drop-in case-insensitive string comparison replacement for `strncmp()` and `strcmp()`

Test code:

Sample output:

References:

Topics to further research

TODO:

14 Comments

Comments

Comments

Comments

1 Comment

5 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

9 Comments

9 Comments

Comments

3 Comments

Comments

2 Comments

POSIX <strings.h> header file replacement for strcasecmp() and strncasecmp() in C

strncmpci(), a direct, drop-in case-insensitive string comparison replacement for strncmp() and strcmp()

Test code:

Sample output:

References:

Topics to further research

TODO:

14 Comments

Comments

Comments

Comments

1 Comment

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

POSIX `<strings.h>` header file replacement for `strcasecmp()` and `strncasecmp()` in C

`strncmpci()`, a direct, drop-in case-insensitive string comparison replacement for `strncmp()` and `strcmp()`