Returning the length of a char array in C

Question

I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:

/* The normalize procedure normalizes a character array of size len 
   according to the following rules:
     1) turn all upper case letters into lower case ones
     2) turn any white-space character into a space character and, 
        shrink any n>1 consecutive whitespace characters to exactly 1 whitespace

     When the procedure returns, the character array buf contains the newly 
     normalized string and the return value is the new length of the normalized string.

*/
int
normalize(unsigned char *buf,   /* The character array contains the string to be normalized*/
                    int len     /* the size of the original character array */)
{
    /* use a for loop to cycle through each character and the built in c functions to analyze it */
    int i;

if(isspace(buf[0])){
    buf[0] = "";
}
if(isspace(buf[len-1])){
    buf[len-1] = "";
}

    for(i = 0;i < len;i++){
        if(isupper(buf[i])) {
            buf[i]=tolower(buf[i]);
        }
        if(isspace(buf[i])) {
            buf[i]=" ";
        }
        if(isspace(buf[i]) && isspace(buf[i+1])){
            buf[i]="";
        }
    }

    return strlen(*buf);


}

How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?

EDIT: I have made some corrections to my program based on the comments. Is it correct now?

/* The normalize procedure normalizes a character array of size len 
   according to the following rules:
     1) turn all upper case letters into lower case ones
     2) turn any white-space character into a space character and, 
        shrink any n>1 consecutive whitespace characters to exactly 1 whitespace

     When the procedure returns, the character array buf contains the newly 
     normalized string and the return value is the new length of the normalized string.

*/
int
normalize(unsigned char *buf,   /* The character array contains the string to be normalized*/
                    int len     /* the size of the original character array */)
{
    /* use a for loop to cycle through each character and the built in c funstions to analyze it */
    int i = 0;
    int j = 0;

    if(isspace(buf[0])){
        //buf[0] = "";
        i++;
    }
    if(isspace(buf[len-1])){
        //buf[len-1] = "";
        i++;
    }
    for(i;i < len;i++){
        if(isupper(buf[i])) {
            buf[j]=tolower(buf[i]);
            j++;
        }
        if(isspace(buf[i])) {
            buf[j]=' ';
            j++;
        }
        if(isspace(buf[i]) && isspace(buf[i+1])){
            //buf[i]="";
            i++;
        }
    }

    return strlen(buf);


}

Use ' ' instead of " " return strlen(*buf) should be return strlen(buf) — VoidStar
– VoidStar, Commented Feb 18, 2014 at 19:22
In C, a string ends with '\0'. Using len implies buf is an array of unsigned char, not necessarily a string. Choose one. — chux
– chux, Commented Feb 18, 2014 at 19:43
@Lesha 1) Firstly Is buf an array of unsigned char with size len (with potentially embedded '\0' or is it a C string? "sequence of characters terminated by and including the first null character"? 2) Is len the size of the array or the length of the string (which does not include the '\0')? — chux
– chux, Commented Feb 18, 2014 at 19:59

cmaster - reinstate monica · Accepted Answer · 2014-02-18 21:26:02Z

1

The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:

int normalizeString(char* buf, int len) {
    int readPosition, writePosition;
    bool hadWhitespace = false;
    for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
        if(isspace(buf[readPosition]) {
            if(!hadWhitespace) buf[writePosition++] = ' ';
            hadWhitespace = true;
        } else if(...) {
            ...
        }
    }
    return writePosition;
}

Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:

int normalizeString(char* string) {    //No length is passed, it is implicit in the null byte.
    char* in = string, *out = string;
    bool hadWhitespace = false;
    for(; *in; in++) {    //loop until the zero byte is encountered
        if(isspace(*in) {
            if(!hadWhitespace) *out++ = ' ';
            hadWhitespace = true;
        } else if(...) {
            ...
        }
    }
    *out = 0;    //add a new zero byte
    return out - string;    //use pointer arithmetic to retrieve the new length
}

In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)

edited Feb 18, 2014 at 21:26

answered Feb 18, 2014 at 19:43

cmaster - reinstate monica

41.1k9 gold badges69 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Lesha Over a year ago

I have edited my initial post to include updated code. Is it effective now? Also someone pointed out that buf is an array of unsigned char rather than a string, does this change anything?

cmaster - reinstate monica Over a year ago

No, it is not, you are still inadequately using strlen to deduce a size of the string. Even if the len parameter includes a terminating null byte, which is then copied to the correct position by your loop, returning the result of strlen changes the size semantics because it does not include the terminating null byte. Confusion and madness follow from this. Either you forget about null termination, and consequently forget about using any standard string handling functions, or you handle your string termination with null bytes correctly and forget about passing string length parameters.

Lesha Over a year ago

It was pointed out that buf is actually a char array and not a string. From I understand because it is not actually a string buf does not have this terminating null byte. Am I making the wrong assumption here? And does this mean that I cannot use strlen?

cmaster - reinstate monica Over a year ago

Yes, if there is no terminating null byte, then strlen may return virtually any garbage, because it simply searches for the null byte. The behavior will depend entirely on whatever is stored after your string, memory that you don't own. The strlen call might even crash. I don't know what you think about when you say that buf is a character array. If you mean that it lacks null termination, I gave you the answer. However, calling it a character array is imprecise, it is still technically a pointer to the first character in an array, and all C-strings are stored in such character arrays.

hobbs · Accepted Answer · 2014-02-18 19:24:31Z

1

if(isspace(buf[i])) {
    buf[i]=" ";
}

This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.

if(isspace(buf[i]) && isspace(buf[i+1])){
    buf[i]="";
}

This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.

return strlen(*buf);

This would be return strlen(buf). *buf is a character, not a string.

answered Feb 18, 2014 at 19:24

hobbs

245k20 gold badges225 silver badges304 bronze badges

1 Comment

Lesha Over a year ago

Could you elaborate on how memmove could be used in this case. I looked it up but I dont really understand how to use it for my purpose.

Jonathan Leffler · Accepted Answer · 2014-02-18 19:29:10Z

1

The notations like:

 buf[i]=" ";
 buf[i]="";

do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.

^{Warning: untested code.}

int i, j;
for (i = 0, j = 0; i < len; i++)
{
    if (isupper(buf[i]))
        buf[j++] = tolower(buf[i]);
    else if (isspace(buf[i])
    {
        buf[j++] = ' ';
        while (i+1 < len && isspace(buf[i+1]))
            i++;
    }
    else
        buf[j++] = buf[i];
}
buf[j] = '\0';  // Null terminate

You replace the arbitrary white space with a plain space using:

buf[i] = ' ';

You return:

return strlen(buf);

or, with the code above:

return j;

edited Feb 18, 2014 at 19:29

answered Feb 18, 2014 at 19:21

Jonathan Leffler

759k145 gold badges961 silver badges1.3k bronze badges

3 Comments

chux Over a year ago

The buf[j] = '\0' may write 1 past the memory of buf. OP is ambiguous on this calling len the "size of the original character array".

Jonathan Leffler Over a year ago

I don't think so if the input is a null-terminated string of the given length (so `buf[len] == '\0'). If it is not a null-terminated string, then null termination is wrong -- period. But this is C, not Java, and null termination is the norm and the length as described is also the norm.

chux Over a year ago

I agree, given "input is a null-terminated string of the given length" is valid. OP used "size" and unsigned char and "array" as well as the function with params including the size/length that hinted this was a byte array and not a C string. OP also used "string" to muddy the waters. Hence I posted the OP for clarification.

barak manos · Accepted Answer · 2014-02-18 19:51:42Z

Several mistakes in your code:

You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.

Here is one optional solution for the problem at hand:

int normalize(char* buf)
{
    char c;
    int i = 0;
    int j = 0;
    while (buf[i] != 0)
    {
        c = buf[i++];
        if (isspace(c))
        {
            j++;
            while (isspace(c))
                c = buf[i++];
        }
        if (isupper(c))
            buf[j] = tolower(c);
        j++;
    }
    buf[j] = 0;
    return j;
}

Ben Reser · Accepted Answer · 2014-02-18 19:42:40Z

0

you should write:

return strlen(buf)

instead of:

return strlen(*buf)

The reason:

buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.

*buf will de-reference the pointer, resulting on a char - not what strlen expects.

edited Feb 18, 2014 at 19:42

Ben Reser

5,7751 gold badge24 silver badges29 bronze badges

answered Feb 18, 2014 at 19:25

elyashiv

3,7112 gold badges31 silver badges54 bronze badges

Comments

chux · Accepted Answer · 2014-02-18 20:28:37Z

Not much different then others but assumes this is an array of unsigned char and not a C string.

tolower() does not itself need the isupper() test.

int normalize(unsigned char *buf, int len) {
  int i = 0;
  int j = 0;
  int previous_is_space = 0;
  while (i < len) {
    if (isspace(buf[i])) {
      if (!previous_is_space) {
        buf[j++] = ' ';
      }
      previous_is_space = 1;
    } else {
      buf[j++] = tolower(buf[i]);
      previous_is_space = 0;
    }
    i++;
  }
  return j;
}

@OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '. To eliminate trailing and leading spaces:

int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

Collectives™ on Stack Overflow

Returning the length of a char array in C

6 Answers 6

4 Comments

1 Comment

3 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

1 Comment

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related