4

How do you remove spaces and special characters from a string?

I couldn't find a single answer while googling. There were a lot related to other languages, but not C. Most of them mentioned the use of regex, which isn't C standard (?).

Removing a simple space is easy:

 char str[50] = "Remove The Spaces!!";

Then a simple loop with a if-statement:

if (str[i] != ' ');

Output would be:

RemoveTheSpaces!!

What do I add to the if-statement so it would recognize special characters and remove them?

My definition of special characters:

Characters not included in this list: 
A-Z a-z 0-9
1
  • String handling in C isn't always funny. Think of strings as just a char array. You can replace an a with a b but there's no plain simple way to remove a character index from the array, so you'd still end up with a hole. Although, if its only for printing you could just iterate over the array and if its not in range for ascii values for a-zA-Z0-9 then just skip doing anything and go to next character. It's imo often the easiest thing to do when possible. Otherwise you need to copy to a new buffer. Commented Mar 16, 2013 at 1:40

7 Answers 7

8

This is probably not the most efficient way of achieving this but it will get the job done fairly fast.

Note: this code does require you to include <string.h> and <ctype.h>

char str[50] = "Remove The Spaces!!";
char strStripped[50];

int i = 0, c = 0; /*I'm assuming you're not using C99+*/
for(; i < strlen(str); i++)
{
    if (isalnum(str[i]))
    {
        strStripped[c] = str[i];
        c++;
    }
}
strStripped[c] = '\0';
Sign up to request clarification or add additional context in comments.

7 Comments

You forgot the NUL termination of strStrippped: strStripped[c]='\0'; goes after the loop.
If you are assuming pre-C99, then // style comments are not supported either.
This would indeed work, thank you! But is there a more character specific way to do this? In case I would like to save some special characters in the string.
No really, single quotes :)
Note that the use of strlen() in the loop condition like that leads to bad (quadratic) performance instead of linear performance. Use int len = strlen(str); and then test len in the loop condition.
|
1

Using your if statement:

if (str[i] != ' ');

With a little logic (the characters have to be in the range a-z or A-Z or 0-9:

If ( !('a' <= str[i] && 'z' >= str[i]) &&
     !('A' <= str[i] && 'Z' >= str[i]) &&
     !('0' <= str[i] && '9' >= str[i])) then ignore character.

2 Comments

You know you could simplify the logic by removing ! and just replace && with ||. You already negated the expression :)
That's true lol... I just wrote it in a way that's natural for me to understand it. For some reason I like and's better than or's... I'm probably just weird.
1

There are millions of different ways this can be done. Here is just one example that is not using any additional storage and performs the removal of unneeded characters "in-place":

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

static void my_strip(char *data)
{
    unsigned long i = 0; /* Scanning index */
    unsigned long x = 0; /* Write back index */
    char c;

    /*
     * Store every next character in `c` and make sure it is not '\0'
     * because '\0' indicates the end of string, and we don't want
     * to read past the end not to trigger undefined behavior.
     * Then increment "scanning" index so that next time we read the
     * next character.
     */
    while ((c = data[i++]) != '\0') {
        /* Check if character is either alphabetic or numeric. */
        if (isalnum(c)) {
            /*
             * OK, this is what we need. Write it back.
             * Note that `x` will always be either the same as `i`
             * or less. After writing, increment `x` so that next
             * time we do not overwrite the previous result.
             */
            data[x++] = c;
        }
        /* else — this is something we don't need — so we don't increment the
           `x` while `i` is incremented. */
    }
    /* After all is done, ensure we terminate the string with '\0'. */
    data[x] = '\0';
}

int main()
{
    /* This is array we will be operating on. */
    char data[512];

    /* Ask your customer for a string. */
    printf("Please enter a string: ");

    if (fgets(data, sizeof(data), stdin) == NULL) {
        /* Something unexpected happened. */
        return EXIT_FAILURE;
    }

    /* Show the customer what we read (just in case :-)) */
    printf("You have entered: %s", data);

    /*
     * Call the magic function that removes everything and leaves
     * only alphabetic and numberic characters.
     */
    my_strip(data);

    /*
     * Print the end result. Note that newline (\n) is there
     * when we read the string
     */
    printf("Stripped string: %s\n", data);

    /* Our job is done! */
    return EXIT_SUCCESS;
}

I put a lot of comments in there so hopefully the code doesn't need explanation. Hope it helps. Good Luck!

Comments

1

This is just a silly suggestion.

char ordinary[CHAR_MAX] = {
    ['A']=1,['B']=1,['C']=1,['D']=1,['E']=1,['F']=1,['G']=1,['H']=1,['I']=1,
    ['J']=1,['K']=1,['L']=1,['M']=1,['N']=1,['O']=1,['P']=1,['Q']=1,['R']=1,
    ['S']=1,['T']=1,['U']=1,['V']=1,['W']=1,['X']=1,['Y']=1,['Z']=1,

    ['a']=1,['b']=1,['c']=1,['d']=1,['e']=1,['f']=1,['g']=1,['h']=1,['i']=1,
    ['j']=1,['k']=1,['l']=1,['m']=1,['n']=1,['o']=1,['p']=1,['q']=1,['r']=1,
    ['s']=1,['t']=1,['u']=1,['v']=1,['w']=1,['x']=1,['y']=1,['z']=1,

    ['0']=1,['1']=1,['2']=1,['3']=1,['4']=1,['5']=1,['6']=1,['7']=1,['8']=1,
    ['9']=1,
};

int is_special (int c) {
    if (c < 0) return 1;
    if (c >= CHAR_MAX) return 1;
    return !ordinary[c];
}

void remove_spaces_and_specials_in_place (char *str) {
    if (str) {
        char *p = str;
        for (; *str; ++str) {
            if (!is_special(*str)) *p++ = *str;
        }
        *p = '\0';
    }
}

1 Comment

Good use of C99 designated initializers.
1
#include <stdio.h>
#include <string.h>

main()
{
    int i=0, j=0;
    char c;
    char buff[255] = "Remove The Spaces!!";

    for(; c=buff[i]=buff[j]; j++){
       if(c>='A' && c<='Z' || c>='a' && c<='z' || c>='0' && c<='9'){
           i++;
       }
    }

    printf("char buff[255] = \"%s\"\n", buff);
}

5 Comments

Just a suggestion... This answer might be improved by adding comments to the code, and perhaps showing the output.
Now I see that in case there is more than special characters in a row, the code will leave them in the result string, always the second of each pair of two in that sequence. because of the instruction "buff[i]=buff[++j];" there's a mistake there because it doesn't assume that there can be special characters in a row of two or more. And also the variable "i", only should be increased when the character in "j" index of the source is valid, instead of being increased always.
So to correct the code: 1 - take out the instruction inside the else, that is, let only the "if"; 2 - don't increase the "i", in the end of each iteration (only the "j"). 3 - increase the "i" in the "if", after the instruction "buff[i]=buff[j];" or replacing this intruction by "buff[i++]=buff[j];". The result would be a code very similar to the code I wrote when editting the code written by Jonathan Leffler, except that, in that one I forgot including the source string terminator in the condition of the "if" in order to copy the terminator as a valid character to the result string.
I thing that this is the most efficient version of the code that removes special characters from a string. The only improvement for the efficiency that maybe could be done is replacing the testing instruction ("c=buff[i]=buff[j];") in the "for" loop by this one: "c=buff[j];" in order to make the fewest unnecessary copies of special characters from the "j" index to the "i" index of the string, and putting the null terminator at the end of the final string at the "i" after the "for" is done. But as I like the more compact codes, so I made it like that. Hope you like it.
Forget about what I said just above this comment. That instruction is right there. Because the copy of characters from the "j" to the "i" index must exist, that is, it will done anyway. The difference is that, this way the code stays more compact than if I made the copies inside the "if", and it can be less efficient because of the copies of special characters that are always done although the "i" isnt increased in those cases. But its also usefull as it puts the null terminator after making the changes, instead of doing it after the "for" is done.
1
include < stdio.h >

int main()
{
    char a[100];

    int i;
    printf("Enter the character : ");
    gets(a);
    for (i = 0; a[i] != '\0'; i++) {
        if ((a[i] >= 'a' && a[i] <= 'z') || (a[i] >= 'A' && a[i] <= 'Z') 
             || (a[i] - 48 >= 0 && a[i] - 48 <= 9)) {
            printf("%c", a[i]);
        } else {
            continue;
        }
    }
    return 0;
}

Comments

0

This is Ascii Code Range

Char:Dec

0:48, 9:57
A:65, Z:90
a:97, z:122

try this:

char str[50] = "Remove The Spaces!!";

int i =0;
for(; i<strlen(str); i++)
{
    if(str[i]>=48 && str[i]<=57 || str[i]>=65 && str[i]<=90 || str[i]>=97 && str[i]<=122)
  //This is equivalent to
  //if(str[i]>='0' && str[i]<='9' || str[i]>='A' && str[i]<='Z' || str[i]>='a' && str[i]<='z')
        printf("alphaNumeric:%c\n", str[i]);
    else
    {
        printf("special:%c\n", str[i]);
        //remove that
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.