3

I'm currently using the following code to scan each word in a text file, put it into a variable then do some manipulations with it before moving onto the next word. This works fine, but I'm trying to remove all characters that don't fall under A-Z / a-z. e.g if "he5llo" was entered I want the output to be "hello". If I can't modify fscanf to do it is there way of doing it to the variable once scanned? Thanks.

while (fscanf(inputFile, "%s", x) == 1)
1
  • That fscanf has one big problem: it is potential buffer overrun. You should always use for example fscanf(inputFile, "%99s", x) when you have char x[100]. Commented Apr 7, 2013 at 17:01

5 Answers 5

3

You can give x to a function like this. First simple version for sake of understanding:

// header needed for isalpha()
#include <ctype.h>

void condense_alpha_str(char *str) {
  int source = 0; // index of copy source
  int dest = 0; // index of copy destination

  // loop until original end of str reached
  while (str[source] != '\0') {
    if (isalpha(str[source])) {
      // keep only chars matching isalpha()
      str[dest] = str[source];
      ++dest;
    }
    ++source; // advance source always, wether char was copied or not
  }
  str[dest] = '\0'; // add new terminating 0 byte, in case string got shorter
}

It will go through the string in-place, copying chars which match isalpha() test, skipping and thus removing those which do not. To understand the code, it's important to realize that C strings are just char arrays, with byte value 0 marking end of the string. Another important detail is, that in C arrays and pointers are in many (not all!) ways same thing, so pointer can be indexed just like array. Also, this simple version will re-write every byte in the string, even when string doesn't actually change.


Then a more full-featured version, which uses filter function passed as parameter, and will only do memory writes if str changes, and returns pointer to str like most library string functions do:

char *condense_str(char *str, int (*filter)(int)) {

  int source = 0; // index of character to copy

  // optimization: skip initial matching chars
  while (filter(str[source])) {
    ++source; 
  }
  // source is now index if first non-matching char or end-of-string

  // optimization: only do condense loop if not at end of str yet
  if (str[source]) { // '\0' is same as false in C

    // start condensing the string from first non-matching char
    int dest = source; // index of copy destination
    do {
      if (filter(str[source])) {
        // keep only chars matching given filter function
        str[dest] = str[source];
        ++dest;
      }
      ++source; // advance source always, wether char was copied or not
    } while (str[source]);
    str[dest] = '\0'; // add terminating 0 byte to match condenced string

  }

  // follow convention of strcpy, strcat etc, and return the string
  return str;
}

Example filter function:

int isNotAlpha(char ch) {
    return !isalpha(ch);
}

Example calls:

char sample[] = "1234abc";
condense_str(sample, isalpha); // use a library function from ctype.h
// note: return value ignored, it's just convenience not needed here
// sample is now "abc"
condense_str(sample, isNotAlpha); // use custom function
// sample is now "", empty

// fscanf code from question, with buffer overrun prevention
char x[100];
while (fscanf(inputFile, "%99s", x) == 1) {
  condense_str(x, isalpha); // x modified in-place
  ...
}

reference:

Read int isalpha ( int c ); manual:

Checks whether c is an alphabetic letter.
Return Value:
A value different from zero (i.e., true) if indeed c is an alphabetic letter. Zero (i.e., false) otherwise

Sign up to request clarification or add additional context in comments.

13 Comments

@RandyHoward If you think its wrong suggest how one should respond Instead.. hyde don't know whether OP asking for homework or for self learning purpose. hyde just helping.
@hyde I would like to suggest that always explain your code so that it would help to OP better ..
Cheers for answering, although I don't fully understand the example you have given so I'll struggle to use it for my approach.
@user2254988 I modified the code to use index instead of pointer arithmetic. Is it any clearer now?
+1 - and a small set of changes makes this function much more general. Instead of hard-coding it to use isalpha(), pass it a pointer to a function (with the same prototype as isalpha() and other ctype.h character classification functions) and you can easily use this to filter on any class of characters, even a custom class of characters: compress_str( char* str, int (*filter)(int))
|
1

luser droog answer will work, but in my opinion it is more complicated than necessary.

foi your simple example you could try this:

while (fscanf(inputFile, "%[A-Za-z]", x) == 1) {   // read until find a non alpha character
   fscanf(inputFile, "%*[^A-Za-z]"))  // discard non alpha character and continue
}

Comments

0

you can use the isalpha() function checking for all the characters contained into the string

Comments

0

I'm working on a similar project so you're in good hands! Strip the word down into separate parts.

Blank spaces aren't an issue with cin each word You can use a

 if( !isPunct(x) )

Increase the index by 1, and add that new string to a temporary string holder. You can select characters in a string like an array, so finding those non-alpha characters and storing the new string is easy.

 string x = "hell5o"     // loop through until you find a non-alpha & mark that pos
 for( i = 0; i <= pos-1; i++ )
                                    // store the different parts of the string
 string tempLeft = ...    // make loops up to and after the position of non-alpha character
 string tempRight = ... 

Comments

0

The scanf family functions won't do this. You'll have to loop over the string and use isalpha to check each character. And "remove" the character with memmove by copying the end of the string forward.

Maybe scanf can do it after all. Under most circumstances, scanf and friends will push back any non-whitespace characters back onto the input stream if they fail to match.

This example uses scanf as a regex filter on the stream. Using the * conversion modifier means there's no storage destination for the negated pattern; it just gets eaten.

#include <stdio.h>
#include <string.h>

int main(){
    enum { BUF_SZ = 80 };   // buffer size in one place
    char buf[BUF_SZ] = "";
    char fmtfmt[] = "%%%d[A-Za-z]";  // format string for the format string
    char fmt[sizeof(fmtfmt + 3)];    // storage for the real format string
    char nfmt[] = "%*[^A-Za-z]";     // negated pattern

    char *p = buf;                               // initialize the pointer
    sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));  // initialize the format string
    //printf("%s",fmt);
    while( scanf(fmt,p) != EOF                   // scan for format into buffer via pointer
        && scanf(nfmt) != EOF){                  // scan for negated format
        p += strlen(p);                          // adjust pointer
        sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));   // adjust format string (re-init)
    }
    printf("%s\n",buf);
    return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.