How to count the number of different type of characters in file using C.

Question

The characters may contain any numeric, alphabets, symbols such as :;@ etc. one method is to use a switch case statement as show below. but thats going to be simple and long process. Is there any other method short method possible?

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>

int main(void) {
FILE *fp;
fp = fopen("input.txt","r");
int ch,count[36]= {0};
if (fp == NULL)
{
fprintf(stderr,
        "Failed to open input.txt: %s\n",
         strerror(errno));
}
else
{
while ((ch = fgetc(fp)) != EOF)
{
    switch (ch)
    {
    case 'a':
        count[0]++;
        break;
    case 'b':
        count[1]++;
        break;
    default:
        count[2]++;
    }
}

fclose(fp);
}
    printf("count a is %d", count[0]);
    printf("count b is %d", count[1]);
    printf("count c is %d", count[2]);
    return 0;
}

make the arraay 256 big, then it's a simple count[ch]++ and you can count ANY bytes in the file. — Marc B
– Marc B, Commented Jun 14, 2013 at 21:05

ouah · Accepted Answer · 2013-06-14 21:15:08Z

5

In ASCII, printable characters have codes from 0x20 to 0x7E, so less than 128 characters. So for ASCII just use an array of 128 characters:

int count[128] = {0};

Update your count with:

count[ch]++;

and print printable characters with something like this:

for (i = 0x20; i <= 0x7E; i++)
{
    printf("count %c is %d", i, count[i]);
}

edited Jun 14, 2013 at 21:15

answered Jun 14, 2013 at 21:08

ouah

147k16 gold badges287 silver badges338 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Hut8 Over a year ago

That's dangerous because extended ASCII can cause you to walk off the end of your array. So better to make it 256 rather than 128.

ouah Over a year ago

@LaceCard: if (ch <= 0x7E) count[ch]++;

Sqeaky Over a year ago

Presuming Ascii this would seem to work, but on a UTF8 file the counts would be off.

user1944441 · Accepted Answer · 2013-06-14 21:05:17Z

3

Use an array of size 2^8 and increase the corresponding member.

while ((ch = fgetc(fp)) != EOF)
{
    characters[ ch ] += 1 ;
....

The index of the array characters fits the asci table.

answered Jun 14, 2013 at 21:05

user1944441

4 Comments

Sqeaky Over a year ago

This works for 8bit Ascii, but what about UTF8? He did says types of characters not bytes.

Will Over a year ago

I used a method similar to this when I made a toy C program to count the frequency of different byte values on a given block device. It was mainly an exercise in using Apple Objective C blocks with C but you might find gist.github.com/iwillspeak/4055319#file-bytes-c-L48 of interest.

user1944441 Over a year ago

@Sqeaky fgets() can read UTF8? It returns a char promoted to int. You should use fgetwc() for a wide char.

Sqeaky Over a year ago

UTF8 is variable width(as few a 1 byte as many as 4 bytes represent a sinlge character), there is no direct formula correlating how many char/bytes and how many characters. Despite this, your code seems like in practice it would work.

Ajax · Accepted Answer · 2013-06-14 21:10:28Z

1

if you are reading ASCII characters:

frequency[ch]++;

where frequency is integer array of size 128

answered Jun 14, 2013 at 21:10

Ajax

1,7494 gold badges21 silver badges29 bronze badges

Comments

Forest Kunecke · Accepted Answer · 2013-06-14 21:11:45Z

1

If you use the functions from <ctype.h> (isalpha, isdigit, ispunct, etc) in a series of if statements inside your while loop, you could categorize them fairly easily.

PS: for a list of these functions, see:

http://www.cplusplus.com/reference/cctype/

answered Jun 14, 2013 at 21:11

Forest Kunecke

2,15015 silver badges33 bronze badges

Collectives™ on Stack Overflow

How to count the number of different type of characters in file using C.

4 Answers 4

3 Comments

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related