1

So i'm running this code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    system("chcp 1252 > nul"); // makes system accept latin characters
    int i = 0;
    for(i = 0; i < 256; i++){
        printf("%i:\t%c\n", i, (char)i);
    }
    return 0;
}

This printed to console window all characters from extended ascii table.

I now am using linux and hoped to achieve the same result. I am aware that linux does not use extended ASCII table. Thus i have made sure the file is encoded to UTF-8, which have worked for me before. I am using code::blocks runing console applications to gnome terminal, also set to UTF-8. But my output is not what i expected:

33: !
34: "
35: #
36: $
37: %
38: &
39: '
40: (
41: )
42: *
43: +
...
69: E
70: F
71: G
72: H
73: I
...
103:    g
104:    h
105:    i
106:    j
107:    k
108:    l
...
127:    
128:    �
129:    �
...    
254:    �
255:    �

What am I missing here.. there has to be a way to do it. I have tried many solutions so far, one of them being:

...
#include <locale.h>

int main()
{
    setlocale(LC_ALL,"portuguese");
    ...
}

but so far, this has been to no avail. Any help is appreciated

Edit 1: Ok! I got to print UTF-8 encoded chars to terminal. But printing to file is not working like i expected. Using wchar.h and locale.h as such:

#include <locale.h>
#include <wchar.h>

int main(){
    setlocale(LC_ALL,"");

    wint_t index = 0;

    FILE* fpinout = fopen("UTF-8.txt","w");
    for(index = 0; index < 0x200; index++){
        printf("%i:\t%lc\n", index, index); //works fine, prints utf-8 chars to terminal
        fprintf(fpinout,"%i\t%lc", index, index); //does not work, output is wierd
    }
    fclose(fpinout);
}

I tried to use index there both as wint_t and wchar_t. My UTF-8.txt file looks like this:

र㄀ĉल㌂̉ऴ㔄ԉश㜆܉स㤈उ〱ਉㄱଉ㈱ఉ㌱ഉ㐱ฉ㔱༉㘱ဉ㜱ᄉ㠱ሉ㤱ጉ〲ᐉㄲᔉ㈲ᘉ㌲ᜉ㐲᠉㔲ᤉ㘲ᨉ㜲ᬉ㠲ᰉ㤲ᴉ〳ḉㄳἉ㈳ ㌳℉㐳∉㔳⌉㘳␉㜳
┉㠳☉㤳✉〴⠉ㄴ⤉㈴⨉㌴⬉㐴Ⰹ㔴ⴉ㘴⸉㜴⼉㠴〉㤴ㄉ〵㈉ㄵ㌉㈵㐉㌵㔉㐵㘉㔵㜉㘵㠉㜵㤉㠵㨉㤵㬉〶㰉ㄶ㴉㈶㸉㌶㼉㐶䀉㔶䄉㘶䈉
㜶䌉㠶䐉㤶䔉〷䘉ㄷ䜉㈷䠉㌷䤉㐷䨉㔷䬉㘷䰉㜷䴉㠷三㤷伉〸倉ㄸ儉㈸刉㌸匉㐸吉㔸唉㘸嘉㜸圉㠸堉㤸変〹娉ㄹ嬉㈹尉㌹崉㐹帉
㔹弉㘹怉㜹愉㠹戉㤹按〱रㅤ㄰攉〱लㅦ㌰有〱ऴㅨ㔰椉〱शㅪ㜰欉〱सㅬ㤰洉ㄱरㅮㄱ漉ㄱलㅰ㌱焉ㄱऴㅲ㔱猉ㄱशㅴ㜱甉ㄱसㅶ㤱眉
㈱रㅸㄲ礉㈱लㅺ㌲笉㈱ऴㅼ㔲紉㈱शㅾ㜲缉㈱स胂㈱ह臂㌱र苂㌱ऱ菂㌱ल蓂㌱ळ藂㌱ऴ蛂㌱व蟂㌱श裂㌱ष观㌱स諂㌱ह诂㐱र賂㐱ऱ跂㐱ल軂㐱
ळ迂㐱ऴ郂㐱व釂㐱श鋂㐱ष鏂㐱स铂㐱ह闂㔱र雂㔱ऱ韂㔱ल飂㔱ळ駂㔱ऴ髂㔱व鯂㔱श鳂㔱ष鷂㔱स黂㔱ह鿂㘱रꃂ㘱ऱꇂ㘱लꋂ㘱ळꏂ㘱ऴ꓂
㘱वꗂ㘱शꛂ㘱षꟂ㘱सꣂ㘱ह꧂㜱रꫂ㜱ऱꯂ㜱ल곂㜱ळ귂㜱ऴ껂㜱व꿂㜱श냂㜱ष뇂㜱स닂㜱ह돂㠱र듂㠱ऱ뗂㠱ल뛂㠱ळ럂㠱ऴ룂㠱व맂㠱श뫂
㠱ष믂㠱स볂㠱ह뷂㤱र뻂㤱ऱ뿂㤱ल胃㤱ळ臃㤱ऴ苃㤱व菃㤱श蓃㤱ष藃㤱स蛃㤱ह蟃〲र裃〲ऱ觃〲ल諃〲ळ诃〲ऴ賃〲व跃〲श軃〲ष迃〲स郃〲ह
釃ㄲर鋃ㄲऱ鏃ㄲल铃ㄲळ闃ㄲऴ雃ㄲव韃ㄲश飃ㄲष駃ㄲस髃ㄲह鯃㈲र鳃㈲ऱ鷃㈲ल黃㈲ळ鿃㈲ऴꃃ㈲वꇃ㈲शꋃ㈲षꏃ㈲स꓃㈲हꗃ㌲रꛃ㌲ऱꟃ㌲
लꣃ㌲ळ꧃㌲ऴ꫃㌲वꯃ㌲श곃㌲ष귃㌲स껃㌲ह꿃㐲र냃㐲ऱ뇃㐲ल닃㐲ळ돃㐲ऴ듃㐲व뗃㐲श뛃㐲ष럃㐲स룃㐲ह맃㔲र뫃㔲ऱ믃㔲ल볃㔲ळ뷃㔲ऴ뻃
㔲व뿃 

Any help is appreciated.

2
  • you might try running the program this way: (from the terminal prompt) (chcp 1253; mypgm;) which will set the code page for the duration of the mypgm execution, as the two executables are being done inside a subshell, the code page will revert to normal when the subshell exits Commented Oct 24, 2014 at 0:26
  • The file's encoding is being misdetected by your text editor (gedit?). It actually has the correct contents, and other text editors should show the correct content. Commented Jul 20, 2015 at 17:27

1 Answer 1

3

printf %c can't be used to generate UTF8 output. It only outputs single byte ASCII. UTF8 is single byte only for the first 128 characters that map to ASCII. After that, UTF8 is multibyte per character.

See this answer for a method of generating non ASCII characters using wide characters.

How to iterate through unicode characters and print them on the screen with printf in C?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.