1

I have a problem with umlauts in a NSString converting to const char*.

This method parses a textfile of words (line by line), saves the words as strings in NSArray *results. Then convert to const char tmpConstChars. This const char saves, for example, an 'ä' like '√§'. How to convert from NSString to const char * - I Thought this is correct.

- (void)inputWordsByFile:(NSString *)path
{

    NSError *error = [[NSError alloc] init];
    NSString *content = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&error];
    NSArray *results = [content componentsSeparatedByString:@"\n"];

    NSMutableArray *words = [[NSMutableArray alloc] initWithArray:results];
    [words removeLastObject];
    for(int i=0; i<[words count]; i++){

    const char *tmpConstChars = [[words objectAtIndex:i] UTF8String];
    [self addWordToTree:tmpConstChars];

    }
}
5
  • 1
    No need to alloc/init the error object. Commented Jun 17, 2011 at 11:08
  • The problem could be with how you are viewing the text... are you sure you're viewing the text in something that is UTF8 aware, or something that is configured to interpret UTF8 encoded text? Commented Jun 17, 2011 at 11:13
  • is the Xcode console an UTF8 interpreter? Commented Jun 17, 2011 at 11:16
  • NSLog(@"%s",tmpConstChars+2);will give me "§hlen" for "zählen", it should give me "hlen". Commented Jun 17, 2011 at 11:22
  • What does NSLog(@"%s",tmpConstChars); give? Commented Jun 17, 2011 at 11:42

1 Answer 1

2

Unless I am mistaken, the UTF8String method returns the UTF-8 encoding bytes for the string. For zählen, these are:

$ perl -MEncode -Mutf8 -E 'say join ", ", map ord, split //, encode("utf8", "zählen")'
122, 195, 164, 104, 108, 101, 110

…where <195, 164> is the UTF-8 encoding sequence for ä. Thus, when you poke into tmpChars+2, you get the character with ASCII code 164 back. Which is probably not what you want. Aren’t you more after unichars? There’s a characterAtIndex: method that returns those, albeit one after one:

NSString *test = @"zählen";
unichar c = [test characterAtIndex:1];
NSLog(@"---> %C", c); // ---> ä
Sign up to request clarification or add additional context in comments.

3 Comments

unichar tmpConstChars = [[words objectAtIndex:i] characterAtIndex:0];
You’ve got to be doing something wrong, it works for me. I’ll update the answer with sample code.
@malteriechmann, what are you doing with tmpConstChars to get your mistaken character? Remember, unichar are 2-bytes long, so you can't read tmpConstChars[2] and get ä. %C is the correct format code for unichar. If you're expecting ä to be 224, then you need to decode using NSISOLatin1StringEncoding. But then you'll be limited to the Latin1 code page.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.