3

I have noticed that if I try to print the byte array containing the representation of a string in UTF-8, using the format specifier "%s", printf() gets it right but NSLog() gets it garbled (i.e., each byte printed as-is, so for example "¥" gets printed as the 2 characters: "¬•"). This is curious, because I always thought that NSLog() is just printf(), plus:

  1. The first parameter (the 'format') is an Objective-C string, not a C string (hence the "@").
  2. The timestamp and app name prepended.
  3. The newline automatically added at the end.
  4. The ability to print Objective-C objects (using the format "%@").

My code:

NSString* string; 

// (...fill string with unicode string...)

const char* stringBytes = [string cStringUsingEncoding:NSUTF8Encoding];

NSUInteger stringByteLength = [string lengthOfBytesUsingEncoding:NSUTF8Encoding];
stringByteLength += 1; // add room for '\0' terminator

char* buffer = calloc(sizeof(char), stringByteLength);

memcpy(buffer, stringBytes, stringByteLength);

NSLog(@"Buffer after copy: %s", buffer);
// (renders ascii, no matter what)

printf("Buffer after copy: %s\n", buffer);
// (renders correctly, e.g. japanese text)

Somehow, it looks as if printf() is "smarter" than NSLog(). Does anyone know the underlying cause, and if this feature is documented anywhere? (Couldn't find)

1
  • Of course, the practical solution would be to just NSLog an UTF8 string as NSString, not as a C string. But in this particular case, I wanted to check that the char buffer had been copied right. Commented May 15, 2014 at 7:57

1 Answer 1

3

NSLog() and stringWithFormat: seem to expect the string for %s in the "system encoding" (for example "Mac Roman" on my computer):

NSString *string = @"¥";
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(CFStringGetSystemEncoding());
const char* stringBytes = [string cStringUsingEncoding:enc];
NSString *log = [NSString stringWithFormat:@"%s", stringBytes];
NSLog(@"%@", log);

// Output: ¥

Of course this will fail if some characters are not representable in the system encoding. I could not find an official documentation for this behavior, but one can see that using %s in stringWithFormat: or NSLog() does not reliably work with arbitrary UTF-8 strings.

If you want to check the contents of a char buffer containing an UTF-8 string, then this would work with arbitrary characters (using the boxed expression syntax to create an NSString from a UTF-8 string):

NSLog(@"%@", @(utf8Buffer));
Sign up to request clarification or add additional context in comments.

2 Comments

Oh, neat trick... I didn't know that syntax. Kind of like @(1), @[a, b, c], @{@"key":value}, etc.
BTW I was testing on iOS device; haven't tried on the Mac yet.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.