NSLog() vs printf() when printing C string (UTF-8)

Question

I have noticed that if I try to print the byte array containing the representation of a string in UTF-8, using the format specifier "%s", printf() gets it right but NSLog() gets it garbled (i.e., each byte printed as-is, so for example "¥" gets printed as the 2 characters: "¬•"). This is curious, because I always thought that NSLog() is just printf(), plus:

The first parameter (the 'format') is an Objective-C string, not a C string (hence the "@").
The timestamp and app name prepended.
The newline automatically added at the end.
The ability to print Objective-C objects (using the format "%@").

My code:

NSString* string; 

// (...fill string with unicode string...)

const char* stringBytes = [string cStringUsingEncoding:NSUTF8Encoding];

NSUInteger stringByteLength = [string lengthOfBytesUsingEncoding:NSUTF8Encoding];
stringByteLength += 1; // add room for '\0' terminator

char* buffer = calloc(sizeof(char), stringByteLength);

memcpy(buffer, stringBytes, stringByteLength);

NSLog(@"Buffer after copy: %s", buffer);
// (renders ascii, no matter what)

printf("Buffer after copy: %s\n", buffer);
// (renders correctly, e.g. japanese text)

Somehow, it looks as if printf() is "smarter" than NSLog(). Does anyone know the underlying cause, and if this feature is documented anywhere? (Couldn't find)

Of course, the practical solution would be to just NSLog an UTF8 string as NSString, not as a C string. But in this particular case, I wanted to check that the char buffer had been copied right. — Nicolas Miari
– Nicolas Miari, Commented May 15, 2014 at 7:57

Martin R · Accepted Answer · 2014-05-15 08:30:58Z

3

NSLog() and stringWithFormat: seem to expect the string for %s in the "system encoding" (for example "Mac Roman" on my computer):

NSString *string = @"¥";
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(CFStringGetSystemEncoding());
const char* stringBytes = [string cStringUsingEncoding:enc];
NSString *log = [NSString stringWithFormat:@"%s", stringBytes];
NSLog(@"%@", log);

// Output: ¥

Of course this will fail if some characters are not representable in the system encoding. I could not find an official documentation for this behavior, but one can see that using %s in stringWithFormat: or NSLog() does not reliably work with arbitrary UTF-8 strings.

If you want to check the contents of a char buffer containing an UTF-8 string, then this would work with arbitrary characters (using the boxed expression syntax to create an NSString from a UTF-8 string):

NSLog(@"%@", @(utf8Buffer));

edited May 15, 2014 at 8:30

answered May 15, 2014 at 8:23

Martin R

541k98 gold badges1.3k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nicolas Miari Over a year ago

Oh, neat trick... I didn't know that syntax. Kind of like @(1), @[a, b, c], @{@"key":value}, etc.

Nicolas Miari Over a year ago

BTW I was testing on iOS device; haven't tried on the Mac yet.

Collectives™ on Stack Overflow

NSLog() vs printf() when printing C string (UTF-8)

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related