2

I've a some text from json file. In this text I've applied UTF8 encode but this encoder don't recognize a non standard character àèìòù and it's capital char, is there a method to purify my string?

My function:

func stringToUTF8String (stringaDaConvertire stringa: String) -> String {
    let encodedData = stringa.dataUsingEncoding(NSUTF8StringEncoding)!
    let attributedOptions = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
    let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
    //println(attributedString.string)
    return attributedString.string
}
2
  • What byte output is String giving you? What would you expect? Also I'm not sure your insertion of the non-standard character into StackOverflow went correctly. Commented Jan 8, 2015 at 21:46
  • Please show a (short) input string demonstrating the problem together with the actual output and the expected output. Commented Jan 8, 2015 at 21:54

1 Answer 1

5

I've found a solution.

The UTF8 take 8 bit of table ASCII, and the UTF16 take 16 bit ASCII table, the solution is simple by modifying my function to:

func stringToUTF16String (stringaDaConvertire stringa: String) -> String {
    let encodedData = stringa.dataUsingEncoding(NSUTF16StringEncoding)!
    let attributedOptions = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
    let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
    //println(attributedString.string)
    return attributedString.string
}
Sign up to request clarification or add additional context in comments.

3 Comments

Yes, this works, but I still don't know why dataUsingEncoding is not able to identify the character using UTF8StringEncoding. In my case, I verified my file is stored as UTF-8, so encodedData should contain the right content, my guess is that NSAttributedString uses UTF-16 encoding, after all that is the only encoding supported by NSString, the documentation is not clear about this though.
I was having the same problem and worked out it must be due to NSAttributedString. The documentation never specify what encoding the parameter data should have, but I think we have verified that it MUST be NSUTF16StringEncoding. Internally they probably decode with that.
The foundational NSString is represented using UTF-16, so that default would make sense. That being said, you can specify options: [characterEncoding: NSUTF8StringEncoding] to match the incoming data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.