1

Well, sorry about the confusing title but I'm having a slightly annoying problem with character encoding in C#.NET

I have a bunch of classes generated from WSDL files, these classes have methods which take string parameters which are then submitted to a remote web service. This remote web service expects all text input to be UTF-8 encoded. Now, as far as I can tell there really isn't a way to make a string in C#.NET UTF-8 encoded, it's UTF-16 or nothing, if I want UTF-8 I have to make it a byte[], right?

So, my big question is, how am I supposed to put my raw UTF-8 byte[] data into a string so I can actually submit it to the web service? I mean, sure, I could probably fall back on C-style code, looping through the whole thing byte by byte but surely Microsoft must have thought about this when designing the language and API? (although since my Vista laptop thinks it's perfectly alright to use UTF-16 internally, cp1252 for some stuff, UTF-8 for some other and cp850(!) for some other stuff I wouldn't be too surprised if they didn't).

So, am I stuck doing things the ugly way or is there some hidden System.Text.EncodeStuffTherightWay.EncodeStringAsUTF8(string) method deep in the bowels of .NET?

11
  • Encoding is just a form of representation. It's like an implementation detail for something implementing the "unicode" interface. Is there a specific reason you need to use UTF-8? Optimization (prevent UTF-8 => UTF-16 => UTF-8)? Commented Sep 13, 2010 at 8:10
  • It depends how you are connecting to the service, but unless this is at a very low level, I'd be very surprised if this is an issue that you need concern yourself with. Commented Sep 13, 2010 at 8:11
  • Well, the external service only allows certain characters and it must be UTF-8 encoded. And since the methods I call to access this service want a string variable (generated from WSDL files which change from time to time so I don't want to mess with these classes) then I need to figure out a way to put UTF-8-encoded text into a string variable. Commented Sep 13, 2010 at 8:13
  • Are you using WCF? If so you can just set the textEncoding attribute on the binding. See: msdn.microsoft.com/en-us/library/ms731361.aspx Commented Sep 13, 2010 at 8:14
  • Greg: I'm just using a bunch of classes generated with wsdl.exe so no WCF (Also, this is .NET 2.0 and IIRC WCF isn't even available for .NET versions < 3). Commented Sep 13, 2010 at 8:17

1 Answer 1

5

Strings never contain anything utf-* or anything else encoded; that isn't their job. They are strings - groups of character/code-point data. The byte[] that you have is the encoded form.

In almost any scenario I can think of, the transport etc should be doing this for you already. If isn't then that sounds like a bug in either the wsdl or the web-service stack itself.

Keep in mind that wsdl itself just has xs:string - if that isn't sufficient (i.e. that in combination with the handshake isn't enough), then it simply isn't a web-service string.

The alternative is to throw it around as a byte[], and encode manually via

byte[] bytes=Encoding.UTF8.GetBytes(yourString);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.