4

I have a language that generally contains serialised data messages in a human-readable format, but some productions within the language contain verbatim raw, binary data.

My parser uses String for its buffer since that seems to be the easiest thing to work with. However the data is read from a network socket into an array of Byte.

Now, I'm trying to connect the dots between Byte() and String:

' data as Byte()
' count as Integer
' buffer as String

buffer += System.Text.Encoding.ASCII.GetString(data, 0, count)

But my initial assumption that an ASCII encoding would just leave my bytes alone turned out to be invalid; any bytes with a value that doesn't fit into the 7-bit model was translated into '?'.

So then I thought about using a single-byte "Unicode" encoding that should leave my bytes alone but also allow values throughout the 8-bit range:

' data as Byte()
' count as Integer
' buffer as String

Dim enc = New System.Text.UTF8Encoding
buffer += enc.GetString(data, 0, count)

But my data is still mangled. I haven't actually been able to deduce yet precisely how the data is being mangled, but I do know that the length of the data is changing, indicating that the bytes are not being left verbatim.

So how can I obtain a String whose contents are just a verbatim copy of the bytes from my Bytes() input?

7
  • 1
    How did you encode the bytes in the first place? Commented Mar 14, 2012 at 16:11
  • @JaredPar: No text encoding. The bytes in question are binary. (Though the human-readable sections of the incoming data stream are ASCII.) I want to get a String from a Byte() whilst maintaining this encoding-agnosticism. Perhaps VB.NET doesn't support this? Commented Mar 14, 2012 at 16:13
  • you need to know a bit about encoding in order to decode properly so it can't be truly agnostic (unless you encode the encoding into the byte stream itself). It sounds like possible you're looking past the human readable content and into the non-readable portion. Do you have a format set for the byte() ? Commented Mar 14, 2012 at 16:15
  • @JaredPar: I'm absolutely looking into the non-readable portion, and I want to. That's why I want to maintain this encoding-agnosticism. I just want String to stop caring about encoding and be a nice automatically-resizing array of bytes for me. Commented Mar 14, 2012 at 16:16
  • 1
    You may want to look at List(Of Byte). It's the rough equivalent of std::vector<byte> and probably closer to what you're looking for. Commented Mar 14, 2012 at 16:24

1 Answer 1

2

Based on our comment discussion it seems like you want to see the Byte instances in the abscence of an encoding. If this is the case you should consider using List(Of Byte) instead of String

Sign up to request clarification or add additional context in comments.

1 Comment

Indeed; my not realising that the .NET String type is encoding-aware was the root cause of the whole problem. Thus the best solution is to use summat else throughout the entire parser, despite losing the easy substring search operations that String provides. This is now done and is working well. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.