5

I am converting some C# code into JavaScript code and while this file has multiple datatypes and I found a matching functionality in Javascrip from across the libraries, I am not able to find one particular function in JS.

That function is https://learn.microsoft.com/en-us/dotnet/api/system.io.binaryreader.readstring?view=net-7.0

There are couple of questions that I have:

  1. First of all what confuses me is that isn't a string inherently a variable length variable? If so, how can this function not take a length argument?
  2. Let's assume that there is some cap on the length of the string. If so, does JS/TS have any similar functionality? Or any package that I can download to mimic the C# functionality?

Thank you in advance.

8
  • 1
    It just looks like a readable stream and maybe a DataView together? Is this in the browser or Node.js? Commented Oct 20, 2022 at 16:29
  • 1
    "Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time." Commented Oct 20, 2022 at 16:32
  • 1
    From the link: "Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time." Commented Oct 20, 2022 at 16:33
  • @caTS its a browser. Commented Oct 20, 2022 at 16:36
  • Looking at @OliverWeichhold and poul's answers' I understand that variable length is not a problem as far as JS can do the job. Now the question is that can JS do this? Commented Oct 20, 2022 at 16:37

1 Answer 1

3
+50

BinaryReader expects strings to be encoded in specific format - the format BinaryWriter writes them. As stated in documentation:

Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time

So length of the string is stored right before the string itself, encoded "as integer seven bits at a time". We can get more info about that from BinaryWriter.Write7BitEncodedInt:

The integer of the value parameter is written out seven bits at a time, starting with the seven least-significant bits. The high bit of a byte indicates whether there are more bytes to be written after this one.

If value will fit in seven bits, it takes only one byte of space. If value will not fit in seven bits, the high bit is set on the first byte and written out. value is then shifted by seven bits and the next byte is written. This process is repeated until the entire integer has been written.

So it's variable-length encoding: unlike the usual approach to always use 4 bytes for Int32 value, this approach uses variable number of bytes. That way the length of short string can take less than 4 bytes (strings with length less than 128 bytes will take just 1 byte for example).

You can reproduce this logic in javascript - just read one byte at a time. Lowest 7-bits represent (part of) the length information, and highest bit indicates whether next byte also represents length information (otherwise it's the start of actual string).

Then when you got the length - use TextDecoder to decode byte array into string of given encoding. Here is the same function in typescript. It accepts buffer (Uint8Array), offset into that buffer and encoding (by default UTF-8, check docs of TextDecoder for other available encodings):

class BinaryReader {
  getString(buffer: Uint8Array, offset: number, encoding: string = "utf-8") {
      let length = 0; // length of following string
      let cursor = 0;
      let nextByte: number;
      do {
          // just grab next byte
          nextByte = buffer[offset + cursor];          
          // grab 7 bits of current byte, then shift them according to this byte position
          // that is if that's first byte - do not shift, second byte - shift by 7, etc
          // then merge into length with or.
          length = length | ((nextByte & 0x7F) << (cursor * 7));          
          cursor++;
      }
      while (nextByte >= 0x80); // do this while most significant bit is 1

      // get a slice of the length we got
      let sliceWithString = buffer.slice(offset + cursor, offset + cursor + length);      
      let decoder = new TextDecoder(encoding);      
      return decoder.decode(sliceWithString);
  }
}

Worth adding various sanity checks into the above code if will be used in production (that we do not read too much bytes reading length, that calculated length is actually in bounds of buffer etc).

Small test, using binary representation of string "TEST STRING", written by BinaryWriter.Write(string) in C#:

let buffer = new Uint8Array([12, 84, 69, 83, 84, 32, 83, 84, 82, 73, 78, 71, 33]);
let reader = new BinaryReader();
console.log(reader.getString(buffer, 0, "utf-8"));
// outputs TEST STRING

Update. You mention in comments that in your data the length of the string is represented by 4 bytes, so for example length 29 is represented by [0, 0, 0, 29]. That means your data was not written using BinaryWriter, and so cannot be read using BinaryReader, so you don't actually need analog of BinaryReader.GetString, contrary to what your question asks.

Anyway if you need to handle such case - you can do it:

class BinaryReader {
  getString(buffer: Uint8Array, offset: number, encoding: string = "utf-8") {
      // create a view over first 4 bytes starting at offset      
      let view = new DataView(buffer.buffer, offset, 4);
      // read those 4 bytes as int 32 (big endian, since your example is like that)
      let length = view.getInt32(0);
      // get a slice of the length we got
      let sliceWithString = buffer.slice(offset + 4, offset + 4 + length);      
      let decoder = new TextDecoder(encoding);      
      return decoder.decode(sliceWithString);
  }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much this is amazing. I tried your code. for a string with length 29. But then when I read the first three bytes, they are literally coming out to be 0. Only the LSB contains any value whatsoever. So going back to your code, wouldn't while (nextByte >= 0x80) exit in the first ever byte that it reads? Because first three bytes are literally 0. For a string with length of 29 the 4 bytes(converted into UInt8Array) look like [0,0,0,29] I think that is the root of all the problem. I am, just like you, expecting the MSB to be 1 within each byte.
That just means that byte array was not created via BinaryWriter.WriteString, and so cannot be read with BinaryReader.ReadString. So in this case you are not looking for analog of BinaryReader as your question mentions. I've updated the answer however with possible solution for this case.
Thank you your original answer worked. I had an error in the way I was reading the bytes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.