4

I'm using Delphi 2009.

This works for me in all cases but one:

var
  BOMLength: integer;
  Buffer: TBytes;
  Encoding: TEncoding;
  Value: string;

SetLength(Buffer, 2048);
CurFileStream.Read(Buffer[0], 2048);

Encoding := nil;
BOMLength := TEncoding.GetBufferEncoding(Buffer, Encoding);
Value := Encoding.GetString(Buffer);

In the one case it doesn't work, the file is a small simple one and starts off with a UTF8 Byte Order Mark (BOM), i.e. hex: 'EF BB BF' and contains the following:

0 HEAD
0 @I1@ INDI
1 NAME Barthel Lee /Brenner/
2 CONT MAURICE F. WEAVER
2 CONT  When I was eleven or twelve years old, I went to Camp Marguette for a w
2 CONC eek or two in the summertime. It was operated by Catholic Charities and w
0 TRLR

After the call to CurFileStreamRead, when I inspect the value of Buffer, it contains the BOM followed by the file, with 0's filling in the rest of the 2048 characters of the Buffer. The Encoding correctly detected the UTF8 BOM and set BOMLength to 3.

However, after the Encoding.GetString statement, the value of Value is the null string: ''.

I have put a try-except block around this to try to catch any exceptions, but there are none.

The code works for 500 other files of different types, but not for this one.

Does anyone know what I can do to fix this so that the file is correctly read?

Or maybe there is something wrong with the file, but I'm not sure what's different about it, or how to identify what might be different or wrong.


Followup:

Remy's answer is correct. I have now determined that it is just small files, less than the buffer size (in my case 2048 bytes) that fail to work without setting the lengths.

As I noted, the remaining part of the buffer is filled with zero's. This must be what causes the Encoding.GetString function to fail to return a value. But when it knows when to stop, it is okay.

1 Answer 1

5

GetString() returns a blank string (instead of raising an exception) if the source bytes are empty, or if it fails to decode the bytes. In your case, you are not telling GetString() to ignore the BOM or the un-filled portion of the buffer. Also, make sure that Encoding is initially nil.

var
  BOMLength: integer;
  Buffer: TBytes;
  BufLength: Integer;
  Encoding: TEncoding;
  Value: string;
begin
  SetLength(Buffer, 2048);
  BufLength := CurFileStream.Read(Buffer[0], Length(Buffer));

  Encoding := nil;
  BOMLength := TEncoding.GetBufferEncoding(Buffer, Encoding);
  Value := Encoding.GetString(Buffer, BOMLength, BufLength - BOMLength);
end;

If that still does not work then the source data most likely has an illegal byte in it.

Sign up to request clarification or add additional context in comments.

3 Comments

I did have the Encoding := nil statement in my code, so I've updated my question. Now I'm going to see if setting the lengths as you suggest will work.
Thank you, Remy. Yes that worked. It's still strange why it was needed for that one file, but not needed for any of the others (and there was a wide assortment of them) that I tried.
And thank you, Remy, and StackOverflow, for helping me in just 40 minutes, solve a problem that has been bugging me for over a week.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.