5

How can I add a byte order mark to a StringBuilder? (I have to pass a string to another method which will save it as a file, but I can't modify that method).

I tried this:

var sb = new StringBuilder();
sb.Append('\xEF');
sb.Append('\xBB');
sb.Append('\xBF');

But when I view it with hex editor, it adds the following sequence: C3 AF C2 BB C2 BF

The string is huge, so it would be good to do it without back and forth converting to byte array.

Edit: Clarification after questions in comments. I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. I can't modify the other method.

3
  • Why? The byte order mark isn't needed until you write to a file... The issue you see is because the byte order marks are not Unicode. Commented Mar 10, 2014 at 17:11
  • I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. Commented Mar 10, 2014 at 17:38
  • For anyone that's using File.WriteAllText() - it supports setting the encoding to UTF8 which will add a BOM. See: learn.microsoft.com/en-us/dotnet/api/… Commented Jan 3, 2024 at 0:30

4 Answers 4

15

Two options:

  1. Don't include the byte order mark in your text at all... instead use an encoding which will automatically include it
  2. Include it as a character in your StringBuilder:

    sb.Append('\uFEFF'); // U+FEFF is the byte-order mark character
    

Personally I'd go for the first approach normally, but the "I can't modify that method" suggests it may not be an option in your case.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. Yes, you are right, I'd go for the first one normally, but I'm taking this approach beacuse I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage.
8

Byte-order marks are to inform readers of a file that the file is of a particular encoding. As such, you should only need the byte-order marks (BOM) in the actual file. If you want to include BOM in a text file you're writing, simply use StreamWriter to write to the file. For example:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write(sb.ToString);
}

If you don't want BOM with UTF-8:

using(var writer = new StreamWriter(stream))
{
    writer.Write(sb.ToString());
}

Or if you want different BOM:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF16))
{
    writer.Write(sb.ToString);
}

Update:

If you wanted to be coupled from the implementation detail of a BOM or a BOM of a particular encoding (i.e. could change at runtime or after deployment) but still wanted to pass a BOM-marked string, you could do something like this (assumes .NET 4.5):

var stream = new MemoryStream();
var encoding = Encoding.UTF8; // TODO: configurize this, if necessary
using(var writer = new StreamWriter(stream, encoding, 1024, true))
{
    writer.Write(sb.ToString());
}
CantModifyButMustUseThis(encoding.GetString(stream.ToArray());

7 Comments

I am aware what BOM is for. However, as I mentioned in my question, I must pass it to another method (which takes a string and creates a file of it on Azure Blob Storage), that's why I am taking this approach.
This is misleading. For example, with UTF-8 and StreamWriter, if you leave out the encoding constructor argument entirely or if you use new UTF8Encoding() as the argument, then UTF-8 without the byte-order-mark is produced. On the other hand, if you specify the argument as Encoding.UTF8 or as new UTF8Encoding(true) you get UTF-8 with BOM. This is a bit of a gotcha, actually. So your first example is wrong.
@JeppeStigNielsen Yes, you are correct. I've modified by answer.
@user2270404 The stream used by StreamWriter does not need to be a file stream.
There is no Encoding.UTF16 in dotnet core, use Encoding.Unicode instead.
|
1

IIRC (and not certain that I do), BOM gets added when you convert to byte using one of the relevant Unicode Encoders. I believe some of UnicodeEncoding's constructors take a bool that control if to add BOM. For example, calling the constructor public UnicodeEncoding (bool bigEndian, bool byteOrderMark); and setting the argument byteOrderMark to true should cause BOM to be emitted during serialization of your string.

2 Comments

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review
@Ouroborus, I made the edit -- that should address it, I hope?
0

I used this code in ASP.NET core, and well!! it works

 [HttpGet("GetCsv")]
    public async Task<IActionResult> GetCsv() {
        
        var cc = new CsvConfiguration(new System.Globalization.CultureInfo("en-US"));
        var entity = await _service.AdminPanelList();
        using (var ms = new MemoryStream()) {
            using (var sw = new StreamWriter(stream: ms, encoding: new UTF8Encoding(true))) {
                using (var cw = new CsvWriter(sw, cc)) {

                    var bom = '\uFEFF'.ToString();
                    byte[] bomArray = Encoding.UTF8.GetBytes(bom);
                    
                    ms.Write(bomArray);
                    cw.WriteRecords(entity);
                }

                var finalArray = ms.ToArray();
                



                var result = File(finalArray, "text/csv", $"PersonExport.csv");
                    

                return result;
            }
        }
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.