0

I'm importing files in codepage 1252 encoding to a SQL Server 2008 database.

Some data contains a comma that isn't the traditional comma (keycode 44) but instead 8218.

The column that contains this value is encrypted via an algorithm in VB6. When I implement the same algorithm in C# I get value 130 which then will does not match 8218.

What am I missing?

EDIT Thought I would share the solution.... Thank god for Reflector. It was that simple...

3
  • 1
    Is VB6 Encoding or Encrypting? Commented Sep 19, 2009 at 12:32
  • 2
    Either way, provide some more info on that algorithm, as it stands this is not answerable. Commented Sep 19, 2009 at 12:33
  • ... and a sample (hex dump) of the data before. (Quoting the VB6 code is likely the best way to show the algorithm.) Commented Sep 19, 2009 at 13:12

3 Answers 3

3

130 is the windows-1252 encoding for the character U+201A (decimal 8218), "Single Low-9 Quotation Mark". If you decode it correctly, the resulting char will have the numeric value 8218 because .NET uses UTF-16 ("Unicode") internally.

It sounds like you decoded the windows-1252 byte sequence as ISO-8859-1, which maps 0x82 (decimal 130) to a control character with numeric value 130. If that's the case, the real solution to your problem is to go back and change the part that's decoding it wrong in the first place.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, but I don't own that data and even if I have a copy of the data I have a requirement to leave it in original state. //D
2

As ever, the key thing is to separate out each bit of the process, and check the strings at each stage.

So first write a program which just reads the file and dumps out the details of the strings, in terms of the Unicode values. I have some code on my strings page which will help with this. When you read the file, specify the encoding explicitly.

Then write a separate program with hardcoded literals (using \uxxxx where necessary) to upload into the database. Then examine the strings in the database as accurately as you can. I would expect the actual uploading bit to just work, so long as the database has the appropriate settings.

There's a bit more on this general process on my "debugging unicode problems" page.

Comments

0

After fiddling a bit I came up with this:

/// <summary>
/// Some charcodes produced by unicode character handling
/// does not map correctly to codepage 1252. This function
/// translates every char to codepage 1252, unless the char
/// takes more than one byte. Then it gets encoded using Unicode.
/// </summary>
/// <param name="chars"></param>
/// <returns></returns>
private string GetStringAfterFixingEncoding(IEnumerable<char> chars)
{
    var result = new StringBuilder();

    foreach (var c in chars)
    {
        var unicodeBytesForChar = Encoding.Unicode.GetBytes(new[] { c });

        if (unicodeBytesForChar.Length > 1 && unicodeBytesForChar[1] != 0)
            result.Append(Encoding.Unicode.GetChars(unicodeBytesForChar)[0]);
        else
            result.Append(_encoding.GetChars(unicodeBytesForChar)[0]);
    }

    return result.ToString();
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.