Converting VB6 encoding application into C#

Question

I'm importing files in codepage 1252 encoding to a SQL Server 2008 database.

Some data contains a comma that isn't the traditional comma (keycode 44) but instead 8218.

The column that contains this value is encrypted via an algorithm in VB6. When I implement the same algorithm in C# I get value 130 which then will does not match 8218.

What am I missing?

EDIT Thought I would share the solution.... Thank god for Reflector. It was that simple...

Either way, provide some more info on that algorithm, as it stands this is not answerable. — Henk Holterman
– Henk Holterman, Commented Sep 19, 2009 at 12:33
... and a sample (hex dump) of the data before. (Quoting the VB6 code is likely the best way to show the algorithm.) — Richard
– Richard, Commented Sep 19, 2009 at 13:12

Alan Moore · Accepted Answer · 2009-09-20 12:05:51Z

3

130 is the windows-1252 encoding for the character U+201A (decimal 8218), "Single Low-9 Quotation Mark". If you decode it correctly, the resulting char will have the numeric value 8218 because .NET uses UTF-16 ("Unicode") internally.

It sounds like you decoded the windows-1252 byte sequence as ISO-8859-1, which maps 0x82 (decimal 130) to a control character with numeric value 130. If that's the case, the real solution to your problem is to go back and change the part that's decoding it wrong in the first place.

edited Sep 20, 2009 at 12:05

answered Sep 20, 2009 at 1:04

Alan Moore

75.6k13 gold badges110 silver badges161 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniel Over a year ago

Yes, but I don't own that data and even if I have a copy of the data I have a requirement to leave it in original state. //D

Jon Skeet · Accepted Answer · 2009-09-19 13:31:05Z

As ever, the key thing is to separate out each bit of the process, and check the strings at each stage.

So first write a program which just reads the file and dumps out the details of the strings, in terms of the Unicode values. I have some code on my strings page which will help with this. When you read the file, specify the encoding explicitly.

Then write a separate program with hardcoded literals (using \uxxxx where necessary) to upload into the database. Then examine the strings in the database as accurately as you can. I would expect the actual uploading bit to just work, so long as the database has the appropriate settings.

There's a bit more on this general process on my "debugging unicode problems" page.

Daniel · Accepted Answer · 2009-09-19 18:51:57Z

After fiddling a bit I came up with this:

/// <summary>
/// Some charcodes produced by unicode character handling
/// does not map correctly to codepage 1252. This function
/// translates every char to codepage 1252, unless the char
/// takes more than one byte. Then it gets encoded using Unicode.
/// </summary>
/// <param name="chars"></param>
/// <returns></returns>
private string GetStringAfterFixingEncoding(IEnumerable<char> chars)
{
    var result = new StringBuilder();

    foreach (var c in chars)
    {
        var unicodeBytesForChar = Encoding.Unicode.GetBytes(new[] { c });

        if (unicodeBytesForChar.Length > 1 && unicodeBytesForChar[1] != 0)
            result.Append(Encoding.Unicode.GetChars(unicodeBytesForChar)[0]);
        else
            result.Append(_encoding.GetChars(unicodeBytesForChar)[0]);
    }

    return result.ToString();
}

Collectives™ on Stack Overflow

Converting VB6 encoding application into C#

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related