I am writing some software that takes rows from an XLS file and inserts them into a database.
In OpenOffice, a cell looks like this :
Brunner Straße, Parzelle
I am using the ExcelFormat library from CodeProject.
int type = cell->Type();
cout << "Cell contains " << type << endl;
const char* cellCharPtr = cell->GetString();
if (cellCharPtr != 0) {
value.assign(cellCharPtr);
cout << "normal string -> " << value << endl;
}
The string when fetched with the library, is returned as a char* (so cell->Type() returns STRING, not WSTRING) and now looks like this (on the console) :
normal string -> Brunner Stra�e, Parzelle
hex string -> 42 72 75 6e 6e 65 72 20 53 74 72 61 ffffffdf 65 2c 20 50 61 72 7a 65 6c 6c 65
I insert it into the database using the mysql cpp connector like so :
prep_stmt = con -> prepareStatement ("INSERT INTO "
+ tablename
+ "(crdate, jobid, imprownum, impid, impname, imppostcode, impcity, impstreet, imprest, imperror, imperrorstate)"
+ " VALUES(?,?,?,?,?,?,?,?,?,?,?)");
<...snip...>
prep_stmt->setString(8,vals["street"]);
<...snip...>
prep_stmt->execute();
Having inserted it into the database, which has a utf8_general_ci collation, it looks like this :
Brunner Stra
which is annoying.
How do I make sure that whatever locale the file is in gets transformed to utf-8 when the string is retrieved from the xls file?
This is going to be running as a backend for a web service, where clients can upload their own excel files, so "Change the encoding of the file in Libre Office" can't work, I am afraid.
ffffffdfobviously is not ASCII, and it's not UTF-8 either. I'd bet on Latin-1, but sign-extended.