Strings from Excel to utf-8 mysql

Question

I am writing some software that takes rows from an XLS file and inserts them into a database.

In OpenOffice, a cell looks like this :

Brunner Straße, Parzelle

I am using the ExcelFormat library from CodeProject.

int type = cell->Type();
cout << "Cell contains " << type << endl;
const char* cellCharPtr = cell->GetString();
if (cellCharPtr != 0) {
  value.assign(cellCharPtr);
  cout << "normal string -> " << value << endl;
}

The string when fetched with the library, is returned as a char* (so cell->Type() returns STRING, not WSTRING) and now looks like this (on the console) :

normal string -> Brunner Stra�e, Parzelle
hex string -> 42 72 75 6e 6e 65 72 20 53 74 72 61 ffffffdf 65 2c 20 50 61 72 7a 65 6c 6c 65

I insert it into the database using the mysql cpp connector like so :

prep_stmt = con -> prepareStatement ("INSERT INTO "
                  + tablename 
                  + "(crdate, jobid, imprownum, impid, impname, imppostcode, impcity, impstreet, imprest, imperror, imperrorstate)"
                  + " VALUES(?,?,?,?,?,?,?,?,?,?,?)");

<...snip...>

prep_stmt->setString(8,vals["street"]);

<...snip...>

prep_stmt->execute();

Having inserted it into the database, which has a utf8_general_ci collation, it looks like this :

Brunner Stra

which is annoying.

How do I make sure that whatever locale the file is in gets transformed to utf-8 when the string is retrieved from the xls file?

This is going to be running as a backend for a web service, where clients can upload their own excel files, so "Change the encoding of the file in Libre Office" can't work, I am afraid.

Would you please print the hex value of the byte array of the string? — ZhangChn
– ZhangChn, Commented Jan 23, 2013 at 9:54
ffffffdf obviously is not ASCII, and it's not UTF-8 either. I'd bet on Latin-1, but sign-extended. — MSalters
– MSalters, Commented Jan 23, 2013 at 10:26
Could you also include the code that inserts the string into the DB? The hex value looks like iso-8859-1, but the utf8_general_ci collation seems to be improperly truncated by \0s. — ZhangChn
– ZhangChn, Commented Jan 23, 2013 at 10:28

Joni · Accepted Answer · 2013-01-23 16:53:46Z

1

Your input seems to be encoded in latin1, so you need to set the mysql "connection charset" to latin1.

I'm not familiar with the API you are using to connect to MySQL. In other APIs you'd add charset=latin1 to the connection URL or call an API function to set the connection encoding.

Alternatively you can recode the input before feeding it to MySQL.

answered Jan 23, 2013 at 16:53

Joni

112k14 gold badges151 silver badges201 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tom Macdonald Over a year ago

I recoded the input based on a configuration parameter, so I'll accept this.

Collectives™ on Stack Overflow

Strings from Excel to utf-8 mysql

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related