How to fix character encoding in sql query

Question

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.

The documentation says that the encoding is utf-8 latin1. My query looks something like this:

SELECT firstn, lastn
FROM unams
WHERE unamid = 12345

The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.

Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.

EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016

Edit your question to show which tool you use to submit the SQL, and which operating system runs that tool. You are seeing codepage conversion, which is avoidable by correct configuration. — mao
– mao, Commented Apr 24, 2019 at 9:28
Have you tried setting the Windows environment variable DB2CODEPAGE to value 1208 ? You will need to stop and restart your SPSS-Modeler tool for the change to take effect. — mao
– mao, Commented Apr 24, 2019 at 11:30
Yes, I get 1208 as the result of the following query: SELECT CODEPAGE FROM SYSCAT.DATATYPES WHERE TYPENAME = 'VARCHAR' — Newl
– Newl, Commented Apr 24, 2019 at 11:56
I feed the data to a (IdentityInsight) pipeline that loads it to the DB, when I manually insert a new row with special characters, it shows correctly in the DB and the query also gives back the proper names. I couldn't find how it messed up, that's why I was keen to find a function to recode the results. — Newl
– Newl, Commented Apr 24, 2019 at 12:02
From your comments, it is unclear if you have set the windows system environment variable on your workstation where SPSS-Modeler runs, (from control panel > System > system Properties > Environment Variables > System Variables > New. Then variable-name: DB2CODEPAGE, variable value: 1208 > OK > OK > OK. then restart. — mao
– mao, Commented Apr 24, 2019 at 12:10

mao · Accepted Answer · 2019-04-25 10:12:44Z

2

Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.

If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.

If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.

Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.

edited Apr 25, 2019 at 10:12

answered Apr 25, 2019 at 9:48

mao

12.4k2 gold badges15 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mark Barinstein · Accepted Answer · 2019-04-24 12:54:56Z

0

with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from 
  t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>' 
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);

C C_HEX
- -----
é C3A9
ó C3B3

The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.

If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

answered Apr 24, 2019 at 12:54

Mark Barinstein

12.8k2 gold badges11 silver badges19 bronze badges

4 Comments

Newl Over a year ago

Thanks for your answer! You code works and returns the mentioned table. Although, when I select a name, that contains and é, and run your code on it, it simply shows that what are the hex values of the Ă and © characters (hex C482 and C2A9) that somehow replace the original é. The proper conversion would be é -> Ă© -> é but it seems like that the last step is missing when I try to get an entity's name by a query.

Mark Barinstein Over a year ago

If your internal software works with such data appropriately, then this may work as designed. Data may not be intended to be usable by common / third-party software. But it looks strange indeed. Back to your initial question: if you want to see correct data without your internal software, you probably must understand such an "encoding scheme", and "decode" the data with an appropriate expression/function.

Newl Over a year ago

The problem is that while the internal software can represent the given entities, but one can't do a search where there are special characters in the name. I.e.: I can't search for someone who has a name Aétó Bill, I can only do a search by the ID. That's why I thought that doing the same by queries from Modeler should be handy, if the conversion can be done.

Mark Barinstein Over a year ago

If I got you right this "internal software" actually places these "wrong" characters to the database. If so, then ask the owner of this "internal software" on why it does such strange things...

Collectives™ on Stack Overflow

How to fix character encoding in sql query

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related