1

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.

The documentation says that the encoding is utf-8 latin1. My query looks something like this:

SELECT firstn, lastn
FROM unams
WHERE unamid = 12345

The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.

Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.

EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016

10
  • 1
    Edit your question to show which tool you use to submit the SQL, and which operating system runs that tool. You are seeing codepage conversion, which is avoidable by correct configuration. Commented Apr 24, 2019 at 9:28
  • Have you tried setting the Windows environment variable DB2CODEPAGE to value 1208 ? You will need to stop and restart your SPSS-Modeler tool for the change to take effect. Commented Apr 24, 2019 at 11:30
  • Yes, I get 1208 as the result of the following query: SELECT CODEPAGE FROM SYSCAT.DATATYPES WHERE TYPENAME = 'VARCHAR' Commented Apr 24, 2019 at 11:56
  • I feed the data to a (IdentityInsight) pipeline that loads it to the DB, when I manually insert a new row with special characters, it shows correctly in the DB and the query also gives back the proper names. I couldn't find how it messed up, that's why I was keen to find a function to recode the results. Commented Apr 24, 2019 at 12:02
  • 1
    From your comments, it is unclear if you have set the windows system environment variable on your workstation where SPSS-Modeler runs, (from control panel > System > system Properties > Environment Variables > System Variables > New. Then variable-name: DB2CODEPAGE, variable value: 1208 > OK > OK > OK. then restart. Commented Apr 24, 2019 at 12:10

2 Answers 2

2

Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.

If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.

If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.

Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.

Sign up to request clarification or add additional context in comments.

Comments

0
with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from 
  t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>' 
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);

C C_HEX
- -----
é C3A9
ó C3B3

The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.

If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

4 Comments

Thanks for your answer! You code works and returns the mentioned table. Although, when I select a name, that contains and é, and run your code on it, it simply shows that what are the hex values of the Ă and © characters (hex C482 and C2A9) that somehow replace the original é. The proper conversion would be é -> Ă© -> é but it seems like that the last step is missing when I try to get an entity's name by a query.
If your internal software works with such data appropriately, then this may work as designed. Data may not be intended to be usable by common / third-party software. But it looks strange indeed. Back to your initial question: if you want to see correct data without your internal software, you probably must understand such an "encoding scheme", and "decode" the data with an appropriate expression/function.
The problem is that while the internal software can represent the given entities, but one can't do a search where there are special characters in the name. I.e.: I can't search for someone who has a name Aétó Bill, I can only do a search by the ID. That's why I thought that doing the same by queries from Modeler should be handy, if the conversion can be done.
If I got you right this "internal software" actually places these "wrong" characters to the database. If so, then ask the owner of this "internal software" on why it does such strange things...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.