DB2 sql query to find non ascii characters in strings

Question

I have a table (say ELEMENTS) with a VARCHAR field named NAME encoded in ccsid 1144. I need to find all the strings in the NAME field which contain "non ascii characters", that is characters that are in the ccsid 1144 set of characters without the ascii ones.

ruakh · Accepted Answer · 2012-10-26 14:11:58Z

2

I think you should be able to create a function like this:

CREATE FUNCTION CONTAINS_NON_ASCII(INSTR VARCHAR(4000))
  RETURNS CHAR(1)
  DETERMINISTIC NO EXTERNAL ACTION CONTAINS SQL
  BEGIN ATOMIC
  DECLARE POS, LEN INT;
  IF INSTR IS NULL THEN
    RETURN NULL;
  END IF;
  SET (POS, LEN) = (1, LENGTH(INSTR));
  WHILE POS <= LEN DO
    IF ASCII(SUBSTR(INSTR, POS, 1)) > 128 THEN
      RETURN 'Y';
    END IF;
    SET POS = POS + 1;
  END WHILE;
  RETURN 'N';
END

And then write:

SELECT NAME
  FROM ELEMENTS
 WHERE CONTAINS_NON_ASCII(NAME) = 'Y'
;

(Disclaimer: completely untested.)

By the way — judging by the documentation, it seems that VARCHAR is a string of bytes, not of Unicode characters. (Bytes range from 0 to 0xFF; Unicode characters range from 0 to 0x10FFFD.) If you're interested in supporting Unicode, you might want to use a different data-type.

answered Oct 26, 2012 at 14:11

ruakh

185k29 gold badges292 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Gabber Over a year ago

+1 thanks for the ASCII function, however the db2 manual states: In a Unicode database, if a supplied argument is a graphic string, it is first converted to a character string before the function is executed. As I understand that, no numbers >128 will ever be returned by ASCII, in fact the € sign is 26

ruakh Over a year ago

@Gabber: I saw that statement, but since it seems that VARCHAR is always a character string, not a graphic string, I didn't think that was relevant. (In other words, I understood that statement to mean that no numbers greater than 255 will ever be returned by ASCII.)

Gabber Over a year ago

From db2 manual: VARCHAR: Varying-length character strings with a maximum length of n bytes, no assumptions are made about the encoding, just about its length. I wouldn't have the problem otherwise, however I still find those damn € characters in my varchar fields :)

ruakh Over a year ago

@Gabber: Is it possible that the euro sign really is being stored as 26 (since that would be an ASCII control character anyway)? Maybe instead of ASCII(SUBSTR(INSTR, POS, 1)) > 128, try ASCII(SUBSTR(INSTR, POS, 1)) NOT BETWEEN 32 AND 127? (That will detect any characters less than SP ' ' or greater than tilde '~'.)

Gabber Over a year ago

Mm no, I mentioned UNICODE because I just needed to understand a possible approach and UNICODE is far more known BUT the encoding is EBCDIC 1144 and, as stated here the € corresponding char code is 159. Also thinking it does a module operation is not correct because 159-127=32

|

Collectives™ on Stack Overflow

DB2 sql query to find non ascii characters in strings

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related