I've got a rough table structure like this:
| ID | Value | Project | Last update |
|---|---|---|---|
| 1 | Hashtag# | testing | 123981010 |
| 2 | I like,it | admin | 123129319 |
We have a very complex mechanism which results in a new table which needs these 3 columns. (has about 68 mil entries) For reasons I don't want to go into, we can't use a join statement. (lots of legacy tables) Now my approach was to create a delimited string like (with a function):
Value#Project#Lastupdate
Here it doesn't matter what the delimiter is. But given that Value can have ANY value present in there,I can't just use a very complicated separator which MIGHT cover 99.999999 % of all cases but in 1 case out of 100 of millions it wont. So what I did was to escape all occurences of the separator in the string, and then separate them. Now in this example it would look like this:
Hashtag\##testing#123981010
So I escape with ''. Now this all works, but it is really slow. The select statement
Here is my code:
CREATE OR REPLACE FUNCTION unescape_internal( i_str IN VARCHAR2, i_idx IN INTEGER) RETURN VARCHAR2 IS
v_first VARCHAR2(4000);
v_second VARCHAR2(4000);
v_third VARCHAR2(4000);
v_ret VARCHAR2(4000);
v_curr_str VARCHAR2(4000);
v_curr_char VARCHAR2(1);
BEGIN
v_curr_str := '';
FOR i in 1..LENGTH(i_str)
LOOP
v_curr_char := substr(i_str, i, 1);
IF v_curr_char = '#' AND i = 1 THEN
IF v_first IS NULL THEN
v_first := ' ';
END IF;
ELSIF v_curr_char = '#' AND substr(i_str, i-1, 1) != '\' THEN
IF v_first IS NULL THEN
v_first := v_curr_str;
v_curr_str := '';
ELSIF v_second IS NULL THEN
v_second := v_curr_str;
v_curr_str := '';
END IF;
ELSE
v_curr_str := v_curr_str || v_curr_char;
END IF;
END LOOP;
IF v_third IS NULL THEN
v_third := v_curr_str;
v_curr_str := '';
END IF;
IF( i_idx = 1 ) THEN
v_ret := TRIM(NVL(v_first,''));
ELSIF( i_idx = 2 ) THEN
v_ret := TRIM(NVL(v_second,''));
ELSIF( i_idx = 3 ) THEN
v_ret := TRIM(NVL(v_third,''));
ELSE
v_ret := '';
END IF;
RETURN v_ret;
END unescape_internal;
The code I used for testing was like this:
declare
v_varchar1 VARCHAR2(4000);
v_varchar2 VARCHAR2(4000);
v_varchar3 VARCHAR2(4000);
BEGIN
FOR i in 1..1000000
LOOP
select unescape_internal('123123123123#1dasdyxcsd113\##test' || i, 1), unescape_internal('123123123123#1dasdyxcsd113\##test' || i, 2), unescape_internal('123123123123#1dasdyxcsd113\##test' || i, 3) into v_varchar1, v_varchar2, v_varchar3 from dual;
END LOOP;
END;
This takes about 40 seconds. Which in this dataset doesn't sound much, but given that I have very little data, and small column values, this will be much more of a problem. We tried using this function with the actual dataset, before the my changes, the sql took about 300 s to return 250 entries in the sqldeveloper. With my changes, we had to abort after 30 minutes of waiting. Using 68 mil entries as in the real environment, this would take about 45 minutes. The code is slow due to it processing the string multiple times. Since I need every column (we have 3) I pass an index of the column I want to get returned. This results in 3 full processings of the string. Unfortunately using DETERMINISTIC for the function doesn't work, since the index is changed everytime.
I tried using regex first, but unfortunately oracle (12g) doesn't support lookahead/lookbehind regex epressions and thus I am now out of ideas and in hope someone has some kind of idea.