0

Hello folks & fellow developers,

I am working on an Oracle DB and have some XML data stored in blob format, The XML has:

a) a formula (eg. %1 - %2 , or %2||'-'||%1)

b) a delimited list of variables that fit into the formula (eg. SALE_Q1| SALE_Q2 or YEAR| MONTH)

I have managed to pull this data into 2 different columns (can be merged into 1 column if needed too) and what I need to do is get the output column as a superimposition of the variables onto the placeholders.

(eg. SALE_Q1 - SALE_Q2 , or MONTH||'-'||YEAR)

there are also a few caveats to this like:

  1. I don't know how many variables will each formula have and,
  2. the variables will not always be used in the formula in the same order as the delimited list (see eg.2)

we can consider the data coming from a query like: SELECT formula || ', ' || columns_used FROM data_table; for the input string and the output i am currently getting is like:

%1||'-'||%2||%3, SITE_NO| SITE_NAME| COUNTRY
0.1 * %1, WIND_RES
%1, TOTAL
CASE WHEN LENGTH(%1) < 8 THEN NULL ELSE TO_DATE(%1,'yyyymmdd')END, MIN_DATE
%1(+)=%3 and %2(+)=%4, ABC| LMN| PQR| XYZ

I am pretty new to SQL and this requirement is going over my head, any help will be highly appreciated. I need an SQL solution as a PL/SQL solution is not feasible in my requirement (this script is going to pull data from one DB and feed it to another repo on a regular basis)

I have seen some articles on XML table, model, or recursive regexp but I am not sure how I can use these. I have also seen this question on StackOverflow but my requirement is a little different from that and is a bit trickier as well. getting the formula string and the variables into the same string for processing like that question is also a plausible route. Any solution that can be written in an SQL query will be very helpful.

some more examples for your consideration:

"%1||'-'||%2||%3, SITE_NO| SITE_NAME| COUNTRY" => "SITE_NO||'-'||SITE_NAME||COUNTRY"

"0.1 * %1, WIND_RES" => "0.1 * WIND_RES"

"%1, TOTAL" => "TOTAL"

"CASE WHEN LENGTH(%1) < 8 THEN NULL ELSE TO_DATE(%1,'yyyymmdd')END, MIN_DATE" => "CASE WHEN LENGTH(MIN_DATE) < 8 THEN NULL ELSE TO_DATE(MIN_DATE,'yyyymmdd')END"

"%1(+)=%3 and %2(+)=%4, ABC| LMN| PQR| XYZ" => "ABC(+)=PQR and LMN(+)=XYZ"
4
  • You appear to be building something that will build dynamic SQL; this looks like a recipe for making your code vulnerable to SQL injection and that is almost certainly not something that you want to put into a production system. Also, your last example looks very much like you are using Oracle's legacy comma join syntax; please migrate to using ANSI joins. Commented Mar 24, 2021 at 9:33
  • Also, you appear to be using || as the string concatenation operator and | as your delimiter. What logic is there to ensure that the | characters retain their appropriate meaning? Is it that the | before a comma are concatenation operators and the | after the comma are always delimiters? Can you ensure that there is never going to be two commas? Commented Mar 24, 2021 at 9:36
  • @MT0 the script is not going to be exposed to any client end and the data input is not coming from a transactional DB, this is a script is part of a system to get a lot of aggregated data to a DB for analysis. The formulee are coming from a reporting tool and hence the sntax looks legacy (i can't change that). I can however change the delimiter, all suggestions are available ", " or "|" "/" or any symbol that you can suggest Commented Mar 24, 2021 at 9:46
  • @MT0, the input string and the list of variables can also be used as 2 different columns if that would the task easier Commented Mar 24, 2021 at 9:51

1 Answer 1

1

I am assuming that the last comma is the delimiter between the template string and the delimited terms. You can use a recursive sub-query factoring clause and simple string functions:

WITH terms ( value, terms, num_terms ) AS (
  SELECT SUBSTR( value, 1, INSTR( value, ', ', -1 ) - 1 ),
         SUBSTR( value, INSTR( value, ', ', -1 ) + 2 ),
         REGEXP_COUNT(
           SUBSTR( value, INSTR( value, ', ', -1 ) + 2 ),
           '.+?(\| |$)'
         )
  FROM   table_name
),
term_bounds ( rn, value, terms, start_pos, lvl ) AS (
  SELECT ROWNUM,
         REPLACE(
           value,
           '%' || num_terms,
           CASE num_terms
           WHEN 1
           THEN terms
           ELSE SUBSTR( terms, INSTR( terms, '| ', 1, num_terms - 1 ) + 2 )
           END
         ),
         terms,
         CASE
         WHEN num_terms > 1
         THEN INSTR( terms, '| ', 1, num_terms - 1 )
         ELSE 1
         END,
         num_terms
  FROM   terms
UNION ALL
  SELECT rn,
         REPLACE(
           value,
           '%' || (lvl - 1),
           CASE lvl - 1
           WHEN 1
           THEN SUBSTR( terms, 1, start_pos - 1 )
           ELSE SUBSTR(
                  terms,
                  INSTR( terms, '| ', 1, lvl - 2 ) + 2,
                  start_pos - INSTR( terms, '| ', 1, lvl - 2 ) - 2
                )
           END
         ),
         terms,
         CASE
         WHEN lvl > 2
         THEN INSTR( terms, '| ', 1, lvl - 2 )
         ELSE 1
         END,
         lvl - 1
  FROM   term_bounds
  WHERE  lvl > 1
)
SEARCH DEPTH FIRST BY rn SET rn_order
SELECT value
FROM   term_bounds
WHERE  lvl = 1;

Which, for the sample data:

CREATE TABLE table_name ( value ) AS
SELECT '%1||'-'||%2||%3, SITE_NO| SITE_NAME| COUNTRY' FROM DUAL UNION ALL
SELECT '0.1 * %1, WIND_RES' FROM DUAL UNION ALL
SELECT '%1, TOTAL' FROM DUAL UNION ALL
SELECT 'CASE WHEN LENGTH(%1) < 8 THEN NULL ELSE TO_DATE(%1,'yyyymmdd')END, MIN_DATE' FROM DUAL UNION ALL
SELECT '%1(+)=%3 and %2(+)=%4, ABC| LMN| PQR| XYZ' FROM DUAL UNION ALL
SELECT '%1, %2, %3, %4, %5, %6, %7, %8, %9, %10, %11, ONE| TWO| THREE| FOUR| FIVE| SIX| SEVEN| EIGHT| NINE| TEN| ELEVEN' FROM DUAL UNION ALL
SELECT '%%%%%%%7, HELLO| 1| 2| 3| 4| 5| 6' FROM DUAL

Outputs:

| VALUE                                                                                      |
| :----------------------------------------------------------------------------------------- |
| SITE_NO||'-'||SITE_NAME||COUNTRY                                                 |
| 0.1 * WIND_RES                                                                             |
| TOTAL                                                                                      |
| CASE WHEN LENGTH(MIN_DATE) < 8 THEN NULL ELSE TO_DATE(MIN_DATE,'yyyymmdd')END |
| ABC(+)=PQR and LMN(+)=XYZ                                                                  |
| ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN                          |
| HELLO                                                                                      |

db<>fiddle here


Or as a PL/SQL function:

CREATE FUNCTION substitute_values(
  i_value          IN VARCHAR2,
  i_terms          IN VARCHAR2,
  i_term_delimiter IN VARCHAR2 DEFAULT '| '
) RETURN VARCHAR2 DETERMINISTIC
IS
  TYPE term_list_type IS TABLE OF VARCHAR2(200);
  v_start  PLS_INTEGER := 1;
  v_end    PLS_INTEGER;   
  v_index  PLS_INTEGER;
  v_terms  term_list_type := term_list_type();
  v_output VARCHAR2(4000) := i_value;
BEGIN
  LOOP
    v_end := INSTR(i_terms, i_term_delimiter, v_start);
    v_terms.EXTEND;
    IF v_end > 0 THEN
      v_terms(v_terms.COUNT) := SUBSTR(i_terms, v_start, v_end - v_start);
    ELSE
      v_terms(v_terms.COUNT) := SUBSTR(i_terms, v_start);
      EXIT;
    END IF;
    v_start := v_end + LENGTH(i_term_delimiter);
  END LOOP;
  
  LOOP
    v_index  := TO_NUMBER(REGEXP_SUBSTR(v_output, '^(.*?)%(\d+)', 1, 1, NULL, 2));
    IF v_index IS NULL THEN
      RETURN v_output;
    ELSIF v_index > v_terms.COUNT THEN
      RETURN NULL;
    END IF;
    v_output := REGEXP_REPLACE(v_output, '^(.*?)%(\d+)', '\1' || v_terms(v_index));
  END LOOP;
END;
/

Then:

SELECT SUBSTITUTE_VALUES(
         SUBSTR(value, 1, INSTR(value, ', ', -1) - 1),
         SUBSTR(value, INSTR(value, ', ', -1) + 2)
       ) AS value
FROM   table_name

Outputs:

VALUE
SITE_NO||'-'||SITE_NAME||COUNTRY
0.1 * WIND_RES
TOTAL
CASE WHEN LENGTH(MIN_DATE) < 8 THEN NULL ELSE TO_DATE(MIN_DATE,'yyyymmdd')END
ABC(+)=PQR and LMN(+)=XYZ
ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN
HELLO

db<>fiddle here

Sign up to request clarification or add additional context in comments.

14 Comments

I ran your query and it ran like a charm but I have a corner case that I would like to ask about how can I incorporate double-digit numbers for place holders like %15 or %21
@kunalkavthekar Updated to start at the end and work backwards from the highest number to the lowest.
thanks a lot, that did the trick, and sorry for the late reply
thanks for helping me with this query, we have been using it for a while now but now they have come back to me asking if it is possible to write this query without the "union all" and recursion (i tried explaining that it is required). I have been thinking that as it is in Oracle, is it possible to write it using the oracle "connect by" structure? if so could you please help me with that?
No, you need to repeatedly apply the replacement and this can only be done by repeatedly iterating over the string. You can do that in SQL, PL/SQL (examples of both in the question) or in another language such as Java which can be accessed from the database. The only other improvement I can suggest is to calculate the replacement text when the values are inserted/updated (either using a trigger to populate another column/table or in a materialized view) so that you do not have to regenerate the text with every SELECT.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.