2

I need to replace HTML codes with the special characters. I am affected by the HTML code as said here

+------------------------+
+ Html code + Display    +
+-----------+------------+
+ À  +  À         +
+ à  +  à         +
+ Á  +  Á         +
+ á  +  á         +
+ È  +  È         +
+ è  +  è         +
+ É  +  É         +
+ é  +  é         +
+ Ì  +  Ì         +
+ ì  +  ì         +
+ Í  +  Í         +
+ í  +  í         +
+ Ò  +  Ò         +
+ ò  +  ò         +
+ Ó  +  Ó         +
+ ó  +  ó         +
+ Ù  +  Ù         +
+ ù  +  ù         +
+ Ú  +  Ú         +
+ ú  +  ú         +
+ «   +  «         +
+ »   +  »         +
+ €    +  €         +
+ °     +  °         +
+------------------------+

I found these entries in the database which make no sense. So need to change them into the original symbols (characters)

Data Setup: Also found in this SQL fiddle

The following values must be updated as per the below table

CREATE TABLE TEMP
(
  COL1  VARCHAR2(50 CHAR),
  COL2  VARCHAR2(50 CHAR),
  COL3  VARCHAR2(50 CHAR),
  COL4  VARCHAR2(10 CHAR)
);

Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA I MAGGIO', 'GIÙ PER LA STRADA', 'TOR LUPARA', '83');
Insert into TEMP (COL1, COL3, COL4)
 Values
   ('VIA D''AZEGLIO', 'MUGGIÒ', '12');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA PONTE NUOVO', 'TOSCA CAFÈ', 'VERONA', '8a');
Insert into TEMP (COL1, COL3, COL4)
 Values
   ('LOCALITÓ AGELLO', 'SAN SEVERINO MARCHE', '60');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA PAPA GIOVANNI XXIII', 'LOCALITÀ PREDONDO', 'BOVEGNO', '24');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA CATANIA', 'CASA DI OSPITAITÀ COLLEREALE', 'MESSINA', '26/B');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('PIAZZA DI SANTA CROCE IN GERUSALEMME', 'MINISTERO BENI E ATTIVITÀ CULTURALI', 'ROMA', '9/a');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA RONCIGLIO', 'LOCALITÀ MONTECUCCO', 'GARDONE RIVIERA', '55');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('BORGO TRINITA''', 'Borgo Trinità, 58', 'BELLANTE', '58');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('10 PIAZZA S. LORENZO', 'ROVARÈ', 'S. BIAGIO DI GALLALTA', '10');
Insert into TEMP (COL1, COL3, COL4)
 Values
   ('LOCALITÀ MALCHINA', 'SISTIANA', '3');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA DEI CROCIFERI', 'PRESSO AUTORITÀ ENERGIA', 'ROMA', '19');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIALE STAZIONE', 'FRAZIONE SAN NICOLÒ A TREBBIA', 'ROTTOFRENO', '10/B');
Insert into TEMP (COL1, COL2, COL3, COL4)
 Values
   ('VIA ADOLFO CONSOLINI', 'ALBARÈ DI COSTERMANO', 'COSTERMANO', '45 B');
COMMIT;

What we see after this setup is

SELECT * FROM TEMP;


COL1                                   COL2                                        COL3                  COL4    
---------------------------------------------------------------------------------- --------------------- --------
VIA I MAGGIO                           GIÙ PER LA STRADA                    TOR LUPARA            83      
VIA D'AZEGLIO                                                                      MUGGIÒ         12      
VIA PONTE NUOVO                        TOSCA CAFÈ                           VERONA                8a      
LOCALITÓ AGELLO                                                             SAN SEVERINO MARCHE   60      
VIA PAPA GIOVANNI XXIII                LOCALITÀ PREDONDO                    BOVEGNO               24      
VIA CATANIA                            CASA DI OSPITAITÀ COLLEREALE         MESSINA               26/B    
PIAZZA DI SANTA CROCE IN GERUSALEMME   MINISTERO BENI E ATTIVITÀ CULTURALI  ROMA                  9/a     
VIA RONCIGLIO                          LOCALITÀ MONTECUCCO                  GARDONE RIVIERA       55      
BORGO TRINITA'                         Borgo Trinità, 58                    BELLANTE              58      
10 PIAZZA S. LORENZO                   ROVARÈ                               S. BIAGIO DI GALLALTA 10      
LOCALITÀ MALCHINA                                                           SISTIANA              3       
VIA DEI CROCIFERI                      PRESSO AUTORITÀ ENERGIA              ROMA                  19      
VIALE STAZIONE                         FRAZIONE SAN NICOLÒ A TREBBIA        ROTTOFRENO            10/B    
VIA ADOLFO CONSOLINI                   ALBARÈ DI COSTERMANO                 COSTERMANO            45 B    

14 rows selected.

What I want to see is that

COL1                                  COL2                                COL3                      COL4      
------------------------------------- ----------------------------------- ------------------------- ----------
VIA I MAGGIO                          GIÙ PER LA STRADA                   LUPARA                    83        
VIA D'AZEGLIO                                                             MUGGIÒ                    12        
VIA PONTE NUOVO                       TOSCA CAFÈ                          VERONA                    8a        
LOCALITÓ AGELLO                                                           SAN SEVERINO MARCHE       60        
VIA PAPA GIOVANNI XXIII               LOCALITÀ PREDONDO                   BOVEGNO                   24        
VIA CATANIA                           CASA DI OSPITAITÀ COLLEREALE        MESSINA                   26/B      
PIAZZA DI SANTA CROCE IN GERUSALEMME  MINISTERO BENI E ATTIVITÀ CULTURALI ROMA                      9/a       
VIA RONCIGLIO                         LOCALITÀ MONTECUCCO                 GARDONE RIVIERA           55        
BORGO TRINITA'                        Borgo TrinitÀ, 58                   BELLANTE                  58        
10 PIAZZA S. LORENZO                  ROVARÈ                              S. BIAGIO DI GALLALTA     10        
LOCALITÀ MALCHINA                                                         SISTIANA                  3         
VIA DEI CROCIFERI                     PRESSO AUTORITÀ ENERGIA             ROMA                      19        
VIALE STAZIONE                        FRAZIONE SAN NICOLÒ A TREBBIA       ROTTOFRENO                10/B      
VIA ADOLFO CONSOLINI                  ALBARÈ DI COSTERMANO                COSTERMANO                45 B      

14 rows selected.

Pitfalls:

  1. À may also be written as À so all the HTML codes must be case insensitive.
  2. More than one HTML code can affect a column. So I need to search all the HTML codes in the table above for any column.

What I tried so far is a simple Update with replace function

UPDATE TEMP
SET COL1 = REPLACE (COL1, 'À' , 'À');

Going by this way, I will spend days to write the scripts. Because I need to carry on this fix in 20+ tables each with 40+ columns. So expecting a simpler way to do this.

Can some one help me out of this writers cramp?

Also which is the best way to replace, is it using the character or ASCII code conversion?

UPDATE:

What exactly I need

  1. How to write update set; either 'À' or CHR(192)

  2. All updates in one statement for one table (May be CASE statement, REGEXP_LIKE and REGEXP_REPLACE combination will do)

3 Answers 3

3

You want to use UTL_I18N.unescape_reference.

For not writing long scripts, let Oracle do the job for you. Then run its generated script:

select
   'UPDATE ' || table_name || ' SET ' || col_name || ' = UTL_I18N.unescape_reference(' || col_name || ');'
from
   all_tab_cols
where
   owner = <MY_NAME>
and 
   table_name in ('....') -- you can use this clause too: table_name like '%my_table%'
Sign up to request clarification or add additional context in comments.

4 Comments

+1 for dynamic SQL. Alex thanks for your help. But i am not interested in a numerous single sql. What I am looking for is more like a regex with CASE statement based update. so that i end up with one UPDATE per table.
@realspirituals Does the colum names change from table to table ?
Yes. Also my main concern is on replace. As of now I am writing a function to find and replace
Thanks for unescape reference. I would give it to you
1

You can create a procedure that will spool the UPDATE statements into a file, which you can eventually execute to perform the actual updates.

The steps will involve the following:

  1. Create a temp table with two columns storing the HTML Code and the Display value mapping.
  2. Create a procedure that will perform this logic using a cursor:
    • Loop through all the tables to be updated.
    • For each table, identify the columns from either user_tab_columns or all_tab_columns.
    • For each column, loop through the HTML Code in the temp table created in #1 then create an UPDATE statement that will replace the column value HTML code to its corresponding Display value.
    • Output each UPDATE statement to the console.
  3. Execute the procedure and spool the results into a file
  4. Execute the spooled file as a script to run the UPDATE statements.

The actual steps may not be the same as above. But the idea is to speed up the task by creating a procedure that will automate the creation of the necessary UPDATE statements.

1 Comment

+1 for your response. But this is an overkill by doing PL/SQL.
0

OK. Now I found one solution to make a function as described here. Both the above answers are useful on this.

Option 1:

CREATE OR REPLACE FUNCTION STRIP_HTML ( DIRTY    IN VARCHAR2,
                                        TO_CVS   IN NUMBER DEFAULT 0 )
   RETURN VARCHAR2
IS
   OUT                   CLOB;

   TYPE ARR_STRING IS VARRAY ( 38 ) OF VARCHAR2 ( 64 );

   ENTITIES_SEARCH_FOR   ARR_STRING;
   ENTITIES_REPLACE      ARR_STRING;
   CONT                  NUMBER;
BEGIN
   -- to accelerate the issue
   IF DIRTY IS NULL
   THEN
      RETURN DIRTY;
   END IF;                                                  

   ENTITIES_SEARCH_FOR :=
      ARR_STRING ( '&Agrave;',
                   '&AGRAVE;',
                   '&agrave;',
                   '&Aacute;',
                   '&AACUTE;',
                   '&aacute;',
                   '&Egrave;',
                   '&EGRAVE;',
                   '&egrave;',
                   '&Eacute;',
                   '&EACUTE;',
                   '&eacute;',
                   '&Igrave;',
                   '&IGRAVE;',
                   '&igrave;',
                   '&Iacute;',
                   '&IACUTE;',
                   '&iacute;',
                   '&Ograve;',
                   '&OGRAVE;',
                   '&ograve;',
                   '&Oacute;',
                   '&OACUTE;',
                   '&oacute;',
                   '&Ugrave;',
                   '&UGRAVE;',
                   '&ugrave;',
                   '&Uacute;',
                   '&UACUTE;',
                   '&uacute;',
                   '&laquo;',
                   '&LAQUO;',
                   '&raquo;',
                   '&RAQUO;',
                   '&euro;',
                   '&EURO;',
                   '&deg;',
                   '&DEG;' );

   ENTITIES_REPLACE :=
      ARR_STRING ( 'À',
                   'À',
                   'à',
                   'Á',
                   'Á',
                   'á',
                   'È',
                   'È',
                   'è',
                   'É',
                   'É',
                   'é',
                   'Ì',
                   'Ì',
                   'ì',
                   'Í',
                   'Í',
                   'í',
                   'Ò',
                   'Ò',
                   'ò',
                   'Ó',
                   'Ó',
                   'ó',
                   'Ù',
                   'Ù',
                   'ù',
                   'Ú',
                   'Ú',
                   'ú',
                   '«',
                   '«',
                   '»',
                   '»',
                   '€',
                   '€',
                   '°',
                   '°' );

   OUT         := DIRTY;

   FOR CONT IN 1 .. 38
   LOOP
      OUT         :=
         REPLACE ( OUT,
                   ENTITIES_SEARCH_FOR ( CONT ),
                   ENTITIES_REPLACE ( CONT ) );
   END LOOP;

   RETURN (OUT);
END STRIP_HTML;

Option2:

SELECT UTL_I18N.unescape_reference(COL1) COL1,
       UTL_I18N.unescape_reference(COL2) COL2,
       UTL_I18N.unescape_reference(COL3) COL3,
       UTL_I18N.unescape_reference(COL4) COL4
  FROM TEMP;

But this cannot handle all uppercase HTML friendly codes. So need another replace for that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.