0

I have a table with a clob column that I need to extract data from. The column looks like this:

Column1
Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021. etc..

I would like to extract only the data between each of the equals signs (=) and the immediately following periods (.), so that the final output looks like this:

Name Branch_Number Type_of_Event Date
John Smith 12345 Seminar 06/22/2021

I've tried this:

Select
  regexp_substr(Column1,'\Name=([^.]+)',1,1,null,1) as Name
, regexp_substr(Column1,'\Branch Number=([^.]+)',1,1,null,1) as Branch_Number
, regexp_substr(Column1,'\Type of Event=([^.]+)',1,1,null,1) as Type_of_Event
, regexp_substr(Column1,'\Date=([^.]+)',1,1,null,1) as Date_of_Event

From table1

Where...

I know there are at least mistakes in the '1,1,null,1' parts (I could only locate these online), because the only column that ends up working is the first one, the other 3 show blanks.

Is there a way to extract each data field between each equals sign and each immediately following period into separate columns?

Any help would be great, thank you in advance. Apologies if my code makes anyone cringe, I just started using Oracle SQL Developer recently and first time using REGEX.

Working but messy solution:

Select
 SUBSTR(Column1, INSTR(Column1, 'Name=', 1, 1) + length('Name='), INSTR(Column1, '.', INSTR(Column1, 'Name=', 1, 1), 1) - INSTR(Column1, 'Name=', 1, 1) + length('Name='))) as Name
, SUBSTR(Column1, INSTR(Column1, 'Branch Number=', 1, 1) + length('Branch Number='), INSTR(Column1, '.', INSTR(Column1, 'Branch Number=', 1, 1), 1) - INSTR(Column1, 'Branch Number=', 1, 1) + length('Branch Number='))) as Branch_Number
, SUBSTR(Column1, INSTR(Column1, 'Type of Event=', 1, 1) + length('Type of Event='), INSTR(Column1, '.', INSTR(Column1, 'Type of Event=', 1, 1), 1) - INSTR(Column1, 'Type of Event=', 1, 1) + length('Type of Event='))) as Type_of_Event
, etc...
From table1
Where ...

4 Answers 4

1

This works fine:

Select regexp_substr(Column1, 'Name=([^.]+)',1,1,null,1) as Name,
       regexp_substr(Column1, 'Branch Number=([^.]+)',1,1,null,1) as Branch_Number,
       regexp_substr(Column1, 'Type of Event=([^.]+)',1,1,null,1) as Type_of_Event,
       regexp_substr(Column1, 'Date=([^.]+)',1,1,null,1) as Date_of_Event
from (select 'Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021. etc..' as column1 from dual) t;

The only modifications to your code are:

  • Removing the leading \, although that doesn't really make a difference to the results.
  • Using the correct prefix for Date.

Here is a db<>fiddle.

Sign up to request clarification or add additional context in comments.

Comments

0

Don't know about regular expressions, but substr + instr do that in a simple manner:

SQL> with test (col) as
  2    (select 'Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021.' from dual)
  3  select substr(col, instr(col, '=', 1, 1) + 1,
  4                     instr(col, '.', 1, 1) - instr(col, '=', 1, 1) - 1
  5               ) as name,
  6         substr(col, instr(col, '=', 1, 2) + 1,
  7                     instr(col, '.', 1, 2) - instr(col, '=', 1, 2) - 1
  8               ) as branch_number,
  9         substr(col, instr(col, '=', 1, 3) + 1,
 10                     instr(col, '.', 1, 3) - instr(col, '=', 1, 3) - 1
 11               ) as event,
 12         substr(col, instr(col, '=', 1, 4) + 1,
 13                     instr(col, '.', 1, 4) - instr(col, '=', 1, 4) - 1
 14               ) as datum
 15  from test;

NAME       BRANC EVENT   DATUM
---------- ----- ------- ----------
John Smith 12345 Seminar 06/22/2021

SQL>

6 Comments

simple simple... Not sure what the OP means with the "etc" on Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021. etc.. Maybe there are more columns than presented? In that case your solution wouldn't be dynamic
@jaime-drq. I do have MANY MANY more data fields in the column to extract. And another wrinkle is that not every row will contain the exact same field names nor will they be in the same order. Some will include for example 'Other Speaker=Tammy Wilson.', others will not. The solution from Littlefoot works if each row is exactly identical in structure. Unfortunately this table is sourced by a messy user interface.
@user16292871, in my opinion, first of all you need to normalize your data. It's not normal in sql to get a row with 4 columns and the next one with 10, so I am afraid that your approach using regexp_substr is not going to work either.
You know how it goes. Garbage in, garbage out.
I need the query to be able to detect from a large list of field names, where each row can be essentially any combination of the field names, so if a particular row does have the field name 'Other Speaker=Fred Flintstone.' then I would need to extract Fred Flintstone, and null I guess if that field name isn't there.
|
0

The only way I know to dynamically do "something" with pure sql is using recursive CTE. This kind of query doesn't perform very well, and in my example returns your data in rows (and it's just an example, not your expected output).

If you change your sample record by another one with more fields, you will see that all of them are returned without changing the query.

with 
d (col, val, n) as (
  select 
    'Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021.' as col, 
    '' as val, 
    0 as n 
  from dual
  union all
  select 
    'Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021.',
    substr(
      col, 
      instr(col, '=', 1, n + 1) + 1,
      instr(col, '.', 1, n + 1) - instr(col, '=', 1, n + 1) - 1
    ),
    n + 1
  from d
  where 
    substr(
      col, 
      instr(col, '=', 1, n + 1) + 1,
      instr(col, '.', 1, n + 1) - instr(col, '=', 1, n + 1) - 1
    ) is not null
)
select val from d where val is not null

You can test on this db<>fiddle

Comments

0

If you are okay with the data being in the rows instead of columns, that would allow it to be more dynamic. You can use XMLTABLE to split the rows at the period character (assuming the only periods are the ones at the end of a field). After you split them into rows, just use substr and instr to split the strings into field and value.

with data as (
select 'Name=John Smith. Branch Number=12345. Type of Event=Seminar. Date=06/22/2021.' Column1 from dual)

select substr(trim(COLUMN_VALUE),1,instr(trim(COLUMN_VALUE),'=')-1) FieldName,
       substr(trim(COLUMN_VALUE),instr(trim(COLUMN_VALUE),'=')+1) Val,
       trim(COLUMN_VALUE) full_text
from data,
xmltable(('"'
|| replace(data.Column1, '.', '","')
|| '"'));

Output:

+---------------+------------+-----------------------+
|   FIELDNAME   |    VAL     |       FULL_TEXT       |
+---------------+------------+-----------------------+
| Name          | John Smith | Name=John Smith       |
| Branch Number | 12345      | Branch Number=12345   |
| Type of Event | Seminar    | Type of Event=Seminar |
| Date          | 06/22/2021 | Date=06/22/2021       |
+---------------+------------+-----------------------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.