0

I have a string 'TICKER: IBM IBM Corporation Inc.' and I want to remove the ticker and its value and grab just the remaining in Oracle PL/SQL.

So I made this query but it is not working the way I intended:

SELECT REGEXP_REPLACE(
           'TICKER: IBM IBM Corporation Inc.',
           '(.*):[[:space:]](.*)[[:space:]](.*)', '\3')
      FROM dual;

I was hoping that '\3' would yield me 'IBM Corporation Inc.' but I get just 'Inc.' as the result.

REGEXP_REPLACE('TICKER:IBMIBMCORPORATIONINC.','(.*):[[:SPACE:]](.*)[[:SPACE:]](.*)','\3') 
----------------------------------------------------------------------------- 
Inc.                                                                                      

1 rows selected

Update:

SELECT REGEXP_REPLACE(
       'TICKER: IBM IBM Corporation Inc.',
       '(.*):[[:space:]](.*)[[:space:]](.*)', '\1|\2|\3')
  FROM dual;

Result:

REGEXP_REPLACE('TICKER:IBMIBMCORPORATIONINC.','(.*):[[:SPACE:]](.*)[[:SPACE:]](.*)','\1|\2|\3') 
-------------------------------------------------------------------------------- 
TICKER|IBM IBM Corporation|Inc.

What am I missing in the regular expression?

Thanks.

1

2 Answers 2

2
SELECT REGEXP_REPLACE(
       'TICKER: IBM IBM Corporation Inc.',
       '(.*):[[:space:]]([^ ]*)[[:space:]](.*)', '\3')
  FROM dual;

Your second capturing expression was grabbing everything, including the next space.

I should mention that I tested in Oracle, not PL/SQL. I would think there'd be no difference though.

PS: the following alternates work as well:

-- using only one capturing expression
SELECT REGEXP_REPLACE(
       'TICKER: IBM IBM Corporation Inc.',
       '.*: [^ ]* (.*)', '\1')
  FROM dual;

  -- using no capturing expressions
  SELECT REGEXP_REPLACE(
       'TICKER: IBM IBM Corporation Inc.',
       '.*: [^ ]* ', '')
  FROM dual;
Sign up to request clarification or add additional context in comments.

2 Comments

I was thinking along the same lines of replacing the first 2 words with NULL, assuming they are always there and the value (symbol?) will always be 1 word): '\w+: \w+ '
Should probably tighten it up a little by anchoring to the start of the string: '^\w+: \w+ '.
1
SELECT REGEXP_REPLACE(
           'TICKER: IBM IBM Corporation Inc.',
           '^(.*?):\s(\S*)\s(.*)$',
           '\3'
       )
FROM DUAL;

or, your code does not need many changes to make it work (anchoring it to the start of the string and converting the first two wild-card matches to be non-greedy):

SELECT REGEXP_REPLACE(
           'TICKER: IBM IBM: Corporation Inc.',
           '^(.*?):[[:space:]](.*?)[[:space:]](.*)',
           '\3'
        )
FROM DUAL;

5 Comments

Unlikely, but one never knows, Try it with a company name containing a colon: 'TICKER: IBM :IBM Co:rporation: Inc.'. Goes to show a query should be run to check for colons in the data first I guess.
Good point on description containing a colon. Your query seems to be handling it perfectly. Thanks. +1
@JKK Always expect the unexpected! Depending on the source of the data and how well it is (or most likely isn't) validated, all kinds of crud can be accepted and end up in the database. Always do some sanity checking against the data before making assumptions like "the company names will never contain a colon" :-)
Yep agreed. In my scenario, this data is always well maintained (because it is being made by another layer) and free from any non-alpha or special characters. Having said that, I'm going with the suggested approach (for just in case). Thanks.
@JKK fixed the : issue and also added a simple fix for your original query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.