1

I'm working with Oracle and JDBC to set up a database with a list of movies. The fields that load are MovieId, MovieTitle, and Genre. Format ex:

MovieId       MovieTitle             Genre
1            Toy Story (1997)      Animated

Now I need to split my MovieTitle to list the year 1997 in a separate column called "Year". I got it to work earlier using this:

SELECT SUBSTR(Movies.MovieTitle, 1, INSTR(Movies.MovieTitle, '(')-1) AS MovieTitle,
   SUBSTR(Movies.MovieTitle, INSTR(Movies.MovieTitle, ')')) AS Year
 FROM MOVIES;

However that is no good, because some of my movies have parenthesis in their title. So I believe I need to use regex, however I can't get it to work. Here is what I have been playing around with:

 WITH TEST AS
(SELECT MovieTitle FROM Movies)
SELECT REGEXP_SUBSTR(Movies.MovieTitle, '^\(\d{4}\)$', 1, 1) MovieTitle,
   REGEXP_SUBSTR(Movies.MovieTitle, '^\(\d{4}\)$', 1, 2) Year
FROM Movies;

All that gives me is two null columns for all my movies. Am I on the right track with this or way off? Another concern is that I want this to be an update on my original Movies table, not a new query or table of its own. Thanks for any suggestions.

0

1 Answer 1

1

Note that ^ is the start of string anchor (it matches the start of string position) and $ matches the end of string position. ^\(\d{4}\)$ matches a string like (1234), and there is no point setting the start position or number of match then since you require the whole string to match the pattern.

You may use

REGEXP_SUBSTR(Movies.MovieTitle, '\((\d{4})\)', 1, 1, NULL, 1)

It will extract the first 4-digit sequence that is enclosed with ( and ).

Details:

  • \( - a literal (
  • (\d{4}) - Group 1 (referred to with the last 1 argument to the REGEXP_SUBSTR) capturing exactly 4 consecutive digits
  • \) - a literal ).

See the online demo.

Sign up to request clarification or add additional context in comments.

7 Comments

Great! That definitely helps with extracting the numbers, but how do I get a separate column in my movies table?
@Polyphase29: That is something not related to regex already. I think you need to have that column already, and just insert the extracted value into it.
So that SQL works when I run it on my database in SQL Developer, however I'm trying to execute it using my Java code and it keeps confusing me with wanting to do a java escape sequence that I think it's messing up my SQL statement. Through some searches I thought changing my regex to '\\\((\\\\d{4})\\\)' would fix it, but it just results in null each time. Any ideas on what I can update to make it work?
@Polyphase29: You may avoid using backslashes by enclosing literal parentheses into [...] and change \d into [0-9]: [(]([0-9]{4})[)]
Why not use a REGEXP_REPLACE directly? See this online demo.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.