I make no claims to performance and I wouldn't run this in a PRODUCTION ENVIORNMENT until vetted and load/performance impacts considered.
- I use 2 CTE's to simulate your data (SearchWords and Sentences)
- I use
instr() to find the position of each word in a sentence
- I use
listAgg() to combine the data into one row for each word found in a sentence.
- I only return occurrences where a word is found in a sentence
- I use
CROSS JOIN so each search word related to every sentence (this could get UGLY in terms of memory usage CPU etc as the data set will be huge) thousands of words times hundreds of millions of sentences...
This is likely better done using text searches but I'm not sure how I'd get the data format you are looking for that way... shrug if it's a one time thing and you have the time to wait.... and it's in an environment where you won't bring down production....
DEMO:
https://dbfiddle.uk/?rdbms=oracle_21&fiddle=77e0b8d9373ee1abc14cf10342c45767
with SearchWords as (SELECT 1 ID , 'pluto' SearchWord from dual UNION ALL
SELECT 2, 'jupiter' from dual),
Sentences as (SELECT 1 ID, 'we go bacvk to earth' sentence from dual UNION ALL
SELECT 2, 'we discover pluto and jupiter' from dual),
Step1 as (SELECT S.ID, LISTAGG('(' || W.ID || ',' || instr(S.Sentence,W.SearchWord) || ')', ',')
WITHIN GROUP (ORDER BY W.ID) Result
FROM Sentences S
CROSS JOIN SearchWords W
WHERE instr(S.Sentence,W.SearchWord)>0
GROUP BY S.ID)
SELECT * FROM Step1
Really don't need the step1 CTE... but I wasn't sure if It was going to work out of the gate.
Giving us:
+----+---------------+
| ID | RESULT |
+----+---------------+
| 2 | (1,13),(2,23) |
+----+---------------+
If needed: You could subdivide the sentences into processing groups to processes some then union in more etc... to manage the hit. But if your environment is sufficiently large it may be able to handle it in one go.