1

My table looks like the following:

id | value1 | count

I have a list of value1 in RAM and I want to do the following:

    (if value1 exists in table){
    count + 1}else{
    insert new row into table}

Is this possible with Oracle or do I have to take it to the code, do a for loop and execute one element of the list at a time? The list contains 5 million values. I'd have to do something like this in the code:

for(int i=0; i<list.size; i++){
  boolean exists = checkifexists(list.get(i));
  if(exists=true){
    countPlusOne(list.get(i);
  }else{
    createNewRow(list.get(i));
  }
}

So I have to do at least two queries for each value, totalling 10m+ queries. This could take a long time and may not be the most efficient way to do this. I'm trying to think of another way.

11
  • Possible duplicate of stackoverflow.com/questions/6351823/… Commented Dec 28, 2018 at 7:30
  • is it possible to get your 5 million values into an oracle table ? Commented Dec 28, 2018 at 8:07
  • @KevinBurton I think this will be another concern, which I need to answer for Commented Dec 28, 2018 at 8:12
  • 1
    Where do these five million entries appear from? They don't magically appear in RAM so there must be a process which populates the list. Also, when you say RAM which tier of the application do you mean? Local PC? App server? Database server? Another thing: what's the list's data quality like? Are those 5 million unique values or can there be duplicates? Commented Dec 28, 2018 at 8:28
  • 1
    @APC excellent point! Commented Dec 28, 2018 at 8:32

4 Answers 4

1

"I load them into RAM from the database"

You already have the source data in the database so you should do the processing in the database. Instantiating a list of 5 million strings in local memory is not a cheap operation, especially when it's unnecessary.

Oracle supports a MERGE capability which we can use to test whether a record exists in the target table and populate a new row conditionally. Being a set operation MERGE is way more performative than single row inserts in a Java loop.

The tricky bit is uniqueness. You need to have a driving query from the source table which contains unique values (otherwise MERGE will hurl). In this example I aggregate a count of each occurrence of value1 in the source table. This gives us a set of value1 plus a figure we can use to maintain the count column on the target table.

merge into you_target_table tt
using ( select value1
               , count(*) as dup_cnt
        from your_source_table
        group by value1
      ) st
on ( st.value1 = tt.value1 )
when not matched then
   insert (id, value1, cnt) 
   values (someseq.nextval, st.value1, st.dup_cnt)
when matched then
   update
   set tt.cnt = tt.cnt + st.dup_cnt;

(I'm assuming the ID column of the target table is populated by a sequence; amend that as you require).

Sign up to request clarification or add additional context in comments.

4 Comments

The idea great. However, the actual query to getting the 5m records is more complex than just "select value1 from source_table". It involves 3 tables. I need to get some id from table1. Then I need to get another type of value(value1 in my original question) from table 2 and table 3 using the id from table1. It's rather complicated, but I'll try.
You can make the query in the USING clause as complicated as you like. The only restriction is it must return a set of unique values of the column(s) evaluated in the ON clause.
and yes the ID column of the target is populated by a sequence
Great! That's what we hope to do here.
1

In Oracle, we could use a MERGE statement to check if a row exists and do insertion only if it doesn't.

First create a type that defines your list.

CREATE OR REPLACE TYPE value1_type as TABLE OF VARCHAR2(10); --use the datatype of value1

Merge statement.

MERGE INTO yourtable t 
USING (
   select distinct column_value as value1 FROM TABLE(value1_type(v1,v2,v3))
)s ON ( s.value1 =  t.value1 )
WHEN NOT MATCHED THEN INSERT
(col1,col2,col3) VALUES ( s.col1,s.col2,s.col3);

You may also use NOT EXISTS.

INSERT INTO yourtable t 
select * FROM 
(
  select distinct column_value as value1 from  TABLE(value1_type(v1,v2,v3))
) s
      WHERE NOT EXISTS
    (
      select 1 from
       yourtable t where t.value1 = s.value1 
    );

1 Comment

While I like the idea of using a TYPE we would need to consider the impact on the PGA level of populating a PL/SQL collection with five million entries.
1

You can do this by two approaches

Approach 1:

  1. Create a temp table in database and insert all your value in RAM into that Temp Table
  2. Write query for updating count on the basis of you main table and temp table join and set a flag in temp table which values are updated, the value which are not updated use insert query to insert.

Approach 2:

  1. You can create your own data type, which accepts array of values as input: CREATE OR REPLACE TYPE MyType AS VARRAY(200) OF VARCHAR2(50);
  2. You can write procedure with your logic,procedure will take value of array as input: CREATE OR REPLACE PROCEDURE testing (t_in MyType)

Comments

1

First fill your RAM list in a temporary table TMP

select * from tmp; 
VALUE1    
----------
V00000001
V00000002
V00000003
V00000004
V00000005
...

You may use a MERGE statement to handle your logik

  • if key existe increase the count by 1
  • if key doesn't exists insert it with the initial count of 1

.

merge into val
using tmp
on (val.value1 = tmp.value1)
when matched then update
   set val.count = val.count + 1
when not matched then
   insert (val.value1, val.count)
   values (tmp.value1, 1)
;

Note that I assume you have IDENTITY key in the column ID, so no key assignment is requeired.

In case there are duplicated record in the TMP table (more records with the same VALUE1 key) you get error as MERGEcan not hanlde more actions with one key.

 ORA-30926: unable to get a stable set of rows in the source tables

If you want to count each duplicated key as one - you must pre-aggregate the temporary table using GROUP BY and add the counts.

Otherwise simple ignore the duplicates using DISTINCT.

merge /*+ PARALLEL(5) */ into val
using (select value1, count(*) count from tmp group by value1) tmp
on (val.value1 = tmp.value1)
when matched then update
   set val.count = val.count + 1
when not matched then
   insert (val.value1, val.count)
   values (tmp.value1, 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.