I am looking for an efficient way to compare data between an Oracle and SQL Server table. I have no control over the Oracle table and can only perform select queries. This table contains 30,000+ rows. Currently I create a data set from the Oracle table and then compare the data with a SQL Server table that I do maintain. In this case I am just checking for the presence of a student number in the SQL Server table. In cases where the number is not present, I insert it into the SQL Server table. This is, as you can imagine, horribly inefficient. Your suggestions and examples would be greatly appreciated.
-
11. SQL is a query language. Do you mean SQL Server? 2. What are you trying to do? You have described how you do it.Ronnis– Ronnis2010-12-17 21:49:28 +00:00Commented Dec 17, 2010 at 21:49
-
SQL Server. The end-goal is to perform a morning report and capture only the new students in the oracle table and then copy them into my sql server table.gnome– gnome2010-12-17 22:00:18 +00:00Commented Dec 17, 2010 at 22:00
4 Answers
Create a Linked Server instance on SQL Server, using an account that has access on the Oracle instance.
Then, you can update the SQL Server table with the missing contents using:
INSERT INTO [SQLServer].[dbo].[table]
SELECT columns
FROM [Oracle].[database].[schema].[table] x
WHERE NOT EXISTS(SELECT NULL
FROM [SQLServer].[dbo].[table] y
WHERE y.student_number = x.student_number)
There are NOT IN and LEFT JOIN/IS NULL alternatives -- NOT IN and NOT EXISTS perform better than LEFT JOIN/IS NULL when the columns compared (in this case, student_number) is not nullable (the value can never be NULL).
It's easy to script this as a SQL Server Agent Job if you need it to run periodically.
2 Comments
A simple SSIS package would suffice. Write a dataflow task that would get data from Oracle and SQLServer, a lookup control that would compare between them and update upon failure and insert upon success. SSIS is designed to be pretty fast and efficient.
1 Comment
I am not sure if this is a good answer since it is hard for me to see what youre asking for, but I personally would offload the processing burden onto the RDBMS. So instead you would create a single select statement algorithmically that would return only the relevant values.
This could be done using the NOT IN sql operator.
So start off your string however you normally would...
String statement = "SELECT * FROM OracleDB where StudentID NOT IN (";
foreach (val in StudentIDValuesFromYourDB)
{
statement = statement.concat(val + ", ");
}
statement = statement.substring(0,statement.length()-2);//remove the extra comma
statement = statement.concat(")");
Admittedly it is a HUGE statement and I am not sure if it is even allowed to be that big (nor am I sure of the syntax in Oracle, I hate adapting my work for Oracle clients), but I think the RDBMS would be better optimized for doing all this sorting work for you. If you can run this statement you will only get the values that are in the OracleDB and NOT in your own DB.
You can now use this list to generate only the relevant insert statements.
1 Comment
Thank you all for your ideas. OMG Ponies, you get the check because it's the right solution for what I need. However, the servers are not linked. So I had to write a work-around using linq.
First I transformed the oracle data set into an object:
List<Student> studentList = new List<Student>();
studentList = (from d in dataSet
select new Student
{
StudentNumber = d.STUDENTNUMBER,
.... other properties
}).ToList();
I then wrote a comparer class and returned the difference:
public IEnumerable<Student> ListNewStudents(IEnumerable<Student> studentList)
{
List<Student> otherStudentList = (from s in _dataContext.Students
select new Student
{
StudentNumber = s.StudentNumber
}).ToList();
return studentList.Except(otherStudentList, new StudentComparer()).ToList();
}
Worked perfectly and I'm a happy man again.