How to get best performance accessing multiple diverse databases

Question

I have been given the task of bringing three legacy systems together into one user interface. This will be an Asp.Net Mvc application.

I have a Sql Server 2005 instance on one server, a Sql Server 2008 instance on another, an access database that holds compliance data and is populated through a custom plugin, and a Powerflex dat file database accessed through odbc.

For every user who accesses this new interface all of these databases need to be queried. One of the Sql Server databases and the Powerflex database has millions of records.

My question is what is the most efficient way to handle this situation?

Do I link the Sql Server databases and write a single query with joins for those servers?

Do I use disconnected in memory datasets?

Do I use minimalistic queries with a data reader?

Do I attempt to utilize the Entity Framework (I haven’t looked into a connector for the Powerflex database)?

I have never attempted to bring this many back ends together before and I am concerned about performance. A minimum of four round trips screams poor performance to me without ever writing a line of code. Any tips would be appreciated.

PS: Bringing them all together into a single database is out of the question at this time.

Christian Hayter · Accepted Answer · 2010-11-10 20:22:24Z

1

All the things you suggest in your question have good potential for simplifying your code, making it more readable, or easier to maintain. However, none of them will affect performance in any way, simply because you will still have 4 different physical data connections (even a linked server definition from SQL 2005 to 2008 or vice versa will not help with that).

To get any real performance benefits you will have to try and consolidate the data somehow. For example:

Move the SQL 2005 database onto the same physical SQL Server instance as the SQL 2008 database. You can then write cross-database joins between tables rather than cross-linked-server joins, which will be more efficient.
Is the Access database kept in that format because it is being used by Access forms or reports? If so, you could use the Upsizing Wizard to move the tables into SQL Server, but keep the Access forms and reports unchanged in the MDB file.

If you can do both of those things, you will end up with only 2 physical data connections to worry about (SQL 2008 and Powerflex). You can then optimise data access manually on a case-by-case basis. For example, if you are joining resultsets from both data connections, execute the one that is likely to return the least number of rows first, then use the results of that to narrow the search criteria for the other query.

answered Nov 10, 2010 at 20:22

Christian Hayter

31.2k6 gold badges76 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

W.Jackson Over a year ago

The access database cannot be moved because one of the legacy applications utilizes message queuing and writes data to this database for compliance purposes. Its nasty and for now, unchangeable.

Bravax · Accepted Answer · 2010-11-10 19:49:31Z

0

Is the following an option:

Move the SQL Server 2005 instance onto the SQL Server 2008 machine (still in it's own database, possibly even running the SQL Server 2005 version if necessary.)
Import the Access database onto the SQL Server 2008 machine into it's own database. You can reference this from Access if it still needs to be updated by access.

This then gives you two main data locations (With 3 SQL Server databases) and the Powerflex database.

Use joins between the SQL Server databases (Which don't need to link to other servers, so should be relatively performant), and then merge the data from Powerflex together in a middle tier.

answered Nov 10, 2010 at 19:49

Bravax

10.5k8 gold badges44 silver badges68 bronze badges

1 Comment

Bengie Over a year ago

you could also use the ODBC driver to have the SQL2k8 instance do a linked server. Now you have all 4 DBs interfaced through the same connection.

Scottie · Accepted Answer · 2010-11-10 19:51:22Z

0

There are a couple of ways to do this off the top of my head.

One, use DataSets. You could query information from all the different databases into one DataSet and then query from that dataset.

Two, use Entity Framework to get models for all of these and use LINQ to query against the different entities.

And, I think you are right. There really is no way around poor performance if you can't combine them into a single database.

EF might be your best bet here.

answered Nov 10, 2010 at 19:51

Scottie

11.4k21 gold badges75 silver badges116 bronze badges

Comments

Icarus · Accepted Answer · 2010-11-10 19:51:50Z

0

Have you considered using Microsoft's Enterprise Library for this? You can query all of these databases transparently. It implements the Factory pattern; the right versions of the database drivers are loaded and used based on the specific database being accessed.

Here's the link:

http://msdn.microsoft.com/en-us/library/ff648951.aspx

answered Nov 10, 2010 at 19:51

Icarus

64k14 gold badges102 silver badges116 bronze badges

Comments

Oded · Accepted Answer · 2010-11-10 20:07:46Z

0

If at all possible, import all the needed data into yet another database, one that is under your control.

Establish protocols for updating the data coming in/out from the different systems (how often data needs to be transferred, what data and how).

You will gain control over your application data and will not need to worry about the multiple other databases (so long as the imports/exports work correctly), management of many data sources and the need to manage data consistency across them in your application.

edited Nov 10, 2010 at 20:07

answered Nov 10, 2010 at 19:44

Oded

501k102 gold badges900 silver badges1k bronze badges

Comments

Stefan P. · Accepted Answer · 2010-11-10 20:25:46Z

0

I had a similar project with lots of sql servers over lan (different versions), the purpose of the app was to view and rarely edit data. I did write for each server a windows service that exports/sync data every hour with a WCF service on the app server. The repository was a SQL Server 2008 and on top of that Entity Framework. If your app doesn't required instant access to real time data this solution could do.

answered Nov 10, 2010 at 20:25

Stefan P.

9,5296 gold badges32 silver badges43 bronze badges

1 Comment

W.Jackson Over a year ago

That is one of the problems and I should have mentioned it. Two of these databases have a requirement for real time data. That is one of the reasons for the new interface.

David Schmitt · Accepted Answer · 2010-11-10 20:28:37Z

There are several options open to you, depending on the workload / query structure you have.

If you have long-running queries on multiple databases it might make sense to use some kind of asynchrony like BeginInvoke()/EndInvoke() as available.

If you have to receive many records from multiple databases and transmission latency becomes a problem you can hand off data reception to worker threads and combine the results afterwards.

If the result sets are so big that you cannot hold them locally in memory consider a streaming approach. Server side sorting and "merge"-type algorithms can help tremendously here. A join for example would sort by the join key and matching tuples would automatically be the first transmitted from both streams.

If you have smaller and larger sets to join, you can first query the smaller and use the data to filter on the bigger database.

As always, keep in mind that manual hard-coded optimizations break in the worst way when encountering unexpected workloads and data distributions.

Collectives™ on Stack Overflow

How to get best performance accessing multiple diverse databases

7 Answers 7

1 Comment

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related