Merging data from identical tables from multiple databases into one larger table. N-1 table replication

Question

There are N databases with identical table structures in a multitenant, multi-database system. The desire is to replicate a table(s) from these databases into a single larger table in a OLAP database, I am assuming it could work.

-Using Transactional Replication-

Recreate the PK for all table articles at the publisher including a newly added field that identifies the database.
Set the option "When Article Exists" to Do not Delete and Use Row filter (which includes the database identifier).
Add a Row filter for each table using the unique database identifier.

My question is, given the scenario above, if a new snapshot is created for a publisher, would the stale data at the subscriber be removed, and just for that publisher? I am afraid that is what the drop table and recreate is meant to solve :/

In simple terms if I have

TableA
DatabaseID
TableAID

If a new snapshot for a publication named Database007 was re-initialized. Would all the data from the Database007 subscription be removed and rehydrated for Database007 or would I hit a PK violation.

I have also been researching CDC, however, this does not seem to support a N-1 replication scenario. Also, please feel free to throw out any other ideas.

I am dealing with the same approach but my solution is working with incremental upserts on a column. I am using the same architecture to consolidate database and now I want to archive the same using sql replication for near to real time consolidations. I am very impressed that the suggestion answers do not support this idea. — Stavros Koureas
– Stavros Koureas, Commented Nov 23, 2022 at 14:55

score 2 · Accepted Answer · 2022-11-15 16:31:44Z

Use a view

I've had to tackle exactly this before, and the most reliable way to do it is to replicate to different tables, then create a view to union it all together.

On the replication, you can set the article to use a different name on the publisher side.
- You can redirect the tables from each location into it's own schema. Depending on your existing use of schemas, you could replicate dbo.Transactions to LocationA.Transactions or LocA_dbo.Transactions.
- You can use this feature to rename articles such that dbo.Transactions becomes dbo.Transactions_LocationA.
- As an alternate to renaming, you could replicate each Publisher into it's own database, which entirely avoids naming conflicts, but potentially introduces some permissions headaches related to cross-database ownership chaining.
Create a view that does a UNION ALL of all the separate tables.
- This is really just partitioning without using the feature of the same name.
- In the view definition you can add a column with the source location listed as a constant to identify the data in the resultant view.

Some caution

In the above plan, I'd suggest you make sure you avoid SELECT * in the view for all the usual reasons. If a schema change is made to different publishers at different times, the view will likely be broken from the time the first table's and is changed until the final one is changed. Instead, explicitly list the columns and only update the view once the schema change is everywhere.

The same schema change considerations need to be made when replicating into a single table, as well. Though I'm that case, it's more likely to break replication from sending, rather than just breaking the view.

Many-to-one Replication

The way the Snapshot agent works is that it essentially just automates using BCP to export from the publisher and import to the subscriber. The default option is to truncate and reload when you re-initialize a publication. You can also change to use delete instead of truncate, but that will use a single, unbatched DELETE statement, which can cause blocking and transaction log bloat.

If your multiple publishers have overlapping PKs, then you'll need to uniquify them, much as you suggest. However, this can impact performance--potentially a significant cost. In addition to the size considerations of adding the column to every PK, if your PK is also your clustered index, the uniquifier will also be included in every non clustered index.

You'll also need to ensure the uniquifier is added to the END of the PK definition to not break SARGability of existing queries. However, even if you do this, you may notice changes in query plans that cause performance regressions.

The query optimizer knows that if ID is a single column PK, then ID = @id will return at most a single row. The same cardinality rules are used during optimization of set based queries and joins. Thus you may start seeing changes in query plans where a 1:1 join is now interpreted as a 1:many join. This can be further mitigated by adding a unique index on the "old" PK. You may even choose to keep the "old" PK as a unique clustered index, and making the "new" PK a non-clustered PK.

The various challenges with adding replication from many targets to a single subscriber table make it a very challenging solution. It requires significant changes to the publisher databases. I would recommend against this option, except in green field development where the schema and performance can be taken into consideration from the start.

Additionally, the inevitable need to re-snapshot a publisher means carefully deleting just the appropriate rows from the subscriber. Using Partitioning with a partition per publisher can help here, but introduces a different set of complications. IMHO, the pseudo-partitioning is an easier to manage solution long term.

Replicating to unique targets ensures that the publishers don't need significant changes and testing, and eases the ongoing support burden that will be involved in a single 1:many replication target

Thanks for the clearly documented answer. I will consider this approach as I also continue to find a solution for N-1 table repl. — Ross Bush
– Ross Bush, Commented Nov 15, 2022 at 14:58
@Ross I added another section to explain why I wouldn't use a many:1 pub:sub replication configuration. — anon
– anon, Commented Nov 15, 2022 at 16:32
Hmmm, I was hoping there was an easy solution for this. Let me think about this for a while and I might post a follow up question if that's ok. — Ross Bush
– Ross Bush, Commented Nov 15, 2022 at 16:38
@RossBush I totally agree with AMtwo here. You'll probably find the performance and manageability of his solution to be much better, and with less unexpected issues, as opposed to trying to do a many-to-one publication-to-subscriber replication topology. Love the idea of using schemas to identify the different tenants instances of the replicated table too. — J.D.
– J.D., Commented Nov 15, 2022 at 19:19

Stack Exchange Network

Merging data from identical tables from multiple databases into one larger table. N-1 table replication

1 Answer 1

Use a view

Some caution

Many-to-one Replication

Your Answer

Linked

Hot Network Questions

Merging data from identical tables from multiple databases into one larger table. N-1 table replication

1 Answer 1

Use a view

Some caution

Many-to-one Replication

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions