Background
Semiautomated file merging is a critical part of version control systems (all the ones I've used anyway!). This is usually accomplished on textual source code without the tool understanding the underlying language – although semantic merging also exists.
I'm interested in whether merging relational data has been researched, and what the findings have been.
Specifically, I'm imagining that a single relational database is "forked" in to two (or more) independent copies. That independent changes are made to these copies, and that we then want to recombine these databases in to one again, which reflects the changes in each.
What this question isn't
I'm not asking a question about specific technologies here, although that would be interesting. I'm definitely not asking about SQL based technology sepecificaly. I'm also specifically interested in relational data, and not tree data like XML, JSON, or ASTs, etc.
I'm also not asking about integrating heterogeneous databases or information stores, which seems to be the subject of Data integration. For the purposes of this question, it can be assumed that the databases will have the same schema and this doesn't change.
Some ideas about how this might work
Relational data is, basically, sets. Sets, in principle, have (several) sensible merge semantics, even when the common base isn't known, such as for a commutative replicated datatype set.
There are obviously some basic issues, such as generating identities. If identities are simple incrementing numbers, then these become extremely hard to merge. This problem seems to be solved by using globally unique identifiers.
Incrementing identifiers seems to be a specific case of a more general problem. It is an example of inserting a new fact in to the database that has been derived from the current state of the database.
In general, I suspect that any fact added to the database that is derived from the existing facts in the database, rather than simply new information collected from the outside world, is a potential source of difficulty during a merge. I wonder if this is a huge problem, or one that can be avoided for the most part by something like normalisation?
For example: if we adjust a balance by looking up the current balance and adding or decrementing, then this is hard or impossible to merge. However, if we store individual increments and decrements, this information is trivial to merge. When we need the balance, we should compute it from the data we've got.
Problems also arise for integrity constraints. Following naive set merging, a primary key may no longer be a primary key. However, it seems like it might be possible to automatically determine where this could happen from the database schema, and then have the user choose a resolution scheme, or default to some sensible one if such a schema exists. Note: it is possible to keep an audit, show conflicts as they arise, and have a user revert a decision later, as is done with version control systems.