1

I am trying to understand the differences between the new CockroackDB and other distributed SQL databases as compared to a cloud-managed database like Azure SQL Database.

It seems there is no difference in the use cases between them:

  1. Like various NOSQL databases SQL (in general) allows partitioning keys.
  2. I can add cores in Azure to increase the performance as needed, I can also switch to Hyper-scale if I have an elastic workload.
  3. I can have read replication across multiple nodes over multiple availability zones (geo-locations)
  4. I can configure data replication in Azure SQL Database too.

It seems to me that a cloud SQL database covers all the use cases the newer distributed databases cover, so why would I want to use a newer product ?

Isn't Azure SQL Database basically a distributed database server ?

Am I missing something ?

8
  • do you mean "Azure SQL Database"?! SQL Server is something else entirely Commented Jul 14, 2022 at 12:08
  • @silent Correct I meant that hence my title contains Azure SQL Sever and not SQL Server. Commented Jul 14, 2022 at 12:09
  • 1
    I just want to be precise here. There is "Azure SQL Database" and there is "(Microsoft) SQL Server" (and there is Azure SQL Managed Instance). All three things are different. Commented Jul 14, 2022 at 12:11
  • 1
    @silent I hope the edits make it a little bit clearer. Commented Jul 14, 2022 at 12:13
  • 2
    @Larnu how about now ? I am asking about Azure SQL Database. Commented Jul 14, 2022 at 12:18

1 Answer 1

3

Is Azure SQL Server a Distributed SQL database?

No.

Like various NOSQL databases SQL (in general) allows partitioning keys.

  • Partitioning in NoSQL databases like Cassandra (and Azure Table Storage) is about distributing partitions to physically distinct nodes, and requires rows to have an explicitly set partition-key value.

    • Cassandra nodes are physically different machines that can run independently, which gives it excellent resiliency.
  • Partitioning in SQL Server, Azure SQL, and Azure SQL Managed Instance is about dividing data up into row-groups that exist in the same server for performance, not resiliency.

I can add cores in Azure to increase the performance as needed, I can also switch to Hyper-scale if I have an elastic workload.

This fact has absolutely nothing to do with distributed databases.

I can have read replication across multiple nodes over multiple availability zones (geo-locations).
I can configure data replication in Azure SQL Database too.

  • Replication isn't the same thing as a true distributed database:
    • In Cassandra and other distributed databases, all clients can connect to all nodes and accomplish the same tasks; and you can arbitrarily add and remove nodes while the system is running.
    • In SQL Server and Azure SQL's replication feature, the replica is strictly a "secondary" that is subordinate to your primary server.
      • Clients can connect to either the secondary or the primary, but the secondary server can only perform read-only queries, whereas if a client wants to do DML (INSERT/UPDATE/DELETE/MERGE) or DDL (CREATE/ALTER) then the client must connect to the primary server.

It seems to me that a cloud SQL database covers all the use cases the newer distributed databases cover, so why would I want to use a newer product?

It can't: because Azure SQL is not a distributed database it cannot allow any client to read and write to any node or endpoint and have that change replicated to all other nodes (using an eventual consistency model). Instead, Azure SQL requires writes to be performed by the single primary "server".

Note that an Azure SQL "server" or logical server is largely an abstraction that hides what Azure SQL really is: a distinct build of SQL Server's engine that runs in a high-availability Azure Service Fabric environment (which is how cores/RAM can be added and removed while it's running and provides for some kind of local resilience against hardware failure) in a single Azure datacenter.

Sign up to request clarification or add additional context in comments.

3 Comments

I think most of your points are correct, but not quite in some detail. Azure SQL is running on Service Fabric, so in fact it is not just a single physical server. That part is abstracted from the user and you are right, it does not behave like a NoSQL DB where you can connect to any instance. But it is still very different than a SQL Server which is physically one node and you need replication to build HA. Azure SQL has HA (even across AZs) built-in in that sense
@Dai So its still single point of failure because of the primary WRITE server ?
@silent Thanks - I've improved my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.