7

I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.

PS: For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?

An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.

4 Answers 4

2

Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.

If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.

Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.

If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for answering my question! I didn’t mean for sharing the connections between users, I’m just trying to figure out a multi-tenancy architecture with each user group getting access to a separate database. I’m wondering how is that possible for me to do in Go as it seems like I need to maintain a undetermined amount of databases on the fly.
I think you can maintain mutiple global objects of *sql.DB(for each database and user) and let golang package handle the connection pool
Yes, for example you can organize your pools in map[string]*sql.DB with keys as a group identifiers.
Thanks I've thought about this before, what if I want to create an unlimited number of databases? In the multi-tenant setting, I am not expecting I would save all those *sql.DB in memory.
2
+200

You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.

Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.

p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.

Comments

1

Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).

The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:

type DBDetail {
 sync.RWMutex
 connection *sql.DB
}

The map would be database URL to database's details (dbDetail) and if a user is write it calls this:

dbDetail.Lock()
defer dbDetail.Unock()

and for reads instead of above just use RLock.

As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.

2 Comments

Hi Arash thanks for the answer! The user here refers to a company or a team that contains multiple users and they will have the need to collaborate on a database so I assume the lock stuff won't really fit into that scenario?
If each database has only one user then the lock would not be handy, but if multiple users have access to a single database (since we want to optimize the connection for maximum users) then the lock would help to developed some middle strategies like keeping the connection only for one user and switching after the end of query or logging the lock time and user id to trace actions on database relating to users based on time.
1

There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.

I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.

Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).

A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.

Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?

The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.

Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:

Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X

https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf

So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.