Recently we've been working with a couple of vendors to identify some potential solutions for scaling backend databases horizontally. This problem has been plagueing drupal.org, as well as many other large scale Drupal sites. Drupal.org has been plagued with database issues (as the infrastructure team is painfully aware), and we're doing what we can to help.
We've taken a look at third party hardware, software, and MySQL's own technologies, and some actually do have promise.
There are two scenarios we are dealing with, and they both have their own set of challenges to overcome.
- Multi-tenant Hosting: several databases coexisting on the same mysql server instance.
- Single-tenant Hosting: a single database on one instance of mysql, ideally spread across multiple servers. (Drupal.org falls into this category)
The market breakdown:
- Continuent uni/cluster provides a middleware solution for MySQL/PostgreSQL that proxies requests between your application and the database servers, allowing one to scale the backend horizontally. It's a great idea, but the price for most startups is a tad prohibitive, and it suffers from write performance problems.
- Sequoia is Continuent's open source offering. It offers a subset of the features provided by their commercial offering, and should work surprisingly well for single-tenant installs that do a lot of reads (it still has issues with write speed). Other downsides include the requirement of a specialized odbc driver, as well as a less-than-stellar memory footprint of 512MB per sequoia instance (it's a Java process, and it's generally run on each MySQL physical backend server). This is, of course, assuming that you have the required 3 days to set it up properly.
- Dolphinics provides a hardware based solutions for clustering that allows multiple servers to share the same memory pool in a very low latency. Ideally this works in tandem with MySQL Cluster to allow the cluster to be grown by simply throwing more hardware at it. We're currently testing out this solution, and will update as we have more.
- MySQL Cluster provides a solution that should theoretically solve most of the pain point for the market, but it only addresses the availability element of the equation and not scalability. In order to grow your cluster, you are required to have a hot standby cluster replicated and waiting, and then have to export and re-import all of your data once the cluster has been expanded.
The more I think about it, the more I'm frustrated that MySQL hasn't solved this problem reliably themselves. Not even MySQL's own site is using Cluster at this point (they're still using master-slave replication), so I'm worried that they've shipped a solution that doesn't provide what the market is demanding.
We've been talking recently with Continuent to help address this issue, but the problem is far more than "just throw more hardware at it." Our discovery with them led us to understand that there are different models for scaling; some around replication, others around data partitioning, and it's a matter of finding the right balance of the two for a given solution.
We'll be working more with both Continuent and Dolphin, as I can guarantee that hope of a working solution is on everyone's mind.