Over the last couple of weeks I have building and installing MySQL 5.1 NDB Cluster EC2 Amazon machine images (AMI).
http://blog.dbadojo.com/2007/07/mysql-51-ndb-cluster-on-ec2.html
http://blog.dbadojo.com/2007/07/mysql-51-ndb-cluster-on-ec2-part-2.html
There has plenty of forum questions with regards to using MySQL 5.1 NDB Cluster as a way to provide redundancy and specifically for EC2, a way to provide persistent storage.
The real benefit with MySQL 5.1 is the new feature ability to store non indexed columns on disk, this essentially increases the size of the database which can run under NDB. With the previous versions the whole databases had to be stored in memory, this constrains the size of the database you could run, both from physical and also the cost.
Last couple of days I have smoothed out the running of multiple NDB nodes.
I have built both dedicated data, management and sql/API nodes and also combined nodes.
The main thing I have been testing is the redundancy and the ability for MySQL NDB to provide the required redundancy and data persistent.
So I used a two combined nodes (a node with has management, data and sql/API software) as the base and two dedicated data nodes.
The thing to remember here is the network bandwidth provided (100Mbps) is the bare minimum required for the cluster.
So the results:
- Having 2 management nodes works as documented.
- Losing any node causes all connections from sql/API nodes to be disconnected, however once the cluster is ok, connections are restored. This is similar to any other HA solution without a front end cache.
- I had two occasions where the lost of management and data node caused the cluster to not rebalance and had to be completely shutdown. This may have been related to a slow network connection.
- Make sure that the number of replicas (a NDB cluster configuration variable) is what you require. With the 4 data nodes with number of replicas of 4 means that the NDB will keep 4 replicas of the data. This is maximum redundancy available provided by MySQL 5.1 NDB.
- Like any scaling solution you should serious look at caching so any loss of connections is a minor issue.
The next testing involves (finally) testing MySQL Cluster replication.
I will post another article with the main configure files and settings I choose to use.
Have Fun
Paul
You say that having to store the complete database in memory limits the size of the database, both physical limits and cost. But as far as I know, MySQL is capable of storing different parts (fragments) of the data on different nodes, so physically it should be possible with a db size larger than memory size on each node, provided you have enough nodes, right? The cost of running the nodes, on the other hand, could be limiting.
Potentially yes. I reread the manual on this < HREF="http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-nodes-groups.html" REL="nofollow">>concept<> and you can partition the data as required.>>What I was demostrating for that test was having 4 replicas of the data on 4 data nodes, rather than partitioning the data into 2 partitions, each with 2 data nodes as suggested in that documentation.>>There is a < HREF="http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-utilities-ndb-size.html" REL="nofollow">NDB perl script<> provided by MySQL AB to help determine the size of the NDB cluster.>Thanks for the comment, it is something I can followup and test.