MySQL 5.1 NDB Cluster on EC2 testing redundancy

Over the last couple of weeks I have building and installing MySQL 5.1 NDB Cluster EC2 Amazon machine images (AMI).

http://blog.dbadojo.com/2007/07/mysql-51-ndb-cluster-on-ec2.html
http://blog.dbadojo.com/2007/07/mysql-51-ndb-cluster-on-ec2-part-2.html

There has plenty of forum questions with regards to using MySQL 5.1 NDB Cluster as a way to provide redundancy and specifically for EC2, a way to provide persistent storage.

The real benefit with MySQL 5.1 is the new feature ability to store non indexed columns on disk, this essentially increases the size of the database which can run under NDB. With the previous versions the whole databases had to be stored in memory, this constrains the size of the database you could run, both from physical and also the cost.

Last couple of days I have smoothed out the running of multiple NDB nodes.

I have built both dedicated data, management and sql/API nodes and also combined nodes.

The main thing I have been testing is the redundancy and the ability for MySQL NDB to provide the required redundancy and data persistent.

So I used a two combined nodes (a node with has management, data and sql/API software) as the base and two dedicated data nodes.

The thing to remember here is the network bandwidth provided (100Mbps) is the bare minimum required for the cluster.

So the results:

  1. Having 2 management nodes works as documented.
  2. Losing any node causes all connections from sql/API nodes to be disconnected, however once the cluster is ok, connections are restored. This is similar to any other HA solution without a front end cache.
  3. I had two occasions where the lost of management and data node caused the cluster to not rebalance and had to be completely shutdown. This may have been related to a slow network connection.
  4. Make sure that the number of replicas (a NDB cluster configuration variable) is what you require. With the 4 data nodes with number of replicas of 4 means that the NDB will keep 4 replicas of the data. This is maximum redundancy available provided by MySQL 5.1 NDB.
  5. Like any scaling solution you should serious look at caching so any loss of connections is a minor issue.

The next testing involves (finally) testing MySQL Cluster replication.

I will post another article with the main configure files and settings I choose to use.

Have Fun

Paul

Oracle RAC on NFS: Configure NFS server

As I mentioned in Openfiler posts, until I can expose a block device within the Xen VM, using ASM to manage the files won’t be happening.

So rather than spin my wheels I have been setting up a NFS server to share the 147 Gig mountpoint /mnt as the shared disk.

Essentially this is the first part of the Oracle RAC on NFS guide. When I wanted a more indepth guide to setting up NFS Server and NFS clients I used this NFS HOWTO.

So I fired up my trusty CentOS 4 base install, got the necessary packages installed, made a hack to get a service running without the sunrpc module and bingo, the NFS server is ready for the next stage of the Oracle RAC build.

Helpful hints:

  1. Install nmap and nfs-utils: yum install nmap nfs-utils
  2. Add all the hostnames to /etc/hosts as it makes it easy to have all the config files using those names rather than hardcoded IP addresses.
  3. To get around the nasty missing sunrpc module when you start NFS use this forum post essentially comment out the exit 1 in this line /sbin/modprobe sunrpc #|| exit 1 in the file /etc/inint.d/rpcidmapd
  4. Add this line rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs defaults 0 0 to /etc/fstab on the NFS server.
  5. Always run exportfs -rav so you can get a verbose output for the directories which are going to be shared via NFS
  6. nmap `hostname` is your friend for determining which ports to open from within ec2 security groups.
  7. rpcinfo -p hostname will tell you whether everything is setup and ready to go.

The other benefit of course like the Openfiler, is that I now can spawn a NFS server node to provide extra storage for Oracle RAC and whatever else needs it, for example a shared backup location for temporary Oracle RMAN backups for practicing building Standby databases using RMAN Duplicate command.

Have Fun

Paul

OCFS2 on Xen on EC2

Unfortunately OCFS2 requires access to block devices to enabling clustering.

This is something that Openfiler via iscsi was looking like being able to do, unfortunately I think it requires the ability to boot the Xen image with block devices exported into the virtual machine.
http://wiki.rpath.com/wiki/Xen_DomU_Guide

I have been trying to get Openfiler working without success see these articles.
http://blog.dbadojo.com/2007/07/openfiler-fun.html
http://blog.dbadojo.com/2007/07/openfiler-fun-sequel.html
http://blog.dbadojo.com/2007/07/openfiler-on-ec2-much-more-work-and.html

If anyone has any ideas I would love to here them as comments on the blog.

Enormaly are working on a s3 block device which might work.

Given the MySQL 5.1 NDB cluster took all of 15 minutes to setup, and has the ability with enough data nodes to withstand the loss of any one machine it is looking more favourable at the moment.

If you are talking OCFS2 you are looking at Oracle RAC probably on ASM. Given the speed of the interconnect is ultra-important for the shared database cache the current bandwidth would be interesting. Testing and sandbox yes, production no way.
The other issue with OCFS2 is that the box exposing or providing the block device has to be redundant as well otherwise storage becomes a single point of failure.

Have Fun

Paul

My background is as a Database Specialist/DBA, I support Oracle, SQLserver and MySQL
with my mates at Pythian.

OpenFiler on EC2 = much more work and control of Xen boot

Given at the moment you have little control over how the EC2 kernel is booted and I wasn’t able to get the rPath image for Xen to boot… I am stuck.

The Openfiler image I built from the tarball is fine, however a bunch of services aren’t working and modules are missing. Given the instance is running its own flavour of linux, it was missing yum or up2date and even rpm. So getting the missing modules in place was a pain.

The next thing which was broken the iscsi-target, without this module I will be unable to present a block device to OCFS2 to partition.

So I can get Openfiler to create logical volume and the NFS service is running, but this was not the whole point of this exercise.

I tried to get iscsi-target to compile and had a bunch of issues with running make.

The other issues is that the /dev/hda2 is mounted when the instance is started, not as a block device as suggested in some of the Openfiler Xen wiki and other articles.

At this point, I am going to move on, I think issue is more the EC2 environment rather than any problems with Openfiler itself. This is clearly better done where you have some control over how Xen boots an instance and the disks it presents.

So I am going to move on with the Oracle RAC on EC2 and attempt Oracle RAC over NFS.

There has been some interest on getting Openfiler working so this is disappointing. Various Amazon forum posts have mentioned several alternatives, however the inability to present a block device may well end up being the biggest issue with EC2 at least for any cluster filesystem requiring that.

Have Fun

Paul

MySQL 5.1 NDB Cluster on EC2 – Part 2

Given I had already been through the procedure of setting up an MySQL 5.0 NDB Cluster on EC2, getting the MySQL 5.1 NDB Cluster installed was a reasonably straight forward task.

I will create a new HOWTO wiki from my screen dumps in the next day or so though the documentation provided by MySQL is very thorough.

Given I want to give a large MySQL 5.1 cluster a whirl, maybe with Cluster replication, as described in this documentation, I built and bundled the following AMIs (Amazon Machine Images)

  1. MySQL 5.1 NDB data node – built from the rpms
  2. MySQL 5.1 NDB Management node – built from rpms
  3. MySQL 5.1 NDB SQL/API node – built from rpms
  4. Complete MySQL 5.1 install.

I built number 4 as you can run the whole cluster to test everything off one box or in my case image. You could also run the management node and SQL node on the same box as well.

The biggest issue I found which is specific to running on EC2 is the hostname, which is allocated via DHCP. I am working on scripts to automate the updating of the /etc/hosts file on each box so that the configure.ini required on the management node and the /etc/my.cnf settings required on the data nodes can point at a name rather than a specific IP address or DNS name.

The work I do here can be replicated to handle the hostname stuff required for Openfiler, Oracle standbys and anything else requiring network connectivity.

For particular interest for users of EC2 is the lack of persistent storage on any specific instance. Plenty of people are looking at various solutions, I guess the interest in using MySQL cluster is that any one data node or two (if you are paranoid) could die so running enough data nodes specifically and also maybe sql nodes with management nodes on the same would provide a solution to this. Backing up to S3 or some form of persistent storage is the fail back if everything goes pear-shaped.

Replicating these virtual images to boxes with persistent storage alleviates some of the pain, however EC2 remains a great area to built your knowledge of various technologies available. Even if some of quirks of running under a virtual machine make things slightly different.

If there is interest I will release a public image of the various types of nodes.

Have Fun

Paul

Part 1: Why use MySQL 5.1 NDB Cluster?

OpenFiler Fun the sequel

As I mentioned in this post, I was having trouble getting OpenFiler to use the physical volumes or volume groups I had created from the unix command line.

I finally had time to give this another go and was stuck again until I decided to read the error.log produced by lighthttpd and also the /var/log/messages file.

/var/log/messages was saying this


Jul 9 12:00:02 domU-12-31-35-00-0A-41 crond(pam_unix)[1936]:
session closed for user openfiler
Jul 9 12:00:34 domU-12-31-35-00-0A-41 modprobe:
FATAL: Could not load /lib/modules/2.6.16-xenU/modules.dep:
No such file or directory

I had tried to use the Xen ready image from Rpath however the image had failed to boot under EC2. So I used the tarball instead and made the image from extracted tarball.

I noticed the message log was complaining about a missing module in a directory which didn’t exist either. That module was dev-mapper. No dev-mapper and lvcreate fails!!

So the Openfiler web admin tool was going through the motions of creating new volumes but failing with any indications in the tool as to why.

I rechecked the Openfiler tarball again and it doesn’t have dev-mapper in a version of Xen which EC2 can use given it boots Xen 2.6.16. So the clunky solution was to copy the files from another running AMI instance running CentOS 4.
After running mod-prod dm-mod, lvcreate worked and so did the OpenFiler admin tool.

Woot! Next stop getting iscsi running so I can get OCFS2 formatting the storage provided by Openfiler as a cluster block device for ASM to use.

Stay tuned…

Paul

Here is a dump of my screen showing how pvcreate, vgcreate and lvcreate should work.



[root@domU-12-31-36-00-31-73 ~]# umount /mnt
[root@domU-12-31-36-00-31-73 ~]# pvcreate /dev/sda2
Physical volume "/dev/sda2" successfully created
[root@domU-12-31-36-00-31-73 ~]# vgcreate /dev/sda2 vg
/dev/sda2: already exists in filesystem
New volume group name "sda2" is invalid
[root@domU-12-31-36-00-31-73 ~]# vgcreate vg /dev/sda2
Volume group "vg" successfully created
[root@domU-12-31-36-00-31-73 ~]# lvcreate -L4096M -n myvmdisk1 vg
Logical volume "myvmdisk1" created
[root@domU-12-31-36-00-31-73 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 937M 8.5G 10% /
[root@domU-12-31-36-00-31-73 ~]# mount /dev/vg/myvmdisk1 /oradata
mount: you must specify the filesystem type
[root@domU-12-31-36-00-31-73 ~]# mkfs -t ext3 /dev/vg/myvmdisk1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
524288 inodes, 1048576 blocks
52428 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1073741824
32 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@domU-12-31-36-00-31-73 ~]# mount /dev/vg/myvmdisk1 /oradata
mount: mount point /oradata does not exist
[root@domU-12-31-36-00-31-73 ~]# mkdir /oradata
[root@domU-12-31-36-00-31-73 ~]# mount /dev/vg/myvmdisk1 /oradata
[root@domU-12-31-36-00-31-73 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 937M 8.5G 10% /
/dev/mapper/vg-myvmdisk1
4.0G 41M 3.7G 2% /oradata

Gridlayer 3Tera Virtual applications

As Barmijo mentioned in his comment there are other Virtualization providers out there. One of them being The Gridlayer from 3Tera.

The best place to check out what they are offering is to review the demo.

Looks like a good mix of GUI and CLI (Command Line Interface), building the application with drag and drop was also nice. Something for either Amazon or some third party provider to aim at.

The best bit (whilst a demo) was the scaling of resources. Given that it was done using the command line, either 3Tera would offer it or someone else, the ability to scale automatically as load reaches some thresholds with some caveats. Sometimes scaling indefinitely is not the solution to denial of Service or poor written code (of any flavour).

Have Fun

Paul

MySQL 5.1 NDB Cluster on EC2 – Part 1

Can you use MySQL Cluster and have persistent storage?

Yes, if you run enough cluster (data) nodes that it can survive the loss of one or more nodes.
So if reliability is required, the minimum number of nodes would be 3 data nodes.

The real solution would be to use MySQL Cluster replication. It requires more nodes but almost guarantees no loss data even if you lose one whole cluster.

Use MySQL 5.1 cluster
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html

As this version will give you the ability to have non-indexed data on disk, otherwise the whole lot has to be in memory. So on EC2 that is 1-1.2Gig or so.

As to persistent storage, you will need either to

  • Run enough nodes (n+1) so that any crash of your AMI is handled ok. You can then use /mnt as storage.
  • Use a S3 filesystem, there are couple of people providing solutions. You could still use S3 as a backup storage, as you still need backups even if you are running a fault tolerant cluster.

I would use option 1, if you are really paranoid use a Master MySQL 5.1 cluster 3 nodes replicating to a Slave MySQL 5.1 cluster 3 nodes.

The MySQL Cluster replication is documented here
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication.html

I have made a MySQL 5.0 NDB cluster of 4 nodes (1 master, 2 data and 1 mysql) so the next thing is built a Mysql 5.1 NDB master and mysql node and a MySQL 5.1 NDB data node for EC2 AMI (Amazon Machine Images).

The process continues in Part 2

Have Fun

Paul

Bizgres on EC2

Amazon and others already have release images with PostgreSQL installed, however as part of getting some datamining software built on EC2 I have run thru the build of Bizgres.

Most of the time was spent getting java installed properly. I will release a recipe soon as mentioned in that post.

On the Openfiler front I have read some more documentation, wiki and forum posts and will give it another crack on the weekend.
I know there are bunch of people interested in seeing Oracle RAC on EC2 a reality.

There are several other options to getting shared disk available for OCFS2 to see and use. I will only explore those if the problems with Openfiler are intractable.

Update:
The Bizgres build was good I have run through the IVP (Instance Verification Program) without issue, see the details here.

Have Fun
Paul

Openfiler fun

I was working on the Openfiler step over the past couple of days.
Whilst I was able to get the Openfiler image working, I was unable to create any logical volumes or get it to see any physical volumes. It was looking like the proverbial business buzzword “it’s a Show Stopper”.

Hint: whilst pvcreate and vgcreate worked, they were unusable by Openfiler.

I googled and came up with very little apart from needing to learn more about Xen and vxd block devices.
I gave it another go (just before starting this post) and bingo, I found a bunch of potential resolution articles.
The Openfiler on Xen wiki (RFTM whoops)
A rpath forum post on handling xvd block devices.
Plus it looks like someone was doing something similar with a HA cluster setup whilst not as crazy as Openfiler on Xen flavoured virtual instance sitting on top of Amazon hardware ala EC2.

I have a week of leave to play with this and get Oracle RAC on EC2 a reality… in amongst some kids parties and a Burt Bacharach concert.

If you have any ideas or thoughts on Openfiler on Xen, feel free to contribute.

Editor note: Slight change pvcreate and vgcreate were working, lvcreate never was.

Update: Success. See the sequel to this article.

Have Fun

Paul