Plenty of people have been excited by the prospect of Amazon EC2 and the ability to scale out your databases as load increases from your original configuration. I noticed Morgan Tocker and Carl Mercier are going to be presenting on this topic at the upcoming MySQL Conference
However almost immediately people are worried about the lack of persistent of data across instance terminations.
In a sense people are wanting dedicated hosting services instead of what EC2 really is.
You need to think of Amazon EC2 in a electricity generation metaphor.
Coal, Nuclear Fission and Gas provide the base load electricity which is on 24×7.
Gas and Hydro can act as peak load generation as well. So at peak periods when the requirement for more electricity is higher, electricity generators can switch on these extra resources to cope.
Amazon EC2 is the same, it is there to service peak load.
Either you are using Amazon EC2 as a base load server or you are using a dedicated hosting service to provide base load. You add additional server resources during peak periods as required.
As a dedicated hosting service EC2 is not actually the cheapest option out there. There are plenty of dedicated hosting providers who will give you and your application and database, cheaper base load capacity. That said, many people choose to run both application and database servers on EC2 as base load servers and the uptime of these instances is good.
What this means is that using EC2 as base load means you must implement additional protections for your data to provide persistence. This may be in the form of clustering technologies and replication technologies or both. So running EC2 as a base load database server adds complexity. This is why numerous companies have sprung up as a result of this complexity. They are essentially providing a method for companies to pay for someone else to deal with it.
The hidden value here is, in adopting a more thorough attitude to data persistence and redundancy, your database is more robust. So if or when your dedicated hosting provider has an outage, your architectural design is already in a position to handle it.
The danger is, you see any ongoing performance issue (a demand for addition base load) as solved by throwing hardware at it. Rather than reviewing whether the demand is justified or whether it can be reduced through tuning the application, database or architecture.
Update: Added Carl as co-presenter at the MySQL conference.
Paul,>>Thanks for plugging my talk (to be 100% fair – it’s both mine and Carl’s talk).>>I like your analogy of being an “electricity generator” – I think that’s quite accurate. While EC2 will survive restarts, and it’s possible to configure software RAID, there are still some limitations when compared to ‘traditional’ offerings. I look forward to sharing those next week 😉
Thanks mate, added Carl to the blog post. My bad.>>Well you are closer than me (5 hours vs least 14), though I know Sheeri is going along to the conference. Maybe I can convince her to attend the presentation. I will have to wait until I download the slides or audio later.
Sorry to post it here, but will these presentations be available online. I’m in Europe.>>Would be nice to talk about mysql-proxy and the new comer… scalr project, isn’t it ?>>I also heard that some tried to implement a db cluster in the cloud, but unsuccessfully. Bad done or simply impossible due to network latencies ?
I am hoping that most presenters post their material online. Unfortunately some slides are completely pointless with audio.>Especially if the speaker uses the one word per slide, rapid flicking of slides type presentation.>>The other question, DB clustering. Given time Amazon should/will/ improve the latencies between EC2 nodes. At the moment many grid/cloud possible software expect at least gigabit or Infiniband network latencies to optimally perform.