Karlsson on databases and stuff: Cloud

Showing posts with label Cloud. Show all posts

Monday, August 5, 2013

Don't let Technophobia kill innovation

What? Me? technophobic? I have the latest iPhone, my office is jam packed with USB gadgets and my car is a Prius, how much more techno friendly can one get?

That is all fine, but looking beyond fun technologies that we play with just for fun, or natural, but cool and useful, evolutions come to most of us easily. But can you honestly say (I can't) that you always look at the promise of a new technology and never have never looked at it not from the point of view of the obvious new advantages, when the technology has developed into something useful, and instead just looked at it and judged this new technology only from it's first, shaky, implementation?

When I was in my early teens (which occurred around the time just after Mayflower had arrived in New England) my family moved into our first own house. My parents were running a restaurant at the time (they ran one or the other all through my childhood) and my mother had seen most of the weirdo Heath Robinson designed (TM) commercial and domestic kitchen appliances, and when we first entered our new home and mum looked in the kitchen and realized there was a dishwasher in there, her first reaction was "Well, I'm never going to use that one". One month later, the dishwasher was working more or less daily, and my mum never did any dished by hand.

Many years later, me, her only son, having spent the better part of his life playing with SQL based relational databases (and looking at some of the code in them, I suspect that Heath Robinson is still around, now as a software engineer), started to look at NoSQL databases, and my reaction was largely that of my mums when she saw the dishwasher "Nah, I'm not going to use anything like that. Eventual consistency? What kind of silly idea is that".

Yes, I was wrong, but I am still convinced that NoSQL databases (yes, I know NoSQL is a bad term, but this is monday morning and I don't have enough energy to think up something better) will not replace SQL based system. What I do think is that we need both.

Just as I think my mum got it wrong twice: Yes, the dishwasher really is a good idea, but some things are better handled without is. The results is that there is an abundant lack of sharp knifes in my mums house (as a dishwasher is a really effective knife-unsharpener). My self, I use a dishwasher, but knifes and beer glasses are still, to this day, washed by hand by yours truly (beer glasses and I don't want any left over enzymes in my beer, as they are used to kill bacteria, including the really tasty bacteria that gives beer it's distinctive taste).

Too many words has so far been used to say this: The world needs both SQL and NoSQL databases working together, serving different purposes and applications. As for Eventual Consistency, I still thing this is bogus, just say what it is, no consistency, and live with it, MongoDB, Cassandra and LevelDB are still very useful technologies, as is MySQL. And in many cases you need ACID properties and atomic transactions and all that, but in many cases this is a gross overkill.

Look at something like Virtualization. In that case, I think I looked at it in the right way, looking at the potential of the new features this brought, and not ignoring, but thinking less about the issues with the first implementations (slow I/O, slow networking, complexity of use, complexity of installation etc) and looking at what it could do in terms of cost reduction, effective systems management etc.

Back them, when I was a big Virtualization supporter, many were opposing me with the obvious issues with databases (which is the field where I work, if this wasn't already obvious) which was that I/O was slow and unreliable. Yes it was, but that can be fixed. This is not a flaw with the technology per se, but with the specific implementation and the limitations of the underlying technology at the time. Not everyone needs the highest of high performance, many can do with less. And some can easily scale out to more machines. All in all, many can benefit from Virtualization, maybe more than you think. These days, I think noone doubts that Virtualization is useful.

This is not to say I am always right, but I am not so technophobic that everything that is not something I already know is something that sucks. Also, we should be careful when comparing things. We often compare based on attributes of existing technologies and tend to forget that new technologies might well have virtues of their own (which we do not use for comparison as we are unfamiliar with these features as they don't exist in the technologies we currently use).

I think one technology that is now in a state of being seen as inferior is Cloud technologies. We look at a cloud by taking something we run on some hard iron in-house and throw it at Amazon and look at the result. Maybe we should build our applications and infrastructure differently to support clouds, and maybe, if we do that, a Cloud might well be both more cost-effective, scalable and performant than the stuff we run at our in-house data center.

So don't let new innovative technologies die just because they lack a 9600 baud modem or a serial port. Or because they are no good for washing beer glasses (even if that is a very important dishwasher feature).

/Karlsson

Friday, November 25, 2011

Cloud Tech Day in Stockholm Tue Nov 29

I'll be doing the keynote at Cloud Tech Day here in Stockholm on tuesday. I'll be speaking a bit about what Recorded Future is up to, about Clouds at Amazon and what it is like, about databases, like MySQL and MongoDB, in the Clouds and about Big Data in the Cloud! Really big data in Mongo, in MySQL and the lot.

As usual, I will express my opinions in no uncertain terms. What works? What doesn't work? What really should work, but which doesn't! What is considered new and waay cool but what is really some old technology that didn't use to work and has little chance of working now. And stuff like that, you know what it's like and maybe you even know what I am like :-)

Hope to see you on tuesday
/Karlsson

Wednesday, November 23, 2011

Nov 23: At Cloud Camp Stockholm

I am Cloud Camp in Stockholm today. Some interesting ideas are bounced around, pretty cool stuff.

One thing hit me today though: the lack in innovation, in IT as a whole and in databases in particular is stunning. I have thus decided to write a few blogpost on this I think should, and probably eventually has to change, but noone wants to change it, and few even see it as a problem.

That said, I still got a few interesting ideas today, and I will test some products I saw here, and I will write a few blogpost on some of them.

I think the good usecases for clouds is also getting clearer, and that is a good thing. In difference to the current IT trends, IT press and many high-profile bloggers as well as IT influencers, I do not think that cloud computing will help resolve the conflict in the middle east. Also, I do not believe that the introduction of cloud computing, in difference to what many IT security folks seem to think, will cause all the credit card info, all the personal data and everything else suddenly to be available to everyone on the net. Taking my own stand as usual, and in this case this is a real different view,I beleive that Cloud computing is great for some, but not for all. And I also do not think (you are sitting down now, right? This is revolutionary, ground-breaking thinking) there is no such thing as a silver bullet. Tough!

/Karlsson

Wednesday, November 2, 2011

Clouds in Stockholm

I'll be at Cloud Camp here in Stockholm on November 23. Some familiar faces will be there, beyond yours truly then. I will discuss and present some real-live Database Cloud experiences, but as this is an unconference, don't expect slides, rather I will talk from my heart and give you some annoying and upsetting views on how things really are. Really!

I hope to see you there, pop by and say hello!
/Karlsson

Wednesday, February 9, 2011

EC2 - The E is for Elastic

So, you are thinking about Cloud Computing? Is it a fad, along the lines of SOA, OOP, NoSQL, ORDBMS or is it a new paradigm when it comes to infrastructure? (not that a fad is bad, it's just that a fad, in my mind, is something that is grossly overblown in proportion. OOP is a good thing, but tell you what, OO-talibans out there, despite what you may think, OOP will not create peace in the middle east (if it did, I'd embrace it right now)).

But all that aside, what is in the Cloud, really? And from a technical standpoint, it seems simple enough: Your servers running across a number of virtual machines, with virtual disks and what have you not, where you pay for resource use and you share the environment with a bunch of other users. And that really is not that complicated. And from a pure technical view, that is it, sort of, but there is more to it than that, because when you come to run your stuff in a cloud, you realize that things aren't as simple, and that running in a VM in a cloud really is different from just running in a VM Ware / Zen / Zones environment, or something like that.

This is an easy mistake to do. When I started working here at Recorded Future, where we use Amazon EC2 for everything, that is what I thought. We run Ubuntu in a virtual environment+ Big deal. There are some EC2 integration tools and some GUI to administer the whole shebang, but except for that, this is no different than your server-room Linux box, but at a lower cost. And yes, I fully admit it, I was wrong.

What this is about, more than running in a Virtual environment, is the side effects of an environment shared with other, and how you set up your system to support that. Two important things I have learnt so far in some 5 months with Recorded Future:

Scalability is key! It really is. And I know, everyone want scalability, but in an environment such as EC2, with many different configuration options, but still a shared environment, you must be able to scale. And Scale horizontally. Even the very largest virtual machines at Amazon has the power for demanding applications in terms of disk I/O performance, Network throughput and latency and CPU performance.
It's not cheap. No, it's not, you are wrong. And that is not to say that it's bad. But it is different! If you manage that difference, you can run a very cost-efficient operation with EC2. But if you expect vertical scaling or have monolithic setups that runs on a single machine that has to scale, in a single instance, with your needs, then don't think this will save you and headaces or money.
But if you build your infrastructure in such a way that it scales nicely acress servers, and ensure that performance requirements on a single server are modest, and can be distributed if needed, then EC2 is for you.

So what about the Elastic aspect then? Well, elasticity in EC2 works both ways, on one side, the performance you get, in particular in terms of network latency, will vary over time. It just will, so get over it, accept the facts and create a system that can sustain it. On the other hand then, you can add resources as you need them. Sounds great, doesn't it? Well, yes, but there is a limit to WHAT resources you can add, and how.

EC2 allows you to add disks to your system as you please. There is a GUI (which is not very good) and there are command-line programs (not particularly good either, to be frank, but I am not, I'm Anders) to do this. But they do the job reasonably well. What you can NOT buy is more disk I/O throughput, just like that. You can get more disks and stripe them for sure, but that's about it. The same goes for CPU, you can get more of them (to a limit), but they are only so powerful. It's not like "Gimme a gazzillion of Petaflops" just like that, I'm afraid.

Above all, the network is only so fast, and regrettably that is not terribly fast. Also, the way EC2 manages DNS looks is just plain weird, I have to assume it is so for a reason, but the reason just has to be something you smoke, but not necessarily something you inhale.

So where are we now, then? EC2 has servers with limited performance, with limited disk I/O capacity and interconnected by what sometimes seems like 2400 baud modems. How can all this be useful? And by the way, at times both Disk I/O and Network performance is quite acceptable, but your milage may vary. Tell you what makes this stuff rock: There are MANY servers and many disks. How many as you want. And you can move them around, mount one disk on one machine, and then on another (like in a SAN, for example) just like that.

All in all then, the perfect infrastructure for EC2 then is something that scale with the number of servers. Scale I/O, Scale network performance, scale User connections, whatever the bottleneck is in your system, with the number of servers, each having as much disk as is necessary.

And having said that, you may have figured out where I am going with this, as in the Letterman Show, we are playing Will it scale?. Yes, that is a valid question, and the answer is, maybe. The less state a system holds, the easier it will scale, to make it really simple. A webserver that is just serving stuff off a disk will scale real nice, for example. Also, the less you persist, one way or the other, the better it will scale. The most common way of persisting data is of course writing it to disk, that is clear, but any kind of persisting data that is shared will cause some limit to scalability.

A simple Web-server scales easily, as we have already said. I more complex Web application, say a PHP application, may not scale as well, but still a lot, as long as the state is limited to a single PHP session or similar. The same goes for App-servers I guess. The one thing in most systems that is difficult to scale is the database, and the reason is of course that the database has a lot of state. A stateless database is not much of a database, to be honest.

MySQL has a means of scaling the database that has worked quite well for a number of years in Web-style applications. Scale-out is the way to go. And scale-out is simple enough: Asynchronously Replicated Slaves being fed from a Master. The Asynchronous nature of this Replication means that database writes (which all go to the Master) are not held up by replication to the Slaves. And you can have, in theory, any number of Slaves. But there is a drawback to all this: Only Reads may scale, not writes. Master-Master may help, to an extent, but not much. And a massive replication setup, with N slaves all Replicating from a different point in the Master. Another issue is that for the Slave to do it's job properly, the operations on the Slave are serialized (if this wasn't the case, think about what Foreign Key relationships might cause you), which means the Slaves are slower than the Master when it comes to Writes, which in turn means that unless you want the Slaves to become more and more behind, you better keep the write-load average at the level that the Slave can sustain, not the Master.

And no, by the way, I don't think MySQL Replication is something bad and awful. But I do think I know what it is good at and what it is NOT good at. And it will not help you scale your writes. Nope.

So what would a database that could scale in an Amazon EC2 Cloud look like? Above all, it would need:

Flexible configuration - Adding a node should be just that, adding a node. No restart, no optimizations, no reorg of data, no downtime, just Here is a node: Use it. And removal of a node should be the same. And management of data in the database as well. No more monolithic database configurations, please!
Scalable performance - And once you add those nodes, they really should be able to increase the performance of things, and not just to a small extent.
Data distribution - Yes, I want my data distributed. Replicated to where it is used. Distributed and persisted to where it is best persisted.
Transparent - Yes, all this should be transparent to the application.
SQL based RDBMS - Yes, I know, I am an old fashioned guy, but this what I want. SQL because it is ubiquitous, not because I think it's the best query language on the planet. And I want an RDBMS because I firmly believe that that is best for my data. Which is not to say that some application might prefer some other means of storage (if you want my full view on these matters, read this post).

So, is this just a wet dream? I hope not. One technology I am looking at and which I am eager to try later this year is NimbusDB. This is created by Jim Starkey, and if you think that the MySQL Falcon debacle was caused by him, and that NimbusDB is therefore not something to look seriously, think again. Having looked at a number of different technologies in the past 5 months or so, I have to say that NimbuxDB is the only one where they have at least understood what problems to solve in a Cloud environment, and that is not a bad start. And tell you what? I do think that to support such a setup as NimbusDB sets out to do, you really need to start from Scratch, I do not think that Oracle 13g or MySQL Clouse Storage Engine or something like that will be around to fulfill the requirements to run and scale properly in a Cloud. But you never know, there are interesting times ahead.

/Karlsson