What? Me? technophobic? I have the latest iPhone, my office is jam packed with USB gadgets and my car is a Prius, how much more techno friendly can one get?
That
is all fine, but looking beyond fun technologies that we play with just
for fun, or natural, but cool and useful, evolutions come to most of us
easily. But can you honestly say (I can't) that you always look at the
promise of a new technology and never have never looked at it not from
the point of view of the obvious new advantages, when the technology has
developed into something useful, and instead just looked at it and
judged this new technology only from it's first, shaky, implementation?
When
I was in my early teens (which occurred around the time just after
Mayflower had arrived in New England) my family moved into our first own
house. My parents were running a restaurant at the time (they ran one
or the other all through my childhood) and my mother had seen most of
the weirdo Heath Robinson designed (TM) commercial and domestic kitchen
appliances, and when we first entered our new home and mum looked in the
kitchen and realized there was a dishwasher in there, her first
reaction was "Well, I'm never going to use that one". One month later, the dishwasher was working more or less daily, and my mum never did any dished by hand.
Many
years later, me, her only son, having spent the better part of his life
playing with SQL based relational databases (and looking at some of the
code in them, I suspect that Heath Robinson is still around, now as a
software engineer), started to look at NoSQL databases, and my reaction
was largely that of my mums when she saw the dishwasher "Nah, I'm not going to use anything like that. Eventual consistency? What kind of silly idea is that".
Yes,
I was wrong, but I am still convinced that NoSQL databases (yes, I know
NoSQL is a bad term, but this is monday morning and I don't have enough
energy to think up something better) will not replace SQL based system.
What I do think is that we need both.
Just as I think
my mum got it wrong twice: Yes, the dishwasher really is a good idea,
but some things are better handled without is. The results is that there
is an abundant lack of sharp knifes in my mums house (as a dishwasher
is a really effective knife-unsharpener). My self, I use a
dishwasher, but knifes and beer glasses are still, to this day, washed
by hand by yours truly (beer glasses and I don't want any left over
enzymes in my beer, as they are used to kill bacteria, including the
really tasty bacteria that gives beer it's distinctive taste).
Too
many words has so far been used to say this: The world needs both SQL
and NoSQL databases working together, serving different purposes and
applications. As for Eventual Consistency, I still thing this is bogus,
just say what it is, no consistency, and live with it, MongoDB,
Cassandra and LevelDB are still very useful technologies, as is MySQL.
And in many cases you need ACID properties and atomic transactions and
all that, but in many cases this is a gross overkill.
Look at something like Virtualization.
In that case, I think I looked at it in the right way, looking at the
potential of the new features this brought, and not ignoring, but
thinking less about the issues with the first implementations (slow I/O,
slow networking, complexity of use, complexity of installation etc) and
looking at what it could do in terms of cost reduction, effective
systems management etc.
Back them, when I was a big
Virtualization supporter, many were opposing me with the obvious issues
with databases (which is the field where I work, if this wasn't already
obvious) which was that I/O was slow and unreliable. Yes it was,
but that can be fixed. This is not a flaw with the technology per se,
but with the specific implementation and the limitations of the
underlying technology at the time. Not everyone needs the highest of
high performance, many can do with less. And some can easily scale out
to more machines. All in all, many can benefit from Virtualization,
maybe more than you think. These days, I think noone doubts that
Virtualization is useful.
This is not to say I am
always right, but I am not so technophobic that everything that is not
something I already know is something that sucks. Also, we should be
careful when comparing things. We often compare based on attributes of
existing technologies and tend to forget that new technologies might
well have virtues of their own (which we do not use for comparison as we
are unfamiliar with these features as they don't exist in the
technologies we currently use).
I think one technology that is now in a state of being seen as inferior is Cloud technologies.
We look at a cloud by taking something we run on some hard iron
in-house and throw it at Amazon and look at the result. Maybe we should
build our applications and infrastructure differently to support clouds,
and maybe, if we do that, a Cloud might well be both more
cost-effective, scalable and performant than the stuff we run at our
in-house data center.
So don't let new innovative
technologies die just because they lack a 9600 baud modem or a serial
port. Or because they are no good for washing beer glasses (even if that
is a very important dishwasher feature).
/Karlsson
I am Anders Karlsson, and I have been working in the RDBMS industry for many, possibly too many, years. In this blog, I write about my thoughts on RDBMS technology, happenings and industry, and also on any wild ideas around that I might think up after a few beers.
Showing posts with label Cloud. Show all posts
Showing posts with label Cloud. Show all posts
Monday, August 5, 2013
Friday, November 25, 2011
Cloud Tech Day in Stockholm Tue Nov 29
I'll be doing the keynote at Cloud Tech Day here in Stockholm on tuesday. I'll be speaking a bit about what Recorded Future is up to, about Clouds at Amazon and what it is like, about databases, like MySQL and MongoDB, in the Clouds and about Big Data in the Cloud! Really big data in Mongo, in MySQL and the lot.
As usual, I will express my opinions in no uncertain terms. What works? What doesn't work? What really should work, but which doesn't! What is considered new and waay cool but what is really some old technology that didn't use to work and has little chance of working now. And stuff like that, you know what it's like and maybe you even know what I am like :-)
Hope to see you on tuesday
/Karlsson
As usual, I will express my opinions in no uncertain terms. What works? What doesn't work? What really should work, but which doesn't! What is considered new and waay cool but what is really some old technology that didn't use to work and has little chance of working now. And stuff like that, you know what it's like and maybe you even know what I am like :-)
Hope to see you on tuesday
/Karlsson
Wednesday, November 23, 2011
Nov 23: At Cloud Camp Stockholm
I am Cloud Camp in Stockholm today. Some interesting ideas are bounced around, pretty cool stuff.
One thing hit me today though: the lack in innovation, in IT as a whole and in databases in particular is stunning. I have thus decided to write a few blogpost on this I think should, and probably eventually has to change, but noone wants to change it, and few even see it as a problem.
That said, I still got a few interesting ideas today, and I will test some products I saw here, and I will write a few blogpost on some of them.
I think the good usecases for clouds is also getting clearer, and that is a good thing. In difference to the current IT trends, IT press and many high-profile bloggers as well as IT influencers, I do not think that cloud computing will help resolve the conflict in the middle east. Also, I do not believe that the introduction of cloud computing, in difference to what many IT security folks seem to think, will cause all the credit card info, all the personal data and everything else suddenly to be available to everyone on the net. Taking my own stand as usual, and in this case this is a real different view,I beleive that Cloud computing is great for some, but not for all. And I also do not think (you are sitting down now, right? This is revolutionary, ground-breaking thinking) there is no such thing as a silver bullet. Tough!
/Karlsson
One thing hit me today though: the lack in innovation, in IT as a whole and in databases in particular is stunning. I have thus decided to write a few blogpost on this I think should, and probably eventually has to change, but noone wants to change it, and few even see it as a problem.
That said, I still got a few interesting ideas today, and I will test some products I saw here, and I will write a few blogpost on some of them.
I think the good usecases for clouds is also getting clearer, and that is a good thing. In difference to the current IT trends, IT press and many high-profile bloggers as well as IT influencers, I do not think that cloud computing will help resolve the conflict in the middle east. Also, I do not believe that the introduction of cloud computing, in difference to what many IT security folks seem to think, will cause all the credit card info, all the personal data and everything else suddenly to be available to everyone on the net. Taking my own stand as usual, and in this case this is a real different view,I beleive that Cloud computing is great for some, but not for all. And I also do not think (you are sitting down now, right? This is revolutionary, ground-breaking thinking) there is no such thing as a silver bullet. Tough!
/Karlsson
Wednesday, November 2, 2011
Clouds in Stockholm
I'll be at Cloud Camp here in Stockholm on November 23. Some familiar faces will be there, beyond yours truly then. I will discuss and present some real-live Database Cloud experiences, but as this is an unconference, don't expect slides, rather I will talk from my heart and give you some annoying and upsetting views on how things really are. Really!
I hope to see you there, pop by and say hello!
/Karlsson
I hope to see you there, pop by and say hello!
/Karlsson
Wednesday, February 9, 2011
EC2 - The E is for Elastic
So, you are thinking about Cloud Computing? Is it a fad, along the lines of SOA, OOP, NoSQL, ORDBMS or is it a new paradigm when it comes to infrastructure? (not that a fad is bad, it's just that a fad, in my mind, is something that is grossly overblown in proportion. OOP is a good thing, but tell you what, OO-talibans out there, despite what you may think, OOP will not create peace in the middle east (if it did, I'd embrace it right now)).
But all that aside, what is in the Cloud, really? And from a technical standpoint, it seems simple enough: Your servers running across a number of virtual machines, with virtual disks and what have you not, where you pay for resource use and you share the environment with a bunch of other users. And that really is not that complicated. And from a pure technical view, that is it, sort of, but there is more to it than that, because when you come to run your stuff in a cloud, you realize that things aren't as simple, and that running in a VM in a cloud really is different from just running in a VM Ware / Zen / Zones environment, or something like that.
This is an easy mistake to do. When I started working here at Recorded Future, where we use Amazon EC2 for everything, that is what I thought. We run Ubuntu in a virtual environment+ Big deal. There are some EC2 integration tools and some GUI to administer the whole shebang, but except for that, this is no different than your server-room Linux box, but at a lower cost. And yes, I fully admit it, I was wrong.
What this is about, more than running in a Virtual environment, is the side effects of an environment shared with other, and how you set up your system to support that. Two important things I have learnt so far in some 5 months with Recorded Future:
EC2 allows you to add disks to your system as you please. There is a GUI (which is not very good) and there are command-line programs (not particularly good either, to be frank, but I am not, I'm Anders) to do this. But they do the job reasonably well. What you can NOT buy is more disk I/O throughput, just like that. You can get more disks and stripe them for sure, but that's about it. The same goes for CPU, you can get more of them (to a limit), but they are only so powerful. It's not like "Gimme a gazzillion of Petaflops" just like that, I'm afraid.
Above all, the network is only so fast, and regrettably that is not terribly fast. Also, the way EC2 manages DNS looks is just plain weird, I have to assume it is so for a reason, but the reason just has to be something you smoke, but not necessarily something you inhale.
So where are we now, then? EC2 has servers with limited performance, with limited disk I/O capacity and interconnected by what sometimes seems like 2400 baud modems. How can all this be useful? And by the way, at times both Disk I/O and Network performance is quite acceptable, but your milage may vary. Tell you what makes this stuff rock: There are MANY servers and many disks. How many as you want. And you can move them around, mount one disk on one machine, and then on another (like in a SAN, for example) just like that.
All in all then, the perfect infrastructure for EC2 then is something that scale with the number of servers. Scale I/O, Scale network performance, scale User connections, whatever the bottleneck is in your system, with the number of servers, each having as much disk as is necessary.
And having said that, you may have figured out where I am going with this, as in the Letterman Show, we are playing Will it scale?. Yes, that is a valid question, and the answer is, maybe. The less state a system holds, the easier it will scale, to make it really simple. A webserver that is just serving stuff off a disk will scale real nice, for example. Also, the less you persist, one way or the other, the better it will scale. The most common way of persisting data is of course writing it to disk, that is clear, but any kind of persisting data that is shared will cause some limit to scalability.
A simple Web-server scales easily, as we have already said. I more complex Web application, say a PHP application, may not scale as well, but still a lot, as long as the state is limited to a single PHP session or similar. The same goes for App-servers I guess. The one thing in most systems that is difficult to scale is the database, and the reason is of course that the database has a lot of state. A stateless database is not much of a database, to be honest.
MySQL has a means of scaling the database that has worked quite well for a number of years in Web-style applications. Scale-out is the way to go. And scale-out is simple enough: Asynchronously Replicated Slaves being fed from a Master. The Asynchronous nature of this Replication means that database writes (which all go to the Master) are not held up by replication to the Slaves. And you can have, in theory, any number of Slaves. But there is a drawback to all this: Only Reads may scale, not writes. Master-Master may help, to an extent, but not much. And a massive replication setup, with N slaves all Replicating from a different point in the Master. Another issue is that for the Slave to do it's job properly, the operations on the Slave are serialized (if this wasn't the case, think about what Foreign Key relationships might cause you), which means the Slaves are slower than the Master when it comes to Writes, which in turn means that unless you want the Slaves to become more and more behind, you better keep the write-load average at the level that the Slave can sustain, not the Master.
And no, by the way, I don't think MySQL Replication is something bad and awful. But I do think I know what it is good at and what it is NOT good at. And it will not help you scale your writes. Nope.
So what would a database that could scale in an Amazon EC2 Cloud look like? Above all, it would need:
/Karlsson
But all that aside, what is in the Cloud, really? And from a technical standpoint, it seems simple enough: Your servers running across a number of virtual machines, with virtual disks and what have you not, where you pay for resource use and you share the environment with a bunch of other users. And that really is not that complicated. And from a pure technical view, that is it, sort of, but there is more to it than that, because when you come to run your stuff in a cloud, you realize that things aren't as simple, and that running in a VM in a cloud really is different from just running in a VM Ware / Zen / Zones environment, or something like that.
This is an easy mistake to do. When I started working here at Recorded Future, where we use Amazon EC2 for everything, that is what I thought. We run Ubuntu in a virtual environment+ Big deal. There are some EC2 integration tools and some GUI to administer the whole shebang, but except for that, this is no different than your server-room Linux box, but at a lower cost. And yes, I fully admit it, I was wrong.
What this is about, more than running in a Virtual environment, is the side effects of an environment shared with other, and how you set up your system to support that. Two important things I have learnt so far in some 5 months with Recorded Future:
- Scalability is key! It really is. And I know, everyone want scalability, but in an environment such as EC2, with many different configuration options, but still a shared environment, you must be able to scale. And Scale horizontally. Even the very largest virtual machines at Amazon has the power for demanding applications in terms of disk I/O performance, Network throughput and latency and CPU performance.
- It's not cheap. No, it's not, you are wrong. And that is not to say that it's bad. But it is different! If you manage that difference, you can run a very cost-efficient operation with EC2. But if you expect vertical scaling or have monolithic setups that runs on a single machine that has to scale, in a single instance, with your needs, then don't think this will save you and headaces or money.
But if you build your infrastructure in such a way that it scales nicely acress servers, and ensure that performance requirements on a single server are modest, and can be distributed if needed, then EC2 is for you.
EC2 allows you to add disks to your system as you please. There is a GUI (which is not very good) and there are command-line programs (not particularly good either, to be frank, but I am not, I'm Anders) to do this. But they do the job reasonably well. What you can NOT buy is more disk I/O throughput, just like that. You can get more disks and stripe them for sure, but that's about it. The same goes for CPU, you can get more of them (to a limit), but they are only so powerful. It's not like "Gimme a gazzillion of Petaflops" just like that, I'm afraid.
Above all, the network is only so fast, and regrettably that is not terribly fast. Also, the way EC2 manages DNS looks is just plain weird, I have to assume it is so for a reason, but the reason just has to be something you smoke, but not necessarily something you inhale.
So where are we now, then? EC2 has servers with limited performance, with limited disk I/O capacity and interconnected by what sometimes seems like 2400 baud modems. How can all this be useful? And by the way, at times both Disk I/O and Network performance is quite acceptable, but your milage may vary. Tell you what makes this stuff rock: There are MANY servers and many disks. How many as you want. And you can move them around, mount one disk on one machine, and then on another (like in a SAN, for example) just like that.
All in all then, the perfect infrastructure for EC2 then is something that scale with the number of servers. Scale I/O, Scale network performance, scale User connections, whatever the bottleneck is in your system, with the number of servers, each having as much disk as is necessary.
And having said that, you may have figured out where I am going with this, as in the Letterman Show, we are playing Will it scale?. Yes, that is a valid question, and the answer is, maybe. The less state a system holds, the easier it will scale, to make it really simple. A webserver that is just serving stuff off a disk will scale real nice, for example. Also, the less you persist, one way or the other, the better it will scale. The most common way of persisting data is of course writing it to disk, that is clear, but any kind of persisting data that is shared will cause some limit to scalability.
A simple Web-server scales easily, as we have already said. I more complex Web application, say a PHP application, may not scale as well, but still a lot, as long as the state is limited to a single PHP session or similar. The same goes for App-servers I guess. The one thing in most systems that is difficult to scale is the database, and the reason is of course that the database has a lot of state. A stateless database is not much of a database, to be honest.
MySQL has a means of scaling the database that has worked quite well for a number of years in Web-style applications. Scale-out is the way to go. And scale-out is simple enough: Asynchronously Replicated Slaves being fed from a Master. The Asynchronous nature of this Replication means that database writes (which all go to the Master) are not held up by replication to the Slaves. And you can have, in theory, any number of Slaves. But there is a drawback to all this: Only Reads may scale, not writes. Master-Master may help, to an extent, but not much. And a massive replication setup, with N slaves all Replicating from a different point in the Master. Another issue is that for the Slave to do it's job properly, the operations on the Slave are serialized (if this wasn't the case, think about what Foreign Key relationships might cause you), which means the Slaves are slower than the Master when it comes to Writes, which in turn means that unless you want the Slaves to become more and more behind, you better keep the write-load average at the level that the Slave can sustain, not the Master.
And no, by the way, I don't think MySQL Replication is something bad and awful. But I do think I know what it is good at and what it is NOT good at. And it will not help you scale your writes. Nope.
So what would a database that could scale in an Amazon EC2 Cloud look like? Above all, it would need:
- Flexible configuration - Adding a node should be just that, adding a node. No restart, no optimizations, no reorg of data, no downtime, just Here is a node: Use it. And removal of a node should be the same. And management of data in the database as well. No more monolithic database configurations, please!
- Scalable performance - And once you add those nodes, they really should be able to increase the performance of things, and not just to a small extent.
- Data distribution - Yes, I want my data distributed. Replicated to where it is used. Distributed and persisted to where it is best persisted.
- Transparent - Yes, all this should be transparent to the application.
- SQL based RDBMS - Yes, I know, I am an old fashioned guy, but this what I want. SQL because it is ubiquitous, not because I think it's the best query language on the planet. And I want an RDBMS because I firmly believe that that is best for my data. Which is not to say that some application might prefer some other means of storage (if you want my full view on these matters, read this post).
/Karlsson
Subscribe to:
Posts (Atom)