Last week I attended MongoDB Day London. Now MongoDB itself is a technology that I’m fairly interested in, I can see where it would have its uses. But the problem is the people! They all talk like this:
- Some problem that just doesn’t really exist (or hasn’t existed for a very long time) with relational databases
An example would be the first speaker, who didn’t like normalized data because it had “bad locality”. Now ignoring for a second the difference between a logical and a physical data model, and the existence of the normal forms, if you ever did find that the bottleneck on your joins specifically was seek time, you could pre-compute the join and refresh it whenever anything changed – in Oracle using a materialized view (1996!), a continuous query, the result cache… And that’s on top of the block buffer cache and the query optimizer already being very smart. Another of the same speaker’s examples overlooked the existence of nested tables, saying what you could do with them is impossible in “SQL databases”. It’s claimed that MongoDB is more flexible because it doesn’t constrain you to tables. Well that’s backwards… We don’t work the way we do because tables are a limitation of the technology, we use the relational model because it has sound mathematical underpinnings, and the technology reflects that†. Where’s the rigour in MongoDB’s model?
Another speaker claimed that it was far better for each application to have its own database, and expose all its data through web services. Sounds good, except you now need another technology, a directory to find all these things, since they aren’t just table names or stored procedure names in the one place, and manage access control and auditing. And if you need to touch data across several of them then you’ll need something to coordinate that… We could call it a transaction processing facility, since that’s what IBM called it in 1960. He handwaved over both of those. There were many similar examples.
Another recurring theme was of an organization refreshing its hardware and modifying its architecture, one component of which was introducing MongoDB, yet all the performance gains attributed to it. For example splitting OLTP and OLAP from one database into two, and introducing a delay of a few minutes between data coming in and being available for reporting. Well that will give you a massive performance boost in any database! If you can tolerate the delay, of course. But if you could, why build it that way in the first place (or having built it, complain that it’s slower than you’d like), and if you can’t, then you can’t do this. In the roadmap they are promising point-in-time recovery in a future release. Oracle had that in 1988, when I had just left primary school.
So anyway, since it’s free‡, there’s no reason not to evaluate MongoDB, and see if it suits your use cases. But just remember that these kids think they’re solving problems that IBM (et al) solved quite literally before they were born in some cases, and the features are probably already there in your existing database/technology stack (I have used Oracle for my rebuttals just because I am most familiar with it, but I expect the same is true for SQL Server and DB/2 as well). Talk to your friendly local DBA…
† I personally predict that in a few years there will be a lot of work re-normalizing the data in MongoDB and its rivals so it can actually be useful. That’s reason enough to become an expert in it. In about 2001, the company I joined then had just completed a massive engineering effort to get off Versant and (back) into Oracle… All this object-database stuff gives me massive deja vu for the 1990s when they were all the rage.
‡ In the same way that Oracle is also free for evaluation purposes. No-one would deny that Oracle is expensive in production! But there is no such thing as cheap or expensive in business, there is only worth the money or not.
Update: Someone has posted this on Hacker News and Reddit.
” there is no such thing as cheap or expensive in business, there is only worth the money or not.” That is so true!!! Good post 🙂
I totally agree with you. NoSQL DBS are interesting but just for a niche of problems. Unfortunately a plethora of developers are trying to use them just cause they are too lazy to use RDBMS correctly, I have shared my idea here:
TL;DR This comment became long… It is not meant as an argument with Gaius – not even a disagreement… Just some thoughts, that I think people reading this, needs to have in mind in this hole NoSQL vs SQL or Document/Object vs Relations.
.oO(I should get my own blog soon…)
Though I agree with you and think that much of the hype is somewhat a display of ignorance, I still feel we are going the document database way, because we need it (in addition to relational databases).
Go(lang) would probably have been a dead fish before it was even born, if it were started in the ’90s.
We have new issues to solve, now that we are building for networked services, scaled over multiple CPUs, machines and even geographical locations in some cases.
And in some cases millions of requests every single day.
MongoDB shouldn’t be viewed as a replacement for Oracle or whatever relational database – that’s where the (some) evangelists of (most) NoSQLs make their biggest mistake!
MongoDB is a new way of handling data, which in some cases can be more logical than a relational model.
ElasticSearch and Solr are good examples of the need for a speedy way of querying data in a way that is not doable without forcing MySQL or PostgreSQL into a weird looking form.
Yes… Postgres can “act” as key->value storage, but try anything more than that and it becomes ugly and hard to debug – Redis is the first step towards something better.
I am aware that Oracle, DB2, SQL Server, etc. have a lot of features, that would make many of the “problems” go away… but…
All of these servers are monsters! They are expensive and needs at least 1 fulltime DBA to be able to just compete with anything in the open source / startup realm (because they suck, when configured incorrectly or doesn’t have enough hardware).
If you don’t know your way around, you are gonna make it slow – even more, they are (often) slow as hell, when dealing with small amounts of data.
Now I’m not saying MySQL, PostgreSQL and the likes (including NoSQLs) can in any way compete with the overall speed of the big DBs (I have seen SQL Server do statistics… Nice!), you just need to have a reasonable amount of data and A LOT of knowledge of the specific Database.
John and Jane decides to follow their dream and start a website they think has been missing for years.
They quit their job, though they only have enough money to be job-less for about 6 months.
The first thing they do, is to setup an EC2 instance and buy a $10.000 Oracle license… Wrong…
They fire up an EC2 instance and put Apache, PHP, MySQL, and just for fun MongoDB on it.
6 months down the road, they are doubling visits each month.
Now they pay $10.000 in Oracle licenses to be able to handle the requests… Wrong…
They install a couple of MySQLs more and hire this guy that can help them scale by offloading some of the stuff to MongoDB and Redis.
In the web sphere today, things move from 0-60 (or 0-0) in no time!
In no startup case is $10.000 (extremely low estimate) “worth” spending.
At first you don’t have the money and the knowledge – then it becomes a question of time.
When you get bigger, you don’t want to spend money on both licenses AND a rewrite of your software.
You want to make what you have work… NOW…
And you reeeeaaallllyyyy don’t want to be a month or two away from succeeding, because you needed a license so expensive, it could have kept you in bussines for that two more months instead.
I am not trying to argue with you, when it comes to how powerful the big DBs have became.
I’m just saying: “Yes they are selling NoSQLs with wrong arguments, but both types has their place in this new era of IT everywhere”…
Banks should probably not use MongoDB… 😉
And Joes website selling homeknit kitty cloth will never be able to pay for an Oracle license! The developer doesn’t have the knowledge to handle it and there is no way Oracle will perform as well as SQLite on that tiny EC2 instance that Joe can afford – it will probably not even start…
And a final side note: I think some of the funded startups could actually do better, of they went with one of the big RDBs… Their is just enough startup types that have the guts to focus solely on DBs and risk not having a job, because they only know this 1 thing…
Personally I have reasonably good knowledge of all the aspects of a big system.
I can architect them, test and choose the technologies, manage the servers, install the services, manage them, code the system and manage developers.
That is probably going to keep me in a job for a looong time.
If you like startups… Just don’t focus on a single technology – startups have a complex system, way before they are able to pay 1 guy to only handling the database side.
That was quite a rant… Sorry about that… Hope somebody can use it to reflect on this weird war some people are fighting… 😉
Can’t argue with any of that 🙂
My criticism here isn’t really of MongoDB itself, it’s just a tool, one of many in my toolbox. It’s of the hype and the style-over-substance which the technical community ought to be past by now.
That was also what I got from it… I just got into that write mode where you can’t stop 😉
And as I said, It wasn’t an attempt at arguing with you, but rather a bunch of thoughts to the discussion.
But yeah… Totally agree on the magnitude of the hype and lack of knowledge 😉
“Banks should probably not use MongoDB… ;)”
Can you substantiate your argument? Unless of course the smiley is indicative of you being tongue in cheek. Otherwise, you need to prove your point. In the instance of MongoDB, Banks and IBs are what actually drive the features.
Gaius, I was at MongoDB days in London too, and my interpretation of it is quite different. It was, for the most part, a showcase of the new features available in MongoDB and of the things that people were doing with it. May I also remind you, that Oracle is also investing in a NoSQL technology (Oracle NoSQL Database). Are they throwing out the mathematical proof you seem to be underpinning your whole argument around? Or is it such hubris, that “kids” are “solving” problems that somebody _tackled_ (i.e. not necessarily solved the right/optimal way) 30 years ago?
Food for thought.
I have always said use the right tool for the job (trawl back through my blog and you will find Coherence, SQLite and others there). Oracle has offered BerkeleyDB for years. My issue is with the repeated assertions that “SQL databases can’t do this”, when in fact they can, and have been doing so if not all along, then for a very long time. Like the guy who thought splitting the DB from a single Oracle to a pair of MongoDBs one for transactions and one for reporting, on newer hardware, with fewer constraints on the timeliness of availability of data, that the performance boost was only because of Mongo!
I think you hit the nail on the head when you said that many of the problems being solved were already solved before the devs were born. Fresh new developers are keen and enthusiastic however quite often a little arrogant with it, so they don’t want to listen to developers that are 30 years older than them because what they do is old and what the new guys do is new fantastic and better than anything that went before an whoa betide anyone to tell them otherwise. The wheel is continually re-invented to many hoots and holla’s from new-guys and just as many sighs from the experienced developers thinking “here we go again”. MongoDB is good, but then again so is Oracle, MSSQL Server, derby, hypersonic e.t.c. use what suits your use case and falls within your resource constraints that’s all.
p.s. Brilliant article, finally someone mentioned the elephant in the room 🙂
I’ve used mongodb for years and in my use case beats mysql 10x
Pick the right tool for the job, I always say.
screaming “my db’s better than yours” kind of misses the point.
I keep reading articles like this, and have yet to meet in person someone that says “just use (nosql flavor of the week) and all your problems go POOF!” There are tons of people, rational, talented sane people, that are embracing emerging technologies. That doesn’t make them zealots or mindless fanboys.
Back in the early days (yes, I am older than you, but appreciate the “kid” reference) the push by Oracle, IBM, Informix, and Sybase was to shove as much “business logic” into the database tier as possible. And for the days’ applications – such as corporate systems that didn’t need scale – that was a great fit. As well, all the logic was contained in one place, making life easier in heterogeneous environments. Perl and python scripts in the data center could access data with the exact same API as the Visual Basic apps running in accounting, for instance. Even IBM caved in and made it much easier to access external, non-mainframe-powered databases, so you could separate your mainframe apps from your data storage. Life was good.
Then the web happened. And suddenly your applications were no longer being accessed by 500 people at a time, if you were successful you would get over a million visitors daily. However the relational systems were not trying to solve that problem, they were perfecting ACID compliance and adding features. And suddenly the extreme pain felt from extreme growth was entirely about the database, which was the hardest thing to scale to meet accelerating performance needs.
The new, nonrelational systems that emerged were specifically solving these modern problems, and these problems rarely had anything to do with a relational approach. Programming languages evolved to a point where data was treated as objects, and working with relational systems started to become a hindrance to productivity. In effect we turned a 180 and started removing logic from the database engine, leaving it to do what it should be doing, store and retrieve data (and quickly).
You now have two different mindsets of what a “database” is, for some it is a big powerful engine that can do tons of great stuff for you; and for others it is a simple engine that stores and finds your stuff and does it quickly and without any complexity.
Can’t we all just get along? *sniff* 😉
Heh, I saw a dozen of them give presentations saying that, that day, and there were probably a few hundred more “true believers” in the audience.
I’d call them on it right there on the spot, and I’ve spoken at well more than twenty events in over five countries without ever having to.
That said, I don’t know what’s worse, cramming a mysql database design into MongoDB, or trying to shove serialized data into Oracle’s “nested tables.” If it’s a boat, then my man, keep that thing in the water, don’t put wings on it! LOL
I feel like too many people are trying to make boats that fly, and planes that can float. Neither do both very well, so you’re far better off with a plane AND a boat based on your needs.
Yeah, but when a guy says to you, planes can’t fly, you should use a boat, then you might think he doesn’t know enough about planes or boats to teach you anything about getting from A to B.
If someone said that to you, and you thought he didn’t know anything about planes or boats, then you’d be right (but in all fairness is not limited just to MongoDB). I’ve met just as many people in <>:
“We only got two kinds of music here: Country, and Western.”
“I can do all of that and more in perl, on only one line!”
“My database can do documents too, I’ll just add a conditional loop in a stored procedure, or maybe it’ll be easier to just serialize the data and use a fulltext index on it.”
“But this website took only seconds to build! Oh you want that feature too, sure lemme get a plugin… No wait, will be easier to just build a custom one, only takes two weeks.”
“Oh you’re coming from Italy, here let me make you an expresso.”
These are the same people that in the caveman years, we could have counted on the lions, tigers and bears to take care of. Can we bring back natural selection, please? 😉
Just a little quibble: That people’s arguments for NoSQL are flawed means nothing in relation to the technology. They may simply be missing important points, or expressing them badly.
A similar point. though, is that the features you mentioned of Oracle are somewhat advanced. It is not surprising that someone who has not worked as much with Oracle (or SQl Server or whatever) does not know them.
That may indicate that the biggest advantage of NoSQL is a shallower learning curve. Or maybe not. YMMV.
That is exactly my point…
In 1995, IBM bought Lotus which had released Notes in 1989. So they also had a document db before some of these kids were born 😉
“Your friendly local DBA” – you mean the one who charges $1500 a day?
That’s exactly the point, really. Perhaps all of these discoveries have been made already, but that knowledge is locked up in the brains of DBAs who startups – among which Mongo is very popular – simply can’t afford to talk to. I honestly don’t know what any of the words in your 3rd paragraph mean, which means for me, as a young programmer, Mongo works and relational DBs don’t work.
It’s true that Mongo isn’t particularly technologically innovative – it’s solving problems that DBAs know the answer to already. But it’s business-innovative by making these solutions accessible by dropping the rich feature set of relational databases in favor of a non-relational DB that programmers with very limited DBA experience can actually work with.
That’s a fair point, but this wasn’t the message on the day. And 10gen is a commercial organization that charges for support – they aren’t a charity. Whether you need it or not is up to you of course, but nothing is really free!
Incidentally you can get Oracle, SQLServer et al pretty cheaply as a startup.
Those who don’t know SQL are cursed to reinvent it forever.
The elephant in the room is that the databases that have these features are closed source and commercial. PostgreSQL is the most sophisticated open source database and they have materialized views lined up for the next release, here in 2013. It’s this cultural aversion to and practical lack of access to modern commercial databases that leads developers to believe that the state-of-the-art is what you get in MySQL, and hence anything that fixes its defects must automatically be “new.”
I am uncertain but I sense the devil here. That is “the devil you know is better than the devil you don’t”. As technologists we are always strongly biased toward technologies we know well. Versus those that we know some about.
No it is the exact opposite – throwing the baby out with the bathwater.
good ole Lotus beats all of the current document DB’s by far!
– Ease of use