NoSQL Solution: Evaluation and Comparison: MongoDB vs Redis, Tokyo Cabinet, and Berkeley DB [CHART]

You may think this is yet another blog on NoSQL (Not Only SQL) hype.
Yes, it is.
But if at this moment you are still struggling to find a NoSQL solution that works, read through to the end, and you may have decided what to do. (I will keep the answer to the end just for fun.) — For those of you who can't wait for the answer, you can skip to the chart below.
When I was involved in developing Perfect Market's content processing platform, I desperately tried to find an extremely fast — in terms of both latency and processing time — and scalable NoSQL database solution to support simple key-value (KV) lookup.
I had pre-determined requirements for the 'solution-to-be' before I started looking:
- Fast data insertion. Some data sets in our content processing platform may contain hundreds of millions of rows (KV pairs), although each row may be small. If data insertion is slow, populating a data set into the database may take days, which would not be acceptable.
- Extremely fast random reads on large datasets. This is key to achieving short content processing time.
- Consistent read/write speed across the whole data set. What this means is that the speed should not favor certain parts of a data set due to how data is stored or indices are organized.
- Efficient data storage. The ratio of the database size (after original data is loaded into database) to original data size should be as low as possible.
- Scale well. Our content processing nodes in EC2 may spawn a large number of concurrent threads hitting data nodes, which requires data nodes to scale well. Also, not all data sets are read-only. Some data nodes must scale well under moderate write load.
- Easy to maintain. Our content processing platform utilizes both local and EC2 resources. Packaging code, setting up data and running different types of nodes in different environments is not easy. The 'solution-to-be' must be easy to maintain to fit in the highly automated content processing system.
- Have a network interface. A library solution is not sufficient.
- Stable, of course.
I started looking without any bias in mind since I had never seriously used any of the NoSQL solutions. With some recommendations from fellow co-workers, and after reading a bunch of blogs (yes, blogs), the journey of evaluation started with Tokyo Cabinet, then Berkeley DB library, MemcacheDB, Project Voldemort, Redis, and finally MongoDB.
There are other very popular alternatives, like Cassandra, HBase, CouchDB … you name it, but we haven't needed to try them yet because the one we selected worked so well. The result turned out to be pretty amazing and this blog post shares some details of my testing.
To explain which one was picked, and why it was picked, I took a suggestion from my co-worker Jay Budzik (CTO), and compiled a comparison chart for all solutions I have evaluated (below). Although this chart is an after-fact thing, well, it’s still helpful to show the rationale behind the and will be helpful to people who are still in decision-making process.
Please note that the chart is not 100% objective and scientific. It is a combination of the testing results and my gut feelings. It was funny that I started the evaluation process without any bias, but after testing out all of them, I may be biased (especially biased based on my use cases).
Another thing to note is that disk access is by far the slowest operation in these I/O intensive workloads. Compared to memory access it is milliseconds to nanoseconds. To handle a data set containing hundreds of millions of rows, you better give enough memory to your computer. If your computer only has 4GB of memory and you try to handle a 50GB data set and expect ultimate speed, you need to either toss your computer and use a better one, or toss out all of the following solutions because none of them will work.
Looking at this chart, you may start to guess which solution I picked. No rush, let me tell you more about each of them.
Tokyo Cabinet (TC) is a very nice piece of work and was the first one I evaluated. I still like it very much, although it was not ultimately selected for our application. The quality of the work was amazing. The hash table database is extremely fast on small data sets (below 20 million rows, I would say) and horizontal scalability is fairly good. The problem with TC is that when the data size increases, performance degradation is significant, for both reads and writes. Even worse, with large data sets performance is not consistent when accessing different parts of the data set. Accessing data inserted earlier appears to be faster than accessing data inserted later. I’m not an expert on TC, and do not have an explanation for this behavior, but the behavior made it impossible to use TC for our application. Using the TC B-Tree database option did not exhibit the same problem but overall performance was much slower.
Berkeley DB (BDB) and MemcacheDB (remote interface of BDB) are a pretty old combination. If you are familiar with BDB, and are not so demanding on speed and feature set, e.g., you are willing to wait for couple days to load a large data set into the database and you are happy with an OK but not excellent read speed, you can still use it. For us, the fact that it took so long to load the initial data set made it less good.
Project Voldemort was the only Java-based and “cloud” style solution I evaluated. I had very high expectations before I started due to all the hype, but the result turned out to be a little disappointing, and here is why:
- BDB Java Edition bloated my data too much (approximated a 1 to 4 ratio while for TC it is 1 to 1.3). Basically the storage efficiency is very low. For large data sets, this is a disaster.
- Insertion speed drops significantly when the database gets bigger.
- Crashed sometimes with obscure exceptions while a large data set was being loaded.
Since data was bloated too much and sometimes crashes happened, the data loading process did not even finish. With only one quarter of the data set populated, it got an OK read speed but not excellent. At that time I thought I better give up on this. Otherwise, besides the above listed tuning, JVM may turn more of my hair gray, although I worked for Sun for five years.
Redis is an excellent caching solution and we almost adopted it in our system. Redis stores the whole hash table in memory and has a background thread that saves a snapshot of the hash table onto the disk based on a preset time interval. If the system is rebooted, it can load the snapshot from disk into memory and have the cache warmed at startup. It takes a couple of minutes to restore 20GB of data depending on your disk speed. This is a great idea and Redis was a decent implementation.
But for our use-cases it did not fit well. The background saving process still bothered me, especially when the hash table got bigger. I had a fear that it may negatively impact read speed. Using logging style persistence instead of saving the whole snapshot could mitigate the impact of these dig dumps, but the data size will be bloated if frequently, which eventually may negatively affect restore time. The single-threaded model does not sound that scalable either, although, in my testing, it scaled pretty well horizontally with a few hundred concurrent reads.
Another thing that bothered me with Redis was that the whole data set must fit into physical memory. It would not be easy to manage this in our diversified environment in different phases of the product lifecycle. Redis’ recent release on VM might mitigate this problem though.
MongoDB is by far the solution I love the most, among all the solutions I have evaluated, and was the winner out of the evaluation process and is currently used in our platform.
MongoDB provides distinct and superior insertion speed probably due to deferred writes and fast file extension with multiple files per collection structure. As long as you give enough memory to your box, hundred of millions of rows can be inserted in hours, not days. I would post exact numbers here but it would be too specific to be useful. But trust me — MongoDB offers very fast bulk inserts.
MongoDB uses memory mapped files and usually it takes only nanoseconds to resolve minor page faults to get file system cached pages mapped into MongoDB’s memory space. Compared to other solutions, MongoDB will not compete with page cache since they are same memory for read-only blocks. With other solutions, if you allocate too much memory for the tool itself, then the box may fall short on page cache, and usually it’s not easy or there may not be an efficient way to have the tool’s cache fully pre-warmed (you definitely don’t want to read every row beforehand!).
For MongoDB, it’s very easy to do some simple tricks (copy, cat or whatever) to have all data loaded in page cache. Once in that state, MongoDB is just like Redis, which performs super well on random reads.
In one of the tests I did, MongoDB showed overall 400,000 QPS with 200 concurrent clients doing constant random reads on a large data set (hundred millions of rows). In the test, data was pre-warmed in page cache. In later tests, MongoDB also showed great random read speed under moderate write load. For a relatively big payload, we compress it and then save it in MongoDB to further reduce data size so more stuff can fit into memory.
MongoDB provides a handy client (similar to MySQL’s) which is very easy to use. It also provides advanced query features, and features for handling big documents, but we don’t use any of them. MongoDB is very stable and almost zero maintenance, except you may need to monitor memory usage when data grows. MongoDB has rich client support in different languages, which makes it very easy to use. I will not go through the laundry list here but I think you get the point.
Although MongoDB is the solution for most NoSQL use cases, it’s not the only solution for all NoSQL needs. If you only need to handle small data sets, Tokyo Cabinet is pretty neat. If you need to handle huge data sets (petabytes), and have a lot of machines, and if latency is not an issue, and you are not pursuing ultimate response time, Cassandra, HBase might be a good fit.
Lastly, if you still need to deal with transactions, don’t bother with NoSQL, use Oracle.
— Jun Xu
Principal Software Engineer
Note: For more great insights into the NoSQL solution, check out "NoSQL Comes of Age: Why Perfect Market Likes MongoDB and Why You Should, Too" where Perfect Market's Chris Germano discusses new features added to MongoDB.
Earlier Posts
-
The Vault Index: Election 2010 — “Money Can’t Buy You Electoral Love”
03 November 2010 -
What Price, Paywalls?
03 November 2010

Comments
Lukas says:
One sentences that is very important to highlight:
“Please note that the chart is not 100% objective and scientific.”
So why do you post such information? Where are the insights? How did you get to your 100 % subjective conclusions?
Is there any reproducible test that you can claim?
November 08, 2010antirez says:
I’m the lead developer of the Redis project, there are a few things I don’t understand:
- Redis low performance on “big data set”. This sounds very strange, Redis performances are very deterministic being an in-RAM database. I wonder how this test was run.
- Redis community / activity is rated as two starts. This is very surprising, there are two people working at the project full time, an impressive community with the latest Redis and London meetups with many people attending. An IRC channel with more than 130 users when is work time in the Pacific timezone, every day. A mailing list that is very active and with 1000 subscribed users. And we are near to release 2.2 after less than two months of development.
- Stability. MongoDB is a very cool project, but I’m sure that objectively Redis was, in the latest two years, a more stable deal. Ask around… Redis crashes are practically non existing in our stable releases history, and I in the past got different reports from Mongo DB users. Now I’m sure MongoDB reached stability and is stable as well, but how Redis was rated less stable than MongoDB?
May continue with many other dubious metrics, but those one may well be a subjective matter, while the one I pointed out are not I think.
Cheers,
November 08, 2010Salvatore
Jun Xu says:
Obviously, all of the tests are reproducible and the chart was concluded from hard testing results. However, I am not sure the extent to which these results are dependent on the specific configuration and application. Your mileage may vary. What I said that it was not 100% objective and scientific I meant that I cannot guarantee 100% accuracy on whether for certain metrics it is 3 pointy fingers, 3.5 pointy fingers or 4 pointy fingers. Sometimes “100% objective” benchmarks flooded with numbers of 10 significant digits are also misleading, right? If you want to test by yourself the scenario of 400,000 overall QPS with 200 concurrent clients on a data set of 100+ million rows, please send me an email and I can help you set up that. If there are more requests on that, I can post the steps here. I cannot post the exact code here because it is too specific to our environment.
November 08, 2010toni says:
Nice blog post.
I am struggling to find a good nosql solution with fast aggregation semantics.
Do any of the nosql solution you tried provide such a functionality?
MongoDB is okish, but I don’t like writing my map reduce in javascript.
November 09, 2010It also seem rather slow.
Jun Xu says:
Thanks, antirez, for the information you provided in your comment. Actually I give Redis highest ranking on random read speed exactly same as MongoDB. In terms of large data insertion, as I said in my blog, I was not happy about writing the huge data set from memory to disk in a fixed time interval. With EC2 high memory extra large instances, the in-memory data set could get really big (maybe 50GB to 60GB) and the snapshot saving process will take long time. I did mention in my blog that changing to the update log mode could mitigate this problem but it may have other problems. I agree that community support is a more subjective area and I noticed that the VM release did take a while to go out at the time I was doing evaluation. I really appreciate that you have provided enough information to clarify the situation - This is what a blog is for.
November 09, 2010cool says:
Very nice article. Would love to see CouchDB and Cassandra added in the future.
November 09, 2010Jun Xu says:
Hi Toni,
Aggregation semantics in NoSQL was not my focus during evaluation. I was not aware of any built-in language constructs supporting that in any of these solutions.
November 10, 2010Edimar says:
I wonder how it’s done to refresh the cache for testing MongoDB
November 18, 2010James says:
you missed membase…
November 23, 2010Perfect Market says:
Hi James,
November 23, 2010Membase looks promising. Should be included in the next round.
Luca says:
It would be interesting a comparison also with OrientDB
November 24, 2010Gregory Burd says:
Hello, I’m a product manager for Berkeley DB at Oracle. I’m surprised at your lackluster results with Berkeley DB, especially in the bulk load (you did use the bulk load APIs, correct http://download.oracle.com/docs/cd/E17076_02/html/api_reference/C/txnbegin.html#txnbegin_DB_TXN_BULK ?) and random read. Without publishing your benchmark code it is hard for me to offer tuning advice on Berkeley DB, or even to know if you used Berkeley DB’s Java or C-based products (or even which access method, BTREE or HASH table?). I’m not discounting your results, just questioning your methods. I’ll fully admit that tuning Berkeley DB is tricky, we’ve not done a lot to help people in that regard. Our default configurations are not appropriate for today’s hardware.
Publish your code, ask questions on our forums, ping us via @berkeleydb before you publish results for any benchmark.
I look forward to helping you further improve your Berkeley DB results.
November 24, 2010Jun Xu says:
Hi Gregory,
I did not do in-depth optimization to the products being evaluated, which, I think, was “fair” but maybe not the favored approach to all products mentioned in this blog. For Tokyo Cabinet I did change bnum to be the recommended huge number and for BDB I only increased buffer size to 1 GB. I (and other people too, I think) appreciate that you are happy to provide help on tuning BDB. I’m wondering if you can provide results on a tuned-up BDB for the following scenario on an EC2 high-memory (m2.4xlarge) instance:
Randomly generate 200 million key-value pairs (most of them should be unique). The key size should vary around 35 characters and the value size should vary around 40 characters. Basically the raw data size is about 15 GB.
Load these key-value pairs to BDB, of course, key uniqueness must be enforced during loading. But data does not need to be synced to disk for every insert as long as it is eventually on disk after the loading process is done.
For MongoDB, loading this data set took 4 hours using the primary key column (_id) to store keys and another column to store values on the instance I mentioned above. If BDB can do a better job, please post your result and settings for BDB here so everyone can benefit from the post.
Thank you!
Best,
November 24, 2010Jun
Jonas Lejon says:
Wow, nice comparison Jun. I’ve made some research myself and have come up with almost the same results as you. Thanks
January 03, 2011Jay says:
Thanks for the effort to compare everything. Did you post anything with a membase comparison?
August 14, 2011ze yu says:
nice work. better to paste the param settings of diffenrent DBs.
September 19, 2011morrel says:
Your claim about BerkeleyDB’s slow insertion speed seems rather hasty to me.
I’m not quite sure you’ve used all the right flags.
Keep in mind that BDB has been around for quite a while, and thus has lots of tried and tested tweaks and flags for about every need a developer has.
You might try
October 28, 2011http://download.oracle.com/docs/cd/E17076_02/html/api_reference/C/txnbegin.html
morrel says:
Basically, the data load speed figures seem very very poor
We load 30Gb of data, into tables with over 10 contraints, autorefreshing materialized views, etc… in and Oracle database in less than 15 minutes using Oracle Warehouse Builder.
We use Non-relational databases like BerkeleyDB to present some data, and cache some simple, very frequently used queries. And tha works well, provided you tune the BDB data load phase with the right flags suiting your needs.
Cheers
October 28, 2011Morrel
pierre says:
Just have a look to leveldb a nosql database Key/value, very very impressive speed !
November 27, 2011http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html