Tuesday, 7 February 2012

Is RavenDB all it's cracked up to be?

It all started with a Vampire Launch.
As a seasoned technologist I’m always happy to bare old battle scars and discuss them candidly with my peers and starry eyed groupies* over a few pints down the local. Technology, after all, always has problems and we have taken it upon ourselves to carry that giant boulder up the hill and start afresh when it rolls back down again.

Take, for example, a recent run in with a search implementation on a property website. We selected a search technology built on Lucene.Net - a high performance, enterprise search platform that is fast becoming an industry standard.  Choosing a Lucene based search provider for a web site heavily reliant on search functionality seemed like a great idea at the time. Fast forward to go live and we soon discovered that the search didn’t hold up very well under significant load. This is what I call a “Vampire Launch” i.e. a web site having a brief moment in the sun but then suddenly erupting into flames. One stressful week later and the search was back online and taking the strain of public exposure. This time however, it was now driven by trusty Microsoft SQL.

Before I cover up this grizzled scar (until the next showing that is), it's worth highlighting three important lessons harvested along the way.

Lesson 1: If you are trying out a new technology make sure your POC includes load testing.

Lesson 2: If you are using a new technology make sure the providers of the technology are able to support it.

Lesson 3:  Make sure you build using interfaces and abstraction in case you ever need to swap out a nasty search implementation.

So how does this all relate to RavenDB?

When a colleague mentioned  RavenDB  to me I had a poke around and discovered that it was one of the more popular open source NoSQL technologies on the market. Not only that but it was bundled with Lucene.Net Search making it Document Database coupled with Lucene search capabilities.  With an interest in NoSQL technology and a grudge match that hadn’t been settled with Lucene.Net, I set myself the challenge to swap out our SQL Search implementation with RavenDB and then do a like for like load test against the two search technologies.

These are my findings from both a programmatic and performance perspective.

Installing RavenDB

There isnt much to installing Raven and its pretty much a case of downloading the latest build and running the Server application.

The server comes with a nice Silverlight management interface which allows you to manage all aspects of Raven Db from databases to data to indexes. All tasks have a programmatic equivalent but a decent GUI is an essential tool for noobs like myself.

Storing the Data

My first development task was to write an import routine which parsed the property data in SQL and then add it into a Raven Database. This was fairly easy and all I needed to do was to create a POCO, plug it with data from SQL and save it using the C# Raven API. The POCO serialised into JSON data and saved as a new document in the  RavenDB.

The main challenge here was changing my thinking from relational modelling to domain driven modelling - a paradigm shift required when moving to NoSQL - which includes concepts like aggregate roots, entities and value types. Journeying into this did get a bit metaphysical at times but here is my understanding of this new fangled schism.

Entity - An entity is something that has a unique identity and meaning in both the business and system context. In the property web site example, a flat or a bungalow or an office match these criteria.

Value Type - Part of the entity which does not require its own identity and has no domain or system relevance on its own. For example, a bedroom or a toilet.

Aggregate Root - Is an master entity with special rules and access permissions that relate to a grouping of similar entities. For example, a property is an aggregate of flats, bungalows and offices. This is the best description of these terms I found.

In this example, I created one Aggregate Root Entity to store all property types.

C# Property POCO

Indexing the Data

Once the Data was stored it needed to be indexed for fast search. To achieve this I had to get to grips with map reduce functions which I had seen around but avoided like the sad and lonely looking bloke** at a FUN party.

The documentation is pretty spartan on the  RavenDB web site but after hacking away I finally created an index that worked on documents with nested types and allowed for spatial queries.

RavenDB allows you to create indexes using Map Reduce functions in LINQ. What this allows you to do is create a Lucene index from a large, tree like structure of data. Map reduce functions give you the same capability as SQL using joins and group by statements. To create a spatial index which allowed me to search properties by type and sector (nested value types) I created an index using the following Map Reduce function.

Index created using the Raven DB Admin GUI

Querying the data

Now that I had data that was indexed, the final development challenge was querying it. RavenDB has a basic search API and a Lucene Query API for more complex queries. Both allow you to write queries in LINQ. To create the kind if complex queries you would require in a property searching web site, the API was a bit lacking. To work around this I had to construct my own native Lucene queries. Fortunately the API allowed me to do so.

Performance Testing

All the pawns were now in place for my load test.
  • The entire property SQL database was mirrored to  RavenDB.
  • The Search Interface now had both a SQL and a  RavenDB implementation.
  • I created a crude Web Page which allowed switching the search from SQL to  RavenDB via query string parameters and output the results using paging.To ensure maximum thrashing the load tests passed in random geo locations for proximity search and keywords for attribute search. 
  • A VM was setup and ready to face the wrath of BrowserMob.
I created a ramp test scaling from 0 to 1000 concurrent users firing a single get request with no think time at the Web Page and ran it in isolation against the SQL Implementation and then in isolation against the  RavenDB Implementation. The test ran for 30 minutes.

And for those of you on the edge of you seat the results where a resounding victory for  RavenDB. Some details of the load test are below but the headline is SQL choked at 250 concurrent users whereas with  RavenDB even with 1000 concurrent users the response time was below 12 seconds.

SQL Load Test

Transactions: 111,014 (Transaction = Single Get Request)
Failures: 110,286 (Any 500 or timeout)

SQL Data Throughput - Flatlines at around 250 concurrent users.

RavenDB Load Test

Transactions: 145,554 (Transaction = Single Get Request)
Failures: 0 (Any 500 or timeout)

RavenDB Data Throughput - What the graph should look like

Final thoughts

RavenDB is a great document database with fairly powerful search capabilities. It has a lot of pluses and a few negative which are listed for you viewing pleasure below.

  • The documentation although spartan does cover the fundamentals making it easy to get started. On some instances I did have to sniff through the source code to fathom how some things worked but that is the beauty of open source I guess. 
  • The Silverlight Admin interface is pretty sweet 
  • The Raven community (a google group) is very active and the couple of queries I posted were responded to almost immediately.
  • Although the API did present some challenges it both allowed you to bypass its limitations and even contribute yourself to the project.
  • The commercial licence for  RavenDB is pretty cheap at a $600 once off payment 
  • The web site documentation and content could do with an a facelift. (Saying that, I just checked the web site and it seems to have been be revamped)
  • I came a cross a bug in the Lucene.Net related to granular spatial queries which has yet to be resolved.   Not  RavenDB's fault but a dependence on third party libraries can cause issues. 
  • I struggled to find really impressive commercial reference sites. There are some testimonials but they give little information away. 
  • Sharding scares me. 
I look forward to following the progress of  RavenDB and hopefully one day using it in a commercial project. I'm not at the comfort level yet for proposing it but with some more investigation and perhaps some good reference sites this could change very quickly.

* Starry Eyed groupies sadly didn't exist, nor have they ever.

** Not me.



  1. Thanks for this. Your post was timely. I'm looking at spatial search and this is an interesting piece of research. Thanks for sharing.

  2. Thank you for doing this awesome post! It would be also very interesting to see some of the tests (and the code) you used. I'm also curious, what difference was over raising number of concurrent requests, so maybe you have some more graphs to show?

  3. This comment has been removed by the author.

  4. Thanks Ben / Daniel.

    Daniel - If you drop me your email I will send some info over.

    Thanks also to Oren Eini and Itamar Syn-Hershko for their feedback.

    -- Begin Email cut n paste --

    1. In the context of RavenDB, aggregate roots are your POCOs, and everything in it is Value Types. There's no difference between Entities and Aggregate Roots.
    2. Most of the time you can't just move from SQL to RavenDB. Modeling for a document database is _very_ different and requires a different thinking. Judging only by your index definition, there may be a better way of modeling the date for your application, one that will allow it to run faster and better preserve transactional boundaries.
    3. That said, RavenDB could probably perform even better with a different model. The current model produces an index with a lot more records than probably needed (the nested "from" selection in the Map function)
    4. Why do you find Sharding scary?
    On a side note, could it be that your bad experience with Lucene.NET is due to not following the use recommendation (reusing index searchers, analyzers, etc)?

    -- End Email cut n paste --

    Itamar - the bad experience we think was due to deadlock issues in the 3rd party implementation. Also, I find sharding scary mainly because I've never done it before.

  5. Your performance test indicated that RavenDB should be much faster then the old SQL Server based implementation. Then what caused it burst into flames in production?

  6. Hi. We originally used another product (not RavenDB) which was based on Lucene.NET. The likely issue was to do with the product's implementation on top of Lucene and not Lucence itself.

  7. Mark,
    I really dig this RavenDB post. I've been tracking RavenDB discussions at DZone.com (you should check it out if you haven't already), and I was wondering if you'd give me permission to repost some of the content of your blog. I think it would be appreciated by the community of developers at DZone. Let me know what you think!

    Eric Genesky
    Content Curator
    DZone, Inc.

  8. I would really like to see some more details on the actual test being run. Can you show us the SQL Server model? The code being run in the SQL Server & RavenDB versions would be much appreciated as well.

    Without any more details of what's actually being run, we have no way to interpret the results.

  9. Eric - Sure, feel free to repost the content.

    Mark - Agreed, to get a true understanding of the differences in performance difference between SQL / Raven variants, code is king. In our SQL implementation the property data was stored in a Big Table and we created a stored proc to query against Table Indexes. We didn't have a normalised model / table structure to work with. This was one of the challenges we faced when moving the search from the original Lucene based implementation to Microsoft SQL i.e. creating a high performance stored procedure which could handle complex queries for property information. I will create a follow up post with some cuts of the SQL/Raven code to give a deeper insight into the two implementations and some more graphs detailing the ramp test.

  10. I love RavenDB and I'm using it in my project, but these results are completely missleading.

    You show no information on the queries being executed. No mention of the index's created on SQL Server, no QueryPlan or IO stats to show that the queries are efficient.

    The results indicate that you haven't done anything at all to SQL Server other than create a table and shove some data in it.

  11. What version and edition of SQL Server are you using? The Express editions have various hardware-utilization limits (http://en.wikipedia.org/wiki/SQL_Server_Express) which could skew the numbers.

  12. What about Mango DB over RavenDB?

  13. Mark - More info would be excellent. However, I don't think a benchmark can be taken seriously unless the full source and model is made available.

    Without full transparency, we have no way of knowing how god/bad the implementation is, or how manipulated the results are. If you don't want to reveal the code/model, that's fine, but in that case I wouldn't go about posting results as skewed as these.

  14. As I understand, indexes in Raven are somewhat similar to Indexed views / Materialised views in relational databases. It would be interesting to know whether you used similar indexes on both databases.

  15. I need to develop an application like http://support.skype.com which has features like help topics, meta data tags, keywords and search is the heart for the app. One more thing is, it should work on-demand
    as well as desktop both.
    I find RavenDB as good fit for my requirements due to content management, tags and searching.

    But I am facing some opposition to introduce RavenDB as my manager wants some proven technology which is used in enterprise application.

    Is there any commercial / Enterprise Application successfully using RavenDB.

  16. Hi Usma,

    I would recommend sending an email to the guys at RavenDB to get reference sites.