Tuesday, 7 February 2012

Is RavenDB all it's cracked up to be?

It all started with a Vampire Launch.

As a seasoned technologist I’m always happy to bare old battle scars and discuss them candidly with my peers and starry eyed groupies* over a few pints down the local. Technology, after all, always has problems and we have taken it upon ourselves to carry that giant boulder up the hill and start afresh when it rolls back down again.

Take, for example, a recent run in with a search implementation on a property website. We selected a search technology built on Lucene.Net - a high performance, enterprise search platform that is fast becoming an industry standard. Choosing a Lucene based search provider for a web site heavily reliant on search functionality seemed like a great idea at the time. Fast forward to go live and we soon discovered that the search didn’t hold up very well under significant load. This is what I call a “Vampire Launch” i.e. a web site having a brief moment in the sun but then suddenly erupting into flames. One stressful week later and the search was back online and taking the strain of public exposure. This time however, it was now driven by trusty Microsoft SQL.

Before I cover up this grizzled scar (until the next showing that is), it's worth highlighting three important lessons harvested along the way.

Lesson 1: If you are trying out a new technology make sure your POC includes load testing.

Lesson 2: If you are using a new technology make sure the providers of the technology are able to support it.

Lesson 3: Make sure you build using interfaces and abstraction in case you ever need to swap out a nasty search implementation.

So how does this all relate to RavenDB?

When a colleague mentioned RavenDB to me I had a poke around and discovered that it was one of the more popular open source NoSQL technologies on the market. Not only that but it was bundled with Lucene.Net Search making it Document Database coupled with Lucene search capabilities. With an interest in NoSQL technology and a grudge match that hadn’t been settled with Lucene.Net, I set myself the challenge to swap out our SQL Search implementation with RavenDB and then do a like for like load test against the two search technologies.

These are my findings from both a programmatic and performance perspective.

Installing RavenDB

There isnt much to installing Raven and its pretty much a case of downloading the latest build and running the Server application.

The server comes with a nice Silverlight management interface which allows you to manage all aspects of Raven Db from databases to data to indexes. All tasks have a programmatic equivalent but a decent GUI is an essential tool for noobs like myself.

Storing the Data

My first development task was to write an import routine which parsed the property data in SQL and then add it into a Raven Database. This was fairly easy and all I needed to do was to create a POCO, plug it with data from SQL and save it using the C# Raven API. The POCO serialised into JSON data and saved as a new document in the RavenDB.

The main challenge here was changing my thinking from relational modelling to domain driven modelling - a paradigm shift required when moving to NoSQL - which includes concepts like aggregate roots, entities and value types. Journeying into this did get a bit metaphysical at times but here is my understanding of this new fangled schism.

Entity - An entity is something that has a unique identity and meaning in both the business and system context. In the property web site example, a flat or a bungalow or an office match these criteria.

Value Type - Part of the entity which does not require its own identity and has no domain or system relevance on its own. For example, a bedroom or a toilet.

Aggregate Root - Is an master entity with special rules and access permissions that relate to a grouping of similar entities. For example, a property is an aggregate of flats, bungalows and offices. This is the best description of these terms I found.

In this example, I created one Aggregate Root Entity to store all property types.

C# Property POCO

Indexing the Data

Once the Data was stored it needed to be indexed for fast search. To achieve this I had to get to grips with map reduce functions which I had seen around but avoided like the sad and lonely looking bloke** at a FUN party.

The documentation is pretty spartan on the RavenDB web site but after hacking away I finally created an index that worked on documents with nested types and allowed for spatial queries.

RavenDB allows you to create indexes using Map Reduce functions in LINQ. What this allows you to do is create a Lucene index from a large, tree like structure of data. Map reduce functions give you the same capability as SQL using joins and group by statements. To create a spatial index which allowed me to search properties by type and sector (nested value types) I created an index using the following Map Reduce function.

Index created using the Raven DB Admin GUI

Querying the data

Now that I had data that was indexed, the final development challenge was querying it. RavenDB has a basic search API and a Lucene Query API for more complex queries. Both allow you to write queries in LINQ. To create the kind if complex queries you would require in a property searching web site, the API was a bit lacking. To work around this I had to construct my own native Lucene queries. Fortunately the API allowed me to do so.

Performance Testing

All the pawns were now in place for my load test.

The entire property SQL database was mirrored to RavenDB.
The Search Interface now had both a SQL and a RavenDB implementation.
I created a crude Web Page which allowed switching the search from SQL to RavenDB via query string parameters and output the results using paging.To ensure maximum thrashing the load tests passed in random geo locations for proximity search and keywords for attribute search.
A VM was setup and ready to face the wrath of BrowserMob.

I created a ramp test scaling from 0 to 1000 concurrent users firing a single get request with no think time at the Web Page and ran it in isolation against the SQL Implementation and then in isolation against the RavenDB Implementation. The test ran for 30 minutes.

And for those of you on the edge of you seat the results where a resounding victory for RavenDB. Some details of the load test are below but the headline is SQL choked at 250 concurrent users whereas with RavenDB even with 1000 concurrent users the response time was below 12 seconds.

SQL Load Test

Transactions: 111,014 (Transaction = Single Get Request)
Failures: 110,286 (Any 500 or timeout)

SQL Data Throughput - Flatlines at around 250 concurrent users.

RavenDB Load Test

Transactions: 145,554 (Transaction = Single Get Request)
Failures: 0 (Any 500 or timeout)

RavenDB Data Throughput - What the graph should look like

Final thoughts

RavenDB is a great document database with fairly powerful search capabilities. It has a lot of pluses and a few negative which are listed for you viewing pleasure below.

Positives

The documentation although spartan does cover the fundamentals making it easy to get started. On some instances I did have to sniff through the source code to fathom how some things worked but that is the beauty of open source I guess.
The Silverlight Admin interface is pretty sweet
The Raven community (a google group) is very active and the couple of queries I posted were responded to almost immediately.
Although the API did present some challenges it both allowed you to bypass its limitations and even contribute yourself to the project.
The commercial licence for RavenDB is pretty cheap at a $600 once off payment

Negatives

The web site documentation and content could do with an a facelift. (Saying that, I just checked the web site and it seems to have been be revamped)
I came a cross a bug in the Lucene.Net related to granular spatial queries which has yet to be resolved. Not RavenDB's fault but a dependence on third party libraries can cause issues.
I struggled to find really impressive commercial reference sites. There are some testimonials but they give little information away.
Sharding scares me.

I look forward to following the progress of RavenDB and hopefully one day using it in a commercial project. I'm not at the comfort level yet for proposing it but with some more investigation and perhaps some good reference sites this could change very quickly.

* Starry Eyed groupies sadly didn't exist, nor have they ever.

** Not me.

http://ravendb.net