Friday, 10 February 2012

RavenDB Vs SQL - A follow up

The fight they were all waiting for
This is a following on post from RavenDB is it all it’s cracked up to be. A few have commented that to provide any meaningful interpretation of the load test they needed to know more about the SQL implementation which is fair enough. So, here we go.

The load test was executed against our SQL Implementation currently driving the search behind a leading estate agent property web site in the UK, with no performance problems. We naturally couldn't run a load test against the production environment but setup a staging environment which mirrored the production environment in terms of application (.Net 4 Web Application running on IIS 7.5), codebase, Operating System and Server Configuration.

The load test executed between SQL and RavenDB was against the same VM, Web Application and Operating System. The only difference was the data stores and the search provider. The test ultimately proved the differences in performance between our current implementation between RavenDb and SQL i.e. a real world, practical example of one search provided vs another. The load test was setup using browser mob and we ran a ramp test scaling from 0 to 1000 users. I chose a ramp test to assess where the saturation / breaking point was in the two technologies

Software / Hardware


OS

Windows Server 2008 R2 x64

Server Specifications

VM, dual core 2.40 GHz, 4 GB RAM

Web Server

IIS 7.5, .Net 4

SQL Database

SQL Server 2008 R2

RavenDB Build

RavenDB-Build-573

Front End

ASP.Net Web Form switching search providers based on query string parameters.

Load Test Configuration

Ramp test scaling up from 0 to 1000 Concurrent Users. Two scripts were created:
one to execute an Attribute Search and one to execute a Proximity/Spatial Search. The distribution between the two tests was 50/50. Each script also passed in randomised search parameters for both the attribute and proximity search. Each user executed single get request with no think time between request. A single transaction was 1 get request which didn't timeout (30 seconds) or return an error.

Load Testing

Another meaningful graph is the response time between the two tests.

RavenDB - Response Time

SQL - Response Time
These two graphs illustrate the average response time between one of the scripts for both SQL and RavenDB. What this illustrates is that the RavenDB scaled well with concurrency serving request in under 12 seconds, even with 1000 concurrent users.

So your SQL implementation is poor then? 

Well from a pragmatic perspective, no. The web site works well and the client is happy that customers can use it to find and buy properties, shelling out exorbitant fees into the hands of greedy estate agents. From a technical perspective we had to query the data from a SQL Big Table, and didn't have the luxury of designing a normalised model from scratch. This I think is interesting from the perspective of real world challenges we face in creating systems and have to adapt to challenges and obstacles in our path. We did have to work on the SQL stored procedure to krank out the performance we needed and our resident SQL guru had to step in and flex his muscles. This best expressed in an email sent to me with the subject line, "This is why your SQL is slow".

If anyone is interested to check out the SQL code I've uploaded it here. The RavenDB code was really straight forward working against a Domain Model and the defined index. The only complexity was of course limitations in the API but creating native Lucene queries worked around this. (As described in blog post)

So is RavenDB better than SQL?

In this scenario, yes. However I do realise that database design varies for every project so it would be a wild and unwise accusation to blanket this statement across every SQL Implementation Installation in the world ;-) Also there are a number of optimisation tricks suggested by colleagues (following on from the post) and the readers of the post. Thanks for all the technical insight and I will endeavour to so some more tweaks, changes and more tests time, virtual machines and browser mob credits permitting.

The aim of the previous post was to create a like for like test – keeping as many variables the same - against a real world solution, whilst investigating RavenDB.

And in this case, RavenDB put SQL doooown!



17 comments:

  1. Sorry, but RavenDB didn't put SQL down in this case. Whoever wrote that SQL code put SQL down.

    If a man shoots himself with a gun, you blame the man, not the gun.

    The SQL code is one humongous stored procedure using lots of temporary tables.
    It uses LIKE %x% predicates for text search - defeating any indexes that might've been used.
    The large amount of temporary tables makes it impossible for SQL Server to use proper estimates.
    The whole code itself is procedural instead of set-absed - the bane of set-based RDBMS performance.
    The code does lots of string manipulation - a task SQL Server is not made for - use the right tool for the job.
    Unless the code is run under the READ UNCOMMITTED isolation mode, this is going to cause heavy locking which will deadlock all over the place if you're modifying the data meanwhile.

    Sorry, but this test is useless. Basically you've got a wrong database model that's being utilized in a bad fashion. Contrast that with an excellent RavenDB model where everything is summed up into proper aggregates, and you basically have a cache at the same level of Memcached. No wonder this results in SQL being put down.

    ReplyDelete
    Replies
    1. Maybe the real point here is that you can get this kind of performance out of RavenDB without having to know all of the details of SQL you point out. It seems clear for this business problem that SQL just brings a lot of baggage with no added value.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. RavenDB may be easier to start out with as a developer. However, Mark does imply they can't change the SQL model and are thus stuck with a horrendous model. With RavenDB on the other hand, it seems like they could create their own model.

      Even not knowing all of the SQL details doesn't make it fair to compare a bad model with a completely different (and excellent) model of a different system.

      While querying/modelling may be easier using RavenDB/NoSQL in general, not knowing the compromises you make can hurt you down the road as well, though not necessarily in the test being benchmarked here.

      EDIT: Posted under my correct profile this time.

      Delete
    4. I have to strongly disagree with the argument 'I dont need to know Raven well to use it well, so it must be better' - Who is building products that consist of performance tests? I have come across the argument numerous times that product X is better than product Y because you dont need to be a programmer to figure out how to use it. Well, guess what, we ARE programmers - if you're trying to avoid learning anything to build your products, you should be in a different industry.
      Note that I am not attacking Raven in any way, I am attacking the argument that 'this performance test is valid because I got great numbers without having to know what I was doing'.
      Well guess what - you dont know what you're doing.

      Delete
  2. Mark / Paul - The intention of my post was to highlight that RavenDb was a great piece of technology that was both easy to work with and performed well in this particular application. Also, that as an alternative to SQL - again in this application instance - it may have been a worthwhile architectural choice under the circumstances we were in. I.e. left with a 'horrible' model which we had to write some creative TSQL to effectively search the data. I think the essence of my post may have been misconstrued by some as a blanket statement that Microsoft SQL was an entirely inferior storage and search technology. Context is key here and perhaps focusing on those two graphs skewed the point of my post.

    On a side note: From a business perspective the client are happy with the search because it works well and does what it needs to do. From a technical perspective I still hoping that one day we will find a profitable reason to swap the SQL for something else more maintainable. And one which, of course, performs very well. Thanks to my learnings, it may very well be RavenDB.

    ReplyDelete
  3. Mark - Precisely, context was, and is, needed.

    Where the test fails is that you compare the horrible current SQL scheme with a completely new RavenDB scheme, made as you'd make it if you actually converted. The only way the test can be fair, is if you pit it against a proper SQL schema, the way you'd make it if you had the time.

    In that case you're comparing two green field systems to each others, and not a brown one with a completely synthetic alternate test. As the benchmark was made, it was oranges vs apples.

    ReplyDelete
  4. Disclaimer: I am an employee of Microsoft, but the opinion below is my own and not meant to represent or speak for Microsoft in any way. Furthermore, I am a certified Master in SQL Server.


    Okay, now tha tthe legaleze is out of the way, I want to say that I love reading these types of comparisons. I really appreciate the time and effort to make the test code available as well.

    I am a big proponent of using the right tool for the job. SQL Server will not always be the right tool, and I'm okay with that. Sometimes it will be. I care more about people having tools that they need than about which specific tool it is.

    Thanks again for the great comparison.

    ReplyDelete
  5. I'm a long term SQL advocate who had to learn to let go of sprocs in favor of the power of good O/RM tools. Looking at the stored procedure running your SQL-based search, it's easy to understand why Raven is out-performing it (along with what was mentioned before).

    All things taken into consideration (meaning implementation and ramp up time vs. performance), you have made a strong case hear for how straight-forward and natural using Raven is when using an Object-Oriented Model.

    I spent the better part of 3 years neck deep in hand-coded SQL and performance tuning to learn the knowledge necessary to see what's wrong and steps to address your SQL issues. (Create full-text search on the relevant fields, make materialized views instead of temp tables, convert cursors into outer joins). Essentially, the tuning steps that you did on Raven have their own equivalents in SQL Server.

    For what it's worth, I'd love to see what happens if you replaced the backend of your code that loaded up Raven with NHibernate or Entity Framework so you could test the more optimized model against SQL Server.

    ReplyDelete
  6. SQL Server on VM is Garbage to begin with.

    ReplyDelete
  7. excellent article. I would like to point out to all those people that are throwing stones at the SQL code to come up with a proc that is more efficient with the same functionality.
    Please go ahead and show everyone how to write a proc that consumes 12 dynamic parameters that are likely spread across several joins where comparisons need to be made with the like operator.
    I don't think even the most knowledgeable SQL guru could work magic that allows SQL server to perform comparatively and scale comparatively. The problem is the underlying relational model and an optimizer that wasn't designed to work with dynamic parameters. the only way to remodel the SQL database would be de-normalization and add full text search which is limited in comparison to map reduce.






    ReplyDelete
  8. It is such a great resource that you’re providing and you also give it away for nothing. I love seeing websites that understand the worthiness of providing a prime resource for free. I truly loved reading your post. Thanks!

    ReplyDelete
  9. Excellent stuff from you, man. I’ve read your things before and you are just too awesome. I adore what you have got right here. You make it entertaining and you still manage to keep it smart. This is truly a great blog thanks for sharing…

    ReplyDelete
  10. Wow, I love your site, big thank you to these ideas, and note in the first place that I fully agree with you! Let me emphasize, yes your article was excellent, I was thinking about all this and more the other day. This is my very first comment here and I’ll come back with pleasure on this blog!

    ReplyDelete
  11. I like it very much because it has very helpful articles of various topics like different culture and the latest news

    ReplyDelete
  12. I love the Information... Great Blog Post...

    ReplyDelete
  13. Really Nice Information,Thank You Very Much For Sharing.
    Wordpress Development Company

    ReplyDelete