Friday, 11 January 2013

Cloud recipe: Standing up an Enterprise, Solr Search Server on Azure in fifteen minutes

I recently saw some Twitter activity around a new on-line Community called VM Depot. From the website: "VM Depot is a community-driven catalogue of preconfigured operating systems, applications, and development stacks that can easily be deployed on Windows Azure."

One of the VMs which caught my eye was an Linux VM with a fully configured instance of Apache Solr 4: a lightening fast, enterprise Lucene search engine built by the Apache community.

In this Cloud Recipe we will deploy a fully configured Linux VM running Apache Solr and write some C# code to talk to it.

To complete this recipe, you need the following:
  • A Windows Azure Subscription. 
  • Node.JS and node package manager (NPM). This can be downloaded and installed here.
You should be able to complete this recipe in under 30 minutes.

Install Windows Azure Command Line Interface

Before you start you need to install Windows Command Line Tools which are a set of command line tools that allow you to do the following on Azure:
  • Import publishing settings.
  • Create and manage Windows Azure Websites.
  • Create and manage Windows Azure Virtual Machines.
The GitHub project for this can be found here

You can install them using the NPM by entering the following command in your Console of choice. Console > npm install azure -g

Create a Storage Account and Import Publish settings

Next, you need to log into your Azure account and create a Storage Account if you don't have one already. This Storage Account will be used to store the Community VHD. 
  • Log into http://windows.azure.com
  • Select Data Services and Create New
  • Enter your domain prefix and select your affinity group. These two fields will be used later when you deploy your VM.
Next you need to download and import your Publishing Profile.
  • Load the Publish Settings Download Link in your browser.  https://windows.azure.com/download/publishprofile.aspx. You will asked to log in. After you do, your publish settings file will be automatically downloaded. Save this into a path of your choice. 
  • Launch your Console. I use Powershell and import the Publish Settings file by entering Console> azure account import [path to publish settings file]
You are now ready to deploy your Solr VM.

Deploy Community VM


In this step, we will deploy the community Solr VM to your Azure Subscription.
  • Navigate to the the VM Depot's Solr Build in your browser . 
  • Click Deployment Script and Agree to the Terms. 
  • Select the Region. This must be the same regions as specified when you entered your Storage Account above. 
  • A Script Snippet will be generated. Copy this and replace DNS_PREFIX with the DNS Prefix Entered when you created your storage account, USER_NAME with your Azure Subscription User Name, PASSWORD with your Azure Subscription Password. 
  • Paste the updated script into your Console and run. 
After about 5 minutes, your Solr VM should be deployed and ready to use. You may have to add port 80 to your VMs internal / external port mappings to allow you to access Solr Admin and Rest Web Services. 

Navigate to your Admin instance by entering http://domainprefix.cloudapp.net in your browser and enter the default credentials listed on the Community Page. Once logged in you will have access to your Solr Admin Console. 


Create console app and write some code

Now that your Solr VM is setup, you are ready to create a .Net Console App which can add new data into its index and query the data back out again. I use the .Net library SolrNet to expedite this task. 

Note: I first added a reference to SolrNet using NuGet but this added a version of SolrNet which wasn't compatible with Solr 4. To solve this I downloaded the latest build from the Team City Community build server here
  • Create a new .Net Console Application
  • Add reference to Solr.Net and dependant assemblies (As mentioned, I sued the latest build from TC build artefacts)
  • Create a Product POCO with Solr Mapping Attributes
  • Add this code into your solution and run. 
Product POCO
    public class Product
    {
        [SolrUniqueKey("id")]
        public string Id { get; set; }

        [SolrField("manu_exact")]
        public string Manufacturer { get; set; }

        [SolrField("cat")]
        public ICollection Categories { get; set; }

        [SolrField("price")]
        public decimal Price { get; set; }

        [SolrField("inStock")]
        public bool InStock { get; set; }
    }

Solr Application using SolrNet
 

    class Program
    {
        static void Main(string[] args)
        {
            DoSolrStuff();
        }

        public static void DoSolrStuff()
        {
            Startup.Init("http://markrodseth.cloudapp.net/solr");

            var p = new Product
            {
                Id = "123456",
                Manufacturer = "Something Awesome",
                Categories = new[] {
                    "fun",
                    "recrecation",
                    },
                Price = 92,
                InStock = true,
            };

            var solr = ServiceLocator.Current.GetInstance>();
            solr.Add(p);
            solr.Commit();

            solr = ServiceLocator.Current.GetInstance>();
            // search for "lucene" in the default field
            var products1 = solr.Query(new SolrQuery("id:123456")); 
        }
    }

When you application runs, it will create a new Solr Index, add and commit the product and query it back out again.

Future areas to explore are:
  • Working with Facets, Highlights and Solr other powerful features
  • A Master Slave Configuration
Happy Enterprise Searching!

2 comments:

  1. Nice Article Marks.

    VM Depto community has announced that community-driven open-source virtual machine image catalog is now integrated into the company's cloud platform, Windows Azure.

    The new feature is available through the Windows Azure management portal and is designed to ease the handling of virtual machine images from VM Depot.

    To take advantage of the new functionality, users will have to choose the "BROWSE VMDEPOT" option within the "Virtual Machines" tab and select the needed files from the list of available images.



    ReplyDelete
  2. Cool - thanks for the info Bhavana!

    ReplyDelete