Monday, 6 September 2010

EPiServer: Proximity Searching with EasySearch

A requirement which recently cropped up was for Property (as in house, flat, etc) searching through both attributes (e.g. number of bedrooms) and proximity (e.g. within 5 miles of some location). I’ll call ‘Properties’ ‘Assets from now on to avoid confusion.

For the attribute – or facet – search functionality we opted for a product called EasySearch. EasySearch - developed by NetworkedPlanet - is a great piece of software built on top of Lucene which ships natively with EPiServer. EasySearch provides tools to index EPiServer Content (Page Types and Files), web and user controls to search, view and narrow search results with selected facets, and administration tools to view and control the search index.

This was great for all aspects of our search requirements apart from proximity searching.

(The latest version of Lucene and Lucene.NET does support proximity searching. Unfortunately EasySearch is dependent on an older version of Lucene that ships with EPiServer which doesn’t have this functionality out of the box. Drat.)

I didn’t want to go down the road of a mixed discipline search implementation: for example, using SQL Spatial search and Lucene together. [ Insert Ghostbusters reference here ] I wanted one search interface so I did what any good coder does – hunt around the web and stitch together a solution from proverbial hacked off limbs.

Before I parade my Frankenstein, this was the approach:
  1. Store the Asset data in EPiServer Page Types with Longitude and Latitude fields.
  2. Tell Lucene to index the Longitude and Latitude.
  3. When a proximity search is made, calculate a rectangular region and perform a Lucene range query to find locations within that region.
  4. Once these locations have been extracted, do some maths to determine if the location falls outside of the circular perimeter and eliminate the edge cases.
  5. Sort the data from closest to furtherest.
Step 1: Storing the Data
I used Joel Abrahamsson's Page Type Builder to create the Asset Page Types with Longitude and Latitude Properties. These fields can then be easy indexed and searched for with EasySearch. There is a gotcha however, when it comes to numerical range searches– for example Longitude and Latitude in decimal format. Since Lucene is a text based search engine it doesn’t play well with numbers.

This blog post explains the issue and provides some Java code which normalises coordinates so that they can be range searched.
 
Here is a my C# Port
public static string NormaliseCoord(double coord)
        {
            string formatDouble = coord.ToString("#0000.000000000000000000");

            if (formatDouble.StartsWith("-"))
            {
                formatDouble = InvertNegativeDouble(formatDouble.Substring(1));
                formatDouble = "n" + formatDouble;
            }
            else
            {
                formatDouble = "p" + formatDouble;
            }

            return formatDouble.Replace(".", "d");
        }

        public static string InvertNegativeDouble(string negDbl)
        {
            String value = "";
            for (int i = 0; i < negDbl.Length; i++)
            {
                char digit = negDbl[i];
                if (digit >= '0' && digit <= '9')
                {
                    value += '9' - digit;
                }
                else
                {
                    value += digit;
                }
            }

            return value;
        }
Step 1b: Storing the Data in the right way
Now that I had my code to Normalise the coordinates I added an event in the Asset Page Type to normalise the Longitude and Latitude Page Type Properties when the page is published.
public AssetPageType()
        {
            DataFactory.Instance.PublishingPage += Instance_PublishingPage;
        }

        protected void Instance_PublishingPage(object sender, PageEventArgs e)
        {
            AssetPageType clone = (AssetPageType)e.Page;
            // Set normalised Longitude and Latitude
            clone.LatitudeNormalised =  GeoPoint.NormaliseCoord(Convert.ToDouble(clone.Longitude));
            clone.LongitudeNormalised = GeoPoint.NormaliseCoord(Convert.ToDouble(clone.Latitude));
        }
Step 2: Indexing the Coordinates Indexing fields with EasySearch is simple. With a bit of Web.config magic you can easily tell Lucene to index specific Page Type properties.

      
      
      
      
      
      
        
      
      
        
      
    
Step 3: The Search
There are two steps search process.

In the first step we calculate a rectangular region from the search center point and the proximity and perform a range search for Assets within that region.
private global::Lucene.Net.Search.Query GetGeoQuery(GeoPoint origin, double distanceKms, LuceneQuery queryInterface)
        {
            double spreadOnLongitude = distanceKms / GeoPoint.CalculateKilometersPerLongitudeDegree(origin.Latitude);
            double spreadOnLatitude = distanceKms / GeoPoint.KILOMETERS_PER_DEGREE;

            // Get top left bounding box point
            GeoPoint topLeft = new GeoPoint(origin.Longitude - spreadOnLongitude, origin.Latitude - spreadOnLatitude);

            // Get bottom right bounding box point
            GeoPoint bottomRight = new GeoPoint(origin.Longitude + spreadOnLongitude, origin.Latitude + spreadOnLatitude);

            // construct bounding box query with lat and long ranges
            BooleanQuery query = new BooleanQuery();

            // Create latitude range query and add to root query 
            ConstantScoreRangeQuery latitudeQuery = new ConstantScoreRangeQuery(LATITUDE_FIELD_NAME,
                topLeft.NormalisedLatitude,
              bottomRight.NormalisedLatitude,
              true, true);
            query.Add(new BooleanClause(latitudeQuery, BooleanClause.Occur.MUST));

            // Create longitude range query and add to root query 
            ConstantScoreRangeQuery longitudeQuery = new ConstantScoreRangeQuery(
              LONGITUDE_FIELD_NAME,
              topLeft.NormalisedLongitude,
              bottomRight.NormalisedLongitude,
              true, true);
            query.Add(new BooleanClause(longitudeQuery, BooleanClause.Occur.MUST));

            Debug.WriteLine("query:" + query);

            return query;
        }
In the next step we iterate through the results, calculate the distance from the search center point and eliminate Assets which fall outside the circular perimeter.
// Execute Lucene Query
                searchResults = FacetCtrl.ExecuteQuery(masterQuery);

                // Iterate through results, calculate distance from origin and add into field into results
                for (int i=0; i< searchResults.Count; i++)
                {
                    // Get Search Results
                    Document doc = searchResults[i];

                    // Get normalised lat long from record
                    string lat = doc.GetField("longnorm_en").StringValue();
                    string lon = doc.GetField("latnorm_en").StringValue();

                    // Denormalise lat long
                    double latitude = GeoPoint.DeNormaliseCoord(lat);
                    double longitude = GeoPoint.DeNormaliseCoord(lon);

                    // Caluculate distance from origin and round off results
                    double distanceKmFromOrigin = point.DistanceFrom(new GeoPoint(longitude, latitude)) * GeoPoint.KILOMETERS_PER_DEGREE;
                    distanceKmFromOrigin = Math.Round(distanceKmFromOrigin, 2);

                    // If outside perimiter flag to be deleted


                    // Add Distance From Origin field onto results set. 
                    doc.Add(new Field("distanceInKm", distanceKmFromOrigin.ToString(), Field.Store.YES, Field.Index.NO_NORMS));
                }

                // Sort results using Distance from Origin IComparer implementation
                searchResults.Sort(new DistanceFromOriginSort());
Finally, if we want to sort the result we can add our own implementation of IComparer.
class DistanceFromOriginSort : IComparer
        {
            public int Compare(Document c1, Document c2)
            {
                double distance1 = Convert.ToDouble(c1.Get("distanceInKm"));
                double distance2 = Convert.ToDouble(c2.Get("distanceInKm"));

                if (distance1 == distance2)
                {
                    return 0;
                }

                if (distance1 < distance2)
                {
                    return -1;
                }
                
                return 1;
            }
        }

Code:
GeoPoint.cs

Links:
NetworkedPlanet - EasySearch
PageType Builder
Proximity Searching with Java and Lucene
Proximity Range Searches with Java and Lucene

Blog:
http://tech-rash.blogspot.com/

1 comment:

  1. Normal text ads typically have a title, a short description, and your website URL. With extensions, advertisers can include more insights, effectively creating stronger calls to action that can translate into a much better return on their ad spend.
    https://ppcexpo.com/blog/why-do-search-ad-extensions-matter

    ReplyDelete