Determine Location Intent in a Search Engine

Author: Max Shaw, VP Product
Product: Answers
Blog Date: June 2020

One of the most common use cases for a search engine is finding something by a "location". Here are some basic examples:

  • Cardiologist near Green Bay
  • Notary near me
  • Restaurants open now

These are all pretty simple queries, but getting these to work in a search engine is much more complex than you might imagine. There are three general steps to handling location intent:

  1. Identification - Identify that the user has local intent
  2. Resolution - Turn that local intent into a latitude/longitude (lat/lng)
  3. Geosearch - Filter the result set to only places near that lat/lng

Let's walk through each of these three queries, step by step, to highlight how these steps work.

Cardiologist Near Green Bay

Identification
The first step is identifying local intent. In this case, the query Cardiologist near Green Bay has local intent and the user is looking for the place "Green Bay". For a human (especially someone who lives in the US and speaks English) this is pretty easy, but identifying potential place names is a difficult task. You will want to use a combination of named entity recognition alongside a database of place names to accurately find "places" in the query. Named Entity Recognition is a natural language processing (NLP) problem where the goal is to identify certain entities (like people, places, and things) in a sequence of text. If we can detect the words in a search query that are locations, we can cross-reference with a location database to convert words into places on a map. Generally, this database will then be used in the resolution step which is next.

Resolution
The next step is resolution, turning the string "Green Bay" into a lat/lng (Geocoding). This task is much harder than you might first think because we don't exactly know which "Green Bay" they are referring to.

Source: Mapbox

On Mapbox, if you search for Green Bay, you will find 5 places that exactly match "Green Bay". So which Green Bay is the best option? Here are the different pieces you will want to factor in to this part of the algorithm:

  • Population density - More likely than not, users are looking for more popular places, all else being equal
  • Distance to the user - The user is probably looking for a place closer to them, all else being equal
  • Typos - If the place name isn't an exact match, you will want to factor that in
  • Underlying Data Set - If the underlying data set only has places near Green Bay, Nova Scotia, the user probably means that Green Bay.

Once you've identified the right "Green Bay", you will want to geocode that term and turn it into a lat/lng. In this scenario, the user is searching from Wisconsin, so we are confident they mean Green Bay, Wisconsin and can geocode that location: 44.5133° N, 88.0133° W.

Geosearch
The final step is the easiest, but there are still a few important details. You will need to store a lat/lng associated with each object in the database and then have a system that can easily and quickly filter to a set of objects based on lat/lng. There are two general approaches to Geosearch:

  • Point + Radius - We have an exact point coordinate from the previous step, but you will need to figure out an ideal radius. This could be selected by the user or could depend on the place selected. If the place has a small geographic area you might want to use a smaller radius then a place with a larger geographic area.
  • Polygon - In certain scenarios, you might want to use a polygon instead of a point + radius. Perhaps if someone searches for "notaries in wisconsin", you might want to use the state boundary.

In this scenario, a point + radius probably makes more sense. Green Bay has an area of 50 mi2. This means the radius of Green Bay is roughly ~ 4 miles. We will double it to also include locations in the nearby region so we will use a radius of 8 miles to run a geosearch.

Using a combination of the lat/lng and a radius of 8 miles, we can turn this into a set of objects from the database. We will also filter the set of results to only show "cardiologists" and will sort the results based on their proximity to the center of Green Bay.

Notary near me

Identification
Notary near me is another type of search query with location intent. In this case, the user didn't explicitly specify a location (like "near Green Bay") but instead wants the search engine to find notaries near the user. Named Entity Recognition is the best way to identify "near me" intent. In theory, you could start with a hardcoded list of strings "near me", "around me", etc. but that only scales so far.

Resolution
Resolution in this case looks very different from the previous search term. Instead of geocoding a place, we need to identify where the user is located. There are two ways this is generally accomplished:

  • IP-Geolocation: There are many libraries that will turn an IP address into a lat/lng. This is easy to do and extremely quick, but does not have very high accuracy. At best, you can locate a user down to a city, but there are many scenarios where it can be pretty inaccurate (e.g. mobile devices, VPN, etc).
  • Device Location - Modern computers and mobile devices can provide their location using a combination of GPS and cell tower triangulation. On the web, this is accomplished using the HTML5 Geolocation API. This is by far the most accurate source of location information, but a user must grant permission to be located. We do not recommend showing a permission pop-up when the user loads the page, so you will only want to ask for HTML5 Geolocation if a user runs a query that has "near me" intent.

Your search engine should automatically prefer device location, but if it doesn't have it, you should use IP-Geolocation as a fallback (it's better than nothing!).

Geosearch
With a "near me" search, only Point + Radius makes sense. In this case we don't have a place (like Green Bay) to inform the radius so you might want to use a hardcoded limit or you might want to set no radius. Either way, you will want to sort objects to show the ones nearest to the user first. This step looks pretty much the same as in the previous example.

Restaurants Open Now

Identification
This type of query is interesting because there is no actual local intent in the search term. However, if you search Restaurants open now in Google from the Yext Office in New York City, here are the results you get:

You'll notice that even though the user didn't signal "near me" intent, Google is treating this as a "near me"search. That's because lat/lng is so critical to Restaurants, when in doubt, you will want to filter to locations nearby. The best way to handle this is to think about the underlying data. For restaurants, physical location is critical. For something like events, physical location maybe isn't as important (e.g. online events), so you might not want to apply this filter automatically.

Resolution
In this example, we'll assume "near me" intent since the user is looking for restaurants. In this case, the resolution looks pretty similar to the previous example - we should use device location or IP geolocation to determine a lat/lng.

Geosearch
For the geosearch, you probably don't want to do any explicit filtering since the user didn't explicitly specify "near me". In this case, we could set an infinite radius, but still rank the locations by the ones closest to the user. You could also explore other thresholding techniques in which you pick a radius based on the underlying data set, and then sort based on reviews or another indicator of relevance.

Location Intent with Yext Answers

Yext Answers handles location search right out of the box. For the search algorithm, Yext Answers implements the following important steps:

Identification
For identification, Yext Answers uses a proprietary named entity recognition (NER) model based on a deep neural network called BERT. BERT (Bidirectional Encoder Representations from Transformers) was open sourced by Google in 2018. BERT combined several recent big advances in neural networks for NLP that dramatically improved performance on tasks like NER. Even better, BERT can be fine-tuned for specific applications to get the "flavor" of the vocabulary and usage of certain words in the right context. This is super important for Yext Answers, where every customer is special, and words usage can vary so much context. So, Yext Answers fine tunes a BERT model using labeled search terms makes predictions for every word in every search term. Since every set of search terms can be unique and never seen before. This means by leveraging the Yext Answers product, you get state of the art natural language understanding without any of the headache. To learn more about how Yext Answers leverages BERT, check out this deep dive.

Resolution
Yext Answers uses the population of the place, distance to the user, the underlying customer's Knowledge Graph and any typo tolerance to determine the most relevant lat/lng. It then dynamically picks a radius depending on the place. To handle the geocoding of places, Yext works with Mapbox which is built on top of the Open Street Maps database.
To identify where the user is located, Yext Answers uses a combination of IP geolocation and HTML5 geolocation, depending what's available.

Geosearch
Every entity that gets added to the Knowledge Graph is automatically geocoded. Answers sits on top of the structured Knowledge Graph, and can find entities around a lat/lng extremely quickly.

All Blog Posts

Determine Location Intent in a Search Engine

Max Shaw, VP Product

One of the most common use cases for a search engine is finding something by a "location". Here are some basic examples: Cardiologist near Green Bay, Notary near me, Restaurants open now. These are all pretty simple queries, but getting these to work in a search engine is much more complex than you might imagine.

4 Methods for Increasing Site Search Clicks

Rick Swette, UX Research

We know good search drives business impact. It increases conversions and transactions, reduces search bounce rate, and boosts overall customer satisfaction. So, how do we get more people to trust and use site search? We embarked on a study to find this out.

How to Measure the Success of Your Site Search

Basil Polsonetti, Data Insights

Most brands know that site search is a feature their website should have, but unless the site is dominated by e-commerce, it’s often relegated as a check-the-box task when building a new website.

The Danger in Document-Level Sentiment Analysis

Calvin Casalino, Senior Product Manager

In order for your feedback to become an actionable item to help businesses provide a better experience, they need a way to analyze the granular content of all of their reviews, at scale.

Deep Dive into Duplicate Suppression

Dee Luo, Product Manager

Brands know the importance of having accurate information across all the apps, maps, and directories where consumers are searching for information. In a perfect world, powering that brand data and managing each of these listings would be enough to ensure that consumers consistently get the answers they're searching for.

Yext Answers Algorithm Update: Milky Way

Max Shaw, VP Product

Yext Answers is constantly improving it’s search algorithm to provide more relevant results over time. Milky Way is the first official upgrade to the Answers Algorithm and includes a series of important upgrades to provide better search precision and recall.

GMB API Update - Dedicated Food Menus

Dee Luo, Product Manager

On August 24, 2020, Google launched version 4.7 of its Google My Business (GMB) API. This update includes enhancements to how your restaurant locations can sync and display food menus on Google.

Structuring Your Knowledge Graph

Jessie Yorke, Yext Administrator

In this post we are going to discuss strategy and give you some tools to effectively think about structuring your own brand's Knowledge Graph!

Welcome to the Hitchhikers Program

Liz Frailey, VP Developer & Admin Experience

Welcome to Hitchhikers! We are so excited to have you join our mission of creating amazing search experiences for brands of all sizes.

Introducing: Yext Answers Plugin for WordPress

Alex Barbet, Product

Businesses of all sizes use both WordPress and Yext to build amazing client experiences, and as more and more brands around the world add the Yext Answers bar to their WordPress powered sites, we wanted to provide a way to drive their time-to-value even faster.

Yext’s Fall ‘20 Release is Now Live!

Nick Oropall, Senior Product Marketing Manager

For those of you who are new to Hitchhikers — Welcome to Yext's new training platform & community! Hitchhikers will be the home for all of Yext's product and release updates moving forward so we encourage you to create a free user and check out the platform!

Meet the Hitchhikers Team: Alyssa Hubbard

Alyssa Hubbard

Alyssa Hubbard began at Yext in the Upward Rotational Program. Now she is full-time on the Hitchhikers team, working to build a platform to empower our community of Yext power users.