The Virtuous Cycle of Machine Learning

Author: Michael Misiewicz, Director, Data Science
Product: Answers
Blog Date: August 2021

Building a great algorithmically-driven product requires a lot of data. You can (and almost certainly must) get some of this data via human labeling, but a great way to really make huge improvements on your algorithms is to get them out in the wild, and measure where they were wrong, so they can be retrained!

The best algorithmic product companies all do this at a large scale:

  • Google knows every search engine result page (SERP) served to every user in response to every query, and they know what position link (or widget, or card, or product, or map pin, etc.) is clicked (or not).
  • Google also knows every time they show you an ad in Gmail (or around the web, due to their DoubleClick ad network), and whether or not you clicked on it.
  • Continuing with Google, reCAPTCHA is another great example. Originally Google used billions of hours of human attention to train their optical character recognition algorithm, and have you noticed how many of the pictures now involve traffic lights and sidewalks?
  • Social media companies such as Facebook (both on the big blue app and Instagram) and Twitter can roll out new algorithms for recommending content for your newsfeed, and measure which interactions you have with it.
  • Product recommending companies like Amazon and StichFix can deploy new recommeandation models, and measure whether you purchase or return the item.
  • Netflix is well known for enormous investments in recommender systems, they can measure clicks and watch time (are you still watching?).

The more you look, the more you see these cycles in many products all around you, especially the ones that seem to have the most magical or impressive algorithmic features. Further, if you look at many of the algorithms underpinning many popular machine learning models, you'll also see this same pattern repeat everywhere. Random forests, gradient boosted trees, neural network optimizers, and reinforcement learning all use this fundamental concept.

We want to make Yext Answers the best it can be, so how do we go about implementing some of these training loops? Well, for the purposes of fitting a single model to a dataset there are incredibly powerful but also pretty well modularized methods. But how do we do it on a higher level of complexity, that is across a number of systems, functions, features and teams? There's a few objectives we need to optimize.

  1. Maximize the number of model inferences in production (i.e. serve as many models as possible, on as many searches as possible)
  2. Maximize the amount of collected feedback and human labeled data (making sure to store everything in a format that allows exploration but also rapid re-training)
  3. Minimize the time required for data scientists to retrain models and deploy them in production

It's no coincidence that the companies that are most successful in developing algorithmic products also make huge investments in optimizing the 3 points above. If you look at Google's track record of accomplishments (BigTable, Spanner, Borg, Kubernetes, TensorFlow, Tensor Processing Units, etc.), you can see the advanced features which become possible with great infrastructure.

Concretely, what are the data science, engineering, and product teams at Yext doing to speed up the virtuous cycle of machine learning?

  • Big investments in warehousing. One of the first and most important systems we built for Answers was a robust and flexible warehouse. We are able to log every request, SERP served, and user interaction event. If any end user clicks a thumbs up or down on a Direct Answer, we log it! This warehouse logs every user interaction with every page, and we use it to fill up labeling assignments for our human data labeling team. Plus, we do it all in a way that is totally anonymous and devoid of any personally identifiable information (PII) or pseudo PII.
  • A human data labeling team. Coupled with the user interaction data from the warehouse, we can rapidly find areas we're mispredicting and have our growing team of human data labellers identify and improve residuals.
  • Product features to increase the size of our labeled data corpus. This is why we built Experience Training!
  • A huge increase in the kind of content that can be served with Yext Answers. This is why we built Document Search. Building the Yext Crawler and Extractors has been essential for growing the corpus of data that can be labeled and used to train new models.
  • Big investments in serving infrastructure. We've spent a lot of time making it easier and simpler for data scientists to build and train models in the critical path of Yext Answers, and this has included a lot of research into specific hardware accelerators and configurations to make everything function within appropriate latency and quality thresholds. As of this writing, we have 37 models that serve in the critical path. That is up from two in June of 2020. Further, back in June of 2020 it took us about a month to deploy a new model, but now we retrain nightly.
  • Internationalization. We're expanding our labeling team to include more language expertise! Expanding languages supported is a great way to increase the number of inferences in production.

Thanks to this continued investment, I am confident that Yext Answers will continue to expand and improve. We're doing large amounts of R&D work to improve quality, from making it easier to add content to the Knowledge Graph, understand query intent better, and improve and expand direct answers.

All Blog Posts

Determine Location Intent in a Search Engine

Max Shaw, VP Product

One of the most common use cases for a search engine is finding something by a "location". Here are some basic examples: Cardiologist near Green Bay, Notary near me, Restaurants open now. These are all pretty simple queries, but getting these to work in a search engine is much more complex than you might imagine.

4 Methods for Increasing Site Search Clicks

Rick Swette, UX Research

We know good search drives business impact. It increases conversions and transactions, reduces search bounce rate, and boosts overall customer satisfaction. So, how do we get more people to trust and use site search? We embarked on a study to find this out.

How to Measure the Success of Your Site Search

Basil Polsonetti, Data Insights

Most brands know that site search is a feature their website should have, but unless the site is dominated by e-commerce, it’s often relegated as a check-the-box task when building a new website.

The Danger in Document-Level Sentiment Analysis

Calvin Casalino, Senior Product Manager

In order for your feedback to become an actionable item to help businesses provide a better experience, they need a way to analyze the granular content of all of their reviews, at scale.

Deep Dive into Duplicate Suppression

Dee Luo, Product Manager

Brands know the importance of having accurate information across all the apps, maps, and directories where consumers are searching for information. In a perfect world, powering that brand data and managing each of these listings would be enough to ensure that consumers consistently get the answers they're searching for.

Yext Answers Algorithm Update: Milky Way

Max Shaw, VP Product

Yext Answers is constantly improving it’s search algorithm to provide more relevant results over time. Milky Way is the first official upgrade to the Answers Algorithm and includes a series of important upgrades to provide better search precision and recall.

GMB API Update - Dedicated Food Menus

Dee Luo, Product Manager

On August 24, 2020, Google launched version 4.7 of its Google My Business (GMB) API. This update includes enhancements to how your restaurant locations can sync and display food menus on Google.

Structuring Your Knowledge Graph

Jessie Yorke, Yext Administrator

In this post we are going to discuss strategy and give you some tools to effectively think about structuring your own brand's Knowledge Graph!

Welcome to the Hitchhikers Program

Liz Frailey, VP Developer & Admin Experience

Welcome to Hitchhikers! We are so excited to have you join our mission of creating amazing search experiences for brands of all sizes.

Introducing: Yext Answers Plugin for WordPress

Alex Barbet, Product

Businesses of all sizes use both WordPress and Yext to build amazing client experiences, and as more and more brands around the world add the Yext Answers bar to their WordPress powered sites, we wanted to provide a way to drive their time-to-value even faster.

Yext’s Fall ‘20 Release is Now Live!

Nick Oropall, Senior Product Marketing Manager

For those of you who are new to Hitchhikers — Welcome to Yext's new training platform & community! Hitchhikers will be the home for all of Yext's product and release updates moving forward so we encourage you to create a free user and check out the platform!

Meet the Hitchhikers Team: Alyssa Hubbard

Alyssa Hubbard

Alyssa Hubbard began at Yext in the Upward Rotational Program. Now she is full-time on the Hitchhikers team, working to build a platform to empower our community of Yext power users.

Yext Answers Algorithm Update: Andromeda

Allie Allegra, Senior Product Marketing Manager

We are constantly making improvements to the underlying Answers algorithm. Our latest algorithm release, Andromeda, includes cutting edge improvements that optimize the overall search experience. With this release, our algorithm now has the ability to search semi-structured data. You can now opt-in to access these improvements depending on your configuration.

WCAG and Search: Developing an Accessible Search Experience

Rose Grant, Associate Product Manager

What’s WCAG? WCAG stands for the Web Content Accessibility Guidelines (WCAG). WCAG is not always black and white; its rules often have a variety of interpretations.

Now Available: Shopify Product Catalog Sync for Yext

Lilly Fast, Senior Business Development Manager

Shoppers have questions about your products, and your ability to answer will determine if they buy or if they bounce. But with rapid changes to your business, it can be hard to keep your product information consistently up-to-date.

Now Available: Yext Product Catalog Sync for Magento Commerce, an Adobe Company

Lilly Fast, Senior Business Development Manager

Shoppers have questions about your products, and your ability to answer will determine if they buy or if they bounce. But with rapid changes to your business, it can be hard to keep your product information consistently up-to-date.

A New Way to Search FAQS

Max Davish, Associate Product Manager

Semantic search is a new way of searching for FAQs. Instead of looking at keywords, it measures the similarity in meaning between two questions. Answers does this using BERT - the same revolutionary natural language processing technology that powers other Answers features like location detection and entity recognition.

How to Build Discoverable and High-Converting Landing Pages

Sonia Elavia, Senior Product Marketing Manager

Search is a massively important marketing channel. Fifty-three percent of website traffic comes from organic search, so it’s critical that businesses optimize for these experiences. But how do you actually build out a strong presence in organic search? Having search-optimized landing pages is critical.

Exporters: From Yext to Your Listings

Calvin Casalino, Senior Product Manager

Our Listings delivery pipeline ensures your data stored in the Knowledge Graph appears on Listings everywhere consumers are asking questions. How do we make sure your data is updated on all publishers as quickly as possible while still ensuring data is formatted properly for each endpoint? Yext’s Listings exporters.

2020: Hitchhikers Year in Review

Liz Frailey, VP Developer & Admin Experience

2020 has been a rollercoaster of a year for everyone for a multitude of reasons. On the Hitchhikers Team, we were able to overcome some of the year's obstacles to really transform the program.

Understanding the Answers Algorithm

Pierce Stegman, Data Scientist & Michael Misiewicz, Data Science Manager

See for yourself how the Answers algorithm works on the Understanding the Answers Algorithm site built by our Data Science team!

A Type System for Knowledge Graph Entities

Oscar Li, Software Engineering Lead

Our type system serves as the foundation of the Knowledge Graph by dictating the structure, validation, and display of fields. The type system guarantees the consistency and correctness of any field value retrieved from the Knowledge Graph. This makes it easier for our engineers to reason about code that handles particular fields, for our customers to manage their entity data, and for our publishers to trust the quality of our data.

3 Integrations That Can Help You Get More Out of Yext

Jonathan Gitlin, Content Marketing Manager, Workato

As a Yext user, you’re likely well aware of the platform’s product suite and the value each solution delivers. But did you know that you can provide even better search results for visitors, collect more reviews from customers, and analyze data more closely by integrating Yext with the apps and systems your team already uses?

Google’s Latest API Update: More Hours

Teddy Riker, Associate Product Manager

On February 25th, 2021, Google launched version 4.9 of its Google My Business (GMB) API. This update includes support for additional hours types, for options such as delivery, drive through, and more.

Spring ‘21 Release is Now Live!

Nick Oropall, Senior Manager of Platform Product Marketing

Yext's seasonal releases are always packed with new features and functionality to keep Hitchhikers on the cutting edge of search, and the Spring '21 Release is no different. Across the product suite, we have added new features that will help you to drive value and improve your user experience.