Semantic Textual Similarity: Here's How It's Changing the Game

As an e-commerce professional, you know the importance of providing a five-star search experience on your site or in your app.

In the fast-paced world of digital marketing, the user experience, starting when someone lands on your website and ending with them leaving as a satisfied customer, is nothing short of everything.

But do you know anything about semantic textual similarity (or just semantic similarity for short) and how it helps create that first-rate information-retrieval experience for your shoppers?

It comes down to this: when someone comes searching for a product or content, they fully expect to be given relevant, personalized, fabulous search results.

Which is where semantic textual similarity (STS) comes in. It compares the similarity of two pieces of text by analyzing their underlying meaning and context.

With this similarity search dataset revealing an “understanding” of the context and depth, a search engine can excel at pegging someone’s intent.

And then, like a thoughtful butler, it can suggest search results that are the most likely to resonate.

What Is Semantic Textual Similarity (STS)?

So what, exactly, is this complicated-sounding similarity-task technology?

Semantic textual similarity is a key metric used to assess likeness in meaning between terms or documents. Beyond simply looking at words, it incorporates numerical descriptions that measure the strength of semantic relationships.

In other words, semantic similarity is the ability of a computer system to understand the meaning of a piece of text and compare it to another. For instance, this could apply to sentence similarity.

Two sentences that convey the same meaning could be phrased slightly (or significantly) differently, and the STS technology would be able to identify the similarity in their meanings.

This process is rooted in the linguistics and computer science discipline natural language processing (NLP), utilizing approaches such as word embedding. Semantic analysis is a sub-field of computational linguistics, which looks at the meanings of words and how they relate.

Artificial-intelligence-aided semantic analysis technology examines vocabulary, grammar, structure, and context.

In the same way as Siamese twins are pretty different from fraternal ones, semantic similarity is different from semantic relatedness.

As Wikipedia notes, semantic relatedness “includes any relation between two terms, while semantic similarity only includes ‘is a’ relations… ‘car’ is similar to ‘bus’, but is also related to ‘road’ and ‘driving’…semantic similarity, semantic distance, and semantic relatedness all mean, ‘How much does term A have to do with term B?’

The answer to this question is usually a number between -1 and 1, or between 0 and 1, where 1 signifies extremely high similarity.”

Where is semantic textual similarity currently being utilized? Natural language understanding (NLU), sentiment analysis, and machine translation (automatically converting content to another language) are a few domains.

Determining Semantic Similarity

At Algolia, we use neural network–based technology to facilitate comprehension of search intent. We utilize vector search and machine learning to determine semantic similarity as part of providing the best search results.

With vectors, computers make sense of terms by clustering them in n-dimensional space. They can each be located with coordinates (x, y, z), and their similarity can then be assessed using distance and angles (our post on cosine similarity has details).

Machine-learning models determine that words near each other in vector space could be synonyms. When two pieces of content are embedded in a vector representation, deep learning helps determine the similarity.

We also use a tie-breaking algorithm that uses various criteria to compare matching items.

Here are the basic steps in our process:

Query understanding. NLP techniques are used to prepare and structure the search query so the search engine can analyze it.
Retrieval. In the AI search process, neural hashing is next. The search engine retrieves the most relevant results and ranks them from most to least relevant. We measure retrieval quality using precision and recall. Precision is the percentage of retrieved documents that are relevant. Recall is the percentage of all relevant documents retrieved. Both metrics help determine whether the search results are any good.
Semantic similarity measurement: Based on the extracted embeddings, the semantic similarity score, representing how closely the two pieces of text are related, is calculated.
Re-ranking: Based on clicks and conversions — plus rules and personalization as they relate to the particular shopper — a dynamic re-ranking process pushes the best results to the top of the list.

Adventures in Text Similarity (and Differences)

Whether people are terrible at asking for what they want or know exactly how to phrase a query to zero in on their desired item, STS has their back. Here are examples of content that might be processed in STS tasks:

“Best fitness tracker for weight loss” vs. “fitness tracker for losing weight.”

Consider the English phrases “best fitness tracker for weight loss” and “fitness tracker for losing weight.” At first glance, they may appear to have virtually identical meanings.

However, with the help of semantic textual similarity, a search engine can delve deeper and identify slight variations in the intent.

Whether the searcher is interested in the most highly rated trackers people use when they want to lose weight or simply wants to know if wearing a fitness tracker is helpful when trying to lose weight, STS is the key to displaying the most relevant results, which ultimately leads to a more satisfied user.

“Makeshift studio” vs. “homemade studio”

If content talks about a “makeshift studio” as opposed to a “homemade studio,” a savvy search engine can determine through fine-tuning whether the phrases are referring to the same concept.

In this case, “makeshift” could mean something more temporary, like a setup in a living room that must be torn down in order to have people over for dinner, whereas “homemade” could mean a space that’s a bit hokey — but still permanent — in a corner of the basement.

“New York Knicks” vs. “Madison Square Garden”

Sometimes a search engine must rack its digital brain to determine whether two completely different phrases refer to the same entity. If someone is searching for information about New York Knicks games, for instance, they might only type the venue name in their query.

But STS can make associations from benchmark phrases like “Home of the New York Knicks” and gather that the searcher might want to know about upcoming games.

From examples like these, it’s easy to see why semantic textual similarity is a critical component of modern search-engine skill sets.

Why Is STS a No-Brainer for Search?

As someone steeped in all things online, you probably hear the phrase “game-changing” on a regular basis, and you’re aware that some of that is simply overblown marketing speak.

In this case, however, genuine game-changing is occurring, as STS fundamentally improves search-engine and recommendation system accuracy and relevance.

That’s key because there’s nothing more necessary than knowing your users’ needs and ensuring that you’re getting them the right search results.

Semantic textual similarity functionality supplies search relevance and satisfaction with every user interaction.

For the record, STS goes way beyond traditional keyword matching. It empowers a search engine to understand the different ways people might express the same idea, which means linguistic ambiguity and variation aren’t possible roadblocks.

That hasn’t been the case with earlier-generation, traditional keyword search techniques.

This language-understanding skill is particularly important in e-commerce, where shoppers’ intent and context vary, and where online retailers must basically read shoppers’ minds in order to stay the least bit competitive.

STS can also improve related recommendations by suggesting items that are semantically similar to what the person has been showing interest in.

State-of-the-Art STS

Are you tasked with managing a search engine or e-commerce recommendation system?

If so, check out our NeuralSearch, which utilizes vector search in concert with neural hashes to deliver fast, accurate search results. It’s allowed us to combine the speed of traditional keyword search with the accuracy of neural search in a single API.

Our technology is rave-worthy at assessing user intent, context, and conceptual meaning to connect a query with the best content.

Then, let’s talk about your options for providing the best imaginable customer experience, with all the benefits that it can bring to your business.