Disclosure: Manifold, the marketplace for independent developer services_, has previously sponsored Hacker Noon._ Use code HACKERNOON2018 to get $10 off any service.
As the founder of Bonsai, I get a lot of questions about how to effectively scale Elasticsearch.
Our customers often ask how to calculate shard counts, or architect indices. Or they need to know how much data is required, or what it takes to handle certain traffic loads. These are valid questions, and an easy place for engineers to get started. But there is a lot more that goes into an effective search engine than its mechanics.
In his talk at Lucene Revolution 2017, Eric Pugh compared a search engine’s needs to Maslow’s Hierarchy of Needs. I thought that was a really interesting metaphor, and it’s been one that’s helped expand and inspire the imaginations of the customers I speak with about search every day.
In his famous diagram, Abraham Maslow outlines the five states of psychological growth for human beings. While you’ll hopefully never have to take your search engine to a psychologist, there are a lot of parallels between what makes a human effective and what makes your app’s search effective.
The next time you architect or review your search engine’s design, be sure to ask yourself the following questions prior to shipping it to production:
The foundation of a good search experience is the ability to query your data, and do it fast.
It’s important to point out that a search engine is not a general purpose data store. It’s designed to create an index of your data, which it uses for extremely efficient querying. By optimizing for query performance, a good search engine can start to build a lot of interesting combinations and inferences when a user performs a query. This enables a kind of “fuzzy” matching, and a ranking that can be easily tuned and customized to best match what the end-user is really trying to find.
That may start to sound complicated, and it is. Fortunately, there exist a lot of great open source search engines out there that you can start playing around with in an afternoon. These projects represent decades of expertise and engineering specifically with search in mind. And an open source search engine will let you download and run it right alongside your development environment.
Bottom line, a good search engine will have a ton of features and functions that are optimized for searching. Selecting and integrating one into your application is where our journey begins.
This is the Functionality stage.
So you’ve gone and integrated a search engine into your application. Congratulations!
At some point it will make sense to get some of this new functionality shipped into production, and put it in the hands of customers and end-users. That means learning about the operations of a search engine, and what it takes to ensure that once it’s up and running, it stays up and running.
If you really want to dive deep into this, you could build a career on the operations and scaling of a search engine alone. (We certainly have!) In today’s cloud-centric world, you get to learn not only about systems engineering and administration, but about challenging areas of computer science, like distributed computing.
To sample a few subjects that a capable operations engineer will cover, consider the following: configuration management, immutable architecture, application packaging, deployment lifecycle, data backups and verification, monitoring and observability, alerting, disaster recovery, capacity planning, sharding and data partitioning, replication, consensus algorithms, queueing theory, CAP theorem, and more…
The goal here is to create a reliable platform that you can trust to continue doing its job as you build more and more awesome functionality into your app.
If that all seems kind of intimidating, don’t worry! The Cloud will provide! There are plenty of quality managed search vendors out there (full disclosure: we’re one two of them!) who are happy to partner up with you to provide solid search reliability.
This is the Reliability stage.
As a product developer, one of the most exciting phases of a new product’s lifecycle is that initial release. When your ideas and hopes and hypotheses finally meet the real world of actual usage. Is everything working as intended? Do people like the new thing? Are they getting value out of it?
Of course, with all the work that goes into steps one and two — especially if you went the DIY route on hosting! — it can be understandable for some developers’ energy flags after the initial launch. Fear not, for end-users are ever vigilant, and even this cadre of coders can expect an eventual bug report that some obvious-seeming query didn’t quite return the expected results.
After all, what is the point of a search experience that does not actually return what a user needs to find?
This is where you’ll want to consider your strategy for logging and observing what’s actually going on with the cluster. Sure, you’ll want some basic operational metrics, so you can answer the question, is this thing on? But much more important from an engagement perspective is to measure what people are asking the search engine.
For some, that could be a little extra logging on your queries. Others might go a step further and actually save an end-user’s queries to perform some analysis on.
Whatever your approach, consider the following questions:
So no excuses! It’s easier than ever to architect an engaging search experience. And some investment into measuring that usage and engagement in the beginning can pay major dividends in the long run.
This is the Engagement stage.
Generally there is a purpose to integrating search into an app. Some businesses even base their entire business model or market differentiation on their search experience! And even the simplest internal search tool could save your colleagues hours every week.
One of my teammates has a background at a large ecommerce company that earns billions of dollars of revenue a year. He loves to tell the story of a single synonym change worth millions of dollars. Ecommerce is a great example in general: consider how filtering or tweaking the ordering of results based on inventory, or profit margins, could affect the bottom line.
Even when the connection to revenue is less clear, search can still serve a purpose, whatever the site. A few of the at-scale social media platforms using our service like to experiment with host the search experience influences average time on site, or monthly active user sessions.
Whatever the underlying business use-case, at this stage, the engineer — or team — responsible for search has some connection or mandate to business KPIs. That could be shopping cart checkout rates, or average session duration, or just straight-up revenue booked.
A team that achieves this level of the hierarchy of search needs is incorporating all of the previous levels into a feedback loop; a virtuous cycle of continuous improvement. A business goal is set, engagement patterns are analyzed, and changes to functionality are made, shipped, and measured. Rinse, repeat.
At this level, a team has access to expertise in all of the search functionality tools available. Reliability is diligently maintained, but ultimately an afterthought; it’s the more advanced rollout patterns (such as split-testing) that are of interest now. Engagement patterns are well-understood, well-instrumented for observability, and trends are reviewed regularly. All to facilitate the setting and achieving of ever new and greater goals.
This is the stage where your search achieves its Purpose.
Whether you’re building search for a Fortune 1000, or at a startup who just find significant product-market fit: prepare for the hockey-stick of scale.
Scale is an interesting idea because it connotes doing something many, many times. That can mean many, many requests as a search feature is rolled out to more and more end-users. It could also mean supporting many clusters for lots of different teams and microservices and deployment environments.
Sometimes operating at scale comes with a lot more oversight and compliance requirements. Setting up a secure and scalable search engine that’s compliant with, say, HIPAA or PCI or FINA or SOC compliance comes with its own set of challenges and processes and, ultimately, practice.
In the long run, there is no one-size-fits-all solution to scaling search. Some apps need a high volume of queries, or very complex queries, or both. Some need to store and make searchable many years of logs and analytics data.
When it comes time to scale, it always helps to have a couple years of experience to bring to bear.
In our experience, the first step to operating at scale is to deeply understand the needs of the business, as well as the capabilities of the search engine. Sometimes there’s a significant amount of back-and-forth as application features are designed to live within the best intersection of the two.
For example, when we design for performance at scale, we follow the mantra of doing less work. Search engines are all about shifting the complexity of a query into the generation of the index, so that the search requests themselves can run very quickly. Knowing where to keep an eye out for unnecessary extra work is essential, but that also needs to translate into practical solutions that can still accomplish business goals.
At a certain scale, it makes sense to invest in building an entire team of search engineers and operations specialists, especially when that search is mission-critical. However, there is still a lot of value that organizations can get out of working with a close partner to manage various aspects of the more commoditized cluster operations.
You have now hit the Scale stage.
When implemented effectively, search engines can almost seem psychic. And yet, completely invisible.
Consider sites like Amazon.com, Netflix, and Pinterest. Not to mention a little search company called Google, perhaps you’ve heard of them? Their search is so advanced that it seems to get what we want before we even do. These companies have huge teams that dedicate themselves fully to crafting an exceptional search experience. And they’re pushing the boundaries of what’s possible with search.
You may never get to this stage. That’s okay. If you do though, search can provide a big advantage over your competition.
And if you’re at this stage, you don’t need me to tell you about it!
But because these kinds of advances can help design where the rest of the industry will go, it’s definitely interesting and educational to keep an eye on these kinds of trends.
One very interesting and relatively accessible project that’s available today is the Learning to Rank plugin for Apache Solr and Elasticsearch. It’s a plugin that’s built by Bloomberg to incorporate machine learning into the tuning of a search query’s relevance function.
Much of the cutting-edge work being done in this area of search is incorporating greater awareness of the end-user’s context and preferences. Consider, for example, a search request that understand’s the user’s location and can adjust itself accordingly. Or factoring in a user’s activity and preferences in the ordering of results. Over time achieving more personalization and a better tailored view of the internet.
So get on it! You’re now at the Innovation stage.
When your users type in a query into your search engine, they expect it to just work. They should never be reminded of what’s actually under the hood, which is where we engineers all too often love to dwell.
Hopefully this Hierarchy of Search gives you the basics you need to start exploring your own search journey. What are some areas that you hope to reach? Where do you need to shore up some more fundamental attributes?
And if you’re just overwhelmed because search operations in particular are just not your thing, or you’d like to borrow our years of experience to give advice at any level, you can always give Bonsai a try. We’ve got a super convenient free Sandbox plan for developers to test their apps on, and have been supporting and scaling search for thousands of businesses since 2009.