C hoosing an enterprise search solution can be daunting and time-consuming. There are many factors to consider and it can be very confusing
This blog post is meant to help guide that decision by providing the key criteria for evaluating the best enterprise search solution. These criteria include vision, various technology considerations, licensing model(s), frequency of updates/support, employee resource(s) needed, flexibility, and security.
Base Technology and Fit
The first area to understand is to dig into the base or underlying technology of the solution. This includes the following areas:
- What technology stack is the search solution built on, and what programming languages would be used to implement and extend it? Is this the same as the technology used within your organization?
- Where is data stored? What technology used for storing data?
- Is any or all part of the solution open source? Or is it completely proprietary? Some mix of the two?
- Does it fit and work within the Content Management Solution or the application that will be exposing the search?
- What parts of the solution are essentially “off limits” vs. what is customizable if necessary
- What skills are necessary to do customization?
Evaluating the base technology behind the solution is important to understand how much it will take to run and support the solution, including what would happen if the organization decides to “go it alone” and support the solution with internal resources. While open source solutions could provide licensing advantages (more on licensing below) and also possibly provide access to the source code (if necessary), it also could lead to support considerations that an organization is not ready for. For instance, choosing a entirely open source option without a real business behind it and then building a solution in-house would mean that the organization is signing up to be a software developer that is essentially competing with the existing enterprise search software vendors out there already. This is still possible and may be the right decision in some situations, presumably in circumstances that cause for extreme customization anyway. But, the constant software development process, testing, implementation, and support necessary to keep up with the changes in the market may not make sense for organizations that just want a product and solution that works and is a truly cloud-based environment (which would be difficult to make happen in-house as well). Choosing a solution that meets the organization’s current technology stack is an important consideration.
Connectors are pre-built code to integrate systems together. Many are built for such things as Content Management Systems and CRM systems, but they could be any environment that the enterprise search solution provider felt was necessary or would provide them a marketing advantage. Commercial applications typically would have a stronger eye toward marketing and would naturally provide more of connectors, while open source solutions would tend to give the tools necessary for developers to create their own connectors. These connectors need to be evaluated according to the following questions to assess the appropriateness:
- How many connectors are currently available?
- Are the necessary connectors available for the organization’s immediate needs? What about for growth?
- If a particular connector is not available, is it possible to create a custom connector? How difficult is this process?
- How deep do the connectors go? Do they provide the right level of integration to be effective or just brush the surface to check a marketing box off that they have the connector? If incomplete in some way, how difficult is it to shore it up to get what is needed?
- Does the provider seem dedicated to continual development of additional connectors?
Vision and Architecture Philosophy
Understanding what the enterprise search solution was created for and where it is headed in the future is important. Some solutions were created and optimized for specific systems, applications, or use cases such as CRM or customer service or knowledge bases. While it may not be important to know how a particular vendor is going to handle predictive analytics or machine learning in detail, for instance, it might be important to an organization to consider a particular solution where the provider is working on artificial intelligence capabilities for automating taxonomy management. This could show show that the vendor is thinking about the future and has the same vision of where the organization wants to go.
Other considerations of vision and philosophy include how data is extracted from the source systems. Then how the search engine solution processes that data and merges data from all of the sources together, commonly called federation. It is important to understand how data is joined from different data sources and normalized to create a common structure, such as one data source having full name in one field while another source has first and last name in different fields. There are many ways to do this, and being on the same page is critical. In addition, the way that a system handles taxonomy is important. Taxonomy is the categorization capabilities or methods of creating context to the data and structuring the data such including creating filters, facets, and other user interface features. All of these different areas can affect the evaluation of a search engine solution.
Most search engines today need to be able to handle very large databases and index sizeable quantities of data, sometimes into the millions and even billions of records. This also means the system needs to be built in order to provide response times to queries in an efficient manner With the amount of data that needs to be processed, the search engine solution needs to process data in order to not frustrate users that are accustomed to Google-like response times. Areas to consider under scalability include the following:
- Number of data sources
- Number of records within each data source? What is the expected growth of data?
- Frequency of updates and how much of the data needs to be updated with each update
- How many queries will be performed? What is the expected growth?
Indexing is the method for gathering the data. It describes whether (and how) a crawler is used, how often data is captured (time between indexes), how fast the actual indexing process takes, and whether some or all fields need secondary processing for creating metadata in order to use the data. All of these are important to consider because if the system is down while the indexing occurs because of how long the process takes or the way that the system is built. If the system is unavailable during this time or slow from the user’s perspective, it is a concern. It is also an issue if the data is very old (stale) because of the time between indexing. Often the processing can be handled offline with a separate server such as a staging server and and intelligent means of data capture can be maintained such as only getting the data has changed rather than the entire data set from every data source. All of these architecture decisions should be evaluated when making a selection of a search engine solution.
Search Features and User Experience
The core query functionality of the system is critical to look at. At this point within the search industry, there are quite a few search features that should be expected in a modern search solution. The features and functionality that should be in most systems including sorting, filtering, faceting, stemming, keyword searches, boolean searches, the use of wildcards, field searches, range searches, synonyms, “did you mean” type features, auto-suggesting and auto completion. If any of these are missing, it should be cause for concern.
The search solution needs to also provide flexibility to allow for providing the means to create a world-class user interface and experience. In many ways, the user experience is just as important, or more important, than the back-end functionality. The system should have the ability to create modern user interface components such as responsive designs (mobile), filters, facets, keyword highlighting, etc. Establishing the user interface can be expensive and care should be taken to make sure to understand how easy it is to make changes if the requirements change.
Search relevancy is the process for determining which search results end up at the top of any particular results list based on how relevant the data is to the search that was performed. Search relevancy is a constant process of optimization of the search algorithm to the needs of the individual system and the ability of the system to determine the user intent. Indexing and architecture can heavily influence search relevancy and how data is processed. The search engine solution should be graded on how easy and flexible it can be tuned to the needs of the organization, how search scoring is handled and its accuracy and ability to tweak, as well as the system’s ability to boost relevancy either manually by an administrator or by additional criteria that is added to the algorithm.
Measurement of search relevance should include aggregation and analysis of search logs, keyword information, results logs, click information, abandon statistics, and possibly even conversion statistics if they are available, particularly if they can be tracked back to the search data. All of this information will help get a more clear picture of the user’s needs and intent with the goal of continual tuning. Eventually, it could lead to personalization of each search that is perform to each individual user that is performing the search. In order to get to this ultimate goal of complete understanding of the user and their intent, it is appropriate to use big data techniques and tools, machine learning methodologies and technologies, as well as predictive analytics to help improve the relevancy scores and continual improvement of the search results.
Licensing Models and Cost
How the license for the solutions works and pricing works is an important criteria in deciding on the right application for an organization. Licensing can be very complex and have many components. These components may not be linear either, with potential hidden costs that aren’t immediately obvious. For instance, although a purely open source solution could look inexpensive with no direct license expense, the on-going support and additional development expense could end up being cost-prohibitive, particularly if the organization doesn’t have the skills to manage an open source solution properly. Some questions to ask on pricing include the following:
- Is the solution SaaS? On-premise? Hybrid?
- If on-premise, how would hosting be handled? Is there a flat fee or tiered pricing? Is there maintenance costs? Is the license price contingent upon the number of servers or processors?
- If SaaS or some type of hybrid is there a base cost? Is it per month? Is there some additional volume-based expense per month (most commonly based on the number of queries)? Is there additional pricing based per person?
- How is support handled? Is this an extra expense? Are there maintenance expense for additional years?
- How is training handled? What training expenses are necessary?
Security and Authentication
Protection of data continues to be one of the biggest challenges for modern organizations. Sensitive and proprietary documents needed to be secured from individuals and systems that should not have access. Some areas to consider on security include the following:
- How is authorization provided?
- Is a single sign-on available?
- Can the system provide document-level security?
- What other security capabilities does the system provide?
Administration and Skills Necessary
Modern enterprise search solutions provide reporting and administrative capabilities to employees in order to understand more about how the system is operating and allowing optimization of search results. An evaluation of the reporting as well as gaining some understanding of what options are available to tune the results is necessary to understand the breadth of the solution. Some considerations for administration include:
- Are there tools for synonyms? Is there an administrative interface to manage synonyms?
- How are misspellings handled? Is there an automated system to detect misspellings?
- What skills or employees are necessary to administer the system?
- Is there a way to boost favored content within the search results?
While selecting a enterprise search solution can be complex and take quite a bit of time, using a methodical process of evaluation can ensure that the right decisions can be made. Rushing the decision for such an important portion of an enterprise network could cause problems and issues with the customer experience for years. On the other hand, choosing the proper solution could provide the right technology to provide accurate access to the information and a superior user experience.