What are the best self-hosted search engines for web applications?

Options
Considered

User
Recs.

Oct 22, 2022

Last
Updated

Typesense vs Appbase.io vs ElasticSearch

vs Solr vs Ambar

Specs

Price

Free

Price

Pros + Cons

Pro
Flexible deployment options

Run as a Fully Managed service, with an existing Elasticsearch cluster or self-host.

Comment

Elasticsearch makes it easy to get started by not requiring you to define a schema before sending documents to be indexed. Elasticsearch will automatically guess field types for you, which although will not be as accurate as creating the mappings manually, is usually pretty accurate. Elasticsearch also lets you manually define the mappings (index structure) before creating the index. One cool feature is if you miss a field or add a new field without defining the mapping, Elasticsearch will try to guess the Type for you.

Comment

Pro
Designed to be distributed

The one area where Elasticsearch shines is distributed search. It was built from the ground up to be suitable for high-scale 'cloud' applications. There are many features Elasticsearch has as a result of being designed to be distributed that aren't currently available in Solr, such as: Shards and replicas can to moved to any node in the cluster on demand. With a simple API call you can increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes. Manipulate shard placement with the cluster reroute API on a live cluster. Search across multiple indexes. Change the schema without restarting the server. Automatic shard rebalancing Elasticsearch also has a module called Gateway, that in the case of the whole cluster crashing or being taken down will enable you to easily restore the latest state of the cluster when it gets back up. Services such as Bonsai further simplify scaling Elasticsearch by hosting and scaling the search servers for you, making it nearly as easy to get started as CloudSearch or Searchify. Elasticsearch was also specifically designed to run well and be relatively easy to setup on EC2.

Comment

Pro
Rapid feature development

Another thing to keep in mind when choosing a search solution is the development momentum. ElasticSearch has quickly caught up to the competition and most of the currently missing features are due to be released in upcoming versions.

Comment

Pro
Percolator (prospective search)

Essentially a reverse search. The percolator allows you to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. Not possible in Solr out of the box.

Comment

Pro
Easy to work with

Elasticsearch gained its popularity amongst developers by being enjoyable to use. A simple feature comparison against it's competition doesn't convey the significant advantages of just how easy it is to work with. This is due to multiple design choices such as the use of JSON for the API and queries.

Comment

Pro
Allows multiple types of documents per index

Another useful and unique feature to Elasticsearch is the ability to have multiple types of documents in a single index. You can then facet, query or filter against all document types or a single type.

Comment

Pro
Handles nested documents

ElasticSearch natively handles a nested document structure. ElasticSearch will index nested documents as a separate indexes and are stored in such a way that allow quick join operations to access them. Nested documents require a nested query to access so that don't clutter results from standard queries.

Comment

Pro
Aggregations

Another area where ElasticSearch shines is its aggregations features. Similarly to facets (now deprecated), aggregations allow calculating and summarizing data of a query as it happens. Aggregations brings the ability to be nested and is broadly categorized as metrics aggregations and bucket aggregations. "aggregations" : { "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> }, ["aggregations" : { [<sub_aggregation>]* } ] } [,"<aggregation_name_2>" : { ... } ]* } Some unique ElasticSearch features are: Regular expressions to define which terms will be included/excluded in results Combine term results from different fields automatically Use scripts to modify the fields values before the calculation process steps in Awesome range searches Can specify set of ranges and it will return both document counts and aggregated data Modify the field and aggregated data with a script

Comment

Pro
RESTful JSON API for configuration/management

Elasticsearch has a REST API for management and configuration. The following are the main features of this API: Index Management: Create, delete, close and open indices by running a simple HTTP command. Increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes. Manipulate shard placement with the cluster reroute API. Move shards between nodes, we can cancel shard allocation process and we can also force shard allocation – everything on a live cluster. Check index and types existence Configuration: Majority of configuration files can be modified dynamically. Update Mappings Define, retrieve and manage warning queries Shut down the entire cluster or a specific node Clear caches on the index level This is all done over JSON, making it a lot more structured then the methods used in Solr.

Comment

Pro
Structured search queries with JSON

Search can be executed either using a simple, Lucene-based query string or using an extensive JSON-based search query DSL. By structuring the query as a JSON object you can be very explicit and can dictate exactly what ElasticSeach will return. A very basic example of a JSON query is: curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d ' { "query" : { "range" : { "postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" } } } }'

Comment

Con
Poor documentation

As a relatively new project, the documentation for ElasticSearch still leaves much to be desired. Documentation assumes that the user at least has familiarity with similar document stores, and is largely oriented toward those already familiar with other search solutions, such as Solr. Errors, while often quite simple to resolve, can be difficult to troubleshoot, as they are often insufficiently descriptive and missing from documentation. New users should be sure to check the tutorials section on elasticsearch.org for supplementary information lacking from the guide, such as more detailed installation instructions.

Comment

Con
Prone to 'Split Brain' Situations

The Sematex blog explains a problem with the way Elasticsearch handles its clusters, called the 'Split Brain Situation': Imagine a situation, where you cluster is divided into half, so half of your nodes don’t see the other half, for example because of the network failure. In such cases Elasticsearch will try to elect a new master in the cluster part that doesn’t have one and this will lead to creation of two independent clusters running at the same time. This can be limited with a small degree of configuration, but it can still happen. Users have already run into this problem in production and ElasticSearch host Bonsai also have had issues with this problem as recently as March 2012.

Comment

Con
Some missing features

Elasticsearch is currently missing the following features: Results Grouping / Field Collapsing Autocomplete Spell Checker/Did you mean (Available as a third-party plugin) Decision Tree Faceting Query Elevation Hash-based deduplication.

Comment

Pro
SpellChecker

Solr allows has the functionality to check and correct spelling mistakes in search queries. The three main implementations are: IndexBasedSpellChecker WordBreadkSolrSpellChecker FirectSolrSpellChecker This is currently not supported in ElasticSearch.

Comment

Pro
Results Grouping

Unlike ElasticSearch Solr allows you to group search results. Results can be grouped by: Field Value Query Function Query You can also collapse multiple results with the same field value down to a single result.

Comment

Pro
Distributed SQL

Solr has the ability to execute distributed SQL queries in parallel.

Comment

Pro
Customizablity

A key differentiator of Solr is the level of customizability the SearchComponent feature provides. SearchComponent provides the developer astonishing flexibility in the way search queries are assembled and executed. At the time of writing, there does not appear to be a ElasticSearch equivalent of SearchComponent. source Whilst ElasticSearch has a number of plugin-points there doesn't appear to be an equivalent of Solr's SearchComponent that enables you to modify the workflow of existing API endpoints.

Comment

Pro
100% open and free forever

Being an Apache Software Foundation product, Solr is guaranteed to remain 100% open source forever. No single corporation or group of people control Solr, but it is governed by a broad Project Management Committee (PMC) and maintained by 50+ committers voted in for their notable contributions and other merits.

Comment

Pro
Enterprise grade security

Solr comes with out-of-the-box, free support for encrypted traffic (SSL), password protection (Authentication), role based Authorization, and it is all pluggable so you can integrate as you wish.

Comment

Pro
Super stable

SolrCloud relies on Apache Zookeeper for its distributed features like failover, leader election, cluster wide operations etc. This prevents split brain syndrome and other typical challenges in distributed systems.

Comment

Pro
Large, active community

The community of users, programmers, consultants and downstream products is large and vibrant. The community maintains the product development, fixes bugs, provides professional support, consulting, training, 3rd party plugins etc. Whenever you have a question or problem, you will always find a helpful community member ready to answer.

Comment

Pro
Parent/Child documents

Support for powerful indexing of 1:* relationships through parent/child docs. Allows you to query children and display parents, or the other way around.

Comment

Pro
Hosting Support

The following services will host Solr for you. The great thing about these services is that they abstract away some of the difficulty of scaling Solr: WebSolr SolrHQ Measured Search

Comment

Pro
Stats component

Solr allows to view average, standard deviation, maximum, minimum, sum of squares of a particular numeric field. It also allows faceting of that numeric field based on the value(s) of other fields

Comment

Pro
Decision tree faceting

Solr has a faceting feature called pivot facets or 'decision tree facets'. Pivot facets enable you to calculate facets inside a parents facet, for example pivoting on 'size' than 'color' returns 'color' facet counts for each 'size' facet.

Comment

Pro
Local params

Solr has a great feature that enables you to use LocalParams to perform more advanced faceting. They provide a way to "localize" information about a specific argument that is being sent to Solr. In other words, LocalParams provide a way to add meta-data to certain argument types such as query strings. From the Solr Wiki: LocalParams are expressed as prefixes to arguments to be sent to Solr. For example: Assume we have the existing query parameter q=solr rocks We can prefix this query string with LocalParams to provide more information to the query parser, for example changing the default operator type to "AND" and the default field to "title" for the lucene query parser: q={!q.op=AND df=title}solr rocks

Comment

Con
Missing some useful features for cloud distribution

Solr is currently missing the following features that are useful when managing a distributed system: Automatic shard rebalancing Ability to re-locate shards and replicas on demand

Comment

Con
Other Missing Features

Solr is currently missing the following general features: Per-doc/query analyzer chain Support for multiple document types per schema Equivalent to Elasticsearch's percolation

Comment

Pro
Instant Search

Language-analyzed full-text search including fuzzy queries, phrases and metadata search. Done in milliseconds, no matter how many documents indexed or how complexed query is.

Comment

Pro
Supports instant search

Analyzes the language used to construct search queries for fuzzy querying, phrases and metadata search. This is all done almost instantaneously, no matter how many documents are indexed or how long the query itself is.

Comment

Pro
Very easy to use

Has a user-friendly web-based interface with real-time statistics, intuitive administration tools and a REST API.

Comment

What are the best self-hosted search engines for web applications?

Typesense vs Appbase.io vs ElasticSearch

vs Solr vs Ambar

Specs

Price

Price

Price

Price

Price

Pros + Cons

Con Not suited for petabyte-scale data (eg: logs)

Pro REST API

Pro Open source

Pro Instant Search Results

Pro Easy to setup and use

Pro Flexible deployment options

Pro Search relevance control plane

Pro UI components and kits to build search faster

Pro Supports geolocation, query rules, synonyms, typos

Pro Built-in access control

Pro Out-of-the-box search analytics

Pro Schemaless

Pro Designed to be distributed

Pro Rapid feature development

Pro Percolator (prospective search)

Pro Easy to work with

Pro Allows multiple types of documents per index

Pro Handles nested documents

Pro Aggregations

Pro RESTful JSON API for configuration/management

Pro Structured search queries with JSON

Con Poor documentation

Con Prone to 'Split Brain' Situations

Con Some missing features

Pro SpellChecker

Pro Results Grouping

Pro Distributed SQL

Pro Customizablity

Pro 100% open and free forever

Pro Enterprise grade security

Pro Super stable

Pro Large, active community

Pro Parent/Child documents

Pro Hosting Support

Pro Stats component

Pro Decision tree faceting

Pro Local params

Con Missing some useful features for cloud distribution

Con Other Missing Features

Pro Instant Search

Pro Supports instant search

Pro Very easy to use

One sec!

Con
Not suited for petabyte-scale data (eg: logs)

Pro
REST API

Pro
Open source

Pro
Instant Search Results

Pro
Easy to setup and use

Pro
Flexible deployment options

Pro
Search relevance control plane

Pro
UI components and kits to build search faster

Pro
Supports geolocation, query rules, synonyms, typos

Pro
Built-in access control

Pro
Out-of-the-box search analytics

Pro
Schemaless

Pro
Designed to be distributed

Pro
Rapid feature development

Pro
Percolator (prospective search)

Pro
Easy to work with

Pro
Allows multiple types of documents per index

Pro
Handles nested documents

Pro
Aggregations

Pro
RESTful JSON API for configuration/management

Pro
Structured search queries with JSON

Con
Poor documentation

Con
Prone to 'Split Brain' Situations

Con
Some missing features

Pro
SpellChecker

Pro
Results Grouping

Pro
Distributed SQL

Pro
Customizablity

Pro
100% open and free forever

Pro
Enterprise grade security

Pro
Super stable

Pro
Large, active community

Pro
Parent/Child documents

Pro
Hosting Support

Pro
Stats component

Pro
Decision tree faceting

Pro
Local params

Con
Missing some useful features for cloud distribution

Con
Other Missing Features

Pro
Instant Search

Pro
Supports instant search

Pro
Very easy to use