As a relatively new project, the documentation for ElasticSearch still leaves much to be desired. Documentation assumes that the user at least has familiarity with similar document stores, and is largely oriented toward those already familiar with other search solutions, such as Solr. Errors, while often quite simple to resolve, can be difficult to troubleshoot, as they are often insufficiently descriptive and missing from documentation. New users should be sure to check the tutorials section on elasticsearch.org for supplementary information lacking from the guide, such as more detailed installation instructions.
The Sematex blog explains a problem with the way Elasticsearch handles its clusters, called the 'Split Brain Situation':
Imagine a situation, where you cluster is divided into half, so half of your nodes don’t see the other half, for example because of the network failure. In such cases Elasticsearch will try to elect a new master in the cluster part that doesn’t have one and this will lead to creation of two independent clusters running at the same time. This can be limited with a small degree of configuration, but it can still happen.
Users have already run into this problem in production and ElasticSearch host Bonsai also have had issues with this problem as recently as March 2012.
Elasticsearch gained its popularity amongst developers by being enjoyable to use. A simple feature comparison against it's competition doesn't convey the significant advantages of just how easy it is to work with. This is due to multiple design choices such as the use of JSON for the API and queries.
Elasticsearch is currently missing the following features:
Results Grouping / Field Collapsing
Autocomplete
Spell Checker/Did you mean (Available as a third-party plugin)
Decision Tree Faceting
Query Elevation
Hash-based deduplication.
Search can be executed either using a simple, Lucene-based query string or using an extensive JSON-based search query DSL. By structuring the query as a JSON object you can be very explicit and can dictate exactly what ElasticSeach will return. A very basic example of a JSON query is:
curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d '
{
"query" : {
"range" : {
"postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" }
}
}
}'
Another area where ElasticSearch shines is its aggregations features. Similarly to facets (now deprecated), aggregations allow calculating and summarizing data of a query as it happens. Aggregations brings the ability to be nested and is broadly categorized as metrics aggregations and bucket aggregations.
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
},
["aggregations" : { [<sub_aggregation>]* } ]
}
[,"<aggregation_name_2>" : { ... } ]*
}
The one area where Elasticsearch shines is distributed search. It was built from the ground up to be suitable for high-scale 'cloud' applications.
There are many features Elasticsearch has as a result of being designed to be distributed that aren't currently available in Solr, such as:
Shards and replicas can to moved to any node in the cluster on demand.
With a simple API call you can increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes.
Manipulate shard placement with the cluster reroute API on a live cluster.
Search across multiple indexes.
Change the schema without restarting the server.
Automatic shard rebalancing
Elasticsearch also has a module called Gateway, that in the case of the whole cluster crashing or being taken down will enable you to easily restore the latest state of the cluster when it gets back up.
Services such as Bonsai further simplify scaling Elasticsearch by hosting and scaling the search servers for you, making it nearly as easy to get started as CloudSearch or Searchify. Elasticsearch was also specifically designed to run well and be relatively easy to setup on EC2.
Elasticsearch has a REST API for management and configuration. The following are the main features of this API:
Index Management:
Create, delete, close and open indices by running a simple HTTP command.
Increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes.
Manipulate shard placement with the cluster reroute API. Move shards between nodes, we can cancel shard allocation process and we can also force shard allocation – everything on a live cluster.
Check index and types existence
Configuration:
Majority of configuration files can be modified dynamically.
Update Mappings
Define, retrieve and manage warning queries
Shut down the entire cluster or a specific node
Clear caches on the index level
This is all done over JSON, making it a lot more structured then the methods used in Solr.
Another thing to keep in mind when choosing a search solution is the development momentum. ElasticSearch has quickly caught up to the competition and most of the currently missing features are due to be released in upcoming versions.
Elasticsearch makes it easy to get started by not requiring you to define a schema before sending documents to be indexed. Elasticsearch will automatically guess field types for you, which although will not be as accurate as creating the mappings manually, is usually pretty accurate.
Elasticsearch also lets you manually define the mappings (index structure) before creating the index. One cool feature is if you miss a field or add a new field without defining the mapping, Elasticsearch will try to guess the Type for you.
Another useful and unique feature to Elasticsearch is the ability to have multiple types of documents in a single index. You can then facet, query or filter against all document types or a single type.
Essentially a reverse search. The percolator allows you to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. Not possible in Solr out of the box.
ElasticSearch natively handles a nested document structure.
ElasticSearch will index nested documents as a separate indexes and are stored in such a way that allow quick join operations to access them. Nested documents require a nested query to access so that don't clutter results from standard queries.
Solr is currently missing the following general features:
Per-doc/query analyzer chain
Support for nested documents
Support for multiple document types per schema
Ability to modify document scores with custom scripts
Equivalent to Elasticsearch's percolation
Solr is currently missing the following features that are useful when managing a distributed system:
Automatic shard rebalancing
Ability to re-locate shards and replicas on demand
Ability to change the schema without restarting the server
Ability to search across multiple indexes.
A key differentiator of Solr is the level of customizability the SearchComponent feature provides.
SearchComponent provides the developer astonishing flexibility in the way search queries are assembled and executed. At the time of writing, there does not appear to be a ElasticSearch equivalent of SearchComponent. source
Whilst ElasticSearch has a number of plugin-points there doesn't appear to be an equivalent of Solr's SearchComponent that enables you to modify the workflow of existing API endpoints.
Solr allows to view average, standard deviation, maximum, minimum, sum of squares of a particular numeric field. It also allows faceting of that numeric field based on the value(s) of other fields.
Solr allows you to group search results. Results can be grouped by:
Field Value
Query
Function Query
You can also collapse multiple results with the same field value down to a single result.
Solr has a faceting feature called pivot facets or 'decision tree facets'. Pivot facets enable you to calculate facets inside a parents facet, for example pivoting on 'size' than 'color' returns 'color' facet counts for each 'size' facet
Solr has a great feature that enables you to use LocalParams to perform more advanced faceting. They provide a way to "localize" information about a specific argument that is being sent to Solr. In other words, LocalParams provide a way to add meta-data to certain argument types such as query strings. From the Solr Wiki:
LocalParams are expressed as prefixes to arguments to be sent to Solr. For example:
Assume we have the existing query parameter
q=solr rocks
We can prefix this query string with LocalParams to provide more information to the query parser, for example changing the default operator type to "AND" and the default field to "title" for the lucene query parser:
q={!q.op=AND df=title}solr rocks
The following services will host Solr for you. The great thing about these services is that they abstract away some of the difficulty of scaling Solr:
WebSolr
SolrHQ
Solr allows has the functionality to check and correct spelling mistakes in search queries. The three main implementations are:
IndexBasedSpellChecker
WordBreadkSolrSpellChecker
DirectSolrSpellChecker
Swiftype integrates with all major third party platforms, offering a Shopify App, Magento Extension, and WordPress Plugin, with more to come. Swiftype also provides tutorials for adding Swiftype to Tumblr, Jimdo, Heroku, Weebly, Tumblr, CloudFlare, WebStarts and Desk.com. They also have questions dealing with fixes for WooCommerce, how to add Swiftype to any CMS (such as Drupal or Jekyll) and searching across content types (like WordPress using GoDaddy Shopping Cart)
Swiftype has client libraries for Python, Ruby, node.JS, Java, PHP, a search and a separate autocomplete library for jQuery, an iOS SDK, and an Android SDK.
In addition to their 14-day free trial, Algolia supports their Community with a free plan as well as discounts for non-profits, students and the open source community.
Algolia's engine has been built in such a way that you can index and search any language, or even several languages at the same time.
The engine is also typo tolerant, and will allow for up to two typos in each words of the search query.
This typo tolerance feature is also language agnostic, as it relies on optimized data structures and "fuzzy" tree traversals (implementing a Damarau-Levenshtein distance algorithm) instead of using dictionaries.
Their Tie-Breaking Algorithm gives you powerful relevance from day one that you can customize as much as you want by integrating the business metrics that matter most.
Only 1000 results per search.
Workaround: "You can use our Browse method implemented in all our API clients. This method supports most of the search parameters, which allows you to retrieve results beyond this limit."
While ElasticSearch and Solr both have active open-source communities propelling the technology forward, CloudSearch is closed. This has multiple disadvantages such as:
Constrained by what Amazon allows you to modify/customize.
No transparency behind new feature development.
Potentially slower development of new features.
No way to modify/extend the search algorithms.
No existing language specific API to call the Cloudsearch and process the response into objects.
The primary differentiator of CloudSearch is how simple it makes the lives of the developers using it. Not only does it automatically scale, developers can change search parameters, fine tune search relevance, and apply new settings at any time without having to upload the data again and can do so from a simple dashboard.
CloudSearch also automatically takes care of:
Hardware provisioning
Data partitioning,
Software patches.
CloudSearch dynamically scales as the amount of searchable data increases or as the query rate changes. The search system utilizes well-understood and automated sharding and replication to scale.
CloudSearch will automatically add search instances and index partitions as required as well as add and remove replicas to respond to changes in search request traffic.
CloudSearch lets you add the the following features:
Faceted search
Free text search
Boolean search expressions
Customized relevance ranking
Field-based sorting and searching
Text processing options such as stopwords, synonyms, and stemming.
CloudSearch does out-of-the-box ranking of search results with simple controls to let developers tweak the ranking. You can add stopwords, perform stemming, and add synonyms.
We were able to build a custom search interface with their SDKs. This was tailored to our website, and goes well beyond what we were originally expecting