Tags
Ranked in these QuestionsQuestion Ranking
Pros
Pro High level API that scales
Pandas-like data frame syntax for succinctly doing powerful aggregations, with no need to pollute your code with batching or scaling logic.
Pro Cheap clusters with spot instances
Using AWS spot instances or GCE preemptible VMs allows for starting cheap clusters, averaging ~10% the price of a on demand instance.
Cons
Con High learning curve
Takes time to understand memory & time complexity of transforms, which leads to a rocky start with Spark while you keep spilling memory.
