When comparing Apache Spark vs Apache Flink, the Slant community recommends Apache Spark for most people. In the question“What are the best processing tools for Big Data?” Apache Spark is ranked 1st while Apache Flink is ranked 4th. The most important reason people chose Apache Spark is:
Pandas-like data frame syntax for succinctly doing powerful aggregations, with no need to pollute your code with batching or scaling logic.
Ranked in these QuestionsQuestion Ranking
Pros
Pro High level API that scales
Pandas-like data frame syntax for succinctly doing powerful aggregations, with no need to pollute your code with batching or scaling logic.
Pro Cheap clusters with spot instances
Using AWS spot instances or GCE preemptible VMs allows for starting cheap clusters, averaging ~10% the price of a on demand instance.
Cons
Con High learning curve
Takes time to understand memory & time complexity of transforms, which leads to a rocky start with Spark while you keep spilling memory.