Here’s the Deal

Apache Spark vs Pachyderm

4 0

1 1

When comparing Apache Spark vs Pachyderm, the Slant community recommends Apache Spark for most people. In the question“What are the best processing tools for Big Data?” Apache Spark is ranked 1st while Pachyderm is ranked 3rd. The most important reason people chose Apache Spark is:

Pandas-like data frame syntax for succinctly doing powerful aggregations, with no need to pollute your code with batching or scaling logic.

Ranked in these QuestionsQuestion Ranking

Common Questions

What are the best processing tools for Big Data?

Pros

Pro
High level API that scales

Pandas-like data frame syntax for succinctly doing powerful aggregations, with no need to pollute your code with batching or scaling logic.

Pro
Cheap clusters with spot instances

Using AWS spot instances or GCE preemptible VMs allows for starting cheap clusters, averaging ~10% the price of a on demand instance.

No pros yet!

Cons

Con
High learning curve

Takes time to understand memory & time complexity of transforms, which leads to a rocky start with Spark while you keep spilling memory.

Con
Reinventing the wheel

Recreated a "modern" hadoop mapreduce & HDFS, except with git-like data. This ties you to their system, and cannot not run any tools such as Spark/Hive on top.

Con
Low level MapReduce API

Only ways to express transforms are: map & reduce. Developer time will be spent expressing high level concepts such as count, windowing, graph traversal in this framework, vs having Spark/Hive do it.

Built By the Slant team

Find the best product instantly.

4.7 star rating

Try it now - it's free