Fault tolerance means that a system has the property to continue operating even though one or more components have failed.
For Erlang systems, this means that the system is kept running even if for example a user has to drop a phone call rather than forcing everyone else to do so.
In order to achieve this, Erlang's VM gives you:
- Knowledge of when a process died and why that happened
- The ability to force processes to die together if they depend on each other and if one of them has a fault.
- A logger that logs every uncaught exception
- Nodes that can be monitored so that you find out when they go down
- The ability to restart failed processes (or groups of them)
Pro Upgrade code without stopping the system
In a real-time system it may not be possible to stop the system in order to implement code upgrades. For these cases Erlang gives you dynamic code upgrade support for free when using OTP. The mechanism is very easy to understand and works as follows:
- Start the app
- Edit the code
That's all that is needed, the app updates with the new code while it's still running and tests are run automatically.
Pro Battle proven
Erlang has been used in production for more than 20 years now. During that time it has proven itself over and over again that works great in both small startups and large-scale enterprise systems.
Erlang has been used extensively by Ericsson themselves. For example, the AXD301 ATM, which is one of Ericsson's flagships is probably the largest Erlang project ever with more than 1.1 million lines of Erlang code.
Con Useful in only one niche
Erlang is not really a general purpose language. It has a very special and well-defined niche where it towers above everything else. It's specialized in scalability and in distributed applications. Which is not necessarily a bad thing per se, but it still lacks and falls behind other languages when it needs to do things outside it's niche.