Watching his presentation reminded me of a conversation I had with an ex-colleague about messaging and distributed execution frameworks.
With Big Data, the amount of data is so important that you can not process everything sequentially. Instead you parallelize the execution on several nodes. In practice, this implies to distribute the execution: the execution code (the mapper and reducer tasks) moves to the data nodes - where the data to process is stored - and runs locally.
With Messaging, you can achieve the opposite: to distribute the data. Messages containing the data are sent to consumers that will process them. In many cases, the data transported inside the messages is not the data to process; it can be an event or contains the location of the data to process. Nonetheless, the result is the same: the execution code will not moved, it will fetch the data and process it on its own node.
- Messaging moves the data close to the execution code.
- Distributed Execution moves the execution code close to the data.
If I understood Galder correctly, Infinispan proposes the two approaches:
- a Map/Reduce model where the mapper and reducer tasks are sent on the cluster nodes to distribute the code execution.
- Listeners and Notifications that can be used as a kind of messaging bus.
Messaging is a model that is widely used in entreprise applications and may become mainstream on the Web with the support of WebSockets by all Web browsers.
Distributed execution framework like Hadoop Map/Reduce or Twitter Storm will become mainstream as the amount of generated data to process and analyze (on the Web or behind firewalls for entreprise softwares) will continue to grow.
Infinispan seems to hit a sweet spot by providing the two models and the ability to mix and match them. I am looking forward to seeing what's next for Infinispan and the best ways to leverage it...