Flint: A Distributed Computation Engine (over Named Data Networking)
Abstract
Distributed computing for big data is often achieved through inter-machine communication. The vast majority of distributed computing systems, such as Spark, use the TCP (or UDP) and IP protocols to achieve inter-machine communication. However, this results in an additional layer of indirection, where data cannot be directly located as there is little correspondence between data and machine name. To showcase the benefits and practicality of networking over data names for a distributed computing system, we present Flint, a distributed computation engine modeled after Spark and utilizing the Named Data Networking architecture. By exploiting features such as multicast data delivery over names, caching, and data security, we demonstrate the feasibility of a data-centric paradigm and its potential performance advantages in the context of cluster computing over a dataset.