By: Adit Madan, Jasmine Wang and Bin Fan, Alluxio
This article highlights the synergy between the two widely adopted open-source projects, Alluxio and Presto, and demonstrates how together they deliver a self-serve data architecture across clouds.
Condition 1: Evolution of the data platform does not require changes
All data platforms evolve over time, including the addition of a new data store, compute engine, or a new team that needs to access shared data. In either case, a data platform is self-serve if it does not require changes to accommodate evolution.
Condition 2: Isolation across Teams
Business units don’t step on each other with a self-serve platform. When a new team is introduced, data access by one team should have no impact on the existing usage of the shared data infrastructure.
The combination of the above two offers agility, which oftentimes is more important than the cost of physical infrastructure.
Below, we introduce some considerations when designing a self-serve platform, and architectural patterns for simple solutions.
This offers the flexibility to choose the most optimal service across environments.
The solution for shared data is to have an abstraction layer across heterogeneous compute. Alluxio provides such an abstraction across clouds for seamless sharing of data between Presto and other compute engines regardless of the data store.
Although replication provides isolation, governance becomes complex as the owner of data enforces strict policies about the consumption of data.
Copies introduce redundancy, which is error-prone and has high resource requirements.
It may seem obvious that a solution is to not make copies of data, but what about performance when we don’t move data? This calls for a single abstraction layer that takes care of governance, performance, and movement of data across ownership domains.
The architecture below shows Presto using the Alluxio layer for access to data regardless of the location.
The above design can be broken down into a few simple cases
In all these cases, the separation of the CONSUMER from the PRODUCER of data is enabled by an abstraction layer that provides more than a simple cache. Advanced preloading and write capabilities guarantee SLAs even with the separation of data from compute.
With a self-serve data architecture across clouds, we construct a solution that stands the test of time as a data platform evolves. Learn more from the whitepaper Presto with Alluxio Overview – Architecture Evolution for Interactive Queries, and see how companies including Facebook, TikTok, Electronic Arts, Walmart, Tencent, Comcast, etc level up their current Presto platform leveraging Alluxio.
First Published here