ETL Latency Holds Back Applications
The first generation of predictive applications completely separated the “rear-view mirror” from the “windshield” with a complex process of data extraction, transformation, and loading (ETL) between an RDBMS for the “windshield” to run operational applications and a Data Warehouse for “rear-view mirror” analysis.
The problem with this approach is that it takes too long to ETL operational data to the analytical systems so the analysis is stale. You are making decisions in the moment based on yesterday’s or even last week’s experiences.
The Challenges of ETL Latency
Lambda Architecture – No ETL
The modern architecture for a predictive application is the “Lambda Architecture.” This is a new term for a simple concept of using dedicated scale-out compute engines for different workloads. Scale-out engines use clusters of inexpensive computers to simultaneously work on data. As the data requirements grow, more computers can be added to the architecture to handle the workload. The scale-out architecture spreads the work as linearly as possible across the cluster to achieve the highest throughput of work. Scale-out architectures are also designed to handle failure because it is highly likely that in a cluster of computers simultaneously working on something, there is likely to be an exception. In scale-out architectures, there typically is no single point of failure — if a worker fails, another worker takes it place. Different engines implement this resilience in different ways but it is key to enabling much lower cost solutions than the RDBMS-ETL-DW solutions that typically require much more expensive solutions to scale.
Typical Lambda Architecture
One compute engine in a Lambda Architecture is typically called the speed-layer, used to ingest data and possibly enrich the data via aggregations (e.g., averages). The speed-layer can accommodate many fast data feeds simultaneously. Apache Kafka is a very popular engine used to power the speed layer. Another compute engine in a Lambda Architecture is used to perform analysis. Analytics takes longer because computation takes place on the whole dataset or a significant portion of it. Because of this latency, this engine is typically referred to as the batch layer. The third compute engine in a Lambda Architecture is the serving-layer that serves data to the application. This layer is typically is very fast to read because it is the compute engine responsible for direct user interaction. For example, it can usually look up or change individual records in milliseconds.
The Lambda Architecture avoids slow ETL because it streams new data to both the batch and serving layer in parallel. They both have immediate access to the data. The speed layer acts like a circuit breaker to buffer data for those systems to consume as their bandwidth allows. For example, if new orders or new clicks stream to both a serving layer and an analytics layer in a marketing application, then the serving engine that looks up the customer profile and applies a model to decide what to show the consumer, can use the most recent data. Additionally, the next time the batch layer kicks off a pipeline of transformations to build a Machine Learning model, it also has the most up-to-date data to create the new predictor.
Lambda Architecture Limitations
While this architecture has been a great start for the first generation of predictive applications, it has severe limitations:
- Complexity – This architecture is extraordinarily complex to build and maintain because separate systems, often open-source projects, written in different languages, need to be integrated and maintained. Teams of people are required to keep the infrastructure operational, tuned, and frequently updated to implement the latest versions with the latest bug fixes. We believe it would be better if these compute engines were pre-integrated together.
- Specialized Skills – These compute engines require highly skilled and sought-after developers who program in multiple programming languages and distributed system paradigms. We believe It would be better if this architecture were exposed in a more ubiquitous declarative language like SQL, which is universal and hides the implementation of the computation, even if the work requires complex distributed processing under the hood.
- Loose-coupling – The engines of Lambda Architectures are still loosely coupled, meaning changes in any one layer takes time and effort to be communicated to another layer. For example, any changes the user makes in the serving layer needs to somehow make its way to the batch layer to be considered in analysis. We believe it would be better if the architecture were based on a single, durable store for all the layers to have simultaneous access.
- Concurrency – Unlike operational databases of the past that were designed to handle concurrent users, Lambda Architectures are extremely limited, leaving all the requirements to handle concurrent users at the application level. When many people or systems can change the same data at the same time ACID properties are required . These properties ensure that updates to a database are made properly even in the event of errors, power failures, and simultaneous updates.
- Resilience – When many people or systems can change the same data at the same time or when there are long-running computational pipelines required, ACID properties are very useful. These properties ensure that updates to a database are made properly even in the event of errors, power failures, and simultaneous updates. So if a billion row transformation fails in the middle of a computation, the database can be rolled-back to a consistent state. In many implementations of Lambda Architectures, pipelines need to be restarted in light of failures often resulting in high latency and missed deadlines. When companies can’t publish reports to decision makers on time, the organization is flying blind. For example, some companies can’t run reports in time for analysts and managers to make decisions because pipelines are still running. The other major effect on companies is their learning cycle time. If a company’s data pipeline is not resilient, they are forced to update machine learning models infrequently and rely on stale data. This translates into customer dissatisfaction and lost revenue opportunities.
No More Duct Taping Compute Engines
Simply put, building lambda architectures is too hard because:
- You have to constantly duct tape many moving parts together
- The skill sets of people who can do that are extremely rare, expensive, and hard to recruit
- It takes a long time to do this
We set out to design a new kind of architecture we call Online Predictive Processing (OLPP) that implements a Lambda Architecture but wraps it in an easy-to-use SQL layer that:
- Would be accessible to existing IT and developers in a language they already know
- Would vastly reduce the time it takes to build predictive applications
- Would significantly reduce the time it takes for a predictive application to complete a learning cycle.
OLPP is the combination of OLTP, OLAP, Machine Learning, Streaming and Notebooks in a seamlessly integrated platform.
No duct tape.