Three Fast Data Application Patterns
Monday, April 13, 2015 at 8:56AM
HighScalability Team in Strategy

This is guest post by John Piekos, VP Engineering at VoltDB. I understand this is a little PRish, but I think the ideas are solid.

The focus of many developers and architects in the past few years has been on Big Data, specifically mining historical intelligence from the Data Lake (usually a Hadoop stack containing terabytes to petabytes of data).

Now, product architects are asking how they can use this business intelligence for competitive advantage. As a result, application developers have come to see the value of using and acting in real-time on streams of fast data; using OLAP reporting wisdom, they can realize the benefits of both fast data and Big Data. As a result, a new set of application patterns have emerged. The applications are designed to capture value from fast-moving streaming data, before it reaches Hadoop.

At VoltDB we call this new breed of applications “fast data” applications. The goal of these fast data applications is to do more than just push data into Hadoop asap, but also to capture real-time value from the data the moment the data arrives.  

Because traditional databases historically haven’t been fast enough, developers have been forced to go to great effort to build fast data applications - they build complex multi-tier systems often involving a handful of tools typically utilizing a dozen or more servers.  However, a new class of database technology, especially NewSQL offerings, has changed this equation.

If you have a relational database that is fast enough, highly available, and able to scale horizontally, the ability to build fast data applications becomes less esoteric and much more manageable. Three new real-time application patterns have emerged as the necessary dataflows to implement real-time applications. These patterns, enabled by new, fast database technology, are:

  1. Real-time Analytics

  2. Real-time Decision Engine

  3. Fast Data Pipeline

Let’s take a look at the characteristics of each of these fast data application patterns and how a NewSQL database can improve and simplify building applications.

Real-time Analytics

This application pattern processes streaming data from one or many sources and performs real-time analytic computations on that fast data. Today this application pattern is often combined with Big Data analytics, producing analytics on both fast data and big data.

 

In this pattern, the value captured from the fast data stream is primarily real-time analytics. The streaming engine tracks counts and metrics derived from each message. Applications tap these stored results and display dashboard state and possibly offer real-time alerts.

Important features in this application pattern include:

Real-time Decision Engine

This application pattern processes inbound requests from many clients, perhaps tens of thousands simultaneously, and returns a low latency response or decision to the client. This is a classic OLTP application pattern but running at scale against high velocity incoming data.

Scaling to support per-event high velocity transactions enables applications that evaluate campaign, policy, authorization and other business logic, to respond in real-time, in milliseconds, to applications. In this pattern, the business value is providing “smart,” or calculated, responses to high velocity requests. Applications that make use of this model today include digital ad-tech campaign balance processing as well as ad choice (based on precomputed user segmentation or other mined heuristics), smart grid electrical grids, and telecom billing and policy decisioning. In all cases, the database is processing incoming requests at exceptionally high rates. Each incoming request runs a transaction to calculate a decision and return a response to the calling application.  For example, an incoming telecom request (a new Call Data Record), may need to decide, “Does this user have enough balance to process this call?” A digital ad platform may ask, “Which of my ads should I serve to this mobile device, based on campaign available balance?”

Important features in this application pattern include:

Fast Data Data Pipeline

This application pattern processes streaming data from one or many sources and performs real-time ETL (Extract, Transform, Load) on the data, delivering the result to a historical archive. In a streaming data pipeline, incoming data may be sessionized, enriched, validated, de-duped, aggregated, counted, discarded, cleansed, etc. by the database before being delivered to the Data Lake.

Important features in this application pattern include:

Applications making use of this pattern often are processing continuous streams of data that must be validated, transformed and archived in some manner. One example is processing device ids (usually in the form of cookies). The pipeline computes segmentation output intelligence, providing correlation data to be used for advanced decisioning applications, often in the digital ad tech arena.  

The Fast Data Pipeline

The fast data processing layer must have the following properties across all use cases:

In addition, the three patterns require a system architected to deliver:

 Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.