InfoQ

News

Event Stream Processing: Scalable Alternative to Data Warehouses?

Posted by Sadek Drobi on Oct 31, 2008 03:32 PM

Community
Architecture
Topics
Data Warehousing ,
Enterprise Architecture ,
Events
Tags
Data Warehouse ,
Scalability ,
Event Stream Processing

On his blog, Dan Pritchett suggests an alternative solution to data warehousing applications. Although reluctant about “solutions that can only be implemented in a single address and storage space”, he acknowledges that sometimes data needs to be aggregated in order to be analyzed. This is precisely what data warehousing applications do offering the possibility to aggregate information along a variety of axis and to invert relationships in the data. Their usage, however, has significant downsides according to Pritchett. Not only are data warehousing applications expensive and “often out of the reach of smaller organizations”, but the way Extract, Transform and Load software (ETL) functions induces costs in terms of scalability and reactivity:

First, the ETL places a significant load on your production databases. If your business has nice offline windows for the ETL, that's great, but if not, managing the scale becomes a challenge. Second, the freshness of the warehouse is typically 24 hours behind or more. As your business grows this lag will grow as well.

Dan Pritchett believes that there could be a solution that would be less expensive and more scalable: processing streams of events using an Event Stream Processor (ESP) solution.

ESP analyze streams of events using a language similar to SQL. In the same manner that databases and data warehouses use SQL to perform analysis of data tables, ESP use their query language to analyze streams of events. The simplest way to understand ESP is to think of events as rows in a table and the attributes of an event as the columns. Each event type is the equivalent of a table.

[…]

[ESP analyzes] the changes to your data as it occurs. Rather than doing batch ETL's, you stream business events as the state of your data changes. This creates a more manageable scaling model for your production system.

[…]

ESP can also be horizontally scaled, providing a more cost effective solution for your business. And since ESP is performing the analysis in real time, the business metrics can be current and remain that way as the business grows.

Dan highlights however that this approach does not allow performing historical analysis in order to get on the business activity a perspective that is different from the one considered at real time. A solution Pritchett mentions could be a framework for capturing and replaying transactions, which would however be rather costly. Commenting on the post, Tahir Akhtar suggests another possible solution: replacing ETL by ESP but continue using data warehousing applications in order to preserve the ability to do historical analysis while taking advantage of ESP scalability and reactivity.

No comments

Reply

Educational Content

JRuby: The Pain of Bringing an Off-Platform Dynamic Language to the JVM

Charles Nutter discusses bringing JRuby to the JVM, why Ruby is hard to implement, JIT compilation, precompilation, core Ruby implementation, Java library access, library challenges and future plans.

Performance Anti-Patterns in Database-Driven Applications

Alois Reitbauer specifies several architectural anti-patterns that one should stay away from and which can downgrade an application’s performance.

Making TDD Stick: Problems and Solutions for Adopters

Teams in large organizations still struggle to adopt TDD. In this article Mark Levison shares problems he uncovered when he surveyed teams, and his own strategy to introduce TDD into an organization.

Testing is Overrated

In this talk from RubyFringe, Luke Francl asks: is developer-driven testing really the best way to find software defects? Or is the emphasis on testing and test coverage barking up the wrong tree?

VM Optimizations for Language Designers

John Pampuch discusses the HotSpot compiler, the history of Java performance, HotSpot development philosophies and challenges, optimization, JVM library improvements, and tips for better performance.

Keith Braithwaite, an Agile Skeptic

In this interview, Keith Braithwaite, an Agile developer, consultant and trainer, says that we should show a good deal of skepticism towards today’s Agile practice.

Workflow Orchestration Using Spring AOP and AspectJ

This article demonstrates how to build and orchestrate highly configurable and extensible yet light-weight embedded process flow using AOP techniques with Spring AOP and Aspect J.

Embrace Uncertainty

Jeff Patton explains why one needs to embrace uncertainty in order to succeed with his/her Agile project and how to avoid some of the common mistakes leading to project failure.