Fine-Grained Selection of Streaming Application Sub-Graphs for Different Levels of Data Processing Guarantee
Publication Date: 2016-Mar-07
The IP.com Prior Art Database
Disclosed is a method for allowing fine-grained specification and enforcement of different levels of data processing guarantees in different regions of a stream processing application, which is represented by a graph of operators connected by data streams. This method provides language level abstractions that allow a developer to programmatically specify the tuple processing guarantee required by different parts of a stream processing application.
Page 01 of 2
Fine- Data Processing Guarantee
Stream processing applications have different requirements regarding the levels of tuple data processing guarantees. One level of guarantee is called at-most-once,
where tuples are not duplicated but can sometimes be lost . This is common in applications that can tolerate approximate results, such as applications that compute a moving average over a time window. If a few samples are lost, then the computed moving average can still be very close to the average computed over the complete stream. Another level of guarantee is the at-least-once, where tuples are not lost, but can be duplicated. This is appropriate when the application transforms incoming tuples and stores the results in an external system. In this case, although the output can have duplicate results, every tuple is transformed and is present in the application output . The third level of guarantee is called exactly-once. This is appropriate when the application must perform an exact computation over every tuple (e.g., count the exact number of tuples or the exact average of an attribute over a time window ). Exactly-once is a stronger level of guarantee than the at-least-once, and at-least-once is a stronger level of guarantee than at-most-once.
Different levels of guarantees have different runtime costs , so it is desirable for a streaming platform to provide a configurable way to achieve the different levels of guarantees. Commonly, a streaming platform forces an application to have a single level of tuple processing guarantee. This is very limiting for a large-scale application, as different parts of the application can have distinct requirements . This is because different parts of the application can be doing analytics on different data sources , which can have different data rates and processing costs . For example, part of the application might be doing some exact computation over market data feeds , while another part of the application is doing video processing , where some video frames can be lost. If a single level of processing guarantee is imposed for the whole application , the video processing is paying an unnecessary overhead to unnecessarily guarantee that every frame is processed.
The novel contribution is a method for allowing fine -grained specification and enforcement of different levels of data processing guarantees in different regions of a stream processing application, which is represented by a graph of operators connected by data streams.
The method specifies one or more regions for different levels of data processing guarantees, where each region is a sub-graph of said streaming application graph and each region has a given level of data processing guarantee. Specifying one or more regions includes:
1. Annotating an operator of the said stream processing application graph as the starting point of a region
2. Identifying the region of the application graph for said level of guarantee by computing the reachabil...