Real-Time Analytics

Processing huge amounts of data requires highest scalability of the infrastructure as well as applied technologies and algorithms to handle data ingest, storage, data analysis, and anomaly detection in real-time.

Our research in the field of real-time analytics is driven by the need to process and analyze millions of metrics and events in real-time. Why? Because processing the huge amounts of data requires high scalability not only of the infrastructure but also of the applied technologies and algorithms. This way data ingest and storage can be handled and large-scale data analysis and anomaly detection enabled in real-time.

Technologies: The ongoing emergence of new technologies requires continuous monitoring and evaluation of their relevance and applicability in the field of real-time data analysis. Extensive feasibility studies including performance tests of different technology stacks are necessary to make the right design decisions for the next product generation. The selection of the most suitable architecture is crucial for future success.

Algorithms: The large data volumes involved make highly efficient algorithms indispensable. Sketching and sampling or other data compaction algorithms enable significant data reduction without losing important information. Fast hashing algorithms are essential for real-time indexing. Distributed agreement and load balancing algorithms are needed to orchestrate the data flow in the cluster. And finally, the best algorithms for real-time analysis and anomaly detection must be found that offer the best trade-off between accuracy and resource consumption.

Related publications

A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks

Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real time. Stream processing frameworks facilitate scalable computing by distributing the application's execution across multiple machines. Despite performance being extensively studied, the measurement...

Adriano Vogel, Sören Henning, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser

| 18th ACM International Conference on Distributed and Event-Based Systems (DEBS'24) | 2024

Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are used inside microservices to continuously ...

Sören Henning, Wilhelm Hasselbring

| The Journal of Systems & Software | 2024

Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores

Parallel computing is very important to accelerate the performance of computing applications. Moreover, parallel applications are expected to continue executing in more dynamic environments and react to changing conditions. In this context, applying self-adaptation is a potential solution to achieve a higher level of autonomic abstractions and runt...

Adriano Vogel, Marco Danelutto, Massimo Torquati, Dalvan Griebler, Luiz Gustavo Fernandes

| The Journal of Supercomputing | 2024

ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale

This work introduces ExaLogLog, a new data structure for approximate distinct counting, which has the same practical properties as the popular HyperLogLog algorithm. It is commutative, idempotent, mergeable, reducible, has a constant-time insert operation, and supports distinct counts up to the exa-scale. At the same time, as theoretically derived ...

Otmar Ertl

| Data Structures and Algorithms arXiv:2402.13726 | 2024

High-level Stream Processing: A Complementary Analysis of Fault Recovery

Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software architectural style. Several software systems rely on stream processing to deliver scalable performance, where...

Adriano Vogel, Sören Henning, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser

| Distributed, Parallel, and Cluster Computing, arXiv:2405.07917 | 2024

JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets

The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets ch...

Otmar Ertl

| Data Structures and Algorithms arXiv:2403.18682 | 2024

ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks

Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of modern stream processing frameworks. In contrast to other benchmarks, it focuses on use cases where s...

Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser

| ICPE '24: Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering | 2024

A systematic mapping of performance in distributed stream processing systems

Several software systems are built upon stream processing architectures to process large amounts of data in near real-time. Today's distributed stream processing systems (DSPSs) spread the processing among multiple machines to provide scalable performance. However, high-performance and Quality of Service (QoS) in distributed stream processing are c...

Adriano Vogel, Sören Henning,Otmar Ertl, Rick Rabiser

| Euromicro Conference on Software Engineering and Advanced Applications | 2023

Benchmarking Stream Processing Frameworks for Large Scale Data Shuffling

Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. We outline our ongoing research on designing a new benchmark for distributed stream processing frameworks. In contrast to other benchmarks, it focuses on use cases where stream processin...

Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser

| Softwaretechnik-Trends | 2023

Cardinality Estimation Adaptive Cuckoo Filters (CE-ACF): Approximate Membership Check and Distinct Query Count for High-Speed Network Monitoring

In network monitoring applications, it is often beneficial to employ a fast approximate set-membership filter to check if a given packet belongs to a monitored flow. Recent adaptive filter designs, such as the Adaptive Cuckoo Filter, are especially promising for such use cases as they adapt fingerprints to eliminate recurring false positives. In ma...