Streaming, Batch and Request-Response Data processing modes

16 min readJun 19, 2024

by Mateusz Słabek

Processing modes evaluation in the context of use-cases

For modern organisations, a data-driven approach to decision making has become the standard. The furthering digitalization of data causes an increase in the volume of information that can be analysed, providing a more complete picture to act as the basis of business decisions. Not only is the volume of data increasing — the nature of these data sources is evolving, enabling new ways of producing and consuming data. Collecting information and processing it periodically may have been a necessity some years ago, but now, due to the advancements in streaming technologies, collecting and processing data in real-time has become incredibly attractive.

In light of this, it’s important for organisations to reconsider how they should process their data. We call the styles of how data is processed processing modes and we recognize three of them: stream processing, batch processing and request-response. When choosing a mode, you have to consider the context of a specific use case, the technologies already used in existing architecture, the type of data sources being used and the value gained through using the chosen mode. To better understand how they work and how you would choose one, we’ll start with a very down-to-earth example to gain some intuition and then we’ll approach each mode in a more precise manner.

The story of 3 interns

Let’s imagine a fictional law firm. We’ll follow the jobs of 3 interns and each one will work in a different way — each in a different processing mode. Before they take on legal tasks, they need to do some mundane work and since this law firm does not believe in digitalization, this will involve manual tasks that normally would be facilitated by IT.

Request-response example

As our first intern gets into the office on the first day of the job, the boss leads him to his desk. To the side of the desk there are multiple cabinets filled with files, all neatly catalogued. “Your job is going to be serving our client’s files. Other lawyers are going to come up to your desk and request some information about our clients’ current cases. Time is money, so you’re going to have to work quickly. The files are sorted and catalogued alphabetically based on the client’s name, so you’ll have no trouble finding what you need. At the end of each day, a guy with new files will come in and you’ll have to put them in the correct place, so that everything stays nice and sorted.”

The intern sits down and soon, lawyers come in. The job is pretty easy at first. Around ten people per hour come up to his desk, tell him what client information they need, he finds it, copies it and hands out the copy. At the end of the day new files come in and he puts them in their correct place. This is what allows him to work fast — he does this sorting ahead of time, so the next day when he’s looking for the file of Johnny Kowalski, he doesn’t have to look in every cabinet — just inside the cabinet for clients with last names beginning with ‘K’. In technical terms, we’d say this is using indexing.

After a couple of days, an oil spill happens and since the oil companies are a big part of the firm’s clientele, the lawyers will have a lot of work to do. Traffic at the desk of our intern increases tenfold and our intern can’t keep up. A constant queue forms up near the desk and the work basically grinds to a halt. Lawyers can’t really continue their work without the client’s current information, so they are blocked. This is because this system works synchronously — technically it’s doing synchronous processing.

Next week, management acknowledges there is a problem and finds the bottleneck — the desk. This is a problem of scalability. The system does not cope with the increased load, so 4 more interns get transferred to desk duty. In technical terms, this is what’s called horizontal scaling or scaling out. Vertical scaling / scaling up (increasing the performance of a single intern) would not be possible in this case.

After scaling out, the queues get shorter, but there are still some problems. The interns keep running into each other as they are going to the cabinets. The interns themselves quickly solve this by dividing the cabinets between themselves equally. Each intern sits at a separate desk and each desk has a sign telling about the range of names served. This is called partitioning.

Batch example

The second intern is given a different job. He gets to his desk and as he sits down, his boss slaps a stack of papers on the table with a thud. “These are the lawsuits that were made against some of our clients. I need you to calculate the total amount of damages sought in these lawsuits per client.” The task is dull but simple. There are a lot of papers, but at least that’s the entire task upfront — in technical terms you could say the input data is a batch — it’s bounded.

The intern gets a paper, a calculator and gets to work. He goes through the lawsuits one by one and updates his notepad — if he encounters a client he hasn’t seen before, he writes down his name with the damages sought in the current lawsuit and if he encounters a client he previously had noted down, he updates his damages amount.

Our batching intern’s need to keep track of the information from previous lawsuits he looked at is what makes this stateful processing. The intern does not know the total sum for any client by looking at a single lawsuit — he needs to keep the amount in memory (in his notebook) across all lawsuits. This is in contrast to the request-response intern at the desk, who was processing a single request regardless of all other requests — that was stateless processing. Also, this specific type of operation is called aggregation, which means collecting multiple things together.

Some days later, after multiple batches have been processed, management gives our intern lawsuits against the oil company clients which come in on giant document trolleys. Volume on input data has risen and the 1-intern-batch-system cannot sustain its performance under this load, so 4 more interns get assigned to this task. This current setup is pretty easy to scale out. Interns divide the work between themselves and at the end of each batch combine their aggregated results. Note that the similar aggregation operations in tools like SQL or Nussknacker are defined declaratively — the user doesn’t specify how the aggregation should be done, just what the result should be. This allows these tools to optimise and scale the operation under the hood and enables easy changes in configuration.

Streaming example

Our third intern works in a separate department of the law firm, one which works on damage control and they need to respond to events very quickly. The intern is given a job that is basically doing the same operation as the batching intern, but with a catch. The lawsuits are basically coming in as a stream, tens per hour and the damages sought need to be calculated and reported per every hour. Each report has to contain all of the lawsuits that were registered at the front desk during that hour. Secretary at the front desk attaches a note to each lawsuit with the exact time it was submitted. The reports need to be delivered as soon as possible to the damage control team, so they can make decisions based on these real-time data-driven insights. The specified time range for each report covers full hours that have no overlap — in streaming terminology this is referred to as a tumbling window with size of 1 hour.

In our example a problem quickly becomes apparent. Some lawsuits come even 10 minutes late, because some time passes from submission of a lawsuit to this lawsuit reaching the desk of our intern. For example, when a lawsuit gets submitted at 10.59, it will definitely not be processed within the 10.00–10.59 time window. This is a problem of lateness and it boils down to a compromise between completeness and latency. Assuming that all lawsuits come in order they were submitted, the intern can just wait for late lawsuits for 5–10 minutes after window finishes.

However, soon more and more lawsuits come in every hour. To handle that, more front desks for lawsuit submission are added to distribute the increased load. While most times the desks manage to handle the increased load, sometimes the load spikes and the affected desks basically halt for some time. Management decides that the completeness of data is critical even if they have to wait a long time. The intern thinks up a solution. He keeps track of the latest lawsuit that was submitted from each secretary (each partition) — technically speaking this is something like watermarks in stream processing engines. He produces his aggregation report only after each secretary hands over a lawsuit that was submitted after the time window closed. He doesn’t produce a report for the 10.00–10.59 time window until he knows he has all the lawsuits and he can only be sure of that by deducing it from the fact that each secretary brought at least one lawsuit submitted after 10.59.

Does this seem complex yet? Well, stream processing in the volumes and velocity that the Big-Data systems require is even harder. Some secretaries may not have any lawsuits submitted for a long time and watermarks may not come from them, essentially blocking the closing of a window (if you’re interested this is called idle partitions). We also need to scale out the aggregating intern’s work to multiple people dynamically and figure out how to divide the lawsuits between them on the fly while avoiding data skew. Additionally, this example was much simpler than actual Big Data stream processing by the fact that we assumed the lawsuits are brought in for processing in order they were submitted at — in reality out-of-order data is very common and it’s not easy to handle. The complexity is daunting.

Technical explanation

With down-to-earth examples for advanced technical topics like processing modes the problem is that it’s simply not possible to convey everything. Sure, you could look at interns as CPU cores or threads and extend the analogy even further, but there are limits to this. Things like presenting costs of operations, fault tolerance or the sheer volume and scale that makes Big Data big is not an easy thing to do. Reaching the limits of the analogy, let’s get technical.

Request-response mode

Request-response is often used to describe the way systems communicate in the context of network protocols, however in the context of data processing we’re using this term to describe a processing mode. The way it works can be boiled down to this:

User or application interacts with the processing system by sending a request and waits for a response.
The request-response service processes the request and sends the results in a response.

And that’s basically it. The fact that the user is waiting for the response is why it is classified as synchronous (for a comparison of processing modes in the context of synchronous and asynchronous processing, check out Arek Burdach’s blog post about the concept of bridge application). In our example, this is why lawyers were waiting for clients data at the desk in a queue.

In the current debate about Big-Data processing modes, request-response is usually omitted, as the debate focuses on comparing streaming and batch processing and request-response is not out-of-the-box suitable for high-volume stateful computations that batch and streaming processing technologies provide. Due to the focus on quickly serving pre-computed results, generally, custom request-response services can be implemented without specialised tooling like processing engines or message queues. Since the service itself is stateless, it can be easily scaled, as seen in the client data serving desks example. Though keep in mind that processing modes are not absolute categories — if the trade-off is acceptable in your use case, you could implement stateful computations or other functionalities not typical for the mode.

Request-response style of processing is the way user-facing web services are implemented. When loading a web application, the response time is vital to user experience. Of course, sometimes websites can make the request in the background and wait for response in a way that hides this delay from the user, but in other cases the user sees what is essentially a loading screen until the response from the slowest-response-time service arrives. This is the reason why typical service within this mode will focus on response time above all else. In the context of Big-Data, an ideal use case of request-response services is serving the results of batch and streaming computations or to act as a convenient interface for these pipelines for other purposes.

Batch processing

Batch processing refers to processing a bounded set of data (batch) and producing output for the entire batch. It is also a very old form of computing. This is the way data was processed even in the 1890 US Census, way before digital computers were even invented, as Martin Kleppman noted, in his famous book “Designing Data-Intensive Applications”. And if you think about it intuitively, it is the most natural form of processing for stateful operations — if someone wanted you to sort some documents, you would expect them to give you the entirety of these documents up front.

The long age and relative simplicity of batch processing does not mean however it can be in any way classified as legacy. To keep up with the increasing volume of data that today’s businesses require, there is a great need to parallelize the processing workload — dividing the work between multiple processing units (like CPU cores). In hardware terms, this is motivated by the fact that in the last decades, CPU single-core performance has stagnated, while the amount of cores per CPU increases steadily. It’s also related to advances in networking, allowing for the load to be delegated to other machines. Processing engines like Apache Spark or Apache Flink allow batch processing to scale batch workloads to today’s Big-Data requirements. Using their declarative API’s, we can keep processing logic separated from operative concerns. That way, changes in these areas remain isolated and processing engines can optimise their performance under the hood without requiring any action from their users.

Batch jobs typically run in scheduled intervals. If their input volume is high, the processing may take hours, or even days. This may seem non-optimal, but for big volumes, even after lots of optimizations, scaling the system, using specialised technologies, it will still take a long time since there’s no way around physical limitations. To mitigate this, if the processing results are needed for day-to-day business operations, these workloads may be scheduled to run during the non operation hours. Some use cases are perfectly suited towards batch processing because they need to be computed in bigger intervals like generating periodical reports, invoicing at the end of the month or making back-ups.

An important advantage of batching, especially when compared to streaming, is the cost of operations. Ecosystem of batch processing is very mature, with battle-tested, scalable storage and processing technologies. Finding experienced engineers or training new ones to operate these systems should be relatively easy and cheap. From a technical side, having all input data available upfront reduces the amount of complex cases to handle and makes it possible to know how much resources are needed for an operation. The business logic part of processing is also relatively very easy to reason about. These arguments make it attractive to use batch processing for use cases where there isn’t much value to be gained from real-time insights or from window-based intermediate results. In some situations, like operating in legacy architectures, use cases which are suitable for streaming may not be economically justifiable when compared to processing them in short-interval batches.

Stream processing

Stream processing is a method of processing data where input ingestion and output production is continuous. Input data can be unbounded and potentially infinite, thus streaming pipeline does not produce final results in a way that batch workload does. The pipeline “finishes” processing when it is stopped. Arguably, this resembles how business processes are often conducted in today’s world — not in scheduled intervals, but as a constant, reactive pipeline.

Streaming technologies are sometimes described as the holy grail of Big-Data processing. From a historical perspective we know that in computing there are no silver bullets, so to counterbalance the hype, let’s start with the downsides. At the end they can all be boiled down to complexity. In the example section of this article, request-response and batch examples, much of the content was a description of supporting topics, not essential to these processing modes. That’s because there’s not a lot of essential complexity to them — they are pretty simple. However in the streaming example, all of the introduced concepts were in some part essential to streaming and it was only scratching the surface.

Window aggregations, watermarks, lateness, idle sources are all non-trivial concepts that need to be understood and properly handled. This means increased cost of operations and engineer training. It also is not possible to push all of this complexity onto the engineering side because some of these concepts are business-critical. For example, deciding how to approach the compromise between completeness and latency stemming from late events is a business decision. This makes stream processing harder to reason about for data analysts.

However, the streaming ecosystem has been developing at a spectacular pace during the last decade and this complexity becomes less and less of a problem. Streaming technologies are becoming more robust in terms of fault tolerance and scalability and their interfaces are becoming more refined. Companies like Confluent keep delivering fantastic learning resources and actively cultivate a healthy developer community. Certain complexity may never be reduced, but the ecosystem’s actors keep making progress in simplifying what can be simplified.

Streaming comes with an incredible set of advantages. Not having to wait for a batch to finish and still have complex, stateful computations done on the fly can provide a lot of business value. This makes streaming shine in cases where freshness of data is critical. For example, in the case of fraud detection in the telecommunications industry, the window of time from detecting an anomaly to prevention is extremely short (for an in-depth fraud detection dive, check out Łukasz Ciolecki’s blog post series). The same can be said for alerting systems based on IoT devices (check out an example on Nussknacker demo application). Another use case perfectly suited towards streaming is real-time recommendation systems. These examples rely on low latency of processing — the lower it is, the more business value the system can provide.

Latencies in request-response, batch and stream processing

The continuous nature of streams allows for gathering insights from data soon after they are produced. This feature is usually characterised as low latency. In literature, when describing performance of systems, a distinction is usually made between latency and response time. However, for our purposes of comparing different modes, we’re referring to latency as the amount of time between an event occurring and the results of its processing being available. For request-response the event is the sending of a request and receiving a response. This may not be academically “correct”, but it’s useful to make a point.

Streaming jobs can have similar latencies to request-response services, but again — this is not absolute. If the streaming pipeline aggregates events in a 10-second long tumbling window with allowed lateness of 5 seconds — you could say the latency for the results inside the window is 15 seconds. For a sliding window or a session window it will be a completely different story. However, stateless operations on streams impose very minimal latency and a pipeline composed of them could be as low-latency as a website serving, user-facing request-response service. On the other hand, request-response making multiple queries to the storage systems may take a very long time.

Same thing can be said about batch — if the batch size is small enough, it may take very short time to process, thus having low latency. Some Big-Data technologies, like Apache Spark, implement stream processing using micro-batch processing.

Cheat sheet: request-response, batch, stream processing use cases

Analysing each processing mode reveals some clear guiding principles on how to choose the mode for a particular use case:

Use streaming if the value you gain from low-latency insights outweigh the complexity overhead.
Use batch if you operate only on historical data or are not discouraged by having to wait for results of scheduled jobs.
Use request-response if you need to serve pre-processed results or your problem may be reduced to simple, stateless operations.

This may be a stretch, but from perspective of principles like the rule of least power or KISS, you could even say that:

Use streaming if you need its features.
Use batch if you don’t need the features of streaming.
Use request-response if you don’t need the features of batch.

Organising your data processing infrastructure around one type of processing would be great, but it’s either impossible or would introduce a lot of unnecessary complexity. Generally, a big organisation needs to employ multiple or all processing modes, since they complement each other perfectly. The bad news is that it can be costly to integrate them all. The good news is that the processing ecosystem keeps making it simpler. Multiple well-thought out architecture models like lambda or kappa architectures are evolving and organisations that employ these models share their experiences and help in their refinement. Big-Data technologies are simplifying their interfaces to allow seamless transition between streaming and batch modes. You could see this as the renaissance of Big-Data processing and it’s hard not to remain hopeful for the future.

Mateusz Słabek @ nussknacker.io