Spring Batch Intro

In this article, we’ll take a look at an introduction to Spring Batch. Spring Batch is a framework designed to process large amounts of data or batches of data.

There are three main things going on in a batch process. We read data from a source, perform some sort of processing on it, and finally, we write that data somewhere. This happens in what’s called a job.

Spring Batch Job

Spring Batch jobs enable us to write code that will perform a certain task. For example, we could have a job that sends newsletter emails to all users in a database or a job that reads data from a CSV file and writes it to a database.

We create jobs using the JobBuilderFactory. We define the name of the job by passing the name to the get method. We concatenate a random number so that we can re-run the app if this job fails. The example below shows a job that has one step.

Spring Batch Step

A Spring Batch job can have one or more steps. A step is a specific phase of a job in that we could have one functionality in the first step of a given job, and have another functionality in the next. These steps can be run in sequence ie one after the other or in parallel.

There are two main kinds of steps you’ll see; a tasklet based-step and a chunk-based step. A chunk-based step will, as the name suggests, process the data in chunks. It could be in chunks of 100s, 200s, etc. Whereas, a tasklet, it performs a single task within a step.

In a step, we use a reader to read data from a given source. We then use a processor to do any business logic we may have. Then finally, we write that data to a destination.

We create a step using a StepBuilderFactory. Similar to creating a job, we set the step name using the get method. Since this is a chunk-based step, we need to define the chunk size and the type of data. Our step can read data of object type Order and transform that data so we write another type like ShippedOrder. In our example, the data that is read and written is of the same type.

We then use a reader to read the data. That data then goes into the processor and finally makes its way to the writer. These are the main components of a step where the processor is optional.

Spring Batch Reader

Spring Batch reader is what enables us to input data into our process. This data can different kinds of sources and Spring Batch has readers for the majority of data sources, whether that is SQL database, mongo, or even a CSV file.

Once we identify the item reader we need, we can then create a bean that returns an ItemReader.

In the example, we read the data using a JdbcCursorItemReader with an SQL query that reads all data from the SHIPPED_ORDER table.

Spring Batch Processor

In a processor, we can perform all sorts of business logic. When creating a processor, it needs to implement the ItemProcessor interface and implement the process method. The processor method will take an item as an argument and this has to be the same type as the one defined in the item reader return type.

In the example, the item processor looks like this. We can add more logic if needed but we’re just logging for this example.

Spring Batch writer

Finally, we have an item writer. An item writer takes in a list of items and it's up to us where we want to write that data.

We can persist that data into a table, write it to a file or send it to a Kafka topic. For the sake of an example, we’re just logging.

Step Listener

Together with a reader, processor, and writer, we can also have a step listener. The listener needs to implement StepExecutionListener and as seen in the example, that gives us the ability to do something before and after the step

Job Listener

When creating our job bean, we can add a listener using the listen() method. Similar to step listeners, we do something before and after the job.

Conclusion

There we have it. This was an introduction to Spring Batch.