Understanding CPU and I/O bound for asynchronous operations
Ever wondered what the cool kids mean when they talk about CPU-bound and I/O-bound thread pools? Read on to learn more.
For a long time, I’ve claimed that the most complex part of software development is anything related to concurrency. Parallel operations are simply very hard to keep track of in our single-track minds. This is why we try to create abstractions in software development to deal with this challenge.
As developers writing user-facing applications, we’re taught that heavy and time-consuming operations should happen on a background thread and we must only do short-lived and lightweight operations on the main thread. You might even have heard about thread pools (or coroutine dispatchers, or RxJava schedulers) and the concepts of I/O-bound vs CPU-bound. My experience is that even if developers have heard about it, few have a full understanding of what it means.
In this post, I’ll try to explain the differences between these two concepts and why it is important to understand them when it comes to client applications.
Different thread pools
On Android, popular libraries usually offer thread-pool-based concurrency abstractions. In Kotlin Coroutines, you specify a dispatcher which is basically a pool of Thread objects that will be used when it is time to execute. With RxJava, you pass a Scheduler to the subscribeOn() and observeOn() methods to specify where things will run.
Both of these libraries have dedicated APIs for I/O scheduling work, separate from other types of operations, but why is this the case? Why don’t we use a single thread pool for all background operations? The operating system will handle the scheduling of these threads the same.
The reason behind this is how an I/O operation behaves compared to one that is more CPU intensive.
Reading and writing data from a file on the local filesystem or from a socket over the network involves a lot of waiting, at least if you’re a CPU. The CPU will have to wait until the underlying hardware has delivered the data since it is not immediately available. Any I/O operation that requires reading or writing using anything that is not stored in RAM will cause something that is commonly called IOwait. This is a system call that tells the CPU to pause the executing of the current thread until data is available or have been successfully transmitted.
Think of this as when you’re waiting for Google to announce the dates for the next Google I/O conference. You can’t proceed with booking the Airbnb and flights until that is announced. Until you get that notification you’ll be free to spend your time reading about everything that is wrong with the iPhone XS. Planning for Google I/O involves a lot of IOwait.
When the operating system encounters an IOwait, it pauses the execution of that thread and is free to pick up another thread that is ready to run. Theoretically, an operating system can have an unlimited number of threads that are sleeping due to IOwait, without the user noticing any difference (In practice, there are other limits to consider, like available RAM, time spent context switching etc., but that is outside the scope of this post).
The important thing to remember here is that a thread dedicated to any I/O work (reading or writing to a file, database, or network) will spend most of their time sleeping while waiting for other hardware systems to complete their work. Most applications are reading and writing a lot of data, and each of those operations will require its own thread.
If you have too few threads available for doing I/O work, the performance of your application will suffer as they can’t even get started until some other I/O operation is completed and that thread goes back to the thread pool. This means that the upper bound for your pool of threads doing I/O operations should be fairly large.
Most of the background work an application will be doing is usually related to reading or writing from a file, a database, or the network, but some operations that are not doing any I/O are still heavy enough to require a background thread so that they won’t block our main thread. This can be work that requires a significant amount of processing, like blurring an image, compressing or decompressing in-memory data, or running the AI code for a game.
Let’s take a naive (and somewhat unrealistic) example of sorting a HUGE array of text strings. All the data is ready for the CPU immediately as it is stored in RAM. There is no IOwait happening, but the sorting still takes long enough to have a significant impact if it would be running on the main thread. We run this on a background thread where it can do its work and deliver a result back to the main thread once it is done.
Now imagine that your application suddenly will have 4 different arrays that need to be sorted at the same time. The CPU in this case has 4 cores, which means that the hardware supports 4 threads to be executed in parallel. What happens if all of these cores are suddenly busy executing the sorting of your arrays? There is no IOwait happening during the sorting, so the CPU won’t pause the execution as it does for an I/O operation. Although the operating system will switch the execution of threads even if they are not doing an IOwait, it will not pause threads as often, which risks leading to less CPU time left for the main thread.
The important thing to remember about CPU bound operations like this is that they should have a thread pool with a size that is related to the number of cores the CPU has (and thus the number of threads it can run in parallel), or we risk blocking up the CPU with work that will cause a negative impact on the user experience.
But what if…
There is of course more to this than simply thinking about I/O vs CPU-only work. We could also talk about memory-bound and cache-bound, but for most app developers, it is enough to consider the difference between CPU and I/O.
The key takeaway from this is that you need two different thread pools for the background work that your application will do. I/O operations should have a much larger pool than the CPU-only operations.
Note that Kotlin Coroutine dispatchers share threads when possible. This makes it very efficient when switching between two contexts. The case for I/O-bound vs CPU-bound still applies though.
Also note that while the thread pool for your I/O operations should be large, it shouldn't be unbound (that is, a possible unlimited amount of threads in the pool). There are other limitations in the OS that will prevent too many open sockets or files, but it is a good practice to have a sane value for the size of your thread pools in any case and not just let it grow uncontrollable.
Let’s have a look at a slightly more complicated case to understand what could go wrong.
Say your application is fetching images over the network and displaying them in a list. Before they are displayed, you apply a blur effect on each image. You now have an operation for each item in the list that is doing heavy I/O work followed by heavy CPU-only work. Which thread pool should you use? If we would only show one image at the time, it would be acceptable to perform all the work on a thread from the I/O thread pool, but what about this case where we show multiple images in a list?
Again, it depends on your situation. You might be fine with doing everything on the I/O thread, but if you experience a performance problem you might want to consider switching to a CPU-bound thread pool after you’ve successfully retrieved the image from the network and are about to perform the blur operation. In our example above it could mean that you’re having problems scrolling while the images are blurring.
The example below illustrates how you would do this using Kotlin Coroutines. Note that this is just to show how to switch between the different thread pools. Also note that if you’re using the Coroutine adapter for Retrofit, the call to fetchImageAsync() can run on the main dispatcher as Retrofit uses its own thread pool.
As always with code you find on the Internet, protect yourself and write proper unit tests!
Let’s see how this would look with RxJava.
The code above could also work as an argument against using Rx, but that is a completely different topic that I’m staying far, far away from. The key takeaway here is that specifying what kind of threads to use is done using observeOn(), except for the initial call (api.fetchImageObservable(), in this case) which requires a subscribeOn() call.
You might have heard about something called non-blocking I/O operations. Java has an entire API for this in the java.nio package. This is a very cool API that uses certain system calls that don’t block a thread for an I/O operation. If the library you’re using uses this, the rules about I/O-bound threads don’t apply the same way. Fortunately, most libraries doing I/O (like OkHttp and Room) are not using this, so you usually don’t have to consider this. Some day I might write a new post about non-blocking I/O so you can be as confused as I am when I work with it.
Most of the time your background work will consist of heavy I/O operations, so as long as you make sure they are done on the appropriate Scheduler or Dispatcher you should be fine. A common mistake is to try to optimize this yourself, often based on your empirical studies. Unless you have a very special case, avoid defining your own thread pools.
Since most libraries that deal with concurrency for us (like RxJava or Kotlin Coroutines) provide us with pre-configured thread-pools, all you have to think about as a developer is which one you should use based on the type of work you’ll be doing.
In the case where you do both CPU-heavy as well as I/O-intensive work, consider splitting up the operation in two parts and switch thread pools in between. One example where this might be relevant is if you use something like Glide or Picasso to download and then transform images. Again, the need for this depends on your specific situation.
Try to identify what kind of work each operation will be doing. If unsure, use something like StrictMode on Android to detect if it is doing any file or network operations.
So to sum it up; Many I/O threads are probably ok, but many CPU-only threads can be very bad.
Many thanks to Zac, Sean and Danny for reviewing this post. They are all very smart so make sure to follow them on Twitter!