Featured image of post When "Quick Emulation" Is Slow

When "Quick Emulation" Is Slow

How QEMU slows down your Docker build and what to do instead.

Recently, I was working on project where I realized that the build was painfully slow. It was taking about 3 hours to produce a small handful of Docker images. This is what happened next.

Investigation

The first thing I did was try to compile the code locally to see if it was a general problem or localized to our CI. I knew to roughly expect that native compilation to take less than 10 minutes to produce all of the binaries, and it did that time as well.

Next, I tried running the version of the build that produced Docker images. This also ran in under 10 minutes. So, I started digging into what was taking so long. And that’s when it hit me. We were building twice as many images in CI because we were building multi-arch images for amd64 and arm64. But twice as many images shouldn’t take over 10 times as long. There was still something else going on.

So, I tried building an arm64 image on my amd64 machine. Bingo. After 20 minutes, it still hadn’t built one image. The next question was why.

As it turns out, builds are very slow through emulation. When you run a build for a different platform using docker buildx, several things happen under the hood. The first is that Docker delegates the build to BuildKit. BuildKit, in turn, emulates the target platform using QEMU.

What is QEMU?

QEMU (short for Quick Emulator) is a powerful open-source tool that, among other things, allows BuildKit to create cross-platform images on a single machine. It does this by simulating a different processor architecture in software. In essence, it creates another computer inside your computer to do things. As you might imagine, this creates some overhead.

First, let’s acknowledge that this is pretty cool. If you are building software and you want to ship binaries that work on both PCs and Macs, for instance, it is extremely convenient to be able to do that on the same machine. It’s even cooler to be able to do it almost automagically. If the alternative is buying hardware for each platform you want to target, emulation is a no-brainer. At least until your project grows large enough that the overhead introduced by this emulation becomes untenable. In our case, there was about a 20x difference between the build times for native vs. emulated buids.

With that kind of inefficiency, it made sense for us to explore other options.

Options for producing binaries for another platform

There are a few options worth discussing.

Virtual Machines

It’s also possible to create virtual machines on a different architecture to run your builds. Unfortunately, you will still be doing similar emulation in this case, and you are unlikely to see much performance benefit. It may also be a bit more painful to coordinate the different builds to produce your final artifacts when they are logically running on different computers.

Cloud-based emphemeral compute

Major cloud providers and CI platforms have various hardware that you can utilize to build your application. Depending on your tooling, this it could be pretty easy or very hard to set up a build flow that actually occurs on a different platform. In our case, I could not readily find documentation on how exactly to designate what hardware architecture part of the build would run on. That’s not to say it isn’t possible, but I got distracted by the last option.

Cross-compiling

Cross-compiling is possible in most modern compiled languages. In some languages, it is easy. Go is one of the easy languages. All you have to do is set the GOOS and GOARCH environment variables. Instead of emulating the target platform, the compiler builds the binary for that platform directly. The compile time is roughly the same as your native build because no emulation is happening, and you end up with a binary that is built for the target platform.

Results

It can be tempting not to touch things that work; in any sense of the word. As the saying goes, “If it ain’t something that’s broken, then there’s no need to repair it.

The compute saved was substantial but hardly significant to us, in the scheme of things. The time saved was significant. When running a build takes half a day, iterating a few times or debugging a problem with the build starts to take multiple days. When it runs in 15 minutes, you can make the same progress in hours. With this fix, we delivered software faster. And that sounds like a win to me.

Built with Hugo
Theme Stack designed by Jimmy