Lessons in OpenStack - Less is More

OpenStack Version - Newton

I am a greedy programmer. I like big beefy machines with multi-cores and hundreds of gigabytes of RAM. This is a post on how OpenStack taught me a lesson that sometimes less is indeed more. Interestingly, I was not aware that the powerful Unix utility "less" gets its name due to a similar reason.

Background - I was trying to run a stereo-vision algorithm based on semi-global matching. The input images were quite large - 8000 x 8000 pixels. Stereo-matching is quite memory intensive but the operations can be heavily parallelized using OpenMP. So of course, I wanted to use as many cores as I could get my hands on.

On a physical machine with 28 physical cores (56 if you count hyperthreading) and 125 GB of RAM, it ran to completion in 4-5 minutes on average. My greed had paid off! The next step was to put it on our RVL Cloud so that multiple users could run the program simultaneously. I knew for a fact that our collaborators were using the same program on a VM with 60GB of RAM and 36 VCPUs and getting similar run times.

Elated with my success, I then converted the physical node to an OpenStack compute node and launched a VM (running the same OS, program etc.) on it with 60GB of RAM and 40 VCPUs (similar to what my collaborators had). The run time was around 10 minutes. Still not 5 minutes, but virtualization does add some overhead and the VM did have fewer number of cores. I decided to launch a bigger VM with 125 GB of RAM and 56 VCPUs to make up for the overhead of virtualization. Imagine my consternation as I watched the run time crawl to a whopping 40 minutes!!!

Then followed a period where I alternated between deep introspection and hair-pulling and wondered if I had fallen through a crack in space-time into an alternative reality. I finally realized that in my greed for cores, I had forgotten that OpenStack runs some native processes on the host machine. Effectively my big VM and OpenStack were competing for the 56 VCPUs. More google search indicated that this is an actual issue and that is why CPU pinning exists. From the RedHat guide, “The exact configuration depends on the NUMA topology of your host system; however, you must reserve some CPU cores across all the NUMA nodes for host processes and let the rest of the CPU cores handle your guest virtual machine instances”.

In my case, my VM was the only VM on the physical node, so the lack of CPU pinning was not the issue. Rather it was the fact that both the big VM and OpenStack were heavily competing for the VCPUs. With the smaller VM (with 40 VCPUs), there were 16 free VCPUs which OpenStack could use.

Lesson learnt - Sometimes, it pays to be thrifty.

~Bharath Comandur

This page was last edited at 2020/04/03 11:49