August 17, 2018
L1 Terminal Fault Poses Risk for Virtual Machines
The Spectre and Meltdown vulnerabilities found earlier this year threw the IT service industry into an uproar. These vulnerabilities existed in locations that system designers thought would be unreadable from cyber criminals. However, where there is a will, there's a way, and people derived information from data collected in cache locations. That begged the question...were there more of these types of vulnerabilities? The answer is...yes. These vulnerabilities take advantage of something called speculative execution and the newest one is called L1 Terminal Fault (L1TF).
First, we need to talk about speculative execution.
Each byte of memory on a system has both a virtual address (memory address used by applications) and a physical address (the actual location of the data in system memory) tracked by the CPU. The operating system keeps track of these translations in page tables that the CPU regularly walks to perform translations while checking other attributes of the page table entries. During a walk of the page table when the processor is performing translations, if the processor identifies a page table entry marked with the ‘not present’ attribute, the CPU throws a fault, identifies and pulls the proper data into cache, then marks the memory location as present. By itself, this isn’t bad, as this happens regularly on a busy system. Due to aggressive speculative execution performed by Intel processors, the CPU performing this walk will tag memory as not present and note that an exception should be thrown by the operating system in the future. The CPU still speculatively looks through the cache to see if a matching entry for the physical address is already loaded and may forward the data back to the requesting application which shouldn’t have access to it. Likewise, a malicious actor could take advantage of this speculative execution to obtain sensitive information that they shouldn’t have access to.
This vulnerability exists in physical hardware, operating systems, and even cloud infrastructure. Patches have been released for operating systems and the CPUs themselves which must be installed. The more complex problem is what to do when dealing with cloud infrastructure. While patches exist, as multiple virtual machines share resources on the same physical host, the L1 cache must be flushed before CPU execution is handed off from one VM to another to prevent disclosing the contents of the cache to the wrong VM. This will have a performance impact dependent on the workload, but will prevent a cyber criminal from reading the memory of another VM on the same physical host.
Finally, in some cases where multiple virtual machines share a CPU core to take advantage of Intel Hyper-Threading (which allows one CPU core to effectively be treated as two) the situation worsens. The cache can’t be flushed as both VMs are simultaneously sharing one core and you can’t flush the cache to prevent speculative reading. If the environment is running virtual machines that you don’t fully manage and cannot guarantee they are fully patched, the current suggestion is to disable Hyper-Threading or enable core isolation in the hypervisor which will have a negative impact on performance. However, this will prevent a malicious VM sharing the same Hyper-Threaded core as a legitimate VM from being able to read the legitimate VM’s memory. The decision to act on the various methods of remediation will need to be evaluated on a case-by-case basis taking risk and performance into account.
All of this information is important to know and understand and that’s where Managed Service Providers can help. An MSP can mitigate risks and keep up on the bleeding edge of vulnerabilities. A remote monitoring and management tool allows them to keep all servers patched properly and a disaster recovery solution, in this type of breach, will allow you to run on a segregated hypervisor in the event you detect a breach on the primary system.