Nutanix AOS 6.1: Memory Overcommit ~Hindawi

Memory overcommit support is now available for AHV to help drive hardware utilization efficiencies within your environment. It provides an effective way to reduce hardware costs and increase the capacity of an existing environment that can’t be immediately expanded with new hardware. If you want to increase the density of your test, dev, disaster recovery, and other less performance-sensitive environments, this capability will allow you to reclaim memory from over-provisioned virtual machines (VMs).  

One of the central benefits of virtualization is the ability to overcommit compute resources, making it possible to provision more CPUs to VMs than are physically present on the server host. Most workloads don’t need all of their assigned CPUs 100% of the time, and the hypervisor can dynamically allocate CPU cycles to workloads that need them at each point in time.

Much like CPU or network resources, memory can be overcommitted also. At any given time, the VMs on the host may or may not use all their allocated memory, and the hypervisor can share that unused memory with other workloads. Memory overcommit makes it possible for administrators to provision a greater number of VMs per host, by combining the unused memory and allocating it to VMs that need it.

AOS 6.1 brings memory overcommit to AHV as an option to allow administrators to utilize in environments such as test and development where additional memory and VM density is required. Overcommit is disabled by default and can be defined on a per-VM basis allowing sharing to be done on all or just a subset of the VMs on a cluster.

Based on the memory usage of the VM, AHV adjusts to match the appropriate memory usage for each VM enabled with memory overcommit. The excess memory can be used by the host to satisfy the memory requirements of another VM. This provides a great way to reduce hardware costs for large deployments or to increase the utilization of an existing environment that can’t be immediately expanded with new nodes. Memory will only be shared by VMs that have memory overcommit enabled by the administrator or user. Other VMs will operate with their pre-assigned memory, and can coexist with overcommit enabled VMs.

Memory overcommit enables the host to utilize excess memory inside a VM for any existing or new VM within the host. For example, excess unutilized memory of VM1 on Host1 can be utilized for another VM on Host1. Hence this feature is adaptive. The system will identify an appropriate size for a VM by observing metrics related to memory pressure. AHV will measure these metrics and provide them to ADS (Acropolis Dynamic Scheduler). ADS will consolidate those metrics and decide on an overall “memory pool” size for each host that all overcommitted VMs on that host share; and within the boundaries of this pool, AHV will then adapt VM sizes to minimize swapping and optimize performance.

Nutanix uses a multi-tier approach, which involves a combination of ballooning and hypervisor-level swap. The rationale is that ballooning can offer insight into a VM’s memory management status that is hard to get from outside. Hypervisor swap offers a guarantee that we can shrink VMs even if they don’t have a balloon driver, their balloon driver isn’t working properly, or the VM is maliciously tampering with the balloon driver. Overcommit by Ballooning is enabled by default and is preferred but we fall back on hypervisor swap for VMs that do not have a functional balloon driver or where the balloon driver is not able to remove the desired amount of guest memory in a timely fashion. Hypervisor swap is implemented using per-host vdisks hosted on AOS.

Limitations of Memory Overcommit

  • You can enable or disable Memory Overcommit only while the VM is powered off.
  • Memory overcommit is not supported with VMs that use GPU passthrough and vNUMA.
  • Migrating a VM enabled with Memory Overcommit takes longer than migrating a VM not enabled with Memory Overcommit.
  • There may be a temporary spike in the aggregate memory usage in the cluster during the migration of a VM enabled with Memory Overcommit from one node to another.
  • Using Memory Overcommit heavily can cause a spike in the disk space utilization in the cluster. This spike is caused because the Host Swap uses some of the disk space in the cluster.
  • All DR operations except Cross Cluster Live Migration (CCLM) are supported.

Leave a Reply

Your email address will not be published. Required fields are marked *