Quick one today. I have been administering an OpenStack environment for a while now. Before rolling with VMware Integrated Openstack, we used the precursor VMware OpenStack (VMOS). When we use the environment to a greater capacity there would be scenario where our instances would not be provisioned. I vaguely remember this and we modified Nova. In VIO – modifications are done via custom.yml
.
Nova would attempt to schedule them and an ERROR would arise. When looking closer at nova-scheduler.log
on Controller01/02
boxes there would be the following:
2019-03-18 05:48:39.988 18668 WARNING nova.scheduler.utils [req-4619d840-70bf-4e29-9dfa-4f4725b45080 0b9e9085999540d8a3f1db6d47143275 2899fc2120994fc8a3d7b27d26056ba6 - 449203f05e4046b4a09c504c7bbc5b65 449203f05e4046b4a09c504c7bbc5b65] [instance: 52766a1e-bac8-43e5-b272-8f37e1a162c7] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found.
Now this is not a helpful error. There is ample compute resources. There is valid hosts. Everything is healthy. What’s wrong?
Well what is wrong is the following. There is a default value of memory overcommit for Nova. It will look at the entire compute cluster and contrast that against provisioned Instances. It looks at the Instances configured amount and not the currently used Memory.
My environment has 8 x 256GB servers in it. So let’s do some math. So the following is simple:
256*8 = 2048
GB of RAM in my environment.
Let’s compare this to my current Memory used. Note that this is not active memory but my current memory allocated to all Instances in my compute cluster.
Get-VMhost | Select Name,@{N="Memory used GB";E={$_ | Get-VM | %{($_.MemoryMB / 1KB) -as [int]} | Measure-Object -Sum | Select -ExpandProperty Sum}} Name Memory used GB ---- -------------- srv-009.mgt.sg.lab 379 srv-007.mgt.sg.lab 442 srv-010.mgt.sg.lab 328 srv-006.mgt.sg.lab 352 srv-011.mgt.sg.lab 390 srv-012.mgt.sg.lab 438 srv-008.mgt.sg.lab 401 srv-005.mgt.sg.lab 368
And with some math:
379+442+328+352+390+438+401+368 = 3098
GB of RAM allocated. So there is a 1.5 overcommit here. This is where I was running into the memory ceiling of instances not being scheduled. So where does number come from?
Look in the /opt/vmware/vio/custom/custom.yml
file and you will find the following property:
# Virtual Memory to physical Memory allocation ratio which affects all CPU filters. nova_ram_allocation_ratio: 1.5
So if this is the default then 2048 * 1.5 = 3072. The cluster was just over the 3098 number of current workloads and as such Nova will not schedule more instances.
Modify the value of nova_ram_allocation_ratio
in the custom.yml to whatever value you deem appropriate. I am doing 3.5.
Then run viocli deployment configure
[email protected]:/var/lib/vio/ansible/custom$ sudo viocli deployment configure Using Customization file /opt/vmware/vio/custom/custom.yml Customization data: {nova_ram_allocation_ratio: 3.5, nsxv3_default_tier0_router: 1663d49a-59cf-41ca-9480-65ee753d9991} Configuring deployment VIO5... [ 2 %] [##
Now once this runs you will be able to exceed the oversubscription ratio!
PS /Users/aburke> Get-VMhost | Select Name,@{N="Memory used GB";E={$_ | Get-VM | %{($_.MemoryMB / 1KB) -as [int]} | Measure-Object -Sum | Select -ExpandProperty Sum}} Name Memory used GB ---- -------------- srv-009.mgt.sg.lab 611 srv-007.mgt.sg.lab 594 srv-010.mgt.sg.lab 514 srv-006.mgt.sg.lab 528 srv-011.mgt.sg.lab 550 srv-012.mgt.sg.lab 562 srv-008.mgt.sg.lab 487 srv-005.mgt.sg.lab 548
That’s 611+594+514+528+550+562+487+548=4394
GB!
There we go. Much larger consolodation ratio!
Please be mindful about your environment and its maximums.