Quick one today. I have been administering an OpenStack environment for a while now. Before rolling with VMware Integrated Openstack, we used the precursor VMware OpenStack (VMOS). When we use the environment to a greater capacity there would be scenario where our instances would not be provisioned. I vaguely remember this and we modified Nova. In VIO – modifications are done via custom.yml.

Nova would attempt to schedule them and an ERROR would arise. When looking closer at nova-scheduler.log on Controller01/02 boxes there would be the following:

  

2019-03-18 05:48:39.988 18668 WARNING nova.scheduler.utils [req-4619d840-70bf-4e29-9dfa-4f4725b45080 0b9e9085999540d8a3f1db6d47143275 2899fc2120994fc8a3d7b27d26056ba6 - 449203f05e4046b4a09c504c7bbc5b65 449203f05e4046b4a09c504c7bbc5b65] [instance: 52766a1e-bac8-43e5-b272-8f37e1a162c7] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found.

Now this is not a helpful error. There is ample compute resources. There is valid hosts. Everything is healthy. What’s wrong?

Well what is wrong is the following. There is a default value of memory overcommit for Nova. It will look at the entire compute cluster and contrast that against provisioned Instances. It looks at the Instances configured amount and not the currently used Memory.

My environment has 8 x 256GB servers in it. So let’s do some math. So the following is simple:

256*8 = 2048 GB of RAM in my environment.

Let’s compare this to my current Memory used. Note that this is not active memory but my current memory allocated to all Instances in my compute cluster.

 

Get-VMhost | Select Name,@{N="Memory used GB";E={$_ | Get-VM | %{($_.MemoryMB / 1KB) -as [int]} | Measure-Object -Sum | Select -ExpandProperty Sum}}

Name               Memory used GB
----               --------------
srv-009.mgt.sg.lab            379
srv-007.mgt.sg.lab            442
srv-010.mgt.sg.lab            328
srv-006.mgt.sg.lab            352
srv-011.mgt.sg.lab            390
srv-012.mgt.sg.lab            438
srv-008.mgt.sg.lab            401
srv-005.mgt.sg.lab            368

 

And with some math:
379+442+328+352+390+438+401+368 = 3098 GB of RAM allocated. So there is a 1.5 overcommit here. This is where I was running into the memory ceiling of instances not being scheduled. So where does number come from?

Look in the /opt/vmware/vio/custom/custom.yml file and you will find the following property:

  
# Virtual Memory to physical Memory allocation ratio which affects all CPU filters.
nova_ram_allocation_ratio: 1.5

So if this is the default then 2048 * 1.5 = 3072. The cluster was just over the 3098 number of current workloads and as such Nova will not schedule more instances.

Modify the value of nova_ram_allocation_ratio in the custom.yml to whatever value you deem appropriate. I am doing 3.5.

Then run viocli deployment configure

  
[email protected]:/var/lib/vio/ansible/custom$ sudo viocli deployment configure
Using Customization file /opt/vmware/vio/custom/custom.yml
Customization data:

{nova_ram_allocation_ratio: 3.5, nsxv3_default_tier0_router: 1663d49a-59cf-41ca-9480-65ee753d9991}

Configuring deployment VIO5...
[  2 %] [##

Now once this runs you will be able to exceed the oversubscription ratio!

  
PS /Users/aburke> Get-VMhost | Select Name,@{N="Memory used GB";E={$_ | Get-VM | %{($_.MemoryMB / 1KB) -as [int]} | Measure-Object -Sum | Select -ExpandProperty Sum}}

Name               Memory used GB
----               --------------
srv-009.mgt.sg.lab            611
srv-007.mgt.sg.lab            594
srv-010.mgt.sg.lab            514
srv-006.mgt.sg.lab            528
srv-011.mgt.sg.lab            550
srv-012.mgt.sg.lab            562
srv-008.mgt.sg.lab            487
srv-005.mgt.sg.lab            548

That’s 611+594+514+528+550+562+487+548=4394 GB!

There we go. Much larger consolodation ratio!

Please be mindful about your environment and its maximums.

Leave a Reply

Your email address will not be published. Required fields are marked *

*