A Controlled Recovery is better than Chaos

VMware ESXi 5.1 has numerous features to make your virtualized enterprise a slick operation. The order of boot up can be very important after maintenance or an outage. The order of server operations can impact roles and functions of servers to no end. An important boot order that can cause serious headaches is Microsoft Exchange Server. In most environments you will have an Active Directory server with DNS installed on it. You will also have an Exchange Server and probably SQL, SCCM, and many other servers that require DNS.

If left to its own devices, the ESXi host will by default will not boot any hosts at all. If you are running in single server or the only device in the cluster has this particular VM then this isn’t a good thing. We need them to boot back up with the ESXi host and have the guests boot in an order that is suitable to operations. We are trying to avoid a race condition in turn preventing a failure of services on boot. My lab below has ESXi 5.1 installed as a VM with two guests installed. I use my Thunderbolt Gigabit Ethernet adapter to plug upstream into my Juniper SRX110H-VA.

Screen Shot 2013-02-03 at 6.20.44 PM

Firstly, log into your ESXi host using vSphere tools and navigate to Configuration > Software > Virtual Machine Startup/Shutdown on the Host. Note that this ESXi host has just booted up and the Customer A’s guests haven’t booted.

Screen Shot 2013-02-03 at 5.05.03 PM

Note the default options. By default Virtual Guests do not startup or shutdown with the actual ESXi host. It is important to define how you want devices to start and shutdown as this allows deterministic results and you can align to best practices.

In my lab, guest tc_001_Customer_A runs Microsoft Active Directory as a Domain Controller with the additional role of DNS server. Guest tc_002_Customer_A runs Microsoft Exchange. Exchange has critical dependency on AD and DNS. If it fails to detect AD and DNS, services fail to start and there is all sorts of headaches awaiting. Probably a queue of end users if you take too long. No body wants or deserves that.

Now let us enable a staggered start for our guests. Click Properties in the top right corner. Now lets view what is before us.

Screen Shot 2013-02-03 at 5.23.21 PM

So by default all guests are in Manual Startup. This means although they inherit the global default settings of 120 seconds, they are actually skipped at boot time. This isn’t great. I know an engineer or two who has missed this. Note there are three types of Startups. Automatic, Any Order, and Manual. Manual and Automatic as self-explanatory. Any Order is interesting. Any servers which do not have major role dependencies such as backup VM, orchestration software, or auxiliary services can be placed in here.

So enable Allow virtual machines to start and stop automatically with the system  and move the client tc_001_Customer_A up into Automatic Startup. This device will now boot 120 seconds after the ESXi host starts up.

Screen Shot 2013-02-03 at 5.32.53 PM

Now our AD DC is on automatic boot we can determine what to do with tc_002_Customer_A which runs Exchange. Let us place this guest into Automatic Order too. Not being comfortable with the machine racing against the AD DC and DNS server I want this VM to have its own Startup settings. Select the guest tc_002_Customer_A and click Edit on the left side.

Screen Shot 2013-02-03 at 5.33.25 PM

As you can see the administrator has the ability to deviate from inherited settings for this particular guest. I want to make the Exchange server boot up after 180 seconds. Click Use Specified Settings and enter 180 seconds into Startup Delay field.

Screen Shot 2013-02-03 at 5.33.52 PM

Now that you have verified we can shut the ESXi host down and see what happens on reboot. Right Click on your ESXi host and select Reboot.

Screen Shot 2013-02-03 at 5.34.03 PM

Once doing that you are prompted to enter a reason why this is occurring. In a production environment you might align this information with a change request or incident ID.

Screen Shot 2013-02-03 at 5.34.37 PM

Enter the information and hit okay. With my Retina Macbook Pro as my test machine, the action of restarting takes about 15 seconds to boot ESXi. It will be longer on a blade or racked server due to boot ups for BIOS, iLO, Raid Configuration Managers, and more.

Screen Shot 2013-02-03 at 5.37.21 PM

As you reconnect to your host through vSphere you notice that the first guest, our AD DC and DNS server have begun booting up. The green play icon against the guest name denotes it is running. The Recent Tasks panel down the bottom also shows what has occurred. Now if we start counting let us see what happens.

Screen Shot 2013-02-03 at 5.38.47 PM

Like clockwork the Exchange server has come up. Note the addition to Recent Tasks and the green play icon against tc_001_Customer_A. 

Now don’t take a green play button as working. You should always log into each device and check to see the correct services are running. You may need to tweak timers as you chain different services and guest types.

The use of this feature should be installed in sites where only one ESXi host resides. If you are clustering and have a single host guest then enable it for this particular guest. Ideally you should use vMotion in a HA ESXi Cluster to move live machines to another ESXi host. This will avoid outage and maintenance can be performed. Never the less you should add this feature to your skill set as a virtualization administrator.

 

2 thoughts on “A Controlled Recovery is better than Chaos

  1. jlgaddis says:

    … and don’t forget that after you create a new virtual machine you will also need to go back into the configuration and set the appropriate startup action for it (I’ve been bitten by that one once!).

    • Indeed. It is a PITA when they just chill in Manual. What is more annoying is if you rely on this feature and decide to vMotion across to another host in the cluster you lose any individual settings; it learns from the hosts global and sits in any order.

Leave a Reply

Your email address will not be published. Required fields are marked *


*