Getting probed. The ACE Cisco way!

Now with a title such as this you’d almost think I was going to start talking about Cisco’s awesome prices and comparative rates but alas, It is part three of my dive into load balancers.

Previously we have discussed what a load balancer can offer and a simple one-arm configuration. That’s great! Now lets look at how to keep your servers up and not up and down like a merry-go-round.

Health Probes are one way to ensure a server is still servicing correctly. It can let a load balancer know that requests to that server will still be there or conversely let it know that the party’s over and it’s out of action. There are some important things to understand when talking about health probes. Timers. How often, how long and what is being checked. Sounds like a horrible day at t
he doctor’s office. Or for some – the daily grind!

Now – back on topic Anthony! Health probes are easily set up but can be a headache. I mentioned that timers are important and I will discuss why. Health checks will place a server in or out of service and it allows the ACE to make informed load balancing decisions. The two categories are Passed – denoting a valid response, and Failed – indicating a server which has failed to provide a response after a specified number of retires.

There are over 1000 unique configurations for probes. Ping, HTTP, ICMP and more! Oh the possibilities. This allows compatibility with a raft of services and software.

Lets get probin’

First we configure the health probe with its name, type of probe, and the attributes associated to it.

probe http ACE-PROBE-HTTP-GET
 interval 10
 faildetect 2
 passdetect interval 30
 passdetect count 2
 request method get url /index.html
 expect status 200 200

Note above the following settings. First we have defined the probe type including a name for it. Next the interval command specifies how frequently the ACE sends probes to the server marked as passed. This is important. The Goldilocks rule applies here. Timers that are too aggressive can load the server and cause a failure. Timers that are too slow will not be efficient in detecting a failure. Measured in seconds the interval default is 15. Here we have set it to 10 seconds.

Faildetect works by setting a counter and allows you to choose how many probes can fail before a server is marked as failed and placed out of service. Default is 3 and here we have manually defined 2.

Passdetect interval is the next interesting feature and it needs to be set correctly. When a server has failed to respond to probes the passdetect interval must pass before a new probe is sent. That is by default 60 seconds. Too long for me. 30 seconds will suit. To put the server back inservice it must pass 2 health checks defined with passdetect count at the interval value of 10 seconds.

The way I am checking this probe is through a HTTP get request. This allows me to confirm that IIS/Apache is working. If I was just to ping the real server IP, IIS/Apache could fall over and the server would still receive HTTP request and yield unsatisfactory results.

The next important field is expect status 200 200. By default the device will not have an expected statuses set. This means that it will never expect anything and the probe will fail. By defining what to expect this will allow the probe to mark a successful pass! You can define a high and low range with this setting and that will allow you to expect a all sorts of http response codes.

My take

The importance of health probes is paramount to achieving HA. Ineffective timers can cause headaches. What I have shown here is but a fraction with many 00’s of the possibilities that are out there. Just remember if you are being aggressive that the probe you are placing on a server farm,  that there could be hundreds or thousands of simultaneous probes occurring. This will place more load onto the ACE’s finite resources.

More resources

Interval, time, and count are very important to settings to tweak to get the health check “just right”. A particular post from Tony Bourke sums up the importance of timings and intervals.

Health Checking On Load Balancers: More Art Than Science

For more information on health checks and load balancing tips and tricks – Please go check out one of the best CCIE DC candidates blog! Lot’s of ACE goodies aswell as Nexus and FC resources. Oh and it’s full of meme’s!

http://datacenteroverlords.com

Previous ACE Articles by @pandom_

Cisco ACE : I’ve not gone green and flipped a table yet.
Cisco Application Control Engine 4710

Cisco ACE : I’ve not gone green and flipped a table yet.

I swear people have had bad luck or I am just lucky. Maybe I am not testing my ACE or using them to their full capacity. So far! Phwoar. What a device. I’ve not hulked, raged or got angry at it. I’ve seen people cuss and curse and even grow grey hairs before my eyes.  I myself have not done any of it! Medal? Maybe. Serious load? Maybe not 🙂

On the heels of my previous article regarding Cisco ACE load balancers I am following up now with a basic configuration and getting your ACE servicing. Now that we have established the concept of the load balancers role in the network and how it works to deliver increased uptime and performance.

** Disclaimer – I have worked on these in a lab environment and dealt with a handful in a production space. I do not profess to be a rock star and below explanations have been made with my best efforts and understandings. Feel free to point out any major no no’s or inconsistencies. **
Remember to allow connectivity first.

In the admin context – change this after if you want to disable http/https or other access methods.

access-list ALL line 8 extended permit ip any any
access-list RMT-MGMT-ACL line 8 extended permit ip any any
access-list RMT-MGMT-ACL line 16 extended permit icmp any any

 

Virtual Contexts

The virtualized environment is divided into objects called contexts. Each context behaves like an independent ACE appliance with its own policies, interfaces, domains, server farms, real servers, and administrators. While the server load balancing design doesn’t require multiple contexts for successful implementation, the ACE 4710 appliance is provisioned with one user context on top of the default Admin context. This approach provides better implementation flexibility in the future. One of the possible features that such setup makes available is active/active implementation with load sharing between redundant appliances. Active/active mode of operation requires multiple user contexts to be provisioned and started, therefore this option is left for potential expansion in the future. Each user context is initially defined in the Admin context, which contains the basic settings for each virtual device or context. Each context has a number of SVIs associated with it for communication.

ACE4710-01/Admin# sh context
Number of Contexts = 3
Name: Admin , Id: 0
Config count: 137
Description:
Resource-class: default
FT Auto-sync running-cfg configured state: enabled
FT Auto-sync running-cfg actual state: enabled
FT Auto-sync startup-cfg configured state: enabled
FT Auto-sync startup-cfg actual state: enabled
Name: WWW-CXT , Id: 1
Config count: 113
Description: WWW Frontend Context
Resource-class: WWW-RC
Vlans: Vlan100-101
FT Auto-sync running-cfg configured state: enabled
FT Auto-sync running-cfg actual state: enabled
FT Auto-sync startup-cfg configured state: enabled
FT Auto-sync startup-cfg actual state: enabled
Name: DNS-CXT , Id: 2
Config count: 167
Description: DNS Lookup Context
Resource-class: DNS-RC
Vlans: Vlan110-111
FT Auto-sync running-cfg configured state: enabled
FT Auto-sync running-cfg actual state: enabled
FT Auto-sync startup-cfg configured state: enabled
FT Auto-sync startup-cfg actual state: enabled

** Note here that FT auto-sync shows that the running config and startup config are being shared between the Fault Tolerant group.
Resource-Classing – Class those resources boy – maximise your balanced load.

One part of having contexts is the fact you have the ability to allocate an amount of the physical devices resources to a virtual context. In our example below we could split 50 percent of total chassis resources to WWW context and 30 percent to the DNS context. This allows us to reserve 20 percent for Admin base context so the device does not become overloaded.

resource-class DNS-RC
 limit-resource all minimum 20.00 maximum unlimited
 limit-resource mgmt-connections minimum 20.00 maximum unlimited
 limit-resource sticky minimum 20.00 maximum unlimited
 limit-resource rate mgmt-traffic minimum 20.00 maximum unlimited
 limit-resource throughput minimum 30.00 maximum equal-to-min
resource-class WWW-RC
 limit-resource all minimum 20.00 maximum unlimited
 limit-resource mgmt-connections minimum 20.00 maximum equal-to-min
 limit-resource sticky minimum 20.00 maximum equal-to-min
 limit-resource rate mgmt-traffic minimum 20.00 maximum equal-to-min
 limit-resource throughput minimum 50.00 maximum equal-to-min
THO-EST-SLB-01/Admin# sh resource allocation | begin throughput
---------------------------------------------------------------------------
Parameter Min Max Class
---------------------------------------------------------------------------
throughput 0.00% 80.00% default 

 50.00% 50.00% WWW-RC 

 30.00% 30.00% DNS-RC

Although these devices are in a test lab and I am generating my own traffic – these values here should not be taken as gospel and those with far more knowledge of ACE and SLB principles should comment here if you read this. I’d love to know what DC guru’s would recommend.

Fault Tolerance and FT Groups
It is possible to share contexts between devices. This allows us to have fail over if a ACE drops. This means we have connection redundancy for traffic passing to the servers as well as device redundancy that will allow us to continue servicing requests if we need to update or lose an ACE peer.

FT Example

ft interface vlan 100
 ip address 169.254.0.1 255.255.255.252
 peer ip address 169.254.0.2 255.255.255.252
 no shutdown
hostname ACE4710-01
peer hostname ACE4710-02

ft peer 1
 heartbeat interval 300
 heartbeat count 10
 ft-interface vlan 100
ft group 10
 peer 1
 peer priority 110
 associate-context Admin
 inservice
 ft group 20
 peer 1
 associate-context WWW-CXT
 inservice
 ft group 30
 peer 1
 associate-context DNS-CXT
 inservice
shared-vlan-hostid 1
peer shared-vlan-hostid 2

Here my FT VLAN allows keep-alives to be passed through to each other. We define the device hostname and the peers hostname then we set up peer 1 and how regular FT heartbeats area and the number required to miss before failure. Then I assign groups to associate contexts to. This allows sharing of context configuration. Then I set the remote peer id and voila! Friendship and Rainbows!

 

Farmville!

Lets start by discussing server farms and defining real servers. Below we define the following real servers in the ACE.

rserver host WWW01
 ip address 192.168.10.10
 inservice
rserver host WWW02
 ip address 192.168.10.11
 inservice
rserver host WWW03
 ip address 192.168.10.12
 inservice
rserver host WWW04
 ip address 192.168.20.13
 inservice
rserver host DNS01
 ip address 192.168.20.10
 inservice
rserver host DNS02
 ip address 192.168.20.11
 inservice
rserver host DNS03
 ip address 192.168.20.12
 inservice
rserver host DNS04
 ip address 192.168.20.13
 inservice
One Arm and Virtual Server Farms

Simple enough to define a real server. Important trick to remember is in service. Treat it like no shut! Now that we have define our real servers we need to nest them inside a Virtual Serverfarm.
This server farm will be the IP that is presented to the world. It will distribute requests based upon roundrobin load sharing and service accordingly.

serverfarm host WWW-FRONTEND-SF
 predictor roundrobin
 rserver WWW01
 inservice
 rserver WWW02
 inservice
 rserver WWW03
 inservice
 rserver WW04
 inservice
serverfarm host DNS-SF
 predictor roundrobin
 rserver DNS01
 inservice
 rserver DNS02
 inservice
 rserver DNS03
 inservice
 rserver DNS04
 inservice

Simple enough there. Now comes the head scratching pat!

Map inside my Map so we can discover while we discover.

Follow this key and decipher! It does make sense – trust me!

Define the class-map WWW-CMAP. This matches traffic from the listed IP. The Policy map multi-match MATCH-REQUEST-ACTION-PMAP matches our first WWW-CMAP then applies what is contained in the Policy map. The second policy-map then assigns it to a server farm.

** Disclaimer – As far as my little mind understand this is how it all works. Feel free to correct. I have been reading a lot and there isn’t much info out there! **

class-map match-all WWW-CMAP
 2 match virtual-address 192.168.10.1 tcp eq www
policy-map multi-match MATCH-REQUEST-ACTION-PMAP
 class WWW-CMAP
 loadbalance vip inservice
 loadbalance policy LB-WWW-PMAP
 loadbalance vip icmp-reply
policy-map type loadbalance http first-match LB-WWW-PMAP
 class class-default
 serverfarm WWW-FRONTEND-SF 

class-map match-all DNS-CMAP
 3 match virtual-address 192.168.20.1 tcp eq dns

policy-map multi-match MATCH-REQUEST-ACTION-PMAP
 class DNS-CMAP
 loadbalance vip inservice
 loadbalance policy LB-DNS-PMAP
 loadbalance vip icmp-reply 

policy-map type loadbalance http first-match LB-DNS-PMAP
 class class-default
 serverfarm DNS-SF

Alright – now because this setup is a one-armed ACE install we need to point a static route back to the SVI. Now our traffic goes to a server of the ACE’s choosing dealt out by a round-robin styled procedure.

Ant’s thoughts

So far so good. The ACE for me has been reliable and customisable as I need. Next little post will cover the health checking probes which allow a Server farm to mark a real server offline. Great if you need to upgrade, install, change or fix. It’s a lot to take in but I am enjoy what this product can do.

Oh and expect a rant towards programmers soon with how they put data onto the wire and think that they know best.