This blog looks to highlight a solution to ingress routing for multi-DC networks and their applications that have location independence using overlay protocols such as VXLAN. It takes advantage of events occurring within the infrastructure and manipulates the network accordingly. This is done through event state monitoring and the use of APIs.
Network Virtualisation brings administrators the ability to achieve this today. This is no longer in the realm of science fiction or is it complex as brain surgery. To achieve the desired outcome of an active-active application topology with local east west and north south routing there are not specific requirements of the network fabric.
Lets build this application topology and this specific outcome using VMware’s NSX for vSphere 6.2.
Location Independence is a nice way to describe Logical Switching that is provided by NSX. Using the VXLAN protocol it is possible to ensure that L2 connectivity can be provided to workloads that are across L3 boundaries. VXLAN overlays provided by NSX come in two flavours – Logical Switch and Universal Logical switch. Quite simply the difference between the two – A Logical Switch is bound to a single vCenter whilst a Universal Logical Switch is replicated between numerous vCenters.
Site-Local routing for east-west traffic
Normally applications are comprised of numerous workloads within a an application tier. Whether it be a three tier application or two tier there is a requirement for routing. Routing has been traditionally been provided in a centralised manner by a gateway device like a core switch. Whilst suitable for connectivity it leaves undesirable application hair pinning and can induce some latency. Simplifying the east-west traffic has been made a little easier with Distributed Logical Routing. Distributed Logical Routing with NSX provides the default gateway for each network local to the host a workload resides on. That means if a workload requires to route between two subnets it can do so within the hypervisors kernel space.
Site-local routing for north-south egress traffic
Routing that leaves the environment can take advantage of local routes specific to the site. Egress optimisation is in the form of a locale-id. In a multi-site topology the locale-id will determine what routes are sent to the NSX controller. The controller will have the locale-id impressed on the routes and it will only send routes to the vSphere hosts with a matching locale-id. This ensure that communication of traffic that is leaving the application topology takes advantage of the best egress route out of a site-local edge gateway.
If an NSX manager is responsible for more than one site a locale-id can be modified at the UDLR, Cluster, or host level.
Site-local routing for north-south ingress traffic
With a workload that resides in an active active topology it is paramount that traffic that is destined to the application topology enters via the correct site. Traffic that is homed to the correct site ensures the client – server communication reaches the server via the the most optimum path. When it cannot the below will occur:
This prevents ingress via Site-A and routing across a datacenter interconnect to Site-B to the active workload. The shortest route ensures applications that the next-hop is the correct hop to reach where the active workload resides.
With a disconnect from the infrastructure layer that runs the compute load and the network both have been oblivious to events that occur. If a vMotion occurs and a workload moves from one host to another or one site to another the network has had to accomodate this. This has lead to suboptimal designs on overall increase in complexity because network administrators had to stretch networks (L2 VLAN DCIs) and dealing with large L2 networks.
When the infrastructure and network are aware of each other and can influence the network topology based on this there is a marked decrease in network complexity.
Achieving site-local routing for north-south ingress traffic
Here is an approach to ingress optimisation that uses APIs, an Application to monitor event state, and can driven the network based on what is happening at the application infrastructure layer.
Site-A is advertising the Web front end of the application to the WAN. The 192.168.101.0/24 network is being advertised into the OSPF protocol that is currently being used between the HQ site and branches. The current workload being access is 192.168.101.10. This machine is located in Site-A. Traffic destined for 192.168.101.10 will come through the WAN and find the route for 192.168.101.0/24 and see the the next hop is via Site-A.
If the machine moves from Site-A with vMotion to Site-B the workload will now be located on the same subnet, same logical network but a different datacenter. If the routing table is not updated then traffic will ingress via Site-A and trombone back across to Site-B and then potentially return out via Site A or lead to asymmetric routing. This is bad with an Edge firewall in play!
To deliver the dream
The application (built against NSX and vCenter APIs) has flagged the Site-B NSX Edge to wait for an event. This particular event is vMotion. When it detects a vMotion event the following occurs:
- Waiting for vMotion event
- Extract Workload IP address
- Inject Workload IP as a static route (192.168.101.10/32) to the Site-B NSX Edge
With the edge current configured to redistribute Static Routes into OSPF. This will automatically inject /32’s into OSPF. While the workload is currently at the DR site or branch site (known as Site-B) it has an ingress optimised path. The route in the OSPF routing table at the WAN will see the route 192.168.101.10/32 via Site-B.
Within the UI itself you can see that there is a description against the route providing information about the host it represents. In the video later on the viewer will see “Ingress Optimised for Web01”. Traffic that requires a session with 192.168.101.10 will find it via Site B!
If the workload moves back to Site-A via a vMotion event the workloads host route (192.168.101.10/32), the static route injected on Site-B NSX Edge will be withdrawn and connectivity via Site-A is represented by 192.168.101.0/24.
What about route-table explosion?
For a long time there has been the notion that if /32 host routes are used then my route table grows. If there are hundreds of applications that are being protected by the application there would be an explosion of routes within the routing table.
There is an extension that is being developed that will have a function that can refactor routes based on the networks it sees. For example if there was four workloads that had the IP addresses .1, .2, .3, .4 then by the previous example there would be four /32 routes injected into the NSX Edge as static routes and lifted into the WANs OSPF routing table. The refactoring in the application would advertise the best summary possible. Case in point instead of four /32 routes there would be a summary route advertised via Site-B of 192.168.101.0/29.
If .5 and .6 we also moved it would advertised two additional /32 host routes. Python and API goodness at its finest!
As Winston Churchill was quoted famously as saying :
I may be drunk, Router, but in the morning I’ll be optimised and you will still be ugly!
The Babs factor
My colleague Babs (Andrew Babakian) is a team member in Sydney, Australia. He along with a couple of other SE’s in my team always chew the fat about how network virtualisation opens itself to new use cases, new approach to old and new problems, and how it can bring world peace!
Babs has brains the size of a planet, two patents to his name, a couple of CCIEs, and working towards a Doctorate. This module was built by him solely uses NSX and vCenter APIs. It is purely based on vMotion events.
Babs also has a patent for this ingress module –
Dynamic Virtual Machine Network Policy For Ingress Optimization
A method of performing ingress traffic optimization for active/active data centers. The method creates site-specific grouping constructs for virtual machines that run applications that are advertised to the external networks. The site specific grouping constructs provide an abstraction to decouple virtual machines from traditional networks for common ingress network policies.
Kudos to him for writing this. His session at VMworld, NET4855 with Dimitri Desmidt received a 4.9/5.0 rating. This was (as I believe) the highest rated VMworld Session of 2015. It received a standing ovation due to the live demo. Babs and Dimitri must have sacrificed a few goats to the demo gods! Well done lads.
Demonstration of solution
This is a video that Babs made which takes the time to go through the entire environment, set it up, and explain what is occurring. It explains how the module influences and watches for the events it is programmed for and subsequently responds to inject a static route into OSPF.
tl;dr – it gets sexy around the 7 minute mark!
I tweeted last night about how awesome it is to work with people who happily share knowledge and want to teach other people. My colleague Babs (Andrew Babakian) is a team member in Sydney, Australia. He along with a couple of other SE’s in my team always chew the fat about how network virtualisation opens itself to new use cases, new approaches to old problems and can also provide new use cases. Imagine the notion of if the workload was actually a load balancer VIP and it also dragged it’s application front end with it along with the routing? How easy does application migration become?
Pretty damn cool that this is programmable networking and what a time to be working with Network Virtualisation.