Ingress Optimisation with NSX for vSphere

This blog looks to highlight a solution to ingress routing for multi-DC networks and their applications that have location independence using overlay protocols such as VXLAN. It takes advantage of events occurring within the infrastructure and manipulates the network accordingly. This is done through event state monitoring and the use of APIs.

Network Virtualisation

Network Virtualisation brings administrators the ability to achieve this today. This is no longer in the realm of science fiction or is it complex as brain surgery. To achieve the desired outcome of an active-active application topology with local east west and north south routing there are not specific requirements of the network fabric.

Lets build this application topology and this specific outcome using VMware’s NSX for vSphere 6.2.

Switching

Location Independence is a nice way to describe Logical Switching that is provided by NSX. Using the VXLAN protocol it is possible to ensure that L2 connectivity can be provided to workloads that are across L3 boundaries. VXLAN overlays provided by NSX come in two flavours – Logical Switch and Universal Logical switch. Quite simply the difference between the two – A Logical Switch is bound to a single vCenter whilst a Universal Logical Switch is replicated between numerous vCenters.

IO-1

Cross Site topology

Site-Local routing for east-west traffic

Normally applications are comprised of numerous workloads within a an application tier. Whether it be a three tier application or two tier there is a requirement for routing. Routing has been traditionally been provided in a centralised manner by a gateway device like a core switch. Whilst suitable for connectivity it leaves undesirable application hair pinning and can induce some latency. Simplifying the east-west traffic has been made a little easier with Distributed Logical Routing. Distributed Logical Routing with NSX provides the default gateway for each network local to the host a workload resides on. That means if a workload requires to route between two subnets it can do so within the hypervisors kernel space.

Site-local routing for north-south egress traffic

Routing that leaves the environment can take advantage of local routes specific to the site. Egress optimisation is in the form of a locale-id. In a multi-site topology the locale-id will determine what routes are sent to the NSX controller. The controller will have the locale-id impressed on the routes and it will only send routes to the vSphere hosts with a matching locale-id. This ensure that communication of traffic that is leaving the application topology takes advantage of the best egress route out of a site-local edge gateway.

io-2

Local Site Egress Optimisation through Locale-ID

If an NSX manager is responsible for more than one site a locale-id can be modified at the UDLR, Cluster, or host level.

Site-local routing for north-south ingress traffic

With a workload that resides in an active active topology it is paramount that traffic that is destined to the application topology enters via the correct site. Traffic that is homed to the correct site ensures the client – server communication reaches the server via the the most optimum path. When it cannot the below will occur:

io-3

Sad Trombone!

 

This prevents ingress via Site-A and routing across a datacenter interconnect to Site-B to the active workload. The shortest route ensures applications that the next-hop is the correct hop to reach where the active workload resides.

With a disconnect from the infrastructure layer that runs the compute load and the network both have been oblivious to events that occur. If a vMotion occurs and a workload moves from one host to another or one site to another the network has had to accomodate this. This has lead to suboptimal designs on overall increase in complexity because network administrators had to stretch networks (L2 VLAN DCIs) and dealing with large L2 networks.

When the infrastructure and network are aware of each other and can influence the network topology based on this there is a marked decrease in network complexity.

Achieving site-local routing for north-south ingress traffic

Here is an approach to ingress optimisation that uses APIs, an Application to monitor event state, and can driven the network based on what is happening at the application infrastructure layer.

IO-1

The application topology

Site-A is advertising the Web front end of the application to the WAN. The 192.168.101.0/24 network is being advertised into the OSPF protocol that is currently being used between the HQ site and branches. The current workload being access is 192.168.101.10. This machine is located in Site-A. Traffic destined for 192.168.101.10 will come through the WAN and find the route for 192.168.101.0/24 and see the the next hop is via Site-A.

If the machine moves from Site-A with vMotion to Site-B the workload will now be located on the same subnet, same logical network but a different datacenter. If the routing table is not updated then traffic will ingress via Site-A and trombone back across to Site-B and then potentially return out via Site A or lead to asymmetric routing. This is bad with an Edge firewall in play!

io-3

Trying to avoid the trombone!

To deliver the dream

io-4

The application (built against NSX and vCenter APIs) has flagged the Site-B NSX Edge to wait for an event. This particular event is vMotion. When it detects a vMotion event the following occurs:

  • Waiting for vMotion event
  • Extract Workload IP address
  • Inject Workload IP as a static route (192.168.101.10/32) to the Site-B NSX Edge
io-6

vMotion detected – Inject the routes!

With the edge current configured to redistribute Static Routes into OSPF. This will automatically inject /32’s into OSPF. While the workload is currently at the DR site or branch site (known as Site-B) it has an ingress optimised path. The route in the OSPF routing table at the WAN will see the route 192.168.101.10/32 via Site-B.

io-7

/32 seen via Site B in WAN routing table

Within the UI itself you can see that there is a description against the route providing information about the host it represents. In the video later on the viewer will see “Ingress Optimised for Web01”. Traffic that requires a session with 192.168.101.10 will find it via Site B!

io-8

If the workload moves back to Site-A via a vMotion event the workloads host route (192.168.101.10/32), the static route injected on Site-B NSX Edge will be withdrawn and connectivity via Site-A is represented by 192.168.101.0/24.

io-4

What about route-table explosion?

For a long time there has been the notion that if /32 host routes are used then my route table grows. If there are hundreds of applications that are being protected by the application there would be an explosion of routes within the routing table.

There is an extension that is being developed that will have a function that can refactor routes based on the networks it sees. For example if there was four workloads that had the IP addresses .1, .2, .3, .4 then by the previous example there would be four /32 routes injected into the NSX Edge as static routes and lifted into the WANs OSPF routing table. The refactoring in the application would advertise the best summary possible. Case in point instead of four /32 routes there would be a summary route advertised via Site-B of 192.168.101.0/29.

If .5 and .6 we also moved it would advertised two additional /32 host routes. Python and API goodness at its finest!

As Winston Churchill was quoted famously as saying :

I may be drunk, Router, but in the morning I’ll be optimised and you will still be ugly!

The Babs factor

My colleague Babs (Andrew Babakian) is a team member in Sydney, Australia. He along with a couple of other SE’s in my team always chew the fat about how network virtualisation opens itself to new use cases, new approach to old and new problems, and how it can bring world peace!

Babs has brains the size of a planet, two patents to his name, a couple of CCIEs, and working towards a Doctorate. This module was built by him solely uses NSX and vCenter APIs. It is purely based on vMotion events.

Babs also has a patent for this ingress module –

Dynamic Virtual Machine Network Policy For Ingress Optimization
United States
A method of performing ingress traffic optimization for active/active data centers. The method creates site-specific grouping constructs for virtual machines that run applications that are advertised to the external networks. The site specific grouping constructs provide an abstraction to decouple virtual machines from traditional networks for common ingress network policies.

Kudos to him for writing this. His session at VMworld, NET4855 with Dimitri Desmidt received a 4.9/5.0 rating. This was (as I believe) the highest rated VMworld Session of 2015. It received a standing ovation due to the live demo. Babs and Dimitri must have sacrificed a few goats to the demo gods! Well done lads.

Demonstration of solution

This is a video that Babs made which takes the time to go through the entire environment, set it up, and explain what is occurring. It explains how the module influences and watches for the events it is programmed for and subsequently responds to inject a static route into OSPF.

tl;dr – it gets sexy around the 7 minute mark!

Summary

I tweeted last night about how awesome it is to work with people who happily share knowledge and want to teach other people. My colleague Babs (Andrew Babakian) is a team member in Sydney, Australia. He along with a couple of other SE’s in my team always chew the fat about how network virtualisation opens itself to new use cases, new approaches to old problems and can also provide new use cases. Imagine the notion of if the workload was actually a load balancer VIP and it also dragged it’s application front end with it along with the routing? How easy does application migration become?

Pretty damn cool that this is programmable networking and what a time to be working with Network Virtualisation.

11 thoughts on “Ingress Optimisation with NSX for vSphere

  1. +vRay says:

    Hi there Anthony one question:
    Why do we need phyton scripting on adding routes /32 after migration of VM, this is something required for egress optimization?

    thanks

    +vRay

    • Griffin Ellis says:

      It isn’t something required for egress optimization, only for ingress optimization. VMware has not implemented any sort of ingress optimization that is baked into NSX 6.2 although I’ve been informed this is on the roadmap.

      • pandom says:

        Thanks for the comment Griffin. Nothing in product today – the use of the script is an example of API consumption.

        Also – I’d love to see that version of the roadmap!

    • pandom says:

      It is not required for egress to work. If you’re a workload or an endpoint on the other side of the network then you need to look at what your routing table sees. If 192.168.103.0/24 is being advertised by Site A – that means you enter there, to only then be routed back through to the single VM on site B.

      If Site B could advertise itself the most specific route to the workload (/32) then the route seen would be via SiteB and more specific!

  2. creis says:

    Regarding the paragraph quoted below, shouldn’t it read:

    three IP addresses .1, .2, .3, […] summary route advertised via Site-B of 192.168.101.0/30 ?

    (with four IP addresses .1, .2, .3, .4, it would yield .0/30 and .4/32 [assuming the algorithm consider the .0 address as unusable / network address]).

    Adding .5 would yield .0/30 + .4/31
    Adding .6 would yield .0/30 + .4/31 + .6/32

    Informative post by the way, thank you ! I’m very surprised to hear this is on the roadmap though, any source to quote on that ?

    —–88—–
    […] For example if there was four workloads that had the IP addresses .1, .2, .3, .4 then by the previous example there would be four /32 routes injected into the NSX Edge as static routes and lifted into the WANs OSPF routing table. The refactoring in the application would advertise the best summary possible. Case in point instead of four /32 routes there would be a summary route advertised via Site-B of 192.168.101.0/29.

    If .5 and .6 we also moved it would advertised two additional /32 host routes. Python and API goodness at its finest!

  3. Ajesh says:

    Suppose if I have 20, /32 VMs out of which I’m migrating only 10 to the Site B, remaining 10 remains at Site A, what happens in the ingress traffic to Site A and Site B VMs ?

    • pandom says:

      10 at Site A would be covered by existing /24 or whatever route in the ESG. Site B would advertise the /32 routes. In the upstream routing table the most specific prefix to a destination takes precedence.

Leave a Reply

Your email address will not be published. Required fields are marked *

*