Ingress Optimisation with NSX for vSphere

This blog looks to highlight a solution to ingress routing for multi-DC networks and their applications that have location independence using overlay protocols such as VXLAN. It takes advantage of events occurring within the infrastructure and manipulates the network accordingly. This is done through event state monitoring and the use of APIs.

Network Virtualisation

Network Virtualisation brings administrators the ability to achieve this today. This is no longer in the realm of science fiction or is it complex as brain surgery. To achieve the desired outcome of an active-active application topology with local east west and north south routing there are not specific requirements of the network fabric.

Lets build this application topology and this specific outcome using VMware’s NSX for vSphere 6.2.


Location Independence is a nice way to describe Logical Switching that is provided by NSX. Using the VXLAN protocol it is possible to ensure that L2 connectivity can be provided to workloads that are across L3 boundaries. VXLAN overlays provided by NSX come in two flavours – Logical Switch and Universal Logical switch. Quite simply the difference between the two – A Logical Switch is bound to a single vCenter whilst a Universal Logical Switch is replicated between numerous vCenters.


Cross Site topology

Site-Local routing for east-west traffic

Normally applications are comprised of numerous workloads within a an application tier. Whether it be a three tier application or two tier there is a requirement for routing. Routing has been traditionally been provided in a centralised manner by a gateway device like a core switch. Whilst suitable for connectivity it leaves undesirable application hair pinning and can induce some latency. Simplifying the east-west traffic has been made a little easier with Distributed Logical Routing. Distributed Logical Routing with NSX provides the default gateway for each network local to the host a workload resides on. That means if a workload requires to route between two subnets it can do so within the hypervisors kernel space.

Site-local routing for north-south egress traffic

Routing that leaves the environment can take advantage of local routes specific to the site. Egress optimisation is in the form of a locale-id. In a multi-site topology the locale-id will determine what routes are sent to the NSX controller. The controller will have the locale-id impressed on the routes and it will only send routes to the vSphere hosts with a matching locale-id. This ensure that communication of traffic that is leaving the application topology takes advantage of the best egress route out of a site-local edge gateway.


Local Site Egress Optimisation through Locale-ID

If an NSX manager is responsible for more than one site a locale-id can be modified at the UDLR, Cluster, or host level.

Site-local routing for north-south ingress traffic

With a workload that resides in an active active topology it is paramount that traffic that is destined to the application topology enters via the correct site. Traffic that is homed to the correct site ensures the client – server communication reaches the server via the the most optimum path. When it cannot the below will occur:


Sad Trombone!


This prevents ingress via Site-A and routing across a datacenter interconnect to Site-B to the active workload. The shortest route ensures applications that the next-hop is the correct hop to reach where the active workload resides.

With a disconnect from the infrastructure layer that runs the compute load and the network both have been oblivious to events that occur. If a vMotion occurs and a workload moves from one host to another or one site to another the network has had to accomodate this. This has lead to suboptimal designs on overall increase in complexity because network administrators had to stretch networks (L2 VLAN DCIs) and dealing with large L2 networks.

When the infrastructure and network are aware of each other and can influence the network topology based on this there is a marked decrease in network complexity.

Achieving site-local routing for north-south ingress traffic

Here is an approach to ingress optimisation that uses APIs, an Application to monitor event state, and can driven the network based on what is happening at the application infrastructure layer.


The application topology

Site-A is advertising the Web front end of the application to the WAN. The network is being advertised into the OSPF protocol that is currently being used between the HQ site and branches. The current workload being access is This machine is located in Site-A. Traffic destined for will come through the WAN and find the route for and see the the next hop is via Site-A.

If the machine moves from Site-A with vMotion to Site-B the workload will now be located on the same subnet, same logical network but a different datacenter. If the routing table is not updated then traffic will ingress via Site-A and trombone back across to Site-B and then potentially return out via Site A or lead to asymmetric routing. This is bad with an Edge firewall in play!


Trying to avoid the trombone!

To deliver the dream


The application (built against NSX and vCenter APIs) has flagged the Site-B NSX Edge to wait for an event. This particular event is vMotion. When it detects a vMotion event the following occurs:

  • Waiting for vMotion event
  • Extract Workload IP address
  • Inject Workload IP as a static route ( to the Site-B NSX Edge

vMotion detected – Inject the routes!

With the edge current configured to redistribute Static Routes into OSPF. This will automatically inject /32’s into OSPF. While the workload is currently at the DR site or branch site (known as Site-B) it has an ingress optimised path. The route in the OSPF routing table at the WAN will see the route via Site-B.


/32 seen via Site B in WAN routing table

Within the UI itself you can see that there is a description against the route providing information about the host it represents. In the video later on the viewer will see “Ingress Optimised for Web01”. Traffic that requires a session with will find it via Site B!


If the workload moves back to Site-A via a vMotion event the workloads host route (, the static route injected on Site-B NSX Edge will be withdrawn and connectivity via Site-A is represented by


What about route-table explosion?

For a long time there has been the notion that if /32 host routes are used then my route table grows. If there are hundreds of applications that are being protected by the application there would be an explosion of routes within the routing table.

There is an extension that is being developed that will have a function that can refactor routes based on the networks it sees. For example if there was four workloads that had the IP addresses .1, .2, .3, .4 then by the previous example there would be four /32 routes injected into the NSX Edge as static routes and lifted into the WANs OSPF routing table. The refactoring in the application would advertise the best summary possible. Case in point instead of four /32 routes there would be a summary route advertised via Site-B of

If .5 and .6 we also moved it would advertised two additional /32 host routes. Python and API goodness at its finest!

As Winston Churchill was quoted famously as saying :

I may be drunk, Router, but in the morning I’ll be optimised and you will still be ugly!

The Babs factor

My colleague Babs (Andrew Babakian) is a team member in Sydney, Australia. He along with a couple of other SE’s in my team always chew the fat about how network virtualisation opens itself to new use cases, new approach to old and new problems, and how it can bring world peace!

Babs has brains the size of a planet, two patents to his name, a couple of CCIEs, and working towards a Doctorate. This module was built by him solely uses NSX and vCenter APIs. It is purely based on vMotion events.

Babs also has a patent for this ingress module –

Dynamic Virtual Machine Network Policy For Ingress Optimization
United States
A method of performing ingress traffic optimization for active/active data centers. The method creates site-specific grouping constructs for virtual machines that run applications that are advertised to the external networks. The site specific grouping constructs provide an abstraction to decouple virtual machines from traditional networks for common ingress network policies.

Kudos to him for writing this. His session at VMworld, NET4855 with Dimitri Desmidt received a 4.9/5.0 rating. This was (as I believe) the highest rated VMworld Session of 2015. It received a standing ovation due to the live demo. Babs and Dimitri must have sacrificed a few goats to the demo gods! Well done lads.

Demonstration of solution

This is a video that Babs made which takes the time to go through the entire environment, set it up, and explain what is occurring. It explains how the module influences and watches for the events it is programmed for and subsequently responds to inject a static route into OSPF.

tl;dr – it gets sexy around the 7 minute mark!


I tweeted last night about how awesome it is to work with people who happily share knowledge and want to teach other people. My colleague Babs (Andrew Babakian) is a team member in Sydney, Australia. He along with a couple of other SE’s in my team always chew the fat about how network virtualisation opens itself to new use cases, new approaches to old problems and can also provide new use cases. Imagine the notion of if the workload was actually a load balancer VIP and it also dragged it’s application front end with it along with the routing? How easy does application migration become?

Pretty damn cool that this is programmable networking and what a time to be working with Network Virtualisation.

What’s new in NSX 6.2 – Traceflow

This is a post in the series – What’s new in 6.2? It covers off the new features of a pseudo-major NSX release.

Introducing Traceflow

Traceflow adds functionality to the Toolbox that NSX provides to help Operationalise the NSX Network Virtualisation platform. Traceflow allows the injection of varying types of packets into application topologies. As the name suggests traces the flow through the path. It collects observation of actions, hosts, relevant components, and their names. This is used to help administrators visualise a topology path.

Tracing within a Layer 2 domain

As an administrator using Traceflow it is possible to craft a packet with a variety of settings. As seen below I have picked a source and destination VM on a Logical Switch. This can be selected on Logical Switches in Unicast or Hybrid mode.

Screen Shot 2015-08-25 at 9.31.41 PM

Here you can see that there is an ability to select protocol and then modify additional fields. I have chosen a TCP packet and a SRC/DST port of 80 for this example. My firewall rules ‘protecting’ my workloads are permit any any.

Screen Shot 2015-08-25 at 9.57.27 PM


This matches App-01 Web Tier that are in a Security Group (matching on a Security Tag) to individual VM’s listed App01, App01, App02, App02. This rule allows all traffic. When the Traceflow is executed the following output is seen:

Screen Shot 2015-08-25 at 9.31.13 PM

At first this looks rather busy. It is possible to identify the following information from the above figure:

  • SRC: Web01 NIC1
  • DST: App01 NIC1
  • Packet flow and order of operations
  • Objects between two points

These Virtual Machines are on a VXLAN Logical Segment. This allows administrators to provide Layer 2 connectivity between workloads independent of the underlying infrastructure.

The order of operations as displayed by the figure is as follows:

  1. The Traceflow packet is injected into Web01 vNIC.
  2. Received by the Distributed Firewall protecting the Web01 vNIC
  3. Forwarded (due to permit rule) by Distributed Firewall protection the Web01 vNIC
  4. Forwarded via VXLAN Tunnel Endpoint of host
  5. Received via VXLAN Tunnel Endpoint of host (where App01 currently is located)
  6. Received by the Distributed Firewall protecting the App01 vNIC
  7. Forwarded (due to permit rule) by Distributed Firewall protecting the App01 vNIC
  8. Delivered to destination workload App01.

That gives administrators visibility to all related objects to a topology between two end points.

Identifying the Deniers

So what would happen if the administrator decided to ratchet down security? What would occur if the rule was changed to the below:

Screen Shot 2015-08-25 at 9.58.15 PM Time to see how Traceflow reacts. When the administrator runs Traceflow a second time the following output is seen.

Screen Shot 2015-08-25 at 9.59.30 PM

The result shows 1 Dropped observation in red. Something has been blocked. The sequence is as follows:

  1. The Traceflow packet is injected into Web01 vNIC.
  2. Received by the Distributed Firewall protecting the Web01 vNIC
  3. Dropped immediately (due to deny rule) by Distributed Firewall protecting the Web01 vNIC on egress.

The component name for Sequence 2 states Firewall (Rule 1005) is the Culprit. All the objects in the Component Name column are hyperlinked. This will reveal more information to the user about the object.

Screen Shot 2015-08-25 at 9.59.45 PM

Drop details which are hyperlinked show Rule ID 1005 is the culprit as suspected. The reason is due to a FW_RULE.

If this is not a desired behaviour or a rule that should not be enforced on this workload the administrator can quickly, easily, and efficiently identify the rule and remediate accordingly.

Layer 3 Traces just got visible

Taking this mentality with security policies on the same Layer 2 domain it is possible to perform Traceflow across routed segments. In this example the administrator decides to

Screen Shot 2015-08-25 at 9.33.14 PM

The difference between this Traceflow and the last one is that the Destination is an IP address. It is an ICMP trace. This is an address that is attached to the DLR. In this case this IP address is the Gateway IP for that subnet. It is local to all hosts in the transport zone the Logical Switch and DLR are assigned to. When the flow is executed the output below is seen:

Screen Shot 2015-08-25 at 9.35.09 PM


Time to look at the steps occurring here to gain an insight into how the traffic is being processed:

  1. Traceflow packet is injected into the vNIC of Web01 VM
  2. Forwarded (due to permit rule) by Distributed Firewall protection the Web01 vNIC
  3. Received by the Distributed Firewall protecting the Web01 vNIC
  4. Logical Switch App-01-Flat forwards this packet
  5. Packet is received by App-01-DLR
  6. Packet is returned by App-01-DLR
  7. Logical Switch App-01-Flat forwards this packet
  8. Received by the Distributed Firewall protecting the Web01 vNIC
  9. Forwarded (due to permit rule) by Distributed Firewall protection the Web01 vNIC

Screen Shot 2015-08-25 at 9.47.57 PM

Like before it is possible to understand the related objects from the Component Name hyperlink. Observation details below outline the Segment ID and Component Name. Very handy to know what VXLAN Numerical Identifier (VNI) is assigned to a Logical Switch.

Screen Shot 2015-08-25 at 9.48.03 PM



Traceflow is a great addition to the tools within VMware NSX for vSphere. It is born out of a maturing platform and provides actionable information at an administrators fingertips. I personally like how I can correlate Firewall policies to where a packet stops. I also like the notion I can inject varying traffic types into my topologies very easily.

VMware NSX for vSphere 6.2 is available now.