Testing Distributed Firewall heap usage

The Distributed Firewall in NSX for vSphere is comprised of a number of memory allocations. These memory allocations are also known as heaps. These are allocated based on the amount of physical memory that a server has. The memory heaps are located under /system/heaps on each vSphere host.

The heaps

Under NSX 6.2.2 these are un-named heaps.

[[email protected]:/var/log] vsish -e ls /system/heaps|grep vsip
vsip-0x430dc0f31000/
vsip-0x43083cd1e000/
vsip-0x43082bd62000/
vsip-0x430f403d6000/
vsip-0x4306e8c1c000/

Under NSX 6.2.3 and later these are named heaps

[[email protected]:~] vsish -e ls /system/heaps|grep vsip
vsip-attr-0x430e22ac4000/
vsip-flow-0x430774f3e000/
vsip-ipdiscovery-0x430731b51000/
vsip-rules-0x43074e54e000/
vsip-state-0x430eac11f000/
vsip-module-0x430724911000/

These heaps perform different functions. Based upon the heaps function it is allocated memory in 8MB allocations up to the maximum for the heap. This is not configurable by the user. Below are the heap maximums for NSX 6.2.2/6.2.3 on a 128GB or higher host:

  • vsip-module has a maximum heap size of 512MB
  • vsip-state has a maximum heap size of 1535MB
  • vsip-rules has a maximum heap size of 1535MB
  • vsip-ipdiscovery has a maximum heap size of 384MB
  • vsip-attr has a maximum heap size of  256MB

In certain environments and situations there have been customers who have utilised over 95% of memory thresholds. When building distributed firewall rules and utilising containers (objects) there are situations where nesting, different group combinations, and very large IPsets have resulting in many copies of object containers being realised on the data plane. This has resulting in high usage.

Having worked closely with a very smart colleague (Hello Dale!) he has imparted his endless knowledge on me about this. With my recent foray into using PShould, PSate, and PowerNSX I thought about making a test for him. This test will highlight all vSphere hosts that have the DFW installed and ensure their respective heaps have at least 20% memory free.

The module

The script is pretty straight forward and commented. It does require the following modules installed

  • PSate
  • PShould
  • Posh-SSH
  • PowerNSX
  • PowerCLI
## Test for DFW Memory heap usage
#a: Anthony Burke - @pandom_
#c: (dcoghland for original idea and initial code, nbradford for sanity checks)


## DO NOT EDIT.
### The limit threshold is recommended as a buffer. If 80% of memory or more is used the test will fail.
## Some math for heap percentage
  $limit = 20
  $total = (100-$limit)
## Collect all VMhosts under vCenter
  $esxi_creds = (Get-Credential)



## Initiate Test sequence
DescribingEach "Distributed Firewall Memory heaps"{
  $vSphereHosts = Get-VmHost
  # For each vSphere host found by Get-VMhost connect to host with SSH
  foreach ( $vsphere in $vSphereHosts ) {
    GivenEach "vSphere Host $($vSphere.name)" {
      $esxi_SSH_Session = New-SSHSession -ComputerName $vsphere -Credential $esxi_creds -AcceptKey
      #Invoke vsish command to list all VSIP heaps and store it
      $vsish_command_1 = "vsish -e ls /system/heaps|grep vsip"
      $vsish_object_1 = Invoke-SSHCommand -SessionId $esxi_SSH_Session.SessionId -Command $vsish_command_1 -EnsureConnection
      #Upon the stored object, for each heap listed, use SSH session to check heap memory remaining.
      foreach ($heap in $vsish_object_1.output) {

        $command = "vsish -e get /system/heaps/$heap'stats'"
        $stats = Invoke-SSHCommand -SessionId $esxi_SSH_Session.SessionId -Command $command -EnsureConnection
        $stats.output | ? { $_ -match "(percent free of max size):(\d{1,3})" } > $Null
        # Based on the regex output, use matches and PShould to determine remaining memory is more than limit (ex:80 is more than 20)
        It "has not surpassed the $total % memory threshold on memory heap $heap for $vsphere" {
          $matches[2] | should be  -gt $limit
        }
      }
    }
  }
}

When run as a script across an environment this is the output in a 6.2.3 or higher environment:

Screenshot 2016-08-23 21.10.33

If this is run in a 6.2.2 environment the filters do not have names. If a host has a filter that goes over the limit the individual test run will return red state it exceeded the threshold.

By default the safe-threshold is 20% remaining.

Git it

Check the script out on github now

PowerNSX Log Insight Segmenter

PowerNSX has been a focus of mine for a little while. I also have a penchant  for Log Insight. I like the product. I have outlined previously a blog here for approaching the segmentation of any application with Log Insight and NSX Distributed Firewall.

I have created a tool that has taken my learnings of segmenting production Log Insight instances and built a set of rules against it. These predefined Security Groups and rules capture the legitimate traffic against Log Insight and protect the cluster.

Screenshot 2016-05-13 13.20.04

The Log Insight Segmenter is designed to work on Log Insight Clusters using an Integrated Load Balancer (ILB). When running the code a user is prompted for the following:

  • The IP address assigned to LogInsightLoadBalancerIPAddress in the script will be used as the Log Insight ILB IP address. Warning text will give a prompt if this is correct displaying the current IP address assigned to variable.
  • Second warning explains what is about to occur and if the user wants to proceed.
  • Any No prompt will abort the script.

An administrator can define a custom ILB IP address appending the following  -LogInsightLoadBalancerIPAddress

  •  .\segmentLI.ps1 -LogInsightLoadBalancerIPAddress 10.100.0.9

The IP address used here is subsequently used in the rules that are created. It is the destination IP address for external based communication.

Running the script results in this:

Screenshot 2016-05-13 13.50.23

After this has run all an administrator needs to do is add an IP Set or object to the Security Group SG-Administrative-Sources and access is granted.

Because this is a generic script for many environments some little tweaks may be needed. I would suggest modifying the ANY from the sources field and append the relevant vCenter Objects and IP ranges for syslog sources.

Download the script and let me know how you fare.