I was working in the lab and I had installed a fresh vCenter 6 Server Appliance into my lab. I seized the hosts from a previous install. What remained was an old installation of NSX (Manager had previous been attached to the old vCenter) and a Log Insight deployment. I had my vCenter running with new hosts and my new DVS built.

When I went to prepare the clusters for NSX I succeeded in deploying the VIBS to each host. I went to configure VXLAN and I would get an error. “Error retrieving DVS forwarding class for switch dvs-40”. Odd. I went into my ESX host and checked the object ID of my installed DVS. DVS ID was 33. So it seemed somewhere I had stale information from my previous installation. Information about the DVS used for Logical Switching is stored as information in the database on NSX Manager. It stores the DVS ID, associated clusters, uplinks assigned to DVS, and the port-group used for the VMKernel interface for VXLAN.

This can only be performed in Shell of NSX Manager. This password is used by GSS/VMware Staff/VMware Tech Support to gain access to the heart of NSX Manager. Please don’t ask me for it.

Sussing the database

After logging into Shell I drop into the database stored on NSX Manager. Lets see what stale information is in my database.

Here are the DVS associated that NSX Manager is aware of. Note that dvs-40 is here but not dvs-33. dvs-40 and dvs-35 are stale records. There is no dvs-33!

secureall=# select * from vdn_vds_context;
id | switch_id | mtu | teaming_policy | vmknic_dvpg_id | promiscuous_mode
310 | dvs-35 | 1600 | 0 | | f
363 | dvs-40 | 1600 | 0 | | f
(2 rows)

Lets see what compute clusters are registered with NSX. Again more stale records.

secureall=# select * from vdn_cluster;
id | cluster_id | vlan_id | ip_pool_id | vmknic_count
315 | domain-c9 | 0 | ipaddresspool-3 | 1
368 | domain-c7 | 0 | ipaddresspool-7 | 1
(2 rows)

Okay – old clusters. My current clusters are c3 and c5. Lets look at the uplinks on the DVS.

secureall=# select * from vds_teaming_uplink_port;
id | uplink_port_name | vds_context | active_port
311 | Uplink 4 | 310 | t
312 | Uplink 3 | 310 | f
313 | Uplink 2 | 310 | f
314 | Uplink 1 | 310 | f
364 | Uplink 4 | 363 | t
365 | Uplink 3 | 363 | f
366 | Uplink 2 | 363 | f
367 | Uplink 1 | 363 | f
(8 rows)

As I suspect – old uplinks are pinned to a DVS context, uplink 1 is active but stale. The portgroups that the VMKernel interfaces are in will match a stale DVS context which will then reference a port-group that doesn’t exist.

secureall=# select * from vdn_vmknic_portgroup;
id | moid | vds_context | vlan_id | backing_status
318 | dvportgroup-60 | 310 | 0 | 0
374 | dvportgroup-64 | 363 | 0 | 0
(2 rows)

With that assumption confirmed it is clear why the forwarding class is not selected when deploying VXLAN VMKernel interface.

To cleanse some stale records

Now we have confirmed there are a number of stale records lets remove the DVS context completely. That should delete all linked objects.

secureall=# delete from vdn_vds_context where id = '363'
secureall-# ;
ERROR: update or delete on table "vdn_vds_context" violates foreign key constraint "vds_teaming_uplink_port_vds_context_fkey" on table "vds_teaming_uplink_port"
DETAIL: Key (id)=(363) is still referenced from table "vds_teaming_uplink_port".

Whoop! Definitely didn’t do that. The error violates Foreign Key constraints. This essentially means all child objects referencing parent objects must be unlinked (deleted in this case) before deleting the parent.

Here we go

secureall=# delete from vdn_vmknic_portgroup where vds_context = '363';
secureall=# delete from vds_teaming_uplink_port where vds_context='363';
secureall=# delete from vdn_cluster_vds_contexts where vds_context_id_fk = '363';
secureall=# delete from vdn_vds_context where id = '363' ;

Now that the stale information has been cleared from the database it is time to check that

Given this is a lab environment with no working NSX the quickest way to resolve this would be a redeploy of the NSX Manager. Due to my mentality of troubleshoot, don’t restore, I was adamant on figuring this one out. Shout out to Dmitri Kalintsev and Nick Bradford (Bad conscience, good conscience) who were sitting on my shoulder as we hacked through the database!

3 thoughts on “Clearing stale records from NSX Manager

  1. dear author, how to access the shell of nsx manager, i accessed the nsx manager,but i can’t performed the command “secureall=# select * from vdn_vds_context”;

  2. Hi Pandom,
    Thanks for the detailed explanation. I wonder if you can just let us know how to connect to NSX vPostgres from low level access, ie DB Name, role name,…
    By the way, recently VMware published a KB for those enthusiastic about having NSX Manager linux level access: https://kb.vmware.com/s/article/2149630

Leave a Reply

Your email address will not be published. Required fields are marked *