Simplified Maintenance of System Security and Availability with VMware Skyline Health

If you’re reading this article, there’s a good chance you’re already familiar with VMware’s vSphere and vSAN Health features.  This was originally shipped with vSphere 6.7 and featured a small handful of monitored items, largely based around configuration best-practice, enabling automatic discovery of environmental issues and offering detailed explanation and guidance on self-service resolution.

As vSphere 6.7 has continued to evolve, so too has the Health feature, and is now known as “Skyline Health”.  Skyline Health is a full-featured, dynamic health check, diagnosis and rectification facility that helps ensure that your VMware environment remains safe, secure and stable through more than 90 separate checks, as well as providing unification of the previously separate checks for vSphere and vSAN health.

I mentioned before that Skyline Health performs “dynamic” health checks – let’s dig into this a little deeper.

In addition to some of the basic best-practice checks performed, such as ensuring logs are stored on persistent storage, a sufficient number of heartbeat datastores are configured for vSphere High Availability or that your version of vSphere is still within general support, a number of security and stability issues are also now checked – such as CPUs that require microcode updates, the recent VMware Directory Service vulnerability, and known problematic hardware / driver versions.

As an example, I recently deployed a new Dell EMC VxRail cluster for a customer who required an external vCenter Server Appliance. We grabbed a copy of vCenter Server 6.7 U3b and deployed this as per normal process, then proceeded to deploy the cluster and spin up some test workloads.  As part of the validation process prior to customer handover, Skyline Health was checked.  Here’s how a brand-new, fresh-out-the-box deployment looked:

As you can see – quite a number of issues needed to be addressed, and certainly not a state in which we’re prepared to hand the system over to the customer!

A CPU microcode update was required to address hardware-level vulnerabilities, and a known issue with the specific version of ESXi that was shipped with the VxRail hardware results in excessive false-positive hardware alerts being raised.  Additionally, a known potential UEFI boot issue was detected, and most critically, the recent directory vulnerability that can allow an attacker to bypass vCenter authentication was also flagged.

This is the real strength of Skyline Health and why I’ve chosen to write about it today – through these dynamic checks, it’s constantly updated with new issues as they’re discovered and provides you with simple, at-a-glance information on the risks that are presented to your environment that might not have been known about yesterday, giving you the knowledge you require to make simple decisions that enable you to protect and maintain the security and availability of your environment.

In the above case, a simple vCenter upgrade was undertaken, followed by the VxRail-automated upgrade for the ESXi hosts, following which we were greeted with a lovely array of all-green ticks:

If you’ve got a version of vSphere that supports it, you can take the Skyline Health feature for a spin yourself – click on the vCenter Server at the very top of your hierarchy, then the “Monitor” tab and you’ll see Skyline Health at the bottom of the list.