Securing Your vSphere Environment from Ransomware Part 2: Rubber Meets Road

In Part 1, I may have scared the crap out of you introduced protecting your vSphere environment from Ransomware attacks using some resources from VMware. But, that was an iceberg-level introduction to the concept.

Here, we are going to talk about what you can actually do to protect your vSphere environment.

My focus here is on securing the ESXi Host specifically and other vSphere apps (vCenter, etc.). I say this because I am not really going to talk about ransomware protection at the OS-layer and I fear some a-hole reading this might be like, “Hey buddy! What about Ransomware/Virus Protection on your VMs? Whatareyagonnadoaboutthat, huh? Whaddayasay?”

That is not my focus here, journalist from a 1940’s Press Conference.

A friendly reminder, as disclosed in Part 1, that most of what follows here is based on recommendations from VMware’s Ransomware Resource Center. However, I will be getting into some detail on how to automate the recommendations there, so I hope you find this valuable. Also, last minute edit: this will be a 3-parter because of the sheer length of this one. It will simply be a continuation of our discussion here about Hardening All the Things.

The “Low Hanging Fruit” Stuff You Can Do

First off, let’s say you want to get started, but you want to “triage” and do some things that are low effort but high reward for securing everything. Here are a few things you can do right away that take little effort and probably won’t cost you anything. These are in no particular order:

  1. Patch your stuff early, often, strategically, and preferably in an automated way. I probably don’t have to tell you this, but I would be remiss if I didn’t mention it: I put everything from the firmware on up to all things VMware into a bona fide pipelined lifecycle (technically we’re not 100% there, but we’re getting there). This approach is methodical and full of various test/development phases that digs deeper than just, “OMG VENDOR RELEASE PATCH. DURRRRRRR MUST INSTALL PATCH!” I will have to blog about that someday.
  2. Related to #1 as above, have a plan in place to efficiently apply emergency security patches.
  3. Turn off SSH on all ESXi hosts, if it isn’t already. We are going to talk about including this in automated configuration drift remediation later in this post.
  4. Ensure your ESXi root accounts have an exceedingly difficult password that are known only by a very small subset of Engineers. Ideally, you will want to change those passwords regularly, but that’s easier said than done. I have some ideas later on about this in a later section.
  5. Start sparking conversations across teams to organize an official plan to respond to a ransomware attack. I am putting this one early in our list here (we still have a ways to go) because the defined response allows for you to figure out what to put into place to be pro-active about ransomware. In other words, making a plan first allows for you to figure out how to protect yourself from ransomware before it happens. The best resource I have found starts with cisa.gov which lists various recommended plans from various standards and government agencies, like this one.

Digging Deeper

Use VMware’s vSphere Security Guide to evaluate and take action on your ESXi hosts.

There, you can download a PDF file and an excel spreadsheet that details VMware’s recommended list of settings that can be implemented to harden your environment. This will be time consuming and the guide goes pretty deep. Here’s a screenshot that will give you and idea about what it looks like:

vSphere Security Guide

From this you should get the idea. Note the tabs at the bottom, which will give you a defined standard across most of the vSphere infrastructure.

This should be a team effort. You can manually go through the list and figure out what makes sense to implement, etc. If you have vROPS, you can enable the vSphere Security Configuration Guide Benchmarks. If you don’t, then you’ll have to do this manually or script it.

I did find this set of scripts from Tony Reardon, but I would test these on your own and improve if you need to. If you find any others, or have any you have created yourself, contact me at bryan.sullins@thinkingoutcloud.org.

Keep in mind that you can do this in a phased approach. Once you have this down, you should have a list of configuration items that need to be enforced, which brings us to the next section:

Implement Automated Configuration Drift Remediation

Yeah, I know this one’s a mouthful, but it’s an important one. Even in the smallest of environments, things change over time: troubleshooting events cause snowflake-workarounds or people forget to turn off SSH (which I have totally, like, never done).

What you are doing here is scripting out the settings and configurations that need to be imposed on the ESXi hosts (from the aforementioned Security Configuration Guide), and scheduling them to run at a regular interval each day through some scheduled scripting platform, such as the two I list below.

There are two methods by which this can be accomplished of which I am aware, although what I list here are two among many. But these are the ones that I know will work for you, because they have worked for me:

  1. PowerCLI/Jenkins – We have talked about this one before.
    Pros – PowerCLI will have the most wide range of settings. It is easy to manage. It is well known and well-supported.
    Cons – With Jenkins it’s hard to have it stop temporarily if you need to do something specific (if it turns off SSH every hour but you need SSH on temporarily for troubleshooting, then it’s hard to make an exception).
  2. Ansible/AWX/Tower – We have talked about this one too.
    Pros – Really easy to implement since this is idempotent (no need to implement pesky conditionals per se).
    Cons – Will not have all available settings, or you will have to resort to raw esxcli commands, which is risky and kludgy in this instance. You will need a method for keeping the inventory up to date. If you don’t, you will have to maintain that manually, which is also full of its own implications.

Let’s go through the easy example of locking down SSH (and keeping it locked down). Bear in mind this is just one setting – there are tens of settings that are recommended in the Secuity guide, so this will probably need to be phased.

For the PowerCLI method, you can use my PowerCLI VMWARE KIT to use as a launching point, but the function for that looks like this:

Function Stop-MYSSH ([Parameter(Mandatory=$true)][string]$ESXiHost) {
    <#
    .SYNOPSIS
    Simple command to stop SSH on the given Host.
    .DESCRIPTION
    This command stops SSH on the given host. This command does not set the SSH service to start with the host.
    .EXAMPLE
    Stop-MYSSH -ESXiHost [ESXiHOSTFQDN]
    #>
    $SSHService = Get-VMHostService -VMHost $ESXiHost | Where-Object {$_.Key -eq "TSM-SSH"}
    If ($SSHService.Running -eq "True" -OR $SSHService.Policy -eq "on") {
    Get-VMHostService -VMHost $ESXiHost | Where-Object {$_.Key -eq "TSM-SSH"} | Stop-VMHostService -Confirm:$false
    Get-VMHostService -VMHost vlvbtsbx01.reicorpnet.com  | where { $_.key -eq "TSM-SSH" } | Set-VMHostService -Policy "Off"
  }
}

You will notice, I am doing 2 things here. If SSH is running or if SSH has a policy of “on” (which would start the service at boot time), I am shutting off SSH, and I am ensuring that the “at-boot” policy for SSH is disabled.

This would be just one function among many for the settings you want to enforce. Use MY VMWARE KIT as a guide for creating your own module for this. And as for scheduling this through Jenkins, my post on using Jenkins for this very purpose should help.

For the Ansible/AWX code, it looks like this:

---
- hosts: all
  gather_facts: false
  vars:
    - vcentername: vcenternamehere
    - vcenterpassword: protectusingansiblevault
  no_log: false

  tasks:

  - name: Gather SSH Service facts about ESXi Hosts . . . .
    vmware_host_service_facts:
      hostname: '{{ vcentername }}'
      username: 'administrator@vsphere.local' # or whatever
      password: '{{ vcenterpassword }}'
      esxi_hostname: '{{ inventory_hostname }}'
      validate_certs: false
    delegate_to: localhost
    register: host_service_facts  

  - name: Stop SSH Service and set to disabled at boot for ESXi hosts that are not set properly . . . 
    vmware_host_service_manager:
      hostname: '{{ vcentername }}'
      username: 'administrator@vsphere.local' # or whatever
      password: '{{ vcenterpassword }}'
      esxi_hostname: '{{ inventory_hostname }}'
      validate_certs: false
      service_name: TSM-SSH
      service_policy: off
      state: stop
    delegate_to: localhost
    when: host_service_facts.host_service_facts[inventory_hostname] | json_query("[?key=='TSM-SSH'].running") != "false"

And of course, you can continue to add additional settings as needed in this playbook. Arguably, the when conditional is not “required” since this is idempotent, but this might be something you can learn for applying to other contexts. This one would get scheduled through AWX, so you can use my AWX post for that.

Wow – it’s like everything I have done culminates in this post. Almost like I planned it.

Automated ESXi root Password Rotations

And finally on to password rotations. Admittedly, this one’s both high in importance but also high in inconvenience. Since ransomware engines attempt to break into multiple hosts and replicate from host-to-host, finding a method to randomize the password and make them unique on a per-host basis will harden your environment nicely.

But First, An Aside:

You might be thinking, “But Bryan, what about having to type the password into the ESXi console? You can’t use copy/paste there, so if you have this wicked-long password, then you have to type it in to get into the ESXi console.”

Well, what are you doing in there? Troubleshooting, right?

Well, if you recall, way back in June 2020, I made a post about ESXi host slipstreaming and I made the comment that we treat ESXi hosts as cattle (we can nuke and pave a new ESXi host in 18 minutes). No need to troubelshoot when you can just nuke and pave the host in place and install an entire new one:

Anyway, the idea here is to have an automated method to rotate passwords (randomized, lengthy, and unique per host) so that it is 1) more difficult to attack and 2) decreases the blast radius.

The automated method starts with a secrets manager like Hashicorp Vault. It is possible to use PowerCLI to automate rotated password from the Vault store and synchronize the changes across your ESXi hosts.

The downside is that if you ever want to get into the root account for ESXi by whatever means, you will need to literally log in to Hashicorp Vault and copy out the password.

. . . I just realized I haven’t even talked about MFA yet!

Here’s the best I can do for now on this: a video from Hashicorp. At the 15 minute mark is where things get interesting. You will see that you have the ability to have Vault dynamically assign different passwords to each host at regular intervals.

Neato mosquito!

Next post will be about MFA, Backups, Hardening vCenter, and Network Segmentation, among other topics.

Questions? Hit me up on twitter @RussianLitGuy or email me at bryansullins@thinkingoutcloud.org. I would love to hear from you.

Leave a comment