Single Touch, Production Ready ESXi Provisioning with Ansible

TL;DR: The source code and documentation for what follows is on my github here: https://github.com/bryansullins/baremetalesxi. For the full story and explanations, keep reading and enjoy!

A successful automated installation of ESXi

The Story

I got my start in IT Infrastructure Engineering at a very large and well-known hardware company that also offered IT Services. I was part of a team that built and maintained an IaaS Cloud for Healthcare Providers.

We rolled out ESXi hosts by the enclosure. That’s 16-32 ESXi hosts at a time and when I joined the team we did it . . . . manually.

Manually.

. . . From the ground up. . . . From rack and stack, to the firmware updates, to getting them into vCenter and production ready.

This was about 2014. We automated some of it and it got a lot better. I can’t take all the credit. But to get to Single/Zero touch for ESXi provisioning was super tough, but we did what we could.

With the more recent releases of Ansible, single touch (possibly zero touch) is now a reality.

What Does Single Touch Mean?

Let’s all make sure we know what I mean when I use the phrase “Single Touch”:

“Bare Metal Single Touch ESXi Rollout (Provisioning) – The User launches the automation once (input is acceptable, hostname, ip, etc.), but once executed, the ESXi host will go from powered off with no OS to being fully configured in vCenter, in its destination cluster, production ready.

As defined by me, Bryan Sullins

How It Works: A Breakdown – Part 1 – ESXi Automated Installation for a Custom-Built ISO

At the top of this post, you will see the github link for what follows, but in case you are lazy like me, here it is again: https://github.com/bryansullins/baremetalesxi.

I will break this down into two parts. Part 1 is the ESXi Bare Metal Installation. This is an automated kickstart with a self-contained ISO that has all of the installation parameters available without the need for PXE, or TFTP. You will need to make the ISO available from a webserver, but that is all made available in playbook.

As far as understanding how this has to be done, you have to have some knowledge of kickstart, which goes a bit beyond this post. There are many ways you can make a kickstart file available for an installation, but remember, I wanted to make this ISO a standalone, self-contained automated install. If you were to do this manually, you would need to:

  1. Mount the ESXi ISO provided by the vendor.
  2. Copy all files out to a staging directory.
  3. Create a kickstart file with IP info, etc. You can also do scripting for additional setup: (esxcli commands, etc.).
  4. Tarball the kickstart file with your choice of name (bmks.tgz), and copy it into the root of the iso.
  5. Edit both boot.cfg files (one for Legacy Boot and one for UEFI) to reference the kickstart file and append the bmks.tgz to the list of tarballs that must be extracted.
  6. Burn all files back into the now-customized iso.
  7. Boot from Virtual Media, etc.

For a detailed description of the process, William Lam’s post on this matter was integral for me to get this done.

You will note in the playbook I am doing what I describe above:

## /opt/baremetal is the staging directory.
  - name: Mounting source directory from official production ESXi ISO . . . copying over build files . . . backing up defaults . . .
    shell: |
      mkdir /mnt/{{ esxi_hostname }}
      mount -o loop -t iso9660 /opt/esxiisosrc/{{ src_iso_file }} /mnt/{{ esxi_hostname }}/
      mkdir /opt/baremetal/{{ esxi_hostname }}
      mkdir /opt/baremetal/temp/{{ esxi_hostname }}
      mkdir -p /opt/baremetal/temp/{{ esxi_hostname }}/etc/vmware/weasel
      cp -r /mnt/{{ esxi_hostname }}/* /opt/baremetal/{{ esxi_hostname }}/
      umount /mnt/{{ esxi_hostname }}
      mv /opt/baremetal/{{ esxi_hostname }}/boot.cfg /opt/baremetal/{{ esxi_hostname }}/boot.cfg.orig
      mv /opt/baremetal/{{ esxi_hostname }}/efi/boot/boot.cfg /opt/baremetal/{{ esxi_hostname }}/efi/boot/boot.cfg.orig
  
## The following two tasks will make the custom iso bootable by both legacy and UEFI implementations:    
  - name: Copying custom boot.cfg to root directory . . .
    copy:
      src: files/{{ esxi_build }}/boot.cfg
      dest: /opt/baremetal/{{ esxi_hostname }}
      owner: root
      group: root
      mode: '0744'

  - name: Copying custom UEFI boot.cfg to root efi directory . . .
    copy:
      src: files/{{ esxi_build }}/efi/boot/boot.cfg
      dest: /opt/baremetal/{{ esxi_hostname }}/efi/boot
      owner: root
      group: root
      mode: '0744'

## Additional options can be appened after the "reboot" at the end of the content section, such as scripted esxcli commands, etc.
  - name: Creating kickstart file with proper automation contents . . .
    copy:
      force: true
      dest: /opt/baremetal/temp/{{ esxi_hostname }}/etc/vmware/weasel/ks.cfg
      content: |
        vmaccepteula
        clearpart --firstdisk=local --overwritevmfs
        install --firstdisk=local --overwritevmfs
        rootpw --iscrypted {{ encrypted_root_password }}
        network --bootproto=static --addvmportgroup=1 --vlanid={{ vlan_id }} --ip={{ host_management_ip }} --netmask={{ net_mask }} --gateway={{ gate_way }} --nameserver="#.#.#.#,#.#.#.#" --hostname={{ esxi_hostname }}
        reboot 

  - name: Scripting commands to tarball the kickstart file and make the proper iso . . .
    shell: |
      chmod ugo+x /opt/baremetal/temp/{{ esxi_hostname }}/etc/vmware/weasel/ks.cfg
      cd /opt/baremetal/temp/{{ esxi_hostname }}
      tar czvf bmks.tgz *
      chmod ugo+x /opt/baremetal/temp/{{ esxi_hostname }}/bmks.tgz
      cp /opt/baremetal/temp/{{ esxi_hostname }}/bmks.tgz /opt/baremetal/{{ esxi_hostname }}/
      cd /opt/baremetal/{{ esxi_hostname }}
      
  - name: Creating bootable iso from all files . . .
    shell: >
      mkisofs
      -relaxed-filenames
      -J
      -R
      -b isolinux.bin
      -c boot.cat
      -no-emul-boot
      -boot-load-size 4
      -boot-info-table
      -eltorito-alt-boot
      -e efiboot.img
      -boot-load-size 1
      -no-emul-boot
      -o /opt/baremetal/{{ esxi_hostname }}.iso
      /opt/baremetal/{{ esxi_hostname }}/

  - name: Moving created iso to webserver . . .
    shell: |
      mv /opt/baremetal/{{ esxi_hostname }}.iso /usr/share/nginx/html/isos/

One important coding “teachable moment” here is the difference above between:

shell: |

and

shell: >

The “|” symbol means “each line below is a new line”.

The “>” symbol means “each new line below represents an option for a one-line command operation.”

The last part to this, simply boots from the created ISO on the nginx webserver and the lights-out management will do the rest of the work:

# Can also use the Dell/EMC iDRAC Repo . . .
  - name: Booting once using the custom built iso . . .
    hpilo_boot:
      host: "{{ ilo_ip }}"
      login: admin
      password: "{{ ilo_password }}"
      media: cdrom
      image: http://#.#.#.#/isos/{{ esxi_hostname }}.iso # <- Your webserver url should go here.
    delegate_to: localhost

After this, Ansible will simply wait 16 minutes, then we move on to Part 2: ESXi host configuration in vCenter.

Part 2: ESXi Host Configuration in vCenter

The rest of this is “the easy part”. After Ansible 2.8 was released, the Ansible Modules for vmware that were released made Single Touch a reality. Now it’s a simple matter of including the vmware modules that do what you need for your standard configuration. I won’t include all of the code here, it’s self explanatory. But let’s take the first example, vmware_host:

  - name: Adding ESXi host "{{ esxi_hostname }}.yourdomain.here" to vCenter . . .
    vmware_host:
      hostname: "{{ vcenter_hostname }}"
      username: "administrator@vsphere.local"
      password: "{{ vcenter_password }}"
      datacenter_name: "{{ datacenter_name }}"
      cluster_name: "{{ cluster_name }}"
      esxi_hostname: "{{ esxi_hostname }}.yourdomain.here"
      esxi_username: "root"
      esxi_password: "{{ esxi_password }}"
      state: present
      validate_certs: false
    delegate_to: localhost

Here, you have the module denoted as vmware_host. The rest is all authentication information and required info that the vmware_host module needs. One pretty common idea in “Ansible-world” is this idea of “state”. state: present means that the host will be added into vCenter.

The vCenter and ESXi portion of the playbook:

  1. Adds the host to vCenter into the Cluster specified.
  2. Adds the license key to the Host.
  3. Adds vmnic1 to Standard vSwitch0
  4. Changes some Advanced Settings, including the Syslog Loghost
  5. Restarts syslog (required to save syslog settings).
  6. Sets Power Management to “High Performance”
  7. Adds a vmkernel port group for the vMotion interface
  8. Adds a vMotion kernel port with the proper IP Address
  9. Configures NTP, Starts the NTP Service, and sets it to start at boot.
  10. Adds vmnic2,vmnic3 to the vDS.
  11. Stops ESXi Shell Service and sets to disable at boot (Idempotent)
  12. Stops SSH Service and sets to disable at boot (Idempotent)
  13. Takes the host out of Maintenance Mode (Idempotent)

And that’s it! Easy-peasy, right? Questions? Hit me up on twitter @RussianLitGuy or email me at bryansullins@thinkingoutcloud.org. I would love to hear from you.

18 thoughts

  1. Good question, and thanks for stopping by!

    The vars_prompt section of https://github.com/bryansullins/baremetalesxi/blob/master/fullmetalbuild.yml will prompt the user who runs the playbook for desired ESXi hostname, management ip, vmotion ip, ilo ip, and cluster_name. The rest of the variables should be defined in the vcentervars.yml file located here:

    https://github.com/bryansullins/baremetalesxi/blob/master/vars/vcentervars.yml

    vcentervars.yml is probably not the greatest name for it, now that I think about it, but you can add/change any variable info there since the playbook calls the vcentervars.yml file under the vars_files: section.

    Hope this helps!

    Like

    1. Hey Mitzi – Thanks for stopping by!

      Yes indeed, an Ansible build server is in order. Very astute!

      You would test/debug your playbooks on a separate Ansible server (CLI-style), get them to work and build your git repo there. Once you merge/push your git repo, you ensure the separate AWX server has matching config to run your playbooks (pip modules, virtualenv, etc.) and then your git repo will do the rest in an AWX Template.

      I also have a post on AWX here: https://thinkingoutcloud.org/2020/11/16/the-power-of-ansible-awx-a-k-a-the-free-ansible-tower/

      Hope this helps!

      Like

  2. One question about Automating tasks for patching ESXi Hosts via vCenter lifeCycle manager. Is that something you can throw some light on? I have been looking for Automating ESXi patching from vCenter using Ansible but the articles what I have seen soo far are related to patching ESXi host via esxcli and installing ViB’s on individual Hosts as compared to using vCenter to Remediate the Hosts. Can you please help here?

    Like

    1. Thank you for stopping by!

      At this time, I don’t have enough experience with vLCM to take a stab at it directly.

      And, if you don’t want to (or can’t) use vSphere Update Manager (VUM) to Remediate Hosts, the only Ansible option I am aware of is (as you said) to use Ansible raw/shell to upload the VIB and install the updates that way. This can be automated with using the parameter serial=1 in the playbook if you want to include rolling restarts, but you really need to test it, because in my humble opinion that is a high risk proposition.

      PowerCLI may be an option – Looks like there are improvements to PowerCLI 12.1 to include more vLCM, but I have not tried it myself:

      https://blogs.vmware.com/PowerCLI/2020/10/new-release-powercli-12-1-vlcm-enhancements.html

      The challenge with automated updates for ESXi is firmware. Not having tried it, I am hoping the vLCM will give people options. We use HPE Server Profiles with firmware baselines and with our nuke-and-pave ESXi-host-as-cattle approach, it’s completely seamless.

      Once we go to vSphere 7.0, I will revisit this. I hope this helps!

      Like

      1. Yes.

        We leverage HPE Image Streamer (https://support.hpe.com/hpesc/public/docDisplay?docId=a00003508en_us&docLocale=en_US) and Ansible to fully automate our ESXi host builds. We can decomm/rebuild any host to/from any vCenter (at the same site, of course) in 18 minutes.

        We treat ESXi as nothing more than a transient Compute Node. We don’t treat it as a static snowflake.

        Within the context of the firmware, each time the host is rebuilt, the firmware is automatically updated to match its baseline.

        And when we have new ESXi patch cycles, we simply rebuild them with a new image.

        The Server Profiles we use allow for use to dynamically re-zone shared storage, but that’s still something we are testing.

        Like

  3. I’m looking at your script and it looks like a good starting point for our non-pxe environment. Do you really have 3 vmnics? First you add vmnic1 to your standard vswitch, later you add vmnic2/3 to the vDS.

    Like

    1. Good question, there are 4 total (vmnic0-vmnic3):

      vmnic0 – single (default) nic used to “bootstrap” the host so we can connect to it for the rest of the automation – this is done in the kickstart phase, so you don’t need to add it explicitly in the Ansible code.
      vmnic1 – added into the standard switch to team with the aforementioned vmnic0.

      vmnic2-3 – added as redundant nics into the vDS.

      Like

  4. Are you using this still with vSphere 7.0? I followed the steps I did with previous 6.7 versions but now I get “Fatal error: 15 (Not found) right after “Loading /tpm.v00”. I retried the steps multiple times and I don’t know what I could have done wrong. Any changes that have to be made for 7.0? I copied and modified the 2 boot.cfg files as I did with 6.7.

    Like

      1. Oh well… with “unmodified” I mean not modified by Ansible scripts, just to be clear.

        Like

      2. How embarrassing, I simply forgot to change the src_iso_file variable and changed only esxi_build. Please delete all my other posts as this it too much noise 😉

        Like

  5. Thanks Bryan, this is the same as what I am creating right now. But I am thinking of not creating iso on every build. Just a single iso that accepts dynamic variables. Like passing the hostname, IP, etc as an argument into the kickstart. Any thoughts?

    Thanks,
    Warren

    Like

    1. Yes – that can be done, but in the interest of full disclosure, I couldn’t get it to work. If you do, I would love to see how you did! You would need to:

      1. Have an available networked storage location (NFS/HTTP works) for both the single ISO and the kickstart file.
      2. Modify the line in the fullmetalbuild.yml and the baremetalesxi/files/9484548/boot.cfg to reflect the new changes.

      Those will take some kickstart knowledge, so hit up my email DMs if you have questions.

      Like

  6. How we can add the DNS server IP, and DNS suffix, NTP server during the ESXi installation (in case that we dont have a vCenter )

    Like

    1. There are two ways you can do that, both have advantages and disadvantages.

      The first way is to alter the main yaml file to setup those items using kickstart. First, create the prompts for those under vars_prompt in this yaml file:

      https://github.com/bryansullins/baremetalesxi/blob/master/fullmetalbuild.yml

      Then, under the task:

      – name: Creating kickstart file with proper automation contents . . .

      Under “content” you can inject the parameters for DNS and NTP, etc. This will setup the ESXi hosts using kickstart.

      Or, with the tasks that configure the host post-boot, you can specify the ESXi host instead of vCenter server (last I checked, those Ansible modules are flexible enough to do that). So, for example, under:

      – name: Configuring NTP servers for host “{{ esxi_hostname }}.yourdomain.here” . . .
      vmware_host_ntp:
      hostname: “{{ vcenter_hostname }}”
      username: “administrator@vsphere.local”
      password: “{{ vcenter_password }}”
      esxi_hostname: “{{ esxi_hostname }}.yourdomain.here”
      ntp_servers:
      – time.nist.gov
      validate_certs: false
      delegate_to: localhost

      Specify the esxi host info under hostname, username, password instead of vcenter. But, without vCenter, it means you’d have to do that for all tasks.

      Like

Leave a comment