A disclaimer: There are probably other ways to solve what follows here, but this is how I solved it. Also, I do not claim to be a guru at bash scripting, but what I talk about here should give you some ideas . . .
Ever felt like you had a pretty good handle on things . . . then all of a sudden you realize you don’t?
Worse, have you ever felt, after figuring something out that might be considered, “rudimentary,” you think to yourself . . . “Uh . . . maybe I should have known that?”
It’s been one of those weeks for me . . .
I have been using Ansible for about 5 years, and so far I have apparently been fortunate enough that my playbooks use a single user, or at least run everything using become: yes
.
Additionally, my bash commands, outside of my ESXi Single Touch playbooks, have been one or two liners. Therefore, I have been able to simply become the root user, do the things root needs to do, then I am out.
This week, I ran into a case where I had a set of bash-level commands (mostly find
and cp
) that could only be run as root (or had root-only permissions), but then the final command (after the aforementioned file preparation) absolutely could not be run as root. Here are the details as a problem statement:
The Scenario:
- A file has been copied up to a bunch of Ubuntu VMs using a file sync tool into a directory that has root-only permissions. This was a testing scenario, so the number of VMs was 14. This is very a small subset of what is in prod.
- This file synchronization could have failed.
- Copying up the file is going to take some time – it’s an 800MB file and these VMs are across different geographical locations. It’s not really practical to just say, “screw it, I’ll just copy the file up to every machine even if it’s there already.” This would take FOREVER across every machine.
- This file just so happens to reside in a randomly named directory, so we have to find the file in the directory location, then copy it to a known location.
- And finally, we have to execute a command on the file that cannot be run as root.
I am not sure if I have ever tried to switch user accounts mid-playbook, as it were. Or, more likely there has been some other kind of workaround, like, using sudo
, or running everything as a user that does have permissions. . . . Or creating one. . . . Or it’s, “not recommended to run that command across 200 machines as root, but go ahead WHATEVS LOL”. And so on.
This time, I did not have any of the above as options. And I think the main factor in this case is that I had to save command outputs to variables, which has its own peccadilloes.
The Process – bash Scripting in Ansible
First, let’s back up and talk about the process. It is this blogger’s opinion (I have always wanted to say that), that there are three major options for running bash commands through Ansible on target machine:
- Use the
shell
,raw
, orcommand
modules to run the bash commands on the target machines. Each of those three have their advantages and disadvantages, all of which go beyond this post. - Make a bash script as a
*.sh
file, copy it up to the target machine, and run it as a bash script, like you would any other script. This also has its advantages and disadvantages, all of which go beyond this post. - Use the
shell/copy
modules to script the commands, but use Ansibleregister
to save the results and use Ansible Blocks for error checking, which is part of this post.
Take the following bash code:
TARFILELOC = sudo find /srv/app/ -name appinstall.tar
DESTFOLDER = /opt/app/dest_folder
sudo cp $TARFILELOC $DESTFILELOC
You could, for example, use the shell
command with something like this:
tasks:
- name: Find app tarball and copy to the proper destination
shell: |
TARFILELOC = sudo find /srv/app/ -name appinstall.tar
DESTFOLDER = /opt/app/dest_folder
sudo cp $TARFILELOC $DESTFOLDER
This has worked for me. Quite well actually. And maybe as a “one-off” it might be OK. But, it’s not really the “Ansible” way.
First of all, this goes into a vacuum. I suppose there is a way to error check through bash, but remember that that would happen beyond the control of Ansible. This is what I would call the, “have fun storming the castle” method. You are sending this off to the machine and hoping for the best, as it were. I am not saying it’s wrong, but you have less control with this method.
Instead, and this is the lesson overall here, you should always try to use Ansible mechanisms for doing things. This gives you much more control and is a much better method for using the data long term in the playbook – the output can be captured this way, and used throughout the playbook, even for other tasks. The following is much more like it:
vars:
- dest_folder = /opt/app/dest_folder
tasks:
- name: Finding tar location . . .
shell: |
find /srv/app/ -name appinstall.tar
register: tar_loc
- name: Copy tarball to destination folder . . .
copy:
src: "{{ tar_loc.stdout }}"
dest: "{{ dest_folder }}"
remote_src: yes
Notice that we are using Ansible register
here to refer to the results repeatedly.
One thing I am missing here is error checking in the form of failed_when
, and I am aware. Right now, let’s just focus on what we have – I am getting to error-checking.
Going the Next Level: Ansible Blocks
The most recent use case, as described above, caused me to do some marathon google sessions research. The result of that research gave me the power of Ansible Blocks. Remember, I need to run a whole series of commands as root, then switch back to a lesser-privileged user. That would look like this:
vars:
- dest_folder = /opt/app/dest_folder
tasks:
- block:
- name: Finding tar location . . .
shell: |
find /srv/app/ -name appinstall.tar
register: tar_loc
- name: Copy tarball to destination folder . . .
copy:
src: "{{ tar_loc.stdout }}"
dest: "{{ dest_folder }}"
remote_src: yes
become: true
become_user: root
become_method: sudo
- name: Non-root commands from the authenticated user can be run here:
shell:
/lesser/privileged/user/command/goes/here --path "{{ dest_folder.stdout }}"
I am not saying you can’t switch by other means, but your best bet is to use the Ansible methods for capturing the data you need. Why?
Variable Scope would be one reason. Any Ansible variables or registers that are created in the block are available elsewhere in the playbook, even though you switched *nix users.
YEAH! WHAT DO YOU THINK OF THAT?
It Gets Better: Error Checking with Ansible Blocks
As it turns out, Ansible Blocks have an added benefit of allowing you to do do more robust error checking using block/rescue/always
. From the Ansible documentation, linked earlier:
“Blocks also introduce the ability to handle errors in a way similar to exceptions in most programming languages.“
So what happens if my find
command fails, for example? I can take additional action, like maybe copy the file from somewhere else, and so on:
vars:
- dest_folder = /opt/app/dest_folder
tasks:
- block:
- name: Finding tar location . . .
shell: |
find /srv/app/ -name appinstall.tar
register: tar_loc
- rescue:
- name: Copy the file from a remote server . . .
copy:
src: /srv/myfiles/appinstall.tar
dest: {{ dest_folder }}
owner: foo
group: foo
mode: u+rw,g-wx,o-rwx
- always:
- name: Copy tarball to destination folder . . .
copy:
src: "{{ tar_loc.stdout }}"
dest: "{{ dest_folder }}"
remote_src: yes
become: true
become_user: root
become_method: sudo
- name: Non-root commands from the authenticated user can be run here:
shell:
/lesser/privileged/user/command/goes/here --path "{{ dest_folder }}"
Hopefully, you get the idea. There are a multitude of options . . .
Maybe instead I want to be emailed if the find
task doesn’t execute and deal with it by other means, or you could still use blocks, let it fail and simply run the playbook again after you’ve remediated it.
Hit me up on twitter @RussianLitGuy or email me at bryansullins@thinkingoutcloud.org. I would love to hear from you!