Guidelines For Digital Transformation Part 2: Transform and Roll Out

Part 1 was about changing your company’s culture, and therefore setting the stage for your next steps. Here I will list off what kinds of projects qualify as Digital Transformation.

My disclaimer here is that this post is by no means meant to be a deep dive, but the guidelines listed here will definitely be a good start on your path to Digital Transformation.

But First, An Anecdote . . .

My first ever IT Admin job was at a non-profit organization that assisted people with disabilities and they mostly operated on donations. It just so happens that I signed on right after we had received a huge donation. In one of the very first meetings I attended, the CEO detailed where all the money went. The money for IT was . . . to put it lightly . . . very small.

I was bold enough to ask the CEO, privately, about why the IT budget was so small by comparison. His answer was straightforward:

“We publish what we spend donations on to the public, and donors want to see that it goes to smiling kids in wheelchairs. If they see that it goes to something intangible, like computer equipment for our business, they get angry about it.”

I know, right?

Now, I will admit wholeheartedly that I have a bias here, but, first of all, there are ways to make the intangible tangible. That’s the second time I’ve linked that post in this series, by the way.

Second of all, leadership usually balks at a larger investment in IT. They don’t see the benefit because they see IT as a cost center that, to them, doesn’t “generate revenue”. But if you invest properly, there should be a big ROI.

So, in the interest of full transparency, I am not saying that Digital Transformation is inexpensive. It is not. One of my jobs is to deal with the sticker shock of going from a, “small mom and pop shop” to “an Enterprise.” The key, ultimately, is to convince people going through this type of transformation that “running your business like an Enterprise” means that you have to “invest like an Enterprise” and that’s not an easy pill to swallow.

Notice that I didn’t say, “spend like an Enterprise”. I said “invest like an Enterprise”. The latter means that you will get that money back and it will be profitable if you invest it right.

Or, to put it more bluntly, as was stated by one of my colleagues recently:

“If you want to run with the big dawgs, you have to invest big dawg money.”

Here’s just one example: the companies who get it right realize that they have to invest in data collection at the very least. What do your customers spend their money on? How can you forecast what they will spend their money on in the future? How do you simply make the right decisions about business?

The answer is some form of Enterprise-level data collection method. Do you want to have disparate spreadsheets put together by humans? Or do you want a robust dashboard with a database backend for all of it that is centralized and highly available?

We could also start down the rabbit hole of Business Continuity if you want (which I will later), but let’s table that for now.

And yes, all you higher-ups, I am sure every Department says this: The Marketing Department says, “You want to run with the big dawg advertisers, you have to invest big dawg advertising money!” Manufacturing says, “You want to run with the big dawg manufacturers, you have to invest big manufacturing money!”

So I get it, hands are out everywhere. Hopefully with Part 1 and Part 2 here, you will at least have an idea of where your money can be invested with IT. See this as a buffet from which you can pick and choose, or you can prioritize some things over others. You don’t have to **insert buzzword eyeroll here** “boil the ocean”.

The NIST Definition of a Cloud

Way back in 2011, NIST defined what qualifies as a “Cloud”. There are 5 items in the list:

  1. On-Demand Self-Service
  2. Broad Network Access
  3. Resource Pooling
  4. Rapid Elasticity
  5. Measured Service

A friendly reminder that these are not limited to Public Clouds. These can all be implemented on your Private Cloud as well. In fact, that’s kind of the idea behind digital transformation: try to get the same kind of efficiencies and expectations on your datacenter as one would get in the Public Cloud.

Furthermore, I want to add two more, in my professional opinion: Business Continuity and Secure by Default. But hey, it was 2011 and arguably those are a given, so who am I to criticize NIST?

IMPORTANT INTERJECTION! Enterprise Documentation and Tribal Knowledge

I hope I talked about this in Part 1, but FFS don’t let people keep shit to themselves. It’s another one of my (many?) pet peeves about Engineers. I used to block off Fridays for documentation. All day. Digital Transformation also means that it is resilient to people leaving. The next person should be able to pick up right where their predecessor left off.

I know how hard it is to get people to document, but I dare say someone should be put on PIP if they don’t do it, just to let you know how strongly I feel about it. But hey, I am just a lowly Sales Engineer, so if not that, find some way to incentivize the sharing of knowledge.

I am aware of at least two large Enterprises who let people go because they were territorial and didn’t share, so there’s that.

One More Thing and Then Let’s Talk About the List

To interject my own selfish take on things, a strong opinion I have, however weakly held, is that if you have the word Senior in your title, particularly in Infrastructure, you should not be doing mundane stupid crap like firmware updates and the like. You should be doing cool stuff like IaC or DevOps or Kubernetes. And you should want to do all the cool stuff.

If your company, “doesn’t have the head count to offload firmware updates” then get to coding. Firmware is one of the easiest things to automate with the biggest payoff. It’s a good place to start your Digital Transformation journey.

Many of my points below are from my own selfish desire to not do mundane stupid crap; I am sure you’ll be able to see that in this post.

Also, something less selfish and more about “doing things right” is a friendly reminder about the SRE O’Reilly book I mentioned in Part 1. See the section on “Automating Away Toil.”

On-Demand Self Service and Broad Network Access

Can your users create their own VMs or other resources through a unified and centralized interface (think ServiceNow)?

The theme here, again, is to not have Engineers do mundane stupid crap. Also, if your Engineers right-click a VM Template and manually configure it, they’re doing it wrong. Automate the entire process, including Template creation. Pipeline it, GitOps it, IaC it. The only time someone should be logging into something directly is for troubleshooting or if they’re using jump servers.

Furthermore, while we’re at it, let’s bang out “Broad Network Access”. Sure, we have the more obvious “network access is wide-ranging,” but we also need to ensure we are removing barriers to execution: it’s kind of hard to deliver a VM in an automated way if there’s no Network Automation. Adding VMs to Load Balancers or adding Firewall Rules come to mind. This is also where products like NSX come in handy (by the way, I want to be transparent about this and state that yes, I work at VMware).

Also, don’t forget about the provisioning of storage. That will need to be automated too.

And while you’re at it, you should also include some form of Configuration Drift Remediation. Something like SaltStack Config might help you there.

Boom! There’s product name drop #2 for you!

Additional things you’ll need here are iPAM and CMDB/DCIM.

The usual objections people will have about Self Service is about guardrails. “Duhhhhhh, what if someone accidentally creates 40 VMs when they meant to create 4, DUUUUUURRRRRR!”

Well, I have three fixes for those things. The first is you should include guardrails throughout the entire process and in the code, and the other two are . . . .

Resource Pooling and Measured Service

I am also going to piggy back Logging and Monitoring here. Of course things are going to scale up and grow. So, you need to have resource pooling, which is an organized place for things to go and that is centrally available for consumption. Resources should be categorized and secured from each other.

Logging and Monitoring is not just about monitoring performance; it’s also about Governance: monitoring costs and standards.

And by the way, remember those VMs that get created through Self Service? How about automatically configuring them for logging and monitoring out the gate at creation?

The Measured Service part is also known as “Showback” or “Chargeback,” which can be a useful tool no matter where you run your stuff.

I feel compelled to talk about right-sizing while we are at it. You want Applications owners to stop over-sizing their VMs? Start charging them for it.

Rapid Elasticity

Rapid Elasticity means that you can automatically scale up or scale down based on need. Provided you plan for it, you can do this at every level, including at the hardware layer. Virtualization itself allows for scaling at the VM level, but through automation, you can scale up/down ESXi through what’s referred to as a “nuke and pave” or “deploy and destroy” model.

This allows for you, within minutes, to provision/reprovision/decomm ESXi hosts to/from various clusters. Got a sale coming up in retail and you are going to need more hosts in the peaking cluster? Reprovision ESXi hosts from one cluster to another with a button click. At my last job, I added 8 from-bare-metal to VM-ready ESXi hosts into a cluster in 90 minutes. It didn’t happen overnight and it wasn’t easy, but that work saved us almost $200,000 a year since our ESXi hosts were plug-and-play across the datacenter.

Again, this is accomplished through models where you code; the ideal situation is that all the robots do all the work. Through event-driven automation you can even autoscale. But, we’re getting off-track.

A warning: the hardest part about the “nuke and pave” approach is selling people on the idea. People get emotionally attached to their robots. Start small and iterate.

Secure by Default and Business Continuity

I don’t want to turn this post into a preachy rant on ensuring everything is secure. I’ve kind of already done that. But just remember that security practices can be included at creation time with your code. Follow the principles you are supposed to follow (Defense in Depth, Principle of Least Privilege) and ensure they are baked into the cake.

And I will admit wholeheartedly that “business continuity” is a loaded phrase, and one of the hardest things to get approved. It’s expensive but is extremely rare (hopefully) that you’d have to use it.

But it’s necessary and should be baked into the cake. The question is how much and how responsive? What RTO/RPO can you define and how do you get there? What’s your DR site going to be? Should you build your own Datacenter, or should you use the cloud? That’s all up to you, but as I mentioned in Part 1, work with your Leadership on scenarios where stuff goes down and how much it costs. Most people are not willing to take that kind of a hit. DR/Backups and what not are just like paying insurance: You will rarely use it, but you will be glad that it’s there.

Hopefully this gives you a good place to start. Call me if you need me. See you next time.

Hit me up on twitter @RussianLitGuy or email me at bryansullins@thinkingoutcloud.org. I would love to hear from you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s