The New Stack Podcast

Automation for Cloud Optimization

Episode Summary

During the pandemic, many organizations sped up their move to the cloud — without fully understanding the costs, both human and financial, they would pay for the convenience and scalability of a digital transformation. “They really didn’t have a baseline,” said Mekka Williams, principal engineer, at Spot by NetApp, in this episode of The New Stack Makers podcast. “And so the those first cloud bills, I'm sure were shocking, because you don't get a cloud bill, when you run on your on-premises environment, or even your private cloud, where you've already paid the cost for the infrastructure that you're using. What’s especially worrisome is that many of those costs are simply wasted, Williams said. “Most of the containerized applications running in Kubernetes clusters are running underutilized,” she said. “And anything that's underutilized in the cloud equates to waste. And if we want to be really lean and clean and use resources in a very efficient manner, we have to have really good cloud strategy in order to do that.” This episode of The New Stack Makers, hosted by Heather Joslyn, TNS features editor, focused on CloudOps, which in this case stands for “cloud operations.” (It can also stand for “cloud optimization,” but more about that later.) The conversation was sponsored by Spot by NetApp.

Episode Notes

During the pandemic, many organizations sped up their move to the cloud — without fully understanding the costs, both human and financial, they would pay for the convenience and scalability of a digital transformation.

 

“They really didn’t have a baseline,” said Mekka Williams, principal engineer, at Spot by NetApp, in this episode of The New Stack Makers podcast. “And so the those first cloud bills, I'm sure were shocking, because you don't get a cloud bill, when you run on your on-premises environment, or even your private cloud, where you've already paid the cost for the infrastructure that you're using.

 

What’s especially worrisome is that many of those costs are simply wasted, Williams said. “Most of the containerized applications running in Kubernetes clusters are running underutilized,” she said. “And anything that's underutilized in the cloud equates to waste. And if we want to be really lean and clean and use resources in a very efficient manner, we have to have really good cloud strategy in order to do that.”

 

This episode of The New Stack Makers, hosted by Heather Joslyn, TNS features editor, focused on CloudOps, which in this case stands for “cloud operations.” (It can also stand for “cloud optimization,” but more about that later.)

 

The conversation was sponsored by Spot by NetApp.

 

Automation for Cloud Optimization

 

Many organizations that moved quickly to the cloud during the dog days of the pandemic have begun to revisit the decisions they made and update their strategies, Williams said.

 

“We see some organizations that are trying to modernize their applications further, to make better use of the services that are available in the cloud,” she said. “The cloud is getting more complex as they grow and mature in their journey.

 

“And so they're looking for ways to simplify their operations. And as always keep their costs down. Keep things simple for their DevOps and SRE, to  is not incur additional technical debt, but still make the most make the best use out of their cloud, wherever they are.”

 

Automation holds the key to CloudOps — both definitions — according to Williams. For starters, it makes teams more efficient.

 

“The less tasks that your workforce have to perform manually, the more time they have to spend focused on business logic and being innovative,” Williams said. “Automation also helps you with repeatability. And it's less error-prone, and it helps you standardize. Really good automation simplifies your environment greatly.”

 

Automating repetitive tasks can also help prevent your site reliability engineers (SREs) from burnout, she said.

 

Practicing “good data hygiene,” Williams said, also helps contain costs and reduce toil: “Making sure you're using the right tier of data, making sure you're not over-provisioned. And the type of storage you need, you don't need to pay top dollar for high-performing storage, if it's just backup data that doesn't get accessed that often.”

 

Such practices are “good to know on-premises, but these are imperative to know when you're in the cloud,” she said, in order to reduce waste.

 

During this episode, Williams pointed to solutions in the Spot by Netapp portfolio that use automation to help make the most of cloud infrastructure, such as its flagship product, Elastigroup, which takes advantage of excess capacity to scale workloads.

 

In June, Spot by NetApp acquired Instaclustr, a solution for managing open source database and streaming technologies. The company recognizes the growing importance of open source for enterprises. “We're paying attention to trends for cloud applications,” Williams said, “and we're growing the portfolio to address the needs that are top of mind for those customers.”

 

Check out the entire episode to learn more about CloudOps.

Episode Transcription

Alex Williams  0:08  

You're listening to the new stack makers, a podcast made for people who develop, deploy and manage at scale software. For more conversations and articles go to the new stack dot I O. All right now on with the show,

 

Colleen Coll  0:28  

sponsored by NetApp enables cloud operations teams to deliver scalability, performance and security for cloud infrastructure and applications at the lowest possible costs, through continuous automation, and optimization, combined with deep visibility, and governance, to learn more@spot.io

 

Heather Joslyn  0:49  

Hello, everyone, and welcome to another episode of the new stack makers Podcast. I'm Heather Jocelyn Features Editor of the new stack. And today we're going to be talking about cloud ops, which is in this case stands for cloud operations. What does it mean to be an ops engineer in a time when your network might be distributed on a cloud or several clouds or clouds and on prem? Or clouds on prem? And the edge? The possibilities are endless? And how are large organizations thinking about ops and changing their thinking to align with the needs of cloud native? We're joined today by Mecca Williams, Principal Engineer of spot by NetApp. Hi, Mecca.

 

Mekka Williams  1:27  

Hi. Great to be with you. It's great to have you. Mikey, can

 

Heather Joslyn  1:32  

you tell us a little bit about spot by NetApp? And what you do there?

 

Mekka Williams  1:35  

Yeah, sure. So I'm a bit of a veteran here at NetApp. I just celebrated 16 years. So I've been around a little while. And for those that don't know, NetApp is 30 year veteran in Data Management Services, providing data management services, and you know, our traditional solutions or for on premises data centers and the like. And we've in recent years been expanding that portfolio, so that that rich history and that expertise that we've built up in in storage efficiency and high performing storage solutions, and bringing that into cloud environments. And so spot by NetApp is part of our portfolio that's focused on helping folks be really lean and efficient in the cloud, both with resource utilization and cost, and really responding to more modern applications. So I'm very excited about the acquisition and where we're headed.

 

Heather Joslyn  2:29  

Terrific. So we've got one more thing before we get started our conversation, our conversation today. This episode is sponsored by spot by NetApp. So let's jump in. So big topic, let's start with a fundamental question, what are some of the biggest ways in which operations is different when your applications are running on the cloud?

 

Mekka Williams  2:47  

So of course, it depends on the architecture of your application. And there are so many things that can be different, but you know, the cloud is this consumption based resource it's shared. And while it's important to understand your application needs, no matter where it runs, it's increasingly important in the cloud, you really have to understand at a granular level, you know what resources your application needs, how it responds to changes in the environment, because this is a governing factor in how reliable and performant your application is. And in addition, the clouds attack surface is completely different. And it can be more complex in some cases. So you really have to be mindful of the solutions you put in place to secure your application. And all the proximal data

 

Heather Joslyn  3:32  

will you see are the components of a cloud ops organization.

 

Mekka Williams  3:36  

So if you think about DevOps as a culture, and I don't want to be too preachy here, but you know, this culture of being lean and collaborative, and breaking down silos, and being really efficient, it's kind of applying that culture all across all of the operations that are required to operate in the cloud. So you have things like fin ops, you know, making sure you have a good financial strategy DevOps proper, right, making sure your software supply chain is intact, sec ops, data ops, like all of these operations, being really lean with your processes and tasks that you need to operate in the cloud efficiently.

 

Heather Joslyn  4:16  

I think, Mike, it might also be good for us to sort of go back just a little bit. I've heard so many different definitions of DevOps. How would you describe what is your what is DevOps Mecca? Oh,

 

Mekka Williams  4:28  

gosh, yeah, this is a great question. I love this. So, let's see, my definition is kind of wordy. I'll be concise, but it really is this culture of collaboration, this shared responsibility across all the teams that are involved from the planning phase to the deployment phase for getting these teams working together with a shared goal of getting increased value and with high quality to the customer. Right. So these processes this this tearing down of silos, this Really working together, but being efficient as possible. And I always like to add the a little bit of seasoning of continuously improving continuously learning, as you iterate through each cycle of providing that software.

 

Heather Joslyn  5:14  

Yeah, sort of a feedback loop of what's working and what isn't, and continue to need to iterate. What might be some of the challenges if you're running on an intro, I talked about why people are running on multiple clouds, or hybrid situations, what what are some of the challenges Speke, ticularly, from an ops point of view,

 

Mekka Williams  5:30  

so multi cloud and even hybrid cloud, all of these platforms have a set of API's and tools that are advertised to help you streamline processes and operations that that you may need to perform in order to run your application or workload. And when you run in a heterogeneous environment, sometimes you incur technical debt, trying to automate and standardize your processes. Because there's, there's some custom API's and abstraction layers that can help you do this. Or sometimes organizations may develop their own and then they incur this technical debt. So so the complexities increase as you add different platforms to your overall data center or your environment. And so that can be increasingly complicated. And then each different surface is a different attack surface again. So really managing and being able to standardize these processes across those environments can be challenging.

 

Heather Joslyn  6:33  

A lot of these organizations are running on Kubernetes, they have containerized micro services, what role does that play in cloud ops,

 

Mekka Williams  6:41  

containers in general, and Kubernetes have done a ton in the way of helping developers and businesses deliver content and fixes enhancements to their end users more quickly. I mean, it's basically revolutionized DevOps, right, we've had to really define and get clean with DevOps, in order to keep up with the pace of, of content delivery enabled by containerization, and Kubernetes. And Kubernetes is the de facto platform for cloud native development. And I mean, it's everywhere. At but we know that most of the containerized applications running in Kubernetes clusters are running underutilized, right. And anything that's underutilized in the cloud equates to waste. And if we want to be really lean and clean and utilize resources in a very efficient manner, we have to have really good cloud up strategy in order to do that. So understanding again, what your what your application needs, and and being really efficient in running and operating.

 

Heather Joslyn  7:44  

I want to talk a little later in our conversation about cost optimization, but what can you tell me about the role that automation should play in a cloud ops organization?

 

Mekka Williams  7:53  

Well, you know, I used to do DevOps exclusively for a living and I feel like automation is the lifeblood of all operations. We should automate everything.

 

Heather Joslyn  8:04  

automate all the things,

 

Mekka Williams  8:05  

automate all the things. Exactly. But, you know, we have the perspective that automation helps teams be more efficient, you know, the, the less tasks that your workforce have to perform manually, the more time they have to spend focused on business logic and being innovative. And then automation also helps you with repeatability. And it's less error prone. And it helps you standardize on really good automation simplifies your environment greatly. It's, you know, yes, it's really true automate all the things. Also,

 

Heather Joslyn  8:41  

my understanding is that there's a role that automation plays in just making your engineers and your developers happier, because they're not having to do a lot of repetitive tasks. And I mean, is that is that a sort of secondary benefit in terms of automating things using automation heavily in your operations?

 

Mekka Williams  8:58  

Absolutely. I believe that's true. I think I read somewhere about over the course of the pandemic, it was observed that DevOps and SRE is or we're burning out faster than anybody else, right. So anything that can be done to help make their lives easier, and yes, if something is repeatable, if something is required to be repeatable, both to make your processes more efficient, but also to make those DevOps and SRE ease lives easier. It absolutely should be automated.

 

Heather Joslyn  9:26  

I sort of said in the intro to that, in this case, cloud ops refers to cloud operations, but it also can refer to cloud optimization. How do you define that? And how does that overlap with, say Site Reliability Engineering, or does it

 

Mekka Williams  9:40  

so Cloud optimization, I think is crucial to being successful with cloud apps like this shouldn't be the goal of your cloud ops strategy, you know, again, with the focus of our spot by net out portfolio as a good example of like, we want you to be as successful and get the most out of your cloud, wherever you are. Right. And so part of that strategy is making sure you are running with the right instance types, right? Or that You are mindful of, of any waste that you have, and that you continue this process that you continually optimize, and try to be leaner and more efficient. You know, each time you iterate through your workflow. And so yeah, optimization is I think, is the end goal is always the goal of any good cloud ops strategy. And this regular evaluation of how am I getting as much am I as good as I can be? Cloud continually innovates to so there's always more a new opportunity to optimize. And I think cloud optimization is a complement to sre. Right? You know, SRE work tends to take a software approach to the operationalization of you know, what's required to keep services up and running with high quality. So, you know, good and strategic cloud ops, that's resulting in good cloud optimization, again, makes SRS lives

 

Heather Joslyn  11:01  

easier. Yeah, a lot of them pretty happy, because they're very important to the running of businesses and large enterprises. And

 

Mekka Williams  11:09  

yes, cupcakes and cloud optimization, keep your SRE is happy.

 

Heather Joslyn  11:14  

As indeed, what can you tell me about what your organization has learned from its customers and how they're thinking about cloud ops.

 

Mekka Williams  11:21  

So I think we observed a big rush to the cloud with the especially with the pandemic when everybody went home, right. So where migration, I think was the initial strategy, just just get there, you know, and utilize cloud infrastructure, I think we're seeing customers kind of revisit, and start to look at ways to make better use of the cloud. So these strategies are being updated, where it didn't make sense to go, you know, lift and shift to the cloud, maybe it was monolithic applications, maybe it was just, you know, for convenience, to get there faster, they may have just moved processes and applications, maybe too hastily. And now there's they're revisiting. So you know, we see some organizations that are trying to modernize their applications further to make better use of the services that are available in the cloud. And the cloud is getting more complex as they grow and mature in their journey. And so they're looking for ways to simplify their operations. And as always keep their costs down, keep things simple for their DevOps and SRE is not incur, you know, additional technical debt, but still make the most make the best use out of their cloud, wherever they are,

 

Heather Joslyn  12:39  

we definitely heard about organizations moving to the cloud during the pandemic, because as you said, everybody went home at the world of work changed a great deal were you seeing as some of the most common mistakes people were making in terms of operations when they rushed to the cloud?

 

Mekka Williams  12:53  

So I think this shift from capex to OP X really caught people. I know, it caught me in my old DevOps work. So not really understanding steady state sunny day, what that costs look like what that resource utilization look like. And so they really didn't have a baseline, those first cloud bills, I'm sure were shocking, because you don't get a cloud bill, when you run on your resources, your on premises environment, or even your private cloud, where you've already paid the cost for the infrastructure that you're using. Now, if you take those same processes and applications and workloads and you throw them in the cloud, well, you know, you're getting charged and you go without any strategy. And you're using on demand, you have an on demand consumption model, and you're getting charged by the minute or that by the usage. Yeah, that's the sticker shock, I think was that was a big deal. I think that was a reason why a lot of folks, you know, decided to go back and reevaluate. So, you know, the strategy, the cloud ops strategy is so important, when you make that move to make sure you don't waste money. And honestly, good cloud up strategy, if you if you plan properly, that's going to benefit you even on premises. So how we get lean and really efficient, and practice good hygiene, data, hygiene, and all those things, the security, all of that can benefit you on premises and anywhere you run your applications and workloads.

 

Heather Joslyn  14:21  

What do you mean by data hygiene, good data hygiene, what's an example data

 

Mekka Williams  14:25  

hygiene? Right? You know, so, you know, making sure you're using the right tier of data, making sure you're not over provisioned. And the type of storage you need, you know, you don't need to pay top dollar for high performing storage. If it's just backup data that doesn't get access that often you know, don't have hot storage provision for cold data, understand what your data heat map looks like and where it needs to be to be the most efficient. Should you be using caches, you know, these are all strategies that you should evaluate and on Understand your application needs. These are good to know on premises. But these are imperative to know when you're in the cloud. And don't waste right if data can be if you can use Storage efficiencies to reduce your data footprint, you should do that, right? These are all things that keep your cost down and keep you actually, you know, keep you more green, because you know, use less greenhouse gases, the smaller your data footprint. And this is all good stuff to

 

Heather Joslyn  15:23  

do, how to spot by NET app, tackle cloud apps, both definitions, operations and optimization,

 

Mekka Williams  15:28  

SOS spot, my Nannup has really leaned into the cloud, I mean, we have a robust portfolio of cloud app solutions, with spot at its core. So everything from automated cost optimization with using AI machine learning algorithms to, to choose to make compute selection for you, to the automation of Kubernetes management to the to load balancing the infrastructure used by your Kubernetes clusters, we have cloud security. And we also, you know, the enterprise is embracing open source technologies. And so we've recently added an addition to the portfolio to help you manage open source database and streaming technologies. So so we're paying attention to what applications to trends for, for cloud applications, and we're growing the portfolio to address the needs that are, you know, top of mind for those customers,

 

Heather Joslyn  16:24  

engineers often find cost optimization, you know, time consuming and painful. But we're we're coming into an economic period where a lot of a lot of companies are looking at their, you know, tightening their belts, and so forth. And looking at ways they can cut costs, how does spot by NetApp make it easier to get more bang for your buck in terms of your Cloud Spin,

 

Mekka Williams  16:42  

so the flagship product or solution of the spot by NetApp portfolio is the elastic group solution. And this is really at the heart of optimizing for cloud infrastructure. So Cloud has these different charge models. And elastic group takes advantage of utilizing excess capacity, and helping by tying in the automation required to respond to replacement of excess capacity, if it happens during your application uptime, and has really good auto scaling capabilities. And so this is one really simple way, it's very easy to use, that can help you get the most bang for your buck out of the cloud. And then the other products layered on top of that, like spot ocean that that will help you load balance and be very cost efficient for your Kubernetes based applications. And then, you know, we have visibility into what your spend looks like. And as I mentioned, we have our instant cluster, our latest addition, which will manage your open source based databases for you, you can provision and configure your open source database clusters, and Insta cluster will take care of the management for you, as well as for Kafka streaming, we have spot security, we have lots of automation, goodness coming that, again, makes the burden of the work and tasks that are required to operate in the cloud easier for your DevOps and SRE ease. And it's all in one place. Right? We provide all of these services and offerings at it in a centralized way. So we're we're really paying attention to who are our end users and growing and innovative and key ways

 

Heather Joslyn  18:23  

you mentioned that provides visibility into the spend, is there. So are there dashboards that visualize so you can see that in real time? Or is it has that? Yeah,

 

Mekka Williams  18:32  

so understanding your spend in a dashboard, with just the cloud native tools can be cumbersome, because you have to look in so many different places. So we do have cloud checker, that gives you kind of a single pane of glass, I know you love that term.

 

Heather Joslyn  18:50  

Love it. We very much love, love that term.

 

Mekka Williams  18:55  

But yeah, it gives you visibility across your, across your clouds. So that it helps you to get a holistic view of what your spend is. And that's really important, especially if you have a multi cloud environment, sometimes it can be challenging to normalize and make sense of your budgets and your spend. And so that's a really useful view that the spot portfolio provides. And then the spot last group dashboard is really good too, because it shows you in real time at all times how much you're actually saving and who does not want to see that

 

Heather Joslyn  19:27  

that's true and information you can share with your bosses and the people in the C suite. And so,

 

Mekka Williams  19:32  

yes, I have so many screenshots of my elastic group dashboards stored so that I can pick them up and

 

Heather Joslyn  19:41  

it's cool. Is there anything else about this whole issue of cloud apps that you feel that we didn't cover that you feel is important for our listeners to know?

 

Mekka Williams  19:49  

Well, I do. Because it's the cluster is our latest addition. I do want to add one aspect that maybe we didn't talk about so much it but in the spirit of offloading DevOps and SRE Freeze. You know, Cloud is innovating at lightning speed. I mean, tech is always innovating really quickly. But Cloud and Kubernetes have advanced things. They have advanced innovation in such a way that we have to do a lot of work to keep up. And so products like Insta cluster, that helped manage open source technologies go a long way in keeping the burden of filling the talent and skill gap down. So your DevOps and SRE teams can focus on their tasks and their jobs without having to constantly learn new skills, and feel like they have to learn all the stuff and know all this stuff. While it's fun to do that, in your spare time. Or if you just want to walk out sometimes that's great. But the pressure to have to do it, you know, to keep your job is something that I think contributes to burnout. And so this is a space that I expect to see grow in particular in the DevOps space, because the the tools matrix of DevOps tools available is so vast, and so anything that we can do to streamline and standardize and automate and keep things, you know, workable and functional, for those for our DevOps and SRE heroes, I think, will go a long way.

 

Heather Joslyn  21:16  

It does seem like that's a realization that people are having in this world of, you know, there's only so much we can ask of people in terms of building skills and building skills quickly. And it's it's interesting to hear that it does align with what we're hearing from sources and from developers and engineers to a lot is expected of them in terms of cognitive load in terms of building new skills constantly. Yeah, absolutely. Well, thank you very much for joining us today Mecca. I just want to thank Mecca Williams from spot by NetApp, for joining us for this conversation. And I'd like to thank our friends at spot by netup for sponsoring this episode. And I'd like to thank all of you for joining us for this episode of the new stack makers. See you next time.

 

Colleen Coll  21:56  

Spot Fiat app enables cloud operations teams to deliver scalability, performance and security for cloud infrastructure and applications at the lowest possible cost. Through continuous automation and optimization. Combined with deep visibility, and governance. Learn more@spot.io

 

Alex Williams  22:17  

Thanks for listening. If you'd like to show, please rate and review us on Apple podcast Spotify, or wherever you get your podcasts. That's one of the best ways you can help us grow this community and we really appreciate your feedback. You can find the full video version of this episode on YouTube. Search for the new stack and don't forget to subscribe so you never miss any new videos. Thanks for joining us and see you soon.

 

Transcribed by https://otter.ai