The New Stack Podcast

Rethinking Web Application Firewalls

Episode Summary

Web Application Firewalls (WAF) first emerged in the late 1990s as Web server attacks became more common. Today, in the context of cloud native technologies, there’s an ongoing rethinking of how a WAF should be applied. No longer is it solely static applications sitting behind a WAF, said Tigera CEO Ratan Tipirneni, President & CEO of Tigera in this episode of The New Stack Makers.

Episode Notes

Web Application Firewalls (WAF) first emerged in the late 1990s as Web server attacks became more common. Today, in the context of cloud native technologies, there’s an ongoing rethinking of how a WAF should be applied.

 

No longer is it solely static applications sitting behind a WAF, said Tigera CEO Ratan Tipirneni, President & CEO of Tigera in this episode of The New Stack Makers.

 

“With cloud native applications and a microservices distributed architecture, you have to assume that something inside your cluster has been compromised,” Tipirneni said. “So just sitting behind a WAF doesn't give you adequate protection; you have to assume that every single microservice container is almost open to the Internet, metaphorically speaking.

 

So then the question is how do you apply WAF controls?

 

Today’s WAF has to be workload-centric, Tiperneni said. In his view, every workload has to have its own WAF. When a container launches, the WAF control is automatically spun up.

 

So that way, even if something inside a cluster is compromised or exposes some of the services to the Internet, it doesn't matter because the workload is protected, Tiperneni said.

 

So how do you apply this level of security? You have to think in terms of a workload-centric WAF.

The Scenario

 

The vulnerabilities are so numerous now and cloud native applications have larger attack surfaces with no way to mitigate vulnerabilities using traditional means, Tiperneni

 

“It's no longer sufficient to throw out a report that tells you about all the vulnerabilities in your system,” Tiperneni said. “Because that report is not actionable. People operating the services are discovering that the amount of time and effort it takes to remediate all these vulnerabilities is incredible, right? So they're looking for some level of prioritization in terms of where to start.”

 

And the onus is on the user to mitigate the problem, Tiperneni said. Those customers have to think about the blast radius of the vulnerability and its context in the system. The second part: how to manage the attack surface.

 

In this world of cloud native applications, customers are discovering very quickly, that trying to protect every single thing, when everything has access to everything else is an almost impossible task, Tiperneni said.

 

What’s needed is a way for users to control how microservices talk to each with permissions set for intercommunciation. In some cases, specific microservices should not be talking to each other at all.

 

“So that is a highly leveraged activity and security control that can stop many of these attacks,” Tiperneni said.

 

Even after all of that, the user still has to assume that attacks will happen, mainly because there's always the threat of an insider attack.

 

And in that situation, the search is for patterns of anomalous behavior at the process level, at the file system level or the system call level to determine the baseline for standard behavior that can then tell the user how to identify deviations, Tiperneni said. Then it’s a matter of trying to tease out some signals, which are indicators of either an attack or of a compromise.

 

“Maybe a simpler use case of that is to constantly be able to monitor and monitor at run time for known bad hashes or files or binaries, that are known to be bad,” Tipirneni said.

 

The real challenge for companies is setting up the architecture to make microservices secure. There are a number of vectors the market may take. In the recording, Tipirneni talks about the evolution of WAF, the importance of observability and better ways to establish context with the services a company has deployed and the overall systems that companies have architected.

 

“There is no single silver bullet,” Tipirneni said. “You have to be able to do multiple things to keep your application safe inside cloud native architectures.”

Episode Transcription

Alex Williams  

You're listening to the new stack makers, a podcast made for people who develop, deploy and manage at scale software.

 

For more conversations and articles go to the new stack dot I O. All right now on with the show.

 

Colleen Coll  

Tigera, the inventor and maintainer of open source Calico delivers Calico cloud, the next generation cloud service for Kubernetes security and observability, Tigera  and the new stack are under common control.

 

Alex Williams  

Hey, it's another episode of the new stack makers. And today I am joined once again by returning to Panini, who is the president and CEO at PTI, Gara return, thank you so much for joining today. Thank you, Alex. Thanks for having me here. Good to see you again. Definitely great to see you. And this is the second part of a discussion that we had. And some of the topics that we discussed in our first discussion, we talked about how the workplace is changing quite a bit, how that's really forcing a whole new move to cloud services are seeing the growth of cloud native applications and the process, but the question still remains about the approach to securing those cloud native applications. And how is that actually done? So how do you identify how do you assess how do you prioritize? How do you adapt to risks across the application? How are organizations really thinking about this from a technical perspective? And so I wanted to follow up, and maybe the first thing we can do is talk about one theme that we really touched on, and that's related to web application firewalls. Now, we talked in February, since February, we've had a number of vulnerabilities discovered, what has been your thinking about the evolution of WF when you think about what we're seeing in the attack landscape, and the preventative measures that companies are taking or not taking it all?  

 

Ratan Tipirneni  

Yep. So Alex, I think RAF has existed for a long time for several decades, that technology's been very impactful in thwarting attacks at the application level. However, in the context of cloud native applications, there is a fundamental assumption of traditional architecture that breaks down. So there's a need to rethink how that layer of security should be applied. So more specifically, WAF was always applied the credit security controls applied at the edge, either in the cloud or through like CDN players. And that worked really well when you had a set of static applications. And there was a concept of a perimeter. However, with cloud native applications, with a microservices distributed architecture, you have to is you that something inside your cluster has been compromised. So just sitting behind a valve that sitting at the edge doesn't give you adequate protection, you have to assume that every single microservice container is almost open to the internet, metaphorically speaking. So then the question really is how do you apply graph controls it just to complicate things, these workloads happen to be dynamic, so you can't really statically program we have controls around any specific workload. So that's where we feel architecturally that valve controls have to be workload centric, where every individual workload has its own web, so to speak. And when a Microsoft has a Cadena gets spun up, the valve controls automatically gets spun up around that. So that way, even if something inside your cluster is compromised, or if you're exposing some of the services to the internet, it really doesn't matter because you're not protected by the valve. And so that's a fundamental architectural change that is required. Additional benefits are with this type of an architecture, the VAF now has complete context around the application, which is not feasible when the valve is sitting at the edge or on the CDF. So that really is one of the fundamental changes that has to be driven when deploying security

 

for cloud native applications, so you have to rethink your architecture for the web and start to apply them. Those security controls the workload, and you need a workload centric Web. There's lots of different types of workloads. So is that part of the RE architecture to consider those different types of workloads? Yeah, exactly. You know, and again, you know, when you have a powerful orchestrator, like Kubernetes, you know, spinning up these containers with these workloads, you have to be able to think about who have controls in a similar ways programmatic, those get spun up dynamically, whenever a workload gets spawned. So what are some of the fundamental differences that we see, in WAF? as it evolves? Let me start with what remains the sea, right. So in terms of the security rules, like that was a dark tan, and a bunch of intelligence that's been built to catch some of the traditional application centric attacks. The good news is a lot of that software, those libraries, those rule sets are still applicable. Right. So that's the good news that you get to leverage all that stuff, where the difference is, you have a non trivial challenge now of how to actually operationalize deploying workload centric WAF. That's what has changed significantly. And it's very different. And we're hearing from customers that the VAF solutions they've been relying on for the last decade or two are really ineffective inside Kubernetes and micro services based cloud native applications, they're looking for something different, because some of the VABs sitting on the edge first, they miss the context of application, they're getting a lot of false positives, that's not very helpful. And worse, they don't have complete visibility to all the traffic that's coming into these cloud native workloads. So those are the three big challenges we're hearing from customers. And architecturally, you just need to be able to think of a different way of deploying workload centric graphs. And that's what we have done together. So I know that the firewalls have changed considerably. That's clear. It's not perimeter based anymore. What about detecting anomalies and issues? What attention do you have to put on the behavior patterns that you see in the network? How do you get beyond practices that once were effective, such as signatures, there's other issues too, such as it once seemed like you could trust any API, but now it's increasingly being a subject of attack those API's themselves. So let me just a little bit too far. So the first part is, I mean, you're absolutely right, the game has changed. And let me start with a very simple example, if you think about vulnerability scanning to detect known vulnerabilities, that was a pretty well understood problem is a pretty straightforward solution for traditional application architectures. However, with cloud native applications, and also with the explosion of vulnerabilities, it's no longer sufficient to throw out a report that tells you about all the vulnerabilities in your system, because that report is not actionable. Because what people who are operating these services are discovering is that the amount of time and effort it takes to remediate all these vulnerabilities is incredible. So what they're looking for is they're looking for some level of prioritization in terms of beta start, which are the vulnerabilities that I need to first attack, which are likely to cause the most amount of damage in your system. So the rules of the game have changed. So the important challenge right now that customers are struggling with is can you figure out the blast radius of a known vulnerability in the context of your system, it has to be contextual in your system, then you can only do that at runtime, then once you understand that, once you understand the blast radius, and the criticality of some of the workloads that maybe a potential vulnerability is actually impacting, it automatically gives you a sorted list of vulnerabilities that you have to manage. So that's the first part. The second part really is related to that understanding the surface area. And how you minimize the surface area of attacks is super critical. Because again, in this world of cloud native applications, customers are discovering very quickly that trying to protect every single thing, when everything has access to everything else is an almost impossible task. So the higher bid for them is to start with understanding how they reduce the attack surface with a denial model and only allowing microservices we need to talk to other micro services to be able to talk to each other. So that is a very highly arranged activity and a security control that can actually stop all of these

 

Alex Williams  

So now the third part, which really coming more specifically to your question is, after having done all that stuff, you have to assume that you're still going to get a tag, mostly because there's always the threat of an insider attack. And in that situation, you're looking for patterns of anomalous behavior, where there's unusual activity, maybe you're the process level at the file system level, or maybe the system call level, and you're looking to baseline where the standard behavior is. And you're looking for deviations from their behavior, and then trying to tease out some signals, which are indicators of either attack or indicators of compromise. And once again, no, you do have to lean on some level of machine learning to be able to do that. And just to add to that, maybe a simpler use case of that is to constantly be able to monitor right run time for known bad hashes or files or binaries that are known to be bad. The good news is that the security community, there's a finite list that the community contributes to, we all know what that list of hashes are of bad files are, the challenge is how do you constantly monitor and recognize those hashes inside your system, and you have to be able to implement their security control, which is another layer of defense. So just to summarize, you know, this is what I've just articulated is defense in depth, there is no single silver bullet, you have to be able to do multiple things to keep your application safe inside Martin cloud native architectures. It's just saying perspective there. What does that forcing network providers to do? Because those behavioral differences can be quite granular. To me, it speaks to the evolution of observability. You're absolutely right now, in terms of the network providers, or cloud providers, just given what you said, you captured in your question, they're limited in what they can do, because a lot of this is contextual. And it depends on your applications, your services. So some of these security solutions have to be specific to the workloads to deploy. So there's only so much I think, the cloud players or the network providers can do, the onus is really shifting to the companies and teams deploying these applications. And really, they're the ones who have to go deploy these solutions. And so how do you see observability evolving as a practice? Sure, yeah. So sorry, I missed the second part of the question you're asked earlier on. observability is very interesting, right? First of all, if you look at the last 10 years, observability security, we do silos. And I think last time we spoke, I talked about those two silos converging. And specifically, it's no longer sufficient to report on security incidents in isolation, where you spit out a report of all the vulnerabilities or all issues inside your system, you have to contextualize it. And so the contextualization, like as an example, in our case, we have a dynamic service. And third graph, which shows visually, which services are talking to each other services. It shows an aggregation of namespaces, it shows which services are actually talking out to services outside the cluster sitting on the internet, or which services are accessing IPs and clusters of IPs. So there's a nice visual representation of what is happening inside the cluster. And it's like magic, because you know, as soon as it pops up, people actually get the first time understand what's actually happening inside the cluster. Now, if you start to overlay security information on top of that, it starts to become really meaningful for them, because you can talk about a shopping cart service that probably has a vulnerability, but then you can actually visually start to see which other microservices, that shopping cart is actually talking to in the context of the dynamic service and Threat Graph. And when you do that, you automatically have a perspective on what the blast radius is for the vulnerability and based on which other micro services it's starting to, you can then decide if the other workloads are mission critical or not. And if so, what mitigating action you take, maybe you're decide that the blast radius is pretty huge in terms of impact and mission critical workloads. And you need to actually put some remediation or mitigation in place as a short term measure before you get some remediation. And you roll out some security controls to stop the flow of traffic to these other microservices from the shopping cart service until you have a remediation. Right? So that's a great example of how observability and security kind of work with each other. And once you roll out the security control, you can actually then visually see through your observability features, whether the traffic indeed how

 

Start going to these adjacent microservices. So that's a very simple example of how observability security are starting to converge. And I think from a trajectory of where we feel industry is going, all conversations about security will happen in the context of observability, or something like a service graph. Because without the context, just talking about absolute data about security is not as meaningful or as valuable to security analysts, or to the people operating this infrastructure and applications. These early times, what are some of the tool development you're seeing there's a need for and what are people relying on in the meantime, there's quite a bit of complexity in what I've just described, articulated a pretty simple case. But in reality, when you have maybe hundreds of micro services running on 1000s of containers across multiple clusters, the amount of data a human has to process is pretty staggering. And even if you have the best visibility tools, it can still cause an information overload. So a couple of things. One is we refer to it as a leanback experience, where we try and take the burden away from the user and try to do a lot of the inference ourselves and present occlusions to users of what they should be doing. So that's a huge opportunity for innovation over the next decade. That is in terms of the inference Correct, exactly. I've heard a lot about inference at the edge, for instance, but we're talking about inference on just the overall network, correct, yeah. And the micro services running inside the network, and how they're talking to each other, the potential impact and indicators of compromise are indicated for attack. So inferences on any of these dimensions are very powerful in the second related concept to that is, so you discovered something, but it could take a significant amount of time to remediate it. Because remember, the practical aspects of remediation Are you gotta go back to the source of the software developer, get her to fix it, and come back and test the fix and roll it out. That could take days, weeks or months, right? Who knows? But in the meantime, you're faced with the dilemma, like do you shut down those services? Or do you take the risk and keep them open knowing that you've got a pretty big security hole inside your cluster? So that's where the concept of mitigation comes in the concept of mitigation is that once you've identified a security hole, can you put in place controls that temporarily maybe block or quarantine, a specific service, or container or Microsoft search has been identified as questionable, until you have an opportunity to get a permanent fix? That's mitigation? And second, can you automate the mitigation? Can the system automatically propose controls for you to quarantine that specific container in question without you having to manually go configure? And then can you test it by staging it right? Can you stage it and just watch the traffic flow? And say, has this thing really been quarantined? Or not? And then promoted to production? Right? So applying a lot of the software engineering principles to security, were thinking about it like a CI CD system, but you're doing operations and you're automatically not only detecting these, you're trying to figure out the blast radius, then you're trying to figure out how do you apply mitigating controls, the system propose, or the mitigating controls could be your test those mitigating controls were pretty good staging. If you're satisfied with that, by visual inspection through observability, you promote that to production. A lot of this is ripe for innovation over the next decade. So software engineering principles apply here too to the management of, for instance, inference models, the mitigation controls that you're putting in place, and really the decision making that you have to do, who are these teams is you're seeing do this work, though teams really are anyone managing security operations are really the team's doing all this heavy work? And just to kind of maybe build on the first part of your question there for a second, Alex, you know, in terms of applying software engineering principles to security, in fact, I'd go so far to the extent to say that, honestly, that is the only where we have any hope of being able to secure these cloud native applications. Because when you think about why the traditional security models like firewalls are failing, they were all hard coded, right? They were saying this specific IP address, you let traffic in this specific IP address, you don't let traffic in. So those are examples of hard coded rules. And there are customers with I'm not exaggerating, 10s of 1000s of these rules, and they're afraid to deprecate any of those rules because they don't know what's going to break

 

Got some of those rules were configured 10 years ago, right? So the modern way of handling security is to treat it like software. And you assign labels to workloads. And you say these workloads are read workloads, these clouds are blue, and these are pink. Reds can talk to blues and blues cannot talk to pings. And then if you need to change some of the rules, you're just changing the software or the rules around that. And everything gets reconfigured. Or if you introduce a new workload, you say this workload your target as blues or reds or pains or a combination of that, and then inherit all the rules. So automatically, you can see how easy some of this is once you start to apply the principles of software engineering to security. So what is the level of seniority on these teams? It's you're seeing really adopting these technologies and like, how do you see those teams being composed? Who are the people on these teams? You talked about security operations people? Is that the whole team? Or is it something different? Or? So it's a little bit of a hybrid? But the first part of your question in terms of seniority. So one thing that's interesting is that a lot of this adoption, and a lot of the innovation and the thought leadership is happening bottom up, right, it's not being driven top down. So you may have someone without any fancy title, but they have the vision. And they know how to do this, and they really driving it, they're getting the rest of the organization to adopt it. So this has very little correlation to someone who have a fairly senior titles. The second part of the question is really cross functional in nature. And that mirrors how the software for microservices and cloud native is getting deployed. It's a combination of developers, it's a combination of someone who has responsibility for handling the platform, they may have different titles, and it depends on the size of the organization, definitely someone in the security organization and someone inside the DevOps team. Now, I've listed four roles. But it's not unusual in smaller companies, maybe all these four roles rolled into a single person. And she's handling all these roles, right? So that's very possible. But as you get into larger organizations, each of the roles I talked about, maybe there's a whole team behind that, but they have to work together to pull this off. They can work in silos, what's the intersection of what we're talking about with the software engineering principles and how they apply to, for instance, the management of inference models and other these topics to what we are seeing in the evolution of laughs and not just protecting the overall network, but the applications and the services themselves. So I think if you start to apply, I mean, think about software engineering, most of software engineering, a programming languages are built on a concept of abstraction. If you look at some of the advances being made in software, I would argue one of the core reasons is better tools and better programming languages. And what does that mean better abstraction, because just cognitively, a programmer can do more things, because a lot of the complexities hidden from the program or you don't have to worry about the registers and the bits, and the bytes are operating at a different level of abstraction, right. So that's why when the last two to three decades, the software engineers today are so much more productive, compared to a decade or two ago because of that level of abstraction. So the concept is very similar, I think, in cloud native security, that you bring in that level of abstraction. And as a simple example, you start to assign labels in identity to workloads, and then you start to configure rules that allow you to dictate the behavior of these workloads what they can and cannot do. So what that takes away is it takes away the physical dependency on where does that workload live? Right, it's no longer attached to an IP address a particular system or particular infrastructure, it really doesn't matter. And when you need to change the rules, you probably need to just go change one rule instead of going in changing 53 Different hardcoded points of reference to a specific IP address. Right. So that's a very simple example of what I'm talking about. And some of the inference talked about is really layered on top of that, once you start to baseline the data of how new workloads have been talking to read workloads for the last 23 days, in suddenly you see a difference in how those two types of workloads are communicating with each other, or some of those workloads start to make some calls out to the internet that you've not seen in the past 23 days, it starts to bring up bring up a question of why why are the blue workloads behaving differently, right, because you now have data from 23 days that tells you they're supposed to behave in a different way. And then it gives you the ability to probe deeper through signals from process behavior, file system behavior, system, call behavior and then start to know

 

out or down and get to what could be different? And what could be the source of the problem. And then the rest of what I talked about. So if you're wanting to find the problem, then what do you do about it from a mitigation perspective? Before you wait for a final remediation that could take weeks or months? Can the system tell you automatically what rules and what security policies to configure and allow you to test them out through staging and then you promote them into production, that entire workflow. So all those I'd say are principles in some form or shape were borrowed from software engineering. So that's the beauty of it, right? We don't need to reinvent a lot of these things, we can actually dip into the history in the repository of software practices. That's a really fascinating topic, I could talk about it for a while, I'm particularly interested in the ramifications for how this may positively affect an organization's. So if you can start seeing those calls out to the internet, you know, that you can start tracing those calls to learn more, really. And that goes them way beyond your network itself, but also into the extended network and the extended supply chain. Really, that seems to be a kickoff for another conversation another time when I want to thank you very much. It's been a real interesting discussion. I always enjoy getting into the teams and how they're working. Because I think you learn a lot through the work that they're doing, because they're the ones who are really working on the on the projects. So thank you for that perspective. Thank you. And we learn a lot from our customers and working with them. So I completely agree with you. Alex has been fun. Thank you so much for the time.

 

Hi Gara, the inventor and maintainer of open source Calico delivers Calico cloud, the next generation cloud service for Kubernetes security and observability, Ty Guerra and the new stack are under common control. Thanks for listening. If you'd like to show, please rate and review us on Apple podcast Spotify, or wherever you get your podcasts. That's one of the best ways you can help us grow this community and we really appreciate your feedback. You can find the full video version of this episode on YouTube. Search for the new stack and don't forget to subscribe so you never miss any new videos. Thanks for joining us and see you soon.

 

Transcribed by https://otter.ai