EP8 — How LinkedIn Built Their Vulnerability Management Program
Three years ago LinkedIn had no vulnerability management program in place. Today that’s a completely different story. Over the past three years, they built their program from scratch and rapidly scaled to keep their 25k+ employees and 800 million users safe and secure.
How did LinkedIn achieve this scale so quickly and what lessons were learned along the way? On today’s episode we speak with Justin Anderson — LinkedIn’s Head of Vulnerability Management who was tasked with building out the company’s program. Justin’s experience spans the US Air Force and MITRE offers a unique perspective on what it takes to overcome the challenges of scaling a security program.
Topics discussed in this episode:
- What Justin and his team prioritized as they began building LinkedIn’s vulnerability management program.
- How the scalability challenges Justin faced in the military prepared him for the challenges of scaling LinkedIn’s vulnerability management program.
- How to incentivize developers to take security seriously and create a win-win for developers and security.
- Why Justin is skeptical of the traditional security champions program model and what he recommends teams doing instead.
- How security is evolving and what Justin believes security teams of the future will look like.
- Group that helps connect tech companies with Veterans – https://breakline.org/
- LinkedIn’s approach to developer productivity and happiness – How LinkedIn Does Developer Productivity Engineering with Grant Jenks
Justin: Thanks so much for having me.
Harshil: I am very excited to have you here, especially because this topic is very near and dear to my heart. I spent a lot of weekends and nights trying to figure some of these things out. And you've delivered, you've been doing it at a much bigger scale than what I had seen myself earlier. Really exciting topic for me, I'm sure our audience is going to love this as well. Before we go too far down the path of the specific topic around vulnerability management and things like that, I would love to hear a little bit about your background. Where did you start your cybersecurity career, and where were you before LinkedIn?
Justin: Yeah, definitely. So I guess I'll start off with the beginning of my cyber career. I remember when I was going to college, it was kind of up in the air, I wasn't sure what I wanted to do. I actually had been kind of dead set on this career field called Acquisition in the US Air Force. I'd been ROTC at that time, getting ready to be an Air Force officer, and I got a call from my commander in the middle of lunch my sophomore year, where they were doing the shred outs to different careers, and they're like, “I know you put number one as Acquisitions, but actually we're going to put you into our number two, Cyber Warfare Officer”. So I was a little surprised at the time, but go figure, it's actually a really great career. So I spent my first five years actually in the US Air Force working on offensive and defensive cyber operations. Went all the way up from leading 250 people as a captain in the US Air Force, running IT, and then some defensive functions all the way to Air Force Cyber Ops, working with the NSA and CyberCom to help kind of coordinate some of those offensive CyberOps. So it was a fun time. After that, I spent a couple of years at MITRE doing some more research oriented projects, also still working with the government on some IR platforms and some other fun projects there. And then these last three years I've been at LinkedIn building the vulnerability management program. So I started out with no program, and now we're here with fully automated servers being scanned on a daily basis, massive amounts of data and getting all our SREs to keep our environment healthy. So it's been a fun journey.
Harshil: That's amazing. I've had the privilege of working with a few other ex-Air Force and ex- army people in application security and in security in general. I have noticed now a lot of the Air Force people tend to think that they are much more technologically advanced and savvier than the army. Do you have any opinions on it?
Justin: That's the funny thing about when you enter the military, there's this thing where if you were to list the military, they do almost an IQ equivalent test, it's called the ASVAB. And the way the different services work, the Air Force actually picks the highest scorers on that test for the enlisted force. So the expectation is as an enlisted person, you're more technical entering the Air Force. In the officer world and in the cyber world, ironically, I think the different services have approached cyber in different ways and the army had the budget and the manpower to really build a robust cyber operations workforce. So I think they're actually a bit ahead of the Air Force. At the end of the day, all the smart people in the Air Force seem to want to be pilots, and there are some smart people who want to be in cyber as well, but I think a lot of the mindshare and a lot of the budget goes to the flyers in the Air Force and the army. The army knows the importance of cyber for sure.
Harshil: Right. Yeah well at least your arguments had data in it as compared to just opinions for all the other ones.
Justin: Haha, yeah exactly.
Harshil: So when you transitioned from the armed forces, from the Air Force to the corporate world, did you see any similarities for people who have never worked in the Air Force or the armed forces in general?
Justin: Yeah, I think there's a lot of similarities. So one thing the Air Force does give you is a very corporate environment because the enlisted folks you work with are pretty technically competent and a lot of them are experts in their field. You have to defer to their opinions a lot of times. And in some of the other services it's more directive where you just get the orders and people go execute the orders. So I think the more corporate environment definitely lends itself to transitioning into the tech world where again, you've got people much smarter than you and with many more years experience. And you may be in a leadership role supervising them, and you need to make sure that you're not trying to force your opinion. You want to get the best possible solution, and that means being malleable about your thought process there. The other thing too, I think pretty similar is just the scale. Scale in the Air Force was about a million devices, as you can imagine, very heavy on the workstations just because there's so many employees and contractors and government civilians. Flip that over in the tech world where you have much fewer employees but hundreds of thousands of servers. But again, scale problems still exist and I think a lot of the approaches that the DoD takes are informative for tech companies, they maybe don't fit one to one, but I think they are definitely useful.
Harshil: Yeah, and one of the things that I've observed in a lot of the veterans who are coming into the corporate world is just a phenomenal discipline that they come with, right? Obviously, technical and domain expertise and all that stuff is table stakes, but the work ethic, the discipline, it is just amazing. And in my previous company, we used to work with an organization called BreakLine, which was placing a lot of the veterans. So for any of the audience members, if you're looking to hire really good talent, you should go check out BreakLine. Really good technical talent, security talent coming out of that.
Justin: Plus one. Actually, I did go through BreakLine. I didn't attend the full thing, but I did meet up with them. They helped a lot with the recruiting process of tech companies. Transitioning is just a huge burden. I feel for veterans because the tech industry really doesn't have a lot of veterans. There's not a lot of people in your network you can really reach out to to transition over. So I'm really a big fan of their work, and I completely agree with you. Veterans I think bring a level of discipline and a level of ownership too that I think is really great at tech companies. Because sometimes you run into people who are happy with the way things have been done and don't really want to change too much and may not feel like other teams' problems are theirs to solve. But I think generally if you throw a veteran in front of a problem and they'll try to solve it even if it's not their problem.
Harshil: Yeah, that's awesome. So let's talk about those problems that you faced initially when you got out of the Air Force and joined MITRE. So you were working on what exactly at MITRE?
Justin: So at MITRE, I actually stayed very government-adjacent. So I worked with the US Air Force for their Cyber Command kind of equivalent. So mostly it’s folks around building incident response platforms, we wanted to have the buzzword at the time, and still the buzzword is “single pane of glass”. You want to see all your different metrics, all your different alerts, like very critical alerts in one place, and then also just bring a data driven mindset to the leadership there. I think it was kind of a new thing at the time. The generals kind of still were operating in this ambiguous space, not sure what the environment looked like, not sure how to make decisions. And so me and then some colleagues basically partnered up with some smart officers in the Air Force Cyber Command and just built this platform to help them kind of inform and make decisions better. And then also bring in stuff from the attack team on how threat actors work, how they can build the detections for this, and then partnering with our friends in the offensive CyberOps in the US Air Force to figure out, as an attacker, how would you approach these things, and what are some lessons we could bake into this product? So that was a couple of years just building that up, but unfortunately, the fun details we can't really get into that particular assignment.
Harshil: But that's a really good journey though, right? Like, you learn so much about the offensive side of things, you get hands-on experience, you understand how those things operate, and then you go into the defensive side with that knowledge and you understand, or you're trying to scale up things like vulnerability management. And it's very handy to have that perspective of what the offensive side even looks like.
So when you were in your next career step at LinkedIn, was there a big challenge that you can share that got you excited about solving?
Justin: Yeah. So I joined LinkedIn as a technical program manager, which means your job could really be anything depending on what the initiatives are and how the wind's blowing for the day. And so it started off in a function focused mostly on potential brand impacting events and kind of building some rigor around how we respond to those types of events, bringing that incident response background to build that process out. And that was pretty fun, for sure. But a couple of months into that role, I found out that we had internal audit requirements, we had external audit requirements, and it was becoming a big issue that our vulnerability management program was loosely defined but not executing at all. And so they needed somebody to go build that up. And so, funny enough, my first experience of vulnerability management at LinkedIn was being handed about 100 tickets having to do with our IM systems. And they're like, “You're a technical program manager, go solve these tickets”. So I looked at them like, “Alright, this is a very small set of systems, it's a very small set of vulnerabilities. It seems like we don't have a really good process around this. Let me go do some digging”. And yes, after digging around, find out that actually outside of compliance requirements, we really didn't have a good idea of how to address any kind of vulnerabilities at scale. And so I got a couple of engineers together and we started brainstorming. We're like, “Okay, how can we properly scale this and actually address these really fundamental problems for our infrastructure?”. And so what we ended up doing was partnering with SRE orgs and figuring out, “Okay, how do we get data? We need data”. Step one, figure out what's out there. And we ended up just going with, “Okay, we're going to use a third party agent”. It’s simple enough, we can deploy it everywhere, we don't need to build it ourselves. It's just the time to choose from that. And then we had to go convince our SREs, “Hey, we want to put these agents on every one of your production systems. And of course they're terrified of that. The scariest thing to them is like, yet another agent, might boom the system, might cause LinkedIn.com to go down. But by proving it out in small environments, small critical environments, building that credibility and then building those relationships, we eventually scaled it out the entire infrastructure. And then with that data, the tough thing for them was we're actually creating more work. They didn't know yet, we're putting it on our systems, they're helping us, but then the agents are on the system, then we come back to them, “Alright, we have some vulnerabilities we're finding. Let's start solving these”. And I think the tendency for most teams is to try to be very transactional about that. But we definitely focused on we want to build scalable, high ROI processes on top of that. So we scaled it from there.
Harshil: So I think one of the particular things that makes this process a little bit difficult is as security professionals, we spend so much time and energy and resources in deploying these systems and finding all those vulnerabilities out, and creating tickets or what have you. And most of us, we expect when we create tickets, somebody should do something about it, right? But that's not the reality. Like, people don't actually just fix tickets because we created them. So this process that you talked about in terms of getting or convincing the SREs or whoever owns the underlying platform to actually go and spend time and fix those things. They clearly either didn't know about it or they were not incentivised to do it, or whatever the gap was, but there were some gaps. What makes this challenge particularly challenging? Because in an ideal world, you find problems, you tell whoever owns the infrastructure to go fix them, and they would go fix it. But that doesn't happen. So what makes this piece challenging, in your opinion?
Justin: Yeah, you brought up some good points there. I think the core thing is definitely incentives. If I'm an SRE, my entire job is maintaining the availability of the infrastructure I work on, and ultimately LinkedIn.com. And when you come to me and say there's a security issue that requires me to apply some change to my environment and then potentially reboot or do whatever else I need to do to patch it, you're introducing the potential for my system to fail. And that's really bad for my system. So my immediate inclination is going to be like, “Don't touch my system. It's running right now. Please don't disrupt it at all”. And the other part of that too is I don't clearly understand what the value of fixing this vulnerability is either. And so what we ended up doing was trying to figure out how do we align these incentives? How do we make this a kind of dual win? Because that's where we try to figure out, okay, at the end of the day, individual vulnerabilities aren't really that significant. Unless it's actually an exploited vulnerability and it's actually an imminent risk to LinkedIn, then what we really care about is actually cycle time. How quickly can we refresh our infrastructure, just clean slate because stagnancy is kind of the enemy of security, where if you have a server that you can't turn off and it hasn't been updated for ten years, that's the place the attacker is going to try to get their first foothold so they can go pivot everywhere else. So we started partnering with them to figure out, “okay, how do we increase your cycle time?”. And then also communicating like cycle time is good for both of us because you can't be on Rails 7, version X for the rest of your life. You're going to have updates, you're going to have software that depends on dependencies from newer version software and things keep moving forward. And if your infrastructure is stagnant, you're not going to be able to adopt those new technologies. So we kind of communicated that dual benefit and really got the alignment on why this is a good thing for security and why it's a good thing for SRE.
Harshil: Yeah. So I like that idea of aligning with cycle time, however, I have a question on whether it satisfies the typical security needs or not. So I'll give you an example. So when you have scheduled refreshes like cycle time, so a lot of companies, they decide, “Okay, we're just going to refresh our infrastructure every quarter, every month”, whatever that is, right? So that has nothing to do with the underlying vulnerabilities or risk that you might be exposed to, right? Because that's a set schedule in a lot of cases. Now, let's just say there's a CVE, that's a critical, there's no exploit, but whatever scanning vendor rates it as a critical CVSS 9.8 or whatever that is. Now by compliance standards, you're required to fix it within seven days or 15 days or 30 days. Now, how do you get those two different approaches together? Like, now you have an SRE team that's upgrading our infrastructure once a quarter, so you have to wait at least 90 days. It doesn't matter what your CVE score is or CVSS score is. And then compliance auditors, they are looking for SLAs on vulnerabilities. How do you bridge that gap?
Justin: Yeah, so there's two things I think I heard there that I think are interesting to dig into. So the first one is for cycle time not necessarily being related vulnerabilities. And that's where I think security teams, you own that problem, if you can increase cycle time, then it's incumbent on you to build that process in a way where every refresh does fix vulnerabilities. So that's where we created automation to do image certification, for example, where as part of their performance testing pipeline, we also inject vulnerability tests as well. So we can say, “All right, this image is completely free of vulnerabilities at this time, and then we're going to give us a time to live of X number of days. As long as you're running this image within that TTL, it's healthy, and we're happy and you're not vulnerable. So we're good. That will be our cycle time. So definitely that step is required for cycle time to matter. The second part of that on compliance, I think security professionals sometimes think these compliance mandates are more strict than they actually are, and so they'll paint themselves into a corner where they set some arbitrary standard of risk which may or may not be correlated to actual risk at all. And I think if you're in that scenario you probably need to work with your GRC team to back it up a little bit and figure out what the right amount of risk is and then what is risky and what's not risky. And so what we did, because we had the luxury of building from the ground up vulnerability management program, we got to partner with our GRC to set these policies up and we set them up in a very pragmatic manner which is we bucket the high risk items into kind of non-exploitable high risk items as far as what a compliance, what an audit would consider as high risk and we do the refresh to just clear those out in a regular way. We also have our secondary, I guess you call, extremely high-risk, which is again exploit-available, actively exploited, things that I can have the red teamer actually use to get access to the rest of the environment. These are really bad things. And those we just treat entirely separately. We don't consider those compliance at all because this is past compliance now. This is like we're keeping the company safe and we want to keep our reputation intact and so we have those of course, like asap, fix immediately. And the nice thing is the faster your cycle time gets and the more resilient your infrastructure is, the easier it is to respond to those secondary issues too because it's no longer the end of the world when your SREs have to stop what they're doing and go push an emergency fix. Their workloads have already been sufficiently disaggregated from the host they're on and they're able to push these changes with more confidence.
Harshil: Yeah, so that's actually a really good way to think about it. So when you follow that path of pushing a hotfix for a real issue that is exploitable, was there a leadership alignment saying like, “Hey look, there's going to be a cycle time and for hot fixes or for really important things when it's all hands on deck, we just need you to do it and not push back all the time”. Because one of the things that I had seen earlier is when you don't have buy in - maybe I'm answering the question for you - but when we don't have the buy in, a lot of the individual engineers might come back and say, “Well, I don't believe this is exploitable, show me an exploit or show me this proof of this”, right? But we can't just keep spending energy and resources in proving to somebody that this is a real issue. So how did you manage that alignment with the SRE teams or the other team saying, “Hey, if this is a real issue, we're going to come to you and we just had to fix it”.
Justin: Yeah. So I think that's where a lot of security programs have challenges, is building that credibility. And credibility is very fragile too. If you mess it up once you just damage your reputation, you're going to have to build your way back up out of that. So as security professionals, I think that does lead you to err on the side of maybe more risk acceptance than most security people would be comfortable with. But for the case of like, let's say actually exploited vulnerability, or communicating with SREs, we've already built a culture at LinkedIn, where if we say something with P zero, everybody across the company understands P zero and all hands are on deck after that. Some cultures are different. You have to definitely build that credibility again, but part of it is understanding how attackers work and actually knowing how the exploit works and how the vulnerability works. I personally did an OSCP just to kind of learn how better how hacking works and how vulnerabilities get exploited and how would an attacker leverage these things. And definitely companies have red teams for the same purposes. You want to have those attackers, they can tell you is this a legitimate concern or is this a theoretical concern? Because where you lose credibility is when you push too hard on a theoretical concern and you get called out for it, and so that becomes an issue. But on the exec buy in side, I definitely agree as well. And this is for emergencies, this is for cycle time as well. It's just so crucial to have that exact buy-in, and this rigor around compliance being a top level concern, just like availability is. And that's something that we had to build at LinkedIn more recently. And it's also something that requires a lot of work just to maintain because you have to embed yourself in those executive reviews. You have to embed yourself in the project review meetings, and then get really good metrics and get really good at keeping the executives interested in the metrics. And so every security program, I think since the beginning of vulnerability management, everybody's known that's like, the thing you need to do to convince the execs. But it's one of those things where if you don't do that groundwork ahead of time, you're going to be fighting an uphill battle every single second of your day.
Harshil: Yeah, that is 100% true. I could not agree more. So this is interesting what you mentioned, which is there's got to be a level setting between the security teams as well as the executives and the leadership on the engineering side. Did you run into cases where within the security organization, somebody was saying, “Hey, this is really important, we have to actually fix it now”. But then you thought maybe it's actually not let's accept the risk, let's not wake up people in the middle of the night or on the weekends and it's actually not that important. Have you seen those cases arise as well?
Justin: I've seen that throughout my career. Military was especially good at this, where my favorite one was all these speculative execution vulnerabilities back in the day, Spectre, Meltdown type things where the whole world's hair got lit on fire. Every CISO and every company was like, “We must force patches neatly”. You look at the actual risk of that being exploited other than if your cloud vendor, they had a different scenario where that model actually had to address that immediately. But the rest of the world I feel like overcorrected and just caused major downtime, invested a ton of engineering time in depicting that, and really the value and the risk reduction seemed very low and I think we're still struggling with that a little bit today. I think when a vulnerability comes in the news cycle and you start reading on CNBC or the New York Times or very public venues, then I think CISOs tend to just jump all over that and there's some credibility, even though it’s low risk it’s still potentially high reputation damage if you were to get exploited from one of those. So it's definitely something to look out for. But I think the security teams are responsible to hold each other accountable. And security leadership especially, that's your job to call a stop if you feel like maybe some of your engineers are getting a little too worked up about something that might not be exploitable and make sure you hear them out on why they're concerned about the risk level of something. But at the end of the day, you're running a business and you need to be pragmatic about how you invest your time. And the cost of responding to an incident, a vulnerability incident for example, it can go into thousands of engineering hours, which if you multiply that times tech salaries, that's a lot. Millions of dollars just invested into fixing something. So you want to make sure the dollars are wisely spent.
Harshil: Yeah. I'll give you an interesting example. In one of my previous companies we organized one of the first security hackathon. It was like a whole internal bug bounty program. So the team was amazing. We set up this whole half day competition with leaderboards and all that stuff and really cool prices, and it turned out 70% of the engineering team wanted to participate in that. And so for almost the whole day the entire engineering was shut down. We were just doing hackathon and bug bounty. We did not have a very favorable feedback from the engineering leadership after that day.
Justin: I actually love that type of stuff though, because that's not responding, that's just culture building. And so we do something similar with capture the flags or something for our in days or hack days. And it creates excitement around security, which you would not get otherwise. I think a lot of people try Security Champion Programs , which is kind of like the buttoned up version of this where you like, “Hey, you're the security person, come learn about security and champion our values”. But you get a couple of engineers excited about cracking some hash to find a password or getting a shell on a host, and that interest doesn't die down quickly. And now you've got a real champion inside that org who's pushing for security and maybe eventually will join your security org just because they want to learn more.
Harshil: Yeah, so that's interesting. Do you have any, like a Security Champion or any similar programs who are responsible for helping with vuln management?
Justin: So we had a Security Champions program more on our platform security side. So like architecture reviews and then also some of the pen testing products. On the product security side, we had some Security Champions that is getting renewed here. I personally am a little skeptical of the model just because if you're trying to have somebody who has a full time day job and their own area of expertise also get an additional day job and a new area of expertise without the incentives necessarily realigning, it doesn't make sense to me as like they're going to revolutionize your security posture on those products. And also those people move teams, they move jobs, there's a lot of churn there as well. So I really don't feel it's a great strategy like lasting security. That's why I focus more on kind of the infrastructure side, like let's build secure tooling and build security libraries and build security capabilities that help them do their job without needing to care about it. But some companies definitely, I think, see value out of it.
Harshil: Right. So in terms of what you just mentioned, which is building secure tooling, secure frameworks and secure libraries and things like that, who actually does it? Okay, so let's take a step back. Who identifies that, “Okay, we need to build this secure library or this shared system that every developer should use, or every SRE should use”. Who decides it, who builds it, and how do you get it adopted?
Justin: So I think this is kind of a new trend within tech companies. I think we'll start seeing it reverberating out to the rest of the industry soon, which is more focused on developer productivity. And so at LinkedIn we have a whole VP level org called Developer Productivity and Happiness. And their whole idea is they build tooling to make developers faster.
Harshil: Does it come with free avocados as well?
Justin: Haha, well it is LinkedIn, of course it comes with free avocados, haha. But these teams are building these libraries so that developers can code faster and they're building metrics around what is your deployment velocity, how quickly are reviews done in your team? Part of that though is security. If I am a developer and I need to publish a service that has some auth or something, I probably want to integrate with some SSO service to make it frictionless. And if I'm a developer not from a security background, I don't know who to go for that too for that. So usually what we see is security orgs will build their own libraries and I think that's the right place for now. But eventually I could see this functionality just being peeled off into either third party libraries, ie. like Fuse, Okta, you can integrate, previously a developer can do that. Or maybe this becomes an embedded part of developer productivity teams or the software engineering team themselves. But eventually I think we need to get out of this place where security takes its own mind space and its own specialization, in its own function to get people to do things right. And more like it's just hard to build things securely. In fact even with framework updates, it becomes a lot easier for us to, without any kind of custom libraries or ownership, have secure code shipped. I.e. if my team's using C++ and that's the framework we've always used, then I have to worry about a lot of memory safety vulnerabilities. If my team is newer and we decided we want to build on top of Rust, then a lot of those concerns are gone. So a lot of it's just getting the industry to catch up on the frameworks. And then if there is a need for extra scaffolding on top of what your company does right now, have your security team build that. We've had I think attempts at LinkedIn where we'd say, “Hey dev teams, this library is insecure, please go fix it”. And almost always again, incentives problem, they don't build it to actually fix the problem or it gets built too slowly and then just sits stagnant for a while and no longer solves our concerns. So at the end of the day we realized, “Alright, let's just build these ourselves. We'll partner with the teams to make sure it meets your requirements”. And then we as a security gets to worry about are they productive, is the tool easy to use? And then especially is it effective at driving down risk? And we care about all those things together so we can carefully monitor that versus like other software engineering teams who might just only care about the productivity part.
Harshil: Right. Yeah, I think that's a good point. Although there are a little bit of unique dynamics for security as well. So for example, a lot of the productivity things could be optional, right? Like if a developer wants to decide to take the longer route and do it in an inefficient way, it's okay, it's their choice. But the same decision-making in a lot of cases cannot be given to a developer. You cannot say a developer wants to choose Log4j and just introduce that. Sure, you can use a vulnerable version of it. You can't allow that, right? In a lot of cases. So how do you maintain that reasonable level of control? I guess back in the day in the waterfall model there would be change approval boards or change review boards or whatever, and you can approve things or block things or whatever. But in this modern engineering environment where developers don't really need anybody's approval, and in that case if you're incentivizing them, or if your objective is to drive adoption of secure frameworks and libraries and things like that, do you only take an incentivisation approach? That “Hey, this is great, like you should use it, if you don't, it's your choice”. Or you actually come in with a mandate and saying “These things are optional, it's good to use, but these things you just have to do it”, and how do you enforce that?
Justin: So I like to think of this as the transition between every company having a very robust QA organization, to the more modern standard of tech company where QA is part of a software engineer's responsibility. You build your own unit test, you build your own integration test. I think security is in a transition where we're still QA and we try to inject these waterfall-like processes into these agile teams, in these fast moving teams and it never works. The worst case is we block those teams from deploying software and hurt the company by slowing down our progress and potentially our products are no longer competitive because we're going to be out-competed by companies who can ship faster. The alternative is kind of what you're talking about, which is we recommend some libraries and there's potential there for people to make bad decisions and leave us in an insecure state. So kind of what I propose and what we're doing a bit here at LinkedIn is more towards the latter of like let engineers make their own decisions, but build things in a way where they can't make really really bad decisions. They can make maybe suboptimal decisions, but the really bad stuff we're going to block. Let's take Log4j for example. If an engineer imports an old version of Log4j, obviously that's not ideal. The best thing to do would be get it out of our monorepo so they can't even import it in the first place. That'd be one way to stop it. Let's say like we never clean it up and they import it into their software and they try to deploy it. The thing we did as a response to Log4j is we pulled an agent JAR patch to break the functionality that makes Log4j exploitable. And so this JAR is actually injected when you build any library on top of our production services. So even at your local workstation, you pulled an old Java and our old Log4j and you build your packet and it's vulnerable, once you ship it to production, we completely de-fang it and this is no longer exploitable. And there's no way now for you to kind of make mistakes that would be really bad for you and really bad for LinkedIn, we just prevent it entirely. Like those types of things. And there's some edge cases that are definitely harder to approach in that way. And I think that's still an area where we need to kind of incentivize and we need to maybe take a little bit of the engineers mindshare to be able to fix those problems. But I think the majority of problems these days can be solved at the infrastructure level, especially with more infrastructure as code and more capabilities to have more constrained environments at the platform level, but more creative environments at the software level. So I think the trend is going to keep shifting more towards, again, more developer autonomy.
Harshil: Yeah, that's phenomenal. I think I've seen more and more adoption of this exact same practice. And some people call it a security engineering function where you're building those guardrails, paved roads, whatever it is, to give an easy path to adopting secure services, but also at the same time, as you mentioned, de-fang the really dangerous stuff without the developers having to do anything about it. So that's pretty awesome. In terms of somebody who is looking to build these types of functions, whether it's empowering developers to choose to make the right decisions or building an automated vulnerability management system, and if someone is starting from scratch, like what you did a few years ago, do you have any key pieces of advice? Like how do you think about it? How do you staff this function, and what are some of the key decisions that you have to make?
Justin: Yeah, definitely. So I guess I'll answer staffing real quick and then I can go on kind of on the advice of the core things you need to worry about. So staffing for our function, because we can build ground up in a tech company, I have the luxury of… we have a team of software engineers and SREs I can leverage, core builders by nature. Actually, my team is entirely non-security focused software engineers and SREs. I'm the only person with a security background and that model has worked pretty well for us. I don't know if it works for all teams, but at the end of the day, you do need builders. And I think that's the challenge that a lot of security professionals are going to face if they aren't also learning software engineering skills and how to be builders as well is as we migrate more towards building security tools, they're going to have less relevance in this old model of like they perform this QA function. So I'd say that's a cautionary thing for a lot of security engineers out there. But you need a team of builders because again, you're building a structure, but you also need your product manager type. That could be me as the engineeer manager who has product experience, that could be a dedicated PM, it could be a smart engineer who's like really interested in security and knows how to solve some of these problems. But you do need that function. And then of course, they need to make sure that the products they are building are performant, they're easy to adopt, they solve the security problems and all those things that make a good product a good product. Because if you have an ineffective product too, you're going to get no adoption, and then of course, we'll be back to square one with our security posture. But then on the advice side, let's build out a vulnerability management program. One of the big things we need to worry about, I mentioned a little bit earlier that telemetry is the number one thing that I think every security program needs to make non-negotiable. And I've seen some paved road approaches where they actually kind of see telemetry and they say, “Alright, we'll create libraries and if you adopt it, you adopt it”. But if you don't know who's adopting your libraries and where they're adopted, and where your crown jewels are, and then where your data is and all this context, then you can't really make good security decisions. So that's something for sure for any vulnerability management program, you need that either agent based or if you're using agentless, you need that authenticated scan data to figure out what vulnerabilities are out there. And then of course you need a recurring inventory refreshed at least I'd say every 24 hours, you have a good understanding of the state of your network. And then more and more we're seeing teams kind of try to build a security posture function of like out of my entire ecosystem, what are my weak points, what all do I have out there? And then how do I, in the case of an emergency, find all, say, Log4j for example, where do I find everybody using Log4j? And then where is that specifically routable from one of our customer facing inputs where we're actually exploitable? You need to constantly build that understanding of the environment.
The second thing I'd say is you want to get better. And to get better you need to figure out where your IT teams and your SRE teams are stagnant and where they're having trouble kind of pushing forward on modernizing their infrastructure. Because modern infrastructure is good for security and it's good for availability as well. And so partner with them on driving some of those core capabilities. It could be asset management, it could be config management, any of these core functions that help you again better deploy software and more quickly refresh your infrastructure. It's a dual win and a lot of times that's a reason too, they want to do modernization, but they're constrained too by their….they don’t have so much time, and their execs are pushing for maybe we want to do a lot of ML infra, we need to robust that up really quickly. Well, you need to sometimes push back and help your SREs push back on the back and say, “Actually this is tech debt clean up quarter. We're going to fix all this infrastructure we have. We need to make that scalable because if we wait another year it's going to be an insurmountable problem. So partner on those areas. And the last thing again we spoke about a little bit is definitely getting the execs to care because if your execs aren't bought in to the strategy then you're going to be, even if you have the most willing SREs in the world who want to do the right thing and want to build scalable infrastructure and refresh stuff and close all the vulnerabilities, they're not going to have time to do it because they're execs are setting the priorities and if compliance isn't a priority they're going to be doing other things. So yeah, just again getting that by is absolutely crucial, and make friends with your SCPs and the SREs I'll say.
Harshil: Yeah, that's a great set of advice in terms of getting in front of the leadership, are there any metrics that you typically share with them to demonstrate whether the vulnerability management program is working, or not working?
Justin: Yeah, I like to keep it simple because I think if you get too elaborate you lose attention. And so we built a very like two-core metrics we care about and we share it with all the execs. First was coverage which again like I said telemetry is non-negotiable. And I wanted to communicate especially as we were ramping our agents everywhere, I wanted to communicate how much of an environment is covered and what are the areas we’re getting push back where like they don't want to install an agent or for some reason they can't install an agent, and where we have no telemetry and it's basically a blind spot in our understanding. And so we publish that coverage metric and we always maintain 95% at least. And if we have any kind of significant drop it becomes a really big issue where we create a P Zero equivalent kind of notification to say like, “Hey, we must re-establish visibility on these areas”. So the first thing is coverage, the second thing is just compliance. And we keep this generic as well. For us, compliance means high and critical vulnerabilities, must be within SLAs, standard kind of compliance metrics. If you drill down deep into it we are also able to kind of like say, “Okay, here's why we picked this level of risk and here's why we picked these SLAs for the environment”. But the execs we don't communicate that, we try to abstract it out. We just say overall compliance you're at X percent, which keeps it simple and then of course everybody knows like execs are competitive, that's how they got to be execs. So you have to do the report card, then you do the dashboard, then you do like, here's how you're doing against your peers, and that's pretty powerful too. But I think at the end of the day, it's just like keeping a very simple number and then showing trends over time. And that's what we do too. It's like, “Where were you last time we talked to you during exec review? Here's where you’re at today. Are we trending up or trending down?”. And then have those systems on the teams, like the IT teams or the SRE teams. Be in that meeting to talk about, “Oh, here's the big automation issues we did to get that number there”. Or if there's a slight slip, like, what's the reason? Because at the end of the day, you don't want to be the vulnerability management team saying like, “Oh, this was a miss this quarter because this infrastructure we don't own wasn’t upgraded on time”. It's not really a compelling story. Like, have the system owners there to represent.
Harshil: I love it. I love the idea to have the SRE or the IT teams in that meeting with you. So it's a real story and it's not a security person reporting good news or bad news and taking credit for the good news or the bad news, right?
Justin: Exactly. Yeah, it is a little bit awkward too being I'm the person who's responsible for finding all the things wrong with the organization. It puts you in a weird spot. So I like to be able to celebrate the wins with them too.
Harshil: Right. Final question, in terms of how you see the industry moving towards in managing vulnerabilities, whether it's infra or app or what have you, how do you see this evolving over the next four to five years?
Justin: Yeah, I alluded to this a little bit earlier, but I really do think Stovepipe security teams are going the way of the Dodo as far as if we have better infrastructure, we no longer need this set of… because again, I look at QA, I look at old school sys admins, I look at all the kind of historical artifacts of that waterfall process and building software in that waterfall manner. And to me, it seems like security is the last team running in this old school way where we serve as an external function, looking in at how things are built and then either offering advice or blocking things. And that's just a terrible model for the way we operate infrastructure these days. And so I think with more auto updates being pushed and more infrastructure as code, and more ability to stop bad things from happening dynamically, and also help the developers make better decisions proactively, I think security teams are going to have less and less job to do. It's going to be more focused on do I want to build infrastructure. Our dream right now, for my team in particular, and I think across the dream is how do I get the feedback that you are about to make an insecure configuration change or that you are about to introduce software that's going to create a vulnerability, or that you have to go take a look at something where you have something that new vulnerability is found and this infrastructure is vulnerable, and get that feedback there immediately because right now cycle times are very slow. If you think of like a standard vulnerability management process, it's like scanner creates detection, detection goes to the security engineer, security engineer creates ticket, ticket goes to the SRE, SRE looks at the ticket, goes to the system, makes a change, goes back the ticket, closes the ticket, back to security engineer, validate, close. That's like the most ridiculous process for a tech company to be running. So we want to make the process SRE makes changes, sees changes are bad, reverts. Or infrastructure goes bad, SRE sees change, makes a change. No intermediaries. So I'm hoping that we get there pretty soon and hoping our infrastructure and our tooling especially again from the vendor space, I would hope we get better and better capabilities to do that. And I think there's some good evolutions like Snyk on the SCA side started creating some tooling made for devs which I think is the right part to target. Wiz.io is looking to creating some more CI/CD type infra to give SREs this capability. So hoping to see more and more in that direction.
Harshil: Awesome. Yeah, Tromzo is doing its own part as well to help drive that change. So we're excited about that exact same future. Well Justin, this was a phenomenal conversation. Thank you for diving deeper into this. These are really really actionable insights that you shared here. Thank you for your time and I really appreciate you coming on this podcast.
Justin: Yeah, thank you, Harshil. This was a fantastic conversation. I appreciate it.
How do you justify investment in product security? On a recent episode of the Future of Application Security, FullStory’s VP of Product Security and Compliance, Mark Stanislav...Read more
Should you outsource product security maturity modeling to a third party? On a recent episode of the Future of Application Security, FullStory’s VP of Product Security and...Read more