EP 17 — Tim Brown: Behind the Scenes of the 2020 SolarWinds Breach
Those in IT, DevOps, and SecOps are all too familiar with the demands of a complex and dynamic technological landscape. For more than two decades, SolarWinds has helped technology professionals and organizations manage and adapt to an ever-expanding ecosystem of IT applications and infrastructure.
In this episode, Tim Brown, Vice President of Security at SolarWinds, gives us an insider view of the 2020 cyberattack where hackers slipped malicious code into the company’s popular network management system and software program, Orion. He shares how his team worked tirelessly to resolve the breach, and how this incident has brought light to the software supply chain security issue and has helped strengthen the whole security industry.
- Tim’s perspective on the dependence of security maturity on engineering process or development process maturity
- How the SolarWinds team handled the 2020 breach
- The importance of creating SBOMs for every application and learning to utilize the data to protect against security vulnerabilities
- Tim’s advice for security leaders working with a supply chain
- What supply chain security will look like in the next few years
Harshil: Welcome everyone, to yet another episode of the future of Application Security. Today we have a very special guest. We have Tim Brown, the vice President of Security at SolarWinds, with decades of experience in security in general, and has been a very, very forward thinking security leader over the past several years now. Tim, welcome to the show.
Tim: Thank you. It's great to be here. I sound old having decades of experience.
Harshil: Haha. You started early.
Tim: That's it. I did start when I was twelve.
Harshil: That's right, haha. Tim, before we go too far down the line, maybe shine a light on what has your long security experience look like over a period of the past several years. What do you do now and how did you end up in this current position?
Tim: Yeah, absolutely. So I have been around for a long time. I started out as an engineer building solutions, then an architect and a distinguished engineer designing solutions mostly in the security space. So I cut my teeth really at Semantic for about twelve years, you know acquired, I don't know, 30 products. Really the CTO office back then was myself and one other, Rob Clyde, and we really ran the team. I focused on internal architecture and strategy and really building out, making sure that our products were secure in that way. Then off to a startup for a little while, off to CA where I was CTO for the security unit for them. So identity and access management, a number of different solutions that we had. Then start up for a little while again, then over to Dell Software. I was one of six Dell fellows, that's the highest level of technical achievement for Dell. So I was one of six of those. Again, distinguished engineer there focused again on security solutions, and really focused on what we were bringing to market, and helping to make sure that we met market needs, those types of things. So I've spent about 20 years telling CISOs kind of what to do and how to manage and those types of things. And I really wanted to get operational experience. I really did want to understand and really sit in the shoes that they're sitting in. So when I took on the head of security role for SolarWinds, that's one of the things I wanted. I wanted both product security as well as operational security. And I started that about five years ago and didn't expect to have it land where it did, right? So I was running security before our incident, running security during, running security after the incident. And in that process I took the role of the CISO on, just to be official, that I was head of security for the corporation. So that's kind of my journey through life in a few minutes.
Harshil: Yeah, that's a very distinguished career with some of the very well known companies for sure. Now at SolarWinds, there's some understanding of what SolarWinds actually does, but I think SolarWinds actually does a lot more. It's a very complex company, a lot of different products. It will be helpful to the audience if you can help them understand the wide variety of things that SolarWinds does.
Tim: First off, we don't do solar and we don't do wind. So we are not a power company. That's the first thing everybody thinks. So we've been doing IT solutions for a number of things, solutions for your IT groups, really the IT practitioner. So, network management, we've been number one from IDC for a number of years. So if you want to manage your network, if you want to know what every switch and router looks at your 7/11, if you want to look at how your environment is structured across the world, it's complex environments, many different nodes, if you want to see if something is slow, something is performing or not, the SolarWinds Orion platform is really the go to platform for many. So, along with Orion, we have database tools, we have service management tools, we have logging, we have security tools and sims of our own. So right now we’re at around 40 or 45, 50 products or so, done a lot of free tools that many people around communities list. So we count somewhere around 200,000 to 300,000 customers of all of our solutions. So we are in Fortune 500, just been there with some of our products. Pingdom, Papertrail, solutions that people just use kind of every day in their normal everyday life. So, SolarWinds has a lot of great products all focused on really simple, powerful, secure solutions. And that's really the model that we go to work with, that we are providing solutions that people can install, that people can manage on their own, that they can run and get value out of. And that's what we've been doing for many many years.
Harshil: Right. I know early in my career I used to work quite a bit with SolarWinds Orion and Pingdom more recently. Very commonplace tools, even though they might not be 100% security focus, but quite a few of them are very common.
Tim: Right. And they come from, as I always say, a well managed environment is a secure environment. You want to manage it well, you need to monitor it, you need to manage it, and that's where our solutions really come in.
Harshil: Right. And I'm guessing across those 200,000 to 300,000 customers, there's a variety of them, including Fortune 500, government, all kinds of different enterprises as well are part of that.
Tim: Yeah, absolutely everyone across the globe. Because again, some of the tools are just engineering toolsets. You just download it and use it. So they are counted as one of those customers. Pingdom, Papertrail, Loggly, all of those have huge numbers of customers. And then the Orion base is big itself as well.
Harshil: Yeah. So tell me a little bit about the complexity that you might have, considering you have several tens of products, right? I believe you mentioned. And considering the different types of customers that might be using them, some open source, some enterprise, how do you think about how to allocate security resources or how do you prioritize with the limited resources that you might have available?
Tim: Yeah, so there's a different set, right? So our SAS products, we're hosting them, we're running them. So with those products we have some sort of different responsibilities. For those products since we're running them ourselves, we can put mitigations in front of things, we can put rafts in places, we can put other things that are protection of that structure and that environment. So that has its own kind of complexities of what we do. But we're building the products on the back end. So we have common software development lifecycles, we have common models of how we develop code across the company, whether it's on premise or SAS. So those things help us from a scale perspective. On premise, we don't necessarily control the environment that you install to. So we're a component that in general runs on a Windows operating system, sometimes in other operating systems, but it's a partnership between the customer that's installing and ourselves to make sure that they've installed correctly, that they've installed with the right controls in place because we don't have control of their environment, right? On the SAS version, we have control of the environment, we can make sure that that is in the right place. The on prem, we have to make sure to help the customers so that they deploy correctly and appropriately.
Harshil: Right. So it's definitely different sets of responsibilities and ownership. Do you have a separate team, I guess, or separate set of resources to take care of the SAS structure since it needs different layers of security?
Tim: Yeah, as you would imagine, right? We have an SRE team that manages the running of our SAS product. So a common model for all of the products from an engineering perspective, from how we record. If you have a bug, you put it into Jira, right? Any bugs that are security related get a security tag, that security item gets a CVSS score, and my team monitors any medium critical, or high critical issue making sure they get worked on appropriately. So whether that issue came from an external bug bounty, whether it came from an internal person, whether it came from one of the tools that we run, it follows through that same kind of process. We have a RAFT process for things that we're going to accept risk on. From the product development side, very common for what we do from an engineering practice and policy and kind of model. What is different is that the SRE team is really the one responsible for running the SAS service itself. Now for them, it is different from what we do there, how we kind of manage that solution, right? So that's really where it separates, right? We have two other things like SOC 2 for our SAS products, and you know, external pen testing because they're outside. We do external pen testing for internal, but the external ones can hit the outside directly. So those types of things happen.
Harshil: Yeah, that makes sense. I think one of the things that caught my attention in what you were saying a little bit earlier is that a well managed environment is a secure environment. And a lot of the topics that we just talked about are related to engineering processes, right? So SRE has a certain process or engineering has a certain process. What's your perspective on the dependence of security maturity on engineering maturity, engineering process maturity, or development process maturity in general? What's your perspective?
Tim: Yeah, I think shift left is a little bit of an overused term right now. But in reality, when we want to move security closer to where things are getting defined, when we want to make sure there's a level of maturity across the board, shifting left is important. When we look at requirements, are those requirements defined appropriately? Are the actions that we're doing appropriately managed? When we're doing peer reviews, are we just reviewing that the code is good, or are we reviewing whether the code has introduced security issues, right? When we do those types of things just as part of our processes, we're always looking to strive to really not just fix something, but really improve as we go along. So I think the maturity there is important. And one of the things that we see is that the maturity of IT organizations, some of those best lessons of an IT org need to move over into DevOps as well, and Sec DevOps. So they're not necessarily completely separate, even though some folks believe that they are separate. “Oh, those IT guys have been doing stuff for 20 years, have been doing it all wrong”. That's not true. There's a lot of maturity in the processes of how you manage a firewall. And what I've been looking at and working with is to take those great practices from managing a firewall and your normal IT functions and moving them over into SRE. Not slowing stuff down, but just having those appropriate practices that you should have. And a lot of smaller companies just don't grow up with that level of maturity, so they don't necessarily have that knowledge to start with.
Harshil: Right. Yeah, and I think it becomes a little bit challenging, especially in a modern DevOps type of environment where it's high frequency, high concurrency of things going on in parallel. So while the practices might make sense, it's just hard to implement them because of the scale of the automation and things like that around that, right? It's a different set of challenges.
Tim: Sometimes it's an excuse.
Harshil: Haha, tell me more about that.
Tim: Sometimes it's laziness, right? Sometimes it's not “I’d just rather not get a peer review because it slows me down. I can't do that, I can't have somebody look at my code.”, right? Or “I have to be a privileged account all the time.”, right? Like, really? You're acting as root? We haven't acted as root for years and years and years. What do you mean you need to be root all the time? So things like that are sometimes it's simply because when you're a company of two people, you were doing those things. But guess what, there's maturity that can be learned, right? And that's one of the things that we've instituted across the board, is that same level of maturity that we have in IT. And it doesn't slow things down to have a secure model, right? And SOC 2 helps you get to some of those mature models. So as long as you're not gaming those systems to SOC 2, why not have an admin account and a separate root account? Why do you have to be root all the time? Don't be silly, right? It's the right practice to do, right? Change control, make sure you have appropriate change control. Make sure it's audited, make sure that those changes and the approvals are going into a record that you can go back to and understand. You know, roll back for changes, making sure that you can do those. Those are just normal, common best practices that should be done for all organizations, whether it's running a SAS service, or whether it's an on premise product, or whether it's your IT department. So I guess don't lose the good processes and procedures that your IT org has. You know, don't forget about those, right? They are still applying, and they're not that hard to do all the time.
Harshil: Yeah. They just might have to be modified to fit in a different world. That's it, right?
Harshil: Change review is a pull request review now, right? It’s the same thing.
Tim: That's true. So just change them up a little bit, but still the models still are appropriate.
Harshil: Right. So switching gears a little bit, tell me more a little bit about what was the breach in 2020 like for you and your team, specifically?
Tim: Yeah, so December 12, Saturday, 2020 is when it kind of started. We got informed by FireEye Mandiant that we had shipped ___ code in the Orion platform. So we started up an investigation really fast, learned quite a bit really quickly, because we didn't need to investigate whether it was true, because we essentially were presented with the compiled source that said, “Hey, this isn't your code”. And what we're able to tell was that three builds were affected. Essentially, March to June builds that we did, we knew it wasn't in our source code control system. That's why it got termed a supply chain issue, right? And the broader supply chain issue really came out a little bit later, but like, on day one, it was, yeah, this is a supply chain issue. It's not in our source code, it's somewhere in our build system, we don't know exactly how it got there yet, on day one. But we knew that, we knew which builds were affected. The whole fast forward, attributed to Russian SVR, a very specific targeted attack against SolarWinds. Very quiet, very stealthy. They compromised email 365 first, did reconnaissance through there, then came back, compromised an account, we're able to ship a no line set of code into the product that was built in October. So not infected, but it basically had a no op that they could see that it worked. Then they went away, came back in February for March through June builds and put code in. So very very quiet, very well thought out, very mission centric attack. Attacking the build system in a transient virtual machine as part of that build system. Just a very smart move to not get detected. So inside of us, they were stealthy. And then the code that they dropped again, stealthy in what they did. Didn't start for 14 days, whitelisted or blacklisted, things like SolarWinds wouldn't run any of our domains, any of our test domains, anything like that. Wouldn't run in a number of domains, attempted to shut off antivirus. So all about the attack was how do I not get detected? How can I get out there and not get detected? Now, what we originally said was that 18,000 customers were affected. Now, later on, what we discovered was under 100 went to a Stage 2 that actually tried to talk to the command and control server and successfully talked to the command and control server. So the belief now is that it was targeting a few major entities, mostly government entities were the end target, and we were really a route to that target. Everything around the code suggests that it wasn't meant to do harm. It was meant to simply provide an administrative access to the Windows server that was running Orion. And from there they could move laterally or attempt to move laterally within the environment. But it had to be connected to the internet, which is not something normally you do with an Orion box. So normally you control what you have from a connection perspective, so therefore, a lot of customers were not affected. But it did really affect everybody, right? It ruined our Christmas, ruined many people's Christmas across the board and really had everybody investigating“Hey, were you affected or not?”. So the first few weeks were kind of pure hell, 17-18 hours, days at least, more than that. I'd say more than that. Probably more like 20 hour days. And simply because you had so many things that were happening. So believe it or not, countries call you. So yes, countries can call. And we had countries, we had all the major governments of the world calling, we had special initiatives that were going on in each one of those regions. Now people were worried about Project Warp Speed. I think it was Warp Speed that was the development of the Covid vaccine for any of those companies involved. CISA was a great partner throughout this, they really helped us. Their mission is really to amplify the truth. We brought CrowdStrike in as a threat hunt partner. Great threat hunt partner, not really developers, so we brought in KPMG's forensic team to help us on the development side. We really focused on understanding what the environment looked like, really getting to root cause analysis about five months of that type of work that went on. We shut down development of new features for about six months, developed a new build system, essentially a system that double checked, making sure source code matched what we produced, that was step one in January. Then we moved everything to AWS. So we didn't move it but really recreated the environment in AWS. Then made everything ephemeral in that environment so nothing is not in code, and then implemented a multiple build pipeline model. So we don't just build once, we build multiple times and then with C-Sharp we're able to get the deterministic so I can actually compare the results of each one of those builds so no one person has access to all the build pipelines. So the comparison on the end says my engineering build matches my development, my development build matches my production or staging build, and we compare them and say, “Okay, we don't ship until they compare”. So anybody trying to get around or change our build today, it’s very very difficult. You need collusion among multiple people to do it. So those types of things really got developed in those six months. So a lot of just being exemplary in the security space and really showing people how to do that. The build system we open sourced to just help others.
Harshil: Oh, that's interesting. Yeah, but overall it's such a fascinating story to learn from in several different ways, right? Especially the amount of information sharing that you all have done, which is really helpful to the broader community for sure. And as we were talking about earlier, before the recording, this incident has done positive things to the industry because it brought light to this challenge of software supply chain security, which has been around for a while, but now it's just much more focused in this interconnected world especially when dependencies and third party software is being used so frequently all over the place.
Tim: Yeah, and I think we focused on making sure our customers got right. That was our first priority. Then the second one was we saw a great collaboration between the research community, a lot of good information there, and public private partnerships really came to a good place, right? So CISA sharing things with the FBI, sharing things with others to try to just help the community in general, and then supply chain conversations. As I said, we turned to the supply chain at first because it wasn't part of our source code system, it was part of our build supply chain. But then quickly our customers started terming it supply chain because we were in the middle of their environment. So the bigger supply chain, whether that is a power system or whether that's a food manufacturer, whether that's somewhere else, we were in the middle of that environment as a component of the system. So that's where supply chain stuff is. A lot of conversations around software build materials, that's great and important, we're a big fan of having that, but also a big fan of expanding that supply chain to include those systems and understanding what the system in the back end really has in it, and which of your components has access to what or what data it has access to. So you can really assess the criticality of that component. So a little bit of both.
Harshil: Yeah, I think unfortunately, the current state of software supply chain security is that there is no standard definition of what it actually means, right? We talked to many people and there's a lot of different definitions of software supply chain, and security product vendors are not helping in this scenario.
Tim: Haha, yup.
Harshil: Whether you're doing dependency checking or code integrity and provenance or CI/CD security, everyone calls themselves supply chain security. And I agree with you saying there's multiple definitions of it or the supply chain problem is a broader problem that encompasses SBOMs and the build and CI/CD systems and also code integrity. All those things are components of a supply chain.
Tim: All those things are components of a supply chain. And if you look at what your services that you provide, right? I'm providing power. Okay, so if I'm providing power, what makes up my power grid? What are the components in there? What are the pieces that are there? Okay, now what has access to my data? What has access to my system? What could affect things negatively? Do I have compensating controls for that? Yes or no? All right, now I have that model first, and then at the end the SBOM comes in as far as okay, well, what is the likely scenario that one of these components, be it open source or something else, it's included in this other component that creates a risk, right? And have I mitigated the risk of that component, or is it something I can't really mitigate so I have to take more stringent actions faster?
Tim: I just don't want us to focus on simply the supply chain SBOM problem. That's going to take a lot of work to do and it will sidetrack us.
Harshil: Right. And I think a lot of conversations are getting focused on just generating SBOMs for every application. My suspicion is that that's because the administration recently passed an order requiring vendors to the federal government to provide that. But there's an angle to it in terms of what are you going to do with those SBOMs, right? Or is that even the best thing you can do for securing your dependencies? What's your perspective? I think you had some thoughts on that?
Tim: Yeah, so it's an important sub component, but I've done a number of meetings with both vendors as well as consumers of the technologies. And the consumers say, “Well, what do I do with this data? Who does it? Who takes care of it? I don't have a team to take care of this. It takes me six months to acquire a product to start with. How can I add this on top of it?”. So the infrastructure is not really ready for them to take it on. Now, the value that you have is say, Log4j, where is it, right? Okay, so I should be able to run a report against all my SBOMS and say, “Oh, Log4j is in these things”. So I might not need to do a call or questionnaire to every one of my vendors and say, “Are you affected by Log4j?”. So there's some minimal benefit to that, but the benefit does not really reduce the risk of the entire system that I'm looking at, right? So I guess I'd like a combination, right? Great SBOMs, but the other part of an SBOM is that you need to have decks as part of that, being able to say why I'm vulnerable or not vulnerable to certain components, but there's also an environmental factor as we talk about with SAS versus on premise. The environmental factors really do matter when you look at exploitability, right? Something exploitable or not because of my compensating controls that I have. What should I work on first, what should I fix first? So there's absolutely a program around this that needs to be developed. We need to work on implementation. We need to understand what it looks like. I think in this sense, the technology to be able to generate an SBOM, that's simple, right.? It's the other half of it, of how you consume it, what you do with it, what actions you take on it, what should I prioritize around the outside? That's what's going to develop over the next few years. If DoD does this in the bill, essentially, so if they go forward, then you'll see other institutions quickly start going towards that way. It will happen, and I think it's good that it's going to happen. It's just a matter of how do we make it so that it provides the value that we're looking for.
Harshil: Right. I was just talking to one of the CISOs recently, and he was in a board meeting, and his board said, “Hey, we've been hearing about SBOM or supply chain. I don't really know what that means”. And the board said, “Do something about it”, right? So when you have conversations like this, just because the software supply chain is such a big ticket item, it's being discussed everywhere, but there's not enough awareness about it. What is the advice that you would give to security leaders who have to do something about supply chain security or really want to do something by themselves?
Tim: Yeah. So the first half is to understand the supply chain and the bigger picture, right? That's kind of the first step. For the services that I have, if I don't have a list of my mission and business critical services, build back, right? Start with that and then look at what makes up those mission and business critical services. Once you have that inventory, then start building up for that inventory the specific items that either provide access to an environment that has access to your critical data, understand what those are within that environment, then get the SBOM for those, right? Don't just start with the SBOMs for your thousands of apps that you use across your entire environment and put the infrastructure together to get everything. Bring it down to what really matters, and then once you get that done, then you can expand it out. But if you spend all your time getting all the SBOMs for every application that you have, that's all you're going to be able to do. That's all you're going to be able to afford to do. That's where you will stop, as opposed to being able to reduce risk for the service that you run for your mission and business critical services.
Harshil: I like that approach, which is SBOM. So what I understand is that generating SBOMs is non trivial, it could take a lot of time and energy so you would rather be better off focusing on understanding what your asset inventory even looks like, understanding what is really important to the business, and then figure out the SBOMs.
Tim: Then get the SBOMs for them to start with to be able to work with those vendors to understand, all right, so if I don't have compensating controls for an item and it's in my critical system, and guess what? I really do need to understand exactly those things, right? And then encourage your vendors to share more. But just be careful that you're not encouraging your vendors to share more, and then you're just throwing away the data that they provide you as well, because it's not going to help you on the other side.
Tim: Be practical with the approach. And the approach is really, how do I reduce the level of risk I have for my critical services?
Harshil: Yeah. Right. So since we are in the very, very early phases of supply chain security, software supply chain security, what's your view on where it is going, how will this look like in the next few years?
Tim: Yeah, so I think that if the US Government puts its buying power behind the kind of the SBOM area and the expectation of vendors, we will see that vendors start providing information, right? And it's really up to the vendors to give you that kind of level of detail that people need to assess risk. Some of the other components are more details about how development is done by the vendors, how they secure their own infrastructure, how they secure themselves from attack, but I think we'll get more prescriptive and the expectation of what a vendor provides will go up, right? We know we've seen that, we've seen that from many of our large customers that the expectation is going up. Then I think what you'll see is that requirement being shifted over to commercial entities, I think that's going to happen. And I think the tooling around SBOM and supply chain will start getting a little bit better in that it's not just about the SBOM, it's about the overall system. And then helping people understand the risks that they face within that system, either being discovered from a dependency perspective, or at least being able to kind of give you an appropriate map. And then we'll also see some advancements in that kind of automatic learning and saying, “Oh, wait a second, this component of the system is going away from normal. It is higher risk because normally it does this, but it popped up and did something that it abnormally does”. But I think we'll see that in the next kind of five years. Is it a little more AIML in that space that helps us determine hey, my supply chain is at higher risk because something weird happened, something different happened. And then we can start looking at saying why did it happen. So I'm a proponent of a lot of this and I think it's just going to take a few years for us to get there. But I think we're kind of on the right track.
Harshil: Definitely. It's a great look into the future and honestly, this challenge of bill of materials has been solved in the other industries. Like for example, the auto industry decades ago, right? This was a challenge, although a different set of challenges, but it has been a challenge. We're just now starting to get to it in the world of software.
Tim: Yeah, and I think we can learn a lot, right? Because those hardware components also have software. They also have firmware usually associated with them. So just because it's a piece of hardware, it also has vulnerabilities, right? It has vulnerabilities in the way that it's built. Maybe not as prevalent, but definitely there. So I think we can learn more from the hardware that really the hardware bill of materials as well and their models, they just don't change and probably not as complex as the number of components as what we see from a piece of software.
Harshil: Yeah, it makes sense. Tim, we’re at time for this and I feel like we could have continued this conversation for at least an hour more. Thank you for coming on the podcast and really sharing these learnings and your insights. I really appreciate your time for this, Tim.
Tim: Absolutely, it was a pleasure. Thank you.
How Does Robinhood Approach Hiring Security Team Members? On a recent episode of the Future of Application Security, Robinhood’s Chief Security Officer, Caleb Sima shared his views.Read more
How can product security teams build empathy with developers? On a recent episode of the Future of Application Security, Stripe’s Application Security Manager, Rajat Bhargav shared his...Read more