EP 9 — Mrityunjay Gautam: How Databricks Approaches Product Security

Databricks is responsible for massive amounts of data for more than 7,000 customers worldwide including more than 40% of the Fortune 500. This means security is mission critical and the stakes are incredibly high. To keep their customer data secure, Databricks has put major focus into building both their product security team and strategy. In January, their team had just two members and today, there are 11 with many additional roles ready to be filled.

To learn more about how Databricks approaches product security, Harshil speaks with the person leading the companies efforts — Mrityunjay Gautam, Databricks Global Head of Product Security.

Topics discussed in the episode:

The difference between application security and product security.
The skill matrix Mrityunjay uses in assessing skill sets of the people who join their product security team.
His recommendations on training programs and valuable resources for those starting their career in product security.
The three most common challenges in product security and how they can be overcome.
Understanding the difference between product threat models and deployment threat models.
How Databricks thinks about threat modeling given their incredibly complex environment.
How Databricks built a highly engaged group of security champions.
Strategies Databricks uses to cut down time spent on product security processes and workflows.

Resources mentioned:

Technical books: https://nostarch.com/

Harshil: Hey everyone, thanks for listening to the Future of AppSec. Today, I am speaking with Mrityunjay Gautam. He is the Global Head of Product Security at Databricks. Mrityunjay, thanks for chatting with me today. I'm so excited to have this conversation with you.

Mrityunjay: Thanks for having me here. Thank you so much.

Harshil: All right, so before we get started, why don't you tell us a little bit about yourself, your background, and what you do.

Mrityunjay: Alright. So currently, as you just introduced, I am running the Product Security functions for Databricks, where we are trying to build a proper secure development lifecycle for the entire database platform and anything which is allied. So our target is to make sure that anything we ship out there is not easily hackable. That's our mission statement. Because it's an unfair thing to claim that it's not hackable. That's a fake claim. Nobody makes that, right? So our target is that it should not be easy to hack, and if there's ever a security situation or incident that could happen, then the impact of the issue should be minimal because of the defense in depth that we try to build for the product. So that's at a high level what we do at Databricks. And prior to this, I used to be with Citrix Systems, where I worked for eleven years. I helped build the entire security engineering functions, which included the Product Security, which was a defensive organization, the offensive security, red teaming, vulnerability response program, PSIRT, their entire logging infrastructure, logging standards, and everything across the product. So I did all of those things for Citrix systems before that. And that's sort of the end of my leadership experience because eleven years at Citrix was a long time.

Harshil: Oh, wow. That is really a long time. That's awesome. So coming from Citrix Systems, which is a large company, a lot of different businesses, a lot of different product lines, global presence, coming into Databricks, how did it feel? Was it different?

Mrityunjay: Oh, it was very different because Citrix has its own upsides. As you correctly said, they had their hands in literally everything, right? From cloud, to networking, virtualization, mobile technologies and everything, which was very different. But the primary presence for them was on-prem, right? On-prem business. Coming to Databricks, everything is in the cloud. Even our offices don't have data centers or anything. It's just like a cafe, you could just come in and sit there and it's like there's a cafe with internet connection. So it's a very different view because the entire lab, all the work is completely in the cloud. And therefore that changes the perspective of the hacker. So what does a breach really mean if I come from a hacker perspective, if I have to breach into something like Citrix, versus something like Databricks? It's a very different perspective. The challenges are very different.

Harshil: Interesting. And for those of the audience who are not aware of what Databricks does, do you mind giving a really high level overview? What does Databricks do?

Mrityunjay: So Databricks is essentially as a company, we are providing a platform for our customers where you can bring in your data and build AI based models, machine learning models, and work with it. But what we solve for you is the scalability, the security, and the speed at which you can work on the AI problems. So the biggest problem of doing any machine learning problem is that the time of training the data takes an eternity, right? And if you are to deal with gigabytes and petabytes of data, now you are going out of range. So being built on our infrastructure, it's a super scalable, highly fast system. So that's what Databricks does for our customers. So we're essentially helping non fans solve the AI problem. Let's say it that way.

Harshil: That's phenomenal. I'm guessing with that comes the challenge of a lot of data, managing the security of a lot of data from your customers. So you don't have a small responsibility. It's a big challenge I'm guessing.

Mrityunjay: It is, right? Because every time we’re onboarding a customer, the customer brings in the data. But at the same time we are effectively onboarding their enemies as well, right? Anybody who wants to breach it and get the data, because that's another avenue where the data can be stolen from. So that makes the job very very critical. We just cannot afford a breach because that will break the confidence of the customer.

Harshil: That's actually a very interesting perspective that when you onboard a customer, you also onboard the attackers, the potentially bad actors. Because if somebody is trying to go after one of your customers, you hold all the data, they might go after you to get that data. That's interesting.

Now, you mentioned briefly Citrix, your earlier employer, they have a lot of on premise things, and I'm guessing the software development deployment, and hence the security lifecycle would be a little bit different on-prem versus a complete cloud platform like Databricks.

Mrityunjay: It is.

Harshil: What were the key challenges or key differences, I guess that you've seen in an organization that ships on-prem products versus an organization that is cloud native application?

Mrityunjay: So it starts right from the assumption on the base infrastructure. So when you use any product from any company, there are two views, right? One view is that of a customer where you are trying to…like, for example, when you see Databricks, you essentially see one web interface which is like a notebook style interface. You can go and create your code. But that's not what the product is when you see them internally. Internally, we are talking about an entire array of servers, a lot of parts, containers and everything, trying to make sure that there are strict multi-tenant aggregations and a bunch of things, right? So the complexity of the problem is very different when you see from the inside view of engineering versus what you get from the customer. That differs quite a bit because the moment you are talking about a product that is completely in the cloud, you're effectively not just the engineering team building the product but also the IT team managing the product deployment and the customer only gets the use case. The moment you are dealing with the on-prem product, now the IT is still with the customer, right? So they have to deploy the product, they have to manage the complexity of IT and everything. So because of that shift of who is going to deploy and manage the product, the security problem changes. Because the moment we ship the product out, then I have written good code, but if it is misconfigured and it is incorrectly deployed, it is a problem of the customer from a liability perspective and there's only so much we can do about it. In Databricks it’s different because even if we have written the code amazing, but if we misconfigured something and there's a hack, it is still our liability. So the scope expands suddenly quite a bit because we are not just talking about writing the safe code but it's more than that. Much more than that.

Harshil: Right. Yeah, that's a good point. I mean the layers of the stack that you provide as a product that's ready for the customer to use but there are still some layers that are quote unquote service, right? Because in a way it's your responsibility to manage and maintain and operationalize and secure it on an ongoing basis. So your responsibility, the scope increases for sure. What about the pace of delivery? Because if you're shipping on-prem products I'm guessing there are scheduled updates, releases of software, so you have a little bit more time as compared to a cloud application where I'm guessing potentially faster releases, more frequent releases?

Mrityunjay: So that's where actually the biggest difference comes in for security strategy as well. So like for example, when you are dealing with a waterfall style of development, right? You have a good amount of time to work with design reviews and threat models so those changes can come into the code, you can review that, it goes out. Fine, it's doable. You also know that the scheduled release is at the end of the quarter, right? So you already know that it's the end of the quarter, no big deal. Alright, you get into the cloud world. Now you are literally publishing changes in production maybe 200 times a week. You cannot really block the velocity of the development team by saying that unless we clear your design reviews and you have fixed the bugs, only then that's when you can release because that's not the speed at which you can work if you do it manually. So it's going to be very automation focused where a lot of low hanging problems can be resolved and identified by automation. Things can be auto blocked because you cannot expect a human to go and do a health check or a release check for anything 200 times in a week. It's just not possible. So that equation becomes very different. And therefore the strategy in which we implement our security and even the basic SDLC as traditionally defined by Microsoft, you go 15 years back, the same model is very different when you do it for the complete cloud products.

Harshil: Yeah, that's a brilliant way of looking at it, which is the strategy fundamentally differs in terms of you know, in the waterfall model, it's more assessments and communicating the findings as compared to in a more rapid agile model, it's more preventative things that controls are just in place to enable shift left or build securely from the beginning in the first place. I guess that's the existential question. How do you actually implement those things? Do you have a team of people who are building these automations and things? How does that work?

Mrityunjay: So the strategy that we usually do is… So there's a base strategy, right? Which is the STLC strategy, which we understand, right? Everybody understands that in the industry. The idea is that how much of this can be automated and can be shifted left in a way that as a developer, when I'm writing something, for example, I'll give you a simple example. This is a super cool project that we are working on. And this project is essentially supposed to be auto generating risks, basically it auto generates threat models, or let's say it does auto risk assessment for Terraform scripts, right? So rather than somebody going in manually understanding what the environment looks like, what could go wrong, we are trying to extract out all that human intelligence and dump it into a code which can now be given as a service. So that let's say our infrastructure team, whenever they want to publish a new change into their cloud, they could just take their TF file, dump it into our service and it will auto generate that and “Okay, here are the problems that you will face”, right? And by doing that, we just cut down on weeks of back and forth where somebody has to go deploy that thing, look at the state and everything and just analyze everything. All of that function is cut down in a matter of a few minutes.

Harshil: So maybe this is a very naive question, but help me understand how is that different from Terraform scanning things that existed. There's TF scan, there's a bunch of commercial products that would look at your TerraForm scripts and identify problems with it. How is it different?

Mrityunjay: That's the first step. That's your first step. The problem happens… So this is a little more enhanced than that because we are talking about the individual components of each section. And then you essentially, like if you are bringing in a specific component, let's say a keyword, right? Or if you bring in, let's say a Cosmos DB equation, there are certain… Each component has its own risks, its own threats, and based on how something is configured to connect, a subsection of that threat gets included in your final threat model. So it's not as simple as just doing a simple scan and figuring it out. But we are trying to build a library of threats which are associated with each cloud component, and then based on the state, what the Terraform state looks like, we are extracting order. This is what the final threat model looks like. And that gives you a risk rating of, “Okay, maybe we need to mitigate this one, we can live with these, but this can go”.

Harshil: That's awesome. And is there also an ability to connect what you deploy using Terraform with what is the actual service that's running? Because that would live as code or repo somewhere else, right? Or I mean, within the same source control system. But how do you know what is actually being deployed here as a service?

Mrityunjay: So we have maintained the TF state file for the entire infrastructure and that's what we essentially use as the input. So we are trying to assess not the TF conflict, but actually the TF state so that we have an actual picture of what the deployment really looks like.

Harshil: Yeah. Are you guys planning to open source this anytime soon?

Mrityunjay: Well, as soon as our testing is done, we might actually. We need to get it cleared with legal, but I don't see a good reason why the community should not be able to use that.

Harshil: Right. Yeah, this will be a very interesting open source project if you're ever able to open source it. Phenomenal. So I’ve heard the use of the term product security in your title and you talked about it as well a little bit. Help me understand, how do you think product security is different than traditionally what we used to call application security? Or is it different?

Mrityunjay: It's a little bit different. Application security is a subset of Product Security, I would say. Because if you think from an application security perspective, most of the time what people understand from application security is something where you're doing the application Layer 7 work and everything. Product security goes beyond that because we are dealing with multiple components. We are talking about the entire product as a whole. So we are looking at not just what the front-facing interface is, but go deeper down, look at the entire infrastructure and then assess it in two different ways, right? One is the assessment from the architecture perspective on how this company interacts with whatever else and how it goes beyond…. It’s beyond the front-facing interface. And the second thing is the deployment itself, which is what I was talking about now that we are dealing with the cloud story, right? So it's not just that. So between what was traditionally called cloud security and then the whole infrastructure security, and the application security, you combine all that together, that's our view of product security.

Harshil: Right, so that's interesting. Does that also mean that you have to either hire a different skill set or train your team with a little bit of a different skill set going towards more cloud configurations, infrastructure containers, all of that as well, which is standard for core AppSec people?

Mrityunjay: You're absolutely right. So the way I've been doing this is that we have a nice skill matrix which we have built that’s for this team to be successful, what are the skills that we need in the team and how many people fit in what blocks, right? So we exactly have a good matrix which says that “Okay, with our existing team, we have skills on these X areas out of total Y areas we need, right? So as we are doing hiring, not every hire is going to be of the same skill set. So there are people who are, in my team, who are experts on containers and triangulation and the internal systems and internal, right? There are other guys who are completely web hackers who have skills to completely compromise like, you throw them on anything they will just break it. Super cool guys. So very different skill sets, but essentially we are trying to fill that matrix in a way that the team in itself is super effective as a product security team, where I would rather have people who are a master in one rather than a jack of all trades.

Harshil: That's awesome. Do you have any suggestions or recommendations on training programs or like, if somebody is just, you know, early in their career and they want to model themselves as a really good, strong product security person, what are the resources they can access? Any suggestions on things they can do?

Mrityunjay: There are a lot of good resources. Again, security, or let's say product security is not a small area, right? I can be blunt about it and I can say that there is nobody on the planet, I can guarantee, who can say that they know everything on all the domains of security. There's literally no one. So first of all, you need to choose your area of expertise that you want to build. Sometimes you want to build a niche, right? Like somebody who feels that maybe they're good at cryptography and they want to build their skills on crypto attacks. That's one area, right? So that's a very different set that they want to study. Somebody who wants to build in web security, for example, they could go through a different set of ideas. There are folks…let's say hacking into the cloud infrastructure is another thing, right? So that's a very different kind of skill set. From a certification perspective today, I think the most popular certification which people like to see is OSCP. So if you have OSCP, it definitely gives us a good confidence that yes, you are somebody who has hands-on experience in actually doing that. And it's not just super theoretical that you have read through a certain page that you're talking about it.

Harshil: Right.

Mrityunjay: So I think that's important. And there are some very nice books on Amazon specifically from No Starch Press on web hacking and everything, which is pretty good, actually.

Harshil: Yeah. So let me ask you this. You mentioned certificates and reading books and things like that. When you're hiring for talent, I'm sure you're looking at hundreds of different resumes. Are there particular things that you look for in a resume, at a very high level? When you're looking at high volume, you're trying to build a strong technical team. What do you look for?

Mrityunjay: So usually when I'm looking at resumes, there are a few things which are red flags, which I typically don't… like if I see that too much on the resume, I would probably not talk to the person. So when you look at the resume, and if you see a lot of tools listed out there, like I see Nessus, and Burp, and Web Inspect, and Fortify. And if that is the highlight of the resume right on the first page that “I am good at these 70 different tools”, then I know that, “Okay, this person does not feel they know any security”. All they know is running tools and looking at reports and sending it out. Because when you're working in product security, you're not really dealing with issues which are easy to discover or which are well known. You're dealing with literally finding zero days every day in the product because nobody knows about it. You're doing that research, right? So that's something you have to know that the person has an experience beyond that, beyond running tools. The other thing is that I also like to see and again, OSCP was definitely one big positive. The other big positive for me when I'm looking at resumes is, has this person ever talked or presented in a security conference? It's not about whether the topic was relevant, but it's about that research mindset, right? Whether you are willing to not stop when you face a problem, but you're going to dig deeper, see what are the new things to do right, and break through. Some of those things definitely stand out as positive. And it helps me talk to the right people.

Harshil: Yeah. So you mentioned that research mindset. So I'm guessing like, if somebody has a history of Bug Bounty record, that might be interesting as well, as above and beyond recognition.

Mrityunjay: Correct. So a lot of times, specifically Bug Bounty, there are so many people who are playing Bug Bounty and people write that on their resume, which is great, actually. But I usually like to go into the CV and read through what they have done and the complexity of their problem.

Harshil: That's amazing. You actually take the time to go read that.

Mrityunjay: Yes, I will do that. I will check out if they're on Hacker One, I'll go on their hacktivity and check out their profile and see what kind of issues they have really submitted right? And what has really been the complexity, and if there's a common pattern which they are seeing, “Okay, maybe this guy only does this, that's the only thing I have seen”. So I know that they're limited to certain areas.

Harshil: Yeah. I found myself looking into people's public GitHub repos and just trying to see what they work on, what they release, what they're contributing to, and things like that. For me, that's also a good indicator

Mrityunjay: I do agree with you. And I have even seen people writing some blogs which they build, and sometimes those blogs and the quality of content says a lot about how much they really analyze the problem.

Harshil: Right. Yeah, so switching gears a little bit, I mean we talked about a lot of different topics within product security, right? So you mentioned, obviously, the traditional web AppSec, there is the cloud security stuff, there is crypto, and some of the more deeper things around architectures as well. Now, since the scope of AppSec has expanded to this broader category of product security, when you come in in a new organization like you just joined six months ago, seven months ago, when you come in, how do you think about the strategy? Because it's very easy to be very tactical and just keep operating on a day to day basis because there's just so much to do. But how do you take a step back and how do you think about strategy from a product security perspective?

Mrityunjay: There are two things that I did, I think. The first item was to not make an assumption of whether the company has a good strategy or they don't have a good strategy. Let's not make that assumption. Actually go through… So I started with right from a few things like look at this kind of defects getting filed, run through all the defects, look at all the open problems that are known in the system, right? So that's the homework that you got to do - to understand what is the level of depth at which they have already been analyzed. Secondly, there were a lot of interviews that I had to do with various leaders and individual contributors to understand specifically in engineering, because it also gives us a good picture. When you're in product security, you're dealing with engineering as your customer, right? And you’ve got to keep your customer happy. If you cannot value that, it's not going to work. So from an engineering perspective, it was very important to get that feedback. That what is not working for ProdSec. Or for example, what is it that you would like to be done better? Right. And that view is very important. So it actually took me a few weeks, almost three weeks, to run through almost 20 - 25 different interviews on different people to understand where things are, do the bug assessment, do all the background assessment. Look at the quality of threat models which are written, right? What is missing, what is not missing. And it definitely helps to have that skill yourself, because then I can take a look at it and I can say, “Okay, fine, I can see that you are sticking to a standard set of problems on spoofing, tampering, but you are actually missing maybe on an actual design problem here”, right? And those kinds of inputs, and then you see the difference between, okay, maybe some people are really doing beautiful work, they've done excellent work, even if they have not been in security, versus some people are messy. So you can kind of get that picture. So you do a gap analysis based on that, and then you start mapping it based on the timeline, “Okay, this is realistically what I can do”. That's one. And secondly, that's a quantitative measure. There has to be a quantitative measure by which you can be honest about your own assessment, right? So the model that I used was the OWASP SAMM model. This is the maturity model, which checks for your product security, maturity and goes end to end, right? So we did an assessment on that and again, that was like, it does not just talk about Product Security, but about offensive security, incident response, cloud infrastructure, everything. Right from requirements up to response. So we go through the whole thing, we do assessment and we say, “Okay, I have a score of X out of three as the highest score possible. From X, how do I reach three, and what is the realistic thing? And we got to understand that the growth is always going to be asynchronous. So it's not going to be like a straight line that, “Hey, I can reach from X to X plus one in one month, and X plus one to X plus two will be in another month”. Doesn't work that way. So be realistic about it, have a fair expectation, and then that's where I think it took me around a month to get all those things sorted. And after that, the winning factor was the support from leadership.

Harshil: That’s always important. So I guess one of the questions that I used to spend a lot of time just trying to figure out myself and never found the right answer is when you look at these maturity models, whether it's Open SAMM or BSIMM or what have you, there are so many different domains, and agreed, all of them are important in one way or the other. But when you do a maturity assessment, I'm sure there's going to be a lot of gaps in any company in a lot of different areas.

Mrityunjay: Yes.

Harshil: So it's also not reasonable to expect that we will reach the highest maturity level across all of those domains.

Mrityunjay: Oh, I never said we will. Haha.

Harshil: Exactly. So you had to pick and choose. Now the question is, how do you pick and choose which domains to focus on first and which ones come next?

Mrityunjay: Well, it's all about the ROI, right? So where my investment is going to give me the maximum return at the end of the day, which is what the shift-left model is, if you think about it. There is something which we have to definitely build in maturity. Like for example, the vulnerability response program in itself is super critical because your customer is thinking about it or some hacker has reported an issue with you. Even though it's the rightmost section, there has to be a certain amount of process and maturity over there. So that definitely has to be done. But once that reactive model is set up, then you don't have to bring as much maturity in your last bit of pentesting pre-release immediately as much as you needed to bring in the threat model. Because the reality is that fixing the problem at the threat model level is always going to be much less expensive compared to that. So it's all about shifting left at the end of the day. So if I have to see… like there are 15 different sections in the OWASP SAMM model. And if I have to raise the bar, If I just look at it as a bar chart, then if I could raise the bar, I'll start raising the rightmost bar of the response first, bring it as high as possible and then immediately shift to my leftmost model and then start bringing it from there. So it's going to be like a curve which is going to be left heavy. And that's the general strategy.

Harshil: Yeah, that makes sense. And I'm guessing as you change your areas of investments, you probably will need different types of skills within the team, different types of functions, different types of processes. So the team's responsibilities, charter, objectives also will keep changing as you go through that maturity improvement.

Mrityunjay: I would say so. Yes, you're absolutely right.

Harshil: Fantastic. In terms of just the broader industry, do you have any insight on what are the common things that you've seen across, let's just call it product security? Any common challenges within the industry? If you want to share a few thoughts on that?

Mrityunjay: I think there are certain kinds of challenges which we are seeing across the industry. One of the common, most common things we are seeing is distributed tracing. So, like, everybody has models for logging. And when you're writing code, everybody's writing logs, which is great, actually. But the problem happens when you're dealing with microservices architecture, right? So many microservices interacting with each other in a multitenant environment, spread over multiple clouds, including customers' data center. Now you're dealing with a super complex environment. Imagine a hacker who just entered into the system from some random endpoint in some place and then they jumped between services, and jumped around, right? If you have to realistically find the trace of exactly when that happened, and not just the entry point, but exactly where they all went, it's super hard because what happens is that each of these services are being developed by different groups, right? Every team is building their own and therefore everybody follows their own model of what they want to log, what they don't want to log. The cross service jump tends to get missed. And that's the problem of distributed tracing. And it's the one common problem which I have seen with so many companies happening. So that's something which I think the industry as a whole is maturing towards, including us. We are maturing as well. But it's one problem. Let's think about it. What else? The other problem which I have seen very common is the view on threat modeling. And most of the time people see threat modeling as just a product - threat model, which is the STRIDE model which Microsoft defined a long time back. The role is more than that, specifically in the cloud space. So I like to see threat modeling as two issues to be solved, not one. So there's a product threat model, which is what the traditional threat model is, and then there's a deployment threat model which is something which we are completely missing. This is what I was talking about earlier, right? So threat models are actually a split problem but most of the people who are looking at threat models are only solving one of them not the second one, which is another issue which we are seeing. And the third issue which I usually see is around penetration testing. A lot of times when you are getting penetration testing, even if it’s with a third party vendor or whatever, what happens is that it's all about the ROI. So most of the time people are just looking at OWASP Top 10. And “Okay, can I find this vulnerability?” But what happens is that you miss on everything, which is a business logic vulnerability, which was interpreted because you didn't go that deep. And so the depth of pentesting is something which is missing. So I think those are the three things that come to my mind immediately.

Harshil: It's awesome. Yeah, that's a really good summary of key challenges. I 100% agree that those are unsolved problems in the industry. In terms of threat modeling, you brought up a good point, which is it's not just a product, but it's also the deployment threat model. Have you seen anybody do similar work? Are any other security teams doing work on this type of consolidation or expansion of threat models?

Mrityunjay: I am not personally aware of that. I'm more than sure Microsoft, Google, these guys would be invested into that. But I have done this for Citrix in the past and we've been working on this for years as well. I'm more than sure the problem is not an unknown problem. I'm more than sure people will realize this sooner or later.

Harshil: Awesome. How do you guys do threat models? I mean, if you have a fast moving engineering environment, how do you actually practically do it? And do you do it for everything or do you do it for selective things?

Mrityunjay: So, we have two kinds of design review processes. So one is what we would like to call it as a light threat model, which doesn't really require a formal drafting of everything. That will typically happen for minor feature changes, etc. And the second thing is a formal threat model when there's any major change which touches the security component and everything and then there's a formal model. So those are two different streams, obviously. The first one, which is a simple design review, will require much less investment and it is typically done over one or two calls. The other one requires maybe a week worth of work. Now, the problem happens on scalability, and for that we have a brilliant champions program that we have and honestly, I've never seen it work so well in other companies the way it is working here. We have got more than 40 champions from engineering who are pretty involved actually and they're really good. They work very closely with the process team. Literally like we work as partners, we try to build threat models quickly, and our target is to speed up the process as much as possible. So it's not just my team, but with the 40 champions effectively as an extension of the project team, that works out very well.

Harshil: That's awesome. Especially if you can fully leverage those security champions who obviously are very deep into the code base, into the architecture and they are the best people to do the threat modeling work. What do you do with all the artifacts that come out of it? So, let's just say a security champion performed a threat model, light or heavy, whatever that is, it may or may not come with some artifacts or findings or to do or what have you. Do you maintain visibility tracking?

Mrityunjay: We do. We put everything into Jira. There is a specific label that we use, and then it is tracked at the engineering leadership.

Harshil: So your expectation is for the engineering team to identify them through security champions but also fix them at a certain level.

Mrityunjay: So we work with security champions and ProductSec. We don't just stick with engineering. So the problem is very weird here because let's say if I am somebody in engineering who is trying to build something, it is hard for me to find the design problems there because if I would have found it, I would have fixed it in the beginning itself, right? So it's like, it's not possible. So relying on engineering to do it is an unfair expectation, very honestly. So between the champions and engineering, they are bringing a lot of good knowledge on the product and architecture in the code. We bring in that evil eye, right? That “Okay, how do we break things into that?”. And you combine those two and you become like a very strong force to analyze the whole thing.

Harshil: Right. And where do you draw that line? Let's say you have some findings out of their architecture review. One extreme is to force them to fix things, but you're at the same time disincentivizing them to find things in the first place or hide things from you. Or there's like on the other side, you can do whatever you want, it's up to you. Totally leave it up to them. So what's your perspective? Where do you stand in the middle of those two extremes?

Mrityunjay: So my perspective is not on absolute security. It's about risk management. So you have to understand that if we are not fixing something completely, is there a mitigation that we can do to lower down the severity or the possibility of the exploitation? At that stage, we come to a level where what is an acceptable risk when you go out with something? There's no product out in the world with zero vulnerabilities, right? You see the number of patches that Chrome gets these days? It's just crazy. There's no product out there which has no vulnerability. So we have to have a risk management attitude and that's what we do. At the end of the day, we will clear that by saying that, “Okay, fine, from a threat modeling perspective, you have to mitigate this,”. When the code is written, we will go through a manual code review and see whether the mitigation was actually applied. And if we find anything that, “Yeah, this may or may not work”, then “sure, go ahead and do an exploitation and see if it actually works”.

Harshil: Interesting. So not only does the security team get involved in making that decision whether it's acceptable or not, but somebody also validates whether the fix has been put in place in code.

Mrityunjay: Yeah.

Harshil: Interesting. That sounds like a lot of work.

Mrityunjay: It is. It is a lot of work, but it's very interesting. Let's say that.

Harshil: Fascinating. This is super exciting. You guys are doing a lot of amazing work. Are you hiring for AppSec?

Mrityunjay: We are. We are actively hiring.

Harshil: What types of roles are you hiding for?

Mrityunjay: Well, that's the Product Security role, right? So everything that I just told you, that's the kind of thing we are doing. We are hiring in the US as well as Europe, and we actually grew the team from like two people in January when I started. Right now we have like eleven people. And that's reasonably fast, frankly, with six months of hiring. Our target is to hire at least four more people towards this year. But yeah, you never know, we may get more .

Harshil: That's amazing. Well, if any of you listeners, if you're looking to join a top notch security team with security conscious developers, go to Databricks.com and check out the security openings on their careers page. Mrityunjay, this has got me super excited. This is all the time we have to cover today. Thank you so much for spending the time here, and I hope to have you back sometime soon. Let's stay in touch.

Mrityunjay: Thanks, Harshil. Thank you so much for having me here. Thank you.

Transcript

Harshil: Hey everyone, thanks for listening to the Future of AppSec. Today, I am speaking with Mrityunjay Gautam. He is the Global Head of Product Security at Databricks. Mrityunjay, thanks for chatting with me today. I'm so excited to have this conversation with you.

Mrityunjay: Thanks for having me here. Thank you so much.

Harshil: All right, so before we get started, why don't you tell us a little bit about yourself, your background, and what you do.

Mrityunjay: Alright. So currently, as you just introduced, I am running the Product Security functions for Databricks, where we are trying to build a proper <a href="https://snyk.io/learn/secure-sdlc/">secure development lifecycle</a> for the entire database platform and anything which is allied. So our target is to make sure that anything we ship out there is not easily hackable. That's our mission statement. Because it's an unfair thing to claim that it's not hackable. That's a fake claim. Nobody makes that, right? So our target is that it should not be easy to hack, and if there's ever a security situation or incident that could happen, then the impact of the issue should be minimal because of the defense in depth that we try to build for the product. So that's at a high level what we do at Databricks. And prior to this, I used to be with Citrix Systems, where I worked for eleven years. I helped build the entire security engineering functions, which included the Product Security, which was a defensive organization, the offensive security, red teaming, vulnerability response program, PSIRT, their entire logging infrastructure, logging standards, and everything across the product. So I did all of those things for Citrix systems before that. And that's sort of the end of my leadership experience because eleven years at Citrix was a long time.

Harshil: Oh, wow. That is really a long time. That's awesome. So coming from Citrix Systems, which is a large company, a lot of different businesses, a lot of different product lines, global presence, coming into Databricks, how did it feel? Was it different?

Mrityunjay: Oh, it was very different because Citrix has its own upsides. As you correctly said, they had their hands in literally everything, right? From cloud, to networking, virtualization, mobile technologies and everything, which was very different. But the primary presence for them was on-prem, right? On-prem business. Coming to Databricks, everything is in the cloud. Even our offices don't have data centers or anything. It's just like a cafe, you could just come in and sit there and it's like there's a cafe with internet connection. So it's a very different view because the entire lab, all the work is completely in the cloud. And therefore that changes the perspective of the hacker. So what does a breach really mean if I come from a hacker perspective, if I have to breach into something like Citrix, versus something like Databricks? It's a very different perspective. The challenges are very different.

Harshil: Interesting. And for those of the audience who are not aware of what Databricks does, do you mind giving a really high level overview? What does Databricks do?

Mrityunjay: So Databricks is essentially as a company, we are providing a platform for our customers where you can bring in your data and build AI based models, machine learning models, and work with it. But what we solve for you is the scalability, the security, and the speed at which you can work on the AI problems. So the biggest problem of doing any machine learning problem is that the time of training the data takes an eternity, right? And if you are to deal with gigabytes and petabytes of data, now you are going out of range. So being built on our infrastructure, it's a super scalable, highly fast system. So that's what Databricks does for our customers. So we're essentially helping non fans solve the AI problem. Let's say it that way.

Harshil: That's phenomenal. I'm guessing with that comes the challenge of a lot of data, managing the security of a lot of data from your customers. So you don't have a small responsibility. It's a big challenge I'm guessing.

Mrityunjay: It is, right? Because every time we’re onboarding a customer, the customer brings in the data. But at the same time we are effectively onboarding their enemies as well, right? Anybody who wants to breach it and get the data, because that's another avenue where the data can be stolen from. So that makes the job very very critical. We just cannot afford a breach because that will break the confidence of the customer.

Harshil: That's actually a very interesting perspective that when you onboard a customer, you also onboard the attackers, the potentially bad actors. Because if somebody is trying to go after one of your customers, you hold all the data, they might go after you to get that data. That's interesting.

Now, you mentioned briefly Citrix, your earlier employer, they have a lot of on premise things, and I'm guessing the software development deployment, and hence the security lifecycle would be a little bit different on-prem versus a complete cloud platform like Databricks.

Mrityunjay: It is.

Harshil: What were the key challenges or key differences, I guess that you've seen in an organization that ships on-prem products versus an organization that is cloud native application?

Mrityunjay: So it starts right from the assumption on the base infrastructure. So when you use any product from any company, there are two views, right? One view is that of a customer where you are trying to…like, for example, when you see Databricks, you essentially see one web interface which is like a notebook style interface. You can go and create your code. But that's not what the product is when you see them internally. Internally, we are talking about an entire array of servers, a lot of parts, containers and everything, trying to make sure that there are strict multi-tenant aggregations and a bunch of things, right? So the complexity of the problem is very different when you see from the inside view of engineering versus what you get from the customer. That differs quite a bit because the moment you are talking about a product that is completely in the cloud, you're effectively not just the engineering team building the product but also the IT team managing the product deployment and the customer only gets the use case. The moment you are dealing with the on-prem product, now the IT is still with the customer, right? So they have to deploy the product, they have to manage the complexity of IT and everything. So because of that shift of who is going to deploy and manage the product, the security problem changes. Because the moment we ship the product out, then I have written good code, but if it is misconfigured and it is incorrectly deployed, it is a problem of the customer from a liability perspective and there's only so much we can do about it. In Databricks it’s different because even if we have written the code amazing, but if we misconfigured something and there's a hack, it is still our liability. So the scope expands suddenly quite a bit because we are not just talking about writing the safe code but it's more than that. Much more than that.

Harshil: Right. Yeah, that's a good point. I mean the layers of the stack that you provide as a product that's ready for the customer to use but there are still some layers that are quote unquote service, right? Because in a way it's your responsibility to manage and maintain and operationalize and secure it on an ongoing basis. So your responsibility, the scope increases for sure. What about the pace of delivery? Because if you're shipping on-prem products I'm guessing there are scheduled updates, releases of software, so you have a little bit more time as compared to a cloud application where I'm guessing potentially faster releases, more frequent releases?

Mrityunjay: So that's where actually the biggest difference comes in for security strategy as well. So like for example, when you are dealing with a waterfall style of development, right? You have a good amount of time to work with design reviews and threat models so those changes can come into the code, you can review that, it goes out. Fine, it's doable. You also know that the scheduled release is at the end of the quarter, right? So you already know that it's the end of the quarter, no big deal. Alright, you get into the cloud world. Now you are literally publishing changes in production maybe 200 times a week. You cannot really block the velocity of the development team by saying that unless we clear your design reviews and you have fixed the bugs, only then that's when you can release because that's not the speed at which you can work if you do it manually. So it's going to be very automation focused where a lot of low hanging problems can be resolved and identified by automation. Things can be auto blocked because you cannot expect a human to go and do a health check or a release check for anything 200 times in a week. It's just not possible. So that equation becomes very different. And therefore the strategy in which we implement our security and even the basic SDLC as traditionally defined by Microsoft, you go 15 years back, the same model is very different when you do it for the complete cloud products.

Harshil: Yeah, that's a brilliant way of looking at it, which is the strategy fundamentally differs in terms of you know, in the waterfall model, it's more assessments and communicating the findings as compared to in a more rapid agile model, it's more preventative things that controls are just in place to enable shift left or build securely from the beginning in the first place. I guess that's the existential question. How do you actually implement those things? Do you have a team of people who are building these automations and things? How does that work?

Mrityunjay: So the strategy that we usually do is… So there's a base strategy, right? Which is the <a href="https://www.sealights.io/software-quality/an-introduction-to-software-testing-life-cycle-stlc-definition-and-phases/">STLC strategy</a>, which we understand, right? Everybody understands that in the industry. The idea is that how much of this can be automated and can be shifted left in a way that as a developer, when I'm writing something, for example, I'll give you a simple example. This is a super cool project that we are working on. And this project is essentially supposed to be auto generating risks, basically it auto generates threat models, or let's say it does auto risk assessment for Terraform scripts, right? So rather than somebody going in manually understanding what the environment looks like, what could go wrong, we are trying to extract out all that human intelligence and dump it into a code which can now be given as a service. So that let's say our infrastructure team, whenever they want to publish a new change into their cloud, they could just take their TF file, dump it into our service and it will auto generate that and “Okay, here are the problems that you will face”, right? And by doing that, we just cut down on weeks of back and forth where somebody has to go deploy that thing, look at the state and everything and just analyze everything. All of that function is cut down in a matter of a few minutes.

Harshil: So maybe this is a very naive question, but help me understand how is that different from Terraform scanning things that existed. There's TF scan, there's a bunch of commercial products that would look at your TerraForm scripts and identify problems with it. How is it different?

Mrityunjay: That's the first step. That's your first step. The problem happens… So this is a little more enhanced than that because we are talking about the individual components of each section. And then you essentially, like if you are bringing in a specific component, let's say a keyword, right? Or if you bring in, let's say a <a href="https://en.wikipedia.org/wiki/Cosmos_DB">Cosmos DB</a>
equation, there are certain… Each component has its own risks, its own threats, and based on how something is configured to connect, a subsection of that threat gets included in your final threat model. So it's not as simple as just doing a simple scan and figuring it out. But we are trying to build a library of threats which are associated with each cloud component, and then based on the state, what the Terraform state looks like, we are extracting order. This is what the final threat model looks like. And that gives you a risk rating of, “Okay, maybe we need to mitigate this one, we can live with these, but this can go”.

Harshil: That's awesome. And is there also an ability to connect what you deploy using Terraform with what is the actual service that's running? Because that would live as code or repo somewhere else, right? Or I mean, within the same source control system. But how do you know what is actually being deployed here as a service?

Mrityunjay: So we have maintained the TF state file for the entire infrastructure and that's what we essentially use as the input. So we are trying to assess not the TF conflict, but actually the TF state so that we have an actual picture of what the deployment really looks like.

Harshil: Yeah. Are you guys planning to open source this anytime soon?

Mrityunjay: Well, as soon as our testing is done, we might actually. We need to get it cleared with legal, but I don't see a good reason why the community should not be able to use that.

Harshil: Right. Yeah, this will be a very interesting open source project if you're ever able to open source it. Phenomenal. So I’ve heard the use of the term product security in your title and you talked about it as well a little bit. Help me understand, how do you think product security is different than traditionally what we used to call application security? Or is it different? 
Mrityunjay: It's a little bit different. Application security is a subset of Product Security, I would say. Because if you think from an application security perspective, most of the time what people understand from application security is something where you're doing the application Layer 7 work and everything. Product security goes beyond that because we are dealing with multiple components. We are talking about the entire product as a whole. So we are looking at not just what the front-facing interface is, but go deeper down, look at the entire infrastructure and then assess it in two different ways, right? One is the assessment from the architecture perspective on how this company interacts with whatever else and how it goes beyond…. It’s beyond the front-facing interface. And the second thing is the deployment itself, which is what I was talking about now that we are dealing with the cloud story, right? So it's not just that. So between what was traditionally called cloud security and then the whole infrastructure security, and the application security, you combine all that together, that's our view of product security.

Harshil: Right, so that's interesting. Does that also mean that you have to either hire a different skill set or train your team with a little bit of a different skill set going towards more cloud configurations, infrastructure containers, all of that as well, which is standard for core AppSec people?

Mrityunjay: You're absolutely right. So the way I've been doing this is that we have a nice skill matrix which we have built that’s for this team to be successful, what are the skills that we need in the team and how many people fit in what blocks, right? So we exactly have a good matrix which says that “Okay, with our existing team, we have skills on these X areas out of total Y areas we need, right? So as we are doing hiring, not every hire is going to be of the same skill set. So there are people who are, in my team, who are experts on containers and triangulation and the internal systems and internal, right? There are other guys who are completely web hackers who have skills to completely compromise like, you throw them on anything they will just break it. Super cool guys. So very different skill sets, but essentially we are trying to fill that matrix in a way that the team in itself is super effective as a product security team, where I would rather have people who are a master in one rather than a jack of all trades.

Harshil: That's awesome. Do you have any suggestions or recommendations on training programs or like, if somebody is just, you know, early in their career and they want to model themselves as a really good, strong product security person, what are the resources they can access? Any suggestions on things they can do?

Mrityunjay: There are a lot of good resources. Again, security, or let's say product security is not a small area, right? I can be blunt about it and I can say that there is nobody on the planet, I can guarantee, who can say that they know everything on all the domains of security. There's literally no one. So first of all, you need to choose your area of expertise that you want to build. Sometimes you want to build a niche, right? Like somebody who feels that maybe they're good at cryptography and they want to build their skills on crypto attacks. That's one area, right? So that's a very different set that they want to study. Somebody who wants to build in web security, for example, they could go through a different set of ideas. There are folks…let's say hacking into the cloud infrastructure is another thing, right? So that's a very different kind of skill set. From a certification perspective today, I think the most popular certification which people like to see is OSCP. So if you have OSCP, it definitely gives us a good confidence that yes, you are somebody who has hands-on experience in actually doing that. And it's not just super theoretical that you have read through a certain page that you're talking about it.

Harshil: Right.

Mrityunjay: So I think that's important. And there are some very nice books on Amazon specifically from No Starch Press on web hacking and everything, which is pretty good, actually.

Harshil: Yeah. So let me ask you this. You mentioned certificates and reading books and things like that. When you're hiring for talent, I'm sure you're looking at hundreds of different resumes. Are there particular things that you look for in a resume, at a very high level? When you're looking at high volume, you're trying to build a strong technical team. What do you look for?

Mrityunjay: So usually when I'm looking at resumes, there are a few things which are red flags, which I typically don't… like if I see that too much on the resume, I would probably not talk to the person. So when you look at the resume, and if you see a lot of tools listed out there, like I see Nessus, and Burp, and Web Inspect, and Fortify. And if that is the highlight of the resume right on the first page that “I am good at these 70 different tools”, then I know that, “Okay, this person does not feel they know any security”. All they know is running tools and looking at reports and sending it out. Because when you're working in product security, you're not really dealing with issues which are easy to discover or which are well known. You're dealing with literally finding zero days every day in the product because nobody knows about it. You're doing that research, right? So that's something you have to know that the person has an experience beyond that, beyond running tools. The other thing is that I also like to see and again, OSCP was definitely one big positive. The other big positive for me when I'm looking at resumes is, has this person ever talked or presented in a security conference? It's not about whether the topic was relevant, but it's about that research mindset, right? Whether you are willing to not stop when you face a problem, but you're going to dig deeper, see what are the new things to do right, and break through. Some of those things definitely stand out as positive. And it helps me talk to the right people. 
Harshil: Yeah. So you mentioned that research mindset. So I'm guessing like, if somebody has a history of Bug Bounty record, that might be interesting as well, as above and beyond recognition.

Mrityunjay: Correct. So a lot of times, specifically Bug Bounty, there are so many people who are playing Bug Bounty and people write that on their resume, which is great, actually. But I usually like to go into the CV and read through what they have done and the complexity of their problem.

Harshil: That's amazing. You actually take the time to go read that.

Mrityunjay: Yes, I will do that. I will check out if they're on Hacker One, I'll go on their hacktivity and check out their profile and see what kind of issues they have really submitted right? And what has really been the complexity, and if there's a common pattern which they are seeing, “Okay, maybe this guy only does this, that's the only thing I have seen”. So I know that they're limited to certain areas.

Harshil: Yeah. I found myself looking into people's public GitHub repos and just trying to see what they work on, what they release, what they're contributing to, and things like that. For me, that's also a good indicator

Mrityunjay: I do agree with you. And I have even seen people writing some blogs which they build, and sometimes those blogs and the quality of content says a lot about how much they really analyze the problem.

Harshil: Right. Yeah, so switching gears a little bit, I mean we talked about a lot of different topics within product security, right? So you mentioned, obviously, the traditional web AppSec, there is the cloud security stuff, there is crypto, and some of the more deeper things around architectures as well. Now, since the scope of AppSec has expanded to this broader category of product security, when you come in in a new organization like you just joined six months ago, seven months ago, when you come in, how do you think about the strategy? Because it's very easy to be very tactical and just keep operating on a day to day basis because there's just so much to do. But how do you take a step back and how do you think about strategy from a product security perspective?

Mrityunjay: There are two things that I did, I think. The first item was to not make an assumption of whether the company has a good strategy or they don't have a good strategy. Let's not make that assumption. Actually go through… So I started with right from a few things like look at this kind of defects getting filed, run through all the defects, look at all the open problems that are known in the system, right? So that's the homework that you got to do - to understand what is the level of depth at which they have already been analyzed. Secondly, there were a lot of interviews that I had to do with various leaders and individual contributors to understand specifically in engineering, because it also gives us a good picture. When you're in product security, you're dealing with engineering as your customer, right? And you’ve got to keep your customer happy. If you cannot value that, it's not going to work. So from an engineering perspective, it was very important to get that feedback. That what is not working for ProdSec. Or for example, what is it that you would like to be done better? Right. And that view is very important. So it actually took me a few weeks, almost three weeks, to run through almost 20 - 25 different interviews on different people to understand where things are, do the bug assessment, do all the background assessment. Look at the quality of threat models which are written, right? What is missing, what is not missing. And it definitely helps to have that skill yourself, because then I can take a look at it and I can say, “Okay, fine, I can see that you are sticking to a standard set of problems on spoofing, tampering, but you are actually missing maybe on an actual design problem here”, right? And those kinds of inputs, and then you see the difference between, okay, maybe some people are really doing beautiful work, they've done excellent work, even if they have not been in security, versus some people are messy. So you can kind of get that picture. So you do a gap analysis based on that, and then you start mapping it based on the timeline, “Okay, this is realistically what I can do”. That's one. And secondly, that's a quantitative measure. There has to be a quantitative measure by which you can be honest about your own assessment, right? So the model that I used was the <a href="https://owasp.org/www-project-samm/">OWASP SAMM model</a>. This is the maturity model, which checks for your product security, maturity and goes end to end, right? So we did an assessment on that and again, that was like, it does not just talk about Product Security, but about offensive security, incident response, cloud infrastructure, everything. Right from requirements up to response. So we go through the whole thing, we do assessment and we say, “Okay, I have a score of X out of three as the highest score possible. From X, how do I reach three, and what is the realistic thing? And we got to understand that the growth is always going to be asynchronous. So it's not going to be like a straight line that, “Hey, I can reach from X to X plus one in one month, and X plus one to X plus two will be in another month”. Doesn't work that way. So be realistic about it, have a fair expectation, and then that's where I think it took me around a month to get all those things sorted. And after that, the winning factor was the support from leadership.

Harshil: That’s always important. So I guess one of the questions that I used to spend a lot of time just trying to figure out myself and never found the right answer is when you look at these maturity models, whether it's Open SAMM or BSIMM or what have you, there are so many different domains, and agreed, all of them are important in one way or the other. But when you do a maturity assessment, I'm sure there's going to be a lot of gaps in any company in a lot of different areas.

Mrityunjay: Yes.

Harshil: So it's also not reasonable to expect that we will reach the highest maturity level across all of those domains.

Mrityunjay: Oh, I never said we will. Haha.

Harshil: Exactly. So you had to pick and choose. Now the question is, how do you pick and choose which domains to focus on first and which ones come next?

Mrityunjay: Well, it's all about the ROI, right? So where my investment is going to give me the maximum return at the end of the day, which is what the shift-left model is, if you think about it. There is something which we have to definitely build in maturity. Like for example, the vulnerability response program in itself is super critical because your customer is thinking about it or some hacker has reported an issue with you. Even though it's the rightmost section, there has to be a certain amount of process and maturity over there. So that definitely has to be done. But once that reactive model is set up, then you don't have to bring as much maturity in your last bit of pentesting pre-release immediately as much as you needed to bring in the threat model. Because the reality is that fixing the problem at the threat model level is always going to be much less expensive compared to that. So it's all about shifting left at the end of the day. So if I have to see… like there are 15 different sections in the OWASP SAMM model. And if I have to raise the bar, If I just look at it as a bar chart, then if I could raise the bar, I'll start raising the rightmost bar of the response first, bring it as high as possible and then immediately shift to my leftmost model and then start bringing it from there. So it's going to be like a curve which is going to be left heavy. And that's the general strategy.

Harshil: Yeah, that makes sense. And I'm guessing as you change your areas of investments, you probably will need different types of skills within the team, different types of functions, different types of processes. So the team's responsibilities, charter, objectives also will keep changing as you go through that maturity improvement.

Mrityunjay: I would say so. Yes, you're absolutely right.

Harshil: Fantastic. In terms of just the broader industry, do you have any insight on what are the common things that you've seen across, let's just call it product security? Any common challenges within the industry? If you want to share a few thoughts on that?

Mrityunjay: I think there are certain kinds of challenges which we are seeing across the industry. One of the common, most common things we are seeing is distributed tracing. So, like, everybody has models for logging. And when you're writing code, everybody's writing logs, which is great, actually. But the problem happens when you're dealing with microservices architecture, right? So many microservices interacting with each other in a multitenant environment, spread over multiple clouds, including customers' data center. Now you're dealing with a super complex environment. Imagine a hacker who just entered into the system from some random endpoint in some place and then they jumped between services, and jumped around, right? If you have to realistically find the trace of exactly when that happened, and not just the entry point, but exactly where they all went, it's super hard because what happens is that each of these services are being developed by different groups, right? Every team is building their own and therefore everybody follows their own model of what they want to log, what they don't want to log. The cross service jump tends to get missed. And that's the problem of distributed tracing. And it's the one common problem which I have seen with so many companies happening. So that's something which I think the industry as a whole is maturing towards, including us. We are maturing as well. But it's one problem. Let's think about it. What else? The other problem which I have seen very common is the view on threat modeling. And most of the time people see threat modeling as just a product - threat model, which is the STRIDE model which Microsoft defined a long time back. The role is more than that, specifically in the cloud space. So I like to see threat modeling as two issues to be solved, not one. So there's a product threat model, which is what the traditional threat model is, and then there's a deployment threat model which is something which we are completely missing. This is what I was talking about earlier, right? So threat models are actually a split problem but most of the people who are looking at threat models are only solving one of them not the second one, which is another issue which we are seeing. And the third issue which I usually see is around penetration testing. A lot of times when you are getting penetration testing, even if it’s with a third party vendor or whatever, what happens is that it's all about the ROI. So most of the time people are just looking at OWASP Top 10. And “Okay, can I find this vulnerability?” But what happens is that you miss on everything, which is a business logic vulnerability, which was interpreted because you didn't go that deep. And so the depth of pentesting is something which is missing. So I think those are the three things that come to my mind immediately.

Harshil: It's awesome. Yeah, that's a really good summary of key challenges. I 100% agree that those are unsolved problems in the industry. In terms of threat modeling, you brought up a good point, which is it's not just a product, but it's also the deployment threat model. Have you seen anybody do similar work? Are any other security teams doing work on this type of consolidation or expansion of threat models?

Mrityunjay: I am not personally aware of that. I'm more than sure Microsoft, Google, these guys would be invested into that. But I have done this for Citrix in the past and we've been working on this for years as well. I'm more than sure the problem is not an unknown problem. I'm more than sure people will realize this sooner or later.

Harshil: Awesome. How do you guys do threat models? I mean, if you have a fast moving engineering environment, how do you actually practically do it? And do you do it for everything or do you do it for selective things?

Mrityunjay: So, we have two kinds of design review processes. So one is what we would like to call it as a light threat model, which doesn't really require a formal drafting of everything. That will typically happen for minor feature changes, etc. And the second thing is a formal threat model when there's any major change which touches the security component and everything and then there's a formal model. So those are two different streams, obviously. The first one, which is a simple design review, will require much less investment and it is typically done over one or two calls. The other one requires maybe a week worth of work. Now, the problem happens on scalability, and for that we have a brilliant champions program that we have and honestly, I've never seen it work so well in other companies the way it is working here. We have got more than 40 champions from engineering who are pretty involved actually and they're really good. They work very closely with the process team. Literally like we work as partners, we try to build threat models quickly, and our target is to speed up the process as much as possible. So it's not just my team, but with the 40 champions effectively as an extension of the project team, that works out very well.

Harshil: That's awesome. Especially if you can fully leverage those security champions who obviously are very deep into the code base, into the architecture and they are the best people to do the threat modeling work. What do you do with all the artifacts that come out of it? So, let's just say a security champion performed a threat model, light or heavy, whatever that is, it may or may not come with some artifacts or findings or to do or what have you. Do you maintain visibility tracking?

Mrityunjay: We do. We put everything into Jira. There is a specific label that we use, and then it is tracked at the engineering leadership.

Harshil: So your expectation is for the engineering team to identify them through security champions but also fix them at a certain level.

Mrityunjay: So we work with security champions and ProductSec. We don't just stick with engineering. So the problem is very weird here because let's say if I am somebody in engineering who is trying to build something, it is hard for me to find the design problems there because if I would have found it, I would have fixed it in the beginning itself, right? So it's like, it's not possible. So relying on engineering to do it is an unfair expectation, very honestly. So between the champions and engineering, they are bringing a lot of good knowledge on the product and architecture in the code. We bring in that evil eye, right? That “Okay, how do we break things into that?”. And you combine those two and you become like a very strong force to analyze the whole thing.

Harshil: Right. And where do you draw that line? Let's say you have some findings out of their architecture review. One extreme is to force them to fix things, but you're at the same time disincentivizing them to find things in the first place or hide things from you. Or there's like on the other side, you can do whatever you want, it's up to you. Totally leave it up to them. So what's your perspective? Where do you stand in the middle of those two extremes?

Mrityunjay: So my perspective is not on absolute security. It's about risk management. So you have to understand that if we are not fixing something completely, is there a mitigation that we can do to lower down the severity or the possibility of the exploitation? At that stage, we come to a level where what is an acceptable risk when you go out with something? There's no product out in the world with zero vulnerabilities, right? You see the number of patches that Chrome gets these days? It's just crazy. There's no product out there which has no vulnerability. So we have to have a risk management attitude and that's what we do. At the end of the day, we will clear that by saying that, “Okay, fine, from a threat modeling perspective, you have to mitigate this,”. When the code is written, we will go through a manual code review and see whether the mitigation was actually applied. And if we find anything that, “Yeah, this may or may not work”, then “sure, go ahead and do an exploitation and see if it actually works”.

Harshil: Interesting. So not only does the security team get involved in making that decision whether it's acceptable or not, but somebody also validates whether the fix has been put in place in code.

Mrityunjay: Yeah.

Harshil: Interesting. That sounds like a lot of work.

Mrityunjay: It is. It is a lot of work, but it's very interesting. Let's say that.

Harshil: Fascinating. This is super exciting. You guys are doing a lot of amazing work. Are you hiring for AppSec?

Mrityunjay: We are. We are actively hiring.

Harshil: What types of roles are you hiding for?

Mrityunjay: Well, that's the Product Security role, right? So everything that I just told you, that's the kind of thing we are doing. We are hiring in the US as well as Europe, and we actually grew the team from like two people in January when I started. Right now we have like eleven people. And that's reasonably fast, frankly, with six months of hiring. Our target is to hire at least four more people towards this year. But yeah, you never know, we may get more .

Harshil: That's amazing. Well, if any of you listeners, if you're looking to join a top notch security team with security conscious developers, go to Databricks.com and check out the security openings on their careers page. Mrityunjay, this has got me super excited. This is all the time we have to cover today. Thank you so much for spending the time here, and I hope to have you back sometime soon. Let's stay in touch.

Mrityunjay: Thanks, Harshil. Thank you so much for having me here. Thank you.