sam's terrain: interviews

Managing the patch process

April 23, 2004

A software developer who began his career with Microsoft at its Melbourne operation is now at the nerve centre of one of the company's most important divisions - its security response centre (MSRC).

Iain Mulholland was in Melbourne for two years from the year 2000 and then moved to Redmond where he is now the operations manager of the MSRC.

Mulholland, 32, is the equivalent of a program manager but his division is one that is very much in the spotlight, given the new, apparent, emphasis which Microsoft has laid on security.

The MSRC is a core team consisting of a number of program managers, each of whom focuses on a specific technical area - be it Internet Explorer, Windows, Office and so on.

Understandably, while there's a good deal of process involved in producing the many patches that Microsoft puts out on the second Tuesday of each month, Mulholland can only speak about some part. But he was quite forthcoming during a phone chat.

Once a vulnerability is deemed to be of sufficient importance to be patched, a virtual team gets on the job, Mulholland said. "Let's assume this is a Windows flaw. Some people are in Hyderabad, India, others in Ireland (where most of the localisation is done)."

Work is assigned to developers based on who is responsible for a particular block of code and the first patches which are developed are tested on the developer's own box. These are referred to as "a private" as it is tested just on the one workstation.

Once this is seen to work, then the developer seeks to check in his or her work in during the daily team meeting. Now the testing process widens and the fix is next checked into the main source tree of the application in question. A package is built and the testing broadens.

"Since this is a hypothetical Windows flaw, the next round of testing is undertaken by the Windows Sustained Engineering Team - we have teams like this for all our major products," Mulholland said. "They will slot the testing into their next test window. The patch gets deployed more broadly on development domains, but if it doesn't work as expected, then it goes back and is analysed all over again."

The release cycle for a patch begins about seven to 10 days before the actual release date. "The patch, now that it has passed all necessary testing, is put up on Windows Update - we release patches for all versions of an application at the same time and for all languages as well," Mulholland said. "Bulletins are prepared in international English as it is easier to translate. This goes through two or three sets of editorial teams before they are finalised."

Mulholland defended the amount of time taken to release patches and attributed it to the large number of versions of software, the number of languages involved and the number of applications which had to remain unaffected by the patching.

"We are not going to stick to some timeline of 30 days, or 60 days or 90 days," he said. "What we are bothered about is our customers - they have to be looked after and that's our only concern. If somebody feeds me details of a vulnerability and says that he or she will release an exploit based on it after giving us X number of days to issue a patch, we won't work to that. The patch has to work - only then will it be released."

Mulholland said Microsoft often used third parties to test the patches which the company was getting ready to release, "in order to get a different opinion. Of course, due to commercial confidentiality, I can't name names."

He also defended the issuing of 20 patches rolled into four advisories this month. "Sure, you can say this is a PR stunt. But customers have told us that it is better to get a number of patches for a certain part of the system in one go, rather than in 14 fragments. And like I said, the customers are our number one priority."

More, in Mulholland's own words:

Q: How is the security response team structured? Is it a US-only operation or spread over many countries? If the latter, how is the coordination done, given the time zones?

Security is an issue that spans the globe, so there are dedicated individuals whose sole focus is security response. Of course, the headquarters for the MSRC is in Redmond, but we have first responders on every continent and work diligently to address threats quickly at the moment they arrive. In the event of a threatening security issue, the MSRC works across Microsoft to address the issue and help customers worldwide to be safe.

Q: How many people work on checking vulnerability reports on mailing lists? Do they monitor underground channels? Warez sites?

This is a difficult question to answer as there are so many different people functioning within the MSRC. Across Microsoft, there are security experts who monitor the security community in the ongoing effort to help keep customers ahead of changing threats. Secure@microsoft.com is monitored around the clock, 365 days a year. I cannot comment on operational specifics, but my team has its thumb on the pulse of the security community. When a security issue threatens customers, the MSRC works in concert with several specially focused teams to investigate, fix and learn from security vulnerabilities.

Q: Are they split into teams based on apps? Or is the division of labour done some other way?

The MSRC is like a hub, with spokes leading out to each product group. Within the team, we have individuals who focus on particular technology areas. So, for example, we have people who focus on the core Windows OS, others who look at IE and similar web technologies and so on. This allows us to develop a really strong understanding of the types of security vulnerabilities that can affect these products and technologies and (it) also means we foster really deep relationships with the engineering teams who are responsible for servicing the products.

Q: Let's say X submits details of what he/she perceives to be a vulnerability. What happens next?

Let me walk you through the process from a high level. First off, we monitor secure@microsoft.com 24 hours a day, 7 days a week. When the MSRC receives a vulnerability report at this address, the first thing we do is determine whether it's really a security vulnerability. If the mail really does appear to report a potential vulnerability, we perform an initial assessment to try to reproduce the scenario the customer reported. Only about one mail in 10 turns out to be a legitimate flaw in our software.

Once the initial triaging phase is over, things become much more formal. The individual who reported the vulnerability is notified that an investigation has been opened, a tracking number is included, and the individual is invited to be in constant contact with the MSRC regarding the status of the investigation. Next, we send a request to the affected product's development team, asking for their help in investigating the issue further. We then meet regularly with representatives from the product teams to discuss the status of all ongoing investigations. During the formal investigation, the development team works to reproduce the problem, and then scrutinises the product's source code to understand exactly why it happens.

Only about one percent of all emails sent to the MSRC identify a bona fide security vulnerability, but in those cases, it's vital that we get a quality fix to our customers as quickly as possible. You usually see these in the form of security updates, issued (on) the second Tuesday of the month, but they also appear in service packs. When an update is built, it must go through an extensive testing process to make sure it completely fixes the vulnerability without breaking any applications. The most important thing in this process is to make sure that we issue a quality update for our customers.

Q: Does X have to make any kind of commitment when he/she submits the bug?

Security researchers who report vulnerabilities to the MSRC have no formal commitment to Microsoft. They are kept in constant communication with the team working to identify and build a fix for the vulnerability, and are often even invited to be a part of the investigation. Most researchers understand that security is an industry-wide issue and truly desire to keep computer users safe. To that end, they are a vital part of the MSRC and usually with us to help protect customers.

Q: What happens when one of the team notices a 0-day exploit posted to any of the vulnerability mailing lists?

We have an extensive 24x7 emergency response process that can be called into action to address a security threat. In the event of a zero-day attack, we would come together to examine the issue and determine appropriate action. Since software is so complex, having an update immediately would be unlikely, but we would determine what customers could do to protect themselves from attack. This would likely be in the form of a workaround. Another crucial part of our process is the communications aspect. We work hard to ensure that customers have are clear and timely information, providing guidance to help keep customers safe from threats. A great example of this emergency process at work is the recent MyDoom worm, and how we came together to help mitigate the worm's effect on customers.

Q: Let's take the case of the eEye vulnerabilities - some of them have gone unpatched for as long as 215 days. How does the security team ascertain whether or not anybody else has happened upon the same flaws? After all, Marc Maiffret (the chief hacking officer of eEye Digital Security) isn't the only hacker with that level of expertise.

Security response is a practical matter. The MSRC works to follow the course of action that is of greatest use to customers and the least value to malicious users. Recent incidents, from Code Red to Blaster and even Witty, have taught us that the critical delta in security response is between the posting of a bulletin and application of the associated update.

While we do track for signs that private issues are being used maliciously as part of our security response cycle, which also generally includes the creation of an emergency skeleton build of a fix, our customers have asked that we publicise a security issue only after we're confident that we've developed and tested a quality fix. As you might expect, higher quality equates to faster uptake and reduced customer exposure.

Similarly, this is one reason we moved to the monthly release cycle. Customers asked for a more predictable update schedule so that they could allocate resources and apply updates in a more timely manner. As a result of improvement to our security response process and ongoing customer education, our April bulletin cycle drove record downloads of the updates in the first 24 hours - nearly double anything we've seen in the past.

Q: What philosophy does Microsoft have with regards to disclosure?

Microsoft, like many other organisations within the IT industry, continues to encourage responsible disclosure of vulnerabilities to minimise risk to computer users. We believe the commonly accepted practice of reporting vulnerabilities directly to a vendor serves everyone's best interests by ensuring that customers receive comprehensive, high-quality patches for security vulnerabilities with no exposure to malicious attackers while the patch is being developed.

We believe that this method of disclosure is the only way that end users are not put at undue risk. There has been much discussion on this topic, and I would suggest that you look into the Organization for Internet Safety's guidelines for responsible disclosure to see what the industry is saying.