The 10th anniversary of the Google search incident that incorrectly classified the entire World Wide Web as malware is another opportunity to reflect upon computer system defects, human error, process flaws, organizational mistakes, and the best principles and practices for solution in the IT industry. In this blog and my upcoming book, Bugs: A Short History of Computer System Failure, I will chronicle some important system failures in the past and discuss ideas for improving the future of system quality. As information technology becomes increasingly woven into Life, the quality of hardware and software impacts our commerce, health, infrastructure, military, politics, science, security, and transportation. The Big Idea is that we have no choice but to get better at delivering technology solutions because our lives depend on it.
On 31 January 2009, a Google engineer manually updated its search engine’s blacklist of sites classified as malware to include the URL of ‘/’; this change meant that every organic Google search result for the entire World Wide Web (WWW or Web) was incorrectly classified as malware. Fortunately, Google’s on-call Site Reliability Engineering (SRE) team quickly identified the problem and fixed it within an hour. Besides affecting organic search results, the system error also impacted Google’s email service, GMail, in which users reported genuine messages routed to spam folders; interestingly, advertised or promoted search results were not affected by the error. This essay explores some of the business and technology factors that contributed to the system defect, the incident’s timely resolution, and the wider implications for the Web, search, and malware classification.
Source: VisualCapitalist.com
According to multiple sources including JumpShot, Netmarketshare.com, and Statista.com, Google has 60–80% of the market share for web search traffic depending on the country. Google is also the default search engine on most smartphones running the Android operating system; according to Gartner Research and Statista.com, Android holds about 85% market share since 2017. If one also accounts for its sister properties such as Google Image, Maps, and Youtube, then Google holds an impressive 90% market share of web, mobile, and in-app searches. There are some potential threats on the horizon to Google’s dominance in Search; they range from Amazon’s Alexa and Echo devices used to search and buy products to users spending more time on Facebook, and even some users opting out of data sharing entirely through Ad/Cookie blocking browser plugins. In the end though, Google handles 3.5 billion searches per day, has more than 1.5 billion unique users, and earns about $32B annually in advertising revenue from search.
Malware is software designed to intentionally cause harm to an individual user, a computing device, or a larger network of nodes by attacking the system’s availability, confidentiality, or integrity. There are different types of malware such as computer viruses, worms, spam, Trojan horses, ransomware, spyware, adware, and others. What began out of curiosity and fun when the Internet was an academic computing environment has now turned into malice and profit because malware means big business and serious trouble for corporations, governments, and individuals across the world. According to various computer security reports from McAfee, Center for Strategic and International Studies (CSIS), IBM, the Ponemon Institute, and Symantec, there are several cybercrime statistics one should be concerned about:
Google Search Warning for Malware
So with great power comes great responsibility. Through the Stopbadware.org initiative since 2006, Google has partnered with the likes of Consumer Reports, Mozilla, Paypal, Verisign, Verizon, and others to prevent, mitigate, and remediate malware websites. Stopbadware receives data from different content and hosting providers, defines criteria for classifying malware sites, maintains a common clearinghouse of URLs blacklisted by community members, aggregates malware statistics, manages the appeal process if a site is blocked by providers, and publishes advisory documents and best practices to reduce the incidence of malware. Although Google supports StopBadware through data sharing, participates in its working groups, and contributes financially to the organization, Google’s Safe Browsing Initiative and Secure Web API’s are separate services that use Google’s own private blacklist curated by both man and machine. This list is periodically updated, and on 31 January, 2009, a Google engineer accidentally added and committed the “/” URL to the blacklist, and Google’s system interpreted this URL to match all Web URLs. Twitter was briefly ablaze and abuzz with people reporting the error using the hashtags #googmayharm #googmayhem. The warning message in Google’s organic search results also linked to Stopbadware.org, and the torrent of users clicking the link caused a DDOS on their website. Users could still copy-and-paste links into the URL field and visit the sites manually, but the widespread perception on that Saturday morning was that the Web was experiencing a malware catastrophe. The good news for Google and the Web was that the Google SRE team was on-call, and it was actively monitoring and supporting its cloud services. The SRE team was notified of user complaints, identified the root cause, communicated a response to the global community through its blog and Twitter, reverted the blacklist change, and deployed the updated configuration to its services. Google’s search services like much of its cloud platform are distributed on servers located across the world so the blacklist configuration update was released in a staggered and rolling fashion. The search errors began appearing between 6:27 and 6:40 AM PST when the blacklist was initially changed and then began disappearing between 7:10 and 7:25 AM when that change was reverted.
While this story is about a negative incident involving Google, there are several positive lessons to be learned for IT professionals.
In subsequent articles, I will discuss specific system incidents involving malware that resulted in security breaches as well as strategies and tactics for preventing and reacting to these events.
Enjoy the article? Follow me on Medium and Twitter for more updates.
References