The 21st anniversary of the USS Yorktown incident offers an opportunity to reflect upon computer system defects, human error, process flaws, and the best principles and practices for solution delivery in the IT industry. In this blog and my upcoming book**, Bugs: A Short History of Computer System Failure**, I will chronicle some important system failures in the past and discuss ideas for improving the future of system quality. As information technology becomes increasingly woven into Life, the quality of hardware and software impacts our commerce, health, infrastructure, military, politics, science, security, and transportation. The Big Idea is that we have no choice but to get better at delivering technology solutions because our lives depend on it.
On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems. The Yorktown had successfully served the US Navy since 1984 without a major incident during multiple combat operations; however, as part of an IT modernization program dubbed Smart Ship, its control systems were modified in 1996 to use a network of PC’s running Windows NT 4.0. This essay will examine the details of the software error that stopped the Yorktown and discuss the IT matters that contributed to the system failure.
USS Yorktown at anchor
The USS Yorktown (CG-48) was launched on 17 January 1983 and commissioned on 4 July 1984. A Ticonderoga-class cruiser, the Yorktown was designed to use the American Aegis system which integrated computers with radar data to track and guide weapons. Weighing nearly 10,000 tons and spanning 173 meters in length, the Yorktown supported a variety of armaments including two Mark-26 surface to air missiles (SAMs), eight RGM-84 Harpoon anti-ship/anti-submarine missiles, two Mark-32 torpedo tubes, and four lightweight mounted guns. Propelled by four General Electric gas turbine engines with 80,000 horsepower, the Yorktown could reach speeds greater than 30 knots. The ship could also carry two Sikorsky Seahawk helicopters, the Navy relative of the US Army’s Blackhawk. The Yorktown’s weapons, aerial resources, speed, and crew of 33 officers and more than 340 enlisted personnel made it one of the US Navy’s most versatile military units on the water’s surface, able to support carrier battle formations, amphibious assaults, escort missions, and interdiction assignments.
The Yorktown ship was built in sections, called modules. The modules were connected together to form the ship’s hull; once done, the deckhouse sections were then lifted aboard. During module component construction, hundreds of subassemblies were made and equipped with piping, ventilation ducts, and other shipboard hardware. These subassemblies were then joined to make the modules, which were then outfitted with larger equipment, such as electrical panels as well as propulsion and power generation machinery. At the Ingalls shipyard in Pascagoula, Mississippi, this modular process is supported by a Computer-Aided Design (CAD) and Manufacturing program; the CAD system directs the operation of digital equipment used to cut steel plates, cut and bend pipes, and form sheet metal assemblies. For launching, the ship was moved across land several hundred yards by a wheel-on-rail transfer system to the floating dry dock. The dock was then moved towards the water’s edge and ballasted down in order for the ship to float free; thereafter the ship was moved to an outfitting berth in preparation for the traditional christening ceremony. Upon completion of post-launch outfitting, the ship went through extensive dockside and at-sea testing to ensure the ship and crew were ready to work safely at sea. Litton Industries needed 15 months to manufacture the Yorktown; the ship cost approximately $1 billion to build and $28 million to operate annually.
Bezzavettny (left) collides with Yorktown (right)
In its first deployment during 1985–1986, the Yorktown undertook several successful expeditions including the interception of the Achille Lauro hijackers, two Black Sea excursions, and three military operations off the Libyan coast. During its second and third deployments, it participated in several US and NATO exercises around Europe and the Mediterranean including the “right of innocent passage” on 12 February 1988 in Soviet territorial waters which triggered a collision incident with the Soviet frigate, Bezzavettny, that some observers have called the “last incident of the Cold War”. It also played a key role in Operation Provide Comfort to Kurdish refugees during the US Iraqi War of 1992. In its fourth and fifth deployments spanning the years of 1993–1995, it served in counter-narcotic operations in the Caribbean as well as UN sponsored actions related to the war in Yugoslavia. By this time, the Yorktown and its crew had earned awards for naval gunfire support (1987), electronic warfare excellence (1991), sustained combat readiness (1992), and superior safety (1993).
In October 1995, the US Naval Research Advisory Committee (NRAC) published a paper recommending approaches to reduce manning; the report’s thesis was that culture and tradition were the obstacles to lower staff and lifecycle costs — not technology. Thus in December 1995, the US Navy established the Smart Ship Program Office (SSPO) to pursue the goal of reducing staff while maintaining combat readiness through new technology and process changes. So-called “smart ships” would consist of several new systems to automate navigation, monitor equipment sensors, control machinery and fuel, and communicate over both fiber optic and wireless networks. The SSPO chose the Yorktown as its first testbed, and by December 1996, the ship was equipped with the first prototype of the Smart Ship System. The system was designed and built by a subsidiary of Litton Industries; it consisted of a Local Area Network (LAN) of 27 client PC’s communicating over fiber optic cable with a server. All the Smart Ship machines ran Microsoft Windows NT 4.0. The system was projected to save $2.8 million per year by reducing manual operations and maintenance costs associated with shrinking the ship’s staff by 4 officers and about 40 enlisted personnel. In May 1997, the Yorktown with reduced crew successfully completed a five month deployment serving in Caribbean counter-narcotic operations as well as performed test exercises alongside the USS George Washington in her carrier battle group. The Navy Man Power Analysis Center (MPAC) and Operational Test and Evaluation Force (OPTEVFOR) groups subsequently reviewed the Yorktown’s crew and ship capabilities and concluded that the ship could meet its operational requirements.
On 21 September 1997, the USS Yorktown was performing training exercises off the coast of Cape Charles, Virginia when a crew member began troubleshooting a fuel valve that was physically closed, but according to the Smart Ship’s Standard Machinery Control System (SMCS) was open. The technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM). The RDM program then attempted to perform a division operation by the valve property; a divide-by-zero arithmetic exception was thrown, not caught by the program, and the RDM crashed. Since other Smart Ship systems were dependent on RDM availability across the LAN, these other SMCS components including ones controlling the motor and propulsion machinery began to fail in a domino-like sequence until the ship stopped dead in the water. The crew was able to troubleshoot and restart the ship’s systems after two hours and forty-five minutes, and the Yorktown returned to base in Norfolk, Virginia.
There are conflicting reports on several aspects of the Yorktown incident that we shall explore now. One controversy is whether the Yorktown returned back to base on its own or was towed by another vessel. Anthony DiGiorgio, a civilian contract engineer with 26 years of experience working on naval control systems in the Atlantic Fleet Technical Support Center, initially stated that the ship had been towed in a critical article he penned in the June 1998 issue of the Naval Institute Proceedings (NIP) journal. His account was later disputed by Captain Richard Rushton, the commanding officer of the Yorktown; Rushton stated that the Yorktown had two FFG-7 emergency power units that were activated when the propulsion system failed. Rushton also indicated that similar program crashes had occurred twice since the Smart Ship installation due to incorrect values entered into the RDM; in each case, the ship’s systems were restarted, RDM values were reset, and the ship performed as expected and required. In the same June 1998 article of the NIP, DiGiorgio elaborated further on the incident and alleged that the ship needed two days of pierside repairs and maintenance. However, in the August 1998 issue of the the Government Computer News (GCN) magazine, DiGiorgio retracted his earlier declaration and suggested that GCN reporters had altered his original story; GCN published a statement standing by its journalists and the original narrative. While Rushton’s story does explain what we know, the retraction by DiGiorgio suggests intense organizational and political pressure to suppress the full facts of the story. The other major point of contention was whether usage of Microsoft Windows NT 4.0 contributed to the failure. Windows NT 4.0 was selected in March 1997 as the standard OS for both networks and PC’s as part of the Navy’s Information Technology for the 21st century initiative (IT-21). Bill Gates even went so far as to nominate the Smart Ship program to the Computer World and Smithsonian Awards. DiGiorgio assigned some blame to Windows NT in his June 1998 article, stating that “using Windows NT… on a warship is similar to hoping that luck will be in our favor.” Ron Redman, deputy IT director of the Aegis Program Executive Office, is quoted in a June issue 1998 of GCN as saying that “UNIX is a better system for control of equipment and machinery, whereas NT is a better system for the transfer of information and data. NT has never been fully refined and there are times when we have had shutdowns that resulted from NT.” However, Rushton again defended the Navy choice of operating system, stating that “NT was never the cause of any problem on the ship. The problems were all in programs, databases, and code within the individual pieces of software we were using.” Based on the public record, while there was organizational pressure to use Windows NT, this decision does not appear to have directly contributed to the Yorktown incident notwithstanding the negative comments from DiGiorgio and Redman.
The US Navy CIO office commissioned an inquiry led by Ron Turner into the USS Yorktown incident. Although the Navy’s investigation report was not made public, several lessons can be learned from this event that are useful for software development professionals.
CAE corrected some of the aforementioned SMCS software issues, and the Yorktown underwent further testing after updates to its SMCS, returning to active service over a year later. On 25 September 1999, the ship departed Pascagoula for a four month counter-narcotics deployment in the Caribbean, and it served without problems. The Yorktown’s final mission was patrolling the Persian Gulf from February to August of 2004; the ship assisted with protecting Iraqi oil terminals and conducting other maritime security operations. Approaching the end of its 35-year estimated life cycle, the Yorktown was decommissioned and struck on 10 December 2004; since that time, it has been berthed at the Naval Inactive Ships Maintenance Facility in Philadelphia, Pennsylvania, but not yet scrapped.
On the other hand, the SSPO ambitions and budget faced considerable scrutiny after the Yorktown incident. The original contract awarded to Litton Industries was worth $138.6 million USD, and the program’s scope was installing the Smart Ship System on all 27 CG-47 cruisers. The Smart Ship System was installed on the USS Ticonderoga (CG-47) after taking 70 weeks and costing the company $30 million USD. Contract renegotiations took place, and under the amended agreement, Litton would complete work on two additional ships already started with an option to build four more. Litton engineers were able to install the system on the USS Monterey in 20 weeks, but the costly lessons of complexity and retrofitting new technology on legacy systems meant that the SSPO’s transformation goals would have to await a new generation of Navy ships.
Enjoy the article? Follow me on Medium and Twitter for more updates.
References