Friday, December 21, 2007

Who needs hackers? Customs computers failed and passengers were kept on pl

Who needs hackers?
Published: September 12, 2007

NOTHING was moving. International travelers flying into Los Angeles
International Airport — more than 17,000 of them — were stuck on
planes for hours one day in mid-August after computers for the United
States Customs and Border Protection agency went down and stayed down
for nine hours.

[Image] John Schwartz on System Breakdowns (mp3)
Enlarge This Image
Dan Steinberg/Associated Press


Relatives and friends waited for travelers at the Los Angeles airport
in August after U.S. Customs computers failed and passengers were kept
on planes.

Hackers? Nope. Though it was the kind of chaos that malevolent
computer intruders always seem to be creating in the movies, the
problem was traced to a malfunctioning network card on a desktop
computer. The flawed card slowed the network and set off a domino
effect as failures rippled through the customs network at the airport,
officials said.

Everybody knows hackers are the biggest threat to computer networks,
except that it ain't necessarily so.

Yes, hackers are still out there, and not just teenagers: malicious
insiders, political activists, mobsters and even government agents all
routinely test public and private computer networks and occasionally
disrupt services. But experts say that some of the most serious, even
potentially devastating, problems with networks arise from sources
with no malevolent component.

Whether it's the Los Angeles customs fiasco or the unpredictable
network cascade that brought the global Skype telephone service down
for two days in August, problems arising from flawed systems,
increasingly complex networks and even technology headaches from
corporate mergers can make computer systems less reliable. Meanwhile,
society as a whole is growing ever more dependent on computers and
computer networks, as automated controls become the norm for air
traffic, pipelines, dams, the electrical grid and more.

"We don't need hackers to break the systems because they're falling
apart by themselves," said Peter G. Neumann, an expert in computing
risks and principal scientist at SRI International, a research
institute in Menlo Park, Calif.

Steven M. Bellovin, a professor of computer science at Columbia
University, said: "Most of the problems we have day to day have
nothing to do with malice. Things break. Complex systems break in
complex ways."

When the electrical grid went out in the summer of 2003 throughout the
Eastern United States and Canada, "it wasn't any one thing, it was a
cascading set of things," Mr. Bellovin noted.

That is why Andreas M. Antonopoulos, a founding partner at Nemertes
Research, a technology research company in Mokena, Ill., says, "The
threat is complexity itself."

Change is the fuel of business, but it also introduces complexity, Mr.
Antonopoulos said, whether by bringing together incompatible computer
networks or simply by growing beyond the network's ability to keep up.

"We have gone from fairly simple computing architectures to massively
distributed, massively interconnected and interdependent networks," he
said, adding that as a result, flaws have become increasingly hard to
predict or spot. Simpler systems could be understood and their
behavior characterized, he said, but greater complexity brings
unintended consequences.

"On the scale we do it, it's more like forecasting weather," he said.

Kenneth M. Ritchhart, the chief information officer for the customs
and border agency, agreed that complexity was at the heart of the
problem at the Los Angeles airport. "As we move from stovepipes to
interdependent systems," he said, "it becomes increasingly difficult
to identify and correct problems."

At first, the agency thought the source of the trouble was routers,
not the network cards. "Many times the problems you see that you try
to correct are not the root causes of the problem," he said.

And even though his department takes the threat of hacking and
malicious cyberintruders seriously, he said, "I've got a list of 16
things that I try to address in terms of outages — only one of them is
cyber- or malicious attacks." Others include national power failures,
data corruption and physical attacks on facilities.

In the case of Skype, the company — which says it has more than 220
million users, with millions online at any time — was deluged on Aug.
16 with login attempts by computers that had restarted after
downloading a security update for Microsoft's Windows operating
system. A company employee, Villu Arak, posted a note online that
blamed a "massive restart of our users' computers across the globe
within a very short time frame" for the 48-hour failure, saying it had
overtaxed the network. Though the company has software to "self-heal"
in such situations, "this event revealed a previously unseen software
bug" in the program that allocates computing resources.

As computer networks are cobbled together, said Matt Moynahan, the
chief executive of Veracode, a security company, "the Law of the
Weakest Link always seems to prevail." Whatever flaw or weakness
allows a problem to occur compromises the entire system, just as one
weak section of a levee can inundate an entire community, he said.
Skip to next paragraph
Dan Steinberg/Associated Press

Some of the thousands of passengers delayed in Los Angeles by a
customs computer failure lined up for screening.

This is not a new problem, of course. The first flight of the space
shuttle in 1981 was delayed minutes before launching because of a
previously undetected software problem.

The "bug heard round the world," as a former NASA software engineer,
John B. Garman, put it in a technical paper, came down to a failure
that would emerge only if a certain sequence of events occurred — and
even then only once in 64 times. He wrote: "It is complexity of design
and process that got us (and Murphy's Law!). Complexity in the sense
that we, the `software industry,' are still naïve and forge into large
systems such as this with too little computer, budget, schedule and
definition of the software code."

In another example, the precursor to the Internet known as the Arpanet
collapsed for four hours in 1980 after years of smooth functioning.
According to Dr. Neumann of SRI, the collapse "resulted from an
unforeseen interaction among three different causes" that included
what he called "an overly lazy garbage collection algorithm" that
allowed the errors to accumulate and overwhelm the fledgling network.

Where are the weaknesses most likely to have grave consequences? Every
expert has a suggestion.

Aviel D. Rubin, a professor of computer science at Johns Hopkins
University, said that glitches could be an enormous problem in
high-tech voting machines. "Maybe we have focused too much on hackers
and not on the possibility of something going wrong," he said.
"Sometimes the worst problems happen by accident."

Dr. Rubin, who is director of the Center for Correct, Usable,
Reliable, Auditable and Transparent Elections, a group financed by the
National Science Foundation to study voting issues, noted that
glitches had already shown up in many elections using the new
generation of voting machines sold to states in the wake of the
Florida election crisis in 2000, when the fate of the national
election came down to issues like hanging chads on punch-card ballots.

Dr. Bellovin at Columbia said he also worried about what might happen
with the massively complex antimissile systems that the government is
developing. "It's a system you can't really test until the real thing
happens," he said.

There are better ways.

Making systems strong enough to recover quickly from the inevitable
glitches and problems can keep disruption to a minimum. The customs
service came under some of the most heated criticism for not having a
backup plan that could quickly compensate for the network flameout;
eventually, airport officials had to provide fuel to the planes so
that the airlines could run the air-conditioning, and provided food,
beverages and diapers to the trapped passengers.

Mr. Ritchhart said it was unfair to characterize his department as
having no backup plan. In fact, there were two — but neither addressed
the problem. The main backup plan envisions a shutdown of the national
customs network, and allows local networks to function independently.
Since it was the local network that was in trouble at Los Angeles, he
said, that backup plan did not work.

The other fallback involves setting up customs agents with laptops
that are equipped to scan the millions of names on the watchlists and
to perform other functions. That system was put in place, he said, but
the laptops operate at one-third the speed of the computer network,
and the delays persisted. The agency is reviewing its policies to
improve its response, he said, and if a similar slowdown occurs, is
considering having agents call colleagues in other cities to perform
searches on functioning parts of the network.

The best answer, Dr. Neumann says, is to build computers that are
secure and stable from the start. A system with fewer flaws also
deters hackers, he said. "If you design the thing right in the first
place, you can make it reliable, secure, fault tolerant and human
safe," he said. "The technology is there to do this right if anybody
wanted to take the effort."

He was part of an effort that began in the 1960s to develop a
rock-solid network-operating system known as Multics, but those
efforts gave way to more commercially successful systems. Multics'
creators were so farsighted, Dr. Neumann recalled, that its designers
even anticipated and prevented the "Year 2000" problem that had to be
corrected in other computers. That flaw, known as Y2K, caused some
machines to malfunction if they detected dates after Jan. 1, 2000.
Billions of dollars were spent to prevent problems.

Dr. Neumann, who has been preaching network stability since the 1960s,
said, "The message never got through." Pressures to ship software and
hardware quickly and to keep costs at a minimum, he said, have worked
against more secure and robust systems.

"We throw this together, shrink wrap it and throw it out there," he
said. "There's no incentive to do it right, and that's pitiful."

No comments: