I.B.M.
Plans Supercomputer That
Works at Speed of Life
By STEVE LOHR
Two
years ago, when an IBM supercomputer known
as Deep Blue beat the world chess champion, Gary
Kasparov, it seemed a confirmation
of the Computer
Age, a triumph of machine
over man. On Monday, IBM
is announcing a five-year,
$100-million program to
build a supercomputer whose
ambitions dwarf Deep
Blue's.
The goal of the new supercomputer,
to be called Blue
Gene, is to simulate one
of the most common routines in
natural biology -- the process
by which amino acids
intricately fold themselves
into full-fledged proteins,
the body's molecular work
force whose chores range
from metabolizing food to
fighting disease.
Protein folding may be
routine, but it is a routine of
enormous complexity. To
simulate the process is
beyond the reach of
contemporary computing. To
meet the challenge, Blue
Gene is being built to run
500 times faster than the
world's fastest
supercomputer today, by
using an innovative design
that seeks to sharply accelerate
the already torrid rate
at which the speed of computers
improves.
If successful, the IBM project
would not only be a
breakthrough in cutting-edge
computing, but would also
help supply fundamental
insights into the basic physics
and chemistry of biology,
opening the door to a new
understanding of diseases
and more effective drugs,
perhaps ones individually
tailored to a person's genetic
makeup.
The publicly financed human
genome project, an
international research initiative
begun in 1990 with the
goal of deciphering all
of the human genetic code by
2005, is already generating
a huge quantity of
biological data for computation.
IBM's Blue Gene
program is, in a sense,
an effort to take the next step --
feeding the genetic data
into a supercomputer to try to
understand basic biological
processes.
Computer scientists and biologists
who have been
briefed in advance of Monday's
announcement are
impressed by the ambition
and promise of the IBM
effort.
"The combination of all the
ideas that IBM is putting
together to make a supercomputer
on this scale is really
exciting," said Ken Kennedy,
a computer scientist at
Rice University who is co-chairman
of the President's
Information Technology Advisory
Committee.
"And there will be a lot
more benefit to society from
this project than there
was from having Deep Blue beat
Kasparov in chess," he added.
Biologists and medical experts
say the broadest impact
of the IBM research, at
least during the next few years,
will likely come from its
contribution to improving the
field of computer simulations
of molecular biology in
general, rather than the
"grand challenge" of protein
folding itself. It is expected
to take five years before
Blue Gene is ready to begin
the marathon simulations of
protein folding.
For more than a decade, biological
researchers have
used computer simulations
to study the activity of
proteins in the body --
how drugs bind to proteins, for
example, or how cell membranes
absorb some
substances while screening
out others.
Being able to run faster,
longer and better simulations
of more modest molecular
mysteries than protein
folding could have big health-care
payoffs in
understanding ailments like
heart disease and high
blood pressure.
"The promise of what IBM
is doing is far beyond the
one machine," said Bernard
Brooks, a principal
investigator at the National
Institutes of Health and a
leading expert in computer
simulations of molecular
biology. "The really important
work that can be done
with this technology is
in smaller-scale simulations
rather than the demonstration
project of protein
folding."
For IBM, Blue Gene is a research
program of its
renowned Watson Labs. But
it is the expected
trickle-down of research
knowledge into commercial
uses that justifies the
company's $100 million
investment on Blue Gene.
Since mid-1998, IBM has jumped
from third in
supercomputer installations
worldwide, after the Cray
division of Silicon Graphics
and Sun Microsystems, to
the top spot.
In that time, IBM has nearly
doubled its share of the
500 most powerful machines,
from 75 to 141 last
month, according to "The
Top 500 Supercomputing
Sites," a list compiled
by three academic
supercomputer experts. At
the same time, the number of
installations for Cray fell
sharply while Sun
Microsystems held steady.
"There is no doubt in our
mind that a lot of that
improvement is because of
what we learned with Deep
Blue," said Paul Horn, senior
vice president of
research. "The payoff can
be enormous."
Several IBM supercomputers
are already at work on
the human genome project
worldwide, including one
that is host to one of the
project's central databases in
Toronto.
The announcement Monday,
just as the project is getting
under way, is also clearly
an image-burnishing step by
IBM, intended to emphasize
its commitment to
supercomputing and to research.
Blue Gene, experts
agree, is a multidisciplinary
endeavor requiring not
only computer hardware,
software and manufacturing
expertise but also mathematicians,
biologists, chemists
and physicists.
In addition, the Blue Gene
project
should serve as a kind of
recruiting
tool for IBM research --
and
perhaps serve as a venture
that
could lift the stature of
computer-science research
in
general. Such a lift, according
to
Kennedy of Rice University,
is
badly needed. Computer talent,
to be
sure, has perhaps never
been in such
great demand as it is today.
Yet the excitement of
Internet start-ups and the
lure of stock options, Kennedy
notes, has meant that computer-science
students
increasingly shun graduate
studies and advanced
research.
"A few projects like this
could re-establish research
institutions -- academic
or corporate -- as centers of
excitement in computing,"
Kennedy said. "It's going to
bring some of those minds
back."
The frontier of computational
biology is certainly a
field that can stir excitement
in the research community
as well as hold out the
promise of being a huge industry
someday. In the last few
years, IBM has built a
30-person team of researchers
in computational
biology.
IBM hopes its supercomputer
project will stimulate the
field. "We want to attract
significant interest and
involvement from university
researchers and from the
scientific community in
general," Dr. Sharon Nunes, a
senior research manager,
said. "If we can influence this
fundamental research, it
will happen faster."
The computing innovation
behind Blue Gene, in
essence, is to build a computer
that works much as
nature works -- a triumph,
if it succeeds, of marrying
simplicity and complexity.
The computer scientists at
IBM plan to sharply simplify
the RISC (reduced instruction-set
computing)
architecture used in the
chips that run engineering work
stations and supercomputers
today. The "instruction
set" -- the total vocabulary
of machine-language
instructions a computer
understands -- will number 57
for Blue Gene, compared
with about 200 for most RISC
machines.
Then, instead of putting
a single microprocessor on a
chip, Blue Gene will have
32 microprocessors -- the
calculating engines of computers
-- on each chip.
Sixty-four such chips will
be inserted on each
motherboard, with eight
motherboards in each of the 64
computing towers of Blue
Gene.
When completed, Blue Gene
will stand about six feet
high, occupying a floor
space of 40 feet by 40 feet at
the Watson labs in Yorktown
Heights, N.Y. It will have
a total of about 1 million
microprocessors.
Among the innovations computer
scientists find most
impressive about Blue Gene
is that IBM will place
memory for storing data
on the same chip as the
microprocessor. In conventional
computer designs, the
memory for storage is separate
from the processor.
Shuttling data from the memory
to the processor is a
major bottleneck in computers,
slowing them down.
Only within the last year
or so, because of advances in
chip making and miniaturization,
has it become
possible to consider putting
memory and processing on
the same chip in the way
that IBM is developing.
To attain the speeds Blue
Gene seeks within five years,
IBM must try a new architecture
of computing. The
conventional wisdom holds
that microprocessor speeds
can theoretically double
every 18 months, a
phenomenon known as Moore's
Law, for Gordon
Moore, the chip pioneer
who first observed it. With
Moore's Law, it would take
about 15 years to achieve
the speed target for Blue
Gene.
"There's no way you get to
where IBM is heading
unless you change today's
computing architecture," said
Arvind, a computer scientist
at the Massachusetts
Institute of Technology,
who uses only a single name.
"It looks as if they have
an outstanding engineering
plan. If they can execute
it properly, it will be a real
breakthrough."
Blue Gene's speed target
is a petaflop -- that is, a
thousand trillion floating
point operations, or
calculations, each second.
Such a speed would make
the machine 500 times faster
than the two fastest
supercomputers in operation
today -- an IBM
supercomputer at the Lawrence
Livermore national
laboratory, and an Intel
machine at the Los Alamos lab.
To translate Blue Gene's
speed into a personal
computer scale: If a fast
PC was represented as an inch
tall, the IBM machine would
be 20 miles high.
The hardware design of Blue
Gene is innovative
indeed, but the real challenge,
as is so often the case in
computing, will be the software.
For in simplifying the
hardware design for speed,
the complexity of protein
folding is left to the software.
And the software, among
other things, has to be
"self-healing" so that the
simulation does not grind
to a halt if a few processors
break down. The software
must recognize the flawed
processors and re-route
the data.
"We have some idea how we're
going to do this," said
Marc Snir, a senior researcher
at Watson. "But I would
be lying if I said we have
solved this. We do have
research to do."
If all the computer wizardry
works as planned, it will
still take Blue Gene about
a year to simulate on the
computer the folding of
a single protein. How long does
it take the body to fold
one? Less than a second.
"It is absolutely amazing
the complexity of the problem
and the simplicity with
which the body does it every
day," Ajay Royyuru, a researcher
in IBM's
computational biology center,
noted.
Related sites: