Parallel processing in geography

by
Ian Turton

Introduction
This chapter is about parallel processing, or parallel computing; the terms are
used synonymously. It will focus on ways to produce real applications not
computer science abstractions. It will start out by describing what parallel
computing is and why as a geographer you should even care. It will then give a
brief historical overview of supercomputing and the rise of parallel computers. It
will then attempt to set out what parallel computing is good for and what it is not
good for and then finish up by showing you how you might get started with
parallel computing.

What is parallel computing?
Parallel processing at its simplest is making use of more than one central
processing unit at the same time to allow you to complete a long computational
task more quickly. This should not be confused with so called multitasking
where a single processor gives the appearance of working on more than one task
by splitting its time between programs; if both the programs are computationally
intensive then it will take more than twice the time for them to complete, nor are
we concerned here with specialized processors (e.g. graphics controllers or disk
managers) that work in parallel with a processor.
Some tasks are easy to parallelize. One example often used is building a brick
wall. If it would take one person four days to build, it would probably take a
well-organized team of four bricklayers one day to build it. However some tasks
are clearly unsuited to parallelization; for example, if it takes a woman nine
months to have a baby, it will still take nine women nine months! But if the task
was to produce nine babies then getting nine women to help would provide a
considerable speedup. Speedup is a term we will return to later which is often
used to describe how well (or badly) a parallel program is working; it can be
defined as the time taken to run the program on a single processor divided by the
time taken to run it on a larger number of processors (N). The closer the speedup
is to N the better the program is performing.
Parallel processing is often felt to be the preserve of large number crunching
engineering disciplines; however geography has many large complex problems
that require the use of the largest supercomputers available. It also has many
smaller problems that can still benefit from parallelism on a smaller scale that is
easily available to geographers.

Types of parallel computer
There are several different types of parallel computer, some of which are better
than others for different types of task. Flynn (1972) proposed a classification of
computers based on their use of instructions (the program) and their use of data
(Table 3.1). He divided computers into four possible groups formed by the
intersection of machines that used single steams of data and multiple streams of
data, and machines that used single streams of instructions and multiple streams
of instructions.
A SISD processor works one instruction at a time on a single piece of data;
this is the classical or von Neuman processor. The operations are ordered in time
and are easily traced and understood. Some would argue that the introduction of
pipelining in modern processors introduces an element of temporal parallelism
into the processor. This is, however, not true parallelism and can be ignored for
our purposes as it is not completely within the control of the programmer.
A MISD processor would have to apply multiple instructions to a single piece
of data at the same time. This is clearly of no use in the real world and therefore
is seen by many to be a serious failing of Flynn’s classification since it classifies
non-existent processor types.
The last two classes of processor are of more interest to parallel programmers.
Firstly, SIMD machines have a series of processors that operate in exact
lockstep, each carrying out the same operation on a different piece of data at the
same time. In the 1980s, Thinking Machines produced the massively parallel
connection machine with many thousands of processors. These machines were
useful for some limited types of problem but proved to be less than useful for
many types of real problem and Thinking Machines went bankrupt and has
ceased to make hardware.
Secondly, MIMD machines are more generally useful having many
processors performing different instructions on different pieces of data at the
same time. There is no need for each processor to be exactly in step with
eachother processor or even to be carrying out a similar task. This allows the
programmer much greater flexibility in programming the machine to carry out
the task required as opposed to coercing the algorithm to fit the machine.

This entry was posted in Articles, Gis-RS Books and tagged , . Bookmark the permalink.

Comments are closed.