Subsections

Environment Used

Symmetry Computer

The computer I used was a Sequent Symmetry S27, which is is a shared memory multi-processor computer with all processors attached to the same memory bus. The processors rely on per processor copy-back caches to reduce bus contention. In my opinion, the caching works well, and reduces the contention to almost nothing for four processors.

The system was equipped with with four Intel i386 processors running at 20 MHz, 80 MB of main memory and 2 GB of disk space. The bus speed was 10 MHz.

The operating system was Dynix version 3.0.17.9, Sequent's multi-processor port of Berkley Unix 4.2. Processing is totally symmetric with dynamic load balancing, so each processor can execute any process, including the Dynix kernel. Dynix includes some support for function and data partitioning, basically some code wrapped around the fork() call, and a locking mechanism.

The basic synchronization object for parallel programs is the hardware lock. On the Symmetry it is implemented by using the xchb machine instruction, which is guaranteed to be atomic.

Shared memory is implemented with file-mapping, using the mmap(), and a shared memory version of the sbrk() function calls. This approach has several disadvantages, some of which are mentioned in section 5.1 below. [3]


FastThreads

FastThreads is a library package implementing light-weight threads. On startup it spawns a process (the thread scheduler) for each processor, and then it creates the main user thread. The processes then schedule and run, using a simple non-preemptive algorithm, the threads that the user threads create.

The synchronization functions necessary for parallel programming are present in the form of lock and barriers. Threads can choose whether to spin-wait or block when synchronization is needed.

FastThreads is a good package for experimenting with threads, but it has serious problems with I/O. The thread schedulers block when a thread does I/O, which means that no thread can be run on that processor until the I/O completes (although other ordinary Dynix processes can run).

A file opened by a thread is local to that processor (process) only. If a thread blocks and starts running on an other processor the file descriptors are lost. Thread management as part of the operating system would have given noticeably better performance. For further information, see [1].

To get better idea of what the programs were doing I modified the FastThread library so that it could generate event traces. The event traces could then be displayed in graph form using a processor status display program developed at the Toshiba R&D Center. For more ideas on visualizing parallel program execution see [4].

I used the FastThread library throughout the investigation.

© Ola Sigurdson 1991-08-20 Ola@Sigurdson.SE