Subsections

Further Investigation

Performance with Different System Configurations

The four processors available really limited the scope of investigation. Measurements should be done on computers with far more processors to see what happens with performance. Some questions to be answered are: Will bus contention limit the parallelism? Can the Improved Parallel Quicksort keep up the parallelism as the number of processors increase, or will lock contention increase? Will thread latency increase even more for small number of elements?

The FastThread library, although simple and robust enough, has its problems. One is the inability to do I/O in a straightforward manner as mentioned in section 2.2. The other is the operating system interference that causes thread latency, described in section 5.2 above. If the sorting algorithms were implemented on an operating system with integrated threads, performance should increase.

Real Application Input Data

For my measurements, I've use arbitrary input data (though chosen with some thought). To get a more useful evaluation of the algorithms, real applications should be examined in order to locate typical sort data. With measurements based on real data, program designers could choose the most appropriate algorithm for each problem.

Algorithm implementation

The Improved Parallel Quicksort is, as mentioned above, far from optimal. Further tuning of the critical parts should improve the efficiency to nearly that of the ordinary Quicksort.

The parallel quicksort algorithms suggested have the parallel threshold hard-coded. This means that they will perform optimally only for a rather limited data set. With some sort of dynamic threshold, the algorithms should be able to adapt to different input data.

Timing of External Algorithms

The external algorithms should be exhaustively timed using different I/O system configurations. Especially the I/O configuration should affect performance.

A different thread implementation than FastThreads' should be used when implementing the external algorithms. The Unix process is probably sufficient for this purpose.

© Ola Sigurdson 1991-08-20 Ola@Sigurdson.SE