CS 102: Data Structures

Spring 2006


Lab 6

Timing Sorts


Introduction

In this lab we will create methods to implement insertion sort and quicksort algorithms. We will then time these sorts on arrays of different lengths to see where the quicksort becomes more efficient than the insertion sort. Finally we will create a hybrid sort that calls insertion sort for small arrays and quicksort for larger arrays, thereby getting the best of both worlds. We will time the three sorting methods to verify that the hybrid sort performs well across the range of array sizes.

This lab counts as two labs: the first lab credit is for the work on coding and timing the results of the sort routines; the second lab grade is for the report on your work.

Setup

The starting code for this lab is in the two files ArraySort.java and SortTest.java. You should also get the folder chn from lab 4 to put into your code folder, so these utilities are accessible. The code for SortTest is complete. You will complete three methods in the ArraySort class.

Specifications

The class ArraySort is a container for sorting methods only -- it has no state and need not be instantiated as an object. It is like the Math class. There will be three public sort methods, quicksort, insertSort, and quicksort2, the hybrid sort. These will be supported by private helper methods as needed.


To Do

The ArraySort class has already been started. Two helper methods for the quicksort algorithm are done for you, partition and medianOfThreeIndex.

1. Complete the quicksortHelper method, using the partition method that is already done. This should be a recursive quicksort on the segment of the array from first to last inclusive.

2. Complete the insertSortHelper method. This should do an insertion sort on the segment of the array from first to last inclusive.

3. Use the SortTest driver to test your sorts by printing the sorted array, by inserting calls to the method printArray appropriately. When the sorts are correct, remove the printing and run the driver to time the sorts. Note that to be accurate, your times should be five seconds or more. For each sort you should subtract out the loop that does the copying. Then use a spreadsheet to record your results and plot the times for the two sorts. Your objective is to estimate the length of list where quicksort becomes faster than insertion sort. Be sure to do five or more runs for each size and average the results. For any sort, the time (which is in seconds) divided by the number of repetitions gives you the time for one sort. Multiply by 1,000,000 to get the time in microseconds and plot this.

You should try array lengths of 10, 20, 30, 40, 50, ..., 100 to get a first estimate of the crossover length, then try additional lengths to pin it down as best you can.

4. Based on your results in part 3, choose a threshold where you switch from insertion sort to quicksort. Then create the method quicksort2 that is based on this threshold. Use a helper method as needed. Note that all sorts of sublists that are below the threshold should be done with the insertion sort.

5. Run the timing for all three sorts and graph your results. Try every 10 up to 100, then 100, 200, 300, 400, 500, 1000. At some point, you can leave out the insertion sort when it takes too much time.

6. Write a formal lab report on this lab, illustrated by your graphs. This report should be 4 to 8 pages and should include the setup for the lab (type of computer, including processor and speed, language used, the process followed and any other relevant details), hypotheses that you have about the results of your experiments (timing runs), a summary of the results of your experiments (timings), the conclusions that you draw and how they compare to your hypotheses. Note, your results should not include all the details of your results, but a summary table using the averages and appropriate graphs integrated into your document. This should be written so that it can be read by someone with an understanding of the material that you have covered in the course who has not done the experiment. If in doubt, explain, don't assume that the reader knows something.

Note: when doing the timings, use a number of repetitions that gives you results of 5 seconds or more. Then multiply the time by 1,000,000 / number of repetitions to get the time in microseconds. As the length gets longer, you can use fewer repetitions, as long as your net times are 5 seconds or more. Use a spreadsheet to tabulate and graph your results.