Saturday, December 10, 2011

Merge Sort Vs Bucket Sort

Merge sort     

       Merge sort is based on the divide-and-conquer paradigm. Its worst-case running time has a lower order of growth than insertion sort. Since we are dealing with subproblems, we state each subproblem as sorting a subarray A[p .. r]. Initially, p = 1 and r = n, but these values change as we recurse through subproblems.

To sort A[p .. r]:

     1.Divide Step

       If a given array A has zero or one element, simply return; it is already sorted. Otherwise, split A[p .. r] into two subarrays A[p.. q] and A[q + 1 .. r], each containing about half of the elements of A[p .. r]. That is, q is the halfway point of A[p .. r].

     2.Conquer Step

       Conquer by recursively sorting the two subarrays A[p .. q] and A[q + 1 .. r].

     3. Combine Step

       Combine the elements back in A[p .. r] by merging the two sorted subarrays A[p .. q] and A[q + 1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure MERGE (A, p, q, r).



Algorithm: Merge Sort

To sort the entire sequence A[1 .. n], make the initial call to the procedure MERGE-SORT (A, 1, n).

MERGE-SORT (A, p, r)

  1. IF p < r // Check for base case
  2. THEN q = FLOOR[(p + r)/2] // Divide step
  3. MERGE (A, p, q) // Conquer step.
  4. MERGE (A, q + 1, r) // Conquer step.
  5. MERGE (A, p, q, r) // Conquer step.

Example: Bottom-up view of the above procedure for n = 8.






Merging


What remains is the MERGE procedure. The following is the input and output of the MERGE procedure.

  • Input: Array A and indices p, q, r such that p ≤ q ≤ r and subarray A[p .. q] is sorted and subarray A[q + 1 .. r] is sorted. By restrictions on p, q, r, neither subarray is empty.
  • Output: The two subarrays are merged into a single sorted subarray in A[p .. r].

We implement it so that it takes Θ(n) time, where n = r − p + 1, which is the number of elements being merged.




Pseudocode

Merge (A, p, q, r )

  1.  n1 ← q − p + 1
  2.  n2 ← r − q
  3.  Create arrays L[1 . . n1 + 1] and R[1 . . n2 + 1]
  4.  FOR i ← 1 TO n1
  5.  DO L[i] ← A[p + i − 1]
  6.  FOR j ← 1 TO n2
  7. DO R[j] ← A[q + j ]
  8.  L[n1 + 1] ← ∞
  9. R[n2 + 1] ← ∞
  10.  i ← 1
  11.  j ← 1
  12.  FOR k ← p TO r
  13. 1DO IF L[i ] ≤ R[ j]
  14. THEN A[k] ← L[i]
  15.  i ← i + 1
  16. ELSE A[k] ← R[j]
  17.  j ← j + 1

Implementation

void mergeSort(int numbers[], int temp[], int array_size)
{
m_sort(numbers, temp, 0, array_size - 1);
}
void m_sort(int numbers[], int temp[], int left, int right)
{
int mid;
if (right > left)
{
mid = (right + left) / 2;
m_sort(numbers, temp, left, mid);
m_sort(numbers, temp, mid+1, right);
merge(numbers, temp, left, mid+1, right);
}
}
void merge(int numbers[], int temp[], int left, int mid, int right)
{
int i, left_end, num_elements, tmp_pos;
left_end = mid - 1;
tmp_pos = left;
num_elements = right - left + 1;
while ((left <= left_end) && (mid <= right))
{
if (numbers[left] <= numbers[mid])
{
temp[tmp_pos] = numbers[left];
tmp_pos = tmp_pos + 1;
left = left +1;
}
else
{
temp[tmp_pos] = numbers[mid];
tmp_pos = tmp_pos + 1;
mid = mid + 1;
}
}
while (left <= left_end)
{
temp[tmp_pos] = numbers[left];
left = left + 1;
tmp_pos = tmp_pos + 1;
}
while (mid <= right)
{
temp[tmp_pos] = numbers[mid];
mid = mid + 1;
tmp_pos = tmp_pos + 1;
}
for (i = 0; i <= num_elements; i++)
{
numbers[right] = temp[right];
right = right - 1;
}
}



Linkhttp://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Sorting/mergeSort.htm


Bucket Sort

       Bucket sort runs in linear time on the average. It assumes that the input is generated by a random process that distributes elements uniformly over the interval [0, 1).

       The idea of Bucket sort is to divide the interval [0, 1) into n equal-sized subintervals, or buckets, and then distribute the n input numbers into the buckets. Since the inputs are uniformly distributed over (0, 1), we don't expect many numbers to fall into each bucket. To produce the output, simply sort the numbers in each bucket and then go through the bucket in order, listing the elements in each.

       The code assumes that input is in n-element array A and each element in A satisfies 0 ≤ A[i] ≤ 1. We also need an auxiliary arrayB[0 . . n -1] for linked-lists (buckets).


BUCKET_SORT (A)
  1. n ← length [A]
  2. For i = 1 to n do
  3. Insert A[i] into list B[nA[i]]
  4. For i = 0 to n-1 do
  5. Sort list B with Insertion sort
  6. Concatenate the lists B[0], B[1], . . B[n-1] together in order.

Example

       Given input array A[1..10]. The array B[0..9] of sorted lists or buckets after line 5. Bucket i holds values in the interval [i/10, (i+1)/10]. The sorted output consists of a concatenation in order of the lists first B[0] then B[1] then B[2] ... and the last one isB[9].




Analysis

       All lines except line 5 take O(n) time in the worst case. We can see inspection that total time to examine all buckets in line 5 is O(n-1)i.e., O(n).

       The only interesting part of the analysis is the time taken by Insertion sort in line 5. Let ni be the random variable denoting the number of elements in the bucket B[i]. Since the expected time to sort by INSERTION_SORT is O(n2), the expected time to sort the elements in bucket B[i] is

       E[O(2ni)] = O(E[2ni]]

      Therefore, the total expected time to sort all elements in all buckets is

       n-1∑i=0 O(E[2ni]) = O n-1∑i=0 (E[2ni]) ------------ A

       In order to evaluate this summation, we must determine the distribution of each random variable n


       We have n elements and n buckets. The probability that a given element falls in a bucket B[i] is 1/n i.e., Probability = p = 1/n. (Note that this problem is the same as that of "Balls-and-Bin" problem)

Therefore, the probability follows the binomial distribution, which has

       mean: E[ni] = np = 1
       variance: Var[ni] = np(1- p) = 1- 1/n

       For any random variable, we have

       E[2ni] = Var[ni] + E2[ni]
       = 1 - 1/n + 12
       = 2 - 1/n
       = (1)



Conclusion: 
       
       Para sa akin mas magandang gamit ang Bucket sort keysa sa Merge sort, dahil mas mabilis ang process ng pagsosorting pag bucket sort ang gagamitin kasi ito linear ang sorting keysa sa merge sort ng keylangan mo pang i-divide ng i-divide bago ma sort. :D




---------------------------
Medialdia, Myrone J.
BSCS3-1
200912295
---------------------------

1 comment: