Algorithms: Part 4 - Randomized Algorithms


Randomized Algorithms
Definitions
Probablistic Analysis: The analysis of algorithms where the input or behaviour is randomized. The focus on the random element of the algorithm allows us to determien the expected runtime, performance or outcome.
Expected Value: The mean/average of a random variable over a series of trials. For example, a fair 6-sided die has an expected value of 3.5, calculated as the average of all potential outcomes.
Discrete Distribution: A probability distribution where the random variable takes on distinct, separate values. Example: Rolling a 6-sided die.
Continuous Distribution: A probability distribution where the random variable can take on any value within a range. Example: A personβs height.
Uniform Distribution: A probability distribution where each possible outcome is equally likely.
Variance: A measure of how spread out the random variables are from the mean (expected value). It is calculated as:
- Standard Deviation: A measure of how much the random variable deviates from the mean. It is the square root of the variance:
Hiring Problem
The Hiring Problem describes a scenario where youβre hiring for a job, and you interview candidates one by one.
After each interview, you decide to hire the candidate only if they are the best one seen so far.
Using probabilistic analysis, we can determine how many times youβll hire a new candidate on average.
The candidates arrive in a random order, meaning that the best candidate appears at an arbitrary position in the sequence.
Define the Indicator Random Variables
Each candidate is indexed as i, where:
The Indicator Random Variable is
= 1, if the canidate is hired
= 0, if the canidate is not hired
is the total number of hires
Calculate the Probability of Hiring the i-th Candidate
To determine , we compute:
= Probability that
For to equal 1, this candidate has the probability , since its best the i-th candidate so far.
Use the Linearity of Expectation to find the Expected Value
This forms a Harmonic Series which can be simplified to show growth using the natural logarithm.
As an example:
Conclusion
The expected number of hires before finding the best seen candidate is . This runtime is much better than if the candidates arrive in sorted order in increase quality .
Why this matters
We can optimize algorithms to run at an average runtime to avoid the worst case scenario.
This has a wide application such as analyzing randomized algorithms, hashing and load balancing.
Probablistic Analysis Checklist
- Define the Indicator Random Variables
- Define a binary variable X_i, 1 if the event occurs and 0 otherwise.
Calculate the Expected Values using Linearity of Expectation
Simplify and Conclude
EXAMPLE: Rolling a 6 sided dice
- Define the Indicator Random Variable
= 1 if the roll is a 1
= 0 if the roll is not a 1
indicates whether a 1 has ocurred
- Linearity of Expected Value
Probability of each roll is
p = 1/6
It will take on average, 6 rolls until a 1 is rolled
Quicksort
Quicksort is an inplace sorting algorithm that has a worst case runtime of and a best/average case of .
The worst case happens when the pivot selection results in an imbalanced recursion tree. If the pivot is always the largest or smallest element, the recursion depth becomes n, leading to:
To illustrate the worst case scenario, lets take the sort list below:
[1, 2, 3, 4, 5]
If we chose the last index as the pivot (largest item) on each recursive call, we would split the list and have an empty list on the right, and a continuous call on the left which would be on each call.
[1, 2, 3, 4, 5]
[1, 2, 3, 4] [5]
[1, 2, 3] [4]
[1, 2] [3]
[1] [2]
Therefore, since the depth of the tree is size , we have created a runtime of
The solution to avoid this worst case scenario runtime is by introducing a randomized pivot.
We can more formally define the problem and solution.
Problem with Fixed Pivots
Choosing the first or last element of the array can lead to poor performance on certain inputs (e.g. reverse sorted, already sorted, arrays of all the same value)
By choosing, the smallest or largest element, it leads to an unbalanced recursion tree.
This results in a recursion depth of size , leading to a runtime of
Choosing a randomized pivot
By choosing a pivot randomly, we reduce the chance of consistently picking a bad pivot.
Using a random pivot will roughly divide the array into two equal halves
By using a random pivot on each recursive step, this is more likely to lead to a balanced recursion tree with a depth of , leading to an average runtime of
Intuition on the Average Case
In each recursive step where a randomized pivot is chosen, it has a uniform distribution of picking a βgoodβ pivot.
The probability of choosing a βgoodβ pivot is . Therefore the probability choosing a good pivot is high.
As the randomized pivot is applied to each level of recursion, even if a βbadβ pivot is chosen, the probability of having a string of βbadβ pivots is extremely low, therefore the probability of a balanced tree is very high.
Expected Depth of the Recursion
The best case is , meaning the tree has been split in a way that has roughly two halves at each level
The worst case is , meaning the smallest or largest element is chosen each time
With a Randomized Pivot, the expected depth of the recursion tree is
Why does Randomization Work?
Randomization ensures that the algorithm does not depend on the order of the input.
The probability of consistently picking bad pivots decreases exponentially with the depth of the tree.
Summary
Due to the randomized choice of pivot, the average runtime results in .
The goal is to create a balanced recursion tree to stabilize the runtime.
Code Example
If we look at this example, there is an output for a non-randomized quicksort algorithm and a randomized quicksort algorithm on a worst case scenario input where the pivot is chosen as the last element in the non-randomized implementation.
input = { 1, 2, 3, 4, 5, 6, 7, 8 }
NON-RANDOMIZED QUICKSORT:
Recursion level: 0, On Side: root, Sorting from index 0 to 7
{ 1, 2, 3, 4, 5, 6, 7, 8, }
Recursion level: 1, On Side: left, Sorting from index 0 to 6
{ 1, 2, 3, 4, 5, 6, 7, }
Recursion level: 2, On Side: left, Sorting from index 0 to 5
{ 1, 2, 3, 4, 5, 6, }
Recursion level: 3, On Side: left, Sorting from index 0 to 4
{ 1, 2, 3, 4, 5, }
Recursion level: 4, On Side: left, Sorting from index 0 to 3
{ 1, 2, 3, 4, }
Recursion level: 5, On Side: left, Sorting from index 0 to 2
{ 1, 2, 3, }
Recursion level: 6, On Side: left, Sorting from index 0 to 1
{ 1, 2, }
Recursion level: 7, On Side: left, Sorting from index 0 to 0
{ 1, }
Recursion level: 7, On Side: right, Sorting from index 2 to 1
{ }
Recursion level: 6, On Side: right, Sorting from index 3 to 2
{ }
Recursion level: 5, On Side: right, Sorting from index 4 to 3
{ }
Recursion level: 4, On Side: right, Sorting from index 5 to 4
{ }
Recursion level: 3, On Side: right, Sorting from index 6 to 5
{ }
Recursion level: 2, On Side: right, Sorting from index 7 to 6
{ }
Recursion level: 1, On Side: right, Sorting from index 8 to 7
{ }
RANDOMIZED QUICKSORT:
Recursion level: 0, On Side: root, Sorting from index 0 to 7
{ 1, 2, 3, 4, 5, 6, 7, 8, }
Recursion level: 1, On Side: left, Sorting from index 0 to 3
{ 1, 2, 3, 4, }
Recursion level: 2, On Side: left, Sorting from index 0 to 2
{ 1, 2, 3, }
Recursion level: 3, On Side: left, Sorting from index 0 to 1
{ 1, 2, }
Recursion level: 4, On Side: left, Sorting from index 0 to -1
{ }
Recursion level: 4, On Side: right, Sorting from index 1 to 1
{ 2, }
Recursion level: 3, On Side: right, Sorting from index 3 to 2
{ }
Recursion level: 2, On Side: right, Sorting from index 4 to 3
{ }
Recursion level: 1, On Side: right, Sorting from index 5 to 7
{ 6, 7, 8, }
Recursion level: 2, On Side: left, Sorting from index 5 to 5
{ 6, }
Recursion level: 2, On Side: right, Sorting from index 7 to 7
{ 8, }
We can see that the depth of the non-randomized quicksort is 8 levels and the depth on the randomized quicksort on this run is 5 levels.
We can conclude that by randomizing the pivot point using a uniform distribution between start to end, we can mitigate the worst case runtime towards O(n log n).
Below is a coded example of a implemented randomized quicksort.
int _partition(int *arr, int start, int end, bool randomized) {
size_t pIdx = start;
size_t pivotPos = end;
if (randomized) {
int randPos = rand() % (end - start + 1) + start;
// Swap the randomly selected pivot with the end element.
_swap(arr, randPos, end);
}
for (int i = start; i < end; i++) {
if (arr[i] <= arr[end]) {
_swap(arr, i, pIdx);
pIdx++;
}
}
_swap(arr, pIdx, pivotPos);
return pIdx;
}
void _quicksort(int *arr, int start, int end, int level, const char *side,
LogCallback logCallback, bool randomized) {
if (logCallback) {
logCallback(arr, level, start, end, side);
}
if (start >= end) {
return;
}
int pivot = _partition(arr, start, end, randomized);
int left_pivot = pivot - 1;
int right_pivot = pivot + 1;
int curr_level = level + 1;
_quicksort(arr, start, left_pivot, curr_level, "left", logCallback,
randomized);
_quicksort(arr, right_pivot, end, curr_level, "right", logCallback,
randomized);
}
Monte Carlo Simulations
Monte Carlo Simulations are randomized algorithms with a determined run-time but are used to create estimates on its output.
It uses randomness as input using a sampling method to determine an approximate result.
Here is a very basic example:
Random Inputs: You generate 1000 random arrays of varying order and size. This introduces randomness into the simulation.
Algorithm Execution: You run the randomized algorithm (quicksort) on each input and record the recursion depth.
Data Collection: You collect the recursion depths for all 1000 runs.
Analysis: You aggregate the results to determine the most common recursion depth, which correlates with the runtime behavior:
- If the depth of the recursion is O(log N), then the runtime is very likely O(N log N)
- If the depth is O(N), the runtime is likely O(N^2)
Due to the randomness of the randomized quicksort algorithm when it comes to choosing a pivot, each run through the algorithm maybe yield slighlty different depths but should be around O(log N) depth.
The accuracy of the esimate should increase as the number of trials increase.
Las Vegas Algorithm
A Las Vegas Algorithm is a randomized algorithm that has a random runtime but a determined result.
Using Randomized Quicksort as an example:
We know the expected runtime is O(n log n) but this can vary around the expected time, depending on the input and choice of pivot.
But we know a correctly implemented Randomized Quicksort Algorithm will always result in a sorted list.
Here is a clearer definition of a Las Vegas Algorithm:
- The algorithm makes random choices during its execution
- The random choices leads to a clear solution, the result is always correct
- If the algorithm fails to produce a correct result it retries, repeating the process