Randomized Algorithms

Definitions

Probablistic Analysis: The analysis of algorithms where the input or behaviour is randomized. The focus on the random element of the algorithm allows us to determien the expected runtime, performance or outcome.
Expected Value: The mean/average of a random variable over a series of trials. For example, a fair 6-sided die has an expected value of 3.5, calculated as the average of all potential outcomes.
Discrete Distribution: A probability distribution where the random variable takes on distinct, separate values. Example: Rolling a 6-sided die.
Continuous Distribution: A probability distribution where the random variable can take on any value within a range. Example: A person’s height.
Uniform Distribution: A probability distribution where each possible outcome is equally likely.
Variance: A measure of how spread out the random variables are from the mean (expected value). It is calculated as:

$Var(x) = E[(x-E(x))^2]$

Standard Deviation: A measure of how much the random variable deviates from the mean. It is the square root of the variance:

$sqrt(Var(x))$

Hiring Problem

The Hiring Problem describes a scenario where you’re hiring for a job, and you interview $n$ candidates one by one.

After each interview, you decide to hire the candidate only if they are the best one seen so far.

Using probabilistic analysis, we can determine how many times you’ll hire a new candidate on average.

The candidates arrive in a random order, meaning that the best candidate appears at an arbitrary position in the sequence.

Define the Indicator Random Variables

Each candidate is indexed as i, where:

$i = 1, 2, ..., n$

The Indicator Random Variable is $Xi$

$Xi$ = 1, if the $ith$ canidate is hired
$Xi$ = 0, if the $ith$ canidate is not hired
$X$ is the total number of hires

$X = X_1 + X_2 + ... + X_n$

Calculate the Probability of Hiring the i-th Candidate

To determine $E[X]$ , we compute:

$E[X_i]$ = Probability that $X_i = 1$

For $X_i$ to equal 1, this candidate has the probability $1/i$ , since its best the i-th candidate so far.

Use the Linearity of Expectation to find the Expected Value

$E[X] = E[X_1 + X_2 + ... X_n]$

$E[X] = 1/1 + 1/2 + 1/3 + ... + 1/n = E[X] = 1 + 1/2 + 1/3 + ... + 1/n$

This forms a Harmonic Series which can be simplified to show growth using the natural logarithm.

As an example:

$1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + 1/7 + 1/8 + 1/9 + 1/10 ~= 2.9$

$ln(10) ~= 2.9$

Conclusion

The expected number of hires before finding the best seen candidate is $O(log n)$ . This runtime is much better than if the candidates arrive in sorted order in increase quality $O(n)$ .

Why this matters

We can optimize algorithms to run at an average runtime to avoid the worst case scenario.

This has a wide application such as analyzing randomized algorithms, hashing and load balancing.

Probablistic Analysis Checklist

Define the Indicator Random Variables

Define a binary variable X_i, 1 if the event occurs and 0 otherwise.

Calculate the Expected Values using Linearity of Expectation
Simplify and Conclude

EXAMPLE: Rolling a 6 sided dice

Define the Indicator Random Variable

$X_i$ = 1 if the $i_th$ roll is a 1
$X_i$ = 0 if the $i_th$ roll is not a 1
$X_i$ indicates whether a 1 has ocurred

Linearity of Expected Value

Probability of each roll is p = 1/6
$1/p = 1/1/6 = 6$

It will take on average, 6 rolls until a 1 is rolled

Quicksort

Quicksort is an inplace sorting algorithm that has a worst case runtime of $O(n^2)$ and a best/average case of $O(n log n)$ .

The worst case happens when the pivot selection results in an imbalanced recursion tree. If the pivot is always the largest or smallest element, the recursion depth becomes n, leading to:

$O(n^2)$

To illustrate the worst case scenario, lets take the sort list below:

[1, 2, 3, 4, 5]

If we chose the last index as the pivot (largest item) on each recursive call, we would split the list and have an empty list on the right, and a continuous call on the left which would be $n-1$ on each call.

[1, 2, 3, 4, 5]
[1, 2, 3, 4]     [5]
[1, 2, 3]        [4]
[1, 2]           [3]
[1]              [2]

Therefore, since the depth of the tree is size $n$ , we have created a runtime of $O(n^2)$

The solution to avoid this worst case scenario runtime is by introducing a randomized pivot.

We can more formally define the problem and solution.

Problem with Fixed Pivots

Choosing the first or last element of the array can lead to poor performance on certain inputs (e.g. reverse sorted, already sorted, arrays of all the same value)
By choosing, the smallest or largest element, it leads to an unbalanced recursion tree.
This results in a recursion depth of size $n$ , leading to a runtime of $\theta(n)$

Choosing a randomized pivot

By choosing a pivot randomly, we reduce the chance of consistently picking a bad pivot.

Using a random pivot will roughly divide the array into two equal halves

By using a random pivot on each recursive step, this is more likely to lead to a balanced recursion tree with a depth of $O(log n)$ , leading to an average runtime of $O(n log n)$

Intuition on the Average Case

In each recursive step where a randomized pivot is chosen, it has a uniform distribution of picking a “good” pivot.

The probability of choosing a “good” pivot is $n-2/n$ . Therefore the probability choosing a good pivot is high.

As the randomized pivot is applied to each level of recursion, even if a “bad” pivot is chosen, the probability of having a string of “bad” pivots is extremely low, therefore the probability of a balanced tree is very high.

Expected Depth of the Recursion

The best case is $log n$ , meaning the tree has been split in a way that has roughly two halves at each level

The worst case is $n$ , meaning the smallest or largest element is chosen each time

With a Randomized Pivot, the expected depth of the recursion tree is $O(log n)$

Why does Randomization Work?

Randomization ensures that the algorithm does not depend on the order of the input.

The probability of consistently picking bad pivots decreases exponentially with the depth of the tree.

Summary

Due to the randomized choice of pivot, the average runtime results in $O(n log n)$ .

The goal is to create a balanced recursion tree to stabilize the runtime.

Code Example

If we look at this example, there is an output for a non-randomized quicksort algorithm and a randomized quicksort algorithm on a worst case scenario input where the pivot is chosen as the last element in the non-randomized implementation.

input = { 1, 2, 3, 4, 5, 6, 7, 8 }

NON-RANDOMIZED QUICKSORT:

Recursion level: 0, On Side: root, Sorting from index 0 to 7
{ 1, 2, 3, 4, 5, 6, 7, 8,  }
Recursion level: 1, On Side: left, Sorting from index 0 to 6
{ 1, 2, 3, 4, 5, 6, 7,  }
Recursion level: 2, On Side: left, Sorting from index 0 to 5
{ 1, 2, 3, 4, 5, 6,  }
Recursion level: 3, On Side: left, Sorting from index 0 to 4
{ 1, 2, 3, 4, 5,  }
Recursion level: 4, On Side: left, Sorting from index 0 to 3
{ 1, 2, 3, 4,  }
Recursion level: 5, On Side: left, Sorting from index 0 to 2
{ 1, 2, 3,  }
Recursion level: 6, On Side: left, Sorting from index 0 to 1
{ 1, 2,  }
Recursion level: 7, On Side: left, Sorting from index 0 to 0
{ 1,  }
Recursion level: 7, On Side: right, Sorting from index 2 to 1
{  }
Recursion level: 6, On Side: right, Sorting from index 3 to 2
{  }
Recursion level: 5, On Side: right, Sorting from index 4 to 3
{  }
Recursion level: 4, On Side: right, Sorting from index 5 to 4
{  }
Recursion level: 3, On Side: right, Sorting from index 6 to 5
{  }
Recursion level: 2, On Side: right, Sorting from index 7 to 6
{  }
Recursion level: 1, On Side: right, Sorting from index 8 to 7
{  }

RANDOMIZED QUICKSORT:

Recursion level: 0, On Side: root, Sorting from index 0 to 7
{ 1, 2, 3, 4, 5, 6, 7, 8,  }
Recursion level: 1, On Side: left, Sorting from index 0 to 3
{ 1, 2, 3, 4,  }
Recursion level: 2, On Side: left, Sorting from index 0 to 2
{ 1, 2, 3,  }
Recursion level: 3, On Side: left, Sorting from index 0 to 1
{ 1, 2,  }
Recursion level: 4, On Side: left, Sorting from index 0 to -1
{  }
Recursion level: 4, On Side: right, Sorting from index 1 to 1
{ 2,  }
Recursion level: 3, On Side: right, Sorting from index 3 to 2
{  }
Recursion level: 2, On Side: right, Sorting from index 4 to 3
{  }
Recursion level: 1, On Side: right, Sorting from index 5 to 7
{ 6, 7, 8,  }
Recursion level: 2, On Side: left, Sorting from index 5 to 5
{ 6,  }
Recursion level: 2, On Side: right, Sorting from index 7 to 7
{ 8,  }

We can see that the depth of the non-randomized quicksort is 8 levels and the depth on the randomized quicksort on this run is 5 levels.

We can conclude that by randomizing the pivot point using a uniform distribution between start to end, we can mitigate the worst case runtime towards O(n log n).

Below is a coded example of a implemented randomized quicksort.

int _partition(int *arr, int start, int end, bool randomized) {
  size_t pIdx = start;
  size_t pivotPos = end;

  if (randomized) {
    int randPos = rand() % (end - start + 1) + start;

    // Swap the randomly selected pivot with the end element.
    _swap(arr, randPos, end);
  }

  for (int i = start; i < end; i++) {
    if (arr[i] <= arr[end]) {
      _swap(arr, i, pIdx);
      pIdx++;
    }
  }

  _swap(arr, pIdx, pivotPos);

  return pIdx;
}

void _quicksort(int *arr, int start, int end, int level, const char *side,
                LogCallback logCallback, bool randomized) {
  if (logCallback) {
    logCallback(arr, level, start, end, side);
  }

  if (start >= end) {
    return;
  }

  int pivot = _partition(arr, start, end, randomized);

  int left_pivot = pivot - 1;
  int right_pivot = pivot + 1;
  int curr_level = level + 1;

  _quicksort(arr, start, left_pivot, curr_level, "left", logCallback,
             randomized);
  _quicksort(arr, right_pivot, end, curr_level, "right", logCallback,
             randomized);
}

Monte Carlo Simulations

Monte Carlo Simulations are randomized algorithms with a determined run-time but are used to create estimates on its output.

It uses randomness as input using a sampling method to determine an approximate result.

Here is a very basic example:

Random Inputs: You generate 1000 random arrays of varying order and size. This introduces randomness into the simulation.

Algorithm Execution: You run the randomized algorithm (quicksort) on each input and record the recursion depth.

Data Collection: You collect the recursion depths for all 1000 runs.

Analysis: You aggregate the results to determine the most common recursion depth, which correlates with the runtime behavior:

If the depth of the recursion is O(log N), then the runtime is very likely O(N log N)
If the depth is O(N), the runtime is likely O(N^2)

Due to the randomness of the randomized quicksort algorithm when it comes to choosing a pivot, each run through the algorithm maybe yield slighlty different depths but should be around O(log N) depth.

The accuracy of the esimate should increase as the number of trials increase.

Las Vegas Algorithm

A Las Vegas Algorithm is a randomized algorithm that has a random runtime but a determined result.

Using Randomized Quicksort as an example:

We know the expected runtime is O(n log n) but this can vary around the expected time, depending on the input and choice of pivot.

But we know a correctly implemented Randomized Quicksort Algorithm will always result in a sorted list.

Here is a clearer definition of a Las Vegas Algorithm:

The algorithm makes random choices during its execution
The random choices leads to a clear solution, the result is always correct
If the algorithm fails to produce a correct result it retries, repeating the process

Algorithms: Part 4 - Randomized Algorithms

Randomized Algorithms

Definitions

Hiring Problem

Define the Indicator Random Variables

Calculate the Probability of Hiring the i-th Candidate

Use the Linearity of Expectation to find the Expected Value

Conclusion

Why this matters

Probablistic Analysis Checklist

EXAMPLE: Rolling a 6 sided dice

Quicksort

Problem with Fixed Pivots

Choosing a randomized pivot

Intuition on the Average Case

Expected Depth of the Recursion

Why does Randomization Work?

Summary

Code Example

Monte Carlo Simulations

Las Vegas Algorithm

Related

Algorithms: Part 4 - Randomized Algorithms

Randomized Algorithms

Definitions

Hiring Problem

Define the Indicator Random Variables

Calculate the Probability of Hiring the i-th Candidate

Use the Linearity of Expectation to find the Expected Value

Conclusion

Why this matters

Probablistic Analysis Checklist

EXAMPLE: Rolling a 6 sided dice

Quicksort

Problem with Fixed Pivots

Choosing a randomized pivot

Intuition on the Average Case

Expected Depth of the Recursion

Why does Randomization Work?

Summary

Code Example

Monte Carlo Simulations

Las Vegas Algorithm

Did you find this page helpful? Consider sharing it 🙌

Related