The Arbitrary Substitution Principle

Robin Hillyard
4 min readJan 26, 2022

--

Cleopatra with the Asp (1630); Reni, Guido

The Arbitrary Substitution Principle (ASP) relates to code such that, if you changed the order of the terms of an expression, the code would behave differently. Alternatively, if you were to substitute one logical alternative for another, the behavior would change.

Time for some examples. A simple example of the first type would be this Java fragment:

boolean a = false, b = evaluateRemotePredicate();
boolean x = a && b;

If we changed the order of a and b in x, the result would be the same (false) but the behavior would be very different (as written, b — potentially a very slow and expensive invocation — never needs to be evaluated at all). In Java, this effect is known as short-circuiting and is unique to the && and || operators. In functional languages, like Scala, the evaluation rules are determined by the definition of a method itself so this sort of thing (lazy evaluation via call-by-name) is a lot more common than in Java.

The second type of ASP situation is when you have two alternative, but more or less equivalent, solutions to a problem. A classic example of this is in Hibbard deletion from a binary search tree. This technique, or something similar, is required if you need to delete a node (x) with two children. The parent (y) of the node to be deleted cannot accommodate both children. If you search for this technique you will see it described in terms of finding x’s successor (z) and having it essentially replace x. This works because z, which is the minimum element of the right-hand branch of y, can have at most one child. That’s because, if it had a left-hand sub-tree, it couldn’t also be a minimum. You can find a good description of this method at the site for Algorithms, 4th Edition: Sedgewick and Wayne: https://algs4.cs.princeton.edu/32bst/ Take note particularly of the truly bizarre behavior of the animation captioned: Each BST contains 150 nodes….

private Node delete(Node x, Key key) {
if (x == null) return null;
int cmp = key.compareTo(x.key);
if (cmp < 0) x.left = delete(x.left, key);
else if (cmp > 0) x.right = delete(x.right, key);
else {
if (x.right == null) return x.left;
if (x.left == null) return x.right;
Node t = x;
// Start of ASP "violation"
x = min(t.right);
x.right = deleteMin(t.right);
x.left = t.left;
}
return x;
}

Note how, in addition to words in the description like successor, we also have the following code references: min, deleteMin, left, right. As the authors ask: “Why not use the predecessor?” Indeed, successor can switch to predecessor, right to left, left to right, min to max, deleteMin to deleteMax in every one of these situations (we must be consistent of course). The best situation is obtained when the choice between the “rightist” version and the “leftist” version is made randomly.

So, how does this help us in our programming? As far as I know there are no IDE tips which flag this problem. But, programmers should be on the lookout, especially peer programmers. Situations where such a choice has been made either arbitrarily, or not arbitrarily, should be so commented. An unjustified choice between two equally valid alternatives should always be questioned as a red flag (code smell). Did the author recognize that a choice could be made? Why did the author make the actual choice? Was it an inherent bias that might have led to a loss of generality or, in the case of Hibbard deletion, poor performance?

Another example relates to the Quick-Union solution to the Union-Find problem. This is also described by Sedgewick and Wayne here: https://algs4.cs.princeton.edu/15uf/ This time, heeding the ASP violation leads directly to a major improvement in performance. In the description of Quick-Union, we see the following: then rename one of the components by linking one of these roots to the other. As soon as you read this sentence, your ASP antennae should go up. The key phrase is “one.. to the other.” No indication of how to choose “one” is given. Could it be that it actually matters? You bet. The improvement that they call “Weighted Quick-Union,” uses the key phrase: and always connect the smaller tree to the larger. Simply by paying attention to tree sizes, we can improve an algorithm which is O(N) to O(log N).

I will close with one more example which makes the difference between being able to solve a problem and not solving it. It relates to a LeetCode problem called “Reaching Points.” The problem is described in terms of a starting point and an ending point. But, writing a solution based on proceeding from the start to the end — to which you might easily be led — is purely arbitrary given that the move function presented can readily be inverted. Without explicitly giving the solution away, let me suggest that the successful solution to this problem depends on recognizing a violation of the ASP and fixing it!

I definitely need to come up with a better name for this important principle! Maybe one of my readers can suggest a better name.

--

--

Robin Hillyard
Robin Hillyard

Written by Robin Hillyard

I’ve been addicted to programming for over 50 years now. I currently teach two classes to graduates at Northeastern University, Boston, USA.

No responses yet