SetBang 2: SetBang programming

In Part 1, I introduced SetBang, an esoteric language based on set theory. Here, I’ll write some SetBang programs, and show that it’s Turing Complete.

First, let’s talk about the “mystery operator” that I introduced: *, which is identical to:

(~#1=(_{}{}~,_~\2>\2>_.2>\2>\2>_&{}2>{}),0)

Remember that 2> is a swap sequence with behavior ... X Y -> ... Y X. You’ll see it a lot.

What does this * do? Let’s take it apart. It’s a conditional, noting the outer (), and the else-clause is simple: 0. So its behavior is:

... 0 -> ... 0 0.

In the then-case, we ~# TOS and equality-check against 1, and then go into another conditional based on that result. So, if TOS is a 1-element set, we execute _{}{}. The eats the boolean produced by the =:

... {e} 1 -> ... {e}

and then we do two {}{}, leaving us with:

... U(e).

In the else-case of the inner conditional, we have at least two elements.

... {e, f, ...} 0.

We the boolean and ~ the set {e, f, ...} and then use \2>\2> to extract its elements (remember that 2> are swaps) then _the remainder set.

... {e, f, ...} e f

We use . to compute the exclusive-or of e and f, and 2> it out of the way. Then we repeat the process using & for the intersection.

... (e . f) (e & f)

With the {} and swaps, we end up at:

... U(e & f) U(e . f)

It’s probably not clear what this does, so let’s look at how it operates when e = {x}, and f = {x, y}:

... 0 -> 0 0

... {{x}} -> x x

... {{x}, {x, y}} -> ... ({x} & {x, y}) ({x} . {x, y}) = ... x y

Ordered pairs

The set-theoretic ordered pair (x, y) is represented by {{x}, {x, y}}. The % operator builds ordered pairs and the * command destructures them. That’s why they exist in the language but, if they weren’t provided, they could be built from the other operators. They exists largely to make the language more convenient. 

Substitutions like this can, again, be tested at the SetBang repl like so:

S∈tBang> :test %* %(~#1=(_{}{}~,_~\2>\2>_.2>\2>\2>_&{}2>{}),0)
............... All tests passed.

We prefix both expressions with % because we only care about getting identical actions on ordered pairs. (The behavior of * is undefined on sets that aren’t either ordered pairs or empty.)

Using ordered pairs, we can build up linked lists, using {} for nil and (x, y), as designed above, for cons cells. We’ll use this in proving that SetBang is Turing Complete.

Arithmetic

SetBang can do arithmetic. Here’s a predecessor macro:

S∈tBang> :macro pred \_#
Stack: {7} {5}
S∈tBang> 6 :pred:
Stack: {7} {5} 5

Here are imperative addition and subtraction functions:

:macro swap 2>

:macro impPlus [:swap:':swap::pred:]_

:macro impMinus [:swap::pred::swap::pred:]_

with the caveat that minus is interpreted to mean limited subtraction (returning 0 when conventional subtraction would yield a negative number). These work, but we have the tools to do better. These macros, after all, use imperative loops. They’re not purely functional and they don’t invoke set theory.

We can get a truer, purer minus using -#, e.g. ... 7 4 -> ... {4, 6, 5} -> ... 3.

How do we get a purer addition function? This can be achieved using ordered pairs:

:macro plus 02>{2>~3>%"}3>_12>{2>~3>%"}2>_|#

How does it work? It uses {}-comprehensions to map over each set:

  • X -> {(0, x) for all x ∈ X}
  • Y -> {(1, y) for all y ∈ Y}

and then a union, followed by a cardinality check, can be used to perform the addition.

Cartesian products aren’t very hard to write either.

:macro prod {2>{2>%"}};

It has a set comprehension within another one. That’s not a problem. Let’s look at how it works. Starting from ... X Y, we go immediately into a loop over the Y-elements. We do a swap and start a loop on the X-elements, and have ... y x. The 2>is a swap, and % makes the pair, and "puts it in a set.

The behavior of the inner comprehension is ... y X -> ... y {(x, y) for all x ∈ X}.

One might expect, then, that the behavior of the outer loop should be ... X Y -> ... (X x Y). It’s close. The thing to remember though is that side effects on the stack that would be expected to move (and destroy) the X, such effects never happen. The stack state that exists when a % is executed is used for each “iteration” of the {} comprehension.

Thus, it’s not possible to populate the stack with a set using a program like {~"}. If you want to do that, you have to use the imperative []-loop, like in this program:[\2>]_, which iteratively applies \ to a set, and then deletes it when it’s empty.

To multiply numbers, there’s a simple program that does the job:

:prod:#, which macroexpands to {2>{2>%"}};#.

Can we divide? Yes, we can. Here’s one way to do it. It turns out to be unbearably slow, but it is mathematically correct:

:macro quot 2>~'3<2>~{3<:times:"}&\_#

Why is it slow? It gives us the following set-theoretic definition of division:

n quot k = #(n’ ∩ k*n’) – 1, e.g. 19 div 5 = #({0, … 19} ∩ {0, 5, 10, 15…}) – 1 = 4 – 1 = 3

Unfortunately, it’s O(n^3)– worse yet, not in the size of the number, but in the number itself. This is not an efficient division algorithm.

In fact, SetBang carries a persistent danger of inefficiency. Why is that? Well, let’s consider what hereditary finite sets are: Rose trees whose nodes contain no information. In Haskell, this could be implemented like what follows.

data Set = Empty | Node [Set]

or, equivalently:

RoseTree (), where

data RoseTree a = Leaf a | Branch [RoseTree a]

An implementation that uses shared data (and, as mine does, exploits numeric constants) is required; otherwise, you’ll have exponential storage just to represent the natural numbers. Given that there is no control over choice order (it’s implementation-defined) it is hard to stamp out the risk of unexpected exponential behavior completely (although we will not observe it in any example here).

One way to make arithmetic faster would be to use a different encoding than the ordinals. One candidate would be to use bit sets (e.g. 23 = {0, 1, 2, 4}) and write the arithmetical operators on those (as well as conversions both ways). Another would be to use von Neumann indices, where a hereditarily finite set’s index is computed as:

I({}) = 0

I({a, b, …}) = 2^a + 2^b + …

This function I is relatively easy to invert (call its inverse J). For example, we’d represent the number 11 not with {0, 1, …, 10} but with:

J(9) = {J(0), J(1), J(3)}

J(3) = {J(0), J(1)}, J(1) = {J(0)}, J(0) = {}, ergo:

J(11) = {{}, {{}}, {{}, {{}}}}

These sets are far more compact than the ordinals for the same numbers. Arithmetic could be construed to operate on numbers represented in this way, and would then take on a flavor of (much more efficient) binary arithmetic. We won’t be doing that here, though: it’s far too practical.

You can test for primality in SetBang:

:macro not 0=

:macro divides ~3<2>{2>:times:"};2>?

:macro prime ~2-{2>:divides:}:not:

Is it fast? No. It’s horribly inefficient– in the current implementation, it’s O(n^4), and takes 10 seconds to figure out that 23 is prime, so we can expect it to take a couple of days on 257, and 21 million years on 65,537– but it’s correct.

If one wished to put the entire set of primes (lazily evaluated) on the stack, one could do so:

${~:prime:(_",__0)}

which, in its full macroexpanded glory, becomes:

${~~2-{2>~3<2>{2>{2>{2>%"}};#"};2>?}0=(_",__0)}

I don’t recommend doing this. If you’re interested in working with large (6+ bit) prime numbers, I recommend representing the numbers in a more efficient way. Of course, that removes from SetBang its delicious impracticality, and suggests that one might use other languages altogether when one needs to work with large primes like 47.

The fact that there are five {}-comprehensions in the macro-less form above suggests that it is O(n^5) to compute the nth prime, and that’s about correct. (It’s slightly worse; because the nth prime is approximately n*log(n), it’s O(n^5*(log n)^4).) This might be one of the few ways in which SetBang is readable: nested {} comprehensions give you an intuitive sense of how cataclysmically inefficient your number-theory code is. And remember that n here is the number itself, and not the size (in, say, bits or digits) of the number.

Data structures

With % and * we have the machinery to build up linked lists. Let’s do that. This language obviously isn’t the best choice for number theory.

Let’s write some macros.

:macro BIG 3^^\_#

:macro u *2>~:BIG:?(__:BIG:,_')2>%

:macro d *2>\_#2>%

:macro l %

:macro r *

:macro p *2>~!2>*

:macro g *2>_@2>*

:macro w *2>[2>%

:macro x *2>]2>%

What do these do? Well, the first one, BIG, simply puts the number 255 (2^(2^3) – 1) on the stack. That’s the maximum value of a single unsigned byte.

Except for l, the remaining macros assume that TOS will be an ordered pair or {}, so let’s consider what might make that always true, in light of the other operators. To take note of an edge case, remember that * destructures an ordered pair, until we get to 0 = {}, when it behaves as ... 0 -> ... 0 0. This might suggest that these macros are intended for an environment in which TOS is always a linked list (possibly {}). That would be the correct intuition, and we can understand the first six operators in terms of their effects on the stack when TOS is a linked list.

  • u : ... (h, t) -> ... (h', t) where h' = min(h + 1, 255)
  • d : ... (h, t) -> ... (h*, t) where h* = max(h - 1, 0)
  • l : ... a (h, t) -> ... (a, (h, t)) and | (h, t) -> | (0, (h, t))
  • r : ... (h, t) -> ... h t and ... 0 -> 0 0
  • p : ... (h, t) -> ... (h, t) with #h printed to console
  • g : ... (_, t) -> ... (c, t) where c is read from console

It’s worth noting the behavior of l and r in edge cases. The edge case of l is when the stack is deficient, noting that % demands 2 arguments. Because SetBang left-fills with {}’s, the behavior is to add a {} to the linked list at TOS. The edge case of r is when TOS is 0, in which case we end up with another 0.

The remaining two macros, wand x, might look a little bit odd. Standing alone, neither is legal code. That’s OK, though. SetBang doesn’t require that macros expand to legal code, and as long as ws and xs are balanced, it will generate legal code. So let’s consider the expansion of wSx, where S is a string of SetBang code. The behavior of wSx, then, is a SetBang []-loop, but with the head of TOS (rather than TOS itself) being used to make the determination about whether to continue the loop.

We can now prove that SetBang is Turing Complete. Brainfuck is Turing Complete, and we can translate any Brainfuck program to a SetBang program as follows:

  • Start with a 0,
  • replace all instances of +, -, >, <, ., ,, [, and ] with :u:, :d:, :l:, :r:, :p:, :g:, :w:, and :x:, respectively. 

Therefore, if for some reason you wish not to write in SetBang, you can always write your program in Brainfuck and transliterate it to SetBang!

This proves that SetBang is Turing Complete, but that shouldn’t surprise us. It’s a fairly complex language, using every punctuation mark on the keyboard as a command. Powerful commands like % and * feel like cheating, and clearly the numerical commands aren’t all necessary: we can always write 0'''' instead of 4, for example.

So how much can we cut and still have a usable language? Which operators are necessary, and which ones can we do away with? And as we cut away more and more from the language, what does the code end up looking like? This is what we’ll focus on in Part 3.