SetBang 3: Fast(er) math in SetBang

SetBang, even when limited to the finite world, is inherently “inefficient”. We might like to live in a world where straightforward set operations can be performed quickly. Unfortunately, that’s not the case. Short programs with no obvious infinite loops can still take a long time.

Let’s restrict ourselves to hereditarily definite sets. A definite set in SetBang is that is derived from other finite sets, and hereditarily definite means that it’s a definite set whose elements are all (hereditarily) definite sets.

There are many finite and probably-finite sets that aren’t definite. For example, if we admit this macro:

:macro magic #003<[003<[~#1?(_\_\_2>'2>,__2>_12>0)]_2>(4<~5<2>/4>'4>_,4<'4>_)]_;~22>?(__0,_2003<[003<[~#1?(_\_\_2>'2>,__2>_12>0)]_2>(4<~5<2>/4>'4>_,4<'4>_)]_;[~~03>[\3<~3<[\_2>{'"}2>]_4<[~3<~4<2>4<.3>&{'"}]_3>2>]__3<~4>~3<~4<2>4<=(;;,_2>~3<~4<2>4<-3>2>-(~{}3<~4>{~3>2>~4>?(_",__0)}\2>[\3<&2>]_;(~"4<2>-3>2>-1[~3<~4<2>4<.3>&{'"}]_[~3<~4<2>4<.3>&{'"}]_,___0),_)(_1))(_~3<~4<2>4<(03>~{}3<~{}3<-#'[\_#3<~3<~4>[\_2>{'"}2>]_~5<~6>~3<~4<2>4<=(;;,_2>~3<~4<2>4<-3>2>-(~{}3<~4>{~3>2>~4>?(_",__0)}\2>[\3<&2>]_;(~"4<2>-3>2>-1[~3<~4<2>4<.3>&{'"}]_[~3<~4<2>4<.3>&{'"}]_,___0),_)(_1))(_3<~4>6<2>/5>4<2>~3<~4<2>4<-3>2>-(~{}3<~4>{~3>2>~4>?(_",__0)}\2>[\3<&2>]_;(~"4<2>-3>2>-1[~3<~4<2>4<.3>&{'"}]_[~3<~4<2>4<.3>&{'"}]_,___0),_)3<,__3>)]_;,2>)(__1[~3<~4<2>4<.3>&{'"}]_,____00),___10)]_)

we can create these sets:

:macro mersenne ${^\_#~:magic:(_",__0)}

:macro fermat ${^^'~:magic:(_",__0)}

:macro maybebot :fermat:`5_`1; 

What does the magic macro do? Its behavior is:

... N -> ... (1 if N is prime, 0 if N is composite).

As a result, mersenne produces the set of Mersenne primes, which is possibly infinite although that remains unknown, and fermat produces that of Fermat primes, of which there are probably exactly 5 (although there may be infinitely many). The set maybebot, derived from fermat, is nonterminating empty (“bottom”) if there are only 5 Fermat primes, and a one-element set if there are 6 or more. The last of these is a set that is trivially, provably, finite; but its derivation (ultimately from $) makes it a risk of non-termination. It’s finite, but not definite.

If we remove $ from SetBang, we can make every set not only definite but hereditarily definite (meaning that the elements are definite sets as well). We still have the potential for non-terminating computations due to [] loops, but otherwise, even if we use {}, everything will terminate.

Is SetBang still “inefficient”? Yes. So let’s talk about what an efficient SetBang would look like, and assume that we’re only working hereditarily definite sets (i.e. there is no $). An efficient SetBang would:

  • canonicalize sets so that set equality checks (=) can be performed in bounded time as a function of the size of the set.
  • perform membership checks (?) in bounded time.
  • represent structured large sets (e.g. 8^#^, the set of all subsets of {0, …, 255}) in an efficient way that doesn’t require storing the entire set.
    • The example above has 2^256 elements. We can’t store it in its entirety but, because it’s a highly structured large set, we can easily check for membership in it.
  • perform primitive operations (namely, \and /) in constant or logarithmic time.
  • “execute” {} comprehensions in a way (strict or eager) that doesn’t take too much time or space upfront, but that allows the resulting sets to be used efficiently, assuming that the code inside the comprehensions runs efficiently.

Is it possible, on some physical device possibly unlike an existing computer, to make an efficient SetBang, even excluding non-definite sets by removing $? It’s extremely unlikely, as we’ll see below.

Before we do that, we note that ordinals are an extremely inefficient way to represent numbers, so let’s use binary encoding to represent them as sets of ordinals instead. (An implementation, of course, can handle small finite ordinals as efficiently as if they were constants, urelements, or near equivalently, machine integers.) So we represent the number 37 as {0, 2, 5} (where the numerals on the right-hand side are still ordinals) as opposed to {0, …, 36}.

We could go further than binary and use von Neumann indices (or “hereditary binary” representations) defined like so:

I({}) = 0

I({a, b, …}) = 2^a + 2^b + …

This creates a bijection between the natural numbers and the hereditarily finite sets. Letting J be I’s inverse, we’d represent 37 as:

J(37) = {J(0), J(2), J(5)} = {{}, {J(1)}, {J(0), J(2)}} = {{}, {{J(0)}}, {{}, {J(1)}}} = {{}, {{{}}}, {{},{{{}}}}}

Why don’t we do this? In practice, I don’t consider it worth it. The multiple recursion of the index encoding and decoding complicate code considerably. Besides, an implementation can hard-code the first 256 ordinal constants and get the same efficiency (for numbers below 2^256) that we would have as if we treated them as urelements. While we gain efficient storage of certain “sparse” numbers (e.g. 2^(2^(2^(2^7 + 1))) + 23) by using hereditary binary, we complicate our algorithms (by, among other things, replacing single recursion with multiple recursion, requiring unpredictable amounts of stack space) and we still can’t do a lot with them. It does not enable us to make tractable operations out of those that would otherwise be intractable (e.g. precise arithmetic on such massive numbers).

With the binary representation, we have a compact addition function:

S∈tBang> :macro addbinary [~3<~4<2>4<.3>&{'"}]_
Stack: {0, 1, 6, 3, 5} {7, 4, 5, 8}
S∈tBang> :addbinary:
Stack: {0, 1, 4, 3, 9}

We see it correctly adding 107 + 432 = 539. The way it works is by repeatedly executing the behavior:

... X Y -> ... (X xor Y) (X and Y) -> ... (X xor Y) {z + 1 : z ∈ X and Y}

until the latter set (TOS) is zero/empty. This propagates the carries in the addition function, and it terminates when there are no more carries. It may not be the most efficient job, but it does the job in polynomial time (of the size of the sets, or the bit-sizes of the numbers).

With this, we can now illustrate concretely that our dream of efficient (as defined above) SetBang is probably not possible. Consider the following code.

^{:sumbinary:=}(1,0);;

with sumbinary (using a loop, but guaranteed to terminate in polynomial time) defined like so:

:macro sumbinary 02>[\3<:addbinary:2>]_

and the former’s behavior is

S X -> 1 if some subset Y of X has Sum(Y) = S, 0 if else.

This is the subset-sum problem, known to be NP-complete. Ergo, if P is not NP, an efficient SetBang implementation is not possible. Even by eschewing $, and restricting ourselves to the most trivial measurement of a set (“Is it empty?”) it will always be easy to write terminating but intractable programs.

In short, one could use laziness to make the {} operator “efficient” and simply not do gnarly computations until needed, and there’s room for cleverness around equality and membership checks (which often don’t require computation of the entire set) but even an empty-check or a \ creates a risk, even in the definite world, of exponential-or-worse time cost.

These results shouldn’t be surprising. Power sets are structured and one can therefore check quickly that, for example, {{{{5}}}} ∈ P(P(P(P(N)))), but they’re still very big. It’s possible, in principle, that SetBang could be implemented in such a way that a large number of complex queries happen efficiently. It’s just not possible, even with $ and [] taken out, to make SetBang fast in all cases. The language is too powerful.

Although there is much FUD around programming languages and speed, SetBang is an example of a language that is literally “not fast”.

Fast(er) math

On Github, I’ve included a library of faster math functions in resources/fastmath.sbg. These operate on numbers that have been converted to binary, like so:

Stack: 37
S∈tBang> :tobinary:
Stack: {0, 2, 5}

If you actually want to convert such numbers back into the more familiar ordinals, you can use frombinary. We’ve seen addbinary, and there isn’t much more to the other arithmetic operators: they’re standard “school book” implementations of subtraction, multiplication, and division-with-remainder on binary numbers. They improve the performance of our arithmetic considerably.

Recall that we previously had an implementation that would take 21 million years to determine that 65,537 was prime. We can now do it in about 30 seconds.

S∈tBang> 00/9'''''''/
Stack: {0, 16}
S∈tBang> :primebinary:
Stack: 1

That’s still a few hundreds of thousands (if not millions) of times slower than, say, trial division in C, but it’s an improvement over what it was.

Here’s a program that (quasi-)efficiently puts the set of all primes on the stack:

S∈tBang> ${~:tobinary::primebinary:(_",__0)}
Stack: {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53, ...}

In fact, full SetBang allows one to put nearly any describable mathematical object on the stack. You can’t always do much with it, of course. Depending on what it is, you might not be able to do anything with it, as it could be:

{n where n is a Gödel number for a proof of theorem T}

which might equal {} when T is provably false (assuming a much smarter implementation of SetBang) but is “bottom” when T is unprovable.

SetBang 2: SetBang programming

In Part 1, I introduced SetBang, an esoteric language based on set theory. Here, I’ll write some SetBang programs, and show that it’s Turing Complete.

First, let’s talk about the “mystery operator” that I introduced: *, which is identical to:

(~#1=(_{}{}~,_~\2>\2>_.2>\2>\2>_&{}2>{}),0)

Remember that 2> is a swap sequence with behavior ... X Y -> ... Y X. You’ll see it a lot.

What does this * do? Let’s take it apart. It’s a conditional, noting the outer (), and the else-clause is simple: 0. So its behavior is:

... 0 -> ... 0 0.

In the then-case, we ~# TOS and equality-check against 1, and then go into another conditional based on that result. So, if TOS is a 1-element set, we execute _{}{}. The eats the boolean produced by the =:

... {e} 1 -> ... {e}

and then we do two {}{}, leaving us with:

... U(e).

In the else-case of the inner conditional, we have at least two elements.

... {e, f, ...} 0.

We the boolean and ~ the set {e, f, ...} and then use \2>\2> to extract its elements (remember that 2> are swaps) then _the remainder set.

... {e, f, ...} e f

We use . to compute the exclusive-or of e and f, and 2> it out of the way. Then we repeat the process using & for the intersection.

... (e . f) (e & f)

With the {} and swaps, we end up at:

... U(e & f) U(e . f)

It’s probably not clear what this does, so let’s look at how it operates when e = {x}, and f = {x, y}:

... 0 -> 0 0

... {{x}} -> x x

... {{x}, {x, y}} -> ... ({x} & {x, y}) ({x} . {x, y}) = ... x y

Ordered pairs

The set-theoretic ordered pair (x, y) is represented by {{x}, {x, y}}. The % operator builds ordered pairs and the * command destructures them. That’s why they exist in the language but, if they weren’t provided, they could be built from the other operators. They exists largely to make the language more convenient. 

Substitutions like this can, again, be tested at the SetBang repl like so:

S∈tBang> :test %* %(~#1=(_{}{}~,_~\2>\2>_.2>\2>\2>_&{}2>{}),0)
............... All tests passed.

We prefix both expressions with % because we only care about getting identical actions on ordered pairs. (The behavior of * is undefined on sets that aren’t either ordered pairs or empty.)

Using ordered pairs, we can build up linked lists, using {} for nil and (x, y), as designed above, for cons cells. We’ll use this in proving that SetBang is Turing Complete.

Arithmetic

SetBang can do arithmetic. Here’s a predecessor macro:

S∈tBang> :macro pred \_#
Stack: {7} {5}
S∈tBang> 6 :pred:
Stack: {7} {5} 5

Here are imperative addition and subtraction functions:

:macro swap 2>

:macro impPlus [:swap:':swap::pred:]_

:macro impMinus [:swap::pred::swap::pred:]_

with the caveat that minus is interpreted to mean limited subtraction (returning 0 when conventional subtraction would yield a negative number). These work, but we have the tools to do better. These macros, after all, use imperative loops. They’re not purely functional and they don’t invoke set theory.

We can get a truer, purer minus using -#, e.g. ... 7 4 -> ... {4, 6, 5} -> ... 3.

How do we get a purer addition function? This can be achieved using ordered pairs:

:macro plus 02>{2>~3>%"}3>_12>{2>~3>%"}2>_|#

How does it work? It uses {}-comprehensions to map over each set:

  • X -> {(0, x) for all x ∈ X}
  • Y -> {(1, y) for all y ∈ Y}

and then a union, followed by a cardinality check, can be used to perform the addition.

Cartesian products aren’t very hard to write either.

:macro prod {2>{2>%"}};

It has a set comprehension within another one. That’s not a problem. Let’s look at how it works. Starting from ... X Y, we go immediately into a loop over the Y-elements. We do a swap and start a loop on the X-elements, and have ... y x. The 2>is a swap, and % makes the pair, and "puts it in a set.

The behavior of the inner comprehension is ... y X -> ... y {(x, y) for all x ∈ X}.

One might expect, then, that the behavior of the outer loop should be ... X Y -> ... (X x Y). It’s close. The thing to remember though is that side effects on the stack that would be expected to move (and destroy) the X, such effects never happen. The stack state that exists when a % is executed is used for each “iteration” of the {} comprehension.

Thus, it’s not possible to populate the stack with a set using a program like {~"}. If you want to do that, you have to use the imperative []-loop, like in this program:[\2>]_, which iteratively applies \ to a set, and then deletes it when it’s empty.

To multiply numbers, there’s a simple program that does the job:

:prod:#, which macroexpands to {2>{2>%"}};#.

Can we divide? Yes, we can. Here’s one way to do it. It turns out to be unbearably slow, but it is mathematically correct:

:macro quot 2>~'3<2>~{3<:times:"}&\_#

Why is it slow? It gives us the following set-theoretic definition of division:

n quot k = #(n’ ∩ k*n’) – 1, e.g. 19 div 5 = #({0, … 19} ∩ {0, 5, 10, 15…}) – 1 = 4 – 1 = 3

Unfortunately, it’s O(n^3)– worse yet, not in the size of the number, but in the number itself. This is not an efficient division algorithm.

In fact, SetBang carries a persistent danger of inefficiency. Why is that? Well, let’s consider what hereditary finite sets are: Rose trees whose nodes contain no information. In Haskell, this could be implemented like what follows.

data Set = Empty | Node [Set]

or, equivalently:

RoseTree (), where

data RoseTree a = Leaf a | Branch [RoseTree a]

An implementation that uses shared data (and, as mine does, exploits numeric constants) is required; otherwise, you’ll have exponential storage just to represent the natural numbers. Given that there is no control over choice order (it’s implementation-defined) it is hard to stamp out the risk of unexpected exponential behavior completely (although we will not observe it in any example here).

One way to make arithmetic faster would be to use a different encoding than the ordinals. One candidate would be to use bit sets (e.g. 23 = {0, 1, 2, 4}) and write the arithmetical operators on those (as well as conversions both ways). Another would be to use von Neumann indices, where a hereditarily finite set’s index is computed as:

I({}) = 0

I({a, b, …}) = 2^a + 2^b + …

This function I is relatively easy to invert (call its inverse J). For example, we’d represent the number 11 not with {0, 1, …, 10} but with:

J(9) = {J(0), J(1), J(3)}

J(3) = {J(0), J(1)}, J(1) = {J(0)}, J(0) = {}, ergo:

J(11) = {{}, {{}}, {{}, {{}}}}

These sets are far more compact than the ordinals for the same numbers. Arithmetic could be construed to operate on numbers represented in this way, and would then take on a flavor of (much more efficient) binary arithmetic. We won’t be doing that here, though: it’s far too practical.

You can test for primality in SetBang:

:macro not 0=

:macro divides ~3<2>{2>:times:"};2>?

:macro prime ~2-{2>:divides:}:not:

Is it fast? No. It’s horribly inefficient– in the current implementation, it’s O(n^4), and takes 10 seconds to figure out that 23 is prime, so we can expect it to take a couple of days on 257, and 21 million years on 65,537– but it’s correct.

If one wished to put the entire set of primes (lazily evaluated) on the stack, one could do so:

${~:prime:(_",__0)}

which, in its full macroexpanded glory, becomes:

${~~2-{2>~3<2>{2>{2>{2>%"}};#"};2>?}0=(_",__0)}

I don’t recommend doing this. If you’re interested in working with large (6+ bit) prime numbers, I recommend representing the numbers in a more efficient way. Of course, that removes from SetBang its delicious impracticality, and suggests that one might use other languages altogether when one needs to work with large primes like 47.

The fact that there are five {}-comprehensions in the macro-less form above suggests that it is O(n^5) to compute the nth prime, and that’s about correct. (It’s slightly worse; because the nth prime is approximately n*log(n), it’s O(n^5*(log n)^4).) This might be one of the few ways in which SetBang is readable: nested {} comprehensions give you an intuitive sense of how cataclysmically inefficient your number-theory code is. And remember that n here is the number itself, and not the size (in, say, bits or digits) of the number.

Data structures

With % and * we have the machinery to build up linked lists. Let’s do that. This language obviously isn’t the best choice for number theory.

Let’s write some macros.

:macro BIG 3^^\_#

:macro u *2>~:BIG:?(__:BIG:,_')2>%

:macro d *2>\_#2>%

:macro l %

:macro r *

:macro p *2>~!2>*

:macro g *2>_@2>*

:macro w *2>[2>%

:macro x *2>]2>%

What do these do? Well, the first one, BIG, simply puts the number 255 (2^(2^3) – 1) on the stack. That’s the maximum value of a single unsigned byte.

Except for l, the remaining macros assume that TOS will be an ordered pair or {}, so let’s consider what might make that always true, in light of the other operators. To take note of an edge case, remember that * destructures an ordered pair, until we get to 0 = {}, when it behaves as ... 0 -> ... 0 0. This might suggest that these macros are intended for an environment in which TOS is always a linked list (possibly {}). That would be the correct intuition, and we can understand the first six operators in terms of their effects on the stack when TOS is a linked list.

  • u : ... (h, t) -> ... (h', t) where h' = min(h + 1, 255)
  • d : ... (h, t) -> ... (h*, t) where h* = max(h - 1, 0)
  • l : ... a (h, t) -> ... (a, (h, t)) and | (h, t) -> | (0, (h, t))
  • r : ... (h, t) -> ... h t and ... 0 -> 0 0
  • p : ... (h, t) -> ... (h, t) with #h printed to console
  • g : ... (_, t) -> ... (c, t) where c is read from console

It’s worth noting the behavior of l and r in edge cases. The edge case of l is when the stack is deficient, noting that % demands 2 arguments. Because SetBang left-fills with {}’s, the behavior is to add a {} to the linked list at TOS. The edge case of r is when TOS is 0, in which case we end up with another 0.

The remaining two macros, wand x, might look a little bit odd. Standing alone, neither is legal code. That’s OK, though. SetBang doesn’t require that macros expand to legal code, and as long as ws and xs are balanced, it will generate legal code. So let’s consider the expansion of wSx, where S is a string of SetBang code. The behavior of wSx, then, is a SetBang []-loop, but with the head of TOS (rather than TOS itself) being used to make the determination about whether to continue the loop.

We can now prove that SetBang is Turing Complete. Brainfuck is Turing Complete, and we can translate any Brainfuck program to a SetBang program as follows:

  • Start with a 0,
  • replace all instances of +, -, >, <, ., ,, [, and ] with :u:, :d:, :l:, :r:, :p:, :g:, :w:, and :x:, respectively. 

Therefore, if for some reason you wish not to write in SetBang, you can always write your program in Brainfuck and transliterate it to SetBang!

This proves that SetBang is Turing Complete, but that shouldn’t surprise us. It’s a fairly complex language, using every punctuation mark on the keyboard as a command. Powerful commands like % and * feel like cheating, and clearly the numerical commands aren’t all necessary: we can always write 0'''' instead of 4, for example.

So how much can we cut and still have a usable language? Which operators are necessary, and which ones can we do away with? And as we cut away more and more from the language, what does the code end up looking like? This is what we’ll focus on in Part 3.

 

SetBang 1: Toward a more unreadable esolang

SetBang (or S∈tBang) is a language based on the foundation of all mathematics: set theory. It’s esoteric by design, so I wouldn’t use it for anything where performance, readability, or programmer sanity were requirements. However, it’s probably great for job security.

Getting SetBang

You’ll need Leiningen (Clojure’s package manager) to run SetBang. SetBang lives on this repository.

“Hello, World!” in SetBang.

The first thing one does in any programming language is write “Hello, World!”. In SetBang, that can achieved as follows.

  1. Open a new file called helloworld.sbg.
  2. In that file, type in the following short program:6^#~8[2>'2>\_#]_!5^#[2>'2>\_#]_~5[2>'2>\_#]_!~9'''  [2>'2>\_#]_~~!!3[2>'2>\_#]_~!5^#~9'''[2>'2>\_#]_!!~8    ][2>'2>\_#]_!~!~3[2>'2>\_#]_!~3[2>\_#2>\_#]_!9''[2>\_#2>\_#]_!5^#'!9'!___
  3. Run ./setbang helloworld.sbg at the command line.

Your output should be:

Hello, world!

If you don’t feel like typing that, the program is provided at resources/hello_nomacro.sbg.

It may not be clear, from the code above, what SetBang is doing. In fact, that particular program doesn’t use any set theory at all. Don’t worry: both the language and the theory behind it will become clearer over time.

The Stack

Every SetBang program runs with a stack. This is the most accessible form of state in SetBang. To see how it works, run ./setbangat the command line. Turn :numeric(a configuration option) off like so:

S∈tBang> :numeric off
Stack:

Right now, the stack is empty. We introduce the first fundamental operator of SetBang: 0. It’s a program that pushes a value, the empty set, on the stack.

S∈tBang> 0
Stack: {}
S∈tBang> 000
Stack: {} {} {} {}

We’ll describe this as 0 having the behavior ... -> ... 0. The program 000 has the behavior ... -> ... {} {} {}. It puts 3 empty sets on the stack.

Empty sets aren’t terribly exciting, so we’ll use another command, /, whose behavior is ... X e -> ... (X ∪ {e}).

S∈tBang> /
Stack: {} {} {{}}
S∈tBang> /
Stack: {} {{{}}}
S∈tBang> /
Stack: {{{{}}}}

If you want to drop the top-of-stack (“TOS”) use _, whose behavior is ... X -> .... You can duplicate TOS with ~, whose behavior is  ... X -> ... X X.

S∈tBang> _
Stack:
S∈tBang>0~
Stack: {} {}
S∈tBang> /
Stack: {{}}

There’s an additional command ', which behaves identically to ~/. Its behavior is ... X -> ... X  ∪ {X} . It’s not strictly necessary, but it can make SetBang coding easier, so it’s included.

S∈tBang> _
Stack:
S∈tBang> 0'
Stack: {{}}
S∈tBang> ~/
Stack: {{}{{}}}
S∈tBang> '
Stack: {{}{{}}{{}{{}}}}

One might be tempted to ask: so what are these things sets of? That’s the beauty of SetBang: just sets! The only values in SetBang are sets. It’s the ultimate unityped language, as there are no urelements at all. SetBang lives strictly within the Von Neumann universe.

Of course, this can get unwieldy. For example, execute the program  ''' on the stack above, and you’ll get:

S∈tBang> '''
Stack: {{}{{}}{{}{{}}{{}{{}}{{}{{}}}}{{}{{}}}}{{}{{}}{{}{{}}}}{{}{{}}}{{}{{}}{{}{{}}{{}{{}}{{}{{}}}}{{}{{}}}}{{}{{}}{{}{{}}}}{{}{{}}}}}

That’s not very human-readable. It turns out that a certain class of sets get a lot of use. They’re the natural numbers, defined like so:

  • {} is a natural number and often called “0”.
  • if n is a natural number, then n ∪ {n}, also known as n’ (or S(n) or n+1), is a natural number.

If you want your natural numbers visualized in the more familiar notation, use :numeric on.

S∈tBang> :numeric on
Stack: 6
S∈tBang> '''
Stack: 9

Because SetBang is unityped, we’re going to use 1 = {0} = {{}} as synonymous with “True” and 0 = {} as synonymous with “False”.

There are commands 1, 2, … , for putting some of the more commonly used natural numbers on the stack. However, don’t be fooled into thinking of them as values. Values don’t exist in SetBang, and there are no literals. They’re commands that do specific things, and 10 does not place the number 10 on the stack, because it’s a followed by a . Instead, it places a {{}} (“1”) and a {} (“0”) on the stack. We’ll discuss tools for creating large numbers later.

S∈tBang> 10
Stack: 9 1 0

The other numerical commands aren’t necessary to write SetBang, of course. One can always use 0'''instead of 3, for example.

Another operator of interest is the cardinality operator, #, whose behavior is ... X -> ... #X where #X is the size of X. Let’s build up the set {2, 3, 5, 7}:

S∈tBang> 02/3/5/7/
Stack: 9 1 0 {7, 3, 2, 5}

We use #and we get back 4. Then, to show that it’s identical to {0, 1, 2, 3}, we use ?, whose behavior is ... X e -> (1 if e ∈ X, 0 if else). That tells us that 2 ∈ 4, as we expect.

S∈tBang> #
Stack: 9 1 0 4
S∈tBang> 2?
Stack: 9 1 0 1

Crucial in stack manipulation are the > and <commands, which behave as follows.
> : ... X1 X2 ... Xk K -> ... Xk X1 ... Xk-1 where k = #K (i.e. K's cardinality).

< : ... X1 X2 ... Xk K -> ... X2 X3 ... Xk X1

Thus, for example >would have the behaviors:

... X Y 2 -> ... Y X

... X Y Z 3 -> ... Z X Y

... X Y Z W 4 -> ... W X Y Z

The 2>pattern is commonly used to swap the top two values on the stack and, of course, 2<has the same effect.

S∈tBang> 2>
Stack: 9 1 1 0
S∈tBang> 3<
Stack: 9 1 0 1
S∈tBang> 4<
Stack: 1 0 1 9
S∈tBang> 7 3
Stack: 1 0 1 9 7 3
S∈tBang> 6>6>5<
Stack: 7 1 0 1 9 3
S∈tBang> >
Stack: 7 1 9 0 1

There’s a command ;whose behavior is ... X Y -> ... Y. If it weren’t provided, it could be simulated with 2>_.

For one last thing before getting into the theory that motivated the language, I’ll introduce conditional execution. The block (code)executes the program in codeif and only if TOS is non-empty, and (then,else)will execute the program thenif TOS is non-empty and elseif TOS is an empty set.

S∈tBang> (___34,9)
Stack: 7 1 3 4
S∈tBang> 0(9,___)
Stack: 7 1
S∈tBang> (80)(9)
Stack: 7 1 8 0

How much can we do with what we have so far? Surprisingly, a lot. However, let’s motivate the language with some set theory by introducing more operators, those justified by the ZFC system of set theory.

ZFC and SetBang

I won’t reiterate the axioms, but I’ll explain what they mean and what they contribute to SetBang.

Extensionality

The Axiom of Extensionality defines what it is for sets to be the same. Two sets are equal if they contain the same elements. If they are equal, they’re members of the same sets. This also implies the nonexistence of urelements by stating that there’s only one thing (namely, the empty set) that has no members.

This is realized in SetBang with the operator =. Its behavior is:

... X Y -> ... (1 if X = Y, 0 if X /= Y)

There is one caveat: equality checking can take a long time. Specifically, it can take infinitely long, as we’ll see when we deal with infinite sets. It behaves well on finite sets, though.

S∈tBang> 02/3/5/7/ 4
Stack: {7, 3, 2, 5} 4
S∈tBang> =
Stack: 0

Of course, {7, 3, 2, 5} is not equal to 4, so this returns false/0. Equality is independent of ordering, hence:

S∈tBang> _
Stack:
S∈tBang> 02/3/5/7/ 07/3/2/5/
Stack: {7, 3, 2, 5} {7, 3, 2, 5}
S∈tBang> =
Stack: 1

Pairing

Pairing is the quintessential building axiom in ZFC set theory. It says that for any sets x, y, there is also a set {x, y} containing exactly them (and no other elements). Pairing is implemented using the +command, with behavior:

... X Y -> ... {X, Y}.

For user convenience, we have some related commands:

" : ... X -> {X}

% : ... X Y -> {{X}, {X, Y}}

Both of these can be defined in terms of +:

" is ~+

% is 2>~~+3>++

The latter of these may be hard to believe. One can work it out at the REPL.

S∈tBang> 3 8 %
Stack: {{3}, {3, 8}}
S∈tBang> 3 8 2>~~+3>++
Stack: {{3}, {3, 8}} {{3}, {3, 8}}
S∈tBang> =
Stack: 1

For more assurance, one can try it out using QuickCheck-like generative testing, via the :testdirective.

S∈tBang> :test % 2>~~+3>++
............... All tests passed.
Stack: 1

This gives us a high degree of confidence that the programs behave identically. To see a case of failure, let’s do one that ought to fail:

S∈tBang> :test #62>? _1
.........
Test #9 FAILED!

Stack was: Stack: {{{0, 3}, 2}, 4} {4, {0, 5}} {7, 2} 9
Program #1: #62>?
Result #1 Stack: {{{0, 3}, 2}, 4} {4, {0, 5}} {7, 2} 0

Program #2: _1
Result #2 Stack: {{{0, 3}, 2}, 4} {4, {0, 5}} {7, 2} 1

Noting that 2>is a swap and that ?is comparison on natural numbers, the behavior of 6#2>? is:

... X -> ... (1 if #X < 6, 0 if #X > 6)

This is why the first nine tests (numbering starts at zero, of course) pass– that program is identical to _1 in the simpler cases– but #9 fails. Although the complexity parameter is hard-coded right now, I plan to change that.

If you want to test something against the empty string, use:test with one argument, like so.

S∈tBang> :test ~_
............... All tests passed.

Note that ~_ is a no-op; it duplicates TOS, then drops it.

Infinity

The Axiom of Infinity provides the ZFC system with an infinite set, ω, defined such that:

  • 0 = {} ∈ ω.
  • if x ∈ ω, then x’ = x ∪ {x}.

In SetBang, you can put infinite sets on the stack. The $command achieves that. It places, on the stack, the set ω = {0, 1, 2, 3, …}.

S∈tBang> $
Stack: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...}
S∈tBang> $
Stack: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...} {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...}

Much larger infinities can be approached in SetBang. Of course, computers are finite, so doing any work that requires the entire set is impossible. We can use it, still, because it’s realized lazily. Let’s generate the natural number 512 using 9^#(to be explained later) and do a membership check in that infinite set.

S∈tBang> $
Stack: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...}
S∈tBang> 9^#
Stack: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...} 512
S∈tBang> ?
Stack: 1

It’s there. Now, here’s a bit of bad news. In SetBang, ?and =are extremely “dumb”. If you use ?to check for membership in an infinite set and what you’re looking for isn’t there, SetBang will search forever. (It exits quickly in the “true” case). If you’re comparing sets for equality, SetBang will keep working until it finds a difference, and that can only happen if one set is finite. SetBang will quickly resolve $0=(they’re not equal) but $$=, while clearly “true” (ω = ω) will never terminate.

Why is this? Determining equality of sets, at the infinite level, is undecidable in the general case. There are tricks that could be used to make sets “smarter”, but I’ve chosen to make SetBang (designed to be an esoteric language, and not a practical tool) predictably dumb instead of unpredictably smart. We’ll show ways to get around this, later on.

Specification, Replacement, and Union

The Axiom Schema of Specification says that any describable subset of a set is also a set. As specified, it refers to logic formulas (free in one variable) like

(∃y)(∃z) (y∈x)⋀(z∈x)⋀~(y=z)

which is free in x and corresponds to “x has at least 2 elements.” SetBang doesn’t use symbolic logic per se, but it uses code objects (to be described) in their stead. Since it allows the pulling of subsets, it’s the filter (as in map, filter, and reduce) of set theory.

The Axiom Schema of Replacement says that the image of a set under a function of a set is also a set. This is the map of set theory.

Finally, the Axiom of Union allows one to take the union of a set (the set of all elements of its elements) and call that a set. This is an unordered, limited reduce that exists in set theory.

In short:

S is a set, p is a predicate and f is a function.

Comprehension: {s : s ∈ S and p(s)} is a set.

Replacement: {f(s) : s ∈ S} is a set.

Union: {x : x ∈ s for some s ∈ S} is a set.

We’re able to unify all three of these with the {}comprehension (or set comprehension). If codeis a program whose behavior is ... x -> ... f(x), then {code}has the behavior:

Y1 ... YN X -> ... Union(f(x) for all x ∈ X).

where Y1 Y2 ... YNis the stack at the time the {}is encountered.

Set comprehensions may be lazy, and will have to be so if applied to infinite sets. The stack used (the Yiabove) will be the stack as it was when the lazy set was created, which may be a different time from when its elements are realized. For this reason, it’s best that code sequences put inside {} comprehensions not contain side effects, and we make it a rule that they’re not allowed to. Side effects on the stack will be ignored, and I/O becomes unpredictable.

In a language like Haskell or Lisp, you can simulate map, concat and filterusing the more general mapcat, like so:

map f xs = mapcat (\x -> [f x]) xs

filter f xs = mapcat (\x -> if f x then [x] else []) xs

concat xs = mapcat identity xs

We use the same principles to implement replacement (map), specification (filter), and union in SetBang. If  is code whose behavior ... x -> f(x) then

map F --> {F"}

(Remember that the behavior of is ... X -> ... {X}. For example:

Stack: {{15, 5, 10}, {7, 13, 3, 2, 11, 5}, {0, 1, 4, 9}}
S∈tBang> ~{#"}
Stack: {{15, 5, 10}, {7, 13, 3, 2, 11, 5}, {0, 1, 4, 9}} {6, 3, 4}

and

filter F --> {~F(_",__0}}

like so with F = 5?, a program returning 1(true) if 5 ∈ X, where X is TOS.

Stack: {{0, 1, 4, 9}, {15, 5, 10}, {7, 13, 3, 2, 11, 5}}
S∈tBang> ~{~5?(_",__0)}
Stack: {...} {{15, 5, 10}, {7, 13, 3, 2, 11, 5}}

Finally, the union is the easiest of all to write:

union --> {}

Stack: {{0, 1, 4, 9}, {15, 5, 10}, {7, 13, 3, 2, 11, 5}}
S∈tBang> {}
Stack: {0, 7, 1, 4, 15, 13, 3, 2, 11, 9, 5, 10}

SetBang comes with built-in operators for the binary union, intersection, difference, and exclusive or of two sets:

| : ... X Y -> ... (X ∪ Y)

& : ... X Y -> ... (X ∩ Y)

- : ... X Y -> ... (X - Y)

. : ... X Y -> ... (X - Y) ∪ (Y - X)

If, however, these didn’t exist, they could be written (with one caveat, explained below) using set comprehensions:

| <--> +{}

& <--> {2>~3<~3>?(_",__0)};

- <--> 2>{2>~3<~3>?(__0,_")};

. <--> 2>~3<~3>2>{2>~3<~3>?(__0,_")};3>2>2>{2>~3<~3>?(__0,_")};+{}

In truth, it’s much better to use the built-ins, and here’s why. {}-comprehensions become strange when infinite sets are involved. One will get identical behavior from 0$& and $0&, but the same is not true of the intersection program written above. Let’s look at this using the macro system (which will be featured heavily in future posts).

S∈tBang> :macro inter {2>~3<~3>?(_",__0)};
Stack:
S∈tBang> $ 0 :inter:
Stack: 0
S∈tBang> 0 $
Stack: 0 0 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...}
S∈tBang> :inter:

.... hangs forever!

As I said, SetBang prefers to be predictably dumb in handling infinite sets, rather than unpredictably smart. However, the implementation of & is smart enough to know that set intersections commute and to avoid looping over an infinite set when possible. Set comprehensions don’t do this, so 0 $ :inter: triggers an infinite loop (over $).

Laziness can usually rescue us, but the REPL demands elements. The problem with the set computed by 0 $ :inter:is that, while empty, it produces no elements. It’s equivalent to filtering, with \_ -> False, from an infinite stream. The result is an empty stream, but the computer is not usually smart enough to figure it out (and, in fact, the general problem is mathematically undecidable). We call this ⊥ (pronounced “bot”). It’s equal to the empty set, but the computer will never know that it’s equal to the empty set: it will try, fruitlessly, to find elements forever.

The &command is smart enough to reorder arguments when it reduces the likelihood of producing a ⊥, while set comprehensions aren’t.

Power Set

The Axiom of the Power Set allows one to produce, for a given set X, the set of all of its subsets. The command for this is ^, with behavior ... X -> ... P(X).

S∈tBang> 3
Stack: 3
S∈tBang> ^
Stack: {0, 1, {2}, 3, 2, {1}, {1, 2}, {0, 2}}

These are the 8 subsets of 3 = {0, 1, 2, 3}.

Laziness comes into play at a certain size (currently, around 2^16 elements) because power sets can be very big.

S∈tBang> 5^#
Stack: 32
S∈tBang> ^
Stack: {0,1,{1},2,{2},{0, 2},{1, 2},3,{3},{0, 3},{1, 3},{0, 1, 3},{3, 2},{0, 3, 2},{1, 3, 2},4, ...}
S∈tBang> _$
Stack: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, ...}
S∈tBang> ^
Stack: {0,1,{1},2,{2},{0, 2},{1, 2},3,{3},{0, 3},{1, 3},{0, 1, 3},{3, 2},{0, 3, 2},{1, 3, 2},4, ...}
S∈tBang> ^
Stack: {0,1,{1},2,{{1}},{0, {1}},{1, {1}},{0, 1, {1}},{2},{0, 2},{1, 2},3,{2, {1}},{0, 2, {1}},{1, 2, {1}},{0, 1, 2, {1}}, ...}

In theory, we can place huge sets on the stack. $^isn’t only infinite but uncountably infinite. That said, SetBang can only generate a finite number of values, so most elements of that set P(ω)– namely, infinite subsets of ω– will never be generated.

Choice

One historically controversial axiom in set theory is the Axiom of Choice, which says that there exists a “function” f : Set -> Set such that f(X) ∈ X for all non-empty X.

Our choice command has the behavior:

... {} -> ... {} {}

... X -> ... (X - {e}) e where e ∈ X 

Which element does it pick? This is where SetBang gets weird and hilariously impractical: in this implementation, it’s nondeterministic. (It is not, however, random.) You can’t, in general, rely on elements being chosen in the same order. Why is it this way? Due to our needs as finite creatures, how a set is computed determines what we can observe. (This is the difference between {} and ⊥. The latter takes an infinite amount of time for one to figure out that it’s empty.) The easiest way to impose a choice function on sets is to create an ordering on them and, as a matter of rule, choose the least element. However, as implemented, infinite sets are processes that generate elements and one may never know when the least element has been generated.

I might remark that this presents a difference between an idealistic notion of choice (a function that exists platonically) and a practical desire, when possible, to produce elements quickly. SetBang favors the latter. Iterated choice will produce new elements in some order (until the set is empty).

For convenience, there’s also a choose-many command ` whose behavior is:

` : ... X K -> (X - Y) Y where #Y = min(#K, #X) and Y ⊆ X

It’s used like so, to choose 9 elements from 65536 = {0, …, 65535}:

S∈tBang> 4^#^#
Stack: 65536
S∈tBang> 9`;
Stack: {0, 60363, 20083, 25535, 39036, 10859, 41408, 65065, 50300}

The mathematical Axiom of Choice is not actually necessary for SetBang’s choice operator to be allowed, insofar as only finitely many choices can actually be made, computationally speaking, and Choice isn’t required for that. So, to be pedantic, it’s technically {\"}that requires the Axiom.

Regularity

The Axiom of Regularity is one of my favorite set-theoretic axioms. The others all define ways to construct sets, while Regularity imposes an upper bound on what can be sets. Strictly speaking, it says that every non-empty set contains an element that is disjoint from it. This prevents cycles in the membership relation (e.g. a ∈ a or b ∈ c ∈ d ∈ b) that would raise paradox-inducing questions around “set-ness”. Luckily, we don’t have to worry about that in SetBang: there is no way to produce a set that violates the Axiom of Regularity.

The rest of the language

We’re already Turing complete. We can express general recursion in {} comprehensions, and we can get things out of sets, but so far we’ve been writing programs at the REPL that don’t actually do anything. If you run a file consisting of 00/ at the command line, you will compute the set {0}, but not actually do anything with it.

SetBang, therefore, comes with two I/O operators: and  @. Their behavior is as follows:

! : ... X -> prints byte valued at min(#X, 255)  to stdout

@ : ... -> ... C -> where C is the natural number corresponding to one byte read from stdin

It might be tempting to use this in conjunction with set comprehensions, but that’s not a good idea, because set comprehensions are lazy, making it unpredictable when that I/O will happen. (In fact, it’s not legal SetBang to do I/O in a set comprehension.)

Imperative, eager loops can be realized using square brackets ([]). The behavior of is, if TOS is empty, to skip the block entirely. If TOS is not empty, then the program code is executed and TOS is checked again, with execution repeating until TOS is empty.

Thus, this program is an “echo” that will take one character from stdin and echo it to stdout. It never terminates.

1[@!].

This program will put the bytes 10, 9, 8, …, 1 to the console.

9'[~!\_#]

The behavior of \_#, please note, is comparable to decrement (except on 0, when it returns 0 because there is no -1).

What else?

There are 42 characters that can serve as SetBang operators. We’ve seen 41 of them.

  • for putting empty sets on the stack. Also: 1, 2, … , 9 as macro-like replacements for 0', 0'' and so on.
  • and ?for equality and membership testing, returning 1on true and 0on false.
    • e.g. ... {2, 4} 4 -> ... 1under ? since 2 ∈ 4 = {0, 1, 2, 3}.
  • + for pairing (... X Y -> ... {X, Y}).
  • 'for the increment operator (... X -> ... X ∪ {X}).
  • "for “quoting” a value into a set, (... X -> ... {X}).
    • Equivalent to ~+.
  • to pop from the stack (... X -> ...).
  • ~ to duplicate TOS (... X -> ... X X).
  • < and for rotations on the top of the stack,
    • e.g. 3>is a program whose behavior is (... X Y Z -> ... Z X Y).
  • as shorthand for 2>_ (“swap-drop”) with behavior ... X Y -> ... Y.
  • {} for set comprehensions, which build sets based on existing sets.
  • &,|,-, andare for intersections, unions, differences, and exclusive-or respectively. 
  • (“choice”) and its (right-)inverse and the iterated choice `.
    • \ : ... X -> ... (X - {e}) {e}
    • / : ... X Y -> (X ∪ {Y})
    • and then, e.g., ... {2, 3, 5, 7} 2 -> {2, 11, 5} {7, 3}under `.
  • ^which creates the power set (e.g. ... X -> ... P(X)).
  • #for replacing TOS with natural number according to cardinality.
    • ... X -> #X works reliably when X is finite.
    • when X may be infinite, you’ll get a lazy possibly-infinite set.
    • does not handle transfinite cardinalities. It doesn’t detect uncountability, for example.
  • () and for conditional execution based on TOS.
    • (^'#) executes the program ^'# if TOS is non-empty.
    • (^'#,_1) is similar but executes _1 if TOS is empty.
  • $to push an infinite set {0, 1, 2, …} on to the stack.
  •   for output and for input. Both convert bytes to natural numbers between 0 and 255 inclusive.
  • [] for looping.
    • [\_#] will execute the program \_# until TOS is empty.
    • 1[] will loop forever (doing nothing).
  • : for macro invocation.
    • Canonical example is :macro swap 2>. Then :swap:_ -> 2>_.
  • % for `… X Y -> {{X}, {X, Y}}, a sort of special pairing.
    • Could be implemented as 2>~~+3>++.

There’s one more (using the character *) which, although useful, is also not necessary as a primitive of the language. If we wanted to build it, we could represent it as:

(~#1=(_{}{}~,_~\2>\2>_.2>\2>\2>_&{}2>{}),0)

For now, I’ll leave this “mystery operator” unexplored, but we’ll come back to it in the next installment. It turns out to be very useful.

Going forward

So, that’s the language. There are many questions to ask about it, though, such as:

  • is it possible to make infinite sets “smarter”? For example, the program ${'"}0searches for 0 in the set {1, 2, 3, …}. Because SetBang is deliberately dumb in interpreting infinite sets, that program will search forever. Is it possible to make it smarter?
  • Full SetBang has a lot of redundant commands. Is it possible to strip some of them out of the language? What’s the minimum possible SetBang?
  • How should we handle error conditions (e.g. stack underflow)?
  • Can we get better at handling larger ordinals like ω+1, ω*2, ω^ω and the like?
  • What do useful programs in SetBang actually look like?
  • SetBang is slow. Is it possible to make it faster?
  • Is this language Turing complete; that is, as powerful as any other programming language? It turns out that the answer is Yes, and I’ll prove it in Part 2.