SetBang 3: Fast(er) math in SetBang

SetBang, even when limited to the finite world, is inherently “inefficient”. We might like to live in a world where straightforward set operations can be performed quickly. Unfortunately, that’s not the case. Short programs with no obvious infinite loops can still take a long time.

Let’s restrict ourselves to hereditarily definite sets. A definite set in SetBang is that is derived from other finite sets, and hereditarily definite means that it’s a definite set whose elements are all (hereditarily) definite sets.

There are many finite and probably-finite sets that aren’t definite. For example, if we admit this macro:

:macro magic #003<[003<[~#1?(_\_\_2>'2>,__2>_12>0)]_2>(4<~5<2>/4>'4>_,4<'4>_)]_;~22>?(__0,_2003<[003<[~#1?(_\_\_2>'2>,__2>_12>0)]_2>(4<~5<2>/4>'4>_,4<'4>_)]_;[~~03>[\3<~3<[\_2>{'"}2>]_4<[~3<~4<2>4<.3>&{'"}]_3>2>]__3<~4>~3<~4<2>4<=(;;,_2>~3<~4<2>4<-3>2>-(~{}3<~4>{~3>2>~4>?(_",__0)}\2>[\3<&2>]_;(~"4<2>-3>2>-1[~3<~4<2>4<.3>&{'"}]_[~3<~4<2>4<.3>&{'"}]_,___0),_)(_1))(_~3<~4<2>4<(03>~{}3<~{}3<-#'[\_#3<~3<~4>[\_2>{'"}2>]_~5<~6>~3<~4<2>4<=(;;,_2>~3<~4<2>4<-3>2>-(~{}3<~4>{~3>2>~4>?(_",__0)}\2>[\3<&2>]_;(~"4<2>-3>2>-1[~3<~4<2>4<.3>&{'"}]_[~3<~4<2>4<.3>&{'"}]_,___0),_)(_1))(_3<~4>6<2>/5>4<2>~3<~4<2>4<-3>2>-(~{}3<~4>{~3>2>~4>?(_",__0)}\2>[\3<&2>]_;(~"4<2>-3>2>-1[~3<~4<2>4<.3>&{'"}]_[~3<~4<2>4<.3>&{'"}]_,___0),_)3<,__3>)]_;,2>)(__1[~3<~4<2>4<.3>&{'"}]_,____00),___10)]_)

we can create these sets:

:macro mersenne ${^\_#~:magic:(_",__0)}

:macro fermat ${^^'~:magic:(_",__0)}

:macro maybebot :fermat:`5_`1; 

What does the magic macro do? Its behavior is:

... N -> ... (1 if N is prime, 0 if N is composite).

As a result, mersenne produces the set of Mersenne primes, which is possibly infinite although that remains unknown, and fermat produces that of Fermat primes, of which there are probably exactly 5 (although there may be infinitely many). The set maybebot, derived from fermat, is nonterminating empty (“bottom”) if there are only 5 Fermat primes, and a one-element set if there are 6 or more. The last of these is a set that is trivially, provably, finite; but its derivation (ultimately from $) makes it a risk of non-termination. It’s finite, but not definite.

If we remove $ from SetBang, we can make every set not only definite but hereditarily definite (meaning that the elements are definite sets as well). We still have the potential for non-terminating computations due to [] loops, but otherwise, even if we use {}, everything will terminate.

Is SetBang still “inefficient”? Yes. So let’s talk about what an efficient SetBang would look like, and assume that we’re only working hereditarily definite sets (i.e. there is no $). An efficient SetBang would:

  • canonicalize sets so that set equality checks (=) can be performed in bounded time as a function of the size of the set.
  • perform membership checks (?) in bounded time.
  • represent structured large sets (e.g. 8^#^, the set of all subsets of {0, …, 255}) in an efficient way that doesn’t require storing the entire set.
    • The example above has 2^256 elements. We can’t store it in its entirety but, because it’s a highly structured large set, we can easily check for membership in it.
  • perform primitive operations (namely, \and /) in constant or logarithmic time.
  • “execute” {} comprehensions in a way (strict or eager) that doesn’t take too much time or space upfront, but that allows the resulting sets to be used efficiently, assuming that the code inside the comprehensions runs efficiently.

Is it possible, on some physical device possibly unlike an existing computer, to make an efficient SetBang, even excluding non-definite sets by removing $? It’s extremely unlikely, as we’ll see below.

Before we do that, we note that ordinals are an extremely inefficient way to represent numbers, so let’s use binary encoding to represent them as sets of ordinals instead. (An implementation, of course, can handle small finite ordinals as efficiently as if they were constants, urelements, or near equivalently, machine integers.) So we represent the number 37 as {0, 2, 5} (where the numerals on the right-hand side are still ordinals) as opposed to {0, …, 36}.

We could go further than binary and use von Neumann indices (or “hereditary binary” representations) defined like so:

I({}) = 0

I({a, b, …}) = 2^a + 2^b + …

This creates a bijection between the natural numbers and the hereditarily finite sets. Letting J be I’s inverse, we’d represent 37 as:

J(37) = {J(0), J(2), J(5)} = {{}, {J(1)}, {J(0), J(2)}} = {{}, {{J(0)}}, {{}, {J(1)}}} = {{}, {{{}}}, {{},{{{}}}}}

Why don’t we do this? In practice, I don’t consider it worth it. The multiple recursion of the index encoding and decoding complicate code considerably. Besides, an implementation can hard-code the first 256 ordinal constants and get the same efficiency (for numbers below 2^256) that we would have as if we treated them as urelements. While we gain efficient storage of certain “sparse” numbers (e.g. 2^(2^(2^(2^7 + 1))) + 23) by using hereditary binary, we complicate our algorithms (by, among other things, replacing single recursion with multiple recursion, requiring unpredictable amounts of stack space) and we still can’t do a lot with them. It does not enable us to make tractable operations out of those that would otherwise be intractable (e.g. precise arithmetic on such massive numbers).

With the binary representation, we have a compact addition function:

S∈tBang> :macro addbinary [~3<~4<2>4<.3>&{'"}]_
Stack: {0, 1, 6, 3, 5} {7, 4, 5, 8}
S∈tBang> :addbinary:
Stack: {0, 1, 4, 3, 9}

We see it correctly adding 107 + 432 = 539. The way it works is by repeatedly executing the behavior:

... X Y -> ... (X xor Y) (X and Y) -> ... (X xor Y) {z + 1 : z ∈ X and Y}

until the latter set (TOS) is zero/empty. This propagates the carries in the addition function, and it terminates when there are no more carries. It may not be the most efficient job, but it does the job in polynomial time (of the size of the sets, or the bit-sizes of the numbers).

With this, we can now illustrate concretely that our dream of efficient (as defined above) SetBang is probably not possible. Consider the following code.

^{:sumbinary:=}(1,0);;

with sumbinary (using a loop, but guaranteed to terminate in polynomial time) defined like so:

:macro sumbinary 02>[\3<:addbinary:2>]_

and the former’s behavior is

S X -> 1 if some subset Y of X has Sum(Y) = S, 0 if else.

This is the subset-sum problem, known to be NP-complete. Ergo, if P is not NP, an efficient SetBang implementation is not possible. Even by eschewing $, and restricting ourselves to the most trivial measurement of a set (“Is it empty?”) it will always be easy to write terminating but intractable programs.

In short, one could use laziness to make the {} operator “efficient” and simply not do gnarly computations until needed, and there’s room for cleverness around equality and membership checks (which often don’t require computation of the entire set) but even an empty-check or a \ creates a risk, even in the definite world, of exponential-or-worse time cost.

These results shouldn’t be surprising. Power sets are structured and one can therefore check quickly that, for example, {{{{5}}}} ∈ P(P(P(P(N)))), but they’re still very big. It’s possible, in principle, that SetBang could be implemented in such a way that a large number of complex queries happen efficiently. It’s just not possible, even with $ and [] taken out, to make SetBang fast in all cases. The language is too powerful.

Although there is much FUD around programming languages and speed, SetBang is an example of a language that is literally “not fast”.

Fast(er) math

On Github, I’ve included a library of faster math functions in resources/fastmath.sbg. These operate on numbers that have been converted to binary, like so:

Stack: 37
S∈tBang> :tobinary:
Stack: {0, 2, 5}

If you actually want to convert such numbers back into the more familiar ordinals, you can use frombinary. We’ve seen addbinary, and there isn’t much more to the other arithmetic operators: they’re standard “school book” implementations of subtraction, multiplication, and division-with-remainder on binary numbers. They improve the performance of our arithmetic considerably.

Recall that we previously had an implementation that would take 21 million years to determine that 65,537 was prime. We can now do it in about 30 seconds.

S∈tBang> 00/9'''''''/
Stack: {0, 16}
S∈tBang> :primebinary:
Stack: 1

That’s still a few hundreds of thousands (if not millions) of times slower than, say, trial division in C, but it’s an improvement over what it was.

Here’s a program that (quasi-)efficiently puts the set of all primes on the stack:

S∈tBang> ${~:tobinary::primebinary:(_",__0)}
Stack: {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53, ...}

In fact, full SetBang allows one to put nearly any describable mathematical object on the stack. You can’t always do much with it, of course. Depending on what it is, you might not be able to do anything with it, as it could be:

{n where n is a Gödel number for a proof of theorem T}

which might equal {} when T is provably false (assuming a much smarter implementation of SetBang) but is “bottom” when T is unprovable.