Constructing Computation 1: Symbols, nonsymbols, and equality.

I want to do something different than my normal writing. I want to construct programming. What do I mean? I want to start from some basic principles and build up to the familiar. I won’t be proving anything, and  I’m not going to claim that what I’m building is necessarily the “right” way to do things– this is just for fun– but I want to explain how programming feels when one strips away the complexities and gets to the basics. This is a model of programming designed to work for anyone interested in learning the fundamentals. This will either be very interesting or loopy and divergent, and I will be editorially incompetent when it comes to determining which is the case.

To start, let’s create an object language (a world) where we have the following:

  • All objects (that is, all things in the world) are nonsymbols or symbols.

That’s easy enough. It’s tautological. It’s a symbol, or its not one. However, the distinction is important to us. Now, some computational models only have one category of “thing” (e.g. lambda calculus, in which everything is a function) and I’m starting with two. Why? With a symbol, we can “inspect” it and immediately know what it is. Nonsymbols are more like “black boxes”; you can observe their behavior, but you can’t know “everything” about an arbitrary nonsymbol.

Let’s start with some of the things we can do with symbols.

  • There is a function succ that takes a symbol x and maps it to a different x‘ (shorthand for succ[x].) There are no duplicates in the infinite set {xx’, x”, …}. We call that set Gen[x] and call x and y, incomparable iff xGen[y] and yGen[x]. We call a symbol x basic if there is no y such that y‘ = x. Applied to a nonsymbol, succ returns a special symbol, _error (which is basic, so there is no confusion between this return and a successful call to succ).
  • There is a symbol called 0 and, for any “English language word” (nonempty string of letters and underscores) “foo” there is a corresponding symbol _foo. These are all mutually incomparable and basic. The most useful of these will be _error.
    • We call 0′, 1; 0”, 2; and so on, so we have the natural numbers as a subset of our symbol language.
  • There is a meta-level function called eq that takes two symbols x and y and returns _true if they are identical and _false if they are not. It returns _error if either argument is a nonsymbol.

What makes symbols useful is the eq function, because it means that we can have a total knowledge about what a symbol is. We know that 0 is 0 is not 1. We know that _true”’ (call that _true3) is not _error or 5. We have a countable number of symbols, and assume that it’s one computation step to call eq on any two symbols (and get _true or _false). We also are going to allow another function to exist:

  • There is a function called comparable that takes two symbols x and y and returns _true if they are comparable, _false if they are not, and _error if either argument is a nonsymbol. This is assumed, likewise, to require only one computation step. The main purpose of this is to give us an natural? predicate, which we’ll use in code (later) to identify natural numbers. We also allow ourselves pred (which returns x when given x‘, and _error when given a basic symbol, like 0) and basic (which maps each symbol to the single basic symbol in the equivalence class defined by the comparability predicate) and consider those, as well, to be one computation step. 

Symbols are “data atoms”. We have something nice here. In a typical object-oriented language, a “thing” can have all sorts of weird properties– it might have changing state or unusual access behaviors, and it might have interacting local functions called methods. The space of things that can be objects is also extensible by the user. It can be a string, a file handle, or an abstract object used solely for perfect identity (i.e. comparing false to anything else, since it holds an address on the machine where it lives, that nothing else can.) There’s a lot less to symbols. The set of them that exists is limited (countably infinite) and pre-defined. You can’t do much with them. Strings, negative numbers, stateful entities, and other such things live at a higher level. We’ll get to those; they’re not symbols in this language.

Nonsymbols are not as well-behaved with regard to eq, the universal equality function. Given nonsymbols, it returns _error, giving us no information. Why? Because for nonsymbols, one cannot compare them for equality in the general case. The set of nonsymbols is not as well behaved. A nonsymbol can’t be observed for its “true” value directly. So let’s explain what we can do with nonsymbols:

  • There is a function called apply that takes two arguments. If its first argument is a symbol, it returns _error. However, the first argument will almost always be a nonsymbol. The second argument may be a symbol or nonsymbol, and likewise for the return. In this light, we can view nonsymbols as “like functions”; we liken a nonsymbol ns to the function mapping x to apply(nsx). The purpose of apply, then, is to give us the interpretation of nonsymbols.

We write a nonsymbol’s notation based on its behavior under apply, so one nonsymbol is {_ => _error}, which returns _error for any input. That’s not especially interesting. Another one (which I’ll call “CAT”) is {0 => 67, 1 => 65, 2 => 84, _ => _error}. What I’m writing here in the {} notation is in our natural language; not code. However, I want to make something clear. About nonsymbols that we create and “own”, we can know the full behavior. We know that {_ => 0} is constantly zero. We just can’t assume such “niceness” among arbitrary nonsymbols. I’ll also note that for every nonsymbol we can describe, there are infinitely many that we can’t describe.

Let’s look at a more pathological nonsymbol A:

A = {0 => A, _ => _error}

It’s self-referential. Is that allowed? There’s no reason it can’t be. Under apply with argument 0, it returns itself. This does mean that a naive “print nonsymbol” utility on A might fail, printing:

{0 => {0 => {0 => {0 => {0 => {0 => ...[infinite loop]...}}}}}}

There are also less-pathological nonsymbols with interesting behavior that can’t be written explicitly as this-to-that correspondences. For example, the succ function is realized by the nonsymbol:

{0 => 1, 1 => 2, 2 => 3, …; _error => _error‘, … ; _true => _true‘; …; $nsym => _error}

and eq by:

{0 => {0 => _true, $sym => _false, _ => _error}, 1 => {1 => _true, $sym => _false, _ => _error}, …, $nsym => {_ => _error}}

where $sym matches any symbol and $nsym matches any nonsymbol. 

However, there are nonsymbols without such easy-to-write patterns. For example, here’s one:

.{ [Symbol or nonsym. N : eq[_true, apply[natural?, apply[N, _size]]] does not return _true] => _error,
.  [Nonsymbol N : eq[_true, apply[natural?, apply[N, _size]]] =>
.    {$sym => _error, $nsym =>
{_size => k, 0 => apply[$nsym, apply[N, 0]], ..., k-1 => apply[$nsym, apply[N, k-1]]]}}}

That’s an extremely important nonsymbol. It’s called “VectorMap” and it works like this:

A := {_size => 3, 0 => 5, 1 => 7, 2 => 2, _ => _error}

F := {0 => 1, 1 => 2, ...} # that's succ (but VectorMap can take any nsym)

apply[apply[VectorMap, A], F] => {_size => 3, 0 => 6, 1 => 8, 2 => 3, _ => _error}

Given a nonsymbol understood to follow a certain set of rules (that _size => integer k and only symbols 0, 1, …, k – 1 are supported) it produces a new vector that is the image of the old one. What if those rules don’t apply? Then there are fields we end up not copying. It is just not possible to ask any questions about an arbitrary nonsymbol’s behavior over all inputs. If we created nonsymbol ns, we can verify that it’s compliant with invariants we created for it; however for an arbitrary one, we cannot answer questions that require us to query across all inputs. In other words, we’re not allowed to ask questions like:

  • Does nonsymbol ns return y for some argument x
  • Do nonsymbols ns1 and ns2 differ in returns for some argument x? (This is why nonsymbol equality comparisons can’t be made.)
  • Is nonsymbol ns constant (that is, is ns(x) identical to ns(y) for all x and y)?
  • Is there some finite number of k for which ns^k(0) — that is, ns applied to 0 k times– is a symbol?

To have an intuition for the theoretical reasons why such operations can’t be done on nonsymbols, consider (and just take on face value, for now) that some nonsymbols N exist of the form {[n is a natural number corresponding, in Godel encoding, to a proof of an undecidable statement] => 1, _ => 0}. In that case, its constancy is undecidable. That should explain the three impossibilities above; the fourth is an expression of the Halting Problem, which Turing proved undecidable. 

The problem with all such questions is that we don’t have an infinite amount of time, and those require an infinite amount of search. Symbols alone are countably infinite, and nonsymbols exist in a class so large that it cannot be called a set. (Nonsymbols, unlike mathematical sets, can contain themselves.) We are somewhat lucky in that the set of nonsymbols that we can describe in any formal language is still countable, reducing how much we have to worry about in the real world; still, these questions remain unanswerable (Godel, Church, Turing) even over that (countable) set of nonsymbols we can build.

Because nonsymbols are unruly in the general case, we find ourselves wanting to define contexts (which will later be formalized better as types; contexts here are informal English-language entities we use to understand certain kinds of nonsymbols) which are principled processes of calling apply on nonsymbols (making observations) to recover all information considered relevant. For example, above we used the Vector context:

Observe _size. If it's not a natural number, then it's not part of the Vector context. If it is some natural number k, then observe 0, 1, ..., k - 1.

This gives us, additionally, an equality definition for Vector. If all observations are identical according to this process, then the Vectors are considered equal. This means that these:

{_size => 1, 0 => 18} (from now on, assume _ => _error unless otherwise specified.)

and

{_size => 1, 0 => 18, _metadata => _blow_up_world}

are equal in the Vector context, even though the latter has gnarly _metadata. In some other context, they might not be equal.

For a nonsymbol to be “not part of the Vector context” means that one can’t interpret it as a Vector. Nothing prevents us from doing so computationally, because we’ve created a language in which even garbage function calls, such as apply with its first argument a symbol, return something– specifically, _error. For example, using vector equality semantics on {0 => 1, 1 => _empty} and {0 => 2, 1 => _empty} leaves us with nonsensical instructions, because observing _size yields _error, and “0, …, _error – 1″ doesn’t make any sense. Computationally, a vector-equality function would be expected to return _error in such a case. If the machinery responsible for looping over “0, …, k-1″ weren’t guarded against erroneous values, one can imagine a naive interpretation that falls into an infinite loop, because succ^k(0) never reaches _error.

Here’s another context called LL for now.

The symbol _empty is in LL, and no other symbols are. For a nonsymbol N, observe N(0). If N(1) = _empty, terminate. Otherwise, observe N(1) in the LL context.

So, here’s something in the LL context:

{0 => 67, 1 => {0 => 65, 1 => {0 => 84, 1 => _empty}}}

What we have, there, is a lazy linked list corresponding to the word “CAT” (in Ascii).

In practice, we’ll also want to keep a preferred context living with our data, so we might see nonsymbols that look like this:

{_type => _vector, _size => 64, 0 => 83, …, 63 => 197}

That gives us an Object1 context for equality:

Two symbols x and y are equal if eq[x, y]. A nonsymbol and symbol are never equal. For two nonsymbols, observe _type for each. If they are both equal symbols, "look up" the corresponding context (e.g. Vector for _vector) and apply it. If they are differing symbols, then they are unequal. If this lookup fails (i.e. their _type is unknown) or either has a nonsymbol observation _type, then they don't live in the Object context and this comparison fails.

By “look up”, I’m assuming that we have access to a “golden source” of interpretations for each _type observation. In the context of distributed systems, that turns out to be a terrible assumption. But it will work for us for now. Even still, the above context is not always satisfactory. Sometimes, we want cross-type equality, e.g. we want

{_type => _list, 0 => 67, 1 => {_type => _list, 0 => 65, 1 => {_type => _list, 0 => 84, 1 => _empty}}}

and

{_type => _vector, _size => 3, 0 => 67, 1 => 65, 2 => 84}

to be treated as equal in our Object context, since they both represent the word “CAT” (in Ascii). Well? That gets gnarly quick. That if our Object context admits N^2 different equality contexts because we need one for each pair of possible _type values. It gets even worse if we allow users to extend the definition of Object, making its list of supported types potentially infinite.

We treat nonsymbols as lazy, which means that their return values are provided on an as-needed basis. They don’t exist in a “completed” form. This is important because many of them contain an infinite amount of data inside them. For example, one has already seen how basic linked lists behave:

{0 => 212, 1 => {0 => 867, 1 => {0 => 5309, 1 => _empty}}}

but then there are also lazy streams that have infinite depth, such as this one:

{0 => 2, 1 => {0 => 3, 1 => {0 => 5, 1 => {0 => 7, 1 => {0 => 11, 1 => {0 => 13, 1 => {…}}}}}}}

which contains the entire set of prime numbers.

This laziness has a lot of benefits. It gives us one of the most powerful nonsymbols out there, which I’m calling Branch:

{[nonsymbol N s.t. N(0) is _true] => {T => {_ => T(0)}}, 
  [nonsymbol N s.t. N(0) is _false] => {_ => {F => F(0)}}}

A thunk is a nonsymbol whose return value is constant, e.g. Thunk[x] = {_ => x}. Since it’s irrelevant argument what is used to “pierce” it, we can use 0 by convention (or _unit if we wish to suggest, more strongly, a never-used input value). The design principle of the Branch nonsymbol is to take three thunks. The first (condition) is observed at 0 (that is, evaluated) no matter what. If it returns _true, we evaluate the second thunk and never the third. If it’s _false, then we ignore the second and evaluate the third.

We use apply3[x, y, z, w] as shorthand for apply[apply[apply[x, y], z], w] and we note that:

apply3[Branch, Thunk[_true], Thunk[42], Thunk[_blow_up_world]]

never, in fact, blows up the world. It returns 42. This gives us conditional execution, and it will be used to realize one of the most important language primitives, which is the if statement.

I’m going to talk about one other context, which will seem radically disconnected with how we usually think of the concept. I’m going to call it “Array” even though it seems nothing like an actual computational array (i.e. a block of contiguous memory):

No nonsymbols are in the Array context. Symbols not comparable to zero (i.e. not natural numbers) are not in the Array context. If it is an Array, call that natural number k. Observe the largest integer m such that m^2 <= k. If k - m^2 >= m, then it's not in the Array context. Otherwise, observe k - m^2.

In other words, only a subset of the natural numbers are in this context. The first observation gives us a maximal value, and the second gives us a value corresponding to the data it contains.

For example, 281474976842501 is an array. That number is equal to 2^48 + 2 * 2^16 + 3 * 2^8 + 5; so m = 2^24 and our second observation is 131845, which we interpret as the 3-byte array [2, 3, 5].

We now have an adjective we can apply over contexts. A context is Arrayable if:

  1. there is some principled way of encoding all things that live within it into an Array, and
  2. there is a fixed set of observations, after making which; for all things that live within it, we will have a finite upper bound on the integer value of the array computed.
  3. each semantically different value will map to a single Array, distinct from any other, but any values that are equal within that context will map to the same Array.

Vector is not Arrayable, but here are two examples of Arrayable contexts, BigInt and AsciiString:

BigInt: Observe _upper. Observe _data. If _data exceeds _upper, the nonsymbol does not live in the BigInt context. This is Arrayable as n(_upper)^2 + n(_data).

AsciiString: Observe _size and, if it is a natural number, call it k. (Otherwise, the nonsymbol is not an AsciiString.) Observe 0, ..., k-1. If all observations are natural numbers from 0 to 127 inclusive, you have an AsciiString. Otherwise, the nonsymbol does not live in this context. If it does, this is Arrayable as: 128^0 * n(0) + 128^1 * n(1) + ... + 128^(k-1) * n(k - 1) + (128^k)^2.

Here is an AsciiString nonsymbol for “CAT”:

{_size => 3, 0 => 67, 1 => 65, 2 => 84}

Its corresponding Array (integer) is 2^42 + 84 * 2^14 + 65 * 2^7 + 67 = 4398047895747.

Whither code?

So far, we’ve dealt only with data. I haven’t gotten yet to code: what it is, and why we like it. Code is what we call a nonsymbol within some Arrayable context (called a language) that allows us to produces a symbol or nonsymbol, in a predictable and useful way, from it. For example, here’s a piece of code, expressed as a nonsymbol.

{_size => 15, 0 => 40, 1 => 102, 2 => 110, 3 => 32, 4 => 120, 5 => 32, 6 => 40, 7 => 115, 8 => 117, 9 => 99, 10 => 99, 11 => 32, 12 => 120, 13 => 41, 14 => 41}

It can better be expressed as an Ascii string:

"(fn x (succ x))"

Our language might give us the tools to transmit this into the nonsymbol we want: {x => x’}.

We can’t trust nonsymbols very far. The space in which they live is too big. Let’s just talk about a simple (but infinitely supported) nonsymbol called Add:

{0 => {0 => 0, 1 => 1, 2 => 2, …}, 1 => {0 => 1, 1 => 2, 2 => 3, …}, 2 => {0 => 2, 1 => 3, 2 => 4, …}}

We can’t afford to realize (or “complete”) this nonsymbol in its entirety, but that doesn’t mean we can’t use it, because we’re only going to need a small number of its infinite cases in a real-world running program. We need some way of specifying what it is, without having to allocate infinite memory (which no machine has) for the table. We end up wanting to be able to write something like this (in pseudocode):

$Use name #add for (fn x -> (if (eq x 0) then (fn y -> y) else (fn y -> (succ (#add (pred x) y)))))

When we execute this, we see its operational semantics.

(#add 2 3) is ((#add 2) 3)
. => (((fn x -> (if (eq x 0) ...)) 2) 3)
. => (((if (eq 2 3) ...)) 3)
. => ((if false ... (fn y -> (succ (#add (pred 2) y))) 3)
. => ((succ (#add (pred 2) y)) 3)
. => (succ (#add (pred 2) 3))
. => (succ (#add 1 3))
. => ... => (succ (succ (#add 0 3)))
. => ... => (succ (succ ((fn y -> y) 3)
. => ... => 5

Of course, this version of addition really succs when it comes to performance. (It’s not even tail-recursive!) Why are we performing so much succage just to add two natural numbers? Well, this is a “least assumptions” model and it doesn’t assume, axiomatically, that one knows how to add. We derive that from succ (which we do assume). In reality, you’d almost certainly want to have a primitive called add or + that performs the addition in a more performant way. Invoking it would perform an apply on the appropriate nonsymbol, e.g.:

(+ 5 7) => apply2[{nonsymbol for addition}, 5, 7]

How does this work in the world of real computation? Well, two things are different. First of all, we actually have a lot of those primitives that we didn’t assume to exist. For example, we have addition and multiplication provided by the chip: not the entire nonsymbol, but a process that executes it over a large number of values (e.g. 64-bit integers). We don’t have to write, in general, arithmetic operations in terms of succ. Again, we could, but it’d perform extremely poorly.

On the other hand, there’s a false assumption above, which is that that eq takes constant time (one “computation step”). On an a finite set of symbols, it can be very fast but, in the real world, this doesn’t hold true. No physical object that we know of can differentiate an infinite number of states and communicate back to us, reliably, which state is observed. That’s why finite sets of symbols and Arrays become important. 

What we really have in the state of a computer is a nonsymbol that has the capability to represent a finite (but very large!) set of symbols. We can think of it as a nonsymbol for which there is some natural number N that returns _error whenever we throw anything but an integer between 0 and N – 1 at it, and returns 0 or 1 when applied to such an integer. Since that state is Arrayable (if we know N) we can imagine it as a single integer or (more usefully) as an array of bits or bytes.

Summary

Again, my purpose here isn’t to do groundbreaking computer science, and that’s not what I’m doing. None of these ideas are original to me, and the only thing I’m trying to accomplish is to create a world that presents these concepts in an accessible but still semi-rigorous way. So here’s what we have so far.

  • We have a universe of symbols and nonsymbols. Symbols allow us to compare for equality (eq) but for a nonsymbol we can never assess its full identity; we can only apply it to various arguments to compute a “return” for each. There’s a succ function on symbols that produces a new one. 
  • We have countably infinitely many symbols and a special symbol called 0; the set {0, succ[0], succ[succ[0]], …} is the natural numbers.
  • Nonsymbols can take symbols or nonsymbols as arguments, and return symbols or nonsymbols. They can be thought-of as akin to functions, but they actually operate on the whole class (nonsymbols are not a set).
  • We have a currently-informal notion of context that allows us to make observations (applications to arguments) of a nonsymbol in a principled way. Some nonsymbols don’t “live in” that context, meaning that the observations are garbage, but many do.These allow us to interpret certain classes of nonsymbols in a better light. For example, we can’t compare nonsymbols for equality ever, but we can compare them for equality as Vectors (i.e. according to observations made in the Vector context).
  • We have a special context called Array that represents a subset of natural numbers. In that context, we observe first an upper bound, and then a natural number data that is less than that.
  • A context is Arrayable if there is a principled way of converting any nonsymbol living in it into an Array, and back, with no context-relevant information lost. (Behavior of that nonsymbol that lives outside the context might, however, be lost.)
  • Languages are Arrayable contexts in which a “string-like” nonsymbol is converted into a symbol or nonsymbol “described” by it. In practice, we think of this as the conversion of a string (file contents) into a function (stateless program).

I don’t know where this is going, or if it’s interesting or useful to anyone, but this is I we have now and I hope I’ve made at least a few concepts clearer (or more attractive, or more interesting).