A Postfix Programming Language

09 Apr 2022


(Update: All further considerations in rpnfl)

Partially building upon the ideas of this blog entry and a bit on Lisp.

Syntax

There are two fundamental statements:

List Statement:

[f1 f2 f3 f4]

Sequence Statement:

{f1 f2 f3 f4}

With f1, f2, f3, f4 here being an exemplary number of functions.

A List Statement has no inputs and has as it’s output the stream of the ordered results of each function run independently.
For example:

[1 2 3]

outputs a stream containing first 1, then 2, then 3, before terminating.

A Sequence Statement has as many inputs as f1 does and has as many outputs as f4 has, with each function’s output being connected to the next function’s input.
For example:

{invert sum}

has one input and one output, or

{[1 2 3] invert sum print}

has no inputs and no outputs.

((wait, can’t we just consider the sequence interpretation to be the default, and use {} for something else? Specifically, for escaping?))

Every in and output is a stream, with the functions behaviour deciding how much of the stream is read. For example a function like map takes two input streams:

[1 2 3] [\inc] map

With \ here escaping the function inc for it to be passed as data to map, instead of to be executed.

However, while the first input stream of map is read until termination, the second input stream is only read once. So for example:

[1 2 3] [\inc \invert] map

would still result in an output stream equivalent to [2 3 4], not [0.5 0.3333 0.25].

Generally, a function will always read it’s first (left-most) input stream(s) until termination, while all after that are only read as needed. For example:

{[1 2 3] [4 5 6] concat}

will read it’s first input stream until termination, then it’s second input stream until termination, etc.

and

{[1 2 3] [4 5 6] zip}

will read it’s first and second stream in tandem, until one terminates.

Ideas

Labelling

We want to be able to give a label to things. We definitely want to define named functions, but we also might want to label a certain stream and attach it somewhere down the line. Is this clean?

A function would be:

\{f1 f2 f3 f4} ~> function_name

A constant would be:

\[f1 f2 f3 f4] ~> const_name

With the statement being escaped, this means that it will not be executed right away, but instead inserted wherever the respective label appears in the code.

However, if we consider \{f1 f2 f3 f4} as a statement as a whole, it effectively again just is a function with one output stream, just as it would appear in something like:

{[1 2 3] \{inc sqr} map}

so the ~> function_name part can be considered as simply taking one input stream and giving it a name. So that would mean, that at any point, we could simply pop a stream from the stack and give it a name.

Problems:
We can’t allow any output to end up being connected to multiple inputs, so therefore, this would mean that we can not allow any label to be applied twice. Of course on the other hand, we want to be able to use named functions multiple times.

This can be solved by defining \ to be a function that takes whatever statement it is escaping and giving an output stream of infinite copies of it.

We also have to consider what precisely the appearance of a label in the source code means. For the use with functions, we want a label to “unescape” whatever statement it contains, i.e.:

{
    \{sum inc} ~> sum_and_1
    
    [1 2 3] sum_and_1 print
    # The above and below statements should be equivalent
    [1 2 3] {sum inc} print
}

However, for a stream, we don’t want anything to be unescaped:

    "data.txt" file ~> our_data

    our_data print

This means we need some differentiation in the syntax between the two. Considering that we usually have many instances of calling a function and few instances of labelling a stream, we want the former be the more convenient one to use. Perhaps something like this:

"data.txt" ~> our_data
'our_data print

with ' here meaning that we actually want the labelled stream to be at that place back up to be dangled, instead of reading it’s contents once and executing them.

Problem: Wait, does this make sense? And does it have to be this not-clean?

Tying up loose streams

Build in a macro for repeated application of a function until the stack of input streams is exhausted. For example something like this:

{[1 2 3] [4 5 6] [7 8 9] concat@}

would be equivalent to

{[1 2 3] [4 5 6] [7 8 9] concat concat}

Maybe @@ can stand for a “black hole”, in cases where we have dangling output streams that we don’t care about.

Problem: If we say the rule is to always pick up the left most stream first, in the inputs to the second concat being [7 8 9] first and [1 2 3 4 5 6] second, i.e. we get [7 8 9 1 2 3 4 5 6]

Can this be solved?

Yes, if we treat the dangling streams as a stack instead of a queue, i.e., every function statement pushes all it’s output streams onto the stream stack. And every time we need input streams, we pop from the stream stack. Then we would have:

{
    [1 2 3]    # gives: s1 (~> Stack: s1)
    [3 4 5]    # gives: s2 (~> Stack: s2 s1)
    [6 7 9]    # gives: s3 (~> Stack: s3 s2 s1)
    concat     # takes: s3 s2, gives: s4 (~> Stack: s4 s1)
    concat     # takes: s4 s1, gives: s5 (~> Stack: s5)
}

Assuming concat works such that it will pass through the second stream it pops of the stack first and the first stream second, we would get the first concat outputting equivalent to [4 5 6 7 8 9], and then the second concat outputting equivalent to [1 2 3 4 5 6 7 8 9].

This also makes a lot more sense, since we generally want to have things that are “close by” (in terms of character distance in the source code) treated first.

Keywords/Macros/Functions

There are functions and there are macros. There are no reserved keywords. Functions are either anonymous or have an alphanumeric (plus _ ?) label. Macros are rules that transform a certain sequence of symbols and expressions into a valid executable expression. A macro can be defined only with symbols as terminals of it’s grammar.

Functions are:

Parameters

Allow for a function to also take parameters in parenthesis instead of as input, for example:

{[1 2 3] map(\inc)}

with a list of N parameters being applied as the N right-most inputs of the function, i.e., as if they had been written before the function, and each parameter in square brackets.

Leaving streams dangling

If a sequence statement has remaining output streams dangling, they become outputs of the statement.
For example:

{
    [[1 11] [2 13] [3 17]]
    unzip    # gives: s1 s2 (~> Stack: s2[2 4 6] s1[1 3 5])
    map(\str) # takes: s2, gives: s3
    # Stack: s3 s1
}

will result in a stream stack with the top stream being equivalent to ["11" "13" "17"] and the second to [1 3 5].

Problem: This might not be what we want, since the “first” values of the tuples are now lower in the stack and the “second” values of the tuples are at the top of the stack. In a sense, this could be the inverse of what we want, however, with this interpretation [(1 11) (2 13) (3 17)] unzip is equivalent to [1 2 3] [11 13 17], which is clean. But of course unzip could also be defined the other way round if needed.

Do we really want the dangling streams to be a stack, not a queue? Is that intuitive?

A way to reorder streams

Say we want to define a function that takes a function as input and applies it to a predefined list of numbers. However, map takes as it’s first stream the data, and it’s second stream the function. How can me make it such that the dangling input that remains is the function?

\{[1 2 3] % map} ~> do_it_to_123

With % here being a blank, such that map will treat [1 2 3] as it’s “data” input and whatever is the input to the entire sequence statement as the “function” input.

((maybe $ instead of %, or some other character?))

How does this play with the dangling stream stack? What does happen if we do:

\{ % % map } ~> map_in_a_trenchcoat

Maybe we can number these dangling inputs:

\{ %1 %0 map } ~> map_but_swapped_inputs

The downside of allowing a blank character like that is that the first statement in a sequence statement no longer tells us the number of input streams of the sequence statement. Whether this is a bad thing, I don’t know, it’s not like it’s relevant for parsing, since we have to read the entire sequence statement anyway to get the number of output streams.

Of course, just as we might want to reorder input streams, we might also want to mess with output streams, so for example something like:

{[1 2 3] | map(\str)}

With | here meaning “leave the previous output stream dangling until the end of the sequence statement instead of pushing it into the stack”.

((maybe use some other character that conveys the “divide” more, | is generally associated with piping, which is kinda the opposite of what it does here))

However, I’m not sure whether this is necessary, or makes for pretty code.