nihil architecture blogs

content

On functional programming

Aah, functional programming, a source of elitism, to separate the common from the elite, or according to the common just intellectual masturbation. What it is no-one really knows, endless wars fought over which languages are functional and which are not…

Now, functional programming seems to have some intrinsic link to the lambda calculus, many programming languages even introduce functions by a  ’lambda keyword’ for no good reason at all, just to be different, for the rest of this discourse I shall use the notation (x.. . E..) in favour of λx.. . E..—I believe it suffices, it’s easier to type out, and nicely delimits the scope of symbols. That lambda was just an excluse to get a Greek letter into it any-way.

Well, I think that in the end, the relationship between lambda calculus and functional programming languages are completely overstated. In fact, I think lambda calculus has little to do with ‘functions’ in the mathematical sense, what is a function? Commonly a function in mathematics is just defined as a collection of pairs in such a way that no first element of those pairs is ever paired with two distinct second elements, though the converse needn’t be true. So it’s just obviously a pairing of elements from the domain of the function to the image of that function. Are lambda abstractions a collection of pairings? Maybe they are in some interpretation, but I more like to see lambda abstractions as a way to get those pairings, they aren’t functions, they’re algorithms, a completely distinct concept. They are instructions to compute functions. They’re just a list of instructions some-one or some-thing can follow, you start with a random first element of such a pair, follow the instructions, and you end up with the second. Algorithms can thus be used as a way to define functions, by simply giving their algorithms, this practice of course has the immediate advantage that using the function then becomes an easy task. A disadvantage is that algorithms are often very long and not all functions can be expressed in algorithms.

What is a lambda abstraction?

I’ve yet to come to a standardized and authoritative definition of the concept, but I think we can safely say that a lambda application (x. E) A with x a symbol and E string of symbols must be identical to E with all occurrences of x in that string replaced by A. Commonly written as E[x:=A]. That was one of the interesting things about lambda calculus, that all one needed to define all algorithms was just basically a way to re-order symbols. Obviously this simple identity is not satisfied any more by a lot of languages that use the lambda keyword to introduce function literals, in fact, a lot have a well-defined evaluation order because the Church-Rosser Theorem—which implies the order of evaluation to be irrelevant—doesn’t hold on those languages. Lambda calculus would become a lot more complex if an evaluation order had to be specified and one often cited beauty of it is its minimalism.

Purely functional languages

The poster child of ‘purely’ functional programming languages is often held to be Haskell, named after the guy who if it wasn’t enough already got the term ‘currying’ named wrongly after him. Haskell’s notation is a bit strange, to leech the examples from Wikipedia: (Hereby this document is licensed under the GNU Free documentation licence too)

-- using recursion
factorial 0 = 1
factorial n = n * factorial (n - 1)

-- using lists
factorial n = product [1..n]

-- using recursion but written without pattern matching
factorial n = if n > 0 then n * factorial (n-1) else 1

-- using fold
factorial n = foldl (*) 1 [1..n]

-- using only prefix notation and n+k-patterns
factorial 0 = 1
factorial (n+1) = (*) (n+1) (factorial n)

Here we have no less than five definitions of the same function in Haskell, we can group these in a number of ways, what’s interesting is the difference between the inner three and the outer two. The inner three are algorithmic, it defines it by giving a way to some reader who understands all those symbols but has an I.Q. of 0 otherwise to also compute the function for any value we want. The other two rely on pattern matching, that is, we simply give up a set of relations, two cases in each, with that function must satisfy and the implementation figures out its own algorithm, in fact, seeing that Haskell is purely functional, even though we could have specified an algorithm that was just inefficient, the implementation would be free to change our algorithm to something that’s more efficient, as long as the return value is the same, as long as the pairs end up being identical, clearly we’re not dealing with algorithms, we’re dealing with functions. It’s the hallmark of purely functional programming languages that algorithms can be changed by the implementation.

Some one once told me that the pattern matching version is just syntactic sugar for the if-then-else version. Well, what is syntactic sugar? I mean, you can in the end justify in some way that C, or even Haskell, is just ’syntactic sugar’ for assembly no? Clearly we need some restriction to that definition. One could propose that needing a Turing Complete model of computation to automate the re-write to what it’s supposedly sugar for exceeds the limits of syntactic sugar. Maybe even restrict it more, in any sense, it’s quite clear that in the C code array[index]—being sugar for *(array+index)—needs a less powerful re-writing system than all the pattern matching that Haskell supports, to begin with the order of the patterns in many languages that support pattern matching is irrelevant, and other functions patterns, or relationships can freely be put in-between. In effect, Haskell is a sharp turn away from algorithms, the word programming itself comes from a Greek root which will be more recognisable in an uninflicted form ‘prographien’, essentially Greek for the latin ‘praescribere’ or ‘to præscribe’, simply a set of instructions which tells some thing what to do. The English word ‘program’ also has a certain connotation to ‘a list in which the order of the items is very much relevant’.

Update:, some-one also pointed out to me correctly that case-statements and pattern matching are different than a true if-then-else in ML based languages because if-then-else’s require an else-branch whereas case-statements do not, which is essential for the type inference of ML-based languages showing that it cannot be replaced by a simple ‘internal-if-then-else’.

Often, people say that Haskell is based on System F, what is the difference between System F and other lambda calculi any-way? All I know is the type system, that’s it, so the only reason it could be based on System F and not simply typed lambda calculus or untyped, is the type system. And that’s more or less the case I guess, Haskell hardly works with lambda applications and abstractions as its main building blocks, but its type system is indeed based on System F. Indeed, it wouldn’t even be Turing Complete using only lambda terms and that type system, it has to be extended with other constructs.

That’s not to say that there’s at all a thing wrong with Haskell or other purely functional languages, or that it’s easy to do, it’s just not algorithmic and therefore it deals in functions and not lambda abstractions. Haskell’s notation is intended to read like definitions in mathematical textbooks, (a thing that’s always amused me is that they use if-else-then as a language construct, while surely in a purely functional environment it could simply be a three-argument function?)

The Lisp language family

One of the oldest language families still in use, and commonly credited with popularizing functional programming, and this is the point where I’ve lost any grip on the meaning of ‘functional programming’? I can see why purely functional languages are, because they don’t have subroutines as in C, they have functions in the very mathematical sense of the word, doing what functions do, they define pairs of input and output, nothing more. Lambda abstractions as established before do a bit more, they also tell you how to calculate that output from that input. But surely I can just keep my C subroutines free from side-effects—except main,of course—and have the very same semantics Lisps; on top of that, Lisps actually allow side-effects to occur. The lexical scope of most mainstream variants today allows for functions to lose their referential transparency by re-defining global variables they use in their body, some Scheme programmers see this as a mark of either supreme bad style, or simply having picked the wrong language, hence all procedures that allow for this all end in an exclamation mark, but Common Lisp programmers do this all the time and the language is a lot more tailored to this.

So why is it called functional programming then? and what’s the connexion people often point to purely functional languages and this? Is it some old relic because Lisp’s just that old and was far more functional than any other language in its early days? Or the back-then prætentious use of the lambda keyword to introduce literal functions? Is it that procedures are first class-objects that can thus be passed as arguments to functions, and dynamically generated at run-time? JavaScript has all those facilities, so why isn’t that functional? Because it simply uses the humble function keyword to introduce literals oppose to lambda?

Focussing on Scheme, the clearly more-functional of the two main dialects in use today. To again leech from Wikipedia to define the factorial:

;using recursion
(define (factorial n)
(if (= n 0)
1
(* n (factorial (- n 1)))))

I’ll also throw in the other variants Haskell had in Scheme analogy too for comparison:

;using folding.
(define (factorial n)
(foldr * (list-range 1 n)))

;using list products
(define (factorial n)
(list-product (list-range 1 n)))

Before I could even start I had to write these (yes, I like coding in Scheme, why do you ask?):

;folds with right-associativity
(define (foldr binary-operator lst)
(binary-operator (car lst)
(if (null? (cddr lst)) (cadr lst)
(foldr binary-operator (cdr lst)))))

;create a product out of a list of elements
(define (list-product lst)
(if (null? (cdr lst)) (car lst)
(foldr * lst)))))

;make a range
(define (list-range start stop . step)
(let ((step (if (null? step) 1 (car step))))
(cons start
(if (>= start stop) '()
(list-range (+ start step) stop step)))))

Yeah, hate me you zealots of tail-optimization. Now, the difference of the algorithmic paradigm with the purely functional paradigm is quite clear. In Haskell and similar ML-derived languages, we use [1..n] showing that it’s meant to resemble mathematical textbook notation, it’s a built in language-construct of some sorts. Lisp is the antithesis of normal mathematical notation, the notation is purely algorithmic, and consistent above all, an implementation of Scheme will just execute my algorithm, the GHC would probably optimize all those variants to the very same algorithm.

Now, the result in each is the same, but the example I stole from that free encyclopædia is the præferred one by far, even if I had already written those other functions which would make those other implementations a lot faster to write. They both recurse twice, they first recurse to make that list, and secondly to fold a multiplication operator to it. The diligent reader also noticed that I used (null? (cddr lst)) instead of the more intuïtive (< = (length? lst) 2), the simple answer is that again, the entire list has to be traversed to get the length. Of course, this one will give an error if the length is 0 or 1, but folding a list to a binary operator with 0 or 1 elements has no meaning. And I tend to write my code very insecure with very little error-catching if that means I have to check each and every time, I’d make sure at another point that I’m never folding a list with less than two elements. I also claimed all functions were the same, that’s not true, some will give an error when trying to get the factorial of a number lower than two, not exactly sure how Haskell‘d handle this one. Of course, simply using:

;using apply
(define (factorial n)
(apply * (list-range 1 n)))

Also does its work, and is probably the most efficient option. Which is by the way what I find to be the true distinguishing strength of Lisps, not first-class functions, but homo-iconicity (what also helps is that multiplication can take as many arguments as it likes, it folds on its own). Interestingly, this implementation also returns the correct result for the factorial of 0. Because (list-range 1 0) just happens to evaluate to a list containing only the number 1. This is hardly a thing I intended when I made the algorithm, I didn’t care because placing the lower bound higher than the upper is just nonsense, I could have let it return an error, but this shows the functioning of algorithms, there’s no deep underlying mathematical principle behind the correctness of this factorial algorithm, it just works, and by sheer dumb luck.

So, at the least I’ve hopefully made compelling that ‘functional programming’, isn’t as much one paradigm as many people think. I’d personally say that Lisps are is closer to Fortran than they are to ML-derivatives like Haskell, and those again are in fact quite close to Prolog. In the former ‘functional paradigm’ we express algorithms, in the latter we define relations and query over them.

And lambda-Scheme?

To satisfy the idea of the lambda abstraction, we would have to remove the constructs define, letrec and obviously any-thing that ends with ! from the language. The last is obvious, the first two are more subtle, they enable a function to occur within its own defining instance, within a lambda expression, the symbol referring to that function is either bound in the ‘parent’ function, or simply free. In any case, the semantics are different.

…Or so we may be naïve to think, what if we just already had a symbol, let’s say fact that evaluated to the factorial function which we used in the definition of our local factorial function? We can easily just say that letrec without a symbol fact already bound in a higher scope has the very same semantics as let in our hypothetical situation. Some would say that we wound need letrec any way to make it. A: this is of course wrong, we can use a fixed point combinator; B: the same applies to any primitive function like +, we would need it defined to define it.

Problems with recursion

We know that in lambda calculus we have to use a fixed point combinator because we use symbols opposed to variables, all functions are literal and anonymous, the quæstion is if allowing a function to call itself in its body violates the notion of a lambda abstraction per se. One can call functions inside the body of a lambda abstraction, functions which needn’t have been defined before, in fact, this is a quintessential trait of many purely functional languages, use functions before one defines them, or rather, there is no ‘before’ and ‘after’, and I wouldn’t know, it depends on one’s definition of lambda-abstraction again. Though, allowing for it would make them more complicated than they traditionally were, again pointing to the minimalsm. Considering notation such as:

T := (x y. x)
F := (x y. y)
...

where the dots would be a body that enable our shorthand, this is often seen as just defining a quick substitute, what a lot of people seem to forget is that this can in fact be expressed without any meta-language, completely in the lambda calculus itself:

(T F.
...
) (x y. x) (x y. y)

Where again, the dots signal a the scope of the body of our shorthand, from the very definition of a lambda application, and of course recursion without fixed point combinators cannot just be reduce to a simple lambda application any-more. It’s not a definition, it’s a substitution, hence it are symbols, not variables.

What is the difference between a symbol and a variable then? A naïve and simple definition is that a variable is… variable, and a symbol is not. However, I’m personally in favour of a stronger distinction, that is, a symbol can always without change of meaning exchanged for its value. Therefore, symbols truly are just to make things more readable, they aren’t strictly needed.

ML-derived languages being based on the non Turing-Complete System F typed lambda calculus however even need variables in lieu of symbols to be Turing Complete, Lisps can evade this by using fixed points, but a fixed point combinator cannot even be implemented in ML-derived languages without functions being able to already refer to themselves before they are defined.

Does the term ‘functional programming’ have any meaning?

I’ve earlier on already drawn some comparisons between Scheme, and JavaScript, the former’s often held to be of the functional family while the latter isn’t. Many’d probably hate me for comparing what is often held as a work of art and elegance to the often-abused scourge of DHTML scripting, but humour me.

Exactly what functional facility does Scheme have that JavaScript lacks? Scheme doesn’t have pattern matching or type-inference like purely functional languages tend to have. It supports various features that a lot of C-style languages lack such as dynamically creating functions at runtime, functions that can consume or return functions, closures, anonymous functions, but really, so does JavaScript. JavaScript is often held to be object oriented, but Alan Kay, the man who coined that very term disputes this, and in fact said, ironically:

OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.

Alternatively, we could try to establish that control structures in Scheme are fundamentally different from JavaScript, we have: (if condition then-branch else-branch) in Scheme versus if (condition) then-branch else else-branch in JavaScript. At first the former appears to be like any function in Scheme while the latter seems to have a unique syntax to it, normal of control structures in JavaScript. However, the shared syntax is all it shares with functions in the first example, the Scheme specs are pretty transparent about it not being a function (procedure), in addition to not evaluating all of its arguments, it can also not be passed to functions as argument itself or just be returned from it, therefore even if it were a procedure, it isn’t first-class which removes any argument we could make about it in this context.

But obviously, even though the capabilities are the same, Scheme is traditionally used in a much more functional manner, JavaScript code is generally filled with assignment statements while a simple search on ! could provide us with a proof that a Scheme source does not contain any. But why? it’s fairly obvious that Scheme’s syntax lends itself a lot better to the functional paradigm, but why? consindering that

function factorial(n) {
var output = 1;
while(n > 1)
output *= n--;
return output;
}

Is in fact easier to do in the form:

function factorial(n) {
return (n == 0) ? 1 : n * factorial (n-1);
}

Of course, the same choice of flavour is even offered in C, but that doesn’t support first-class functions and closures, et cetera. Which leaves me but to wonder why the recursive expression of the factorial is often seen as a typical example of the use of functional programming, as many C‘ programs also use it. The latter example also shows that if-then-else can be treated as a proper expression, and of course:

function factorial(n) {
return arrayRange(1,n).fold(multiplication);
}

Is also supported, we would of course have to define arrayRange, prototype fold to arrays and create a function multiplication, as unlike in Lisp, there is a proper distinction between operators and functions, so maybe that’s it? I believe that ML-based purely functional languages also treat them differently though. In case you wonder why I didn’t make those functions here, I don’t like coding in JavaScript at all.

So, what is it? Maybe Lisp lends itself better for functional programming because of its use of lists opposed to arrays? Lisp has arrays though, they’re just called vectors, likewise, the OO paradigm of JavaScript allows for an easy implementation of lists. There seems to be a trend amongst languages associated with functional programming to use lists in favour of vectors so that could be it, they do make recursion a lot more convenient.

Though the comparison of Scheme and JavaScript naturally doesn’t hold for all functional languages against all procedural languages, it does show that languages themselves aren’t as much functional as the way people use them, often for no good reason. The true difference between Lisps and JavaScript to me is that the former’s homo-iconic. The obvious difference between Lisps and MLs being static typing, type inference and above all pattern matching.

And ‘declarative programming’?

This term also surfaces a lot, which I find quite possibly even vaguer than functional programming, accordingly Wikipedia:

Declarative programming is often defined as any style of programming that is not imperative. A number of other common definitions exist that attempt to give the term a definition other than simply contrasting it with imperative programming. For example:

  • A program that describes what computation should be performed and not how to compute it
  • Any programming language that lacks side effects (or more specifically, is referentially transparent)
  • A language with a clear correspondence to mathematical logic.

These definitions overlap substantially.

These are conveniently listed in ascending order of vagueness, and the first is pretty vague to begin with. The only sensible definition of the first I could give is to not use algorithms at all, this would make imperative programming synonymous with algorithmic programming, or simply… programming? As the word ‘imperative’ already hints, you tell a computer what to do. Purely declarative programming would thus be describing what relationships your solution must satisfy, and telling the computer to give it, an art in itself, no doubt. We can clearly see this in Haskell’s pattern matching and type inference. We just gave two relationships a factorial function must satisfy and Haskell gives us that function, or maybe not? Seeing the quintessential:

-- List reversing function
rev [] = []
rev (s:xs) = rev xs ++ [s]

To me, this is the on the line between algorithmic and non-algorithmic. The argument on the second function is given in a pattern, with s:xs standing for cons(s,xs), surely this is enough information to unambiguously tell Haskell that the argument must be a list, and to bind the head the symbol s and the tail to xs? In Scheme we would have:

;more obviously algorithmic
(define (rev lst)
(if (null? lst) lst
(append (rev (cdr lst)) (list (car lst)))))

How much does that decomposing argument compromise its status as an algorithm? the way it reads there is in fact a mathematical equation, a property it must satisfy, but could it just be syntactic sugar for a decomposition of the argument? Let us after all not forget that in Scheme in reality, all procedures are single-argument functions, they take a list as argument. Code like (lambda (x y) (+ x y)) can also be expressed as (lambda lst (+ (car lst) (cadr lst))), so clearly the list is bound to a pattern here, the only difference between these two literals is that the former raises an error on more than two arguments, and the latter simply throws the rest of the arguments away. So we could change our function to:

;without argument patterns
(define (rev . lst)
(if (null? lst) lst
(append (apply rev (cdr lst)) (list (car lst)))))

The difference in semantics is now of course that it expects (rev 1 2 3) instead of (rev (list 1 2 3)), the pattern matching also does not go infinitely high up, while (lambda (x y . z) expression ...) is acceptable in lieu of (lambda lst expression ...), (lambda ((x . y) z) expression ...) simply returns an error instead of a procedure that demands the list it is applied to for its car to be pair and its cddr to both exist and be (). It would also be worth nothing that in Haskell rev xs ++ [s] = rev (s:xs) isn’t allowed, so clearly this = symbol here is not our very symmetric mathematical identity relation. It’s often called a ’single assignment’, but how much can we really ‘assign’ a value to a function applied to an argument whose argument is a pattern which is decomposed in the value of the assignment itself? I’m not really sure what it is, it’s neither a true assignment as set! is, nor is it defining a true æquality relation.

However, in Lisp the notation (a . b)cannot be used to create a pair, Lisp’s syntax itself is composed of lists and therefore also pairs, it’s not an operation, it’s syntax. In ML-derivatives, a:b is an actual cons operation which may be used to return a consed value. The æquivalent in Lisp would be (cons a b) or alternatively (cons . (a . (b . ()))), which is evidently a completely different pair.

The second element on the list, having no side-effects is even more vague, of course this assumes our language has some intuïtive correspondence to the function or the subroutine, that a thing can be ‘evaluated’ to produce one in along with the value of its evaluation, what is a ’side-effect’? If I reduce (n. n (x. F) T) (x y. y) to T, then this would take me 3 reductions. How do I know this? Well, every reduction obviously had an extra ‘effect’ besides reducing, it also incremented a counter. And this brings us to the vagueness of ‘effect’, that abstraction cannot reach that ‘effect’ can it? The program can’t either, is expending cycles on a CPU an ‘effect’?

Referential transparency is more clear, it simply says the return value of the function or subroutine must always be identical given the same arguments, well, what are arguments? Depending on the context of that lambda term above, surely T may differ, see the above example. So sending out side-effects that one cannot reach can’t be defined and is related to other similar problems in metaphysics, How can I know if an Angel doesn’t die every time I take a dump if Angles are invisible to man?, and receiving side-effects can just be seen as arguments. Furthermore, side-effects which only happen inside functions to local variables may be algorithmically different, but not functionally, it results into same collection of pairs.

It may not be referentially transparent for instance in PHP to write:

// raises a global value by a given argument and returns the result
function raise_by($raise_by) {
global $global_value;
return $global_value+$raise_by;
}
echo raise_by(5);

As it’s not immediately clear from the instance of the function itself unless one knows the definition, but it does the same as:

// raises a value by a given argument and returns the result
function raise_by($value,$raise_by) {
return $value+$raise_by;
}
echo raise_by($global_value,5);

However:

// updates a counter by a given value and outputs the new value.
function increment_by($increment_by) {
global $global_value;
return $global_value+=$increment_by;
}
echo increment_by(5);

This is the point were we cannot reduce it any-more, it not only kills an Angel, it also has access to how many Angels it has killed. So clearly only outward, or only inward does not break referential transparency, but both at the same time surely do.

The third property is a good example of The Law of Joost There are things that are correct, things that are wrong, and even worse, things that aren’t even wrong any-more., first to formalize it gets an honourable mention in this blog that no one reads, and those that do either hate or not understand.