nihil architecture blogs

content

Why I dislike LaTeX on a fundamental level

LaTeX is pretty much the standard for publishing mathematical documents, a pretty old standard too, and one I dislike on a fundamental level, already I should have acquired copious amounts of nerd-rage by writing this, but you know it turns me one when you guys are mad at me.

My main dislike is hardly the quality of the output—though that could really use some work too, despite what the liberal media may tell you, it’s far from the quality of a professional typesetter—it’s the language used to specify the output. The language to me is so daft and devoid of any reasonable thought, it feels like gotos all over the place again.

The year is 2010, Dijkstra has won, few modern languages still support the goto, those that do often require the code to have special privileges and/or be explicitly placed in some ‘unsafe’ clause. However there are more things to healthy programming, say I’m programming some code which does some-thing with prime numbers, tries to find some pattern in it, just hypothetically, say that initially, I’m not that ambitious, and I investigate only the first 1 000 000 primes. Afterwards, I’m on a lead and I want to scoop it up a notch and go for the first 10 000 000, surely, if I programmed it scalably, I would only have to change the number 1000000 to 10000000 once in my code and the rest adapts right? If I run a gravity simulator, say I want to then try it for two times as much gravity? If I programmed this well I would have to change that number only once right? If I make some program to output a web-page, and I suddenly decide to output XHTML instead of HTML, surely if I did this well I would only have to change one switch some-where and the rest follows?

Okay, so let’s port this idea to LaTeX, a real life example, a friend of mine was once collaborating, he used em-dashes in his version, his parter thought they were too prætentious, so he changed them to commata. He had to edit every single em-dash, a simple substitute didn’t work because it turns out they used --- in more places than em-dashes. It was a frustrating task I can imagine. Now, it’s quite possible to just use \medium-pause every time and let this be output as em-dashes or as commata at one’s pleasure. But this is seldom done, and LaTeX certainly doesn’t make this very convenient nor encourages this. Another quite simple example would be that you’re writing some-thing which uses the speed of light, and in a second edition you find out that the symbol conflicts with some other symbol in the same formula, you’d wish you had used s_l there or some other thing instead of c the entire time, now you have to edit it, probably missing a few, and introducing errors. In my world, I would have used some-thing like [lightspeed] from the start and bound that concept to a symbol at only one piece in the entire code, which is then easily changed. Alternatively, you might find out your publisher’s style doesn’t really like dots and wants crosses or spaces for multiplication, you’re stuck on editing that. You could just search and replace, but of course you want to check. Isn’t it easier to just specify multiplication each time and define the symbol associated with that at some higher level?

Another part we’re always getting hot about is re-use of code, Two or more, use a for!, LaTeX isn’t particularly friendly about this thing either, often you will find multiple æquations which re-use some basic concept into it which often leads to copy-ing and pasting of text. I once learnt that as soon as you copy and paste your code, you’re doing some-thing wrong and I agree with this maxim. Consider this simple example:


$\land$ and $\lor$ distribute over each other, more formally:
\begin{eqnarray}
x \land (y \lor z) &=& (x \land y) \or x (\land z) \\
x \lor (y \land z) &=& (x \lor y) \land x (\lor z)
\end{eqnarray}

Splendid, I’d rather just use:


\mutual_distribution(\land,\lor)

And define once how this is output, and then be done with it. I can change that template once then later if I want to make some changes. I’m to be honest surprised that LaTeX automatically enumerates your æquations for you and you don’t have to do that yourself. And yes, it is possible with some hacking around to make such a template, but the syntax for it is so limited and confuzzling that almost no one does it, and it almost makes things less readable to begin with. I tried and I tried to make this happen a little more but the language was quite clearly never designed for it that well, and it’s madness and pain.

LaTeX lacks a \Beta on its own, quite obviously because it looks visually the same as B so why bother? Well, even if you were a dusty computer scientists who has a fetish with not keeping his mind ordered, it still begs the quæstion in what typeface?, it’s not unlikely that a typeface exists which for clarity puts Greek letters in a slightly different makeup, and I would not think this would be a bad idea, differentiating A and ? in various manuscripts can be very handy. And this requires extensive workarounds in LaTeX, to be expected from a tool coming from a time which prided itself on writing all possible code with the mentality of not looking ahead and taking into account the future. LaTeX code manages to be a reasonable write-only language, an impressive feat considering you don’t even have to mentally follow the flow of computation, that its commenting syntax is so verbose that it discourages people from using them might be another part of the problem.

Another thing I don’t like is how it enters and closes math-mode, sure, if math-modes can’t nest? Why not take the same open and closing tag right? I am sure to speak for all of us when I say that never has debugging been such a pain as LaTeX incorrect LaTeX math-mode closing, the error messages are so enormously cryptic by misplacing a dollar sign that you often have no idea exactly which it is you misplaced and often when you got it to work you still have no idea what you did wrong. It’s like rebooting your computer when some-thing went wrong and you’re still not exactly sure on how you fixed it. It turns out that closing tags help error messages and help you find them. Maybe this is Donald’s savant mind speaking who makes no errors, but for us mortals this is damned annoying.

It’s quite clear that the base of this language was designed in a time when people were practically still cave-men scratching COBOL code in granite walls to keep track of how many mammoths they hunted down.

Finally, on a more meta-level. LaTeX, well, just PDF or Postscript really, defines a lot of things which I feel a document ought not define. Yes, they are languages to specify the layout for a printing place, every pixel, every vector, and I feel that the only thing that should ever receive that data is the printer, or the person who checks it before it goes to the printer. The consumer should never receive that on his or her computer. Things like font size, font type, letter spacing, colour, text width, line spacing, these should all be properties of the viewer not of the document it reads. Supposedly LaTeX is about legibility, and it has used some archaic myths about what is optimal legibility to algorithmically force that down our throats, in reality probably more Donald Knuth’s præferred style. In the end, legibility is subjective, dependent on person, varies through locus and time, has a genetic and an environmental component, what one person considers legible another does not. So people should be able to adjust those settings in their viewer, computers are powerful enough to calculate that on the fly nowadays. Especially since e-readers are coming up, why not use that advantage?

Some people have already done this, it’s called separation of præsentation and content, take apart what you are saying semantically, and how you are layouting what you say for the most part as you can. It’s scalable, it takes into account that the house style of the publishing magazine may later on change, it enables the same document to easily be published in multiple styles depending on the target audience, it also gives the audience itself control over the style if they receive it in digital format, and so on. I for one hate with a passion that new misodendric hype of basically 1 em line spacing in papers. I’m still not really sure if they actually hit enter twice after every line or make some switch that does that automatically for each line, but as they use LaTeX I would not be surprised if they actually accomplished their treeslaughter by the former. Every time I see it, I would kill to have my PDF viewer implement a simple switch that could take that away, it reads both annoyingly and I’d like my grandchildren to have a plentiful supply of oxygen.

Restrictions and clearer languages

After reading about this new C# language, some design choices of it baffled me, which were part of a new trend, I’m not talking about lambda abstractions, about some object oriented features, about things that drastically reduce performance, I’m talking about if(x = y) being illegal. Conditionals in C# apparently require their value to be of the type bool, if the compiler cannot prove they are of type bool, this is considered a static compile time error.

Now, to some programmers, if(x = y) would be naïvely perceived as a typo, or applying basic rules to a C-syntax. But it’s actually quite possibly intended code in many languages. C popularized an interesting concept, assignment wasn’t a statement, but an expression, it had a well defined value, namely, the value that was being assigned. This allowed for ‘chain assignments’ : a = b= c = d, and on most architectures this maps most efficiently to machine operations. It’s also a rather common way in JavaScript to loop over some list, and for that reason the ‘next’ operation yields null rather than an error if there is no next, stopping the loop.

It’s also to many programmers a source of frustration, typos can often lead to unintended code, if(x = 5) is of course always true. Though a lot of programmers have taught themselves to use if(5 == x) to catch those errors early. So C# has chosen to favour the latter problem, a restriction on power and expressiveness to protect people against themselves. Even though the problem could have just as easily been avoided by using := for assignment, oh well.

And this seems to be the trend more and more, restrictions and restrictions on power to protect people ‘against themselves’, my rants on psychiatry and society should make it clear that I am generally opposed to protect people ‘against themselves’. Let people do what they want, maybe they do a thing they’ll regret, maybe they’ll stop doing it afterwards. It’s better to teach a child to not touch a pan by letting it burn its fingers then by just saying ‘don’t do it’ and the child having grown up knowing it shouldn’t do it, but never really knew why. And this is also exactly what these restrictions do, instead of teaching people good programming, and the theory behind it and why they shouldn’t do certain things, you just don’t allow them to do it, they won’t do it because they have no choice, and they’ll never understand why they shouldn’t and they’ll be kept ignorant forever. My hunch is that there are more programmers on the planet that don’t realize why they use == ‘in an if-statement’ and = outside it than those who realize why this is done. Which really was one of the beautiful things of C, it gave the programmer choice, and indeed, at some points it does pay to be able to use assignment inside a conditional. while(obj = array[i++]) ... is code which I very commonly use in languages where accessing outside of the bounds of an array yields null rather than an error, and it probably was designed so for this purpose. If this C# idea catches on, people will probably be kept more and more ignorant and in the end completely be obscure to the fact that== is a binary operator, just like +.

A thing people often misunderstand about this issue is static versus dynamic typing, often things like In statically typed languages, variables have a type, in dynamically typed languages, values have a type. are uttered, quite a crude approximation. Static in most context is the same as ‘lexical’, a statically typed language is designed so that ‘types’ can be inferred from analysing source code alone without running it or computing any values. Static typing is older, and less powerful than dynamic typing, but also more restrictive. Static typing is also easier to implement, for this reason, when the first high level languages came, people had the interesting idea of coming up with ‘type declarations’, so that a compiler could analyse these during compilation and safeguard a programmer against errors whose result may be well defined in machine term operations, but whose result is also ‘nonsensical’ to any human reader. Applying float addition to integers is a perfectly defined operation that produces a new natural number in binary base which you can interpret as whatever you want, a character, an unsigned integer, a signed integer, a float, a four-character ASCII string, it’s all about interpretation of that scalar ordinal value. However in any of these interpretations, to human readers it’ll most likely not make any ’sense’. So if in the source code of programs variables of the ‘wrong’ type were used together, lexically, the code wouldn’t compile, and signal an error. In static typing, this can just be inferred from source code.

Dynamic typing is some-thing different, more powerful, and more dangerous, ‘dynamic’ in this context has the usual meaning of ‘only known when the code is run’, in dynamic tying, values carry a label, which is checked when operations are applied, if it doesn’t match, a runtime error or exception is raised. Dynamic typing was a novelty due to Lisp, a language in which source code itself was dynamic, a lexical analysis was insufficient to determine if types wouldn’t conflict, because the source code itself was no longer static. Other languages adopted this then-performance-costly idea because of the raw power it afforded, the downside is that programmers again had to manually verify their code and follow the logic of it to ensure that type errors would not occur. Which in statically typed languages can be proven by the compiler.

And proven is a very important word here, it can be proven by a compiler, but not decided, it’s theoretically impossible to decide if a type error is going to occur lexically. Where deciding means that all programs you reject will have type errors, and all you accept will not. Compilers only offer the guarantee that accepted programs will not have them. It’s quite possible that a rejected program will also not have it, but as long as the compiler can’t prove it, it will reject. This is a pretty awkward mentally. ‘I do not accept your program, for I cannot show it will work.’, rather than ‘I do not accept, for I can show that it will not work’, however, though the latter path is also possible, choosing it will necessarily leave open the option of type errors. A trivial example would be:

{int a, b; float c; a = true ? b : c;}

This will usually be rejected in statically typed languages, even-though a type error will never occur from this, of course, this example is quite useless but a more useful example would be:

function concact(x,y) {
if("array" == typoeof x && "array" == typeof y)
return (new Array).append(x,y);
else if ("string" == typoeof x && "string" == typeof y)
return x + y;
else if("number" == typoeof x && "number" == typeof y)
return x + y;
else return null;
}

Dynamically typed languages often offer facilities for runtime type-checking to choose a program flow, the same thing happened as above, a statically typed language would reject this example, even though for whatever input it may get, a type error cannot possibly occur within this function, a compiler has no way to prove this algorithmically from a lexical analysis. Lexically the function + which applies to two numbers is applied to the same variables in the same lexical environment whereon array_append which applies to two arrays is also applied. The compiler cannot prove no type error is going to occur. Many people who coded in ‘archaic’ static languages often felt the pain of having to re-write and overload functions that do the exact same thing that have to deal with both integers and floats, but templates and generics did offer a marginal solution.

The real rescue for this problem in statically typed languages is parametric polymorphism, where types are not constants, but variables. The function above is then ‘typed’ as simply ‘taking two identical types, producing a value of same type, or null’.

Static typing is not only a restriction on programmers to ‘protect them against themselves’, it’s also taking with it a lot of things that need no protecting. It’s banning cars because some of them are unsafe, and taking the safe one’s with it because you have no way to infer præcisely which are safe.

Other languages have evolved a different tactic, they aim to make their syntax ‘clear’, a well known and ridiculously failed example is a certain language where the now ubiquitously readable action of C++; would be similarly expressed as ADD 1 TO COBOL GIVING COBOL, as it turned out, use and conventions make readability, not natural language. Though the latter example is a lot clearer to people never having programmed. The sheer mass of exposure to the former form makes it immediately clear. A dynamically typed and less extreme example would be:

def factorial(n):
if x > 1:
return factorial(n - 1)
else:
return 1

Python, often using English words, some-what reads like English, define factorial in n: If x is greater than 1, return factorial of n minus 1, else return 1. It has shown that it helps some people to read the code aloud in their mind, one of the ancestors of Python has a wholly different vision:

(define (factorial n) (apply * (range 1 n)))

Or at least how I would write it, people often say that Lisp is ‘unreadable’ but I beg to differ, in my opinion Lisp code is exceptionally clear, consistent, there is no such thing as operator præcedence, not only is the delimiting of lexical blocks explicit, the limit of an expression is, the range of an operation is and so on, with syntax highlighting there is no confusion if the code you edit is still inside of the scope of your control structure and so on, the one thing is, it’s impossible to read it like English. Why does infix notation exist even though it’s inconsistent, is it more clearer due to conventions, or because x = y can be read as ‘x is y’ in English?

I know that to be able to read it as English or to be able to use layout indentation doesn’t work for me. I find myself counting invisible characters in Python and then summing them to determine block structure. I like the fact that in Scheme, what it does is clear from a simple rule that requires no thought at all, the value of evaluating a list is the evaluation of the head as an expression applied to the evaluation of the other forms. But I’ve found that most people like to see ‘a variable’ as being identical to the value, rather than evaluating to a value.

Another thing lately while discussing some features of Clojure is when I found that most Clojure programmers see ‘a list’ as ‘an ordered collection’ for all purposes not that different from a vector. Thinking about it that way just gives me shrivels when programming, ‘a list’ to me is a binary tree which on its outer right leave points to a special nil constant, or that constant itself. This is what I ’see’ conceptually when I work with lists, when I get their head or their tail. I also found out there an interesting differing view in that those that advocated against using cadddr in lieu of simply fourth found themselves translating ‘the fourth item of the list’ to ‘the head of the tail of the tail of the tail’ of the list. Where those that insisted on cadddr being clearer found themselves wanting ‘the car of the cdr of the cdr of the cdr of the pair’ and thinking ‘That would be uumm, the fourth of the list.’. The Zen of Python famously includes There should be only one obvious way to do it., I find that when people have such conceptions about ‘clear code’, they forget that different people have a differing view on ‘clear code’, many people’ll call APL, Forth or Lisp a mistake because the code is unreadable, users of either often passionately disagree and say a program is instantly readable and very clear. The fact that these languages have endured so long should point to this. But they do have in common that they can’t be read out loud in English. I’m a visual thinker, most people I know that defend Lisp are so likewise, maybe that explains some things. I never ‘read my code aloud’, I don’t think English is a very good language to express mathematical or sequential logic in. Notation like forall x forall y : ( forall z : ¬(z in y)) -> x + y = x is a lot clearer than ‘For all x for all y, if for all z, z is not in y, then x plus y æquals x’, but this is often shortened down to ‘x plus zero is always x, where we encode zero as the empty set.’ Still, that doesn’t define the idea ‘empty set’. I personally rather see code as what it is, a series of expressions evaluating to a value, that value then taken as the value of an outer expression to work with. Different people, different methodology. Which is also a reason I don’t think people should be restricted, different restrictions work differently for different people.

From my perspective though, seeing code as English is a bad idea and I can imagine that people make errors like if(x = y) where they mean if(x == y), once you see them both as binary in the same vein of + where the former simply evaluates to its second operant always and has the side effect of changing the value of the memory location the first points to that of the second. And the latter operation evaluates to true or false depending on the æquality of both. But some instead ‘choose’ to read if(x == y) a = b; as ‘if x is y, then a is b’, which is where the trouble begins is my hunch. It’s a very approximate and awkward simulation of assertive mathematics, programming language are not assertive and do not constrain their variables concordantly the truth of the termination value.

Of course, I said I was averse to protecting people ‘from themselves’, I have nothing against protecting people ‘from others’. I think it’s perfectly acceptable that an aviation company demands its software be written in a very type safe and very restrictive language. But ideally this should already be done by people who’ve learnt the hard way what a simple typo can cause and understand why the restrictions are there and what exactly they prævent. Many people nowadays start learning programming with C#, ideally people should be taught programming in a language without any type checking at all to firmly grasp the difference between float and integer operations.

On functional programming

Aah, functional programming, a source of elitism, to separate the common from the elite, or according to the common just intellectual masturbation. What it is no-one really knows, endless wars fought over which languages are functional and which are not…

Now, functional programming seems to have some intrinsic link to the lambda calculus, many programming languages even introduce functions by a  ’lambda keyword’ for no good reason at all, just to be different, for the rest of this discourse I shall use the notation (x.. . E..) in favour of λx.. . E..—I believe it suffices, it’s easier to type out, and nicely delimits the scope of symbols. That lambda was just an excluse to get a Greek letter into it any-way.

Well, I think that in the end, the relationship between lambda calculus and functional programming languages are completely overstated. In fact, I think lambda calculus has little to do with ‘functions’ in the mathematical sense, what is a function? Commonly a function in mathematics is just defined as a collection of pairs in such a way that no first element of those pairs is ever paired with two distinct second elements, though the converse needn’t be true. So it’s just obviously a pairing of elements from the domain of the function to the image of that function. Are lambda abstractions a collection of pairings? Maybe they are in some interpretation, but I more like to see lambda abstractions as a way to get those pairings, they aren’t functions, they’re algorithms, a completely distinct concept. They are instructions to compute functions. They’re just a list of instructions some-one or some-thing can follow, you start with a random first element of such a pair, follow the instructions, and you end up with the second. Algorithms can thus be used as a way to define functions, by simply giving their algorithms, this practice of course has the immediate advantage that using the function then becomes an easy task. A disadvantage is that algorithms are often very long and not all functions can be expressed in algorithms.

What is a lambda abstraction?

I’ve yet to come to a standardized and authoritative definition of the concept, but I think we can safely say that a lambda application (x. E) A with x a symbol and E string of symbols must be identical to E with all occurrences of x in that string replaced by A. Commonly written as E[x:=A]. That was one of the interesting things about lambda calculus, that all one needed to define all algorithms was just basically a way to re-order symbols. Obviously this simple identity is not satisfied any more by a lot of languages that use the lambda keyword to introduce function literals, in fact, a lot have a well-defined evaluation order because the Church-Rosser Theorem—which implies the order of evaluation to be irrelevant—doesn’t hold on those languages. Lambda calculus would become a lot more complex if an evaluation order had to be specified and one often cited beauty of it is its minimalism.

Purely functional languages

The poster child of ‘purely’ functional programming languages is often held to be Haskell, named after the guy who if it wasn’t enough already got the term ‘currying’ named wrongly after him. Haskell’s notation is a bit strange, to leech the examples from Wikipedia: (Hereby this document is licensed under the GNU Free documentation licence too)

-- using recursion
factorial 0 = 1
factorial n = n * factorial (n - 1)

-- using lists
factorial n = product [1..n]

-- using recursion but written without pattern matching
factorial n = if n > 0 then n * factorial (n-1) else 1

-- using fold
factorial n = foldl (*) 1 [1..n]

-- using only prefix notation and n+k-patterns
factorial 0 = 1
factorial (n+1) = (*) (n+1) (factorial n)

Here we have no less than five definitions of the same function in Haskell, we can group these in a number of ways, what’s interesting is the difference between the inner three and the outer two. The inner three are algorithmic, it defines it by giving a way to some reader who understands all those symbols but has an I.Q. of 0 otherwise to also compute the function for any value we want. The other two rely on pattern matching, that is, we simply give up a set of relations, two cases in each, with that function must satisfy and the implementation figures out its own algorithm, in fact, seeing that Haskell is purely functional, even though we could have specified an algorithm that was just inefficient, the implementation would be free to change our algorithm to something that’s more efficient, as long as the return value is the same, as long as the pairs end up being identical, clearly we’re not dealing with algorithms, we’re dealing with functions. It’s the hallmark of purely functional programming languages that algorithms can be changed by the implementation.

Some one once told me that the pattern matching version is just syntactic sugar for the if-then-else version. Well, what is syntactic sugar? I mean, you can in the end justify in some way that C, or even Haskell, is just ’syntactic sugar’ for assembly no? Clearly we need some restriction to that definition. One could propose that needing a Turing Complete model of computation to automate the re-write to what it’s supposedly sugar for exceeds the limits of syntactic sugar. Maybe even restrict it more, in any sense, it’s quite clear that in the C code array[index]—being sugar for *(array+index)—needs a less powerful re-writing system than all the pattern matching that Haskell supports, to begin with the order of the patterns in many languages that support pattern matching is irrelevant, and other functions patterns, or relationships can freely be put in-between. In effect, Haskell is a sharp turn away from algorithms, the word programming itself comes from a Greek root which will be more recognisable in an uninflicted form ‘prographien’, essentially Greek for the latin ‘praescribere’ or ‘to præscribe’, simply a set of instructions which tells some thing what to do. The English word ‘program’ also has a certain connotation to ‘a list in which the order of the items is very much relevant’.

Update:, some-one also pointed out to me correctly that case-statements and pattern matching are different than a true if-then-else in ML based languages because if-then-else’s require an else-branch whereas case-statements do not, which is essential for the type inference of ML-based languages showing that it cannot be replaced by a simple ‘internal-if-then-else’.

Often, people say that Haskell is based on System F, what is the difference between System F and other lambda calculi any-way? All I know is the type system, that’s it, so the only reason it could be based on System F and not simply typed lambda calculus or untyped, is the type system. And that’s more or less the case I guess, Haskell hardly works with lambda applications and abstractions as its main building blocks, but its type system is indeed based on System F. Indeed, it wouldn’t even be Turing Complete using only lambda terms and that type system, it has to be extended with other constructs.

That’s not to say that there’s at all a thing wrong with Haskell or other purely functional languages, or that it’s easy to do, it’s just not algorithmic and therefore it deals in functions and not lambda abstractions. Haskell’s notation is intended to read like definitions in mathematical textbooks, (a thing that’s always amused me is that they use if-else-then as a language construct, while surely in a purely functional environment it could simply be a three-argument function?)

The Lisp language family

One of the oldest language families still in use, and commonly credited with popularizing functional programming, and this is the point where I’ve lost any grip on the meaning of ‘functional programming’? I can see why purely functional languages are, because they don’t have subroutines as in C, they have functions in the very mathematical sense of the word, doing what functions do, they define pairs of input and output, nothing more. Lambda abstractions as established before do a bit more, they also tell you how to calculate that output from that input. But surely I can just keep my C subroutines free from side-effects—except main,of course—and have the very same semantics Lisps; on top of that, Lisps actually allow side-effects to occur. The lexical scope of most mainstream variants today allows for functions to lose their referential transparency by re-defining global variables they use in their body, some Scheme programmers see this as a mark of either supreme bad style, or simply having picked the wrong language, hence all procedures that allow for this all end in an exclamation mark, but Common Lisp programmers do this all the time and the language is a lot more tailored to this.

So why is it called functional programming then? and what’s the connexion people often point to purely functional languages and this? Is it some old relic because Lisp’s just that old and was far more functional than any other language in its early days? Or the back-then prætentious use of the lambda keyword to introduce literal functions? Is it that procedures are first class-objects that can thus be passed as arguments to functions, and dynamically generated at run-time? JavaScript has all those facilities, so why isn’t that functional? Because it simply uses the humble function keyword to introduce literals oppose to lambda?

Focussing on Scheme, the clearly more-functional of the two main dialects in use today. To again leech from Wikipedia to define the factorial:

;using recursion
(define (factorial n)
(if (= n 0)
1
(* n (factorial (- n 1)))))

I’ll also throw in the other variants Haskell had in Scheme analogy too for comparison:

;using folding.
(define (factorial n)
(foldr * (list-range 1 n)))

;using list products
(define (factorial n)
(list-product (list-range 1 n)))

Before I could even start I had to write these (yes, I like coding in Scheme, why do you ask?):

;folds with right-associativity
(define (foldr binary-operator lst)
(binary-operator (car lst)
(if (null? (cddr lst)) (cadr lst)
(foldr binary-operator (cdr lst)))))

;create a product out of a list of elements
(define (list-product lst)
(if (null? (cdr lst)) (car lst)
(foldr * lst)))))

;make a range
(define (list-range start stop . step)
(let ((step (if (null? step) 1 (car step))))
(cons start
(if (>= start stop) '()
(list-range (+ start step) stop step)))))

Yeah, hate me you zealots of tail-optimization. Now, the difference of the algorithmic paradigm with the purely functional paradigm is quite clear. In Haskell and similar ML-derived languages, we use [1..n] showing that it’s meant to resemble mathematical textbook notation, it’s a built in language-construct of some sorts. Lisp is the antithesis of normal mathematical notation, the notation is purely algorithmic, and consistent above all, an implementation of Scheme will just execute my algorithm, the GHC would probably optimize all those variants to the very same algorithm.

Now, the result in each is the same, but the example I stole from that free encyclopædia is the præferred one by far, even if I had already written those other functions which would make those other implementations a lot faster to write. They both recurse twice, they first recurse to make that list, and secondly to fold a multiplication operator to it. The diligent reader also noticed that I used (null? (cddr lst)) instead of the more intuïtive (< = (length? lst) 2), the simple answer is that again, the entire list has to be traversed to get the length. Of course, this one will give an error if the length is 0 or 1, but folding a list to a binary operator with 0 or 1 elements has no meaning. And I tend to write my code very insecure with very little error-catching if that means I have to check each and every time, I’d make sure at another point that I’m never folding a list with less than two elements. I also claimed all functions were the same, that’s not true, some will give an error when trying to get the factorial of a number lower than two, not exactly sure how Haskell‘d handle this one. Of course, simply using:

;using apply
(define (factorial n)
(apply * (list-range 1 n)))

Also does its work, and is probably the most efficient option. Which is by the way what I find to be the true distinguishing strength of Lisps, not first-class functions, but homo-iconicity (what also helps is that multiplication can take as many arguments as it likes, it folds on its own). Interestingly, this implementation also returns the correct result for the factorial of 0. Because (list-range 1 0) just happens to evaluate to a list containing only the number 1. This is hardly a thing I intended when I made the algorithm, I didn’t care because placing the lower bound higher than the upper is just nonsense, I could have let it return an error, but this shows the functioning of algorithms, there’s no deep underlying mathematical principle behind the correctness of this factorial algorithm, it just works, and by sheer dumb luck.

So, at the least I’ve hopefully made compelling that ‘functional programming’, isn’t as much one paradigm as many people think. I’d personally say that Lisps are is closer to Fortran than they are to ML-derivatives like Haskell, and those again are in fact quite close to Prolog. In the former ‘functional paradigm’ we express algorithms, in the latter we define relations and query over them.

And lambda-Scheme?

To satisfy the idea of the lambda abstraction, we would have to remove the constructs define, letrec and obviously any-thing that ends with ! from the language. The last is obvious, the first two are more subtle, they enable a function to occur within its own defining instance, within a lambda expression, the symbol referring to that function is either bound in the ‘parent’ function, or simply free. In any case, the semantics are different.

…Or so we may be naïve to think, what if we just already had a symbol, let’s say fact that evaluated to the factorial function which we used in the definition of our local factorial function? We can easily just say that letrec without a symbol fact already bound in a higher scope has the very same semantics as let in our hypothetical situation. Some would say that we wound need letrec any way to make it. A: this is of course wrong, we can use a fixed point combinator; B: the same applies to any primitive function like +, we would need it defined to define it.

Problems with recursion

We know that in lambda calculus we have to use a fixed point combinator because we use symbols opposed to variables, all functions are literal and anonymous, the quæstion is if allowing a function to call itself in its body violates the notion of a lambda abstraction per se. One can call functions inside the body of a lambda abstraction, functions which needn’t have been defined before, in fact, this is a quintessential trait of many purely functional languages, use functions before one defines them, or rather, there is no ‘before’ and ‘after’, and I wouldn’t know, it depends on one’s definition of lambda-abstraction again. Though, allowing for it would make them more complicated than they traditionally were, again pointing to the minimalsm. Considering notation such as:

T := (x y. x)
F := (x y. y)
...

where the dots would be a body that enable our shorthand, this is often seen as just defining a quick substitute, what a lot of people seem to forget is that this can in fact be expressed without any meta-language, completely in the lambda calculus itself:

(T F.
...
) (x y. x) (x y. y)

Where again, the dots signal a the scope of the body of our shorthand, from the very definition of a lambda application, and of course recursion without fixed point combinators cannot just be reduce to a simple lambda application any-more. It’s not a definition, it’s a substitution, hence it are symbols, not variables.

What is the difference between a symbol and a variable then? A naïve and simple definition is that a variable is… variable, and a symbol is not. However, I’m personally in favour of a stronger distinction, that is, a symbol can always without change of meaning exchanged for its value. Therefore, symbols truly are just to make things more readable, they aren’t strictly needed.

ML-derived languages being based on the non Turing-Complete System F typed lambda calculus however even need variables in lieu of symbols to be Turing Complete, Lisps can evade this by using fixed points, but a fixed point combinator cannot even be implemented in ML-derived languages without functions being able to already refer to themselves before they are defined.

Does the term ‘functional programming’ have any meaning?

I’ve earlier on already drawn some comparisons between Scheme, and JavaScript, the former’s often held to be of the functional family while the latter isn’t. Many’d probably hate me for comparing what is often held as a work of art and elegance to the often-abused scourge of DHTML scripting, but humour me.

Exactly what functional facility does Scheme have that JavaScript lacks? Scheme doesn’t have pattern matching or type-inference like purely functional languages tend to have. It supports various features that a lot of C-style languages lack such as dynamically creating functions at runtime, functions that can consume or return functions, closures, anonymous functions, but really, so does JavaScript. JavaScript is often held to be object oriented, but Alan Kay, the man who coined that very term disputes this, and in fact said, ironically:

OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.

Alternatively, we could try to establish that control structures in Scheme are fundamentally different from JavaScript, we have: (if condition then-branch else-branch) in Scheme versus if (condition) then-branch else else-branch in JavaScript. At first the former appears to be like any function in Scheme while the latter seems to have a unique syntax to it, normal of control structures in JavaScript. However, the shared syntax is all it shares with functions in the first example, the Scheme specs are pretty transparent about it not being a function (procedure), in addition to not evaluating all of its arguments, it can also not be passed to functions as argument itself or just be returned from it, therefore even if it were a procedure, it isn’t first-class which removes any argument we could make about it in this context.

But obviously, even though the capabilities are the same, Scheme is traditionally used in a much more functional manner, JavaScript code is generally filled with assignment statements while a simple search on ! could provide us with a proof that a Scheme source does not contain any. But why? it’s fairly obvious that Scheme’s syntax lends itself a lot better to the functional paradigm, but why? consindering that

function factorial(n) {
var output = 1;
while(n > 1)
output *= n--;
return output;
}

Is in fact easier to do in the form:

function factorial(n) {
return (n == 0) ? 1 : n * factorial (n-1);
}

Of course, the same choice of flavour is even offered in C, but that doesn’t support first-class functions and closures, et cetera. Which leaves me but to wonder why the recursive expression of the factorial is often seen as a typical example of the use of functional programming, as many C‘ programs also use it. The latter example also shows that if-then-else can be treated as a proper expression, and of course:

function factorial(n) {
return arrayRange(1,n).fold(multiplication);
}

Is also supported, we would of course have to define arrayRange, prototype fold to arrays and create a function multiplication, as unlike in Lisp, there is a proper distinction between operators and functions, so maybe that’s it? I believe that ML-based purely functional languages also treat them differently though. In case you wonder why I didn’t make those functions here, I don’t like coding in JavaScript at all.

So, what is it? Maybe Lisp lends itself better for functional programming because of its use of lists opposed to arrays? Lisp has arrays though, they’re just called vectors, likewise, the OO paradigm of JavaScript allows for an easy implementation of lists. There seems to be a trend amongst languages associated with functional programming to use lists in favour of vectors so that could be it, they do make recursion a lot more convenient.

Though the comparison of Scheme and JavaScript naturally doesn’t hold for all functional languages against all procedural languages, it does show that languages themselves aren’t as much functional as the way people use them, often for no good reason. The true difference between Lisps and JavaScript to me is that the former’s homo-iconic. The obvious difference between Lisps and MLs being static typing, type inference and above all pattern matching.

And ‘declarative programming’?

This term also surfaces a lot, which I find quite possibly even vaguer than functional programming, accordingly Wikipedia:

Declarative programming is often defined as any style of programming that is not imperative. A number of other common definitions exist that attempt to give the term a definition other than simply contrasting it with imperative programming. For example:

  • A program that describes what computation should be performed and not how to compute it
  • Any programming language that lacks side effects (or more specifically, is referentially transparent)
  • A language with a clear correspondence to mathematical logic.

These definitions overlap substantially.

These are conveniently listed in ascending order of vagueness, and the first is pretty vague to begin with. The only sensible definition of the first I could give is to not use algorithms at all, this would make imperative programming synonymous with algorithmic programming, or simply… programming? As the word ‘imperative’ already hints, you tell a computer what to do. Purely declarative programming would thus be describing what relationships your solution must satisfy, and telling the computer to give it, an art in itself, no doubt. We can clearly see this in Haskell’s pattern matching and type inference. We just gave two relationships a factorial function must satisfy and Haskell gives us that function, or maybe not? Seeing the quintessential:

-- List reversing function
rev [] = []
rev (s:xs) = rev xs ++ [s]

To me, this is the on the line between algorithmic and non-algorithmic. The argument on the second function is given in a pattern, with s:xs standing for cons(s,xs), surely this is enough information to unambiguously tell Haskell that the argument must be a list, and to bind the head the symbol s and the tail to xs? In Scheme we would have:

;more obviously algorithmic
(define (rev lst)
(if (null? lst) lst
(append (rev (cdr lst)) (list (car lst)))))

How much does that decomposing argument compromise its status as an algorithm? the way it reads there is in fact a mathematical equation, a property it must satisfy, but could it just be syntactic sugar for a decomposition of the argument? Let us after all not forget that in Scheme in reality, all procedures are single-argument functions, they take a list as argument. Code like (lambda (x y) (+ x y)) can also be expressed as (lambda lst (+ (car lst) (cadr lst))), so clearly the list is bound to a pattern here, the only difference between these two literals is that the former raises an error on more than two arguments, and the latter simply throws the rest of the arguments away. So we could change our function to:

;without argument patterns
(define (rev . lst)
(if (null? lst) lst
(append (apply rev (cdr lst)) (list (car lst)))))

The difference in semantics is now of course that it expects (rev 1 2 3) instead of (rev (list 1 2 3)), the pattern matching also does not go infinitely high up, while (lambda (x y . z) expression ...) is acceptable in lieu of (lambda lst expression ...), (lambda ((x . y) z) expression ...) simply returns an error instead of a procedure that demands the list it is applied to for its car to be pair and its cddr to both exist and be (). It would also be worth nothing that in Haskell rev xs ++ [s] = rev (s:xs) isn’t allowed, so clearly this = symbol here is not our very symmetric mathematical identity relation. It’s often called a ’single assignment’, but how much can we really ‘assign’ a value to a function applied to an argument whose argument is a pattern which is decomposed in the value of the assignment itself? I’m not really sure what it is, it’s neither a true assignment as set! is, nor is it defining a true æquality relation.

However, in Lisp the notation (a . b)cannot be used to create a pair, Lisp’s syntax itself is composed of lists and therefore also pairs, it’s not an operation, it’s syntax. In ML-derivatives, a:b is an actual cons operation which may be used to return a consed value. The æquivalent in Lisp would be (cons a b) or alternatively (cons . (a . (b . ()))), which is evidently a completely different pair.

The second element on the list, having no side-effects is even more vague, of course this assumes our language has some intuïtive correspondence to the function or the subroutine, that a thing can be ‘evaluated’ to produce one in along with the value of its evaluation, what is a ’side-effect’? If I reduce (n. n (x. F) T) (x y. y) to T, then this would take me 3 reductions. How do I know this? Well, every reduction obviously had an extra ‘effect’ besides reducing, it also incremented a counter. And this brings us to the vagueness of ‘effect’, that abstraction cannot reach that ‘effect’ can it? The program can’t either, is expending cycles on a CPU an ‘effect’?

Referential transparency is more clear, it simply says the return value of the function or subroutine must always be identical given the same arguments, well, what are arguments? Depending on the context of that lambda term above, surely T may differ, see the above example. So sending out side-effects that one cannot reach can’t be defined and is related to other similar problems in metaphysics, How can I know if an Angel doesn’t die every time I take a dump if Angles are invisible to man?, and receiving side-effects can just be seen as arguments. Furthermore, side-effects which only happen inside functions to local variables may be algorithmically different, but not functionally, it results into same collection of pairs.

It may not be referentially transparent for instance in PHP to write:

// raises a global value by a given argument and returns the result
function raise_by($raise_by) {
global $global_value;
return $global_value+$raise_by;
}
echo raise_by(5);

As it’s not immediately clear from the instance of the function itself unless one knows the definition, but it does the same as:

// raises a value by a given argument and returns the result
function raise_by($value,$raise_by) {
return $value+$raise_by;
}
echo raise_by($global_value,5);

However:

// updates a counter by a given value and outputs the new value.
function increment_by($increment_by) {
global $global_value;
return $global_value+=$increment_by;
}
echo increment_by(5);

This is the point were we cannot reduce it any-more, it not only kills an Angel, it also has access to how many Angels it has killed. So clearly only outward, or only inward does not break referential transparency, but both at the same time surely do.

The third property is a good example of The Law of Joost There are things that are correct, things that are wrong, and even worse, things that aren’t even wrong any-more., first to formalize it gets an honourable mention in this blog that no one reads, and those that do either hate or not understand.

Oh mathematics is so perfect

This is probably extremely useless info for most people and trivial for others but I just realized this and it’s too elegant to not bother the few that both don’t know of it and care about it with it. Basically, in mathematics there is a principle called ‘from a contradiction follows all’, or when you’ve stumbled on a contradiction, any proposition that can be expressed in your language is automatically true. Or rather, you may infer from that any so called ’sentence’ from your language, as sentences negated are also sentences… well… contradictions lead to contradictions. Often a contradiction is defined as for any sentence S, if we can derive:

S ∧ ¬S

We have a contradiction, the ¬ symbol takes a sentence, and negates it, which basically says that its falsity is true (note that that is different from saying that it’s not true or that it ’s false, but we’ll ignore that henceon for convenience sake.) and ∧ just means ‘and’.

So, how can we prove from that every other sentence? Well, it’s really quite simple:

  1. S ∧ ¬S
  2. S ∨ T (since S is true, then surely only S is true, or only T, or both for any random T)
  3. T (but we just said that the falsity of S is true, and since S or T are true, if S is false, then T must be true right?)

Well, there you have it, any random sentence T with a truth value can be easily demonstrated to be true, which is… unhandy, it’s not that inconsistent mathematics is ‘wrong’, it’s just not very practical and above all not very interesting.

Logical systems may have another property, that’s called functional completeness, functional completeness basically means that we can compose from our functions all possible functions that take any number of arguments from {0,1} (true and false) and return a member of {0,1}. So what we want is basically a system in which we can construct any permutation of truth values. Now truth operations are in principle functions, when we say S ∧ T we can also express that concept as ∧(S,T) = 1.

    Most often, the canonical example given for a system that is functionally complete is a system that contains the operations ∧,∨,→,¬ or AND, OR, IMPLIES, and NOT respectively, which may be defined as:

    • AND: (1,1) » 1; else: 0
    • OR: (0,0) » 0; else: 1
    • NOT: (0) » 1; (1) » 0
    • IMPLIES: (1,0) » 0; else: 1

    It should become apparent that NOT is quite special as it’s the only function to take one argument. But this definition is extremely redundant because we may construct IMPLIES as follows:

    S → T := ¬S ∨ T

    Just do the maths, they return the same value each on the same inputs, so we can scrap IMPLIES from the list and still retain functional completeness, after all, we can construct it from the other functions. But it doesn’t end there, we can also scrap OR from the list by:

    S ∨ T := ¬(¬S ∧ ¬T)

    Quite intuitive, at least one of them is true if it’s not true, that both of them are not true, right? If both of them are not true, then surely not one of them can be true? So, we have a functionally complete system with only NOT and AND as we can construct the rest from it. But it goes on, we could have also opted to scrap AND by:

    S ∧ T := ¬(¬S ∨ ¬T)

    Both of them are true if and only if it isn’t true that at least one of them is not true, right? So, NOT and OR alone is also functionally complete. Lastly, the least intuitive one is this:

    S ∨ T := (S → T) → T

    Or, at least one of S,T is true if that S implies T is true implies T. Walk with me on this one, say that S is true here, then T is true, if T is true follows from that then as T implies itself of course, T is also true, thus the whole picture is true if S is true regardless of the other situation. If T is true, then S doesn’t matter any more and the fact that S implies T is irrelevant to the truth of T, so if T is true this picture also implies T, thus we have established that this returns true at least if either S or T are true. Now we want it to return false if both aren’t true. Well, we already established that IMPLIES returns true in any case that it’s first argument is not true. Thus it collapses, by that to TRUE → FALSE, which by the definition above is the only possible combination of arguments that results into FALSE.

    So we’ve established that AND can be constructed using only OR and NOT, that OR can be constructed using only IMPLIES and NOT, and that OR can be constructed using only AND and NOT and that IMPLIES can be constructed using only AND and NOT, so, for a functionally complete system all we need is NOT, and one of the others. People just love redundancy don’t they?

    This is all fine and nice, but back to that contradiction, how do we define it, well, we stated S ∧ ¬S, that’s the most common, but not the only way, another is: S → ¬S, for surely if a sentence implies its own negation, that automatically leads to the former definition? And lastly, you guessed it ¬(S ∨ ¬S) . So, it all falls into place, if a system can state a contradiction, it is functionally complete, if it is functionally complete it can state a contradiction, another one of those fine things which show why God hates us and why all systems that have some useful and cool property automatically also have a major downside that comes with it.

    And no—you zealots of rigour, this is not intended as a proof, this is an illustration, I’m not going to Hilbert-style this on a blog that’s mainly for a relative take on holocaust denial.

    script tag

    Any one remotely familiar with infernal web coding will recognise this:

    <script src="virus.js" type="text/javascript" />

    Most people into serious web design rather not see this tag around. Then there are people that are even more seriously into web design and recognise that it can still be there as long as it doesn’t break any rules and is not needed to let the page function and it still operates like a proper document if author style and scripts are turned off. Strangely a lot of people that hate Javascript do it to be cool and still make sites that are not a logically structured document and fail to work with author styles turned off. That aside, all people that failed to understand what I just typed bugger off, the hardcore remaining I want to bore with this issue.

    Now, that infamous tag above, what does it mean really? I haven’t the frankest idea and it has been been bugging me for quite some while. I know what browsers do when they encounter it, I just don’t know what it means in a descriptive sense. For instance:

    <meta name="author" content="Jan Smit" />
    <base href="http://jansmitzedeloosheidsdiensten.nl" />
    <title>Jan Smit, Zedeloosheidsdiensten NV</title>

    All have defined meaning, respectively ‘the author of this page is Jan Smit’, ‘the base relative uri of this document is http://jansmitzedeloosheidsdiensten.nl’ and ‘the title of this document is Jan Smit, Zedeloosheidsdiensten NV’. Now, a lot of people’ll say that the script tag is a link to a script. But it isn’t, observe this famous example:

    <link rel="stylesheet" type="text/css" media="print" href="print.css" />

    This is a link to an external style sheet. Now a link is what makes the web great and why it’s called the web all documents on it link to other documents. And the rel attribute of a link defines the relation the other document has to the current one. In this case our semantics say ‘There exists a document on the uri print.css which relation to this document is that it is its stylesheet, which is for print media and has a mime-type of text/css.’, that is what the document says, now of course a browser if it knows this should be intelligent enough to apply that style sheet to the document after fetching it.

    But the script attribute is not the equivalent for scripts. For one, it can be placed in the body of a document, it’s part of is logical structure, it can come after or before a certain paragraph or heading in it. Clearly this makes no sense if it had the semantics the link above had. Is it comparable to an a element then? Which says: ‘This part of the document specifically is linked to this other document’, well, no. Because it defines per definition no part of the document that is because if the element has content in it, it’s the script itself.

    It’s like the img element in the end, the reason that it also has a src and not a href attribute. It doesn’t link to an image, it says ‘At this part in the document is this image which may be located at…’ and the script element does the same for scripts. Except that the ‘this part’ makes no sense here, an image is part of the document itself, of its logical structure, a script isn’t. The script was a mistake from day one. And an image can be linked to too, it just changes the meaning, say that our document is a news article, it can then link to an image which’ relation is a large picture of the event in quæstion or an alternate format. Or perhaps a tutorial that is also available in the form of an image.

    Now, the obvious solution is this:

    <link rel="script" type="application/ecmascript" media="screen" href="virus.js" />

    which also provides a means to automatically disable scripts on certain media. But strangely no browser seems intelligent enough to force that script over that page if its told ‘A document whose relation to this document is that it’s its script of the mime-type application/ecmascript for screen media is located at virus.js”, also despite that a search on the web seems to suggest that more people wonder why ‘this doesn’t work’ no vendor seems to add this little thoughtful piece of intellect to their browser, and the W3C isn’t telling them to either.