nihil architecture blogs


Restrictions and clearer languages

After reading about this new C# language, some design choices of it baffled me, which were part of a new trend, I’m not talking about lambda abstractions, about some object oriented features, about things that drastically reduce performance, I’m talking about if(x = y) being illegal. Conditionals in C# apparently require their value to be of the type bool, if the compiler cannot prove they are of type bool, this is considered a static compile time error.

Now, to some programmers, if(x = y) would be naïvely perceived as a typo, or applying basic rules to a C-syntax. But it’s actually quite possibly intended code in many languages. C popularized an interesting concept, assignment wasn’t a statement, but an expression, it had a well defined value, namely, the value that was being assigned. This allowed for ‘chain assignments’ : a = b= c = d, and on most architectures this maps most efficiently to machine operations. It’s also a rather common way in JavaScript to loop over some list, and for that reason the ‘next’ operation yields null rather than an error if there is no next, stopping the loop.

It’s also to many programmers a source of frustration, typos can often lead to unintended code, if(x = 5) is of course always true. Though a lot of programmers have taught themselves to use if(5 == x) to catch those errors early. So C# has chosen to favour the latter problem, a restriction on power and expressiveness to protect people against themselves. Even though the problem could have just as easily been avoided by using := for assignment, oh well.

And this seems to be the trend more and more, restrictions and restrictions on power to protect people ‘against themselves’, my rants on psychiatry and society should make it clear that I am generally opposed to protect people ‘against themselves’. Let people do what they want, maybe they do a thing they’ll regret, maybe they’ll stop doing it afterwards. It’s better to teach a child to not touch a pan by letting it burn its fingers then by just saying ‘don’t do it’ and the child having grown up knowing it shouldn’t do it, but never really knew why. And this is also exactly what these restrictions do, instead of teaching people good programming, and the theory behind it and why they shouldn’t do certain things, you just don’t allow them to do it, they won’t do it because they have no choice, and they’ll never understand why they shouldn’t and they’ll be kept ignorant forever. My hunch is that there are more programmers on the planet that don’t realize why they use == ‘in an if-statement’ and = outside it than those who realize why this is done. Which really was one of the beautiful things of C, it gave the programmer choice, and indeed, at some points it does pay to be able to use assignment inside a conditional. while(obj = array[i++]) ... is code which I very commonly use in languages where accessing outside of the bounds of an array yields null rather than an error, and it probably was designed so for this purpose. If this C# idea catches on, people will probably be kept more and more ignorant and in the end completely be obscure to the fact that== is a binary operator, just like +.

A thing people often misunderstand about this issue is static versus dynamic typing, often things like In statically typed languages, variables have a type, in dynamically typed languages, values have a type. are uttered, quite a crude approximation. Static in most context is the same as ‘lexical’, a statically typed language is designed so that ‘types’ can be inferred from analysing source code alone without running it or computing any values. Static typing is older, and less powerful than dynamic typing, but also more restrictive. Static typing is also easier to implement, for this reason, when the first high level languages came, people had the interesting idea of coming up with ‘type declarations’, so that a compiler could analyse these during compilation and safeguard a programmer against errors whose result may be well defined in machine term operations, but whose result is also ‘nonsensical’ to any human reader. Applying float addition to integers is a perfectly defined operation that produces a new natural number in binary base which you can interpret as whatever you want, a character, an unsigned integer, a signed integer, a float, a four-character ASCII string, it’s all about interpretation of that scalar ordinal value. However in any of these interpretations, to human readers it’ll most likely not make any ’sense’. So if in the source code of programs variables of the ‘wrong’ type were used together, lexically, the code wouldn’t compile, and signal an error. In static typing, this can just be inferred from source code.

Dynamic typing is some-thing different, more powerful, and more dangerous, ‘dynamic’ in this context has the usual meaning of ‘only known when the code is run’, in dynamic tying, values carry a label, which is checked when operations are applied, if it doesn’t match, a runtime error or exception is raised. Dynamic typing was a novelty due to Lisp, a language in which source code itself was dynamic, a lexical analysis was insufficient to determine if types wouldn’t conflict, because the source code itself was no longer static. Other languages adopted this then-performance-costly idea because of the raw power it afforded, the downside is that programmers again had to manually verify their code and follow the logic of it to ensure that type errors would not occur. Which in statically typed languages can be proven by the compiler.

And proven is a very important word here, it can be proven by a compiler, but not decided, it’s theoretically impossible to decide if a type error is going to occur lexically. Where deciding means that all programs you reject will have type errors, and all you accept will not. Compilers only offer the guarantee that accepted programs will not have them. It’s quite possible that a rejected program will also not have it, but as long as the compiler can’t prove it, it will reject. This is a pretty awkward mentally. ‘I do not accept your program, for I cannot show it will work.’, rather than ‘I do not accept, for I can show that it will not work’, however, though the latter path is also possible, choosing it will necessarily leave open the option of type errors. A trivial example would be:

{int a, b; float c; a = true ? b : c;}

This will usually be rejected in statically typed languages, even-though a type error will never occur from this, of course, this example is quite useless but a more useful example would be:

function concact(x,y) {
if("array" == typoeof x && "array" == typeof y)
return (new Array).append(x,y);
else if ("string" == typoeof x && "string" == typeof y)
return x + y;
else if("number" == typoeof x && "number" == typeof y)
return x + y;
else return null;

Dynamically typed languages often offer facilities for runtime type-checking to choose a program flow, the same thing happened as above, a statically typed language would reject this example, even though for whatever input it may get, a type error cannot possibly occur within this function, a compiler has no way to prove this algorithmically from a lexical analysis. Lexically the function + which applies to two numbers is applied to the same variables in the same lexical environment whereon array_append which applies to two arrays is also applied. The compiler cannot prove no type error is going to occur. Many people who coded in ‘archaic’ static languages often felt the pain of having to re-write and overload functions that do the exact same thing that have to deal with both integers and floats, but templates and generics did offer a marginal solution.

The real rescue for this problem in statically typed languages is parametric polymorphism, where types are not constants, but variables. The function above is then ‘typed’ as simply ‘taking two identical types, producing a value of same type, or null’.

Static typing is not only a restriction on programmers to ‘protect them against themselves’, it’s also taking with it a lot of things that need no protecting. It’s banning cars because some of them are unsafe, and taking the safe one’s with it because you have no way to infer præcisely which are safe.

Other languages have evolved a different tactic, they aim to make their syntax ‘clear’, a well known and ridiculously failed example is a certain language where the now ubiquitously readable action of C++; would be similarly expressed as ADD 1 TO COBOL GIVING COBOL, as it turned out, use and conventions make readability, not natural language. Though the latter example is a lot clearer to people never having programmed. The sheer mass of exposure to the former form makes it immediately clear. A dynamically typed and less extreme example would be:

def factorial(n):
if x > 1:
return factorial(n - 1)
return 1

Python, often using English words, some-what reads like English, define factorial in n: If x is greater than 1, return factorial of n minus 1, else return 1. It has shown that it helps some people to read the code aloud in their mind, one of the ancestors of Python has a wholly different vision:

(define (factorial n) (apply * (range 1 n)))

Or at least how I would write it, people often say that Lisp is ‘unreadable’ but I beg to differ, in my opinion Lisp code is exceptionally clear, consistent, there is no such thing as operator præcedence, not only is the delimiting of lexical blocks explicit, the limit of an expression is, the range of an operation is and so on, with syntax highlighting there is no confusion if the code you edit is still inside of the scope of your control structure and so on, the one thing is, it’s impossible to read it like English. Why does infix notation exist even though it’s inconsistent, is it more clearer due to conventions, or because x = y can be read as ‘x is y’ in English?

I know that to be able to read it as English or to be able to use layout indentation doesn’t work for me. I find myself counting invisible characters in Python and then summing them to determine block structure. I like the fact that in Scheme, what it does is clear from a simple rule that requires no thought at all, the value of evaluating a list is the evaluation of the head as an expression applied to the evaluation of the other forms. But I’ve found that most people like to see ‘a variable’ as being identical to the value, rather than evaluating to a value.

Another thing lately while discussing some features of Clojure is when I found that most Clojure programmers see ‘a list’ as ‘an ordered collection’ for all purposes not that different from a vector. Thinking about it that way just gives me shrivels when programming, ‘a list’ to me is a binary tree which on its outer right leave points to a special nil constant, or that constant itself. This is what I ’see’ conceptually when I work with lists, when I get their head or their tail. I also found out there an interesting differing view in that those that advocated against using cadddr in lieu of simply fourth found themselves translating ‘the fourth item of the list’ to ‘the head of the tail of the tail of the tail’ of the list. Where those that insisted on cadddr being clearer found themselves wanting ‘the car of the cdr of the cdr of the cdr of the pair’ and thinking ‘That would be uumm, the fourth of the list.’. The Zen of Python famously includes There should be only one obvious way to do it., I find that when people have such conceptions about ‘clear code’, they forget that different people have a differing view on ‘clear code’, many people’ll call APL, Forth or Lisp a mistake because the code is unreadable, users of either often passionately disagree and say a program is instantly readable and very clear. The fact that these languages have endured so long should point to this. But they do have in common that they can’t be read out loud in English. I’m a visual thinker, most people I know that defend Lisp are so likewise, maybe that explains some things. I never ‘read my code aloud’, I don’t think English is a very good language to express mathematical or sequential logic in. Notation like forall x forall y : ( forall z : ¬(z in y)) -> x + y = x is a lot clearer than ‘For all x for all y, if for all z, z is not in y, then x plus y æquals x’, but this is often shortened down to ‘x plus zero is always x, where we encode zero as the empty set.’ Still, that doesn’t define the idea ‘empty set’. I personally rather see code as what it is, a series of expressions evaluating to a value, that value then taken as the value of an outer expression to work with. Different people, different methodology. Which is also a reason I don’t think people should be restricted, different restrictions work differently for different people.

From my perspective though, seeing code as English is a bad idea and I can imagine that people make errors like if(x = y) where they mean if(x == y), once you see them both as binary in the same vein of + where the former simply evaluates to its second operant always and has the side effect of changing the value of the memory location the first points to that of the second. And the latter operation evaluates to true or false depending on the æquality of both. But some instead ‘choose’ to read if(x == y) a = b; as ‘if x is y, then a is b’, which is where the trouble begins is my hunch. It’s a very approximate and awkward simulation of assertive mathematics, programming language are not assertive and do not constrain their variables concordantly the truth of the termination value.

Of course, I said I was averse to protecting people ‘from themselves’, I have nothing against protecting people ‘from others’. I think it’s perfectly acceptable that an aviation company demands its software be written in a very type safe and very restrictive language. But ideally this should already be done by people who’ve learnt the hard way what a simple typo can cause and understand why the restrictions are there and what exactly they prævent. Many people nowadays start learning programming with C#, ideally people should be taught programming in a language without any type checking at all to firmly grasp the difference between float and integer operations.