Syntax Matters in Slang

This post was inspired by Ola Bini’s notes on syntax. I agree with most of Ola’s thoughts. I’m not convinced that repetition aids reading or improves syntax, and think that optimising syntax for writing is mostly irrelevant. I might agree less about the familiarity of syntax. I would love it if people, including me, were more open to trying languages with unfamiliar syntax, but the familiarity is important to many on some level.

Some of these thoughts form the basis for Slang’s syntax. I’m not quite ready to define The Zen of Slang, but I can give a shot to stating some of the underlying principles behind its syntax:

  • Reading code is an order of magnitude more important than writing code
  • You should still be able to type a lot of code quickly in any text editor
  • Concise is good, repetition is bad
  • There can be multiple ways to write the same thing, but one way should be canonical
  • Familiarity is good, in this case with:
    • Scala
    • to a lesser extent, other curly brace languages
    • mathematical notation

How these goals actually effect the syntax deserves more explanation

Reading > Writing

We read code more than we write it, and reading is more important. I hope this is accepted by everyone nowadays and I’m not going into why. But how to optimise for readability is another question, and probably highly subjective. In Slang, I am experimenting with syntax that I think is readable, but that might not be always easy to write.

Slang makes very few restrictions on what Unicode characters are allowed in identifiers — I believe that well-known mathematical operators in code that deals with maths are more readable than English words. The ability to write close to mathematical notation in many cases is enhanced by limited mix-fix operator support, but this is really less about syntax and more about libraries. Slang’s standard libraries will prefer well-known mathematical operator symbols instead of word-based method names.

A thing I really love about Scala’s syntax is the ability to define one-line methods without using curly braces.

def +(v: Vector2) = Vector2(x + v.x, y + v.y)

I’d find it incredibly distracting for readability if there were curly braces around this or if the method body was on a separate line due to code conventions. For me, such one-liners and even multi-line single-expression methods are important.

def sum(xs: [Int]) =
  loop(i := 0, acc := 0) while i < #xs do
    (i + 1, acc + xsᵢ)
  yield acc

Many methods that are based on loops can conveniently be implemented as a single loop expression because Slang doesn’t have mutable local variables (at least not yet) that would be defined in a code block, but instead prefers “anonymous recursion” for computing varying values. while is syntax sugar for making that construct more concise. There is just one loop construct, where other languages would have for loops, tail recursion, while loops, or do while loops. It’s likely that I’ll add a ‘for each’ construct, though.

There are other things to consider when thinking about code readability, but that could be a whole post or series of posts on its own.

Writing cannot suck!

Typing Unicode symbols is not easy. The solution I use regularly is to have often-used symbols in a text file, and copy them from there when needed, or to search in a Character Map application (there’s a good one in Ubuntu). This slows down writing, which often doesn’t matter if you spend more time thinking than writing anyway.

But sometimes you still want to type fast, and for that, Slang’s library will have ASCII aliases for every non-ASCII identifier. You can write ASCII names instead of Unicode symbols and replace them later when a code snippet is complete and tested. In the future, an IDE should know what the canonical forms are and do the replacement automatically.

This means there will effectively be two ways of writing many programs. Some language designers would consider multiple ways of doing things bad, but in Slang I allow it with no regrets — this comes from the desire to achieve a balance between the goals “Reading > Writing” and “Writing cannot suck!”.

Repetition is bad and multiple ways of writing things

Ok, I mentioned that non-ASCII symbols always have ASCII aliases (in fact it is mostly vice versa). Consider the following:

class Boolean {
  def xor(that: Boolean) = // intrinsic operation
  def ⊻(that: Boolean) = xor(that)

  // Methods ending with … are prefix operators
  def not…() = true ⊻ this
  def ¬…() = true ⊻ this
}

In the above code, we create aliases by either duplicating definitions (not, ¬) or defining similar methods where the aliases call the “canonical” method (xor, ). There is repetition in both cases and potential performance issues in the latter case if methods are not in-lined. It is how most widely used languages allow aliasing, because they are not too concerned with providing multiple ways of doing things. But Slang is concerned with that, because it helps deal with the above-mentioned tension between reading and writing. So Slang makes aliases explicit:

class Boolean {
  def xor(that: Boolean) = // intrinsic operation
  def not…() = true ⊻ this

  alias ⊻ = xor
  alias ¬… = not…
}

The repetition is gone — to define an alias, we only define its name and the name of the aliasee. Right now only simple identifiers may be aliased, but I might allow more complex cases later.

If you don’t like the aliases Slang provides for the standard types, you could define your own:

extending Boolean {
  alias `no way!` = not… // I hope nobody will actually do this :)
}

Now you can write: if collection.isEmpty.`no way!` then ...

As you can see, even spaces are allowed in identifiers, as long as the identifier is surrounded by back-ticks (same as in Scala). We can’t use the `no way!` alias as a prefix operator, though, because Slang currently allows only certain operator symbols to be prefix instead of infix. I might add user-defined precedence/fixity rules later, but not totally sure about that.

Now, I hear alarm bells sounding in some readers’ heads already. How come I’m willing to give potential Slang programmers so many tools to shoot themselves in the foot? Because I want to believe that most programmers are not stupid and the usefulness of this in the hands of smart people outweighs the negative sides.

Similarly to term aliases, there are type aliases:

type SDL_Rect = (x: Short, y: Short, w: Short, h: Short)

Familiar is good

I don’t have a whole lot to say about this, as things are not yet set in stone. I really like most of Scala’s syntax and I want to stick to something similar for the most part. Significant whitespace would be an interesting experiment, but I’m not sure if it’s a clear improvement over curly braces, which are familiar to the vast majority of programmers. Mathematical notation is complex, and varies between different fields of mathematics, blackboard and paper, and so on. But there is a part of it that is commonly understood and I would like to make some subset of that usable in Slang, although slight differences in syntax might be necessary.

Some examples:

∃! xs , x -> x == 3 // there exists only one x in xs where x == 3
2 ∈ xs // 2 is an element of xs
5 + ∏ xs // 5 + the product of xs
∛(27) + |-3| // cube root of 27 + absolute value of -3

// definition of for all on an array of integers
extending [Int] {
  def ∀…,…(ƒ: (Int) → Boolean) =
    loop(i ≔ 0, z ≔ true) while z ∧ i < #this do
      (i + 1, z ∧ ƒ(thisᵢ))
    yield z
}

You may have noticed

  • quantifiers (for all, exists etc.). For this purpose I allowed comma to be used in a few special identifiers: ∀…,… ∃…,… ∄…,… ∃!…,…
  • subscripts can be used to index into arrays and other indexed collections
  • Unicode aliases like ∧, ≔ and →
  • ∏… as a prefix operator
  • |…| as a closed operator

These all actually translate to method calls such as

xs.∃!…,…((x: Int) -> x == 3)
2.∈(xs)
5.+(xs.∏…())
∛(27).+(3.-…().|…|())
this.#…()
this.subscript(i)

Conclusion

Syntax is quite important in Slang. I’m not designing it syntax-first, but syntax weighs into most considerations. I think the syntax as a whole will be somewhat familiar to many, being similar to Scala and a couple of other recent JVM languages.

The language is optimised for reading, mostly by allowing concise method definitions, a large range of Unicode characters in identifiers, but also by having some support for mix-fix operators. Aliases to Unicode symbols allow fast writing as well, and the alias definitions are made explicit in code. Lots of aliases allow one to write the same program in multiple ways, but in the future, there might be IDE support for “normalising” the syntax.

There are some new things here as well, or at least things that the vast majority of languages don’t have. One is the translation of subscript and superscript characters to their normal equivalents. Another is the loop construct, which is basically anonymous recursion similar to some functional language constructs. It turned out more flexible than I thought, and I’d like to write another post about that, soon.