Last time I posted about how it came to be that I started designing a new programming language and writing a compiler for it. I think I should continue by explaining a bit about how I started the project, how I’m implementing the compiler, and describe the project’s goals and the language I want to create.
About the same time I started having ideas of a more high-level object-functional language allowing a data-oriented programming style and more close-to-machine runtime than the JVM, Geoff Reedy‘s Scala on LLVM appeared. I think that’s a really interesting project and I hope it turns into something awesome. But I think that is not quite what I was looking for, because it still needs the Java libraries and must carry some of the baggage that comes from the Scala/Java interoperability (such as null).
But LLVM caught my attention as I had been hearing more and more about it. I went through the Kaleidoscope tutorial and shortly had my mind set that LLVM would be the library/toolset I would use as the back-end of my compiler to output machine code. By the way, I suggest that anyone who is new to implementing compilers like me should try that tutorial — it takes about a day to get through and in the end you have a simple toy language with a basic REPL and JIT.
I tried to continue adding stuff to Kaleidoscope, but realized yet again that I really don’t like C++ (which is what the tutorial used). I found no Java bindings for LLVM, so I thought to learn a language that has bindings: Haskell. I Learned me a Haskell, or part of it. It is another awesome language, but the syntax and purity scares me a bit and I just couldn’t imagine being as comfortable writing Haskell as I am when writing Scala. So, back to Scala. I initially thought to output LLVM Bitcode from Scala, but after reading some docs, it seemed easier to generate LLVM assembly text files (.ll), similarly to the Scala LLVM project (I even reused some bits of code from there). And indeed, it was much easier getting something running that way, and that’s how I’ll implement the first compiler.
But lets talk about the language and the project goals as well. Actually, I want to split this project into separate phases.
- The first phase is to create a relatively simple language that has a nice Scala-like syntax.
- The second phase is trying out various designs and implementations based on that small base language.
- The third phase, if I get that far, is at the moment an Unknown, trying to take what I’ve learned in the previous phases and try to apply that to actually implement the language I want.
At some point, I will try to bootstrap a compiler in the language itself. The project might end up creating yet another esoteric language, but I hope it will turn out to have some usefulness.
The separation into phases is necessary because I still have a lot to learn and don’t want to rush into creating a full-blown general programming language. And also because I don’t want to think too far ahead at the moment. I started reading some books on programming languages and compilers: Programming Language Pragmatics for the introduction, next will be the Dragon Book and Types and Programming Languages (thanks to Daniel Spiewak for the suggestions). I have a lot of theory to go through and progress may be slow.
The language from the first phase is codenamed
The name comes from Kaleidoscope + language (no relation to Viktor Klang :)), because Kaleidoscope was what I started with, although I altered the syntax to be more like Scala. What features will make it into Klang is not yet set in stone, but I’ll list some of them and describe the language in more detail in the next post, because this one is already pretty long. Anyway, some of the intended features
- A Scala-like (but not always identical) syntax
- Type safety
- Type inference (somewhat similar to Scala’s)
- Named and default arguments
- Passing functions as arguments to other functions
- Implicit conversions (but they can’t be used for pimping)
- Anonymous functions
- Extern function declarations (for linking to native C libraries)
- Primitive types: Byte, Short, Int, Long, Float, Double, Boolean
- Considering having Byte be unsigned, but not sure
- Tuple types with optionally named elements
- function argument lists are tuples
- can call a function with any expression that returns a tuple (if the argument list matches)
- multiple named return values from functions
- Arrays and UTF-8 Strings
- Blocks and local values (no mutable local variables)
- Control structures
- If expressions
- Some kind of for-loop like structure, specifics not decided yet
- && and ||
- Packages or namespaces
- A tiny standard library
- … maybe a few more things
A feature that will probably not make it is polymorphism. There will likely not be an Any or Object type that all other objects inherit from. And you will not be able to ask for the type of a value at runtime.
At least the above is roughly what I think will be in the first version of Klang. Once that version is done, I might make it available on Github under some liberal license. More in the next post.