Holy Bitcode Batman, You’re Writing a Compiler!

Recently I’ve taken an interest in programming language design and have started working on a compiler for a new language. The reasons for doing that are perhaps less practical than I’d like, because I’m a practical man, or sometimes pretend to be. At least I’m usually more interested in the application of mathematics than the beauty of mathematics itself. The same used to go for programming languages.

As the few readers of this blog may know, I’ve been experimenting with game programming now and then, but haven’t been driven enough to actually complete and publish a game. In early years I tried Pascal and C++. I did complete some stupid little games in Pascal, but C++ was more a hindrance to me than an enabler. At some point I learned UnrealScript while developing a Deus Ex mod and kind of liked it. Actually I think that was part of what lead me to Java and getting jobs in Enterprise Java programming. I never really thought to write games in Java, though.

When I discovered Scala I fell in love and thought it was the be-all and the end-all of programming languages and very applicable to games. Tried to write a couple of games in it, wrote some libraries for my own use and even ported a physics engine from Java and C++ to Scala. I loved doing it, loved the language features, the type system, the syntax, the preference of immutability over mutability. Perhaps most of all I loved Scala’s elegant mixture of object-oriented and functional programming.

Eventually something about making games in it started nagging me, though. The performance was good enough for the type of game I was making, but the game was also using a lot of resources for the type of game it was. It started to feel wasteful. The idea that games are a big part of what is pushing hardware forward and have to take the most out of it was somehow stuck in my head.

Scala runs on the JVM, which has the nice abstraction of a big heap, and the memory management is done for you. My boss Jevgeni has given a really awesome talk on that topic titled “Do you really get memory?”. But whatever tricks the JVM does to make that abstraction work well enough for most applications, it does produce some amount of waste — extra CPU cycles and garbage objects which need to be collected to free the memory for new objects. And that is a big part of what the JVM engineers are continuously improving on. They are probably the most efficient at collecting (virtual) garbage in the world! But there are cases where that kind of heap abstraction doesn’t seem to hold well, and high-performance games are one of those cases.

My game engine used lots of immutable, short-lived objects. Things soon to become garbage, in other words. Garbage, dying a slow death in the heap. Every small Vector2(x, y) tracked by the collector, maybe living in a separate heap region from its closest friends. And looking up bits from here and there in the heap is really expensive from the CPUs perspective. Even when the Sun Java VM started optimizing away heap allocation of very short-lived objects (enabled by escape analysis), that only gave me a small performance boost. The situation has improved, but back then I decided to try and avoid so much garbage being produced. I optimized some functions manually, doing scalar replacement of Vector and Matrix objects. That made the code look really ugly and unreadable because it hid the mathematical formulas.

I couldn’t stand it. Neither could I stand all these cycles wasted on GC. So I wrote an optimizer that plugged into the Scala compiler and did the scalar replacement automatically. It worked, and gave a more significant improvement than the JVM’s escape analysis optimizations at that time; garbage production was being reduced, I was going green! But it was hard for me to maintain the optimizer as I knew almost nothing about compilers and was just going by my nose. There were some corner cases that were hard to handle correctly. It only worked on code written against a very specific library, optimizing away well-known constructor and method calls.

Writing that optimizer got me somewhat interested in compilers, though. I remember saying to someone during a job interview a few years ago that I like complex problems, but am not interested in the really complex stuff like compilers. Sometimes things work out as the reverse of what you think.

Anyway, working on my game engine, I wanted to create a really powerful entity system. I wanted to use mix-ins and other nice Scala features. Reading some blog posts about game entity systems, I realized that most people seemed to be moving away from inheritance-based systems into component-based systems. Reading more about component-based systems made me run into the topic of data oriented design (PDF), which is all about thinking about data first, and how the program processes it. A couple of presentations on that left an impression on me and made me realize just how expensive it actually is to make the CPU churn through megabytes of random memory.

But I didn’t want to switch to C++ or C to be able to take advantage of the kinds of optimizations that data-oriented programming can give. I had the idea that maybe there should be a language that was object-functional like Scala, but compiled down to very CPU- and cache-friendly data structures and functions, the kind of structures one would use when doing data-oriented programming manually. And I have huge respect for people who do the latter. But I noticed that some of them seemed to be wanting a better language than C++ as well.

So, a language as expressive and type safe as Scala, similarly object-functional, but with more efficient memory access, CPU cache and parallelization friendly, enabling a data-oriented programming style with less hassle than C++. Perhaps one that could even run some functions on the GPU. What could that look like? I started thinking that maybe I should find out first-hand. [Update to clarify my goals] Well, at least I want to find out what it would be like to try to get there. I’m sure combining the type safety and power of Scala with the raw performance of C is way too ambitious for me. So I’m setting the goal way lower, but more about that in the next post.[/Update]

I’ve never really been a language geek. I’ve programmed in several different languages and at the moment am good friends with only two: Java and Scala. There are scores of other programming languages out there and I’m sure there is some language that is at least 2/3 of what I’m looking for.

But I think some bug bit me, because I couldn’t let go of the idea of creating one myself. And this blog post was a lengthy, boring preface to a series of hopefully less boring posts documenting my experiment. I have no illusions — I don’t think I’ll create the “next big language” or anything close — there are people much smarter than me who have been working on programming languages for years and decades. But this will be a fun learning experiment and maybe something useful will come out of it. Next time I will talk about the kind of language(s) I want to create. The (s) is because I want to create a really simple language first, to learn more about compilers during the process, and later expand from that base.


9 thoughts on “Holy Bitcode Batman, You’re Writing a Compiler!

  1. There’s certainly something to be said for designing and implementing your own language. i’m doing the same myself, and, interestingly, somewhat from a DOD perspective since my language is based around a document structure equivalent to XML but with a C-style syntax i devised i call Struxt.

    Anyway, if your goal is to write games, you might have a look at the latest version of Objective-C using Apple’s latest development tools (LLVM). It’s really quite an optimal and efficient platform. If you insist on continuing to design your own language, you’d serve yourself well to learn as much as you can about Google’s Go language. It’s beautifully designed, thoroughly consistent and it’s capabilities for easing concurrency trounces most other languages.

  2. I think the world can always use more languages, so the best of luck to you! However, speaking as a language geek who has spent a *lot* of time thinking about this problem, I think you’re going to run headlong into an irreconcilable dichotomy:

    – a language as […] type safe as Scala
    – with more efficient memory access

    By “more efficient”, I assume you mean “not bound by the GC”. You want access to pointers, real values (on the stack), manual control of allocation/deallocation, etc. Those things cannot be reconciled with type safety. Well, you can get stack values, but you can’t force them to *stay* on the stack without allowing your language to grow some potential surprises (e.g. a function that *appears* to be pass-by-reference but is in fact pass-by-value).

    Pointers are obviously trouble since you can spindle them using a Turing Computable function (i.e. pointer math in your program), and so there’s no way (in general) for the type system to help you. There are tricks for restricting the expressiveness of your language which help here (e.g. linear typing), but they don’t solve the whole problem. Manual control of malloc/free is of course very, *very* important for performance, but without some sort of alias tracking (again requiring something like linear typing), you won’t be able to guarantee the safety of such operations in general.

    There has been a lot of research into this sort of thing (research largely ignored by Go, btw). There are ways to do better than Scala or the JVM in general, but they’re not trivial and they come with a high complexity cost. I do hope you enjoy the challenge, but I think you’re going to be surprised by how difficult it is to reach the goal you have set out!

  3. @Ivan I haven’t tried that yet, but I did actually reuse some of the code from the Scala LLVM project as my current compiler (written in Scala) currently generates LLVM assembly as well.

    I totally expect to run into complications beyond my ability. I probably should have mentioned that I don’t expect to actually achieve creating the language I described. Some of the intricacies of Scala’s type system are still beyond me and I don’t even want to try to replicate all of that. Maybe just a small part of it. So “as type safe as Scala” was definitely an exaggeration. Actually, my initial goal is set quite low, and once I have completed that, I’ll see about where I want to take this next.

  4. I programmed in Scala for a while, had short interludes with Haskell, and recently toyed with this idea as well. I went on gathering some references on the topic and, after scanning through “Implementing functional languages: a tutorial” [1], I realized I might better start from something that already exists. The Habit programming language [2] caught my attention lately. It is a strict version of Haskell and it seems like a good place to get his hands dirty. Unfortunately I don’t know if they plan to release any source code.

    Traditionally, programming languages have always been the product of the work of one or two likely minded individuals. Future will tell if I’m wrong but, given the amount of knowledge we accumulated on programming languages for 40 years, I don’t think it is a good idea anymore and I’d be curious to see if a language spec could be driven by a community for a change. Anyway, I’ll be following your progress with great interest!

    [1] http://tinyurl.com/qtajee
    [2] http://lambda-the-ultimate.org/node/4205

  5. You mentioned UnrealScript and DeusEx. Check out the Unreal Engine, that is one of the most powerful game engines right now.

    The limits of games is not bound by the host programming language but by resource handling and rendering, it’s more about working with shaders, tricks with textures and lightning during preprocess than worrying about GC. Unreal Engine uses UnrealScript which is pretty Java like instead of using C++ for game code. You don’t need raw memory access, you need the opposite, you need to focus on the game rather than low level stuff.

    Don’t listen to me, listen to what the people behind Unreal Engine think of a new language to work with games: http://www.cs.princeton.edu/~dpw/popl/06/Tim-POPL.ppt

    Check my project Sloth: https://github.com/BelfryGames/sloth

    I copied code meant to work with the newest OpenGL features with C++ and converted it to Scala. The examples run almost at the same framerate (i.e 855[Scala] vs 860[C++]).

    There I used really careful techniques to handle arrays before passing them to the GPU to avoid copying data. I made arrays of Vector that where side by side sharing contiguous memory while they still allowed handling as individuals but where good to go as one copy instruction when passed to the GPU via LWJGL. The only loss with my approach is memory but not processing power compared to C++.

    That doesn’t mean my approach scales to Game Logic without extra work but you must be extra careful on how you interact with the graphic device as that is where the biggest bottleneck is.

    I would honestly give it another go at optimizing you Scala code rather than writing a language from scratch. I recommend against worrying about raw memory management. Projects like ScalaCL http://code.google.com/p/scalacl/ to make optimization to code are good ways of improving Scala without going too far, like you did with the optimizer plugin.

    Nonetheless I would be extremely grateful if you proved me wrong and made a new kickass language to make games with :)

  6. Pingback: My Language Experiment « Villane

  7. I started something very similar 3-4 years ago. I hadn’t heard of Scala at the time but the language I designed ended up looking and feeling a lot like it. The work was almost endless which is what finally drove me to search more thoroughly for an existing solution and voilà, Scala. Unfortunately the JVM really does suck for game programming, but I still enjoy Scala. :) Best of luck with your project.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s