Scala: for a/vs while {update}

As I mentioned in a post about the performance of Scala’s functional style for expressions compared to the imperative while, the Scala 2.7.0 release is going to improve that over the 2.6.x releases, because they fixed a performance blooper with the Range class. The same code in Scala 2.7.0 RC1 (Release Candidate) makes the for expression over a range about 2.5 times faster. But it’s still 60 times slower than while. I’ll re-post the code and the new comparison results, along with the Time singleton.

object Time {
  def apply[T](name: String)(block: => T) {
    val start = System.currentTimeMillis
    try {
        block
    } finally {
        val diff = System.currentTimeMillis - start
        println("# Block \"" + name +"\" completed, time taken: " + diff + " ms (" + diff / 1000.0 + " s)")
    }
  }
}

object ComprehensionPerfImpact {
  def main(args : Array[String]) : Unit = {
    val n = 100000000

    Time("for-comprehension") {
      for (x <- 1 to n) {}
    }
    Time("while") {
      var x = 0
      while (x <= n) { x += 1 }
    }
  }
}&#91;/sourcecode&#93;
<code># Block "for-comprehension" completed, time taken: 6672 ms (6.672 s)
# Block "while" completed, time taken: 109 ms (0.109 s)</code>

The previous results with Scala 2.6.0 were: <code>16.797 s</code> vs. <code>0.125 s</code>. Note that the small difference in <code>while</code> performance between then and now is probably random, as the results differ a bit on each execution.

In the previous post the conclusion was that since a while expression is compiled into some simple byte-code that uses <code>goto</code>, and for is translated into some higher-order function calls, they can never quite match in performance. But this still seems like an awfully big difference to me. If I find some time for it during this week or the next, I'll profile the code in detail and find what exactly causes the for-comprehension to perform so slowly.

<span id="more-13"></span>

In the mean time, I implemented a simple and really naive <code>MyRange</code> class, by extending <code>Iterator[Int]</code>. It isn't very correct, but performs well because it doesn't do much. It also shows how to use custom classes in for comprehensions:


class MyRange(start: Int, end: Int) extends Iterator[Int] {
  var curr = start
  def hasNext = curr < end
  def next = { curr += 1; curr }
}

object ComprehensionPerfImpact2 {
  def main(args : Array&#91;String&#93;) : Unit = {
    val n = 100000000

    Time("my range") {
      for (x <- new MyRange(1,n)) {}
    }
  }
}&#91;/sourcecode&#93;
<code># Block "my range" completed, time taken: 2329 ms (2.329 s)</code>

Now, this seems better. Still 20 times slower than while, but I guess we couldn't get it much faster any more just by changing libraries -- MyRange is a pretty minimal implementation of an Iterator. Actually, you can make it slightly faster by making MyRange an internal class of the "my range" closure and moving the <code>curr</code> variable out of the class -- I guess that way the Scala compiler can optimize <code>Int</code> usage and will not have to do auto-boxing. But doing that would result in some weird and definitely not reusable code -- it would be better to just use while.

PS. Something strange happened when I put all of these three loops in the same class -- the second for expression (whether it was using the built-in range or MyRange) became slower by 2-3 seconds. I have no explanation for why that happens at the moment, but didn't want that to mess up the results, that's why I put the last loop in its own class.

<b>Update:</b> I thought of trying the same MyRange class in a while, to get a comparison of for and while expressions that do the exact same thing. The difference is roughly 9 times:

...
    Time("while: my range") {
      var r = new MyRange(1,n)
      while (r.hasNext) { r.next }
    }
...

# Block "while: my range" completed, time taken: 265 ms (0.265 s)

Advertisements

4 thoughts on “Scala: for a/vs while {update}

  1. Pingback: Scala: for a/vs while « Villane

  2. Hi,

    Microbenchmarks in the JVM are quite tricky due to the optimizing JIT. There are many things to look for, but instead of going through them here I will point you to this article[1] that mentions many of them. Even for a quick and simple benchmark, it’s very important to warm-up the JIT by performing many untimed iterations before the timed ones.

    Hope it helps,
    Ismael

    [1] http://www.ibm.com/developerworks/java/library/j-jtp02225.html

  3. I ran almost identical benchmarks to these and found the same result.

    I agree with the micro-benchmark comment previously mentioned, but I do not think it will change your comparative results here that much.

    I think this kind of benchmark is very enlightening about understanding what code is fast and what code is slow short of reading the byte assembly. The most basic coding elements should be benchmarked and understood, so that developers can understand the effects of employing one mechanism over another. Here the mechanism is a language feature comparison versus a classic data structure comparison.

    I had run my own benchmarks that are almost exactly the same as yours before I found this article of your. I came to similiar results. In addition, the results between a java while loop and scala were about the same. This is expected since they should produce the same byte code and eventually the same fast native integer based code as HOTSPOT does its work upon successive iterations.

    I ran tests using range.foreach/closures, for-comprehensions, and while-loop. I believe the mechanism for the for-comprehension is similiar to a closure call and thus the slowness. The speed of for-comprehension versus closure is very close on 2.6.x scala. There is a very important overhead to closures, and maybe that overhead is similiar to the closure overhead. Maybe there will opportunities around optimizing closure calls to speed this up.

    I found Joshua Blochs comments on closure performance interesting. Closures have a lot things they must do under the covers to make them happen properly.

    http://www.parleys.com/display/PARLEYS/The+Closures+Controversy

    Scala invites one to use these closures everywhere including the inner-most loops. This is what concerns me most about scala. I would hope that such simple and common use cases can be optimized, but such optimization may take as long it took to get from byte-code-interpretation to JIT to HOTSPOT.

    Scala does have a leg-up on those dynamic languages. This much is certain. However, if coders are using closures all over the place in their inner most loops, then performance could be hit hard. I hope I am missing something on that point.

    If you have not seen it, there is small-ish but more complex benchmark comparing java, scala, and groovy:

    http://dmy999.com/article/26/scala-vs-groovy-static-typing-is-key-to-performance

    This shows scala performing well in a more real-world benchmark (for comprehensions and all) and this is ultimately what matters.

    Keep doing those micro-benchmarks! There is no better way to a get “feel” for what kind of code is fast versus slow in any language IMO. Talk is cheap – experimenting gets results.
    Steve

  4. I’ve taken a look at what happens when when using a List, which is strict (1 to n yields a Range which is lazy). First observation, I get a heap overflow for such a big value of n :b
    So I set it lower. Second observation, for comprehension with a List takes considerably less to complete IF you don’t count the time it takes to construct Range/List. Building a List from a Range (r.toList) is, apparently, very expensive.
    My 2c

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s