How can a language whose compiler is written in C ever be faster than C?

Question Detail: 

Taking a look at Julia's webpage, you can see some benchmarks of several languages across several algorithms (timings shown below). How can a language with a compiler originally written in C, outperform C code?

enter image description here Figure: benchmark times relative to C (smaller is better, C performance = 1.0).

Asked By : ben
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/45486

Answered By : tsleyson

There is no necessary relation between the implementation of the compiler and the output of the compiler. You could write a compiler in a language like Python or Ruby, whose most common implementations are very slow, and that compiler could output highly optimized machine code capable of outperforming C. The compiler itself would take a long time to run, because its code is written in a slow language. (To be more precise, written in a language with a slow implementation. Languages aren't really inherently fast or slow, as Raphael points out in a comment. I expand on this idea below.) The compiled program would be as fast as its own implementation allowed—we could write a compiler in Python that generates the same machine code as a Fortran compiler, and our compiled programs would be as fast as Fortran, even though they would take a long time to compile.

It's a different story if we're talking about an interpreter. Interpreters have to be running while the program they're interpreting is running, so there is a connection between the language in which the interpreter is implemented and the performance of the interpreted code. It takes some clever runtime optimization to make an interpreted language which runs faster than the language in which the interpreter is implemented, and the final performance can depend on how amenable a piece of code is to this kind of optimization. Many languages, such as Java and C#, use runtimes with a hybrid model which combines some of the benefits of interpreters with some of the benefits of compilers.

As a concrete example, let's look more closely at Python. Python has several implementations. The most common is CPython, a bytecode interpreter written in C. There's also PyPy, which is written in a specialized dialect of Python called RPython, and which uses a hybrid compilation model somewhat like the JVM. PyPy is much faster than CPython in most benchmarks; it uses all sorts of amazing tricks to optimize the code at runtime. However, the Python language which PyPy runs is exactly the same Python language that CPython runs, barring a few differences which don't affect performance.

Suppose we wrote a compiler in the Python language for Fortran. Our compiler produces the same machine code as GFortran. Now we compile a Fortran program. We can run our compiler on top of CPython, or we can run it on PyPy, since it's written in Python and both of these implementations run the same Python language. What we'll find is that if we run our compiler on CPython, then run it on PyPy, then compile the same Fortran source with GFortran, we'll get exactly the same machine code all three times, so the compiled program will always run at around the same speed. However, the time it takes to produce that compiled program will be different. CPython will most likely take longer than PyPy, and PyPy will most likely take longer than GFortran, even though all of them will output the same machine code at the end.

From scanning the Julia website's benchmark table, it looks like none of the languages running on interpreters (Python, R, Matlab/Octave, Javascript) have any benchmarks where they beat C. This is generally consistent with what I'd expect to see, although I could imagine code written with Python's highly optimized Numpy library (written in C and Fortran) beating some possible C implementations of similar code. The languages which are equal to or better than C are being compiled (Fortran, Julia) or using a hybrid model with partial compilation (Java, and probably LuaJIT). PyPy also uses a hybrid model, so it's entirely possible that if we ran the same Python code on PyPy instead of CPython, we'd actually see it beat C on some benchmarks.

No comments

Powered by Blogger.