How to crash the java compiler (JDK 8 update 112)

I have been testing java optimizations lately and I have found a bug in the Java compiler by accident. The JDK version I use is 8 update 112, which is the latest production release available at this point (Dec 28, 2016). I have just reported the bug to Oracle, hopefully they will fix it soon. I will explain in this blog post how to reproduce the bug. My computer is a MacBook Pro (i7 / 8Gb RAM / Early 2013 / OS X Yosemite).

EDIT: The bug is now confirmed: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8172106

Basically I wrote a small program that creates a big Java class with lots of unused assignments. Originally my intention was to check how the JVM would handle the performance of unused assignments (hopefully JIT would ignore them all, but I will check on that later), when I suddenly stumbled upon the bug.

Consider the source code below. This is part of my original code and I removed some pieces that are not needed to reproduce the bug:

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class Main {

   public static void main(String[] args) throws IOException {

      final String folder = "/Users/hugo/temp/";
      BufferedWriter out = new BufferedWriter(new FileWriter(folder + "Performance.java"));

      out.write("public class Performance {\n");
      out.write("\tpublic static int doNothing() {\n");
      out.write("\t\tint i = 0;\n");
      for (int i = 1; i < 99999999; i++) {
         out.write("\t\ti = " + i + ";\n");
      }
      out.close();
   }
}

This code compiles correctly. If you run this code (you should change the “folder” variable first and point it to a valid folder on your computer), it will generate a file called Performance.java on your computer. This is a big source file with 1.5Gb in size.

Now if you try to compile this class:

javac Performance.java

You will get the following error message:

An exception has occurred in the compiler (1.8.0_112). 
Please file a bug against the Java compiler via the Java 
bug reporting page (http://bugreport.java.com) after checking 
the Bug Database (http://bugs.java.com) for duplicates. 
Include your program and the following diagnostic in your 
report. Thank you.
java.lang.IllegalArgumentException
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
   at com.sun.tools.javac.util.BaseFileManager$ByteBufferCache.get(BaseFileManager.java:325)
   at com.sun.tools.javac.util.BaseFileManager.makeByteBuffer(BaseFileManager.java:294)
   at com.sun.tools.javac.file.RegularFileObject.getCharContent(RegularFileObject.java:114)
   at com.sun.tools.javac.file.RegularFileObject.getCharContent(RegularFileObject.java:53)
   at com.sun.tools.javac.main.JavaCompiler.readSource(JavaCompiler.java:602)
   at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:665)
   at com.sun.tools.javac.main.JavaCompiler.parseFiles(JavaCompiler.java:950)
   at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:857)
   at com.sun.tools.javac.main.Main.compile(Main.java:523)
   at com.sun.tools.javac.main.Main.compile(Main.java:381)
   at com.sun.tools.javac.main.Main.compile(Main.java:370)
   at com.sun.tools.javac.main.Main.compile(Main.java:361)
   at com.sun.tools.javac.Main.compile(Main.java:56)
   at com.sun.tools.javac.Main.main(Main.java:42)

I have found bugs in the JVM before, but this is the first bug I have found on the Java compiler. You can leave your comments below.

Comparing the volatile keyword in Java, C and C++

The volatile keyword exists in some programming languages, including Java, C and C++. I want to discuss in this blog post the differences, similarities and some details that every programmer should know about this keyword and help you understand how it affects your code. My focus will be only on Java, C and C++ because those are the languages I have most experience with. Feel free to share in the comments information about how other programming languages deal with the volatile modifier.

You probably know that Wikipedia has a good article about the volatile keyword, but the article doesn’t go very deep into the language details. The goal of this blog post is to compare the languages side by side and provide a bit more knowledge to anyone who is interested in this subject.

Declaring a volatile variable in Java

In Java, you can only add the volatile keyword to class members:

public class MyClass {
   private volatile int count = 0;
   public static volatile int sum = 0;
}

Arrays can be volatile, but the elements won’t be volatile:

volatile int[] numbers = new int[10];
...
numbers[0] = 5;  // not volatile

Local variables cannot be volatile:

public void doStuff() {
   volatile int count = 0; // Error: illegal start of expression
}

Declaring a volatile variable in C and C++

In C and C++, the volatile keyword can be applied to local and global variables:

volatile int globalNumber = 0;
int* volatile pInt; // volatile pointer to nonvolatile int
volatile int* volatile pInt2; // volatile pointer to a volatile int

void doStuff() {
   volatile float localNumber = 5.0f;
}

structs can also have volatile fields:

struct A {
   volatile double n = 0.0;
};

In C++, class members can also be volatile:

class T {
   volatile long id = 0;
};

C++ also allows the volatile qualifier to be applied to return values, parameters and class methods. This is explained at the end of the post (see below).

In Java, volatile means memory visibility

When a variable is declared volatile in Java, the compiler is informed that such variable is shared and its value can be changed by another thread. Since the value of a volatile variable can change unexpectedly, the compiler has to make sure its value is propagated predictably to all other threads. Let’s consider a simple piece of code:

volatile boolean isDone = false;
public void run() {
   while (!isDone)
      doWork();
}

The isDone variable is declared volatile. If that variable is set to true by another thread, this piece of code will read the new value and immediately leave the while-loop. If we forget to add the volatile modifier to that variable, there is a good chance the loop will never finish if the variable is set to true by another thread.

If a variable is declared volatile, all threads must read its most recent written value. The JVM does not cache volatile variables in registers or other caches that are hidden from other processors. This gives visibility to the variable and is considered a weaker form of synchronization between threads.

In Java, the volatile keyword guarantees memory visibility and it is meant to be used in concurrent programming.

In C/C++, volatile means no optimizations

In C/C++, optimizations are done by the compiler. Most compilers have command line parameters that allow you to specify different levels of optimization. For example, take a look at some options from CLANG:

-O0 = no optimization
-O1 = somewhere between -O0 and -O2
-O2 = Moderate level of optimization, which enables most optimizations
-O3 = Like -O2 with extra optimizations to reduce code size

Some compiler optimizations include removing unused assignments and reordering of memory operations. This is easy to check with the gcc.goldbolt.com website, let’s take a look at a simple example:

cpp1

This code has an int variable x that receives a useless assignment on line 2. The compiler detects this issue, ignores the value 10 and decides to return 30 directly (see assembly code on the right).

The volatile keyword in C/C++ disables all optimizations on a variable. If we try the same code again, but with the volatile modifier applied to the x variable, the assembly code looks totally different:

cpp2.png

Now the useless assignment isn’t removed by the compiler and the assembly code on the right reflects exactly the source code on the left. You may ask why would someone use this kind of less efficient code. One answer is memory-mapped I/O, where Gadgets and electronic peripherals (e.g., sensors, displays, etc.) must communicate with the software in a very specific way. Optimizations could break the communication with those electronic parts.

Still in this topic, I have found an interesting difference between GCC and CLANG when it comes to optimizations and the volatile keyword. You can check it out in my previous article: a note about the volatile keyword in c++

In C/C++, the volatile keyword is NOT suitable for concurrent programming. It is meant to be used only with special memory (e.g., memory-mapped I/O).

In Java, volatile doesn’t guarantee atomicity and thread-safety

A common misunderstanding about the volatile keyword in Java is whether operations on such variables are atomic or not. The answer to this question can be YES and NO, so it depends on what the code is doing. Let me explain.

Operations are atomic if you look at individual reads and writes of the volatile variable. In other words, reading or writing a value works as if you are entering a synchronized block. So the integrity of the value is kept safe and there is no chance you will read a broken value because some other thread was still in the process of writing it.

Operations are NOT atomic if the code has to read the value before updating it. For example, consider this code:

volatile int i = 0;
i += 10;  // not atomic

The += operator isn’t atomic because operations on volatile variables perform no locking. Because there is no locking, the executing thread cannot block other threads before the operation is completed. So read-update-write sequences on volatile variables are not thread-safe in Java. Another thread could just sneak into the sequence and create a race condition.

One way of implementing atomicity in Java is to use atomic classes from the java.util.concurrent.atomic package, like AtomicInteger, AtomicBoolean, AtomicReference, etc. The previous code would look like this:

AtomicInteger i = new AtomicInteger(0);
i.addAndGet(10);  // atomic

In C/C++, volatile can be used in return types and parameters

C++ allows return types and parameters to be volatile. The following function takes a volatile int parameter and returns another volatile int:

volatile int work(volatile int x) {
  return x + 1;
}

As discussed before, the only difference the volatile qualifier makes in this case is that it won’t be optimized by the compiler. The same function without the volatile keywords would be optimized. Here is the assembly code generated by the CLANG compiler:

volatile_function

Now compare the same function without the volatile qualifiers. The assembly code is simplified:

nonvolatile_function.png

In C++, volatile can also be used in class methods, constructors and operators

C++ also allows you to use the volatile qualifier in class methods. Here is a simple example:

class MyClass {
  public:
   int normalMethod() { 
      return 1; 
    }
   int volatileMethod() volatile { 
      return 2; 
    }
};

If an instance of this class is declared volatile, then the code can only call volatile methods on that object:

int main() {
   volatile MyClass a;
   a.normalMethod();  // error: nonvolatile method cannot be called
   a.volatileMethod();  // OK 
   return 0;
}

If the instance is not volatile, then the code can call both volatile and nonvolatile methods:

int main() {
   MyClass a;
   a.normalMethod();  // OK
   a.volatileMethod();  // OK 
   return 0;
}

Conclusion

The meaning of the volatile keyword differs from language to language and this blog post briefly explains how it works in Java, C and C++.

In Java, volatile variables are meant to be used in multithreaded contexts. Threads that concurrently read a volatile variable will always read its latest value, which means the JVM will not cache the variable internally and its value will be shared across all processors.

Atomicity and thread-safety in Java are not guaranteed by volatile variables. In summary, you should use the volatile keyword if writes to the variable do not depend on its current value, or you can ensure that only a single thread ever updates the value. Besides that, there shouldn’t be other variables participating in the concurrent state of the class, otherwise you will have race conditions. Finally, remember that locking is not required for any other reason while the variable is being accessed.

In C/C++, the volatile keyword is meant to be used with special memory: manipulating I/O registers or memory-mapped hardware. All optimizations on those variables will be disabled, which means that assignments and their order will be preserved. The volatile qualifier doesn’t help us in multithreaded code.

We can finish with a table that summarizes the points discussed:

Java C / C++
Purpose Concurrent Programming Special Memory (e.g., memory mapped I/O)
What difference does it make? Adds visibility to the variable Disables optimizations on the variable
Is the operation atomic? Yes for individual reads and writes;

No if the write depends on the current value;

No
Applies to class fields local and global variables;

return types;

function parameters;

class fields and methods (C++ only);

Beware the return statement in Javascript

The return statement in Javascript looks innocent and intuitive, but it hides a caveat that can break the logic of your code and lead to undefined behavior. The best way to illustrate this is to use a simple example, so let’s consider this function:

function greeter(name) {
    return 'Hello ' + name;
}
greeter('Hugo');  // works as expected

The function above works as expected and you don’t have to worry much about it. Now let’s imagine that you have to concatenate more strings inside the greeter function and you end up with a much longer line of code like this:

function greeter(name) {
   return 'Hello, ' + name + ',\nThis is a long long long sentence.\nThis is another long sentence.';
}
console.log(greeter('Hugo'));

The code above also works fine if you keep it exactly as it is. You can notice that the string is too long to fit the width of the screen, so any decent developer would care about that and try to break that long line into smaller readable chunks. One possibility is:

function greeter(name) {
   return 
       'Hello, ' + name + ',\n' +
       'This is a long long long sentence.\n' + 
       'This is another long sentence.';
}
console.log(greeter('Hugo'));

B-I-N-G-O! The code above does NOT work as expected. This function returns undefined because the return statement is actually empty and Javascript doesn’t consider the next lines to be part of the return statement. There are some ways to fix this issue. One of them is to remove the line break after the return keyword, like this:

function greeter(name) {
   return 'Hello, ' + name + ',\n' +
       'This is a long long long sentence.\n' + 
       'This is another long sentence.';
}
console.log(greeter('Hugo'));

You can notice that the code above is a bit unaligned. A perfectionist (like me) would add the line break after the return in order to fix the alignment problem, but this actually creates the other (much bigger) problem. Sigh.

Another way of fixing the code is to use parenthesis around the return value. In this case the alignment isn’t a problem:

function greeter(name) {
   return (
       'Hello, ' + name + ',\n' +
       'This is a long long long sentence.\n' + 
       'This is another long sentence.'
   );
}

Parentheses are valid in this case because the return statement can be followed by an expression. It is quite uncommon to see such syntax, but in this case it plays an important role. This is one of the many details about the language that you have to be aware of in order to play it safe. If you develop code in other languages — like Java or C++ — you know that such code without the parentheses would work perfectly fine in those languages, so it would be fairly easy to forget to use them when you switch back to Javascript.

Typescript is better

In Typescript we could have some similar code like this:

function greeter(name: string): string {
   return 
       'Hello, ' + name + ',\n'
       'This is a long long long sentence.\n' + 
       'This is another long sentence.';
}
console.log(greeter('Hugo'));

If you try to compile this code with Typescript you get the following error message:

code.ts(3,8): error TS7027: Unreachable code detected.

So it basically says that line 3 has unreachable code and now you have to fix it. This is a much better world compared to pure javascript development since you know that this issue will not waste your time later.

If you use Typescript, make sure you have the compiler option “allowUnreachableCode” enabled (otherwise you will not see the error above). Read more about this flag on the compiler options page.

One question may come to your mind at this point: should Typescript fix the code automatically by generating JS code that doesn’t contain the line break after the return statement? Nope. The number of scenarios that this issue could happen is huge, so it is much safer to just warn the developer instead of trying to fix the code silently. I think Typescript is doing a good job here (kudos to Microsoft).

Conclusion

The return statement issue described in this post is just one of many caveats of the Javascript language. It should remind you that Javascript alone can be a treacherous path and you should definitely consider better alternatives like Typescript in your development. Javascript alone can make the software maintenance a nightmare in the long run, so serious developers must look for better alternatives in order to detect and fix mistakes as early as possible. Be productive and stay safe.

Having fun with Typescript, ThreeJS and Ammo.js

This week I implemented a simple 3D experiment with Typescript, ThreeJS and Ammo.js. It works like an FPS game where you can navigate with the mouse pointer and W-A-S-D keys, and you can also shoot cannon balls with the mouse button.

webgl_shooter.png

Using Typescript

Typescript is a great language that compiles to Javascript code. If you are a web developer and have never used this language before, you should check it out immediately because it has a lot of interesting things that can make your life (as a web programmer) much easier. You know that Javascript is a powerful language, but alone it can make the maintenance of big projects a nightmare. Typescript is a great step towards high quality maintainable code because it is a typed superset of Javascript.

The challenge of using Typescript with some external libraries is that you need the type definition files of those libraries to compile your code. A type definition file is a piece of typescript code that contains the declaration of types, classes, interfaces and functions of a library. It doesn’t provide implementation details, only types and signatures. This allows Typescript to analyze your code and detect programming errors. For example, check the ThreeJS definition file here:

DefinitelyTyped is a website that aggregates lots of definition files and it is a great asset for Typescript developers. Sometimes it doesn’t contain definition files that you need and this was the case of the Ammo.js library I used in my experiment.

No definition files – What should I do?

When there is no definition file available for the library you want to use, you can build one your own. Typescript can certainly guide you through this process. The compiler errors reported by Typescript will show you exactly what is missing in the definition file. For example, let’s try to declare a variable like this:

let physicsWorld: Ammo.btDiscreteDynamicsWorld;

You can start with an empty definition file (ammo.d.ts). Compiling the code above will show this message:

Cannot find namespace 'Ammo'.

So you can use this information to fix your custom definition file for that library. In this case, you can just add an empty namespace like this:

declare namespace Ammo {
}

After that, if you try to compile again the error message will change to:

Module 'Ammo' has no exported member 'btDiscreteDynamicsWorld'.

So you can change your definition file to:

declare namespace Ammo {
    export class btDiscreteDynamicsWorld {
    }
}

Following this pattern, you can basically add the missing information piece by piece to the definition file until your project is fully compilable.  Of course you have to keep an eye on the real library code and make sure your declarations are consistent with the real implementation. For Ammo.js, you can see below the final definition file I created. Note that there is no implementation code, only types and signatures.

declare namespace Ammo {

   export class btDefaultCollisionConfiguration {}

   export class btCollisionDispatcher {
      constructor(c: btDefaultCollisionConfiguration);
   }

   export class btVector3 {
      x(): number;
      y(): number;
      z(): number;
      constructor(x: number, y: number, z: number);
   }

   export class btAxisSweep3 {
      constructor(min: btVector3, max: btVector3);
   }

   export class btSequentialImpulseConstraintSolver {}

   export class btDiscreteDynamicsWorld {
      constructor(a: btCollisionDispatcher, b: btAxisSweep3, c: btSequentialImpulseConstraintSolver, d: btDefaultCollisionConfiguration);
      setGravity(v: btVector3);
      addRigidBody(b: btRigidBody);
      stepSimulation(n1: number, n2: number);
   }

   export class btConvexShape {
      calculateLocalInertia(n: number, v: btVector3);
      setMargin(n: number);
   }

   export class btBoxShape extends btConvexShape {
      constructor(v: btVector3);
   }

   export class btSphereShape extends btConvexShape {
      constructor(radius: number);
   }

   export class btRigidBody {
      constructor(info: btRigidBodyConstructionInfo);
      setActivationState(s: number);
   }

   export class btQuaternion {
      x(): number;
      y(): number;
      z(): number;
      w(): number;
      constructor(x: number, y: number, z: number, w: number);
   }

   export class btTransform {
      setIdentity();
      setOrigin(v: btVector3);
      getOrigin(): btVector3;
      setRotation(q: btQuaternion);
      getRotation(): btQuaternion;
   }

   export class btRigidBodyConstructionInfo {
      constructor(mass: number, motionState: btDefaultMotionState, shape: btConvexShape, inertia: btVector3);
   }

   export class btDefaultMotionState {
      constructor(t: btTransform);
   }
}

You should realize that my type definitions for Ammo.js are far from complete, considering what the library does. Ammo is a huge library and it contains a lot more things that my code will not use at this point. This is okay and the main point here is that the file I created is enough for me to compile my experiment and keep moving. If I need more methods from Ammo.js, I can adjust that file as needed and move on.

Typescript and browser differences

Sometimes Typescript will NOT understand your Javascript code and you will have to deal with it. Let me give you a clear example:

let m = document.body.requestPointerLock || 
        document.body.mozRequestPointerLock || 
        document.body.webkitRequestPointerLock;

The code above is syntactically correct and it looks for the Pointer Lock API in the current browser. Typescript complains about this code with the following error message:

Property 'mozRequestPointerLock' does not exist on type 'HTMLElement'.
Property 'webkitRequestPointerLock' does not exist on type 'HTMLElement'.

Those methods are browser-specific and typescript doesn’t recognize them. We have to fix this error message and one possible solution is to assign the document.body value to a variable of type “any”. This will prevent typescript from checking if those properties belong to the HTMLElement type:

let _body: any = document.body;
let m = _body.requestPointerLock || 
        _body.mozRequestPointerLock || 
        _body.webkitRequestPointerLock;

There are other ways of handling such cases, but I won’t explain them in this post. One great thing about Typescript is that it doesn’t stop the compilation when it finds errors like the one above. It simply generates the javascript output as if the code were correct and you can clean up that error message later.

Conclusions

Working with Typescript is an amazing experience. I believe that Microsoft is doing a great job with this language and contributing to the future of web development with great ideas. My experiment with ThreeJS and Ammo.js is just in the beginning. I have a good experience with the BulletPhysics library (see my personal projects) and this should allow me to expand the code and build more cool demos, examples and games.

 

A note about the volatile keyword in C++

The volatile keyword in C++ is poorly misunderstood. Lots of developers believe that it makes the declared variable an atomic variable, but that’s wrong. The truth is that you should use std::atomic or related code (e.g., mutex) on a variable if you want to guarantee atomic operations in a multithreaded context.

What I want to discuss in this article is something else. Before I start you should remember that the volatile keyword should be used when you want to tell the compiler to not optimize a given variable. It basically says “don’t perform any optimizations on operations on this memory because something else outside the program (which the compiler is not aware of) may change it”. The most common use of volatile is in memory-mapped I/O. Gadgets and electronic peripherals (e.g., sensors, displays, etc.) may communicate with the software and optimizations could break the code.

The use of the volatile keyword has caught my attention recently and I want to show you a curious thing about it. I will go through some code examples and I will use an amazing online tool called gcc.godbolt.org to look at the assembly code generated by the compiler.

With the GCC compiler

Let’s start with a very simple example without the volatile keyword. As you can see in the code below, we have an int variable and a while loop. The GCC compiler realizes that the variable will never change its value and the loop will never finish. So it optimizes the assembly code by ignoring the variable check and keeps jumping forever back to the L2 label.

no_volatile

Now consider the same code, but let’s add volatile to the declaration of the int variable on line 4. As we can see, the GCC compiler now generates an assembly code that is totally different than the previous example and it respects the value == 5 check inside the loop.

volatile1.png

Now let’s change this example a little bit. What would happen if we move the int variable into a struct? Consider the code below without the volatile keyword. The assembly code is the same as in the first example. The GCC compiler ignores the loop check and keeps jumping forever.

no_volatile2

If we want to tell the compiler to not optimize the int variable, should we add the volatile keyword to the int variable declaration inside the struct? Let’s see how this would work:

volatile2

As we can see, the GCC compiler simply ignored the volatile definition inside the struct and optimized away the loop again. A careless developer could easily assume that code was correct and move on to other tasks. Bugs later would require hours of debugging in order to find the source of the problem. The real solution here is to add the volatile keyword to the struct object on line 6:

volatile3.png

I’ve tried different versions of GCC they all have the same result regarding this issue. Now let’s try CLANG and see what happens.

With the CLANG compiler

Using clang 3.9 we can start without the volatile keyword. As you can see below, the loop is optimized just like it happened with the GCC compiler.

clang0.png

Now let’s try the volatile keyword in the struct member variable. We have a surprise here: CLANG doesn’t optimize the loop like GCC did. I tried older versions of CLANG and the output is the same.

clang1.png

We get the same result when we move the volatile to the object on line 6.

clang2.png

So which compiler is the correct one?

It seems to me that CLANG is doing a better job than GCC because the loop deals with the struct object and its member variable. So if the volatile modifier is present either in the object or in its member, then the compiler has to respect that in the loop.

If you have more thoughts about this, please leave comments below.

EDIT 1: I posted this discussion on reddit and we have some great comments there.

EDIT 2: This issue was also discussed on the CppCast Episode 76 with Dan Saks, check it out: http://cppcast.com/2016/10/dan-saks/

Reading text files line by line: Java, C++ and C benchmark

I wanted to compare how Java, C++ and C perform when reading a text file line by line and printing the output. I’ve implemented some possibilities and, at the end, we can compare the speed of each execution.

Java 7 version (BufferedReader)

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class Main7 {

   private static final String FILENAME = "/path/to/file.txt";

   public static void main(String[] args) throws IOException {
      File file = new File(FILENAME);
      BufferedReader br = new BufferedReader(new FileReader(file));
      for (String line; (line = br.readLine()) != null; ) {
         System.out.println(line);
      }
   }
}

Java 8 version (Stream)

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

public class Main8 {

   private static final String FILENAME = "/path/to/file.txt";

   public static void main(String[] args) throws IOException {
       try (Stream stream = Files.lines(Paths.get(FILENAME))) {
           stream.forEach(System.out::println);
       }
   } 
}

C++ (ifstream)

#include < iostream>
#include < fstream>
#include < string>

using namespace std;

int main() {
   const constexpr char* FILENAME = "/path/to/file.txt";
   ifstream file(FILENAME);

   string line;
   while (file.peek() != EOF) {
      getline(file, line);
      cout << line << '\n';
   }
   return 0;
}

C (FILE)

#include < stdio.h>
#include < stdlib.h>

int main(void)
{
   FILE* fp = fopen("/path/to/file.txt", "r");
   if (fp == NULL)
      exit(EXIT_FAILURE);

   char* line = NULL;
   size_t len = 0;
   while ((getline(&line, &len, fp)) != -1) {
      printf("%s", line);
   }

   fclose(fp);
   if (line)
      free(line);
   exit(EXIT_SUCCESS);
}

Performance

Below you can see how each program above performs reading a TXT file with 10,440 lines (file size is 3.5Mb).

 Version  Time
 Java 7 (BufferedReader)  0.254 seconds
 Java 8 (Stream)  0.324 seconds
 C++ (ifstream)  0.429 seconds
 C (File)  0.023 seconds

As we can see, C is by far the fastest option. I am surprised that C++ isn’t faster than Java, but this is probably because of the ifstream and std::getline() implementation. This is not the first time I see the Standard Library with performance issues compared to other languages (the regular expression implementation was the first time).

 

 

Advantages of Mercurial over Subversion

I have been wondering about version control systems (VCS) recently and I wanted to build a list with clear advantages of Mercurial over Subversion. First of all, Mercurial is a distributed VCS, Subversion is not. This brings advantages like:

1) You have a full copy of the repository on your computer, so you don’t have to rely on server backups in case a server goes down.

2) If you don’t have internet connection, you can still work and commit the changes to your local repository. When the connection is back online, you can push your changes to the server.

3) Mercurial organizes revisions as changesets which allow you to very easily branch/merge the code base. Merging a branch in SVN is harder.