Convert Java into C#

Rodrigo Di Olivera is a powerfully creative man, of relatively few words, who perhaps speaks loudest through the software he creates (including Boo, Prevayler).

Now he’s released a nice piece of programming alchemy: a tool that converts Java into C#.

One day I hope to port my Arcadia RPG engine to Windows Mobile (.NET Compact Framework). I expect this tool has just made that task almost trivial, given that Arcadia makes very few demands of its host environment’s APIs.

Software Engineering Radio

Hearing a speech by an expert makes a quite different impression to reading their written words. Its not surprising the two media aren’t interchangeable, given that differing senses, neural circuitry, and mental frameworks are involved in experiencing each.

My feeling is that speech better outlines general concepts and big ideas, while the variety of intonation in voice can convey subtle values judgments and preferences that text renders flat. Written text, meanwhile, rules supreme when fine detail needs to be communicated, or when the consumer of the information needs to regulate their consumption rate (for example, when reading mathematical formulae).

Software Engineering Radio is a series of about 100 podcasts from leading experts on a diverse range of topics. Its a fairly effortless and enjoyable way to expose your mind to some of the finest memes (and people) in the field today, and a refreshingly distinct alternative to reading blogs and books.

Static vs Dynamic Typing: My Preference

There has been a vigourous debate in the couple of years running through the whole software development community on the merits of Dynamic vs Static typing.

When I was at uni in the mid 90’s, the static typing forces seemed dominant, but the tables have turned. Now dynamic typing is cool. Ruby, Groovy Python, Erlang, Clojure and Javascript are wowing everyone with their funky features and terseness. Sometimes it seems like all the smart people are doing it, while the mediocre corporate.programmers of the world plod along in statically typed Java, C# or C++.

Where do I stand?

Static Typing Forever!!!

I still love statically typed languages. For me, its the only way to fly.

The safety & correctness aspect of static typing is only a part of their value.The fundamental reason for typing is to put more of the intent of the code into the language. Thus requiring less knowledge be stored in the programmers head.

def writeToStream(stream)
  stream.write(@content)
end

bytesWritten = envelope.writeToStream(filestream)

In this Ruby example, the programmer clearly intended “stream” to be a particular type of object. That is, stream does have a type, its just implicit in their head. Well then, why not use a language and toolset smart enough to understand type of ‘stream’? Why burden the programmers brain with sole responsibility for implementing the type system?

Yes, I’ve used Dynamic Typing in Anger

For 6 months Ive been working on a large commercial JRuby/Rails and heavy Javascript dynamic web site. Dynamically typed code all day long. Yes, you can stay DRY and keep the syntax clutter low. But Im tired of operating in the fog of partial certainty about what the objects Im woking with actually are and can do.

The challenge: Making Static Typing easier

I think the surging popularity of dynamic typing is symptomatic of the pain of the syntax-heavy static type systems in C++/Java/C#. But I would rather fix the pain and keep the types.

Type Inference is the answer. To those who favor dynamic types: have you looked at statically inferred-type language like Scala, Boo or Haskell?

Virtual Machines will become fastest code execution environment

I’m going to paraphrase a point made recently by Charles Nutter (can’t recall exactly where), because it bears repeating and was a rather profound realization for me.

Nowadays, optimization of code is not bounded by available CPU power for performing optimization, but by

  • information available about the context in which the code will be run
  • the ability to change or withdraw an optimization in response to changed conditions

JITing VMs have better information about code’s execution context than static compiler. They can “change their mind”, and recompile a method with different optimizations applied in response to contextual changes. This permits more aggressive optimizations; inlining of virtual methods being a classic example. Therefore, their trend will be not only to equal, but surpass pre-compiled executables, for modern OO/Functional/Dynamic apps. Virtual machines like the JVM, .NET and cousins, are destined to become the fastest place to run the software of the future.

Of course, for simple, fixed tasks, old C code will still run fastest of all. In comprehending this principle, its important to compare apples with apples. Demands for flexibility and productivity in software have relentlessly pushed it towards more dynamic behaviors. Popular modern apps like Eclipse or Rails rely on dynamic behaviors that are impossible or impractical in statically linked C or C++, and this will only grow.

The JVM needs Value Types

My biggest frustration with Java and the JVM gets little attention outside very geek circles: the lack of support for Value Types in the JVM (known as structs in .NET). Here I’m going to describe them and argue the case for supporting them.

Value Types give the ability to lay out an application’s data in memory in a much more efficient way than the JVM currently allows. They can significantly improve the performance of JVM applications that use small, numerous objects. Since memory layout is considered a low-level issue, the lack of, or even the concept of, Value Types, doesn’t receive much publicity & discussion. Undeservedly! I hope that I can convince you that in fact, a crucial foundation piece needed for a performant, all-purpose VM is currently missing from Java.

What are Value Types?

I use the term “Value Types” to mean lightweight objects that behave, and are laid out in memory, like Java’s primitives:

  • A value type resides “at” the field, local variable, or array cell where it is defined. Thus, it is created as part of a containing object, stack frame or array, respectively.
  • Like a primitive, it is not allocated as an object on the heap and thus it is not garbage collected. To be passed as a reference type (ie normal object), it must be boxed. Only the boxed form can be null.
  • Unlike a primitive, a value type can have (a) multiple named fields which can be either value types or references to normal objects, and (b) static and instance methods like any other object.
  • When a value type is passed or assigned, a copy of its values is made.
  • When value types are allocated in an array, the array cells contain the type’s actual data fields. When reference types are allocated in an array, the cells contain pointers to the object record, which reside somewhere else on the heap. This quality is one of value type’s most compelling advantages.
  • Value types dont support inheritance well.

Examples of Value Types

The Common Language Runtime in the .NET platform supports Value Types (called Structs), and so its APIs provide a number of examples. A compelling example is its Decimal , which covers similar ground to Java’s cumbersome, inefficient BigDecimal. It supports 28 significant digits of precision without floating point error, and does all this in 8 bytes of storage. From a programmers perspective, it behaves just like one of Java’s primitive types, providing a blend of efficiency and convenience. Other examples are string, DateTime and Point (you can see the tendency for value types to be small, lightweight things.)

Value Types can be useful outside core libraries too. In 3D graphics, Triangles are ubiquitous and make an excellent value type, as do Points, Rectangles, Ranges and Quaternions. In scientific computing, Complex Numbers consist of a pair of floating point components, the real and imaginary parts, which are perfect candidates to be value types.

Why & Where are they more Efficient

Recall that a value type object is nested within its containing type in memory, whereas every regular reference type object has its own distinct record in the heap, and containing objects have a pointer to that record. This is the cause of all the efficiency benefits; when compared to a reference type, they

  • requires less space in memory to represent the same data fields (once).
  • do not require a separate allocation operation – memory is allocated as part of their parent’s allocation
  • require less CPU effort to access their field in memory, because they are a fixed offset from their parent objects memory address. Reference types require de-referencing a pointer to access a non-primitive object field.
  • make better, less-polluting use of CPU cache memory, because reside in the same block of memory as their containing object, and because they are more compact. Reference objects are allocated onto a mixed heap shared with other threads and object types, so loading a block of heap memory into a cache line can “pollute” it with irrelevant data.
  • The VM’s Garbage Collection system can treat value types as having the lifespan of their containing reference type, so garbage collection of value types is very low cost.
  • Modern VMs relocate objects in memory when they garbage collect. This requires them to track and rewrite all pointers to relocated objects as they are moved. Because value types do not have pointers to them, there is nothing to track or relocate.

Note that reference types can be shared by many parents, but value types have only parent, and will result in duplicated copies where shared amongst many objects. They are not appropriate for widely shared, or shared mutable, data.

Arrays of Value Types: Benefits Multiplied!

Value types power is unleashed when they are stacked in arrays for bulk operations- neat rows of data-bricks lined up in memory. The performance advantage of a large value-type array, relative to a large reference-type array, is massive;

  • An entire cache line can be filled with the value type data, as soon as the first instance is accessed, and iterating over the sequence can proceed without going back to main memory. A reference type fills the cache up with pointer addresses, to heap records that may be scattered or mixed with other object types. (Granted, a smart GC can often arrange related heap records together after the first generation copy, thus alleviating this problem)
  • Because value types have a regular fixed size, the memory address of the nth array element can be computed from n and the vale type’s size. This is quicker than a pointer dereference, which requires a memory read.
  • Huge numbers of value types can be created in a single bulk memory allocation. Although reference type allocation is apparently very fast in modern VMs, it still must be done for every reference type array member and presumably requires thread/CPU synchronization/barriers.
  • Huge numbers of value types can be cleaned up without any need for the GC to consider their lifespan or track their reachability. When the containing array becomes eligible for collection, they are all disposed of at once.

Because arrays of value types consist of a sequence of data values and nothing else, they can also be directly mapped onto data from IO operations. For example, imagine a bulk quantity of point data is written into a block of memory in a JVM, from either the network or filesystem. With value types this data could be “interpreted” as type Point[] in place, without any subsequent copying or allocation. (See ByteBuffer for how this currently works for Java primitives.)

An extension of such a buffer sharing approach might allow Java apps to work efficiently with hardware accelerators for 3D graphics and physics. Typically, the APIs to such devices work on the same copy of the data as the client application, as the data volumes involved are too large to duplicate.

An Example: Triangle Mesh Data

Now let me give a practical example: imagine we want to represent a
10,000-triangle mesh.

struct Vertex {

double x;

double y;

double z;

//constructor, methods etc..

}

struct Triangle {

Vertex v1; Vertex v2; Vertex v3;

//constructor, methods etc..

}

class Mesh {

Triangle[] meshTriangles = new Triangle[10000];

}

So all up, as well as 10,000 triangles, there are 30,000 vertices. Lets
compare how value types and reference types handle this challenge:

Value Types Reference Types
Allocation Meshes entire mesh allocated in one block in one
operation
Mesh’s memory allocated in 40001 separate blocks
in 40001 separate new calls
Access a field eg meshTriangles[8622].v1.y Field’s address can be determined as direct
offset from array address
Requires loading and de-referencing 2 pointers
to locate the field
Cache Friendliness When a mesh field is accessed, mesh data will
typically fill the entire cache line
Accessing mesh data will load into cache
whatever is in that part of the heap, mesh data mixed with possibly
unrelated data.
Garbage Collector copying reachable data Entire mesh be block copied. Track and rewrite
one reference pointer when data relocated
GC must trace into all 10000 triangle to copy
each vertex separately. Track and rewrite 40001 reference pointers when
data relocated.

Value Types and JVM Performance

I see us entering a world where the JVM is the fastest place to execute modern code.

So in that world, what’s the next bottleneck? Thats how performance works, right? Amdahl‘s law etc. With code path execution and garbage collection out of the way, the chatty, cache-unfriendly one-size-fits-all solution that is reference types will increasingly show up as a brake on achievable performance.

I’m not anti-reference types, BTW. They are the correct “default” model of an object. But there are simply times when they give you much more flexibility than you need, for a far greater price than you want to pay.

Bringing Value Types to the JVM

I hope Ive given you a sense of how cool value types would be in Java. High performance games, scientific computing and lower-level systems programming applications could become quite practical.

But unfortunately, value types are not something that can be implemented by spitting out different bytecode from a compiler. They require support at the bytecode and JVM level. Therefore, code using value-types can only run upon a newer VM with value type support. In that way, they are like the Java 5 changes: non-value type based classes can still execute fine on a VM with value type support, but not the reverse.

New JVMs and bytecode format changes dont happen very often. But one of the most significant revisions ever is going on right now; the Da Vinci Machine, associated with forthcoming JDK 7. From its blurb:

“We are extending the JVM with first-class architectural support for languages other than Java, especially dynamic languages. This project will prototype a number of extensions to the JVM, so that it can run non-Java languages efficiently, with a performance level comparable to that of Java itself.”

Although its stated mission is more oriented towards improving dynamic langauge support, there is a broader list of possible changes that includes a (differently nuanced) form of value type support.

Honk if you like Value Types too…

If you think value type support is desirable and important for the JVM, please help to raise awareness of the issue:

  • Leave a comment here – its easy! (Registration not needed to post comments)
  • Advocate it on Da Vinci Machine forums like:
  1. JVM Languages Google group
  2. Da Vinci Machine Mailing list
  • Advocate in the wider Java community, eg your work, your blog, TheServerSide.com, JavaLobby.org, http://www.infoq.com/java/, etc

Honk if you dont like Value Types

Please also feel free to comment if you feel there are inaccuracies, mistakes etc in my post, or other reasons why value types cant fly on the JVM (beyond “political difficulty”)