The JVM needs Value Types

My biggest frustration with Java and the JVM gets little attention outside very geek circles: the lack of support for Value Types in the JVM (known as structs in .NET). Here I’m going to describe them and argue the case for supporting them.

Value Types give the ability to lay out an application’s data in memory in a much more efficient way than the JVM currently allows. They can significantly improve the performance of JVM applications that use small, numerous objects. Since memory layout is considered a low-level issue, the lack of, or even the concept of, Value Types, doesn’t receive much publicity & discussion. Undeservedly! I hope that I can convince you that in fact, a crucial foundation piece needed for a performant, all-purpose VM is currently missing from Java.

What are Value Types?

I use the term “Value Types” to mean lightweight objects that behave, and are laid out in memory, like Java’s primitives:

  • A value type resides “at” the field, local variable, or array cell where it is defined. Thus, it is created as part of a containing object, stack frame or array, respectively.
  • Like a primitive, it is not allocated as an object on the heap and thus it is not garbage collected. To be passed as a reference type (ie normal object), it must be boxed. Only the boxed form can be null.
  • Unlike a primitive, a value type can have (a) multiple named fields which can be either value types or references to normal objects, and (b) static and instance methods like any other object.
  • When a value type is passed or assigned, a copy of its values is made.
  • When value types are allocated in an array, the array cells contain the type’s actual data fields. When reference types are allocated in an array, the cells contain pointers to the object record, which reside somewhere else on the heap. This quality is one of value type’s most compelling advantages.
  • Value types dont support inheritance well.

Examples of Value Types

The Common Language Runtime in the .NET platform supports Value Types (called Structs), and so its APIs provide a number of examples. A compelling example is its Decimal , which covers similar ground to Java’s cumbersome, inefficient BigDecimal. It supports 28 significant digits of precision without floating point error, and does all this in 8 bytes of storage. From a programmers perspective, it behaves just like one of Java’s primitive types, providing a blend of efficiency and convenience. Other examples are string, DateTime and Point (you can see the tendency for value types to be small, lightweight things.)

Value Types can be useful outside core libraries too. In 3D graphics, Triangles are ubiquitous and make an excellent value type, as do Points, Rectangles, Ranges and Quaternions. In scientific computing, Complex Numbers consist of a pair of floating point components, the real and imaginary parts, which are perfect candidates to be value types.

Why & Where are they more Efficient

Recall that a value type object is nested within its containing type in memory, whereas every regular reference type object has its own distinct record in the heap, and containing objects have a pointer to that record. This is the cause of all the efficiency benefits; when compared to a reference type, they

  • requires less space in memory to represent the same data fields (once).
  • do not require a separate allocation operation – memory is allocated as part of their parent’s allocation
  • require less CPU effort to access their field in memory, because they are a fixed offset from their parent objects memory address. Reference types require de-referencing a pointer to access a non-primitive object field.
  • make better, less-polluting use of CPU cache memory, because reside in the same block of memory as their containing object, and because they are more compact. Reference objects are allocated onto a mixed heap shared with other threads and object types, so loading a block of heap memory into a cache line can “pollute” it with irrelevant data.
  • The VM’s Garbage Collection system can treat value types as having the lifespan of their containing reference type, so garbage collection of value types is very low cost.
  • Modern VMs relocate objects in memory when they garbage collect. This requires them to track and rewrite all pointers to relocated objects as they are moved. Because value types do not have pointers to them, there is nothing to track or relocate.

Note that reference types can be shared by many parents, but value types have only parent, and will result in duplicated copies where shared amongst many objects. They are not appropriate for widely shared, or shared mutable, data.

Arrays of Value Types: Benefits Multiplied!

Value types power is unleashed when they are stacked in arrays for bulk operations- neat rows of data-bricks lined up in memory. The performance advantage of a large value-type array, relative to a large reference-type array, is massive;

  • An entire cache line can be filled with the value type data, as soon as the first instance is accessed, and iterating over the sequence can proceed without going back to main memory. A reference type fills the cache up with pointer addresses, to heap records that may be scattered or mixed with other object types. (Granted, a smart GC can often arrange related heap records together after the first generation copy, thus alleviating this problem)
  • Because value types have a regular fixed size, the memory address of the nth array element can be computed from n and the vale type’s size. This is quicker than a pointer dereference, which requires a memory read.
  • Huge numbers of value types can be created in a single bulk memory allocation. Although reference type allocation is apparently very fast in modern VMs, it still must be done for every reference type array member and presumably requires thread/CPU synchronization/barriers.
  • Huge numbers of value types can be cleaned up without any need for the GC to consider their lifespan or track their reachability. When the containing array becomes eligible for collection, they are all disposed of at once.

Because arrays of value types consist of a sequence of data values and nothing else, they can also be directly mapped onto data from IO operations. For example, imagine a bulk quantity of point data is written into a block of memory in a JVM, from either the network or filesystem. With value types this data could be “interpreted” as type Point[] in place, without any subsequent copying or allocation. (See ByteBuffer for how this currently works for Java primitives.)

An extension of such a buffer sharing approach might allow Java apps to work efficiently with hardware accelerators for 3D graphics and physics. Typically, the APIs to such devices work on the same copy of the data as the client application, as the data volumes involved are too large to duplicate.

An Example: Triangle Mesh Data

Now let me give a practical example: imagine we want to represent a
10,000-triangle mesh.

struct Vertex {

double x;

double y;

double z;

//constructor, methods etc..

}

struct Triangle {

Vertex v1; Vertex v2; Vertex v3;

//constructor, methods etc..

}

class Mesh {

Triangle[] meshTriangles = new Triangle[10000];

}

So all up, as well as 10,000 triangles, there are 30,000 vertices. Lets
compare how value types and reference types handle this challenge:

Value Types Reference Types
Allocation Meshes entire mesh allocated in one block in one
operation
Mesh’s memory allocated in 40001 separate blocks
in 40001 separate new calls
Access a field eg meshTriangles[8622].v1.y Field’s address can be determined as direct
offset from array address
Requires loading and de-referencing 2 pointers
to locate the field
Cache Friendliness When a mesh field is accessed, mesh data will
typically fill the entire cache line
Accessing mesh data will load into cache
whatever is in that part of the heap, mesh data mixed with possibly
unrelated data.
Garbage Collector copying reachable data Entire mesh be block copied. Track and rewrite
one reference pointer when data relocated
GC must trace into all 10000 triangle to copy
each vertex separately. Track and rewrite 40001 reference pointers when
data relocated.

Value Types and JVM Performance

I see us entering a world where the JVM is the fastest place to execute modern code.

So in that world, what’s the next bottleneck? Thats how performance works, right? Amdahl‘s law etc. With code path execution and garbage collection out of the way, the chatty, cache-unfriendly one-size-fits-all solution that is reference types will increasingly show up as a brake on achievable performance.

I’m not anti-reference types, BTW. They are the correct “default” model of an object. But there are simply times when they give you much more flexibility than you need, for a far greater price than you want to pay.

Bringing Value Types to the JVM

I hope Ive given you a sense of how cool value types would be in Java. High performance games, scientific computing and lower-level systems programming applications could become quite practical.

But unfortunately, value types are not something that can be implemented by spitting out different bytecode from a compiler. They require support at the bytecode and JVM level. Therefore, code using value-types can only run upon a newer VM with value type support. In that way, they are like the Java 5 changes: non-value type based classes can still execute fine on a VM with value type support, but not the reverse.

New JVMs and bytecode format changes dont happen very often. But one of the most significant revisions ever is going on right now; the Da Vinci Machine, associated with forthcoming JDK 7. From its blurb:

“We are extending the JVM with first-class architectural support for languages other than Java, especially dynamic languages. This project will prototype a number of extensions to the JVM, so that it can run non-Java languages efficiently, with a performance level comparable to that of Java itself.”

Although its stated mission is more oriented towards improving dynamic langauge support, there is a broader list of possible changes that includes a (differently nuanced) form of value type support.

Honk if you like Value Types too…

If you think value type support is desirable and important for the JVM, please help to raise awareness of the issue:

  • Leave a comment here – its easy! (Registration not needed to post comments)
  • Advocate it on Da Vinci Machine forums like:
  1. JVM Languages Google group
  2. Da Vinci Machine Mailing list
  • Advocate in the wider Java community, eg your work, your blog, TheServerSide.com, JavaLobby.org, http://www.infoq.com/java/, etc

Honk if you dont like Value Types

Please also feel free to comment if you feel there are inaccuracies, mistakes etc in my post, or other reasons why value types cant fly on the JVM (beyond “political difficulty”)

About these ads

15 Comments

  1. Alex said,

    September 29, 2008 at 4:11 pm

    Thanks man, now i really understand what is value types. I personally like an idea to bring them to JVM.

  2. Denis said,

    September 30, 2008 at 8:11 pm

    Used to code for Symbian. Now I consider to distinguish types in this particular way as a bad practice. Frankly, old times they had one single part in a book just for string-types (it was a real pain). Nowdays they are trying to get rid of it in new Symnian OS versions. Don’t bring this into Java. There are so many other languages for it. Just don’t mess it into Java, please.

  3. Ismael Juma said,

    October 2, 2008 at 2:13 pm

    Hi,

    Although I think value types would be useful to have in the JVM, it’s worth mentioning that there are HotSpot modifications that propose to get many of their benefits automatically. See the “Automatic Object Inlining” section of:

    wikis.sun.com/display/HotSpotInternals/Publications+JKU

    As usual, there are trade-offs when it comes to choosing an automatic versus manual approach.

  4. Art B said,

    October 3, 2008 at 7:44 am

    Wouldn’t it make more sense to use annotations to denote classes the programmer feels are value-ish and let HotSpot them optimise them that way? That way if at a later date you want to extend them then you don’t have to worry about making majour changes, HotSpot will just stop applying the optimization?

  5. benhutchison said,

    October 3, 2008 at 9:40 am

    Denis: I’m proposing to bring it into the JVM platform, /not/ Java the language.

    There is software it is simply not possible to build competitively on the JVM unless value types are used. To me, saying “use another language” is to admit defeat.

  6. benhutchison said,

    October 3, 2008 at 9:41 am

    Ismael,

    Thanks for the link. “Automatic Object Inlining” seems like clever stuff.

    Ive been told that GHC Haskell automatically decides it can use unboxed records (ie value types) internally in some cases.

  7. benhutchison said,

    October 3, 2008 at 9:46 am

    Art B,

    it could be done with annotations, however changing between value and references types changes the semantics (ie meaning/function) of a program. In general, you could not switch back an forth without breaking the software. In some limited cases, it seems you can, as is described in the link Ismael posted.

  8. Niclas Wiberg said,

    August 5, 2009 at 9:23 pm

    I would like to have support for value types in the JVM. But I think it may be necessary to support it on the source code level as well.

    At first sight, it may appear that immutable classes (see http://creativekarma.com/ee.php/weblog/comments/value_types_in_java/) could be treated as value types by the JVM. Indeed, the “Automatic Object Inlining” mentioned above seems to do exactly this. I doubt that it could be a general solution however, because of the problem of unique but equal instances.

    For instance, Integer a = new Integer(3); Integer b = new Integer(3); creates two equal but unique objects.

    If immutable classes like Integer would be implemented internally as value types, what would be the desired object identity semantics? For instance, when checking the identity of two variables using the == operator, the only sensible thing seems to be to use equals(). But that would break with current practice, for instance a == b is false in the above example.

    Another problem is locking, owning the lock of object a above is not the same as owning the lock of object b, although the objects are equal.

    I would guess that a JVM implementation of “Automatic Object Inlining” has to be very restrictive, making sure to avoid these problems. That probably means that such “value types” can not be stored in arrays. Which would be a pity because, as indicated in the article here, that is one of the really interesting scenarios for value types.

    Perhaps it is a mistake to have public constructors on immutable classes, since they open up the possibility of creating unique equal objects. An immutable class should perhaps instead only provide object creation using static methods, like Integer.valueOf(int). That way, software should never rely on object identities.

    What about this: New immutable classes should be written without public constructors, providing object construction using static methods. Such classes could get special treatment from the JVM, implementing them as value types when deemed suitable. Object identity would then be defined according to either the equals() method or just identity in all fields. Perhaps such classes should be annotated in a special way, or extend a common class java.lang.Immutable.

  9. August 5, 2009 at 9:38 pm

    Niclas,

    Thanks for your comments.

    Re: Changed semantics for value types (eg equals, locking, etc)… absolutely. One cant switch Integer between a Value type and a Ref type without breaking things like equals().

    Java’s Integer shouldn’t be retrofitted anyway, since we already have ‘int’. Its more complex types that are more interesting/useful, like tuples and arrays.

    .NET is IMO the best guide for how to do it decently.

    I wouldnt expect to support locking over value types. locking over arbitary objects is widely seen as a near useless feature anyway, a mistake in Java’s design. Use dedicated Monitors instead.

    Re:

  10. Niclas Wiberg said,

    August 6, 2009 at 12:13 am

    The really interesting question is: How to make the JVM optimize away a reference object into a value object?

    This is in my view the most interesting question, since it appears difficult at this point to introduce a really new concept into the Java language. As far as I can tell, the main reason why the JVM can’t make value types of immutable classes is the object identity problem.

    So, my idea was to introduce a small change that lets the JVM know that object identity is not a problem for a particular class.

    For instance, there could be a new abstract class java.lang.Immutable. Its subclasses would be required to have only primitive (or Immutable) final fields and to follow some rules for the equals() and hashCode() methods (essentially being based on the field values). Object identity (the == operator) would be defined as the same as equals(), and null-valued variables of Immutable type would not be allowed. This would allow a smart JVM to make inline value objects of variables declared as such classes.

    There are some problems. For instance, an Immutable object might be assigned to an Object variable and used like a reference object. That would probably require that a “real” object is created for that value, so that it can have a reference. The question is what happens with object identity in that case. It could be solved by keeping a register of referenced immutable objects, sharing them when necessary so that there are never two unique but equal instances. A bit awkward, but it would preserve language semantics while only incurring a cost for the new Immutable classes, and only when they are used in the perhaps “least interesting” way. An alternative would be to have a special handling of “==” for variables declared as Object, checking whether they are Immutable or not. But that would come at a cost. Yet another alternative would be to let object identity be undefined for subclasses of Immutable. But that would break with Java language semantics.

  11. Thaina said,

    May 6, 2011 at 1:52 am

    I want this too

    It is the most important things for low level performance of Java language right now

    I have add something like this to Google App Engine language feature request 2 days ago

    I’m C# game programmer so I know what it really useful of this things. Make me curious why all clever man in SUN and Oracle never think about it even after C# have it from the first version

  12. Thaina said,

    May 6, 2011 at 1:59 am

    Sorry for dig this but, as you see, Java still doesn’t support this yet

  13. Piotr Kołaczkowski said,

    January 13, 2012 at 10:57 pm

    Please give value types to the JVM. Other JVM languages, e.g. Scala would get huge benefits from it. There is also *no* semantic difference between value types and reference types for immutable classes (which Scala and Java use a lot) – so it is a perfect case for automatic optimisation at the JVM level. Java would benefit from this even without changing the Java language.

  14. Niclas Wiberg said,

    January 14, 2012 at 1:25 am

    It would be nice to have objects of immutable Java classes be treated as values by the JVM, in the sense that they can be copied and stored by value instead of referenced. For instance this could be very useful with large arrays of small immutable objects.

    I think it would require changes to the Java language however.

    The problem is that such immutable objects must be assignable to variables of class Object, and then follow all the semantics of that class. This includes following the semantics of the == operator and some esoteric things like locks.

    (I posted some ideas above on how to handle this problem with the current Java language, but they are quite awkward.)

    It is probably necessary to introduce some Java language changes to do this.

  15. rose00 said,

    March 25, 2012 at 6:08 am

    Here’s some progress on working this gracefully into Java. Not surprisingly, the hardest part is the arrays.

    https://blogs.oracle.com/jrose/entry/value_types_in_the_vm

    Thanks for an excellent exposition!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: