Retooling for Scala

As an undergraduate at Melbourne Uni in the mid 90s, we mainly used C with vi on slow, ugly unix terminals. I thought it sucked.  When Java appeared on my radar in 96 I was in 3rd year, and I immediately jumped ship to get onto a higher-level, OO language. 12 years later, and I’d call myself a “native speaker” - Im familiar with almost every corner (and almost every shortcoming) of the language.

Anyway, Java’s weaknesses are well documented elsewhere, so I wont bore you. Ive been searching for the exit for some years, and have finally found my ticket into the next generation, something to get excited and inspired by, something worth comprehensively rewiring my brain for, something to love. Its Scala.

If at first you dont succeed…

I only got into Scala on my second attempt, and even then, the realisation of how good a programing language it is came only slowly. Now it rushes like a torrent in my head, opening new vistas and resetting expectations about how easy programming should be.

The first time, a year ago, its weird swiss-academic accent put me off; eg the name: type syntax, underscore wildcards._, Bottom Types, and the inability of the Eclipse plugin to auto-import like Java does. “Not for me”, I thought, “too strange and theoretical!”

So I followed Groovy, JRuby, C# and Boo instead. But none of them were right for me. C# and Boo dont run on the JVM. And the weak type systems in Groovy and JRuby is something I cannot comfortably swallow, because deep down I’m convinced types represent a fundamental programming truth, and thus should be modelled by the language.

…try, and try again

So I came back and made a second attempt at Scala. I bought Programming in Scala (which is a great beginner book) and I listened to an inspiring talk by Scala creator Martin Odersky. And gradually, I was able to see beyond the strange accent and appreciate that Scala delivers a cutting-edge, ultra-high-level, performant language thats widely usable in the real world, because its so thouroughly compatible with Java and the JVM.

If you’re considering Scala, realize its a very deep, very intelligently designed language. Like me, you might not grok it without considerable effort… but once you do, the rewards are worth it.

Convert Java into C#

Rodrigo Di Olivera is a powerfully creative man, of relatively few words, who perhaps speaks loudest through the software he creates (including Boo, Prevayler).

Now he’s released a nice piece of programming alchemy: a tool that converts Java into C#.

One day I hope to port my Arcadia RPG engine to Windows Mobile (.NET Compact Framework). I expect this tool has just made that task almost trivial, given that Arcadia makes very few demands of its host environment’s APIs.

Software Engineering Radio

Hearing a speech by an expert makes a quite different impression to reading their written words. Its not surprising the two media aren’t interchangeable, given that differing senses, neural circuitry, and mental frameworks are involved in experiencing each.

My feeling is that speech better outlines general concepts and big ideas, while the variety of intonation in voice can convey subtle values judgments and preferences that text renders flat. Written text, meanwhile, rules supreme when fine detail needs to be communicated, or when the consumer of the information needs to regulate their consumption rate (for example, when reading mathematical formulae).

Software Engineering Radio is a series of about 100 podcasts from leading experts on a diverse range of topics. Its a fairly effortless and enjoyable way to expose your mind to some of the finest memes (and people) in the field today, and a refreshingly distinct alternative to reading blogs and books.

Static vs Dynamic Typing: My Preference

There has been a vigourous debate in the couple of years running through the whole software development community on the merits of Dynamic vs Static typing. Time to state my preferences in this enormous, extended group decision process

When I was at uni in the mid 90’s, the static typing forces seemed dominant, but the tables have turned. Now dynamic typing is cool. Sometimes it seems like all the smart people are doing it, while the mediocre corporate.programmers of the world plod along in statically typed Java, C# or C++.

“Strong Typing, Weak Winds”

So said my former dev manager, and Ruby/Smalltalk fan,  Steve Hayes. His proverb questions the traditionally perceived motivation for a type system: to infer correctness of code prior to runtime. Nowadays runtime (ie unit) testing may do a better job of ensuring correctness, without requiring types.

And languages like Ruby, Groovy Python and Javascript are wowing everyone with their speed and terseness…

…but I still favor Static Typing

I still favor statically typed languages; my inner theoretician prevails over my inner practician.

The safety/correctness aspect of types is approx 40%  type’s their value, IMO. The other 60% is about putting more of the intent of the code into the language, and thus requiring less knowledge be stored in the programmers head.

def writeToStream(stream)
  return stream.write(@content)
end

bytesWritten = envelope.writeToStream(filestream)

In this Ruby example, the programmer clearly intended “stream” to be a particular type of object. That is, stream does have a type, its just implicit in their head. Well then, why not use a language and toolset smart enough to understand type of ’stream’? Why burden the programmers brain with sole responsibility for implementing the type system?

The challenge: Making Static Typing easier

I think the surging popularity of dynamic typing is symptomatic of the pain of the syntax-heavy static type systems in C++/Java/C#. (I suspect there’s also some misplaced “I like Ruby / Rails, therefore I like dynamic typing” association going on) But I would rather fix the pain and keep the types.

Type Inference is the answer. To those who favor dynamic types: have you looked at statically inferred-type language like Boo? In Boo, one rarely has to specify the type of an expression, and yet can enjoy > 10x faster execution, enable IDE navigation/code-completion/refactoring support, and compile time checking:

def writeToStream(stream as FileStream):
  return stream.write(_content)

bytesWritten = envelope.writeToStream(filestream)

Boo doesnt infer parameters to methods, but other than that, this statically typed code looks very like the Ruby version doesn’t it?

Virtual Machines will become fastest code execution environment

I’m going to paraphrase a point made recently by Charles Nutter (can’t recall exactly where), because it bears repeating and was a rather profound realization for me.

Nowadays, optimization of code is not bounded by available CPU power for performing optimization, but by

  • information available about the context in which the code will be run
  • the ability to change or withdraw an optimization in response to changed conditions

JITing VMs have better information about code’s execution context than static compiler. They can “change their mind”, and recompile a method with different optimizations applied in response to contextual changes. This permits more aggressive optimizations; inlining of virtual methods being a classic example. Therefore, their trend will be not only to equal, but surpass pre-compiled executables, for modern OO/Functional/Dynamic apps. Virtual machines like the JVM, .NET and cousins, are destined to become the fastest place to run the software of the future.

Of course, for simple, fixed tasks, old C code will still run fastest of all. In comprehending this principle, its important to compare apples with apples. Demands for flexibility and productivity in software have relentlessly pushed it towards more dynamic behaviors. Popular modern apps like Eclipse or Rails rely on dynamic behaviors that are impossible or impractical in statically linked C or C++, and this will only grow.

The JVM needs Value Types

My biggest frustration with Java and the JVM gets little attention outside very geek circles: the lack of support for Value Types in the JVM (known as structs in .NET). Here I’m going to describe them and argue the case for supporting them.

Value Types give the ability to lay out an application’s data in memory in a much more efficient way than the JVM currently allows. They can significantly improve the performance of JVM applications that use small, numerous objects. Since memory layout is considered a low-level issue, the lack of, or even the concept of, Value Types, doesn’t receive much publicity & discussion. Undeservedly! I hope that I can convince you that in fact, a crucial foundation piece needed for a performant, all-purpose VM is currently missing from Java.

What are Value Types?

I use the term “Value Types” to mean lightweight objects that behave, and are laid out in memory, like Java’s primitives:

  • A value type resides “at” the field, local variable, or array cell where it is defined. Thus, it is created as part of a containing object, stack frame or array, respectively.
  • Like a primitive, it is not allocated as an object on the heap and thus it is not garbage collected. To be passed as a reference type (ie normal object), it must be boxed. Only the boxed form can be null.
  • Unlike a primitive, a value type can have (a) multiple named fields which can be either value types or references to normal objects, and (b) static and instance methods like any other object.
  • When a value type is passed or assigned, a copy of its values is made.
  • When value types are allocated in an array, the array cells contain the type’s actual data fields. When reference types are allocated in an array, the cells contain pointers to the object record, which reside somewhere else on the heap. This quality is one of value type’s most compelling advantages.
  • Value types dont support inheritance well.

Examples of Value Types

The Common Language Runtime in the .NET platform supports Value Types (called Structs), and so its APIs provide a number of examples. A compelling example is its Decimal , which covers similar ground to Java’s cumbersome, inefficient BigDecimal. It supports 28 significant digits of precision without floating point error, and does all this in 8 bytes of storage. From a programmers perspective, it behaves just like one of Java’s primitive types, providing a blend of efficiency and convenience. Other examples are string, DateTime and Point (you can see the tendency for value types to be small, lightweight things.)

Value Types can be useful outside core libraries too. In 3D graphics, Triangles are ubiquitous and make an excellent value type, as do Points, Rectangles, Ranges and Quaternions. In scientific computing, Complex Numbers consist of a pair of floating point components, the real and imaginary parts, which are perfect candidates to be value types.

Why & Where are they more Efficient

Recall that a value type object is nested within its containing type in memory, whereas every regular reference type object has its own distinct record in the heap, and containing objects have a pointer to that record. This is the cause of all the efficiency benefits; when compared to a reference type, they

  • requires less space in memory to represent the same data fields (once).
  • do not require a separate allocation operation - memory is allocated as part of their parent’s allocation
  • require less CPU effort to access their field in memory, because they are a fixed offset from their parent objects memory address. Reference types require de-referencing a pointer to access a non-primitive object field.
  • make better, less-polluting use of CPU cache memory, because reside in the same block of memory as their containing object, and because they are more compact. Reference objects are allocated onto a mixed heap shared with other threads and object types, so loading a block of heap memory into a cache line can “pollute” it with irrelevant data.
  • The VM’s Garbage Collection system can treat value types as having the lifespan of their containing reference type, so garbage collection of value types is very low cost.
  • Modern VMs relocate objects in memory when they garbage collect. This requires them to track and rewrite all pointers to relocated objects as they are moved. Because value types do not have pointers to them, there is nothing to track or relocate.

Note that reference types can be shared by many parents, but value types have only parent, and will result in duplicated copies where shared amongst many objects. They are not appropriate for widely shared, or shared mutable, data.

Arrays of Value Types: Benefits Multiplied!

Value types power is unleashed when they are stacked in arrays for bulk operations- neat rows of data-bricks lined up in memory. The performance advantage of a large value-type array, relative to a large reference-type array, is massive;

  • An entire cache line can be filled with the value type data, as soon as the first instance is accessed, and iterating over the sequence can proceed without going back to main memory. A reference type fills the cache up with pointer addresses, to heap records that may be scattered or mixed with other object types. (Granted, a smart GC can often arrange related heap records together after the first generation copy, thus alleviating this problem)
  • Because value types have a regular fixed size, the memory address of the nth array element can be computed from n and the vale type’s size. This is quicker than a pointer dereference, which requires a memory read.
  • Huge numbers of value types can be created in a single bulk memory allocation. Although reference type allocation is apparently very fast in modern VMs, it still must be done for every reference type array member and presumably requires thread/CPU synchronization/barriers.
  • Huge numbers of value types can be cleaned up without any need for the GC to consider their lifespan or track their reachability. When the containing array becomes eligible for collection, they are all disposed of at once.

Because arrays of value types consist of a sequence of data values and nothing else, they can also be directly mapped onto data from IO operations. For example, imagine a bulk quantity of point data is written into a block of memory in a JVM, from either the network or filesystem. With value types this data could be “interpreted” as type Point[] in place, without any subsequent copying or allocation. (See ByteBuffer for how this currently works for Java primitives.)

An extension of such a buffer sharing approach might allow Java apps to work efficiently with hardware accelerators for 3D graphics and physics. Typically, the APIs to such devices work on the same copy of the data as the client application, as the data volumes involved are too large to duplicate.

An Example: Triangle Mesh Data

Now let me give a practical example: imagine we want to represent a
10,000-triangle mesh.

struct Vertex {

double x;

double y;

double z;

//constructor, methods etc..

}

struct Triangle {

Vertex v1; Vertex v2; Vertex v3;

//constructor, methods etc..

}

class Mesh {

Triangle[] meshTriangles = new Triangle[10000];

}

So all up, as well as 10,000 triangles, there are 30,000 vertices. Lets
compare how value types and reference types handle this challenge:

Value Types Reference Types
Allocation Meshes entire mesh allocated in one block in one
operation
Mesh’s memory allocated in 40001 separate blocks
in 40001 separate new calls
Access a field eg meshTriangles[8622].v1.y Field’s address can be determined as direct
offset from array address
Requires loading and de-referencing 2 pointers
to locate the field
Cache Friendliness When a mesh field is accessed, mesh data will
typically fill the entire cache line
Accessing mesh data will load into cache
whatever is in that part of the heap, mesh data mixed with possibly
unrelated data.
Garbage Collector copying reachable data Entire mesh be block copied. Track and rewrite
one reference pointer when data relocated
GC must trace into all 10000 triangle to copy
each vertex separately. Track and rewrite 40001 reference pointers when
data relocated.

Value Types and JVM Performance

I see us entering a world where the JVM is the fastest place to execute modern code.

So in that world, what’s the next bottleneck? Thats how performance works, right? Amdahl’s law etc. With code path execution and garbage collection out of the way, the chatty, cache-unfriendly one-size-fits-all solution that is reference types will increasingly show up as a brake on achievable performance.

I’m not anti-reference types, BTW. They are the correct “default” model of an object. But there are simply times when they give you much more flexibility than you need, for a far greater price than you want to pay.

Bringing Value Types to the JVM

I hope Ive given you a sense of how cool value types would be in Java. High performance games, scientific computing and lower-level systems programming applications could become quite practical.

But unfortunately, value types are not something that can be implemented by spitting out different bytecode from a compiler. They require support at the bytecode and JVM level. Therefore, code using value-types can only run upon a newer VM with value type support. In that way, they are like the Java 5 changes: non-value type based classes can still execute fine on a VM with value type support, but not the reverse.

New JVMs and bytecode format changes dont happen very often. But one of the most significant revisions ever is going on right now; the Da Vinci Machine, associated with forthcoming JDK 7. From its blurb:

“We are extending the JVM with first-class architectural support for languages other than Java, especially dynamic languages. This project will prototype a number of extensions to the JVM, so that it can run non-Java languages efficiently, with a performance level comparable to that of Java itself.”

Although its stated mission is more oriented towards improving dynamic langauge support, there is a broader list of possible changes that includes a (differently nuanced) form of value type support.

Honk if you like Value Types too…

If you think value type support is desirable and important for the JVM, please help to raise awareness of the issue:

  • Leave a comment here - its easy! (Registration not needed to post comments)
  • Advocate it on Da Vinci Machine forums like:
  1. JVM Languages Google group
  2. Da Vinci Machine Mailing list
  • Advocate in the wider Java community, eg your work, your blog, TheServerSide.com, JavaLobby.org, http://www.infoq.com/java/, etc

Honk if you dont like Value Types

Please also feel free to comment if you feel there are inaccuracies, mistakes etc in my post, or other reasons why value types cant fly on the JVM (beyond “political difficulty”)

How to easily back-out any Subversion revision

My colleague David Kemp taught me a very useful technique yesterday: how to easily back-out (ie erase) the effect of one or more Subversion commits, even when those commits happened in the past and are not at the “tip” of the repository. Barring merge problems, its as if those commits never happened.

merge -r243:242 file:///c:/projects/arcadia/Development/Repository/Arcadia

Note how in this command, we merge from a higher number revision backward to a lower number, on the same branch. This creates working copy modifications with the changes contained in Revision 243 undone.

Here’s a screenshot from within Eclipse:

Full Speed Debugging

I have to spread the word on the goodness that is the Java Hotspot runtime’s Full Speed Debugging.

Essentially, running a JVM in debug mode incurrs no performance cost unless you are actually doing debugging (eg setting breakpoints, stepping through code, etc). So you can routinely start your JVMs, even in production, with remote debug turned on (& appropriate security of course). Whenever you feel like having a look, attach a debugger to the JVM and do some debugging, then disconnect and let it run on at full speed.

I have been using this technique alot with Heroes of Arcadia test-play and are loving it; if I observe weirdness in the game, I immediately stop, hook in with a debugger and examine the game state. No messing around trying to replicate the problem later.

Heroes of Arcadia progress update

Playscape Games mobile Strategy-RPG Heroes of Arcadia (renamed from ‘Arcadia’) has been 3 years since inception to date. I suspect some friends and observers of the project, quite understandably, wonder (a) what we’re doing all day, and (b) if we’ll ever finish.

Well, I’m hard at work. Making a quality RPG, on any platform, involves a large investment of effort is a wide variety of areas.

To convey a sense of the many little details that go into it, here’s my notes for a source code commit spanning 9/3/08 - 31/3/08, about 6 man days of development time on my part-time schedule. Apart from demonstrating a lack of discipline when it comes to regular source control checkins ;), I’m quite happy with the steady progress we’re making.

As for a “release date” - sorry, I cannot estimate that accurately enough to speculate on. However, we’re aiming for a second beta release mid year.

Numerous enhancements and fixes (too long between commits):

Hero class removed, replaced by ExtendedUnit
HeroType removed, heroes become regular UnitTypes

Added/updated sprites:
-Animated structures
-Increased size potions, fixed tombstone
-Shadows removed

TargetSelectionRequestor
-Merge with its interface, removal of dead code
-becomes a Runnable task
-Less filtering of invalid target applied by UI
-MapViewInputHandler greatly simplified by moving target selection logic to TargetSelectionRequestor

Remove World methods delegating to Zone

MapView
- Health bar goes yellow/red when wounded
- Fixed bug where sprites faced wrong way in combat
- Made ranged/melee attack code pathways more uniform
- Removed turn counter from the HUD
- Removed “Cannot enter” warning when attempt move to impassable location

IOManager
-Fix: Zone added to read table when deserialized
- Better exception handling when content script init fails

Textwindow
-Globally enforce rule requiring use of task queue to display text boxes

Scripting:
Completed northern wolves quest
Completed forest shrines quest

Removed remaining resource rules code

Party leader passes baton on death, new leader selection dialog

Big speed up in Faction invoke code
-NPC factions serially called from Zone
-Zone runs from Task queue, World dosnt anymore
-NPC Factions not intersecting player faction are only run periodically
-Faction class no longer runnable, PlayerFaction remains runnable, code diverging alot.

Relationships
-Added 5th alliance state Neutral, renamed old neutral to “wary”

Pathing & Movement
-Replaced Breadth-First Search with AStar pathfinder
-AI now prefers not to paths thru other npc factions

boost AI aggression to pevent units fleeing too much

Introduced consistent unit targeting rules and added AI
-Melee: defender nominates defender
-Ranged: attacker nominates defender

New AddTrigger command for scripting

Remove obselete Abort- and Contiue- TargetSelection commands

New LocationFunction types: Location -> int, used in pathfinding

Fixed bug in unit’s cached event history causing party escorts not to
attack

Fixed divide by zero bug when finding minimum threat locations

Quests now award XP on completion

Situationist Game AI

I have just become aware of Game-Developer/Philosopher Adam Russell through his thought provoking article on Situationist Game AI in Game AI Wisdom 4. I look forward to reading more of his work in the near future.

Reductionism vs Constructionism

The article lucidly describes one of the key design tensions in designing any game with story:

  • Reductionism: You want your world to be filled with interesting, autonomous agents that the player can interact with
  • Constructionism: You want to choreograph these agents in a very controlled, predictable way when required to tell the story

These tendencies are conflicting; for example, through interacting freely with an agent you might move, re-align or even kill it, which compromises the ability to use the agent to advance a pre-conceived story. He gives a nice exposition of how this tension has been handled, or more correctly, “worked around”, in various games to date. Its the best explanation Ive yet read of a tension that has plagued me during the developement of Heroes of Arcadia.

The beginnings of a solution

Adam’s response is to advocate a hybrid approach dubbed Situationist game AI. Unfortunately, this solution is much less precisely articulated than the problem. But he starts down the road with vigour. He draws parallels with debates about local vs global influence in sociology and psycology, and finds middle ways in both fields where an autonomous agent’s environment strongly conditions and shapes its response.

He then sketches how these ideas might be applied in game animation, perception of the environment, and in layering of multiple action-states in an agent’s behavioral representation. Little that’s directly usable at this stage, but I do hope it spurs other more practically-minded developers to make advances in this area.

« Previous entries