it should have a title that is “Man having serious doubts about whether saying everything twice is actually the best we software engineers can do”

Easy solutions for Caching in the JVM

Bounded LRU Cache

A common use case. A cache data structure that:

  • has a bounded maximum size and thus memory consumption (which in turn implies a cache eviction policy like LRU)
  • is safe and performant for concurrent use

AFAICT, there’s nothing in the JDK or the concurrency JSRs that hits both these requirements. I googled and found the open source Concurrent Linked Hashmap library, which does.

[An incorrect comment about the project missing tests has been removed]

Unbounded Garbage Collectable Cache

If instead, you want a cache that will grow with use, but can be reclaimed if memory is short, then the use of a ConcurrentReferenceHashMap, configured with SoftReferences, is a good solution.

Escape Analysis changes the rules for performance critical code

The last few days its been dawning on me how Escape Analysis changes the rules on performance critical code.

Its a big deal. Short lived objects become nearly free to use. Convenience rules. If you need to tuple-ize 2 Ints, then stick them in a List, to pass through an API, only to immediately unpack back to the Ints on the other side: go ahead. Dont worry. Those objects probably will never be created; they’re conceptually there.

Get back to what you ought to be doing: Abstracting and Composing.

Arcadia’s level editor lives again

This is a sight I havent seen for 3 months – my game’s level editor working. Its a relief to have it back (partially).

Heavy refactoring of game code, technical debt and experimentation have conspired to make it broken. Its taken much of the past month to fix.

Resource_-_ArcadiaResourcesScenariosNuWoodhaven.scenario.xml_-_Eclipse_SDK-2009.07.06-19.03.27

XSL for Xml Schema Migration & Data Transformation

Ive recently been using XSL to do schema migration on XML data; ie small, incremental modifications to XML where most of the existing data is preserved. The application was to massage some Heroes of Arcadia level data, serialized with XStream, to keep in step with code changes I’d made.

First I tried using Scala’s scala.xml.transform.RewriteRule, but hit some bugs (#2124 and #2125) that convinced me to let it stabilize further before applied use.

So instead I used the new XSL support in Eclipse 3.5  (in WTP 3.1). Seems to work well.

I’d only used XSL back in 2001 and never fully understood it. I found most entry level tutorials (eg ZVON and W3 Schools) unhelpful for this kind of application, because they don’t explain copy existing nodes well.

My breakthrough came when I discovered Jesper Tverskov’s article on the Identity Template. This explains how to recursively process XML by copying with modification.

One case Jesper doesnt cover is copying where segments of the subtree are omitted and others are retained but modified.

I solved that in my template by using a <for-each> to pickout the subtree part I wanted to preserve, wrapping the <xsl:copy><xsl:apply-templates select=”@*|node()”/></xsl:copy> structure thats fundamental to recursive processing.

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="//vector[default]">
<size><xsl:value-of select="default/elementCount"/></size>
<xsl:for-each select="default/elementData" >
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>

Sample input and output:

<__features id="50" serialization="custom">
<vector>
<default>
<capacityIncrement>4</capacityIncrement>
<elementCount>1</elementCount>
<elementData id="51">
<arcadia.domain.Unit reference="46"/>
<null/>
<null/>
<null/>
<null/>
</elementData>
</default>
</vector>
</__features>

<__features id="50" serialization="custom">
<size>1</size>
<elementData id="51">
<arcadia.domain.Unit reference="46"/>
<null/>
<null/>
<null/>
<null/>
</elementData>
</__features>

Appropriate Use of Mutable Data in Functional Software Systems

Since I wrote this post, I have become aware of Rich Hickey’s discussion of State and Identity on Clojure.org. I feel he expresses the same truths in a generally clearer, sharper way than I have managed to here…

++++++++++++++++++

This post arose out of a Twitter discussion with Daniel Spiewak on twitter. I dont know whether Newton could have encoded his laws of Mechanics into 140 characters, but I need more space to share my thoughts on the (hopefully) simpler matter of: What is appropriate use of Mutable Data in Functional software systems?

Entities and Identities

I feel the answer is inter-connected with the idea of Identity represented by the data in a software system. As Eric Evans explained so well, data that has a strong identity is called an Entity.

A classic example, present in almost every business software system, is some “User Entity”, modeling a person’s account in that system. In such a system, when we speak about a particular User, conceptually we really want there to be only one such entity in existence.

The identity of an entity is normally governed by a unique & immutable primary key. Other than the primary key, everything attribute of the entity may change. A User, for example, might change their favorites list, address, name or possibly even sex.

Functional Interaction with Entities

Conceptually there’s only one entity. But actually, there’s only one “offical version”, and other possibly divergent versions, which may be promoted to official status.

Entities should be accessed via transactions (in broadest sense, including STM or concurrent data structures). A functional program makes an immutable snapshot of the entity at some time. It may derive an updated version of the entity from that, and then request that its copy be promoted as the “master” copy. Typically, this promotion would be done using optimistic concurrency, so that if another process changed the entity first, the update must retry. But other models (eg pessimistic concurr, change merging) might be used instead.

Note that

  • An executing program has no guarantee that it operates on the “latest” version of an entity.
  • After the fact, it is possible to say “at time X, the official state of the entity was thus…”, by time-stamping updates. This is important, for example, in a legal case if we wanted to know who owned some property at a particular point in time in a dispute.
  • This is either Shared State and Message Passing concurrency. We can view the “official entity version” as shared state, or as an agent to whom we send read-entity and write-entity messages.

Pretty much as Daniel put it (borrowed from Clojure, apparently) Everything is immutable except for concurrency-safe entities

The meaning of Entity Update

Something special happens when an entity’s master state is updated. This is the synchronization/convergence point between separate threads and components. It is the boundary between the functional code, which transforms some input state into some new entity states, and the imperative world of sequentially updating mutable state. It is the end of a Unit of Work, a time when a functional program says “this is not a means to an end, this is the end”. The transient becomes persisted, allowing the state in the current moment to affect and interact with other points in time.

Granularity, Scalability and Consistency

So how many entities do we need? Coarse or fine grained? A few larger, internally complex entities, or many simpler and smaller entities?

Its a matter of chosing a point on a spectrum whose extremes are

  • A single StateOfTheWorld entity whose attributes encompass all data in the system. As in a purely functional system; the program sees only its own private version of state, without any interference.
  • Every tiny piece of data is its own entity. This would be like programming only with atomic global variables. Everyone sees and shares all state without privacy.

I don’t think there’s one “right” answer. All I can offer is an outline of some design forces acting in either direction:

Fewer Entities each containing more data

  • For: data consistency within an entity. A coarse grained entity starts in a consistent state and is never externally updated. A program is free to compute changes to the entity’s state independently without risk of being disrupted by other program’s updates. The benefits of the purely functional model accrue.
  • Against: Contention if multiple processes try to simultaneously update the same entity. For an extreme example, imagine if two functional programs simulatneously computed a new StateOfTheWorld entity – they would operate in a perpetual state of fatal contention.
  • Against: Copying Overhead. Need to copy minimum of log(N)  (where N is the “size” of the entity), and often much more, of an entity’s data to update it.

More Entities each containing less data

  • For: Less copying overhead. We still need to copy minimum of log(N)  of an entity’s data to update it, but N is smaller.
  • For: Less contention when multiple processes try to simultaneously update entities, because the updates are spread across more fine-grained locks.
  • Against: data inconsistency between inter-related entities. We either have to tolerate inconsistency between entities (eg Person entity ‘Ben’ has a son ‘Otto’, but Person entity ‘Otto’ doesnt have a father ‘Ben’), or introduce an extra layer of locking over the entities (as most databases do). Note that if we inroduce locking above the entity level, our contention benefits go away, and we may be creating de-facto coarse-grain entities.

Its interesting to note that the same trade-off between consistency and scalability shows up many times elsewhere – eBay being a nice example.

Non-Global Identity

Above, I presented Entities as having a globally unique identity. Much of the time, thats a good way to think about them, but be aware it is a simplification. In fact, identity is inherently defined relative to some scope. That scope needn’t be global.

Here are 3 different scopes for the entity The number of people on earth at midnight on Dec 31, 1999.

  1. The globally true answer that an omnipotent god might know
  2. My national government’s official figure
  3. My personal estimate

Scopes may nest. Im still contemplating exactly how this affects the design of systems with mutable data.

OO/Imperative programmers: ‘Study Functional Programming or Be Ignorant’

Right now, if you want to understand the state of the art in computer programming, those are your choices as I see them.

Sorry to be so blunt … please don’t shoot the messenger.

My awakening started when I began searching for a better Java, and found Scala. Scala had these weird (in my ignorance) functional features and learned Scala people often talked about Haskell. A uni friend Bernie Pope raved about it too.

I took a look at Haskell. It wasn’t a pleasurable experience. Much of the code was almost incomprehensible. Half the concepts I’d never heard of before. It made me feel stupid, but actually what I was was ignorant.

I say ignorant, rather than innocent, because Haskell has been around for over a decade, and FP much longer.

I still struggle to read most Haskell, and I certainly can’t use it to build anything. But I am starting to get a sense of just how sophisticated it is, and a map of it’s concepts in my mind. I began by going down a rabbit-hole, and expected to find a burrow, a community of Haskelliers programming in their own unique way. Instead, bit by bit Ive realised Ive ended up in a vast cavern absolutely full of stuff I barely even knew existed.

The bit that stirs me up is that this “stuff” isn’t, repeat is not, Haskell-specific. Its rooted in the fabric of our reality, in our mathematics and our problem domains, and bits poke up like the tips of icebergs into mainstream OO languages, their true structure part-revealed. But rarely in the OO-world have I found such carefully abstracted and subtle techniques of programming in daily use.

Ideas from Haskell & Functional Programming will continue to flow into the mainstream over the next couple of decades. Innovations will be trumpeted, trends identified, features debated, technologies evangelized.

But personally, Im too curious (and too lazy a typist) to wait that long.,

Why Javaspaces are not for me

In 2001/2002 I worked for 18 months at the now defunct UK Javaspace vendor Intamission, building their Javaspace implementation Autevo, which went on to  … er… “inspire” the open source Blitz product , released by Intamission co-founder & lead programmer Dan Creswell after he left the company.

One of the less satisfying aspects of that project was that, as I worked with Javaspace/Tuplespace -based systems, I came to believe in them less and less. I now believe that while Javaspaces have some great qualities, they also have some serious flaws. Here’s a brief outline of what I think these flaws are.

Some background for the layman

Javaspaces pride themselves on having a simple API based on 4 methods

  • write puts an object into the shared distributed space
  • read returns a copy of one object matching some provided query criteria. There are blocking and polling variants.
  • take returns a copy of one object matching some provided query criteria and removes it from the space. There are blocking and polling variants.
  • notify registers interest in being notified should an object enter the space matching the provided criteria.

Javaspaces use a Query by Example mechanism to query the state of the space. One presents an template Entry to the space, with zero or more public, non-primitive, serializable fields, any of which may be null, and the space matches any Entry in the space whose fields are equals() to the template’s field, while treating null fields of the Entry as wildcards that match anything. Thinking of Entry as being a tuple is a good analogy.

I subscribe to the design aesthetic as simple as possible, but no simpler. Unfortunately,  the Javaspace query mechanism is too simplistic, and not sufficiently powerful to solve practical distributed systems problems with:

No Range Queries

The query API allows one to specify either an exact value, or any value, but nothing in between. You can say “a Person of Age == 25”, or “a Person of any Age”, but not “a Person of Age 25- 34”. This is a result of Javaspace’s fixation with a flawed query-by-example model. As you might imagine, this limitation quickly becomes problematic when you try to build real applications.

Use of Null as Wildcard

Another nasty side-effect of the query-by-example model is that null becomes overloaded as a wildcard value. Accordingly, Entry fields cannot be primitive, and may not be null for any legitimate application purpose.

Lack of a readAll Operation

readAll would be the Javaspace analog of a SQL Select statement; ie find me all Entries in the space matching the provided criteria. Sorry, no can do. A Javaspace can give you one of them, but not all of them.

So for example, if we want to ask our space-based application, “How many tasks are in the queue?” , it can only reply “All I can tell you is that there is at least one.”

The Future

Someone should combine the best ideas from Javaspaces – a focus on high-performance, short-term storage of self-contained objects/messages, leases, and event notification, with a decent, extensible query mechanism. Now that would fly. (Maybe someone already has? … I havent been following this field closely in recent years)

Deleting from the Heroes of Arcadia domain: a thorny problem

Deleting Objects from a Domain Model in a safe and scalable way with minimal adverse effects, is a thorny problem.

Eric Evans Domain Driven Design book has the best discussion Ive read under the section on Aggregates, but its not comprehensive.

Im struggling to find a really nice solution in Heroes of Arcadia. My best shot so far involves a mixed approach – cleaning out as many root refs as I can, plus stamping the deleted object as inactive. An example:

Units can be removed from a Zone.

There are two primary references to a Unit, via its Faction and its Location. These are bi-directional. When a unit is removed, it notifies its Faction and Location that it is going and they remove their ref to it. As these are the primary indices, cleaning them out will prevent the creation of new refs to the removed unit.

Units may also be referenced from other domain entities uni directionally (meaning they are unaware they’re referenced), for example, by being the target of a spell effect. These refs will remain after the unit is “removed”. The unit must stamp itself as being removed/inactive, and all domain operations need to tolerate this state. Gradually, the removed unit should “drain out” of the system as refs expire.

Annoying Java Bug/Limitation: Generic methods don’t combine well with static imports

package test;
import java.util.*;

import static test.StaticallyImported.*;

public class Test {

  public static void main(String[] args) {

    //passing Type Argument works fine
    HashMap m1 = 
      StaticallyImported.<HashMap>aStaticallyImportedGenericMethod();

    //same thing, with static import syntax: compiler error
    HashMap m2 = <HashMap>aStaticallyImportedGenericMethod();
  }
}

class StaticallyImported {
  static <A extends Map> A aStaticallyImportedGenericMethod() {
    return null;
  }
}

« Older entries Newer entries »