Appropriate Use of Mutable Data in Functional Software Systems

Since I wrote this post, I have become aware of Rich Hickey’s discussion of State and Identity on Clojure.org. I feel he expresses the same truths in a generally clearer, sharper way than I have managed to here…

++++++++++++++++++

This post arose out of a Twitter discussion with Daniel Spiewak on twitter. I dont know whether Newton could have encoded his laws of Mechanics into 140 characters, but I need more space to share my thoughts on the (hopefully) simpler matter of: What is appropriate use of Mutable Data in Functional software systems?

Entities and Identities

I feel the answer is inter-connected with the idea of Identity represented by the data in a software system. As Eric Evans explained so well, data that has a strong identity is called an Entity.

A classic example, present in almost every business software system, is some “User Entity”, modeling a person’s account in that system. In such a system, when we speak about a particular User, conceptually we really want there to be only one such entity in existence.

The identity of an entity is normally governed by a unique & immutable primary key. Other than the primary key, everything attribute of the entity may change. A User, for example, might change their favorites list, address, name or possibly even sex.

Functional Interaction with Entities

Conceptually there’s only one entity. But actually, there’s only one “offical version”, and other possibly divergent versions, which may be promoted to official status.

Entities should be accessed via transactions (in broadest sense, including STM or concurrent data structures). A functional program makes an immutable snapshot of the entity at some time. It may derive an updated version of the entity from that, and then request that its copy be promoted as the “master” copy. Typically, this promotion would be done using optimistic concurrency, so that if another process changed the entity first, the update must retry. But other models (eg pessimistic concurr, change merging) might be used instead.

Note that

  • An executing program has no guarantee that it operates on the “latest” version of an entity.
  • After the fact, it is possible to say “at time X, the official state of the entity was thus…”, by time-stamping updates. This is important, for example, in a legal case if we wanted to know who owned some property at a particular point in time in a dispute.
  • This is either Shared State and Message Passing concurrency. We can view the “official entity version” as shared state, or as an agent to whom we send read-entity and write-entity messages.

Pretty much as Daniel put it (borrowed from Clojure, apparently) Everything is immutable except for concurrency-safe entities

The meaning of Entity Update

Something special happens when an entity’s master state is updated. This is the synchronization/convergence point between separate threads and components. It is the boundary between the functional code, which transforms some input state into some new entity states, and the imperative world of sequentially updating mutable state. It is the end of a Unit of Work, a time when a functional program says “this is not a means to an end, this is the end”. The transient becomes persisted, allowing the state in the current moment to affect and interact with other points in time.

Granularity, Scalability and Consistency

So how many entities do we need? Coarse or fine grained? A few larger, internally complex entities, or many simpler and smaller entities?

Its a matter of chosing a point on a spectrum whose extremes are

  • A single StateOfTheWorld entity whose attributes encompass all data in the system. As in a purely functional system; the program sees only its own private version of state, without any interference.
  • Every tiny piece of data is its own entity. This would be like programming only with atomic global variables. Everyone sees and shares all state without privacy.

I don’t think there’s one “right” answer. All I can offer is an outline of some design forces acting in either direction:

Fewer Entities each containing more data

  • For: data consistency within an entity. A coarse grained entity starts in a consistent state and is never externally updated. A program is free to compute changes to the entity’s state independently without risk of being disrupted by other program’s updates. The benefits of the purely functional model accrue.
  • Against: Contention if multiple processes try to simultaneously update the same entity. For an extreme example, imagine if two functional programs simulatneously computed a new StateOfTheWorld entity – they would operate in a perpetual state of fatal contention.
  • Against: Copying Overhead. Need to copy minimum of log(N)  (where N is the “size” of the entity), and often much more, of an entity’s data to update it.

More Entities each containing less data

  • For: Less copying overhead. We still need to copy minimum of log(N)  of an entity’s data to update it, but N is smaller.
  • For: Less contention when multiple processes try to simultaneously update entities, because the updates are spread across more fine-grained locks.
  • Against: data inconsistency between inter-related entities. We either have to tolerate inconsistency between entities (eg Person entity ‘Ben’ has a son ‘Otto’, but Person entity ‘Otto’ doesnt have a father ‘Ben’), or introduce an extra layer of locking over the entities (as most databases do). Note that if we inroduce locking above the entity level, our contention benefits go away, and we may be creating de-facto coarse-grain entities.

Its interesting to note that the same trade-off between consistency and scalability shows up many times elsewhere – eBay being a nice example.

Non-Global Identity

Above, I presented Entities as having a globally unique identity. Much of the time, thats a good way to think about them, but be aware it is a simplification. In fact, identity is inherently defined relative to some scope. That scope needn’t be global.

Here are 3 different scopes for the entity The number of people on earth at midnight on Dec 31, 1999.

  1. The globally true answer that an omnipotent god might know
  2. My national government’s official figure
  3. My personal estimate

Scopes may nest. Im still contemplating exactly how this affects the design of systems with mutable data.

OO/Imperative programmers: ‘Study Functional Programming or Be Ignorant’

Right now, if you want to understand the state of the art in computer programming, those are your choices as I see them.

Sorry to be so blunt … please don’t shoot the messenger.

My awakening started when I began searching for a better Java, and found Scala. Scala had these weird (in my ignorance) functional features and learned Scala people often talked about Haskell. A uni friend Bernie Pope raved about it too.

I took a look at Haskell. It wasn’t a pleasurable experience. Much of the code was almost incomprehensible. Half the concepts I’d never heard of before. It made me feel stupid, but actually what I was was ignorant.

I say ignorant, rather than innocent, because Haskell has been around for over a decade, and FP much longer.

I still struggle to read most Haskell, and I certainly can’t use it to build anything. But I am starting to get a sense of just how sophisticated it is, and a map of it’s concepts in my mind. I began by going down a rabbit-hole, and expected to find a burrow, a community of Haskelliers programming in their own unique way. Instead, bit by bit Ive realised Ive ended up in a vast cavern absolutely full of stuff I barely even knew existed.

The bit that stirs me up is that this “stuff” isn’t, repeat is not, Haskell-specific. Its rooted in the fabric of our reality, in our mathematics and our problem domains, and bits poke up like the tips of icebergs into mainstream OO languages, their true structure part-revealed. But rarely in the OO-world have I found such carefully abstracted and subtle techniques of programming in daily use.

Ideas from Haskell & Functional Programming will continue to flow into the mainstream over the next couple of decades. Innovations will be trumpeted, trends identified, features debated, technologies evangelized.

But personally, Im too curious (and too lazy a typist) to wait that long.,