Entities, Programs and Bindings: an embryonic programming paradigm

!Warning! This is one of my most abstract and part-formed blog posts, a rough sketch of some ideas that are still forming in my mind. You might want to skip onto to something more tangible.

I beleive the ideas expressed in Rich Hickey’s monograph “On State and Identity” (and my own independent grasping toward these ideas) are true.

The past few days I have been contemplating what large scale software look like in this model. The answer seems quite exciting. Traditional immutable versus mutable state conflicts are finally cleared away – instead, each has its defined place.

Values

A value is something time-invariant. All observations of a value should be equivalent. Computationally, a value may be passed around in unresolved or part-resolved form, and further computation is required when parts of the value are accessed.

Entities

An entity is a time-varying stream of values associated with a particular identity. Note that the history of an entity’s value, as well its its current value, is available, which is why a Stream is returned (the history may need to be truncated to fit within real-world computational and storage limit).

def entity[T](id: ID): Stream[T, Timestamp]

Entities change their value transactionally. All changes to the value are recorded permanently. Changes are timestamped in system-time.

def update[T](id: ID, T newValue) //in transaction

Logically, the number of entities is infinite and they are never created or destroyed. If one asks for an entity which has not previously been set to a value, then it returns an empty stream.

Note: There is no way to ask what entities exist. One must find entities via references from other already known entities, and not via a system-level “browse-style” method.

IDs

An opaque immutable value representing an identity. IDs can be used to obtain the values of the entity they reference. IDs are values like any other, can be stored as/in entities, passed to functions, etc.

There is one special identity, system-time. This refers to an entity that holds the current system time, that updates in some discretized approximation of a smoothly changing variable.

Programs

Programs are the things that happen, the things that update entity values and respond to updates in entity state. Most operations are functional, but imperative operations may be required when communicating with external systems, ie for IO.

Functions with Bindings

Functional Programs consist of two things:

  • a (purely functional) Function from N inputs to M outputs. It can and should contain other functions nested to any depth.
  • a Binding of the function’s inputs and outputs to either
    • the value of an entity. In the case of inputs, this means the function is evaluated passing the entity’s value as a parameter. For outputs, this means the entity’s value is assigned that value returned by the function.
    • the event of an entity changing value, for inputs only. This means that a function fires when a entity has changed value. (cf FRP)

The ability to launch programs when entities change state is a key to the operation of the paradigm.

Imperative Operations

Programs can also run imperatively, in order to sequence IO operations for communicating without outside systems.  They can bind inputs to entity values or entity change events, exactly as functional programs do, but do not return any outputs that are assigned back onto entities.

This explanation is rather terse. I need to give some examples, some diagrams, and some explanations. Perahps some outline of what my goals are. To do anything practical with this paradigm requires alot of infrastructure built on top of this base. For now, Im mainly writing this down as a record of my own thought process, for later re-editing

Oh, how shall I spend my Creative Mana?

Im coming late to the party: in February Errki Lindpere posed an worthwhile question on his blog that want to talk about:

“Anyway, please comment, if you are also a programmer, or a geek with a different specialization, who has a lot of hobby projects, how do you manage not to get swamped with them or get the feeling that you are always working on cool stuff but never releasing?”

Myself, I have been developing a fantasy mobile game Heroes of Arcadia since early 2005. While so far Ive got some funding, formed a company and released 2 beta versions, a finished game remains /at least/ 12 months off. [full story:  http://bit.ly/e3UKE]

I would never have dreamed in early 2005 that I’d still be unfinished in 2009. I would never have imagined the marathon effort it  would require of me to create a polished, fun, playable and stable game.

This experience has taught me an important life lesson: Pick your creative battles!

Our lives are finite. There are actually a limited number of things one person can attempt and complete in one lifetime.

Therefore, I feel it’s important to try to understand what you truly aspire to, what you are really good at, what you’re going to be happy doing. Find the sweet spot and really try to focus on it.

My strategy increasingly is to ensure that everything I am involved with fits with my overall aspirations, reinforces each other, and reuses what Ive learned and done already in life. In a word, Synergy.

By trying to create something large, Ive gained a sense of how much creative ‘mana’ Ive got in me. The blunt truth is that it feels somewhat finite, and less than I had idealistically imagined in my youthful daydreams.

So I want to ensure that I spend this mana in the most fruitful way I can.

Anything I attempt to do, I want to

  • yield some outcome
  • create something beautiful
  • be proud of what Im doing
  • enjoy the journey as the well as the destination
  • learn and grow from

Appropriate Use of Mutable Data in Functional Software Systems

Since I wrote this post, I have become aware of Rich Hickey’s discussion of State and Identity on Clojure.org. I feel he expresses the same truths in a generally clearer, sharper way than I have managed to here…

++++++++++++++++++

This post arose out of a Twitter discussion with Daniel Spiewak on twitter. I dont know whether Newton could have encoded his laws of Mechanics into 140 characters, but I need more space to share my thoughts on the (hopefully) simpler matter of: What is appropriate use of Mutable Data in Functional software systems?

Entities and Identities

I feel the answer is inter-connected with the idea of Identity represented by the data in a software system. As Eric Evans explained so well, data that has a strong identity is called an Entity.

A classic example, present in almost every business software system, is some “User Entity”, modeling a person’s account in that system. In such a system, when we speak about a particular User, conceptually we really want there to be only one such entity in existence.

The identity of an entity is normally governed by a unique & immutable primary key. Other than the primary key, everything attribute of the entity may change. A User, for example, might change their favorites list, address, name or possibly even sex.

Functional Interaction with Entities

Conceptually there’s only one entity. But actually, there’s only one “offical version”, and other possibly divergent versions, which may be promoted to official status.

Entities should be accessed via transactions (in broadest sense, including STM or concurrent data structures). A functional program makes an immutable snapshot of the entity at some time. It may derive an updated version of the entity from that, and then request that its copy be promoted as the “master” copy. Typically, this promotion would be done using optimistic concurrency, so that if another process changed the entity first, the update must retry. But other models (eg pessimistic concurr, change merging) might be used instead.

Note that

  • An executing program has no guarantee that it operates on the “latest” version of an entity.
  • After the fact, it is possible to say “at time X, the official state of the entity was thus…”, by time-stamping updates. This is important, for example, in a legal case if we wanted to know who owned some property at a particular point in time in a dispute.
  • This is either Shared State and Message Passing concurrency. We can view the “official entity version” as shared state, or as an agent to whom we send read-entity and write-entity messages.

Pretty much as Daniel put it (borrowed from Clojure, apparently) Everything is immutable except for concurrency-safe entities

The meaning of Entity Update

Something special happens when an entity’s master state is updated. This is the synchronization/convergence point between separate threads and components. It is the boundary between the functional code, which transforms some input state into some new entity states, and the imperative world of sequentially updating mutable state. It is the end of a Unit of Work, a time when a functional program says “this is not a means to an end, this is the end”. The transient becomes persisted, allowing the state in the current moment to affect and interact with other points in time.

Granularity, Scalability and Consistency

So how many entities do we need? Coarse or fine grained? A few larger, internally complex entities, or many simpler and smaller entities?

Its a matter of chosing a point on a spectrum whose extremes are

  • A single StateOfTheWorld entity whose attributes encompass all data in the system. As in a purely functional system; the program sees only its own private version of state, without any interference.
  • Every tiny piece of data is its own entity. This would be like programming only with atomic global variables. Everyone sees and shares all state without privacy.

I don’t think there’s one “right” answer. All I can offer is an outline of some design forces acting in either direction:

Fewer Entities each containing more data

  • For: data consistency within an entity. A coarse grained entity starts in a consistent state and is never externally updated. A program is free to compute changes to the entity’s state independently without risk of being disrupted by other program’s updates. The benefits of the purely functional model accrue.
  • Against: Contention if multiple processes try to simultaneously update the same entity. For an extreme example, imagine if two functional programs simulatneously computed a new StateOfTheWorld entity – they would operate in a perpetual state of fatal contention.
  • Against: Copying Overhead. Need to copy minimum of log(N)  (where N is the “size” of the entity), and often much more, of an entity’s data to update it.

More Entities each containing less data

  • For: Less copying overhead. We still need to copy minimum of log(N)  of an entity’s data to update it, but N is smaller.
  • For: Less contention when multiple processes try to simultaneously update entities, because the updates are spread across more fine-grained locks.
  • Against: data inconsistency between inter-related entities. We either have to tolerate inconsistency between entities (eg Person entity ‘Ben’ has a son ‘Otto’, but Person entity ‘Otto’ doesnt have a father ‘Ben’), or introduce an extra layer of locking over the entities (as most databases do). Note that if we inroduce locking above the entity level, our contention benefits go away, and we may be creating de-facto coarse-grain entities.

Its interesting to note that the same trade-off between consistency and scalability shows up many times elsewhere – eBay being a nice example.

Non-Global Identity

Above, I presented Entities as having a globally unique identity. Much of the time, thats a good way to think about them, but be aware it is a simplification. In fact, identity is inherently defined relative to some scope. That scope needn’t be global.

Here are 3 different scopes for the entity The number of people on earth at midnight on Dec 31, 1999.

  1. The globally true answer that an omnipotent god might know
  2. My national government’s official figure
  3. My personal estimate

Scopes may nest. Im still contemplating exactly how this affects the design of systems with mutable data.

OO/Imperative programmers: ‘Study Functional Programming or Be Ignorant’

Right now, if you want to understand the state of the art in computer programming, those are your choices as I see them.

Sorry to be so blunt … please don’t shoot the messenger.

My awakening started when I began searching for a better Java, and found Scala. Scala had these weird (in my ignorance) functional features and learned Scala people often talked about Haskell. A uni friend Bernie Pope raved about it too.

I took a look at Haskell. It wasn’t a pleasurable experience. Much of the code was almost incomprehensible. Half the concepts I’d never heard of before. It made me feel stupid, but actually what I was was ignorant.

I say ignorant, rather than innocent, because Haskell has been around for over a decade, and FP much longer.

I still struggle to read most Haskell, and I certainly can’t use it to build anything. But I am starting to get a sense of just how sophisticated it is, and a map of it’s concepts in my mind. I began by going down a rabbit-hole, and expected to find a burrow, a community of Haskelliers programming in their own unique way. Instead, bit by bit Ive realised Ive ended up in a vast cavern absolutely full of stuff I barely even knew existed.

The bit that stirs me up is that this “stuff” isn’t, repeat is not, Haskell-specific. Its rooted in the fabric of our reality, in our mathematics and our problem domains, and bits poke up like the tips of icebergs into mainstream OO languages, their true structure part-revealed. But rarely in the OO-world have I found such carefully abstracted and subtle techniques of programming in daily use.

Ideas from Haskell & Functional Programming will continue to flow into the mainstream over the next couple of decades. Innovations will be trumpeted, trends identified, features debated, technologies evangelized.

But personally, Im too curious (and too lazy a typist) to wait that long.,