Most software processes can be distilled down to a couple of discrete steps:
- Gather data
- Aggregate to something manageable
- Examine the data and decide what actions must be taken
- Execute the operations
These steps, or phases, are true for both the smallest and largest processes. A process here is defined as “a series of actions or steps taken in order to achieve a particular end,” and this description holds true for both small pure functions and large business workflows.
More often than not, people don’t actually reflect on these issues and just interleave all the steps. There are, however, good reasons why you might want to extract these to distinct phases, or at least identify them and make use of their properties.
Before we go into details for the steps, we look at how a typical function might be constructed and identify where the boundaries between each steps.
void sendVotingReminder(int userId, string whenCannotVote) {
// userId and message is part of the gather phase, but we also need some more data
var user = getUser(userId);
// We might also need to calculate some data
var name = user.firstName + " " + user.lastName;
// Then we need to decide what to do
string msg;
if (user < 18) {
// We use null as a "don't send anything" message
msg = null;
} else if (!user.canVote) {
msg = "Hi " + name + " " + whenCannotVote;
} else {
msg = "Hi " + name + ". Remember to vote!";
}
// Now decisions have been made, we need to take action
if (msg == null) {
return;
} else {
sendMail(msg);
}
}
I wrote the function that way to make each step easily distinguishable, although people tend to shorten functions to avoid redundancy and rather writes something like the following.
void sendVotingReminder(int userId, string whenCannotVote) {
var user = getUser(userId);
if (user >= 18) {
sendMail("Hi " + user.firstName + " " + user.lastName + " " +
(user.canVote ? ". Remember to vote!" : whenCannotVote));
}
}
While it’s a lot terser, the different steps are interleaved, making it more difficult to extract just a portion of the process, identifying the business rules, and so on. As the process becomes more complex, this can often result in difficult to understand code.
Now that we’ve looked at a concrete example, it’s time to describe the steps in more detail.
Gather/collect data: The process of retrieving all the data you need in order to decide what to do. These values can come from function parameters, or they can be fetched from external sources as databases and remote APIs. For many processes, it can even be a combination of data from several sources.
Aggregate/calculate data: When we have all the data we need, we need to massage it to make it more manageable by combining datasets, filtering, converting, cleaning, calculating and so on. We do this to make the next step easier easier to read, write and reason about.
Decide: Once data is in a format that’s easy to process, we can look at it and decide what to do. This is the business rules, e.g. “if this then that”. This is the actual important part, while the other steps is just necessary cruft to make the decision possible and to put it in effect.
Act/execute: Given that we have decided what to do, we need to actually perform the operation. This typically involves writing to databases, sending emails, calling external APIs and so on.
Describing the above steps should come as no surprise, and most experienced developers will probably go “well, duh!”. Many will probably also state that it’s not that simple in the real world as we need to do error handling, performance optimization and so on – And I totally agree. This post is to reflect on the distinct phases in a process and their properties to help us develop, refactor and test such processes.
We could describe any process with the following function:
let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e =
gather >> aggregate >> decide >> act
: let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e =
: gather >> aggregate >> decide >> act;;
: val makeProcess:
: gather: ('a -> 'b) ->
: aggregate: ('b -> 'c) ->
: decide: ('c -> 'd) -> act: ('d -> 'e) -> ('a -> 'e)
While most type systems don’t allow us to describe many properties of these functions, we can discuss them in prose.
gather
: Often a mix of pure and impure; things that is already fetched (like parameters) are pure, while things we need to fetch from other sources is impure. In general, we’ll say that this step is impure. On the other hand, it will never mutate any data, only read data. You should be able to call this function with all possible parameters, ignoring all results, and the world still looks exactly the same.
aggregate
: Combines values to a more actionable format. Given the same data, it will always return the same result. The step is pure, and can be memoized for instance. This is why I like to think of gather
and aggregate
as distinct phases. Pure functions are easy to reason about and test, so the more you’re able to encode as pure functions, the better.
decide
: Only looks at the aggregated data and never writes any data, and is thus a pure function. This is also where most domain logic/business rules reside. Reducing this to a single pure step makes it trivially testable. Nothing has to be mocked, and the core of the domain becomes very understandable. As this is the main part that is of interest to the outside, keeping it pure, separate, tested and documented is great for communicating with users of the system.
act
: Performs the decided operations and is definitely not pure. This is the only part of the process which mutates data. It will only use data which is added by the prior decision step, and it will execute the effect.
To summarize: gather
: queries the world, no effects on the world aggregate
: pure – no contact with the world decide
: pure – no contact with the world act
: doesn’t query the world, only executes decided effects, no side-effects
Testing gather
requires us to mock the sources it fetches data from. But it might be easier to test gather >> aggregate
rather than gather
alone, and that’s fine – testing aggregate
alone doesn’t always give much benefit. Similarly, testing act
requires us to mock the sources which is mutated. Testing decide
is “trivial” as it doesn’t read or write to the outside world.
Since aggregate
and decide
are pure functions, you might just have one function which does both, or you might not have any of them at all… A function which doesn’t do anything and just returns the value passed into it is called the identity function. We can use this to “skip” steps where we don’t need to look at or change the data
You can run gather >> aggregate >> decide
until hell freezes over, and you won’t have had any effect on the world! This is a really nice property.
Let’s look at some silly examples to show that our makeProcess
is able to describe regular functions.
// We decide that + should be performed
let myadd = makeProcess id id (+) id
myadd 1 2 // Returns 3
let perform = makeProcess id id id
let add a b = a + b
perform add 1 2 // Returns 3
let mysum = makeProcess id id id (List.fold (+) 0)
mysum [1 .. 3] // Returns 6
let const' x = makeProcess (fun _ -> x) id id id
let const'' x = makeProcess id (fun _ -> x) id id
let const''' x = makeProcess id id (fun _ -> x) id
let const'''' x = makeProcess id id id (fun _ -> x)
: let const' x = makeProcess (fun _ -> x) id id id
: let const'' x = makeProcess id (fun _ -> x) id id
: let const''' x = makeProcess id id (fun _ -> x) id
: let const'''' x = makeProcess id id id (fun _ -> x);;
: val const': x: 'a -> ('b -> 'a)
: val const'': x: 'a -> ('b -> 'a)
: val const''': x: 'a -> ('b -> 'a)
: val const'''': x: 'a -> ('b -> 'a)
Let’s look at how we could split these steps out of sendVotingReminder
, but first we need to convert it to F#.
let sendVotingReminder (userId : int) (whenCannotVote : string) =
// gather
let user = getUser userId
// aggregate
let name = user.firstName + " " + user.lastName;
// decide
let msg =
if (user < 18)
then null
else if (not user.canVote)
then sprintf "Hi %s %s" name whenCannotVote
else sprintf "Hi %s. Remember to vote!" name
// act
if (isNull msg)
then ()
else sendMail msg
Encoding the possible decisions as a closed set is good both for documentation and robustness.
type Action =
| NoActionBecauseUserTooYoung
| SendCannotVoteMessage of message : string
| SendReminder of message : string
Remember that act
shouldn’t query the outside world, so the only information it has available is what is available in Action
. We could drop the NoActionBecauseUserTooYoung
by using an Option
if we need 0 or 1 action, or support 0 to many actions by returning a list of actions.
Sometimes it makes sense to let aggregate
return more information to decide
like the fact that a user is too young. But having a “no-op” case is often a very useful feature (like the NullObject pattern in OOP, the identity function or empty for Monoid), so we’ll leave it in.
let sendVotingReminder (userId : int) (whenCannotVote : string) =
// gather
let user = getUser userId
// aggregate
let name = user.firstName + " " + user.lastName;
// decide
let action =
if (user < 18)
then NoActionBecauseUserTooYoung
else if (not user.canVote)
then SendCannotVoteMessage (sprintf "Hi %s %s" name whenCannotVote)
else SendReminder (sprintf "Hi %s. Remember to vote!" name)
// act
match action with
| NoActionBecauseUserTooYoung ->
()
| SendCannotVoteMessage message ->
sendMail msg
| SendReminder message ->
sendMail msg
We can start by creating inner functions for the parts we wish to extract
type Gathered =
{ user : User
whenCannotVote : string
}
type Aggregated =
{ user : User
whenCannotVote : string
fullname : string
}
let sendVotingReminder (userId : int) (whenCannotVote : string) =
let gather (userId : int) (whenCannotVote : string) : Gathered =
{ getUser userId; whenCannotVote }
let aggregate (gathered : Gathered) : Aggregated =
let name = user.firstName + " " + user.lastName;
{ gathered.user; gathered.whenCannotVote; name }
let decide (aggregated : Aggregated) : Action =
if (aggregated.user < 18)
then NoActionBecauseUserTooYoung
else if (not aggregated.user.canVote)
then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote)
else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname)
// act
let act (action : Action) : unit =
match action with
| NoActionBecauseUserTooYoung ->
()
| SendCannotVoteMessage message ->
sendMail msg
| SendReminder message ->
sendMail msg
gather userId whenCannotVote
|> aggregate
|> decide
|> act
This is still the same function, and we can now reduce it to just its parts
type Gathered =
{ user : User
whenCannotVote : string
}
type Aggregated =
{ user : User
whenCannotVote : string
fullname : string
}
type Action =
| NoActionBecauseUserTooYoung
| SendCannotVoteMessage of message : string
| SendReminder of message : string
let gather (userId : int) (whenCannotVote : string) : Gathered =
{ getUser userId; whenCannotVote }
let aggregate (gathered : Gathered) : Aggregated =
let name = user.firstName + " " + user.lastName;
{ gathered.user; gathered.whenCannotVote; name }
let decide (aggregated : Aggregated) : Action =
if (aggregated.user < 18)
then NoActionBecauseUserTooYoung
else if (not aggregated.user.canVote)
then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote)
else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname)
let act (action : Action) : unit =
match action with
| NoActionBecauseUserTooYoung ->
()
| SendCannotVoteMessage message ->
sendMail msg
| SendReminder message ->
sendMail msg
let sendVotingReminder = makeProcess gather aggregate decide act
Just looking at the types, we can pretty much guess what’s going on. It’s pretty easy to describe decide
to business users, and pretty easy to test in isolation. It’s actually pretty easy to test each part in isolation as necessary if the impure steps accepts functions for communication with their dependencies.
We’ll look at a final example with just the end result. We create an API which returns dummy data for our example.
type User =
{ userId: int
firstName: string
lastName: string
}
type Profile =
{ address: string
}
type Post =
{ userId: int
published : DateTime
}
let getUser (userId : int) : User =
{ userId = userId
firstName = sprintf "firstname %d" userId
lastName = sprintf "lastname %d" userId
}
let getProfile (userId : int) : Profile =
{ address = sprintf "address for %d" userId }
let getPosts () : Post list =
[
{ userId = 1
published = DateTime.Today
}
]
Now we’re ready to build our process, and the first step is to gather all the data needed.
let gather (userId : int) =
let user = getUser userId
let profile = getProfile userId
let posts = getPosts ()
(user, profile, posts)
After all data is gathered, we need to process it. It is often useful to create a new structure to hold our information. This is pure, so given the same arguments, it will always return the same result, and it will never have any effects on the outside world.
type TodayDigestInfo = { userId: int; fullname: string; address: string; numBlogsToday: int }
let aggregate ((user, profile, blogs) : (User * Profile * Post list)) =
{ userId = user.userId
fullname = sprintf "%s, %s" user.lastName user.firstName
address = profile.address.ToUpper()
numBlogsToday = blogs |> Seq.filter (fun b -> b.userId = user.userId && b.published.Date = DateTime.Today) |> Seq.length
}
When we have our data, we’re ready to make decisions about what to do. Making the decision, the important business logic, is pure, and all possible outcomes are typed in the result of the function.
type Action =
| SendCongratulationCard of name : string * address: string * message : string
| ShameUser of userId : int * why : string
let dailyDigest (info : TodayDigestInfo) : Action =
if info.numBlogsToday = 0
then ShameUser (info.userId, "Booo. You didn't write any posts!")
else SendCongratulationCard (info.fullname, info.address, (sprintf "You wrote %d posts" info.numBlogsToday))
Pure functions doesn’t actually “do” anything, so given our decisions, we need to modify the world. Everything we need to execute the decisision should be stored in the data passed to our execute function from the decision.
let executeAction (action : Action) =
match action with
| SendCongratulationCard (name, address, message) ->
sprintf "UPS.sendCard %A %A %A" name address message
| ShameUser (userId, why) ->
sprintf "Shaming %A -- %A" userId why
And finally, we’ll create our process. Our process will then have the type userId: int -> actionResult: string
let sendDailyDigest = makeProcess gather aggregate dailyDigest executeAction
Let’s test our code
sendDailyDigest 1
: sendDailyDigest 1;;
: val it: string =
: "UPS.sendCard "lastname 1, firstname 1" "ADDRESS FOR 1" "You wrote 1 posts""
sendDailyDigest 2 // "Shaming 2"
All this might look like complete overkill, and in many cases it is. But recognizing that processes, from the smallest +
function, to the largest business processes, all share the same general steps with the same properties is powerful knowledge. It makes it easier to extract parts that can be reused by other processes, parts that should be tested more thoroughly and so on.
In many cases you only want to extract a single part for some reason, like the business logic. The important thing is to remember that these are common boundaries that are often quite natural to extract and often yields some benefits as processes becomes more complex. Just having these distinct blocks in functions can be beneficial as it’s easier to reason about and reduces spaghetti code.