Your Objects, the Unix Way
Applying the Unix Philosophy to Object-Oriented Design
Today I’m thrilled to present a guest post from my friend John Pignata. John is Director of Engineering Operations at GroupMe and writes a great blog on Ruby software engineering that you should absolutely check out.
In 1964, Doug McIlroy, an engineer at Bell Labs, wrote an internal memo describing some of his ideas for the Multics operating system. The surviving tenth page summarizes the four items he felt were most important. The first item on the list reads:
“We should have some ways of coupling programs like [a] garden hose – screw in another segment when it becomes necessary to massage data in another way.”
This sentence describes what ultimately became the Unix pipeline: the chaining
together of a set of programs such that the output of one is fed into the next
as input. Every time we run a command like tail -5000 access.log | awk '{print
$4}' | sort | uniq -c
we’re benefiting from the legacy of McIlroy’s garden
hose analogy. The pipeline enables each program to expose a small set of
features and, through the interface of the standard streams, collaborate
with other programs to deliver a larger unit of functionality.
It’s mind-expanding when a Unix user figures out how to snap together their system’s various small command-line programs to accomplish tasks. The pipeline renders each command-line tool more powerful as a stage in a larger operation than it could have been as a stand-alone utility. We can couple together any number of these small programs as necessary and build new tools which add specific features as we need them. We can now speak Unix in compound sentences.
Without the interface of the standard streams to allow programs to collaborate,
Unix systems might have ended up with larger programs with duplicative feature
sets. In Microsoft Windows, most programs tend to be their own closed universes
of functionality. To get word count of a document you’re writing on a Unix
system, you’d run wc -w document.md
. On a system running Windows you’d likely
have to boot the entire Microsoft Word application in order to get a word count
of document.docx
. The count functionality of Word is locked in the context of
use of editing a Word document.
Just as Unix and Windows are composed of programs as units of functionality, our systems are composed of objects. When we build chunky, monolithic objects that wrap huge swaths of procedural code, we’re building our own closed universes of functionality. We’re trapping the features we’ve built within a given context of use. Our objects are obfuscating important domain concepts by hiding them as implementation details.
In coarse-grained systems, each single object fills multiple roles increasing their complexity and resistance to change. Extending an object’s functionality or swapping out an implementation for another sometimes involves major shotgun surgery. The battle against complexity is fought within the definition of every object in your system. Fine-grained systems are composed of objects that are easier to understand, to modify, and to use.
Some years after that memo, McIlroy summarized the Unix philosophy as: “write programs that do one thing and do it well. Write programs to work together.” Eric Raymond rephrased this in The Art of Unix Programming as the Rule of Modularity: “write simple parts connected by clean interfaces.” This philosophy is a powerful strategy to help us manage complexity within our systems. Like Unix, our systems should consist of many small components each of which are focused on a specific task and work with each other via their interfaces to accomplish a larger task.
Everything But the Kitchen Sink
Let’s look at an example of an object that contains several different features. The requirement represented in this code is to create an old-school web guestbook for our home page. For anyone who missed the late nineties, like its physical analog a web guestbook was a place for visitors to acknowledge a visit to a web page and to leave public comments for the maintainer.
When we start the project the requirements are straightforward: save a name, an IP address, and a comment from any visitor that fills out the form and display those contents in an index view. We scaffold up a controller, generate a migration, a new model, sprinkle web design in some ERB templates, high five, and call it a day. This is a Rails system, I know this.
Over time our requirements begin growing and we slowly start adding new features. First, in real-life operations we realize that spammers are posting to the form so we want to build a simple spam filter to reject posts containing certain words. We also realize we want some kind of rate-limiting to prevent visitors from posting more than one message per day. Finally, we want to post to Twitter when a visitor signs our guestbook because if we’re going to be anachronistic with our code example, let’s get really weird with it.
require "set"
class GuestbookEntry < ActiveRecord::Base
SPAM_TRIGGER_KEYWORDS = %w(viagra acne adult loans xxx mortgage).to_set
RATE_LIMIT_KEY = "guestbook"
RATE_LIMIT_TTL = 1.day
validate :ensure_content_not_spam, on: :create
validate :ensure_ip_address_not_rate_limited, on: :create
after_create :post_to_twitter
after_create :record_ip_address
private
def ensure_content_not_spam
flagged_words = SPAM_TRIGGER_KEYWORDS & Set.new(content.split)
unless flagged_words.empty?
errors[:content] << "Your post has been rejected."
end
end
def ensure_ip_address_not_rate_limited
if $redis.exists(RATE_LIMIT_KEY)
errors[:base] << "Sorry, please try again later."
end
end
def post_to_twitter
client = Twitter::Client.new(Configuration.twitter_options)
client.update("We had a visitor! #{name} said #{content.first(50)}")
end
def record_ip_address
$redis.setex("#{RATE_LIMIT_KEY}:#{ip_address}", RATE_LIMIT_TTL, "1")
end
end
These features are oversimplified but set that aside for the purposes of this
example. The above code shows us a hodgepodge of entwined features. Like the
Microsoft Word example of the word count feature, the features we’ve built are
locked within the context of creating a GuestbookEntry
.
This kind of approach has several real-world implications. For one, the tests
for this object likely exercise some of these features in the context of saving
a database object. We don’t need to roundtrip to our database in order to
validate that our rate limiting code is working, but since we’ve hung it off an
after_create
callback that’s likely what we might do because that’s the
interface our application is using. These tests also likely littered with
unrelated details and setup due to the coupling to unrelated but neighboring
behavior and data.
At a glance it’s difficult to untangle which code relates to what feature. When looking at the code we have to think about at each line to discern which of the object’s behavior that line is principally concerned with. Clear naming helps us in this example but in a system where each behavior was represented by a domain object, we’d be able to assume that a line of code related to the object’s assigned responsibility.
Lastly, it’s easy for us to glance over the fact that we have, for example, the acorn of a user content spam filter in our system because it’s a minor detail of another object. If this were its own domain concept it would be much clearer that it was a first-class role within the system.
Applying the Unix Philosophy
Let’s look at this implementation through the lens of the Rule of Modularity.
The above code fails the “simple parts, clean interfaces” sniff test. In our
current factoring, we can’t extend or change these features without diffusing
more details about them into the GuestbookEntry
object. The interface by
which our model uses this behavior through internal callbacks trigged
through the object’s lifecycle. There are no external interfaces to these
features despite the fact that each has their own behavior and data. This
object now has several reasons to change.
Let’s refactor these features by extracting this behavior to independent objects to see how these shake out as stand-alone domain concepts. First we’ll extract the code in our spam check implementation into its own object.
require "set"
class UserContentSpamChecker
TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set
def initialize(content)
@content = content
end
def spam?
flagged_words.present?
end
private
def flagged_words
TRIGGER_KEYWORDS & @content.split
end
end
Features like this have serious sprawl potential. When we first see the problem
of abuse we’re likely we respond with the simplest thing that could work.
There’s usually quite a bit of churn in this code as our combatants expose new
weaknesses in our implementation. The rate of change of our spam protection
strategy is inherently different than that of our GuestbookEntry
persistence
object. Identifying our UserContentSpamChecker
as its own dedicated domain
concept and establishing it as such will allow us to more easily maintain and
extend this functionality independently of where it’s being used.
Next we’ll extract our rate limiting code. Some small changes are required to decouple it fully from the guestbook such as the addition of a namespace.
class UserContentRateLimiter
DEFAULT_TTL = 1.day
def initialize(ip_address, namespace, options = {})
@ip_address = ip_address
@namespace = namespace
@ttl = options.fetch(:ttl, DEFAULT_TTL)
@redis = options.fetch(:redis, $redis)
end
def exceeded?
@redis.exists?(key)
end
def record
@redis.setex(key, @ttl, "1")
end
private
def key
"rate_limiter:#{@namespace}:#{@ip_address}"
end
end
Now that we have a stand-alone domain object, more advanced requirements for this rate limiting logic will only change this one object. Our tests can exercise this feature in isolation apart from any potential consumer of its functionality. This will not only speed up tests, but help future readers of the code in reading the tests to understand the feature more quickly.
Finally we’ll extract our call to the Twitter gem. It’s tiny, but there’s good
reason to keep it separate from our GuestbookEntry
. Since Twitter and the gem
are third-party APIs, we’d like to isolate the coupling to an adapter object
that we use to hide the nitty-gritty details of sending a tweet.
class Tweeter
def post(content)
client.update(content)
end
private
def client
@client ||= Twitter::Client.new(Configuration.twitter_options)
end
end
Now that we have these smaller components, we can change our GuestbookEntry
object to make use of them. We’ll replace the extracted logic with calls to
the objects we’ve just created.
class GuestbookEntry < ActiveRecord::Base
validate :ensure_content_not_spam, on: :create
validate :ensure_ip_address_not_rate_limited, on: :create
after_create :post_to_twitter
after_create :record_ip_address
private
def ensure_content_not_spam
if UserContentSpamChecker.new(content).spam?
errors[:content] << "Post rejected."
end
end
def ensure_ip_address_not_rate_limited
if rate_limiter.exceeded?
errors[:base] << "Please try again later."
end
end
def post_to_twitter
Tweeter.new.post("New visitor! #{name} said #{content.first(50)}")
end
def record_ip_address
rate_limiter.record
end
def rate_limiter
@rate_limiter ||= UserContentRateLimiter.new(ip_address, :guestbook)
end
end
This new version is only a couple of lines shorter than our original
implementation but it knows much less about its constituent parts. Many of the
details of the “how” these features are implemented have found a dedicated home
in our domain with our model calling those collaborators to accomplish the
larger task of creating a GuestbookEntry
. These features are now
independently testable and individually addressable. They are no longer locked
in the context of creating a GuestbookEntry
. At the meager cost of a few more
files and some more code we now have simpler objects and a better set of
interfaces. These objects can be changed with less risk of ripple effects and
their interfaces can be called by other objects in the system.
Wrapping Up
“Good code invariably has small methods and small objects. I get lots of resistance to this idea, especially from experienced developers, but no one thing I do to systems provides as much help as breaking it into more pieces.”
– Kent Beck, Smalltalk Best Practice Patterns
The Unix philosophy illustrates that small components that work together through an interface can be extraordinarily powerful. Nesting an aspect of your domain as an implementation detail of a specific model conflates responsibilities, bloats code, makes tests less isolated and slower, and hides concepts that should be first-class in your system.
Don’t let your domain concepts be shy. Promote them to full-fledged objects to make them more understandable, isolate and speed up their tests, reduce the likelihood that changes in neighboring features will have ripple effects, and provide the concepts a place to evolve apart from the rest of the system. The only thing we know with certainty about the futures of our systems is that they will change. We can design our systems to be more amenable to inevitable change by following the Unix philosophy and building clean interfaces between small objects that have one single responsibility.