Your Objects, the Unix Way

Applying the Unix Philosophy to Object-Oriented Design

John Pignata on Nov 28, 2012

Today I’m thrilled to present a guest post from my friend John Pignata. John is Director of Engineering Operations at GroupMe and writes a great blog on Ruby software engineering that you should absolutely check out.

In 1964, Doug McIlroy, an engineer at Bell Labs, wrote an internal memo describing some of his ideas for the Multics operating system. The surviving tenth page summarizes the four items he felt were most important. The first item on the list reads:

“We should have some ways of coupling programs like [a] garden hose – screw in another segment when it becomes necessary to massage data in another way.”

This sentence describes what ultimately became the Unix pipeline: the chaining together of a set of programs such that the output of one is fed into the next as input. Every time we run a command like tail -5000 access.log | awk '{print $4}' | sort | uniq -c we’re benefiting from the legacy of McIlroy’s garden hose analogy. The pipeline enables each program to expose a small set of features and, through the interface of the standard streams, collaborate with other programs to deliver a larger unit of functionality.

It’s mind-expanding when a Unix user figures out how to snap together their system’s various small command-line programs to accomplish tasks. The pipeline renders each command-line tool more powerful as a stage in a larger operation than it could have been as a stand-alone utility. We can couple together any number of these small programs as necessary and build new tools which add specific features as we need them. We can now speak Unix in compound sentences.

Without the interface of the standard streams to allow programs to collaborate, Unix systems might have ended up with larger programs with duplicative feature sets. In Microsoft Windows, most programs tend to be their own closed universes of functionality. To get word count of a document you’re writing on a Unix system, you’d run wc -w document.md. On a system running Windows you’d likely have to boot the entire Microsoft Word application in order to get a word count of document.docx. The count functionality of Word is locked in the context of use of editing a Word document.

Just as Unix and Windows are composed of programs as units of functionality, our systems are composed of objects. When we build chunky, monolithic objects that wrap huge swaths of procedural code, we’re building our own closed universes of functionality. We’re trapping the features we’ve built within a given context of use. Our objects are obfuscating important domain concepts by hiding them as implementation details.

In coarse-grained systems, each single object fills multiple roles increasing their complexity and resistance to change. Extending an object’s functionality or swapping out an implementation for another sometimes involves major shotgun surgery. The battle against complexity is fought within the definition of every object in your system. Fine-grained systems are composed of objects that are easier to understand, to modify, and to use.

Some years after that memo, McIlroy summarized the Unix philosophy as: “write programs that do one thing and do it well. Write programs to work together.” Eric Raymond rephrased this in The Art of Unix Programming as the Rule of Modularity: “write simple parts connected by clean interfaces.” This philosophy is a powerful strategy to help us manage complexity within our systems. Like Unix, our systems should consist of many small components each of which are focused on a specific task and work with each other via their interfaces to accomplish a larger task.

Everything But the Kitchen Sink

Let’s look at an example of an object that contains several different features. The requirement represented in this code is to create an old-school web guestbook for our home page. For anyone who missed the late nineties, like its physical analog a web guestbook was a place for visitors to acknowledge a visit to a web page and to leave public comments for the maintainer.

When we start the project the requirements are straightforward: save a name, an IP address, and a comment from any visitor that fills out the form and display those contents in an index view. We scaffold up a controller, generate a migration, a new model, sprinkle web design in some ERB templates, high five, and call it a day. This is a Rails system, I know this.

Over time our requirements begin growing and we slowly start adding new features. First, in real-life operations we realize that spammers are posting to the form so we want to build a simple spam filter to reject posts containing certain words. We also realize we want some kind of rate-limiting to prevent visitors from posting more than one message per day. Finally, we want to post to Twitter when a visitor signs our guestbook because if we’re going to be anachronistic with our code example, let’s get really weird with it.

require "set"

class GuestbookEntry < ActiveRecord::Base
  SPAM_TRIGGER_KEYWORDS = %w(viagra acne adult loans xxx mortgage).to_set
  RATE_LIMIT_KEY = "guestbook"
  RATE_LIMIT_TTL = 1.day

  validate :ensure_content_not_spam, on: :create
  validate :ensure_ip_address_not_rate_limited, on: :create

  after_create :post_to_twitter
  after_create :record_ip_address

  private

  def ensure_content_not_spam
    flagged_words = SPAM_TRIGGER_KEYWORDS & Set.new(content.split)

    unless flagged_words.empty?
      errors[:content] << "Your post has been rejected."
    end
  end

  def ensure_ip_address_not_rate_limited
    if $redis.exists(RATE_LIMIT_KEY)
      errors[:base] << "Sorry, please try again later."
    end
  end

  def post_to_twitter
    client = Twitter::Client.new(Configuration.twitter_options)
    client.update("We had a visitor! #{name} said #{content.first(50)}")
  end

  def record_ip_address
    $redis.setex("#{RATE_LIMIT_KEY}:#{ip_address}", RATE_LIMIT_TTL, "1")
  end
end

These features are oversimplified but set that aside for the purposes of this example. The above code shows us a hodgepodge of entwined features. Like the Microsoft Word example of the word count feature, the features we’ve built are locked within the context of creating a GuestbookEntry.

This kind of approach has several real-world implications. For one, the tests for this object likely exercise some of these features in the context of saving a database object. We don’t need to roundtrip to our database in order to validate that our rate limiting code is working, but since we’ve hung it off an after_create callback that’s likely what we might do because that’s the interface our application is using. These tests also likely littered with unrelated details and setup due to the coupling to unrelated but neighboring behavior and data.

At a glance it’s difficult to untangle which code relates to what feature. When looking at the code we have to think about at each line to discern which of the object’s behavior that line is principally concerned with. Clear naming helps us in this example but in a system where each behavior was represented by a domain object, we’d be able to assume that a line of code related to the object’s assigned responsibility.

Lastly, it’s easy for us to glance over the fact that we have, for example, the acorn of a user content spam filter in our system because it’s a minor detail of another object. If this were its own domain concept it would be much clearer that it was a first-class role within the system.

Applying the Unix Philosophy

Let’s look at this implementation through the lens of the Rule of Modularity. The above code fails the “simple parts, clean interfaces” sniff test. In our current factoring, we can’t extend or change these features without diffusing more details about them into the GuestbookEntry object. The interface by which our model uses this behavior through internal callbacks trigged through the object’s lifecycle. There are no external interfaces to these features despite the fact that each has their own behavior and data. This object now has several reasons to change.

Let’s refactor these features by extracting this behavior to independent objects to see how these shake out as stand-alone domain concepts. First we’ll extract the code in our spam check implementation into its own object.

require "set"

class UserContentSpamChecker
  TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set

  def initialize(content)
    @content = content
  end

  def spam?
    flagged_words.present?
  end

  private

  def flagged_words
    TRIGGER_KEYWORDS & @content.split
  end
end

Features like this have serious sprawl potential. When we first see the problem of abuse we’re likely we respond with the simplest thing that could work. There’s usually quite a bit of churn in this code as our combatants expose new weaknesses in our implementation. The rate of change of our spam protection strategy is inherently different than that of our GuestbookEntry persistence object. Identifying our UserContentSpamChecker as its own dedicated domain concept and establishing it as such will allow us to more easily maintain and extend this functionality independently of where it’s being used.

Next we’ll extract our rate limiting code. Some small changes are required to decouple it fully from the guestbook such as the addition of a namespace.

class UserContentRateLimiter
  DEFAULT_TTL = 1.day

  def initialize(ip_address, namespace, options = {})
    @ip_address = ip_address
    @namespace  = namespace
    @ttl        = options.fetch(:ttl, DEFAULT_TTL)
    @redis      = options.fetch(:redis, $redis)
  end

  def exceeded?
    @redis.exists?(key)
  end

  def record
    @redis.setex(key, @ttl, "1")
  end

  private

  def key
    "rate_limiter:#{@namespace}:#{@ip_address}"
  end
end

Now that we have a stand-alone domain object, more advanced requirements for this rate limiting logic will only change this one object. Our tests can exercise this feature in isolation apart from any potential consumer of its functionality. This will not only speed up tests, but help future readers of the code in reading the tests to understand the feature more quickly.

Finally we’ll extract our call to the Twitter gem. It’s tiny, but there’s good reason to keep it separate from our GuestbookEntry. Since Twitter and the gem are third-party APIs, we’d like to isolate the coupling to an adapter object that we use to hide the nitty-gritty details of sending a tweet.

class Tweeter
  def post(content)
    client.update(content)
  end

  private

  def client
    @client ||= Twitter::Client.new(Configuration.twitter_options)
  end
end

Now that we have these smaller components, we can change our GuestbookEntry object to make use of them. We’ll replace the extracted logic with calls to the objects we’ve just created.

class GuestbookEntry < ActiveRecord::Base
  validate :ensure_content_not_spam, on: :create
  validate :ensure_ip_address_not_rate_limited, on: :create

  after_create :post_to_twitter
  after_create :record_ip_address

  private

  def ensure_content_not_spam
    if UserContentSpamChecker.new(content).spam?
      errors[:content] << "Post rejected."
    end
  end

  def ensure_ip_address_not_rate_limited
    if rate_limiter.exceeded?
      errors[:base] << "Please try again later."
    end
  end

  def post_to_twitter
    Tweeter.new.post("New visitor! #{name} said #{content.first(50)}")
  end

  def record_ip_address
    rate_limiter.record
  end

  def rate_limiter
    @rate_limiter ||= UserContentRateLimiter.new(ip_address, :guestbook)
  end
end

This new version is only a couple of lines shorter than our original implementation but it knows much less about its constituent parts. Many of the details of the “how” these features are implemented have found a dedicated home in our domain with our model calling those collaborators to accomplish the larger task of creating a GuestbookEntry. These features are now independently testable and individually addressable. They are no longer locked in the context of creating a GuestbookEntry. At the meager cost of a few more files and some more code we now have simpler objects and a better set of interfaces. These objects can be changed with less risk of ripple effects and their interfaces can be called by other objects in the system.

Wrapping Up

“Good code invariably has small methods and small objects. I get lots of resistance to this idea, especially from experienced developers, but no one thing I do to systems provides as much help as breaking it into more pieces.”

– Kent Beck, Smalltalk Best Practice Patterns

The Unix philosophy illustrates that small components that work together through an interface can be extraordinarily powerful. Nesting an aspect of your domain as an implementation detail of a specific model conflates responsibilities, bloats code, makes tests less isolated and slower, and hides concepts that should be first-class in your system.

Don’t let your domain concepts be shy. Promote them to full-fledged objects to make them more understandable, isolate and speed up their tests, reduce the likelihood that changes in neighboring features will have ripple effects, and provide the concepts a place to evolve apart from the rest of the system. The only thing we know with certainty about the futures of our systems is that they will change. We can design our systems to be more amenable to inevitable change by following the Unix philosophy and building clean interfaces between small objects that have one single responsibility.