Build your own Code Climate Analysis Engine

Michael Bernstein on Jul 7, 2015

Recently, we announced the release of the Code Climate Platform, which lets anyone create and deploy static analysis tools to an audience of over 50,000 developers. These Open Source static analysis tools are called “engines,” and in this post I’ll show you how to create one from scratch.

We’ll create an engine that greps through your source code and looks for instances of problematic words like FIXME, TODO, and BUG. This engine conforms to the Code Climate engine specification, and all of the code for this engine is available on GitHub. When we get up and running, you’ll see results like this on your command line:

Engines that you create can be tested with the Code Climate command line tool. Once you’ve got an Open Source engine that works, we’d love to chat with you about making it available on our cloud platform, so that your whole community can have access to it!

For more information join our Developer Program.

What’s an engine made of?

Instead of asking you to dive into the Code Climate engine specification, to learn what an engine is, I’ll give you a brief overview here.

A Code Climate engine is a containerized program which analyzes source code and prints issues in JSON to STDOUT.

Sound simple? We really think it is! Hopefully this blog post will illustrate what we mean.

More concretely, the FIXME engine we’re going to create contains three important files:

A Dockerfile which specifies the Docker image
A bin/fixme executable wrapper script that runs the engine
The index.js file, which contains the engine source code

There are other requirements in the specification regarding resource allocation, timing, and the shape of the output data (which we’ll see more of below), but that’s really all there is to it.

A little bit of setup

Before we write any code, you need a few things running locally to test your engine, so you might as well get that out of the way now. You’ll need the following things running locally:

A Docker environment (we recommend Docker For Mac for OSX development.
The Code Climate CLI tool (you can brew tap codeclimate/formulae && brew install codeclimate on OSX)

Run codeclimate -v when you’re done. If it print a version number, you should be ready to go!

FIXME

The idea for FIXME came to us when we were brainstorming new engine ideas which were both high value and easy to implement. We wanted to release a sort of Hello, world engine, but didn’t want it to be one that did something totally pointless. Thus, FIXME was born.

The FIXME engine looks for (case-insensitive, whole word) instances of the following strings in your project’s files:

TODO
FIXME
HACK
BUG
XXX

This is not a novel idea. It’s well known that instances of these phrases in your code are lurking problems, waiting to manifest themselves when you least expect it. We also felt it worth implementing because running a FIXME engine in your workflow has the following benefits:

Existing FIXMEs hacks will be more visible to you and your team
New FIXMEs will bubble up and can even fail your pull requests if you configure them properly on codeclimate.com

Pretty nifty for around 75 lines of code.

To achieve this, the engine performs a case insensitive grep command on all of the files you specify, and emits Code Climate issues wherever it finds one.

Implementing an engine in JavaScript

The meat of the actual engine is in the index.js file, which contains around 50 lines of JavaScript. The entirety of the file can be found here. I’ll highlight a few important sections of the code for the engine below, but if you have any questions, please open an issue on the GitHub repo and I’ll try my best to answer promptly!

On to the code. After requiring our dependencies and typing out the module boilerplate, we put the phrases we want to find in grep pattern format:

var fixmeStrings = "'(FIXME|TODO|HACK|XXX|BUG)'";

This will be used in a case insensitive search against all of the files the engine we’ll analyze.

Next, we create a function that we will use to print issues to STDOUT according to the issue data type specification in the engine spec. The printIssue function accepts a file name, a line number, and the issue string,

var printIssue = function(fileName, lineNum, matchedString){
  var issue = {
    "type": "issue",
    "check_name": "FIXME found",
    "description": matchedString + " found",
    "categories": ["Bug Risk"],
    "location":{
      "path": fileName,
      "lines": {
        "begin": lineNum,
        "end": lineNum
      }
    }
  };

  // Issues must be followed by a null byte
  var issueString = JSON.stringify(matchedString)+"\0";
  console.log(issueString);
}

This data format contains information about the location, category, and description of each issue your engine emits. It’s at the heart of our engine specification and massaging data from an existing tool to conform to this format is typically straightforward.

The data in the JSON your engine prints will be consumed by the CLI and if you join our Developer Program and work with us, it can also be made available to all users of codeclimate.com. We’ll work with you to ensure your engine is spec compliant and meets our security and performance standards, and get your work in front of a lot of people!

The actual code that greps each file isn’t super interesting, but you should check it out on GitHub and open an issue on the repo if you have a question.

Because it’s a requirement of engines to respect the file exclusion rules passed to it by the CLI or our cloud services, though, I’ll show a bit of how that works:

// Uses glob to traverse code directory and find files to analyze,
// excluding files passed in with by CLI config
var fileWalk = function(excludePaths){
  var analysisFiles = [];
  var allFiles = glob.sync("/code/**/**", {});

  allFiles.forEach(function(file, i, a){
    if(excludePaths.indexOf(file.split("/code/")[1]) < 0) {
      if(!fs.lstatSync(file).isDirectory()){
        analysisFiles.push(file);
      }
    }
  });

  return analysisFiles;
}

Here I am using the NPM glob module to iterate over all of the files starting at /code recursively. This location also comes from the engine specification. The fileWalk function takes an array of excludePaths, which it extracts from /config.json (this will be made available to your engine after the CLI parses a project’s .codeclimate.yml file). This all happens in the main function of the engine, runEngine:

FixMe.prototype.runEngine = function(){
  // Check for existence of config.json, parse exclude paths if it exists
  if (fs.existsSync("/config.json")) {
    var engineConfig = JSON.parse(fs.readFileSync("/config.json"));
    var excludePaths = engineConfig.exclude_paths;
  } else {
    var excludePaths = [];
  }

  // Walk /code/ path and find files to analyze
  var analysisFiles = fileWalk(excludePaths);

  // Execute main loop and find fixmes in valid files
  analysisFiles.forEach(function(f, i, a){
    findFixmes(f);
  });
}

This main function gives hopefully gives you a clear picture of what this engine does:

It parses a JSON file and extracts an array of files to exclude from analysis
It passes this list of files to a function that walks all files available to the engine, and produces a list of files to be analyzed
It passes the list of analyzable files to the findFixmes function, which greps individual files and prints them to STDOUT

Packaging it up

How engines are packaged as Docker containers is important: it has it’s own section of the engine specification. The Dockerfile for FIXME is pretty typical:

FROM node

MAINTAINER Michael R. Bernstein

RUN useradd -u 9000 -r -s /bin/false app

RUN npm install glob

WORKDIR /code
COPY . /usr/src/app

USER app
VOLUME /code

CMD ["/usr/src/app/bin/fixme"]

Here’s a breakdown of each line (for more information about each directive, see the official Docker documentation):

The official node Docker container is the basis for this engine container. It has node and npm installed, and generally makes our lives easier.
Declare a maintainer for the container.
Create the app user to run the command as specified.
Install packages with npm install glob so that the external dependency is available when the engine runes.
Set the WORKDIR to /code, where the source to be analyzed will be mounted.
Copy the engine code to /usr/src/app.
Use the app user that we created earlier.
Mount /code as a VOLUME per the spec
Our engine specification says that the engine should launch and run immediately, so we use CMD to achieve this. In the case of FIXME, the executable wrapper script instantiates the engine we wrote in JavaScript above, and runs it. Check it out:

#!/usr/bin/env node

var FixMe = require('../index');
var fixMe = new FixMe();

fixMe.runEngine();

We now have all of the pieces in places. Let’s test it out.

Testing your engine locally

If you want to test the code for this engine locally, you can clone the codeclimate-fixme repository locally, and follow these steps:

Build the docker image with docker build -t codeclimate/codeclimate-fixme . (You must be inside the project directory to do this)
Make sure the engine is enabled in the .codeclimate.yml file of the project you want to analyze:

  engines:
    fixme:
      enabled: true

Test the engine against the engine code itself (whoooah) with codeclimate analyze --dev

And you should see some results from test/test.js! Pretty cool, right?

Note that if you want to test modifications you are making to this engine, you should build the image with a different image name, e.g. codeclimate/codeclimate-fixme-YOURNAME. You would then add fixme-YOURNAME to your .codeclimate.yml file as well.

If you get stuck during development, invoke codeclimate console and run:

Analyze.new(['-e', 'my-engine', '--dev']).run

And you should be able to see what’s going on under the hood.

What will you build?

Hopefully seeing how straightforward an engine can be will give you lots of great ideas for engines you can implement on your own. If tools for your language don’t exist, contact us, and maybe we can help you out!

Simple ideas like FIXME have a lot of power when your entire team has access to them. Wire up the codeclimate CLI tool in your build process, push your repositories to Code Climate, and keep pursuing healthy code. We can’t wait to see what you’ll build.