Problem Driven Development

Introduction

I recently worked on building a bespoke pre-commit hook that lints new code with standardjs to enforce a unified company-wide code style. We needed this because many engineers consistently forgot to lint code to match company accepted style. In this post, I intend to walk you through how we achieved this and more importantly, how the script evolved into its final form. I know many have argued against pre-commit hooks but I’ll not dwell on their pros or cons here. We decided as a team to use pre-commit hooks to achieve this and that’s all that matters for the moment.

The pre-commit hook had to meet these requirements.

It installs transparently without a developer’s intervention.
Runs before every commit — obvious, isn’t it?
It runs only on staged files.
It ignores deleted and renamed files.
Exhaustively lint all files; do not stop on the first linting error.
Bonus: Ignore non-staged changes, i.e. if a file is partially staged (via git add -p), only lint the staged part.

Update the script to run the linting on what’s in the index not on the filesystem. this way if a user does git add . and lint later, they can’t commit until they stage the new linted changes

Setup

For ease of distribution and installation, we settled on the husky package. It makes it super easy to setup git hooks and hides all the complexity of running the hook from end users. The documentation is quite straightforward, so I won’t repeat that here. Once setup, Husky generates a hidden directory .husky and places the pre-commit hook there. Your job then is to script your desired behavior upon commit. Upon npm install, Husky will transparently install any new pre-commit hook on your behalf.

Initial Script — Running blindly on all files

The first approach that comes to mind is to simply run standardjs non-discriminately whenever the script is invoked.

#!/bin/bash
npx standard 

Iteration two — Running only on staged files

The previous script runs the linter on all files in the repo, which might be tolerable for very small codebases, but for any significantly large codebase, this will be noticeably slower than desired. A bigger problem with this approach is that it defeats the whole point of pre-commit hooks: you usually care about the new changes you’re committing, not the whole repo.

To address the shortcomings of the previous approach, I discovered I could exploit the git diff tool in a clever way. Typically, when you run git diff, you get the diffs in hunks for each file, but by applying the --name-only and --cached flags, git would return only the names of the files that have been staged for commit instead. With the changed filenames in hand, you just had to run them through standardjs, which accepts individual files too.

#!/bin/bash
stagedFiles=$(git diff --name-only --cached | grep -e "*.js")
for stagedFile in "$stagedFiles"; do
	npx standard "$stagedFile"
done

Iteration three — Ignoring Deleted/Renamed Files and some (premature?) Optimizations

The above script is quite robust, but it still has a couple of problems in that it doesn’t ignore deleted or renamed files, and it invokes a whole process just to filter out non-javascript files.

From the documentation of git diff, that was easy to fix; we just needed to add two new flags; --diff-filter=AM and -- *.js.

#!/bin/bash
stagedFiles=$(git diff --name-only --cached --diff-filter=AM -- "*.js")
for stagedFile in "$stagedFiles"; do
	npx standard "$stagedFile"
done

Iteration four — Refactoring to improve compatibility

The script now worked well, but during code review, someone who was more experienced with bash pointed out we could improve compatibility, avoid one extra variable by just piping the results to read, and also handle file names with special characters and spaces well by applying the -r flag to read. So I did that.

#!/bin/bash
git diff --name-only --cached --diff-filter=AM -- "*.js" | \
while read -r stagedFile; do
	npx standard "$stagedFile"
done

Final iteration — Attempting to support partially staged files (and giving up)

At this point, the script met almost all the requirements except the last. It’s not a common occurrence, so it wasn’t really a priority, and we could consider the script complete, but I’m a perfectionist… so I decided to handle that case too — or so I thought.

This corner case turned out to be quite challenging to overcome, but git checkout-index proved promising. This command allows one to copy the current state of all files in the index into the current working directory. It’s like saying that, hey git, don’t just give me the diffs; give me the whole file with its currently staged modification. This effectively meant that if you stage a part of a file, only that staged portion will reflect when you invoke that command; all other unstaged changes will not. This command, however, could not ignore renamed files, so I had to find a workaround. Fortunately, it took an optional file name, so all I had to do was start from git diff with the appropriate flags and pipe the individual files into git checkout-index.

#!/bin/bash
HEAD=$(git rev-parse HEAD)
# HEAD helps avoid conflicting with files from other running processes
TMP_DIR="/tmp/$HEAD/" 
git diff --name-only --cached --diff-filter=AM -- "*.js" | \
while read -r stagedFile; do
	git checkout-index -f --prefix="$TMP_DIR" "$stagedFile" 
	npx standard "$TMP_DIR/$stagedFile"
done

rm -rf "$TMP_DIR" 

Despite additional complexity, this script almost works, except it has one problem: standard-js will report the file at prefix/filename instead of the actual file name we care about when there’s an error. This could be resolved by capturing standardjs output and piping it through sed and deleting the prefix from it, but at that point, I felt like it was too much work to handle this corner case, so I relented. The final script stayed at iteration four.

Final Script

#!/bin/bash
git diff --name-only --cached --diff-filter=AM -- "*.js" | \
while read -r stagedFile; do
	npx standard "$stagedFile"
done

Parting Thoughts

For the few patient ones who read to the end, you may be wondering why this uninteresting article? Well, I wanted to put across the point that one doesn’t have to know upfront exactly how a solution to a problem should look like but instead allow to evolve as they solve it by piecemeal. Many real world code you may encounter may look neatly polished but they’re usually a product of constant evolution and improvement over a long period. Prior to writing this, I knew little about the git commands I used to solve the problem. I just had to understand the problem I was dealing with and all of its constraints. That ultimately led me to a solution, or at least close to one. Put differently, the normal way to write working code is to write code that’s not great at first and let it evolve until it meets its requirements. This means we should choose tools, frameworks, architectural design, etc. on the go and as dictated by the business problem at hand — I call this problem driven development.