Code Review is Critical (so why don't more projects do it?)

I recently wrote about how frustrating it is as a hiring manager when code samples come in that don’t live up to best practices. This isn’t just a factor of hiring though. It always surprises me how many teams I interact with that don’t do any formal code review process at all (even automated tools)! Why does it matter? Let’s dig into that.

Why is Code Quality so Important?

This question is one that I hear a lot. Or, I hear something along the lines of “we don’t care about that here” (which, invariably, means that they don’t understand the answer to this question).

Code quality means a few things to me:

  • code is written thoughtfully to conform to industry best practices

  • code is written in both a secure and performant way

  • code is written to align as a team to a common architecture.

Code quality may mean other things to you in other contexts, but I believe these are the big ones.

At the end of the day, we want to ensure that:

  • code (and code history) can be easily read and contributed to by anyone (both experienced and new team members)

  • code is easily managed and updated

  • code can be easily and reliably tested.

Each of these areas of focus become much more challenging when code is of poor quality and/or overly complicated. It becomes even worse if it abandons industry / community best practices and delves deeply into homebrew solutions that only a small group of individuals can maintain / understand.

Example

Let’s quickly examine with a few lines of code:

/**
 * {@inheritdoc}
 */
public function getCacheableMetadata($name) {
  return new CacheableMetadata();
}

Right off the bat, there are two things I want to call attention to.

  1. the return statement is exactly two spaces indented on the line. not one “tab” character, not four space. two spaces.

  2. we are calling CacheableMetadata, not Drupal\Core\Cache\CacheableMetadata

it’s important to be conscious of the effects our actions may have on entire ecosystems
— https://www.nps.gov/articles/leave-no-trace-seven-principles.htm

In both of these cases I am adhering to Drupal best practices and common coding conventions from the Drupal community. Does it really matter if we are using 2 spaces, 4 spaces, a tab, a space, etc.? I mean, yes—if you subscribe to Drupal coding standards, yes it does. But that’s not the hill I want to die on today. Let’s for now, say, “no” it doesn’t matter what you use as long as everyone else on your team is using the same thing. If everyone on the team is doing it slightly different we will have significant churn in our git history every single time someone opens a file and changes anything.

To me, I want to follow principles similar to the “leave no trace principles” that are so common amongst hikers and campers. When I am editing a file I want to do what I need to do with the most minimal impact possible on that file. If I open a pull request that is supposed to change one line of code and the resulting pull request contains hundreds (or even thousands) of changed files? That’s really bad! One line of code changed in development should result in exactly one line of code changed in a pull request. No more, no less.

The other thing in this particular example is relying on a use statement for my return statement. I’m returning an instance of the CacheableMetadata class and doing so by relying on a use statement farther up in the file (not included in the example). While you can technically fully namespace a class here, it’s not best practices to do so. Again, you don’t want half your team doing it one way and another half doing it another.

Conforming to Best Practices

You may personally disagree with some best practices. That’s ok! Some of them annoy me to no end (I particularly hate 80 character per line when writing comments). One of the most important reasons to get over that frustration and follow them is automation. The last thing I want to do when reviewing a pull request is read code line by line and count spaces, comment length, etc. Do I see errors doing this? Yes! Do I catch them all and/or enjoy doing it? No!

Using automated code quality tools like PHPCS, GrumPHP, etc. help eliminate the need to manually review the “easy” stuff. The Drupal community even has a special project called Coder that implements common / best practices for PHPCS to make sure that you are specifically conforming to Drupal standards! USE IT ON EVERY PROJECT! If you aren’t a PHP dev there are many other frameworks with similar toolsets (e.g. Gulp has many different options that we commonly use on projects for SCSS and JS linting).

If your team is committed to following coding standards and you use automated tools (like PHPCS) to automate the testing of these standards, this is a major battle won. Code quality scanning should happen on every pull request which in turn should happen any time code changes. As the architect, I don’t even look at pull requests until code quality scans pass. I would still encourage manual code review, however.

I recently had a circumstance on a Drupal project were my automated code sniffing tools were passing. However, on closer inspection, I discovered that a developer had hard coded a HTML table template inside the build function of a PHP class. (For those non-Drupalists reading, the Twig template system is intended for exactly this sort of thing). Was this developer’s code “valid?” Sure! It was passing PHPCS. Was it “the right way to do it?” Definitely not. In this example, a dozen lines of HTML has no place in PHP. I asked the dev to do some minor refactoring (it ended up being a largely copy / paste job to dump the table into Twig and some minor refactoring to use a theme function instead of just returning the table directly).

This scenario is the best of both worlds. Automation is running in parallel with manual review. The automated tools are looking for all the common issues that can easily be found just parsing the code, the manual review is applying logic and context to more complex questions like “should this code live here” and “was this build the right way.”

Writing Secure and Performant Code

A question I hear constantly is “can we automatically scan for security vulnerabilities in code.” The short answer is “no.” Oh sure, there are things you can do like drush pm:security that will tell you if you are using a contributed project that has a known security vulnerability. But if you’re writing custom code and doing silly things like not escaping user input, hard coding api keys or passwords, opening yourself to SQL injection, etc. there isn’t a magical security fairy that is going to warn you. Obviously Drupal.org has attempted to capture many of the best practices surrounding security and performance. But at the end of the day, this requires manual review from a Drupal expert. If you’re writing Wordpress code, or Sharepoint code, or whatever code, you’ll be in a similar boat. You need someone who is highly proficient not just in the language of code being written but in the best practices for the framework that you are writing that code for. Sure, there are broad swaths of best practices for PHP applications. But if you’re writing Drupal applications following Wordpress best practices… there’s a pretty high probability you’re going to mess something up. If you’re following only PHP best practices while writing Drupal code, you are likely doing more work than you need too, since Drupal (and Symfony) both provide a lot of tools to help you write more secure and performant code. For instance, you could write a custom method to sanitize a string of text using PHP. Or you could just use the out of the box Html::escape() method provided by Drupal core.

Acquia’s PS team actually has a dedicated team for doing security and performance reviews as part of all of our custom builds. We go back and make sure that what we’ve built is performant and secure. We also do this as part of our manual / ongoing code review process. But we even assume that our architects and developers will miss something somewhere so we still do an aggressive review right before we go live (after we are feature complete) to make sure that everything is secure and performant.

Aligning Architecture

The final point here becomes more critical the larger your team is. Obviously if everyone is following the same coding standards (good) but putting the same “sorts” of things in a bunch of different places, you can still end up with a real mess on your hands! This is a task I always put on the architect (or lead developer): devise a strategy for each story as to where the code should live. In the Drupal-world, this may be a module vs. the theme. This may be the config/install directory vs. the config/optional directory. Could what the developer is building work in both places? Yep! Should it go in either place? Definitely not.

I firmly believe in my developers’ ability to figure this stuff out on their own. I also firmly believe that a group of ten intelligent people will necessarily bring different solutions to the table. So I’m explicit in my implementation details on how I want something architected. This ensures that as developers implement code they are using the tools I want and putting the solutions in the places I want them to go. It does create work for me (or the architect) but it also helps ensure that the overall code standards are much higher at the end of the build.

One of the major struggles in any shared building model is trying to make it seem as if there is a single contributor. This is readily apparent when writing articles, books, etc. If you have five authors all writing your book with sightly different tones of voice (passive vs. active), spellings (American English vs. British English), and tense (past vs. present) every time your reader changes sections it’s going to read like a totally different piece of literature. This would be unacceptable! Code quality should be much the same way. Code should read like one person wrote all of it wether there is one person or thirty contributing.

Remember: it’s really easy to look at the work a developer did, test it, read it, and go yep. This is great! Let’s merge it. But, you forgot to look at where the work was done. Maybe there was already work in the project that does 90% of what was required for this ticket. Instead of modifying 10% of the code and adding, the pull request adds entirely new code and duplicates a bunch of logic. Maybe all of our functions of a certain type live a particular place, and this one now lives somewhere else.

To me, this all falls under the category of code quality. Sure, it’s not “line to line” code quality like the spacing issue is, but it’s still a critical part of the overall quality and readability of the code.

In Conclusion

If you’re on a team that doesn’t use continuous integration, doesn’t use pull requests, and just opens feature branches for “quick testing before merging into the integration branch” this article is for you. I realize that everything I’m talking about here adds overhead to the development process. You have to pay the piper sometime. For me, I would much rather a more time consuming / costly development process than production downtime. In this case, the best results come both from automating the infrastructure and continuing to manually interject.

Photo by Andy Kelly on Unsplash