Architecture

Gitignore Files: The Filesystem MVP

If you’ve seen me talk about Drupal or are an avid reader of the blog, you probably know I do a lot of work with Acquia’s Build and Launch Tools (BLT). Recently BLT launched it’s 12th major release to support Drupal 9.x. During that release process there was a significant re-architecting of some of the functionality in BLT (and as a result, a fair amount of functionality was broken out into separate plugins). Dane Powell, BLT’s current maintainer, wrote up a great update guide that covers this in more detail if you’re interested.

As part of this re-architecting the long-used blt-project was sunsetted in favor of Acquia’s drupal-recommended-project. As with any new project, it has had a few minor tweaks that have been made. And one of those tweaks had to do with the .gitignore file. You know what? Having an improperly formatted .gitignore file really really makes life difficult.

So in this post I wanted to talk a little bit about why the .gitignore file is your filesystem MVP.

What is a .Gitignore File?

There’s git commands for just about anything you could possibly want to do with it. That’s one of the reasons git is so popular and so powerful. But, there isn’t actually a command to ignore something. If you have something you don’t want git to manage for you, you have to create a rule. And that rule lives in a special dot file called the .gitignore.

For more information, check out Atlassian’s documentation (which is pretty thorough). Keep in mind that .gitignore files are a git construct and not specific to any specific git host. So Github, Gitlab, Bitbucket, your local git, etc. all respect .gitignore files (and respect them in exactly the same way). So that’s pretty cool!

Why Should You Use a .Gitignore File?
(Why is it the MVP?)

I’ve talked a bit in previous posts about critical files that should be included in your repositories. It’s just as important (if not more so) to keep the wrong files out of your git repo and the .gitignore is the last line of defense for this.

In general, I would say files fall into a few categories of reasons why you shouldn’t put them in:

  1. They’re redundant

  2. They need to be regenerated as part of another process

  3. They’re actually not useful

  4. They are risky

Redundant Files

Many of the files you “could” commit in a repository are redundant due to other processes. Examples of these might be dependencies managed by dependency managers like composer or npm. You want these tools to manage which versions of your dependencies are built / deployed and not git. So you shouldn’t store the files in your repo! By excluding these files you also cut down on your repository size by a significant amount. Like tens or hundreds of thousands of files. This makes cloning and pushing significantly faster and it makes the management of these dependencies significantly simpler.

Common examples of this might be:

# Ignore drupal core.
docroot/core
# Ignore contrib modules. These should be created during build process.
docroot/modules/contrib
docroot/themes/contrib
docroot/profiles/contrib
docroot/libraries
drush/Commands
# Dependencies
vendor/
docroot/themes/custom/*/node_modules

They Should be Regenerated

An example of this one is files that are compiled as part of a build process like minified javascript and css files. It’s a best practice to never commit these files as you want to re-compile / re-minify them anytime you’re doing a build or deployment to ensure that you have the most current version.

Common examples of this might be:

# Ignore custom theme build artifacts
docroot/themes/custom/*/css
docroot/themes/custom/*/styleguide
docroot/themes/custom/*/js/dist

Note that in this case you probably would commit the unminified javascript and the scss files (if you’re using scss). You just don’t want to commit the minifed js and compiled css.

Another example here might be a complete build artifact. Meaning you want to build out something that has all the dependencies, all of the compiled front end code, sanitized for actual production use, etc. Again, in this case, you want to generate the build artifact without committing it into your development repository.

Not Useful Files

The next category is files that are actually not useful. Prime examples might be a .DS_Store generated by Mac OS or configuration files/folders generated by applications (like your IDE). Does it strictly speaking hurt to have these files in the repo? Probably not. These files represent a lot of noise that will make your git repo larger and clutter your pull requests / git diffs. They should never be in the repo. Ever.

Common examples of this might be:

# OS X
.DS_Store
.AppleDouble
.LSOverride
# Thumbnails
._*
# Files that might appear on external disk
.Spotlight-V100
.Trashes
# Windows image file caches
Thumbs.db
ehthumbs.db
# Folder config file
Desktop.ini
# Recycle Bin used on file shares
$RECYCLE.BIN/

Risky Files

The final category is less easy to prescribe but represents one of the most critical. Many projects rely on API keys or other credentials that are highly sensitive. Having these in git history could lead to a compromised application / webserver / etc. It’s definitely a best practice to gitignore files that might potentially be “accidentally” committed with these sensitivities.

An example from the Cloudflare Drupal Module:

# Ignore cloudflare CMI export file because it includes API keys.
config/default/cloudflare.settings.yml

In Conclusion

Obviously your miles may vary depending on what application you’re building for, but the principles here should apply quite broadly for software development. Here’s a complete .gitignore example for a Drupal project. Yes, there’s a lot of rules there. Also yes, many of these rules might be unnecessary (e.g. why have the $RECYCLE.BIN/ rule in there if no one on the team is developing from Windows?)

The critical thing here with a gitignore is you want to anticipate files before they end up in your repo. It’s much better to have an overly restrictive gitignore that you can gradually loosen the requirements on than to have a blank one where you are constantly realizing “oh we shouldn’t have added this.” Remember, once you commit something into git and push it, it’s incredibly challenging to get it back out again. This is particularly important to remember for those items under the “risky” category. If you push a password into a public git repository, consider it compromised.

As with many architectural (and devops) conversations, this is one that will be ongoing as your project continues! Definitely spend some time considering it up front. And if possible, rely on things like BLT to provide you with a recommended default so that you don’t have to try and think of all the things.

Photo by Giorgio Trovato on Unsplash

Related Content

Critical Tools for Development: Environment Detector

Thinking about the entire development ecosystem can be both challenging and overwhelming. It’s one of the many things that actually stop organizations and teams from following best practices and implementing a DevOps workflow. Why? It’s non-trivial to handle conditional elements in different environments.

Environmental Differences and similarities

The most common environments on web projects are:

  • local

  • continuous integration (CI / CD)

  • development (dev)

  • testing (qa / stage / test)

  • production

Now, obviously environment names are all made up and they don’t really matter anyway. So you could have 4 environments or you could have 10. The point is, there are some things between environments that should be identical and others that should be quite different.

The challenge: knowing the difference.

For instance! If I were to write a simple hello world function in PHP, it might look something like this.

<?php
print “Hello World!”;

Obviously, there is nothing special or conditional about this statement. It just prints Hello World 100% of the time. Now as we think about application development… simple print statements like this one aren’t often useful. We usually want to customize them to say hello to the specific user. We want to conditionalize them to not appear unless certain conditions are met. In other words, we want to make our code as dynamic and responsive as possible!

This is the same concept for environments. There are certain aspects of your application that can (and should) behave very differently based on the environment that they are running in. Here are a few examples:

  • caching configuration - caching should be turned WAY up in production and other cloud environments (for parity) but disabled or turned way DOWN in local / CI environments for easier development. similarly, you may want to keep javascript / css files un-minified locally (as you are developing on them) vs. in production (where performance matters)

  • security configuration - due to domain restriction, SSL certificates, etc. some of your application’s security settings may need to be relaxed locally or during CI

  • API keys - many integrations (e.g. e-commerce) mirror your web application’s multi-environment setup. you absolutely do not want your development web application communicating with your production e-commerce platform!

  • database / other settings - if you’re using a virtual solution locally (like Lando or Drupal VM) you may have an entirely different containerization strategy locally than you do in the cloud. You also likely have different database names and passwords (or you should) for local and CI environment than you do in the “real” hosting environments.

So the question becomes… just like with customizing that Hello World statement if we want to customize these scenarios… we need a way in code to differentiate between environments AND we need code that is smart enough to handle that differentiation.

Environment Detectors to the rescue

Most quality hosting organizations provide some sort of environment variables. These vary from organization so you’ll want to carefully review their documentation. But they do typically exist! Acquia’s documentation is online and shows how to do it on our platform.

TLDR: there is a super global AH_SITE_ENVIRONMENT variable that can be accessed via PHP code that spits out the environment that the code is currently running in.

Acquia also provides an open source / extendable Environment Detector. If you host with Acquia? Cool! Use ours. If you don’t, also cool! Use ours as an example for your own.

Basically, the environment detector uses the environment variables to figure out where we are. Like so:

  /**
   * Is AH prod.
   *
   * @param string|null $ah_env
   *   Environment machine name.
   *
   * @return bool
   *   TRUE if prod, FALSE otherwise.
   */
  public static function isAhProdEnv($ah_env = NULL) {
    if (is_null($ah_env)) {
      $ah_env = self::getAhEnv();
    }
    // ACE prod is 'prod'; ACSF can be '01live', '02live', ...
    return $ah_env == 'prod' || preg_match('/^\d*live$/', $ah_env);
  }

Then, you can use the method in your code. So…

<?php

use Acquia\DrupalEnvironmentDetector\AcquiaDrupalEnvironmentDetector;

public function test() {
  if (AcquiaDrupalEnvironmentDetector::isAhProdEnv()) {
    print "Hello on Prod!";
  }
  else {
    print "Hello World";
  }
}

In this case, our Hello World could be customized to say hello from prod (if we are on prod) and Hello World everywhere else.

You can take this model even farther (like BLT has) and use the Environment Detector to:

You can also use the Environment Detector in custom code!

Getting Started

So the biggest thing to know when you get started with Environment Detection is that while you CAN and SHOULD use it “a lot of places” in your Devops and Development pipelines… if you aren’t using it now it can be non-trivial to begin integrating it.

Why? Because it can change really significant portions of your workflow. Change settings files conditionally in all environments? That can (and likely) means that all of your environments are temporarily going to break while you iron out the details. I mean, that’s ok! Just know before you go in (and maybe don’t allocate a super junior person to do it).

So the take away here that I would suggest is that you plan your architecture and think through how you want to implement the conditional aspects of environments.

I would also recommend that you use Acquia’s Build and Launch Tools (BLT) as a model. We use environment detection everywhere in BLT and it really makes it quite powerful. Obviously, if you consistently host in “many” different hosting environments this gets more challenging. But just because it’s challenging doesn’t mean you shouldn’t do it!

My recommendation: start very small and very simple. Especially if you have to write your own environment detector. Make sure it works. Make sure it works up and down your stack. THEN start trying to write some simple code. THEN get more fancy and start integrating it into your devops and deployment workflows.

Related Content