The Everything Wrong Scenario: Fixing a Broken Drupal 9 Site

I’ve spent some time the last couple of days helping some colleagues deal with a horribly broken Drupal 9 site. This blog post is about the problems we faced and the actions taken to fix the problems (yes, PROBLEMS, not just one). We’ll also do a bit of root cause analysis!

Overview of Problems

1. The first thing that was noticed was that the admin/content page in the Drupal admin UI was broken. This was an area of concern, obviously both because broken page AND because the current work to remove an old theme shouldn’t have impacted the theme. This lead to some additional analysis and we discovered that the admin/content page was actually broken in the production site as well. This means that:

something broke the page “earlier”
it got missed in the code review / testing / continuous integration / deployment cycle
all of our sites (local / dev / stage / prod) were infected with the issue

2. The next thing we ran into was a composer 2 incompatibility. This in and of itself isn’t a deal breaker, but it represents a major blocker since I’ve already updated to composer 2 locally (and the project wasn’t yet compatible). To work around this, I either needed to downgrade to composer 1 or upgrade the project to composer 2. This is problematic in this scenario, as we already have a broken site and trying to upgrade all the things in the middle of a broken site is bad bad bad.

3. The next thing noted (even as trying to upgrade to support composer 2) is that there were a significant number of configuration issues related to shifting modules and Lightning. An example of this:

  Unexpected error during import with operation update for core.entity_view_display.media.instagram.embedded: The &quot;instagram&quot; plugin does not exist. Valid plugin IDs for Drupal\media\MediaSourceManager are: file, audio_file, video_file, oembed  
  :video, image, oembed:instagram, twitter, video_embed_field

This is because the instagram plugin has changed to be oembed:instagram, but with a broken site I can’t execute the necessary database updates to make that change.

Since my local was so broken, I jumped out to a cloud environment to try and reproduce the admin/content issue and work around it. The good news, is that I was able to get the error!

Uncaught PHP Exception InvalidArgumentException: "A valid cache entry key is required. Use getAll() to get all table data." at /mnt/www/html/drupal4gov/docroot/core/modules/views/src/ViewsData.php line 140 request_id="v-80413364-243b-11eb-83bf-d70cc6e981a0"

4. I tried to load up the views ui module in prod so that I could visit the view and see what’s up. That’s where I found my final error:

The module libraries does not exist.

Starting Repairs

So we are in a cluster here, as we have 4 competing problems that are all interacting together to create the perfect storm. The site is broken, we are blocked from updating, and we have a database in a bad state (module gone but still enabled). Let’s start fixing this with the “easy” stuff first.

Fixing the Libraries Problem

The libraries issue is, honestly, the easiest to fix. This is a semi-common problem in Drupal: you have removed a module from the codebase (in this case, libraries) but you didn’t properly uninstall it first.

Warning: this is dangerous and should only be done in a worst case scenario. Ideally, you would properly uninstall a module to avoid this scenario entirely (which means you uninstall it, THEN remove it from the codebase: this may need to be facilitated across multiple deployments).

To fix the libraries problem, I ran this command:

drush cdel core.extension module.libraries

drush cr

This effectively removes the configuration for the libraries module directly from active configuration so it stops throwing errors.

Fixing the Views Problem

Now that the libraries problem is fixed, I can enable views_ui and get into the admin/content view config to see what’s going on.

Screen Shot 2020-11-11 at 9.13.16 AM.png

If you look carefully, you’ll see there are FOUR broken/missing handler issues in this view. That’s a problem! The ‘quick’ fix is to delete those out of fields, filters, and relationships to solve the problem. The more thorough fix is to go into docroot/core/modules/node/config/optional/views.view.content.yml and grab the “proper” version of this view config and replace it. Once this is done, the admin/content issue should be resolved.

Fixing the Composer 2 Issues

I wrote extensively about updating to Composer 2. Hop over to that post to read more about that!

Resolving the Update Issues

So once you’ve gotten the site working with Composer 2 (and updated the config to remove libraries and resolve the admin/content view) it’s time to make sure that all the current code aligns. This should be done by following this procedure:

sync your (now working) database
update your dependencies with composer (locally)
run database updates (drush updb -y)
run lightning updates (if using lightning — drush update:lightning)
export config changes

Note that on the project, my issues in updates were due to a media_entity_instagram update that wasn’t being properly applied. In the end, this was just a database update / config update that needed to be applied… but of course I couldn’t do that because of the other issues going on.

Root Cause Analysis

So the root cause of this is a super common one: dependencies got updated without properly applying databse updates and integrating configuration updates. The fact that the admin/content page got pushed to production when it’s fatal erring is a good indication that testing didn’t occur as well.

It’s absolutely critical that anytime a module, profile, or drupal/core get updated that you run database updates, export configuration, and thoroughly test. Thoroughly testing means re-running the updates you just applied from the beginning (e.g. syncing database, running updates, importing configuration, etc.). It should also mean some manual testing!

At the end of the day, this one took a few hours to sort out. The underlying problem was actually super simple to fix, but all the other cruft in the way made it much more challenging than it “should have been” to deal with.

Do everything in your power to avoid these situations! We were not far off from trying to roll back a database and deploy tag. That is the nightmare scenario because you have to choose between content / work and a working website. Thankfully we were able to work around this issue so that isn’t necessary!