Overlooking Security Vulnerabilities: The Danger of HTML

I was recently working on a security audit and I spent a reasonable amount of time flagging issues I’ve found in a Drupal 9 site. Would you believe that some of the most serious vulnerabilities I’ve found are actually HTML vulnerabilities?

HTML is a language that is probably older than some of you reading this (it was released in 1993) but it still represents the backbone of any website. Even if you have nifty PHP, Javascript (like React, Angular, Vue, etc.), or whatever else actually running your platform, it still renders down to plain HTML.

The result of this is that people can invariably forget about the really incredibly stupid things you can do with HTML. Or the incredibly dangerous things you can do with HTML. So while you probably don’t have to worry about a SQL Injection vector from HTML code, that doesn’t mean there isn’t a danger.

Let’s hop in the way back machine and talk about some of the many ways HTML can bit you, even if you’re using a modern web content management system (like Drupal and Wordpress).

Dangerous HTML Tags

First and foremost, let’s get this out of the way: HTML itself isn’t all that dangerous. What is dangerous is what you can do with HTML.

Some of the most critical tags include those that bring in or embed content from other sources. These are:

  • embed

  • iframe

  • video

  • img

  • script

  • style

  • etc.

Now, you might be thinking, but Mike, we always use Javascript on websites. We embed videos and the like all the time! And yes, that is totally legit. Let’s add a bit more context here.

When there’s a script tag on a website (hey, like this one) the visitor to the site doesn’t get a great deal of control over which scripts run (unless you happen to have a script blocking extension installed, say, an Ad Blocker or the like). When those scripts (or whatever) run, the end user doesn’t necessarily “know” all the things that are happening. For instance, did you know I’m running Google Analytics on this page, right now? Of course, Google Analytics is pretty harmless, it anonymizes your data, etc. But if you haven’t read my privacy policy (or looked at the source code of this nifty Square Space site) you may not have realized it was there.

Let’s expand that out a bit farther. Let’s imagine that someone compromises your website. Obviously they are going to do that with something other than HTML. BUT once they are inside your website, they could use HTML to

How does that Impact Drupal / Wordpress

Remember, a content management system is designed to manage content. Anytime you start running things “outside” the CMS (on other servers, etc.) the CMS loses control over what it is serving up.

In a typical world for Drupal / Wordpress, if you want custom javascript or HTML to execute, you have to build and test it and then deploy it out as part of the application. Usually it’s part of the templating system in the CMS. But when your WYSIWYG editor is wide open and you can just throw those tags directly into the body, you’re actually exposing yourself to potential security vulerabilities!

One of the most common scenarios I’ve run across is this:

  • create a new page

  • content gets partially typed up

  • content gets partially copied and pasted in from another scource

This past piece is the gotcha. If the WYSIWYG isn’t configured to scrape out HTML, you may be copying and pasting in tags you didn’t even know were there. And if you accidentally copy in javascript that starts executing on your site, then you have rogue code running that not only do you not control but you may not even know it’s there!

The general takeaway (particularly in Drupal) is that if you need to do an integration with an outside webpage / script that you do not directly control, you need to very tightly restrain who can integrate these tags / scripts. If “just anyone” can drop an iFrame or embed tag onto a page, then that can represent a really significant security vulnerability.

Let’s imagine you have to have an iframe. Let’s talk about best case scenarios!

  1. Don’t implement it via the WYSIWYG. build out a block or some other content entity that lets you put in the pertinent details (usually the URL) and then transform that url into an iFrame using other mechanisms. TLDR this doesn’t allow the WYSIWYG to accept the iFrame (but you still get an iFrame).

  2. Use something like the Seckit module to restrict where scripts and/or embeds/iFrames can run from. This will help make sure that if you are using these types of HTML tags you are controlling where they can communicate to

  3. Heavily restrict which types of users can place these types of tags / content

  4. Rely on services that provide their own embed features (like YouTube!)

In Conclusion

At the end of the day, HTML is such an old technology it’s quite easy to overlook how dangerous it can be. And yes, I realize that the real culprit here is almost always going to be javascript. You’re not wrong if you’re thinking that. javascript is dangerous! BUT in this case, if javascript gets in the front door to your metaphorical house because you hung the keys on the front stoop (or worse just left the door hanging open altogether) then it’s not just javascript’s fault.

Make sure you have smart, sensible restrictions in place. Just because a WYSIWYG editor CAN be Dreameaver 2021 and use “any HTML under the sun” doesn’t mean it should be. Heavily restrict HTML tags to styling tags only and leave all the scripting and layout tags to other, more sensible tools.

Photo by Candace McDaniel from StockSnap

Related Content