HTML Entities

The year is 2003 and you are building a website to teach others how to make websites. Internet Explorer (IE) is the dominant browser, holding about 95% of the browser market, so teaching others to use Microsoft's marquee element seems like a good idea. You code up this example for your users:

html
preview

You don't know about semantic HTML yet, or the importance of complying with web standards. Using a presentation element like marquee that only works in IE is an accepted practice of the time. But then you expand your tutorial to explain the markup:

html
preview

That didn't work. The HTML is rendered, not shown as literal HTML. You remember that a preformatted element is used to wrap up code, so you add a pre tag:

html
preview

The pre tag does not do what you want. The content is rendered in a monospace font and the whitespace is not coalesced, but any HTML in a preformatted element is still interpreted instead of being rendered literally.

You need to escape the source somehow to make it not HTML. The angle brackets < and > are what make text look like tags. Many of our traditional programming languages use backslashes to take away a character's special meaning. Perhaps you could put backslashes in front of these brackets? Try it.

Backslashes aren't the right tool for this job. Instead, you use HTML entities, which are short codes for characters. Instead of typing literal angle brackets, you use the entities &lt; and &gt;, like this:

html
preview

HTML entities can be expressed in several forms. The copyright symbol ©, for example, has three possible representations:

All three forms start with an ampersand and end in semi-colon. Not all symbols have a mnemonic form.

You don't always have to use HTML entities to produce angle brackets. The relational operators in this code display just fine:

html
preview

The angle brackets here don't form HTML tags, so entities aren't needed. But it's probably a good idea to use them anyway, just to be safe. Additionally, since entities start with ampersands, you should probably use an entity for the logical and operator too. You don't want ampersands accidentally being interpreted as HTML entities. The statement p = &copy; in a C program would not render correctly. Your carefully escaped HTML source would look like this:

html
preview

Another occasion to use HTML entities occurs when you are adding attributes to a tag. Consider this img tag, whose alt text contains quotation marks:

html
preview

This isn't going to work. Quotation marks are syntactically significant inside a tag. You can escape them using the quotation mark entity:

html
preview

The HTML specification also allows attributes to be singly-quoted, or not even quoted at all if the value contains no whitespace. However, you will not see anything but double quotation marks around attribute values in this book. Feel free to investigate on your own the various opinions that people have about how to quote attributes.

Escaping special characters is the only reason you must use HTML entities, but there are other reasons you might choose to use them. For example, you might want to add a character that is awkward to type, like the long dash or emdash:

html
preview

Using the HTML entity is probably easier than remembering the keyboard shortcut or opening your operating system's character map.

HTML entities also make the semantic intent clear. Suppose you wish to typeset a date range. Typographers insist that you use the endash to separate dates:

html
preview

Had you typed literal emdashes and endashes, someone maintaining your source code might not observe the difference. The mnemonics make the difference explicit.

Typographers want you to use HTML entities on a few other occasions. For example, when a sentence trails off, you don't put three periods. You put an ellipsis:

html
preview

Try highlighting the ellipsis. See how it's a single character instead of three? The semantic meaning is clearer. The mnemonic stands for horizontal ellipsis. There's also a vertical ellipsis that's handy for typesetting matrices.

When you quote something, typographers often want you to use curly quotes, not straight quotes. You add curly quotes with the entities &ldquo; and &rdquo;, the mnemonics of which stand for left double quote and right double quote. Compare these renditions of a pithy quotation found on Pinterest:

html
preview

The argument is that curly quotes scan better as the reader's eyes pass over the text. What do you think? If you've done a lot of programming, you probably see no problem with straight quotes. See Has the Internet Killed Curly Quotes? for more on this story.

Emoji may also be added to your HTML documents using HTML entities. Consider this thumbs-up:

html
preview

You don't need HTML entities for emoji, emdashes, endashes, ellipses, curly quotes, emoji, and all these other symbols. Your editor can substitute them in for you, you can type their keyboard shortcuts, or you can copy and paste them. Entering them literally will work just fine as long as your workflow supports Unicode. This isn't always the case, especially with emoji, so you may want to use entities until all of your software has caught up.

For example, some text editors support basic emoji but not the skin tone levels that were first introduced in 2015. An emoji's skin tone can be changed by succeeding it immediately with a modifier. Five different tones are defined: dark, medium-dark, medium, light-medium, and light, and their modifiers are shown here:

html
preview

Be aware that Unicode's spectrum of five skin tones does not adequately represent the vibrant array of real human beings.