HTML started as a small language for merely marking up the structure of a document, leaving the browser to decide how the structures would be rendered. However, as the language grew in popularity, and as browsers deviated from the standard to add glitz and glam, HTML began to acquire tags for changing the visual presentation. The font
tag, for example, could be used to set the face, color, and size of text. The u
tag could be used to underline text, the strike
tag to cross out text, and the center
tag to align its child elements. These new tags had little too do with the structure of the document and a lot to do with its presentation.
Most of these presentation tags have since been removed from HTML. With HTML5, the standards committee began separating structure from presentation. HTML was restored as the language of structure. Presentation was to be expressed elsewhere, in stylesheets, a topic you'll read about soon.
When an HTML document is written so that the markup declares structure but not presentation, the document is said to be written in semantic HTML. Separating the semantic meaning of a tag from its visual appearance will likely take some practice. For example, when you see a ul
tag, you automatically think of a bulleted list. In semantic HTML, however, a ul
is merely a sequence of items. Its presentation will be decided by the stylesheet, and it may look nothing like your notion of a list when rendered. This is often the case for ul
tags used to structure lists of navigation links.
Let's examine a few more HTML elements and discuss them through the lens of semantics.
To create an image with a caption, you could use an img
element followed by a p
element. But their interdependent structure would not be clear. Enter the figure
element, an element that explicitly associates content with an explanatory caption:
The text here is lorem ipsum text. Such placeholder text has been used by the typesetting industry for decades in promotional materials. The original Latin text comes from an essay written by Cicero, a Roman orator. The image comes from a website that offers similar randomness but for photographs.
Suppose you wish to share some Java code on a web page. When you insert the code directly into the HTML, the rendering is probably not what you had in mind:
There are no tags in the HTML. Without markup, the code has the same semantic meaning as plain text. The browser collapses the whitespace and renders it like any other flowing text.
The pre
tag can be used to communicate that content is preformatted:
The semantic meaning of a pre
element is that the whitespace in the content is significant. How the content is displayed is still left up to the browser. Most browsers render preformatted elements using a monospace font.
Inside a preformatted element, all whitespace is considered significant. Sometimes you only want a significant linebreak, as in poetry. Without any markup, the linebreaks are lost in this limerick:
Putting this text in a preformatted element would fix the rendering, but it isn't semantically appropriate if you want to make just linebreaks significant. For that, you can use the break element, whose tag is <br>
:
Break is a void element. It has no children and no closing tag.
If you are using multiple break elements to add padding between elements, and not to mark structural linebreaks, you are violating the element's semantic meaning. Padding must be added not through HTML but through a stylesheet.
Suppose you have this US census data that you would like to put in a document:
The data is unstructured and hard to interpret. It belongs in a table element. The immediate children of a table are table row elements, which are marked with the tag tr
. Each row is broken in table data elements, which are marked with the tag td
:
The semantic structure is clear, and the browser also happens to display the table in a readable grid. The semantic meaning can be made even more clear by prepending a row of table heading cells:
The rows are further distinguished using the thead
and tbody
tags.
When presentational HTML was fashionable, table elements were used to define the overall layout of a page. In semantic HTML, tables are not used for layout. A table element is used only to structure tabular data.