by Alexandre Alapetite on 2003-04-11; updated on 2005-07-04

Headers and META information in HTML

I will quickly introduce headers, META information and other technical points that are important in HTML pages.
This documentation is made for XHTML 1.0 but can be used with few changes with other versions of HTML.

Abstract
Table of contents
HTML headers
Charset and document type
The language
- Strictest standard
- Secondary languages and alpha-3
A description
Some keywords
Robots
LINK, a META using an external file
Other META information
Declaration of the type of the document
Licence
Comments

Exit

HTML headers

HTML Web pages start with a <head> area where META information can be stored.
Here is an example (inspired from this page):

test.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta http-equiv="content-language" content="en-GB" />
<title>META information HTML - Doc Alex</title>
<meta name="language" content="en-GB" />
<meta name="robots" content="all" />
<meta name="description" content="Important META information in HTML" />
<meta name="keywords" content="HTML,META,charset,languages,ISO" />
<link rel="author" href="mailto:alex@france.fr" xml:lang="fr-FR" title="Alexandre Alapetite" />
<link rel="start" href="../../index.en.html" />
<link rel="up" href="../index.en.html" />
<link rel="contents" href="#toc" />
<link rel="help" href="../../about/index.en.html" />
<link rel="search" href="../../find.html" />
<link rel="alternate" href="index.fr.html" hreflang="fr-FR" xml:lang="fr-FR" title="Documentation META en français" />
<link rel="alternate" type="application/pdf" media="print" href="alx_meta.pdf" />
</head>

<body>
...
</body>
</html>

NB: "http-equiv" META tags are more dedicated to simulate or replace HTTP headers, whereas others are for browsers and other clients like search engines.

Index

Charset and document type

Specifying the document charset and type is one of the most important information, especially in the current internationalization process. The charset is the name of the characters list used in the document.
If you are in west Europe, (with your Windows Notepad for example) then your documents are probably coded with ISO-8859-1.
Specifying the charset allows browsers to know which font to use, to display the good characters, and not the ones from another alphabet (like Greek, Russian, Chinese, ...), (see Microsoft MSDN), without mentioning Unicode charsets like UTF-8.
Moreover, you can keep special characters like [é,è,à,å,ø,æ...] with no need to replace them by their associated codes [é,è,à,å,æ,...]. Since we talk about HTML Web pages, document type will always be text/html.
The UTF-8 Unicode encoding should be favoured at all times. The document charset and type are specified by the line:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

This line is to be placed as high as possible in the headers (especially before <TITLE>) since browsers recalculate the page when they read this line.
More information about character sets can be founded at MauveCloud’s, and Alan Wood for Unicode.

Index

The language

This information is also essential. It allows searching specifying the language, to call with no error automatic translators or speech synthesis, to be correctly indexed by search engines, etc...
The language is specified by the line:

<meta http-equiv="content-language" content="fr-FR" />
<meta name="language" content="fr-FR" />

It is quite a pity, but tools working mainly on HTTP protocol or file format will use the first line, whereas tools working mainly on the content will rather use the second one.
Note that it is possible, for documents written with several languages, to specify those different languages sorted by importance separating them by a comma ",".

Strictest standard

It is better using the strictest standard for coding the language (see RFC-3066), which schema is xx-XX:
2 lowercase characters for the language "-" 2 uppercase characters for the country.
The first two characters are coding the general language, French for example, using ISO-639-1 standard, while the two last characters, optional, are specifying a country coded with ISO-3166 standard. This allows a distinction between French from France, from Switzerland or Canada for example. The country code has to be for a country that uses it as an official language. For the few languages that have no official country, no sub code will be added.

Some examples:

French from France: fr-FR
French from Canada: fr-CA
French from Switzerland: fr-CH
English from United-Kingdom: en-GB
English from USA: en-US
English from Canada: en-CA
Danish from Denmark: da-DK
Italian from Italy: it-IT
Italian from Switzerland: it-CH
Spanish from Spain: es-ES
German from Germany: de-DE
German from Switzerland: de-CH
Swedish from Sweden: sv-SE
Esperanto: eo

Index

Secondary languages and alpha-3

There are many secondary languages, less used, which codes are listed in the ISO-639-2 document and which schema is xxx.

Example:

Provencal: pro

NB: It is not recommended to use the ISO-639-2 alpha-3 standard (i.e.: 3 alphabetic characters) to code languages already defined by an alpha-2 code in the ISO-639-1 document, this for compatibility purposes.
In the same way, it is better, if an alpha-2 code exists, not to use alpha-3 country codes. (other link: United Nations).

Index

A description

It is possible to include a short description of the page. This description is often used by search engines (like Lycos) as a comment just under the links resulting from a query.
The line to use is:

<meta name="description" xml:lang="en-GB" content="Important META information in HTML" />

It is interesting to re-precise the language for this line if you propose a description in several languages.

Index

Some keywords

There is a META information which aim is to provide search engines some keywords from the Web page. However, it appears that those are not using this information so much, since it is neither often provided nor correct. Nevertheless, some small engines use it, like the one for personal Web pages hosted by the provider chez Free.
It can be declared this way:

<meta name="keywords" xml:lang="en-GB" content="HTML,META,charset,languages,ISO" />

Here again, it could be useful to precise the language.

Index

Robots

There is a META information reserved for specifying whether the page is to be indexed by search engines or not, and whether search engines are allowed to follow its links or not.
It is logical to put this line before the META description and keywords.

<meta name="robots" content="all" />

The values are a combination of "index" or "noindex" with "follow" or "nofollow", with two shortcuts "all" (by default) and "none":
So values can be "all", synonymous of "index,follow" for a full indexation, "noindex,follow" no to be indexed but allowing to follow links, "index,nofollow" to be indexed without link analysis, and "none", synonymous of "noindex,nofollow" for no indexation nor link analysis. Moreover, there are some other tags, like Google’s.

Index

LINK, a META using an external file

It is recommended to use the almost synonymous tag LINK when it deals with META for which external files are involved. This tag is largely used and well recognized.
Some utilization has to be encouraged, like links to alternate pages translated in a foreign language, or provided in another format (PDF, Word, ...) for which a target media can be specified (printer, voice synthesis, PDA, ...). It is also possible to precise links to the glossary, table of contents, etc... to enhance navigation.
It is important to specify the type of the document, at the MIME format for links that are not referring to HTML documents. Other information like hreflang (language of the targeted document), charset or media (screen, printer, voice, ...) should not be added if identical with the current document.

Some examples:

<link rel="stylesheet" type="text/css" href="monstyle.css" />: To links a CSS style sheet.
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />: Change the icon attached to the URL. It should be named favicon.ico at the root of your site.
<link rel="alternate" hreflang="en-GB" href="page.en.html" />: Propose a version in another language. (alternate associated with hreflang)
<link rel="alternate" type="application/msword" media="print" href="page.doc" />: Propose a version in another format and/or another media (here the printer). (alternate associated with media) Well used by Internet Explorer with a Word document.
<link rel="next" href="page2.html" />: Next page in a collection. See also other standard links like start,up,first,prev,next,last,... Well used by Mozilla, Opera, ...
<link rel="glossary" href="page_glossary.html" />: Propose a glossary in relation with this page.
<link rel="author" href="mailto:alex@france.fr" xml:lang="fr-FR" title="Alexandre Alapetite" />: Link to the author.

The using of LINK become more and more indispensable, as the navigation becomes more complex, with the diversification of mediums used to access to the Internet, and when people with diseases should be able to access to the information.
The use of next, prev, for example, allows browsers like Mozilla to prefetch linked pages to increase navigation speed.

Mozilla 1.4a’s site navigation bar, using LINK

Some resources:

Index

Other META information

Other META information tags are useful, and very used, like the ones allowing redirection and refreshing control, or the validity length of a page. These META information tags have of course to be used just when necessary.

<meta http-equiv="refresh" content="5; url=https://alexandre.alapetite.fr" />: Automatic redirection to another URL, after the specified amount a seconds. More information in my documentation about redirections.
<meta http-equiv="expires" content="Mon, 24 Nov 1980 09:20:00 GMT" />: Date (RFC 850) after what this page is not valid anymore and should be reloaded.

Several other META information tags have been proposed, but unfortunately there is no real standardisation yet, despite some good projects, like the one from Dublin Core. Furthermore, browsers do not use META information tags so much.
A list of META information tag can be found here. Here is a list of a few of them:

<meta name="author" content="Alexandre Alapetite" />: Name of the author of this page.
<meta name="generator" content="Notepad" />: Tool used to create the page.
<meta name="geo.position" content="43.923;4.828" />: Terrestrial latitude and longitude of the subject of this page, to be indexed by GeoURL or GeoTags
<meta http-equiv="Page-Enter" content="RevealTrans(Duration=1,Transition=23)" />: Graphical transition under Internet Explorer.

Sommaire

Declaration of the type of the document

The declaration of the type of the document is becoming more and more important, in the current formalisation process.
Several HTML versions coexist: HTML3.2, HTML4.01, XHTML2.0 for the three last ones. There are also other documents using XML. So it is important to avoid version conflicts, which produce errors and unexpected rendering with the different browsers.

Furthermore, this declaration allows the validation, and maybe an automatic correction of the page by tools, like the one from the W3C online.
This validation is indispensable.

I would advice the using of XHTML1.0 Transitional as close as possible from XHTML1.0 Strict (XHTML 1.1). This is specified this way:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB">

Since the XHTML is more formal than old HTML, it allows a precise verification by simple tools, like the one I am developping.
Watch out that this declaration is a bit different for XHTML 1.0 documents using a FRAMESET:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB">