Metadata, search engines, social networks
Once you make a webpage, you want people to find it and to share it.
Letting Google read you
There are people who make a living out of Search Engine Optimisation (SEO). The details can be arcane. But the basics are very simply and you should learn them yourself. Google indexes three things:
- The text of your page
- The metadata of your page
- How your pages is linked from other pages
In the first place, Google needs to be able to read your page. The Google-bot can read text. More specifically, it can read the text in a HTML page. Everything else requires extra effort on your part. Flash used to be a pain, for example.
The text that is easiest for Google to read is the source of the webpage. That means that all text added afterwards by JavaScript (or loaded in from other pages) is not scraped by Google (unless you take extra effort). Websites that are made as separate HTML pages (like your thesis) or with a server-side language like PHP (like Wordpress) are generally fine. If you do navigation and loading content with JavaScript you need to take extra measures to make sure the content is indexed.1
Historically, Google’s algorithms have excelled at indexing text without using keywords. But a page’s metadata matters as well. More about that in the next section.
Finally, a deciding factor in Google’s ranking since the early days has been how others link to your page. The more people link to your page, the better it will perform in Google results. This is something that is not easy to achieve overnight. But once you have a few well visited sites linking to your page, it will usually start to perform much better in the rankings.
Metadata
Metadata means data about data. In the context of a website, metadata is all the information contained in the code that is not directly visible on the page, but can be read be computer programs. Metadata is useful to organise information (like in a library catalogue), or to quickly decide if something is worth your time (the summary on the back of a book), or even if you can afford it (the price tag).
One metadata tag you have surely used is <title>
<title>This is the title of the website</title>
This is the title of your page, as displayed in the browser tab. It is used by default in many other places, such as Google search results, bookmark names, etc.
For other meta-information, there is a dedicated HTML tag: <meta>
. It used for many different meta-datas, using the general form:
<meta name="meta-data-name" content="The contents of the metadata" />
One meta-tag that is important to use is the description.
<meta name="description" content="In this you can put a short description of your website. It will be used as a description in Some cases." />
Your description does not need to be different for each page—but it very well can be.
Google and meta-data
Search engines have traditionally used metadata to create their indexes. HTML has a tag to specify keywords. But keywords can be incomplete, and authors can try to trick search engines into giving them a good rating by adding popular keywords that are not really related to the content. That is why Google has long been ignoring keywords, and favours reading and indexing the text of the page directly.
That does not mean Google does not look at metadata at all. The name of a listing in search results is taken directly from the <title>
tag. And if it can not extract a nice sentence to use in the short descriptive text underneath a link, it will use the description
meta tag.2
Facebook, twitter, previews and metadata
Page metadata has gained a new importance with the rise of social media. When sharing a link to a page, sites like Facebook and Twitter show a preview of that page. Needless to say, the more enticing that preview, the more people will visit that page. The preview is based on meta-data present in the page.
The Open Graph protocol
To control the appearance of the preview on Facebook, Facebook has introduced the Open Graph protocol. It is loosely based on the existing RDF/A protocol. In true monopolist style, it is not completely compatible. Open Graph uses meta tags that look just like regular HTML meta tags, except that they use property
instead of name
:
<meta property="og:something" content="The content of the facebook metadata" />
These are the most important ones:
<meta property="og:title" content="The title of the page" />
<meta property="og:description" content="A short description of the page’s content" />
<meta property="og:image" content="http://example.com/url-to-preview-image.jpg" />
One might notice that some of these properties are very similar to the ones we could already encode with other HTML meta tags. There are some differences though. The <title>
tag often contains both the name of the website and the current page:
<title>Mouse (computing) - Wikipedia, the free encyclopedia</title>
In a Facebook preview these are separated. So they could be encoded like:
<meta property="og:title" content="Mouse (computing)" />
<meta property="og:site_name" content="Wikipedia, the free encyclopedia" />
If this information is not encoded in an og-tag, Facebook will fall back to the <title>
for the og-title
, and the websites url for og:site_name
.
The one tag that is really useful, and that I would definitely recommend to use, is og:image
. It allows one to specify an image that shown in the feed. The preferred format is currently 1200 by 630 pixels. Images exactly in this format will show up the biggest in the timeline.
You can check how Facebook scrapes your website in the Debugger.
Twitter cards
Twitter has its own standard similar to Open Graph, called Twitter cards. It allows you to once again describe title, description, preview image etc. If there are no Twitter tags present it will fall back to the Facebook data. This means using Twitter tags is not really necessary in most cases. There are two tags that can be practical though. With twitter:site
and twitter:creator
one can link to the twitter accounts that are responsible for the site and the creator of the specific page, respectively. In this way, when content is shared through Twitter, Twitter users can quickly find these accounts.
An example
Putting it all together, a recent newspaper article on why the Flemish public likes Dutch TV personalities can be encoded as such:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Waarom Vlamingen toch zo van Nederlanders houden - NRC Handelsblad van donderdag 11 juni 2015</title>
<meta name="description" content="Het Laatste Nieuws lijft Jan Mulder in voor een recordbedrag als columnist: hij zou 750.000 euro krijgen voor drie jaar columns schrijven. Waarom doet hij het zo goed bij onze zuiderburen?" />
<meta property="og:title" content="Waarom Vlamingen toch zo van Nederlanders houden" />
<meta property="og:image" content="http://static.nrc.nl/images/w640/1106mediajanmulder.jpg" />
<meta property="dc:creator" content="The name of the author of page" />
<meta property="twitter:site" content="@nrc">
</head>
<body>
....
</body>
</html>
Admittedly, this is rapidly changing. Google will know how to read more and more JavaScript. In which case you still have to use a ‘client-side router’ to make sure every part of your site has its own url.↩
Google can more sophisticated metadata, although this requires more advanced markup techniques. Check out ‘Promote Your Content with Structured Data Markup’.↩