Funnel that link juice with canonicalization

In an earlier post titled, “,” I talked about how you can canonicalize your domain name in terms of sorting out potential user confusion between homophones (words that sound the same but are spelled differently).

Note: Remember that to canonicalize a URL is to select one primary version and divert all traffic from the other various versions to the canonical version.

That earlier advice was cool, but it’s not the end of your canonicalization considerations. Not even close! There are many other URL variation issues to resolve through URL canonicalization. And given that search engine indexes are so literal as to make Dr. Sheldon Cooper look like an artistic free thinker, we need to actively address this issue for SEO.

The problem is that the search engines regard each version of a URL as a separate entity, even when the target page of multiple URLs point to the same source code content. As such, each URL variation to a given page earns its own search engine index ranking value, thereby diluting the potential rank of the target page.

Multiple URLs pointing to the same page?

At first it seems odd. You’d think there were discrete addresses for each page on the web! But the web is more flexible than you might think, and for search engines, that is the rub. This is especially troublesome for domain home pages. Below is an example of just a few URL variations that would likely point to the same hypothetical domain home page:

  • www.mysite.com/
  • www.mysite.com
  • mysite.com
  • mysite.com/
  • www.mysite.com/index.html
  • www.mysite.com/index.html?var=1
  • www.mysite.com/en/us/
  • www.<ExternalHostProvider>.com/~mysite

Believe it or not, this is an abbreviated list! The list of all possible permutations is even larger. Not all of these apply in every circumstance, but I’ve seen each of these pull up the home page of a non-canonicalized site, and that’s a problem.

Testing your site for canonicalization problems

So how many variations of your domain home page are indexed and earning their own rank, taking SEO value away from your preferred (canonical) home page URL? Let’s find out. Start your favorite browser and open three session tabs, all pointing to www.google.com (you can run this with www.bing.com as well). In the three browser tabs, type the following lines of text respectively in the Google search text boxes:

  1. site:<mysite>.com
  2. site:www.<mysite>.com
  3. site:<mysite>.com -site:www.<mysite>.com

Replace <mysite> with just your root domain name and its associated top-level domain (it doesn’t have to be .COM) exactly as shown.

Note: This test is intended for domain root sites. If your site is a blog subdomain under a host, such as blogspot.com, the test as described won’t provide useful results. Also note that the number results you see are not necessarily a complete listing of all indexed pages in Google. It’s common that the results from “site:” queries are merely subset sample listings, especially with larger sites, regardless of which search engine is queried.

This first test query searches for all pages in the index from the entire domain (including all subdomains). The second test query specifically looks for indexed pages associated with the “www.” subdomain. Finally, the third test reruns the first query while using the second query as an exclusion filter. This filter removes all indexed results that include the “www.” subdomain in the URL.

If the last test produces results (which may include URLs in subdomains other than “www.”), examine the search results in detail for both the “www.” and non-“www.” variations of your site’s URLs. If found, you need to canonicalize your site to consolidate the URL variations in the search engine so your pages can earn the highest possible rank for the canonical URL form.

So you discover you have canonicalization problems. Welcome to the club! Now let’s fix them.

Canonicalize external inbound links with 301 redirects

URLs are found through crawled links. And with the notable exception of nefarious link spammers (you know who you are!), links coming to you from external sites are typically out of your control. So what do you do when external sites link to your home page using a non-canonicalized URL? Use redirects! More specifically, use permanent redirects – aka the 301 redirect.

For the list of URL variations shown above, most webmasters choose the first variant as their canonical URL. This URL form includes the “www.“ prefix as well as a trailing slash (but feel free to buck the system and choose your own style — just be consistent about it!). Once you’ve selected your canonical URL, you need to set up 301 redirects for all the other, non-canonical URLs.

The key here is the use of the 301 or permanent redirect rather than the 302 or temporary redirect (which is often the default type offered by web servers). Search engines react differently to 301s than they do to 302s. When they encounter a 302, they assume the redirect will eventually be removed (such as when an online store redirects a URL for an out-of-stock inventory item to a “Temporarily Sold Out” page). As the 302 is temporary, the original URL is assumed to be coming back online soon, so the search engines make no change to their indexes. However, a 301 tells the search engines that the original URL is permanently gone, which enables them to safely transfer all of the search index ranking value from the old URL to the new URL! This is how you consolidate all of those diluted index ranking values from the URL variants to your new canonical URL.

How do you set up a 301 redirect?

Excellent question, Grasshopper. The answer depends upon your web server environment. Users of Apache HTTP Server can modify the .htaccess file at the root of their website by inserting a bit of script. I’ve used variants of the following script in my work:

# Redirects non www. URLs to the full www.mysite.com/ URL
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite\.com [NC]
RewriteRule ^(.*)$ http://www.mysite.com/$1 [L,R=301]

# Redirects URLs specifying default page file name references to the full www.mysite.com/ URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*(default|index)\.(html|php|htm)\ HTTP/
RewriteRule ^(([^/]+/)*)(default|index)\.(html|php|htm)$ http://www.mysite.com/$1 [L,R=301]

This script permanently redirects all variants of the home page URL that omit the “www.” domain name prefix, omit the trailing slash, or add file name references, to refer to a canonicalized URL. So even if you get a great authoritative link from www.BigPowerfulSite.com but they use a non-canonical URL and can’t be bothered to change it, who cares? You’ve just fixed the problem from your side.

If your web site runs on Internet Information Services (IIS) for Windows Server, you will likely use the IIS graphical user interface to set up your redirects. To save a ton of space in this post, I refer you to your IIS documentation for task-specific information. You can also run a web search on how to set up a 301 redirect on IIS!

Canonicalize your internal linking with consistent, absolute URLs

So now that you’ve got your external, inbound links canonicalized, you need to look inside. Inside your own site, I mean. Considering that internal links are theoretically controlled by one person (or organization), it’s a bit surprising how often intra-site links use inconsistent URLs. And all links offer SEO value (at least to some degree, with standard caveats, of course), even if they are from and to pages in the same domain, such as a link from your home page to your About Us page. You gotta do your own links right!

You need to scan your site (or use your CMS) to search your source code for all intra-site links (both inline and navigational) to other pages on your site. Ensure the following rules are followed:

  • All links use absolute (aka full) URLs rather than relative URLs
  • All references to default pages in a folder consistently follow the same form (such as a trailing slash, but no file name)

The consistent use of the same canonical URL for each webpage on your site will reinforce your site-wide canonicalization policy, which is simply a good SEO best practice.

Canonicalize the URLs in your Sitemap file

While you’re ensuring that your intra-site links on your webpages are consistent and use canonical, absolute URLs, check your XML Sitemap files, too. Consistency counts here, and you don’t want to list file names for directory default pages in your Sitemap when your new canonical policy is to not list them!

Use the <link> rel=”canonical” tag to define the canonical URL for a given page

Lastly, as reinforcement insurance for your new canonicalization policy, add the <link> rel=canonical tag to the <head> section of every webpage. For example, for an About Us page on my fictional website, I’d add the following code:

<link rel=”canonical” href=”http://www.mysite.com/aboutus.html />

The rel=canonical tag is really a hint from a webmaster to the search engine indicating the desired canonical URL for the given page’s content. I wouldn’t depend on this tag alone to handle your canonicalization efforts, but using it is definitely part of a sound, overall canonicalization strategy.

Don’t forget about the search engines’ webmaster tools

If your site employs URL parameters for passing user tracking codes, authentication, or other data from the client to the server, these codes can get embedded in inbound links and thus indexed by the search engines. That does not help your canonicalization efforts.

A smart way to address this situation is to logon to your search engine webmaster tools account — you are signed up, right? — and set up URL parameters to be ignored when indexing URLs from your site. To set this up, go to both Google (under Site configuration > Settings > Parameter handling) and Bing (under the Index tab > URL Normalization) and add your settings.

Given that this technique requires that you create an account with the search engine and that you actively take the time to change these settings in their tools, this effort likely carries more weight with search engines than just the rel=canonical tag (at a minimum, it certainly won’t hurt!). This URL parameter handling work also contributes to canonicalizing the URLs of your site in search.

The goal of your canonicalization strategy is one URL per one webpage. By funneling all of the search engine’s attention to one URL per page, you will earn the highest possible ranking value for that page. There’s a lot more to do in SEO than canonicalization, but this is an important success factor. We’ll continue this conversation in the days and weeks to come. Stay tuned! red diamond logo

This entry was posted in canonicalization, SEO. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

9 Trackbacks

Post a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sharing Buttons by Linksku