Canonical links: how to avoid SEO chaos
Definition
A canonical page is the page that a search engine identifies as the primary among a group of pages with similar content.
A canonical link is a link that points to the canonical page and includes the rel attribute with the value canonical: <link rel="canonical" href="link"/>.
A non-canonical page is a page that includes the rel="canonical" attribute with the address of another page.
What the canonical attribute looks like
The rel="canonical" attribute can be specified in two ways:
- <link rel="canonical" href="link" /> — in the <head> section of the page.
- Link: <link>; rel="canonical" — in the HTTP header.
The Canonicalization Process
Canonicalization is the process of selecting the primary page among duplicates (identical pages available at different addresses) or among pages with similar content.
Why canonical links are important for SEO
- Avoiding duplicate content.
Search engines do not favor duplicate content as it clutters search results. Additionally, algorithms can struggle to correctly identify the main page. The rel="canonical" attribute indicates which URL should be indexed. Google notes that they do not always recognize the specified canonical address, as canonical tags are suggestions rather than directives.
Proper use of canonical tags helps reduce the risk of the bot choosing the wrong canonical page. - Efficient use of search engines' crawl budget. A large amount of duplicate content can negatively impact the "crawl budget." This means search engines will spend their resources scanning non-unique pages instead of finding new or updated content. It is worth noting that with proper setup, search bots bypass non-canonical pages much less frequently than canonical ones.
- Consolidation of traffic.
The canonical attribute helps consolidate traffic to identical or repetitive pages. This is necessary to collect all the information about different pages (e.g., links to them) and associate it with one URL. For example, to combine links for the page site.com/air-conditioners/red?gclid=123 with links for site.com/air-conditioners. - Accuracy of data in Google Search Console.
Data from the performance report in Google Search Console has been linked to canonical addresses since 2019. This means that to get accurate data from the report, you need to specify the correct canonical pages. - Protection against spam.
Setting a canonical for each page helps protect against spam, where competitors generate junk pages through get-parameters.
When to use canonical
There are several situations where the canonical attribute is necessary. In other cases, it can be used at your discretion.
For duplicate pages
Often, the same page can be accessible via different URLs. This happens because a section/product/service can belong to several categories. In this case, you need to choose one address as the main one and set canonical for other duplicate pages.
Example: In an online store, you can access a product page in three ways:
- site.com/lg/air-conditioners
- site.com/brand/lg/air-conditioners
- site.com/home-appliances/brand/lg/air-conditioners/
You can choose any of these as the canonical address, but it is better to choose the first or second option, as their depth level is less than the third. (A study of ranking factors by Backlinko showed that shorter URLs correlate with higher positions in Google).
Sorting pages are also considered duplicates by search engines, as the order of content display does not change the actual content of the page:
- site.com/air-conditioners/?sort=name_desc
- site.com/air-conditioners/?sort=price_desc
- site.com/air-conditioners/?sort=new
- etc.
Additionally, canonical should be used in cases where the page content does not change after applying filters on the site.
Example: There is a "split systems" page that contains 5 models. After applying the "Cooling area up to 30 sq. m" filter, the page still shows the same 5 models. In this case, the content has not changed, so it is worth setting canonical to the parent page.
For pages with similar content
If you have similar content at different addresses, you should also use canonical. For example, these can be products that differ only in color or size. In this case, one main page is selected, and others are set as canonical links to it. This method is worth applying when there is no demand for the key "product+color" or "product+size".
For AMP pages
For pages created using AMP technology, you need to specify the canonical address to the main page.
Example:
- URL: site.com/amp/air-conditioners
- Canonical: site.com/air-conditioners
Situations when canonical can be used
Cases where canonical can be one of the possible solutions to technical issues.
For dynamic URLs
Dynamic URLs can be created on the site by adding various identifiers and parameters through filters, spam, ad transitions, etc.
Examples:
- site.com/air-conditioners/inverter?color=red
- site.com/air-conditioners/inverter?gclid=ABCD
Such duplicates can be filtered out using canonical.
For copies of pages on multilingual and multi-regional sites
Versions of the same page in different languages are considered copies if the main content is written in one language, and only some text elements are translated. In this case, the main version should be specified as the canonical page.
Using rel="canonical" on pagination pages
Canonical on pagination pages can be set in two ways:
- If there is a general page that contains all the content from the entire pagination, then the canonical tag is set to it;
- When each pagination page's canonical points to itself.
For individual print pages
Sometimes print pages form separate pages that are not significant for search engines.
Examples:
- site.com/air-conditioners/
- site.com/air-conditioners/buy
Setting canonical to the parent page helps avoid duplication.
For merging pages
Canonical can be used to merge pages when the content is the same, but the URLs differ:
- with or without the www prefix: https://site.com and https://www.site.com
- http and https protocols: http://site.com and https://site.com
- presence or absence of a slash at the end of the URL: site.com/air-conditioners/ and site.com/air-conditioners
For different URL spellings
For example, when pages are identical in content but differ only in the presence of capital letters in the address:
Examples:
- site.com/air-conditioners/lg/
- site.com/air-conditioners/LG/
How to specify the canonical address of a page
There are three main ways to specify a canonical page.
HTML code
The most popular way is to use the <link> tag in the <head> section of the HTML document:
<link rel="canonical" href="link to the canonical page" />
*Canonical links should be set for all duplicate pages.
HTTP header
Canonicalization can be done for both regular HTML pages and electronic documents (PDF, DOC, XLS, etc.). For example, if a PDF file is available at different URLs, you need to specify the canonical one through the HTTP header as follows:
Link: <link to the canonical page>; rel="canonical"
Sitemap.xml file
All pages in the sitemap are considered canonical by default. Therefore, there should be no duplicates in the sitemaps. Otherwise, search bots may get confused in choosing the canonical address. No attributes are needed to specify the canonical page.
How to check canonicalization
It is important to understand whether the specified canonical address is taken into account. There are two ways to do this:
Google Search Console
In Google Search Console, you need to go to the URL check tool and enter the page address for which you want to check canonicalization. If the canonical address is different, the tool will show the URL specified in the rel="canonical" attribute.
Using search engines
In Google, enter a query with the URL of the checked page in the search box:
info: site.com/link_to_checked_page
The first result will be the canonical page selected by the search engine.
Common Canonicalization Mistakes
The most common mistakes when using canonical links.
- Specifying a non-existent canonical page.
As a result, the search engine will simply ignore the canonical link. This happens most often due to typos or an incorrect protocol. - Cross-referencing canonicals.
When pages reference each other. For example, page A specifies canonical for page B, and page B specifies canonical for page A. The search engine will also ignore such tags. - Setting canonical to irrelevant pages.
The canonical tag should always point to a page with similar content to the one for which it is set. - Specifying multiple canonical pages.
The canonical attribute should be used only once per page. The search engine will ignore multiple attributes. - Incorrect implementation of canonical for pagination.
Specifying canonical for the first page of pagination is a common mistake. The correct way is to set canonical for each individual pagination page or to a consolidated page containing all pagination content.
Common Canonicalization Mistakes (continued)
- Using relative URLs in the canonical tag.
Always use absolute URLs in the canonical tag to avoid confusion. For example, use<link rel="canonical" href="https://www.example.com/page" />
instead of<link rel="canonical" href="/page" />
. - Setting canonical tags on noindexed pages.
Canonical tags should not be used on pages that are set to noindex. If a page is not meant to be indexed, using canonical tags is redundant and may confuse search engines. - Canonicalizing to the wrong language version.
Ensure that the canonical tag points to the correct language version of the page. For example, the English version should have a canonical tag pointing to the English page, not a page in another language. - Ignoring URL parameters in canonical tags.
URL parameters can significantly change the content of a page. Be careful when setting canonical tags on pages with URL parameters to ensure that they point to the correct canonical version without the parameters if appropriate. - Not updating canonical tags after site migrations.
After a site migration or URL structure change, it's crucial to update all canonical tags to reflect the new URLs. Failing to do so can lead to broken links and indexing issues.
Best Practices for Canonicalization
To ensure effective use of canonical tags, follow these best practices:
- Audit your site regularly.
Regularly audit your site to check for duplicate content and ensure that all canonical tags are correctly implemented and pointing to the appropriate pages. - Use consistent URL structures.
Maintain a consistent URL structure throughout your site to simplify canonicalization and reduce the chances of duplicate content. - Combine canonical tags with other SEO techniques.
Canonical tags are just one tool in your SEO toolkit. Use them alongside other techniques like 301 redirects, meta robots tags, and sitemaps to manage duplicate content effectively. - Test changes before deploying.
Before implementing canonical tags site-wide, test them on a few pages to ensure they work as expected and do not negatively impact your site's SEO performance. - Monitor changes in Google Search Console.
Use Google Search Console to monitor how Google is interpreting your canonical tags. Check the Coverage and Performance reports regularly to identify any issues or unexpected behavior. - Educate your team.
Ensure that everyone involved in your website's development and content management understands the importance of canonical tags and how to implement them correctly.
Conclusion
Canonicalization is a crucial aspect of SEO that helps manage duplicate content, consolidate traffic, and improve the accuracy of data in tools like Google Search Console. By understanding when and how to use canonical tags, you can ensure that search engines correctly identify the primary pages on your site, improving your site's visibility and performance in search results. Avoid common mistakes, follow best practices, and regularly audit your site to maintain effective canonicalization.
By correctly implementing canonical tags, you not only help search engines understand your site's structure but also improve the overall user experience by guiding users to the most relevant and authoritative version of your content.