Course → Module 7: Technical SEO Baseline
Session 7 of 7

Duplicate content is one of the most common technical SEO problems, and one of the most damaging to entity authority. When multiple URLs on your site (or across the web) serve the same or substantially similar content, Google must decide which version to index. If it chooses the wrong one, your entity signals on the preferred page get diluted or ignored entirely.

Canonical tags, hreflang attributes, and proper URL management are the tools you use to tell Google which version of your content is the authoritative one. They focus your entity signals on the right pages instead of spreading them thin across duplicates.

How Duplicate Content Affects Entity Signals

flowchart TD A["Page A: /about/
Has Organization schema,
entity description, sameAs"] --> D["Google Finds
Duplicate Content"] B["Page B: /about/index.html
Same content, same schema"] --> D C["Page C: /about/?ref=email
Same content with URL parameter"] --> D D --> E{"Which URL
is canonical?"} E -->|You specify| F["Canonical tag on A
Entity signals consolidated"] E -->|Google guesses| G["Google picks B or C
Entity signals may be split"] F --> H["Strong Entity Signal
on preferred URL"] G --> I["Diluted Entity Signal
across multiple URLs"] style A fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#6b8f71,color:#ede9e3 style H fill:#222221,stroke:#6b8f71,color:#ede9e3 style G fill:#222221,stroke:#c47a5a,color:#ede9e3 style I fill:#222221,stroke:#c47a5a,color:#ede9e3

In the diagram above, three URLs serve the same About page content. Without a canonical tag, Google must guess which URL is the "real" one. If it guesses wrong, the entity signals on your preferred URL may not be the ones Google indexes.

Key concept: Canonical tags do not prevent crawling. They tell Google which URL should be the indexed version. Google treats the canonical tag as a strong hint, not an absolute directive. But in most cases, Google respects a properly implemented canonical tag.

Common Canonicalization Issues

IssueExampleEntity ImpactFix
www vs. non-wwwhttps://example.com and https://www.example.com both serve contentEntity signals split between two domains301 redirect one to the other. Set canonical on all pages.
HTTP vs. HTTPShttp://example.com and https://example.com both accessibleEntity signals split. Also a security issue.301 redirect HTTP to HTTPS.
Trailing slash vs. no trailing slash/about/ and /about both serve contentMinor signal dilutionChoose one format, redirect the other. Set canonical.
URL parameters/about/?utm_source=email and /about/ serve same contentTracking parameters create duplicate URLsSet canonical to the clean URL (without parameters). Configure URL parameters in GSC.
Index file variations/about/, /about/index.html, /about/index.php all serve same pageMultiple indexed versions of entity pagesRedirect all variations to one canonical URL.
Pagination duplicates/blog/ and /blog/page/1/ serve identical contentMinor. Affects blog pages more than entity pages.Canonical /blog/page/1/ to /blog/. Or use rel="next/prev".
Print or AMP versions/about/ and /about/print/ or /about/amp/ serve same contentDuplicate entity content across versionsCanonical to the main version.
Case sensitivity/About/ and /about/ treated as different URLs by some serversDuplicate content with different URLsStandardize to lowercase. Redirect uppercase variations.
Session IDs in URLs/about/?sessionid=abc123 creates unique URL per visitorPotentially infinite duplicate URLsRemove session IDs from URLs. Use cookies instead.

Implementing Canonical Tags

A canonical tag is an HTML element placed in the <head> section of a page that specifies the preferred URL for that content.

<link rel="canonical" href="https://example.com/about/" />

Every page on your site should have a self-referencing canonical tag (pointing to its own URL) at minimum. This tells Google "this is the definitive URL for this content" even when no duplicate exists. It is a preventive measure.

For pages that are duplicates of another page, the canonical tag should point to the original:

<!-- On the duplicate page /about/?ref=email -->
<link rel="canonical" href="https://example.com/about/" />

Canonical Tag Rules

RuleExplanation
Every page needs a canonical tagSelf-referencing canonicals prevent future duplicate issues
Use absolute URLsAlways include the full URL with protocol and domain
Canonical must return 200 statusDo not canonical to a page that redirects or returns a 404
Canonical must be indexableDo not canonical to a noindex page
One canonical per pageMultiple canonical tags confuse Google. Use only one.
Match canonical with sitemapThe URL in your sitemap should be the canonical version
Match canonical with internal linksLink to the canonical URL, not duplicate variations

Hreflang for Multilingual Sites

If your entity operates in multiple languages or regions, hreflang tags tell Google which language version of a page to show to which users. Without hreflang, Google may show the English version to French users, or the US version to UK users.

<link rel="alternate" hreflang="en" href="https://example.com/about/" />
<link rel="alternate" hreflang="id" href="https://example.com/id/about/" />
<link rel="alternate" hreflang="x-default" href="https://example.com/about/" />

Hreflang implementation is complex and error-prone. The most common mistakes:

For entity authority, hreflang ensures that your entity signals reach the right audience. If you have an Indonesian About page and an English About page, each with appropriate schema markup, hreflang tells Google to show the Indonesian version to Indonesian users and the English version to English users.

flowchart LR A["User in Indonesia
searches brand name"] --> B{"hreflang
configured?"} B -->|Yes| C["Google serves
/id/about/
Indonesian entity signals"] B -->|No| D["Google guesses
May serve English page"] E["User in US
searches brand name"] --> B B -->|Yes| F["Google serves
/about/
English entity signals"] B -->|No| D style C fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3

Only 55% of websites implement self-referencing canonical tags, and only 38% properly handle URL parameters. If you implement all five practices in the chart above, your entity signals will be significantly more focused than the average website.

Auditing Your Canonicalization

To audit your site's canonicalization:

  1. View the source of every entity-critical page. Check for a <link rel="canonical"> tag.
  2. Verify the canonical URL uses the correct protocol (https://), domain (www or non-www, whichever you chose), and path (with or without trailing slash).
  3. Try accessing your entity pages with different URL variations (www, non-www, with and without trailing slash, with a random parameter like ?test=1). Each variation should either redirect to the canonical URL or carry a canonical tag pointing to it.
  4. Check Google Search Console's "Duplicate" entries in the Pages report. These show where Google has found duplicates and which canonical it selected.

Further Reading

Assignment

  1. View the source of your homepage, About page, Contact page, and Services page. Does each have a self-referencing canonical tag? If not, add one to each page.
  2. Test URL variations for your homepage: try with and without www, with and without trailing slash, and with a random URL parameter (?test=1). For each variation, check: does it redirect to the canonical URL, or does it serve duplicate content?
  3. Open Google Search Console and check the Pages report for any duplicate content issues. Record the number of pages flagged as "Duplicate without user-selected canonical" and "Duplicate, submitted URL not selected as canonical."
  4. If your site has multiple language versions, audit your hreflang implementation. Verify that return tags exist on every referenced page and that all hreflang URLs are canonical.
  5. Create a canonicalization policy document for your site: which URL format is canonical (www or non-www, trailing slash or not), how URL parameters should be handled, and which pages need explicit canonical tags.