Seeing double: the SEO dangers of duplicate content

As the world’s most ubiquitous search engine, Google’s mission is to deliver relevant, comprehensive and specialised information in response to any question or phrase. As a result, it prioritises sites that can offer unique, original content combined with site authority. Correspondingly, it deprioritises pages that are derivative and/or unoriginal – that is, those high in duplicate content.

Duplicate content is a particular problem for ecommerce sites, especially in the context of product descriptions. The Primary Content that sits on  your product and category pages – that is, the essential information that drives organic traffic and turns browsers into buyers, including product and category descriptions – is the most important text on your website. Despite this, too many etailers ‘make do’ with unoriginal content – diminishing not only their site’s SEO but also the customer journey.

Here we explore the why, the how and the how-to-avoid as far as this common pitfall is concerned.

What is duplicate content and why is it bad?

Here is Google’s definition of duplicate content:

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

– Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices.

– Store items shown or linked via multiple distinct URLs.

– Printer-only versions of web pages.

While Google has repeatedly stated that it does not penalise ‘innocent’ duplicate content (i.e. non-spam sites), it certainly doesn’t reward it either. What it does favour is unique Primary Content, and the corresponding added value that it offers a reader.

And although duplicate content may not be exactly punished by Google, it is overlooked. Google’s algorithm filters for duplicates, so that only one version of a text appears in any one Search Engine Results Page (SERP). While this is fine for the one entry Google decides is ‘canonical’ – i.e. the original or otherwise most valuable iteration of this information – it means that any pages deemed non-canonical are effectively erased from the search. As a result, duplicate pages across the same domain often cannibalise one another’s traffic and SEO value, reducing the overall impact of your site in the SERPs.

A sub-category of duplicate content is ‘thin’ content. Here is Google’s take again:

Google believes that pure, or ‘thin’, affiliate websites do not provide additional value for web users, especially (but not only) if they are part of a program that distributes its content across a network of affiliates. These sites often appear to be cookie-cutter sites or templates where the same or similar content is replicated within the same site, or across multiple domains or languages. Because a search results page could return several of these sites, all with the same content, thin affiliates create a frustrating user experience.

Examples of thin affiliates:

Pages with product affiliate links on which the product descriptions and reviews are copied directly from the original merchant without any original content or added value.

– Pages of product affiliation where the majority of the site is made for affiliation and contains a limited amount of original content or added value for users.

As you can see, Google specifically calls out ecommerce sites that copy/paste manufacturer information, as well as aggregate and spin-off sites. According to reports by Sistrix and Search Engine Land on 2017’s Google Fred update, thin content has little success in the SERPs; and some low-value content-heavy sites saw a drop of 50-90% in their Google visibility following the update.

So what are the common duplication errors found on ecommerce websites?

We can see that duplicate or ‘thin’ content brings little to no SEO value to your ecommerce site; in some cases, it can even damage its performance in Google search. The problem is, this kind of content is everywhere – particularly in ecommerce.

Here are four common causes of duplication on etail sites:

1. Duplicate URLs

It’s fairly common for the same product page to be found via more than one URL path on an ecommerce site. For example, if a sofa comes in multiple colours or fabrics, the product might come to be located on two different pages:

/sofas/two-seater/red

/sofas/two-seater/blue

Because your description copy across these products is bound to be similar or even identical, they count as duplicate pages – even if the images are different.

Furthermore, there are many ways in which web developers can unconsciously duplicate content across similar – but not identical – URLs. For example, the use of the secure https protocol, rather than http, is common on ecommerce platforms. But while Google favours sites that feature https encryption, recreating your entire site in this protocol requires duplication of pretty much every page – including products.

The result is duplication on a site-wide level. Even in cases where protective sign-ins prevent Google from crawling etailer https pages, mistakes in web development can still open swathes of these encrypted pages up to bots, confusing search engine crawlers as to the original, canonical page. Again, Google cannot determine the ‘true’ origin of the content, and the site begins to cannibalise its own traffic.

2. Similar products 

If two different products are similar in function, or come from the same manufacturer, it can be tempting to produce product descriptions as quickly as possible by copy/pasting copy between items.

Unfortunately, even if slight alterations are made between descriptions, Google can still recognise the copy’s origin. Your product descriptions will count as thin content, offering neither unique nor specialised information to your user.

3. Multi-brand (aggregator) sites & marketplaces

Marketplace and multi-brand or aggregator platforms are off to a bad start in terms of original content. Many of these websites lack the resources to create their own unique content for ever-expanding product ranges. As a result, some end up copy/pasting merchant or manufacturer information onto products to fill otherwise empty space.

This is, of course, the definition of duplicate content. Since Google will probably prefer the original, canonical manufacturer version, these content-thin sites will find their pages automatically filtered out of many product searches.

Obvious exceptions to this rule are marketplace behemoths such as Amazon or Etsy, whose brands are bigger than those of the products they stock. This results in a different problem…

4. Copycat product feeds

If you’re on the other side of the equation as a brand, supplying product feeds to an influential aggregator, marketplace or social commerce site, it can be tempting to pass on your lovingly crafted, unique description copy with it.

Don’t risk it. Large multi-brand domains will almost always hold more credit in search engines than your brand site – unless you, too, are a powerful Google entity. Either way, this creates strong competition between their product page and that on your own site, putting your original content – now potentially judged ‘duplicate’ – at risk of being pushed out of the SERPs.

How can SEO teams tackle the issue of duplicate content?

Fortunately, there are a few ways to minimise the effect of duplicate content on your website’s performance, depending on your issue.

1. Canonicalisation

If you have two pages on your site featuring similar or identical text and you can’t get rid of either, there are ways of indicating to Google which should be indexed. This process, known as canonicalisation, helps the search engine understand what is duplicated and what is ‘original’ – that is, what you want it to take notice of, and what you want it to ignore.

Access the HTML of the non-canonical page – that is, the duplicate – and place the following tag in the header:

<link rel=”canonical” href=”http://www.preferredURL.com/morevaluable”>

Google will use this information when crawling your site to prioritise the canonical URL in search results. You can also use self-referential canonical tags on the target page itself to help Google be completely sure of what to index.

Canonicalisation has its limits – crucially, when it’s a different site duplicating your content. Although cross-domain canonicalisation is supported by Google, you’ll have your work cut out for you convincing foreign domains to add canonical tags to every page using your content, directing SEO benefits back to you.

2. 301 redirect

According to Google, this is a faster and more effective way of doing what the canonical tags accomplish, although it again only functions within your own domain. Using permanent 301 redirects, SEOs can ensure that anything that accesses URL A – including Google bot and external links – is instead directed to URL B, effectively erasing the page at URL A from existence.

301 redirects are a great way to ensure that your users, as well as Google, are only accessing one version of a duplicate page. Furthermore, any link juice or SEO properties garnered by URL A are passed directly on to URL B. Unfortunately, 301s aren’t always possible – for example, when product pages are similar but not exactly the same, and you need both versions available to the consumer. If this is the case, canonical tags may be a preferable option.

Small tweaks in how you curate your URLs can yield real results in the battle against duplication. Products in different fabrics or colours that are otherwise identical should be displayed on the same page, not separate URLs. Differing URL paths that display the same product should 301 redirect to a single prioritised version, pooling all cumulative link juice and SEO authority.

3. Invest in unique content

The simplest way to search-optimise your Primary Content and avoid duplication is to be more clever with its creation and dissemination.

In a perfect world, every single product description on your site should be individually crafted to a common brief. By providing unique, detailed information to your consumer at every touch point, you encourage Google to bump your pages up the SERPs. In a really ideal situation, this content should then be regularly updated and replaced to foil competitor copycats. Fresh, quality, regularly updated copy also sees a huge uptick in both traffic and conversion.

Furthermore, while you can’t realistically prevent other sites from copy/pasting your original product content, you can stop serving it up to aggregators and marketplaces in product feeds and inventory listings. We at Quill offer a content variant add-on to meet this need; along with our core product description-writing service, we create unique variants for use in your data feeds, either in the core brand tone of voice or adapted for the appropriate site destination.

Of course, writing original content for every page of your site – and keeping that content routinely updated – is, for most ecommerce businesses, a gargantuan task. With hundreds or even thousands of products to update at any one time, content production at an adequate speed and scale is a significant operational challenge.

In these cases, outsourcing content creation can be an effective solution, answering not only questions of duplication and SEO but user experience too. Get in contact to discover how Quill can leverage its unique Cloud technologies and 2500-strong freelancer Network to battle on-site duplication and help your brand conquer the SERPs.

More posts from the blog

21.10.2018

Luxury content: how to deliver a premium online customer experience

Find out more
15.10.2018

Open for business: meeting web content accessibility standards in ecommerce

Find out more
09.10.2018

Ed Bussey shortlisted for ‘Disruptor of the Year’ at UK Tech Founder Awards

Find out more

Get in touch with the team

Contact Back to top