OK this isn’t a rant about my site, although we have been affected, its more of a helpful guide for anyone with problems with their site where normal SEO recovery strategies haven’t worked. Most SEO companies will concentrate on cleaning up backlinks but few will go deep in to your site to reveal hidden problem that you may not have thought of. This whole article therefore is on Duplicate content – an area that should be simple right? well its not – depending on your web developer it can be a huge nightmare as it has been for us.
1. Basic Content Duplication
This is a fairly simple area to tackle – it involves keeping all of your content clean which is an essential best practice for e-commerce sites yet one of the main causes of duplicate content. When we are uploading new styles on to our website we are often given a spreadsheet from our supplier with a basic description of the article. We used to just copy and paste the content from that spreadsheet directly in to the description box on our site – what’s wrong with that you may ask well it is not your content is it? it was not written by you and in all probability it already exists somewhere else on the web, maybe the manufacturers website. You need to come up with your own description, written by you or a member of staff. I would advise reading the original piece then putting it to one side and writing your own from memory. If you copy other people’s content then your page will suffer the consequences and drop down the search listings. One tip – don’t just swap the sentences around – I caught one of our chaps doing that the other day and gave him a slap! rewrite it completely.
Make it an interesting read as well – Google likes ‘sharability’ and ‘enjoyment’ so if someone reads the content they are likely to want to share it with their friends. The content must be peppered with the keywords that you want to rank the page for as well for lots of reasons. You need to convey as much as possible about the article in question, what its made of, the lining and sole – what it can be worn with and why it would look good. Those keywords will help with the on page SEO but also – when you submit a product feed to Google through the merchant center you need to make sure that your product will be found when people are searching. PLA’s or Product Listing Ads will not automatically select your product just because you have uploaded it and paid a certain PPC. You need to use those two or three word key phrases in the copy in order that Google will select your product over a competitors. If a key search term is Pink Waterproof Wellies and you sell one then put that term in the copy even f you have to change the tense.
E.G. “This is one of Hunter’s best selling Pink Waterproof Wellies………..
This goes for all of the static pages on your website as well. Brand information is one which is easy to fall foul of – sure go to the Rockport website and read about the company, when it was started, by whom, who its owned by now – then go away and rewrite it – make it interesting.
Make your content original, informative and entertaining.
2. Duplicate Development Sites
Back in April 2011 our infamous web developer – set up a development area for our website. I was asked by them to set up a subdomain of our main site and point it at our server (using the A record) and they would use that for all stages of any new development. This was “dev.shoes.co.uk” – The problem was that they neglected to put a no follow/no index tag in the header of this version of our website! This resulted in every page being duplicated in Google’s search engine. The result was the start of the site’s decline in the index and a major factor in it subsequent poor performance in search. Being a bit wet behind the ears with SEO at the time this fact completely passed me by. It would be more than a year before I spotted the pages in the search results and had the tags put in place.
I have never read any warning of a problem like this in any papers that I have read – you may think that this is basic stuff and no right minded developer would do this – but they did!, so check yours.
Where is the dev area for your website and can Google find it? if it is searchable then block it.
3. Duplicate URL’s
It is really easy for a developer to fall foul of this and create duplicate URL’s without even realising it and I am going to show you a few ways it can be done. It is a really effective way of getting the pages that share the same content, with the same URL dumped far down in search.
Most websites have a hierarchy to all of their pages – in our case it is: site/gender/brand/product id/colour
This seems like really basic stuff but we had a problem with it. The developer that added our inline html brand pages e.g. www.shoes.co.uk/hush-puppies.html added links to the gender pages. The problem that the link itself did not specify the lower case option. This resulted in the following.
Home page menu link: http://www.shoes.co.uk/mens/hush-puppies
Link from the html page: http://www.shoes.co.uk/Mens/Hush-Puppies
Now of course these are two completely different URL’s, one capitalised, the other lowercase. yet the content on the page is the same! this will lead to both pages appearing in the index and both pages ranking low. We even had the following:
Which means that potentially you could have four different duplicates!
You need to correct the formatting first then 301 redirect the wrong URL’s to the single correct ones.
Many websites have just one protocol – you will either have a slash at the end of a URL or you don’t. In our case our main website does not have slashes after the URL but in WordPress which is what this blog is written on, there are slashes at the end. We had a problem with this – once again our enlightened developer had one rule on the home page and another for the inline pages. Using the Hush Puppies link above we had the following:
Now of course you know what I am going to say! both pages turned up in Google – with the same content and both pages were dumped low down. So we are now up to a potential 8 versions of the same page – well this didn’t happen but you get the picture! A URL with a slash after it can have completely different content as one without and Google will treat them separately – if you have this problem and they have the same content it is once again a source of duplicate content and it will harm your site.
You need to have one format and use 301 redirects to correct the pages.
Underscores and other separators
Like all websites we have a number of feeds going out to various platforms, Google shopping is one example, affiliate feeds is another. On our main site all of the URL’s use a hyphen as a separator for gaps and in between brand names like Hush-Puppies and in styles where there was more than one word. We had a situation where we were sending a feed to Google which used underscores instead of hyphens within the name of the product in the URL for example:
Main site: http://www.shoes.co.uk/womens/iron-fist/high-heels/timmy-chew-heel-1424200001/black
Its a very subtle difference and difficult to spot but by scouring the duplicate title and description section of webmasters it all came to light! guess what? they were either both being shown in Google or the ones with either hyphens or underscores were showing up so it was a complete mess. Imagine the combination of all three of the above! We had to ‘harmonise’ all of the URL’s and rewrite the code for the fees exports making sure that they all followed the same rules.
When I analysed the sitemap that was being sent to Google I also found a mix up of all these errors in the file! Combinations of slashes on the end of URLs, underscore errors and capitalisation, the whole export script had to be rewritten.
Most websites that are built in php have unfriendly URLs and these are then rewritten into the format I have explained above – however in some cases the old non mod rewrite versions can still persist depending where you are on the website. If this is the case then you can use the URL parameters section of Google webmasters to tell Google not to index these suffixes. A good example could be:
http://www.shoes.co.uk/womens/iron-fist/high-heels/timmy-chew-heel-1424200001/black – the non Mod Rewritten version maybe:
This results, wait for it, yet again! the same content being viewable on two different URLs you have to make sure that the mod rewrite code is defined in every link to an inline page and tell Google not to index these non friendly versions! You can do this in Webmasters URL Parameters but be very careful and always ask an expert to do it because you can ruin a site by getting this bit wrong.
Your site should also use the canonical tagging system so Google Recognises the one true version of the page but you’ll need to make all these clean up clean up too!
If not specified correctly your website could include https and non https URL’s which have the same content. it depends how the code is written once you are in the secure area of your website. Our site is secured by a Thawte certificate and the sign up pages are set to heeps. This was not always the case and I had to politely point out to our experts that the site sign-up form was not under https. Once this was corrected (many years ago!) the outbound links from these pages created https version of standard http pages and……you guessed t they were duplicates! e.g.
You need to remove these pages very carefully – if you simply redirect all https to http then your checkout won’t work so it has to be handled very specifically.
The home page of most websites is called the index page most are index.php, index.asp, index.htm and index.html depending on the script your developer has used. I am sure you have guessed already but yes we had further duplication with two home pages one with index.php after it. these can contain different content and is another factor in Google dropping the site! You do not want to have them both – one or the other.
4. Duplicate Titles and Descriptions
I don’t need to tell you that having pages with duplicate titles and descriptions is a bad thing. This can quite easily happen if you do not have a way of specifying them in your content management system. (CMS) When we set up the site the Meta tags – for these are those! were auto generated. This meant that some pages would have the same Meta just because there was no way of writing them in the CMS – we had to change the way it was done and add all of this. We then spent a long time separating them. Just six months ago we spotted another fine web developer error – the boot pages for all genders used the Men’s Meta tags – a simple error in the code meant they were written in to every pages of boots. This was corrected and over time Google dropped the errors from webmasters warnings!
5. Duplicate content caused by Pagination
Most e-commerce websites and shoes is no exception, have more than one page for a bran or category because there are more products than will fit on a single page. This creates Meta duplication warnings in Webmasters Tools. You can partly tackle this with the URL parameter tool, by telling Google to ignore any parameter with page= tag in it. Google recently launched its ‘Rel’ tag so that you can add code to the header specifying how paginated pages relate to each other. We have not done this yet but need to as we still get warnings about the page=all pages duplicating content with the inline brand and category pages. There is more about that here on the Google blog about pagination
6. Duplicate Blog Content
WordPress is a fantastic tool isn’t it! well its flawed and rather than just setting things out in an ideal way for SEO it is very easy to get wrong and ruin your site! Let me explain: many sites take advantage of the amazing looking tag cloud than can be set on all the pages of a blog. In fact pan up and you will see our Tag cloud on the right of the page.
Every WordPress blog post can be put in any number of categories and tags. Guess what every tag and every category shows the same post but with a URL extended by the category of tag. So every post you make can be duplicated many times by their use. Of course its nice to have a filing system for a blog but it means that every tag and category multiplies the duplicity of the post. Install All In One SEO and block Google from indexing the tags and categories – and while you are at it the author tag as well!
I think that maybe after reading this that you will have come to two clear conclusions. Firstly that we at Shoes.co.uk have had an uphill struggle, still ongoing cleaning up our site to get rid of all the duplicate pages! When we embarked on this task Google was indexing over 32,000 pages from our site. This has now fallen to around 7,400 or 77%! We are still not completely clean – the blog was only fixed last week!
The second conclusion must be that duplicate content is a huge subject that web developers can get badly wrong. Its not just about switching around a few sentences that will get you out of the woods- you have to look at all of the areas above and change your site from top to bottom if you are victim to any of them. As I said in the first paragraph most SEO companies will not look at all of this. They will suggest its your backlink profile that is bad, then it may address duplicate titles and page content – but are they looking at everything? are they just assuming that all of this doesn’t happen? I think not – you as a webmaster have to be extremely vigilant and pull your developer on this stuff – sloppy and negligent coding can kill a site.
Nigel Carr 2013 ©