WordPress duplicate content might lead you to supplemental results ruin
Yeah, WordPress is the most popular CMS used by bloggers everywhere these days, but so few of them know anything about one of its biggest issue, duplicate content. Duplicate content from where you might ask? Well, from the way this CMS organizes its pages and posts. Theoreticaly, Google should be able to index a number of pages a little bigger than the number of total posts on a blog, but it’s not entirely so, as WordPress also creates lots of additional pages for archives, author, categories, search, etc. So we end up with several pages having the same content, and that lead into getting many of your pages in Google’s supplemental results.
What is Google’s supplemental results? Accordingly to their FAQ: "Supplemental sites are part of Google’s auxiliary index. We’re able to place fewer restraints on sites that we crawl for this supplemental index than we do on sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index. The index in which a site is included is completely automated; there’s no way for you to select or change the index in which your site appears. Please be assured that the index in which a site is included does not affect its PageRank."
Yabady yabada… OOOOK! What should we understand from here is that Google’s supplemental results is a place you don’t want your site to be. Pages considered "uninteresting", but that are still indexed by Google, get in here. The down part is that these pages get very bad results in SERPs, and will get you almost no traffic. That’s why you have to "tune" your WordPress in order to eliminate as many pages from supplemental results ass possible. There are several ways of doing this, but the best is trying to get rid of that duplicate content.
First of all, let’s see how you can check how many of your pages get into supplemental results? It’s simple, just type: site:mikesquarter.com *** -spght in Google (pages in the supplemental results are marked with - Supplemental Result -). As you can see, my blog returns 215 results in supplemental pages from a total of 983 (site:mikesquarter.com). That means i have 768 "good" pages.

We’re going to take a look on the robots.txt file i use, that i uploaded today in order to see what good will it take. First of all I wanted to stop the author, category, archives and tags pages from being indexed.
Disallow: /author/
Disallow: /archives/
Disallow: /tag/
Disallow: /category/
Then, i eliminated the feeds pages, as those are XML files not meant to be read by humans, by adding the following lines. You can see i used the Googlebot User-agent field, in orther to be able to use some of the special characters supported only by this bot.
User-agent: Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
My blog uses a plugin that allows readers to translate the content in various languages, but these translated pages tended to get in the supplemental results a lot. So, after reading a couple of articles bout it, i decided to eliminate these files to, by adding the following lines.
Disallow: /it/
Disallow: /de/
Disallow: /es/
Disallow: /fr/
Disallow: /pt/
Disallow: /ja/
Disallow: /co/
Disallow: /ru/
Disallow: /zh-CN/
I did this as most of my search engine traffic came from the English part of the blog, but I’m still not sure i did the right choice. So, what do you think about it?
Another way of controlling what pages get indexed and what don’t is by adding this little code in your header.php file, just above the </head> tag:
<?php if (is_home() || is_page() || is_single() ) {
echo ‘<meta name="robots" content="index,follow" /> ’;else echo ‘<meta name="robots" content="noindex,follow" /> ’;
}?>
In both cases, the results would be the same: Googlebot will only be able to index your home page, your posts and your additional pages. So, you’ll end up with fewer indexed pages than before, in fact a lot fewer. But, hopefully, if these pages have a decent number of inbound links and are well optimized, they’ll have good positions in SERPs, thus bringing you a lot more traffic than before.
If you have any other ways of solving the supplemental results issue in WordPress, or just want to leave a reply, please do so
Tags: Google, results, robots.txt, supplemental, wordpress
I have not used the plugin for translating blog contents to other languages and wanted to know if its any good at attracting traffic from regional search engines? IMO, you should at least allow the spider to crawl and index the pages for these other languages, so the pages can rank on other regional search engines, maybe for niche keywords and can get you some traffic.
Yea, I’ve taken those rules out of the robots. Some of the foreign languages, particularly Spanish, do bring me a couple of Google visitors, but since I’m currently experiencing the sandbox effect, i can’t give any numbers. I’ll just have to wait and see.
Ont he other hand, on my other blogs i use the plugin, i get like 5% of my Google visitors from foreign keywords and almost 95% of the indexed pages in other languages get into supplemental…