Rankings Institute: Duplicate Content
Right now, before I leave for the airport, I want to tell you guys about my major take away from the latest module of Rankings Institute. Actually, I got a little confused. Last week we talked about 404 errors and the importance of that – that’s actually in module three of the Rankings Institute. This week we need to talk about module two, I skipped over that.
One of the big discussions in module two is duplicate content. This is something that we’ve talked about before. One of the cool things in Rankings Institute is a bunch of advanced techniques that talk to you about how to find duplicate content, how to assess the impact of duplicate content on your site, and so forth.
What we’re going to talk about on the podcast today, however, is what duplicate content is. I want to demystify this for you, because there is always a lot of discussion about spinners and duplicate content and the effect that it can have on your site, what duplicate content actually is and what Google is looking for. So I’m going to try to demystify this for you. This is something that we’ve talked about before, but it was 30 episodes or so ago and I think I have a more clear explanation for you now, so I think that’s what we’ll talk about today.
I think everybody knows that Google doesn’t like duplicate content, that’s very clear. I think what people aren’t clear on is what duplicate content is. There are kind of two specific cases of duplicate content that you need to worry about on your website.
The case that everybody always wants to talk about is the case where content on your site appears on somebody else’s site. We’ll get to that, but I’m going to tell you that when I read the Google Webmaster Guidelines it’s pretty clear to me, and it has been for years, that that’s not really the duplicate content that they were talking about when they initially wrote those Webmaster Guidelines.
The duplicate content that they’re talking about is actually duplicate content on your own site. What that really means is Google does not want you to have any content on your site that can be reached by more than one URL. How is this possible? Unfortunately, WordPress is a duplicate content generator, because in typical themes a lot of times what WordPress does is it generates category pages and tag pages. Oftentimes these category and tag pages show full posts. When they do show those full posts basically those posts are duplicates of the original post URL.
Going back to the e-cigarettes example, let’s say you review five different e-cigarette brands on your e-cigarettes website, the one that we talked about last week. Let’s say, for the sake of argument, that each one of these articles is 800 words long because you’re creating excellent content we talked about a couple weeks ago. These are awesome reviews with pictures and great analysis of why a particular e-cig is better than another and comparing brands, you’re really helping people out and you’ve created some awesome reviews that are going to help people decide which e-cigarette to buy.
All of those e-cigarette articles have their own URL – yoursite.com/ecigarettereview1, yoursite.com/ecigarettereview2, and so on. Of course, you wouldn’t actually use URLs like that, you would want to put the name of the e-cigarette in the URL for SEO. So you have these URLs and those are the kind of authoritative URLs, that’s where the content lives.
Because you’re trying to help your reader you’ve placed all of those reviews in the category “reviews,” and you have a button on your site that a user can click on and see all of the reviews that you’ve written. That’s how you’re using the category page and that is an excellent use of the category feature in WordPress because it provides a positive user experience. Users who want to see all the e-cigarettes that you’ve reviewed can easily do that, that’s exactly what you want from a user experience standpoint.
But if you look at that URL, you’ll find that the page created for those categories duplicates the header tags that you used for each of your articles and in some cases some, if you’re using excerpts, or all, if you you’re using full posts, of the content that’s at those other five URLs, the original URLs for the review. That is duplicate content from Google’s point of view. That is what you want to avoid.
Now, most themes that are out there, the really good ones like Thesis and Genesis, and most SEO plugins like the All-in-One SEO plugin and Yoast SEO plugin, will allow you to “no index” those pages. We’ve talked about this before, but let’s go over that.
There are two kinds of things that you can do when Google is scanning your site, it’s called crawling when the Google robot comes to crawl your site, there are a couple of directives that you can tell the robot that Google will to some degree respect.
You can tell Google “don’t index this page,” which means “Hey Google, I got this page here, it’s for my users, but I’m telling you in advance that I really don’t want it in your search engine results. It’s not really something that is appropriate for your search engine and it’s not the kind of thing that we want searchers to land on.” That’s helpful to Google because they’re counting on you as a webmaster to tell them what’s good to keep in the index and what isn’t good to keep in the index. Then there’s also no follow, which tells Google, “Hey Google, not only do I not want you to index this, I don’t even want you to go there, don’t go and read the page.”
In the case of no index you’re telling Google, “You’re free to read the page and follow whatever links that you find there and crawl around my site by going down this page, but I don’t want you to actually put this page in your index.” In the case of no follow you’re telling Google, “I don’t even want you reading this page. This page is not for you. I don’t want you to find links on this page and follow those and pass link juice down through these links.” These are the kinds of things that you do when you’re optimizing link juice on your site.
In the case of duplicate content though, what you’re telling Google is, “This category page just contains information that you’ve already indexed. I want you to read this page and find the links to all the posts that are here and all the links that are on those pages, feel free to read it, but this is duplicate content and I don’t want you to index this because I know you don’t like duplicate content.”
The one exception to this could be if you have a category page that is somehow unique. Perhaps your category page starts off with 800 words of unique content or 500 words of unique content. In that case you might want to index it. Or if your category page is something that you want to index strategically for some reason, maybe you have a bunch of products that fall into a category and that’s a search term you’re targeting and you have strategically decided, then you need to somehow make that page unique if you want to index it.
That’s the first kind of duplicate content. You would be amazed how many people out there on the internet have accepted default settings in WordPress and they have indexed their category pages, their tag pages, they have redirected homepages where the page is indexed and the homepage is indexed and the www version is indexed as well as the non-www version is indexed, and there are four or five copies of the same content in Google’s index.
Google does not like that and that is one of things, like the 404, where they will put that in a list of if you’ve been naughty or nice. Duplicate content goes on the naughty list, just like 404 errors. These things add up against you eventually pushing your site down. It’s like if you’re trying to swim and somebody keeps handing you one pound weights, it just makes it harder and harder to swim.
Maybe you can swim with five pounds of weights in your hand, but it’s easier to swim without them. Ranking is the same way. For duplicate content you want to make sure that from Google’s point of view you don’t have duplicate content on your site.
The second kind of duplicate content is the kind we’ve talked about so many times before, it’s the kind of duplicate content where your content that’s on your website appears on other websites around the web. This is going to happen to you. If you’ve been creating websites for any amount of time you know that spammers are going to scrape your site and post your content on their site, sometimes they link back to your site and sometimes they don’t, trying to get your content to rank so that they can get traffic in the search engines for your content.
Nothing is more frustrating than to look at a Google result where your content that was stolen from you and is living on another website is outranking your original content, the exact same content on your very own site. How can this happen? Obviously it happens because the root domain on the site that stole your content may have more authority or the person who stole your content may have gone to the trouble of pointing links to the stolen content, so they’ve done some sort of SEO optimization on the content that they stole from you. For whatever reason, maybe your site is penalized, maybe you have 404 errors, maybe you have duplicate content, and an important thing is Google has lost track of who the original author of the content was.
This is even more important if you’re posting content that you spun, which, given the way Google is valuing quality content, I’m not really a strong believer in spun content anymore. I know it’s possible to rank spun content and there are some applications for it, but I’m not a big fan of spun content the way things are going on the internet. I think that’s something you need to be thinking about very carefully if you’ve got a strategy that includes content spinning.
If you have content on your website that is from another source, my recommendation to you, quite frankly, is to rewrite that content. Create some new content, create some unique content, make sure that your content that’s appearing on your site is not appearing elsewhere.
If you have quotes from other places, normal editorial stuff where you’re saying “The Prime Minister of India said…,” and that’s appearing all over the web because you’re quoting someone, that’s normal. That should be in the middle of an article of 800 or 1,000 words of unique content. That’s okay, that’s absolutely fine.
What I’m talking about is a significant amount of content that’s clearly ripped off from somewhere else. If you have that duplicate content, maybe you need duplicate content, maybe there is a case where you’re republishing a press release where the authoritative source for the content is actually on PR Web or someplace like that, but it’s of value to your users to place that content on your site. Go ahead and do that, but tell Google not to index that content. Say, “Hey Google, this is stuff I don’t want you to look at, don’t put it in your index,” and that will give Google the signal that you’re trying to help them make sure that the content they index from your site is unique.
So two kinds of duplicate content; duplicate content where the same stuff appears over and over again or more than once on your site, and duplicate content where stuff on your site appears elsewhere on the internet. Google is working very hard to try and keep track of your stuff when you publish it first, particularly if you’ve messed around with and established Google authorship on your site, which if you haven’t I recommend. If you have Google authorship running and you have content that you’ve posted recently and your site is being crawled well by Google, then Google is going to see content on your site first and 99 times out of 100 you’re not going to have a problem with being outranked by someone else, because Google knows where they saw the content first.
I think the real problem is for those of you out there that have had marketing strategies where you’ve spun content and maybe not spun it very well or you’ve used PLR content and maybe that content wasn’t very unique, Google sees that as duplicate and it’s dragging your site down. Definitely recommend, and one of the major takeaways from my time in the Rankings Institute module two, duplicate content is something you just want to go ahead and avoid. If you’re serious about ranking web pages in Google, don’t mess around with duplicate content on your own site and don’t mess around with duplicate content bringing other people’s content onto your site.
That’s my advice from the Rankings Institute. If you think that’s cool, you can sign up for the notification list over there so you can know when they open this course up again. I’m really enjoying it, Alex and Andrew are pretty amazing. Week four is mind blowing, so next week when I’m back in Dallas we’ll talk a little bit about my major take away from the first part of the link building campaigns that they’ll be talking about in module four of Rankings Institute.
One More Thing About Content…
I am a little bit short on time today because I’m about to have to rush off to catch a car to drive from Kuala Lumpur to Melaka, so I have to run down to the lobby to do that, but I did want to leave you with this one last idea before I go.
With all this discussion about duplicate content you may be going, “Woe is me, I was using a lot of PLR.” It’s still okay to use PLR, you just need to understand this duplicate content thing and maybe rewrite it a little more aggressively.
“But I have this duplicate content and I have so much content I need to create, it’s just a beating. I just don’t know how I’m going to create all this content.” When I create content for me, the problem is oftentimes what to write about exactly. It’s not that I can’t write easily. If something is already in my head, as you guys know from listening to me, I have a lot to say and I’m able to ramble on easily for long periods of time, for 30 minutes at a time just like I’m doing right now. The problem is if I need to write an article about e-cigarettes or something, I don’t know enough about e-cigarettes so I need to spend an hour, or two, or three, reading about e-cigarettes and that such a beating, it really is difficult.
Jon Leger has done a lot of different stuff like Rank Crew and a bunch of stuff over the years, he’s a good guy, he has a new product coming out that’s really pretty amazing for doing research. This is not to generate content that you would publish, this is to generate quality research that you could then use to form the basis for an article that you would then write. Give me pages about facts and ideas around e-cigarettes or around some subtopic of e-cigarettes and then I’ll read that, and then I can write an article based on that. It’s that kind of idea.
This can really cut your content creation time down significantly. This project that he has is called Content Ferret. The thing about Content Ferret, it’s an impressive product, it’s a product that pulls out curated content, it’s really neat the way it works. Because of that, it doesn’t work on just any topic. He’s releasing it with a certain number of topics, you can find that over on the sales page what those topics are, and he’s going to add topics over time.
Now, I have experience with Jon and these sort of things. He’s incredibly good at hiring armies of people to create curated content for stuff like this. For a long time I talked about Article Builder, I still use that for various things from time to time. Content Ferret is an idea like that from the standpoint of building a team to create content behind the scenes to power a content engine.
One of the reasons Jon does this is he’s interested in creating jobs, so he creates jobs for people when he does these kind of things, real paying jobs that people have and use for money to feed their families. So that’s one of his intrinsic motivations for creating this kind of content. Those of you that have been around Jon a long time know he does a lot of charity work and so forth. In fact, one of my favorite things that he did years ago was save a train museum up the highway from where I live. He is kind of a neat guy.
Anyway, Content Ferret, if you’re the kind of person who can write just fine but you don’t want to have to do the research you can check Content Ferret out and see what you think. Jon has a really nice video there that explains to you what Content Ferret does and he also has a very generous return policy that if it doesn’t work out for you there’s not really much risk. If that speaks to you, I like Jon and I like his products and I think you might like Content Ferret.
Wrapping Things Up…
As I mentioned, I’m going to put on a pair of pants and head down to the lobby. You didn’t know I was podcasting without pants, did you? That’s probably a little too much information, but if it makes you feel any better I have my workout shorts on. So it’s not like naked podcasting, it’s like podcasting in workout clothes. I need to put on a pair of slacks and head down to the lobby.
I hope you have an absolutely fantastic day. I hope this discussion about duplicate content was helpful to you and makes some sense. If you have questions about that, I would love to hear about them in the show notes at http://www.latenightim.com/lnim074-duplicate-content-seo/
Remember, in a couple of episodes I’ll be talking about listener feedback and answering questions. If you have a question, bring it on through the hotline at 214-444-8655. Most of you are listening on your iPhone or Samsung Galaxy, that’s what the market data says anyway, so you can just hit pause and dial 214-444-8655 and throw me some feedback. I’d love that.