Is duplicate content dragging your site down in Google without you even knowing it? Most people think duplicate content means someone stole their articles. The bigger problem is usually duplicate content you are creating on your own site without realizing it. In this transcript, Mark breaks down both types of duplicate content and explains exactly what to do about each one.
What You'll Learn in This Episode
- The two distinct types of duplicate content that hurt your SEO
- Why WordPress is a duplicate content generator by default
- How to use noindex and nofollow directives to fix internal duplication
- What to do when someone scrapes your content and outranks you
- Why spun content and PLR are increasingly risky strategies
Episode Summary
Mark shares his key takeaway from module two of the Rankings Institute course: duplicate content is more dangerous and more misunderstood than most site owners realize. There are two specific types you need to worry about.
Type 1: Duplicate content on your own site. This is the type Google was primarily targeting when they wrote the Webmaster Guidelines, and it is the one most people overlook. The problem occurs when the same content on your site can be reached through more than one URL. WordPress creates this situation by default through category pages and tag pages that display full posts. If you have five product reviews in a “reviews” category, and your category page shows the full text of each review, Google sees those reviews duplicated at two different URLs.
The fix involves using noindex directives on category and tag pages. Most quality WordPress themes like Thesis and Genesis, plus SEO plugins like Yoast, allow you to noindex these pages easily. The noindex directive tells Google the page exists for users but should not be included in search results. The nofollow directive goes further, telling Google not to even crawl that page or follow its links.
For category pages, Mark recommends noindex with follow. This tells Google not to index the duplicate content but to still follow the links on that page to discover your individual posts. The one exception: if you have created substantial unique content on a category page, you might want to index it strategically.
You would be amazed how many site owners have accepted WordPress default settings and have four or five copies of the same content in Google's index through category pages, tag pages, redirected homepages, www versus non-www versions, and more.
Type 2: Your content appearing on other sites. This happens when scrapers steal your content and post it elsewhere. The most frustrating scenario is when stolen content outranks your original because the scraper's domain has more authority or they have built links to the stolen content. If you have a well-maintained site with Google Authorship set up and your site gets crawled regularly, Google will usually recognize your content as the original. The real danger is for sites using spun or PLR content that was never truly unique to begin with.
Mark's recommendation: if you have content from other sources on your site, rewrite it. If you must republish something like a press release, use noindex to tell Google not to include it in their index. Remove the excuses that give Google reasons to push your site down.
Key Takeaways
- The most common duplicate content problem is content on your own site reachable through multiple URLs
- WordPress category and tag pages create duplicate content by default; use noindex to fix this
- Check for www versus non-www duplication and homepage redirect issues
- If someone scrapes your content, Google Authorship and regular crawling help establish you as the original source
- Avoid spun content and poorly rewritten PLR; Google is increasingly good at detecting it
- Duplicate content is like swimming with weights: you can survive it, but removing them makes everything easier
What's Changed Since This Episode
Mark recorded this in early 2014. The fundamentals of duplicate content management remain important, but the technical landscape has evolved.
Canonical tags have become the standard solution. While Mark discusses noindex and nofollow, the rel=canonical tag has become the primary tool for handling duplicate content in 2026. This tag tells Google which version of a page is the authoritative one, which is a more elegant solution than noindex for many situations.
Google Authorship no longer exists. Google discontinued the Authorship program in 2014, shortly after this episode was recorded. However, Google's ability to determine original content creators has improved through other signals, including crawl data, publication timestamps, and site authority metrics.
Most modern WordPress themes and SEO plugins handle duplication automatically. The Yoast SEO plugin and similar tools now set sensible defaults for noindex on archives and handle canonical tags automatically. The manual configuration Mark describes is still good to understand, but much of it is handled out of the box in 2026.
AI-generated content has created new duplicate content challenges. With AI writing tools producing similar outputs across multiple users, a new form of near-duplicate content has emerged that Google is actively working to address.
Resources Mentioned
- Yoast SEO Plugin — Handles noindex directives and canonical tags for WordPress
- LNIM Podcast
Related Episodes
If you found this episode helpful, you might also enjoy:
- LNIM074 Show Notes — The Impact of Duplicate Content on SEO
- LNIM073 — Are Broken Links Killing Your SEO?
- LNIM072 — Content Is King for SEO
Listen and Subscribe
Listen to Late Night Internet Marketing on Apple Podcasts or subscribe at latenightim.com/internet-marketing-podcast/. Have a question for Mark? Call the digital recorder at 214-444-8655 or drop a comment below.




Mark
this is very helpful. I heard the podcast and then came to your site to read
the transcript. But I am a beginner. I understand it’s important to not index
category pages (and archive pages and maybe tag pages??). But, HOW DO I DO
IT? Can you demonstrate exactly how to set up the Yoast SEO plugin?
I
really like your podcast. Thanks for all the great content you produce.