The causes and potential negative effects of duplicate content

Published: 15th February 2011
Views: N/A
Ask About This Article Print Republish This Article
You know duplicate content can have a negative effect on web site rankings. But how do you examine whether a particular web site exhibits this problem, and how do you mitigate or avoid it?

To begin, you can divide duplicate content into two main categories:



Duplicate Content as a Result of Site Architecture

Some examples of site architecture itself leading to duplicate content are as follows:

- Print-friendly pages

- Pages with substantially similar content that can be accessed via different URLs

- Pages with items that are extremely similar, such as a series of differently colored shirts in an e-commerce catalog having similar descriptions

- Pages that are part of an improperly configured affiliate program tracking application

- Pages with duplicate title or meta tag values

- Using URL-based session IDs

- Canonicalization problems



All of these scenarios are discussed at length in this chapter.

To look for duplicate content as a result of site architecture, you can use a "site:example.com" query to examine the URLs of a web site that a search engine has indexed. All major search engines (Google,


Yahoo!, Bing Search) support this feature. Usually this will reveal quickly if, for example, "printfriendly" pages are being indexed.Google frequently places content it perceives as duplicate content in the "supplemental index." This is

noted at the bottom of a search engine result with the phrase "supplemental result." If your web site has many pages in the supplemental index, it may mean that those pages are considered duplicate content —

at least by Google. Investigate several pages of URLs if possible, and look for the aforementioned cases.Look especially at the later pages of results. It is extremely easy to create duplicate content problems without

realizing it, so viewing from the vantage point of a search engine may be useful.



Duplicate Content as a Result of Content Theft

Content theft creates an entirely different problem. Just as thieves can steal tangible goods, they can also steal content. This, unsurprisingly, is the reason why it is called content theft. It creates a similar problem


for search engines, because they strive to filter duplicate content from search results — across different web sites as well — and will sometimes make the wrong assumption as to which instance of the content is

the original, authoritative one.This is an insidious problem in some cases, and can have a disastrous effect on rankings. CopyScape (copyscape.com) is a service that helps you find content thieves by scanning for similar

content contained by a given page on other pages. Sitemaps can also offer help by getting new content indexed more quickly and therefore removing the ambiguity as to who is the original author.

unfortunately, fighting content theft is ridiculously time-consuming and expensive — especially if lawyers get involved. Doing so for all instances is probably unrealistic; and search engines generally

do accurately assess who is the original author and display that one preferentially. In Google, the illicit duplicates are typically relegated to the supplemental index. However, it may be necessary to take this

action in the unlikely case that the URLs with the stolen content actually rank better than yours.



Excluding Duplicate Content

When you have duplicate content on your site, you can remove it entirely by altering the architecture of a web site. But sometimes a web site has to contain duplicate content. The most typical scenario of this is

when the business rules that drive the web site require the said duplicate content.To address this, you can simply exclude it from the view of a search engine. Here are the two ways of

excluding pages:



Using the Robots Meta Tag

This is addressed first, not because it’s universally the optimal way to exclude content, but rather because it has virtually no limitations as to its application. Using the robots meta tag you can exclude any HTMLbased

content from a web site on a page-by-page basis, and it is frequently an easier method to use when eliminating duplicate content from a preexisting site for which the source code is available, or when a site

contains many complex dynamic URLs.















My name is daksh and i help online business to improve their link popularity and especially for social bookmarking services, article submission services and directory submission services

This article is free for republishing
Source: http://jamesdaksh.articlealley.com/the-causes-and-potential-negative-effects-of-duplicate-content-2035092.html


Report this article Ask About This Article Print Republish This Article


Loading...
More to Explore
 


Ask a Professional Online Now
27 Experts are Online. Ask a Question, Get an Answer ASAP.
Type your question here...
Optional:
Select...