Preventing Blog Scrapers
I read Problogger's blog every once in a while as I often find good tips in there. Today I noticed just such a post.
The post is about Blog Scrapers and it is something that I have often wondered about. Scrapers are people or bots that take your RSS feed and use it as their own blog or content. Often with no link or mention of you. They do so purely for the advertising revenue. They hurt us by doing two things. 1) they dilute your web site ranking by causing search engines to think your content is duplicate content. 2) they are making money off your work.
They are different from the many excellent aggregators and other online tools that use your content in fair use and small snippets that often drive traffic and new users to your site.
I have found my blog involved being abused by scrapers in the past and also other bloggers that I read. In the past I have just sent a note to them or their host and it gets cleared up, but clearly that doesn't scale. Its hard to find them and when you do there is no way to know for sure how to get them to stop. That is why I liked the blog entry mentioned above. It outlines some pretty good tips to stop them. One specific quote from Matt Cutts from google that I found interesting.
“…if the syndicated article has a link to the original source of that article, then it is pretty much guaranteed the original home of that article will always have the higher PageRank, compared to all the syndicated copies. And that just makes it that much easier for us to do duplicate content detection and say: “You know what, this is the original article; this is the good one, so go with that.”
Basically if each of your posts has a link to the original post in the RSS feed that would help SEO and combat the scrapers. So that got me thinking. How best could we implement this with BlogCFC? Ray may opt to have something like this in version 6, but for those of us who may want to do this sooner what is the best approach? It would be easy enough to update the rss.cfm file in some way, I guess I am more concerned with doing it in a way that would be difficult for them to strip out and would make sense to readers of our usual feed (not make us look stupid).
Maybe just a simple link at the bottom saying 'original entry at xyz domain' is all we need?
Like this entry? Subscribe to my blog.


Comments (moderation on)
That wont fix stuff like this of course (rays site copied).
http://razorproxy.com/index.php?q=aHR0cDovL3d3dy5j...
What I have done is just institute auto <more> tag so that any long post automatically links back to my site for the details.