Feed Rss



Oct 31 2008

Blog Scraping

category: Rants author:

For the uninitiated:

Blog scraping is the process of scanning through a large number of blogs, usually daily, searching for and copying content. This process is conducted through automated software. The software and the individuals who run the software are sometimes referred to as blog scrapers.

Scraping is copying a blog that is not owned by the individual initiating the scraping process. If the material is copyrighted it is considered copyright infringement, unless there is a license relaxing the copyright. The scraped content is often used on spam blogs or splogs.

This is something I’ve not had to deal with in the past. I guess my blog wasn’t useful enough or interesting enough to scrape. That changed recently when I saw a pingback awaiting moderation. I’m always curious to see who is linking to my site.

The majority of links are usually related to the plugin I’ve written for WordPress. However, the pingback in the moderation queue was bizarre as it contained my post in it’s entirety.  Turns out that the site, which I will not list to increase their link count, has blatantly copied the content of my posts and put it on their blog. The original link to my post still remains, thus the pingback.

A quick run through the other entires on this site seems to show that they’ve been scraping numerous other blogs and post the content surrounded by Google ads.

Blogs are copyrighted material and re-publishing it without my express consent and permission is at the least an unethical thing to do and at some level illegal. Furthermore, the other posts that’ve been scraped are all an unrelated jumbled mess of posts that have nothing to do with each other.

As flattering as it might be to have my content stolen, the fact that it ended up being scraped without permission and only spammy blog irritates me.

Cheers

tag: , ,