Wholesale Blog Plagiarism … Alert
Written by Om Malik- Posted Sunday, December 25 at 9:05 AMUpdated: December 27, 2005: The wholesale blog plagiarism is a much wider problem than most of us realize. In the past few days, as we tried to get one site shut down, many more examples have come to light that are simply ripping the content, and repurposing it for making Ad dollars. I can list many a few names here, but why give them the traffic. Some suggest no-more-full feeds, which has sparked off a whole different debate. I am not cutting off the full feeds because of some people are not doing the right thing. That doesn’t mean I am not worried about this whole trend. I turned to Dick Costolo, CEO of Feed Burner and asked him if he could do something. Dick replies…
We do have the ability to throttle these kinds of things IF they are identifying themselves. Frequently, these kinds of sites use a tool that masks their identify by just requesting the feed with a blank user-agent string (for example, instead of sending up “FeedDemon/1.0″ or “Googlebot”, they just send up nothing). The problem with banning a blank string is that there are a bunch of perfectly valid home grown RSS readers out there that also send blank strings, and finally, these guys usually bounce around from IP address to IP address.
Clearly, these sites ONLY exist because they can make money from Google AdSense. The issue is important enough for Google to step in, and do something. Everett says, in the short term it may not be a bother for Google. Jeremy Pepper points out that while the splog sites are doing this to make a quick buck, how about the aggregators etc, who are repurposing the content and making money off that.
One commenter writes, “Scrapper sites may soon become the Achilles heel of google adsense program and trigger massive advertiser withdrawal, like what happened to banner advertisements of Web 1.0 era, when many sites started to reload the page every few seconds to get billions of ad displays and advertisers lost millions.” Meanwhile there has been a lot of behinds the scene conversation, that cannot be blogged right now.
Original Post…
Last week, Mike over at Crunch Notes was complaining about Josh Stomel, who was making slight changes to Mike’s posts and reposting them as his own writing. Well, at least Stomel made some effort. This morning, Andy Abramson sent me an email about this website which is lifting and reposting the posts from GigaOM wholesale, images and everything.
These guys who call themselves a magazine network are so dumb, they even took the categories. Apparently, these people are not just ripping my content, but also the content of other bloggers. The design seems to be inspired by “Weblogs Inc” and clearly, this site is created to make money off other-people’s work. Think of this as a new kind of a splog. All right folks, I need some suggestions on how to make this shit stop. The domain is registered to someone in Texas, and the email address on the domain registration information goes to RezGlobal, a wholesale luxury travel agency. Global company, which has a website, but no executives.
Update: Thanks to reader suggestions, some aggressive reporting by Dave Burstein and Andy Abramson, along with a quick response by the said site’s host, the xb90.com has been shut down. Thank you all for moral and technical support.
Comments and Trackbacks
Om,
You contact their hosting provider and notify them of copyright infringement. If they are a US provider it should be a simple matter. They might move, but eventually they’ll get tired.
If you want to pursue more litigious methods, you need to find out who they are.
any idea how to find out who their hosting service is. are the nameservers an indication of who the hosting provider might be?
This should be the info you need…
Technical Contact:
Master , Domain
gstirling@rezglobal.com
4311 North 10th Street
Mc Allen , Texas 78504
United States
(775) 363-4545
Master , Domain
gstirling@rezglobal.com
4311 North 10th Street
Mc Allen , Texas 78504
United States
(775) 363-4545
Om…According to Netcraft, that domain’s ip address is 216.219.95.188 which belongs to…
Host Department LLC
501 Silverside Road, Suite 105
Wilmington, DE 19809
According to this…
http://uptime.netcraft.com/up/hosted?netname=WORLD-ISP-NETWORK,216.219.80.0,216.219.95.255
That site is one of 50 on that server.
Hi there PXLated, thanks for helping out with that information. i sent an email to their abuse departemnt as well. lets see what happens. appreciate the help deeply
Om: Make sure to cite the DMCA in all correspondence. This is a case where the DCMA is your friend, in a very big way. Even if the ISP for some reason doesn’t believe or like you, they are required by law to automatically remove the content immediately and put it in safe harbor until the matter is resolved. If they don’t, you can sue them, which is why they will automatically do it.
DMCA is the best route to go - send it to them, to their ISP, and the ISP’s upstream. You’ll get a quick response.
Matt
Welcome to the next spam wave, i know some of these guys are making thousands a day by taking XML feeds and repackaging them in high paying markets.
THe only way you will be able to shut them down is to report them to google adsense for breaking their TOS.
You can also contact Google and get their AdSense shut off…
thanks guys for the tips. i am following up on this and will keep you posted. appreciate the tips. very helpful.
om,
it looks like these guys even splogged your post on the fact they are splogging your posts!
cheers, mark
hey mark,
that is funny, and further proof that they are nothing but scammers. please help spread the word and make this a nightmare not for this guy but also for his host, and everyone else
LOL. They even have your plagiarism blog published on their site. Reminds me of a stupid boy in my school who always copied from me in exams and sometimes even my name. HaHa!
Meanwhile check this website www.techwhack.com, he plagiarises the entire news networks content as blogs to earn from ads and has a very high ranking in Google news.
Om, regarding plagiarism, I can’t speak but a big company is already writing a tool to detect the same.
khabri
thanks for the heads up on the tool. whichever company is doing that, it is doing a lot of us a favor. and i kinda of am sick and tired of always fighting this battle. so some automation will come in handy
I don’t see why you seem so pissed! I see the ‘Times of India’ having one page called ‘bloggers park’ where they lift articles from the blogosphere! Its basically free content for them and in a revenue as they have print ads. Now, will you call the plagiariasm? He doesn’t give credit I agree but guess what over a period of time people will know the real sourse. Try introducing referrals to your own site then and there so that the reader knows where the content is coming.
Try to go after these kind of people is just pure waste of time and efoort. If are succesful in this after a month it may be someone else!!
Om — I’m a hosting provider for 10+ years. We are, as others have stated, required to state the person who is the DCMA related legal contact on our site. That means, you are lucky (temporarily) these sploggers’ provider is in the US. It won’t be the case next time. As a hoster we always respond DCMA complaints (though am not sure we are required to immediately block a site alleged to be a copyright violator), however a crook hardly ever hosts with a seemingly law abiding hoster like us. These guys simply have some sort of an xml sucking tool so their cost of copying and moving the site is almost zero. As we’ve learnt from the spam war, content creators can’t win this lopsided war.
I’ve found posts of mine copied onto splogs before, but it doesn’t seem to harmful. These websites get virtually no traffic, of course, and if you occasionaly link back to one of your posts in a post, then when it is copied there, Google will see that as yet another incoming link, which is arguably beneficial. It’s sort of like someone else doing SEO dirtywork for you.
Still, it’s pretty weak.
joesph, thanks for your input and you are right. normally i ignore these bastards but this time around it is just too much… oh well.
sanjay
thanks for your input. i agree, some of these nefarious people are simply ruining it for all the other law abiding folks. i still wish there is something we could do about this. i have followed the instructions from all the folks who have left a comment, and i wonder if that is enough.
thanks for your words of support.
Om,
For sites like mine and others that do not rely on ad revenue to pay for themselves ends up building our legend. For you and those like Tom Keatings and a few others who do rely on the ad revenue the stealing of traffic hurts in the wallet, which I why I alerted you.
Andy
Sanjay: You actually are required to immediately place any content which has a DMCA complaint against it in safe harbor (i.e. temporarily remove it from the site, by whatever means you choose). At that point, you are only required to notify the accused party that you have taken it down and that there is a complaint against it.
This is to protect you as the ISP, and not anybody else.
From the moment you take it down, there is really no further action you must take. You are not required to make a judgement on whether or not the content is illegal, and in fact, you may allow the content to be re-posted if you feel the accused is correct. At this point, any lawsuits are between the accuser and the accused and you are not a party to it. 99% of the time, it doesn’t get this far though as the accused has been “caught” and usually cowers away into a hole at that point.
It’s something that is happening more and more out there. At least you have presence of mind to have your copyright below and are not relying on just a Creative Commons license.
But, it does beg the question about other legitimate services that do use your posts and make money off of them. What about the news aggregators that pull your posts, yet have AdSense or other ads there? What about people that are blogging via Del.icio.us - pretty much just link blogs that also have ads? Where does the line get drawn?
No, I don’t think what the xb90 guy doing is Kosher, but I do think others have been rabidly attacked for doing stupid stuff. This guy is milking the system, while others have just been lazy.
Just my .02.
I started putting comments on their site stating that they are ripping off your content.
Om find their requests for your feed in your logs and block them with apache.. I assume the request for your feed will come from the same server and IP address. If you need a hand let me know.
Nik,
that is a great idea. i think it is something i am going to try and get done. it might be better way of blocking them. thanks again.
[…] I am going to refer you to Om Malik’s post about him getting ripped off by some website that is merely copying RSS feeds and not giving credit to the original author. This is ridiculous and hopefully will be stopped […]
[…] sites that repost content are nothing new. However, this latest, reported by Om and Andy, is really over the top. A company called XB 90 media is lifting posts out of RSS Feeds, and reposting them whole without attribution. They’re monetizing the site with Google Adsense. It’s such a laughable job that they’ve even preserved the original poster’s categories. […]
While RSS feeds can spark a new take on a subject and we all build upon others comments, wholesale plagiarism of blogs shouldn’t be tolerated, just as we wouldn’t tolerate ripping-off the content of the AP or a ‘conventional’ online publication. If legal action cannot be taken, these thieves should at least be ‘out-ed.’
Ed
[…] Om writes about his experience with splogs and plagiarism too. What’s great about these complete copy-cat blogs is when they steal and re-post the entries that talk about them stealing and re-posting content. It shows, of course, that this is just an automated system, and these guys are doing it to generate revenue. […]
I found another site http://onlinecashkings.com/ that is doing the same thing execpt for they are taking stuff from http://www.calacanis.com/ and not Om Malik. This seems to be getting more and more out of hand and we need to start doing something about it.
“The only way you will be able to shut them down is to report them to google adsense for breaking their TOS.”
You’re joking, right? Why would Google shut down advertisers who are making them thousands a day? Google has ZERO oversight on the content that runs Adsense. they take no responsibility.
“Google is a provider of information, not a mediator. We serve ads targeted to certain web pages, but we don’t control the content of these pages. For these kinds of questions or comments, it is best to directly address the webmaster of the page in question.”
Google is the crack dealer handing out drugs at the playground and then saying “but I’m not forcing anyone to do it”.
Om - to piggyback on pxlated, and if you use IE (unfortunately not available for FF or Opera yet), you can use the Netcraft toolbar to see who is hosting a domain as you visit it (see toolbar.netcraft.com). It also has some anti-phishing stuff there.
Just a thought…
If they are grabbing a feed and it’s “full”, they get value. If the feed is just a “Summary” with a link to your site, there is little (or less) value. I know some don’t like “summary” feeds but hey, it’s an easy defense against these types.
This is why I’m against full text in feeds. Set your feed to include only the first N words and you’re done. I can’t see how you can complain if your feed contains the full text, you’re practically giving it all away.
Looks like the site’s been taken down…
“Account for domain voip.xb90.com has been suspended”
Paul
Om, the site has been suspended, I just checked…
site is down–hope it stays down,those crooks.
This site http://www.yukonet.com/default.aspx?content=showChannel&channelId=109273 shows Om’s content with ads.
Well done on shutting it down. A useful tool for discovering this stuff is http://www.copyscape.com/ - I’ve found a few people stealing my marketing copy with it.
Publishers using the summary feature of RSS feeds is one side, aggregators using the “excerpt” capacity of most of the RSS parsing software is the other.
Having twenty words of Om’s content as Memeorandum does at the moment means people will come to this site; having the whole article is theft.
It is an issue which is going to become much more important as mashups and aggregations try to add value to original content.
Om,
The ripoff posted above is taking your feedburner feed. You know how to fix that I think..
Looks like you killed them, site is down.
Splog away! Splog away! Sploy away all!
Gee, I can’t even take some blogging time off for the Christmas holiday without some major controversy hitting the blogosphere. Apparently, some splogger has been lifting several bloggers full content surrounded by Google Adsense.Andy Abramson, a fell…
Splog away! Splog away! Splog away all!
Gee, I can’t even take some blogging time off for the Christmas holiday without some major controversy hitting the blogosphere. Apparently, some splogger has been lifting several bloggers full content surrounded by Google Adsense.Andy Abramson, a fell…
[…] This issue has got some traction over the last 24 hours, because big-time blogger, Om Malik, got the shits his content was being ripped off. Fact is, though, it’s happening all over the net, all-day, every-day. There’s simply stuff all respect for writer’s copyright on the Internet and there needs to be a fundamental change take place or it’s going to go on unabated and the tenuous economics that one day, might make blogging and independent media sustainable remain even further off in the distance. […]
Om,
I think this story has touched the heart of so many bloggers.
Looking at the popularity and number of comments in this post, I am sure to find this in Best of GigaOm 2006 ebook.
Sometime back, About.com picked up a story from my blog and didn’t give any credit. I wrote to the About.com editor but my email was simply ignored.
http://labnol.blogspot.com/2005/12/aboutcom-and-blog-plagiarism.html
Amit.
[…] On the issue of wholesale copying of blog posts. This comment sums it up nicely. […]
Talk about a nasty Christmas surprise…
Uf as seems likely, the ripoff artists are mirroring your posts wholesale using some kind of bot, it is very easy to add some JavaScript code to your page, that defaces the plagiarized site big time.
Scrapper sites may soon become the Achilles heel of google adsense program and trigger massive advertiser withdrawal, like what happened to banner advertisements of Web 1.0 era, when many sites started to reload the page every few seconds to get billions of ad displays and advertisers lost millions…
Splogs Go Away Quietly into that Good Night
In my post last night titled "Splog Away! Splog away! Splog Away all!" - a pun of "Dash Away! Dash Away! Dash Away all!" - I commented on a recent splogging of several bloggers’ content. I’ve been splogged a…
[…] Seen first at Threadwatch, followed from their point to a fellow 9ruler Om Malik. […]
i know you don’t want to do this, but in august i killed our full text feed because of the text theft. our feeds now just display the summary. i didn’t want to go that route either, but i was getting sick of our content being copied/pasted and placed on a page with adsense ads(we don’t even have adsense on our page, so why should they make money where we don’t). as soon as i killed the full text feed, they completely stopped. i haven’t seen any any new content stolen since august.
Om, this is nothing new - webmasters have been complaining of their sites being scraped for short-term Google AdSense monetization for some time now.
By the way, your blog entries don’t seem to show for this Firefox user:
Error: [Exception… “Component returned failure code: 0×80040111 (NSERRORNOTAVAILABLE) [nsIXMLHttpRequest.status]” nsresult: “0×80040111 (NSERRORNOTAVAILABLE)” location: “JS frame :: http://gigaom.com/wp-content/themes/gigaom/javascript/giga.js :: stateHandler :: line 32″ data: no]
Source File: http://gigaom.com/wp-content/themes/gigaom/javascript/giga.js
Line: 32
Sorry to hear about this Om but I have one question. Have you been hiding under a rock or something? The amazing part to me is that someone who is supposedly so net savvy JUST realized that scraper sites steal blog content for the purpose of displaying ads next to it. Just seems odd you didn’t know about this a long time ago. In the online marketing and advertising business this kind of thing is old news.
I hate to post this Om , but stopping the scrapers is almost a impossible task from the webmasters end ,you kill one and hundreds will appear!…
IMHO , the best way to stop this is to eliminate their financial incentive ,almost 90% of this scrappers monetize with adsense - if only google had a strict adsense spam policy and act on the spam reports fast enough (meaning they have to deploy more warm bodies) this problem would’nt be this big!
Well now you know, how the artistes felt when you downloaded their music using Napster or BitTorrent.
awww…don’t tell me you never ripped off a song in your life huh?
hey MIT Dude, i think it was in the napster days. its simpler to just buy now. thank you itunes. still your point is very valid.
MIT Dude, I don’t think your analogy is quite right. The main issue here is these guys making money out of your content, which is not something you do when you download music (unless you copy it to a CD and then sell it on the street).
The worst thing here is plagiarism: when someone takes your content and pretends he is the author (something you are unlikely to do with a Britney Spears song).
[…] Check out this bitchfest. […]
[…] I’ve read some things that get people ticked off pretty quickly. I guess I never thought my blog would be one of them (indirectly). Unfortunately, Om and others don’t really like aggregation of blogs (like this one does right now). […]
If you have ever checked-out a “hire a freelancer” website, you know the ones where people post their Internet and related technologies project for so called “professionals” to bid on, you’d find that many many projects are to “clone” or “scape” another website. It makes me sick that people bid on these illegal projects!!! I’m not saying that all the buyers and sellers are crooks, but I am saying that this activity is going on in plain site yet nothing is being done about it.
I have to agree with AGoToGuy here–this was news 10 months ago. Since then Technorati notifications on my domain name have been running 10-20 a week, all splogs. Wikipedia has had it defined for months.
http://en.wikipedia.org/wiki/Splog
If you click on the “Ads by Goooogle” link you can report the offending site, but that’s playing whack-a-mole, you’ll waste more time reporting than the offender does generating the splog.
while ppl are discussing this issue I thought I should check opinion of legal/ethical experts on
http://sf.getvendors.com (check out the news & views section).. we need to polish it and fix a number of issues (the final version will look quite different and load fast)..but looking at this discussion wondering what you folks think about approach..Feel free to take shots..
Drop your full feeds Om, it doesn’t give you 100% protection but it does limit the scrapers ability to rob from you.
[…] 27th, 2005 and is filed under News Brief. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently notallowed. […]
Sorry, Duncan, and others suggesting that bloggers drop their full-text feeds… that’s throwing the baby out with the bathwater.
What next? Gee, I think I’ll stop sending and receiving e-mail and just IM people with a link to view a note to them on my Web site?
Spammers and other thugs on the Internet shouldn’t force us to make content distribution and accessibility a pain in the ass for the 99% of the people with ethics.
Instead, people should indeed lobby Google to institute policies and procedures that — while not, unfortunately, likely to increase their revenues — will at least largely make the blogosphere a better, less-scraped place.
The problem, though, is that Google AND its advertisers have no economic incentive to clean this stuff up. Look at how things currently are with splogs:
1) Asshat creates a splog featuring, say, of Viagra links.
2) Some floppy fella searches Google for the Big V.
3) He lands on a scraper page and sees a big bold ad for Viagra, along with some scraped info from a medical blog about the topic. He’s happy. He clicks through on the ad to a real Viagra site.
4) Google’s happy. They just got paid.
5) The real Viagra site’s happy… they just got a new, valid customer (someone truly interested in their product), albeit in a slightly round-about way (one extra click).
6) And the scraper site’s happy. They just got AdSense money.
So, basically, in this typical scenario, EVERYONE is happy except for the folks whose content is being scraped. And unless they uber-geeks who check for their links regularly, they probably don’t even KNOW their stuff is being scraped, so no harm no foul, right?
And those being scraped, so to speak? They’re typically not Google’s customers. Their unhappiness currently isn’t any sort of a liability for Google. Worse yet, making them happy (serving as a copyright policeman) is likely to LOSE Google money.
I sincerely believe that Google’s engineers are trying to figure out algorithmic ways to blast the sploggers to hell and kill their AdSense revenue largely BECAUSE it’s the right thing to do. But given the lack of economic incentive, I sincerely doubt this is a top-priority project over there.
[…] Om Malik’s blog post on blog scraping » […]
This site: laptop-notebook.blogspot.com ripped off two of my reviews, word for word (along with ripping off a lot of other sites, including Trusted Reviews, Ziff-Davis Net, PC World, PC Magazine, LAPTOP Magazine, etc).
It’s hosted by Google. I have contacted Google, both the Adsense and the Blogger sides. Nothing, not a word.
How do these people get away with it? They don’t even change the wording of any of the reviews they lift (and they do a damn fine job, including lifting all relevant images).
I repackage many rss feeds on my site and I have google ads. I don’t see any problem with it. I’m offering a service by aggregating various feeds together. An RSS feed is meant to be redistributed, is it not? (BTW I give clear attribution and links to the original rss feed and web page. )
Make sure to get a cached copy of one of their offending webpages, including the date and time. You can get one from Google by searching for “cache:” followed by the offending website’s URL. Archive.org’s WayBackMachine may include more than one copy, so you can see how long they have been copying you over time.
Om Malik’s blog plagiarized - Been there, here’s one way to resolve it…
…
[…] Now Om Malik got hit by this and he has all the right to be pissed off. It’s a case of theft and should be handled at that level. Though some of the solutions suggested in the comment trail can be packaged together as a solution to deal with this growing issue of blogiarism. […]
The answer is DRM, not DCMA. You need to proactively protect your content using technical means, not reactively throw lawyers at the problem. This would mean changing the way everyone publishes and consumes content. Unfortunately, that’s a huge change, but the only way we can ensure correct attribution and compensation.
[…] Granted this isn’t as egregious as stealing my entire feed - or someone like ATLBloggers republishing my feed with my permission (I joined the site, essentially opting in) - but it still sucks. Others, like A-listers Steve Rubel and Om Malik, face this problem of “blog plagiarism” and “blog theft” daily. I can’t just opt out of this guy doing what he’s doing. How should I proceed? Ask to be taken off entirely? Request a link back to my main page and the original post? Be thankful more folks are reading my words? […]
[…] לאחרונה אום נתקל בבעיית הפלגייאטריזם בבלוגים, צרה חמורה דיה לבלוגים בכלל, וקשה שבעתיים עבור בלוגריי צמרת כמו מאליק. מדובר בספלוגים שמעתיקים אתרים שלמים על תמונותיהם ופוסטיהם, על מנת למשוך קוראים אקראיים שמגיעים ממנועי החיפוש, מתפתים להקיש קצת על הפרסומות, ומייצרים רווחים לגנבי התוכן שמפיקים אותם. Xb90.com היה אחד מאלו. אום פרסם כתבה נזעמת בה זרק האשמות על ימין ועל שמאל, בין היתר כלפי גוגל שמאפשרת את הפרסום באתרים כאלו ללא סינון, וביקש את עזרת קוראיו. סיבוב הקרב הנוכחי הסתיים כשהספלוג הורד מהרשת בעזרתם האדיבה של כמה מהקוראים. […]
[…] A few days ago when I made a stink about certain sites simply republishing the content without as much as a hat tip, I got a smack down from others who thought let the text flow. Jason and Jeff Jarvis have since picked up the flag, and are basically pointing out that this is becoming a big problem. Not sure where it is going to end, but Mike Rundle is sounding an alarm over a new site called Top Ten Sources. The site is simply republishing entire feeds, regardless of the content length or whatever. So the cycle continues… […]
[…] Om Malik calls it plagiarism and points to a number of bloggers making the same complaint. I don’t really see it that way. […]
Thank for your page. it s my favorite.
Amour
It’s a win win situation. Google makes money scrapers make money, adwords publishers get targetted traffic..
But also it’s a trade off for google, it needs to maintain the quality of i’s results, but also be as profitable as possible, so while other search engines aren’t doing anything to improve their serps and ban scrapers, why would google care?
+%0D%0A+%0D%0A+%0D%0A+%0D%0A+%0D%0A+%0D%0Ao
Post a comment (or leave a trackback)