Wednesday, June 14, 2006

Thieves and Idiots... A Slap at Many Faces

Ready for a sharp-tongued diversion in which I take a slap at several entities in a single post? Ready for a piece that both is and isn't about poker? I thought so. Blame it on a pissy day at the tables... I feel the need to vent.

The launching point for this mini-sized discourse is over at Gambling 911, a site that's been foundering for a bit, looking for a new purpose ever since the most recent season of "American Idol" concluded. There was a day not too long ago when the Gambling 911 page featured six different "Idol" pieces out of the eight that were visible on a standard-sized monitor. Who sayeth overkill? Thank God the World Cup has started --- for Gambling 911, at least, it's a change of pace.

Anyhow, Gambling 911 somehow snuck in a piece not too long ago that was related to a topic we reported on here: the continuing shady practices --- mostly various forms of online property theft and spamming --- by the 888 Holdings/Cassava Enterprises holdings group that also runs Pacific Poker/Casino-On-Net.

Or, as Gambling 911 reported it: "Site scrapping and content theft are very serious issues that need to be halted now," wrote [casino watchdog Bryan] Bailey on his popular posting forum. That forum also explains exactly what 'site scrapping' is in more detail."

Hilarious. If only because it's called "site scraping," (as the original quote shows). I have no problem with typos (and serve up many of them myself), but quoting something twice, specifically, and still getting it wrong sort of crosses the line. There are those moments when Gambling 911 demonstrates that it's about an equal mix of (a) competent individuals and (b) random-key-punching floozies more interested in flashing their boobs at the nearest camera than getting a story even close to accurate. Guess which group's representative likely typed this one up.

Scrapers, scrapers everywhere

But, whatever. Dear Gambling 911: Learn how to use the "Ctrl-C" function on your keyboard when copying a quote.

As for me, it's on to the matter of site scraping itself, and why 888 remains front-and-center at the list of online-poker scumbags. Bryan Bailey, the webmaster of the Casinomeister watchdog site refernced above, reported that one thread on the topic at related site extended to 15 pages, reflecting its relevance to anyone who has a web site related to poker or online gaming.

I'll let Bailey's words define what "site scraping" is:

"Basically, Site Scraping is where a black/grey hat webmaster uses a bot to grab as much content as possible from different websites. This content is then used to form the basis of this webmasters new site.

"So not only is this content theft, the offending site could possibly affect your own sites' rankings in the Search Engines if they utilise your stolen content as it is possible you will be hit by a duplicate content filter, thus lowering your serps across the board.

"The benefits of scraping [also] allow these webmasters to upload normally recently expired domains that they have purchased thus avoiding the Google sandbox, with thousands of pages of themed content. This in turn will initially allow the site to rank highly for its targeted terms. However, the pages being served to the visitor will redirect to the targetted casino of which the webmaster either affiliates with, or works for..."

You've heard of the page-rank game, right? If you have any sort of poker-content website, then beware: 888 and its affiliates may well be stealing from you. Needless to say, there are also lots and lots of links to evidence of 888's complicity, viewable at the sites linked above.

However, it's one thing to define what site scraping is; it's still better to show you an example of it in action. Readers who've been visiting for a while will remember a story I did a few weeks ago about a poker-software program I encountered that was 99 and 44/100% scam. Okay, it was 100% scam; so much for Dirty Harry and the benefit of the doubt. I mentioned encountering the news release for the scam site at another site called Poker News Hub, which I described in several ways, the nicest of which was probably "vapid poker mindlessness."

Allow me to update and correct myself: Poker News Hub is a site scraper in action, and it scrapes a rather prominent source... Yahoo! News. Take a look at any recent day's listings on the Poker News Hub site, then go to Yahoo!, click the "News" tab above the main text-search box, then type in "poker". Voila. Your story lists --- if it works as it did when I tested this --- will be one and the same.

Unless Poker News Hub is a fringe division of Yahoo! itself (which seems unlikely, given the affiliate ads), the site is web thievery of the site-scraping variety. What they're doing is grabbing the XML feed from Yahoo! for the "poker" topic, then using programming macros to repurpose it onto their own site, sandwiched between all those affiliate banners. It's an example of how site-scrapers operate, though they come in many flavors. Nor is this Poker News Hub site necessarily connected to 888, Cassava, or any of its affiliates: we're just using it here as an example of the process.

So why the rant? It's because of the nature of theft --- thieves may steal from only one source at a time, but collectively, their acts cost everyone.

So guard your poker content, because someone can use it to make money at your expense. One trick I often use is to look for a distinctive phrase, such as "Nor is this Poker News Hub site necessarily connected to 888" in the above, then (after a few weeks have passed) I search for the complete phrase on the web to see where else it might pop into view. Such methods help identify the thieves, and for me, they often help me find separate the wheat from the chaff.

I went back a few weeks into one of my own posts here, and found out that, sure as the sun rises in the east, the bot-driven site scrapers have even hit this little ol' blog. From the same scam-themed post I mentioned earlier, I selected the phrase "Sites like these are avenues of opportunity for even" --- with quotes on both ends of the word string. I went to Yahoo! and typed it in ... and found 43 fucking matches. And yes, all but a couple of the links that turn up are going to site-link variations of the 888/Cassava/Pacific thieves mentioned higher up in this post.

Miss WWSB (World Wide Scraper Babe) plying her wares

I might not be able to fight them, but I damn sure don't have to like them.

And I damn sure don't have to give them my business, either.


Jonathan Bailey said...

There are ways to fight back against scrapers, DMCA notices work very well and don't cost any money.

If you want any help fighting against scrapers, just let me know. You can use the contact form on my site to email me if you wish.

I'll gladly help any way that I can, especially against these scrapers.

Anonymous said...

Although full content scraping is a moderate issue -
what about Google News? They basically compile their news from a slew of sources, although they usually only include the headline of the content and a sentence or two of the content itself.

Do you consider this a threat? I am torn, and haven't decided my alignment on the issue to be honest.

Haley said...

As long as it's just the headline and a link to the original source, it's not a problem. The issue comes with sites that repurpose other people's original efforts for their own purpose and potential profit... as with the sites mentioned in this post.