Referrer spam attack! Or is it?
One of the peculiar aspects of having a blog is that you're not only a writer, but also a website administrator. If you have your own domain, you pay for bandwidth, and you allow people to come on and write whatever pops into their heads. As a result, there are things you have to watch.
And so it was with some alarm that I determined the other night that I was suddenly getting an enormous amount of traffic from a host of domains, all ending with ".listenernetwork.com/SearchWeb.asp." Before the first dot were the names of various radio stations around the country. When I went to my logs and clicked on the referral sources (or, as I've now learned, "referer" as the computer servers know it), you couldn't tell what they were searching for. But if you went to the various "listenernetwork.com" home pages, you would see that they're all a template, no doubt generated at a single location, that various radio stations use as a site for their listeners to visit. Generic and cheap, but seemingly unique at first glance. And shades of OregonLive and its sister sites in the anemic Advance Publications chain, they're all almost identical.
That was all very interesting to notice, but it didn't solve my problem. The evil hits were coming fast and heavy, and they were landing on various category archive pages on this site. After nearly three and a half years, those archive pages have gotten mighty long. And there are tons of images on them, which makes them a heavy load to send out. If that's a robot spamming me, there's going to be nothing but trouble ahead. It's going to chew up huge hunks of bandwidth, and sully my hit counter with fake traffic.
And so off I went to find a solution. Blocking the IP addresses of the visitors wouldn't work, because the hits were coming in from all sorts of different addresses, and I was sure they were all being faked. The advice I most often received was that I could keep these referrals away by making a modification to a file on my server called "htaccess." It is only with grave, grave trepidation that I mess with such things, but in the heat of yet another battle with spammers, off I went to try to set up a barrier.
Based on various highly technical posts I found on various sites, I came up with a number of different ways that it could be done. But try as I might with my limited technical skills, I couldn't keep the "listenernetwork" hits away. And so I brooded for the better part of two days about how I could save my bandwidth and keep the visits from artificially inflating my hit count.
During all this stewing, I noticed that the hostile searches all landed on the same archives on my site: Family, Food, International, National, and Nostalgia, with the last being the most prevalent. As a temporary fix, I renamed all those archives by sticking a "2" on the end of each name, and deleting the archive page that had each name on it without the "2." Sure enough, that kept "listenernetwork" referrals from making it to my hit counter -- they were still arriving, looking for, say, the "Nostalgia" archive, but instead of seeing any of my pages, they'd get a "404 - File Not Found" error. I was o.k., for now at least.
Still trickling in, though, were a smaller number of hits from various search engines, including Yahoo and a Denver newspaper's site, and on these, you could see the same bizarre search term: "What Manhattan deli served up a corned beef and tongue sandwich called 'Tongue’s for the Memory'?" While trying to figure out what to do to try to block that, it finally dawned on me that maybe that was what the listenernetwork searches were looking for, too.
Yes! Of course! It was that post I had written a while back about Hobby's Deli in Newark! Many of those search terms were in that post! And guess what? That post appears in exactly five archive categories -- the same five that were being hit by the listenernetwork searches. So the listenernetwork attack and the tongue sandwich attack were all part of one and the same evil plot, and they were all looking for the Hobby's piece.
Just for kicks, I ran the search through Google. Tons of hits, including this one as No. 1. Still no clue as to who the evil spammer is, but at last I'm getting somewhere. Then I tried running the search with the word "listnernetwork" up front, and ...
...lo and behold, the scales fell from my eyes. Check it out. It looks as though it's not a spam robot hitting my site at all, but a nationwide internet trivia quiz sponsored by a bunch of radio stations. And all those IP addresses? They may not be fake. Those may be actual radio listeners trying to earn "points" in a giveaway contest.
So what to do now? I'm thinking of taking down the defenses I've been throwing up. If those are legitimate hits, I should let my counter count them -- and just hope I don't run out of bandwidth at the end of the month. But there are so many of them, at so many odd hours, I can't believe they're all real readers. What to do?
Wherever this leads, I'm determined to figure out how to block referrals like this in the future. If "htaccess" will do it, I've got to become more expert at it and figure out how.
One last note: While checking those referral logs, I noticed that the teenagers of America are major bandwidth thieves. All those "MySpace" pages are chock full of images being lifted from other servers via hotlinks. Among the ones the teens liked best from me were this, this and this. But if they are hotlinked to the original places where I had them stored until yesterday, their sites are now displaying this. Clean up your act, kids!
P.S. To answer the trivia question, it's the Carnegie Deli. I've put up this post, with a fake date and an archive category all its own, to see if I can distract the traffic over there. I'll get the hits without the bandwidth drain.
Comments (6)
Interesting post Jack. And I'm sure your message about stealing bandwith will stop the myspace kids from using your pictures. hahahahaha...
a more clever message would have been a picture of Nancy Regan proclaiming "Just say No".
Posted by justin | November 7, 2005 5:49 AM
Actually, I believe I've got them all "forbidden" now. They can steal the pictures all they want, but they'll have to show them via their own bandwidth, as they can no longer hotlink successfully to my server.
I did get a sadistic pleasure out of that "revised version." One kid had an image from my server tiled across his page as wallpaper. Wish I could be there when he saw what it looked like after my little change. Dude!
Posted by Jack Bog | November 7, 2005 5:55 AM
touche...
... and you have to keep this blog around, until your daughters become teenagers. Besides suicide bombers, teenagers are the most irrational group of people alive.
Posted by justin | November 7, 2005 7:52 AM
MySpace. If there's a bigger cesspool of inanity available on the infobahn, I'm not sure where to find it. They gleefully give Joe Average Nitwit the easy ability to create an wholly-unreadable "home page", complete with retina-searing color combinations and tiled backgrounds the likes of which went out of style... oh, about five minutes after they came INTO style, lo the many years ago. *shudder*
(Disclaimer: I have a MySpace account, but only because my closest buddies in the office have accounts there and insisted I join the collective, as it were. It's much the same with with LiveJournal... I have an LJ so I can comment on LJs.)
I am, however, curious about my website traffic for the first time in ages. Hmm, where'd I put those log processing scripts...?
Posted by GreyDuck | November 7, 2005 9:41 AM
The problem is that people who do not pay for bandwidth have no idea how much of someone else's they are using when they hotlink an image that will be viewed on a website (like, say, Fark.com) that gets tens of thousands of hits. I make it a rule to only hotlink images from large corporate websites--CNN, Yahoo, etc.
Posted by Dave J. | November 7, 2005 10:14 AM
Just a technical point: Contrary to popular opinion, it’s nearly impossible to fake an IP address on a TCP connection. Had those been actual spambots, they would likely have used borrowed*, but real, IP addresses, i.e., when a computer gets taken over by a virus, it allows spammers to post from your computer without your knowledge.
* technical term is “trojanned”
Posted by Nate | November 8, 2005 12:19 PM