SPAM sucks. People whom use SPAM suck. Anyone who responds to and purchases something based on a SPAM email sucks. It is hard to believe that anyone would ever READ a spam email much less send a spam jackass any money, but they do, and the spam keeps coming as a result (the official technical term for anyone that is involved in sending mass unsolicited email is "spam jackass", for the record.)
There are many ways the spam jackasses get the email addresses that they deluge with digital crapola. One of the most underhanded ways is that they mine it from websites (one of the underhanded ways, all the ways short of asking and getting permission are unscrupulous and the jackasses dont ever do that.) They do this by using robots that search for email addresses in mailto links or in any textual content. Did I mention "jackasses."
They then use these email addresses for spam and or even sell the collected email addresses to other fools willing to spam, er other jackasses.
If you operate a webserver anywhere you should protect against this (and if you are a web surfer anywhere you should be aware of this, and NEVER put your email address on anything that will likely end up on a web page.)
You can protect against this by using the jackasses built in jackass qualities against them. For example: you can use the "robots.txt" file to specifically tell robots to stay away from a particular area of a site. Then what do the robots do, if you say stay out, then they consider that an invitation and come right in. You can then either simply record what users have done that (user agent, IP, etc) and block them, or you can fill up that directory or a file in that directory with bogus and offensive email addresses (this is the method I prefer.) Note that there are even some server side programs to trap the spider in a loop and generate thousands of bogus email addresses (wpoison for one.)
You can also check the user agent string of every request at your webserver and block certain robots based on this value (although, they being unscrupulous jackasses, will often set the agent string to the same as IE, or Netscape, etc.)
All of this detection and manipulation can be done with Apache and a few simple utilities such as mod_rewrite. There are known lists of user agents that are email "siphons" and there is the above method of checking yourself with a "trap" to determine which robots are looking where you tell them not to (regardless of user agent.) Combining these and setting up mod_rewrite rules is a must. Its not that hard and it works.
For Example Apache Config:
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon
[OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.*$ /spammer.html [L]
Note that setting up a rule to block all agents that DO NOT identify themselves is also easy, but risky. Any valid proxy server or browser or other agent that does not identify themselves will be blocked (although calling these "valid" is pushing it, in my book, no ID, no site.)
Another way to block agents is using environment variables to Apache. This is rather clever and is elaborated on here at an evolt article. This article is very good and also elaborates on setting the "trap" and then stopping the bad robots.
Note that while you cant stop them all, especially when they are jackasses and dont play by the rules (as in the case of reporting their agent to be IE) you can stop most. Use the trap and the stop and you will put an end to a lot of spam.
For more info on SPAM and SPAM USER AGENTS, see the links. Web Robots Database
Chatter
17 hours 35 min ago
1 day 20 hours ago
1 week 1 day ago
1 week 2 days ago
1 week 2 days ago
1 week 3 days ago
1 week 4 days ago
2 weeks 5 days ago
2 weeks 5 days ago
2 weeks 6 days ago