Email address obfuscation

Everyone knows the story: an innocent email address is posted online and a big bad spambot finds it, relaying it to every spammer on the face of the earth… the email address becomes useless due to the 500 spam emails you get every day!

Note: This post contains old code. Read about the updated code at http://itgotmethinking.com/2009/02/01/obfuscating-email-addresses-revisited/

Everyone knows the story: an innocent email address is posted online and a big bad spambot finds it, relaying it to every spammer on the face of the earth… the email address becomes useless due to the 500 spam emails you get every day!

I always try to encode email addresses on sites I build in an effort to make the addresses more difficult to abuse. This has become a very common practice, thanks largely to free encoder tools such as the Hivelogic Enkoder by Dan Benjamin.

Some encoding methods are easy to beat

However, precisely because of their popularity, some spambots are being written to overcome simpler encoding methods, such as obscuring the address using character entities. A block of text encoded as character entities is easy to defeat with automated decoders, even by an amateur like me. The patterns are still there: the ‘mailto’ protocol, the @ sign, the .com/.org/.whatever, etc… they just look like this:


yourname
@somedom
ain.com

Javascript shouldn’t be required

More complex encoding methods, such as the aforementioned Hiveware enkoder, still seem to work well, but they also rely on Javascript. While the Javascript adds several layers of complexity for the bot to overcome, it also limits what your visitors can do on your site. What if your visitor has Javascript disabled? Many times, they’ll see nothing… no fake email address, no chunks of encoded text, nada. This is a big no-no if you’re a supporter of graceful degradation/progressive enhancement.

I decided to search around for encoding tricks that would work without Javascript. Guess what? I couldn’t find any, aside form the simple character entity encoding described above. It wasn’t much of a surprise, to be honest.

What about spelling it out at myaddress dot com?

I started looking at the common trick of writing “name at somewhere dot com.” I have a couple of problems with this approach: First of all, the address isn’t clickable. Sounds silly, but it’s a big usability component if you’re trying to encourage people to contact you. Secondly, I think it would be easy to write a bot that finds page content written in that format, especially in places where it’s a common practice, such as bulletin boards and forums. This seems like a temptation for a zealous hacker who wants to prove his or her worth. No thanks.

A simple compromise

After kicking around these thoughts for a while, I decided to implement a compromise between two of the three methods I’ve covered so far: using an altered email address to fake out the bots, and using Javascript to make the bots work a little harder.

Granted, I can tell you in advance this isn’t a foolproof method, but it’s very easy to implement and doesn’t leave Javascript-deprived visitors out in the cold.

Part one: Add some useless text to your address.

Yes, that means we’ll be using the standard email link technique:


<a href="mailto:someone@somewhere.com">Email me!</a>

Important! Notice that I didn’t type the email address between the ‘a’ tags… that would make this system pointless! I suggest typing sensible alternate text, such as “Email me!” or the email recipient’s name.

By adding a little bit of unrelated text to the username portion of the address (“REALLYNICE”), we can prevent spambots from knowing what our true email address is:


<a href="mailto:someoneREALLYNICE@somewhere.com">Email me!</a>

So what does this do? It keeps the link clickable, and it prevents the spambot from knowing what our real email address is.

What doesn’t this do? It doesn’t get rid of the useless text (“REALLYNICE”), which means that while the link remains clickable, it’s also useless if the visitor doesn’t manually edit the address.

As the webpage developer, it’s my duty to make the link human-readable, and make the dummy text as obvious as possible. The following example is much easier to read by the average person:


<a href="mailto:someoneRemoveThisText@somewhere.com">Email me!</a>

Part two: use Javascript to make it easier on the end user

At this stage, the email address is somewhat usable, but still requires effort on the end user’s part. If they click the link, it will appear in their email program with the full text “someoneRemoveThisText@somewhere.com”. Some people will see what they have to do and act accordingly, but others might not realize they need to take action.

Here’s where Javascript comes in as a progressive enhancement: we can use Javascript to remove the dummy text when the link is clicked!

A simple Javascript function examines the link text, finds the specified bit of dummy text, and removes it automatically:


function doMail(theLink, key){

    //Get the HREF tag. This includes the anti-spam 'key'
    var before = theLink.getAttribute('href');

    //If the anti-spam key is not found in the link, exit the function without doing anything
    //If the link is clicked more than once, this prevents the Javascript from throwing an error
    if(before.indexOf(key) == -1) return false;

    //Our new variable "addy" is a combination of the text that
    //comes BEFORE the key [0] and AFTER the key [1]
    var addy = before.split(key)[0] + before.split(key)[1];

    //Substitute the original link with the new link ("addy") 
    theLink.href = addy;

}

Sample usage:


<a href="mailto:someoneRemoveThisText@somewhere.com" 
   onclick="doMail(this, 'RemoveThisText')">email</a>

Because the dummy text is specified as a key when the function is called, you can use whatever dummy text you like. For instance, at my office all emails are formatted as: givenname.familyname@ouroffice.org. You could rewrite the address in the following way:


<a href="mailto:givenname.familyname.dummyText@ouroffice.org"
   onclick="doMail(this, '.dummyText')">email</a>

or


<a href="mailto:givennamenoSpam.familyname@ouroffice.org"
   onclick="doMail(this, 'noSpam')">email</a>

Of course, I recommend avoiding terms that are easy for spambots to recognize, such as “nospam”. Why not get creative with something like “SpamSucks”?


<a href="mailto:givennameSpamSucks.familyname@ouroffice.org"
   onclick="doMail(this, 'SpamSucks')">email</a>

You can even put the dummy text in the domain name, if you choose:


<a href="mailto:givenname.familyname@iDontWantSpamAtouroffice.org"
   onclick="doMail(this, 'iDontWantSpamAt')">email</a>

Nothing is invincible!

This email obfuscation method may wind up being easy to crack by the more sophisticated bots, but I feel comfortable knowing that I’ve added a reasonable layer of complexity the spambot must overcome. This in itself will prevent the majority of bots from harvesting my addresses!

I’m also happy because the email address is still human readable (if the dummy text is sensibly written), and is still clickable with or without Javascript. Plus the Javascript is extremely lightweight and the entire method is easier to implement than some of the crazier encoding methods being used today.