Identifying Spam

Sometimes it's quite easy to determine if a message is spam, based on the obvious "spam-like" content of a given message or the name of the sender. Many spam filters work simply by searching for the most common words and names used by spammers. However, things are rarely that easy. 

 

A definition we've been moving towards is that spam is "unsolicited bulk, commercial, or objectionable email, often sent using stolen resources." Once we unpack this definition, it becomes clear that spam identification is problematic and requires a systematic approach, one that cannot be completely automated. For example, it takes a certain amount of research and analysis to determine whether headers have been forged and at what step in the delivery process. In addition, determining whether a message is truly "unsolicited" opens up another level of complexity, where certain qualitative decisions need to be made. For us, a crucial part of our mission is spam identification. This section briefly summarizes how we evaluate a questionable message. In short, the following are the general questions that we ask when judging whether a message is spam.

 

Is there a prior relationship between the sender and the recipient?

From our perspective, determining if a message is unsolicited is the key goal. To this end, it helps to verify the existence of a prior relationship between the sender and the recipient. If you receive bulk email from a person or company that you never heard of, it is unlikely that you requested to receive the email. An analysis of the message content and of the business that's advertising can often rule out or confirm a prior relationship. However, companies and individuals who have had relationships with the victims can still send messages unsolicited, and it's still spam.

 

Is there a legitimate removal option?

Another important clue is a removal option. Removal, or "opt-out," options typically come in the form of an email address or Web site link within the content of the email. The recipient can then theoretically follow the removal instructions to cease delivery of further mailings. Research on the particulars of the removal option is necessary to distinguish spam from legitimate bulk email that recipients may have subscribed to in the past. The presence of an effective removal option in the email message does not by itself mean the message is not spam. Some less legitimate senders of email actually sell removal response messages to spammers, who then send further unsolicited emails to these "confirmed live" email accounts. Thus, a removal option becomes yet another tool in the spammer toolbox. Further investigation is required to determine whether the removal option works and does not lead to more unsolicited email messages.

 

Was there an attempt to conceal header information?

The next bit of detective work involves examining the headers. The headers provide information such as the sender of the message, the recipient, the mailer that was used to send the mail, the names of the different servers that processed the message along the way, and so on. Header information provides a good summary of the path that a piece of mail took. Unfortunately, spammers can forge header information easily; it is a trivial matter to insert arbitrary information to cover their tracks. The last thing that many spammers want is to reveal their identities and whereabouts. Various tools allow you to trace the paths of messages. Detailed and systematic

analysis of the headers is often necessary to sort out what doesn't make sense and spot the inconsistencies or impossibilities.

 

Was the message sent by bulk methods?

Another important clue is whether the email message is bulk email. Were multiple copies of this email message sent? If you received multiple copies of the same message, this often indicates that the message was delivered in bulk by an automated tool. In most cases, however, it is difficult to tell, as you receive only one copy, and cannot know if one or one million such messages were sent. Our spam system's unique architecture, however, enables staff to quickly verify if certain messages were delivered in bulk.  Bulk mail delivery alone does not identify a message as spam, as many legitimate, solicited email messages are sent in bulk. However, it is a worthwhile clue to consider.

 

What is the content of the email

Finally, the content of an email may provide clues to whether it was unsolicited. Regarding content, the first giveaway is usually the curious grammar and word choice that spammers seem to employ. Certain patterns, such as liberal amounts of ALL CAPS and multiple exclamation (!!!) points are often favored by spammers. As we discussed earlier, spam as advertising attracts certain businesses more than others. Many of these common types of email, such as multilevel marketing messages, are often less than fully legal attempts to involve consumers in schemes that may be completely fraudulent. An analysis of the content involves verifying whether the email message:

 

* Advertises something for sale

* Offers money-making opportunities

* Advertises pornographic web sites or products

* Contains offensive material

* Otherwise follows patterns typical of many other already-identified spam messages

* Contains or attaches suspicious software code

 

Another thing to check is whether the apparent "legitimate business" referred to in the content of the message has a Web-based email account while advertising a business domain within the message body. The clues discussed in this section are just some of those used by the operation center staff to determine whether a message is spam. No single clue alone causes operation center staff to treat a piece of email as spam. If we could feasibly ask each user whether they had requested the email, we would not need to use these indirect clues. A certain amount of judgment is required to judge whether a specific email message is unsolicited and whether it's spam. Below shows the basic parts of a spam message, illustrating the discussion in this section.

 

 

Traditional Anti-Spam Methods

The most logical and practical places to filter email for spam are in the mail user agent (MUA) or mail transfer agent (MTA), but the two are by no means equally effective. MUAs are the client applications that allow users to retrieve and send mail from their computers. Common MUAs include Netscape Messenger, Microsoft Outlook, and Eudora. MTAs are like post offices; they are programs that reside on mail servers and are responsible for routing and sometimes delivering mail. MTA and MUA-based filtering is usually based on the header information, the mailer type, or the IP address or domain name of the sender.

 

To filter at the MUA level requires that email users explicitly create anti-spam filters on their machines. This approach has a number of shortcomings. First, the onus of the anti-spam work is placed on the recipient. This is not only time-consuming, but also largely ineffective. Email users typically do not have the expertise to create effective filters, nor do they have access to the most current spam. Filters based on past spam will generally be ineffective in blocking current spam, as spammers constantly change their messages to avoid such filters. Any attempt to combat the flood of spam must itself leverage the power of the same networks that the spammers exploit, and must operate in real time, around the clock. The filters users create quickly become outdated because they are fighting yesterday's spam. In addition, by the time spam hits the userÕs machine, much of the damage is already done, in terms of storage costs on the mail server.

 

Filtering in the MTA, on the other hand, is often accomplished by adding rules to the configuration for the specific mail system running on the server. MTA level filtering is more effective than MUA filtering because it enables filtering for a larger number of mail accounts from a central point for administration. The drawback in this case is that users need to provide spam messages and other information to the email administrators so that current information can be incorporated into an organizationwide filtering list. This method requires continuous maintenance to keep the filter list current and effective, because it is built in reaction to spamming activity. The filters are drawn from only one ISP, and lack input on the types of spam circulating in the rest of the Internet. Another problem is the tendency to identify "false positives," cases in which legitimate mail is incorrectly identified and filtered as spam. If the filter list is not made with care, or if domains are incorrectly blocked, valid email messages are discarded along with the spam.

 

Whether it's MUA filtering or MTA filtering, the same essential problems exist. For individual ISPs and email users, the available information about current spam attacks is limited, and the Internet represents a huge playing field. Traditional measures both block legitimate email and reduce productivity because service providers' staff and the user community need to continually devote time to fighting this problem. It's a never-ending battle because spammers' techniques and tools are always changing.

 

Additionally, once a particular domain is blocked, it is trivial for a spammer to obtain another one and resume spam attacks. Because persistent spammers can easily obtain new IP addresses and new domain names on a daily basis, reactive blocking and filtering is futile, like trying to hit a moving target.

 

MC's Employed Solution

Our anti-spam solution is a server-side, Internet-wide, solution that actively seeks out, identifies, analyzes, and ultimately diffuses spam attacks before they can overwhelm networks and irritate email users. Furthermore, it is part of a comprehensive solution that blocks viruses and other threats that arrive via email.

 

This solution uses filters that are based on human and/or machine analysis to determine if email messages should be routed normally, sidelined, or modified. This is achieved through service and software components, automated and human-directed functions to forge the best defense against spam. The main service components are the "Probe Network" and the Operations Center (OC).

 

Together these components add up to a dynamic and effective solution to the spam problem, one that takes the guesswork out of spam identification.

 

Probe Network

The Probe Network is a large collection of email accounts with a statistical reach of over 100 million email addresses. The email accounts in this pool are created worldwide and include addresses hosted by some of the largest ISPs in the world. The email accounts that are used for detection are called probe accounts. Probe accounts are the first step in the real-time detection and analysis of spam. They attract spam. As mentioned earlier, spammers are quite resourceful in their harvesting of email addresses.

 

Many of the probe accounts, therefore, are strategically seeded to attract and catch large quantities of spam. Knowing where spammers go to collect email addresses helps to strengthen the Probe Network. As a result, spammers never know if they are sending mail to an unsuspecting recipient or to a probe account.

 

The structure of the Probe Network also provides powerful evidence that helps to judge if a message is spam. This virtual "net" of numerous accounts spread all over the Internet makes it easy for us to quickly verify that a given message was sent using bulk methods. When the same questionable message is caught by different probes, alarms go off and we can take action.

 

The Operations Center

When a probe account detects a possible spam attack on the Internet, the probe immediately routes the message to the Operations Center (OC), a spam-analysis center staffed round-the-clock, 365 days a year. The OC consists of a dedicated team of email experts whose mission is to provide swift, accurate responses to spam threats, and pro-actively research and develop technologies that eliminate future threats. Their duties include:

 

* Analyzing incoming email from the Probe Network

* Developing, validating, and transmitting anti-spam rules to our mail servers

* Managing and seeding the accounts in the Probe Network

* Researching spam attacks

* Collecting statistics and information to evaluate the effectiveness of filtering servers

 

The experts at the OC are another example of what sets our filtering service apart from other filtering systems. As we saw earlier, certain qualitative skills are essential to accurately distinguish spam from legitimate email. Most email users won't tolerate losing legitimate mail to the fight against spam. Our extremely low false positive rate is a direct result of the incorporation of the OC into the anti-spam process. The OC serves as an intelligent buffer between the spammer and the unwilling recipient of spam.

 

This added intelligence, however, doesn't come at the expense of privacy. The OC only has access to mail addressed to the probe accounts. The specialists at the OC have no access to email users' personal email. In the end, the email user has final say. Our customers can access a list of all blocked emails via our online control panel. We refer to these spam messages as grey mail.

 

Email scanning

Using updated anti-spam rules transmitted from the OC, the servers check the headers, contents, and other information in each message and identifies grey mail (suspected spam). The grey mail is routed to a special storage area.

 

Mail Flow with the anti-spam filtering

The diagram below shows how the mail flow process works:

 

Summary

This unique anti-spam solution can be summarized in three steps:

 

1. Find Spam

First, spam is actively sought using a probe network, an extensive array of dedicated email accounts with a statistical reach of 100 million Internet addresses.

 

2. Identify Spam

When the probe network finds possible spam, it forwards that email to the OC. There, spam experts verify that the email is spam and write rules to block it. They send those rules to the scanning servers.

 

3. Stop Spam

Using updated rules from the OC, scanning servers identify and filter spam messages from incoming email. Grey mail is diverted to a special storage area, where users review via our online control panel.