Cloudmark's Advanced Message Fingerprinting™ algorithms are designed to target sophisticated spamming and virus proliferation techniques. By fingerprinting only relevant threat attributes in messages, Cloudmark is able to identify spam, phishing and viruses that have undergone polymorphic changes in text, URL, image, sender or other attributes. As each message comes in, these algorithms generate a set of fingerprints that represent unique aspects of the message. Message fingerprints are matched against a database of known "bad" fingerprints, which is updated every 60 seconds. If there is a match, the message will be blocked without the need for reporting.
Completely new outbreaks are rapidly detected through a global network of millions of reporters, consisting of system administrators, end users, honeypots and other sources. All feedback is corroborated and analyzed by Cloudmark's Trust Evaluation System (TES). TES tracks the reputation of each reporting source. Trust is earned over time by consistently reporting correct abuse feedback. This system preserves the integrity of reports and ensures that the system is extremely accurate. Since feedback is continuously corroborated, any inaccuracies in message classification are corrected in near real time. No other system offers this kind of continuous feedback loop.
Zombies, Open Relays, and Known Spam Sources
A "zombie" is a computer that has been taken over by a spammer, and which is used to send out bulk mailings without the knowledge of the computer's owner. Normally this occurs because the computer owner opened a virus, which gave a spammer a back door into their system. This is one of the most common sources of spam. Another prevalent source of spam is "open relays" insecure mail servers that can be used freely by spammers. Spammers use automated tools to scour the Internet in search of vulnerable mail servers, and then hijack those servers to increase the amount of spam they can send.
To combat this problem, there are several third-party organizations that maintain databases ("blacklists") that list the IP addresses of these compromised machines. There are also databases that list known professional spammers. We have arrangements with approximately 15 of these organizations, so that our system can download full copies of the blacklists hourly and incorporate them into the spam filtering system.
DNS and RFC Violations
Spammers tend to be careless in how they send email. Therefore it is important to scrutinize each inbound email to see if it followed the rules defined by current Internet standards. For example, below are just a few of the tests that examine the sending mail server and the message:
- Did the mail server falsely identify itself in the "HELO/EHLO" data?
- Does the mail server have a missing or invalid reverse DNS record?
- Is the domain missing "A" and "MX" DNS records or using illegitimate values?
- Was there an SPF violation? (Was the email sent from a mail server that is not authorized to send mail using the sender's domain name?)
- Are message headers improperly formatted or missing required data?
There are more than 1,000 of these tests. Blocking based on any test alone would block a large amount of legitimate email; however, these tests are extremely effective when used with a filtering system that combines message fingerprinting and real-time threat reporting.
Spammers are very aware of the filtering techniques used by top-tier business email service providers such as KD Interactive Hosted Email. This has led them to develop creative tactics and advanced software in an attempt to beat anti-spam systems. For instance, many spammers use binary encoding to hide their text and HTML email from signature-based filters. It is also very common for spam to include invisible HTML code intermixed with the visible content, and subtle variations in wording and punctuation, as well as purposely misspelled words.
The Cloudmark system is completely text-, language- and format-agnostic. Cloudmark's Advanced Message Fingerprinting algorithms is resistant to polymorphic changes in text, URL, image, sender or other attribute as only relevant ("spammy") parts of the message are fingerprinted and tracked.
Combining the Tests
After rigorous testing, a final confidence level is assigned to each email. The confidence level is compared against a threshold to determine if the email should be identified as spam. If the email confidence level is lower than the threshold, the email is deemed to be spam-free and is delivered normally. If the confidence level is greater than the threshold, the email is identified as spam.