| |
|
|
|
|
AntiSpam
AntiVirus Gateway Specifications & Details
The Clever AntiSpam AntiVirus Gateway
stands as a first line of defense against virus-infected
messages that may be harmful to your systems, and
performs a series of tests to determine if an email
message should be accepted.
We rely on ETrust AntiVirus from
Computer Associates, with signature files updated hourly
as they become available.
The AntiSpam system is in strict
adherence to the SMTP Standards as stated in
RFC-2821 (Request For Comment,
Final Draft), and accepts or rejects messages based on
these guidelines, along with other filtering techniques
that have been developed and refined over months of
research and several hundred-thousand processed
messages.
Click the flow chart to the right to
see a quick overview or how it works.
We continue to monitor these
mechanisms and regularly update them to adapt to new
circumstances as the junk email environment evolves.
Please take the time to read the
following document, and if you still have questions,
check the FAQ page.
Tips for staying off
the Spam Lists
|
|
| |
How It Works
When an outside system connects to the
Gateway and attempts to transmit a message,
a series of tests are performed throughout
the delivery process, beginning with Phase
One:
-
First, the IP address
from which the sending system is calling
is checked against published DNS records
by a Reverse Lookup. A legitimate
network must have a Reverse DNS Pointer
(PTR) record for any IP address that
will be transmitting mail to the outside
world. If the IP address, for example,
is 1.2.3.4 then we query
4.3.2.1.in-addr.arpa for the PTR record
that should be returned. The PTR record
should contain a fully qualified domain
name (FQDN) of the mail system that is
talking to us right now, such as "mail.example.tld".
We then perform a Forward DNS Lookup for
that value to see if an IP address is
returned from the published records for
that domain, which should resolve to the
same IP address we started with. If the
PTR record is missing, or doesn't
contain a proper FQDN, or if the attempt
to resolve that FQDN back to the
original IP address fails, the session
is terminated. This is usually the case
for "zombie" computers that have been
infected with a Trojan virus, or "spamware"
that's running on an unauthorized
workstation or server that has internet
access but is not normally supposed to
be sending mail.
-
Next, the sending
system should identify itself with a
FQDN in a HELO/EHLO introduction
command. This must have a DNS record
someplace that resolves to an IP
address. If it does not, the session is
terminated. A good example might be
"mail.example.tld" or even just
"example.tld". If example.tld is a real
domain, there will be DNS records for it
held by the hosting service, and an
associated IP address will be
available. If, however, the connecting
system identifies itself incorrectly
with something like "MyHomePC", we know
that it's not part of a domain and
should not be generating mail directly
to destination servers. Instead, it
should be talking to a local Mail
Transport Agent (MTA), such as the mail
service at their Internet Service
Provider (ISP) and logging in and
requesting that the MTA transmit a
message to us. The local MTA will be
able to authenticate the sender, and
will have the correct DNS records, and
the mail will be accepted, at least
through this point. A filtering
technique is also applied here, because
some ISP's, especially DSL and Cable
Internet providers seem to feel that
it's necessary to publish DNS records in
the public space for all of their
home-user customers, despite the fact
that they are not generally authorized
to be directly sending mail. We examine
the text of the HELO/EHLO identifier for
certain clues that will indicate one of
these special types of machines, and if
a match is found the session will be
terminated because these machines should
not be sending mail directly either, but
should be using the MTA as above. Note
that some DSL connected networks do in
fact host their own DNS records and can
act as their own MTA, and these cases
are easily identified and passed through
DNS record examination, and virtually
never provide a HELO/EHLO identifier
that will fail this test if they are
configured correctly.
-
In the third step,
the sending system will tell us who the
message is from in a MAIL FROM
structure. This value must be a
properly formatted email address per the
RFC. If it is not, the address is
rejected and the session terminated. If
it is properly formatted, the domain
part after the "@" symbol is further
examined by retrieving DNS records for
that domain. We even make a quick call
out to the mail system of the sender
domain and ask if the address is
legitimate! If the domain doesn't exist,
or if the information returned cannot be
reconciled with what we already know,
the address will be rejected and the
session will terminate. One example
might be if the IP Address above belongs
to the hotmail network, but the email
address claims to be "example@yahoo.com"
then we can reasonably believe this
transmission is a forgery, or the
legitimate mail client that originated
the message is not correctly configured,
as it may have its settings mixed up
between two MTA's that it can send mail
through. In the latter case, correcting
the settings in the mail client will
cure this failure. Another cause for
rejection is if the mail service for the
sender domain tells us that the address
is totally bogus. The primary exception
to this paragraph is the Null Sender,
represented as "<>" instead of a
complete email address. Per the RFC,
Null Sender is used to return delivery
notices, bounce messages, and postmaster
advisories from other systems, typically
in response to mail that was sent out
from our mail server. Compliance to the
standards dictates that we must accept
such messages, and because of this some
spammers exploit this requirement by
sending junk mail using the Null
Sender. We thwart some of this later in
the process, if such messages actually
get this far by passing the tests in the
preceding sections.
-
Now we will get one
or more recipients as complete email
addresses telling us who we'll need to
deliver the message to. Each of these
is checked to ensure that they are valid
accounts on our network. No address
will be accepted here that isn't hosted
by us. If we were to accept and
subsequently attempt to blindly deliver
messages to domains outside of our
network, this would be considered Open
Relaying, and is not permitted.
Remember that we're only talking about
incoming messages from the outside world
here, not about any outbound messages
that our clients may send through us
(those are handled through the mail
server and require authentication, and
are not processed through the Gateway).
Note that these addresses must match
exactly in spelling and punctuation
(although not case-sensitive) to a valid
mailbox. In the past, there used to be
a practice of having a "garbage" alias
that would catch mail that was intended
to a valid domain but didn't have the
local part before the "@" symbol
correct. This was intended to accept
mail that simply had a typographical
error in the address, and in some cases
it was used as a general alias to
facilitate sorting of incoming mail
based on the address it was sent to
without having multiple mail accounts.
Unfortunately this has led to a
situation where spammers could put
anything in the local part of the
address and get delivery, flooding
mailboxes with all manner of junk, so
this capability is no longer available
(we offer unlimited aliases to be set up
to accept mail, so this shouldn't be a
significant issue). Finally, it is here
that we can often defeat spammers
exploiting the Null Sender mentioned in
the section above, because real
postmaster responses will never have
more than one recipient since they are a
direct response to a single message from
a single sender. Spammers attempting to
abuse the Null Sender also often try to
pass multiple recipients through as
well, because it's more efficient for
them to only transmit the message
content once to several targets. We
catch these attempts and treat such
sessions as hostile and will terminate
them without going any further.
At this point, we've
received what's generally referred to as the
SMTP Envelope: the handling instructions for
the message we're about to get. Appearance
of any of the above sender or
recipient email addresses in the actual
message that you see in your regular mail
client (Outlook/Eudora/Webmail) is purely
coincidental. We have not yet received
any of the message data, which includes the
Headers and Body sections of an email
message that you're probably used to
seeing. The Headers may contain completely
different information, depending on the
source and intent of the message, and on how
many relay points on the internet have
previously received and forwarded the
message along. If the process has passed
all of the above tests and made it this far,
we will allow the sending system to transmit
the actual message, which it does with a
DATA command. At the conclusion of this,
the sending system tells us that we now have
the entire message, and we have one last
shot at rejecting it if something inside it
fails the next round of inspections:
-
In Phase Two, we
first examine the headers for missing
key elements that would indicate that
the message is spam or poorly
constructed by some "spamware" or Trojan
virus or similarly unwelcome source. We
also look closely at the "Received"
lines, specifically the IP addresses
that theoretically have carried this
message on its journey. It is important
to note here that absolutely every
part of the headers found here can
easily be fabricated from nothing.
There is no reliability in any of
it, however there are mistakes often
made in such fabrications that can make
them easy to detect. After processing
several thousands of messages, patterns
emerge and tell-tale signs become
apparent. These are what we look for
and regularly adapt as spammer behavior
tries to adjust to the latest in
defensive measures. For example, the
Sobig Virus generated emails containing
copies and mutations of itself and in
doing so would place a specific discrete
marker in the value of the Date Header,
probably so if it came back around to
the author he wouldn't infect himself.
Even while antivirus signature files
were running behind the mutations,
filters such as ours could detect the
marker and either discard the messages
or at least quarantine them for later
human review. While it's possible to
scan both the Headers and the entire
message Body for key words or phrases,
this sort of filtering has proven over
time to be largely ineffective, because
spammers intent on getting their message
delivered rapidly employ intentional
misspellings and character substitutions
at such a rapid rate and infinite
combination possibilities that to try to
keep up would require an army of
personnel at a prohibitive cost.
Recently developed, a process called
Bayesian Filtering was supposed to be
the end-all to spam by scanning the
message body for various words and
applying a cumulative positive /
negative score for each word found in
the first and last few hundred
characters of a message. Spammers
defeated this method in less than 48
hours by saturating their messages with
leading and trailing paragraphs of text
that was either direct quotes from
famous written works or utter nonsense
made up of highly positive-scoring
words. I'm told a lot of work was put
into that concept, too bad it only
lasted a couple of days.
-
Next, we take a look
at any attachment files that may be
included in the message. Files with
names ending in certain extensions are
always treated as likely to be harmful
and therefore will trigger a rejection.
Currently, these are EXE, PIF, SCR, BAT,
COM, DLL, VBS, CPL, and JS. This list
may change as circumstances require.
Normally ZIP files are allowed, however
this may be temporarily suspended during
times when a new virus is on its initial
"run" and signature files are not yet
available to correctly identify a virus
that may be inside. We reject messages
with these attachments because in the
real world virus signatures come out
after the virus is released, and it's in
that initial lag period that the damage
can be most severe while no one has any
protection yet. In some cases, we will
quarantine such messages for human
review, and if warranted a copy may be
sent to the Virus Lab at Computer
Associates, particularly if there is any
probability that it represents a new
strain or mutation of an existing virus
or possibly a completely new beast
altogether, so that they can get an
update to virus signatures published as
quickly as possible for everyone's
benefit.
-
Finally, the entire
message is swept with our industry
leading ETrust AntiVirus software from
Computer Associates. This scan engine
includes Heuristic Virus Detection,
which often detects new mutations on
known viruses, even if there is no
published signature for that new
strain. We are able to automatically
deploy updated virus signatures as often
as every hour, without disruption of
service, so you can be sure we're always
on the front lines in virus defense.
At this stage, if nothing
inappropriate was discovered, we tell the
sending system that we will accept the
message and disconnect. All of the above
takes place in just a few seconds, excluding
upload time of the message data. We
optimize the performance of all of this by
"caching" the information we gather from
other systems, such as address verification
results and DNS records, for a period of
time, so that when similar messages are
attempted we can process them faster with
the information we gathered previously.
This also is done as a courtesy to those
other networks because it's not appropriate
to repeatedly pound their systems for the
same information we just got a few minutes
earlier...the data is just not likely to
change that quickly. The duration of that
caching varies depending on type and
content, and can range from 90 minutes to a
week or more for some things (DNS records
for major service providers like AOL and
Yahoo for example haven't changed in months
or years) and in most cases the source of
the information tells us how long we should
rely on the information to remain valid.
WHEW! What a gauntlet
that must be run by every message that comes
into the system...and we don't stop there!
From here the message goes into a final
process queue, where the sender information
is checked against our maintained
Whitelist. If it's listed, the message is
forwarded into the primary mail server for
final delivery. If the sender is not
listed, the message may be held for review
in Phase Three by one of our staff, who will
put human eyes on the message for things no
mechanical system can accurately detect,
like inappropriate Subjects (i.e. 'Vi@ra",
"Fr33 M3ds", or "M0rtg@g3 R@t3s") and other
non-machine readable content such as
HTML-linked images containing ads or
offensive material, and such messages will
get rerouted to a database for analysis for
pattern matches to adjust the automated part
of the system. While a single or a few
messages won't tell us much, the
accumulation of them over time is invaluable
toward refining the system to stay at peak
performance. Real messages are marked for
whitelisting so that future messages by the
same sender don't get held up. It may be
important to note here that a new message
that passes all the tests from a sender that
the system has never seen may not be
released from the review system immediately,
depending on how heavy the server load is,
and the time of day. It is specifically NOT
recommended to "test" the system to see if
it works, especially by sending email into
it from places it's never seen before. All
aspects of the system are constantly
monitored by both human and machine
attendants, so in the unlikely event of a
system outage, the appropriate technical
personnel are made aware promptly, and the
notification systems are regularly tested.
This combined with various system
redundancies makes the whole system highly
reliable and stable, thus end-user testing
isn't necessary, and would not yield much in
the way of useful information to you
anyway. Who knows, we might even discard
test messages under the network abuse clause
of the hosting agreement, just because we're
feeling a bit mischievous!
Now for your
questions...we're sure you have them after
reading this document. Okay, on to the !
|
|
|
You focus on your business... we'll do the web sites! |
|
| << top |
|
|