SpamAssassin: Filtering E-Mail

Introduction

SpamAssassin is a mail filter to identify SPAM using text analysis. Using its rule base, it uses a wide range of heuristic tests on mail headers and body text, to identify SPAM, or unsolicited commercial email.

Overview

E-mail users will occasionally see messages from sources unknown, offering goods or services. Users tend to get this e-mail without solicitation. Often times the goods and services offered are not interesting to the user and the messages clutter in in-box, obscuring the important messages. Because the users didn't request information about the offered goods and services, it becomes difficult to pinpoint where the user's e-mail address became part of the mailers' lists, and therefore is difficult to remove the user's e-mail address from those lists.

Some messages include a section that allows the user to opt-out of future mailings. This method is okay if the sender is a legitimate business and honors the request from the user to remove them from their mailing list. But there are some that use the request to opt-out of future mailing to strengthen the e-mail address as valid and promotes the address in future mailings and shares the address with other businesses. Currently no federal legislation require an opt-out of future mailings. At one time, a bill numbered S.1618 would have required such a message at the end of each unsolicited e-mail, and some unsolicited e-mail messages still include a reference to the failed bill. Currently, no Indiana legislation regulates unsolicited e-mail.

The Engineering Computer Network (ECN) has implemented a copy of SpamAssassin for use by users to automatically filter their e-mail. SpamAssassin is an opt-in service, due to the nature of the scanner being unable to predict with one-hundred percent accuracy when a message is SPAM. When SpamAssassin is enabled, it processes the incoming e-mail messages and rates their spamminess on a point scale. When a message evaluates to a high enough point level, the message is marked as SPAM. Marked SPAM messages are delivered based on user preferences. Unmarked, non-SPAM messages are appended to the users in-box as usual.

Implementation

In order to use SpamAssassin, a incoming mail filter is implemented. In this document, the program procmail implements the mail filtering and can be configured in a variety of ways. Messages marked as SPAM can be delivered to the in-box, delivered to a separate mail folder, deleted or bounced back to the sender. Examples of each are shown below.

In order to determine if a message is SPAM, SpamAssassin comes with a variety of tests, each counting towards a total number of points. When a certain level of points are reached, the message is considered to be SPAM and processed accordingly. Note that there are some tests where the overall point level is lowered, indicating that the message may not be SPAM after all. Although SpamAssassin doesn't have a way of adding customized rules, it does allow for the point system to be customized. The point threshold is configurable, as well as each rule. Examples of changing SpamAssassin scoring system are shown below.

Requirements

This document describes the use of SpamAssassin at ECN. In order to use SpamAssassin, all of the following criteria must pass:

  • The final destination of your e-mail must be an ECN server,
  • The ECN server must be running RedHat Enterprise operating system,
  • No existing e-mail filtering is enabled (such as the vacation program).

To determine the e-mail server is to log on to the ECN server system and execute the mailbox command. To determine the operating system version, use the uname -s command. RedHat Enterprise systems will display as Linux. To determine if a preexisting mail filter is in operating, see if the file .forward exists.

In the example below, the user kosh logs on to his departmental server named Pier and checks the location of his e-mail mailbox, the operating system version and if there's a .forward file. The server answers back that his e-mail mailbox is forwarded to Pier, that Pier is running Linux and no mail forwarding file exists.

pier.ecn.purdue.edu% mailbox
Mail for "kosh" is forwarded to "kosh@pier.ecn.purdue.edu".
pier.ecn.purdue.edu% uname -s
Linux
pier.ecn.purdue.edu% cat .forward
cat: cannot open .forward

Don't worry about understanding much of this. The initialization script shown below will intervene if there is a problem setting up SpamAssassin because of operating system version or mail forwarding. Your ECN site specialist will assist setting up SpamAssassin in cases where the initialization script stops.

Configuring SpamAssassin

SpamAssassin comes with a variety of tests, each counting towards a total number of points. By default, the total point threshold to score a message as SPAM is 5.0. Messages that evaluate the SpamAssassin rules that compute to 5.0 or more points get marked as SPAM. Messages scoring less than 5.0 are delivered normally. When SpamAssassin processes a rule indicating a SPAM message, the total number of points increments with a positive score value. Rules have a higher positive score value when they are more apparently SPAM. Likewise, when SpamAssassin processes a rule indicating a non-SPAM message, the total number of points decrements with a negative score value. This helps balance SpamAssassin between falsely identifying a message as SPAM when it is not.

When using SpamAssassin with its default settings, it should correctly identify SPAM about ninety-percent of the time. In the following section, there are ways of tweaking SpamAssassin that will help in raising the success rate of correctly identifying SPAM messages.

Changing Required Hits

Occasionally SpamAssassin will miss SPAM messages or hit non-SPAM messages because the scoring threshold is misplaced. The default score is 5.0. If there are too many SPAM messages entering the in-box as unmarked, normal messages, try setting the score threshold lower. If there are too many non-SPAM messages being diverted from the in-box, try increasing the score threshold.

To change the score threshold value, log on to the Sun server and edit the file named .spamassassin/user_prefs. Near the top of the file will be the default setting when SpamAssassin was first run.

# How many hits before a mail is considered spam.
required_hits           5

Change the setting for required_hits from 5 to a new value. Try incrementing or decrementing the value by one or two points to start with. Write out the file. The changes to the scoring will be processed on the next e-mail message.

Changing Scoring Values

Occasionally SpamAssassin will miss SPAM messages, or hit non-SPAM messages because one of the rules scores too many or not enough points. SpamAssassin comes with a large number of rules, and each rules has a point value. See the SpamAssassin: Tests Performed for the rule identifier, score value and description.

To change the score value when a rule matches, look up the rule identifier. Then log on to the Sun server and edit the file named .spamassassin/user_prefs. Near the bottom of the file will be a place to put new rule score values.

# score SYMBOLIC_TEST_NAME n.nn

Add a new line to the file with the word score followed by the rule ID and the new score value.

As an example, say kosh wants to mark messages talking about Senate Bill 1618 with a higher score value than the default. Looking through the rule list finds a rule labeled S_1618, described as "Claims compliance with senate bill 1618" and has a default score value of 3.344 points toward being a SPAM message. kosh makes it a higher point value by adding the following line to the .spamassassin/user_prefs file:

# score SYMBOLIC_TEST_NAME n.nn
score S_1618                6.0

The changes to the score for this rule will be processed on the next e-mail message that matches.

Customization

The SpamAssassin initialization script installs a default filtering rule to mark messages when they are detected as SPAM. This will work okay for casual browsing messages, because SpamAssassin will place the identifying mark at the beginning subject line. There are some e-mail applications that can apply rules to messages to work with the altered subject line, such as automatically filing messages into new folders or changing the message in the message listing to a different color. These are client-side rules and it depends on the e-mail application for available functionality when dealing with SPAM messages.

SpamAssassin, in cooperation with the e-mail filtering software procmail, can perform different tasks other than just marking the message as SPAM. It is also possible to have the message filed to a separate mail folder, outright deleting the message, or bouncing the message back to the sender as undeliverable. These are server-side rules and they act on the incoming messages as they are being delivered by the mail server.

There are advantages and disadvantages to client-side or server-side rules. Client-side rules are implemented by the e-mail application and are relatively easy to manage using the graphical interface. However, client-side rules require that the e-mail messages, SPAM or not, transfer over the network before sorting rules are applied. Server-side rules can reduce the transfer volume of messages to the client e-mail application, but are difficult or non-intuitive changes to the mail filtering software, and it becomes possible to inhibit messages delivery if an error occurs.

Server-side Customization

Redirecting SPAM Messages

It is possible to have the mail server place messages marked as SPAM into a separate folder as they are received. After initializing the account for mail filtering with the sainit command, a section of the command script needs editing to enable the message filing command. In the middle of the procmail command script .procmailrc is a section with the message filing command, but commented out initially. Remove the command characters, the pound sign (#), from the three lines near the bottom, in between the arrows, but not including the line with the arrows. The script starts out looking this this:

#  Move to file or delete
#
#  This filter will move potential SPAM messages to a separate file.
#  Messages will not appear in the INBOX.  It is safer to move to a
#  file rather than deleting the message because it's possible that
#  some message may not be SPAM.
#
#  In this example, the SPAM messages will be moved to a file called
#  "SPAM". To delete the messages, change "SPAM" to "/dev/null".
#
#  In order to activate this rule, remove the comment character (#)
#  from the beginning of the three lines between the arrows.
#
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#:0
#* ^Subject: \*\*\*\*\*SPAM\*\*\*\*\*
#SPAM
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Change the three line between the arrows to this:

#
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
:0
* ^Subject: \*\*\*\*\*SPAM\*\*\*\*\*
SPAM
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once the change is made, the messages will begin filing to the new file named SPAM in the home directory. If the file does not exist, it will be created automatically by procmail. To change the message file to one visible by the ECN Web Mail system, use the name mail/SPAM instead of SPAM.

Deleting SPAM Messages

It is possible to have the mail server delete messages marked as SPAM, never to be seen again. This is a harmful way of handling SPAM mail filtering with SpamAssassin because it is possible that SpamAssassin will misidentify a valid message as SPAM. Setting the mail server to delete messages leaves no way of recovering the mail from a trashcan when deleted. The method to delete messages marked as SPAM is just like redirecting message, as described above, but changing the message file from SPAM to /dev/null. The message file /dev/null is a UNIX device that discards file changes. A mail deleting command would look like this:

#
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
:0
* ^Subject: \*\*\*\*\*SPAM\*\*\*\*\*
/dev/null
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once the change is made, new messages marked as SPAM will begin discarding instead of being collected in the in-box.

Co-existing with "vacation"

The "vacation" program is a utility that will return a message to the sender that you're not reading e-mail. It takes a template messages, stored in the home directory as the file .vacation.msg, and looks something like this:

 
From: kosh (via the vacation program) Subject: away from my mail I will not be reading my mail for a while. Your mail regarding "$SUBJECT" will be read when I return.

Vacation alters the mail filtering file .forward so that messages are stored into the in-box as usual, but also sends an auto-reply message back to the sender patterned after the message file above. The vacation program stores information about the auto-reply message into a database file, stored in the home directory as the file .vacation.db, that remembers the last time it auto-replied to the sender. Remembering the last reply time prevents the vacation program from sending back multiple messages to the sender. The vacation program will only auto-reply to a sender once a week.

Part of the problem between running SpamAssassin and the vacation program simultaneously is that they both use the same mail forwarding file, .forward. It is possible to get SpamAssassin and vacation to work together, but it requires hand editing of SpamAssassin files and vacation files.

Enabling Vacation

Enabling the vacation program requires three items: The vacation message file; The vacation database file; Enabling vacation in SpamAssassin's message filter.

To create a vacation message file, execute the program vacation. If vacation hasn't been run before, it will first bring up the editor with a template message. Accept or edit the template message, write out the template message file and exit the editor. Next, a series of questions will ask about showing or editing the message. Reply yes or no as needed. Finally a message exclaims that a .forward file already exists. When the vacation program asks to remove the file, reply no.

Would you like to see it? no Would you like to edit it? no You have a .forward file in your home directory containing: 

#forward.template-v1 
# 
# Filter e-mail through "procmail". 
# 
# If procmail is not found or fails for some reason, exit with status 
# 75. Status 75 tells sendmail that a temporary failure occurred and to 
# try again later. This way, mail won't be lost so long as a correction 
# is made before sendmail gives up trying a few days later. 
# "|/usr/local/bin/procmail -f- || exit 75" 
# 
# If at some point in the future, you want to bounce messages back, 
# uncomment the lines in the ".procmailrc" file and switch the line 
# above to: 
# 
# "|/usr/local/bin/procmail -f-" 
# 
# See the document # # https://engineering.purdue.edu/ECN/documents/UNIX/sa/ 
# 
# for more information about the subject of bouncing messages. 
# Would you like to remove it and disable the vacation feature? no

Next, initialize a vacation database file. Do this by executing vacation -I.

Finally, when ready to implement the vacation program on all newly received message, a change is needed to message filter file. At the bottom of the procmail command script .procmailrc is a section with the vacation command, but commented out initially. Remove the command characters, the pound sign (#), from the three lines near the bottom, in between the arrows, but not including the line with the arrows. The script starts out looking this this:

#  Enabling the vacation program
#
#  By adding the rule below, the vacation program will be enabled.
#  Be sure to initialize a vacation message first (the file .vacation.msg),
#  and initialize the database (using the command "vacation -I").
#
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#:0c
#*! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\*
#| /usr/bin/vacation $LOGNAME
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Change the three line between the arrows to this:

#
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
:0c
*! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\*
| /usr/bin/vacation $LOGNAME
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once the change is made, any new messages will process through SpamAssassin as before, but any non-SPAM messages will also be processing through the vacation program for auto-replies back to the sender.

Disabling Vacation

Removing the auto-reply feature of the vacation program is nearly the same as the third step of adding auto-reply. Replace the comment characters for the vacation program section of the .procmailrc file back into place. The script starts out looking like this:

# Enabling the vacation program 
# 
# By adding the rule below, the vacation program will be enabled. 
# Be sure to initialize a vacation message first (the file .vacation.msg), 
# and initialize the database (using the command "vacation -I"). 
# 
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv 
:0c *! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* | /usr/bin/vacation $LOGNAME 
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Change the three line between the arrows to this:

# 
###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv 
#
:0c 
#*! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* 
#| /usr/bin/vacation $LOGNAME 
###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There's no need to work with the vacation message file .vacation.msg or the database file .vacation.db. Leaving these files in place allows for the vacation command to work with the same data next time.

Last modified: 2015/04/21 11:07:4.758393 GMT-4 by curtis.f.smith.1
Created: 2007/11/06 13:47:28.899000 US/Eastern by brian.r.brinegar.1.

Categories

Search

Type in a few keywords describing what information you are looking for in the text box below.

Admin Options: Edit this Document