SpamAssassin: Filtering E-Mail
SpamAssassin is a mail filter to identify SPAM using text analysis. Using its rule base, it uses a wide range of heuristic tests on mail headers and body text, to identify SPAM, or unsolicited commercial email.
E-mail users will occasionally see messages from sources unknown, offering goods or services. Users tend to get this e-mail without solicitation. Often times the goods and services offered are not interesting to the user and the messages clutter in in-box, obscuring the important messages. Because the users didn't request information about the offered goods and services, it becomes difficult to pinpoint where the user's e-mail address became part of the mailers' lists, and therefore is difficult to remove the user's e-mail address from those lists.
Some messages include a section that allows the user to opt-out of future mailings. This method is okay if the sender is a legitimate business and honors the request from the user to remove them from their mailing list. But there are some that use the request to opt-out of future mailing to strengthen the e-mail address as valid and promotes the address in future mailings and shares the address with other businesses. Currently no federal legislation require an opt-out of future mailings. At one time, a bill numbered S.1618 would have required such a message at the end of each unsolicited e-mail, and some unsolicited e-mail messages still include a reference to the failed bill. Currently, no Indiana legislation regulates unsolicited e-mail.
The Engineering Computer Network (ECN) has implemented a copy of SpamAssassin for use by users to automatically filter their e-mail. SpamAssassin is an opt-in service, due to the nature of the scanner being unable to predict with one-hundred percent accuracy when a message is SPAM. When SpamAssassin is enabled, it processes the incoming e-mail messages and rates their spamminess on a point scale. When a message evaluates to a high enough point level, the message is marked as SPAM. Marked SPAM messages are delivered based on user preferences. Unmarked, non-SPAM messages are appended to the users in-box as usual.
In order to use SpamAssassin, a incoming mail filter is implemented. In this document, the program procmail implements the mail filtering and can be configured in a variety of ways. Messages marked as SPAM can be delivered to the in-box, delivered to a separate mail folder, deleted or bounced back to the sender. Examples of each are shown below.
In order to determine if a message is SPAM, SpamAssassin comes with a variety of tests, each counting towards a total number of points. When a certain level of points are reached, the message is considered to be SPAM and processed accordingly. Note that there are some tests where the overall point level is lowered, indicating that the message may not be SPAM after all. Although SpamAssassin doesn't have a way of adding customized rules, it does allow for the point system to be customized. The point threshold is configurable, as well as each rule. Examples of changing SpamAssassin scoring system are shown below.
This document describes the use of SpamAssassin at ECN. In order to use SpamAssassin, all of the following criteria must pass:
- The final destination of your e-mail must be an ECN server,
- The ECN server must be running Sun Solaris 10 operating system,
- No existing e-mail filtering is enabled (such as the vacation program).
To determine the e-mail server is to log on to the ECN server system and execute the mailbox command. To determine the operating system version, use the uname -sr command. Solaris 10 systems will display as SunOS 5.10. To determine if a preexisting mail filter is in operating, see if the file .forward exists.
In the example below, the user kosh logs on to his Sun server named Pier and checks the location of his e-mail mailbox, the operating system version and if there's a .forward file. The Sun server answers back that his e-mail mailbox is forwarded to Pier, that Pier is running Solaris 10 and no mail forwarding file exists.
pier.ecn.purdue.edu% mailbox Mail for "kosh" is forwarded to "email@example.com". pier.ecn.purdue.edu% uname -sr SunOS 5.10 pier.ecn.purdue.edu% cat .forward cat: cannot open .forward
Don't worry about understanding much of this. The initialization script shown below will intervene if there is a problem setting up SpamAssassin because of operating system version or mail forwarding. Your ECN site specialist will assist setting up SpamAssassin in cases where the initialization script stops.
To initialize SpamAssassin, log on to the Sun server with your mailbox and execute the command sainit.The initialization script will make consistency checks before installing. If mail forwarding is already enabled on your account, it will stop before making changes. The initialization script will add two files to the home directory. A .forward file, which directs the mail server to filter the mail instead of storing the message in the in-box, and a .procmailrc file which directs the mail server on how the message is filtered. The initialization script can also remove the filter. Run the command sainit again and the script will detect that SpamAssassin is installed and will ask if it should remove the filter. It removes the filtering software by removing the file .forward then the file .procmailrc.
In the example below, the user kosh logs on to his Sun server named Pier and runs the initialization script. The script verifies that kosh e-mail is unfiltered before installing the SpamAssassin filter. A yes or no prompt is presented just before changes are made.
pier.ecn.purdue.edu% sainit Welcome to SpamAssassin initialization wizard. SpamAssassin analyzes incoming e-mail messages and uses a rating system to mark messages as potential spam. This script will enable SpamAssassin on your account. Do you wish to continue: yes SpamAssassin installed. To remove SpamAssassin from your account, remove from your home directory the file ".forward" then the file ".procmailrc". The default function of SpamAssassin is to filter the message, but leave it in your INBOX. To enable additional features of SpamAssassin, edit the file ".procmailrc" and follow the instructions. pier.ecn.purdue.edu%
SpamAssassin comes with a variety of tests, each counting towards a total number of points. By default, the total point threshold to score a message as SPAM is 5.0. Messages that evaluate the SpamAssassin rules that compute to 5.0 or more points get marked as SPAM. Messages scoring less than 5.0 are delivered normally. When SpamAssassin processes a rule indicating a SPAM message, the total number of points increments with a positive score value. Rules have a higher positive score value when they are more apparently SPAM. Likewise, when SpamAssassin processes a rule indicating a non-SPAM message, the total number of points decrements with a negative score value. This helps balance SpamAssassin between falsely identifying a message as SPAM when it is not.
When using SpamAssassin with its default settings, it should correctly identify SPAM about ninety-percent of the time. In the following section, there are ways of tweaking SpamAssassin that will help in raising the success rate of correctly identifying SPAM messages.
Changing Required Hits
Occasionally SpamAssassin will miss SPAM messages or hit non-SPAM messages because the scoring threshold is misplaced. The default score is 5.0. If there are too many SPAM messages entering the in-box as unmarked, normal messages, try setting the score threshold lower. If there are too many non-SPAM messages being diverted from the in-box, try increasing the score threshold.
To change the score threshold value, log on to the Sun server and edit the file named .spamassassin/user_prefs. Near the top of the file will be the default setting when SpamAssassin was first run.
# How many hits before a mail is considered spam. required_hits 5
Change the setting for required_hits from 5 to a new value. Try incrementing or decrementing the value by one or two points to start with. Write out the file. The changes to the scoring will be processed on the next e-mail message.
Changing Scoring Values
Occasionally SpamAssassin will miss SPAM messages, or hit non-SPAM messages because one of the rules scores too many or not enough points. SpamAssassin comes with a large number of rules, and each rules has a point value. See the SpamAssassin: Tests Performed for the rule identifier, score value and description.
To change the score value when a rule matches, look up the rule identifier. Then log on to the Sun server and edit the file named .spamassassin/user_prefs. Near the bottom of the file will be a place to put new rule score values.
# score SYMBOLIC_TEST_NAME n.nn
Add a new line to the file with the word score followed by the rule ID and the new score value.
As an example, say kosh wants to mark messages talking about Senate Bill 1618 with a higher score value than the default. Looking through the rule list finds a rule labeled S_1618, described as "Claims compliance with senate bill 1618" and has a default score value of 3.344 points toward being a SPAM message. kosh makes it a higher point value by adding the following line to the .spamassassin/user_prefs file:
# score SYMBOLIC_TEST_NAME n.nn score S_1618 6.0
The changes to the score for this rule will be processed on the next e-mail message that matches.
The SpamAssassin initialization script installs a default filtering rule to mark messages when they are detected as SPAM. This will work okay for casual browsing messages, because SpamAssassin will place the identifying mark at the beginning subject line. There are some e-mail applications that can apply rules to messages to work with the altered subject line, such as automatically filing messages into new folders or changing the message in the message listing to a different color. These are client-side rules and it depends on the e-mail application for available functionality when dealing with SPAM messages.
SpamAssassin, in cooperation with the e-mail filtering software procmail, can perform different tasks other than just marking the message as SPAM. It is also possible to have the message filed to a separate mail folder, outright deleting the message, or bouncing the message back to the sender as undeliverable. These are server-side rules and they act on the incoming messages as they are being delivered by the mail server.
There are advantages and disadvantages to client-side or server-side rules. Client-side rules are implemented by the e-mail application and are relatively easy to manage using the graphical interface. However, client-side rules require that the e-mail messages, SPAM or not, transfer over the network before sorting rules are applied. Server-side rules can reduce the transfer volume of messages to the client e-mail application, but are difficult or non-intuitive changes to the mail filtering software, and it becomes possible to inhibit messages delivery if an error occurs.
Microsoft Outlook XP contains a mail organization function that allows for incoming messages to be filed, colored or hidden. Rules can be easily applied to messages when they match a source or destination address, or have a particular subject line. Since the default settings in the SpamAssassin initialization script is to mark the message with a special subject line tag, it is easy to add an organization rule to Outlook XP. In this section will be an example on how to make Outlook XP change the color of the message when it is a SPAM message. It is assumed that Outlook XP is already configured to receive mail.
Start by selecting the in-box folder as the current mail folder, then click on the button in the toolbar labeled Organize. This will split the window into two and the top window will be the Ways to Organize Inbox. Select on the left the tab labeled Using Colors.
The available options in the organize by color section are coloring by from and coloring by sent to. In order to color the message that SpamAssassin has determined to be SPAM, there needs to be a rule that colors by subject line, because SpamAssassin will change the subject line to include the phrase "*****SPAM*****". To color by subject line, use the advanced organizing features by clicking on the Automatic Formatting... button near the upper right of the window.
Add a new rule by clicking the Add button. Name the rule SPAM. Click on Condition....
Add a condition for SpamAssassin marked messages by typing into the Search for the word(s) text box the phrase *****SPAM***** (five asterisk, the word SPAM, then five more asterisk). The field should already be set to subject field only. Click on OK. Next, set the color of the text in SPAM messages. Click on the Font... button.
Select a font style for the SPAM messages. In this example, set the color to Red and the font style to Bold. Then click OK. Back at the automatic formatting window, complete the rule by clicking OK.
Redirecting SPAM Messages
It is possible to have the mail server place messages marked as SPAM into a separate folder as they are received. After initializing the account for mail filtering with the sainit command, a section of the command script needs editing to enable the message filing command. In the middle of the procmail command script .procmailrc is a section with the message filing command, but commented out initially. Remove the command characters, the pound sign (#), from the three lines near the bottom, in between the arrows, but not including the line with the arrows. The script starts out looking this this:
# Move to file or delete # # This filter will move potential SPAM messages to a separate file. # Messages will not appear in the INBOX. It is safer to move to a # file rather than deleting the message because it's possible that # some message may not be SPAM. # # In this example, the SPAM messages will be moved to a file called # "SPAM". To delete the messages, change "SPAM" to "/dev/null". # # In order to activate this rule, remove the comment character (#) # from the beginning of the three lines between the arrows. # ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv #:0 #* ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* #SPAM ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Change the three line between the arrows to this:
# ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv :0 * ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* SPAM ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the change is made, the messages will begin filing to the new file named SPAM in the home directory. If the file does not exist, it will be created automatically by procmail. To change the message file to one visible by the ECN Web Mail system, use the name mail/SPAM instead of SPAM.
Deleting SPAM Messages
It is possible to have the mail server delete messages marked as SPAM, never to be seen again. This is a harmful way of handling SPAM mail filtering with SpamAssassin because it is possible that SpamAssassin will misidentify a valid message as SPAM. Setting the mail server to delete messages leaves no way of recovering the mail from a trashcan when deleted. The method to delete messages marked as SPAM is just like redirecting message, as described above, but changing the message file from SPAM to /dev/null. The message file /dev/null is a UNIX device that discards file changes. A mail deleting command would look like this:
# ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv :0 * ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* /dev/null ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the change is made, new messages marked as SPAM will begin discarding instead of being collected in the in-box.
The "vacation" program is a utility that will return a message to the sender that you're not reading e-mail. It takes a template messages, stored in the home directory as the file .vacation.msg, and looks something like this:
From: kosh (via the vacation program) Subject: away from my mail I will not be reading my mail for a while. Your mail regarding "$SUBJECT" will be read when I return.
Vacation alters the mail filtering file .forward so that messages are stored into the in-box as usual, but also sends an auto-reply message back to the sender patterned after the message file above. The vacation program stores information about the auto-reply message into a database file, stored in the home directory as the file .vacation.db, that remembers the last time it auto-replied to the sender. Remembering the last reply time prevents the vacation program from sending back multiple messages to the sender. The vacation program will only auto-reply to a sender once a week.
Part of the problem between running SpamAssassin and the vacation program simultaneously is that they both use the same mail forwarding file, .forward. It is possible to get SpamAssassin and vacation to work together, but it requires hand editing of SpamAssassin files and vacation files.
Enabling the vacation program requires three items: The vacation message file; The vacation database file; Enabling vacation in SpamAssassin's message filter.
To create a vacation message file, execute the program vacation. If vacation hasn't been run before, it will first bring up the editor with a template message. Accept or edit the template message, write out the template message file and exit the editor. Next, a series of questions will ask about showing or editing the message. Reply yes or no as needed. Finally a message exclaims that a .forward file already exists. When the vacation program asks to remove the file, reply no.
Would you like to see it? no Would you like to edit it? no You have a .forward file in your home directory containing: #forward.template-v1 # # Filter e-mail through "procmail". # # If procmail is not found or fails for some reason, exit with status # 75. Status 75 tells sendmail that a temporary failure occurred and to # try again later. This way, mail won't be lost so long as a correction # is made before sendmail gives up trying a few days later. # "|/usr/local/bin/procmail -f- || exit 75" # # If at some point in the future, you want to bounce messages back, # uncomment the lines in the ".procmailrc" file and switch the line # above to: # # "|/usr/local/bin/procmail -f-" # # See the document # # https://engineering.purdue.edu/ECN/documents/UNIX/sa/ # # for more information about the subject of bouncing messages. # Would you like to remove it and disable the vacation feature? no
Next, initialize a vacation database file. Do this by executing vacation -I.
Finally, when ready to implement the vacation program on all newly received message, a change is needed to message filter file. At the bottom of the procmail command script .procmailrc is a section with the vacation command, but commented out initially. Remove the command characters, the pound sign (#), from the three lines near the bottom, in between the arrows, but not including the line with the arrows. The script starts out looking this this:
# Enabling the vacation program # # By adding the rule below, the vacation program will be enabled. # Be sure to initialize a vacation message first (the file .vacation.msg), # and initialize the database (using the command "vacation -I"). # ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv #:0c #*! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* #| /usr/bin/vacation $LOGNAME ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Change the three line between the arrows to this:
# ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv :0c *! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* | /usr/bin/vacation $LOGNAME ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the change is made, any new messages will process through SpamAssassin as before, but any non-SPAM messages will also be processing through the vacation program for auto-replies back to the sender.
Removing the auto-reply feature of the vacation program is nearly the same as the third step of adding auto-reply. Replace the comment characters for the vacation program section of the .procmailrc file back into place. The script starts out looking like this:
# Enabling the vacation program # # By adding the rule below, the vacation program will be enabled. # Be sure to initialize a vacation message first (the file .vacation.msg), # and initialize the database (using the command "vacation -I"). # ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv :0c *! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* | /usr/bin/vacation $LOGNAME ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Change the three line between the arrows to this:
# ###vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # :0c #*! ^Subject: \*\*\*\*\*SPAM\*\*\*\*\* #| /usr/bin/vacation $LOGNAME ###^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There's no need to work with the vacation message file .vacation.msg or the database file .vacation.db. Leaving these files in place allows for the vacation command to work with the same data next time.
Last modified: 2014/01/17 10:31:32.455152 US/Eastern by
Created: 2007/11/06 13:47:28.899000 US/Eastern by brian.r.brinegar.1.
Type in a few keywords describing what information you are looking for in the text box below.