SpamAssassin

From Halon, SMTP software for hosting providers
Jump to: navigation, search

The primary anti-spam of the Halon SMTP software is a commercial technology from CYREN, which detects spam by measuring the volume of messages, globally. However, SpamAssassin is included as well, primarily to provide a "second opinion" in case the CYREN engine isn't sure.

Configuration

The system ship's with SpamAssassin's default configuration, with these exceptions;

score RDNS_NONE 0    # We don't do reverse lookups for performance reasons
lock_method flock    # Faster locking method
use_bayes 0          # Faster Bayes database (lines below)
bayes_auto_learn 1
bayes_file_mode 0777
bayes_path /storage/spamassassin/bayesian/bayes
bayes_store_module Mail::SpamAssassin::BayesStore::SDB 

For information on what SA rules are included by default, please see SpamAssassin's official default rules page. It's possible to add your own custom SA rules on the /mail/spamassassin/ (Configuration > System settings > SpamAssassin) page.

Usage

SpamAssassin can be used with the HSL function; ScanSA in the DATA flow. It is also available in the graphical "Anti-spam" block.

if (ScanSA() >= 5)
    
Reject("SpamAssassin thinks your message is SPAM"); 

DNSBL usage

SpamAssassin has strict policy for which DNSBL lists to include by default. Some of them might, or might not, be free for you to use. However, your system's accuracy or performance shouldn't be significantly degraded if some of those DNSBL's start blocking you.

Disabling DNSBL and other online checks

If you have a very high traffic system which is likely to be blocked by all SpamAssassin's included DNSBL lists, you can disable all of them by adding

skip_rbl_checks  1
skip_uribl_checks 1

to your configuration, or disabling some of them by adding lines such as

score __RCVD_IN_ZEN 0 # SpamHaus
score URIBL_SC_SURBL 0 # SURBL
score URIBL_WS_SURBL 0
score URIBL_PH_SURBL 0
score URIBL_MW_SURBL 0
score URIBL_AB_SURBL 0
score URIBL_JP_SURBL 0
score RCVD_IN_DOB 0 # Day Old Bread
score URIBL_RHS_DOB  0
score DNS_FROM_DOB 0

or you can block them based on domain name using

dns_query_restriction deny spamhaus.org
dns_query_restriction deny sorbs.net

Performance

Before reading this chapter, please familiarise yourself with the general performance guidelines.

SpamAssassin has several "deep scanning" and network lookup filters, that may take some time depending on the size, content, complexity and layout of a message. High volume systems need to cope with this fact, and for that reason we have developed a queuing and bypass system.

Configuration Default Description
antispam_sa_sizelimit 512 KiB Max message size; NOT_SCANNED_TOO_BIG=0 is returned if exceeded
antispam_sa_waitlimit 30 s Max estimated queue wait time; NOT_SCANNED_QUEUE_TOO_LONG=0 is returned if exceeded
antispam_sa_processes Depends on RAM Max number of SpamAssassin processes

Queueing

In order to provide a predicable throughput, we have implemented a queue in front of SpamAssassin, and messages are bypassed if the estimated wait time exceeds antispam_sa_waitlimit. However, in some cases SpamAssassin may still take a bit longer, that is due two possible reasons.

  • The message is being processed by SpamAssassin which for some reason takes more time than expected. That could be reloading of rules or some other unforeseen events.
  • The message has been put into the wait queue (somewhat commited to wait), but for some reason it takes way more time than expected. In that case, it waits up to three times the antispam_sa_waitlimit (* 3) before stepping out of the queue.

Bypass

In some cases it might not be necessary to scan a message using SpamAssassin; for example if CYREN return "spam" or if the message is a legit DSN message.

Bypass large volumes of bounces to a specific user:

if ($sender == "" and rate("dsn-bypass"$recipient10300) == false) {
    echo 
"DSN bypass";
    
Deliver();

Bypass messages from whitelisted servers:

function ScanSA() {
    global 
$senderip;
    global 
$senderdomain;
    if (
dnswl($senderip) or spamhauswl($senderip) or emailreg($senderdomain$senderip)) return 0;
    return 
builtin ScanSA();
}

function 
emailreg($senderdomain$senderip) {
    if (
$senderip =~ ":") return false;
    
$reverseip implode("."array_reverse(explode("."$senderip)));
    
$emailreg dns("$senderdomain.$reverseip.resl.emailreg.org");
    if (!
count($emailreg)) return false;
    echo 
"emailregwl: ".$emailreg;
    if (
$emailreg[0] =~ ''/127\.0\.\d\.0/'') return true;
    return 
false;
}

function 
dnswl($senderip) {
    
$dnswl dnsbl($senderip"list.dnswl.org");
    if (!
count($dnswl)) return false;
    
$parts explode("."$dnswl[0]);
    if (
$parts[3] == "255") return false;
    
$cat = [ "2" => "financial""3" => "serviceproviders""4"=> "orgs""5" => "ISP""7" => "travel""8" => "governments""9" => "media""10" => "special""11" => "edu""12" => "health""13" => "industrial""14" => "retail""15" => "marketing" ];
    echo 
"dnswl: ".$dnswl[0]." cat ".$cat[$parts[2]] ." score ".$parts[3];
    if (
$parts[3] == "3")
        return 
true// Only return true for "high" trust IP
    
return false;
}

function 
spamhauswl($senderip) {
    
$dnswl dnsbl($senderip"swl.spamhaus.org");
    if (!
count($dnswl)) return false;
    
$cat = [ "2" => "ind""3" => "trans""102" => "ind-temp""103" => "trans-temp" ];
    echo 
"spamhauswl: ".$dnswl[0]." cat ".$cat[explode("."$dnswl[0])[3]];
    return 
true;