Mail::SpamAssassin::Plugin::Bayes - determine spammishness using a Bayesian classifier
This is a Bayesian-style probabilistic classifier, using an algorithm based on the one detailed in Paul Graham's A Plan For Spam paper at:
http://www.paulgraham.com/spam.html
It also incorporates some other aspects taken from Graham Robinson's webpage on the subject at:
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
And the chi-square probability combiner as described here:
http://www.linuxjournal.com/print.php?sid=6467
The results are incorporated into SpamAssassin as the BAYES_* rules.
Languages enabled in bayes stopwords processing, every language have a default stopwords regexp, tokens matching this regular expression will not be considered in bayes processing.
Custom regular expressions for additional languages can be defined in local.cf
.
Custom regular expressions can be specified by using the bayes_stopword_lang
keyword like in the following example:
bayes_stopword_languages en se
bayes_stopword_en (?:you|me)
bayes_stopword_se (?:du|mig)
Regexps are case-insensitive will be anchored automatically at beginning and end.
To disable stopwords usage, specify bayes_stopword_languages disable
.
Only one bayes_stopword_languages or bayes_stopword_xx configuration line can be used. New configuration line will override the old one, for example the ones from SpamAssassin default ruleset (60_bayes_stopwords.cf).
Configure the maximum number of character a token could contain