Future Directions for SpamAssassin: Bayesian Probability


Naive Bayesian classification -- as seen on slashdot ;)

Given a corpus of mail messages, and knowledge that each mail is spam or non-spam: break the message down into "words", and track how frequently each word appears in spam vs. non-spam.

From this you can determine the probability that a mail is spam, based on the words used within it.