Having to deal with spam is a given. Trying to automate management of SpamAssassin ham and spam databases on my shared hosting, I decided to build two scripts, one to automatically learn spam, and the other to learn ham.
Obviously, there are caveats and also warnings… But that is always the case, is it not?
Warnings, Caveats, Requirements
These scripts provided below have the following warnings, caveats and requirements:
- sa-learn must obviously be executable by the user account running the scripts, and the Bayes DB file used by the local user must be “writeable” as well
- mail must be in the maildir format (not mbox format – understand the difference)
- spam directories will be cleared after “learning”
- spam must be “tossed” into the .spam mailbox folder; this means:
- POP/POP3 mailboxes that simply download and delete emails will never work
- IMAP clients must support server-side spam folder and users must move all spam there
- ham is only read from the “main” mailbox folders that are not hidden (i.e. not prefixed with a dot “.”)
- any spam that resides in a “main” mailbox will be assumed as ham; although subsequent correct re-classification, script re-runs and DB expiration should fix that
learnham.sh
This script assumes that all non-new messages in the main mailbox folders that are not “hidden” (i.e. with a dot “.” prefix) are ham:
#!/bin/sh for hamdir in `find ~/mail/ -type d -not -path "*/.spam/cur" -not -path "*/.*/cur" -path "*/cur" -not -empty` do echo Learning ham from $hamdir/\* sa-learn --showdots --ham $spamdir/\* done
learnspam.sh
This script assumes that all messages in the */.spam/cur mailbox folders are spam:
#!/bin/sh for spamdir in `find ~/mail/ -type d -path "*/.spam/cur" -not -empty` do echo Learning spam from $spamdir/\* sa-learn --showdots --spam $spamdir/\* rm $spamdir/*>/dev/null 2>1 done