Mail::SpamAssassin::ArchiveIterator - find and process messages one at a time
my $iter = new Mail::SpamAssassin::ArchiveIterator( { 'opt_j' => 0, 'opt_n' => 1, 'opt_all' => 1, 'opt_cache' => 1, } );
$iter->set_functions( \&wanted, sub { } );
eval { $iter->run(@ARGV); };
sub wanted { my($class, $filename, $recv_date, $msg_array) = @_;
... }
The Mail::SpamAssassin::ArchiveIterator module will go through a set of mbox files, mbx files, and directories (with a single message per file) and generate a list of messages. It will then call the wanted and results functions appropriately per message.
Mail::SpamAssassin::ArchiveIterator
object. You may
pass the following attribute-value pairs to the constructor. The pairs are
optional unless otherwise noted.
If the value is 0, the list of messages to process will be kept in memory, and only 1 message at a time will be processed by the wanted subroutine. Restarting is not allowed.
If the value is 1, the list of messages to process will be kept in a temporary file, and only 1 message at a time will be processed by the wanted subroutine. Restarting is not allowed.
If the value is 2 or higher, the list of messages to process will be kept in a temporary file, and the process will split into a parent/child mode. The option value number of children will be forked off and each child will process messages via the wanted subroutine in parallel. Restarting is allowed.
NOTE: For opt_j
>= 1, an extra child process will be created to
determine the list of messages, sort the list, everything as appropriate.
This will keep the list in memory (possibly multiple copies) before
writing the final list to a temporary file which will be used for
processing. The list generation child will exit, freeing up the memory.
If both opt_head
and opt_tail
are specified, then the opt_head
value
specifies a subset of the opt_tail
selection to use; in other words, the
opt_tail
splice is applied first.
wanted_sub
callback below. Set this to 0 to avoid this;
it's a good idea to set this to 0 if you can, as it imposes a performance
hit.
opt_cache
, if you don't want to mix them with the input files (as is the
default). The directory must be both readable and writable.
Note that if opt_want_date
is set to 0, the received date scalar will be
undefined.
Note that if opt_want_date
is set to 0, the received date scalar will be
undefined.
.gz
or
.bz2
will be properly uncompressed via call to gzip -dc
and bzip2 -dc
respectively.
The target_paths array is expected to be one element per path in the following format: class:format:raw_location
run()
returns 0 if there was an error (can't open a file, etc,) and 1 if there
were no errors.
dir
is a directory whose
files are individual messages, file
a file with a single message,
mbox
an mbox formatted file, or mbx
for an mbx formatted directory.
detect
can also be used. This assumes mbox
for any file whose path
contains the pattern /\.mbox/i
, file
for STDIN and anything that is
not a directory, or directory
otherwise.
perldoc -f glob
). ~
at the
front of the value will be replaced by the HOME
environment variable.
Escaped whitespace is protected as well.
NOTE: ~user
is not allowed.
Mail::SpamAssassin
spamassassin
mass-check