NAME Mail::SpamAssassin::Message - decode, render, and hold an RFC-2822 message DESCRIPTION This module encapsulates an email message and allows access to the various MIME message parts and message metadata. The message structure, after initiating a parse() cycle, looks like this: Message object, also top-level node in Message::Node tree | +---> Message::Node for other parts in MIME structure | |---> [ more Message::Node parts ... ] | [ others ... ] | +---> Message::Metadata object to hold metadata PUBLIC METHODS new() Creates a Mail::SpamAssassin::Message object. Takes a hash reference as a parameter. The used hash key/value pairs are as follows: "message" is either undef (which will use STDIN), a scalar of the entire message, an array reference of the message with 1 line per array element, or a file glob which holds the entire contents of the message. Note: The message is expected to generally be in RFC 2822 format, optionally including an mbox message separator line (the "From " line) as the first line. "parse_now" specifies whether or not to create the MIME tree at object-creation time or later as necessary. The *parse_now* option, by default, is set to false (0). This allows SpamAssassin to not have to generate the tree of Mail::SpamAssassin::Message::Node objects and their related data if the tree is not going to be used. This is handy, for instance, when running "spamassassin -d", which only needs the pristine header and body which is always handled when the object is created. "subparse" specifies how many MIME recursion levels should be parsed. Defaults to 20. _do_parse() Non-Public function which will initiate a MIME part parse (generates a tree) of the current message. Typically called by find_parts() as necessary. find_parts() Used to search the tree for specific MIME parts. See *Mail::SpamAssassin::Message::Node* for more details. get_pristine_header() Returns pristine headers of the message. If no specific header name is given as a parameter (case-insensitive), then all headers will be returned as a scalar, including the blank line at the end of the headers. If called in an array context, an array will be returned with each specific header in a different element. In a scalar context, the last specific header is returned. ie: If 'Subject' is specified as the header, and there are 2 Subject headers in a message, the last/bottom one in the message is returned in scalar context or both are returned in array context. Note: the returned header will include the ending newline and any embedded whitespace folding. get_mbox_separator() Returns the mbox separator found in the message, or undef if there wasn't one. get_body() Returns an array of the pristine message body, one line per array element. get_pristine() Returns a scalar of the entire pristine message. get_pristine_body() Returns a scalar of the pristine message body. extract_message_metadata($main) $str = get_metadata($hdr) put_metadata($hdr, $text) delete_metadata($hdr) $str = get_all_metadata() finish_metadata() Destroys the metadata for this message. Once a message has been scanned fully, the metadata is no longer required. Destroying this will free up some memory. finish() Clean up an object so that it can be destroyed. receive_date() Return a time_t value with the received date of the current message, or current time if received time couldn't be determined. PARSING METHODS, NON-PUBLIC These methods take a RFC2822-esque formatted message and create a tree with all of the MIME body parts included. Those parts will be decoded as necessary, and text/html parts will be rendered into a standard text format, suitable for use in SpamAssassin. parse_body() parse_body() passes the body part that was passed in onto the correct part parser, either _parse_multipart() for multipart/* parts, or _parse_normal() for everything else. Multipart sections become the root of sub-trees, while everything else becomes a leaf in the tree. For multipart messages, the first call to parse_body() doesn't create a new sub-tree and just uses the parent node to contain children. All other calls to parse_body() will cause a new sub-tree root to be created and children will exist underneath that root. (this is just so the tree doesn't have a root node which points at the actual root node ...) _parse_multipart() Generate a root node, and for each child part call parse_body() to generate the tree. _parse_normal() Generate a leaf node and add it to the parent.