buildsys: drop upstream tarball and add extracted sources

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
This commit is contained in:
Stoiko Ivanov 2021-03-24 17:44:22 +01:00
parent 054f24dbbb
commit 37ef577538
586 changed files with 144918 additions and 0 deletions

Binary file not shown.

403
upstream/CREDITS Normal file
View File

@ -0,0 +1,403 @@
Copyright (C) 2019 The Apache Software Foundation
Project Management Committee (PMC):
This list contains PMC members in alphabetical order (and their Amazon
wishlists). The PMC can be reached at <private /at/ spamassassin.apache.org>
(for private correspondence) or <dev /at/ spamassassin.apache.org> (for
public contact); we request that the dev list should be used for all
non-confidential correspondence.
- Giovanni Bechis
- Karsten Bräckelmann
- Alex Broens http://www.msf.org/en/donate
- Bill Cole
- John Hardin
- Dave Jones
- Adam Katz
- Henrik Krohns
- Sidney Markowitz
- Mark Martinec
- Kevin A. McGrail
- Michael Parker http://www.amazon.com/o/registry/10BBAR2M03T6F
- Joe Quinn
Committers:
This list contains committers in alphabetical order
- Paul Stead
- Merijn van den Kroonenberg
- Bryan Vest
PMC Emeritus & inactive committers:
This list contains all PMC Emeritus & inactive committers in alphabetical order.
We wish them well and hope to see them return someday.
- Matthew Cline
- Duncan Findlay - PMC Emeritus
- Tony Finch
- Steve Freegard
- Craig Hughes
- Matt Kettler - PMC Emeritus
- Justin Mason - Creator & PMC Emeritus
- Robert Menschel
- Daryl C. W. O'Shea - PMC Emeritus
- John Gardiner Myers
- Daniel Quinlan - PMC Emeritus
- Dale 'Doc' Schneider - PMC Emeritus
- Matt Sergeant
- Henry Stern
- Malte S. Stretz - PMC Emeritus
- Warren Togami
- Theo Van Dinter - PMC Emeritus
- Matt Yackley
- Daniel Lemke
Spamc credits:
- Author: Craig Hughes
- Conversion to a thread-safe shared library by Liam Widdowson
- Portions by Brad Jorsch
- Windows adaption by Sidney Markowitz
- autoconf wrapper (for Unix)/alternative (for Windows): Malte S. Stretz,
based on work done by Sidney Markowitz
- spamc/qmail-spamc.c by John Peacock
Spamd:
- Author: Craig Hughes
- Parts by Malte S. Stretz
Algorithms:
The Bayesian-style probabilistic classifier used by SpamAssassin's BAYES
rules is based on an approach outlined by Gary Robinson. Thanks, Gary!
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
Major contributions:
- Michael Bell, <mikebell90(at)yahoo.com>: Bayes documentation.
- Kelsey Cummings, <kgc(at)sonic.net>: client-ip-address security in spamd;
sql-configs-with-setuid switch added to spamd.
- Dallas L. Engelken <dallase(at)nmgi.com>: ImageInfo plugin.
- Justin England, <jengland(at)enetis.net>: SQL support.
- Steve Friedl, <steve(at)unixwiz.net>: UNIX domain socket support in
spamd/spamc.
- Ryan Finnie, <ryan(at)finnie.org>: message encapsulation as
message/rfc822 attachment.
- Matt Kettler, <mkettler_sa(at)comcast.net>: most of the anti-drug rules in
rules/20_drugs.cf; bug fix for list of reserved IP addresses; others.
- Brad "anomie" Jorsch, <anomie(at)users.sourceforge.net>: fix
to avoid losing mail from spamc; BSMTP and -e support; tracking of
number of spamd processes; several other mods.
- Kristian Köhntopp, <kris(at)koehntopp.de>: LDAP support.
- Matthias Leisi, <matthias(at)leisi.net>: Mail::SpamAssassin::Plugin::ASN
plugin.
- Daniel Lemke, <lemke(at)jam-software.com>: many Windows support fixes
- John Madden, <maddenj+spamassassin at skynet.ie>: spamc -F configuration
file support.
- Sidney Markowitz, <sidney(at)sidney.com>: fix to DNS tests;
message-size sanity-checking in spamc; language identification;
Win32 build support.
- Marc Merlin, <marc_soft(at)merlins.org>: RBL ordering/timeouts;
time-logging for debug speed testing.
- Bob Menschel: 'longwords' rules, some documentation.
- Eugene/Yevgeniy Miretsky, <eugene(at)invision.net>: bug fix for
spamc -c; Spamc timeout support; support for REPORT, REPORT_IFSPAM
and SYMBOLS methods in spamc.
- Gertjan van Noord, <vannoord(at)let.rug.nl>: TextCat language classifier
- Michael Parker, <parkerm(at)pobox.com>: Bayes-in-SQL and AWL-in-SQL;
writing Bayes regression tests.
- John Peacock, <jpeacock(at)rowman.com>: qmail-spamc in contrib;
patch to README file
- Marc Perkel, <marc(at)perkel.com>: 30-or-so rules; about 20 of which are
still in codebase.
- Nico Prenzel, <nico.prenzel(at)pn-systeme.de>: remote learning and
spam-reporting support in spamc/spamd.
- Dustin Sallings, <dustin+spamassassin(at)spy.net>: support for
virtual users in spamd.
- Ed Allen Smith, <easmith(at)beatrice.rutgers.edu>: GA improvements; 6
rules.
- Henry Stern: perceptron score optimizer (replacing the GA).
- Ivo Truxa: TxRep reputation database plugin.
- Liam Widdowson, <liam(at)inodes.org>: shared-library use of spamc.
- Radoslaw Zielinski, <radek(at)pld-linux.org>:
Mail::SpamAssassin::Spamd::Apache2, a mod_perl2 module implementing spamd,
contributed as a Google Summer of Code project.
Translators:
- Michel Bouissou, <michel(at)bouissou.net>: French translation.
- Olivier Nicole, <on(at)cs.ait.ac.th>: prior version of French translation.
- Jesse Houwing, <j.houwing(at)student.utwente.nl>: Dutch translation.
- Alexander Litvinov, <lan(at)ac-sw.com>: Russian translation.
- Peter Mann, <peter.mann(at)tuke.sk>: Slovak translation.
- Klaus Heinz, <klaus.heinz(at)onlinehome.de>: German translation.
Patch submitters:
- Michael Brown, <michaelb(at)opentext.com>: support to build libspamc.so
- Nick "Nix" Alcock, <nix(at)esperi.org.uk>: DCC fix
- Bob Apthorpe, <apthorpe+sa(at)cynistar.net>: tools/sa-stats.pl
- Alan Barrett, <apb(at)cequrux.com>: base64 decoding code
- Rod Begbie, <rod(at)begbie.com> <rOD-spamassassin(at)arsecandle.org>: DCC
bugfixes.
- Robert Bihlmeyer, <robbe(at)orcus.priv.at>: CHARSET_FARAWAY_HEADERS test
- Richard Birkett, <richard(at)birkett.com>: patch to a build script.
- Cassandra Brockett, <cass(at)ophiuchi.net>: brought SQL checks and
documentation into line
- Adrian Bunk, <bunk(at)fs.tum.de>: URI_IS_POUND
- Ken Causey, <ken(at)kencausey.com>: patch to remove reserved IPs;
improve performance of network lookups.
- Jean Charles Delepine, <delepine(at)u-picardie.fr>: report_charset
support.
- Anirvan Chatterjee, <anirvan(at)chatterjee.net>:
<anirvan(at)chatterjee.net>; suggestions for TO_HAS_SPACES rule
and TO_ADDRESS_EQ_REAL;
- Andrey Chernov, <ache(at)nagual.pp.ru>: miscellaneous bug fixes.
- Christopher Davis, <ckd-spamassassin(at)ckdhr.com>: patch to
contrib/spamcheck.py.
- Vince Delvecchio, <vince.delvecchio(at)analog.com>: negative look-behinds
PORN_4
- Chris Eykamp, <chris(at)eykamp.com>: 4 weight loss rules.
- David B Funk <dbfunk(at)engineering.uiowa.edu>: Fix for newline collapse
causing excessive whitespace rules to not match
- Alan Ford, <alan(at)whirlnet.co.uk>: Getting a list of symbols of tests hit
with spamd; patch for spamd response headers.
- Scott Griffith, <skod(at)ises-llc.com>: 12 tests; 7 still in codebase.
- Ask Bjoern Hansen, <ask(at)develooper.com> <ask(at)apache.org>:
REPORT_IFSPAM in spamd.
- Sean Harding, <sharding(at)dogcow.org>: patch for
X-Spam-Checker-Version.
- Klaus Heinz, <klaus.heinz(at)onlinehome.de>: changes to rules;
packaging fixes for UNIX package; German translation.
- Ed Henderson, <ed.henderson(at)certainty.net>: fix for vpopmail support in
spamd.
- David Hull, <hull(at)paracel.com> <hull(at)davidhull.org>:
rewrite_subject and report_header; rules
- Morbus Iff, <morbus(at)disobey.com>: don't create prefs patch.
- Steve Keay, <steve-spamassassin-bugzilla(at)keay.com>: spamd -A network
ranges support.
- Vivek Khera, <khera(at)kcilink.com>: contributed to Razor2 patch.
- Alexander Kourakos, <awk(at)bnt.com>: bug fixes.
- Juergen Kreileder, <kreilede(at)issan.informatik.uni-dortmund.de>:
misc fixes; Bayes ignore Gnus annotation
- Henrik Krohns, <hege@hege.li>, 'uridnssub' keyword for URIDNSBL plugin.
- David M. Koppelman, <koppel(at)ece.lsu.edu>: bayes_score report
fix; bayes expiry time bugfix.
- Gregor Lawatscheck, <gpel(at)mpex.net>: rule suggestions
- Michael Lemke, <lemkemch(at)t-online.de>: Win32Locker fix
- John Levine, <johnl(at)iecc.com>: added --syslog=stderr support.
- Tom Lipkis, <tal(at)pss.com>: SunOS 4.1.4 support.
- Dave Lugo, <dlugo(at)etherboy.com>: documentation regarding use of
DCC in INSTALL file.
- Mark Martinec, <Mark.Martinec(at)ijs.si>: contributed to Razor2 patch
- Morgan Massena, <mmassena(at)ipowerplant.com>: patch to SQL support to
allow installer to specify more table details.
- Kevin McGrail, <kmcgrail(at)pccc.com>: portability fix for
Red Hat 5 support.
- Rob McMillin, <rlm(at)pricegrabber.com> <rfm(at)scareduck.com>:
rule fixes.
- Michael Moncur, <mgm(at)starlingtech.com>: many rules; SQL fix.
- John Morrissey, <jwm(at)horde.net>: fixed "check_rbl() doesn't check $#ips
properly".
- Dirk Mueller, <mueller(at)kde.org>: spamc low-memory bugfix.
- Nate Mueller, <nate(at)cs.wisc.edu>: SSL support in spamd/spamc;
- Rob Nagler, <nagler(at)bivio.com>: speed-up bug fix.
- Nathan Neulinger, <nneul(at)umr.edu>: security patch; code
cleanup; dccifd support.
- John Newman, <jnewman(at)scms.waikato.ac.nz>: UW .mbx file-format support.
- Mike Nolan, <nolan(at)naic.edu>: SunOS build directions
- Martin Östlund, <mo(at)microsaft.nu>: Slackware 9.0 rc-script for spamd.
- Tomasz Ostrowski, <tometzky(at)batory.org.pl>: perl 5.005 support.
- Henning P. Schmiedehausen, <hps(at)intermeta.de> <henning(at)apache.org>:
adding ? to shell globs.
- Francesco Potortì, <pot(at)gnu.org>: documentation improvements
- Alan Premselaar, <alien(at)12inch.com>: rule suggestions.
- Martin Radford, <martin-sabz(at)zamenhof.demon.co.uk>: rules and
rule descriptions.
- Daniel Rall, <dlr(at)finemaltcoding.com>: documentation fix.
- Brad Rathbun, <brad(at)computechnv.com>: tools/sa-stats.pl.
- Xavier Renaut, <node3667(at)users.sourceforge.net>: contrib/spamproxyd
bugfix.
- Bobby Rose, <brose(at)med.wayne.edu>: Pyzor support; dcc_path.
- Klaus Johannes Rusch, <KlausRusch(at)atmedia.net>: fix for
find_all_addrs_in_line().
- Bart Schaefer, <schaefer(at)zanshin.com>: bug fix for DCC; Razor
support; rules.
- Dianne Skoll, <dfs(at)roaringpenguin.com>: one line change to
__OUTLOOK_MUA.
- Al Smith, <al.smith(at)aeschi.ch.eu.org>: fix to SSL spamd bug.
- Sander Steffann, <sander(at)steffann.nl>: patch to contrib/spamcheck.py.
- Michael Stenner, <mstenner(at)phy.duke.edu>: ident authentication support
in spamd.
- Brett A. Thomas, <bthomas(at)vindicia.com>: improved sa-learn
command-line API.
- Reini Urban, <rurban(at)x-ray.at>: fix to Makefile for cygwin
- Tobias von Koch, <tvk(at)weltcharts.de>: DCC support fixes; rules.
- Vince Vielhaber, <vev(at)michvhf.com>: spamc -c bugfix patch.
- Charlie Watts, <cewatts(at)frontier.net>: patch to deal with perl
bug on BSD platforms; DNS timeouts.
- Andrew Wilson, <andrew(at)rivendale.net>: support for MIME::Entity
contributed.
- Jeremy Zawodny, <jeremy(at)zawodny.com>: patch to SQL support.
- The Little Rubber Ninja Homepage <http://www.unsuave.com/ninja/>, owned
by Mike Quinn: source of the original SpamAssassin logo.
If your name is not here, and you've submitted a patch that was included,
it's just an oversight. Please mail me at <jm /at/ jmason.org> and I'll add
you to the list.
ASF Sponsorship:
SpamAssassin is an Apache Software Foundation project. The Apache Software
Foundation could not exist without the continued generous support from the
community.
Please take a moment to view the complete list of sponsors by visiting:
http://www.apache.org/foundation/thanks.html
If you are interested in sponsoring the ASF, please read the sponsorship page
by visiting:
http://www.apache.org/foundation/sponsorship.html for more information.
Resources:
Thanks to our previous mirrors: Peregrine Computer Consultants Corporation
(previously Peregrine Hardware, Inc.) and Kevin A. McGrail, Jeremy Zawodny,
Mark Reynolds, RedIRIS, Hagen Herrschaft, and PlanetMirror.
Thanks to Mark Reynolds of Reynolds Technology (http://www.reynolds.net.au/)
for the registration of spamassassin.org.
Thanks to Kelsey Cummings and Sonic.net (http://www.sonic.net/) for
significant contributions with network services, and on the back-end; our
score-optimization systems would be significantly weedier without their help.
http://www.pccc.com/
http://www.zawodny.com/
http://www.reynolds.net.au/
http://www.rediris.es/
http://www.hrxnet.de/
http://www.planetmirror.com/
http://sourceforge.net/
http://www.sonic.net/
Finally:
Thanks to James Thompson at cPanel Inc who designed our new logo in 2014 and
to Christian Rauh, winner of the SpamAssassin logo contest, who created,
designed, and illustrated our first Apache SpamAssassin logo.

3514
upstream/Changes Normal file

File diff suppressed because it is too large Load Diff

481
upstream/INSTALL Normal file
View File

@ -0,0 +1,481 @@
Upgrading SpamAssassin?
-----------------------
Please be sure to read the UPGRADE file for important changes that
have been made since previous versions. In particular, 3.3.0 no
longer includes a default ruleset.
Installing or Upgrading SpamAssassin
------------------------------------
Using CPAN via CPAN.pm:
perl -MCPAN -e shell [as root]
o conf prerequisites_policy ask
install Mail::SpamAssassin
quit
Using Linux:
Debian unstable: apt-get install spamassassin
Gentoo: emerge mail-filter/spamassassin
Fedora: yum install spamassassin
Alternatively download the tarfile, zipfile, and/or build your own RPM
from https://spamassassin.apache.org/. Building from tar/zip file is
usually as simple as:
[unzip/untar the archive]
cd Mail-SpamAssassin-*
perl Makefile.PL
[option: add -DSPAMC_SSL to $CFLAGS to build an SSL-enabled spamc]
make
make install [as root]
After installing SpamAssassin, you need to download and install the
SpamAssassin ruleset using "sa-update". See the "Installing Rules"
section below.
Please make sure to read this whole document before installing, especially
the prerequisite information further down.
To install as non-root, see the directions below.
If you are running AFS, you may also need to specify INSTALLSITELIB and
SITELIBEXP.
Note that you can upgrade SpamAssassin using these instructions, as long
as you take care to read the caveats in the file UPGRADE. Upgrading
will not delete your learnt Bayes data or local rule modifications.
If you're using SunOS 4.1.x, see
http://wiki.spamassassin.org/w/BuildingOnSunOS4 for build tips.
Installing SpamAssassin for Personal Use (Not System-Wide)
----------------------------------------------------------
These steps assume the following, so substitute as necessary:
- Your UNIX login is "user"
- Your home directory is /home/user
- The location of the procmail executable is /usr/bin/procmail
Many more details of this process are at
http://wiki.apache.org/spamassassin/SingleUserUnixInstall
1. Uncompress and extract the SpamAssassin archive, using "unzip" or
"tar xvfz", in a temporary directory.
2. change directory into it:
cd Mail-SpamAssassin-*
3. Make SpamAssassin as normal, but using your home directory as the
target:
perl Makefile.PL PREFIX=$HOME
make
make install
Please see the file PACKAGING, sections "Changing paths in the Makefile"
and "Setting further options on the command line" for more information
on available command line variables.
4. Install the SpamAssassin ruleset using "sa-update":
$HOME/bin/sa-update
See the "Installing Rules" section below if you do not wish to download
the rules directly from the internet.
NOTE: Because LWP does not support IPv6, sa-update as of 3.4.0 will use
the binaries curl, wget or fetch to download rule updates with LWP used
as a fallback if none of the binaries exist.
5. If you already use procmail, skip to step 7. If not, ensure procmail
is installed using "which procmail" or install it from www.procmail.org.
6. Create a .forward file in your home directory containing the below
lines:
"|IFS=' ' && exec /usr/bin/procmail -f- || exit 75 #user"
7. Edit or create a .procmailrc file in your home directory containing the
below lines. If you already have a .procmailrc file, add the lines to
the top of your .procmailrc file:
:0fw: spamassassin.lock
| /home/user/bin/spamassassin
The above line filters all incoming mail through SpamAssassin and tags
probable spam with a unique header. If you would prefer to have spam
blocked and saved to a file called "caughtspam" in your home directory,
instead of passed through and tagged, append this directly below the above
lines:
:0:
* ^X-Spam-Status: Yes
caughtspam
Also, see the file procmailrc.example and
http://wiki.apache.org/spamassassin/UsedViaProcmail
8. Now, you should be ready to send some test emails and ensure everything
works as expected. First, send yourself a test email that doesn't
contain anything suspicious. You should receive it normally, but there
will be a header containing "X-Spam-Status: No". If you are only
tagging your spam, send yourself a copy of the GTUBE test string to
check to be sure it is marked as spam. GTUBE is located in the
sample-spam.txt message distributed with SpamAssassin and also at:
https://spamassassin.apache.org/gtube/
If your test emails don't get through to you, immediately rename your
.forward file until you figure out cause of the the problem, so you
don't lose incoming email.
Note: one possible cause for this is the use of smrsh on the MTA system;
see http://wiki.spamassassin.org/w/ProcmailVsSmrsh for details.
9. You can now customize SpamAssassin. See README for more information.
Installing Rules
----------------
Rules are normally installed by running a sa-update command.
The version of sa-update program should match the version of SpamAssassin
modules, so invoking sa-update should be performed only after installing
or upgrading SpamAssassin code, not before.
Installing rules from network is done with a single command:
sa-update
This is normally run as root.
If you wish to install rules from downloaded files, rather than "live" from
the latest online ruleset, here is how to do it.
Obtain all the following files:
Mail-SpamAssassin-rules-xxx.tgz
Mail-SpamAssassin-rules-xxx.tgz.asc
Mail-SpamAssassin-rules-xxx.tgz.md5
Mail-SpamAssassin-rules-xxx.tgz.sha1
(where xxx may look something like '3.3.0-rc1.r893295')
Save them all to the current directory.
Obtain a rules-signing public key:
curl -O https://spamassassin.apache.org/updates/GPG.KEY
Import the signing key to the SpamAssassin gpg keyring, so that the rules
files can be verified safely:
sa-update --import GPG.KEY
Install rules from a compressed tar archive:
sa-update --install Mail-SpamAssassin-rules-xxx.tgz
Note that the ".tgz.asc", ".tgz.md5" and ".tgz.sha1" files all need to
be in the same directory, otherwise sa-update will fail.
If the intended rules destination directory differs from a default location
as assumed by sa-update and SpamAssassin, such as when running a content
filter within a Unix jail or on an unusual installation, please supply the
rules destination directory to sa-update through its option --updatedir,
such as:
sa-update --updatedir /var/jail/var/db/spamassassin/3.003000
CPAN
----
Most of the modules listed below are available via the Comprehensive Perl
Archive Network (CPAN, see http://www.cpan.org/ for more information).
While each module is different, most can be installed via a few simple
commands such as:
$ perl -MCPAN -e shell
cpan> o conf prerequisites_policy ask
cpan> install Module::Name
cpan> quit
If there are problems or questions regarding the installation any of the
modules, please see the CPAN and relevant module's documentation for more
information. We can't provide documentation or installation support for
third party modules.
Additional information about the CPAN module is also available via
"perldoc CPAN".
Most Linux distributions also offer the CPAN modules in their own native
formats (RPMs, Debian packages, etc.), so you should be able to find these
through those mechanisms, too, if you prefer.
Required Perl Interpreter
-------------------------
Perl 5.8.1 or a later version is required.
Preferred versions are 5.8.8, or 5.10.1 or later.
Most of the functions might still work with Perl 5.6.1 or 5.6.2,
but 5.6.* is no longer a supported version.
Required Perl Modules
---------------------
In addition to the modules associated with Perl, some additional modules
need to be installed or upgraded depending on the version of Perl that you
are running.
You can get an immediate report on which of these modules you may need (or
want) to upgrade, by running "perl build/check_dependencies" from the
SpamAssassin build directory.
The list of required modules that do not ship with Perl and must be
installed:
- Digest::SHA1 (from CPAN),
or the newer Digest::SHA which is a perl base module since Perl 5.10.0
The Digest::SHA1 module is used as a cryptographic hash for some
tests and the Bayes subsystem if Digest::SHA module is not available.
An external perl module razor-agents-2.84 as used by a Razor2 plugin
seems to be the only remaining component depending on Digest::SHA1
(note that a packager may ship a patched version of razor-agents which
can use Digest::SHA instead)
Debian: apt-get install libdigest-sha1-perl
Gentoo: emerge dev-perl/Digest-SHA1
Fedora: yum install perl-Digest-SHA1
- HTML::Parser >= 3.43 (from CPAN)
HTML is used for an ever-increasing amount of email so this dependency
is unavoidable. Run "perldoc -q html" for additional information.
Debian: apt-get install libhtml-parser-perl
Gentoo: emerge dev-perl/HTML-Parser
Fedora: yum install perl-HTML-Parser
- Net::DNS (from CPAN)
Used for all DNS-based tests (SBL, XBL, SpamCop, DSBL, etc.),
perform MX checks, used when manually reporting spam to SpamCop,
and used by sa-update to gather version information.
You need to make sure the Net::DNS version is sufficiently up-to-date:
- version 0.34 or higher on Unix systems
- version 0.46 or higher on Windows systems
Debian/Ubuntu: apt-get install libnet-dns-perl
Fedora: yum install perl-Net-DNS
- NetAddr::IP (from CPAN)
Used to parse IP addresses and IP address ranges for
"trusted_networks".
Debian/Ubuntu: apt-get install libnetaddr-ip-perl
Fedora: yum install perl-NetAddr-IP
- Time::HiRes (from CPAN)
Used by asynchronous DNS lookups to operate timeouts with subsecond
precision and to report processing times accurately.
- LWP (aka libwww-perl) (from CPAN)
This set of modules will include both the LWP::UserAgent and
HTTP::Date modules, used by sa-update to retrieve update archives.
Fedora: yum install perl-libwww-perl
- HTTP::Date (from CPAN)
Used by sa-update to deal with certain Date requests.
- IO::Zlib (from CPAN)
Used by sa-update to uncompress update archives.
Version 1.04 or later is required.
Fedora: yum install perl-IO-Zlib
- Archive::Tar (from CPAN)
Used by sa-update to expand update archives.
Version 1.23 or later is required.
Fedora: yum install perl-Archive-Tar
Optional Modules
----------------
In addition, the following modules will be used for some checks, if
available and the version is high enough. If they are not available or if
their version is too low, SpamAssassin will still work, just not as
effectively because some of the spam-detection tests will have to be
skipped.
Note: SpamAssassin will not warn you if these are installed, but the
version is too low for them to be used.
- MIME::Base64
This module is highly recommended to increase the speed with which
Base64 encoded messages/mail parts are decoded.
- DB_File (from CPAN, included in many distributions)
Used to store data on-disk, for the Bayes-style logic, TxRep, and
auto-whitelist. *Much* more efficient than the other standard Perl
database packages. Strongly recommended.
There seems to be a bug in libdb 4.1.25, which is
distributed by default on some versions of Linux. See
http://wiki.apache.org/spamassassin/DbFileSleepBug for details.
- Net::SMTP (from CPAN)
Used when manually reporting spam to SpamCop.
- Mail::SPF (from CPAN)
Used to check DNS Sender Policy Framework (SPF) records to fight email
address forgery and make it easier to identify spams. This module
makes Mail::SPF::Query obsolete.
Net::DNS version 0.58 or higher is required.
Note that NetAddr::IP (required by Mail::SPF) versions up to and
including version 4.006 include a bug that will slow down the entire
perl interpreter. NetAddr::IP version 4.007 or later fixes this.
- Geo::IP (from CPAN)
Used by the RelayCountry plugin (not enabled by default) to determine
the domain country codes of each relay in the path of an email.
IP::Country::Fast is used as alternative if Geo::IP is not installed.
This is not recommended as it's obsolete.
- Net::Ident (from CPAN)
If you plan to use the --auth-ident option to spamd, you will need
to install this module.
- IO::Socket::INET6 (from CPAN)
This is required if the first nameserver listed in your IP
configuration or /etc/resolv.conf file is available only via an IPv6
address.
Fedora: yum install perl-IO-Socket-INET6
- IO::Socket::SSL (from CPAN)
If you wish to use SSL encryption to communicate between spamc and
spamd (the --ssl option to spamd), you need to install this
module. (You will need the OpenSSL libraries and use the
ENABLE_SSL="yes" argument to Makefile.PL to build and run an SSL
compatible spamc.)
Fedora: yum install perl-IO-Socket-SSL
- Compress::Zlib (from CPAN)
If you wish to use the optional zlib compression for communication
between spamc and spamd (the -z option to spamc), useful for
long-distance use of spamc over the internet, you need to install
this module.
Fedora: yum install perl-Compress-Zlib
- Mail::DKIM (from CPAN)
If this module is installed, and you enable the DKIM plugin,
SpamAssassin will perform DKIM lookups when a DKIM-Signature header is
present in the message headers. Current versions of Mail::DKIM (0.20
or later) also perform Domain Key lookups on DomainKey-Signature headers,
without requiring the Mail::DomainKeys module, which is now obsolete.
Version 0.37 or later is preferred, the absolute minimal version is 0.31.
Note that the Mail::DKIM module in turn requires the Digest::SHA module
and OpenSSL libraries.
- DBI *and* DBD driver/modules for your database (from CPAN)
If you intend to use SpamAssassin with an SQL database backend for
user configuration data, Bayes storage, or other storage, you will need
to have these installed; both the basic DBI module and the driver for
your database.
- Encode::Detect (from CPAN)
If you plan to use the normalize_charset config setting to detect
charsets and convert them into Unicode, you will need to install
this module.
- Apache::Test (from CPAN)
If you plan to run the Apache2 version of spamd in the
"spamd-apache2" directory, you will need to install this
module.
- Apache 2 and mod_perl
If you plan to run the Apache2 version of spamd in the "spamd-apache2"
directory, you will need to ensure these are installed.
Ubuntu: sudo apt-get install apache2 libapache2-mod-perl2
- Razor2
If you plan to use Vipul's Razor, note that versions up to and
including version 2.82 include a bug that will slow down the entire
perl interpreter. Version 2.83 or later fixes this.
If you do not plan to use this plugin, be sure to comment out
its loadplugin line in "/etc/mail/spamassassin/v310.pre".
What Next?
----------
Take a look at the USAGE document for more information on how to use
SpamAssassin.
The SpamAssassin Wiki <http://wiki.spamassassin.org/> contains
information on custom plugins, extensions, and other optional modules
included with SpamAssassin.
(end of INSTALL)
// vim:tw=74:

35
upstream/INSTALL.VMS Normal file
View File

@ -0,0 +1,35 @@
Notes on building SpamAssassin on VMS
-------------------------------------
- Firstly, do not do a default build; this will build a set of C components
which currently use Autoconf to determine their dependencies, to build
"spamc".
Instead, run
perl Makefile.PL
make spamassassin
If you want "spamc" I suggest you provide a replacement header and
binaries.mk we can use, then we can special-case VMS in Makefile.PL to use
those instead of running autoconf.
- The generated Makefile.PL no longer relies on UNIX input/output
redirection; /bin/sh "for" loops; @foo "silent" directives; backslashed
continuations; or [ -f foo ] || somecommand conditionals. At least, not in
our code any more ;)
- There are still some UNIX paths inside the generated Makefile:
"rules/local.cf" for example. Suggestions on how to work around this
neatly inside the Makefile syntax while gaining VMS compatibility would be
welcome!
- "make test" will probably rely heavily on UNIX-style input and output
redirection, so will probably still fail.
- bug 1099 in the SA Bugzilla is being used to track progress.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=1099

202
upstream/LICENSE Normal file
View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

585
upstream/MANIFEST Normal file
View File

@ -0,0 +1,585 @@
CREDITS
Changes
INSTALL
INSTALL.VMS
LICENSE
MANIFEST
MANIFEST.SKIP
Makefile.PL
NOTICE
PACKAGING
README
TRADEMARK
UPGRADE
USAGE
build/check_dependencies
build/convert_pods_to_doc
build/get_version
build/mkrules
build/parse-rules-for-masses
build/preprocessor
build/sha256sum.pl
build/sha512sum.pl
ldap/README
ldap/README.testing
ldap/sa_test.ldif
lib/Mail/SpamAssassin.pm
lib/Mail/SpamAssassin/AICache.pm
lib/Mail/SpamAssassin/ArchiveIterator.pm
lib/Mail/SpamAssassin/AsyncLoop.pm
lib/Mail/SpamAssassin/AutoWhitelist.pm
lib/Mail/SpamAssassin/Bayes.pm
lib/Mail/SpamAssassin/Bayes/CombineChi.pm
lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
lib/Mail/SpamAssassin/BayesStore.pm
lib/Mail/SpamAssassin/BayesStore/BDB.pm
lib/Mail/SpamAssassin/BayesStore/DBM.pm
lib/Mail/SpamAssassin/BayesStore/MySQL.pm
lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
lib/Mail/SpamAssassin/BayesStore/Redis.pm
lib/Mail/SpamAssassin/Util/TinyRedis.pm
lib/Mail/SpamAssassin/BayesStore/SDBM.pm
lib/Mail/SpamAssassin/BayesStore/SQL.pm
lib/Mail/SpamAssassin/Client.pm
lib/Mail/SpamAssassin/Conf.pm
lib/Mail/SpamAssassin/Conf/LDAP.pm
lib/Mail/SpamAssassin/Conf/Parser.pm
lib/Mail/SpamAssassin/Conf/SQL.pm
lib/Mail/SpamAssassin/Constants.pm
lib/Mail/SpamAssassin/DBBasedAddrList.pm
lib/Mail/SpamAssassin/Dns.pm
lib/Mail/SpamAssassin/DnsResolver.pm
lib/Mail/SpamAssassin/HTML.pm
lib/Mail/SpamAssassin/Locales.pm
lib/Mail/SpamAssassin/Locker.pm
lib/Mail/SpamAssassin/Locker/Flock.pm
lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
lib/Mail/SpamAssassin/Locker/Win32.pm
lib/Mail/SpamAssassin/Logger.pm
lib/Mail/SpamAssassin/Logger/File.pm
lib/Mail/SpamAssassin/Logger/Stderr.pm
lib/Mail/SpamAssassin/Logger/Syslog.pm
lib/Mail/SpamAssassin/MailingList.pm
lib/Mail/SpamAssassin/Message.pm
lib/Mail/SpamAssassin/Message/Metadata.pm
lib/Mail/SpamAssassin/Message/Metadata/Received.pm
lib/Mail/SpamAssassin/Message/Node.pm
lib/Mail/SpamAssassin/NetSet.pm
lib/Mail/SpamAssassin/PerMsgLearner.pm
lib/Mail/SpamAssassin/PerMsgStatus.pm
lib/Mail/SpamAssassin/PersistentAddrList.pm
lib/Mail/SpamAssassin/Plugin.pm
lib/Mail/SpamAssassin/Plugin/AskDNS.pm
lib/Mail/SpamAssassin/Plugin/ASN.pm
lib/Mail/SpamAssassin/Plugin/AWL.pm
lib/Mail/SpamAssassin/Plugin/AccessDB.pm
lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
lib/Mail/SpamAssassin/Plugin/Bayes.pm
lib/Mail/SpamAssassin/Plugin/BodyEval.pm
lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
lib/Mail/SpamAssassin/Plugin/Check.pm
lib/Mail/SpamAssassin/Plugin/DCC.pm
lib/Mail/SpamAssassin/Plugin/DKIM.pm
lib/Mail/SpamAssassin/Plugin/DNSEval.pm
lib/Mail/SpamAssassin/Plugin/FreeMail.pm
lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
lib/Mail/SpamAssassin/Plugin/Hashcash.pm
lib/Mail/SpamAssassin/Plugin/HashBL.pm
lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
lib/Mail/SpamAssassin/Plugin/OLEVBMacro.pm
lib/Mail/SpamAssassin/Plugin/OneLineBodyRuleType.pm
lib/Mail/SpamAssassin/Plugin/Phishing.pm
lib/Mail/SpamAssassin/Plugin/PhishTag.pm
lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
lib/Mail/SpamAssassin/Plugin/Pyzor.pm
lib/Mail/SpamAssassin/Plugin/Razor2.pm
lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
lib/Mail/SpamAssassin/Plugin/RelayEval.pm
lib/Mail/SpamAssassin/Plugin/ResourceLimits.pm
lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
lib/Mail/SpamAssassin/Plugin/Reuse.pm
lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
lib/Mail/SpamAssassin/Plugin/SPF.pm
lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
lib/Mail/SpamAssassin/Plugin/SpamCop.pm
lib/Mail/SpamAssassin/Plugin/Test.pm
lib/Mail/SpamAssassin/Plugin/TextCat.pm
lib/Mail/SpamAssassin/Plugin/TxRep.pm
lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
lib/Mail/SpamAssassin/Plugin/URIDetail.pm
lib/Mail/SpamAssassin/Plugin/URIEval.pm
lib/Mail/SpamAssassin/Plugin/VBounce.pm
lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
lib/Mail/SpamAssassin/PluginHandler.pm
lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
lib/Mail/SpamAssassin/RegistryBoundaries.pm
lib/Mail/SpamAssassin/Reporter.pm
lib/Mail/SpamAssassin/SQLBasedAddrList.pm
lib/Mail/SpamAssassin/SpamdForkScaling.pm
lib/Mail/SpamAssassin/SubProcBackChannel.pm
lib/Mail/SpamAssassin/Timeout.pm
lib/Mail/SpamAssassin/Util.pm
lib/Mail/SpamAssassin/Util/DependencyInfo.pm
lib/Mail/SpamAssassin/Util/Progress.pm
lib/Mail/SpamAssassin/Util/ScopedTimer.pm
lib/Mail/SpamAssassin/Util/TieOneStringHash.pm
lib/spamassassin-run.pod
procmailrc.example
rules.README
rules/active.list
rules/init.pre
rules/languages
rules/local.cf
rules/regression_tests.cf
rules/sa-update-pubkey.txt
rules/user_prefs.template
rules/v310.pre
rules/v312.pre
rules/v320.pre
rules/v330.pre
rules/v340.pre
rules/v341.pre
rules/v342.pre
rules/v343.pre
rules/20_aux_tlds.cf
sa-awl.raw
sa-check_spamd.raw
sa-compile.raw
sa-learn.raw
sa-update.raw
sample-nonspam.txt
sample-spam.txt
spamassassin.raw
spamc/Makefile.in
spamc/Makefile.win
spamc/README.qmail
spamc/README.win
spamc/acconfig.h
spamc/config.h.in
spamc/config.h.win
spamc/configure
spamc/configure.in
spamc/configure.pl
spamc/getopt.c
spamc/getopt.h
spamc/libspamc.c
spamc/libspamc.h
spamc/qmail-spamc.c
spamc/spamc.c
spamc/spamc.h.in
spamc/spamc.h.win
spamc/spamc.pod
spamc/utils.c
spamc/utils.h
spamc/version.h.in
spamc/version.h.pl
spamd-apache2/MANIFEST
spamd-apache2/MANIFEST.SKIP
spamd-apache2/META.yml
spamd-apache2/Makefile.PL
spamd-apache2/README.apache
spamd-apache2/bin/Bench-spamd.pl
spamd-apache2/bin/Spamd
spamd-apache2/bin/apache-spamd.pl
spamd-apache2/lib/Mail/SpamAssassin/Spamd.pm
spamd-apache2/lib/Mail/SpamAssassin/Spamd/Apache2.pm
spamd-apache2/lib/Mail/SpamAssassin/Spamd/Apache2/AclIP.pm
spamd-apache2/lib/Mail/SpamAssassin/Spamd/Apache2/AclRFC1413.pm
spamd-apache2/lib/Mail/SpamAssassin/Spamd/Apache2/Config.pm
spamd-apache2/lib/Mail/SpamAssassin/Spamd/Config.pm
spamd-apache2/t/30run.t
spamd-apache2/t/TEST.PL
spamd-apache2/t/certs/Makefile
spamd-apache2/t/certs/server.crt
spamd-apache2/t/certs/server.csr
spamd-apache2/t/certs/server.key
spamd-apache2/t/conf/extra.last.conf.in
spamd/OSXStartup/README
spamd/OSXStartup/SAspamd
spamd/OSXStartup/StartupParameters.plist
spamd/PROTOCOL
spamd/README
spamd/README.SuSE
spamd/README.vpopmail
spamd/netbsd-rc-script.sh
spamd/redhat-rc-script.sh
spamd/slackware-rc-script.sh
spamd/solaris-rc-script.sh
spamd/spamd.raw
spamd/suse-ancient-rc-script.sh
sql/README
sql/README.awl
sql/README.bayes
sql/README.txrep
sql/awl_mysql.sql
sql/awl_pg.sql
sql/bayes_mysql.sql
sql/bayes_pg.sql
sql/userpref_mysql.sql
sql/userpref_pg.sql
sql/txrep_mysql.sql
sql/txrep_pg.sql
sql/txrep_sqlite.sql
t/README
t/SATest.pl
t/SATest.pm
t/all_modules.t
t/autolearn.t
t/autolearn_force.t
t/autolearn_force_fail.t
t/basic_lint.t
t/basic_lint_without_sandbox.t
t/basic_meta.t
t/basic_meta2.t
t/basic_obj_api.t
t/bayesbdb.t
t/bayesdbm.t
t/bayesdbm_flock.t
t/bayessdbm.t
t/bayessdbm_seen_delete.t
t/bayessql.t
t/blacklist_autolearn.t
t/body_mod.t
t/body_str.t
t/check_implemented.t
t/cidrs.t
t/config.dist
t/config_errs.t
t/config_text.t
t/config_tree_recurse.t
t/cpp_comments_in_spamc.t
t/cross_user_config_leak.t
t/olevbmacro.t
t/data/01_test_rules.cf
t/data/01_test_rules.pre
t/data/Dumpheaders.pm
t/data/dkim/test-adsp-11.msg
t/data/dkim/test-adsp-12.msg
t/data/dkim/test-adsp-13.msg
t/data/dkim/test-adsp-14.msg
t/data/dkim/test-adsp-15.msg
t/data/dkim/test-adsp-16.msg
t/data/dkim/test-adsp-17.msg
t/data/dkim/test-adsp-18.msg
t/data/dkim/test-adsp-19.msg
t/data/dkim/test-adsp-20.msg
t/data/dkim/test-adsp-21.msg
t/data/dkim/test-adsp-22.msg
t/data/dkim/test-fail-01.msg
t/data/dkim/test-fail-02.msg
t/data/dkim/test-fail-03.msg
t/data/dkim/test-fail-04.msg
t/data/dkim/test-fail-05.msg
t/data/dkim/test-fail-06.msg
t/data/dkim/test-fail-07.msg
t/data/dkim/test-fail-08.msg
t/data/dkim/test-fail-09.msg
t/data/dkim/test-pass-01.msg
t/data/dkim/test-pass-02.msg
t/data/dkim/test-pass-03.msg
t/data/dkim/test-pass-04.msg
t/data/dkim/test-pass-05.msg
t/data/dkim/test-pass-06.msg
t/data/dkim/test-pass-07.msg
t/data/dkim/test-pass-08.msg
t/data/dkim/test-pass-09.msg
t/data/dkim/test-pass-10.msg
t/data/dkim/test-pass-11.msg
t/data/dkim/test-pass-12.msg
t/data/dkim/test-pass-13.msg
t/data/dkim/test-pass-14.msg
t/data/dkim/test-pass-15.msg
t/data/dkim/test-pass-16.msg
t/data/dkim/test-pass-17.msg
t/data/dkim/test-pass-18.msg
t/data/dkim/test-pass-19.msg
t/data/etc/hello.txt
t/data/etc/testhost.cert
t/data/etc/testhost.key
t/data/mime-subject.txt
t/data/nice/001
t/data/nice/002
t/data/nice/003
t/data/nice/004
t/data/nice/005
t/data/nice/006
t/data/nice/007
t/data/nice/008
t/data/nice/009
t/data/nice/010
t/data/nice/011
t/data/nice/012
t/data/nice/013
t/data/nice/014
t/data/nice/015
t/data/nice/016
t/data/nice/base64.txt
t/data/nice/crlf-endings
t/data/nice/dkim/AddedVtag_07
t/data/nice/dkim/BasicTest_01
t/data/nice/dkim/MIMEnowsp_05
t/data/nice/dkim/MIMEsimple_04
t/data/nice/dkim/MultipleAuthRes_10
t/data/nice/dkim/MultipleReceived_08
t/data/nice/dkim/MultipleSig_06
t/data/nice/dkim/NonExistingHeader_09
t/data/nice/dkim/Nowsp_03
t/data/nice/dkim/Simple_02
t/data/nice/mailman_message.txt
t/data/nice/mime1
t/data/nice/mime2
t/data/nice/mime3
t/data/nice/mime4
t/data/nice/mime5
t/data/nice/mime6
t/data/nice/mime7
t/data/nice/mime8
t/data/nice/mime9
t/data/nice/no_body
t/data/nice/not_gtube.eml
t/data/nice/orig_ip_hdr.eml
t/data/nice/spf1
t/data/nice/spf2
t/data/nice/spf3
t/data/nice/spf3-received-spf
t/data/reporterplugin.pm
t/data/spam/001
t/data/spam/002
t/data/spam/003
t/data/spam/004
t/data/spam/005
t/data/spam/006
t/data/spam/007
t/data/spam/008
t/data/spam/009
t/data/spam/010
t/data/spam/011
t/data/spam/012
t/data/spam/013
t/data/spam/014
t/data/spam/015
t/data/spam/016
t/data/spam/017
t/data/spam/018
t/data/spam/badctype1
t/data/spam/badctype2
t/data/spam/badmime.txt
t/data/spam/badmime2.txt
t/data/spam/badmime3.txt
t/data/spam/base64.txt
t/data/spam/bsmtp
t/data/spam/bsmtpnull
t/data/spam/dnsbl.eml
t/data/spam/gtube.eml
t/data/spam/gtubedcc.eml
t/data/spam/olevbmacro/encrypted.eml
t/data/spam/olevbmacro/goodcsv.eml
t/data/spam/olevbmacro/macro.eml
t/data/spam/olevbmacro/malicemacro.eml
t/data/spam/olevbmacro/nomacro.eml
t/data/spam/olevbmacro/renamedmacro.eml
t/data/spam/olevbmacro/zippwmacro.eml
t/data/spam/razor2
t/data/spam/relayUS.eml
t/data/spam/spf1
t/data/spam/spf2
t/data/spam/spf3
t/data/spamc_blank.cf
t/data/taintcheckplugin.pm
t/data/testplugin.pm
t/data/testplugin2.pm
t/data/validuserplugin.pm
t/data/whitelists/action.eff.org
t/data/whitelists/amazon_co_uk_ship
t/data/whitelists/amazon_com_ship
t/data/whitelists/cert.org
t/data/whitelists/debian_bts_reassign
t/data/whitelists/ibm_enews_de
t/data/whitelists/infoworld
t/data/whitelists/linuxplanet
t/data/whitelists/lp.org
t/data/whitelists/media_unspun
t/data/whitelists/mlist_mailman_message
t/data/whitelists/mlist_yahoo_groups_message
t/data/whitelists/mypoints
t/data/whitelists/neat_net_tricks
t/data/whitelists/netcenter-direct_de
t/data/whitelists/netsol_renewal
t/data/whitelists/networkworld
t/data/whitelists/oracle_net_techblast
t/data/whitelists/orbitz.com
t/data/whitelists/paypal.com
t/data/whitelists/register.com_password
t/data/whitelists/ryanairmail.com
t/data/whitelists/sf.net
t/data/whitelists/winxpnews.com
t/data/whitelists/yahoo-inc.com
t/data/phishing/openphish-feed.txt
t/data/phishing/phishtank-feed.csv
t/data/spam/phishing_openphish.eml
t/data/spam/phishing_phishtank.eml
t/phishing.t
t/date.t
t/db_awl_path.t
t/db_awl_perms.t
t/db_based_whitelist.t
t/db_based_whitelist_ips.t
t/dcc.t
t/debug.t
t/desc_wrap.t
t/dkim.t
t/dnsbl.t
t/dnsbl_sc_meta.t
t/duplicates.t
t/freemail.t
t/get_all_headers.t
t/get_headers.t
t/gtube.t
t/hashcash.t
t/html_colors.t
t/html_obfu.t
t/html_utf8.t
t/idn_dots.t
t/if_can.t
t/ifversion.t
t/ip_addrs.t
t/lang_lint.t
t/lang_pl_tests.t
t/line_endings.t
t/lint_nocreate_prefs.t
t/memory_cycles.t
t/metadata.t
t/mimeheader.t
t/mimeparse.t
t/missing_hb_separator.t
t/nonspam.t
t/originating_ip_hdr.t
t/plugin.t
t/plugin_file.t
t/plugin_priorities.t
t/prefs_include.t
t/priorities.t
t/razor2.t
t/rcvd_parser.t
t/re_base_extraction.t
t/recips.t
t/recreate.t
t/recursion.t
t/regexp_valid.t
t/relaycountry_fast.t
t/relaycountry_geoip.t
t/relaycountry_geoip2.t
t/relative_scores.t
t/report_safe.t
t/reportheader.t
t/reportheader_8bit.t
t/reuse.t
t/root_spamd.t
t/root_spamd_tell.t
t/root_spamd_tell_paranoid.t
t/root_spamd_tell_x.t
t/root_spamd_tell_x_paranoid.t
t/root_spamd_u.t
t/root_spamd_u_dcc.t
t/root_spamd_virtual.t
t/root_spamd_x.t
t/root_spamd_x_paranoid.t
t/root_spamd_x_u.t
t/rule_multiple.t
t/rule_names.t
t/rule_types.t
t/sa_awl.t
t/sa_check_spamd.t
t/sa_compile.t
t/sha1.t
t/shortcircuit.t
t/spam.t
t/spamc.t
t/spamc_B.t
t/spamc_E.t
t/spamc_c.t
t/spamc_c_stdout_closed.t
t/spamc_cf.t
t/spamc_headers.t
t/spamc_l.t
t/spamc_optC.t
t/spamc_optL.t
t/spamc_x_E_R.t
t/spamc_x_e.t
t/spamc_y.t
t/spamc_z.t
t/spamd.t
t/spamd_allow_user_rules.t
t/spamd_client.t
t/spamd_hup.t
t/spamd_kill_restart.t
t/spamd_kill_restart_rr.t
t/spamd_ldap.t
t/spamd_maxchildren.t
t/spamd_maxsize.t
t/spamd_parallel.t
t/spamd_plugin.t
t/spamd_port.t
t/spamd_prefork_stress.t
t/spamd_prefork_stress_2.t
t/spamd_prefork_stress_3.t
t/spamd_prefork_stress_4.t
t/spamd_protocol_10.t
t/spamd_report.t
t/spamd_report_ifspam.t
t/spamd_sql_prefs.t
t/spamd_ssl.t
t/spamd_ssl_accept_fail.t
t/spamd_stop.t
t/spamd_symbols.t
t/spamd_syslog.t
t/spamd_unix.t
t/spamd_unix_and_tcp.t
t/spamd_user_rules_leak.t
t/spamd_utf8.t
t/spamd_whitelist_leak.t
t/spf.t
t/sql_based_whitelist.t
t/stop_always_matching_regexps.t
t/strip2.t
t/strip_no_subject.t
t/stripmarkup.t
t/tainted_msg.t
t/test_dir
t/text_bad_ctype.t
t/timeout.t
t/trust_path.t
t/uri.t
t/uri_html.t
t/uri_text.t
t/uribl.t
t/urilocalbl_geoip.t
t/utf8.t
t/util_wrap.t
t/whitelist_addrs.t
t/whitelist_from.t
t/whitelist_subject.t
t/whitelist_to.t
t/zz_cleanup.t
t/spamc_bug6176.t
t/data/spam/dnsbl_domsonly.eml
t/uribl_domains_only.t
t/data/spam/dnsbl_ipsonly.eml
t/uribl_all_types.t
t/uribl_ips_only.t
t/uri_list.t
t/dnsbl_subtests.t
powered_by/128-powered-by-spamassassin.png
powered_by/256-powered-by-spamassassin.png
powered_by/512-powered-by-spamassassin.png
powered_by/LOGO_USAGE.TXT
powered_by/powered_by_spamassassin.psd
META.yml Module YAML meta-data (added by MakeMaker)
META.json Module JSON meta-data (added by MakeMaker)

124
upstream/MANIFEST.SKIP Normal file
View File

@ -0,0 +1,124 @@
\.a$
\.bak$
\.dll$
\.exe$
\.exists$
\.log$
\.o$
\.obj$
\.old$
\.pid$
\.so$
\.svn/
\.gitignore$
\.swp$
\.tmp$
\.tst$
\bautom4te\.cache/
\bcore(\.\d+)?$
^Makefile$
^spamc/Makefile$
\bold\.cf$
\btmon\.out$
\b[Oo][Ll][Dd]$
\b[Oo][Uu][Tt]$
^blib/
^blibdirs$
^build/3\.\d\.\d_change_summary$
^build/cf_to_html$
^build/buildbot_ready$
^build/find_meta_missing_dependencies$
^build/README$
^build/replace_license_blocks$
^build/tag_.*$
^build/update_.*$
^build/pga.*$
^build/old/.*$
^debian/
^doc/
^lib/Mail/SpamAssassin/Plugin/NetCache\.pm$
^lib/Mail/SpamAssassin/Util/MemoryDump\.pm$
^lm/
^made-doc-stamp$
^Mail-SpamAssassin-.*$
^pm_to_blib$
^pod2html?-?.*$
^presentation$
^rules/.*\.cf$
^rules/.*\.pm$
^rules/STATISTICS-set\d.txt$
^sa-filter$
^sa-learn$
^sa_with_mail_audit$
^spamassassin$
^spamc/config\.(h|status|log)$
^spamc/qmail-spamc$
^spamc/spamc$
^spamc/spamc\.h$
^spamc/spamc\.conf$
^spamc/sslspamc$
^spamc/version\.h$
^spamd/spamd$
^stuff/
^t/bayessql\.cf$
^t/config$
^t/data/nice/cjk/
^t/data/whitelists/
^t/do_net$
^t/log/
^t/rule_tests\.t$
^t/sql_based_whitelist\.cf$
^tasks/
^testmails.*$
^todo$
^wordfreqs/
~$
^masses/
^contrib/
^tools/
^pod2ref
^sa-update$
^sa-compile$
^build/describe-to-po-file$
^rules/70_sandbox\.cf$
^build/automc/
^rulesrc/.*$
^rules/active\.list$
^build/mkupdates/
^build/buildbot/
^hack/
^patches/
(^|/)\.
~$
/logs/
/t/logs/
\.conf$
\.pl$
\bCVS\b
/t/TEST$
^lib/Mail/SpamAssassin/Plugin/P595Body\.pm
^lib/Mail/SpamAssassin/Plugin/RabinKarpBody\.pm
^rules/svn_only\.pre$
^masses/tenpass/randomise$
^masses/rule-dev/seek-phrases-in-log$
^masses/rule-dev/seek-phrases-in-corpus$
^build/announcements/.*\.txt$
^t/mass_check\.t$
^build/backup
^build/hudson
^build/jenkins
^rules/.*\.pm$
^artifacts/
^t\.rules/
^t/make_install\.t$
^backend/
^sa-awl$
^sa-check_spamd$
^xt/
^build/rebuild_xt$
^build/repackage_latest_update_rules$
^MYMETA.(json|yml)$
^trunk-only.*
^t/mkrules\.t$
^t/mkrules_else\.t$
^t/spamc_H\.t$

61
upstream/META.json Normal file
View File

@ -0,0 +1,61 @@
{
"abstract" : "Apache SpamAssassin is an extensible email filter which is used to identify spam",
"author" : [
"The Apache SpamAssassin Project <dev@spamassassin.apache.org>"
],
"dynamic_config" : 1,
"generated_by" : "ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.120921",
"license" : [
"apache_2_0"
],
"meta-spec" : {
"url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
"version" : "2"
},
"name" : "Mail-SpamAssassin",
"no_index" : {
"directory" : [
"t",
"inc"
]
},
"prereqs" : {
"build" : {
"requires" : {
"ExtUtils::MakeMaker" : "0"
}
},
"runtime" : {
"requires" : {
"Archive::Tar" : "1.23",
"Digest::SHA1" : "0",
"Errno" : "0",
"File::Copy" : "2.02",
"File::Spec" : "0.8",
"HTML::Parser" : "3.43",
"IO::Zlib" : "1.04",
"Mail::DKIM" : "0.31",
"Net::DNS" : "0.34",
"NetAddr::IP" : "4.01",
"Pod::Usage" : "1.1",
"Sys::Hostname" : "0",
"Test::More" : "0",
"Time::HiRes" : "0",
"Time::Local" : "0"
}
}
},
"release_status" : "stable",
"resources" : {
"homepage" : "https://spamassassin.apache.org/",
"license" : [
"http://www.apache.org/licenses/LICENSE-2.0.html"
],
"repository" : {
"type" : "svn",
"url" : "http://svn.apache.org/repos/asf/spamassassin/"
},
"x_MailingList" : "http://wiki.apache.org/spamassassin/MailingLists"
},
"version" : "3.004004"
}

39
upstream/META.yml Normal file
View File

@ -0,0 +1,39 @@
---
abstract: 'Apache SpamAssassin is an extensible email filter which is used to identify spam'
author:
- 'The Apache SpamAssassin Project <dev@spamassassin.apache.org>'
build_requires:
ExtUtils::MakeMaker: 0
dynamic_config: 1
generated_by: 'ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.120921'
license: apache
meta-spec:
url: http://module-build.sourceforge.net/META-spec-v1.4.html
version: 1.4
name: Mail-SpamAssassin
no_index:
directory:
- t
- inc
requires:
Archive::Tar: 1.23
Digest::SHA1: 0
Errno: 0
File::Copy: 2.02
File::Spec: 0.8
HTML::Parser: 3.43
IO::Zlib: 1.04
Mail::DKIM: 0.31
Net::DNS: 0.34
NetAddr::IP: 4.01
Pod::Usage: 1.1
Sys::Hostname: 0
Test::More: 0
Time::HiRes: 0
Time::Local: 0
resources:
homepage: https://spamassassin.apache.org/
license: http://www.apache.org/licenses/LICENSE-2.0.html
repository: http://svn.apache.org/repos/asf/spamassassin/
x_MailingList: http://wiki.apache.org/spamassassin/MailingLists
version: 3.004004

1170
upstream/Makefile.PL Normal file

File diff suppressed because it is too large Load Diff

35
upstream/NOTICE Normal file
View File

@ -0,0 +1,35 @@
This product includes software developed by the Apache Software
Foundation (http://www.apache.org/).
SpamAssassin is a trademark of the Apache Software Foundation.
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
The OpenSSL Project - http://www.openssl.org/source/
spamc and libspamc use OpenSSL to perform SSL encryption.
Steffen Ullrich - http://search.cpan.org/%7esullr/
spamd uses IO::Socket::SSL to perform SSL encryption.

315
upstream/PACKAGING Normal file
View File

@ -0,0 +1,315 @@
Introduction
------------
The Makefile which is used to build SpamAssassin is created by calling
perl Makefile.PL
This is the standard Perl way of building packages. It involves the
Perl module ExtUtils::MakeMaker which creates a Makefile.
ExtUtils::MakeMaker recognizes several variables which can be set at
the command line to give the user the possibility to influence the
contents of the generated Makefile. All macros written to the Makefile
can be changed on the command line like this:
perl Makefile.PL FOO="bar"
This would give the (exemplary) macro 'FOO' the value 'bar'.
Now has the internal structure of ExtUtils::MakeMaker and that of the
generated Makefiles changed over the years. For a description of the
features your version supports, please read
perldoc ExtUtils::MakeMaker
One important thing to know when you're building packages is that Perl
uses three different "repositories" for installed modules and their
corresponding files: 'perl', 'site' and 'vendor' (the latter was
introduced with Perl 5.6.0). These have the following meanings:
perl: This should be used only by essential modules shipped with
Perl or modules required by one of these. And maybe for some
other important modules chosen by some obscure selection
process. Only one thing is sure about this set of directories:
SpamAssassin doesn't belong there.
site: This is the default. The libs (.pm files) of the modules are
installed into the site_perl subdir in the Perl lib dir.
Everything installed via the CPAN shell or directly from
sources should go there.
vendor: This repository was officially introduced some time after
Perl 5.005_03 (maybe with 5.6.0). It's intended to be the
target for all modules installed from distribution specific
packages; that means RPMs, debs, ebuilds, etc. The rationale
behind this is that this prevents modules installed by the
user from being overwritten by packaged ones.
The wanted repository can be chosen by setting the variable INSTALLDIRS.
So according to the description above should packages probably use
perl Makefile.PL INSTALLDIRS=vendor
That's definitely the correct way to go for Debian, according to their
Perl Policy [DEBPERL]. But I've heard that the vendor stuff is either
broken or not set on many other systems, especially Red Hat ones. Google
might help to find out more on this topic.
The following resources might help understanding this stuff:
[MANEUMM616], [MM00779], [P5P94113].
Changing paths in the Makefile
------------------------------
Internally the Makefile defined quite some paths for the different settings
of INSTALLDIRS. One can change them directly but to be independent of the
version of ExtUtils::MakeMaker the following variables should be used:
PREFIX:
Sets the prefix below which SpamAssassin is installed. Please note the
exceptions for SYSCONFDIR.
Default is the prefix Perl was built with (call
perl -V:prefix
to see the value). Normally something like /usr or /usr/local.
Samples:
This will install the spamassassin apps in /foo/bin, the libs in
/foo/lib/perl5, the shared stuff in /foo/share/spamassassin and make
SpamAssassin look for config files in /foo/etc/mail/spamassassin:
perl Makefile.PL PREFIX=/foo
LIB:
This will change the directory where the SpamAssassin libraries (.pm files)
are installed. The module's architecture-independent files will be put into
the given directory, the architecture-dependent files into a subdirectory
with the name of the current architecture.
The default is something like PREFIX/lib/perl5/site_perl/PERL_VERSION (for
INSTALLDIRS=site).
Samples:
Under i686-Linux, put the architecture-independent files below ~/.libs
and the architecture-dependent ones below ~/.libs/i686-linux:
perl Makefile.PL LIB=~/.libs
DATADIR (DEFRULESDIR):
SpamAssassin's real logic lies in its shipped rule definitions and the
corresponding scores. The files with these settings have to be saved
somewhere, normally below PREFIX/share/spamassassin. The full path to
that directory can be changed with this variable (DEFRULESDIR is a
synonym).
ATTENTION: All files within this directory are removed when SpamAssassin
is installed!
Samples:
Install everything into the default locations but put the rules in
/tmp/sa-rules (for whatever reason):
perl Makefile.PL DATADIR=/tmp/sa-rules
SYSCONFDIR:
Sets the base dir for the config files. See also CONFDIR.
The default depends on the PREFIX and is compliant to the FHS:
- if PREFIX is either /usr or /usr/local:
/etc
- if PREFIX starts with /opt:
/etc/opt
- else:
PREFIX/etc
Samples:
This will (on Windows) install below 'C:\Program Files\SpamAssassin' but
look for the config files in 'C:\Program Files\Shared Files\SpamAssassin':
perl Makefile.PL PREFIX="C:/Program Files/SpamAssassin"
SYSCONFDIR="C:/Program Files/Shared Files/SpamAssassin"
To put the apps and libs below ~/.sa-bin but the config below ~/.sa-etc
try the following:
perl Makefile.PL PREFIX=$HOME/.sa-bin SYSCONFDIR=$HOME/.sa-etc
And the following installs SpamAssassin in /usr/local and forces the
config files to be below /usr/local, too:
perl Makefile.PL PREFIX=/usr/local SYSCONFDIR=/usr/local/etc
CONFDIR (LOCALRULESDIR):
SpamAssassin looks for its config files in SYSCONFDIR/mail/spamassassin.
(There is also a sample local.cf created if such a file doesn't exist yet.)
Some people didn't like this path for various reasons so the full path to
the config files can be changed here (this more or less makes SYSCONFDIR
obsolete). A synonym for this variable is LOCALRULESDIR.
Samples:
If you'd like to have the config files directly in /etc/spamassassin
try this:
perl Makefile.PL CONFDIR=/etc/spamassassin
LOCALSTATEDIR:
"sa-update" will download rule updates into LOCALSTATEDIR/spamassassin.
The default depends on the PREFIX and is compliant to the FHS:
- if PREFIX is either /usr or /usr/local:
/var/lib
- if PREFIX starts with /opt:
/var/opt
- else:
PREFIX/var
Samples:
If you'd like to have the downloaded rules files in /var/spamassassin
try this:
perl Makefile.PL LOCALSTATEDIR=/var
Installing to a directory different from the final destination
--------------------------------------------------------------
When you're building packages, it's often needed to install the stuff to
some temporary directory and then build the package from there. The problem
with this approach is that the build system of SpamAssassin needs to write
some final paths to the libs and the applications.
Previous versions offered some complicated variables to achieve this. Those
hacks weren't compatible to current versions of ExtUtils::MakeMaker. But
ExtUtils::MakeMaker 6.06 introduced a feature which is well known from the GNU
build tools [GNUMAKECMD]: The variable DESTDIR.
The value of DESTDIR is simply prepended to all other paths on make install.
So if you wanted to create a SpamAssassin package for a system which will
have it installed in /usr but you want to create that package from some temp
dir, you would do something like this:
perl Makefile.PL Makefile.PL PREFIX=/usr DESTDIR=/tmp/sa-build
make
make disttest
make install
cd /tmp/sa-build
build_some_package
Setting further options on the command line
-------------------------------------------
Besides the directories, the build process of SpamAssassin supports several
other settings to set or enable some features. For some of these settings
the user is asked before the Makefile is created. To avoid these questions
(and accept the defaults, whatever they are) it is possible to redirect
STDIN from the null device like this:
perl Makefile.PL < /dev/null
Or, under Windows:
perl Makefile.PL < nul
The following variables are supported:
ENABLE_SSL:
Can be set to either "yes" or "no" (default). Makes it possible to use SSL
encryption on the (TCP) connection between spamc and spamd.
Sample:
Build spamc with SSL, use defaults for all other questions:
perl Makefile.PL ENABLE_SSL=yes < /dev/null
CONTACT_ADDRESS:
Each reported spam contains an address under which the confused user/client
can request more information about the tagging of his mail. That address can
be set here. The default is to query the buildung user, falling back to the
string "the administrator of that system".
Sample:
The user can find some information on the page http://example.com/tag/:
perl Makefile.PL CONTACT_ADDRESS="http://example.com/tag/"
RUN_NET_TESTS:
Vipul's Razor and Net::DNS are optional modules. If one of those modules is
found to be installed, some special tests can be performed when 'make test'
is run. The builder is asked if he wants to do so. Default is "no" (because
those tests can fail if there are problems with the network connection or
the servers).
Sample:
Run only the Razor tests:
perl Makefile.PL RUN_NET_TESTS=yes < /dev/null
make test TEST_FILES="t/razor*.t"
Twisting Perl details
---------------------
The build process of SpamAssassin has to know several details of the Perl
calling it later. This is used to work around some Perl bugs and make it
all actually work :o) The following additional variables are supported to
modify these settings:
PERL_BIN:
The path to the perl application which will be used to call the scripts
(like spamassassin and spamd). It makes sense to set this if you build
SpamAssassin on some weird build host which happen to have Perl in
/some/weird/location which is definitely not the location on the end
user's box. The default is the value of the macro FULLPERL which should
be the path to the perl processing Makefile.PL.
Sample:
Building with some weird perl:
/local/buildsys/perl-5.6.1/bin/perl Makefile.PL PERL_BIN=/usr/bin/perl
Obsolete Variables
------------------
The following list shows variables recognized by the old build system and
their new counterparts (no, the ones in the end aren't in the wrong order,
it actually was that complicated):
old: PREFIX=/bar/foo INST_PREFIX=/foo
new: PREFIX=/foo DESTDIR=/bar
old: INST_SITELIB=/foo
new: LIB=/foo
old: SYSCONFDIR=/bar/foo INST_SYSCONFDIR=/foo
new: SYSCONFDIR=/foo DESTDIR=/bar
old: LOCAL_RULES_DIR=/foo PKG_LOCAL_RULES_DIR=/bar/foo
new: LOCALRULESDIR=/foo DESTDIR=/bar
old: DEF_RULES_DIR=/foo PKG_DEF_RULES_DIR=/bar/foo
new: DEFRULESDIR=/foo DESTDIR=/bar
Using one of the following variables will make the Makefile generation
process die:
INST_PREFIX
INST_SITELIB
INST_SYSCONFDIR
LOCAL_RULES_DIR
DEF_RULES_DIR
If you think you need to use one of those nevertheless, you can set the
variable IGNORE_CRUFT to "yes".
Resources
---------
[BUGZILLA] SpamAssassin bug database:
<http://issues.apache.org/SpamAssassin/>
[DEBPERL] Debian Perl Policy, Chapter 3: Packaged Modules:
<http://www.debian.org/doc/packaging-manuals/perl-policy/ch-module_packages.html>
[GNUMAKECMD] GNU make manual: Make Conventions: Variables for Specifying
Commands
<http://www.gnu.org/manual/make-3.79.1/html_chapter/make_14.html#SEC119>
[MANEUMM616] The man page for ExtUtils::MakeMaker 6.16:
<http://search.cpan.org/author/MSCHWERN/ExtUtils-MakeMaker-6.16/lib/ExtUtils/MakeMaker.pm#Default_Makefile_Behaviour>
[MM00779] makemaker-at-perl-dot-org: Michael G Schwern: "Re: MakeMaker
problems with relocation" (PREFIX was broken):
<http://www.mail-archive.com/makemaker@perl.org/msg00779.html>
[P5P94113] perl5-porters: Michael G Schwern: "Re: OS X's vendorlib default
seems wrong" (description of different repositoreis):
<http://archive.develooper.com/perl5-porters@perl.org/msg94113.html>
[RHBUG78053] Red Hat bug 78053: "incompatible changes in behavior of
MakeMaker; affects rpm build process" (introduction of DESTDIR):
<https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=78053>

341
upstream/README Normal file
View File

@ -0,0 +1,341 @@
Welcome to Apache SpamAssassin!
-------------------------------
What is Apache SpamAssassin
---------------------------
Apache SpamAssassin is the #1 Open Source anti-spam platform giving
system administrators a filter to classify email and block "spam"
(unsolicited bulk email). It uses a robust scoring framework and plug-ins
to integrate a wide range of advanced heuristic and statistical analysis
tests on email headers and body text including text analysis, Bayesian
filtering, DNS blocklists, and collaborative filtering databases.
Apache SpamAssassin is a project of the Apache Software Foundation (ASF).
What Apache SpamAssassin Is Not
-------------------------------
Apache SpamAssassin is not a program to delete spam, route spam and ham to
separate mailboxes or folders, or send bounces when you receive spam.
Those are mail routing functions, and Apache SpamAssassin is not a mail
router. Apache SpamAssassin is a mail filter or classifier. It will examine
each message presented to it, and assign a score indicating the
likelihood that the mail is spam. An external program must then
examine this score and do any routing the user wants done. There are
many programs that will easily perform these functions after examining
the score assigned by Apache SpamAssassin.
How Apache SpamAssassin Works
-----------------------------
Apache SpamAssassin uses a wide range of heuristic tests on mail headers and
body text to identify "spam", also known as unsolicited commercial
email.
Once identified, the mail can then be optionally tagged as spam for
later filtering using the user's own mail user-agent application.
Apache SpamAssassin typically differentiates successfully between spam and
non-spam in between 95% and 100% of cases, depending on what kind of mail
you get and your training of its Bayesian filter. Specifically,
Apache SpamAssassin has been shown to produce around 1.5% false negatives (spam
that was missed) and around 0.06% false positives (ham incorrectly marked
as spam). See the rules/STATISTICS*.txt files for more information.
Apache SpamAssassin also includes plugins to support reporting spam messages
automatically or manually to collaborative filtering databases such as
Pyzor, DCC, and Vipul's Razor.
The distribution provides "spamassassin", a command line tool to
perform filtering, along with the "Mail::SpamAssassin" module set
which allows Apache SpamAssassin to be used in spam-protection proxy SMTP or
POP/IMAP server, or a variety of different spam-blocking scenarios.
In addition, "spamd", a daemonized version of Apache SpamAssassin which
runs persistently, is available. Using its counterpart, "spamc",
a lightweight client written in C, an MTA can process large volumes of
mail through Apache SpamAssassin without having to fork/exec a perl interpreter
for each message.
Questions? Need Help?
---------------------
If you have questions about Apache SpamAssassin, please check the Wiki[1] to
see if someone has already posted an answer to your question. (The
Wiki doubles as a FAQ.) Failing that, post a message to the
spamassassin-users mailing list[2]. If you've found a bug (and you're
sure it's a bug after checking the Wiki), please file a report in our
Bugzilla[3].
[1]: http://wiki.apache.org/spamassassin/
[2]: http://wiki.apache.org/spamassassin/MailingLists
[3]: http://issues.apache.org/SpamAssassin/
Please also be sure to read the man pages.
Upgrading Apache SpamAssassin
-----------------------------
IMPORTANT: If you are upgrading from a previous major version of Apache
SpamAssassin, please be sure to read the notes in UPGRADE to find out
what has changed in a non- backward compatible way.
Installing Apache SpamAssassin
------------------------------
See the INSTALL file.
Customizing Apache SpamAssassin
-------------------------------
These are the configuration files installed by Apache SpamAssassin. The commands
that can be used therein are listed in the POD documentation for the
Mail::SpamAssassin::Conf class (run the following command to read it:
"perldoc Mail::SpamAssassin::Conf"). Note: The following directories are
the standard defaults that people use. There is an explanation of all the
default locations that Apache SpamAssassin will look at the end.
- /usr/share/spamassassin/*.cf:
Distributed configuration files, with all defaults. Do not modify
these, as they are overwritten when you upgrade.
- /var/lib/spamassassin/*/*.cf:
Local state directory; updated rulesets, overriding the
distributed configuration files, downloaded using "sa-update". Do
not modify these, as they are overwritten when you run
"sa-update".
- /etc/mail/spamassassin/*.cf:
Site config files, for system admins to create, modify, and
add local rules and scores to. Modifications here will be
appended to the config loaded from the above directory.
- /etc/mail/spamassassin/*.pre:
Plugin control files, installed from the distribution. These are
used to control what plugins are loaded. Modifications here will
be loaded before any configuration loaded from the above
directories.
You want to modify these files if you want to load additional
plugins, or inhibit loading a plugin that is enabled by default.
If the files exist in /etc/mail/spamassassin, they will not
be overwritten during future installs.
- /usr/share/spamassassin/user_prefs.template:
Distributed default user preferences. Do not modify this, as it is
overwritten when you upgrade.
- /etc/mail/spamassassin/user_prefs.template:
Default user preferences, for system admins to create, modify, and
set defaults for users' preferences files. Takes precedence over
the above prefs file, if it exists.
Do not put system-wide settings in here; put them in a file in the
"/etc/mail/spamassassin" directory ending in ".cf". This file is
just a template, which will be copied to a user's home directory
for them to change.
- $USER_HOME/.spamassassin:
User state directory. Used to hold spamassassin state, such
as a per-user automatic whitelist, and the user's preferences
file.
- $USER_HOME/.spamassassin/user_prefs:
User preferences file. If it does not exist, one of the
default prefs file from above will be copied here for the
user to edit later, if they wish.
Unless you're using spamd, there is no difference in
interpretation between the rules file and the preferences file, so
users can add new rules for their own use in the
"~/.spamassassin/user_prefs" file, if they like. (spamd disables
this for security and increased speed.)
- $USER_HOME/.spamassassin/bayes*
Statistics databases used for Bayesian filtering. If they do
not exist, they will be created by Apache SpamAssassin.
Spamd users may wish to create a shared set of bayes databases;
the "bayes_path" and "bayes_file_mode" configuration settings
can be used to do this.
See "perldoc sa-learn" for more documentation on how
to train this.
File Locations:
Apache SpamAssassin will look in a number of areas to find the default
configuration files that are used. The "__*__" text are variables
whose value you can see by looking at the first several lines of the
"spamassassin" or "spamd" scripts.
They are set on install time and can be overridden with the Makefile.PL
command line options DATADIR (for __def_rules_dir__) and CONFDIR (for
__local_rules_dir__). If none of these options were given, FHS-compliant
locations based on the PREFIX (which becomes __prefix__) are chosen.
These are:
__prefix__ __def_rules_dir__ __local_rules_dir__
-------------------------------------------------------------------------
/usr /usr/share/spamassassin /etc/mail/spamassassin
/usr/local /usr/local/share/spamassassin /etc/mail/spamassassin
/opt/$DIR /opt/$DIR/share/spamassassin /etc/opt/mail/spamassassin
$DIR $DIR/share/spamassassin $DIR/etc/mail/spamassassin
The files themselves are then looked for in these paths:
- Distributed Configuration Files
'__def_rules_dir__'
'__prefix__/share/spamassassin'
'/usr/local/share/spamassassin'
'/usr/share/spamassassin'
- Site Configuration Files
'__local_rules_dir__'
'__prefix__/etc/mail/spamassassin'
'__prefix__/etc/spamassassin'
'/usr/local/etc/spamassassin'
'/usr/pkg/etc/spamassassin'
'/usr/etc/spamassassin'
'/etc/mail/spamassassin'
'/etc/spamassassin'
- Default User Preferences File
'__local_rules_dir__/user_prefs.template'
'__prefix__/etc/mail/spamassassin/user_prefs.template'
'__prefix__/share/spamassassin/user_prefs.template'
'/etc/spamassassin/user_prefs.template'
'/etc/mail/spamassassin/user_prefs.template'
'/usr/local/share/spamassassin/user_prefs.template'
'/usr/share/spamassassin/user_prefs.template'
In addition, the "Distributed Configuration Files" location is overridden
by a "Local State Directory", used to store an updated copy of the
ruleset:
__prefix__ __local_state_dir__
-------------------------------------------------------------------------
/usr /var/lib/spamassassin/__version__
/usr/local /var/lib/spamassassin/__version__
/opt/$DIR /var/opt/spamassassin/__version__
$DIR $DIR/var/spamassassin/__version__
This is normally written to by the "sa-update" script. "__version__" is
replaced by a representation of the version number, so that multiple
versions of Apache SpamAssassin will not interfere with each other's rulesets.
After installation, try "perldoc Mail::SpamAssassin::Conf" to see what
can be set. Common first-time tweaks include:
- required_score
Set this higher to make Apache SpamAssassin less sensitive.
If you are installing Apache SpamAssassin system-wide, this is
**strongly** recommended!
Statistics on how many false positives to expect at various
different thresholds are available in the "STATISTICS.txt" file in
the "rules" directory.
- rewrite_header, add_header
These options affect the way messages are tagged as spam or
non-spam. This makes it easy to identify incoming mail.
- ok_locales
If you expect to receive mail in non-ISO-8859 character sets (ie.
Chinese, Cyrillic, Japanese, Korean, or Thai) then set this.
Learning
--------
Apache SpamAssassin includes a Bayesian learning filter, so it is worthwhile
training Apache SpamAssassin with your collection of non-spam and spam,
if possible. This will make it more accurate for your incoming mail.
Do this using the "sa-learn" tools, like so:
sa-learn --spam ~/Mail/saved-spam-folder
sa-learn --ham ~/Mail/inbox
sa-learn --ham ~/Mail/other-nonspam-folder
If these are mail folders in mbox format, use the --mbox switch, for
Maildirs use a trailing slash, like Maildir/cur/.
Use as many mailboxes as you like. Note that Apache SpamAssassin will remember
what mails it has learnt from, so you can re-run this as often as you like.
Localization
------------
All text displayed to users is taken from the configuration files. This
means that you can translate messages, test descriptions, and templates
into other languages.
If you do so, we would *really* appreciate it if you could contribute
these translations, so that they can be added to the
distribution. Please file a bug in our Bugzilla[4], and attach your
translations. You will, of course, be credited for this work!
[4]: http://issues.apache.org/SpamAssassin/
Disabled code
-------------
There are some tests and code in Apache SpamAssassin that are turned off by
default: experimental code, slow code, or code that depends on
non-open-source software or services that are not always free. These
disabled tests include:
- DCC: depends on non-open-source software (disabled in init.pre)
- MAPS: commercial service (disabled in 50_scores.cf)
- TextCat: slow (disabled in init.pre)
- various optional plugins, disabled for speed (disabled in *.pre)
To turn on tests disabled in 50_scores.cf, simply assign them a non-zero
score, e.g. by adding score lines to your ~/.spamassassin/user_prefs file.
To turn on tests disabled by commenting out the required plugin in
init.pre, you need to uncomment the loadplugin line and make sure the
prerequisites for proper operation of the plugin are present.
Automatic Reputation System
--------------------------
Apache SpamAssassin includes an automatic reputation system. The way it works is
by tracking for each sender address a rolling average score of messages
so far seen from there. Then, it combines this long-term average score
for the sender with the score for the particular message being evaluated,
after all other rules have been applied.
This functionality can be enabled or disabled with the
"use_txrep" option.
For more information, read sql/README.txrep
(end of README)
// vim:tw=74:

67
upstream/TRADEMARK Normal file
View File

@ -0,0 +1,67 @@
Guidelines for Use of the SpamAssassin Trademark
The goal of these guidelines is to minimize the likelihood that email
users and other relevant readers will be confused as to the relationship
between the licensees of the SpamAssassin(tm) technology, and the project
itself. The Apache Software Foundation has acquired the trademark and
will be taking the appropriate steps to protect the valuable goodwill that
has developed under the SpamAssassin mark.
What do these guidelines cover?
These guidelines explain how you can use the trademark SpamAssassin to
refer to the SpamAssassin software as required for reasonable and
customary use in describing the origin of the software and reproducing
the content of the NOTICE file. All such use is at your own risk.
Without advance written permission from the Apache Software Foundation,
you may not use the SpamAssassin trademark, logos, or artwork for any
other purpose. Under no circumstances may you use SpamAssassin in or as
part of a product, service, company, domain, or other name, or in any
slogan, tag line, promotional campaign title, advertising hook, or meta
tag. If you have any concerns about whether your plans might conflict
with these guidelines, please contact the Apache Software Foundation at
<human-response@apache.org>.
What are the rules?
Proper use of the SpamAssassin trademark is simple: (1) stylize the mark
properly, (2) use it in a grammatically appropriate way, (3) use the (tm)
marking and attribute ownership in the fine print, and (4) avoid any
misleading usage. The following paragraphs provide additional detail on
these four steps.
(1) Stylization of the SpamAssassin mark.
SpamAssassin should always be written in mixed case with the initial S and
the first A in assassin in capital letters, and the other letters in
lowercase type. There is no space between "Spam" and "Assassin".
(2) Grammatically Appropriate Use of the SpamAssassin mark.
A trademark is used correctly when the word "brand" would comfortably fit
between the mark and the following word. "SpamAssassin [brand] software"
and "SpamAssassin [brand] technology" are correct uses; "SpamAssassin
[brand] will intercept" is not. A trademark is an adjective that should
be followed by an appropriate generic term. It may be cumbersome to
always make technically correct use of the SpamAssassin trademark, but you
should do so the first time you refer to the mark and as often as possible
thereafter.
Because a trademark is an adjective, it should never be used in a
possessive form ("SpamAssassin's") or made plural ("SpamAssassins") or
used as a verb. You also should not create new forms of the trademark
("SpamAssassinate"). Consistency in repetition helps make a mark more
memorable, and will promote the success of the SpamAssassin project.
(3) Markings and Fine Print.
The appropriate marking to use with SpamAssassin is the (tm) symbol
(&#153;). At the bottom of the page in which SpamAssassin is referenced,
you should add the statement "SpamAssassin is a trademark of the Apache
Software Foundation".
(4) Avoiding Misleading Use.
The SpamAssassin mark must never be used to imply that the Apache Software
Foundation or the SpamAssassin project wrote, tested, endorses, or
approves any particular third party product.

589
upstream/UPGRADE Normal file
View File

@ -0,0 +1,589 @@
Note for Users Upgrading to SpamAssassin 3.4.4
----------------------------------------------
- FromNameSpoof: fns_extrachars parameter default value has been increased to 50
- nosubject and maxhits tflags now work correctly with sa-compile
Note for Users Upgrading to SpamAssassin 3.4.3
----------------------------------------------
- New subjprefix keyword added, this can be used to add a prefix to
email Subject if the original email matches a particular rule
- New Util::is_fqdn_valid() function to validate hostname (DNS name) format
(Bug 7736). To check if a name contains valid TLD, it's still needed to
additionally use RegistryBoundaries::is_domain_valid()
- New OLEVBMacro plugin to detect OLE Macro inside documents attached to emails,
this plugin requires Archive::Zip and IO::String Perl modules to work.
- Due to the dangerous nature of sa-update --allowplugins option, it
now prints a warning that --reallyallowplugins is required to use it.
This is to make sure all the legacy installations and wiki guides etc
still using it needlessly get fixed.
- TxRep and Awl plugins has been modified to be compatible
with latest Postgresql versions.
You should upgrade your sql database running the following command:
MySQL:
"ALTER TABLE `txrep` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';"
"ALTER TABLE `awl` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';"
"ALTER TABLE `awl` ADD last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;"
PostgreSQL:
"ALTER TABLE txrep RENAME COLUMN count TO msgcount;"
"ALTER TABLE awl RENAME COLUMN count TO msgcount;"
"ALTER TABLE awl ADD last_hit timestamp NOT NULL default CURRENT_TIMESTAMP;"
- body_part_scan_size 50000, rawbody_part_scan_size 500000 defaults added (Bug 6582)
These enable safer and faster scanning of large emails.
- ALL pseudo-header now returns decoded headers, so it's usage is consistent
with single header matching (:raw returns undecoded and folded like before).
- RegistryBoundaries did not load 20_aux_tlds.cf properly in older versions.
Old hardcoded list is now removed and RB will print "no tlds defined, need
to run sa-update" unless it can find list from config files.
- Deprecated functions: Parser::is_delimited_regexp_valid(),
Parser::is_regexp_valid(), Util::regexp_remove_delimiters(),
Util::make_qr(). These all are combined into new Util::compile_regexp().
- DNSEval: add check_rbl_headers to check specific headers in rbl
- DNSEval: add check_rbl_ns_from to check against an rbl for dns servers
- HashBL: Add check_hashbl_bodyre, check_hashbl_emails, check_hashbl_uris,
hashbl_ignore
- ASN: Support IPv6 with asn_lookup_ipv6 (Bug 7211)
- sa-update: New option --httputil to force used download utility
- Add rules_matching() expression to meta rules
- Add tflags domains_only/ips_only to DNSEval.pm functions
- RelayCountry: Added new metadata: X-Spam-Countries-External (_RELAYCOUNTRYEXT_),
X-Spam-Countries-Auth (_RELAYCOUNTRYAUTH_), X-Spam-Countries-All (_RELAYCOUNTRYALL_)
- New tflag "nosubject" for 'body' rules, to stop matching the Subject
header which is part of the body text.
Note for Users Upgrading to SpamAssassin 3.4.2
----------------------------------------------
- We now support SHA-512 and SHA-256 signatures for our rules updates.
- We may stop producing SHA-1 signatures in the near future so upgrading
to 3.4.2 is important. sa-update no longer uses these signatures.
See https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7614
- freemail_import_whitelist_auth, freemail_import_def_whitelist_auth added (Bug 6451)
New plugins
-----------
There are four new plugins added with this release:
Mail::SpamAssassin::Plugin::HashBL
The HashBL plugin is the interface to The Email Blocklist (EBL).
The EBL is intended to filter spam that is sent from IP addresses
and domains that cannot be blocked without causing significant
numbers of false positives.
Mail::SpamAssassin::Plugin::ResourceLimits
This plugin leverages BSD::Resource to assure your spamd child processes
do not exceed specified CPU or memory limit. If this happens, the child
process will die. See the BSD::Resource for more details.
Mail::SpamAssassin::Plugin::FromNameSpoof
This plugin allows for detection of the From:name field being used to mislead
recipients into thinking an email is from another address. The man page
includes examples and we expect to put test rules for this plugin into
rulesrc soon!
Mail::SpamAssassin::Plugin::Phishing
This plugin finds uris used in phishing campaigns detected by
OpenPhish (https://openphish.com) or PhishTank (https://phishtank.com) feeds.
These plugins are disabled by default. To enable, uncomment
the loadplugin configuration options in file v342.pre, or add it to
some local .pre file such as local.pre .
Notable changes
---------------
For security reasons SSLv3 support has been removed from spamc(1).
GeoIP2 support has been added to RelayCountry and URILocalBL plugins due
to GeoIP legacy api deprecations.
New configuration options
-------------------------
A new template tag _DKIMSELECTOR_ that maps to the DKIM selector (the 's' tag)
from valid signatures has been added.
A 'uri_block_cont' option to URILocalBL plugin to score uris per continent has been added.
Possible continent codes are:
af, as, eu, na, oc, sa for Africa, Asia, Europe, North America,
Oceania and South America.
The 'country_db_type' and 'country_db_path' options has been added to be able
to choose in RelayCountry plugin between GeoIP legacy
(discontinued from 04/01/2018), GeoIP2, IP::Country::Fast
and IP::Country::DB_File.
GeoIP legacy is still the default option but it will be deprecated
in future releases.
A config option 'uri_country_db_path' has been added to be able to choose
in URILocalBL plugin between GeoIP legacy and new GeoIP2 api.
A config option 'resource_limit_cpu' (default: 0 or no limit) has been added
to configure how many cpu cycles are allowed on a child process before it dies.
A config option 'resource_limit_mem' (default: 0 or no limit) has been added
to configure the maximum number of bytes of memory allowed both for
(virtual) address space bytes and resident set size.
A new config option 'report_wrap_width' (default: 70) has been added
to set the wrap width for description lines in the X-Spam-Report header.
Notable Internal changes
------------------------
SpamAssassin can cope with new Net::DNS module versions.
The "bytes" pragma has been remove from both core modules and plugins for
better utf-8 compatibility, there has been also some other utf-8 related fixes.
The spamc(1) client can now be build against OpenSSL 1.1.0.
The test framework has been switched to Test::More module.
Other updates
-------------
A list of top-level domains in registrar boundaries was updated.
Note for Users Upgrading to SpamAssassin 3.4.1
----------------------------------------------
- The TxRep plugin is now included and disabled by default for new installs.
To replace an existing AWL configuration with TxRep, follow the steps below:
- Disable AWL
- Enable TxRep
- Set txrep_factor equal to your previous AWL factor
- Set use_txrep to 1
For more detailed information and more configuration options, consult the
documentation in Mail::SpamAssassin::Plugin::TxRep.
- The $VALID_TLDS_RE global in registrar boundaries is deprecated but kept for
third-party plugin compatibility. It will become increasingly out of date
and may be removed in a future release.
See lib/Mail/SpamAssassin/Plugin/FreeMail.pm for an example of the new way
to obtain a valid list of TLDs, i.e.
$self->{main}->{registryboundaries}->{valid_tlds_re}
- Mail::SpamAssassin::Util::RegistrarBoundaries is being replaced by
Mail::SpamAssassin::RegistryBoundaries so that new TLDs can be updated via
20_aux_tlds.cf delivered via sa-update.
***3rd Party Plugin Authors, Please Note***
The following functions will be removed in the next release after 3.4.1
excepting any emergency break/fix releases immediately after 3.4.1:
Mail::SpamAssassin::Util::RegistrarBoundaries::is_domain_valid
Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain
Mail::SpamAssassin::Util::RegistrarBoundaries::split_domain
Mail::SpamAssassin::Util::uri_to_domain
And the following variables will also be removed:
Mail::SpamAssassin::Util::RegistrarBoundaries::US_STATES
Mail::SpamAssassin::Util::RegistrarBoundaries::THREE_LEVEL_DOMAINS
Mail::SpamAssassin::Util::RegistrarBoundaries::TWO_LEVEL_DOMAINS
Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS_RE
Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS
This change should only affect 3rd party plugin authors who will need to
update their code to utilize Mail::SpamAssassin::RegistryBoundaries
instead of the functions and variables in
Mail::SpamAssassin::Util::RegistrarBoundaries and the function
Mail::SpamAssassin::Util::uri_to_domain which are deprecated and will be
removed.
For example, the $VALID_TLDS_RE global in registrar boundaries is
deprecated but kept for third-party plugin compatibility. It will become
increasingly out of date and may be removed in a future release.
See lib/Mail/SpamAssassin/Plugin/FreeMail.pm for an example of the new way
to obtain a valid list of TLDs, i.e.
$self->{main}->{registryboundaries}->{valid_tlds_re}
- It is now recommended that users uncomment "normalize_charset 1" in
local.cf. It will break rules that depend on messages being in non-UTF8
encodings, but going forward this will enable more robust unicode rules and
will eventually become the default.
Note for Users Upgrading to SpamAssassin 3.4.0
----------------------------------------------
- When Bayes classification is in use and messages are 'learned' as spam
or ham and stored in a database, the Bayes plugin generates internal
message IDs of learned messages and stores them in a 'seen' database to
avoid re-learning duplicates and accidental un-learning messages that
were not previously learned. With changes in bug 5185, the calculation
of message IDs in a bayes 'seen' database has changed, so new code can
no longer associate new messages with those learned before the change.
- Note that this change does not affect recognition of old tokens and the
classification algorithm, only duplicate detection and unlearning of old
messages is affected.
- Because of this change, if you use Bayes and you are upgrading from a
version prior to 3.4.0, you may consider wiping your Bayes database
and starting fresh.
- There is a new optional dependency on Net::Patricia to speed up lookups
on internal_networks, trusted_networks or msa_networks when these lists
contain a larger number of entries. Consider installing this module to
speed up classification.
- The minimal required version of NetAddr::IP was bumped to 4.010
- In addition to existing backends, the 3.4.0 introduces support for keeping
a Bayes database on a Redis server, either running locally, or accessed
over network. Similar to SQL backends, the database may be concurrently
used by several hosts running SpamAssassin.
- For more detail on these and other changes, please see the Announcement
file at:
http://svn.apache.org/repos/asf/spamassassin/branches/3.4/build/announcements/3.4.0.txt
Note for Users Upgrading to SpamAssassin 3.3.0
-----------------------------------------------
- Rules are no longer included with SpamAssassin "out of the box". You will
need to immediately run "sa-update", or download the additional rules .tgz
package and run "sa-update --install" with it, to get a ruleset.
- The BETA label has been taken off of the SpamAssassin SQL support. Please
be aware that occasional changes may still be made to this area of the
code. You should be sure to read this upgrade document each time you
install a new version to determine if any SQL updates need to be made to
your local installation.
- The DKIM plugin is now enabled by default for new installs, if the perl
module Mail::DKIM is installed. However, installation of SpamAssassin
will not overwrite existing .pre configuration files, so to use DKIM when
upgrading from a previous release that did not use DKIM, a directive:
loadplugin Mail::SpamAssassin::Plugin::DKIM
will need to be uncommented in file "v312.pre", or added to some
other .pre file, such as local.pre.
Note for Users Upgrading to SpamAssassin 3.2.0
-----------------------------------------------
- The "127/8" network, including 127.0.0.1, is now always implicitly part of
"trusted_networks" and "internal_networks". It's impossible to remove these
from the trusted/internal sets, since if you cannot trust the host where
SpamAssassin is running, you cannot trust SpamAssassin itself. If you
previously had "trusted_networks" and "internal_networks" lines listing those
hosts, you should now remove them, otherwise a minor (non-lint-error) warning
will be output.
- For ISPs -- version 3.2.0 now includes a new way to specify Mail Submission
Agents, relay hosts which accept mail from your own users and authenticates
them appropriately. See the Mail::SpamAssassin::Conf manual page for the
"msa_networks" setting.
Note for Users Upgrading to SpamAssassin 3.1.0
-----------------------------------------------
- A significant amount of core functionality has been moved into
plugins. These include, AWL (auto-whitelist), DCC, Pyzor, Razor2,
SpamCop reporting and TextCat. For information on configuring these
plugins please refer to their individual documentation:
perldoc Mail::SpamAssassin::Plugin::* (ie AWL, DCC, etc)
- There are now multiple files read to enable plugins in the
/etc/mail/spamassassin directory; previously only one, "init.pre" was
read. Now both "init.pre", "v310.pre", and any other files ending
in ".pre" will be read. As future releases are made, new plugins
will be added to new files named according to the release they're
added in.
- Due to license restrictions the DCC and Razor2 plugins are disabled
by default. We encourage you to read the appropriate license
yourself and decide if you are able to re-enable the plugins for
your site.
- The use_auto_whitelist config option has been moved to a user config
option, this allows individual users to turn on or off whitelisting
regardless of what is set in the system config. If you would like to
disable AWL (auto-whitelist) on a site-wide basis, then you can comment
out the plugin in "v310.pre".
- The bayes_auto_learn_threshold_* config options for bayes have moved
to a plugin. In general the default should work just fine however
if you are interested in changing these values you should see:
perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold
- The AWL support for NDBM_File databases has been dropped, due to a
bug in that package which renders it useless (bug 4353). It appears
that SDBM_File, the package which will be used instead, is fully
compatible with NDBM however, so this should be unnoticeable.
- The prefork algorithm for spamd has been changed. In this version spamd
will attempt to keep a small number of "hot" child processes as busy as
possible, and keep any others as idle as possible, using something
similar to the Apache httpd server scaling algorithm. This reduces
memory usage and swapping. You can use the --round-robin switch for
spamd to disable this scaling algorithm, and the behaviour seen in the
3.0.x versions will be used instead, where all processes receive an
equal load and no scaling takes place.
- As of 3.1.0, in addition to the generic BayesSQL support (via
Mail::SpamAssassin::BayesStore::SQL) usable by multiple database
drivers there is now specific support for MySQL 4.1+ and
PostgreSQL. This support is based on non-standard features present
in both database servers that allow for various performance boosts.
If you were using the previous BayesSQL support with MySQL, and
already have MySQL 4.1+ installed you can begin using the new module
immediately by replacing the bayes_store_module line in your
configuration with: Mail::SpamAssassin::BayesStore::MySQL
We do however recommend that you switch your MySQL tables over to
InnoDB for better data integrity and multi user support. You can
most often do this via a simple ALTER TABLE command, refer to the
MySQL documentation for more information.
If you were previously using PostgreSQL for your bayes database then
we STRONGLY recommend switching to the PostgreSQL specific backend:
Mail::SpamAssassin::BayesStore::PgSQL
To switch to this backend you should first run sa-learn --backup to
make a backup of your existing data and then drop and recreate the
database following the instructions in sql/README.bayes. Then you
can restore the database with sa-learn --restore. If you have
multiple users then you will have to run --backup and --restore for
each user to fully restore the database.
- See http://wiki.apache.org/spamassassin/UpgradeTo310 for a
supplementary list of upgrade notes. It will be updated with any
upgrade notes not captured in this document.
- Finally, this document is likely not complete. Other configuration
options/arguments may have changed from older versions, etc. It would
be good to double-check any custom configuration options to make sure
they're still valid. This could be as simple as running "spamassassin
--lint", or more complex, as required by the environment.
Note for Users Upgrading to SpamAssassin 3.0.x
----------------------------------------------
- The SpamAssassin 2.6x release series was the last set of releases to
officially support perl versions earlier than perl 5.6.1. If you are
using an earlier version of perl, you will need to upgrade before you
can use the 3.0.0 version of SpamAssassin. You will also want to make
sure that you have the appropriate versions of required and optional
modules as they may have changed from old versions. The INSTALL
document has the modules and version requirements listed.
- See http://wiki.apache.org/spamassassin/UpgradeTo300 for a
supplementary list of upgrade notes. It will be updated with any
upgrade notes not captured in this document.
- SpamAssassin 3.0.0 has a significantly different API (Application Program
Interface) from the 2.x series of code. This means that if you use
SpamAssassin through a third-party utility (milter, etc,) you need to make
sure you have an updated version which supports 3.0.0. See
http://wiki.apache.org/spamassassin/UpgradeTo300 for information about
third-party software.
- The --auto-whitelist, --whitelist and -a options for "spamd" and
"spamassassin" to turn on the auto-whitelist have been removed and
replaced by the "use_auto_whitelist" configuration option which is
also now turned on by default.
- The --virtual-config switch for spamd had to be dropped, due to licensing
issues. It is replaced by the --virtual-config-dir switch.
- The "rewrite_subject" and "subject_tag" configuration options were
deprecated and are now removed. Instead, using "rewrite_header Subject
[your desired setting]". e.g.
rewrite_subject 1
subject_tag ****SPAM(_SCORE_)****
becomes
rewrite_header Subject ****SPAM(_SCORE_)****
- The "sa-learn --rebuild" command has been deprecated; please use
"sa-learn --sync" instead. The --rebuild option will remain temporarily
for backward compatibility.
- The Bayesian storage modules have been completely re-written and now
include Berkeley DB (DBM) storage as well as SQL based storage (see
sql/README.bayes for more information). In addition, a new format
has been introduced for the bayes database that stores tokens in fixed
length hashes (Bayes v3). All DBM databases should be automatically
converted to this new format the first time they are opened for write.
You can manually perform the upgrade by running "sa-learn --sync"
from the command line.
Due to the database format change, you will want to do something like
this when upgrading:
- stop running spamassassin/spamd (ie: you don't want it to be running
during the upgrade)
- run "sa-learn --rebuild", this will sync your journal. if you skip
this step, any data from the journal will be lost when the DB is
upgraded.
- upgrade SA to 3.0.0
- run "sa-learn --sync", which will cause the db format to be upgraded.
if you want to see what is going on, you can add the "-D" option.
- test the new database by running some sample mails through
SpamAssassin, and/or at least running "sa-learn --dump" to make sure
the data looks valid.
- start running spamassassin/spamd again
If, instead of uprading your Bayes database, you want to wipe it and
start fresh, you can run "sa-learn --clear" to safely remove your
Bayes database files. If the --clear command issues an error then
you can simply delete the Bayes database files ("bayes_*") while SA
is not running; SpamAssassin will recreate them in the current
format when it runs.
- "spamd" now has a default max-children setting of 5; no more than 5
child scanner processes will be run in parallel. Previously, there was
no default limit unless you specified the "-m" switch when starting
spamd.
- If you are using a UNIX machine with all database files on local disks,
and no sharing of those databases across NFS filesystems, you can use a
more efficient, but non-NFS-safe, locking mechanism. Do this by adding
the line "lock_method flock" to the /etc/mail/spamassassin/local.cf
file. This is strongly recommended if you're not using NFS, as it is
much faster than the NFS-safe locker.
- Please note that the use of the following commandline parameters for
spamassassin and spamd have been deprecated and may be removed in
upcoming versions of SpamAssassin. Please discontinue usage of these
options:
in the 2.6x series: --add-from, --pipe, -F, --stop-at-threshold,
-S, -P (spamassassin only)
in the 3.0.x series: --auto-whitelist, -a, --whitelist-factory, -M,
--warning-from, -w, --log-to-mbox, -l
- user_scores_sql_table is no longer supported. If you need to use a table
name, other than the default, create a custom query using the
user_scores_sql_custom_query config option.
- SpamAssassin runs in "taint mode" by default for improved security.
Certain third-party modules may be incompatible with taint mode.
- 2.6x deprecated the use of the "check_bayes_db" script, and it
has been removed in 3.0.0. Please see the sa-learn man/pod
documentation for more info.
- Finally, this document is likely not complete. Other configuration
options/arguments may have changed from older versions, etc. It would
be good to double-check any custom configuration options to make sure
they're still valid. This could be as simple as running "spamassassin
--lint", or more complex, as required by the environment.
An example: "require_version <version>" hasn't changed itself, but the
internal version representation is now "x.yyyzzz" instead of "x.yz"
which could cause issues if "require_version 3.00" is expected to work
(it won't, it needs to be "require_version 3.000000").
Note for Users Upgrading from SpamAssassin 2.5x
-----------------------------------------------
- Due to major reliability shortcomings in the database support libraries
other than DB_File, we now require that the DB_File module be installed
to use SpamAssassin's Bayes rules.
SpamAssassin will still work without DB_File installed, but the Bayes
support will be disabled.
If you install DB_File and wish to import old Bayes database data, each
user with a Bayes db should run "sa-learn --import" to copy old entries
from the other formats into a new DB_File file.
Due to the database library change, and the change to the database
format itself, you will want to do something like this when upgrading:
- stop running spamassassin/spamd (ie: you don't want it to be running
during the upgrade)
- run "sa-learn --rebuild", this will sync your journal. if you skip
this step, any data from the journal will be lost when the DB is
upgraded.
- install DB_File module if necessary
- upgrade SA to 3.0.0
- if you were using another database module previously, run "sa-learn
--import" to migrate the data into new DB_File files
- run "sa-learn --sync", which will cause the db format to be upgraded.
if you want to see what is going on, you can add the "-D" option.
- test the new database by running some sample mails through
SpamAssassin, and/or at least running "sa-learn --dump" to make sure
the data looks valid.
- start running spamassassin/spamd again
Obviously the steps will be different depending on your environment, but
you get the idea. :)
Note For Users Upgrading From SpamAssassin 2.3x or 2.4x
-------------------------------------------------------
- SpamAssassin no longer includes code to handle local mail delivery, as
it was not reliable enough, compared to procmail. So now, if you relied
on spamassassin to write the mail into your mail folder, you'll have to
change your setup to use procmail as detailed below. If you used
spamassassin to filter your mail and then something else wrote it into a
folder for you, then you should be fine.
- Support for versions of the optional Mail::Audit module is no longer
included.
- The default mode of tagging (which used to be ***SPAM*** in the subject
line) no longer takes place. Instead the message is rewritten. If an
incoming message is tagged as spam, instead of modifying the original
message, SpamAssassin will create a new report message and attach the
original message as a message/rfc822 MIME part (ensuring the original
message is completely preserved and easier to recover). If you do not
want to modify the body of incoming spam, use the "report_safe" option.
The "report_header" and "defang_mime" options have been removed as a
result.
(end of UPGRADE)
//vim:tw=74:

250
upstream/USAGE Normal file
View File

@ -0,0 +1,250 @@
Important Note For Users Upgrading From Earlier Versions
--------------------------------------------------------
SpamAssassin no longer includes code to handle local mail delivery, as it
was not reliable enough, compared to procmail. So now, if you relied on
spamassassin to write the mail into your mail folder, you'll have to
change your setup to use procmail as detailed below.
If you used spamassassin to filter your mail and then something else wrote
it into a folder for you, then you should be fine.
Steps to take for every installation:
- Install Mail::SpamAssassin on your mail server, as per the INSTALL
document.
- Test it:
spamassassin -t < sample-nonspam.txt > nonspam.out
spamassassin -t < sample-spam.txt > spam.out
Verify (using a text viewer, ie. "less" or "notepad") that nonspam.out
has not been tagged as spam, and that spam.out has. The files should
contain the full text and headers of the messages, the "spam.out"
message should contain the header "X-Spam-Flag: YES" and be annotated
with a report from SpamAssassin, and there should be no errors when you
run the commands.
Even though sample-nonspam.txt is not spam, nonspam.out will
contain a SpamAssassin report anyway. This is a side-effect of
the "-t" (test) switch. However, there should be less than 5
points accumulated; when the "-t" switch is not in use, the report
text would not be added. For more verbose (debugging) output, add
the "-D" switch.
If the commands do not work, DO NOT PROCEED TO THE NEXT STEP, as you
will lose mail!
If you use KMail:
- http://kmail.kde.org/tools.html mentions:
The filter setup is the work of five minutes (if that!) if you have a
working spamassassin set up.
The filter in question is "<any header><matches regexp> ."
The action is "<pipe through> spamassassin"
Then, in the advanced options, uncheck the "If this filter matches,
stop processing here" box. If you keep this filter at the top, it will
analyze any incoming mail, decide whether it's spam or not, and flag
it accordingly.
[Then add] a second filter behind it, which searches for the added
spam-flags and diverts them into a specific spam folder. [...]
If you use procmail, or haven't decided on any of the above examples:
- Make a backup of your .procmailrc (if you already have one).
cp ~/.procmailrc ~/.procmailrc.bak
- add the line from procmailrc.example to ~/.procmailrc, at the top of
the file before any existing recipes.
That'll process all mail through SA, and refile spam messages to
a folder called "caughtspam" in your home directory.
- Send yourself a mail message, and ensure it gets to you. If it does
not, copy your old backed-up .procmailrc file back into place and ask
your sysadmin for help! Here's commands to do that:
cp ~/.procmailrc.bak ~/.procmailrc
echo "Help!" | mail root
If you want to use SpamAssassin site-wide:
- take a look at the notes on the Wiki website, currently at
<http://wiki.apache.org/spamassassin/UsingSiteWide>. You will probably
want to use 'spamd' (see below). You may want to investigate the
new Apache mod_perl module, in the 'spamd-apache2' directory, too.
- *PLEASE* let your users know you've installed it, and how to turn it
off! This is our #1 tech support query, and the users are usually
pretty frustrated once it reaches that stage.
- *PLEASE* consider setting it up as "off by default" for most accounts,
and let users opt-in to using it. Quite a few folks prefer not to
have their mail filtered, presumably because they don't use their
email address publicly and do not get much spam.
- Note that procmail users adding spamc to /etc/procmailrc should
add the line 'DROPPRIVS=yes' at the top of the file.
The Auto-Whitelist
------------------
The auto-whitelist is enabled using the 'use_auto_whitelist' option.
(See http://wiki.apache.org/spamassassin/AutoWhitelist for details on
how it works, if you're curious.)
Other Installation Notes
------------------------
- Hashcash is a useful system; it requires that senders exercise a
CPU-intensive task before they can send mail to you, so we give that
some bonus points. However, it requires that you list what addresses
you expect to receive mail for, by adding 'hashcash_accept' lines to
your ~/.spamassassin/user_prefs or /etc/mail/spamassassin/local.cf
files. See the Mail::SpamAssassin::Plugin::Hashcash manual page for
details on how to specify these.
- SpamAssassin now uses a temporary file in /tmp (or $TMPDIR, if that's
set in the environment) for Pyzor and DCC checks. Make sure that this
directory is either (a) not writable by other users, or (b) not shared
over NFS, for security.
- You can create your own system-wide rules files in
/etc/mail/spamassassin; their filenames should end in ".cf". Multiple
files will be read, and SpamAssassin will not overwrite these files
when installing a new version.
- You should not modify the files in /usr/share/spamassassin; these
will be overwritten when you upgrade. Any changes you make in
files in the /etc/mail/spamassassin directory, however, will
override these files.
- Rules can be turned off by setting their scores to 0 in a
configuration or user-preference file.
- Speakers of Chinese, Japanese, Korean or Arabic may find it useful to
turn off the rules listed at the end of the "user_prefs.template"
file; we've found out that these rules are still triggering on
non-spam CJK mails.
- If you have an unusual network configuration, you should probably
set 'trusted_networks'. This allows SpamAssassin to determine where
your internal network ends and the internet begins, and allows DNS
checks to be more accurate. If your mail host is NATed, you will
almost certainly need to set 'trusted_networks' to get correct
results.
- A very handy new feature is SPF support, which allows you to check
that the message sender is permitted by their domain to send from the
IP address used. This has the potential to greatly cut down on mail
forgery. (see http://spf.pobox.com/ for more details.)
- MDaemon users should add this line to their "local.cf" file:
report_safe_copy_headers X-MDRcpt-To X-MDRemoteIP X-MDaemon-Deliver-To
Otherwise, MDaemon's internal delivery will fail when SpamAssassin
rewrites a message as spam.
- The distribution includes 'spamd', a daemonized version of
SpamAssassin which runs persistently. Using its counterpart,
'spamc', a lightweight client written in C, an MTA can process
large volumes of mail through SpamAssassin without having to
fork/exec a perl interpreter for each message. Take a look in the
'spamd' and 'spamc' directories for more details.
- The distribution also includes 'spamd-apache2', a mod_perl module
allowing the Apache HTTP server to be used as a platform for a
daemonized SpamAssassin, in an upwardly-compatible fashion from
'spamd'. If you don't require some of the spamd features it does not
implement (such as switching UIDs to read per-user configuration from
user home directories), this may be much faster than spamd. Take a
look at the 'spamd-apache2' directory for details.
- spamc can now be built as a shared library for use with milters or
to link into other existing programs; simply run "make libspamc.so"
to build this.
- If you get spammed, it is helpful to everyone else if you re-run
spamassassin with the "-r" option to report the message in question as
"verified spam". This will add it to Vipul's Razor, DCC and Pyzor,
assuming you've set these up appropriately.
spamassassin -r < spam-message
If you use mutt as your mail reader, this macro will bind the X key to
report a spam message.
macro index X "| spamassassin -r"
This is, of course, optional -- but you'll get lots of good-netizen
karma. ;)
- Quite often, if you've been on the internet for a while, you'll have
accumulated a few old email accounts that nowadays get nothing but
spam. You can set these up as spam traps using SpamAssassin; see the
''SPAM TRAPPING'' section of the spamassassin manual page for details.
If you don't want to go to the bother of setting up a system yourself
to do this, take a look here [1] for a simple forwarding-based
alternative.
[1]: http://wiki.apache.org/spamassassin/SpamTrapping
- Scores and other user preferences can now be loaded from, and Bayes
and automatic reputation data can be stored in, an SQL database; see
the 'sql' subdirectory for more details.
If you are setting up a large 'spamd' system-wide installation, with
Bayes and/or automatic reputation, we strongly recommend using SQL as
storage. It has proven more reliable than the default DB_File storage
backend at several large sites. For Bayes, you should always use the
new Redis backend (see Mail::SpamAssassin::BayesStore::Redis).
- If you are running SpamAssassin under a disk quota, or are setting up
'spamd' with users with disk quotas, be warned that the DB_File
database module used by SpamAssassin for Bayes, TxRep, and AWL storage
seems to be unreliable in the face of quotas (bug 3796). In this
situation, we recommend using SQL storage for those databases, instead
of DB_File.
- Lots more ways to integrate SpamAssassin can be read at
http://wiki.SpamAssassin.org/ .
(end of USAGE)
// vim:tw=74:

View File

@ -0,0 +1,6 @@
#!/usr/bin/perl
use strict;
use lib 'lib';
use Mail::SpamAssassin::Util::DependencyInfo;
exit Mail::SpamAssassin::Util::DependencyInfo::long_diagnostics();

View File

@ -0,0 +1,17 @@
#!/usr/bin/perl
use Pod::Html;
use Pod::Text ();
foreach (@ARGV) {
$in = $_;
s,^(lib|spamd|spamc)/|\.(pod|pm)$,,g;
tr,/,_,;
# convert to HTML: doc/foo.html
pod2html ("--infile=$in", "--outfile=doc/$_.html");
# and to text: doc/foo.txt
my $parser = Pod::Text->new ();
$parser->parse_from_file ($in, "doc/$_.txt");
}

15
upstream/build/get_version Executable file
View File

@ -0,0 +1,15 @@
#!/usr/bin/perl
open (IN, "< lib/Mail/SpamAssassin.pm")
or die "cannot open lib/Mail/SpamAssassin.pm\n";
while (<IN>) {
($VERSION) = /^\s*\$VERSION\s*=\s*\"(\S+)\"\;/ unless ( $VERSION );
($EXTRA) = /^\s*\@EXTRA_VERSION\s*=\s*q\w\((\S+?)\)/ unless ( $EXTRA );
($DEVEL) = /^\s*\$IS_DEVEL_BUILD\s*=\s*(1);/ unless ( $DEVEL );
}
close IN;
print $VERSION;
print "-$EXTRA" if ( $EXTRA );
print "-svn" if ( $DEVEL );
print "\n";

1276
upstream/build/mkrules Executable file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,219 @@
#!/usr/bin/perl -w
#
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
use strict;
sub usage {
die "
parse-rules-for-masses: parse the SpamAssassin rules files for mass-checks,
evolving, and frequency analysis
usage: ./parse-rules-for-masses [-d rulesdir] [-o outputfile] [-s scoreset] [-x]
rulesdir defaults to ../rules
outputfile defaults to ./tmp/rules.pl
scoreset default to 0
-x do not include test rules files (ie 70_*)
";
}
use Getopt::Long;
use Data::Dumper;
our (@rulesdirs, $outputfile, $scoreset, $skip_test_rules);
GetOptions (
"d=s" => \@rulesdirs,
"o=s" => \$outputfile,
"s=i" => \$scoreset,
"x" => \$skip_test_rules,
"help|h|?" => sub { usage(); } );
if ($#rulesdirs < 0) {
@rulesdirs = ("../rules");
}
if (!defined $outputfile) {
$outputfile = "./tmp/rules.pl";
mkdir ("tmp", 0755);
}
$scoreset = 0 if ( !defined $scoreset );
my $rules = { };
$rules->{_scoreset} = $scoreset;
readrules(@rulesdirs);
my $scores = { };
foreach my $key (keys %{$rules}) {
next if $key eq '_scoreset';
$scores->{$key} = $rules->{$key}->{score};
}
writerules($outputfile);
exit;
sub readrules {
foreach my $indir (@_) {
my @files = <$indir/*.cf>;
my $file;
my $scores_mutable = 1;
my %rulesfound = ();
my %langs = ();
foreach $file (sort @files) {
$scores_mutable = 1; # always start off mutable in each file
if ($skip_test_rules) {
next if ($file =~ /70_/);
}
open (IN, "<$file");
while (<IN>)
{
# these appear in comments, so deal with them before comment stripping
# takes place
if (/<\/gen:mutable>/i) {
$scores_mutable = 0;
}
elsif (/<gen:mutable>/i) {
$scores_mutable = 1;
}
s/#.*$//g; s/^\s+//; s/\s+$//; next if /^$/;
# TODO: this could be overwriting stuff
my $lang = '';
if (s/^lang\s+(\S+)\s+//) {
$lang = $1;
}
if (/^(header|rawbody|body|full|uri|askdns|meta|mimeheader|reuse)\s+(\S+)\s+(.*)$/) {
my $type = $1;
my $name = $2;
my $val = $3;
if (exists $rules->{$name}->{type} && $type eq 'reuse') {
# "reuse" should be skipped if we already have a rule
next;
}
$rules->{$name} ||= { };
$rules->{$name}->{type} = $type;
$rules->{$name}->{lang} = $lang;
$rules->{$name}->{issubrule} = ($name =~ /^__/) ? '1' : '0';
$rules->{$name}->{tflags} = '';
$rules->{$name}->{eval} = ($val =~ /\beval:(\w+)/) ? $1 : '0';
if ($type eq "meta") {
my @depends = grep { !/^\d+$/ } ($val =~ m/(\w+)/g);
push(@{ $rules->{$name}->{depends} }, @depends);
}
$rules->{$name}->{code} = $val;
} elsif (/^describe\s+(\S+)\s+(.+)$/) {
$rules->{$1} ||= { };
if ($lang) {
$rules->{$1}->{describe} ||= $2;
}
else {
$rules->{$1}->{describe} = $2;
}
} elsif (/^tflags\s+(\S+)\s+(.+)$/) {
$rules->{$1} ||= { };
$rules->{$1}->{tflags} = $2;
} elsif (/^score\s+(\S+)\s+(.+)$/) {
my($name,$score) = ($1,$2);
$rules->{$name} ||= { };
if ( $score =~ /\s/ ) { # there are multiple scores
($score) = (split(/\s+/,$score))[$scoreset];
}
$rules->{$name}->{score} = $score;
$rules->{$name}->{mutable} = $scores_mutable;
}
}
close IN;
}
}
foreach my $rule (keys %{$rules}) {
next if ($rule eq '_scoreset');
if (!defined $rules->{$rule}->{type}) {
delete $rules->{$rule}; # no rule definition -> no rule
next;
}
my $tflags = $rules->{$rule}->{tflags};
if (!defined $rules->{$rule}->{score}) {
my $def = 1.0;
if ($rule =~ /^T_/) {
$def = 0.01;
}
if ($tflags =~ /\bnice\b/) {
$rules->{$rule}->{score} = -$def;
} else {
$rules->{$rule}->{score} = $def;
}
$rules->{$rule}->{no_score_found} = 1;
}
# ignore net rules in set 0 or set 2
if ($tflags =~ /\bnet\b/ && ($scoreset & 1) == 0) {
$rules->{$rule}->{mutable} = 0;
$rules->{$rule}->{score} = 0;
}
# ignore bayes rules in set 0 or set 2
if ($tflags =~ /\blearn\b/ && ($scoreset & 2) == 0) {
$rules->{$rule}->{mutable} = 0;
$rules->{$rule}->{score} = 0;
}
# if a rule didn't have a score specified, assume it's
# mutable
if (!defined $rules->{$rule}->{mutable}) {
$rules->{$rule}->{mutable} = 1;
}
# although T_ test rules are clamped to 0.01. this works well
# for release mass-checks, at least
if ($rule =~ /^T_/) {
$rules->{$rule}->{mutable} = 0;
} elsif ($rule eq 'AWL') { # ignore entirely
$rules->{$rule}->{mutable} = 0;
$rules->{$rule}->{score} = 0;
}
}
}
sub writerules {
my $outfile = shift;
# quick hack to create the tmp directory
system ("mkdir -p $outfile 2>/dev/null ; rmdir $outfile 2>/dev/null");
open (OUT, ">$outfile") or die "cannot write to $outfile";
print OUT "# dumped at ".`date`."\n";
$Data::Dumper::Purity = 1;
print OUT Data::Dumper->Dump ([$rules, $scores], ['*rules', '*scores']);
print OUT "1;";
close OUT;
}

228
upstream/build/preprocessor Executable file
View File

@ -0,0 +1,228 @@
#!/usr/bin/perl
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
#
# This script isn't supposed to be run by hand, it's used by `make` as a pre-
# processor. It currently accepts these options on the command line:
#
# -M<module> Enables <module>
# -D<variable>=<value> Defines the <variable> to be <value>; if the value
# doesn't contain an equal sign, it is interpreted
# as a file and all of its lines containing equal
# signs are taken as <variable>=<value> pairs
#
# and some more to help with non-UNIX platforms, where command-line input
# and output redirection are not always available:
#
# -i<file> Read from input file <file>
# -o<file> Write to output file <file>
#
# -I<dir> Read from input directory <dir>
# -O<dir> Write to output directory <dir>
# <filename> ... Process named files from -I<dir> to -O<dir>
#
# -m<perm> Use chmod permissions <perm> for files
#
# Those modules are currently implemented:
# conditional Comments out every line containing the string
# REMOVEFORINST
# vars Replaces variables: upper case strings surrounded
# by double at-signs, eg. @@VERSION@@. The values are
# taken from the environment and can be overwritten with
# the -D switch. Empty/undefined variables are removed.
# sharpbang Does some sharpbang (#!) replacement. Uses PERL_BIN and
# PERL_WARN.
use strict;
use warnings;
my %modules = ();
my %defines = ();
my @infiles = ();
my $infile;
my $outfile;
my $indir;
my $outdir;
my $mode;
# Each environment variable counts as an own defined var for us.
foreach (keys %ENV) {
$defines{$_} = $ENV{$_};
}
foreach (@ARGV) {
if (/^-M([a-z]+)$/) { $modules{$1} = 1; }
elsif (/^-D([A-Z0-9_]+)=(.*)$/) { $defines{$1} = $2; }
elsif (/^-D([^=]+)$/) { read_defs($1); }
elsif (/^-i(.+)$/) { $infile = $1; }
elsif (/^-o(.+)$/) { $outfile = $1; }
elsif (/^-I(.+)$/) { $indir = $1; }
elsif (/^-m(.*)$/) { $mode = '0'.$1; }
elsif (/^-O(.+)$/) { $outdir = $1; }
elsif (/^(.+)$/) { push (@infiles, $1); }
}
# On Windows, we get -m without an arg. avoid problems with that
# by just ignoring that switch.
$mode = undef unless $mode;
if (defined ($indir) && defined ($outdir) && scalar @infiles > 0) {
require File::Spec;
my $fname;
while ($fname = shift @infiles) {
my $in = File::Spec->catfile ($indir, $fname);
my $out = File::Spec->catfile ($outdir, $fname);
do_file ($in, $out);
}
}
elsif (defined ($infile) && defined($outfile)) {
do_file ($infile, $outfile);
}
else {
# just do STDIN/STDOUT . Not recommended for portability as
# it requires "<" and ">" for Makefile to do its work.
#
do_stdin();
}
sub read_defs {
my ($in) = @_;
open (DEFS, "<$in") or die "Cannot open $in: $!";
foreach (<DEFS>) {
$_ =~ s/^\s+|\s+$//g;
next if /^#/;
next unless /=/;
my ($var, $val) = split(/\s*=\s*/, $_, 2);
$var =~ tr/A-Z_//cd;
$defines{$var} = $val;
}
close (DEFS);
}
sub do_file {
my ($in, $out) = @_;
open (FOOIN, "<$in") or die "Cannot open $in: $!";
open (FOOOUT, ">$out") or die "Cannot open $out: $!";
do_it();
close (FOOIN);
close (FOOOUT);
if (defined $mode) {
chmod (oct $mode, $out) or die "Cannot chmod $mode $out: $!";
}
}
sub do_stdin {
open (FOOIN, "<&STDIN") or die "Cannot dup stdin: $!";
open (FOOOUT, ">&STDOUT") or die "Cannot dup stdout: $!";
do_it();
close (FOOIN);
close (FOOOUT);
}
sub do_it {
# The perlpath can be overwritten via -DPERL_BIN=<perlpath>
my $perl;
if($defines{'PERL_BIN'} && ($defines{PERL_BIN} ne 'this')) {
$perl = $defines{'PERL_BIN'};
unless(-x $perl) {
warn("No such PERL_BIN: $perl");
}
}
else {
# use eval so the module is not loaded unless needed; it's slow
eval 'use Config; $perl = $Config{"perlpath"};';
}
# Warnings are enabled per default
my $perl_warn = ' -w';
# The warnings can be overwritten via -DPERL_WARN=<yes|no>
if ($defines{'PERL_WARN'} and $defines{'PERL_WARN'} eq 'no') {
$perl_warn = '';
}
# Taint mode is enabled per default except on 5.005
my $perl_taint = ' -T';
# The taint mode can be disabled with -DPERL_TAINT=<yes|no>
if ($defines{'PERL_TAINT'} and $defines{'PERL_TAINT'} eq 'no') {
$perl_taint = '';
}
$defines{PERL_MAJOR_VER} = sub {
$] =~ /^(\d\.\d\d\d)/ or die "bad perl ver $]";
return $1;
};
$defines{PLUGIN_POD} = sub {
# Grab active plugin list
my @plugin_pod = ();
foreach my $pre (<rules/*.pre>) {
if (open(INIT, $pre)) {
while (<INIT>) {
if (/^loadplugin\s+(.*?)\s*$/) { push(@plugin_pod, " $1\n"); }
}
close(INIT);
}
}
return join('', sort @plugin_pod);
};
my $towrite = '';
while (<FOOIN>) {
$_ = pack("C0A*", $_); # turn off UTF8-ness
# Conditional compiling
if ($modules{'conditional'}) {
# DELETE lines carrying the REMOVE_ON_BUILD or (deprecated) REMOVEFORINST tag
if(/\bREMOVE(?:FORINST|_ON_BUILD)\b/) {
next;
}
}
$towrite .= $_;
}
# Sharpbang (#!) replacement (see also ExtUtils::MY->fixin)
if ($modules{'sharpbang'}) {
$towrite =~ s/^#![^\n]*perl[^\n]*\n/#!${perl}${perl_taint}${perl_warn}\n/;
}
# Variable replacement (do in one invocation)
if ($modules{'vars'}) {
# Replace all @@VARS@@
while ($towrite =~ /\@\@([A-Z][A-Z0-9_]*)\@\@/) {
my $what = $1;
my $d = $defines{$what} || '';
if (ref($d) =~ /^CODE/) { $d = $d->(); }
$towrite =~ s/\@\@${what}\@\@/$d/g;
}
}
print FOOOUT $towrite;
}

11
upstream/build/sha256sum.pl Executable file
View File

@ -0,0 +1,11 @@
#!/usr/bin/perl
BEGIN {
require Digest::SHA; import Digest::SHA qw(sha256_hex sha512_hex);
}
$/=undef;
while(<>) {
print sha256_hex($_)," $ARGV\n";
}

11
upstream/build/sha512sum.pl Executable file
View File

@ -0,0 +1,11 @@
#!/usr/bin/perl
BEGIN {
require Digest::SHA; import Digest::SHA qw(sha256_hex sha512_hex);
}
$/=undef;
while(<>) {
print sha512_hex($_)," $ARGV\n";
}

116
upstream/ldap/README Normal file
View File

@ -0,0 +1,116 @@
Using SpamAssassin With An LDAP Server
--------------------------------------
SpamAssassin can now load users' score files from an LDAP server. The concept
here is to have a web application (PHP/perl/ASP/etc.) that will allow users to
be able to update their local preferences on how SpamAssassin will filter their
e-mail. The most common use for a system like this would be for users to be
able to update the white list of addresses (whitelist_from) without the need
for them to update their $HOME/.spamassassin/user_prefs file. It is also quite
common for users listed in /etc/passwd to not have a home directory,
therefore, the only way to have their own local settings would be through a
database or LDAP server.
SpamAssassin will check the global configuration file (ie. any file matching
/etc/mail/spamassassin/*.cf) for the following settings:
user_scores_dsn ldap://host:port/dc=basedn,dc=de?attr?scope?uid=__USERNAME__
user_scores_ldap_username bind dn
user_scores_ldap_password password
The first option, user_scores_dsn, describes the data source name that will be
used to create the connection to your LDAP server. You have to write the DSN as
an LDAP URL, the components being the host and port to connect to, the base DN
for the search, the scope of the search (base, one or sub), the single
attribute being the multivalued attribute used to hold the configuration data
(space separated pairs of key and value, just as in a file) and finally the
filter being the expression used to filter out the wanted username. Note that
the filter expression uses the literal text __USERNAME__ as a placeholder for
the username (SpamAssassin will use a s///g statement to replace it with the
actual username).
Examples:
ldap://localhost:389/dc=koehntopp,dc=de?spamassassin?sub?uid=__USERNAME__
ldap://localhost:389/o=stooges?spamassassin?sub?uid=__USERNAME__
If the user_scores_dsn option does not exist, SpamAssassin will not attempt
to use an LDAP server for retrieving users' preferences. Note that this will
NOT look for test rules, only local scores, whitelist_from(s), and
required_score.
Requirements
------------
In order for SpamAssassin to work with your LDAP database, you must have
the perl Net::LDAP module installed. You'll also need the URI module.
In order for spamd to use the LDAP driver, you will have to start spamd
with the additional parameters '--ldap-config -x'.
Each user that wants to utilise the SpamAssassin LDAP driver must add
the 'spamassassin' attribute in their object (either manually or via the
web interface of your making/choice) like this (see the file sa_test.ldif
in this directory for a full database example):
spamassassin: add_header all Foo LDAP read
Database Schema
---------------
You can use any schema extension to your user entries with SpamAssassin,
as long as the attribute is multivalued and correctly named in your LDAP url.
We are currently using a <customername>spamassassin field that is part of
our inetOrgPerson subclass.
Here's an example for openldap's /etc/openldap/schema/inetorgperson.schema :
# SpamAssassin
# see http://SpamAssassin.org/ .
attributetype ( 2.16.840.1.113730.3.1.217
NAME 'spamassassin'
DESC 'SpamAssassin user preferences settings'
EQUALITY caseExactMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.15 )
(don't forget to add "$ spamassassin" to the objectclass MAY clause.)
Testing SpamAssassin/LDAP
-------------------------
To test your LDAP setup, and debug any possible problems, you should start
spamd with the -D option, which will keep spamd in the foreground, and will
output debug message to the terminal. You should then test spamd with a
message by calling spamc. You can use the sample-spam.txt file with the
following command:
cat sample-spam.txt | spamc
Watch the debug output from spamd and look for the following debug line:
retrieving LDAP prefs for <username>: <value>
If you do not see the above text, then the LDAP query was not successful, and
you should see any error messages reported. <username> should be the user
that was passed to spamd and is usually the user executing spamc.
If you need to set up LDAP, a good guide is here:
http://yolinux.com/TUTORIALS/LinuxTutorialLDAP.html
To test LDAP support using the SpamAssassin test suite, you need to
perform a little bit of manual configuration first. See the file
"ldap/README.testing" for details.
******
NB: This should be considered BETA, and the interface or overall
operation of LDAP support may change at any time with future releases of SA.
******
Please send any comments to <kris at koehntopp.de> and file bugs via
<http://issues.apache.org/SpamAssassin/>.
Kristian Köhntopp

View File

@ -0,0 +1,106 @@
How to enable the SpamAssassin LDAP self-test
---------------------------------------------
- install openldap server, using apt-get etc. On Debian (unstable),
that's done as follows:
sudo apt-get install slapd ldap-utils
Then enter an admin password.
- Patch the 'inetorgperson.schema' file, found in the following locations:
Fedora Core 1: /etc/openldap/schema/inetorgperson.schema
Debian: /etc/ldap/schema/inetorgperson.schema
as follows:
--- /etc/openldap/schema/inetorgperson.schema.default 2003-10-23 07:26:01.000000000 -0700
+++ /etc/openldap/schema/inetorgperson.schema 2004-02-05 22:07:01.000000000 -0800
@@ -121,6 +121,13 @@
DESC 'RFC2798: personal identity information, a PKCS #12 PFX'
SYNTAX 1.3.6.1.4.1.1466.115.121.1.5 )
+# spamassassin
+# see http://SpamAssassin.org/ .
+attributetype ( 2.16.840.1.113730.3.1.220
+ NAME 'spamassassin'
+ DESC 'SpamAssassin user preferences settings'
+ EQUALITY caseExactMatch
+ SYNTAX 1.3.6.1.4.1.1466.115.121.1.15 )
# inetOrgPerson
# The inetOrgPerson represents people who are associated with an
@@ -138,5 +145,5 @@
labeledURI $ mail $ manager $ mobile $ o $ pager $
photo $ roomNumber $ secretary $ uid $ userCertificate $
x500uniqueIdentifier $ preferredLanguage $
- userSMIMECertificate $ userPKCS12 )
+ userSMIMECertificate $ userPKCS12 $ spamassassin )
)
- set up according to
http://yolinux.com/TUTORIALS/LinuxTutorialLDAP.html#TUTORIAL . To go into
some detail, here's what you need to do...
- Edit the slapd.conf file, and add the "o=stooges" suffix, so that the LDIF
file we'll be using will be valid.
On Debian, this means changing the existing /etc/ldap/slapd.conf file
according to these substitutions:
s/"dc=jmason,dc=org"/"o=stooges"/g
s/"cn=admin,dc=jmason,dc=org"/"cn=StoogeAdmin,o=stooges"/g
Also, add these two lines to set the "root" password so the LDIF file
can be loaded (ONLY FOR TESTING! DO NOT DO THIS ON A LIVE SERVER!):
rootdn "cn=StoogeAdmin,o=stooges"
rootpw secret1
On Fedora Core 1, you can use the file from
http://yolinux.com/TUTORIALS/OpenLDAP2.0-stooges-slapd.conf-sample.txt
pretty much as-is.
- Next, create the storage directories:
sudo mkdir /var/lib/ldap/stooges /var/lib/ldap/fraternity
sudo chown ldap.ldap /var/lib/ldap/stooges [Fedora Core only]
sudo chown ldap.ldap /var/lib/ldap/fraternity [Fedora Core only]
- And start the server:
sudo /etc/rc.d/init.d/ldap start [Fedora Core only]
sudo /etc/init.d/slapd start [Debian only]
- Now load the LDIF data for our testing: (this LDIF file adds a "spamassassin"
line to the "curley" user.)
ldapadd -f ldap/sa_test.ldif -xv -D "cn=StoogeAdmin,o=stooges" \
-h 127.0.0.1 -w secret1
- install Net::LDAP using CPAN:
perl -MCPAN -e shell
install Convert::ASN1
install Net::LDAP
quit
- create the test flag file:
touch t/do_ldap
- now "make test" will test against the LDAP server as well. You can
also run "cd t; ./spamd_ldap.t" to run just that test directly.

View File

@ -0,0 +1,99 @@
dn: o=stooges
objectClass: top
objectClass: organization
o: stooges
description: The Three Stooges
dn: cn=StoogeAdmin,o=stooges
objectClass: organizationalRole
cn: StoogeAdmin
description: LDAP Directory Administrator
dn: ou=MemberGroupA,o=stooges
ou: MemberGroupA
objectClass: top
objectClass: organizationalUnit
description: Members of MemberGroupA
dn: ou=MemberGroupB,o=stooges
ou: MemberGroupB
objectClass: top
objectClass: organizationalUnit
description: Members of MemberGroupB
dn: cn=Larry Anderson,ou=MemberGroupA,o=stooges
ou: MemberGroupA
o: stooges
cn: Larry Anderson
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
mail: LAnderson@isp.com
givenname: Larry
sn: Anderson
uid: larry
homePostalAddress: 15 Cherry Ln.$Plano TX 78888
postalAddress: 15 Fitzhugh Ave.
l: Dallas
st: TX
postalcode: 76888
telephoneNumber: (800)555-1212
homePhone: 800-555-1313
facsimileTelephoneNumber: 800-555-1414
userPassword: larrysecret
title: Account Executive
dn: cn=Moe Anderson,ou=MemberGroupA,o=stooges
ou: MemberGroupA
o: stooges
cn: Moe Anderson
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
mail: MAnderson@isp.com
givenname: Moe
sn: Anderson
uid: moe
initials: Bob
homePostalAddress: 16 Cherry Ln.$Plano TX 78888
postalAddress: 15 Fitzhugh Ave.
l: Dallas
st: TX
postalcode: 76888
pager: 800-555-1319
homePhone: 800-555-1313
telephoneNumber: (800)555-1213
mobile: 800-555-1318
title: Manager of Product Development
facsimileTelephoneNumber: 800-555-3318
manager: cn=Larry Anderson,ou=MemberGroupA,o=stooges
userPassword: moesecret
dn: cn=Curley Anderson,ou=MemberGroupB,o=stooges
ou: MemberGroupB
o: stooges
cn: Curley Anderson
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
mail: CAnderson@isp.com
givenname: Curley
sn: Anderson
uid: curley
initials: Joe
homePostalAddress: 14 Cherry Ln.$Plano TX 78888
postalAddress: 15 Fitzhugh Ave.
spamassassin: add_header all Foo LDAP read
l: Dallas
st: TX
postalcode: 76888
pager: 800-555-1319
homePhone: 800-555-1313
telephoneNumber: (800)555-1214
mobile: 800-555-1318
title: Developemnt Engineer
facsimileTelephoneNumber: 800-555-3318
userPassword: curleysecret

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,214 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::AICache - provide access to cached information for
ArchiveIterator
=head1 DESCRIPTION
This module allows ArchiveIterator to use cached atime information instead of
having to read every message separately.
=head1 PUBLIC METHODS
=over 4
=cut
package Mail::SpamAssassin::AICache;
use File::Spec;
use File::Path;
use File::Basename;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
use re 'taint';
use Errno qw(EBADF);
=item new()
Generates a new cache object.
=back
=cut
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = shift;
if (!defined $self) { $self = {}; }
$self->{cache} = {};
$self->{dirty} = 0;
$self->{prefix} ||= '/';
my $use_cache = 1;
# be sure to use rel2abs() here, since otherwise relative paths
# are broken by the prefix stuff
if ($self->{type} eq 'dir') {
$self->{cache_file} = File::Spec->catdir(
$self->{prefix},
File::Spec->rel2abs($self->{path}),
'.spamassassin_cache');
my @stat = stat($self->{cache_file});
@stat or dbg("AIcache: no access to %s: %s", $self->{cache_file}, $!);
$self->{cache_mtime} = $stat[9] || 0;
}
else {
my @split = File::Spec->splitpath($self->{path});
$self->{cache_file} = File::Spec->catdir(
$self->{prefix},
File::Spec->rel2abs($split[1]),
join('_', '.spamassassin_cache', $self->{type}, $split[2]));
my @stat = stat($self->{cache_file});
@stat or dbg("AIcache: no access to %s: %s", $self->{cache_file}, $!);
$self->{cache_mtime} = $stat[9] || 0;
# for mbox and mbx, verify whether mtime on cache file is >= mtime of
# messages file. if it is, use it, otherwise don't.
@stat = stat($self->{path});
@stat or dbg("AIcache: no access to %s: %s", $self->{path}, $!);
if ($stat[9] > $self->{cache_mtime}) {
$use_cache = 0;
}
}
$self->{cache_file} = File::Spec->canonpath($self->{cache_file});
# go ahead and read in the cache information
local *CACHE;
if (!$use_cache) {
# not in use
} elsif (!open(CACHE, $self->{cache_file})) {
dbg("AIcache: cannot open AI cache file (%s): %s", $self->{cache_file},$!);
} else {
for ($!=0; defined($_=<CACHE>); $!=0) {
my($k,$v) = split(/\t/, $_);
next unless (defined $k && defined $v);
$self->{cache}->{$k} = $v;
}
defined $_ || $!==0 or
$!==EBADF ? dbg("AIcache: error reading from AI cache file: $!")
: warn "error reading from AI cache file: $!";
close CACHE
or die "error closing AI cache file (".$self->{cache_file}."): $!";
}
bless($self,$class);
$self;
}
sub count {
my ($self) = @_;
return keys %{$self->{cache}};
}
sub check {
my ($self, $name) = @_;
return $self->{cache} unless $name;
# for dir collections: just use the info on a file, if an entry
# exists for that file. it's very unlikely that a file will be
# changed to contain a different Date header, and it's slow to check.
# return if ($self->{type} eq 'dir' && (stat($name))[9] > $self->{cache_mtime});
$name = $self->canon($name);
return $self->{cache}->{$name};
}
sub update {
my ($self, $name, $date) = @_;
return unless $name;
$name = $self->canon($name);
# if information is different than cached version, set dirty and update
if (!exists $self->{cache}->{$name} || $self->{cache}->{$name} != $date) {
$self->{cache}->{$name} = $date;
$self->{dirty} = 1;
}
}
sub finish {
my ($self) = @_;
return unless $self->{dirty};
# Cache is dirty, so write out new file
# create enclosing dir tree, if required
eval {
mkpath(dirname($self->{cache_file}));
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn "cannot mkpath for AI cache file ($self->{cache_file}): $eval_stat\n";
};
my $towrite = '';
while(my($k,$v) = each %{$self->{cache}}) {
$towrite .= "$k\t$v\n";
}
{
# ignore signals while we're writing this file
local $SIG{'INT'} = 'IGNORE';
local $SIG{'TERM'} = 'IGNORE';
if (!open(CACHE, ">".$self->{cache_file}))
{
warn "creating AI cache file failed (".$self->{cache_file}."): $!";
# TODO: should we delete it/clean it up?
}
else {
print CACHE $towrite
or warn "error writing to AI cache file: $!";
close CACHE
or warn "error closing AI cache file (".$self->{cache_file}."): $!";
}
}
return;
}
sub canon {
my ($self, $name) = @_;
if ($self->{type} eq 'dir') {
# strip off dirs, just look at filename
$name = (File::Spec->splitpath($name))[2];
}
else {
# we may get in a "/path/mbox.offset", so trim to just offset as necessary
$name =~ s/^.+\.(\d+)$/$1/;
}
return $name;
}
# ---------------------------------------------------------------------------
1;
__END__

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,683 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::AsyncLoop - scanner asynchronous event loop
=head1 DESCRIPTION
An asynchronous event loop used for long-running operations, performed "in the
background" during the Mail::SpamAssassin::check() scan operation, such as DNS
blocklist lookups.
=head1 METHODS
=over 4
=cut
package Mail::SpamAssassin::AsyncLoop;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Time::HiRes qw(time);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Logger;
our @ISA = qw();
# obtain timer resolution if possible
our $timer_resolution;
BEGIN {
eval {
$timer_resolution = Time::HiRes->can('clock_getres')
? Time::HiRes::clock_getres(Time::HiRes::CLOCK_REALTIME())
: 0.001; # wild guess, assume resolution is better than 1s
1;
} or do {
$timer_resolution = 1; # Perl's builtin timer ticks at one second
};
}
#############################################################################
sub new {
# called from PerMsgStatus, a new AsyncLoop object is created
# for each new message processing
my $class = shift;
$class = ref($class) || $class;
my ($main) = @_;
my $self = {
main => $main,
queries_started => 0,
queries_completed => 0,
total_queries_started => 0,
total_queries_completed => 0,
pending_lookups => { },
timing_by_query => { },
all_lookups => { }, # keyed by "rr_type/domain"
};
bless ($self, $class);
$self;
}
# Given a domain name, produces a listref of successively stripped down
# parent domains, e.g. a domain '2.10.Example.COM' would produce a list:
# '2.10.example.com', '10.example.com', 'example.com', 'com', ''
#
sub domain_to_search_list {
my ($domain) = @_;
$domain =~ s/^\.+//; $domain =~ s/\.+\z//; # strip leading and trailing dots
my @search_keys;
if ($domain =~ /\[/) { # don't split address literals
@search_keys = ( $domain, '' ); # presumably an address literal
} else {
local $1;
$domain = lc $domain;
for (;;) {
push(@search_keys, $domain);
last if $domain eq '';
# strip one level
$domain = ($domain =~ /^ (?: [^.]* ) \. (.*) \z/xs) ? $1 : '';
}
if (@search_keys > 20) { # enforce some sanity limit
@search_keys = @search_keys[$#search_keys-19 .. $#search_keys];
}
}
return \@search_keys;
}
# ---------------------------------------------------------------------------
=item $ent = $async->start_lookup($ent, $master_deadline)
Register the start of a long-running asynchronous lookup operation.
C<$ent> is a hash reference containing the following items:
=over 4
=item key (required)
A key string, unique to this lookup. This is what is reported in
debug messages, used as the key for C<get_lookup()>, etc.
=item id (required)
An ID string, also unique to this lookup. Typically, this is the DNS packet ID
as returned by DnsResolver's C<bgsend> method. Sadly, the Net::DNS
architecture forces us to keep a separate ID string for this task instead of
reusing C<key> -- if you are not using DNS lookups through DnsResolver, it
should be OK to just reuse C<key>.
=item type (required)
A string, typically one word, used to describe the type of lookup in log
messages, such as C<DNSBL>, C<MX>, C<TXT>.
=item zone (optional)
A zone specification (typically a DNS zone name - e.g. host, domain, or RBL)
which may be used as a key to look up per-zone settings. No semantics on this
parameter is imposed by this module. Currently used to fetch by-zone timeouts.
=item timeout_initial (optional)
An initial value of elapsed time for which we are willing to wait for a
response (time in seconds, floating point value is allowed). When elapsed
time since a query started exceeds the timeout value and there are no other
queries to wait for, the query is aborted. The actual timeout value ranges
from timeout_initial and gradually approaches timeout_min (see next parameter)
as the number of already completed queries approaches the number of all
queries started.
If a caller does not explicitly provide this parameter or its value is
undefined, a default initial timeout value is settable by a configuration
variable rbl_timeout.
If a value of the timeout_initial parameter is below timeout_min, the initial
timeout is set to timeout_min.
=item timeout_min (optional)
A lower bound (in seconds) to which the actual timeout approaches as the
number of queries completed approaches the number of all queries started.
Defaults to 0.2 * timeout_initial.
=back
C<$ent> is returned by this method, with its contents augmented by additional
information.
=cut
sub start_lookup {
my ($self, $ent, $master_deadline) = @_;
my $id = $ent->{id};
my $key = $ent->{key};
defined $id && $id ne '' or die "oops, no id";
$key or die "oops, no key";
$ent->{type} or die "oops, no type";
my $now = time;
$ent->{start_time} = $now if !defined $ent->{start_time};
# are there any applicable per-zone settings?
my $zone = $ent->{zone};
my $settings; # a ref to a by-zone or to global settings
my $conf_by_zone = $self->{main}->{conf}->{by_zone};
if (defined $zone && $conf_by_zone) {
# dbg("async: searching for by_zone settings for $zone");
$zone =~ s/^\.//; $zone =~ s/\.\z//; # strip leading and trailing dot
for (;;) { # 2.10.example.com, 10.example.com, example.com, com, ''
if (exists $conf_by_zone->{$zone}) {
$settings = $conf_by_zone->{$zone};
last;
} elsif ($zone eq '') {
last;
} else { # strip one level, careful with address literals
$zone = ($zone =~ /^( (?: [^.] | \[ (?: \\. | [^\]\\] )* \] )* )
\. (.*) \z/xs) ? $2 : '';
}
}
}
dbg("async: applying by_zone settings for %s", $zone) if $settings;
my $t_init = $ent->{timeout_initial}; # application-specified has precedence
$t_init = $settings->{rbl_timeout} if $settings && !defined $t_init;
$t_init = $self->{main}->{conf}->{rbl_timeout} if !defined $t_init;
$t_init = 0 if !defined $t_init; # last-resort default, just in case
my $t_end = $ent->{timeout_min}; # application-specified has precedence
$t_end = $settings->{rbl_timeout_min} if $settings && !defined $t_end;
$t_end = $self->{main}->{conf}->{rbl_timeout_min} if !defined $t_end; # added for bug 7070
$t_end = 0.2 * $t_init if !defined $t_end;
$t_end = 0 if $t_end < 0; # just in case
$t_init = $t_end if $t_init < $t_end;
my $clipped_by_master_deadline = 0;
if (defined $master_deadline) {
my $time_avail = $master_deadline - time;
$time_avail = 0.5 if $time_avail < 0.5; # give some slack
if ($t_init > $time_avail) {
$t_init = $time_avail; $clipped_by_master_deadline = 1;
$t_end = $time_avail if $t_end > $time_avail;
}
}
$ent->{timeout_initial} = $t_init;
$ent->{timeout_min} = $t_end;
$ent->{display_id} = # identifies entry in debug logging and similar
join(", ", grep { defined }
map { ref $ent->{$_} ? @{$ent->{$_}} : $ent->{$_} }
qw(sets rules rulename type key) );
$self->{pending_lookups}->{$key} = $ent;
$self->{queries_started}++;
$self->{total_queries_started}++;
dbg("async: starting: %s (timeout %.1fs, min %.1fs)%s",
$ent->{display_id}, $ent->{timeout_initial}, $ent->{timeout_min},
!$clipped_by_master_deadline ? '' : ', capped by time limit');
$ent;
}
# ---------------------------------------------------------------------------
=item $ent = $async->bgsend_and_start_lookup($domain, $type, $class, $ent, $cb, %options)
A common idiom: calls C<bgsend>, followed by a call to C<start_lookup>,
returning the argument $ent object as modified by C<start_lookup> and
filled-in with a query ID.
=cut
sub bgsend_and_start_lookup {
my($self, $domain, $type, $class, $ent, $cb, %options) = @_;
$ent = {} if !$ent;
$domain =~ s/\.+\z//s; # strip trailing dots, these sometimes still sneak in
$ent->{id} = undef;
$ent->{query_type} = $type;
$ent->{query_domain} = $domain;
$ent->{type} = $type if !exists $ent->{type};
$cb = $ent->{completed_callback} if !$cb; # compatibility with SA < 3.4
my $key = $ent->{key} || '';
my $dnskey = uc($type) . '/' . lc($domain);
my $dns_query_info = $self->{all_lookups}{$dnskey};
if ($dns_query_info) { # DNS query already underway or completed
my $id = $ent->{id} = $dns_query_info->{id}; # re-use existing query
return if !defined $id; # presumably blocked, or other fatal failure
my $id_tail = $id; $id_tail =~ s{^\d+/IN/}{};
lc($id_tail) eq lc($dnskey)
or info("async: unmatched id %s, key=%s", $id, $dnskey);
my $pkt = $dns_query_info->{pkt};
if (!$pkt) { # DNS query underway, still waiting for results
# just add our query to the existing one
push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
dbg("async: query %s already underway, adding no.%d %s",
$id, scalar @{$dns_query_info->{applicants}},
$ent->{rulename} || $key);
} else { # DNS query already completed, re-use results
# answer already known, just do the callback and be done with it
if (!$cb) {
dbg("async: query %s already done, re-using for %s", $id, $key);
} else {
dbg("async: query %s already done, re-using for %s, callback",
$id, $key);
eval {
$cb->($ent, $pkt); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (1) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("query %s completed, callback %s failed: %s\n",
$id, $key, $@);
};
}
}
}
else { # no existing query, open a new DNS query
$dns_query_info = $self->{all_lookups}{$dnskey} = {}; # new query needed
my($id, $blocked);
my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
if ($dns_query_blockages) {
my $search_list = domain_to_search_list($domain);
foreach my $parent_domain (@$search_list) {
$blocked = $dns_query_blockages->{$parent_domain};
last if defined $blocked; # stop at first defined, can be true or false
}
}
if ($blocked) {
dbg("async: blocked by dns_query_restriction: %s", $dnskey);
} else {
dbg("async: launching %s for %s", $dnskey, $key);
$id = $self->{main}->{resolver}->bgsend($domain, $type, $class, sub {
my($pkt, $pkt_id, $timestamp) = @_;
# this callback sub is called from DnsResolver::poll_responses()
# dbg("async: in a bgsend_and_start_lookup callback, id %s", $pkt_id);
if ($pkt_id ne $id) {
warn "async: mismatched dns id: got $pkt_id, expected $id\n";
return;
}
$self->set_response_packet($pkt_id, $pkt, $ent->{key}, $timestamp);
$dns_query_info->{pkt} = $pkt;
my $cb_count = 0;
foreach my $tuple (@{$dns_query_info->{applicants}}) {
my($appl_ent, $appl_cb) = @$tuple;
if ($appl_cb) {
dbg("async: calling callback on key %s%s", $key,
!defined $appl_ent->{rulename} ? ''
: ", rule ".$appl_ent->{rulename});
$cb_count++;
eval {
$appl_cb->($appl_ent, $pkt); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (2) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("query %s completed, callback %s failed: %s\n",
$id, $appl_ent->{key}, $@);
};
}
}
delete $dns_query_info->{applicants};
dbg("async: query $id completed, no callbacks run") if !$cb_count;
});
}
return if !defined $id;
$dns_query_info->{id} = $ent->{id} = $id;
push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
$self->start_lookup($ent, $options{master_deadline});
}
return $ent;
}
# ---------------------------------------------------------------------------
=item $ent = $async->get_lookup($key)
Retrieve the pending-lookup object for the given key C<$key>.
If the lookup is complete, this will return C<undef>.
Note that a lookup is still considered "pending" until C<complete_lookups()> is
called, even if it has been reported as complete via C<set_response_packet()>.
=cut
sub get_lookup {
my ($self, $key) = @_;
return $self->{pending_lookups}->{$key};
}
# ---------------------------------------------------------------------------
=item $async->log_lookups_timing()
Log sorted timing for all completed lookups.
=cut
sub log_lookups_timing {
my ($self) = @_;
my $timings = $self->{timing_by_query};
for my $key (sort { $timings->{$a} <=> $timings->{$b} } keys %$timings) {
dbg("async: timing: %.3f %s", $timings->{$key}, $key);
}
}
# ---------------------------------------------------------------------------
=item $alldone = $async->complete_lookups()
Perform a poll of the pending lookups, to see if any are completed.
Callbacks on completed queries will be called from poll_responses().
If there are no lookups remaining, or if too much time has elapsed since
any results were returned, C<1> is returned, otherwise C<0>.
=cut
sub complete_lookups {
my ($self, $timeout, $allow_aborting_of_expired) = @_;
my $alldone = 0;
my $anydone = 0;
my $allexpired = 1;
my %typecount;
my $pending = $self->{pending_lookups};
$self->{queries_started} = 0;
$self->{queries_completed} = 0;
my $now = time;
if (defined $timeout && $timeout > 0 &&
%$pending && $self->{total_queries_started} > 0)
{
# shrink a 'select' timeout if a caller specified unnecessarily long
# value beyond the latest deadline of any outstanding request;
# can save needless wait time (up to 1 second in harvest_dnsbl_queries)
my $r = $self->{total_queries_completed} / $self->{total_queries_started};
my $r2 = $r * $r; # 0..1
my $max_deadline;
while (my($key,$ent) = each %$pending) {
my $t_init = $ent->{timeout_initial};
my $dt = $t_init - ($t_init - $ent->{timeout_min}) * $r2;
my $deadline = $ent->{start_time} + $dt;
$max_deadline = $deadline if !defined $max_deadline ||
$deadline > $max_deadline;
}
if (defined $max_deadline) {
# adjust to timer resolution, only deals with 1s and with fine resolution
$max_deadline = 1 + int $max_deadline
if $timer_resolution == 1 && $max_deadline > int $max_deadline;
my $sufficient_timeout = $max_deadline - $now;
$sufficient_timeout = 0 if $sufficient_timeout < 0;
if ($timeout > $sufficient_timeout) {
dbg("async: reducing select timeout from %.1f to %.1f s",
$timeout, $sufficient_timeout);
$timeout = $sufficient_timeout;
}
}
}
# trap this loop in an eval { } block, as Net::DNS could throw
# die()s our way; in particular, process_dnsbl_results() has
# thrown die()s before (bug 3794).
eval {
if (%$pending) { # any outstanding requests still?
$self->{last_poll_responses_time} = $now;
my $nfound = $self->{main}->{resolver}->poll_responses($timeout);
dbg("async: select found %s responses ready (t.o.=%.1f)",
!$nfound ? 'no' : $nfound, $timeout);
}
$now = time; # capture new timestamp, after possible sleep in 'select'
# A callback routine may generate another DNS query, which may insert
# an entry into the %$pending hash thus invalidating the each() context.
# So, make sure that callbacks are not called while the each() context
# is open. [Bug 6937]
#
while (my($key,$ent) = each %$pending) {
my $id = $ent->{id};
if (exists $self->{finished}->{$id}) {
delete $self->{finished}->{$id};
$anydone = 1;
$ent->{finish_time} = $now if !defined $ent->{finish_time};
my $elapsed = $ent->{finish_time} - $ent->{start_time};
dbg("async: completed in %.3f s: %s", $elapsed, $ent->{display_id});
$self->{timing_by_query}->{". $key"} += $elapsed;
$self->{queries_completed}++;
$self->{total_queries_completed}++;
delete $pending->{$key};
}
}
if (%$pending) { # still any requests outstanding? are they expired?
my $r =
!$allow_aborting_of_expired || !$self->{total_queries_started} ? 1.0
: $self->{total_queries_completed} / $self->{total_queries_started};
my $r2 = $r * $r; # 0..1
while (my($key,$ent) = each %$pending) {
$typecount{$ent->{type}}++;
my $t_init = $ent->{timeout_initial};
my $dt = $t_init - ($t_init - $ent->{timeout_min}) * $r2;
# adjust to timer resolution, only deals with 1s and fine resolution
$dt = 1 + int $dt if $timer_resolution == 1 && $dt > int $dt;
$allexpired = 0 if $now <= $ent->{start_time} + $dt;
}
dbg("async: queries completed: %d, started: %d",
$self->{queries_completed}, $self->{queries_started});
}
# ensure we don't get stuck if a request gets lost in the ether.
if (! %$pending) {
$alldone = 1;
}
elsif ($allexpired && $allow_aborting_of_expired) {
# avoid looping forever if we haven't got all results.
dbg("async: escaping: lost or timed out requests or responses");
$self->abort_remaining_lookups();
$alldone = 1;
}
else {
dbg("async: queries active: %s%s at %s",
join (' ', map { "$_=$typecount{$_}" } sort keys %typecount),
$allexpired ? ', all expired' : '', scalar(localtime(time)));
$alldone = 0;
}
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
# resignal if alarm went off
die "async: (3) $eval_stat\n" if $eval_stat =~ /__alarm__ignore__\(.*\)/s;
dbg("async: caught complete_lookups death, aborting: %s", $eval_stat);
$alldone = 1; # abort remaining
};
return wantarray ? ($alldone,$anydone) : $alldone;
}
# ---------------------------------------------------------------------------
=item $async->abort_remaining_lookups()
Abort any remaining lookups.
=cut
sub abort_remaining_lookups {
my ($self) = @_;
my $pending = $self->{pending_lookups};
my $foundcnt = 0;
my $now = time;
while (my($key,$ent) = each %$pending) {
dbg("async: aborting after %.3f s, %s: %s",
$now - $ent->{start_time},
(defined $ent->{timeout_initial} &&
$now > $ent->{start_time} + $ent->{timeout_initial}
? 'past original deadline' : 'deadline shrunk'),
$ent->{display_id} );
$foundcnt++;
$self->{timing_by_query}->{"X $key"} = $now - $ent->{start_time};
$ent->{finish_time} = $now if !defined $ent->{finish_time};
delete $pending->{$key};
}
# call any remaining callbacks, indicating the query has been aborted
#
my $all_lookups_ref = $self->{all_lookups};
foreach my $dnskey (keys %$all_lookups_ref) {
my $dns_query_info = $all_lookups_ref->{$dnskey};
my $cb_count = 0;
foreach my $tuple (@{$dns_query_info->{applicants}}) {
my($ent, $cb) = @$tuple;
if ($cb) {
dbg("async: calling callback/abort on key %s%s", $dnskey,
!defined $ent->{rulename} ? '' : ", rule ".$ent->{rulename});
$cb_count++;
eval {
$cb->($ent, undef); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (2) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("query %s aborted, callback %s failed: %s\n",
$dnskey, $ent->{key}, $@);
};
}
dbg("async: query $dnskey aborted, no callbacks run") if !$cb_count;
}
delete $dns_query_info->{applicants};
}
dbg("async: aborted %d remaining lookups", $foundcnt) if $foundcnt > 0;
delete $self->{last_poll_responses_time};
$self->{main}->{resolver}->bgabort();
1;
}
# ---------------------------------------------------------------------------
=item $async->set_response_packet($id, $pkt, $key, $timestamp)
Register a "response packet" for a given query. C<$id> is the ID for the
query, and must match the C<id> supplied in C<start_lookup()>. C<$pkt> is the
packet object for the response. A parameter C<$key> identifies an entry in a
hash %{$self->{pending_lookups}} where the object which spawned this query can
be found, and through which further information about the query is accessible.
C<$pkt> may be undef, indicating that no response packet is available, but a
query has completed (e.g. was aborted or dismissed) and is no longer "pending".
The DNS resolver's response packet C<$pkt> will be made available to a callback
subroutine through its argument as well as in C<$ent-<gt>{response_packet}>.
=cut
sub set_response_packet {
my ($self, $id, $pkt, $key, $timestamp) = @_;
$self->{finished}->{$id} = 1; # only key existence matters, any value
$timestamp = time if !defined $timestamp;
my $pending = $self->{pending_lookups};
if (!defined $key) { # backward compatibility with 3.2.3 and older plugins
# a third-party plugin did not provide $key in a call, search for it:
if ($id eq $pending->{$id}->{id}) { # I feel lucky, key==id ?
$key = $id;
} else { # then again, maybe not, be more systematic
for my $tkey (keys %$pending) {
if ($id eq $pending->{$tkey}->{id}) { $key = $tkey; last }
}
}
dbg("async: got response on id $id, search found key $key");
}
if (!defined $key) {
info("async: no key, response packet not remembered, id $id");
} else {
my $ent = $pending->{$key};
my $ent_id = $ent->{id};
if (!defined $ent_id) {
# should not happen, troubleshooting
info("async: ignoring response, id %s, ent_id is undef: %s",
$id, join(', ', %$ent));
} elsif ($id ne $ent_id) {
info("async: ignoring response, mismatched id $id, expected $ent_id");
} else {
$ent->{finish_time} = $timestamp;
$ent->{response_packet} = $pkt;
}
}
1;
}
=item $async->report_id_complete($id,$key,$key,$timestamp)
Legacy. Equivalent to $self->set_response_packet($id,undef,$key,$timestamp),
i.e. providing undef as a response packet. Register that a query has
completed and is no longer "pending". C<$id> is the ID for the query,
and must match the C<id> supplied in C<start_lookup()>.
One or the other of C<set_response_packet()> or C<report_id_complete()>
should be called, but not both.
=cut
sub report_id_complete {
my ($self, $id, $key, $timestamp) = @_;
$self->set_response_packet($id, undef, $key, $timestamp);
}
# ---------------------------------------------------------------------------
=item $time = $async->last_poll_responses_time()
Get the time of the last call to C<poll_responses()> (which is called
from C<complete_lookups()>. If C<poll_responses()> was never called or
C<abort_remaining_lookups()> has been called C<last_poll_responses_time()>
will return undef.
=cut
sub last_poll_responses_time {
my ($self) = @_;
return $self->{last_poll_responses_time};
}
1;
=back
=cut

View File

@ -0,0 +1,354 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::AutoWhitelist - auto-whitelist handler for SpamAssassin
=head1 SYNOPSIS
(see Mail::SpamAssassin)
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using text analysis and
several internet-based realtime blacklists.
This class is used internally by SpamAssassin to manage the automatic
whitelisting functionality. Please refer to the C<Mail::SpamAssassin>
documentation for public interfaces.
=head1 METHODS
=over 4
=cut
package Mail::SpamAssassin::AutoWhitelist;
use strict;
use warnings;
# use bytes;
use re 'taint';
use NetAddr::IP 4.000;
use Mail::SpamAssassin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
our @ISA = qw();
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my ($main, $msg) = @_;
my $conf = $main->{conf};
my $self = {
main => $main,
factor => $conf->{auto_whitelist_factor},
ipv4_mask_len => $conf->{auto_whitelist_ipv4_mask_len},
ipv6_mask_len => $conf->{auto_whitelist_ipv6_mask_len},
};
my $factory;
if ($main->{pers_addr_list_factory}) {
$factory = $main->{pers_addr_list_factory};
}
else {
my $type = $conf->{auto_whitelist_factory};
if ($type =~ /^([_A-Za-z0-9:]+)$/) {
$type = untaint_var($type);
eval '
require '.$type.';
$factory = '.$type.'->new();
1;
'
or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn "auto-whitelist: $eval_stat\n";
undef $factory;
};
$main->set_persistent_address_list_factory($factory) if $factory;
}
else {
warn "auto-whitelist: illegal auto_whitelist_factory setting\n";
}
}
if (!defined $factory) {
$self->{checker} = undef;
} else {
$self->{checker} = $factory->new_checker($self->{main});
}
bless ($self, $class);
$self;
}
###########################################################################
=item $meanscore = awl->check_address($addr, $originating_ip, $signedby);
This method will return the mean score of all messages associated with the
given address, or undef if the address hasn't been seen before.
If B<$originating_ip> is supplied, it will be used in the lookup.
=cut
sub check_address {
my ($self, $addr, $origip, $signedby) = @_;
if (!defined $self->{checker}) {
return; # no factory defined; we can't check
}
$self->{entry} = undef;
my $fulladdr = $self->pack_addr ($addr, $origip);
my $entry = $self->{checker}->get_addr_entry ($fulladdr, $signedby);
$self->{entry} = $entry;
if (!$entry->{msgcount}) {
# no entry found
if (defined $origip) {
# try upgrading a default entry (probably from "add-addr-to-foo")
my $noipaddr = $self->pack_addr ($addr, undef);
my $noipent = $self->{checker}->get_addr_entry ($noipaddr, undef);
if (defined $noipent->{msgcount} && $noipent->{msgcount} > 0) {
dbg("auto-whitelist: found entry w/o IP address for $addr: replacing with $origip");
$self->{checker}->remove_entry($noipent);
# Now assign proper entry the count and totscore values of the
# no-IP entry instead of assigning the whole value to avoid
# wiping out any information added to the previous entry.
$entry->{msgcount} = $noipent->{msgcount};
$entry->{totscore} = $noipent->{totscore};
}
}
}
if ($entry->{msgcount} < 0 ||
$entry->{msgcount} != $entry->{msgcount} || # test for NaN
$entry->{totscore} != $entry->{totscore})
{
warn "auto-whitelist: resetting bad data for ($addr, $origip), ".
"count: $entry->{msgcount}, totscore: $entry->{totscore}\n";
$entry->{msgcount} = $entry->{totscore} = 0;
}
return !$entry->{msgcount} ? undef : $entry->{totscore} / $entry->{msgcount};
}
###########################################################################
=item awl->count();
This method will return the count of messages used in determining the
whitelist correction.
=cut
sub count {
my $self = shift;
return $self->{entry}->{msgcount};
}
###########################################################################
=item awl->add_score($score);
This method will add half the score to the current entry. Half the
score is used, so that repeated use of the same From and IP address
combination will gradually reduce the score.
=cut
sub add_score {
my ($self,$score) = @_;
if (!defined $self->{checker}) {
return; # no factory defined; we can't check
}
if ($score != $score) {
warn "auto-whitelist: attempt to add a $score to AWL entry ignored\n";
return; # don't try to add a NaN
}
$self->{entry}->{msgcount} ||= 0;
$self->{checker}->add_score($self->{entry}, $score);
}
###########################################################################
=item awl->add_known_good_address($addr);
This method will add a score of -100 to the given address -- effectively
"bootstrapping" the address as being one that should be whitelisted.
=cut
sub add_known_good_address {
my ($self, $addr, $signedby) = @_;
return $self->modify_address($addr, -100, $signedby);
}
###########################################################################
=item awl->add_known_bad_address($addr);
This method will add a score of 100 to the given address -- effectively
"bootstrapping" the address as being one that should be blacklisted.
=cut
sub add_known_bad_address {
my ($self, $addr, $signedby) = @_;
return $self->modify_address($addr, 100, $signedby);
}
###########################################################################
sub remove_address {
my ($self, $addr, $signedby) = @_;
return $self->modify_address($addr, undef, $signedby);
}
###########################################################################
sub modify_address {
my ($self, $addr, $score, $signedby) = @_;
if (!defined $self->{checker}) {
return; # no factory defined; we can't check
}
my $fulladdr = $self->pack_addr ($addr, undef);
my $entry = $self->{checker}->get_addr_entry ($fulladdr, $signedby);
# remove any old entries (will remove per-ip entries as well)
# always call this regardless, as the current entry may have 0
# scores, but the per-ip one may have more
$self->{checker}->remove_entry($entry);
# remove address only, no new score to add
if (!defined $score) { return 1; }
if ($score != $score) { return 1; } # don't try to add a NaN
# else add score. get a new entry first
$entry = $self->{checker}->get_addr_entry ($fulladdr, $signedby);
$self->{checker}->add_score($entry, $score);
return 1;
}
###########################################################################
sub finish {
my $self = shift;
return if !defined $self->{checker};
$self->{checker}->finish();
}
###########################################################################
sub ip_to_awl_key {
my ($self, $origip) = @_;
my $result;
local $1;
if (!defined $origip) {
# could not find an IP address to use
} elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) {
my $mask_len = $self->{ipv4_mask_len};
$mask_len = 16 if !defined $mask_len;
# handle the default and easy cases manually
if ($mask_len == 32) {
$result = $origip;
} elsif ($mask_len == 16) {
$result = $1;
} else {
my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
if (!defined $origip_obj) { # invalid IPv4 address
dbg("auto-whitelist: bad IPv4 address $origip");
} else {
$result = $origip_obj->network->addr;
$result =~s/(\.0){1,3}\z//; # truncate zero tail
}
}
} elsif ($origip =~ /:/ && # triage
$origip =~
/^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
# looks like an IPv6 address
my $mask_len = $self->{ipv6_mask_len};
$mask_len = 48 if !defined $mask_len;
my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
if (!defined $origip_obj) { # invalid IPv6 address
dbg("auto-whitelist: bad IPv6 address $origip");
} elsif (NetAddr::IP->can('full6')) { # since NetAddr::IP 4.010
$result = $origip_obj->network->full6; # string in a canonical form
$result =~ s/(:0000){1,7}\z/::/; # compress zero tail
}
} else {
dbg("auto-whitelist: bad IP address $origip");
}
if (defined $result && length($result) > 39) { # just in case, keep under
$result = substr($result,0,39); # the awl.ip field size
}
if (defined $result) {
dbg("auto-whitelist: IP masking %s -> %s", $origip,$result);
}
return $result;
}
###########################################################################
sub pack_addr {
my ($self, $addr, $origip) = @_;
$addr = lc $addr;
$addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia
if (defined $origip) {
$origip = $self->ip_to_awl_key($origip);
}
if (!defined $origip) {
# could not find an IP address to use, could be localhost mail
# or from the user running "add-addr-to-*".
$origip = 'none';
}
return $addr . "|ip=" . $origip;
}
###########################################################################
1;
=back
=cut

View File

@ -0,0 +1,165 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Bayes - support for learning classifiers
=head1 DESCRIPTION
This is the general class used to train a learning classifier with new samples
of spam and ham mail, and classify based on prior training.
Prior to version 3.3.0, the default Bayes implementation was here; if you're
looking for information on that, it has moved to
C<Mail::SpamAssassin::Plugin::Bayes>.
=cut
package Mail::SpamAssassin::Bayes;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
our @ISA = qw();
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my ($main) = @_;
my $self = {
'main' => $main,
'conf' => $main->{conf},
'use_ignores' => 1,
};
bless ($self, $class);
$self->{main}->call_plugins("learner_new");
$self;
}
###########################################################################
sub finish {
my $self = shift;
# we don't need to do the plugin; Mail::SpamAssassin::finish() does
# that for us
%{$self} = ();
}
###########################################################################
# force the Bayes dbs to be closed, if they haven't already been; called
# at the end of scan operation, or when switching between user IDs,
# or when C<Mail::SpamAssassin::finish_learner()> is called.
#
sub force_close {
my $self = shift;
my $quiet = shift;
$self->{main}->call_plugins("learner_close", { quiet => $quiet });
}
###########################################################################
sub ignore_message {
my ($self,$PMS) = @_;
return 0 unless $self->{use_ignores};
my $ig_from = $self->{main}->call_plugins ("check_wb_list",
{ permsgstatus => $PMS, type => 'from', list => 'bayes_ignore_from' });
my $ig_to = $self->{main}->call_plugins ("check_wb_list",
{ permsgstatus => $PMS, type => 'to', list => 'bayes_ignore_to' });
my $ignore = $ig_from || $ig_to;
dbg("bayes: not using bayes, bayes_ignore_from or _to rule") if $ignore;
return $ignore;
}
###########################################################################
sub learn {
my ($self, $isspam, $msg, $id) = @_;
return unless $self->{conf}->{use_learner};
return unless defined $msg;
if( $self->{use_ignores} ) # Remove test when PerMsgStatus available.
{
# DMK, koppel@ece.lsu.edu: Hoping that the ultimate fix to bug 2263 will
# make it unnecessary to construct a PerMsgStatus here.
my $PMS = new Mail::SpamAssassin::PerMsgStatus $self->{main}, $msg;
my $ignore = $self->ignore_message($PMS);
$PMS->finish();
return 0 if $ignore;
}
return $self->{main}->call_plugins("learn_message", { isspam => $isspam, msg => $msg, id => $id });
}
###########################################################################
sub forget {
my ($self, $msg, $id) = @_;
return unless $self->{conf}->{use_learner};
return unless defined $msg;
return $self->{main}->call_plugins("forget_message", { msg => $msg, id => $id });
}
###########################################################################
sub sync {
my ($self, $sync, $expire, $opts) = @_;
return 0 unless $self->{conf}->{use_learner};
if ($sync) {
$self->{main}->call_plugins("learner_sync", $opts );
}
if ($expire) {
$self->{main}->call_plugins("learner_expire_old_training", $opts );
}
return 0;
}
###########################################################################
sub is_scan_available {
my $self = shift;
return 0 unless $self->{conf}->{use_learner};
return $self->{main}->call_plugins("learner_is_scan_available");
}
###########################################################################
sub dump_bayes_db {
my($self, $magic, $toks, $regex) = @_;
return 0 unless $self->{conf}->{use_learner};
return $self->{main}->call_plugins("learner_dump_database", {
magic => $magic, toks => $toks, regex => $regex });
}
1;

View File

@ -0,0 +1,124 @@
# Chi-square probability combining and related constants.
#
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
use strict; # make Test::Perl::Critic happy
# this package is a no-op; the real impl code is in another pkg.
package Mail::SpamAssassin::Bayes::CombineChi; 1;
# Force into another package, so our symbols will appear in that namespace with
# no indirection, for speed. Other combiners must do the same, since Bayes.pm
# uses this namespace directly. This means only one combiner can be loaded at
# any time.
package Mail::SpamAssassin::Bayes::Combine;
use strict;
use warnings;
# use bytes;
use re 'taint';
use POSIX qw(frexp);
use constant LN2 => log(2);
# Value for 'x' in Gary Robinson's f(w) equation.
# "Let x = the number used when n [hits] is 0."
our $FW_X_CONSTANT = 0.538;
# Value for 's' in the f(w) equation. "We can see s as the "strength" (hence
# the use of "s") of an original assumed expectation ... relative to how
# strongly we want to consider our actual collected data." Low 's' means
# trust collected data more strongly.
our $FW_S_CONSTANT = 0.030;
# (s . x) for the f(w) equation.
our $FW_S_DOT_X = ($FW_X_CONSTANT * $FW_S_CONSTANT);
# Should we ignore tokens with probs very close to the middle ground (.5)?
# tokens need to be outside the [ .5-MPS, .5+MPS ] range to be used.
our $MIN_PROB_STRENGTH = 0.346;
###########################################################################
# Chi-Squared method. Produces mostly boolean $result,
# but with a grey area.
sub combine {
my ($ns, $nn, $sortedref) = @_;
# @$sortedref contains an array of the probabilities
my $wc = scalar @$sortedref;
return unless $wc;
my ($H, $S);
my ($Hexp, $Sexp);
$Hexp = $Sexp = 0;
# see bug 3118
my $totmsgs = ($ns + $nn);
if ($totmsgs == 0) { return; }
$S = ($ns / $totmsgs);
$H = ($nn / $totmsgs);
foreach my $prob (@$sortedref) {
$S *= 1.0 - $prob;
$H *= $prob;
if ($S < 1e-200) {
my $e;
($S, $e) = frexp($S);
$Sexp += $e;
}
if ($H < 1e-200) {
my $e;
($H, $e) = frexp($H);
$Hexp += $e;
}
}
$S = log($S) + $Sexp * LN2;
$H = log($H) + $Hexp * LN2;
# note: previous versions used (2 * $wc) as second arg ($v), but the chi2q()
# fn then just used ($v/2) internally! changed to simply supply $wc as
# ($halfv) directly instead to avoid redundant doubling and halving. The
# side-effect is that chi2q() uses a different API now, but it's only used
# here anyway.
$S = 1.0 - chi2q(-2.0 * $S, $wc);
$H = 1.0 - chi2q(-2.0 * $H, $wc);
return (($S - $H) + 1.0) / 2.0;
}
# Chi-squared function (API changed; see comment above)
sub chi2q {
my ($x2, $halfv) = @_;
my $m = $x2 / 2.0;
my ($sum, $term);
$sum = $term = exp(0 - $m);
# replace 'for my $i (1 .. (($v/2)-1))' idiom, which creates a temp
# array, with a plain C-style for loop
my $i;
for ($i = 1; $i < $halfv; $i++) {
$term *= $m / $i;
$sum += $term;
}
return $sum < 1.0 ? $sum : 1.0;
}
1;

View File

@ -0,0 +1,77 @@
# Naive-Bayesian-style probability combining and related constants.
#
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
use strict; # make Test::Perl::Critic happy
# this package is a no-op; the real impl code is in another pkg.
package Mail::SpamAssassin::Bayes::CombineNaiveBayes; 1;
# Force into another package, so our symbols will appear in that namespace with
# no indirection, for speed. Other combiners must do the same, since Bayes.pm
# uses this namespace directly. This means only one combiner can be loaded at
# any time.
package Mail::SpamAssassin::Bayes::Combine;
use strict;
use warnings;
# use bytes;
use re 'taint';
###########################################################################
# Value for 'x' in Gary Robinson's f(w) equation.
# "Let x = the number used when n [hits] is 0."
our $FW_X_CONSTANT = 0.600;
# Value for 's' in the f(w) equation. "We can see s as the "strength" (hence
# the use of "s") of an original assumed expectation ... relative to how
# strongly we want to consider our actual collected data." Low 's' means
# trust collected data more strongly.
our $FW_S_CONSTANT = 0.160;
# (s . x) for the f(w) equation.
our $FW_S_DOT_X = ($FW_X_CONSTANT * $FW_S_CONSTANT);
# Should we ignore tokens with probs very close to the middle ground (.5)?
# tokens need to be outside the [ .5-MPS, .5+MPS ] range to be used.
our $MIN_PROB_STRENGTH = 0.430;
###########################################################################
# Combine probabilities using Gary Robinson's naive-Bayesian-style
# combiner
sub combine {
my ($ns, $nn, $sortedref) = @_;
my $wc = scalar @$sortedref;
return unless $wc;
my $P = 1;
my $Q = 1;
foreach my $pw (@$sortedref) {
$P *= (1-$pw);
$Q *= $pw;
}
$P = 1 - ($P ** (1 / $wc));
$Q = 1 - ($Q ** (1 / $wc));
return (1 + ($P - $Q) / ($P + $Q)) / 2.0;
}
1;

View File

@ -0,0 +1,921 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::BayesStore - Storage Module for default Bayes classifier
=head1 DESCRIPTION
This is the public API for the Bayesian store methods. Any implementation of
the storage module for the default Bayes classifier must implement these methods.
=cut
package Mail::SpamAssassin::BayesStore;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Logger;
# TODO: if we ever get tuits, it'd be good to make these POD
# method docs more perlish... hardly a biggie.
=head1 METHODS
=over 4
=item new
public class (Mail::SpamAssassin::BayesStore) new (Mail::SpamAssassin::Plugin::Bayes $bayes)
Description:
This method creates a new instance of the Mail::SpamAssassin::BayesStore
object. You must pass in an instance of the Mail::SpamAssassin::Plugin::Bayes
object, which is stashed for use throughout the module.
=cut
sub new {
my ($class, $bayes) = @_;
$class = ref($class) || $class;
my $self = {
'bayes' => $bayes,
'supported_db_version' => 0,
'db_version' => undef,
};
bless ($self, $class);
$self;
}
=item DB_VERSION
public instance (Integer) DB_VERSION ()
Description:
This method returns the currently supported database version for the
implementation.
=cut
sub DB_VERSION {
my ($self) = @_;
return $self->{supported_db_version};
}
=item read_db_configs
public instance () read_db_configs ()
Description:
This method reads any needed config variables from the configuration object
and then calls the Mail::SpamAssassin::Plugin::Bayes read_db_configs method.
=cut
sub read_db_configs {
my ($self) = @_;
# TODO: at some stage, this may be useful to read config items which
# control database bloat, like
#
# - use of hapaxes
# - use of case-sensitivity
# - more midrange-hapax-avoidance tactics when parsing headers (future)
#
# for now, we just set these settings statically.
my $conf = $self->{bayes}->{main}->{conf};
# Minimum desired database size? Expiry will not shrink the
# database below this number of entries. 100k entries is roughly
# equivalent to a 5Mb database file.
$self->{expiry_max_db_size} = $conf->{bayes_expiry_max_db_size};
$self->{expiry_pct} = $conf->{bayes_expiry_pct};
$self->{expiry_period} = $conf->{bayes_expiry_period};
$self->{expiry_max_exponent} = $conf->{bayes_expiry_max_exponent};
$self->{bayes}->read_db_configs();
}
=item prefork_init
public instance (Boolean) prefork_init ()
Description:
This optional method is called in the parent process shortly before
forking off child processes.
=cut
# sub prefork_init {
# my ($self) = @_;
# }
=item spamd_child_init
public instance (Boolean) spamd_child_init ()
Description:
This optional method is called in a child process shortly after being spawned.
=cut
# sub spamd_child_init {
# my ($self) = @_;
# }
=item tie_db_readonly
public instance (Boolean) tie_db_readonly ()
Description:
This method opens up the database in readonly mode.
=cut
sub tie_db_readonly {
my ($self) = @_;
die "bayes: tie_db_readonly: not implemented\n";
}
=item tie_db_writable
public instance (Boolean) tie_db_writable ()
Description:
This method opens up the database in writable mode.
Any callers of this methods should ensure that they call untie_db()
afterwards.
=cut
sub tie_db_writable {
my ($self) = @_;
die "bayes: tie_db_writable: not implemented\n";
}
=item untie_db
public instance () untie_db ()
Description:
This method unties the database.
=cut
sub untie_db {
my $self = shift;
die "bayes: untie_db: not implemented\n";
}
=item calculate_expire_delta
public instance (%) calculate_expire_delta (Integer $newest_atime,
Integer $start,
Integer $max_expire_mult)
Description:
This method performs a calculation on the data to determine the optimum
atime for token expiration.
=cut
sub calculate_expire_delta {
my ($self, $newest_atime, $start, $max_expire_mult) = @_;
die "bayes: calculate_expire_delta: not implemented\n";
}
=item token_expiration
public instance (Integer, Integer,
Integer, Integer) token_expiration(\% $opts,
Integer $newest_atime,
Integer $newdelta)
Description:
This method performs the database specific expiration of tokens based on
the passed in C<$newest_atime> and C<$newdelta>.
=cut
sub token_expiration {
my ($self, $opts, $newest_atime, $newdelta) = @_;
die "bayes: token_expiration: not implemented\n";
}
=item expire_old_tokens
public instance (Boolean) expire_old_tokens (\% hashref)
Description:
This method expires old tokens from the database.
=cut
sub expire_old_tokens {
my ($self, $opts) = @_;
my $ret;
my $eval_stat;
eval {
local $SIG{'__DIE__'}; # do not run user die() traps in here
if ($self->tie_db_writable()) {
$ret = $self->expire_old_tokens_trapped ($opts);
}
1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
};
if (!$self->{bayes}->{main}->{learn_caller_will_untie}) {
$self->untie_db();
}
if (defined $eval_stat) { # if we died, untie the dbs.
warn "bayes: expire_old_tokens: $eval_stat\n";
return 0;
}
$ret;
}
=item expire_old_tokens_trapped
public instance (Boolean) expire_old_tokens_trapped (\% $opts)
Description:
This methods does the actual token expiration.
XXX More docs here about the methodology and what not
=cut
sub expire_old_tokens_trapped {
my ($self, $opts) = @_;
# Flag that we're doing work
$self->set_running_expire_tok();
# We don't need to do an expire, so why were we called? Oh well.
if (!$self->expiry_due()) {
$self->remove_running_expire_tok();
return 0;
}
my $started = time();
my @vars = $self->get_storage_variables();
if ( $vars[10] > time ) {
dbg("bayes: expiry found newest atime in the future, resetting to current time");
$vars[10] = time;
}
# How many tokens do we want to keep?
my $goal_reduction = int($self->{expiry_max_db_size} * $self->{expiry_pct});
dbg("bayes: expiry check keep size, ".$self->{expiry_pct}." * max: $goal_reduction");
# Make sure we keep at least 100000 tokens in the DB
if ( $goal_reduction < 100000 ) {
$goal_reduction = 100000;
dbg("bayes: expiry keep size too small, resetting to 100,000 tokens");
}
# Now turn goal_reduction into how many to expire.
$goal_reduction = $vars[3] - $goal_reduction;
dbg("bayes: token count: ".$vars[3].", final goal reduction size: $goal_reduction");
if ( $goal_reduction < 1000 ) { # too few tokens to expire, abort.
dbg("bayes: reduction goal of $goal_reduction is under 1,000 tokens, skipping expire");
$self->set_last_expire(time());
$self->remove_running_expire_tok(); # this won't be cleaned up, so do it now.
return 1; # we want to indicate things ran as expected
}
# Estimate new atime delta based on the last atime delta
my $newdelta = 0;
if ( $vars[9] > 0 ) {
# newdelta = olddelta * old / goal;
# this may seem backwards, but since we're talking delta here,
# not actual atime, we want smaller atimes to expire more tokens,
# and visa versa.
#
$newdelta = int($vars[8] * $vars[9] / $goal_reduction);
}
# Calculate size difference between last expiration token removal
# count and the current goal removal count.
my $ratio = ($vars[9] == 0 || $vars[9] > $goal_reduction) ? $vars[9]/$goal_reduction : $goal_reduction/$vars[9];
dbg("bayes: first pass? current: ".time().", Last: ".$vars[4].", atime: ".$vars[8].", count: ".$vars[9].", newdelta: $newdelta, ratio: $ratio, period: ".$self->{expiry_period});
## ESTIMATION PHASE
#
# Do this for the first expire or "odd" looking results cause a first pass to determine atime:
#
# - last expire was more than 30 days ago
# assume mail flow stays roughly the same month to month, recompute if it's > 1 month
# - last atime delta was under expiry period
# if we're expiring often max_db_size should go up, but let's recompute just to check
# - last reduction count was < 1000 tokens
# ditto
# - new estimated atime delta is under expiry period
# ditto
# - difference of last reduction to current goal reduction is > 50%
# if the two values are out of balance, estimating atime is going to be funky, recompute
#
if ( (time() - $vars[4] > 86400*30) || ($vars[8] < $self->{expiry_period}) || ($vars[9] < 1000)
|| ($newdelta < $self->{expiry_period}) || ($ratio > 1.5) ) {
dbg("bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass)");
my $start = $self->{expiry_period}; # exponential search starting at ...? 1/2 day, 1, 2, 4, 8, 16, ...
my $max_expire_mult = 2**$self->{expiry_max_exponent}; # $max_expire_mult * $start = max expire time (256 days), power of 2.
dbg("bayes: expiry max exponent: ".$self->{expiry_max_exponent});
my %delta = $self->calculate_expire_delta($vars[10], $start, $max_expire_mult);
return 0 unless (%delta);
# This will skip the for loop if debugging isn't enabled ...
if (would_log('dbg', 'bayes')) {
dbg("bayes: atime\ttoken reduction");
dbg("bayes: ========\t===============");
for(my $i = 1; $i<=$max_expire_mult; $i <<= 1) {
dbg("bayes: ".$start*$i."\t".(exists $delta{$i} ? $delta{$i} : 0));
}
}
# Now figure out which max_expire_mult value gives the closest results to goal_reduction, without
# going over ... Go from the largest delta backwards so the reduction size increases
# (tokens that expire at 4 also expire at 3, 2, and 1, so 1 will always be the largest expiry...)
#
for( ; $max_expire_mult > 0; $max_expire_mult>>=1 ) {
next unless exists $delta{$max_expire_mult};
if ($delta{$max_expire_mult} > $goal_reduction) {
$max_expire_mult<<=1; # the max expire is actually the next power of 2 out
last;
}
}
# if max_expire_mult gets to 0, either we can't expire anything, or 1 is <= $goal_reduction
$max_expire_mult ||= 1;
# $max_expire_mult is now equal to the value we should use ...
# Check to see if the atime value we found is really good.
# It's not good if:
# - $max_expire_mult would not expire any tokens. This means that the majority of
# tokens are old or new, and more activity is required before an expiry can occur.
# - reduction count < 1000, not enough tokens to be worth doing an expire.
#
if ( !exists $delta{$max_expire_mult} || $delta{$max_expire_mult} < 1000 ) {
dbg("bayes: couldn't find a good delta atime, need more token difference, skipping expire");
$self->set_last_expire(time());
$self->remove_running_expire_tok(); # this won't be cleaned up, so do it now.
return 1; # we want to indicate things ran as expected
}
$newdelta = $start * $max_expire_mult;
dbg("bayes: first pass decided on $newdelta for atime delta");
}
else { # use the estimation method
dbg("bayes: can do estimation method for expiry, skipping first pass");
}
my ($kept, $deleted, $num_hapaxes, $num_lowfreq) = $self->token_expiration($opts, $newdelta, @vars);
my $done = time();
my $msg = "expired old bayes database entries in ".($done - $started)." seconds";
my $msg2 = "$kept entries kept, $deleted deleted";
if ($opts->{verbose}) {
my $hapax_pc = ($num_hapaxes * 100) / $kept;
my $lowfreq_pc = ($num_lowfreq * 100) / $kept;
print "$msg\n$msg2\n" or die "Error writing: $!";
printf "token frequency: 1-occurrence tokens: %3.2f%%\n", $hapax_pc
or die "Error writing: $!";
printf "token frequency: less than 8 occurrences: %3.2f%%\n", $lowfreq_pc
or die "Error writing: $!";
}
else {
dbg("bayes: $msg: $msg2");
}
$self->remove_running_expire_tok();
return 1;
}
=item sync_due
public instance (Boolean) sync_due ()
Description:
This methods determines if a sync is due.
=cut
sub sync_due {
my ($self) = @_;
die "bayes: sync_due: not implemented\n";
}
=item expiry_due
public instance (Boolean) expiry_due ()
Description:
This methods determines if an expire is due.
=cut
sub expiry_due {
my ($self) = @_;
$self->read_db_configs(); # make sure this has happened here
# If force expire was called, do the expire no matter what.
return 1 if ($self->{bayes}->{main}->{learn_force_expire});
# if config says not to auto expire then no need to continue
return 0 if ($self->{bayes}->{main}->{conf}->{bayes_auto_expire} == 0);
# is the database too small for expiry? (Do *not* use "scalar keys",
# as this will iterate through the entire db counting them!)
my @vars = $self->get_storage_variables();
my $ntoks = $vars[3];
my $last_expire = time() - $vars[4];
if (!$self->{bayes}->{main}->{ignore_safety_expire_timeout}) {
# if we're not ignoring the safety timeout, don't run an expire more
# than once every 12 hours.
return 0 if ($last_expire < 43200);
}
else {
# if we are ignoring the safety timeout (e.g.: mass-check), still
# limit the expiry to only one every 5 minutes.
return 0 if ($last_expire < 300);
}
dbg("bayes: DB expiry: tokens in DB: $ntoks, Expiry max size: ".$self->{expiry_max_db_size}.", Oldest atime: ".$vars[5].", Newest atime: ".$vars[10].", Last expire: ".$vars[4].", Current time: ".time());
my $conf = $self->{bayes}->{main}->{conf};
if ($ntoks <= 100000 || # keep at least 100k tokens
$self->{expiry_max_db_size} > $ntoks || # not enough tokens to cause an expire
$vars[10]-$vars[5] < 43200 || # delta between oldest and newest < 12h
$self->{db_version} < $self->DB_VERSION # ignore old db formats
) {
return 0;
}
return 1;
}
=item seen_get
public instance (Char) seen_get (String $msgid)
Description:
This method retrieves the stored value, if any, for C<$msgid>. The return
value is the stored string ('s' for spam and 'h' for ham) or undef if
C<$msgid> is not found.
=cut
sub seen_get {
my ($self, $msgid) = @_;
die "bayes: seen_get: not implemented\n";
}
=item seen_put
public instance (Boolean) seen_put (String $msgid, Char $flag)
Description:
This method records C<$msgid> as the type given by C<$flag>. C<$flag> is
one of two values 's' for spam and 'h' for ham.
=cut
sub seen_put {
my ($self, $msgid, $flag) = @_;
die "bayes: seen_put: not implemented\n";
}
=item seen_delete
public instance (Boolean) seen_delete (String $msgid)
Description:
This method removes C<$msgid> from storage.
=cut
sub seen_delete {
my ($self, $msgid) = @_;
die "bayes: seen_delete: not implemented\n";
}
=item get_storage_variables
public instance (@) get_storage_variables ()
Description:
This method retrieves the various administrative variables used by
the Bayes storage implementation.
The values returned in the array are in the following order:
0: scan count base
1: number of spam
2: number of ham
3: number of tokens in db
4: last expire atime
5: oldest token in db atime
6: db version value
7: last journal sync
8: last atime delta
9: last expire reduction count
10: newest token in db atime
=cut
sub get_storage_variables {
my ($self) = @_;
die "bayes: get_storage_variables: not implemented\n";
}
=item dump_db_toks
public instance () dump_db_toks (String $template, String $regex, @ @vars)
Description:
This method loops over all tokens, computing the probability for the token
and then printing it out according to the passed in template.
=cut
sub dump_db_toks {
my ($self, $template, $regex, @vars) = @_;
die "bayes: dump_db_toks: not implemented\n";
}
=item set_last_expire
public instance (Boolean) _set_last_expire (Integer $time)
Description:
This method sets the last expire time.
=cut
sub set_last_expire {
my ($self, $time) = @_;
die "bayes: set_last_expire: not implemented\n";
}
=item get_running_expire_tok
public instance (Time) get_running_expire_tok ()
Description:
This method determines if an expire is currently running and returns the time
the expire started.
=cut
sub get_running_expire_tok {
my ($self) = @_;
die "bayes: get_running_expire_tok: not implemented\n";
}
=item set_running_expire_tok
public instance (Time) set_running_expire_tok ()
Description:
This method sets the running expire time to the current time.
=cut
sub set_running_expire_tok {
my ($self) = @_;
die "bayes: set_running_expire_tok: not implemented\n";
}
=item remove_running_expire_tok
public instance (Boolean) remove_running_expire_tok ()
Description:
This method removes a currently set running expire time.
=cut
sub remove_running_expire_tok {
my ($self) = @_;
die "bayes: remove_running_expire_tok: not implemented\n";
}
=item tok_get
public instance (Integer, Integer, Time) tok_get (String $token)
Description:
This method retrieves the specified token (C<$token>) from storage and returns
it's spam count, ham count and last access time.
=cut
sub tok_get {
my ($self, $token) = @_;
die "bayes: tok_get: not implemented\n";
}
=item tok_get_all
public instance (\@) tok_get_all (@ @tokens)
Description:
This method retrieves the specified tokens (C<@tokens>) from storage and
returns an array ref of arrays spam count, ham count and last access time.
=cut
sub tok_get_all {
my ($self, $tokens) = @_;
die "bayes: tok_get_all: not implemented\n";
}
=item tok_count_change
public instance (Boolean) tok_count_change (Integer $spam_count,
Integer $ham_count,
String $token,
Time $atime)
Description:
This method takes a C<$spam_count> and C<$ham_count> and adds it to
C<$token> along with updating C<$token>s atime with C<$atime>.
=cut
sub tok_count_change {
my ($self, $spam_count, $ham_count, $token, $atime) = @_;
die "bayes: tok_count_change: not implemented\n";
}
=item multi_tok_count_change
public instance (Boolean) multi_tok_count_change (Integer $spam_count,
Integer $ham_count,
\% $tokens,
String $atime)
Description:
This method takes a C<$spam_count> and C<$ham_count> and adds it to all
of the tokens in the C<$tokens> hash ref along with updating each tokens
atime with C<$atime>.
=cut
sub multi_tok_count_change {
my ($self, $spam_count, $ham_count, $tokens, $atime) = @_;
die "bayes: multi_tok_count_change: not implemented\n";
}
=item nspam_nham_get
public instance (Integer, Integer) nspam_nham_get ()
Description:
This method retrieves the total number of spam and the total number of ham
currently under storage.
=cut
sub nspam_nham_get {
my ($self) = @_;
die "bayes: nspam_nham_get: not implemented\n";
}
=item nspam_nham_change
public instance (Boolean) nspam_nham_change (Integer $num_spam,
Integer $num_ham)
Description:
This method updates the number of spam and the number of ham in the database.
=cut
sub nspam_nham_change {
my ($self, $num_spam, $num_ham) = @_;
die "bayes: nspam_nham_change: not implemented\n";
}
=item tok_touch
public instance (Boolean) tok_touch (String $token,
Time $atime)
Description:
This method updates the given tokens (C<$token>) access time.
=cut
sub tok_touch {
my ($self, $token, $atime) = @_;
die "bayes: tok_touch: not implemented\n";
}
=item tok_touch_all
public instance (Boolean) tok_touch_all (\@ $tokens,
Time $atime)
Description:
This method does a mass update of the given list of tokens C<$tokens>, if the existing token
atime is < C<$atime>.
=cut
sub tok_touch_all {
my ($self, $tokens, $atime) = @_;
die "bayes: tok_touch_all: not implemented\n";
}
=item cleanup
public instance (Boolean) cleanup ()
Description:
This method performs any cleanup necessary before moving onto the next
operation.
=cut
sub cleanup {
my ($self) = @_;
die "bayes: cleanup: not implemented\n";
}
=item get_magic_re
public instance get_magic_re (String)
Description:
This method returns a regexp which indicates a magic token.
=cut
sub get_magic_re {
my ($self) = @_;
die "bayes: get_magic_re: not implemented\n";
}
=item sync
public instance (Boolean) sync (\% $opts)
Description:
This method performs a sync of the database.
=cut
sub sync {
my ($self, $opts) = @_;
die "bayes: sync: not implemented\n";
}
=item perform_upgrade
public instance (Boolean) perform_upgrade (\% $opts)
Description:
This method is a utility method that performs any necessary upgrades
between versions. It should know how to handle previous versions and
what needs to happen to upgrade them.
A true return value indicates success.
=cut
sub perform_upgrade {
my ($self, $opts) = @_;
die "bayes: perform_upgrade: not implemented\n";
}
=item clear_database
public instance (Boolean) clear_database ()
Description:
This method deletes all records for a particular user.
Callers should be aware that any errors returned by this method
could causes the database to be inconsistent for the given user.
=cut
sub clear_database {
my ($self) = @_;
die "bayes: clear_database: not implemented\n";
}
=item backup_database
public instance (Boolean) backup_database ()
Description:
This method will dump the users database in a machine readable format.
=cut
sub backup_database {
my ($self) = @_;
die "bayes: backup_database: not implemented\n";
}
=item restore_database
public instance (Boolean) restore_database (String $filename, Boolean $showdots)
Description:
This method restores a database from the given filename, C<$filename>.
Callers should be aware that any errors returned by this method
could causes the database to be inconsistent for the given user.
=cut
sub restore_database {
my ($self, $filename, $showdots) = @_;
die "bayes: restore_database: not implemented\n";
}
=item db_readable
public instance (Boolean) db_readable ()
Description:
This method returns whether or not the Bayes DB is available in a
readable state.
=cut
sub db_readable {
my ($self) = @_;
die "bayes: db_readable: not implemented\n";
}
=item db_writable
public instance (Boolean) db_writable ()
Description:
This method returns whether or not the Bayes DB is available in a
writable state.
=cut
sub db_writable {
my ($self) = @_;
die "bayes: db_writable: not implemented\n";
}
sub sa_die { Mail::SpamAssassin::sa_die(@_); }
1;
=back
=cut

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,75 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::BayesStore::SDBM;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Fcntl;
use Mail::SpamAssassin::BayesStore::DBM;
use Mail::SpamAssassin::Logger;
# our @DBNAMES; # <---- unused!
our @ISA = qw( Mail::SpamAssassin::BayesStore::DBM );
sub HAS_DBM_MODULE {
my ($self) = @_;
if (exists($self->{has_dbm_module})) {
return $self->{has_dbm_module};
}
$self->{has_dbm_module} = eval { require SDBM_File; };
}
sub DBM_MODULE {
return "SDBM_File";
}
# Possible file extensions used by the kinds of database files SDBM_File
# might create. We need these so we can create a new file and rename
# it into place.
sub DB_EXTENSIONS {
return ('.pag', '.dir');
}
sub _unlink_file {
my ($self, $filename) = @_;
for my $ext ($self->DB_EXTENSIONS) {
unlink $filename . $ext;
}
}
sub _rename_file {
my ($self, $sourcefilename, $targetfilename) = @_;
for my $ext ($self->DB_EXTENSIONS) {
return 0 unless (rename($sourcefilename . $ext, $targetfilename . $ext));
}
return 1;
}
# this is called directly from sa-learn(1).
sub perform_upgrade {
return 1;
}
1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,606 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Client - Client for spamd Protocol
=head1 SYNOPSIS
my $client = Mail::SpamAssassin::Client->new({
port => 783,
host => 'localhost',
username => 'someuser'});
or
my $client = Mail::SpamAssassin::Client->new({
socketpath => '/path/to/socket',
username => 'someuser'});
Optionally takes timeout, which is applied to IO::Socket for the
initial connection. If not supplied, it defaults to 30 seconds.
if ($client->ping()) {
print "Ping is ok\n";
}
my $result = $client->process($testmsg);
if ($result->{isspam} eq 'True') {
do something with spam message here
}
=head1 DESCRIPTION
Mail::SpamAssassin::Client is a module which provides a perl implementation of
the spamd protocol.
=cut
package Mail::SpamAssassin::Client;
use strict;
use warnings;
use re 'taint';
use IO::Socket;
use Errno qw(EBADF);
our($io_socket_module_name);
BEGIN {
if (eval { require IO::Socket::IP }) {
$io_socket_module_name = 'IO::Socket::IP';
} elsif (eval { require IO::Socket::INET6 }) {
$io_socket_module_name = 'IO::Socket::INET6';
} elsif (eval { require IO::Socket::INET }) {
$io_socket_module_name = 'IO::Socket::INET';
}
}
my $EOL = "\015\012";
my $BLANK = $EOL x 2;
my $PROTOVERSION = 'SPAMC/1.5';
=head1 PUBLIC METHODS
=head2 new
public class (Mail::SpamAssassin::Client) new (\% $args)
Description:
This method creates a new Mail::SpamAssassin::Client object.
=cut
sub new {
my ($class, $args) = @_;
$class = ref($class) || $class;
my $self = {};
# with a sockets_path set then it makes no sense to set host and port
if ($args->{socketpath}) {
$self->{socketpath} = $args->{socketpath};
}
else {
$self->{port} = $args->{port};
$self->{host} = $args->{host};
}
if (defined $args->{username}) {
$self->{username} = $args->{username};
}
if ($args->{timeout}) {
$self->{timeout} = $args->{timeout} || 30;
}
bless($self, $class);
$self;
}
=head2 process
public instance (\%) process (String $msg)
Description:
This method calls the spamd server with the PROCESS command.
The return value is a hash reference containing several pieces of information,
if available:
content_length
isspam
score
threshold
message
=cut
sub process {
my ($self, $msg, $is_check_p) = @_;
my $command = 'PROCESS';
if ($is_check_p) {
warn "Passing in \$is_check_p is deprecated, just call the check method instead.\n";
$command = 'CHECK';
}
return $self->_filter($msg, $command);
}
=head2 check
public instance (\%) check (String $msg)
Description:
The method implements the check call.
See the process method for the return value.
=cut
sub check {
my ($self, $msg) = @_;
return $self->_filter($msg, 'CHECK');
}
=head2 headers
public instance (\%) headers (String $msg)
Description:
This method implements the headers call.
See the process method for the return value.
=cut
sub headers {
my ($self, $msg) = @_;
return $self->_filter($msg, 'HEADERS');
}
=head2 learn
public instance (Boolean) learn (String $msg, Integer $learntype)
Description:
This method implements the learn call. C<$learntype> should be
an integer, 0 for spam, 1 for ham and 2 for forget. The return
value is a boolean indicating if the message was learned or not.
An undef return value indicates that there was an error and you
should check the resp_code/resp_msg values to determine what
the error was.
=cut
sub learn {
my ($self, $msg, $learntype) = @_;
$self->_clear_errors();
my $remote = $self->_create_connection();
return unless $remote;
my $msgsize = length($msg.$EOL);
print $remote "TELL $PROTOVERSION$EOL";
print $remote "Content-length: $msgsize$EOL";
print $remote "User: $self->{username}$EOL" if defined $self->{username};
if ($learntype == 0) {
print $remote "Message-class: spam$EOL";
print $remote "Set: local$EOL";
}
elsif ($learntype == 1) {
print $remote "Message-class: ham$EOL";
print $remote "Set: local$EOL";
}
elsif ($learntype == 2) {
print $remote "Remove: local$EOL";
}
else { # bad learntype
$self->{resp_code} = 00;
$self->{resp_msg} = 'do not know';
return;
}
print $remote "$EOL";
print $remote $msg;
print $remote "$EOL";
$! = 0; my $line = <$remote>;
# deal gracefully with a Perl I/O bug which may return status EBADF at eof
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (1): $!")
: die "error reading from spamd (1): $!";
return unless defined $line;
my ($version, $resp_code, $resp_msg) = $self->_parse_response_line($line);
$self->{resp_code} = $resp_code;
$self->{resp_msg} = $resp_msg;
return unless $resp_code == 0;
my $did_set = '';
my $did_remove = '';
for ($!=0; defined($line=<$remote>); $!=0) {
local $1;
if ($line =~ /DidSet: (.*)/i) {
$did_set = $1;
}
elsif ($line =~ /DidRemove: (.*)/i) {
$did_remove = $1;
}
elsif ($line =~ /^${EOL}$/) {
last;
}
}
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (2): $!")
: die "error reading from spamd (2): $!";
close $remote or die "error closing socket: $!";
if ($learntype == 0 || $learntype == 1) {
return $did_set =~ /local/;
}
else { #safe since we've already checked the $learntype values
return $did_remove =~ /local/;
}
}
=head2 report
public instance (Boolean) report (String $msg)
Description:
This method provides the report interface to spamd.
=cut
sub report {
my ($self, $msg) = @_;
$self->_clear_errors();
my $remote = $self->_create_connection();
return unless $remote;
my $msgsize = length($msg.$EOL);
print $remote "TELL $PROTOVERSION$EOL";
print $remote "Content-length: $msgsize$EOL";
print $remote "User: $self->{username}$EOL" if defined $self->{username};
print $remote "Message-class: spam$EOL";
print $remote "Set: local,remote$EOL";
print $remote "$EOL";
print $remote $msg;
print $remote "$EOL";
$! = 0; my $line = <$remote>;
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (3): $!")
: die "error reading from spamd (3): $!";
return unless defined $line;
my ($version, $resp_code, $resp_msg) = $self->_parse_response_line($line);
$self->{resp_code} = $resp_code;
$self->{resp_msg} = $resp_msg;
return unless $resp_code == 0;
my $reported_p = 0;
for ($!=0; defined($line=<$remote>); $!=0) {
if ($line =~ /DidSet:\s+.*remote/i) {
$reported_p = 1;
last;
}
elsif ($line =~ /^${EOL}$/) {
last;
}
}
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (4): $!")
: die "error reading from spamd (4): $!";
close $remote or die "error closing socket: $!";
return $reported_p;
}
=head2 revoke
public instance (Boolean) revoke (String $msg)
Description:
This method provides the revoke interface to spamd.
=cut
sub revoke {
my ($self, $msg) = @_;
$self->_clear_errors();
my $remote = $self->_create_connection();
return unless $remote;
my $msgsize = length($msg.$EOL);
print $remote "TELL $PROTOVERSION$EOL";
print $remote "Content-length: $msgsize$EOL";
print $remote "User: $self->{username}$EOL" if defined $self->{username};
print $remote "Message-class: ham$EOL";
print $remote "Set: local$EOL";
print $remote "Remove: remote$EOL";
print $remote "$EOL";
print $remote $msg;
print $remote "$EOL";
$! = 0; my $line = <$remote>;
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (5): $!")
: die "error reading from spamd (5): $!";
return unless defined $line;
my ($version, $resp_code, $resp_msg) = $self->_parse_response_line($line);
$self->{resp_code} = $resp_code;
$self->{resp_msg} = $resp_msg;
return unless $resp_code == 0;
my $revoked_p = 0;
for ($!=0; defined($line=<$remote>); $!=0) {
if ($line =~ /DidRemove:\s+remote/i) {
$revoked_p = 1;
last;
}
elsif ($line =~ /^${EOL}$/) {
last;
}
}
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (6): $!")
: die "error reading from spamd (6): $!";
close $remote or die "error closing socket: $!";
return $revoked_p;
}
=head2 ping
public instance (Boolean) ping ()
Description:
This method performs a server ping and returns 0 or 1 depending on
if the server responded correctly.
=cut
sub ping {
my ($self) = @_;
my $remote = $self->_create_connection();
return 0 unless ($remote);
print $remote "PING $PROTOVERSION$EOL";
print $remote "$EOL"; # bug 6187, bumps protocol version to 1.5
$! = 0; my $line = <$remote>;
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (7): $!")
: die "error reading from spamd (7): $!";
close $remote or die "error closing socket: $!";
return unless defined $line;
my ($version, $resp_code, $resp_msg) = $self->_parse_response_line($line);
return 0 unless ($resp_msg eq 'PONG');
return 1;
}
=head1 PRIVATE METHODS
=head2 _create_connection
private instance (IO::Socket) _create_connection ()
Description:
This method sets up a proper IO::Socket connection based on the arguments
used when creating the client object.
On failure, it sets an internal error code and returns undef.
=cut
sub _create_connection {
my ($self) = @_;
my $remote;
if ($self->{socketpath}) {
$remote = IO::Socket::UNIX->new( Peer => $self->{socketpath},
Type => SOCK_STREAM,
Timeout => $self->{timeout},
);
}
else {
my %params = ( Proto => "tcp",
PeerAddr => $self->{host},
PeerPort => $self->{port},
Timeout => $self->{timeout},
);
$remote = $io_socket_module_name->new(%params);
}
unless ($remote) {
print "Failed to create connection to spamd daemon: $!\n";
return;
}
$remote;
}
=head2 _parse_response_line
private instance (@) _parse_response_line (String $line)
Description:
This method parses the initial response line/header from the server
and returns its parts.
We have this as a separate method in case we ever decide to get fancy
with the response line.
=cut
sub _parse_response_line {
my ($self, $line) = @_;
$line =~ s/\r?\n$//;
return split(/\s+/, $line, 3);
}
=head2 _clear_errors
private instance () _clear_errors ()
Description:
This method clears out any current errors.
=cut
sub _clear_errors {
my ($self) = @_;
$self->{resp_code} = undef;
$self->{resp_msg} = undef;
}
=head2 _filter
private instance (\%) _filter (String $msg, String $command)
Description:
Makes the actual call to the spamd server for the various filter method
(ie PROCESS, CHECK, HEADERS, etc). The command that is passed in is
sent to the spamd server.
The return value is a hash reference containing several pieces of information,
if available:
content_length
isspam
score
threshold
message (if available)
=cut
sub _filter {
my ($self, $msg, $command) = @_;
my %data;
$self->_clear_errors();
my $remote = $self->_create_connection();
return 0 unless ($remote);
my $msgsize = length($msg.$EOL);
print $remote "$command $PROTOVERSION$EOL";
print $remote "Content-length: $msgsize$EOL";
print $remote "User: $self->{username}$EOL" if defined $self->{username};
print $remote "$EOL";
print $remote $msg;
print $remote "$EOL";
$! = 0; my $line = <$remote>;
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (8): $!")
: die "error reading from spamd (8): $!";
return unless defined $line;
my ($version, $resp_code, $resp_msg) = $self->_parse_response_line($line);
$self->{resp_code} = $resp_code;
$self->{resp_msg} = $resp_msg;
return unless $resp_code == 0;
for ($!=0; defined($line=<$remote>); $!=0) {
local($1,$2,$3);
if ($line =~ /Content-length: (\d+)/) {
$data{content_length} = $1;
}
elsif ($line =~ m!Spam: (\S+) ; (\S+) / (\S+)!) {
$data{isspam} = $1;
$data{score} = $2 + 0;
$data{threshold} = $3 + 0;
}
elsif ($line =~ /^${EOL}$/) {
last;
}
}
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (9): $!")
: die "error reading from spamd (9): $!";
my $return_msg;
for ($!=0; defined($line=<$remote>); $!=0) {
$return_msg .= $line;
}
defined $line || $!==0 or
$!==EBADF ? dbg("error reading from spamd (10): $!")
: die "error reading from spamd (10): $!";
$data{message} = $return_msg if ($return_msg);
close $remote or die "error closing socket: $!";
return \%data;
}
1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,206 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Conf::LDAP - load SpamAssassin scores from LDAP database
=head1 SYNOPSIS
(see Mail::SpamAssassin)
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using text analysis and
several internet-based realtime blacklists.
This class is used internally by SpamAssassin to load scores from an LDAP
database. Please refer to the C<Mail::SpamAssassin> documentation for public
interfaces.
=head1 METHODS
=over 4
=cut
package Mail::SpamAssassin::Conf::LDAP;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw();
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my ($main) = @_;
my $self = {
'main' => $main
};
bless ($self, $class);
$self;
}
###########################################################################
sub load_modules { # static
dbg("ldap: loading Net::LDAP and URI");
eval {
require Net::LDAP; # actual server connection
require URI; # parse server connection dsn
};
# do any other preloading that will speed up operation
}
###########################################################################
=item $f->load ($username)
Read configuration parameters from LDAP server and parse scores from it.
=back
=cut
sub load {
my ($self, $username) = @_;
my $conf = $self->{main}->{conf};
my $url = $conf->{user_scores_dsn}; # an ldap URI
dbg("ldap: URL is $url");
if(!defined($url) || $url eq '') {
dbg("ldap: No URL defined; skipping LDAP");
return;
}
eval {
# make sure we can see croak messages from DBI
local $SIG{'__DIE__'} = sub { warn "$_[0]"; };
require Net::LDAP;
require URI;
load_with_ldap($self, $username, $url);
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
if ($conf->{user_scores_fail_to_global}) {
info("ldap: failed to load user (%s) scores from LDAP server, ".
"using a global default: %s", $username, $eval_stat);
return 1;
} else {
warn sprintf(
"ldap: failed to load user (%s) scores from LDAP server: %s\n",
$username, $eval_stat);
return 0;
}
};
}
sub load_with_ldap {
my ($self, $username, $url) = @_;
# ldapurl = scheme "://" [hostport] ["/"
# [dn ["?" [attributes] ["?" [scope]
# ["?" [filter] ["?" extensions]]]]]]
my $uri = URI->new("$url");
my $host = $uri->host;
if (!defined($host) || $host eq '') {
dbg("ldap: No server specified, assuming localhost");
$host = "localhost";
}
my $port = $uri->port;
my $base = $uri->dn;
my @attr = $uri->attributes;
my $scope = $uri->scope;
my $filter = $uri->filter;
my $scheme = $uri->scheme;
my %extn = $uri->extensions; # unused
$filter =~ s/__USERNAME__/$username/g;
dbg("ldap: host=$host, port=$port, base='$base', attr=${attr[0]}, scope=$scope, filter='$filter'");
my $main = $self->{main};
my $conf = $main->{conf};
my $ldapuser = $conf->{user_scores_ldap_username};
my $ldappass = $conf->{user_scores_ldap_password};
if(!$ldapuser) {
undef($ldapuser);
} else {
dbg("ldap: user='$ldapuser'");
}
if(!$ldappass) {
undef($ldappass);
} else {
# don't log this to avoid leaking sensitive info
# dbg("ldap: pass='$ldappass'");
}
my $f_attribute = $attr[0];
my $ldap = Net::LDAP->new ("$host:$port",
onerror => "warn",
scheme => $scheme);
if (!defined($ldapuser) && !defined($ldappass)) {
$ldap->bind;
} else {
$ldap->bind($ldapuser, password => $ldappass);
}
my $result = $ldap->search( base => $base,
filter => $filter,
scope => $scope,
attrs => \@attr
);
my $config_text = '';
foreach my $entry ($result->all_entries) {
my @v = $entry->get_value($f_attribute);
foreach my $v (@v) {
dbg("ldap: retrieving prefs for $username: $v");
$config_text .= $v."\n";
}
}
if ($config_text ne '') {
$conf->{main} = $main;
$conf->parse_scores_only($config_text);
delete $conf->{main};
}
return;
}
###########################################################################
sub sa_die { Mail::SpamAssassin::sa_die(@_); }
###########################################################################
1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,195 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Conf::SQL - load SpamAssassin scores from SQL database
=head1 SYNOPSIS
(see Mail::SpamAssassin)
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using text analysis and
several internet-based realtime blacklists.
This class is used internally by SpamAssassin to load scores from an SQL
database. Please refer to the C<Mail::SpamAssassin> documentation for public
interfaces.
=head1 METHODS
=over 4
=cut
package Mail::SpamAssassin::Conf::SQL;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw();
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my ($main) = @_;
my $self = {
'main' => $main
};
bless ($self, $class);
$self;
}
###########################################################################
sub load_modules { # static
eval {
require DBI;
};
# do any other preloading that will speed up operation
}
###########################################################################
=item $f->load ($username)
Read configuration parameters from SQL database and parse scores from it.
=back
=cut
sub load {
my ($self, $username) = @_;
my $conf = $self->{main}->{conf};
my $dsn = $conf->{user_scores_dsn};
if (!defined($dsn) || $dsn eq '') {
dbg("config: no DSN defined; skipping sql");
return 1;
}
eval {
# make sure we can see croak messages from DBI
local $SIG{'__DIE__'} = sub { die "$_[0]"; };
require DBI;
load_with_dbi($self, $username, $dsn);
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
if ($conf->{user_scores_fail_to_global}) {
info("config: failed to load user (%s) scores from SQL database, ".
"using a global default: %s", $username, $eval_stat);
return 1;
} else {
warn sprintf(
"config: failed to load user (%s) scores from SQL database: %s\n",
$username, $eval_stat);
return 0;
}
};
return 1;
}
sub load_with_dbi {
my ($self, $username, $dsn) = @_;
my $main = $self->{main};
my $conf = $main->{conf};
my $dbuser = $conf->{user_scores_sql_username};
my $dbpass = $conf->{user_scores_sql_password};
my $custom_query = $conf->{user_scores_sql_custom_query};
my $f_preference = 'preference';
my $f_value = 'value';
my $f_username = 'username';
my $f_table = 'userpref';
my $dbh = DBI->connect($dsn, $dbuser, $dbpass, {'PrintError' => 0});
if ($dbh) {
my $sql;
if (defined($custom_query)) {
$sql = $custom_query;
my $quoted_username = $dbh->quote($username);
my ($mailbox, $domain) = split('@', $username);
my $quoted_mailbox = $dbh->quote($mailbox);
my $quoted_domain = $dbh->quote($domain);
$sql =~ s/_USERNAME_/$quoted_username/g;
$sql =~ s/_TABLE_/$f_table/g;
$sql =~ s/_MAILBOX_/$quoted_mailbox/g;
$sql =~ s/_DOMAIN_/$quoted_domain/g;
}
else {
$sql = "select $f_preference, $f_value from $f_table where ".
"$f_username = ".$dbh->quote($username).
" or $f_username = '\@GLOBAL' order by $f_username asc";
}
dbg("config: Conf::SQL: executing SQL: $sql");
my $sth = $dbh->prepare($sql);
if ($sth) {
my $rv = $sth->execute();
if ($rv) {
dbg("config: retrieving prefs for $username from SQL server");
my @row;
my $config_text = '';
while (@row = $sth->fetchrow_array()) {
$config_text .= (defined($row[0]) ? $row[0] : '') . "\t" .
(defined($row[1]) ? $row[1] : '') . "\n";
}
if ($config_text ne '') {
$conf->{main} = $main;
$conf->parse_scores_only($config_text);
delete $conf->{main};
}
$sth->finish();
undef $sth;
}
else {
die "config: SQL error: $sql\n".$sth->errstr."\n";
}
}
else {
die "config: SQL error: " . $dbh->errstr . "\n";
}
$dbh->disconnect();
}
else {
die "config: SQL error: " . DBI->errstr . "\n";
}
}
###########################################################################
sub sa_die { Mail::SpamAssassin::sa_die(@_); }
###########################################################################
1;

View File

@ -0,0 +1,414 @@
# Constants used in many parts of the SpamAssassin codebase.
#
# TODO! we need to reimplement parts of the RESERVED regexp!
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Constants;
use strict;
use warnings;
use re 'taint';
use Exporter ();
our @ISA = qw(Exporter);
our(@BAYES_VARS, @IP_VARS, @SA_VARS, %EXPORT_TAGS, @EXPORT_OK);
# NOTE: Unless you need these to be available at BEGIN time, you're better with this out of a BEGIN block with a simple our statement.
BEGIN {
@IP_VARS = qw(
IP_IN_RESERVED_RANGE IP_PRIVATE LOCALHOST IPV4_ADDRESS IP_ADDRESS
);
@BAYES_VARS = qw(
DUMP_MAGIC DUMP_TOKEN DUMP_BACKUP
);
# These are generic constants that may be used across several modules
@SA_VARS = qw(
HARVEST_DNSBL_PRIORITY MBX_SEPARATOR
MAX_BODY_LINE_LENGTH MAX_HEADER_KEY_LENGTH MAX_HEADER_VALUE_LENGTH
MAX_HEADER_LENGTH ARITH_EXPRESSION_LEXER AI_TIME_UNKNOWN
CHARSETS_LIKELY_TO_FP_AS_CAPS MAX_URI_LENGTH RULENAME_RE IS_RULENAME
META_RULES_MATCHING_RE
);
%EXPORT_TAGS = (
bayes => [ @BAYES_VARS ],
ip => [ @IP_VARS ],
sa => [ @SA_VARS ],
all => [ @BAYES_VARS, @IP_VARS, @SA_VARS ],
);
@EXPORT_OK = ( @BAYES_VARS, @IP_VARS, @SA_VARS );
}
# BAYES_VARS
use constant DUMP_MAGIC => 1;
use constant DUMP_TOKEN => 2;
use constant DUMP_SEEN => 4;
use constant DUMP_BACKUP => 8;
# IP_VARS
# ---------------------------------------------------------------------------
# Initialize a regexp for private IPs, i.e. ones that could be
# used inside a company and be the first or second relay hit by
# a message. Some companies use these internally and translate
# them using a NAT firewall. These are listed in the RBL as invalid
# originators -- which is true, if you receive the mail directly
# from them; however we do not, so we should ignore them.
#
# sources:
# IANA = <https://www.iana.org/numbers>,
# 5735 = <https://tools.ietf.org/html/rfc5735>
# 6598 = <https://tools.ietf.org/html/rfc6598>
# 4193 = <https://tools.ietf.org/html/rfc4193>
# CYMRU = <https://www.team-cymru.com/bogon-reference.html>
#
# This includes:
# host-local address space 127.0.0.0/8 and ::1,
# link-local address space 169.254.0.0/16 and fe80::/10,
# private-use address space 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16,
# TODO: Unique Local Unicast Addresses fc00::/7 (RFC 4193)
# shared address space 100.64.0.0/10 (RFC 6598 - for use in CGN),
# IPv4-mapped IPv6 address ::ffff:0:0/96 (RFC 3513)
#
use constant IP_PRIVATE => qr{^(?:
(?: # IPv4 addresses
10| # 10.0.0.0/8 Private Use (5735, 1918)
127| # 127.0.0.0/8 Host-local (5735, 1122)
169\.254| # 169.254.0.0/16 Link-local (5735, 3927)
172\.(?:1[6-9]|2[0-9]|3[01])| # 172.16.0.0/12 Private Use (5735, 1918)
192\.168| # 192.168.0.0/16 Private Use (5735, 1918)
100\.(?:6[4-9]|[7-9][0-9]|1[01][0-9]|12[0-7]) # 100.64.0.0/10 CGN (6598)
)\..*
|
(?: # IPv6 addresses
# don't use \b here, it hits on :'s
(?:IPv6: # with optional prefix
| (?<![a-f0-9:])
)
(?:
# IPv4 mapped in IPv6
# note the colon after the 12th byte in each here
(?:
# first 6 (12 bytes) non-zero
(?:0{1,4}:){5} ffff:
|
# leading zeros omitted (note {0,5} not {1,5})
::(?:0{1,4}:){0,4} ffff:
|
# trailing zeros (in the first 6) omitted
(?:0{1,4}:){1,4}: ffff:
|
# 0000 in second up to (including) fifth omitted
0{1,4}::(?:0{1,4}:){1,3} ffff:
|
# 0000 in third up to (including) fifth omitted
(?:0{1,4}:){2}:0{1,2}: ffff:
|
# 0000 in fourth up to (including) fifth omitted
(?:0{1,4}:){3}:0: ffff:
|
# 0000 in fifth omitted
(?:0{1,4}:){4}: ffff:
)
# and the IPv4 address appended to all of the 12 bytes above
(?:
10|
127|
169\.254|
172\.(?:1[6-9]|2[0-9]|3[01])|
192\.168|
100\.(?:6[4-9]|[7-9][0-9]|1[01][0-9]|12[0-7])
)\..*
| # or IPv6 link-local address space, fe80::/10
fe[89ab][0-9a-f]:.*
| # or the host-local ::1 addr, as a pure IPv6 address
# all 8 (16 bytes) of them present
(?:0{1,4}:){7} 0{0,3}1
|
# leading zeros omitted
:(?::0{1,4}){0,6}: 0{0,3}1
|
# 0000 in second up to (including) seventh omitted
0{1,4}:(?::0{1,4}){0,5}: 0{0,3}1
|
# 0000 in third up to (including) seventh omitted
(?:0{1,4}:){2}(?::0{1,4}){0,4}: 0{0,3}1
|
# 0000 in fourth up to (including) seventh omitted
(?:0{1,4}:){3}(?::0{1,4}){0,3}: 0{0,3}1
|
# 0000 in fifth up to (including) seventh omitted
(?:0{1,4}:){4}(?::0{1,4}){0,2}: 0{0,3}1
|
# 0000 in sixth up to (including) seventh omitted
(?:0{1,4}:){5}(?::0{1,4}){0,1}: 0{0,3}1
|
# 0000 in seventh omitted
(?:0{1,4}:){6}: 0{0,3}1
)
(?![a-f0-9:])
)
)}oxi;
# backward compatibility
use constant IP_IN_RESERVED_RANGE => IP_PRIVATE;
# ---------------------------------------------------------------------------
# match the various ways of saying "localhost".
use constant LOCALHOST => qr/
(?:
# as a string
localhost(?:\.localdomain)?
|
\b(?<!:) # ensure no "::" IPv6 marker before this one
# plain IPv4
127\.0\.0\.1 \b
|
# IPv6 addresses
# don't use \b here, it hits on :'s
(?:IPv6: # with optional prefix
| (?<![a-f0-9:])
)
(?:
# IPv4 mapped in IPv6
# note the colon after the 12th byte in each here
(?:
# first 6 (12 bytes) non-zero
(?:0{1,4}:){5} ffff:
|
# leading zeros omitted (note {0,5} not {1,5})
::(?:0{1,4}:){0,4} ffff:
|
# trailing zeros (in the first 6) omitted
(?:0{1,4}:){1,4}: ffff:
|
# 0000 in second up to (including) fifth omitted
0{1,4}::(?:0{1,4}:){1,3} ffff:
|
# 0000 in third up to (including) fifth omitted
(?:0{1,4}:){2}:0{1,2}: ffff:
|
# 0000 in fourth up to (including) fifth omitted
(?:0{1,4}:){3}:0: ffff:
|
# 0000 in fifth omitted
(?:0{1,4}:){4}: ffff:
)
# and the IPv4 address appended to all of the 12 bytes above
127\.0\.0\.1 # no \b, we check later
| # or (separately) a pure IPv6 address
# all 8 (16 bytes) of them present
(?:0{1,4}:){7} 0{0,3}1
|
# leading zeros omitted
:(?::0{1,4}){0,6}: 0{0,3}1
|
# 0000 in second up to (including) seventh omitted
0{1,4}:(?::0{1,4}){0,5}: 0{0,3}1
|
# 0000 in third up to (including) seventh omitted
(?:0{1,4}:){2}(?::0{1,4}){0,4}: 0{0,3}1
|
# 0000 in fourth up to (including) seventh omitted
(?:0{1,4}:){3}(?::0{1,4}){0,3}: 0{0,3}1
|
# 0000 in fifth up to (including) seventh omitted
(?:0{1,4}:){4}(?::0{1,4}){0,2}: 0{0,3}1
|
# 0000 in sixth up to (including) seventh omitted
(?:0{1,4}:){5}(?::0{1,4}){0,1}: 0{0,3}1
|
# 0000 in seventh omitted
(?:0{1,4}:){6}: 0{0,3}1
)
(?![a-f0-9:])
)
/oxi;
# ---------------------------------------------------------------------------
# an IP address, in IPv4 format only.
#
use constant IPV4_ADDRESS => qr/\b
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)
\b/ox;
# ---------------------------------------------------------------------------
# an IP address, in IPv4, IPv4-mapped-in-IPv6, or IPv6 format. NOTE: cannot
# just refer to $IPV4_ADDRESS, due to perl bug reported in nesting qr//s. :(
#
use constant IP_ADDRESS => qr/
(?:
\b(?<!:) # ensure no "::" IPv4 marker before this one
# plain IPv4, as above
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\b
|
# IPv6 addresses
# don't use \b here, it hits on :'s
(?:IPv6: # with optional prefix
| (?<![a-f0-9:])
)
(?:
# IPv4 mapped in IPv6
# note the colon after the 12th byte in each here
(?:
# first 6 (12 bytes) non-zero
(?:[a-f0-9]{1,4}:){6}
|
# leading zeros omitted (note {0,5} not {1,5})
::(?:[a-f0-9]{1,4}:){0,5}
|
# trailing zeros (in the first 6) omitted
(?:[a-f0-9]{1,4}:){1,5}:
|
# 0000 in second up to (including) fifth omitted
[a-f0-9]{1,4}::(?:[a-f0-9]{1,4}:){1,4}
|
# 0000 in third up to (including) fifth omitted
(?:[a-f0-9]{1,4}:){2}:(?:[a-f0-9]{1,4}:){1,3}
|
# 0000 in fourth up to (including) fifth omitted
(?:[a-f0-9]{1,4}:){3}:(?:[a-f0-9]{1,4}:){1,2}
|
# 0000 in fifth omitted
(?:[a-f0-9]{1,4}:){4}:[a-f0-9]{1,4}:
)
# and the IPv4 address appended to all of the 12 bytes above
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d) # no \b, we check later
| # or (separately) a pure IPv6 address
# all 8 (16 bytes) of them present
(?:[a-f0-9]{1,4}:){7}[a-f0-9]{1,4}
|
# leading zeros omitted
:(?::[a-f0-9]{1,4}){1,7}
|
# trailing zeros omitted
(?:[a-f0-9]{1,4}:){1,7}:
|
# 0000 in second up to (including) seventh omitted
[a-f0-9]{1,4}:(?::[a-f0-9]{1,4}){1,6}
|
# 0000 in third up to (including) seventh omitted
(?:[a-f0-9]{1,4}:){2}(?::[a-f0-9]{1,4}){1,5}
|
# 0000 in fourth up to (including) seventh omitted
(?:[a-f0-9]{1,4}:){3}(?::[a-f0-9]{1,4}){1,4}
|
# 0000 in fifth up to (including) seventh omitted
(?:[a-f0-9]{1,4}:){4}(?::[a-f0-9]{1,4}){1,3}
|
# 0000 in sixth up to (including) seventh omitted
(?:[a-f0-9]{1,4}:){5}(?::[a-f0-9]{1,4}){1,2}
|
# 0000 in seventh omitted
(?:[a-f0-9]{1,4}:){6}:[a-f0-9]{1,4}
|
# :: (the unspecified address 0:0:0:0:0:0:0:0)
# dos: I don't expect to see this address in a header, and
# it may cause non-address strings to match, but we'll
# include it for now since it is valid
::
)
(?![a-f0-9:])
)
/oxi;
# ---------------------------------------------------------------------------
use constant HARVEST_DNSBL_PRIORITY => 500;
# regular expression that matches message separators in The University of
# Washington's MBX mailbox format
use constant MBX_SEPARATOR => qr/^([\s\d]\d-[a-zA-Z]{3}-\d{4}\s\d{2}:\d{2}:\d{2}.*),(\d+);([\da-f]{12})-(\w{8})\r?$/;
# $1 = datestamp (str)
# $2 = size of message in bytes (int)
# $3 = message status - binary (hex)
# $4 = message ID (hex)
# ---------------------------------------------------------------------------
# values used for internal message representations
# maximum byte length of lines in the body
use constant MAX_BODY_LINE_LENGTH => 2048;
# maximum byte length of a header key
use constant MAX_HEADER_KEY_LENGTH => 256;
# maximum byte length of a header value including continued lines
use constant MAX_HEADER_VALUE_LENGTH => 8192;
# maximum byte length of entire header
use constant MAX_HEADER_LENGTH => 65536;
# maximum byte length of any given URI
use constant MAX_URI_LENGTH => 8192;
# used for meta rules and "if" conditionals in Conf::Parser
use constant ARITH_EXPRESSION_LEXER => qr/(?:
[\-\+\d\.]+| # A Number
\w[\w\:]*| # Rule or Class Name
[\(\)]| # Parens
\|\|| # Boolean OR
\&\&| # Boolean AND
\^| # Boolean XOR
!(?!=)| # Boolean NOT
>=?| # GT or EQ
<=?| # LT or EQ
==| # EQ
!=| # NEQ
[\+\-\*\/]| # Mathematical Operator
[\?:] # ? : Operator
)/ox;
# ArchiveIterator
# if AI doesn't read in the message in the first pass to see if the received
# date makes the message useful or not, we need to mark it so that in the
# second pass (when the message is actually read + processed) the received
# date is calculated. this value signifies "unknown" from the first pass.
use constant AI_TIME_UNKNOWN => 0;
# Charsets which use capital letters heavily in their encoded representation.
use constant CHARSETS_LIKELY_TO_FP_AS_CAPS => qr{[-_a-z0-9]*(?:
koi|jp|jis|euc|gb|big5|isoir|cp1251|windows-1251|georgianps|pt154|tis
)[-_a-z0-9]*}ix;
# Allowed rulename format
use constant RULENAME_RE => qr([_a-zA-Z][_a-zA-Z0-9]{0,127});
# Exact match
use constant IS_RULENAME => qr/^${\(RULENAME_RE)}$/;
# meta function rules_matching(), takes argument RULENAME_RE with glob *? characters
use constant META_RULES_MATCHING_RE => qr/(?<!_)\brules_matching\(\s*([_a-zA-Z*?][_a-zA-Z0-9*?]{0,127})\s*\)/;
1;

View File

@ -0,0 +1,179 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::DBBasedAddrList;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Fcntl;
use Mail::SpamAssassin::PersistentAddrList;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Logger;
our @ISA = qw(Mail::SpamAssassin::PersistentAddrList);
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = $class->SUPER::new(@_);
$self->{class} = $class;
bless ($self, $class);
$self;
}
###########################################################################
sub new_checker {
my ($factory, $main) = @_;
my $class = $factory->{class};
my $self = {
'main' => $main,
'accum' => { },
'is_locked' => 0,
'locked_file' => ''
};
my @order = split(/\s+/, $main->{conf}->{auto_whitelist_db_modules});
untaint_var(\@order);
my $dbm_module = Mail::SpamAssassin::Util::first_available_module (@order);
if (!$dbm_module) {
die "auto-whitelist: cannot find a usable DB package from auto_whitelist_db_modules: " .
$main->{conf}->{auto_whitelist_db_modules}."\n";
}
my $umask = umask ~ (oct($main->{conf}->{auto_whitelist_file_mode}));
# if undef then don't worry -- empty hash!
if (defined($main->{conf}->{auto_whitelist_path})) {
my $path = $main->sed_path($main->{conf}->{auto_whitelist_path});
my ($mod1, $mod2);
if ($main->{locker}->safe_lock
($path, 30, $main->{conf}->{auto_whitelist_file_mode}))
{
$self->{locked_file} = $path;
$self->{is_locked} = 1;
($mod1, $mod2) = ('R/W', O_RDWR | O_CREAT);
}
else {
$self->{is_locked} = 0;
($mod1, $mod2) = ('R/O', O_RDONLY);
}
dbg("auto-whitelist: tie-ing to DB file of type $dbm_module $mod1 in $path");
($self->{is_locked} && $dbm_module eq 'DB_File') and
Mail::SpamAssassin::Util::avoid_db_file_locking_bug($path);
if (! tie %{ $self->{accum} }, $dbm_module, $path, $mod2,
oct($main->{conf}->{auto_whitelist_file_mode}) & 0666)
{
my $err = $!; # might get overwritten later
if ($self->{is_locked}) {
$self->{main}->{locker}->safe_unlock($self->{locked_file});
$self->{is_locked} = 0;
}
die "auto-whitelist: cannot open auto_whitelist_path $path: $err\n";
}
}
umask $umask;
bless ($self, $class);
return $self;
}
###########################################################################
sub finish {
my $self = shift;
dbg("auto-whitelist: DB addr list: untie-ing and unlocking");
untie %{$self->{accum}};
if ($self->{is_locked}) {
dbg("auto-whitelist: DB addr list: file locked, breaking lock");
$self->{main}->{locker}->safe_unlock ($self->{locked_file});
$self->{is_locked} = 0;
}
# TODO: untrap signals to unlock the db file here
}
###########################################################################
sub get_addr_entry {
my ($self, $addr, $signedby) = @_;
my $entry = {
addr => $addr,
};
$entry->{msgcount} = $self->{accum}->{$addr} || 0;
$entry->{totscore} = $self->{accum}->{$addr.'|totscore'} || 0;
dbg("auto-whitelist: db-based $addr scores ".$entry->{msgcount}.'/'.$entry->{totscore});
return $entry;
}
###########################################################################
sub add_score {
my($self, $entry, $score) = @_;
$entry->{msgcount} ||= 0;
$entry->{addr} ||= '';
$entry->{msgcount}++;
$entry->{totscore} += $score;
dbg("auto-whitelist: add_score: new count: ".$entry->{msgcount}.", new totscore: ".$entry->{totscore});
$self->{accum}->{$entry->{addr}} = $entry->{msgcount};
$self->{accum}->{$entry->{addr}.'|totscore'} = $entry->{totscore};
return $entry;
}
###########################################################################
sub remove_entry {
my ($self, $entry) = @_;
my $addr = $entry->{addr};
delete $self->{accum}->{$addr};
delete $self->{accum}->{$addr.'|totscore'};
if ($addr =~ /^(.*)\|ip=none$/) {
# it doesn't have an IP attached.
# try to delete any per-IP entries for this addr as well.
# could be slow...
my $mailaddr = $1;
while (my ($key, $value) = each %{$self->{accum}}) {
# regex will catch both key and key|totscore entries and delete them
if ($key =~ /^\Q${mailaddr}\E\|/) {
delete $self->{accum}->{$key};
}
}
}
}
###########################################################################
1;

View File

@ -0,0 +1,730 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
use strict; # make Test::Perl::Critic happy
package Mail::SpamAssassin::Dns; 1;
package Mail::SpamAssassin::PerMsgStatus;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::AsyncLoop;
use Mail::SpamAssassin::Constants qw(:ip);
use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows);
use File::Spec;
use IO::Socket;
use POSIX ":sys_wait_h";
our $KNOWN_BAD_DIALUP_RANGES; # Nothing uses this var???
our $LAST_DNS_CHECK;
# use very well-connected domains (fast DNS response, many DNS servers,
# geographical distribution is a plus, TTL of at least 3600s)
our @EXISTING_DOMAINS = qw{
adelphia.net
akamai.com
apache.org
cingular.com
colorado.edu
comcast.net
doubleclick.com
ebay.com
gmx.net
google.com
intel.com
kernel.org
linux.org
mit.edu
motorola.com
msn.com
sourceforge.net
sun.com
w3.org
yahoo.com
};
our $IS_DNS_AVAILABLE = undef;
#Removed $VERSION per BUG 6422
#$VERSION = 'bogus'; # avoid CPAN.pm picking up razor ver
###########################################################################
BEGIN {
# some trickery. Load these modules right here, if possible; that way, if
# the module exists, we'll get it loaded now. Very useful to avoid attempted
# loads later (which will happen). If we do a fork(), we could wind up
# attempting to load these modules in *every* subprocess.
#
# # We turn off strict and warnings, because Net::DNS and Razor both contain
# # crud that -w complains about (perl 5.6.0). Not that this seems to work,
# # mind ;)
# no strict;
# local ($^W) = 0;
no warnings;
eval {
require Net::DNS;
require Net::DNS::Resolver;
};
eval {
require MIME::Base64;
};
eval {
require IO::Socket::UNIX;
};
};
###########################################################################
sub do_rbl_lookup {
my ($self, $rule, $set, $type, $host, $subtest) = @_;
$host =~ s/\.\z//s; # strip a redundant trailing dot
my $key = "dns:$type:$host";
my $existing_ent = $self->{async}->get_lookup($key);
# only make a specific query once
if (!$existing_ent) {
my $ent = {
key => $key,
zone => $host, # serves to fetch other per-zone settings
type => "DNSBL-".$type,
sets => [ ], # filled in below
rules => [ ], # filled in below
# id is filled in after we send the query below
};
$existing_ent = $self->{async}->bgsend_and_start_lookup(
$host, $type, undef, $ent,
sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
master_deadline => $self->{master_deadline} );
}
if ($existing_ent) {
# always add set
push @{$existing_ent->{sets}}, $set;
# sometimes match or always match
if (defined $subtest) {
$self->{dnspost}->{$set}->{$subtest} = $rule;
} else {
push @{$existing_ent->{rules}}, $rule;
}
$self->{rule_to_rblkey}->{$rule} = $key;
}
}
# TODO: these are constant so they should only be added once at startup
sub register_rbl_subtest {
my ($self, $rule, $set, $subtest) = @_;
if ($subtest =~ /^sb:/) {
warn("dns: ignored $rule, SenderBase rules are deprecated\n");
return 0;
}
$self->{dnspost}->{$set}->{$subtest} = $rule;
}
sub do_dns_lookup {
my ($self, $rule, $type, $host) = @_;
$host =~ s/\.\z//s; # strip a redundant trailing dot
my $key = "dns:$type:$host";
my $ent = {
key => $key,
zone => $host, # serves to fetch other per-zone settings
type => "DNSBL-".$type,
rules => [ $rule ],
# id is filled in after we send the query below
};
$ent = $self->{async}->bgsend_and_start_lookup(
$host, $type, undef, $ent,
sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
master_deadline => $self->{master_deadline} );
$ent;
}
###########################################################################
sub dnsbl_hit {
my ($self, $rule, $question, $answer) = @_;
my $log = "";
if (substr($rule, 0, 2) eq "__") {
# don't bother with meta rules
} elsif ($answer->type eq 'TXT') {
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
$log = join('',$answer->txtdata);
local $1;
$log =~ s{ (?<! [<(\[] ) (https? : // \S+)}{<$1>}xgi;
} else { # assuming $answer->type eq 'A'
local($1,$2,$3,$4,$5);
if ($question->string =~ m/^((?:[0-9a-fA-F]\.){32})(\S+\w)/) {
$log = ' listed in ' . lc($2);
my $ipv6addr = join('', reverse split(/\./, lc $1));
$ipv6addr =~ s/\G(....)/$1:/g; chop $ipv6addr;
$ipv6addr =~ s/:0{1,3}/:/g;
$log = $ipv6addr . $log;
} elsif ($question->string =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)\.(\S+\w)/) {
$log = "$4.$3.$2.$1 listed in " . lc($5);
} else {
$log = 'listed in ' . $question->string;
}
}
# TODO: this may result in some log messages appearing under the
# wrong rules, since we could see this sequence: { test one hits,
# test one's message is logged, test two hits, test one fires again
# on another IP, test one's message is logged for that other IP --
# but under test two's heading }. Right now though it's better
# than just not logging at all.
$self->{already_logged} ||= { };
if ($log && !$self->{already_logged}->{$log}) {
$self->test_log($log);
$self->{already_logged}->{$log} = 1;
}
if (!$self->{tests_already_hit}->{$rule}) {
$self->got_hit($rule, "RBL: ", ruletype => "dnsbl");
}
}
sub dnsbl_uri {
my ($self, $question, $answer) = @_;
my $qname = $question->qname;
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
#
# rdatastr() is historical/undocumented, use rdstring() since Net::DNS 0.69
my $rdatastr = $answer->UNIVERSAL::can('txtdata') ? join('',$answer->txtdata)
: $answer->UNIVERSAL::can('rdstring') ? $answer->rdstring
: $answer->rdatastr;
if (defined $qname && defined $rdatastr) {
my $qclass = $question->qclass;
my $qtype = $question->qtype;
my @vals;
push(@vals, "class=$qclass") if $qclass ne "IN";
push(@vals, "type=$qtype") if $qtype ne "A";
my $uri = "dns:$qname" . (@vals ? "?" . join(";", @vals) : "");
push @{ $self->{dnsuri}->{$uri} }, $rdatastr;
dbg("dns: hit <$uri> $rdatastr");
}
}
# called as a completion routine to bgsend by DnsResolver::poll_responses;
# returns 1 on successful packet processing
sub process_dnsbl_result {
my ($self, $ent, $pkt) = @_;
return if !$pkt;
my $question = ($pkt->question)[0];
return if !$question;
my $sets = $ent->{sets} || [];
my $rules = $ent->{rules};
# NO_DNS_FOR_FROM
if ($self->{sender_host} &&
# fishy, qname should have been "RFC 1035 zone format" -decoded first
lc($question->qname) eq lc($self->{sender_host}) &&
$question->qtype =~ /^(?:A|MX)$/ &&
$pkt->header->rcode =~ /^(?:NXDOMAIN|SERVFAIL)$/ &&
++$self->{sender_host_fail} == 2)
{
for my $rule (@{$rules}) {
$self->got_hit($rule, "DNS: ", ruletype => "dns");
}
}
# DNSBL tests are here
foreach my $answer ($pkt->answer) {
next if !$answer;
# track all responses
$self->dnsbl_uri($question, $answer);
my $answ_type = $answer->type;
# TODO: there are some CNAME returns that might be useful
next if ($answ_type ne 'A' && $answ_type ne 'TXT');
if ($answ_type eq 'A') {
# Net::DNS::RR::A::address() is available since Net::DNS 0.69
my $ip_address = $answer->UNIVERSAL::can('address') ? $answer->address
: $answer->rdatastr;
# skip any A record that isn't on 127.0.0.0/8
next if $ip_address !~ /^127\./;
}
for my $rule (@{$rules}) {
$self->dnsbl_hit($rule, $question, $answer);
}
for my $set (@{$sets}) {
if ($self->{dnspost}->{$set}) {
$self->process_dnsbl_set($set, $question, $answer);
}
}
}
return 1;
}
sub process_dnsbl_set {
my ($self, $set, $question, $answer) = @_;
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
#
# rdatastr() is historical/undocumented, use rdstring() since Net::DNS 0.69
my $rdatastr = $answer->UNIVERSAL::can('txtdata') ? join('',$answer->txtdata)
: $answer->UNIVERSAL::can('rdstring') ? $answer->rdstring
: $answer->rdatastr;
while (my ($subtest, $rule) = each %{ $self->{dnspost}->{$set} }) {
next if $self->{tests_already_hit}->{$rule};
if ($subtest =~ /^\d+\.\d+\.\d+\.\d+$/) {
# test for exact equality, not a regexp (an IPv4 address)
$self->dnsbl_hit($rule, $question, $answer) if $subtest eq $rdatastr;
}
# bitmask
elsif ($subtest =~ /^\d+$/) {
# Bug 6803: response should be within 127.0.0.0/8, ignore otherwise
if ($rdatastr =~ m/^127\.\d{1,3}\.\d{1,3}\.\d{1,3}$/ &&
Mail::SpamAssassin::Util::my_inet_aton($rdatastr) & $subtest)
{
$self->dnsbl_hit($rule, $question, $answer);
}
}
# regular expression
else {
my $test = qr/$subtest/;
if ($rdatastr =~ /$test/) {
$self->dnsbl_hit($rule, $question, $answer);
}
}
}
}
sub harvest_until_rule_completes {
my ($self, $rule) = @_;
dbg("dns: harvest_until_rule_completes");
my $result = 0;
for (my $first=1; ; $first=0) {
# complete_lookups() may call completed_callback(), which may
# call start_lookup() again (like in Plugin::URIDNSBL)
my ($alldone,$anydone) =
$self->{async}->complete_lookups($first ? 0 : 1.0, 1);
$result = 1 if $self->is_rule_complete($rule);
last if $result || $alldone;
dbg("dns: harvest_until_rule_completes - check_tick");
$self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
}
return $result;
}
sub harvest_dnsbl_queries {
my ($self) = @_;
dbg("dns: harvest_dnsbl_queries");
for (my $first=1; ; $first=0) {
# complete_lookups() may call completed_callback(), which may
# call start_lookup() again (like in Plugin::URIDNSBL)
# the first time around we specify a 0 timeout, which gives
# complete_lookups a chance to ripe any available results and
# abort overdue requests, without needlessly waiting for more
my ($alldone,$anydone) =
$self->{async}->complete_lookups($first ? 0 : 1.0, 1);
last if $alldone;
dbg("dns: harvest_dnsbl_queries - check_tick");
$self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
}
# explicitly abort anything left
$self->{async}->abort_remaining_lookups();
$self->{async}->log_lookups_timing();
$self->mark_all_async_rules_complete();
1;
}
# collect and process whatever DNS responses have already arrived,
# don't waste time waiting for more, don't poll too often.
# don't abort any queries even if overdue,
sub harvest_completed_queries {
my ($self) = @_;
# don't bother collecting responses too often
my $last_poll_time = $self->{async}->last_poll_responses_time();
return if defined $last_poll_time && time - $last_poll_time < 0.1;
my ($alldone,$anydone) = $self->{async}->complete_lookups(0, 0);
if ($anydone) {
dbg("dns: harvested completed queries");
# $self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
}
}
sub set_rbl_tag_data {
my ($self) = @_;
# DNS URIs
my $rbl_tag = $self->{tag_data}->{RBL}; # just in case, should be empty
$rbl_tag = '' if !defined $rbl_tag;
while (my ($dnsuri, $answers) = each %{ $self->{dnsuri} }) {
# when parsing, look for elements of \".*?\" or \S+ with ", " as separator
$rbl_tag .= "<$dnsuri>" . " [" . join(", ", @{ $answers }) . "]\n";
}
if (defined $rbl_tag && $rbl_tag ne '') {
chomp $rbl_tag;
$self->set_tag('RBL', $rbl_tag);
}
}
###########################################################################
sub rbl_finish {
my ($self) = @_;
$self->set_rbl_tag_data();
delete $self->{dnspost};
delete $self->{dnsuri};
}
###########################################################################
sub load_resolver {
my ($self) = @_;
$self->{resolver} = $self->{main}->{resolver};
return $self->{resolver}->load_resolver();
}
sub clear_resolver {
my ($self) = @_;
dbg("dns: clear_resolver");
$self->{main}->{resolver}->{res} = undef;
return 0;
}
sub lookup_ns {
my ($self, $dom) = @_;
return unless $self->load_resolver();
return if ($self->server_failed_to_respond_for_domain ($dom));
my $nsrecords;
dbg("dns: looking up NS for '$dom'");
eval {
my $query = $self->{resolver}->send($dom, 'NS');
my @nses;
if ($query) {
foreach my $rr ($query->answer) {
if ($rr->type eq "NS") { push (@nses, $rr->nsdname); }
}
}
$nsrecords = [ @nses ];
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("dns: NS lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
return;
};
$nsrecords;
}
sub is_dns_available {
my ($self) = @_;
my $dnsopt = $self->{conf}->{dns_available};
my $dnsint = $self->{conf}->{dns_test_interval} || 600;
my @domains;
$LAST_DNS_CHECK ||= 0;
my $diff = time() - $LAST_DNS_CHECK;
# undef $IS_DNS_AVAILABLE if we should be testing for
# working DNS and our check interval time has passed
if ($dnsopt eq "test" && $diff > $dnsint) {
$IS_DNS_AVAILABLE = undef;
dbg("dns: is_dns_available() last checked %.1f seconds ago; re-checking",
$diff);
}
return $IS_DNS_AVAILABLE if (defined $IS_DNS_AVAILABLE);
$LAST_DNS_CHECK = time();
$IS_DNS_AVAILABLE = 0;
if ($dnsopt eq "no") {
dbg("dns: dns_available set to no in config file, skipping test");
return $IS_DNS_AVAILABLE;
}
# Even if "dns_available" is explicitly set to "yes", we want to ignore
# DNS if we're only supposed to be looking at local tests.
goto done if ($self->{main}->{local_tests_only});
# Check version numbers - runtime check only
if (defined $Net::DNS::VERSION) {
if (am_running_on_windows()) {
if ($Net::DNS::VERSION < 0.46) {
warn("dns: Net::DNS version is $Net::DNS::VERSION, but need 0.46 for Win32");
return $IS_DNS_AVAILABLE;
}
}
else {
if ($Net::DNS::VERSION < 0.34) {
warn("dns: Net::DNS version is $Net::DNS::VERSION, but need 0.34");
return $IS_DNS_AVAILABLE;
}
}
}
$self->clear_resolver();
goto done unless $self->load_resolver();
if ($dnsopt eq "yes") {
# optionally shuffle the list of nameservers to distribute the load
if ($self->{conf}->{dns_options}->{rotate}) {
my @nameservers = $self->{resolver}->available_nameservers();
Mail::SpamAssassin::Util::fisher_yates_shuffle(\@nameservers);
dbg("dns: shuffled NS list: " . join(", ", @nameservers));
$self->{resolver}->available_nameservers(@nameservers);
}
$IS_DNS_AVAILABLE = 1;
dbg("dns: dns_available set to yes in config file, skipping test");
return $IS_DNS_AVAILABLE;
}
if ($dnsopt =~ /^test:\s*(\S.*)$/) {
@domains = split (/\s+/, $1);
dbg("dns: looking up NS records for user specified domains: %s",
join(", ", @domains));
} else {
@domains = @EXISTING_DOMAINS;
dbg("dns: looking up NS records for built-in domains");
}
# do the test with a full set of configured nameservers
my @nameservers = $self->{resolver}->configured_nameservers();
# optionally shuffle the list of nameservers to distribute the load
if ($self->{conf}->{dns_options}->{rotate}) {
Mail::SpamAssassin::Util::fisher_yates_shuffle(\@nameservers);
dbg("dns: shuffled NS list, testing: " . join(", ", @nameservers));
} else {
dbg("dns: testing resolver nameservers: " . join(", ", @nameservers));
}
# Try the different nameservers here and collect a list of working servers
my @good_nameservers;
foreach my $ns (@nameservers) {
$self->{resolver}->available_nameservers($ns); # try just this one
for (my $retry = 3; $retry > 0 && @domains; $retry--) {
my $domain = splice(@domains, rand(@domains), 1);
dbg("dns: trying ($retry) $domain, server $ns ...");
my $result = $self->lookup_ns($domain);
$self->{resolver}->finish_socket();
if (!$result) {
dbg("dns: NS lookup of $domain using $ns failed horribly, ".
"may not be a valid nameserver");
last;
} elsif (!@$result) {
dbg("dns: NS lookup of $domain using $ns failed, no results found");
} else {
dbg("dns: NS lookup of $domain using $ns succeeded => DNS available".
" (set dns_available to override)");
push(@good_nameservers, $ns);
last;
}
}
}
if (!@good_nameservers) {
dbg("dns: all NS queries failed => DNS unavailable ".
"(set dns_available to override)");
} else {
$IS_DNS_AVAILABLE = 1;
dbg("dns: NS list: ".join(", ", @good_nameservers));
$self->{resolver}->available_nameservers(@good_nameservers);
}
done:
# jm: leaving this in!
dbg("dns: is DNS available? " . $IS_DNS_AVAILABLE);
return $IS_DNS_AVAILABLE;
}
###########################################################################
sub server_failed_to_respond_for_domain {
my ($self, $dom) = @_;
if ($self->{dns_server_too_slow}->{$dom}) {
dbg("dns: server for '$dom' failed to reply previously, not asking again");
return 1;
}
return 0;
}
sub set_server_failed_to_respond_for_domain {
my ($self, $dom) = @_;
dbg("dns: server for '$dom' failed to reply, marking as bad");
$self->{dns_server_too_slow}->{$dom} = 1;
}
###########################################################################
sub enter_helper_run_mode {
my ($self) = @_;
dbg("dns: entering helper-app run mode");
$self->{old_slash} = $/; # Razor pollutes this
%{$self->{old_env}} = ();
if ( %ENV ) {
# undefined values in %ENV can result due to autovivification elsewhere,
# this prevents later possible warnings when we restore %ENV
while (my ($key, $value) = each %ENV) {
$self->{old_env}->{$key} = $value if defined $value;
}
}
Mail::SpamAssassin::Util::clean_path_in_taint_mode();
my $newhome;
if ($self->{main}->{home_dir_for_helpers}) {
$newhome = $self->{main}->{home_dir_for_helpers};
} else {
# use spamd -u user's home dir
$newhome = (Mail::SpamAssassin::Util::portable_getpwuid ($>))[7];
}
if ($newhome) {
$ENV{'HOME'} = Mail::SpamAssassin::Util::untaint_file_path ($newhome);
}
# enforce SIGCHLD as DEFAULT; IGNORE causes spurious kernel warnings
# on Red Hat NPTL kernels (bug 1536), and some users of the
# Mail::SpamAssassin modules set SIGCHLD to be a fatal signal
# for some reason! (bug 3507)
$self->{old_sigchld_handler} = $SIG{CHLD};
$SIG{CHLD} = 'DEFAULT';
}
sub leave_helper_run_mode {
my ($self) = @_;
dbg("dns: leaving helper-app run mode");
$/ = $self->{old_slash};
%ENV = %{$self->{old_env}};
if (defined $self->{old_sigchld_handler}) {
$SIG{CHLD} = $self->{old_sigchld_handler};
} else {
# if SIGCHLD has never been explicitly set, it's returned as undef.
# however, when *setting* SIGCHLD, using undef(%) or assigning to an
# undef value produces annoying 'Use of uninitialized value in scalar
# assignment' warnings. That's silly. workaround:
$SIG{CHLD} = 'DEFAULT';
}
}
# note: this must be called before leave_helper_run_mode() is called,
# as the SIGCHLD signal must be set to DEFAULT for it to work.
sub cleanup_kids {
my ($self, $pid) = @_;
if ($SIG{CHLD} && $SIG{CHLD} ne 'IGNORE') { # running from spamd
waitpid ($pid, 0);
}
}
###########################################################################
sub register_async_rule_start {
my ($self, $rule) = @_;
dbg("dns: $rule lookup start");
$self->{rule_to_rblkey}->{$rule} = '*ASYNC_START';
}
sub register_async_rule_finish {
my ($self, $rule) = @_;
dbg("dns: $rule lookup finished");
delete $self->{rule_to_rblkey}->{$rule};
}
sub mark_all_async_rules_complete {
my ($self) = @_;
$self->{rule_to_rblkey} = { };
}
sub is_rule_complete {
my ($self, $rule) = @_;
my $key = $self->{rule_to_rblkey}->{$rule};
if (!defined $key) {
# dbg("dns: $rule lookup complete, not in list");
return 1;
}
if ($key eq '*ASYNC_START') {
dbg("dns: $rule lookup not yet complete");
return 0; # not yet complete
}
my $ent = $self->{async}->get_lookup($key);
if (!defined $ent) {
dbg("dns: $rule lookup complete, $key no longer pending");
return 1;
}
dbg("dns: $rule lookup not yet complete");
return 0; # not yet complete
}
###########################################################################
# interface called by SPF plugin
sub check_for_from_dns {
my ($self, $pms) = @_;
if (defined $pms->{sender_host_fail}) {
return ($pms->{sender_host_fail} == 2); # both MX and A need to fail
}
}
1;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,110 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Locales;
use strict;
use warnings;
# use bytes;
use re 'taint';
###########################################################################
# A mapping of known country codes to frequent charsets used therein.
# note that the ISO and CP charsets will already have been permitted,
# so only "unusual" charsets should be listed here.
#
# Country codes should be lowercase, charsets uppercase.
#
# A good listing is in /usr/share/config/charsets from KDE 2.2.1
#
our %charsets_for_locale = (
# Japanese: Peter Evans writes: iso-2022-jp = rfc approved, rfc 1468, created
# by Jun Murai in 1993 back when he didn't have white hair! rfc approved.
# (rfc 2237) <-- by M$.
'ja' => 'EUCJP JISX020119760 JISX020819830 JISX020819900 JISX020819970 '.
'JISX021219900 JISX021320001 JISX021320002 SHIFT_JIS SHIFTJIS '.
'ISO2022JP SJIS JIS7 JISX0201 JISX0208 JISX0212',
# Korea
'ko' => 'EUCKR KSC56011987',
# Cyrillic: Andrew Vasilyev notes CP866 is common (bug 2278)
'ru' => 'KOI8R KOI8U KOI8T ISOIR111 CP1251 GEORGIANPS CP1251 PT154 CP866',
'ka' => 'KOI8R KOI8U KOI8T ISOIR111 CP1251 GEORGIANPS CP1251 PT154 CP866',
'tg' => 'KOI8R KOI8U KOI8T ISOIR111 CP1251 GEORGIANPS CP1251 PT154 CP866',
'be' => 'KOI8R KOI8U KOI8T ISOIR111 CP1251 GEORGIANPS CP1251 PT154 CP866',
'uk' => 'KOI8R KOI8U KOI8T ISOIR111 CP1251 GEORGIANPS CP1251 PT154 CP866',
'bg' => 'KOI8R KOI8U KOI8T ISOIR111 CP1251 GEORGIANPS CP1251 PT154 CP866',
# Thai
'th' => 'TIS620',
# Chinese (simplified and traditional). Peter Evans writes: new government
# mandated chinese encoding = gb18030, chinese mail is supposed to be
# iso-2022-cn (rfc 1922?)
'zh' => 'GB1988 GB2312 GB231219800 GB18030 GBK BIG5HKSCS BIG5 EUCTW ISO2022CN',
# Chinese Traditional charsets only
'zh.big5' => 'BIG5HKSCS BIG5 EUCTW',
# Chinese Simplified charsets only
'zh.gb2312' => 'GB1988 GB2312 GB231219800 GB18030 GBK ISO2022CN',
);
###########################################################################
sub is_charset_ok_for_locales {
my ($cs, @locales) = @_;
$cs = uc $cs; $cs =~ s/[^A-Z0-9]//g;
$cs =~ s/^3D//gs; # broken by quoted-printable
$cs =~ s/:.*$//gs; # trim off multiple charsets, just use 1st
study $cs; # study is a no-op since perl 5.16.0, eliminating related bugs
#warn "JMD $cs";
# always OK (the net speaks mostly roman charsets)
return 1 if ($cs eq 'USASCII');
return 1 if ($cs =~ /^ISO8859/);
return 1 if ($cs =~ /^ISO10646/);
return 1 if ($cs =~ /^UTF/);
return 1 if ($cs =~ /^UCS/);
return 1 if ($cs =~ /^CP125/);
return 1 if ($cs =~ /^WINDOWS/); # argh, Windows
return 1 if ($cs eq 'IBM852');
return 1 if ($cs =~ /^UNICODE11UTF[78]/); # wtf? never heard of it
return 1 if ($cs eq 'XUNKNOWN'); # added by sendmail when converting to 8bit
return 1 if ($cs eq 'ISO'); # Magellan, sending as 'charset=iso 8859-15'. grr
foreach my $locale (@locales) {
if (!defined($locale) || $locale eq 'C') { $locale = 'en'; }
$locale =~ s/^([a-z][a-z]).*$/$1/; # zh_TW... => zh
my $ok_for_loc = $charsets_for_locale{$locale};
next if (!defined $ok_for_loc);
if ($ok_for_loc =~ /(?:^| )\Q${cs}\E(?:$| )/) {
return 1;
}
}
return 0;
}
1;

View File

@ -0,0 +1,74 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Locker;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Fcntl;
use Time::HiRes ();
use Mail::SpamAssassin;
our @ISA = qw();
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = { };
bless ($self, $class);
$self;
}
###########################################################################
sub safe_lock {
my ($self, $path, $max_retries, $mode) = @_;
# max_retries is optional, should default to about 30
# mode is UNIX-style and optional, should default to 0700,
# callers must specify --x bits
die "locker: safe_lock not implemented by Locker subclass";
}
###########################################################################
sub safe_unlock {
my ($self, $path) = @_;
die "locker: safe_unlock not implemented by Locker subclass";
}
###########################################################################
sub refresh_lock {
my ($self, $path) = @_;
die "locker: refresh_lock not implemented by Locker subclass";
}
###########################################################################
sub jittery_one_second_sleep {
my ($self) = @_;
Time::HiRes::sleep(rand(1.0) + 0.5);
}
###########################################################################
1;

View File

@ -0,0 +1,173 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Locker::Flock;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin;
use Mail::SpamAssassin::Locker;
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Logger;
use File::Spec;
use IO::File;
use Fcntl qw(:DEFAULT :flock);
our @ISA = qw(Mail::SpamAssassin::Locker);
###########################################################################
sub new {
my $class = shift;
my $self = $class->SUPER::new(@_);
$self;
}
###########################################################################
# Attempt to create a file lock, using NFS-UNsafe locking techniques.
sub safe_lock {
my ($self, $path, $max_retries, $mode) = @_;
my $is_locked = 0;
my @stat;
$max_retries ||= 30;
$mode ||= "0700";
$mode = (oct $mode) & 0666;
dbg ("locker: mode is $mode");
my $lock_file = "$path.mutex";
my $umask = umask(~$mode);
my $fh = new IO::File();
if (!$fh->open ($lock_file, O_RDWR|O_CREAT)) {
umask $umask; # just in case
die "locker: safe_lock: cannot create lockfile $lock_file: $!\n";
}
umask $umask; # we've created the file, so reset umask
dbg("locker: safe_lock: created $lock_file");
my $unalarmed = 0;
my $oldalarm = 0;
# use a SIGALRM-based timer -- more efficient than second-by-second
# sleeps
my $eval_stat;
eval {
local $SIG{ALRM} = sub { die "alarm\n" };
dbg("locker: safe_lock: trying to get lock on $path with $max_retries timeout");
# max_retries is basically seconds! so use it for the timeout
$oldalarm = alarm $max_retries;
# HELLO!?! IO::File doesn't have a flock() method?!
if (!flock($fh, LOCK_EX)) {
warn "locker: safe_lock: cannot obtain a lock on log file: $!";
} else {
alarm $oldalarm;
$unalarmed = 1; # avoid calling alarm(0) twice
dbg("locker: safe_lock: link to $lock_file: link ok");
$is_locked = 1;
# just to be nice: let people know when it was locked
$fh->print("$$\n") or die "error writing to lock file: $!";
$fh->flush or die "cannot flush lock file: $!";
# keep the FD around - we need to keep the lockfile open or the lock
# is unlocked!
$self->{lock_fhs} ||= { };
$self->{lock_fhs}->{$path} = $fh;
}
1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
};
$unalarmed or alarm $oldalarm; # if we die'd above, need to reset here
if (defined $eval_stat) {
if ($eval_stat =~ /alarm/) {
dbg("locker: safe_lock: timed out after $max_retries seconds");
} else {
die "locker: safe_lock: $eval_stat\n";
}
}
return $is_locked;
}
###########################################################################
sub safe_unlock {
my ($self, $path) = @_;
if (!exists $self->{lock_fhs} || !defined $self->{lock_fhs}->{$path}) {
dbg("locker: safe_unlock: no lock handle for $path - already unlocked?");
return;
}
my $fh = $self->{lock_fhs}->{$path};
delete $self->{lock_fhs}->{$path};
flock($fh, LOCK_UN) or die "cannot unlock a log file: $!";
$fh->close or die "error closing a lock file: $!";
dbg("locker: safe_unlock: unlocked $path.mutex");
# do NOT unlink! this would open a race, whereby:
#
# procA: ....unlock (unlocked lockfile)
# procB: lock (gets lock on lockfile)
# procA: unlink (deletes lockfile)
# (procB's lock is now deleted as well!)
# procC: create, lock (gets lock on new file)
#
# both procB and procC would then think they had locks, and both
# would write to the database file. this is bad.
#
# unlink ("$path.mutex");
#
# side-effect: we leave a .mutex file around. but hey!
}
###########################################################################
sub refresh_lock {
my($self, $path) = @_;
return unless $path;
if (!exists $self->{lock_fhs} || !defined $self->{lock_fhs}->{$path}) {
warn "locker: refresh_lock: no lock handle for $path\n";
return;
}
my $fh = $self->{lock_fhs}->{$path};
$fh->print("$$\n") or die "error writing to lock file: $!";
$fh->flush or die "cannot flush lock file: $!";
dbg("locker: refresh_lock: refresh $path.mutex");
}
###########################################################################
1;

View File

@ -0,0 +1,245 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Locker::UnixNFSSafe;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin;
use Mail::SpamAssassin::Locker;
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Logger;
use File::Spec;
use Time::Local;
use Fcntl qw(:DEFAULT :flock);
our @ISA = qw(Mail::SpamAssassin::Locker);
###########################################################################
sub new {
my $class = shift;
my $self = $class->SUPER::new(@_);
$self;
}
###########################################################################
# NFS-safe locking (I hope!):
# Attempt to create a file lock, using NFS-safe locking techniques.
#
# Locking code adapted from code by Alexis Rosen <alexis@panix.com>
# by Kelsey Cummings <kgc@sonic.net>, with mods by jm and quinlan
#
# A good implementation of Alexis' code, for reference, is here:
# http://mail-index.netbsd.org/netbsd-bugs/1996/04/17/0002.html
use constant LOCK_MAX_AGE => 600; # seconds
sub safe_lock {
my ($self, $path, $max_retries, $mode) = @_;
my $is_locked = 0;
my @stat;
$max_retries ||= 30;
$mode ||= "0700";
$mode = (oct $mode) & 0666;
dbg ("locker: mode is $mode");
my $lock_file = "$path.lock";
my $hname = Mail::SpamAssassin::Util::fq_hostname();
my $lock_tmp = Mail::SpamAssassin::Util::untaint_file_path
($path.".lock.".$hname.".".$$);
# keep this for unlocking
$self->{lock_tmp} = $lock_tmp;
my $umask = umask(~$mode);
if (!open(LTMP, ">$lock_tmp")) {
umask $umask; # just in case
die "locker: safe_lock: cannot create tmp lockfile $lock_tmp for $lock_file: $!\n";
}
umask $umask;
autoflush LTMP 1;
dbg("locker: safe_lock: created $lock_tmp");
for (my $retries = 0; $retries < $max_retries; $retries++) {
if ($retries > 0) { $self->jittery_one_second_sleep(); }
print LTMP "$hname.$$\n" or warn "Error writing to $lock_tmp: $!";
dbg("locker: safe_lock: trying to get lock on $path with $retries retries");
if (link($lock_tmp, $lock_file)) {
dbg("locker: safe_lock: link to $lock_file: link ok");
$is_locked = 1;
last;
}
# link _may_ return false even if the link _is_ created
@stat = lstat($lock_tmp);
@stat or warn "locker: error accessing $lock_tmp: $!";
if (defined $stat[3] && $stat[3] > 1) {
dbg("locker: safe_lock: link to $lock_file: stat ok");
$is_locked = 1;
last;
}
# check age of lockfile ctime
my $now = ($#stat < 11 ? undef : $stat[10]);
@stat = lstat($lock_file);
@stat or warn "locker: error accessing $lock_file: $!";
my $lock_age = ($#stat < 11 ? undef : $stat[10]);
if (defined($lock_age) && defined($now) && ($now - $lock_age) > LOCK_MAX_AGE)
{
# we got a stale lock, break it
dbg("locker: safe_lock: breaking stale $lock_file: age=" .
(defined $lock_age ? $lock_age : "undef") . " now=$now");
unlink($lock_file)
or warn "locker: safe_lock: unlink of lock file $lock_file failed: $!\n";
}
}
close LTMP or die "error closing $lock_tmp: $!";
unlink($lock_tmp)
or warn "locker: safe_lock: unlink of temp lock $lock_tmp failed: $!\n";
# record this for safe unlocking
if ($is_locked) {
@stat = lstat($lock_file);
@stat or warn "locker: error accessing $lock_file: $!";
my $lock_ctime = ($#stat < 11 ? undef : $stat[10]);
$self->{lock_ctimes} ||= { };
$self->{lock_ctimes}->{$path} = $lock_ctime;
}
return $is_locked;
}
###########################################################################
sub safe_unlock {
my ($self, $path) = @_;
my $lock_file = "$path.lock";
my $lock_tmp = $self->{lock_tmp};
if (!$lock_tmp) {
dbg("locker: safe_unlock: $path.lock never locked");
return;
}
# 1. Build a temp file and stat that to get an idea of what the server
# thinks the current time is (our_tmp.st_ctime). note: do not use time()
# directly because the server's clock may be out of sync with the client's.
my @stat_ourtmp;
if (!defined sysopen(LTMP, $lock_tmp, O_CREAT|O_WRONLY|O_EXCL, 0700)) {
warn "locker: safe_unlock: failed to create lock tmpfile $lock_tmp: $!";
return;
} else {
autoflush LTMP 1;
print LTMP "\n" or warn "Error writing to $lock_tmp: $!";
if (!(@stat_ourtmp = stat(LTMP)) || (scalar(@stat_ourtmp) < 11)) {
@stat_ourtmp or warn "locker: error accessing $lock_tmp: $!";
warn "locker: safe_unlock: failed to create lock tmpfile $lock_tmp";
close LTMP or die "error closing $lock_tmp: $!";
unlink($lock_tmp)
or warn "locker: safe_lock: unlink of lock file failed: $!\n";
return;
}
}
my $ourtmp_ctime = $stat_ourtmp[10]; # paranoia
if (!defined $ourtmp_ctime) {
die "locker: safe_unlock: stat failed on $lock_tmp";
}
close LTMP or die "error closing $lock_tmp: $!";
unlink($lock_tmp)
or warn "locker: safe_lock: unlink of lock file failed: $!\n";
# 2. If the ctime hasn't been modified, unlink the file and return. If the
# lock has expired, sleep the usual random interval before returning. If we
# didn't sleep, there could be a race if the caller immediately tries to
# relock the file.
my $lock_ctime = $self->{lock_ctimes}->{$path};
if (!defined $lock_ctime) {
warn "locker: safe_unlock: no ctime recorded for $lock_file";
return;
}
my @stat_lock = lstat($lock_file);
@stat_lock or warn "locker: error accessing $lock_file: $!";
my $now_ctime = $stat_lock[10];
if (defined $now_ctime && $now_ctime == $lock_ctime)
{
# things are good: the ctimes match so it was our lock
unlink($lock_file)
or warn "locker: safe_unlock: unlink failed: $lock_file\n";
dbg("locker: safe_unlock: unlink $lock_file");
if ($ourtmp_ctime >= $lock_ctime + LOCK_MAX_AGE) {
# the lock has expired, so sleep a bit; use some randomness
# to avoid race conditions.
dbg("locker: safe_unlock: lock expired on $lock_file expired safely; sleeping");
my $i; for ($i = 0; $i < 5; $i++) {
$self->jittery_one_second_sleep();
}
}
return;
}
# 4. Either ctime has been modified, or the entire lock file is missing.
# If the lock should still be ours, based on the ctime of the temp
# file, warn it was stolen. If not, then our lock is expired and
# someone else has grabbed the file, so warn it was lost.
if ($ourtmp_ctime < $lock_ctime + LOCK_MAX_AGE) {
warn "locker: safe_unlock: lock on $lock_file was stolen";
} else {
warn "locker: safe_unlock: lock on $lock_file was lost due to expiry";
}
}
###########################################################################
sub refresh_lock {
my($self, $path) = @_;
return unless $path;
# this could arguably read the lock and make sure the same process
# owns it, but this shouldn't, in theory, be an issue.
# TODO: in NFS, it definitely may be one :(
my $lock_file = "$path.lock";
utime time, time, $lock_file;
# update the lock_ctimes entry
my @stat = lstat($lock_file);
@stat or warn "locker: error accessing $lock_file: $!";
my $lock_ctime = ($#stat < 11 ? undef : $stat[10]);
$self->{lock_ctimes}->{$path} = $lock_ctime;
dbg("locker: refresh_lock: refresh $path.lock");
}
###########################################################################
1;

View File

@ -0,0 +1,116 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Locker::Win32;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Fcntl;
use Mail::SpamAssassin;
use Mail::SpamAssassin::Locker;
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Logger;
use File::Spec;
use Time::Local;
our @ISA = qw(Mail::SpamAssassin::Locker);
###########################################################################
sub new {
my $class = shift;
my $self = $class->SUPER::new(@_);
$self;
}
###########################################################################
use constant LOCK_MAX_AGE => 600; # seconds
sub safe_lock {
my ($self, $path, $max_retries, $mode) = @_;
my @stat;
$max_retries ||= 30;
# $mode is ignored on win32
my $lock_file = "$path.lock";
if (-e $lock_file && -M $lock_file > (LOCK_MAX_AGE / 86400)) {
dbg("locker: safe_lock: breaking stale lock: $lock_file");
unlink($lock_file)
or warn "locker: safe_lock: unlink of lock file $lock_file failed: $!\n";
}
for (my $retries = 0; $retries < $max_retries; $retries++) {
if ($retries > 0) {
sleep(1);
# TODO: $self->jittery_one_second_sleep();?
}
dbg("locker: safe_lock: trying to get lock on $path with $retries retries");
if (!defined sysopen(LOCKFILE, $lock_file, O_RDWR|O_CREAT|O_EXCL)) {
dbg("locker: safe_lock: failed to create lock tmpfile $lock_file: $!");
} else {
dbg("locker: safe_lock: link to $lock_file: sysopen ok");
close(LOCKFILE) or warn "error closing a lock file: $!";
return 1;
}
my @stat = stat($lock_file);
@stat or warn "locker: error accessing $lock_file: $!";
# check age of lockfile ctime
my $age = ($#stat < 11 ? undef : $stat[10]);
if ((!defined($age) && $retries > $max_retries / 2) ||
(defined($age) && (time - $age > LOCK_MAX_AGE)))
{
dbg("locker: safe_lock: breaking stale lock: $lock_file");
unlink($lock_file)
or warn "locker: safe_lock: unlink of lock file $lock_file failed: $!\n";
}
}
return 0;
}
###########################################################################
sub safe_unlock {
my ($self, $path) = @_;
unlink("$path.lock")
or warn "locker: safe_unlock: unlink failed: $path.lock\n";
dbg("locker: safe_unlock: unlink $path.lock");
}
###########################################################################
sub refresh_lock {
my($self, $path) = @_;
return unless $path;
# this could arguably read the lock and make sure the same process
# owns it, but this shouldn't, in theory, be an issue.
utime time, time, "$path.lock";
dbg("locker: refresh_lock: refresh $path.lock");
}
###########################################################################
1;

View File

@ -0,0 +1,412 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Logger - SpamAssassin logging module
=head1 SYNOPSIS
use Mail::SpamAssassin::Logger;
$SIG{__WARN__} = sub {
log_message("warn", $_[0]);
};
$SIG{__DIE__} = sub {
log_message("error", $_[0]) if !$^S;
};
=cut
package Mail::SpamAssassin::Logger;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Exporter ();
use Time::HiRes ();
our @ISA = qw(Exporter);
our @EXPORT = qw(dbg info would_log);
our @EXPORT_OK = qw(log_message);
use constant ERROR => 0;
use constant WARNING => 1;
use constant INFO => 2;
use constant DBG => 3;
my %log_level = (
0 => 'ERROR',
1 => 'WARNING',
2 => 'INFO',
3 => 'DBG',
);
# global shared object
our %LOG_SA;
our $LOG_ENTERED; # to avoid recursion on die or warn from within logging
# duplicate message line suppressor
our $LOG_DUPMIN = 10; # only start suppressing after x duplicate lines
our $LOG_DUPLINE = ''; # remembers last log line
our $LOG_DUPLEVEL = ''; # remembers last log level
our $LOG_DUPTIME; # remembers last log line timestamp
our $LOG_DUPCNT = 0; # counts duplicates
# defaults
$LOG_SA{level} = WARNING; # log info, warnings and errors
$LOG_SA{facility} = {}; # no dbg facilities turned on
# always log to stderr initially
use Mail::SpamAssassin::Logger::Stderr;
$LOG_SA{method}->{stderr} = Mail::SpamAssassin::Logger::Stderr->new();
=head1 METHODS
=over 4
=item add_facilities(facilities)
Enable debug logging for specific facilities. Each facility is the area
of code to debug. Facilities can be specified as a hash reference (the
key names are used), an array reference, an array, or a comma-separated
scalar string. Facility names are case-sensitive.
If "all" is listed, then all debug facilities are implicitly enabled,
except for those explicitly disabled. A facility name may be preceded
by a "no" (case-insensitive), which explicitly disables it, overriding
the "all". For example: all,norules,noconfig,nodcc. When facility names
are given as an ordered list (array or scalar, not a hash), the last entry
applies, e.g. 'nodcc,dcc,dcc,noddc' is equivalent to 'nodcc'. Note that
currently no facility name starts with a "no", it is advised to keep this
practice with newly added facility names to make life easier.
Higher priority informational messages that are suitable for logging in
normal circumstances are available with an area of "info". Some very
verbose messages require the facility to be specifically enabled (see
C<would_log> below).
=cut
sub add_facilities {
my ($facilities) = @_;
my @facilities;
if (ref ($facilities) eq '') {
if (defined $facilities && $facilities ne '0') {
@facilities = split(/,/, $facilities);
}
}
elsif (ref ($facilities) eq 'ARRAY') {
@facilities = @{ $facilities };
}
elsif (ref ($facilities) eq 'HASH') {
@facilities = keys %{ $facilities };
}
@facilities = grep(/^\S+$/, @facilities);
if (@facilities) {
for my $fac (@facilities) {
local ($1,$2);
$LOG_SA{facility}->{$2} = !defined($1) if $fac =~ /^(no)?(.+)\z/si;
}
# turn on debugging if facilities other than "info" are enabled
if (grep { !/^info\z/ && !/^no./si } keys %{ $LOG_SA{facility} }) {
$LOG_SA{level} = DBG if $LOG_SA{level} < DBG;
}
else {
$LOG_SA{level} = INFO if $LOG_SA{level} < INFO;
}
# debug statement last so we might see it
dbg("logger: adding facilities: " . join(", ", @facilities));
dbg("logger: logging level is " . $log_level{$LOG_SA{level}});
}
}
=item log_message($level, @message)
Log a message at a specific level. Levels are specified as strings:
"warn", "error", "info", and "dbg". The first element of the message
must be prefixed with a facility name followed directly by a colon.
=cut
sub log_message {
my ($level, @message) = @_;
# too many die and warn messages out there, don't log the ones that we don't
# own. jm: off: this makes no sense -- if a dependency module dies or warns,
# we want to know about it, unless we're *SURE* it's not something worth
# worrying about.
# if ($level eq "error" or $level eq "warn") {
# return unless $message[0] =~ /^\S+:/;
# }
if ($level eq "error") {
# don't log alarm timeouts or broken pipes of various plugins' network checks
return if (index($message[0], '__ignore__') != -1);
# dos: we can safely ignore any die's that we eval'd in our own modules so
# don't log them -- this is caller 0, the use'ing package is 1, the eval is 2
my @caller = caller 2;
return if (defined $caller[3] && defined $caller[0] &&
$caller[3] =~ /^\(eval\)$/ &&
$caller[0] =~ m#^Mail::SpamAssassin(?:$|::)#);
}
return if $LOG_ENTERED; # avoid recursion on die or warn from within logging
$LOG_ENTERED = 1; # no 'returns' from this point on, must clear the flag
my $message = join(" ", @message);
$message =~ s/[\r\n]+$//; # remove any trailing newlines
my $now = Time::HiRes::time;
# suppress duplicate loglines
if ($message eq $LOG_DUPLINE) {
$LOG_DUPCNT++;
$LOG_DUPTIME = $now;
# only start suppressing after x identical lines
if ($LOG_DUPCNT >= $LOG_DUPMIN) {
$LOG_ENTERED = 0;
return;
}
} else {
if ($LOG_DUPCNT >= $LOG_DUPMIN) {
$LOG_DUPCNT -= $LOG_DUPMIN - 1;
if ($LOG_DUPCNT > 1) {
_log_message($LOG_DUPLEVEL,
"$LOG_DUPLINE [... logline repeated $LOG_DUPCNT times]",
$LOG_DUPTIME);
} else {
_log_message($LOG_DUPLEVEL, $LOG_DUPLINE, $LOG_DUPTIME);
}
}
$LOG_DUPCNT = 0;
$LOG_DUPLINE = $message;
$LOG_DUPLEVEL = $level;
}
_log_message($level, $message, $now);
$LOG_ENTERED = 0;
}
# Private helper
sub _log_message {
# split on newlines and call log_message multiple times; saves
# the subclasses having to understand multi-line logs
my $first = 1;
foreach my $line (split(/\n/, $_[1])) {
# replace control characters with "_", tabs and spaces get
# replaced with a single space.
$line =~ tr/\x09\x20\x00-\x1f/ _/s;
if ($first) {
$first = 0;
} else {
local $1;
$line =~ s/^([^:]+?):/$1: [...]/;
}
while (my ($name, $object) = each %{ $LOG_SA{method} }) {
$object->log_message($_[0], $line, $_[2]);
}
}
}
=item dbg("facility: message")
This is used for all low priority debugging messages.
=cut
sub dbg {
_log(DBG, @_) if $LOG_SA{level} >= DBG;
1; # always return the same simple value, regardless of log level
}
=item info("facility: message")
This is used for informational messages indicating a normal, but
significant, condition. This should be infrequently called. These
messages are typically logged when SpamAssassin is run as a daemon.
=cut
sub info {
_log(INFO, @_) if $LOG_SA{level} >= INFO;
1; # always return the same simple value, regardless of log level
}
# remember to avoid deep recursion, my friend
sub _log {
my $facility;
local ($1);
# it's faster to access this as the $_[1] alias, and not to perform
# string mods until we're sure we actually want to log anything
if ($_[1] =~ /^([a-z0-9_-]*):/i) {
$facility = $1;
} else {
$facility = "generic";
}
# log all info, warn, and error messages;
# only debug if asked to
if ($_[0] == DBG) {
return unless
exists $LOG_SA{facility}->{$facility} ? $LOG_SA{facility}->{$facility}
: $LOG_SA{facility}->{all};
}
my ($level, $message, @args) = @_;
$message =~ s/^(?:[a-z0-9_-]*):\s*//i;
$message = sprintf($message,@args) if @args;
$message =~ s/\n+$//s;
$message =~ s/^/${facility}: /mg;
# no reason to go through warn()
log_message(($level == INFO ? "info" : "dbg"), $message);
}
=item add(method => 'syslog', socket => $socket, facility => $facility)
C<socket> is the type the syslog ("unix" or "inet"). C<facility> is the
syslog facility (typically "mail").
=item add(method => 'file', filename => $file)
C<filename> is the name of the log file.
=item add(method => 'stderr')
No options are needed for stderr logging, just don't close stderr first.
=cut
sub add {
my %params = @_;
my $name = lc($params{method});
my $class = ucfirst($name);
return 0 if $class !~ /^\w+$/; # be paranoid
eval 'use Mail::SpamAssassin::Logger::'.$class.'; 1'
or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
die "logger: add $class failed: $eval_stat\n";
};
if (!exists $LOG_SA{method}->{$name}) {
my $object;
my $eval_stat;
eval '$object = Mail::SpamAssassin::Logger::'.$class.'->new(%params); 1'
or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
undef $object; # just in case
};
if (!$object) {
if (!defined $eval_stat) {
$eval_stat = "Mail::SpamAssassin::Logger::$class->new ".
"failed to return an object";
}
warn "logger: failed to add $name method: $eval_stat\n";
}
else {
$LOG_SA{method}->{$name} = $object;
dbg("logger: successfully added $name method\n");
return 1;
}
return 0;
}
warn "logger: $name method already added\n";
return 1;
}
=item remove(method)
Remove a logging method. Only the method name needs to be passed as a
scalar.
=cut
sub remove {
my ($method) = @_;
my $name = lc($method);
if (exists $LOG_SA{method}->{$name}) {
delete $LOG_SA{method}->{$name};
info("logger: removing $name method");
return 1;
}
warn "logger: unable to remove $name method, not present to be removed\n";
return 1;
}
=item would_log($level, $facility)
Returns false if a message at the given level and with the given facility
would not be logged. Returns 1 if a message at a given level and facility
would be logged normally. Returns 2 if the facility was specifically
enabled.
The facility argument is optional.
=cut
sub would_log {
my ($level, $facility) = @_;
if ($level eq 'dbg') {
return 0 if $LOG_SA{level} < DBG;
return 1 if !$facility;
return ($LOG_SA{facility}->{$facility} ? 2 : 0)
if exists $LOG_SA{facility}->{$facility};
return 1 if $LOG_SA{facility}->{all};
return 0;
} elsif ($level eq 'info') {
return $LOG_SA{level} >= INFO;
}
warn "logger: would_log called with unknown level: $level\n";
return 0;
}
=item close_log()
Close all logs.
=cut
sub close_log {
while (my ($name, $object) = each %{ $LOG_SA{method} }) {
$object->close_log();
}
}
END {
close_log();
}
1;
=back
=cut

View File

@ -0,0 +1,117 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Logger::File - log to file
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Logger::File
=head1 DESCRIPTION
=cut
package Mail::SpamAssassin::Logger::File;
use strict;
use warnings;
# use bytes;
use re 'taint';
use POSIX ();
use Time::HiRes ();
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(am_running_on_windows);
our @ISA = ();
# ADDING OS-DEPENDENT LINE TERMINATOR - BUG 6456
my $eol = "\n";
if (am_running_on_windows()) {
$eol = "\r\n";
}
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = { };
bless ($self, $class);
# parameters
my %params = @_;
$self->{filename} = $params{filename} || 'spamassassin.log';
$self->{timestamp_fmt} = $params{timestamp_fmt};
if (! $self->init()) {
die "logger: file initialization failed$eol";
}
return($self);
}
# logging via file is requested
sub init {
my ($self) = @_;
if (open(STDLOG, ">> $self->{filename}")) {
dbg("logger: successfully opened file $self->{filename}");
# ensure it's unbuffered
my $oldfh = select STDLOG;
$| = 1;
select $oldfh;
return 1;
}
else {
warn "logger: failed to open file $self->{filename}: $!$eol";
return 0;
}
}
sub log_message {
my ($self, $level, $msg, $ts) = @_;
my $timestamp;
my $fmt = $self->{timestamp_fmt};
my $now = defined $ts ? $ts : Time::HiRes::time;
if (!defined $fmt) {
$timestamp = scalar localtime($now); # default, backward compatibility
} elsif ($fmt eq '') {
$timestamp = '';
} else {
$timestamp = POSIX::strftime($fmt, localtime($now));
}
$timestamp .= ' ' if $timestamp ne '';
my($nwrite) = syswrite(STDLOG, sprintf("%s[%s] %s: %s%s",
$timestamp, $$, $level, $msg, $eol));
defined $nwrite or warn "error writing to log file: $!";
}
sub close_log {
my ($self) = @_;
if (defined $self->{filename}) {
close(STDLOG) or die "error closing log file: $!";
}
}
1;

View File

@ -0,0 +1,95 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Logger::Stderr - log to standard error
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Logger::Stderr
=head1 DESCRIPTION
=cut
package Mail::SpamAssassin::Logger::Stderr;
use strict;
use warnings;
# use bytes;
use re 'taint';
use POSIX ();
use Time::HiRes ();
our @ISA = ();
# ADDING OS-DEPENDENT LINE TERMINATOR - BUG 6456
# Using Mail::SpamAssassin::Util::am_running_on_windows() leads to circular
# dependencies. So, we are duplicating the code instead.
use constant RUNNING_ON_WINDOWS => ($^O =~ /^(?:mswin|dos|os2)/oi);
my $eol = "\n";
if (RUNNING_ON_WINDOWS) {
$eol = "\r\n";
}
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = { };
bless ($self, $class);
my %params = @_;
$self->{timestamp_fmt} = $params{timestamp_fmt};
return($self);
}
sub log_message {
my ($self, $level, $msg, $ts) = @_;
my $timestamp;
my $fmt = $self->{timestamp_fmt};
my $now = defined $ts ? $ts : Time::HiRes::time;
if (!defined $fmt) {
# default since 3.3.0
my $datetime = POSIX::strftime("%b %d %H:%M", localtime($now));
utf8::encode($datetime) if utf8::is_utf8($datetime); # Bug 7305
$timestamp = sprintf("%s:%06.3f", $datetime, $now-int($now/60)*60);
# Bug 6329: %e is not in a POSIX standard, use %d instead and edit
local $1; $timestamp =~ s/^(\S+\s+)0/$1 /;
} elsif ($fmt eq '') {
$timestamp = '';
} else {
$timestamp = POSIX::strftime($fmt, localtime($now));
}
$timestamp .= ' ' if $timestamp ne '';
my($nwrite) = syswrite(STDERR, sprintf("%s[%d] %s: %s%s",
$timestamp, $$, $level, $msg, $eol));
defined $nwrite or warn "error writing to log file: $!";
}
sub close_log {
my ($self) = @_;
}
1;

View File

@ -0,0 +1,265 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Logger::Syslog - log to syslog
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Logger::Syslog
=head1 DESCRIPTION
=cut
package Mail::SpamAssassin::Logger::Syslog;
use strict;
use warnings;
# use bytes;
use re 'taint';
use POSIX qw(:sys_wait_h setsid sigprocmask);
use Time::HiRes ();
use Sys::Syslog qw(:DEFAULT setlogsock);
use Mail::SpamAssassin::Logger;
our @ISA = ();
# %prio_map maps Logger.pm log level names (warn, error, info, dbg)
# into standard Sys::Syslog::syslog() log level names
#
our %prio_map = (dbg => 'debug', debug => 'debug', info => 'info',
notice => 'notice', warn => 'warning', warning => 'warning',
error => 'err', err => 'err', crit => 'crit', alert => 'alert',
emerg => 'emerg');
# make sure never to hit the CPAN-RT#56826 bug (memory corruption
# when closelog() is called twice), fixed in Sys-Syslog 0.28
our $syslog_open = 0;
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = { };
bless ($self, $class);
# initialization
$self->{already_done_failure_warning} = 0;
$self->{disabled} = 0;
$self->{consecutive_failures} = 0;
$self->{failure_threshold} = 10;
$self->{SIGPIPE_RECEIVED} = 0;
# parameters
my %params = @_;
$self->{ident} = $params{ident} || 'spamassassin';
$self->{log_socket} = $params{socket};
$self->{log_facility} = $params{facility};
$self->{timestamp_fmt} = $params{timestamp_fmt};
if (! $self->init()) {
die "logger: syslog initialization failed\n";
}
return($self);
}
# logging via syslog is requested
sub init {
my ($self) = @_;
my $log_socket = $self->{log_socket};
$log_socket = '' if !defined $log_socket;
my $eval_stat;
eval {
if ($log_socket eq '') {
# calling setlogsock is optional, let Sys::Syslog choose a default
} else {
dbg("logger: calling setlogsock($log_socket)");
setlogsock($log_socket) or die "setlogsock($log_socket) failed: $!";
}
dbg("logger: opening syslog with $log_socket socket");
# the next call is required to actually open the socket
openlog($self->{ident}, 'cons,pid,ndelay', $self->{log_facility});
$syslog_open = 1;
1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("logger: connection to syslog/$log_socket failed: $eval_stat");
};
# Solaris sometimes doesn't support UNIX-domain syslog sockets apparently;
# the same is true for perl 5.6.0 build on an early version of Red Hat 7!
# In these cases we try it with INET instead.
# See also Bug 6267 and Bug 6331.
if (defined($eval_stat) && $log_socket ne 'inet') {
dbg("logger: trying setlogsock('inet')");
undef $eval_stat;
eval {
setlogsock('inet') or die "setlogsock('inet') failed: $!";
dbg("logger: opening syslog using inet socket");
openlog($self->{ident}, 'cons,pid,ndelay', $self->{log_facility});
$syslog_open = 1;
1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("logger: connection to syslog/inet failed: $eval_stat");
};
}
# we failed!
if (defined $eval_stat) {
return 0;
}
else {
dbg("logger: successfully connected to syslog/$log_socket");
return 1;
}
}
sub log_message {
my ($self, $level, $msg, $ts) = @_;
return if $self->{disabled};
# map level names
$level = $prio_map{$level};
if (!defined $level) { # just in case
$level = 'err';
$msg = '(bad prio: ' . $_[1] . ') ' . $msg;
}
# install a new handler for SIGPIPE -- this signal has been
# found to occur with syslog-ng after syslog-ng restarts.
local $SIG{'PIPE'} = sub {
$self->{SIGPIPE_RECEIVED}++;
# force a log-close. trap possible die() calls
eval { closelog() } if $syslog_open;
$syslog_open = 0;
};
my $timestamp = '';
my $fmt = $self->{timestamp_fmt};
if (defined $fmt && $fmt ne '') { # for completeness, rarely used
my $now = defined $ts ? $ts : Time::HiRes::time;
$timestamp = POSIX::strftime($fmt, localtime($now));
}
$msg = $timestamp . ' ' . $msg if $timestamp ne '';
# no longer needed since a patch to bug 6745:
# # important: do not call syslog() from the SIGCHLD handler
# # child_handler(). otherwise we can get into a loop if syslog()
# # forks a process -- as it does in syslog-ng apparently! (bug 3625)
# $Mail::SpamAssassin::Logger::LOG_SA{INHIBIT_LOGGING_IN_SIGCHLD_HANDLER} = 1;
my $eval_stat;
eval {
syslog($level, "%s", $msg); 1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
};
# no longer needed since a patch to bug 6745:
# $Mail::SpamAssassin::Logger::LOG_SA{INHIBIT_LOGGING_IN_SIGCHLD_HANDLER} = 0;
if (defined $eval_stat) {
if ($self->check_syslog_sigpipe($msg)) {
# dealt with
}
else {
warn "logger: syslog failed: $eval_stat\n";
# only write this warning once, it gets annoying fast
if (!$self->{already_done_failure_warning}) {
warn "logger: try using --syslog-socket={unix,inet} or --syslog=file\n";
$self->{already_done_failure_warning} = 1;
}
}
$self->syslog_incr_failure_counter();
}
else {
$self->{consecutive_failures} = 0;
$self->check_syslog_sigpipe($msg); # check for SIGPIPE anyway (bug 3625)
}
$SIG{PIPE} = 'IGNORE'; # this may have been reset (bug 4026)
}
sub check_syslog_sigpipe {
my ($self, $msg) = @_;
if (!$self->{SIGPIPE_RECEIVED}) {
return 0; # didn't have a SIGPIPE
}
eval {
# SIGPIPE received when writing to syslog -- close and reopen
# the log handle, then try again.
closelog() if $syslog_open;
$syslog_open = 0;
openlog($self->{ident}, 'cons,pid,ndelay', $self->{log_facility});
$syslog_open = 1;
syslog('debug', "%s", "syslog reopened");
syslog('info', "%s", $msg);
# now report what happened
$msg = "SIGPIPE received, reopening log socket";
dbg("log: $msg");
syslog('info', "%s", $msg);
# if we've received multiple sigpipes, logging is probably still broken.
if ($self->{SIGPIPE_RECEIVED}) {
warn "logger: syslog failure: multiple SIGPIPEs received\n";
$self->{disabled} = 1;
}
$self->{SIGPIPE_RECEIVED} = 0;
return 1;
1; # just to not forget a good habit
} or do { # something died? that's not good.
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("log: failure in check_syslog_sigpipe: $eval_stat");
$self->syslog_incr_failure_counter();
}
}
sub syslog_incr_failure_counter {
my ($self) = @_;
$self->{consecutive_failures}++;
if ($self->{consecutive_failures}++ > $self->{failure_threshold}) {
warn("logger: syslog() failed " . $self->{consecutive_failures} .
" times in a row, disabled\n");
$self->{disabled} = 1;
return 1;
}
return 0;
}
sub close_log {
my ($self) = @_;
closelog() if $syslog_open;
$syslog_open = 0;
}
1;

View File

@ -0,0 +1,151 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
# Eval Tests to detect genuine mailing lists.
use strict; # make Test::Perl::Critic happy
package Mail::SpamAssassin::MailingList; 1;
package Mail::SpamAssassin::PerMsgStatus;
use strict;
use warnings;
# use bytes;
use re 'taint';
sub detect_mailing_list {
my ($self) = @_;
return 1 if $self->detect_ml_ezmlm();
return 1 if $self->detect_ml_mailman();
return 1 if $self->detect_ml_sympa();
return 0;
}
# EZMLM
# Mailing-List: .*run by ezmlm
# Precedence: bulk
# List-Post: <mailto:
# List-Help: <mailto:
# List-Unsubscribe: <mailto:[a-zA-Z\.-]+-unsubscribe@
# List-Subscribe: <mailto:[a-zA-Z\.-]+-subscribe@
sub detect_ml_ezmlm {
my ($self) = @_;
return 0 unless $self->get('mailing-list') =~ /ezmlm$/;
return 0 unless $self->get('precedence') eq "bulk\n";
return 0 unless $self->get('list-post') =~ /^<mailto:/i;
return 0 unless $self->get('list-help') =~ /^<mailto:/i;
return 0 unless $self->get('list-unsubscribe') =~ /<mailto:[a-zA-Z\.-]+-unsubscribe\@/i;
return 0 unless $self->get('list-subscribe') =~ /<mailto:[a-zA-Z\.-]+-subscribe\@/i;
return 1; # assume ezmlm then.
}
# MailMan (the gnu mailing list manager)
# Precedence: bulk [or list for v2]
# List-Help: <mailto:
# List-Post: <mailto:
# List-Subscribe: .*<mailto:.*=subscribe>
# List-Id:
# List-Unsubscribe: .*<mailto:.*=unsubscribe>
# List-Archive:
# X-Mailman-Version: \d
#
# However, for mailing list membership reminders, most of
# those headers are gone, so we identify on the following:
#
# Subject: ...... mailing list memberships reminder (v1)
# or X-List-Administrivia: yes (only in version 2)
# X-Mailman-Version: \d
# Precedence: bulk [or list for v2]
# X-No-Archive: yes
# Errors-To:
# X-BeenThere:
sub detect_ml_mailman {
my ($self) = @_;
return 0 unless $self->get('x-mailman-version') =~ /^\d/;
return 0 unless $self->get('precedence') =~ /^(?:bulk|list)$/;
if ($self->get('x-list-administrivia') =~ /yes/ ||
$self->get('subject') =~ /mailing list memberships reminder$/)
{
return 0 unless defined $self->get('errors-to',undef);
return 0 unless defined $self->get('x-beenthere',undef);
return 0 unless $self->get('x-no-archive') =~ /yes/;
return 1;
}
return 0 unless defined $self->get('list-id',undef);
return 0 unless $self->get('list-help') =~ /^<mailto:/i;
return 0 unless $self->get('list-post') =~ /^<mailto:/i;
return 0 unless $self->get('list-subscribe') =~ /<mailto:.*=subscribe>/i;
return 0 unless $self->get('list-unsubscribe') =~ /<mailto:.*=unsubscribe>/i;
return 1; # assume this is a valid mailman list
}
# Sympa
# Return-Path: somelist-owner@somedomain.com [...]
# Precedence: list [...]
# List-Id: <somelist@somedomain.com>
# List-Help: <mailto:sympa@somedomain.com?subject=help>
# List-Subscribe: <mailto:somedomain.com?subject=subscribe%20somelist>
# List-Unsubscribe: <mailto:sympa@somedomain.com?subject=unsubscribe%somelist>
# List-Post: <mailto:somelist@somedomain.com>
# List-Owner: <mailto:somelist-request@somedomain.com>
# X-Mailer: Sympa 6.2.22
# X-Sympa-To: somelist@somedomain.com
# [and optionally] List-Archive: <http://www.somedomain.com/wws/arc/somelist>
sub detect_ml_sympa {
my ($self) = @_;
return 1 if $self->get('X-Mailer') =~ /^Sympa \d\.\d\.\d/;
return 1 if defined ($self->get('X-Sympa-To',undef));
return 0;
}
# Lyris
# Not implemented - need headers
sub detect_ml_lyris {
}
# ListBuilder
# Sep 17 2002 jm: turned off due to bad S/O ratio
# sub detect_ml_listbuilder {
# my ($self, $full) = @_;
#
# my $reply = $self->get('Reply-To:addr');
# if ($reply !~ /\@lb.bcentral.com/) { return 0; }
#
# # Received: from unknown (HELO lbrout14.listbuilder.com) (204.71.191.9)
# my $rcvd = $self->get('received');
# return 0 unless ($rcvd =~ /\blbrout\d+\.listbuilder\.com\b/i);
# return 0 unless ($rcvd =~ /\b204\.71\.191\.\d+\b/);
#
# # _______________________________________________________________________
# # Powered by List Builder
# # To unsubscribe follow the link:
# # http://lb.bcentral.com/ex/sp?c=19511&s=76CA511711046877&m=14
# $full = join ("\n", @{$full});
#
# if ($full !~ /__________________{40,}\s+Powered by List Builder\s/) { return 0; }
# if ($full !~
# m,\shttp://lb\.bcentral\.com/ex/sp\?c=[0-9A-Z]*&s=[0-9A-Z]*&m=[0-9A-Z]*\s,)
# { return 0; }
#
# return 1;
# }
1;

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,112 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Message::Metadata - extract metadata from a message
=head1 DESCRIPTION
This class is tasked with extracting "metadata" from messages for use as
Bayes tokens, fodder for eval tests, or other rules. Metadata is
supplemental data inferred from the message, like the examples below.
It is held in two forms:
1. as name-value pairs of strings, presented in mail header format. For
example, "X-Languages" => "en". This is the general form for simple
metadata that's useful as Bayes tokens, can be added to marked-up
messages using "add_header", etc., such as the trusted-relay inference
and language detection.
2. as more complex data structures on the $msg->{metadata} object. This
is the form used for metadata like the HTML parse data, which is stored
there for access by eval rule code. Because it's not simple strings,
it's not added as a Bayes token by default (Bayes needs simple strings).
=head1 PUBLIC METHODS
=over 4
=cut
package Mail::SpamAssassin::Message::Metadata;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin;
use Mail::SpamAssassin::Constants qw(:sa);
use Mail::SpamAssassin::Util qw(reverse_ip_address);
use Mail::SpamAssassin::Message::Metadata::Received;
use Mail::SpamAssassin::Logger;
=item new()
=back
=cut
sub new {
my ($class, $msg) = @_;
$class = ref($class) || $class;
my $self = {
msg => $msg,
strings => { }
};
bless($self,$class);
$self;
}
sub extract {
my ($self, $msg, $permsgstatus) = @_;
# pre-chew Received headers
$self->parse_received_headers ($permsgstatus, $msg);
foreach my $tuple (
[$self->{relays_trusted}, 'RELAYSTRUSTEDREVIP' ],
[$self->{relays_untrusted}, 'RELAYSUNTRUSTEDREVIP'],
[$self->{relays_internal}, 'RELAYSINTERNALREVIP' ],
[$self->{relays_external}, 'RELAYSEXTERNALREVIP' ])
{ my($rly, $tag) = @$tuple;
my @revips;
@revips = map {
my($ip,$revip);
$ip = $_->{ip} if ref $_ && !$_->{ip_private};
$revip = reverse_ip_address($ip) if defined $ip && $ip ne '';
defined $revip && $revip ne '' ? $revip : ();
} @$rly if $rly;
$permsgstatus->set_tag($tag,
@revips == 1 ? $revips[0] : \@revips) if @revips;
}
$permsgstatus->{main}->call_plugins("extract_metadata",
{ msg => $msg, permsgstatus => $permsgstatus,
conf => $permsgstatus->{main}->{conf} });
}
sub finish {
my ($self) = @_;
%{$self} = ();
}
1;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,337 @@
# Mail::SpamAssassin::NetSet - object to manipulate CIDR net IP addrs
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::NetSet;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Time::HiRes qw(time);
use NetAddr::IP 4.000;
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Logger;
our $have_patricia;
BEGIN {
eval {
require Net::Patricia;
Net::Patricia->VERSION(1.16); # need AF_INET6 support
import Net::Patricia;
$have_patricia = 1;
};
}
###########################################################################
sub new {
my ($class,$netset_name) = @_;
$class = ref($class) || $class;
$netset_name = '' if !defined $netset_name; # object name for debugging
my $self = {
name => $netset_name, num_nets => 0,
cache_hits => 0, cache_attempts => 0,
};
$self->{pt} = Net::Patricia->new(&AF_INET6) if $have_patricia;
bless $self, $class;
$self;
}
###########################################################################
sub DESTROY {
my($self) = shift;
if (exists $self->{cache}) {
local($@, $!, $_); # protect outer layers from a potential surprise
my($hits, $attempts) = ($self->{cache_hits}, $self->{cache_attempts});
dbg("netset: cache %s hits/attempts: %d/%d, %.1f %%",
$self->{name}, $hits, $attempts, 100*$hits/$attempts) if $attempts > 0;
}
}
###########################################################################
sub add_cidr {
my ($self, @nets) = @_;
$self->{nets} ||= [ ];
my $numadded = 0;
delete $self->{cache}; # invalidate cache (in case of late additions)
foreach my $cidr_orig (@nets) {
my $cidr = $cidr_orig; # leave original unchanged, useful for logging
# recognizes syntax:
# [IPaddr%scope]/len or IPaddr%scope/len or IPv4addr/mask
# optionally prefixed by a '!' to indicate negation (exclusion);
# the %scope (i.e. interface), /len or /mask are optional
local($1,$2,$3,$4);
$cidr =~ s/^\s+//;
my $exclude = ($cidr =~ s/^!\s*//) ? 1 : 0;
my $masklen; # netmask or a prefix length
$masklen = $1 if $cidr =~ s{ / (.*) \z }{}xs;
# discard optional brackets
$cidr = $1 if $cidr =~ /^ \[ ( [^\]]* ) \] \z/xs;
my $scope;
# IPv6 Scoped Address (RFC 4007, RFC 6874, RFC 3986 "unreserved" charset)
if ($cidr =~ s/ % ( [A-Z0-9._~-]* ) \z //xsi) { # scope <zone_id> ?
$scope = $1; # interface specification
# discard interface specification, currently just ignored
info("netset: ignoring interface scope '%%%s' in IP address %s",
$scope, $cidr_orig);
}
my $is_ip4 = 0;
if ($cidr =~ /^ \d+ (\. | \z) /x) { # looks like an IPv4 address
if ($cidr =~ /^ (\d+) \. (\d+) \. (\d+) \. (\d+) \z/x) {
# also strips leading zeroes, not liked by inet_pton
$cidr = sprintf('%d.%d.%d.%d', $1,$2,$3,$4);
$masklen = 32 if !defined $masklen;
} elsif ($cidr =~ /^ (\d+) \. (\d+) \. (\d+) \.? \z/x) {
$cidr = sprintf('%d.%d.%d.0', $1,$2,$3);
$masklen = 24 if !defined $masklen;
} elsif ($cidr =~ /^ (\d+) \. (\d+) \.? \z/x) {
$cidr = sprintf('%d.%d.0.0', $1,$2);
$masklen = 16 if !defined $masklen;
} elsif ($cidr =~ /^ (\d+) \.? \z/x) {
$cidr = sprintf('%d.0.0.0', $1);
$masklen = 8 if !defined $masklen;
} else {
warn "netset: illegal IPv4 address given: '$cidr_orig'\n";
next;
}
$is_ip4 = 1;
}
if ($self->{pt}) {
if (defined $masklen) {
$masklen =~ /^\d{1,3}\z/
or die "Network mask not supported, use a CIDR syntax: '$cidr_orig'";
}
my $key = $cidr;
my $prefix_len = $masklen;
if ($is_ip4) {
$key = '::ffff:' . $key; # turn it into an IPv4-mapped IPv6 addresses
$prefix_len += 96 if defined $prefix_len;
}
$prefix_len = 128 if !defined $prefix_len;
$key .= '/' . $prefix_len;
# dbg("netset: add_cidr (patricia trie) %s => %s",
# $cidr_orig, $exclude ? '!'.$key : $key);
defined eval {
$self->{pt}->add_string($key, $exclude ? '!'.$key : $key)
} or warn "netset: illegal IP address given (patricia trie): ".
"'$key': $@\n";
}
$cidr .= '/' . $masklen if defined $masklen;
my $ip = NetAddr::IP->new($cidr);
if (!defined $ip) {
warn "netset: illegal IP address given: '$cidr_orig'\n";
next;
}
# dbg("netset: add_cidr %s => %s => %s", $cidr_orig, $cidr, $ip);
# if this is an IPv4 address, create an IPv6 representation, too
my ($ip4, $ip6);
if ($is_ip4) {
$ip4 = $ip;
$ip6 = $self->_convert_ipv4_cidr_to_ipv6($cidr);
} else {
$ip6 = $ip;
}
# bug 5931: this is O(n^2). bad if there are lots of nets. There are good
# reasons to keep it for linting purposes, though, so don't start skipping
# it until we have over 200 nets in our list
if (scalar @{$self->{nets}} < 200) {
next if ($self->is_net_declared($ip4, $ip6, $exclude, 0));
}
# note: it appears a NetAddr::IP object takes up about 279 bytes
push @{$self->{nets}}, {
exclude => $exclude,
ip4 => $ip4,
ip6 => $ip6,
as_string => $cidr_orig,
};
$numadded++;
}
$self->{num_nets} += $numadded;
$numadded;
}
sub get_num_nets {
my ($self) = @_;
return $self->{num_nets};
}
sub _convert_ipv4_cidr_to_ipv6 {
my ($self, $cidr) = @_;
# only do this for IPv4 addresses
return unless $cidr =~ /^\d+[.\/]/;
if ($cidr !~ /\//) { # no mask
return NetAddr::IP->new6("::ffff:".$cidr);
}
# else we have a CIDR mask specified. use new6() to do this
#
my $ip6 = NetAddr::IP->new6($cidr)->cidr;
# 127.0.0.1 -> 0:0:0:0:0:0:7F00:0001/128
# 127/8 -> 0:0:0:0:0:0:7F00:0/104
# now, move that from 0:0:0:0:0:0: space to 0:0:0:0:0:ffff: space
if (!defined $ip6 || $ip6 !~ /^0:0:0:0:0:0:(.*)$/) {
warn "oops! unparseable IPv6 address for $cidr: $ip6";
return;
}
return NetAddr::IP->new6("::ffff:$1");
}
sub _nets_contains_network {
my ($self, $net4, $net6, $exclude, $quiet, $netname, $declared) = @_;
return 0 unless (defined $self->{nets});
foreach my $net (@{$self->{nets}}) {
# check to see if the new network is contained by the old network
my $in4 = defined $net4 && defined $net->{ip4} && $net->{ip4}->contains($net4);
my $in6 = defined $net6 && defined $net->{ip6} && $net->{ip6}->contains($net6);
if ($in4 || $in6) {
warn sprintf("netset: cannot %s %s as it has already been %s\n",
$exclude ? "exclude" : "include",
$netname,
$net->{exclude} ? "excluded" : "included") unless $quiet;
# a network that matches an excluded network isn't contained by "nets"
# return 0 if we're not just looking to see if the network was declared
return 0 if (!$declared && $net->{exclude});
return 1;
}
}
return 0;
}
sub is_net_declared {
my ($self, $net4, $net6, $exclude, $quiet) = @_;
return $self->_nets_contains_network($net4, $net6, $exclude,
$quiet, $net4 || $net6, 1);
}
sub contains_ip {
my ($self, $ip) = @_;
my $result = 0;
if (!$self->{num_nets}) { return 0 }
$self->{cache_attempts}++;
if ($self->{cache} && exists $self->{cache}{$ip}) {
dbg("netset: %s cached lookup on %s, %d networks, result: %s",
$self->{name}, $ip, $self->{num_nets}, $self->{cache}{$ip});
$self->{cache_hits}++;
return $self->{cache}{$ip};
} elsif ($self->{pt}) {
# do a quick lookup on a Patricia Trie
my $t0 = time;
local($1,$2,$3,$4); local $_ = $ip;
$_ = $1 if /^ \[ ( [^\]]* ) \] \z/xs; # discard optional brackets
s/%[A-Z0-9:._-]+\z//si; # discard interface specification
if (m{^ (\d+) \. (\d+) \. (\d+) \. (\d+) \z}x) {
$_ = sprintf('::ffff:%d.%d.%d.%d', $1,$2,$3,$4);
} else {
s/^IPv6://si; # discard optional 'IPv6:' prefix
}
eval { $result = $self->{pt}->match_string($_); 1 } or undef $result;
$result = defined $result && $result !~ /^!/ ? 1 : 0;
dbg("netset: %s patricia lookup on %s, %d networks, result: %s, %.3f ms",
$self->{name}, $ip, $self->{num_nets}, $result, 1000*(time - $t0));
} else {
# do a sequential search on a list of NetAddr::IP objects
my $t0 = time;
my ($ip4, $ip6);
if ($ip =~ /^\d+\./) {
$ip4 = NetAddr::IP->new($ip);
$ip6 = $self->_convert_ipv4_cidr_to_ipv6($ip);
} else {
$ip6 = NetAddr::IP->new($ip);
}
foreach my $net (@{$self->{nets}}) {
if ((defined $ip4 && defined $net->{ip4} && $net->{ip4}->contains($ip4))
|| (defined $ip6 && defined $net->{ip6} && $net->{ip6}->contains($ip6))){
$result = !$net->{exclude};
last;
}
}
dbg("netset: %s lookup on %s, %d networks, result: %s, %.3f ms",
$self->{name}, $ip, $self->{num_nets}, $result, 1000*(time - $t0));
}
$self->{cache}{$ip} = $result;
return $result;
}
sub contains_net {
my ($self, $net) = @_;
my $exclude = $net->{exclude};
my $net4 = $net->{ip4};
my $net6 = $net->{ip6};
return $self->_nets_contains_network($net4, $net6, $exclude, 1, "", 0);
}
sub ditch_cache {
my ($self) = @_;
if (exists $self->{cache}) {
dbg("netset: ditch cache on %s", $self->{name});
delete $self->{cache};
}
}
sub clone {
my ($self) = @_;
my $dup = Mail::SpamAssassin::NetSet->new($self->{name});
if ($self->{nets}) {
@{$dup->{nets}} = @{$self->{nets}};
}
if ($self->{pt}) {
my $dup_pt = $dup->{pt};
$self->{pt}->climb(sub {
my $key = $_[0]; $key =~ s/^!//;
defined eval { $dup_pt->add_string($key, $_[0]) }
or die "Adding a network $_[0] to a patricia trie failed: $@";
1;
});
}
$dup->{num_nets} = $self->{num_nets};
return $dup;
}
###########################################################################
1;

View File

@ -0,0 +1,195 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::PerMsgLearner - per-message status (spam or not-spam)
=head1 SYNOPSIS
my $spamtest = new Mail::SpamAssassin ({
'rules_filename' => '/etc/spamassassin.rules',
'userprefs_filename' => $ENV{HOME}.'/.spamassassin/user_prefs'
});
my $mail = $spamtest->parse();
my $status = $spamtest->learn($mail,$id,$isspam,$forget);
my $didlearn = $status->did_learn();
$status->finish();
=head1 DESCRIPTION
The Mail::SpamAssassin C<learn()> method returns an object of this
class. This object encapsulates all the per-message state for
the learning process.
=head1 METHODS
=over 4
=cut
package Mail::SpamAssassin::PerMsgLearner;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::Bayes;
use Mail::SpamAssassin::Logger;
our @ISA = qw();
###########################################################################
sub new {
my $class = shift;
$class = ref($class) || $class;
my ($main, $msg) = @_;
my $self = {
'main' => $main,
'msg' => $msg,
'learned' => 0,
'master_deadline' => $msg->{master_deadline}, # dflt inherited from msg
};
$self->{conf} = $self->{main}->{conf};
$self->{bayes_scanner} = $self->{main}->{bayes_scanner};
bless ($self, $class);
$self;
}
###########################################################################
# $status->learn_spam($id)
#
# Learn the message as spam.
#
# C<$id> is an optional message-identification string, used internally
# to tag the message. If it is C<undef>, one will be generated.
# It should be unique to that message.
#
# This is a semi-private API; callers should use
# C<$spamtest-E<gt>learn($mail,$id,$isspam,$forget)> instead.
sub learn_spam {
my ($self, $id) = @_;
# bug 4096
# if ($self->{main}->{learn_with_whitelist}) {
# $self->{main}->add_all_addresses_to_blacklist ($self->{msg});
# }
# use the real message-id here instead of mass-check's idea of an "id",
# as we may deliver the msg into another mbox format but later need
# to forget it's training.
$self->{learned} = $self->{bayes_scanner}->learn (1, $self->{msg}, $id);
}
###########################################################################
# $status->learn_ham($id)
#
# Learn the message as ham.
#
# C<$id> is an optional message-identification string, used internally
# to tag the message. If it is C<undef>, one will be generated.
# It should be unique to that message.
#
# This is a semi-private API; callers should use
# C<$spamtest-E<gt>learn($mail,$id,$isspam,$forget)> instead.
sub learn_ham {
my ($self, $id) = @_;
# bug 4096
# if ($self->{main}->{learn_with_whitelist}) {
# $self->{main}->add_all_addresses_to_whitelist ($self->{msg});
# }
$self->{learned} = $self->{bayes_scanner}->learn (0, $self->{msg}, $id);
}
###########################################################################
# $status->forget($id)
#
# Forget about a previously-learned message.
#
# C<$id> is an optional message-identification string, used internally
# to tag the message. If it is C<undef>, one will be generated.
# It should be unique to that message.
#
# This is a semi-private API; callers should use
# C<$spamtest-E<gt>learn($mail,$id,$isspam,$forget)> instead.
sub forget {
my ($self, $id) = @_;
# bug 4096
# if ($self->{main}->{learn_with_whitelist}) {
# $self->{main}->remove_all_addresses_from_whitelist ($self->{msg});
# }
$self->{learned} = $self->{bayes_scanner}->forget ($self->{msg}, $id);
}
###########################################################################
=item $didlearn = $status->did_learn()
Returns C<1> if the message was learned from or forgotten successfully.
=cut
sub did_learn {
my ($self) = @_;
return ($self->{learned});
}
###########################################################################
=item $status->finish()
Finish with the object.
=cut
sub finish {
my $self = shift;
%{$self} = ();
}
###########################################################################
1;
__END__
=back
=head1 SEE ALSO
Mail::SpamAssassin(3)
spamassassin(1)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,162 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::PersistentAddrList - persistent address list base class
=head1 SYNOPSIS
my $factory = PersistentAddrListSubclass->new();
$spamtest->set_persistent_addr_list_factory ($factory);
... call into SpamAssassin classes...
SpamAssassin will call:
my $addrlist = $factory->new_checker($spamtest);
$entry = $addrlist->get_addr_entry ($addr);
...
=head1 DESCRIPTION
All persistent address list implementations, used by the auto-whitelist
code to track known-good email addresses, use this as a base class.
See C<Mail::SpamAssassin::DBBasedAddrList> for an example.
=head1 METHODS
=over 4
=cut
package Mail::SpamAssassin::PersistentAddrList;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw();
###########################################################################
=item $factory = PersistentAddrListSubclass->new();
This creates a factory object, which SpamAssassin will call to create
a new checker object for the persistent address list.
=cut
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = { };
bless ($self, $class);
$self;
}
###########################################################################
=item my $addrlist = $factory->new_checker();
Create a new address-list checker object from the factory. Called by the
SpamAssassin classes.
=cut
sub new_checker {
my ($factory, $main) = @_;
die "auto-whitelist: unimplemented base method"; # override this
}
###########################################################################
=item $entry = $addrlist->get_addr_entry ($addr);
Given an email address C<$addr>, return an entry object with the details of
that address.
The entry object is a reference to a hash, which must contain at least
two keys: C<count>, which is the count of times that address has been
encountered before; and C<totscore>, which is the total of all scores for
messages associated with that address. From these two fields, an average
score will be calculated, and the score for the current message will be
regressed towards that mean message score.
The hash can contain whatever other data your back-end needs to store,
under other keys.
The method should never return C<undef>, or a hash that does not contain
a C<count> key and a C<totscore> key.
=cut
sub get_addr_entry {
my ($self, $addr, $signedby) = @_;
my $entry = { };
die "auto-whitelist: unimplemented base method"; # override this
return $entry;
}
###########################################################################
=item $entry = $addrlist->add_score($entry, $score);
This method should add the given score to the whitelist database for the
given entry, and then return the new entry.
=cut
sub add_score {
my ($self, $entry, $score) = @_;
die "auto-whitelist: unimplemented base method"; # override this
}
###########################################################################
=item $entry = $addrlist->remove_entry ($entry);
This method should remove the given entry from the whitelist database.
=cut
sub remove_entry {
my ($self, $entry) = @_;
die "auto-whitelist: unimplemented base method"; # override this
}
###########################################################################
=item $entry = $addrlist->finish ();
Clean up, if necessary. Called by SpamAssassin when it has finished
checking, or adding to, the auto-whitelist database.
=cut
sub finish {
my ($self) = @_;
}
###########################################################################
1;
=back
=cut

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,491 @@
# SpamAssassin - ASN Lookup Plugin
#
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
#
###########################################################################
#
# Originated by Matthias Leisi, 2006-12-15 (SpamAssassin enhancement #4770).
# Modifications by D. Stussy, 2010-12-15 (SpamAssassin enhancement #6484):
#
# Since SA 3.4.0 a fixed text prefix (such as AS) to each ASN is configurable
# through an asn_prefix directive. Its value is 'AS' by default for backward
# compatibility with SA 3.3.*, but is rather redundant and can be set to an
# empty string for clarity if desired.
#
# Enhanced TXT-RR decoding for alternative formats from other DNS zones.
# Some of the supported formats of TXT RR are (quoted strings here represent
# individual string fields in a TXT RR):
# "1103" "192.88.99.0" "24"
# "559 1103 1239 1257 1299 | 192.88.99.0/24 | US | iana | 2001-06-01"
# "192.88.99.0/24 | AS1103 | SURFnet, The Netherlands | 2002-10-15 | EU"
# "15169 | 2a00:1450::/32 | IE | ripencc | 2009-10-05"
# "as1103"
# Multiple routes are sometimes provided by returning multiple TXT records
# (e.g. from cymru.com). This form of a response is handled as well.
#
# Some zones also support IPv6 lookups, for example:
# asn_lookup_ipv6 origin6.asn.cymru.com [_ASN_ _ASNCIDR_]
=head1 NAME
Mail::SpamAssassin::Plugin::ASN - SpamAssassin plugin to look up the
Autonomous System Number (ASN) of the connecting IP address.
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::ASN
asn_lookup asn.routeviews.org _ASN_ _ASNCIDR_
asn_lookup_ipv6 origin6.asn.cymru.com _ASN_ _ASNCIDR_
add_header all ASN _ASN_ _ASNCIDR_
header TEST_AS1234 X-ASN =~ /^1234$/
=head1 DESCRIPTION
This plugin uses DNS lookups to the services of an external DNS zone such
as at C<http://www.routeviews.org/> to do the actual work. Please make
sure that your use of the plugin does not overload their infrastructure -
this generally means that B<you should not use this plugin in a
high-volume environment> or that you should use a local mirror of the
zone (see C<ftp://ftp.routeviews.org/dnszones/>). Other similar zones
may also be used.
=head1 TEMPLATE TAGS
This plugin allows you to create template tags containing the connecting
IP's AS number and route info for that AS number.
The default config will add a header field that looks like this:
X-Spam-ASN: AS24940 213.239.192.0/18
where "24940" is the ASN and "213.239.192.0/18" is the route
announced by that ASN where the connecting IP address came from.
If the AS announces multiple networks (more/less specific), they will
all be added to the C<_ASNCIDR_> tag, separated by spaces, eg:
X-Spam-ASN: AS1680 89.138.0.0/15 89.139.0.0/16
Note that the literal "AS" before the ASN in the _ASN_ tag is configurable
through the I<asn_prefix> directive and may be set to an empty string.
=head1 CONFIGURATION
The standard ruleset contains a configuration that will add a header field
containing ASN data to scanned messages. The bayes tokenizer will use the
added header field for bayes calculations, and thus affect which BAYES_* rule
will trigger for a particular message.
B<Note> that in most cases you should not score on the ASN data directly.
Bayes learning will probably trigger on the _ASNCIDR_ tag, but probably not
very well on the _ASN_ tag alone.
=head1 SEE ALSO
http://www.routeviews.org/ - all data regarding routing, ASNs, etc....
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4770 -
SpamAssassin Issue #4770 concerning this plugin
=head1 STATUS
No in-depth analysis of the usefulness of bayes tokenization of ASN data has
been performed.
=cut
package Mail::SpamAssassin::Plugin::ASN;
use strict;
use warnings;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(reverse_ip_address);
use Mail::SpamAssassin::Dns;
use Mail::SpamAssassin::Constants qw(:ip);
our @ISA = qw(Mail::SpamAssassin::Plugin);
our $txtdata_can_provide_a_list;
my $IPV4_ADDRESS = IPV4_ADDRESS;
sub new {
my ($class, $mailsa) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
$self->set_config($mailsa->{conf});
#$txtdata_can_provide_a_list = Net::DNS->VERSION >= 0.69;
#more robust version check from Damyan Ivanov - Bug 7095
$txtdata_can_provide_a_list = version->parse(Net::DNS->VERSION) >= version->parse('0.69');
return $self;
}
###########################################################################
sub set_config {
my ($self, $conf) = @_;
my @cmds;
=head1 ADMINISTRATOR SETTINGS
=over 4
=item asn_lookup asn-zone.example.com [ _ASNTAG_ _ASNCIDRTAG_ ]
Use this to lookup the ASN info in the specified zone for the first external
IPv4 address and add the AS number to the first specified tag and routing info
to the second specified tag.
If no tags are specified the AS number will be added to the _ASN_ tag and the
routing info will be added to the _ASNCIDR_ tag. You must specify either none
or both of the tag names. Tag names must start and end with an underscore.
If two or more I<asn_lookup>s use the same set of template tags, the results of
their lookups will be appended to each other in the template tag values in no
particular order. Duplicate results will be omitted when combining results.
In a similar fashion, you can also use the same template tag for both the AS
number tag and the routing info tag.
Examples:
asn_lookup asn.routeviews.org
asn_lookup asn.routeviews.org _ASN_ _ASNCIDR_
asn_lookup myview.example.com _MYASN_ _MYASNCIDR_
asn_lookup asn.routeviews.org _COMBINEDASN_ _COMBINEDASNCIDR_
asn_lookup myview.example.com _COMBINEDASN_ _COMBINEDASNCIDR_
asn_lookup in1tag.example.net _ASNDATA_ _ASNDATA_
=item asn_lookup_ipv6 asn-zone6.example.com [_ASN_ _ASNCIDR_]
Use specified zone for lookups of IPv6 addresses. If zone supports both
IPv4 and IPv6 queries, use both asn_lookup and asn_lookup_ipv6 for the same
zone.
=item clear_asn_lookups
Removes any previously declared I<asn_lookup> entries from a list of queries.
=item asn_prefix 'prefix_string' (default: 'AS')
The string specified in the argument is prepended to each ASN when storing
it as a tag. This prefix is rather redundant, but its default value 'AS'
is kept for backward compatibility with versions of SpamAssassin earlier
than 3.4.0. A sensible setting is an empty string. The argument may be (but
need not be) enclosed in single or double quotes for clarity.
=back
=cut
push (@cmds, {
setting => 'asn_lookup',
is_admin => 1,
code => sub {
my ($conf, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
local($1,$2,$3);
unless ($value =~ /^(\S+?)\.?(?:\s+_(\S+)_\s+_(\S+)_)?$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
my ($zone, $asn_tag, $route_tag) = ($1, $2, $3);
$asn_tag = 'ASN' if !defined $asn_tag;
$route_tag = 'ASNCIDR' if !defined $route_tag;
push @{$conf->{asnlookups}},
{ zone=>$zone, asn_tag=>$asn_tag, route_tag=>$route_tag };
}
});
push (@cmds, {
setting => 'asn_lookup_ipv6',
is_admin => 1,
code => sub {
my ($conf, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
local($1,$2,$3);
unless ($value =~ /^(\S+?)\.?(?:\s+_(\S+)_\s+_(\S+)_)?$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
my ($zone, $asn_tag, $route_tag) = ($1, $2, $3);
$asn_tag = 'ASN' if !defined $asn_tag;
$route_tag = 'ASNCIDR' if !defined $route_tag;
push @{$conf->{asnlookups_ipv6}},
{ zone=>$zone, asn_tag=>$asn_tag, route_tag=>$route_tag };
}
});
push (@cmds, {
setting => 'clear_asn_lookups',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NOARGS,
code => sub {
my ($conf, $key, $value, $line) = @_;
if (defined $value && $value ne '') {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
delete $conf->{asnlookups};
delete $conf->{asnlookups_ipv6};
}
});
push (@cmds, {
setting => 'asn_prefix',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
default => 'AS',
code => sub {
my ($conf, $key, $value, $line) = @_;
$value = '' if !defined $value;
local($1,$2);
$value = $2 if $value =~ /^(['"])(.*)\1\z/; # strip quotes if any
$conf->{$key} = $value; # keep tainted
}
});
$conf->{parser}->register_commands(\@cmds);
}
# ---------------------------------------------------------------------------
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $conf = $self->{main}->{conf};
if (!$pms->is_dns_available()) {
dbg("asn: DNS is not available, skipping ASN checks");
return;
}
if (!$conf->{asnlookups} && !$conf->{asnlookups_ipv6}) {
dbg("asn: no asn_lookups configured, skipping ASN lookups");
return;
}
# initialize the tag data so that if no result is returned from the DNS
# query we won't end up with a missing tag. Don't use $pms->set_tag()
# here to avoid triggering any tag-dependent action unnecessarily
if ($conf->{asnlookups}) {
foreach my $entry (@{$conf->{asnlookups}}) {
$pms->{tag_data}->{$entry->{asn_tag}} ||= '';
$pms->{tag_data}->{$entry->{route_tag}} ||= '';
}
}
if ($conf->{asnlookups_ipv6}) {
foreach my $entry (@{$conf->{asnlookups_ipv6}}) {
$pms->{tag_data}->{$entry->{asn_tag}} ||= '';
$pms->{tag_data}->{$entry->{route_tag}} ||= '';
}
}
# get reversed IP address of last external relay to lookup
# don't return until we've initialized the template tags
my $relay = $pms->{relays_external}->[0];
if (!defined $relay) {
dbg("asn: no first external relay IP available, skipping ASN check");
return;
} elsif ($relay->{ip_private}) {
dbg("asn: first external relay is a private IP, skipping ASN check");
return;
}
my $ip = $relay->{ip};
my $reversed_ip = reverse_ip_address($ip);
if (defined $reversed_ip) {
dbg("asn: using first external relay IP for lookups: %s", $ip);
} else {
dbg("asn: could not parse first external relay IP: %s, skipping", $ip);
return;
}
my $lookup_zone;
if ($ip =~ /^$IPV4_ADDRESS$/o) {
if (!defined $conf->{asnlookups}) {
dbg("asn: asn_lookup for IPv4 not defined, skipping");
return;
}
$lookup_zone = "asnlookups";
} else {
if (!defined $conf->{asnlookups_ipv6}) {
dbg("asn: asn_lookup_ipv6 for IPv6 not defined, skipping");
return;
}
$lookup_zone = "asnlookups_ipv6";
}
# we use arrays and array indices rather than hashes and hash keys
# in case someone wants the same zone added to multiple sets of tags
my $index = 0;
foreach my $entry (@{$conf->{$lookup_zone}}) {
# do the DNS query, have the callback process the result
my $zone_index = $index;
my $zone = $reversed_ip . '.' . $entry->{zone};
my $key = "asnlookup-${lookup_zone}-${zone_index}-".$entry->{zone};
my $ent = $pms->{async}->bgsend_and_start_lookup($zone, 'TXT', undef,
{ type => 'ASN', key => $key, zone => $lookup_zone },
sub { my($ent, $pkt) = @_;
$self->process_dns_result($pms, $pkt, $zone_index, $lookup_zone) },
master_deadline => $pms->{master_deadline}
);
$pms->register_async_rule_start($key) if $ent;
$index++;
}
}
#
# TXT-RR format of response:
# 3 fields, each as one TXT RR <character-string> (RFC 1035): ASN IP MASK
# The latter two fields are combined to create a CIDR.
# or: At least 2 fields made of a single or multiple
# <character-string>s, fields are separated by a vertical bar.
# They will be the ASN and CIDR fields in any order.
# If only one field is returned, it is the ASN. There will
# be no CIDR field in that case.
#
sub process_dns_result {
my ($self, $pms, $pkt, $zone_index, $lookup_zone) = @_;
my $conf = $self->{main}->{conf};
my $zone = $conf->{$lookup_zone}[$zone_index]->{zone};
my $asn_tag = $conf->{$lookup_zone}[$zone_index]->{asn_tag};
my $route_tag = $conf->{$lookup_zone}[$zone_index]->{route_tag};
my($any_asn_updates, $any_route_updates, $tag_value);
my(@asn_tag_data, %asn_tag_data_seen);
$tag_value = $pms->get_tag($asn_tag);
if (defined $tag_value) {
my $prefix = $pms->{conf}->{asn_prefix};
if (defined $prefix && $prefix ne '') {
# must strip prefix before splitting on whitespace
$tag_value =~ s/(^| )\Q$prefix\E(?=\d+)/$1/gs;
}
@asn_tag_data = split(/ /,$tag_value);
%asn_tag_data_seen = map(($_,1), @asn_tag_data);
}
my(@route_tag_data, %route_tag_data_seen);
$tag_value = $pms->get_tag($route_tag);
if (defined $tag_value) {
@route_tag_data = split(/ /,$tag_value);
%route_tag_data_seen = map(($_,1), @route_tag_data);
}
# NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
my @answer = !defined $pkt ? () : $pkt->answer;
foreach my $rr (@answer) {
#dbg("asn: %s: lookup result packet: %s", $zone, $rr->string);
next if $rr->type ne 'TXT';
my @strings = $txtdata_can_provide_a_list ? $rr->txtdata :
$rr->char_str_list; # historical
next if !@strings;
for (@strings) { utf8::encode($_) if utf8::is_utf8($_) }
my @items;
if (@strings > 1 && join('',@strings) !~ m{\|}) {
# routeviews.org style, multiple string fields in a TXT RR
@items = @strings;
if (@items >= 3 && $items[1] !~ m{/} && $items[2] =~ /^\d+\z/) {
$items[1] .= '/' . $items[2]; # append the net mask length to route
}
} else {
# cymru.com and spameatingmonkey.net style, or just a single field
@items = split(/\s*\|\s*/, join(' ',@strings));
}
my(@route_value, @asn_value);
if (@items && $items[0] =~ /(?: (?:^|\s+) (?:AS)? \d+ )+ \z/xsi) {
# routeviews.org and cymru.com style, ASN is the first field,
# possibly a whitespace-separated list (e.g. cymru.com)
@asn_value = split(' ',$items[0]);
@route_value = split(' ',$items[1]) if @items >= 2;
} elsif (@items > 1 && $items[1] =~ /(?: (?:^|\s+) (?:AS)? \d+ )+ \z/xsi) {
# spameatingmonkey.net style, ASN is the second field
@asn_value = split(' ',$items[1]);
@route_value = split(' ',$items[0]);
} else {
dbg("asn: unparseable response: %s", join(' ', map("\"$_\"",@strings)));
}
foreach my $route (@route_value) {
if (!defined $route || $route eq '') {
# ignore, just in case
} elsif ($route =~ m{/0+\z}) {
# unassigned/unannounced address space
} elsif ($route_tag_data_seen{$route}) {
dbg("asn: %s duplicate route %s", $route_tag, $route);
} else {
dbg("asn: %s added route %s", $route_tag, $route);
push(@route_tag_data, $route);
$route_tag_data_seen{$route} = 1;
$any_route_updates = 1;
}
}
foreach my $asn (@asn_value) {
$asn =~ s/^AS(?=\d+)//si;
if (!$asn || $asn == 4294967295) {
# unassigned/unannounced address space
} elsif ($asn_tag_data_seen{$asn}) {
dbg("asn: %s duplicate asn %s", $asn_tag, $asn);
} else {
dbg("asn: %s added asn %s", $asn_tag, $asn);
push(@asn_tag_data, $asn);
$asn_tag_data_seen{$asn} = 1;
$any_asn_updates = 1;
}
}
}
if ($any_asn_updates && @asn_tag_data) {
$pms->{msg}->put_metadata('X-ASN', join(' ',@asn_tag_data));
my $prefix = $pms->{conf}->{asn_prefix};
if (defined $prefix && $prefix ne '') { s/^/$prefix/ for @asn_tag_data }
$pms->set_tag($asn_tag,
@asn_tag_data == 1 ? $asn_tag_data[0] : \@asn_tag_data);
}
if ($any_route_updates && @route_tag_data) {
# Bayes already has X-ASN, Route is pointless duplicate, skip
#$pms->{msg}->put_metadata('X-ASN-Route', join(' ',@route_tag_data));
$pms->set_tag($route_tag,
@route_tag_data == 1 ? $route_tag_data[0] : \@route_tag_data);
}
}
# Version features
sub has_asn_lookup_ipv6 { 1 }
1;

View File

@ -0,0 +1,633 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::AWL - Normalize scores via auto-whitelist
=head1 SYNOPSIS
To try this out, add this or uncomment this line in init.pre:
loadplugin Mail::SpamAssassin::Plugin::AWL
Use the supplied 60_awl.cf file (ie you don't have to do anything) or
add these lines to a .cf file:
header AWL eval:check_from_in_auto_whitelist()
describe AWL From: address is in the auto white-list
tflags AWL userconf noautolearn
priority AWL 1000
=head1 DESCRIPTION
This plugin module provides support for the auto-whitelist. It keeps
track of the average SpamAssassin score for senders. Senders are
tracked using a combination of their From: address and their IP address.
It then uses that average score to reduce the variability in scoring
from message to message and modifies the final score by pushing the
result towards the historical average. This improves the accuracy of
filtering for most email.
=head1 TEMPLATE TAGS
This plugin module adds the following C<tags> that can be used as
placeholders in certain options. See C<Mail::SpamAssassin::Conf>
for more information on TEMPLATE TAGS.
_AWL_ AWL modifier
_AWLMEAN_ Mean score on which AWL modification is based
_AWLCOUNT_ Number of messages on which AWL modification is based
_AWLPRESCORE_ Score before AWL
=cut
package Mail::SpamAssassin::Plugin::AWL;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::AutoWhitelist;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Logger;
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
# the important bit!
$self->register_eval_rule("check_from_in_auto_whitelist");
$self->set_config($mailsaobject->{conf});
return $self;
}
sub set_config {
my($self, $conf) = @_;
my @cmds;
=head1 USER PREFERENCES
The following options can be used in both site-wide (C<local.cf>) and
user-specific (C<user_prefs>) configuration files to customize how
SpamAssassin handles incoming email messages.
=over 4
=item use_auto_whitelist ( 0 | 1 ) (default: 1)
Whether to use auto-whitelists. Auto-whitelists track the long-term
average score for each sender and then shift the score of new messages
toward that long-term average. This can increase or decrease the score
for messages, depending on the long-term behavior of the particular
correspondent.
For more information about the auto-whitelist system, please look
at the the C<Automatic Whitelist System> section of the README file.
The auto-whitelist is not intended as a general-purpose replacement
for static whitelist entries added to your config files.
Note that certain tests are ignored when determining the final
message score:
- rules with tflags set to 'noautolearn'
=cut
push (@cmds, {
setting => 'use_auto_whitelist',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
=item auto_whitelist_factor n (default: 0.5, range [0..1])
How much towards the long-term mean for the sender to regress a message.
Basically, the algorithm is to track the long-term mean score of messages for
the sender (C<mean>), and then once we have otherwise fully calculated the
score for this message (C<score>), we calculate the final score for the
message as:
C<finalscore> = C<score> + (C<mean> - C<score>) * C<factor>
So if C<factor> = 0.5, then we'll move to half way between the calculated
score and the mean. If C<factor> = 0.3, then we'll move about 1/3 of the way
from the score toward the mean. C<factor> = 1 means just use the long-term
mean; C<factor> = 0 mean just use the calculated score.
=cut
push (@cmds, {
setting => 'auto_whitelist_factor',
default => 0.5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=item auto_whitelist_ipv4_mask_len n (default: 16, range [0..32])
The AWL database keeps only the specified number of most-significant bits
of an IPv4 address in its fields, so that different individual IP addresses
within a subnet belonging to the same owner are managed under a single
database record. As we have no information available on the allocated
address ranges of senders, this CIDR mask length is only an approximation.
The default is 16 bits, corresponding to a former class B. Increase the
number if a finer granularity is desired, e.g. to 24 (class C) or 32.
A value 0 is allowed but is not particularly useful, as it would treat the
whole internet as a single organization. The number need not be a multiple
of 8, any split is allowed.
=cut
push (@cmds, {
setting => 'auto_whitelist_ipv4_mask_len',
default => 16,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || $value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
} elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_ipv4_mask_len} = $value;
}
});
=item auto_whitelist_ipv6_mask_len n (default: 48, range [0..128])
The AWL database keeps only the specified number of most-significant bits
of an IPv6 address in its fields, so that different individual IP addresses
within a subnet belonging to the same owner are managed under a single
database record. As we have no information available on the allocated address
ranges of senders, this CIDR mask length is only an approximation. The default
is 48 bits, corresponding to an address range commonly allocated to individual
(smaller) organizations. Increase the number for a finer granularity, e.g.
to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is
allowed but is not particularly useful, as it would treat the whole internet
as a single organization. The number need not be a multiple of 4, any split
is allowed.
=cut
push (@cmds, {
setting => 'auto_whitelist_ipv6_mask_len',
default => 48,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || $value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
} elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_ipv6_mask_len} = $value;
}
});
=item user_awl_sql_override_username
Used by the SQLBasedAddrList storage implementation.
If this option is set the SQLBasedAddrList module will override the set
username with the value given. This can be useful for implementing global
or group based auto-whitelist databases.
=cut
push (@cmds, {
setting => 'user_awl_sql_override_username',
default => '',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item auto_whitelist_distinguish_signed
Used by the SQLBasedAddrList storage implementation.
If this option is set the SQLBasedAddrList module will keep separate
database entries for DKIM-validated e-mail addresses and for non-validated
ones. A pre-requisite when setting this option is that a field awl.signedby
exists in a SQL table, otherwise SQL operations will fail (which is why we
need this option at all - for compatibility with pre-3.3.0 database schema).
A plugin DKIM should also be enabled, as otherwise there is no benefit from
turning on this option.
=cut
push (@cmds, {
setting => 'auto_whitelist_distinguish_signed',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
=back
=head1 ADMINISTRATOR SETTINGS
These settings differ from the ones above, in that they are considered 'more
privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section.
No matter what C<allow_user_rules> is set to, these can never be set from a
user's C<user_prefs> file.
=over 4
=item auto_whitelist_factory module (default: Mail::SpamAssassin::DBBasedAddrList)
Select alternative whitelist factory module.
=cut
push (@cmds, {
setting => 'auto_whitelist_factory',
is_admin => 1,
default => 'Mail::SpamAssassin::DBBasedAddrList',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item auto_whitelist_path /path/filename (default: ~/.spamassassin/auto-whitelist)
This is the automatic-whitelist directory and filename. By default, each user
has their own whitelist database in their C<~/.spamassassin> directory with
mode 0700. For system-wide SpamAssassin use, you may want to share this
across all users, although that is not recommended.
=cut
push (@cmds, {
setting => 'auto_whitelist_path',
is_admin => 1,
default => '__userstate__/auto-whitelist',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
if (-d $value) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_path} = $value;
}
});
=item auto_whitelist_db_modules Module ... (default: see below)
What database modules should be used for the auto-whitelist storage database
file. The first named module that can be loaded from the perl include path
will be used. The format is:
PreferredModuleName SecondBest ThirdBest ...
ie. a space-separated list of perl module names. The default is:
DB_File GDBM_File SDBM_File
NDBM_File is no longer supported, since it appears to have bugs that
preclude its use for the AWL (see SpamAssassin bug 4353).
=cut
push (@cmds, {
setting => 'auto_whitelist_db_modules',
is_admin => 1,
default => 'DB_File GDBM_File SDBM_File',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item auto_whitelist_file_mode (default: 0700)
The file mode bits used for the automatic-whitelist directory or file.
Make sure you specify this using the 'x' mode bits set, as it may also be used
to create directories. However, if a file is created, the resulting file will
not have any execute bits set (the umask is set to 0111).
=cut
push (@cmds, {
setting => 'auto_whitelist_file_mode',
is_admin => 1,
default => '0700',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value !~ /^0?[0-7]{3}$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_file_mode} = untaint_var($value);
}
});
=item user_awl_dsn DBI:databasetype:databasename:hostname:port
Used by the SQLBasedAddrList storage implementation.
This will set the DSN used to connect. Example:
C<DBI:mysql:spamassassin:localhost>
=cut
push (@cmds, {
setting => 'user_awl_dsn',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item user_awl_sql_username username
Used by the SQLBasedAddrList storage implementation.
The authorized username to connect to the above DSN.
=cut
push (@cmds, {
setting => 'user_awl_sql_username',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item user_awl_sql_password password
Used by the SQLBasedAddrList storage implementation.
The password for the database username, for the above DSN.
=cut
push (@cmds, {
setting => 'user_awl_sql_password',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item user_awl_sql_table tablename
Used by the SQLBasedAddrList storage implementation.
The table user auto-whitelists are stored in, for the above DSN.
=cut
push (@cmds, {
setting => 'user_awl_sql_table',
is_admin => 1,
default => 'awl',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
$conf->{parser}->register_commands(\@cmds);
}
sub check_from_in_auto_whitelist {
my ($self, $pms) = @_;
return 0 unless ($pms->{conf}->{use_auto_whitelist});
my $timer = $self->{main}->time_method("total_awl");
my $from = lc $pms->get('From:addr');
# dbg("auto-whitelist: From: $from");
return 0 unless $from =~ /\S/;
# find the earliest usable "originating IP". ignore private nets
my $origip;
foreach my $rly (reverse (@{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}}))
{
next if ($rly->{ip_private});
if ($rly->{ip}) {
$origip = $rly->{ip}; last;
}
}
my $scores = $pms->{conf}->{scores};
my $tflags = $pms->{conf}->{tflags};
my $points = 0;
my $signedby = $pms->get_tag('DKIMDOMAIN');
undef $signedby if defined $signedby && $signedby eq '';
foreach my $test (@{$pms->{test_names_hit}}) {
# ignore tests with 0 score in this scoreset,
# or if the test is marked as "noautolearn"
next if !$scores->{$test};
next if exists $tflags->{$test} && $tflags->{$test} =~ /\bnoautolearn\b/;
return 0 if exists $tflags->{$test} && $tflags->{$test} =~ /\bnoawl\b/;
$points += $scores->{$test};
}
my $awlpoints = (sprintf "%0.3f", $points) + 0;
# Create the AWL object
my $whitelist;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($pms->{main});
my $meanscore;
{ # check
my $timer = $self->{main}->time_method("check_awl");
$meanscore = $whitelist->check_address($from, $origip, $signedby);
}
my $delta = 0;
dbg("auto-whitelist: AWL active, pre-score: %s, autolearn score: %s, ".
"mean: %s, IP: %s, address: %s %s",
$pms->{score}, $awlpoints,
!defined $meanscore ? 'undef' : sprintf("%.3f",$meanscore),
$origip || 'undef',
$from, $signedby ? "signed by $signedby" : '(not signed)');
if (defined $meanscore) {
$delta = $meanscore - $awlpoints;
$delta *= $pms->{main}->{conf}->{auto_whitelist_factor};
$pms->set_tag('AWL', sprintf("%2.1f",$delta));
if (defined $meanscore) {
$pms->set_tag('AWLMEAN', sprintf("%2.1f", $meanscore));
}
$pms->set_tag('AWLCOUNT', sprintf("%2.1f", $whitelist->count()));
$pms->set_tag('AWLPRESCORE', sprintf("%2.1f", $pms->{score}));
}
# Update the AWL *before* adding the new score, otherwise
# early high-scoring messages are reinforced compared to
# later ones. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=159704
if (!$pms->{disable_auto_learning}) {
my $timer = $self->{main}->time_method("update_awl");
$whitelist->add_score($awlpoints);
}
# now redundant, got_hit() takes care of it
# for my $set (0..3) { # current AWL score changes with each hit
# $pms->{conf}->{scoreset}->[$set]->{"AWL"} = sprintf("%0.3f", $delta);
# }
if ($delta != 0) {
$pms->got_hit("AWL", "AWL: ", ruletype => 'eval',
score => sprintf("%0.3f", $delta));
}
$whitelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
# try an unlock, in case we got that far
eval { $whitelist->finish(); } if $whitelist;
return 0;
};
dbg("auto-whitelist: post auto-whitelist score: %.3f", $pms->{score});
# test hit is above
return 0;
}
sub blacklist_address {
my ($self, $args) = @_;
return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
unless ($args->{address}) {
print "SpamAssassin auto-whitelist: failed to add address to blacklist\n" if ($args->{cli_p});
dbg("auto-whitelist: failed to add address to blacklist");
return;
}
my $whitelist;
my $status;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
if ($whitelist->add_known_bad_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-whitelist: adding address to blacklist: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-whitelist: adding address to blacklist: " . $args->{address});
$status = 0;
}
else {
print "SpamAssassin auto-whitelist: error adding address to blacklist\n" if ($args->{cli_p});
dbg("auto-whitelist: error adding address to blacklist");
$status = 1;
}
$whitelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
eval { $whitelist->finish(); };
return 0;
};
return $status;
}
sub whitelist_address {
my ($self, $args) = @_;
return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
unless ($args->{address}) {
print "SpamAssassin auto-whitelist: failed to add address to whitelist\n" if ($args->{cli_p});
dbg("auto-whitelist: failed to add address to whitelist");
return 0;
}
my $whitelist;
my $status;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
if ($whitelist->add_known_good_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-whitelist: adding address to whitelist: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-whitelist: adding address to whitelist: " . $args->{address});
$status = 1;
}
else {
print "SpamAssassin auto-whitelist: error adding address to whitelist\n" if ($args->{cli_p});
dbg("auto-whitelist: error adding address to whitelist");
$status = 0;
}
$whitelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
eval { $whitelist->finish(); };
return 0;
};
return $status;
}
sub remove_address {
my ($self, $args) = @_;
return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
unless ($args->{address}) {
print "SpamAssassin auto-whitelist: failed to remove address\n" if ($args->{cli_p});
dbg("auto-whitelist: failed to remove address");
return 0;
}
my $whitelist;
my $status;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
if ($whitelist->remove_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-whitelist: removing address: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-whitelist: removing address: " . $args->{address});
$status = 1;
}
else {
print "SpamAssassin auto-whitelist: error removing address\n" if ($args->{cli_p});
dbg("auto-whitelist: error removing address");
$status = 0;
}
$whitelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
eval { $whitelist->finish(); };
return 0;
};
return $status;
}
1;
=back
=cut

View File

@ -0,0 +1,173 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::AccessDB - check message against Access Database
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::AccessDB
header ACCESSDB eval:check_access_database('/etc/mail/access.db')
describe ACCESSDB Message would have been caught by accessdb
tflags ACCESSDB userconf
score ACCESSDB 2
=head1 DESCRIPTION
Many MTAs support access databases, such as Sendmail, Postfix, etc.
This plugin does similar checks to see whether a message would have
been flagged.
The rule returns false if an entry isn't found, or the entry has a RHS of
I<OK> or I<SKIP>.
The rule returns true if an entry exists and has a RHS of I<REJECT>, I<ERROR>,
or I<DISCARD>.
Note: only the first word (split on non-word characters) of the RHS
is checked, so C<error:5.7.1:...> means C<ERROR>.
B<AccessDB Pointers:>
http://www.faqs.org/docs/securing/chap22sec178.html
http://www.postfix.org/access.5.html
=cut
package Mail::SpamAssassin::Plugin::AccessDB;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Fcntl;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
use constant HAS_DB_FILE => eval { require DB_File; };
sub new {
my $class = shift;
my $mailsaobject = shift;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule("check_access_database");
return $self;
}
sub check_access_database {
my ($self, $pms, $path) = @_;
if (!HAS_DB_FILE) {
return 0;
}
my %access;
my %ok = map { $_ => 1 } qw/ OK SKIP /;
my %bad = map { $_ => 1 } qw/ REJECT ERROR DISCARD /;
$path = $self->{main}->sed_path ($path);
dbg("accessdb: tie-ing to DB file R/O in $path");
if (tie %access,"DB_File",$path, O_RDONLY) {
my @lookfor;
# Look for "From:" versions as well!
foreach my $from ($pms->all_from_addrs()) {
# $user."\@"
# rotate through $domain and check
my ($user,$domain) = split(/\@/, $from,2);
push(@lookfor, "From:$from",$from);
if ($user) {
push(@lookfor, "From:$user\@", "$user\@");
}
if ($domain) {
while ($domain =~ /\./) {
push(@lookfor, "From:$domain", $domain);
$domain =~ s/^[^.]*\.//;
}
push(@lookfor, "From:$domain", $domain);
}
}
# we can only match this if we have at least 1 untrusted header
if ($pms->{num_relays_untrusted} > 0) {
my $lastunt = $pms->{relays_untrusted}->[0];
# If there was a reverse lookup, use it in a lookup
if (! $lastunt->{no_reverse_dns}) {
my $rdns = $lastunt->{lc_rdns};
while($rdns =~ /\./) {
push(@lookfor, "From:$rdns", $rdns);
$rdns =~ s/^[^.]*\.//;
}
push(@lookfor, "From:$rdns", $rdns);
}
# do both IP and net (rotate over IP)
my ($ip) = $lastunt->{ip};
$ip =~ tr/0-9.//cd;
while($ip =~ /\./) {
push(@lookfor, "From:$ip", $ip);
$ip =~ s/\.[^.]*$//;
}
push(@lookfor, "From:$ip", $ip);
}
my $retval = 0;
my %cache;
foreach (@lookfor) {
next if ($cache{$_}++);
dbg("accessdb: looking for $_");
# Some systems put a null at the end of the key, most don't...
my $result = $access{$_} || $access{"$_\000"} || next;
my ($type) = split(/\W/,$result);
$type = uc $type;
if (exists $ok{$type}) {
dbg("accessdb: hit OK: $type, $_");
$retval = 0;
last;
}
if (exists $bad{$type} || $type =~ /^\d+$/) {
$retval = 1;
dbg("accessdb: hit not-OK: $type, $_");
}
}
dbg("accessdb: untie-ing DB file $path");
untie %access;
return $retval;
}
else {
dbg("accessdb: cannot open accessdb $path R/O: $!");
}
return 0;
}
1;

View File

@ -0,0 +1,164 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
AntiVirus - simple anti-virus tests
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::AntiVirus
body MICROSOFT_EXECUTABLE eval:check_microsoft_executable()
body MIME_SUSPECT_NAME eval:check_suspect_name()
=head1 DESCRIPTION
The MICROSOFT_EXECUTABLE rule works by checking for 3 possibilities in
the message in any application/* or text/* part in the message:
=over 4
=item - in text parts, look for a uuencoded executable start string
=item - in application parts, look for filenames ending in an executable extension
=item - in application parts, look for a base64 encoded executable start string
=back
=cut
package Mail::SpamAssassin::Plugin::AntiVirus;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Util;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule("check_microsoft_executable");
$self->register_eval_rule("check_suspect_name");
return $self;
}
sub check_microsoft_executable {
my ($self, $pms) = @_;
_check_attachments(@_) unless exists $pms->{antivirus_microsoft_exe};
return $pms->{antivirus_microsoft_exe};
}
sub check_suspect_name {
my ($self, $pms) = @_;
_check_attachments(@_) unless exists $pms->{antivirus_suspect_name};
return $pms->{antivirus_suspect_name};
}
sub _check_attachments {
my ($self, $pms) = @_;
$pms->{antivirus_microsoft_exe} = 0;
$pms->{antivirus_suspect_name} = 0;
# MICROSOFT_EXECUTABLE triggered here
foreach my $p ($pms->{msg}->find_parts(qr/./, 1)) {
my ($ctype, $boundary, $charset, $name) =
Mail::SpamAssassin::Util::parse_content_type($p->get_header('content-type'));
$name = lc($name || '');
my $cte = lc($p->get_header('content-transfer-encoding') || '');
$ctype = lc $ctype;
if ($name && $name =~ /\.(?:ade|adp|asx|bas|bat|chm|cmd|com|cpl|crt|dll|exe|hlp|hta|inf|ins|isp|js|jse|lnk|mda|mdb|mde|mdt|mdw|mdz|msc|msi|msp|mst|nws|ops|pcd|pif|prf|reg|scf|scr\??|sct|shb|shs|shm|swf|url|vb|vbe|vbs|vbx|vxd|wsc|wsf|wsh)$/)
{
# file extension indicates an executable
$pms->{antivirus_microsoft_exe} = 1;
}
elsif ($cte =~ /base64/ && defined $p->raw()->[0] &&
$p->raw()->[0] =~ /^TV[opqr].A..[AB].[AQgw][A-H].A/)
{
# base64-encoded executable
$pms->{antivirus_microsoft_exe} = 1;
}
elsif ($ctype =~ /^text\b/) {
# uuencoded executable
for (@{$p->raw()}) {
if (/^M35[GHIJK].`..`..*````/) {
# uuencoded executable
$pms->{antivirus_microsoft_exe} = 1;
}
}
}
# MIME_SUSPECT_NAME triggered here
if ($name && $ctype ne "application/octet-stream") {
$name =~ s/.*\.//;
$ctype =~ s@/(x-|vnd\.)@/@;
if (
# text
(($name =~ /^(?:txt|[px]?html?|xml)$/) &&
($ctype !~ m@^(?:text/(?:plain|[px]?html?|english|sgml|xml|enriched|richtext)|message/external-body)@)) ||
# image
(($name =~ /^(?:jpe?g|tiff?|gif|png)$/) &&
($ctype !~ m@^(?:image/|application/mac-binhex)@)) ||
# vcard
(($name eq "vcf") && $ctype ne "text/vcard") ||
# application
(($name =~ /^(?:bat|com|exe|pif|scr|swf|vbs)$/) &&
($ctype !~ m@^application/@)) ||
# msword
(($name eq "doc") && ($ctype !~ m@^application/.*word$@)) ||
# powerpoint
(($name eq "ppt") &&
($ctype !~ m@^application/.*(?:powerpoint|ppt)$@)) ||
# excel
(($name eq "xls") && ($ctype !~ m@^application/.*excel$@))
)
{
$pms->{antivirus_suspect_name} = 1;
}
}
}
}
1;

View File

@ -0,0 +1,660 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
AskDNS - form a DNS query using tag values, and look up the DNSxL lists
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::AskDNS
askdns D_IN_DWL _DKIMDOMAIN_._vouch.dwl.spamhaus.org TXT /\b(transaction|list|all)\b/
=head1 DESCRIPTION
Using a DNS query template as specified in a parameter of a askdns rule,
the plugin replaces tag names as found in the template with their values
and launches DNS queries as soon as tag values become available. When DNS
responses trickle in, filters them according to the requested DNS resource
record type and optional subrule filtering expression, yielding a rule hit
if a response meets filtering conditions.
=head1 USER SETTINGS
=over 4
=item rbl_timeout t [t_min] [zone] (default: 15 3)
The rbl_timeout setting is common to all DNS querying rules (as implemented
by other plugins). It can specify a DNS query timeout globally, or individually
for each zone. When the zone parameter is specified, the settings affects DNS
queries when their query domain equals the specified zone, or is its subdomain.
See the C<Mail::SpamAssassin::Conf> POD for details on C<rbl_timeout>.
=back
=head1 RULE DEFINITIONS
=over 4
=item askdns NAME_OF_RULE query_template [rr_type [subqueryfilter]]
A query template is a string which will be expanded to produce a domain name
to be used in a DNS query. The template may include SpamAssassin tag names,
which will be replaced by their values to form a final query domain.
The final query domain must adhere to rules governing DNS domains, i.e.
must consist of fields each up to 63 characters long, delimited by dots.
There may be a trailing dot at the end, but it is redundant / carries
no semantics, because SpamAssassin uses a Net::DSN::Resolver::send method
for querying DNS, which ignores any 'search' or 'domain' DNS resolver options.
Domain names in DNS queries are case-insensitive.
A tag name is a string of capital letters, preceded and followed by an
underscore character. This syntax mirrors the add_header setting, except that
tags cannot have parameters in parenthesis when used in askdns templates.
Tag names may appear anywhere in the template - each queried DNS zone
prescribes how a query should be formed.
A query template may contain any number of tag names including none,
although in the most common anticipated scenario exactly one tag name would
appear in each askdns rule. Specified tag names are considered dependencies.
Askdns rules with dependencies on the same set of tags are grouped, and all
queries in a group are launched as soon as all their dependencies are met,
i.e. when the last of the awaited tag values becomes available by a call
to set_tag() from some other plugin or elsewhere in the SpamAssassin code.
Launched queries from all askdns rules are grouped too according to a pair
of: query type and an expanded query domain name. Even if there are multiple
rules producing the same type/domain pair, only one DNS query is launched,
and a reply to such query contributes to all the constituent rules.
A tag may produce none, one or multiple values. Askdns rules awaiting for
a tag which never receives its value never result in a DNS query. Tags which
produce multiple values will result in multiple queries launched, each with
an expanded template using one of the tag values. An example is a DKIMDOMAIN
tag which yields a list of signing domains, one for each valid signature in
a signed message.
When more than one distinct tag name appears in a template, each potentially
resulting in multiple values, a Cartesian product is formed, and each tuple
results in a launch of one DNS query (duplicates excluded). For example,
a query template _A_._B_.example._A_.com where tag A is a list (11,22)
and B is (xx,yy,zz), will result in queries: 11.xx.example.11.com,
22.xx.example.22.com, 11.yy.example.11.com, 22.yy.example.22.com,
11.zz.example.11.com, 22.zz.example.22.com .
A parameter rr_type following the query template is a comma-separated list
of expected DNS resource record (RR) types. Missing rr_type parameter implies
an 'A'. A DNS result may bring resource records of multiple types, but only
resource records of a type found in the rr_type parameter list are considered,
other resource records found in the answer section of a DNS reply are ignored
for this rule. A value ANY in the rr_type parameter list matches any resource
record type. An empty DNS answer section does not match ANY.
The rr_type parameter not only provides a filter for RR types found in
the DNS answer, but also determines the DNS query type. If only a single
RR type is specified in the parameter (e.g. TXT), than this is also the RR
type of a query. When more than one RR type is specified (e.g. A, AAAA, TXT)
or if ANY is specified, then the DNS query type will be ANY and the rr_type
parameter will only act as a filter on a result.
Currently recognized RR types in the rr_type parameter are: ANY, A, AAAA,
MX, TXT, PTR, NAPTR, NS, SOA, CERT, CNAME, DNAME, DHCID, HINFO, MINFO,
RP, HIP, IPSECKEY, KX, LOC, SRV, SSHFP, SPF.
https://www.iana.org/assignments/dns-parameters/dns-parameters.xml
The last optional parameter of a rule is a filtering expression, a.k.a. a
subrule. Its function is much like the subrule in URIDNSBL plugin rules,
or in the check_rbl eval rules. The main difference is that with askdns
rules there is no need to manually group rules according to their queried
zone, as the grouping is automatic and duplicate queries are implicitly
eliminated.
The subrule filtering parameter can be: a plain string, a regular expression,
a single numerical value or a pair of numerical values, or a list of rcodes
(DNS status codes of a response). Absence of the filtering parameter implies
no filtering, i.e. any positive DNS response (rcode=NOERROR) of the requested
RR type will result in a rule hit, regardless of the RR value returned with
the response.
When a plain string is used as a filter, it must be enclosed in single or
double quotes. For the rule to hit, the response must match the filtering
string exactly, and a RR type of a response must match the query type.
Typical use is an exact text string for TXT queries, or an exact quad-dotted
IPv4 address. In case of a TXT or SPF resource record which can return
multiple character-strings (as defined in Section 3.3 of [RFC1035]), these
strings are concatenated with no delimiters before comparing the result
to the filtering string. This follows requirements of several documents,
such as RFC 5518, RFC 7208, RFC 4871, RFC 5617. Examples of a plain text
filtering parameter: "127.0.0.1", "transaction", 'list' .
A regular expression follows a familiar perl syntax like /.../ or m{...}
optionally followed by regexp flags (such as 'i' for case-insensitivity).
If a DNS response matches the requested RR type and the regular expression,
the rule hits. Examples: /^127\.0\.0\.\d+$/, m{\bdial up\b}i .
A single numerical value can be a decimal number, or a hexadecimal number
prefixed by 0x. Such numeric filtering expression is typically used with
RR type-A DNS queries. The returned value (an IPv4 address) is masked
with a specified filtering value and tested to fall within a 127.0.0.0/8
network range - the rule hits if the result is nonzero:
((r & n) != 0) && ((r & 0xff000000) == 0x7f000000). An example: 0x10 .
A pair of numerical values (each a decimal, hexadecimal or quad-dotted)
delimited by a '-' specifies an IPv4 address range, and a pair of values
delimited by a '/' specifies an IPv4 address followed by a bitmask. Again,
this type of filtering expression is primarily intended with RR type-A
DNS queries. The rule hits if the RR type matches, and the returned IP
address falls within the specified range: (r >= n1 && r <= n2), or
masked with a bitmask matches the specified value: (r & m) == (n & m) .
As a shorthand notation, a single quad-dotted value is equivalent to
a n-n form, i.e. it must match the returned value exactly with all its bits.
Some typical examples of a numeric filtering parameter are: 127.0.1.2,
127.0.1.20-127.0.1.39, 127.0.1.0/255.255.255.0, 0.0.0.16/0.0.0.16,
0x10/0x10, 16, 0x10 .
Lastly, the filtering parameter can be a comma-separated list of DNS status
codes (rcode), enclosed in square brackets. Rcodes can be represented either
by their numeric decimal values (0=NOERROR, 3=NXDOMAIN, ...), or their names.
See https://www.iana.org/assignments/dns-parameters for the list of names. When
testing for a rcode where rcode is nonzero, a RR type parameter is ignored
as a filter, as there is typically no answer section in a DNS reply when
rcode indicates an error. Example: [NXDOMAIN], or [FormErr,ServFail,4,5] .
=back
=cut
package Mail::SpamAssassin::Plugin::AskDNS;
use strict;
use warnings;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Util qw(decode_dns_question_entry);
use Mail::SpamAssassin::Logger;
use version 0.77;
our @ISA = qw(Mail::SpamAssassin::Plugin);
our %rcode_value = ( # https://www.iana.org/assignments/dns-parameters, RFC 6195
NOERROR => 0, FORMERR => 1, SERVFAIL => 2, NXDOMAIN => 3, NOTIMP => 4,
REFUSED => 5, YXDOMAIN => 6, YXRRSET => 7, NXRRSET => 8, NOTAUTH => 9,
NOTZONE => 10, BADVERS => 16, BADSIG => 16, BADKEY => 17, BADTIME => 18,
BADMODE => 19, BADNAME => 20, BADALG => 21, BADTRUNC => 22,
);
our $txtdata_can_provide_a_list;
sub new {
my($class,$sa_main) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($sa_main);
bless($self, $class);
$self->set_config($sa_main->{conf});
#$txtdata_can_provide_a_list = Net::DNS->VERSION >= 0.69;
#more robust version check from Damyan Ivanov - Bug 7095
$txtdata_can_provide_a_list = version->parse(Net::DNS->VERSION) >= version->parse('0.69');
return $self;
}
# ---------------------------------------------------------------------------
# Accepts argument as a string in single or double quotes, or as a regular
# expression in // or m{} notation, or as a numerical value or a pair of
# numerical values, or as a bracketed and comma-separated list of DNS rcode
# names or their numerical codes. Recognized numerical forms are: m, n1-n2,
# or n/m, where n,n1,n2,m can be any of: decimal digits, 0x followed by
# up to 8 hexadecimal digits, or an IPv4 address in quad-dotted notation.
# The argument is checked for syntax, undef is returned on syntax errors.
# A string that looks like a regular expression is converted to a compiled
# Regexp object and returned as a result. Otherwise, numeric components of
# the remaining three forms are converted as follows: hex or decimal numeric
# strings are converted to a number and a quad-dot is converted to a number,
# then components are reassembled into a string delimited by '-' or '/'.
# As a special backward compatibility measure, a single quad-dot (with no
# second number) is converted into n-n, to distinguish it from a traditional
# mask-only form. A list or rcodes is returned as a hashref, where keys
# represent specified numerical rcodes.
#
# Arguments like the following are anticipated:
# "127.0.0.1", "some text", 'some "more" text',
# /regexp/flags, m{regexp}flags,
# 127.0.1.2 (same as 127.0.1.2-127.0.1.2 or 127.0.1.2/255.255.255.255)
# 127.0.1.20-127.0.1.39 (= 0x7f000114-0x7f000127 or 2130706708-2130706727)
# 0.0.0.16/0.0.0.16 (same as 0x10/0x10 or 16/0x10 or 16/16)
# 16 (traditional style mask-only, same as 0x10)
# [NXDOMAIN], [FormErr,ServFail,4,5]
#
sub parse_and_canonicalize_subtest {
my($subtest) = @_;
my $result;
local($1,$2,$3);
# modifiers /a, /d, /l, /u in suffix form were added with perl 5.13.10 (5.14)
# currently known modifiers are [msixoadlu], but let's not be too picky here
if ( $subtest =~ m{^ / (.+) / ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* \( (.+) \) ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* \[ (.+) \] ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* \{ (.+) \} ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* < (.+) > ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* (\S) (.+) \1 ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ (["']) (.*) \1 \z}xs) { # quoted string
$result = $2;
} elsif ($subtest =~ m{^ \[ ( (?:[A-Z]+|\d+)
(?: \s* , \s* (?:[A-Z]+|\d+) )* ) \] \z}xis) {
# a comma-separated list of rcode names or their decimal values
my @rcodes = split(/\s*,\s*/, uc $1);
for (@rcodes) { $_ = $rcode_value{$_} if exists $rcode_value{$_} }
return if grep(!/^\d+\z/, @rcodes);
# a hashref indicates a list of DNS rcodes (stored as hash keys)
$result = { map( ($_,1), @rcodes) };
} elsif ($subtest =~ m{^ ([^/-]+) (?: ([/-]) (.+) )? \z}xs) {
my($n1,$delim,$n2) = ($1,$2,$3);
my $any_quad_dot;
for ($n1,$n2) {
if (!defined $_) {
# ok, $n2 may not exist
} elsif (/^\d{1,10}\z/) {
$_ = 0 + $_; # decimal string -> number
} elsif (/^0x[0-9a-zA-Z]{1,8}\z/) {
$_ = hex($_); # hex string -> number
} elsif (/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\z/) {
$_ = Mail::SpamAssassin::Util::my_inet_aton($_); # quad-dot -> number
$any_quad_dot = 1;
} else {
return;
}
}
$result = defined $n2 ? $n1.$delim.$n2
: $any_quad_dot ? $n1.'-'.$n1 : "$n1";
}
return $result;
}
sub set_config {
my($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'askdns',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my($self, $key, $value, $line) = @_;
local($1,$2,$3,$4);
if (!defined $value || $value =~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
} elsif ($value !~ /^ (\S+) \s+ (\S+)
(?: \s+ ([A-Za-z0-9,]+)
(?: \s+ (.*?) )? )? \s* $/xs) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
} else {
my($rulename,$query_template,$query_type,$subtest) = ($1,$2,$3,$4);
$query_type = 'A' if !defined $query_type;
$query_type = uc $query_type;
my @answer_types = split(/,/, $query_type);
# https://www.iana.org/assignments/dns-parameters/dns-parameters.xml
if (grep(!/^(?:ANY|A|AAAA|MX|TXT|PTR|NAPTR|NS|SOA|CERT|CNAME|DNAME|
DHCID|HINFO|MINFO|RP|HIP|IPSECKEY|KX|LOC|SRV|
SSHFP|SPF)\z/x, @answer_types)) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$query_type = 'ANY' if @answer_types > 1 || $answer_types[0] eq 'ANY';
if (defined $subtest) {
$subtest = parse_and_canonicalize_subtest($subtest);
defined $subtest or return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
# collect tag names as used in each query template
my @tags = $query_template =~ /_([A-Z][A-Z0-9]*)_/g;
my %seen; @tags = grep(!$seen{$_}++, @tags); # filter out duplicates
# group rules by tag names used in them (to be used as a hash key)
my $depends_on_tags = !@tags ? '' : join(',',@tags);
# subgroup rules by a DNS RR type and a nonexpanded query template
my $query_template_key = $query_type . ':' . $query_template;
$self->{askdns}{$depends_on_tags}{$query_template_key} ||=
{ query => $query_template, rules => {}, q_type => $query_type,
a_types => # optimization: undef means "same as q_type"
@answer_types == 1 && $answer_types[0] eq $query_type ? undef
: \@answer_types };
$self->{askdns}{$depends_on_tags}{$query_template_key}{rules}{$rulename}
= $subtest;
# dbg("askdns: rule: %s, config dep: %s, domkey: %s, subtest: %s",
# $rulename, $depends_on_tags, $query_template_key, $subtest);
# just define the test so that scores and lint works
$self->{parser}->add_test($rulename, undef,
$Mail::SpamAssassin::Conf::TYPE_EMPTY_TESTS);
}
}
});
$conf->{parser}->register_commands(\@cmds);
}
# run as early as possible, launching DNS queries as soon as their
# dependencies are fulfilled
#
sub parsed_metadata {
my($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $conf = $pms->{conf};
return if !$pms->is_dns_available;
$pms->{askdns_map_dnskey_to_rules} = {};
# walk through all collected askdns rules, obtain tag values whenever
# they may become available, and launch DNS queries right after
#
for my $depends_on_tags (keys %{$conf->{askdns}}) {
my @tags;
@tags = split(/,/, $depends_on_tags) if $depends_on_tags ne '';
if (would_log("dbg","askdns")) {
while ( my($query_template_key, $struct) =
each %{$conf->{askdns}{$depends_on_tags}} ) {
my($query_template, $query_type, $answer_types_ref, $rules) =
@$struct{qw(query q_type a_types rules)};
dbg("askdns: depend on tags %s, rules: %s ",
$depends_on_tags, join(', ', keys %$rules));
}
}
if (!@tags) {
# no dependencies on tags, just call directly
$self->launch_queries($pms,$depends_on_tags);
} else {
# enqueue callback for tags needed
$pms->action_depends_on_tags(@tags == 1 ? $tags[0] : \@tags,
sub { my($pms,@args) = @_;
$self->launch_queries($pms,$depends_on_tags) }
);
}
}
}
# generate DNS queries - called for each set of rules
# when their tag dependencies are met
#
sub launch_queries {
my($self, $pms, $depends_on_tags) = @_;
my $conf = $pms->{conf};
my %tags;
# obtain tag/value pairs of tags we depend upon in this set of rules
if ($depends_on_tags ne '') {
%tags = map( ($_,$pms->get_tag($_)), split(/,/,$depends_on_tags) );
}
dbg("askdns: preparing queries which depend on tags: %s",
join(', ', map($_.' => '.$tags{$_}, keys %tags)));
# replace tag names in a query template with actual tag values
# and launch DNS queries
while ( my($query_template_key, $struct) =
each %{$conf->{askdns}{$depends_on_tags}} ) {
my($query_template, $query_type, $answer_types_ref, $rules) =
@$struct{qw(query q_type a_types rules)};
my @rulenames = keys %$rules;
if (grep($conf->{scores}->{$_}, @rulenames)) {
dbg("askdns: query template %s, type %s, rules: %s",
$query_template,
!$answer_types_ref ? $query_type
: $query_type.'/'.join(',',@$answer_types_ref),
join(', ', @rulenames));
} else {
dbg("askdns: query template %s, type %s, all rules disabled: %s",
$query_template, $query_type, join(', ', @rulenames));
next;
}
# collect all tag names from a template, each may occur more than once
my @templ_tags = $query_template =~ /_([A-Z][A-Z0-9]*)_/gs;
# filter out duplicate tag names, and tags with undefined or empty value
my %seen;
@templ_tags = grep(!$seen{$_}++ && defined $tags{$_} && $tags{$_} ne '',
@templ_tags);
my %templ_vals; # values that each tag takes
for my $t (@templ_tags) {
my %seen;
# a tag value may be a space-separated list,
# store it as an arrayref, removing duplicate values
$templ_vals{$t} = [ grep(!$seen{$_}++, split(' ',$tags{$t})) ];
}
# count through all tag value tuples
my @digit = (0) x @templ_tags; # counting accumulator
OUTER:
for (;;) {
my %current_tag_val; # maps a tag name to its current iteration value
for my $j (0 .. $#templ_tags) {
my $t = $templ_tags[$j];
$current_tag_val{$t} = $templ_vals{$t}[$digit[$j]];
}
local $1;
my $query_domain = $query_template;
$query_domain =~ s{_([A-Z][A-Z0-9]*)_}
{ defined $current_tag_val{$1} ? $current_tag_val{$1}
: '' }ge;
# the $dnskey identifies this query in AsyncLoop's pending_lookups
my $dnskey = join(':', 'askdns', $query_type, $query_domain);
dbg("askdns: expanded query %s, dns key %s", $query_domain, $dnskey);
if ($query_domain eq '') {
# ignore, just in case
} else {
if (!exists $pms->{askdns_map_dnskey_to_rules}{$dnskey}) {
$pms->{askdns_map_dnskey_to_rules}{$dnskey} =
[ [$query_type, $answer_types_ref, $rules] ];
} else {
push(@{$pms->{askdns_map_dnskey_to_rules}{$dnskey}},
[$query_type, $answer_types_ref, $rules] );
}
# launch a new DNS query for $query_type and $query_domain
my $ent = $pms->{async}->bgsend_and_start_lookup(
$query_domain, $query_type, undef,
{ key => $dnskey, zone => $query_domain },
sub { my ($ent2,$pkt) = @_;
$self->process_response_packet($pms, $ent2, $pkt, $dnskey) },
master_deadline => $pms->{master_deadline} );
# these rules are now underway; unless the rule hits, these will
# not be considered "finished" until harvest_dnsbl_queries() completes
$pms->register_async_rule_start($dnskey) if $ent;
}
last if !@templ_tags;
# increment accumulator, little-endian
for (my $j = 0; ; $j++) {
last if ++$digit[$j] <= $#{$templ_vals{$templ_tags[$j]}};
$digit[$j] = 0; # and carry
last OUTER if $j >= $#templ_tags;
}
}
}
}
sub process_response_packet {
my($self, $pms, $ent, $pkt, $dnskey) = @_;
my $conf = $pms->{conf};
my %rulenames_hit;
# map a dnskey back to info on queries which caused this DNS lookup
my $queries_ref = $pms->{askdns_map_dnskey_to_rules}{$dnskey};
my($header, @question, @answer, $qtype, $rcode);
# NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
if ($pkt) {
@answer = $pkt->answer;
$header = $pkt->header;
@question = $pkt->question;
$qtype = uc $question[0]->qtype if @question;
$rcode = uc $header->rcode if $header; # 'NOERROR', 'NXDOMAIN', ...
# NOTE: qname is encoded in RFC 1035 zone format, decode it
dbg("askdns: answer received, rcode %s, query %s, answer has %d records",
$rcode,
join(', ', map(join('/', decode_dns_question_entry($_)), @question)),
scalar @answer);
if (defined $rcode && exists $rcode_value{$rcode}) {
# Net::DNS return a rcode name for codes it knows about,
# and returns a number for the rest; we deal with numbers from here on
$rcode = $rcode_value{$rcode} if exists $rcode_value{$rcode};
}
}
if (!@answer) {
# a trick to make the following loop run at least once, so that we can
# evaluate also rules which only care for rcode status
@answer = ( undef );
}
# NOTE: $rr->rdstring returns the result encoded in a DNS zone file
# format, i.e. enclosed in double quotes if a result contains whitespace
# (or other funny characters), and may use \DDD encoding or \X quoting as
# per RFC 1035. Using $rr->txtdata instead avoids this unnecessary encoding
# step and a need for decoding by a caller, returning an unmodified string.
# Caveat: in case of multiple RDATA <character-string> fields contained
# in a resource record (TXT, SPF, HINFO), starting with Net::DNS 0.69
# the $rr->txtdata in a list context returns these strings as a list.
# The $rr->txtdata in a scalar context always returns a single string
# with <character-string> fields joined by a single space character as
# a separator. The $rr->txtdata in Net::DNS 0.68 and older returned
# such joined space-separated string even in a list context.
# RFC 5518: If the RDATA in a TXT record contains multiple
# character-strings (as defined in Section 3.3 of [RFC1035]),
# the code handling such reply from DNS MUST assemble all of these
# marshaled text blocks into a single one before any syntactical
# verification takes place.
# The same goes for RFC 4408 (SPF), RFC 4871 (DKIM), RFC 5617 (ADSP),
# draft-kucherawy-dmarc-base (DMARC), ...
for my $rr (@answer) {
my($rr_rdatastr, $rdatanum, $rr_type);
if (!$rr) {
# special case, no answer records, only rcode can be tested
} else {
$rr_type = uc $rr->type;
if ($rr_type eq 'A') {
# Net::DNS::RR::A::address() is available since Net::DNS 0.69
$rr_rdatastr = $rr->UNIVERSAL::can('address') ? $rr->address
: $rr->rdatastr;
if ($rr_rdatastr =~ m/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\z/) {
$rdatanum = Mail::SpamAssassin::Util::my_inet_aton($rr_rdatastr);
}
} elsif ($rr->UNIVERSAL::can('txtdata')) {
# TXT, SPF: join with no intervening spaces, as per RFC 5518
if ($txtdata_can_provide_a_list || $rr_type ne 'TXT') {
$rr_rdatastr = join('', $rr->txtdata); # txtdata() in list context!
} else { # char_str_list() is only available for TXT records
$rr_rdatastr = join('', $rr->char_str_list); # historical
}
} else {
# rdatastr() is historical, use rdstring() since Net::DNS 0.69
$rr_rdatastr = $rr->UNIVERSAL::can('rdstring') ? $rr->rdstring
: $rr->rdatastr;
utf8::encode($rr_rdatastr) if utf8::is_utf8($rr_rdatastr);
}
# dbg("askdns: received rr type %s, data: %s", $rr_type, $rr_rdatastr);
}
my $j = 0;
for my $q_tuple (!ref $queries_ref ? () : @$queries_ref) {
next if !$q_tuple;
my($query_type, $answer_types_ref, $rules) = @$q_tuple;
next if !defined $qtype || $query_type ne $qtype;
$answer_types_ref = [$query_type] if !defined $answer_types_ref;
# mark rule as done
$pms->{askdns_map_dnskey_to_rules}{$dnskey}[$j++] = undef;
while (my($rulename,$subtest) = each %$rules) {
my $match;
local($1,$2,$3);
if (ref $subtest eq 'HASH') { # a list of DNS rcodes (as hash keys)
$match = 1 if $subtest->{$rcode};
} elsif ($rcode != 0) {
# skip remaining tests on DNS error
} elsif (!defined($rr_type) ||
!grep($_ eq 'ANY' || $_ eq $rr_type, @$answer_types_ref) ) {
# skip remaining tests on wrong RR type
} elsif (!defined $subtest) {
$match = 1; # any valid response of the requested RR type matches
} elsif (ref $subtest eq 'Regexp') { # a regular expression
$match = 1 if $rr_rdatastr =~ $subtest;
} elsif ($rr_rdatastr eq $subtest) { # exact equality
$match = 1;
} elsif (defined $rdatanum &&
$subtest =~ m{^ (\d+) (?: ([/-]) (\d+) )? \z}x) {
my($n1,$delim,$n2) = ($1,$2,$3);
$match =
!defined $n2 ? ($rdatanum & $n1) && # mask only
(($rdatanum & 0xff000000) == 0x7f000000) # 127/8
: $delim eq '-' ? $rdatanum >= $n1 && $rdatanum <= $n2 # range
: $delim eq '/' ? ($rdatanum & $n2) == (int($n1) & $n2) # value/mask
: 0; # notice int($n1) to fix perl ~5.14 taint bug (Bug 7725)
}
if ($match) {
$self->askdns_hit($pms, $ent->{query_domain}, $qtype,
$rr_rdatastr, $rulename);
$rulenames_hit{$rulename} = 1;
}
}
}
}
# these rules have completed (since they got at least 1 hit)
$pms->register_async_rule_finish($_) for keys %rulenames_hit;
}
sub askdns_hit {
my($self, $pms, $query_domain, $qtype, $rr_rdatastr, $rulename) = @_;
$rr_rdatastr = '' if !defined $rr_rdatastr; # e.g. with rules testing rcode
dbg('askdns: domain "%s" listed (%s): %s',
$query_domain, $rulename, $rr_rdatastr);
# only the first hit will show in the test log report, even if
# an answer section matches more than once - got_hit() handles this
$pms->clear_test_state;
$pms->test_log(sprintf("%s %s:%s", $query_domain,$qtype,$rr_rdatastr));
$pms->got_hit($rulename, 'ASKDNS: ', ruletype => 'askdns'); # score=>$score
}
1;

View File

@ -0,0 +1,261 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
=head1 DESCRIPTION
This plugin implements the threshold-based auto-learning discriminator
for SpamAssassin's Bayes subsystem. Auto-learning is a mechanism
whereby high-scoring mails (or low-scoring mails, for non-spam) are fed
into its learning systems without user intervention, during scanning.
Note that certain tests are ignored when determining whether a message
should be trained upon:
=over 4
=item * rules with tflags set to 'learn' (the Bayesian rules)
=item * rules with tflags set to 'userconf' (user configuration)
=item * rules with tflags set to 'noautolearn'
=back
Also note that auto-learning occurs using scores from either scoreset 0
or 1, depending on what scoreset is used during message check. It is
likely that the message check and auto-learn scores will be different.
=cut
package Mail::SpamAssassin::Plugin::AutoLearnThreshold;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub new {
my $class = shift;
my $mailsaobject = shift;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->set_config($mailsaobject->{conf});
return $self;
}
sub set_config {
my($self, $conf) = @_;
my @cmds;
=head1 USER OPTIONS
The following configuration settings are used to control auto-learning:
=over 4
=item bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
The score threshold below which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a non-spam message.
=cut
push (@cmds, {
setting => 'bayes_auto_learn_threshold_nonspam',
default => 0.1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=item bayes_auto_learn_threshold_spam n.nn (default: 12.0)
The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.
Note: SpamAssassin requires at least 3 points from the header, and 3
points from the body to auto-learn as spam. Therefore, the minimum
working value for this option is 6.
If the test option autolearn_force is set, the minimum value will
remain at 6 points but there is no requirement that the points come
from body and header rules. This option is useful for autolearning
with rules that are considered to be extremely safe indicators of
the spaminess of a message.
=cut
push (@cmds, {
setting => 'bayes_auto_learn_threshold_spam',
default => 12.0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=item bayes_auto_learn_on_error (0 | 1) (default: 0)
With C<bayes_auto_learn_on_error> off, autolearning will be performed
even if bayes classifier already agrees with the new classification (i.e.
yielded BAYES_00 for what we are now trying to teach it as ham, or yielded
BAYES_99 for spam). This is a traditional setting, the default was chosen
to retain backward compatibility.
With C<bayes_auto_learn_on_error> turned on, autolearning will be performed
only when a bayes classifier had a different opinion from what the autolearner
is now trying to teach it (i.e. it made an error in judgement). This strategy
may or may not produce better future classifications, but usually works
very well, while also preventing unnecessary overlearning and slows down
database growth.
=cut
push (@cmds, {
setting => 'bayes_auto_learn_on_error',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
$conf->{parser}->register_commands(\@cmds);
}
sub autolearn_discriminator {
my ($self, $params) = @_;
my $scan = $params->{permsgstatus};
my $conf = $scan->{conf};
# Figure out min/max for autolearning.
# Default to specified auto_learn_threshold settings
my $min = $conf->{bayes_auto_learn_threshold_nonspam};
my $max = $conf->{bayes_auto_learn_threshold_spam};
# Find out what score we should consider this message to have ...
my $score = $scan->get_autolearn_points();
my $body_only_points = $scan->get_body_only_points();
my $head_only_points = $scan->get_head_only_points();
my $learned_points = $scan->get_learned_points();
# find out if any of the tests added an autolearn_force status
my $force_autolearn = $scan->get_autolearn_force_status();
my $force_autolearn_names = $scan->get_autolearn_force_names();
dbg("learn: auto-learn? ham=$min, spam=$max, ".
"body-points=".$body_only_points.", ".
"head-points=".$head_only_points.", ".
"learned-points=".$learned_points);
my $isspam;
if ($score < $min) {
$isspam = 0;
} elsif ($score >= $max) {
$isspam = 1;
} else {
dbg("learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam");
return;
}
my $learner_said_ham_points = -1.0;
my $learner_said_spam_points = 1.0;
if ($isspam) {
my $required_body_points = 3;
my $required_head_points = 3;
#Set a lower threshold of "just has to be spam" if autolearn_force was set on a rule
if ($force_autolearn) {
$required_body_points = -99;
$required_head_points = -99;
dbg("learn: auto-learn: autolearn_force flagged for a rule. Removing separate body and head point threshold. Body Only Points: $body_only_points ($required_body_points req'd) / Head Only Points: $head_only_points ($required_head_points req'd)");
dbg("learn: auto-learn: autolearn_force flagged because of rule(s): $force_autolearn_names");
} else {
dbg("learn: auto-learn: autolearn_force not flagged for a rule. Body Only Points: $body_only_points ($required_body_points req'd) / Head Only Points: $head_only_points ($required_head_points req'd)");
}
if ($body_only_points < $required_body_points) {
dbg("learn: auto-learn? no: scored as spam but too few body points (".
$body_only_points." < ".$required_body_points.")");
return;
}
if ($head_only_points < $required_head_points) {
dbg("learn: auto-learn? no: scored as spam but too few head points (".
$head_only_points." < ".$required_head_points.")");
return;
}
if ($learned_points < $learner_said_ham_points) {
dbg("learn: auto-learn? no: scored as spam but learner indicated ham (".
$learned_points." < ".$learner_said_ham_points.")");
return;
}
if (!$scan->is_spam()) {
dbg("learn: auto-learn? no: scored as ham but autolearn wanted spam");
return;
}
} else {
if ($learned_points > $learner_said_spam_points) {
dbg("learn: auto-learn? no: scored as ham but learner indicated spam (".
$learned_points." > ".$learner_said_spam_points.")");
return;
}
if ($scan->is_spam()) {
dbg("learn: auto-learn? no: scored as spam but autolearn wanted ham");
return;
}
}
if ($conf->{bayes_auto_learn_on_error}) {
# learn-on-error strategy chosen:
# only allow learning if the autolearning classifier was unsure or
# had a different opinion from what we are trying to make it learn
#
my $tests = $scan->get_tag('TESTS');
if (defined $tests && $tests ne 'none') {
my %t = map { ($_,1) } split(/,/, $tests);
if ($isspam && $t{'BAYES_99'} || !$isspam && $t{'BAYES_00'}) {
dbg("learn: auto-learn? no: learn-on-error, %s, already classified ".
"as such", $isspam ? 'spam' : 'ham');
return;
}
}
}
dbg("learn: auto-learn? yes, ".($isspam?"spam ($score > $max)":"ham ($score < $min)")." autolearn_force=".($force_autolearn?"yes":"no"));
#Return an array reference because call_plugins only carry's one return value
return [$isspam, $force_autolearn, $force_autolearn_names];
}
1;
=back
=cut

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,301 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Plugin::BodyEval;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:sa);
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
# the important bit!
$self->register_eval_rule("multipart_alternative_difference");
$self->register_eval_rule("multipart_alternative_difference_count");
$self->register_eval_rule("check_blank_line_ratio");
$self->register_eval_rule("tvd_vertical_words");
$self->register_eval_rule("check_stock_info");
$self->register_eval_rule("check_body_length");
return $self;
}
sub multipart_alternative_difference {
my ($self, $pms, $fulltext, $min, $max) = @_;
$self->_multipart_alternative_difference($pms) unless (exists $pms->{madiff});
if (($min == 0 || $pms->{madiff} > $min) &&
($max eq "undef" || $pms->{madiff} <= $max)) {
return 1;
}
return 0;
}
sub multipart_alternative_difference_count {
my ($self, $pms, $fulltext, $ratio, $minhtml) = @_;
$self->_multipart_alternative_difference($pms) unless (exists $pms->{madiff});
return 0 unless $pms->{madiff_html} > $minhtml;
return(($pms->{madiff_text} / $pms->{madiff_html}) > $ratio);
}
sub _multipart_alternative_difference {
my ($self, $pms) = @_;
$pms->{madiff} = 0;
$pms->{madiff_html} = 0;
$pms->{madiff_text} = 0;
my $msg = $pms->{msg};
# Find all multipart/alternative parts in the message
my @ma = $msg->find_parts(qr@^multipart/alternative\b@i);
# If there are no multipart/alternative sections, skip this test.
return if (!@ma);
# Figure out what the MIME content of the message looks like
my @content = $msg->content_summary();
# Exchange meeting requests come in as m/a text/html text/calendar,
# which we want to ignore because of the high FP rate it would cause.
#
if (@content == 3 && $content[2] eq 'text/calendar' &&
$content[1] eq 'text/html' &&
$content[0] eq 'multipart/alternative') {
return;
}
# Go through each of the multipart parts
foreach my $part (@ma) {
my %html;
my %text;
# limit our search to text-based parts
my @txt = $part->find_parts(qr@^text\b@i);
foreach my $text (@txt) {
# we only care about the rendered version of the part
my ($type, $rnd) = $text->rendered();
next unless defined $type;
# parse the rendered text into tokens. assume they are whitespace
# separated, and ignore anything that doesn't have a word-character
# in it (0-9a-zA-Z_) since those are probably things like bullet
# points, horizontal lines, etc. this assumes that punctuation
# in one part will be the same in other parts.
#
if ($type eq 'text/html') {
foreach my $w (grep(/\w/, split(/\s+/, $rnd))) {
#dbg("eval: HTML: $w");
$html{$w}++;
}
# If there are no words, mark if there's at least 1 image ...
if (!%html && exists $pms->{html}{inside}{img}) {
# Use "\n" as the mark since it can't ever occur normally
$html{"\n"}=1;
}
}
else {
foreach my $w (grep(/\w/, split(/\s+/, $rnd))) {
#dbg("eval: TEXT: $w");
$text{$w}++;
}
}
}
# How many HTML tokens do we have at the start?
my $orig = keys %html;
next if ($orig == 0);
$pms->{madiff_html} = $orig;
$pms->{madiff_text} = keys %text;
dbg('eval: text words: ' . $pms->{madiff_text} . ', html words: ' . $pms->{madiff_html});
# If the token appears at least as many times in the text part as
# in the html part, remove it from the list of html tokens.
while(my ($k,$v) = each %text) {
delete $html{$k} if (exists $html{$k} && $html{$k}-$text{$k} < 1);
}
#map { dbg("eval: LEFT: $_") } keys %html;
# In theory, the tokens should be the same in both text and html
# parts, so there would be 0 tokens left in the html token list, for
# a 0% difference rate. Calculate it here, and record the difference
# if it's been the highest so far in this message.
my $diff = scalar(keys %html)/$orig*100;
$pms->{madiff} = $diff if ($diff > $pms->{madiff});
dbg("eval: " . sprintf "madiff: left: %d, orig: %d, max-difference: %0.2f%%", scalar(keys %html), $orig, $pms->{madiff});
}
return;
}
sub check_blank_line_ratio {
my ($self, $pms, $fulltext, $min, $max, $minlines) = @_;
if (!defined $minlines || $minlines < 1) {
$minlines = 1;
}
my $blank_line_ratio_ref = $pms->{blank_line_ratio};
if (! exists $blank_line_ratio_ref->{$minlines}) {
$fulltext = $pms->get_decoded_body_text_array();
my $blank = 0;
my $nlines = 0;
foreach my $chunk (@$fulltext) {
foreach (split(/^/m, $chunk, -1)) {
$nlines++;
$blank++ if !/\S/;
}
}
# report -1 if it's a blank message ...
$blank_line_ratio_ref->{$minlines} =
$nlines < $minlines ? -1 : 100 * $blank / $nlines;
}
return (($min == 0 && $blank_line_ratio_ref->{$minlines} <= $max) ||
($blank_line_ratio_ref->{$minlines} > $min &&
$blank_line_ratio_ref->{$minlines} <= $max));
}
sub tvd_vertical_words {
my ($self, $pms, $text, $min, $max) = @_;
# klugy
$max = 101 if ($max >= 100);
if (!defined $pms->{tvd_vertical_words}) {
$pms->{tvd_vertical_words} = -1;
foreach (@{$text}) {
my $l = length $_;
next unless ($l > 5);
my $spaces = tr/ / /;
my $nonspaces = $l - $spaces;
my $pct;
if ($spaces > $nonspaces || $nonspaces == 0) {
$pct = 100;
}
else {
$pct = int(100*$spaces/$nonspaces);
}
$pms->{tvd_vertical_words} = $pct if ($pct > $pms->{tvd_vertical_words});
}
}
dbg("eval: tvd_vertical_words value: $pms->{tvd_vertical_words} / min: $min / max: $max - value must be >= min and < max");
return 1 if ($pms->{tvd_vertical_words} >= $min && $pms->{tvd_vertical_words} < $max);
}
sub check_stock_info {
my ($self, $pms, $fulltext, $min) = @_;
$self->_check_stock_info($pms) unless (exists $pms->{stock_info});
if ($min == 0 || $pms->{stock_info} >= $min) {
return 1;
}
return 0;
}
sub _check_stock_info {
my ($self, $pms) = @_;
$pms->{stock_info} = 0;
# Find all multipart/alternative parts in the message
my @parts = $pms->{msg}->find_parts(qr@^text/plain$@i);
return if (!@parts);
# Go through each of the multipart parts
my %hits;
my $part = $parts[0];
my ($type, $rnd) = $part->rendered();
return unless $type;
# bug 5644,5717: avoid pathological cases where a regexp takes massive amount
# of time by applying the regexp to limited-size text chunks, one at a time
foreach my $rnd_chunk (
Mail::SpamAssassin::Message::split_into_array_of_short_paragraphs($rnd))
{
foreach ( $rnd_chunk =~ /^\s*([^:\s][^:\n]{2,29})\s*:\s*\S/mg ) {
my $str = lc $_;
$str =~ tr/a-z//cd;
#$str =~ s/([a-z])0([a-z])/$1o$2/g;
if ($str =~ /(
^trad(?:e|ing)date|
company(?:name)?|
s\w?(?:t\w?o\w?c\w?k|y\w?m(?:\w?b\w?o\w?l)?)|
t(?:arget|icker)|
(?:opening|current)p(?:rice)?|
p(?:rojected|osition)|
expectations|
weeks?high|
marketperformance|
(?:year|week|month|day|price)(?:target|estimates?)|
sector|
r(?:ecommendation|ating)
)$/x) {
$hits{$1}++;
dbg("eval: stock info hit: $1");
}
}
}
$pms->{stock_info} = scalar keys %hits;
dbg("eval: stock info total: ".$pms->{stock_info});
return;
}
sub check_body_length {
my ($self, $pms, undef, $min) = @_;
my $body_length = $pms->{msg}->{pristine_body_length};
dbg("eval: body_length - %s - check for min of %s", $body_length, $min);
return (defined $body_length && $body_length <= $min) ? 1 : 0;
}
# ---------------------------------------------------------------------------
# capability checks for "if can()":
#
sub has_check_body_length { 1 }
1;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,662 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
DNSEVAL - look up URLs against DNS blocklists
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::DNSEval
rbl_headers EnvelopeFrom,Reply-To,Disposition-Notification-To
header RBL_IP eval:check_rbl_headers('rbl', 'rbl.example.com.', '127.0.0.2')
describe RBL_IP From address associated with spam domains
tflags RBL_IP net
reuse RBL_IP
Supported extra tflags from SpamAssassin 3.4.3:
domains_only - only non-IP-address "host" components are queried
ips_only - only IP addresses as the "host" component will be queried
=head1 DESCRIPTION
The DNSEval plugin queries dns to see if a domain or an ip address
present on one of email's headers is on a particular rbl.
=cut
package Mail::SpamAssassin::Plugin::DNSEval;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:ip);
use Mail::SpamAssassin::Util qw(reverse_ip_address);
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
my $IP_ADDRESS = IP_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
# this is done this way so that the same list can be used here and in
# check_start()
$self->{'evalrules'} = [
'check_rbl_accreditor',
'check_rbl',
'check_rbl_ns_from',
'check_rbl_txt',
'check_rbl_sub',
'check_rbl_results_for',
'check_rbl_from_host',
'check_rbl_from_domain',
'check_rbl_envfrom',
'check_rbl_headers',
'check_rbl_rcvd',
'check_dns_sender',
];
$self->set_config($mailsaobject->{conf});
foreach(@{$self->{'evalrules'}}) {
$self->register_eval_rule($_);
}
return $self;
}
=head1 USER PREFERENCES
The following options can be used in both site-wide (C<local.cf>) and
user-specific (C<user_prefs>) configuration files to customize how
SpamAssassin handles incoming email messages.
=over
=item rbl_headers
This option tells SpamAssassin in which headers to check for content
used to query the specified rbl.
If on the headers content there is an email address, an ip address
or a domain name, it will be checked on the specified rbl.
The configuration option can be overridden by passing an headers list as
last parameter to check_rbl_headers.
The default headers checked are:
=back
=over
=item *
EnvelopeFrom
=item *
Reply-To
=item *
Disposition-Notification-To
=item *
X-WebmailclientIP
=item *
X-Source-IP
=back
=cut
sub set_config {
my ($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'rbl_headers',
default => 'EnvelopeFrom,Reply-To,Disposition-Notification-To,X-WebmailclientIP,X-Source-IP',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
}
);
$conf->{parser}->register_commands(\@cmds);
}
# this is necessary because PMS::run_rbl_eval_tests() calls these functions
# directly as part of PMS
sub check_start {
my ($self, $opts) = @_;
foreach(@{$self->{'evalrules'}}) {
$opts->{'permsgstatus'}->register_plugin_eval_glue($_);
}
}
sub ip_list_uniq_and_strip_private {
my ($self, @origips) = @_;
my @ips;
my %seen;
my $IP_PRIVATE = IP_PRIVATE;
foreach my $ip (@origips) {
next unless $ip;
next if (exists ($seen{$ip})); $seen{$ip} = 1;
next if ($ip =~ /$IP_PRIVATE/o);
push(@ips, $ip);
}
return @ips;
}
# check an RBL if the message contains an "accreditor assertion,"
# that is, the message contains the name of a service that will vouch
# for their practices.
#
sub check_rbl_accreditor {
my ($self, $pms, $rule, $set, $rbl_server, $subtest, $accreditor) = @_;
if (!defined $pms->{accreditor_tag}) {
$self->message_accreditor_tag($pms);
}
if ($pms->{accreditor_tag}->{$accreditor}) {
$self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
}
return 0;
}
# Check for an Accreditor Assertion within the message, that is, the name of
# a third-party who will vouch for the sender's practices. The accreditor
# can be asserted in the EnvelopeFrom like this:
#
# listowner@a--accreditor.mail.example.com
#
# or in an 'Accreditor" Header field, like this:
#
# Accreditor: accreditor1, parm=value; accreditor2, parm-value
#
# This implementation supports multiple accreditors, but ignores any
# parameters in the header field.
#
sub message_accreditor_tag {
my ($self, $pms) = @_;
my %acctags;
if ($pms->get('EnvelopeFrom:addr') =~ /[@.]a--([a-z0-9]{3,})\./i) {
(my $tag = $1) =~ tr/A-Z/a-z/;
$acctags{$tag} = -1;
}
my $accreditor_field = $pms->get('Accreditor',undef);
if (defined $accreditor_field) {
my @accreditors = split(/,/, $accreditor_field);
foreach my $accreditor (@accreditors) {
my @terms = split(' ', $accreditor);
if ($#terms >= 0) {
my $tag = $terms[0];
$tag =~ tr/A-Z/a-z/;
$acctags{$tag} = -1;
}
}
}
$pms->{accreditor_tag} = \%acctags;
}
sub check_rbl_backend {
my ($self, $pms, $rule, $set, $rbl_server, $type, $subtest) = @_;
local ($_);
# First check that DNS is available, if not do not perform this check
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
if (($rbl_server !~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) &&
(index($rbl_server, '.') >= 0) &&
($rbl_server !~ /\.$/)) {
$rbl_server .= ".";
}
dbg("dns: checking RBL $rbl_server, set $set");
# ok, make a list of all the IPs in the untrusted set
my @fullips = map { $_->{ip} } @{$pms->{relays_untrusted}};
# now, make a list of all the IPs in the external set, for use in
# notfirsthop testing. This will often be more IPs than found
# in @fullips. It includes the IPs that are trusted, but
# not in internal_networks.
my @fullexternal = map {
(!$_->{internal}) ? ($_->{ip}) : ()
} @{$pms->{relays_trusted}};
push (@fullexternal, @fullips); # add untrusted set too
# Make sure a header significantly improves results before adding here
# X-Sender-Ip: could be worth using (very low occurence for me)
# X-Sender: has a very low bang-for-buck for me
my $IP_ADDRESS = IP_ADDRESS;
my @originating;
for my $header (@{$pms->{conf}->{originating_ip_headers}}) {
my $str = $pms->get($header,undef);
next unless defined $str && $str ne '';
push (@originating, ($str =~ m/($IP_ADDRESS)/g));
}
# Let's go ahead and trim away all private ips (KLC)
# also uniq the list and strip dups. (jm)
my @ips = $self->ip_list_uniq_and_strip_private(@fullips);
# if there's no untrusted IPs, it means we trust all the open-internet
# relays, so we can return right now.
return 0 unless (scalar @ips + scalar @originating > 0);
dbg("dns: IPs found: full-external: ".join(", ", @fullexternal).
" untrusted: ".join(", ", @ips).
" originating: ".join(", ", @originating));
my $trusted = $self->{main}->{conf}->{trusted_networks};
# If name is foo-notfirsthop, check all addresses except for
# the originating one. Suitable for use with dialup lists, like the PDL.
# note that if there's only 1 IP in the untrusted set, do NOT pop the
# list, since it'd remove that one, and a legit user is supposed to
# use their SMTP server (ie. have at least 1 more hop)!
# If name is foo-lastexternal, check only the Received header just before
# it enters our internal networks; we can trust it and it's the one that
# passed mail between networks
if ($set =~ /-(notfirsthop|lastexternal)$/)
{
# use the external IP set, instead of the trusted set; the user may have
# specified some third-party relays as trusted. Also, don't use
# @originating; those headers are added by a phase of relaying through
# a server like Hotmail, which is not going to be in dialup lists anyway.
@ips = $self->ip_list_uniq_and_strip_private(@fullexternal);
if ($1 eq "lastexternal") {
@ips = (defined $ips[0]) ? ($ips[0]) : ();
} else {
pop @ips if (scalar @ips > 1);
}
}
# If name is foo-firsttrusted, check only the Received header just
# after it enters our trusted networks; that's the only one we can
# trust the IP address from (since our relay added that header).
# And if name is foo-untrusted, check any untrusted IP address.
elsif ($set =~ /-(first|un)trusted$/)
{
my @tips;
foreach my $ip (@originating) {
if ($ip && !$trusted->contains_ip($ip)) {
push(@tips, $ip);
}
}
@ips = $self->ip_list_uniq_and_strip_private (@ips, @tips);
if ($1 eq "first") {
@ips = (defined $ips[0]) ? ($ips[0]) : ();
} else {
shift @ips;
}
}
else
{
my @tips;
foreach my $ip (@originating) {
if ($ip && !$trusted->contains_ip($ip)) {
push(@tips, $ip);
}
}
# add originating IPs as untrusted IPs (if they are untrusted)
@ips = reverse $self->ip_list_uniq_and_strip_private (@ips, @tips);
}
# How many IPs max you check in the received lines
my $checklast=$self->{main}->{conf}->{num_check_received};
if (scalar @ips > $checklast) {
splice (@ips, $checklast); # remove all others
}
my $tflags = $pms->{conf}->{tflags}->{$rule};
# Trusted relays should only be checked against nice rules (dnswls)
if (defined $tflags && $tflags !~ /\bnice\b/) {
# remove trusted hosts from beginning
while (@ips && $trusted->contains_ip($ips[0])) { shift @ips }
}
unless (scalar @ips > 0) {
dbg("dns: no untrusted IPs to check");
return 0;
}
dbg("dns: only inspecting the following IPs: ".join(", ", @ips));
eval {
foreach my $ip (@ips) {
my $revip = reverse_ip_address($ip);
$pms->do_rbl_lookup($rule, $set, $type,
$revip.'.'.$rbl_server, $subtest) if defined $revip;
}
};
# note that results are not handled here, hits are handled directly
# as DNS responses are harvested
return 0;
}
sub check_rbl {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
$self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
}
sub check_rbl_txt {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
$self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'TXT', $subtest);
}
# run for first message
sub check_rbl_sub {
my ($self, $pms, $rule, $set, $subtest) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
$pms->register_rbl_subtest($rule, $set, $subtest);
}
# backward compatibility
sub check_rbl_results_for {
#warn "dns: check_rbl_results_for() is deprecated, use check_rbl_sub()\n";
check_rbl_sub(@_);
}
# this only checks the address host name and not the domain name because
# using the domain name had much worse results for dsn.rfc-ignorant.org
sub check_rbl_from_host {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
_check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $_[1]->all_from_addrs());
}
sub check_rbl_headers {
my ($self, $pms, $rule, $set, $rbl_server, $subtest, $test_headers) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
my @env_hdr;
my $conf = $self->{main}->{conf};
if ( defined $test_headers ) {
@env_hdr = split(/,/, $test_headers);
} else {
@env_hdr = split(/,/, $conf->{rbl_headers});
}
foreach my $rbl_headers (@env_hdr) {
my $addr = $_[1]->get($rbl_headers.':addr', undef);
if ( defined $addr && $addr =~ /\@([^\@\s]+)/ ) {
$self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
$subtest, $addr);
} else {
my $host = $pms->get($rbl_headers);
chomp($host);
if($host =~ /^$IP_ADDRESS$/ ) {
return if ($conf->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
$host = reverse_ip_address($host);
} else {
return if ($conf->{tflags}->{$rule}||'') =~ /\bips_only\b/;
}
$pms->do_rbl_lookup($rule, $set, 'A',
"$host.$rbl_server", $subtest) if ( defined $host and $host ne "");
}
}
}
=over 4
=item check_rbl_from_domain
This checks all the from addrs domain names as an alternate to check_rbl_from_host. As of v3.4.1, it has been improved to include a subtest for a specific octet.
=back
=cut
sub check_rbl_from_domain {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
_check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $_[1]->all_from_addrs_domains());
}
=over 4
=item check_rbl_ns_from
This checks the dns server of the from addrs domain name.
It is possible to include a subtest for a specific octet.
=back
=cut
sub check_rbl_ns_from {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
my $domain;
my @nshost = ();
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
for my $from ($pms->get('EnvelopeFrom:addr')) {
next unless defined $from;
$from =~ tr/././s; # bug 3366
if ($from =~ m/ \@ ( [^\@\s]+ \. [^\@\s]+ )/x ) {
$domain = lc($1);
last;
}
}
return 0 unless defined $domain;
dbg("dns: checking NS for host $domain");
my $key = "NS:" . $domain;
my $obj = { dom => $domain, rule => $rule, set => $set, rbl_server => $rbl_server, subtest => $subtest };
my $ent = {
key => $key, zone => $domain, obj => $obj, type => "URI-NS",
};
# dig $dom ns
$ent = $pms->{async}->bgsend_and_start_lookup(
$domain, 'NS', undef, $ent,
sub { my ($ent2,$pkt) = @_;
$self->complete_ns_lookup($pms, $ent2, $pkt, $domain) },
master_deadline => $pms->{master_deadline} );
return $ent;
}
sub complete_ns_lookup {
my ($self, $pms, $ent, $pkt, $host) = @_;
my $rule = $ent->{obj}->{rule};
my $set = $ent->{obj}->{set};
my $rbl_server = $ent->{obj}->{rbl_server};
my $subtest = $ent->{obj}->{subtest};
if (!$pkt) {
# $pkt will be undef if the DNS query was aborted (e.g. timed out)
dbg("DNSEval: complete_ns_lookup aborted %s", $ent->{key});
return;
}
dbg("DNSEval: complete_ns_lookup %s", $ent->{key});
my @ns = $pkt->authority;
foreach my $rr (@ns) {
my $nshost = $rr->mname;
if(defined($nshost)) {
chomp($nshost);
if ( defined $subtest ) {
dbg("dns: checking [$nshost] / $rule / $set / $rbl_server / $subtest");
} else {
dbg("dns: checking [$nshost] / $rule / $set / $rbl_server");
}
$pms->do_rbl_lookup($rule, $set, 'A',
"$nshost.$rbl_server", $subtest) if ( defined $nshost and $nshost ne "");
}
}
}
=over 4
=item check_rbl_rcvd
This checks all received headers domains or ip addresses against a specific rbl.
It is possible to include a subtest for a specific octet.
=back
=cut
sub check_rbl_rcvd {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
my %seen;
my @udnsrcvd = ();
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
my $rcvd = $pms->{relays_untrusted}->[$pms->{num_relays_untrusted} - 1];
my @dnsrcvd = ( $rcvd->{ip}, $rcvd->{by}, $rcvd->{helo}, $rcvd->{rdns} );
# unique values
foreach my $value (@dnsrcvd) {
if ( ( defined $value ) && (! $seen{$value}++ ) ) {
push @udnsrcvd, $value;
}
}
foreach my $host ( @udnsrcvd ) {
if((defined $host) and ($host ne "")) {
chomp($host);
if($host =~ /^$IP_ADDRESS$/ ) {
next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
$host = reverse_ip_address($host);
} else {
next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bips_only\b/;
$host =~ s/\.$//;
}
if ( defined $subtest ) {
dbg("dns: checking [$host] / $rule / $set / $rbl_server / $subtest");
} else {
dbg("dns: checking [$host] / $rule / $set / $rbl_server");
}
$pms->do_rbl_lookup($rule, $set, 'A',
"$host.$rbl_server", $subtest) if ( defined $host and $host ne "");
}
}
return 0;
}
# this only checks the address host name and not the domain name because
# using the domain name had much worse results for dsn.rfc-ignorant.org
sub check_rbl_envfrom {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
_check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $_[1]->get('EnvelopeFrom:addr',undef));
}
sub _check_rbl_addresses {
my ($self, $pms, $rule, $set, $rbl_server, $subtest, @addresses) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
my %hosts;
for (@addresses) {
next if !defined($_) || !/ \@ ( [^\@\s]+ )/x;
my $address = $1;
# strip leading & trailing dots (as seen in some e-mail addresses)
$address =~ s/^\.+//; $address =~ s/\.+\z//;
# squash duplicate dots to avoid an invalid DNS query with a null label
$address =~ tr/.//s;
$hosts{lc($address)} = 1 if $address =~ /\./; # must by a FQDN
}
return unless scalar keys %hosts;
if (($rbl_server !~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) &&
(index($rbl_server, '.') >= 0) &&
($rbl_server !~ /\.$/)) {
$rbl_server .= ".";
}
for my $host (keys %hosts) {
if ($host =~ /^$IP_ADDRESS$/) {
next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
$host = reverse_ip_address($host);
} else {
next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bips_only\b/;
}
dbg("dns: checking [$host] / $rule / $set / $rbl_server");
$pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
}
}
sub check_dns_sender {
my ($self, $pms, $rule) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
my $host;
for my $from ($pms->get('EnvelopeFrom:addr',undef)) {
next unless defined $from;
$from =~ tr/././s; # bug 3366
if ($from =~ m/ \@ ( [^\@\s]+ \. [^\@\s]+ )/x ) {
$host = lc($1);
last;
}
}
return 0 unless defined $host;
if ($host eq 'compiling.spamassassin.taint.org') {
# only used when compiling
return 0;
}
dbg("dns: checking A and MX for host $host");
$pms->do_dns_lookup($rule, 'A', $host);
$pms->do_dns_lookup($rule, 'MX', $host);
# cache name of host for later checking
$pms->{sender_host} = $host;
return 0;
}
# capability checks for "if can(Mail::SpamAssassin::Plugin::DNSEval::XXX)":
#
sub has_tflags_domains_only { 1 }
sub has_tflags_ips_only { 1 }
1;

View File

@ -0,0 +1,650 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Plugin::FreeMail;
use strict;
use warnings;
use re 'taint';
my $VERSION = 2.003;
=head1 NAME
FreeMail - check message headers/body for freemail-domains
=head1 SYNOPSIS
If for example From-address is freemail, and Reply-To or address found in mail body is
different freemail address, return success. Good sign of Nigerian scams
etc. Test idea from Marc Perkel.
Also separate functions to check various portions of message for freemails.
=head1 CONFIGURATION
freemail_domains domain ...
List of domains to be used in checks.
Regexp is not supported, but following wildcards work:
? for single character (does not match a dot)
* for multiple characters (does not match a dot)
For example:
freemail_domains hotmail.com hotmail.co.?? yahoo.* yahoo.*.*
freemail_whitelist email/domain ...
Emails or domains listed here are ignored (pretend they aren't
freemail). No wildcards!
freemail_import_whitelist_auth 1/0
Entries in whitelist_auth will also be used to whitelist emails
or domains from being freemail. Default is 0.
freemail_import_def_whitelist_auth 1/0
Entries in def_whitelist_auth will also be used to whitelist emails
or domains from being freemail. Default is 0.
header FREEMAIL_REPLYTO eval:check_freemail_replyto(['option'])
Checks/compares freemail addresses found from headers and body.
Possible options:
replyto From: or body address is different than Reply-To
(this is the default)
reply as above, but if no Reply-To header is found,
compares From: and body
header FREEMAIL_FROM eval:check_freemail_from(['regex'])
Checks all possible "from" headers to see if sender is freemail.
Uses SA all_from_addrs() function (includes 'Resent-From', 'From',
'EnvelopeFrom' etc).
Add optional regex to match the found email address(es). For example,
to see if user ends in digit: check_freemail_from('\d@')
If you use multiple check_freemail_from rules with regexes, remember
that they might hit different emails from different heades. To match
a certain header only, use check_freemail_header.
header FREEMAIL_HDRX eval:check_freemail_header('header' [, 'regex'])
Searches defined header for freemail address. Optional regex to match
the found address (like in check_freemail_from).
header FREEMAIL_BODY eval:check_freemail_body(['regex'])
Searches body for freemail address. With optional regex to match.
=head1 CHANGELOG
1.996 - fix freemail_skip_bulk_envfrom
1.997 - set freemail_skip_when_over_max to 1 by default
1.998 - don't warn about missing freemail_domains when linting
1.999 - default whitelist undisclosed-recipient@yahoo.com etc
2.000 - some cleaning up
2.001 - fix freemail_whitelist
2.002 - _add_desc -> _got_hit, fix description email append bug
2.003 - freemail_import_(def_)whitelist_auth
=cut
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::Util qw(compile_regexp);
our @ISA = qw(Mail::SpamAssassin::Plugin);
# default email whitelist
our $email_whitelist = qr/
^(?:
abuse|support|sales|info|helpdesk|contact|kontakt
| (?:post|host|domain)master
| undisclosed.* # yahoo.com etc(?)
| request-[a-f0-9]{16} # live.com
| bounced?- # yahoo.com etc
| [a-f0-9]{8}(?:\.[a-f0-9]{8}|-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) # gmail msgids?
| .+=.+=.+ # gmail forward
)\@
/xi;
# skip replyto check when envelope sender is
# allow <> for now
# no re "strict"; # since perl 5.21.8: Ranges of ASCII printables...
our $skip_replyto_envfrom = qr/
(?:
^(?:post|host|domain)master
| ^double-bounce
| ^(?:sentto|owner|return|(?:gr)?bounced?)-.+
| -(?:request|bounces?|admin|owner)
| \b(?:do[._-t]?)?no[._-t]?repl(?:y|ies)
| .+=.+
)\@
/xi;
sub dbg { Mail::SpamAssassin::Plugin::dbg ("FreeMail: @_"); }
sub new {
my ($class, $mailsa) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
$self->{freemail_available} = 1;
$self->set_config($mailsa->{conf});
$self->register_eval_rule("check_freemail_replyto");
$self->register_eval_rule("check_freemail_from");
$self->register_eval_rule("check_freemail_header");
$self->register_eval_rule("check_freemail_body");
return $self;
}
sub _init_email_regex {
my ($self) = @_;
dbg("initializing email regex");
# Some regexp tips courtesy of http://www.regular-expressions.info/email.html
# full email regex v0.02
$self->{email_regex} = qr/
(?=.{0,64}\@) # limit userpart to 64 chars (and speed up searching?)
(?<![a-z0-9!#\$%&'*+\/=?^_`{|}~-]) # start boundary
( # capture email
[a-z0-9!#\$%&'*+\/=?^_`{|}~-]+ # no dot in beginning
(?:\.[a-z0-9!#\$%&'*+\/=?^_`{|}~-]+)* # no consecutive dots, no ending dot
\@
(?:[a-z0-9](?:[a-z0-9-]{0,59}[a-z0-9])?\.){1,4} # max 4x61 char parts (should be enough?)
$self->{main}->{registryboundaries}->{valid_tlds_re} # ends with valid tld
)
/xi;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'freemail_max_body_emails',
default => 5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_max_body_freemails',
default => 3,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_skip_when_over_max',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_skip_bulk_envfrom',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_add_describe_email',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_import_whitelist_auth',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_import_def_whitelist_auth',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
$conf->{parser}->register_commands(\@cmds);
}
sub parse_config {
my ($self, $opts) = @_;
if ($opts->{key} eq "freemail_domains") {
foreach my $temp (split(/\s+/, $opts->{value})) {
if ($temp =~ /^[a-z0-9.*?-]+$/i) {
my $value = lc($temp);
if ($value =~ /[*?]/) { # separate wildcard list
$self->{freemail_temp_wc}{$value} = 1;
}
else {
$self->{freemail_domains}{$value} = 1;
}
}
else {
warn("invalid freemail_domains: $temp");
}
}
$self->inhibit_further_callbacks();
return 1;
}
if ($opts->{key} eq "freemail_whitelist") {
foreach my $temp (split(/\s+/, $opts->{value})) {
my $value = lc($temp);
if ($value =~ /\w[.@]\w/) {
$self->{freemail_whitelist}{$value} = 1;
}
else {
warn("invalid freemail_whitelist: $temp");
}
}
$self->inhibit_further_callbacks();
return 1;
}
return 0;
}
sub finish_parsing_end {
my ($self, $opts) = @_;
my $wcount = 0;
if (defined $self->{freemail_temp_wc}) {
my @domains;
foreach my $value (keys %{$self->{freemail_temp_wc}}) {
$value =~ s/\./\\./g;
$value =~ s/\?/./g;
$value =~ s/\*/[^.]*/g;
push(@domains, $value);
}
my $doms = join('|', @domains);
$self->{freemail_domains_re} = qr/\@(?:${doms})$/;
$wcount = scalar @domains;
undef $self->{freemail_temp_wc};
delete $self->{freemail_temp_wc};
}
my $count = scalar keys %{$self->{freemail_domains}};
if ($count + $wcount) {
dbg("loaded freemail_domains entries: $count normal, $wcount wildcard");
}
else {
if ($self->{main}->{lint_rules} ||1) {
dbg("no freemail_domains entries defined, disabling plugin");
}
else {
warn("no freemail_domains entries defined, disabling plugin");
}
$self->{freemail_available} = 0;
}
# valid_tlds_re will be available at finish_parsing_end, compile it now,
# we only need to do it once and before possible forking
if ($self->{freemail_available} && !$self->{email_regex}) {
$self->_init_email_regex();
}
return 0;
}
sub _is_freemail {
my ($self, $email, $pms) = @_;
return 0 if $email eq '';
if (defined $self->{freemail_whitelist}{$email}) {
dbg("whitelisted email: $email");
return 0;
}
my $domain = $email;
$domain =~ s/.*\@//;
if (defined $self->{freemail_whitelist}{$domain}) {
dbg("whitelisted domain: $domain");
return 0;
}
if ($email =~ $email_whitelist) {
dbg("whitelisted email, default: $email");
return 0;
}
foreach my $list ('whitelist_auth','def_whitelist_auth') {
if ($pms->{conf}->{"freemail_import_$list"}) {
foreach my $regexp (values %{$pms->{conf}->{$list}}) {
if ($email =~ /$regexp/o) {
dbg("whitelisted email, $list: $email");
return 0;
}
}
}
}
if (defined $self->{freemail_domains}{$domain}
or ( defined $self->{freemail_domains_re}
and $email =~ $self->{freemail_domains_re} )) {
return 1;
}
return 0;
}
sub _parse_body {
my ($self, $pms) = @_;
# Parse body
if (not defined $pms->{freemail_cache}{body}) {
%{$pms->{freemail_cache}{body}} = ();
my %seen;
my @body_emails;
# get all <a href="mailto:", since they don't show up on stripped_body
my $parsed = $pms->get_uri_detail_list();
while (my($uri, $info) = each %{$parsed}) {
if (defined $info->{types}->{a} and not defined $info->{types}->{parsed}) {
if ($uri =~ /^(?:(?i)mailto):$self->{email_regex}/o) {
my $email = lc($1);
push(@body_emails, $email) unless defined $seen{$email};
$seen{$email} = 1;
last if scalar @body_emails >= 20; # sanity
}
}
}
# scan stripped normalized body
# have to do this way since get_uri_detail_list doesn't know what mails are inside <>
my $body = $pms->get_decoded_stripped_body_text_array();
BODY: foreach (@$body) {
# strip urls with possible emails inside
s{<?https?://\S{0,255}(?:\@|%40)\S{0,255}}{ }gi;
# strip emails contained in <>, not mailto:
# also strip ones followed by quote-like "wrote:" (but not fax: and tel: etc)
s{<?(?<!mailto:)$self->{email_regex}(?:>|\s{1,10}(?!(?:fa(?:x|csi)|tel|phone|e?-?mail))[a-z]{2,11}:)}{ }gi;
while (/$self->{email_regex}/g) {
my $email = lc($1);
utf8::encode($email) if utf8::is_utf8($email); # chars to UTF-8
push(@body_emails, $email) unless $seen{$email};
$seen{$email} = 1;
last BODY if @body_emails >= 40; # sanity
}
}
my $count_all = 0;
my $count_fm = 0;
foreach my $email (@body_emails) { # as UTF-8 octets
if (++$count_all == $pms->{main}->{conf}->{freemail_max_body_emails}) {
if ($pms->{main}->{conf}->{freemail_skip_when_over_max}) {
$pms->{freemail_skip_body} = 1;
dbg("too many unique emails found from body");
return 0;
}
}
next unless $self->_is_freemail($email, $pms);
if (++$count_fm == $pms->{main}->{conf}->{freemail_max_body_freemails}) {
if ($pms->{main}->{conf}->{freemail_skip_when_over_max}) {
$pms->{freemail_skip_body} = 1;
dbg("too many unique freemails found from body");
return 0;
}
}
$pms->{freemail_cache}{body}{$email} = 1;
}
dbg("all body freemails: ".join(', ', keys %{$pms->{freemail_cache}{body}}))
if scalar keys %{$pms->{freemail_cache}{body}};
}
if (defined $pms->{freemail_skip_body}) {
dbg("[cached] body email limit exceeded, skipping");
return 0;
}
return 1;
}
sub _got_hit {
my ($self, $pms, $email, $desc) = @_;
my $rulename = $pms->get_current_eval_rule_name();
if (defined $pms->{conf}->{descriptions}->{$rulename}) {
$desc = $pms->{conf}->{descriptions}->{$rulename};
}
if ($pms->{main}->{conf}->{freemail_add_describe_email}) {
$email =~ s/\@/[at]/g;
$pms->test_log($email);
}
$pms->got_hit($rulename, "", description => $desc, ruletype => 'eval');
}
sub check_freemail_header {
my ($self, $pms, $header, $regex) = @_;
return 0 unless $self->{freemail_available};
my $rulename = $pms->get_current_eval_rule_name();
dbg("RULE ($rulename) check_freemail_header".(defined $regex ? " regex:$regex" : ""));
unless (defined $header) {
warn("check_freemail_header needs argument");
return 0;
}
my $re;
if (defined $regex) {
my ($rec, $err) = compile_regexp($regex, 0);
if (!$rec) {
warn "freemail: invalid regexp for $rulename '$regex': $err\n";
return 0;
}
$re = $rec;
}
my @emails = map (lc, $pms->{main}->find_all_addrs_in_line ($pms->get($header)));
if (!scalar (@emails)) {
dbg("header $header not found from mail");
return 0;
}
dbg("addresses from header $header: ".join(';',@emails));
foreach my $email (@emails) {
if ($self->_is_freemail($email, $pms)) {
if (defined $re) {
next unless $email =~ /$re/o;
dbg("HIT! $email is freemail and matches regex");
}
else {
dbg("HIT! $email is freemail");
}
$self->_got_hit($pms, $email, "Header $header is freemail");
return 1;
}
}
return 0;
}
sub check_freemail_body {
my ($self, $pms, $regex) = @_;
return 0 unless $self->{freemail_available};
my $rulename = $pms->get_current_eval_rule_name();
dbg("RULE ($rulename) check_freemail_body".(defined $regex ? " regex:$regex" : ""));
return 0 unless $self->_parse_body($pms);
my $re;
if (defined $regex) {
my ($rec, $err) = compile_regexp($regex, 0);
if (!$rec) {
warn "freemail: invalid regexp for $rulename '$regex': $err\n";
return 0;
}
$re = $rec;
}
if (defined $re) {
foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
if ($email =~ /$re/o) {
dbg("HIT! email from body is freemail and matches regex: $email");
$self->_got_hit($pms, $email, "Email from body is freemail");
return 0;
}
}
}
elsif (scalar keys %{$pms->{freemail_cache}{body}}) {
my $emails = join(', ', keys %{$pms->{freemail_cache}{body}});
dbg("HIT! body has freemails: $emails");
$self->_got_hit($pms, $emails, "Body contains freemails");
return 0;
}
return 0;
}
sub check_freemail_from {
my ($self, $pms, $regex) = @_;
return 0 unless $self->{freemail_available};
my $rulename = $pms->get_current_eval_rule_name();
dbg("RULE ($rulename) check_freemail_from".(defined $regex ? " regex:$regex" : ""));
my $re;
if (defined $regex) {
my ($rec, $err) = compile_regexp($regex, 0);
if (!$rec) {
warn "freemail: invalid regexp for $rulename '$regex': $err\n";
return 0;
}
$re = $rec;
}
my %from_addrs = map { lc($_) => 1 } ($pms->all_from_addrs());
delete $from_addrs{''}; # no empty ones thx
unless (scalar keys %from_addrs) {
dbg("no from-addresses found to check");
return 0;
}
dbg("all from-addresses: ".join(', ', keys %from_addrs));
foreach my $email (keys %from_addrs) {
next unless $self->_is_freemail($email, $pms);
if (defined $re) {
next unless $email =~ /$re/o;
dbg("HIT! $email is freemail and matches regex");
}
else {
dbg("HIT! $email is freemail");
}
$self->_got_hit($pms, $email, "Sender address is freemail");
return 0;
}
return 0;
}
sub check_freemail_replyto {
my ($self, $pms, $what) = @_;
return 0 unless $self->{freemail_available};
my $rulename = $pms->get_current_eval_rule_name();
dbg("RULE ($rulename) check_freemail_replyto");
if (defined $what) {
if ($what ne 'replyto' and $what ne 'reply') {
warn("invalid check_freemail_replyto option: $what");
return 0;
}
}
else {
$what = 'replyto';
}
# Skip mailing-list etc looking requests, mostly FPs from them
if ($pms->{main}->{conf}->{freemail_skip_bulk_envfrom}) {
my $envfrom = lc($pms->get("EnvelopeFrom"));
if ($envfrom =~ $skip_replyto_envfrom) {
dbg("envelope sender looks bulk, skipping check: $envfrom");
return 0;
}
}
my $from = lc($pms->get("From:addr"));
my $replyto = lc($pms->get("Reply-To:addr"));
my $from_is_fm = $self->_is_freemail($from, $pms);
my $replyto_is_fm = $self->_is_freemail($replyto, $pms);
dbg("From address: $from") if $from ne '';
dbg("Reply-To address: $replyto") if $replyto ne '';
if ($from_is_fm and $replyto_is_fm and ($from ne $replyto)) {
dbg("HIT! From and Reply-To are different freemails");
$self->_got_hit($pms, "$from, $replyto", "From and Reply-To are different freemails");
return 0;
}
if ($what eq 'replyto') {
if (!$replyto_is_fm) {
dbg("Reply-To is not freemail, skipping check");
return 0;
}
}
elsif ($what eq 'reply') {
if ($replyto ne '' and !$replyto_is_fm) {
dbg("Reply-To defined and is not freemail, skipping check");
return 0;
}
elsif (!$from_is_fm) {
dbg("No Reply-To and From is not freemail, skipping check");
return 0;
}
}
my $reply = $replyto_is_fm ? $replyto : $from;
return 0 unless $self->_parse_body($pms);
# Compare body to headers
if (scalar keys %{$pms->{freemail_cache}{body}}) {
my $check = $what eq 'replyto' ? $replyto : $reply;
dbg("comparing $check to body freemails");
foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
if ($email ne $check) {
dbg("HIT! $check and $email are different freemails");
$self->_got_hit($pms, "$check, $email", "Different freemails in reply header and body");
return 0;
}
}
}
return 0;
}
1;

View File

@ -0,0 +1,437 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
FromNameSpoof - perform various tests to detect spoof attempts using the From header name section
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
# Does the From:name look like it contains an email address
header __PLUGIN_FROMNAME_EMAIL eval:check_fromname_contains_email()
# Is the From:name different to the From:addr header
header __PLUGIN_FROMNAME_DIFFERENT eval:check_fromname_different()
# From:name and From:addr owners differ
header __PLUGIN_FROMNAME_OWNERS_DIFFER eval:check_fromname_owners_differ()
# From:name domain differs to from header
header __PLUGIN_FROMNAME_DOMAIN_DIFFER eval:check_fromname_domain_differ()
# From:name and From:address don't match and owners differ
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
# From:name address matches To:address
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
=head1 DESCRIPTION
Perform various tests against From:name header to detect spoofing. Steps in place to
ensure minimal FPs.
=head1 CONFIGURATION
The plugin allows you to skip emails that have been DKIM signed by specific senders:
fns_ignore_dkim googlegroups.com
FromNameSpoof allows for a configurable closeness when matching the From:addr and From:name,
the closeness can be adjusted with:
fns_extrachars 50
B<Note> that FromNameSpoof detects the "owner" of a domain by the following search:
<owner>.<tld>
By default FromNameSpoof will ignore the TLD when testing if From:addr is spoofed.
Default 1
fns_check 1
Check levels:
0 - Strict checking of From:name != From:addr
1 - Allow for different tlds
2 - Allow for different aliases but same domain
=head1 TAGS
The following tags are added to the set if a spoof is detected. They are available for
use in reports, header fields, other plugins, etc.:
_FNSFNAMEADDR_
Detected spoof address from From:name header
_FNSFNAMEDOMAIN_
Detected spoof domain from From:name header
_FNSFNAMEOWNER_
Detected spoof owner from From:name header
_FNSFADDRADDR_
Actual From:addr address
_FNSFADDRDOMAIN_
Actual From:addr domain
_FNSFADDROWNER_
Actual From:addr detected owner
=head1 EXAMPLE
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
meta FROMNAME_SPOOF_EQUALS_TO (__PLUGIN_FROMNAME_SPOOF && __PLUGIN_FROMNAME_EQUALS_TO)
describe FROMNAME_SPOOF_EQUALS_TO From:name is spoof to look like To: address
score FROMNAME_SPOOF_EQUALS_TO 1.2
=cut
use strict;
package Mail::SpamAssassin::Plugin::FromNameSpoof;
my $VERSION = 0.9;
use Mail::SpamAssassin::Plugin;
use List::Util ();
use Mail::SpamAssassin::Util;
use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);
sub dbg { Mail::SpamAssassin::Plugin::dbg ("FromNameSpoof: @_"); }
sub uri_to_domain {
my ($self, $domain) = @_;
return unless defined $domain;
if ($Mail::SpamAssassin::VERSION <= 3.004000) {
Mail::SpamAssassin::Util::uri_to_domain($domain);
} else {
$self->{main}->{registryboundaries}->uri_to_domain($domain);
}
}
# constructor: register the eval rule
sub new
{
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->set_config($mailsaobject->{conf});
# the important bit!
$self->register_eval_rule("check_fromname_spoof");
$self->register_eval_rule("check_fromname_different");
$self->register_eval_rule("check_fromname_domain_differ");
$self->register_eval_rule("check_fromname_contains_email");
$self->register_eval_rule("check_fromname_equals_to");
$self->register_eval_rule("check_fromname_owners_differ");
$self->register_eval_rule("check_fromname_equals_replyto");
return $self;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds = ();
push (@cmds, {
setting => 'fns_add_addrlist',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my($self, $key, $value, $line) = @_;
local($1,$2);
if ($value !~ /^ \( (.*?) \) \s+ (.*) \z/sx) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my $listname = "FNS_$1";
$value = $2;
$self->{parser}->add_to_addrlist ($listname, split(/\s+/, lc($value)));
$self->{fns_addrlists}{$listname} = 1;
}
});
push (@cmds, {
setting => 'fns_remove_addrlist',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my($self, $key, $value, $line) = @_;
local($1,$2);
if ($value !~ /^ \( (.*?) \) \s+ (.*) \z/sx) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my $listname = "FNS_$1";
$value = $2;
$self->{parser}->remove_from_addrlist ($listname, split (/\s+/, $value));
}
});
push(@cmds, {
setting => 'fns_extrachars',
default => 50,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push (@cmds, {
setting => 'fns_ignore_dkim',
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
$self->{fns_ignore_dkim}->{$_} = 1 foreach (split(/\s+/, lc($value)));
}
});
push (@cmds, {
setting => 'fns_ignore_headers',
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
$self->{fns_ignore_header}->{$_} = 1 foreach (split(/\s+/, $value));
}
});
push(@cmds, {
setting => 'fns_check',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
$conf->{parser}->register_commands(\@cmds);
}
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
$pms->action_depends_on_tags('DKIMDOMAIN',
sub { my($pms,@args) = @_;
$self->_check_fromnamespoof($pms);
}
);
1;
}
sub check_fromname_different
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_address_different};
}
sub check_fromname_domain_differ
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_domain_different};
}
sub check_fromname_spoof
{
my ($self, $pms, $check_lvl) = @_;
$self->_check_fromnamespoof($pms);
if ( not defined $check_lvl ) {
$check_lvl = $pms->{conf}->{fns_check};
}
my @array = (
($pms->{fromname_address_different}) ,
($pms->{fromname_address_different} && $pms->{fromname_owner_different}) ,
($pms->{fromname_address_different} && $pms->{fromname_domain_different})
);
return $array[$check_lvl];
}
sub check_fromname_contains_email
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_contains_email};
}
sub check_fromname_equals_replyto
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_equals_replyto};
}
sub check_fromname_equals_to
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_equals_to_addr};
}
sub check_fromname_owners_differ
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_owner_different};
}
sub _check_fromnamespoof
{
my ($self, $pms) = @_;
return if (defined $pms->{fromname_contains_email});
my $conf = $pms->{conf};
$pms->{fromname_contains_email} = 0;
$pms->{fromname_address_different} = 0;
$pms->{fromname_equals_to_addr} = 0;
$pms->{fromname_domain_different} = 0;
$pms->{fromname_owner_different} = 0;
$pms->{fromname_equals_replyto} = 0;
foreach my $addr (split / /, $pms->get_tag('DKIMDOMAIN') || '') {
if ($conf->{fns_ignore_dkim}->{lc($addr)}) {
dbg("ignoring, DKIM signed: $addr");
return 0;
}
}
foreach my $iheader (keys %{$conf->{fns_ignore_header}}) {
if ($pms->get($iheader)) {
dbg("ignoring, header $iheader found");
return 0 if ($pms->get($iheader));
}
}
my $list_refs = {};
if ($conf->{fns_addrlists}) {
my @lists = keys %{$conf->{fns_addrlists}};
foreach my $list (@lists) {
$list_refs->{$list} = $conf->{$list};
}
s/^FNS_// foreach (@lists);
dbg("using addrlists: ".join(', ', @lists));
}
my %fnd = ();
my %fad = ();
my %tod = ();
$fnd{'addr'} = $pms->get("From:name");
if ($fnd{'addr'} =~ /\b((?>[\w\.\!\#\$\%\&\'\*\+\/\=\?\^\_\`\{\|\}\~\-]+@[\w\-\.]+\.[\w\-\.]+))\b/i) {
my $nochar = ($fnd{'addr'} =~ y/A-Za-z0-9//c);
$nochar -= ($1 =~ y/A-Za-z0-9//c);
return 0 unless ((length($fnd{'addr'})+$nochar) - length($1) <= $conf->{'fns_extrachars'});
$fnd{'addr'} = lc $1;
} else {
return 0;
}
my $replyto = lc $pms->get("Reply-To:addr");
$fad{'addr'} = lc $pms->get("From:addr");
my @toaddrs = $pms->all_to_addrs();
return 0 unless @toaddrs;
$tod{'addr'} = lc $toaddrs[0];
$fnd{'domain'} = $self->uri_to_domain($fnd{'addr'});
$fad{'domain'} = $self->uri_to_domain($fad{'addr'});
$tod{'domain'} = $self->uri_to_domain($tod{'addr'});
return 0 unless (defined $fnd{'domain'} && defined $fad{'domain'});
$pms->{fromname_contains_email} = 1;
$fnd{'owner'} = $self->_find_address_owner($fnd{'addr'}, $list_refs);
$fad{'owner'} = $self->_find_address_owner($fad{'addr'}, $list_refs);
$tod{'owner'} = $self->_find_address_owner($tod{'addr'}, $list_refs);
$pms->{fromname_address_different} = 1 if ($fnd{'addr'} ne $fad{'addr'});
$pms->{fromname_domain_different} = 1 if ($fnd{'domain'} ne $fad{'domain'});
$pms->{fromname_equals_to_addr} = 1 if ($fnd{'addr'} eq $tod{addr});
$pms->{fromname_equals_replyto} = 1 if ($fnd{'addr'} eq $replyto);
if ($fnd{'owner'} ne $fad{'owner'}) {
$pms->{fromname_owner_different} = 1;
}
if ($pms->{fromname_address_different}) {
$pms->set_tag("FNSFNAMEADDR", $fnd{'addr'});
$pms->set_tag("FNSFADDRADDR", $fad{'addr'});
$pms->set_tag("FNSFNAMEOWNER", $fnd{'owner'});
$pms->set_tag("FNSFADDROWNER", $fad{'owner'});
$pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
$pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
}
}
sub _find_address_owner
{
my ($self, $check, $list_refs) = @_;
foreach my $owner (keys %{$list_refs}) {
foreach my $white_addr (keys %{$list_refs->{$owner}}) {
my $regexp = qr/$list_refs->{$owner}{$white_addr}/i;
if ($check =~ /$regexp/) {
$owner =~ s/^FNS_//i;
return lc $owner;
}
}
}
my $owner = $self->uri_to_domain($check);
$check =~ /^([^\@]+)\@(.*)$/;
if ($owner ne $2) {
return $self->_find_address_owner("$1\@$owner", $list_refs);
}
$owner =~ /^([^\.]+)\./;
return lc $1;
}
1;

View File

@ -0,0 +1,216 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Plugin::HTMLEval;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Locales;
use Mail::SpamAssassin::Util qw(untaint_var compile_regexp);
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
# the important bit!
$self->register_eval_rule("html_tag_balance");
$self->register_eval_rule("html_image_only");
$self->register_eval_rule("html_image_ratio");
$self->register_eval_rule("html_charset_faraway");
$self->register_eval_rule("html_tag_exists");
$self->register_eval_rule("html_test");
$self->register_eval_rule("html_eval");
$self->register_eval_rule("html_text_match");
$self->register_eval_rule("html_title_subject_ratio");
$self->register_eval_rule("html_text_not_match");
$self->register_eval_rule("html_range");
$self->register_eval_rule("check_iframe_src");
return $self;
}
sub html_tag_balance {
my ($self, $pms, undef, $rawtag, $rawexpr) = @_;
return 0 if $rawtag !~ /^([a-zA-Z0-9]+)$/;
my $tag = $1;
return 0 unless exists $pms->{html}{inside}{$tag};
return 0 if $rawexpr !~ /^([\<\>\=\!\-\+ 0-9]+)$/;
my $expr = untaint_var($1);
$pms->{html}{inside}{$tag} =~ /^([\<\>\=\!\-\+ 0-9]+)$/;
my $val = untaint_var($1);
return eval "\$val $expr";
}
sub html_image_only {
my ($self, $pms, undef, $min, $max) = @_;
return (exists $pms->{html}{inside}{img} &&
exists $pms->{html}{length} &&
$pms->{html}{length} > $min &&
$pms->{html}{length} <= $max);
}
sub html_image_ratio {
my ($self, $pms, undef, $min, $max) = @_;
return 0 unless (exists $pms->{html}{non_space_len} &&
exists $pms->{html}{image_area} &&
$pms->{html}{image_area} > 0);
my $ratio = $pms->{html}{non_space_len} / $pms->{html}{image_area};
return ($ratio > $min && $ratio <= $max);
}
sub html_charset_faraway {
my ($self, $pms) = @_;
return 0 unless exists $pms->{html}{charsets};
my @locales = Mail::SpamAssassin::Util::get_my_locales($pms->{conf}->{ok_locales});
return 0 if grep { $_ eq "all" } @locales;
my $okay = 0;
my $bad = 0;
for my $c (split(' ', $pms->{html}{charsets})) {
if (Mail::SpamAssassin::Locales::is_charset_ok_for_locales($c, @locales)) {
$okay++;
}
else {
$bad++;
}
}
return ($bad && ($bad >= $okay));
}
sub html_tag_exists {
my ($self, $pms, undef, $tag) = @_;
return exists $pms->{html}{inside}{$tag};
}
sub html_test {
my ($self, $pms, undef, $test) = @_;
return $pms->{html}{$test};
}
sub html_eval {
my ($self, $pms, undef, $test, $rawexpr) = @_;
return 0 if $rawexpr !~ /^([\<\>\=\!\-\+ 0-9]+)$/;
my $expr = untaint_var($1);
# workaround bug 3320: weird perl bug where additional, very explicit
# untainting into a new var is required.
my $tainted = $pms->{html}{$test};
return 0 unless defined($tainted);
my $val = $tainted;
# just use the value in $val, don't copy it needlessly
return eval "\$val $expr";
}
sub html_text_match {
my ($self, $pms, undef, $text, $regexp) = @_;
my ($rec, $err) = compile_regexp($regexp, 0);
if (!$rec) {
warn "htmleval: html_text_match invalid regexp '$regexp': $err";
return 0;
}
foreach my $string (@{$pms->{html}{$text}}) {
next unless defined $string;
if ($string =~ $rec) {
return 1;
}
}
return 0;
}
sub html_title_subject_ratio {
my ($self, $pms, undef, $ratio) = @_;
my $subject = $pms->get('Subject');
if ($subject eq '') {
return 0;
}
my $max = 0;
for my $string (@{ $pms->{html}{title} }) {
if ($string) {
my $ratio = length($string) / length($subject);
$max = $ratio if $ratio > $max;
}
}
return $max > $ratio;
}
sub html_text_not_match {
my ($self, $pms, undef, $text, $regexp) = @_;
for my $string (@{ $pms->{html}{$text} }) {
if (defined $string && $string !~ /${regexp}/) {
return 1;
}
}
return 0;
}
sub html_range {
my ($self, $pms, undef, $test, $min, $max) = @_;
return 0 unless exists $pms->{html}{$test};
$test = $pms->{html}{$test};
# not all perls understand what "inf" means, so we need to do
# non-numeric tests! urg!
if (!defined $max || $max eq "inf") {
return ($test eq "inf") ? 1 : ($test > $min);
}
elsif ($test eq "inf") {
# $max < inf, so $test == inf means $test > $max
return 0;
}
else {
# if we get here everything should be a number
return ($test > $min && $test <= $max);
}
}
sub check_iframe_src {
my ($self, $pms) = @_;
foreach my $v ( values %{$pms->{html}->{uri_detail}} ) {
return 1 if $v->{types}->{iframe};
}
return 0;
}
1;

View File

@ -0,0 +1,110 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
package Mail::SpamAssassin::Plugin::HTTPSMismatch;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:ip);
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
# the important bit!
$self->register_eval_rule ("check_https_http_mismatch");
return $self;
}
# [lt]a href="http://baboz-njeryz.de/"[gt]https://bankofamerica.com/[lt]/a[gt]
# ("<" and ">" replaced with "[lt]" and "[gt]" to avoid Kaspersky Desktop AV
# false positive ;)
sub check_https_http_mismatch {
my ($self, $permsgstatus, undef, $minanchors, $maxanchors) = @_;
my $IP_ADDRESS = IP_ADDRESS;
$minanchors ||= 1;
if (!exists $permsgstatus->{chhm_hit}) {
$permsgstatus->{chhm_hit} = 0;
$permsgstatus->{chhm_anchors} = 0;
foreach my $k ( keys %{$permsgstatus->{html}->{uri_detail}} ) {
my %uri_detail = %{$permsgstatus->{html}->{uri_detail}};
my $v = ${uri_detail}{$k};
# if the URI wasn't used for an anchor tag, or the anchor text didn't
# exist, skip this.
next unless (exists $v->{anchor_text} && @{$v->{anchor_text}});
my $uri;
if ($k =~ m@^https?://([^/:]+)@i) {
$uri = $1;
# Skip IPs since there's another rule to catch that already
if ($uri =~ /^$IP_ADDRESS+$/) {
undef $uri;
next;
}
# want to compare whole hostnames instead of domains?
# comment this next section to the blank line.
$uri = $self->{main}->{registryboundaries}->trim_domain($uri);
undef $uri unless ($self->{main}->{registryboundaries}->is_domain_valid($uri));
}
next unless $uri;
$permsgstatus->{chhm_anchors}++ if exists $v->{anchor_text};
foreach (@{$v->{anchor_text}}) {
if (m@https://([^/:]+)@i) {
my $https = $1;
# want to compare whole hostnames instead of domains?
# comment this next section to the blank line.
if ($https !~ /^$IP_ADDRESS+$/) {
$https = $self->{main}->{registryboundaries}->trim_domain($https);
undef $https unless ($self->{main}->{registryboundaries}->is_domain_valid($https));
}
next unless $https;
dbg("https_http_mismatch: domains $uri -> $https");
next if $uri eq $https;
$permsgstatus->{chhm_hit} = 1;
last;
}
}
}
dbg("https_http_mismatch: anchors ".$permsgstatus->{chhm_anchors});
}
return ( $permsgstatus->{chhm_hit} && $permsgstatus->{chhm_anchors} >= $minanchors && (defined $maxanchors && $permsgstatus->{chhm_anchors} < $maxanchors) );
}
1;

View File

@ -0,0 +1,682 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
HashBL - query hashed (and unhashed) DNS blocklists
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::HashBL
header HASHBL_EMAIL eval:check_hashbl_emails('ebl.msbl.org')
describe HASHBL_EMAIL Message contains email address found on EBL
hashbl_acl_freemail gmail.com
header HASHBL_OSENDR eval:check_hashbl_emails('rbl.example.com/A', 'md5/max=10/shuffle', 'X-Original-Sender', '^127\.', 'freemail')
describe HASHBL_OSENDR Message contains email address found on HASHBL
tflags HASHBL_OSENDR net
body HASHBL_BTC eval:check_hashbl_bodyre('btcbl.foo.bar', 'sha1/max=10/shuffle', '\b([13][a-km-zA-HJ-NP-Z1-9]{25,34})\b')
describe HASHBL_BTC Message contains BTC address found on BTCBL
priority HASHBL_BTC -100 # required priority to launch async lookups
header HASHBL_URI eval:check_hashbl_uris('rbl.foo.bar', 'sha1', '127.0.0.32')
describe HASHBL_URI Message contains uri found on rbl
=head1 DESCRIPTION
This plugin support multiple types of hashed or unhashed DNS blocklists.
OPTS refers to multiple generic options:
raw do not hash data, query as is
md5 hash query with MD5
sha1 hash query with SHA1
case keep case before hashing, default is to lowercase
max=x maximum number of queries
shuffle if max exceeded, random shuffle queries before truncating to limit
Multiple options can be separated with slash or other non-word character.
If OPTS is empty ('') or missing, default is used.
HEADERS refers to slash separated list of Headers to process:
ALL all headers
ALLFROM all From headers as returned by $pms->all_from_addrs()
EnvelopeFrom message envelope from (Return-Path etc)
HeaderName any header as used with $pms->get()
if HEADERS is empty ('') or missing, default is used.
=over 4
=item header RULE check_hashbl_emails('bl.example.com/A', 'OPTS', 'HEADERS/body', '^127\.')
Check email addresses from DNS list, "body" can be specified along with
headers to search body for emails. Optional subtest regexp to match DNS
answer. Note that eval rule type must always be "header".
DNS query type can be appended to list with /A (default) or /TXT.
Additional supported OPTS:
nodot strip username dots from email
notag strip username tags from email
nouri ignore emails inside uris
noquote ignore emails inside < > or possible quotings
Default OPTS: sha1/notag/noquote/max=10/shuffle
Default HEADERS: ALLFROM/Reply-To/body
For existing public email blacklist, see: http://msbl.org/ebl.html
header HASHBL_EBL check_hashbl_emails('ebl.msbl.org')
priority HASHBL_EBL -100 # required for async query
=over 4
=item header RULE check_hashbl_uris('bl.example.com/A', 'OPTS', '^127\.')
Check uris from DNS list, optional subtest regexp to match DNS
answer.
DNS query type can be appended to list with /A (default) or /TXT.
Default OPTS: sha1/max=10/shuffle
=back
=item body RULE check_hashbl_bodyre('bl.example.com/A', 'OPTS', '\b(match)\b', '^127\.')
Search body for matching regexp and query the string captured. Regexp must
have a single capture ( ) for the string ($1). Optional subtest regexp to
match DNS answer. Note that eval rule type must be "body" or "rawbody".
=back
=cut
package Mail::SpamAssassin::Plugin::HashBL;
use strict;
use warnings;
my $VERSION = 0.101;
use Digest::MD5 qw(md5_hex);
use Digest::SHA qw(sha1_hex);
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Util qw(compile_regexp);
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub dbg {
my $msg = shift;
Mail::SpamAssassin::Plugin::dbg("HashBL: $msg", @_);
}
sub new {
my ($class, $mailsa) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
# are network tests enabled?
if ($mailsa->{local_tests_only}) {
$self->{hashbl_available} = 0;
dbg("local tests only, disabling HashBL");
} else {
$self->{hashbl_available} = 1;
}
$self->register_eval_rule("check_hashbl_emails");
$self->register_eval_rule("check_hashbl_uris");
$self->register_eval_rule("check_hashbl_bodyre");
$self->set_config($mailsa->{conf});
return $self;
}
sub set_config {
my($self, $conf) = @_;
my @cmds;
push (@cmds, {
setting => 'hashbl_ignore',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
default => {},
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || $value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
foreach my $str (split (/\s+/, $value)) {
$self->{hashbl_ignore}->{lc $str} = 1;
}
}
});
$conf->{parser}->register_commands(\@cmds);
}
sub _parse_args {
my ($self, $acl) = @_;
if (not defined $acl) {
return ();
}
$acl =~ s/\s+//g;
if ($acl !~ /^[a-z0-9]{1,32}$/) {
warn("invalid acl name: $acl");
return ();
}
if ($acl eq 'all') {
return ();
}
if (defined $self->{hashbl_acl}{$acl}) {
warn("no such acl defined: $acl");
return ();
}
}
sub parse_config {
my ($self, $opt) = @_;
if ($opt->{key} =~ /^hashbl_acl_([a-z0-9]{1,32})$/i) {
$self->inhibit_further_callbacks();
return 1 unless $self->{hashbl_available};
my $acl = lc($1);
my @opts = split(/\s+/, $opt->{value});
foreach my $tmp (@opts)
{
if ($tmp =~ /^(\!)?(\S+)$/i) {
my $neg = $1;
my $value = lc($2);
if (defined $neg) {
$self->{hashbl_acl}{$acl}{$value} = 0;
} else {
next if $acl eq 'all';
# exclusions overrides
if ( not defined $self->{hashbl_acl}{$acl}{$value} ) {
$self->{hashbl_acl}{$acl}{$value} = 1
}
}
} else {
warn("invalid acl: $tmp");
}
}
return 1;
}
return 0;
}
sub finish_parsing_end {
my ($self, $opts) = @_;
return 0 if !$self->{hashbl_available};
# valid_tlds_re will be available at finish_parsing_end, compile it now,
# we only need to do it once and before possible forking
if (!exists $self->{email_re}) {
$self->_init_email_re();
}
return 0;
}
sub _init_email_re {
my ($self) = @_;
# Some regexp tips courtesy of http://www.regular-expressions.info/email.html
# full email regex v0.02
$self->{email_re} = qr/
(?=.{0,64}\@) # limit userpart to 64 chars (and speed up searching?)
(?<![a-z0-9!#\$%&'*+\/=?^_`{|}~-]) # start boundary
( # capture email
[a-z0-9!#\$%&'*+\/=?^_`{|}~-]+ # no dot in beginning
(?:\.[a-z0-9!#\$%&'*+\/=?^_`{|}~-]+)* # no consecutive dots, no ending dot
\@
(?:[a-z0-9](?:[a-z0-9-]{0,59}[a-z0-9])?\.){1,4} # max 4x61 char parts (should be enough?)
$self->{main}->{registryboundaries}->{valid_tlds_re} # ends with valid tld
)
/xi;
# default email whitelist
$self->{email_whitelist} = qr/
^(?:
abuse|support|sales|info|helpdesk|contact|kontakt
| (?:post|host|domain)master
| undisclosed.* # yahoo.com etc(?)
| request-[a-f0-9]{16} # live.com
| bounced?- # yahoo.com etc
| [a-f0-9]{8}(?:\.[a-f0-9]{8}|-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) # gmail msgids?
| .+=.+=.+ # gmail forward
)\@
/xi;
}
sub _get_emails {
my ($self, $pms, $opts, $from, $acl) = @_;
my @emails; # keep find order
my %seen;
my @tmp_email;
my $domain;
foreach my $hdr (split(/\//, $from)) {
my $parsed_emails = $self->_parse_emails($pms, $opts, $hdr);
foreach (@$parsed_emails) {
next if exists $seen{$_};
my @tmp_email = split('@', $_);
my $domain = $tmp_email[1];
if (defined($acl) and ($acl ne "all") and defined($domain)) {
if (defined($self->{hashbl_acl}{$acl}{$domain}) and ($self->{hashbl_acl}{$acl}{$domain} eq 1)) {
push @emails, $_;
$seen{$_} = 1;
}
} else {
push @emails, $_;
$seen{$_} = 1;
}
}
}
return \@emails;
}
sub _parse_emails {
my ($self, $pms, $opts, $hdr) = @_;
if (exists $pms->{hashbl_email_cache}{$hdr}) {
return $pms->{hashbl_email_cache}{$hdr};
}
if ($hdr eq 'ALLFROM') {
my @emails = $pms->all_from_addrs();
return $pms->{hashbl_email_cache}{$hdr} = \@emails;
}
if (not defined $pms->{hashbl_whitelist}) {
%{$pms->{hashbl_whitelist}} = map { lc($_) => 1 }
( $pms->get("X-Original-To:addr"),
$pms->get("Apparently-To:addr"),
$pms->get("Delivered-To:addr"),
$pms->get("Envelope-To:addr"),
);
if ( defined $pms->{hashbl_whitelist}{''} ) {
delete $pms->{hashbl_whitelist}{''};
}
}
my $str = '';
if ($hdr eq 'ALL') {
$str = join("\n", $pms->get('ALL'));
} elsif ($hdr eq 'body') {
# get all <a href="mailto:", since they don't show up on stripped_body
my $uris = $pms->get_uri_detail_list();
while (my($uri, $info) = each %{$uris}) {
if (defined $info->{types}->{a} && !defined $info->{types}->{parsed}) {
if ($uri =~ /^mailto:(.+)/i) {
$str .= "$1\n";
}
}
}
my $body = join('', $pms->get_decoded_stripped_body_text_array());
if ($opts =~ /\bnouri\b/) {
# strip urls with possible emails inside
$body =~ s#<?https?://\S{0,255}(?:\@|%40)\S{0,255}# #gi;
}
if ($opts =~ /\bnoquote\b/) {
# strip emails contained in <>, not mailto:
# also strip ones followed by quote-like "wrote:" (but not fax: and tel: etc)
$body =~ s#<?(?<!mailto:)$self->{email_re}(?:>|\s{1,10}(?!(?:fa(?:x|csi)|tel|phone|e?-?mail))[a-z]{2,11}:)# #gi;
}
$str .= $body;
} else {
$str .= join("\n", $pms->get($hdr));
}
my @emails; # keep find order
my %seen;
while ($str =~ /($self->{email_re})/g) {
next if exists $seen{$1};
push @emails, $1;
}
return $pms->{hashbl_email_cache}{$hdr} = \@emails;
}
sub check_hashbl_emails {
my ($self, $pms, $list, $opts, $from, $subtest, $acl) = @_;
return 0 if !$self->{hashbl_available};
return 0 if !$pms->is_dns_available();
return 0 if !$self->{email_re};
my $rulename = $pms->get_current_eval_rule_name();
if (!defined $list) {
warn "HashBL: $rulename blocklist argument missing\n";
return 0;
}
if ($subtest) {
my ($rec, $err) = compile_regexp($subtest, 0);
if (!$rec) {
warn "HashBL: $rulename invalid subtest regex: $@\n";
return 0;
}
$subtest = $rec;
}
# Defaults
$opts = 'sha1/notag/noquote/max=10/shuffle' if !$opts;
$from = 'ALLFROM/Reply-To/body' if !$from;
# Find all emails
my $emails = $self->_get_emails($pms, $opts, $from, $acl);
if (!@$emails) {
if(defined $acl) {
dbg("$rulename: no emails found ($from) on acl $acl");
} else {
dbg("$rulename: no emails found ($from)");
}
return 0;
} else {
dbg("$rulename: raw emails found: ".join(', ', @$emails));
}
# Filter list
my $keep_case = $opts =~ /\bcase\b/i;
my $nodot = $opts =~ /\bnodot\b/i;
my $notag = $opts =~ /\bnotag\b/i;
my @filtered_emails; # keep order
my %seen;
foreach my $email (@$emails) {
next if exists $seen{$email};
if (($email =~ $self->{email_whitelist}) or defined ($pms->{hashbl_whitelist}{$email})) {
dbg("Address whitelisted: $email");
next;
}
if ($nodot || $notag) {
my ($username, $domain) = ($email =~ /(.*)(\@.*)/);
$username =~ tr/.//d if $nodot;
$username =~ s/\+.*// if $notag;
$email = $username.$domain;
}
push @filtered_emails, $keep_case ? $email : lc($email);
$seen{$email} = 1;
}
# Randomize order
if ($opts =~ /\bshuffle\b/) {
Mail::SpamAssassin::Util::fisher_yates_shuffle(\@filtered_emails);
}
# Truncate list
my $max = $opts =~ /\bmax=(\d+)\b/ ? $1 : 10;
$#filtered_emails = $max-1 if scalar @filtered_emails > $max;
foreach my $email (@filtered_emails) {
$self->_submit_query($pms, $rulename, $email, $list, $opts, $subtest);
}
return 0;
}
sub check_hashbl_uris {
my ($self, $pms, $list, $opts, $subtest) = @_;
return 0 if !$self->{hashbl_available};
return 0 if !$pms->is_dns_available();
my $rulename = $pms->get_current_eval_rule_name();
if (!defined $list) {
warn "HashBL: $rulename blocklist argument missing\n";
return 0;
}
if ($subtest) {
my ($rec, $err) = compile_regexp($subtest, 0);
if (!$rec) {
warn "HashBL: $rulename invalid subtest regex: $@\n";
return 0;
}
$subtest = $rec;
}
# Defaults
$opts = 'sha1/max=10/shuffle' if !$opts;
# Filter list
my $keep_case = $opts =~ /\bcase\b/i;
if ($opts =~ /raw/) {
warn "HashBL: $rulename raw option invalid\n";
return 0;
}
my $uris = $pms->get_uri_detail_list();
my %seen;
my @filtered_uris;
while (my($uri, $info) = each %{$uris}) {
# we want to skip mailto: uris
next if ($uri =~ /^mailto:/i);
next if exists $seen{$uri};
# no hosts/domains were found via this uri, so skip
next unless $info->{hosts};
next unless $info->{cleaned};
next unless $info->{types}->{a} || $info->{types}->{parsed};
foreach my $uri (@{$info->{cleaned}}) {
# check url
push @filtered_uris, $keep_case ? $uri : lc($uri);
}
$seen{$uri} = 1;
}
# Randomize order
if ($opts =~ /\bshuffle\b/) {
Mail::SpamAssassin::Util::fisher_yates_shuffle(\@filtered_uris);
}
# Truncate list
my $max = $opts =~ /\bmax=(\d+)\b/ ? $1 : 10;
$#filtered_uris = $max-1 if scalar @filtered_uris > $max;
foreach my $furi (@filtered_uris) {
$self->_submit_query($pms, $rulename, $furi, $list, $opts, $subtest);
}
return 0;
}
sub check_hashbl_bodyre {
my ($self, $pms, $bodyref, $list, $opts, $re, $subtest) = @_;
return 0 if !$self->{hashbl_available};
return 0 if !$pms->is_dns_available();
my $rulename = $pms->get_current_eval_rule_name();
if (!defined $list) {
warn "HashBL: $rulename blocklist argument missing\n";
return 0;
}
if (!$re) {
warn "HashBL: $rulename missing body regex\n";
return 0;
}
my ($rec, $err) = compile_regexp($re, 0);
if (!$rec) {
warn "HashBL: $rulename invalid body regex: $@\n";
return 0;
}
$re = $rec;
if ($subtest) {
my ($rec, $err) = compile_regexp($subtest, 0);
if (!$rec) {
warn "HashBL: $rulename invalid subtest regex: $@\n";
return 0;
}
$subtest = $rec;
}
# Defaults
$opts = 'sha1/max=10/shuffle' if !$opts;
my $keep_case = $opts =~ /\bcase\b/i;
# Search body
my @matches;
my %seen;
if (ref($bodyref) eq 'ARRAY') {
# body, rawbody
foreach (@$bodyref) {
while ($_ =~ /$re/gs) {
next if !defined $1;
my $match = $keep_case ? $1 : lc($1);
next if exists $seen{$match};
$seen{$match} = 1;
push @matches, $match;
}
}
} else {
# full
while ($$bodyref =~ /$re/gs) {
next if !defined $1;
my $match = $keep_case ? $1 : lc($1);
next if exists $seen{$match};
$seen{$match} = 1;
push @matches, $match;
}
}
if (!@matches) {
dbg("$rulename: no matches found");
return 0;
} else {
dbg("$rulename: matches found: '".join("', '", @matches)."'");
}
# Randomize order
if ($opts =~ /\bshuffle\b/) {
Mail::SpamAssassin::Util::fisher_yates_shuffle(\@matches);
}
# Truncate list
my $max = $opts =~ /\bmax=(\d+)\b/ ? $1 : 10;
$#matches = $max-1 if scalar @matches > $max;
foreach my $match (@matches) {
$self->_submit_query($pms, $rulename, $match, $list, $opts, $subtest);
}
return 0;
}
sub _hash {
my ($self, $opts, $value) = @_;
my $hashtype = $opts =~ /\b(raw|sha1|md5)\b/i ? lc($1) : 'sha1';
if ($hashtype eq 'sha1') {
return sha1_hex($value);
} elsif ($hashtype eq 'md5') {
return md5_hex($value);
} else {
return $value;
}
}
sub _submit_query {
my ($self, $pms, $rulename, $value, $list, $opts, $subtest) = @_;
if (exists $pms->{conf}->{hashbl_ignore}->{lc $value}) {
dbg("query skipped, ignored string: $value");
return 1;
}
my $hash = $self->_hash($opts, $value);
dbg("querying $value ($hash) from $list");
if (exists $pms->{conf}->{hashbl_ignore}->{$hash}) {
dbg("query skipped, ignored hash: $value");
return 1;
}
my $type = $list =~ s,/(A|TXT)$,,i ? uc($1) : 'A';
my $lookup = "$hash.$list";
my $key = "HASHBL_EMAIL:$lookup";
my $ent = {
key => $key,
zone => $list,
rulename => $rulename,
type => "HASHBL",
hash => $hash,
value => $value,
subtest => $subtest,
};
$ent = $pms->{async}->bgsend_and_start_lookup($lookup, $type, undef, $ent,
sub { my ($ent, $pkt) = @_; $self->_finish_query($pms, $ent, $pkt); },
master_deadline => $pms->{master_deadline}
);
$pms->register_async_rule_start($rulename) if $ent;
}
sub _finish_query {
my ($self, $pms, $ent, $pkt) = @_;
if (!$pkt) {
# $pkt will be undef if the DNS query was aborted (e.g. timed out)
dbg("lookup was aborted: $ent->{rulename} $ent->{key}");
return;
}
my $dnsmatch = $ent->{subtest} ? $ent->{subtest} : qr/^127\./;
my @answer = $pkt->answer;
foreach my $rr (@answer) {
if ($rr->address =~ $dnsmatch) {
dbg("$ent->{rulename}: $ent->{zone} hit '$ent->{value}'");
$ent->{value} =~ s/\@/[at]/g;
$pms->test_log($ent->{value});
$pms->got_hit($ent->{rulename}, '', ruletype => 'eval');
$pms->register_async_rule_finish($ent->{rulename});
return;
}
}
}
# Version features
sub has_hashbl_bodyre { 1 }
sub has_hashbl_emails { 1 }
sub has_hashbl_uris { 1 }
sub has_hashbl_ignore { 1 }
1;

View File

@ -0,0 +1,352 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::Hashcash - perform hashcash verification tests
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::Hashcash
=head1 DESCRIPTION
Hashcash is a payment system for email where CPU cycles used as the
basis for an e-cash system. This plugin makes it possible to use valid
hashcash tokens added by mail programs as a bonus for messages.
=cut
=head1 USER SETTINGS
=over 4
=item use_hashcash { 1 | 0 } (default: 1)
Whether to use hashcash, if it is available.
=cut
=item hashcash_accept user@example.com ...
Used to specify addresses that we accept HashCash tokens for. You should set
it to match all the addresses that you may receive mail at.
Like whitelist and blacklist entries, the addresses are file-glob-style
patterns, so C<friend@somewhere.com>, C<*@isp.com>, or C<*.domain.net> will all
work. Specifically, C<*> and C<?> are allowed, but all other metacharacters
are not. Regular expressions are not used for security reasons.
The sequence C<%u> is replaced with the current user's username, which
is useful for ISPs or multi-user domains.
Multiple addresses per line, separated by spaces, is OK. Multiple
C<hashcash_accept> lines is also OK.
=cut
=item hashcash_doublespend_path /path/to/file (default: ~/.spamassassin/hashcash_seen)
Path for HashCash double-spend database. HashCash tokens are only usable once,
so their use is tracked in this database to avoid providing a loophole.
By default, each user has their own, in their C<~/.spamassassin> directory with
mode 0700/0600. Note that once a token is 'spent' it is written to this file,
and double-spending of a hashcash token makes it invalid, so this is not
suitable for sharing between multiple users.
=cut
=item hashcash_doublespend_file_mode (default: 0700)
The file mode bits used for the HashCash double-spend database file.
Make sure you specify this using the 'x' mode bits set, as it may also be used
to create directories. However, if a file is created, the resulting file will
not have any execute bits set (the umask is set to 111).
=cut
package Mail::SpamAssassin::Plugin::Hashcash;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
use Errno qw(ENOENT EACCES);
use Fcntl;
use File::Path;
use File::Basename;
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
}
our @ISA = qw(Mail::SpamAssassin::Plugin);
use constant HAS_DB_FILE => eval { require DB_File; };
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("check_hashcash_value");
$self->register_eval_rule ("check_hashcash_double_spend");
$self->set_config($mailsaobject->{conf});
return $self;
}
###########################################################################
sub set_config {
my($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'use_hashcash',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push(@cmds, {
setting => 'hashcash_doublespend_path',
default => '__userstate__/hashcash_seen',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
});
push(@cmds, {
setting => 'hashcash_doublespend_file_mode',
default => "0700",
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push(@cmds, {
setting => 'hashcash_accept',
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
});
$conf->{parser}->register_commands(\@cmds);
}
###########################################################################
sub check_hashcash_value {
my ($self, $scanner, $valmin, $valmax) = @_;
my $val = $self->_run_hashcash($scanner);
return ($val >= $valmin && $val < $valmax);
}
sub check_hashcash_double_spend {
my ($self, $scanner) = @_;
$self->_run_hashcash($scanner);
return ($scanner->{hashcash_double_spent});
}
############################################################################
sub _run_hashcash {
my ($self, $scanner) = @_;
if (defined $scanner->{hashcash_value}) { return $scanner->{hashcash_value}; }
$scanner->{hashcash_value} = 0;
# X-Hashcash: 0:031118:camram-spam@camram.org:c068b58ade6dcbaf
# or:
# X-hashcash: 1:20:040803:hashcash@freelists.org::6dcdb3a3ad4e1b86:1519d
# X-hashcash: 1:20:040803:jm@jmason.org::6b484d06469ccb28:8838a
# X-hashcash: 1:20:040803:adam@cypherspace.org::a1cbc54bf0182ea8:5d6a0
# call down to {msg} so that we can get it as an array of
# individual headers
my @hdrs = $scanner->{msg}->get_header ("X-Hashcash");
if (scalar @hdrs == 0) {
@hdrs = $scanner->{msg}->get_header ("Hashcash");
}
foreach my $hc (@hdrs) {
my $value = $self->_run_hashcash_for_one_string($scanner, $hc);
if ($value) {
# remove the "double-spend" bool if we did find a usable string;
# this happens when one string is already spent, but another
# string has not yet been.
delete $scanner->{hashcash_double_spent};
return $value;
}
}
return 0;
}
sub _run_hashcash_for_one_string {
my ($self, $scanner, $hc) = @_;
if (!$hc) { return 0; }
$hc =~ s/\s+//gs; # remove whitespace from multiline, folded tokens
# untaint the string for paranoia, making sure not to allow \n \0 \' \"
if ($hc =~ /^[-A-Za-z0-9\xA0-\xFF:_\/\%\@\.\,\= \*\+\;]+$/) {
$hc = untaint_var($hc);
}
if (!$hc) { return 0; }
my ($ver, $bits, $date, $rsrc, $exts, $rand, $trial);
if ($hc =~ /^0:/) {
($ver, $date, $rsrc, $trial) = split (/:/, $hc, 4);
}
elsif ($hc =~ /^1:/) {
($ver, $bits, $date, $rsrc, $exts, $rand, $trial) =
split (/:/, $hc, 7);
# extensions are, as yet, unused by SpamAssassin
}
else {
dbg("hashcash: version $ver stamps not yet supported");
return 0;
}
if (!$trial) {
dbg("hashcash: no trial in stamp '$hc'");
return 0;
}
my $accept = $scanner->{conf}->{hashcash_accept};
if (!$self->_check_hashcash_resource ($scanner, $accept, $rsrc)) {
dbg("hashcash: resource $rsrc not accepted here");
return 0;
}
# get the hash collision from the token. Computing the hash collision
# is very easy (great!) -- just get SHA1(token) and count the 0 bits at
# the start of the SHA1 hash, according to the draft at
# http://www.hashcash.org/draft-hashcash.txt .
my $value = 0;
my $bitstring = unpack ("B*", sha1($hc));
$bitstring =~ /^(0+)/ and $value = length $1;
# hashcash v1 tokens: if the "claimed value" of the token is less than
# what the token actually contains (ie. token was accidentally generated
# with 24 bits instead of the claimed 20), then cut it down to just the
# claimed value. that way it's a bit tidier and more deterministic.
if ($bits && $value > $bits) {
$value = $bits;
}
dbg("hashcash: token value: $value");
if ($self->was_hashcash_token_double_spent ($scanner, $hc)) {
$scanner->{hashcash_double_spent} = 1;
return 0;
}
$scanner->{hashcash_value} = $value;
return $value;
}
sub was_hashcash_token_double_spent {
my ($self, $scanner, $token) = @_;
my $main = $self->{main};
if (!$main->{conf}->{hashcash_doublespend_path}) {
dbg("hashcash: hashcash_doublespend_path not defined or empty");
return 0;
}
if (!HAS_DB_FILE) {
dbg("hashcash: DB_File module not installed, cannot use double-spend db");
return 0;
}
my $path = $main->sed_path ($main->{conf}->{hashcash_doublespend_path});
my $parentdir = dirname ($path);
my $stat_errn = stat($parentdir) ? 0 : 0+$!;
if ($stat_errn == 0 && !-d _) {
dbg("hashcash: parent dir $parentdir exists but is not a directory");
} elsif ($stat_errn == ENOENT) {
# run in an eval(); if mkpath has no perms, it calls die()
eval {
mkpath ($parentdir, 0, (oct ($main->{conf}->{hashcash_doublespend_file_mode}) & 0777));
};
}
my %spenddb;
if (!tie %spenddb, "DB_File", $path, O_RDWR|O_CREAT,
(oct ($main->{conf}->{hashcash_doublespend_file_mode}) & 0666))
{
dbg("hashcash: failed to tie to $path: $@ $!");
# not a serious error. TODO?
return 0;
}
if (exists $spenddb{$token}) {
untie %spenddb;
dbg("hashcash: token '$token' spent already");
return 1;
}
$spenddb{$token} = time;
dbg("hashcash: marking token '$token' as spent");
# TODO: expiry?
untie %spenddb;
return 0;
}
sub _check_hashcash_resource {
my ($self, $scanner, $list, $addr) = @_;
$addr = lc $addr;
if (defined ($list->{$addr})) { return 1; }
study $addr; # study is a no-op since perl 5.16.0, eliminating related bugs
foreach my $regexp (values %{$list})
{
# allow %u == current username
# \\ is added by $conf->add_to_addrlist()
$regexp =~ s/\\\%u/$scanner->{main}->{username}/gs;
if ($addr =~ /$regexp/i) {
return 1;
}
}
# TODO: use "To" and "Cc" addresses gleaned from the mails in the Bayes
# database trained as ham, as well.
return 0;
}
############################################################################
1;
=back
=cut

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,409 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
#
# -------------------------------------------------------
# ImageInfo Plugin for SpamAssassin
# Version: 0.7
# Created: 2006-08-02
# Modified: 2007-01-17
#
# Changes:
# 0.7 - added image_name_regex to allow pattern matching on the image name
# - added support for image/pjpeg content types (progressive jpeg)
# - updated imageinfo.cf with a few sample rules for using image_name_regex()
# 0.6 - fixed dems_ bug in image_size_range_
# 0.5 - added image_named and image_to_text_ratio
# 0.4 - added image_size_exact and image_size_range
# 0.3 - added jpeg support
# 0.2 - optimized by theo
# 0.1 - added gif/png support
#
#
# Usage:
# image_count()
#
# body RULENAME eval:image_count(<type>,<min>,[max])
# type: 'all','gif','png', or 'jpeg'
# min: required, message contains at least this
# many images
# max: optional, if specified, message must not
# contain more than this number of images
#
# image_count() examples
#
# body ONE_IMAGE eval:image_count('all',1,1)
# body ONE_OR_MORE_IMAGES eval:image_count('all',1)
# body ONE_PNG eval:image_count('png',1,1)
# body TWO_GIFS eval:image_count('gif',2,2)
# body MANY_JPEGS eval:image_count('gif',5)
#
# pixel_coverage()
#
# body RULENAME eval:pixel_coverage(<type>,<min>,[max])
# type: 'all','gif','png', or 'jpeg'
# min: required, message contains at least this
# much pixel area
# max: optional, if specified, message must not
# contain more than this much pixel area
#
# pixel_coverage() examples
#
# body LARGE_IMAGE_AREA eval:pixel_coverage('all',150000) # catches any images that are 150k pixel/sq or higher
# body SMALL_GIF_AREA eval:pixel_coverage('gif',1,40000) # catches only gifs that 1 to 40k pixel/sql
#
# image_name_regex()
#
# body RULENAME eval:image_name_regex(<regex>)
# regex: full quoted regexp, see examples below
#
# image_name_regex() examples
#
# body CG_DOUBLEDOT_GIF eval:image_name_regex('/^\w{2,9}\.\.gif$/i') # catches double dot gifs abcd..gif
#
#
#
# -------------------------------------------------------
package Mail::SpamAssassin::Plugin::ImageInfo;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("image_count");
$self->register_eval_rule ("pixel_coverage");
$self->register_eval_rule ("image_size_exact");
$self->register_eval_rule ("image_size_range");
$self->register_eval_rule ("image_named");
$self->register_eval_rule ("image_name_regex");
$self->register_eval_rule ("image_to_text_ratio");
return $self;
}
# -----------------------------------------
my %get_details = (
'gif' => sub {
my ($pms, $part) = @_;
my $header = $part->decode(13);
# make sure this is actually a valid gif..
return unless $header =~ s/^GIF(8[79]a)//;
my $version = $1;
my ($width, $height, $packed, $bgcolor, $aspect) = unpack("vvCCC", $header);
my $color_table_size = 1 << (($packed & 0x07) + 1);
# for future enhancements
#my $global_color_table = $packed & 0x80;
#my $has_global_color_table = $global_color_table ? 1 : 0;
#my $sorted_colors = ($packed & 0x08)?1:0;
#my $resolution = ((($packed & 0x70) >> 4) + 1);
if ($height && $width) {
my $area = $width * $height;
$pms->{imageinfo}->{pc_gif} += $area;
$pms->{imageinfo}->{dems_gif}->{"${height}x${width}"} = 1;
$pms->{imageinfo}->{names_all}->{$part->{'name'}} = 1 if $part->{'name'};
dbg("imageinfo: gif image ".($part->{'name'} ? $part->{'name'} : '')." is $height x $width pixels ($area pixels sq.), with $color_table_size color table");
}
},
'png' => sub {
my ($pms, $part) = @_;
my $data = $part->decode();
return unless (substr($data, 0, 8) eq "\x89PNG\x0d\x0a\x1a\x0a");
my $datalen = length $data;
my $pos = 8;
my $chunksize = 8;
my ($width, $height) = ( 0, 0 );
my ($depth, $ctype, $compression, $filter, $interlace);
while ($pos < $datalen) {
my ($len, $type) = unpack("Na4", substr($data, $pos, $chunksize));
$pos += $chunksize;
last if $type eq "IEND"; # end of png image.
next unless ( $type eq "IHDR" && $len == 13 );
my $bytes = substr($data, $pos, $len + 4);
my $crc = unpack("N", substr($bytes, -4, 4, ""));
if ($type eq "IHDR" && $len == 13) {
($width, $height, $depth, $ctype, $compression, $filter, $interlace) = unpack("NNCCCCC", $bytes);
last;
}
}
if ($height && $width) {
my $area = $width * $height;
$pms->{imageinfo}->{pc_png} += $area;
$pms->{imageinfo}->{dems_png}->{"${height}x${width}"} = 1;
$pms->{imageinfo}->{names_all}->{$part->{'name'}} = 1 if $part->{'name'};
dbg("imageinfo: png image ".($part->{'name'} ? $part->{'name'} : '')." is $height x $width pixels ($area pixels sq.)");
}
},
'jpeg' => sub {
my ($pms, $part) = @_;
my $data = $part->decode();
my $index = substr($data, 0, 2);
return unless $index eq "\xFF\xD8";
my $pos = 2;
my $chunksize = 4;
my ($prec, $height, $width, $comps) = (undef,0,0,undef);
while (1) {
my ($xx, $mark, $len) = unpack("CCn", substr($data, $pos, $chunksize));
last if (!defined $xx || $xx != 0xFF);
last if (!defined $mark || $mark == 0xDA || $mark == 0xD9);
last if (!defined $len || $len < 2);
$pos += $chunksize;
my $block = substr($data, $pos, $len - 2);
my $blocklen = length($block);
if ( ($mark >= 0xC0 && $mark <= 0xC3) || ($mark >= 0xC5 && $mark <= 0xC7) ||
($mark >= 0xC9 && $mark <= 0xCB) || ($mark >= 0xCD && $mark <= 0xCF) ) {
($prec, $height, $width, $comps) = unpack("CnnC", substr($block, 0, 6, ""));
last;
}
$pos += $blocklen;
}
if ($height && $width) {
my $area = $height * $width;
$pms->{imageinfo}->{pc_jpeg} += $area;
$pms->{imageinfo}->{dems_jpeg}->{"${height}x${width}"} = 1;
$pms->{imageinfo}->{names_all}->{$part->{'name'}} = 1 if $part->{'name'};
dbg("imageinfo: jpeg image ".($part->{'name'} ? $part->{'name'} : '')." is $height x $width pixels ($area pixels sq.)");
}
},
);
sub _get_images {
my ($self,$pms) = @_;
my $result = 0;
foreach my $type ( 'all', keys %get_details ) {
$pms->{'imageinfo'}->{"pc_$type"} = 0;
$pms->{'imageinfo'}->{"count_$type"} = 0;
}
foreach my $p ($pms->{msg}->find_parts(qr@^image/(?:gif|png|jpe?g)$@, 1)) {
# make sure its base64 encoded
my $cte = lc($p->get_header('content-transfer-encoding') || '');
next if ($cte !~ /^base64$/);
my ($type) = $p->{'type'} =~ m@/(\w+)$@;
$type = 'jpeg' if $type eq 'jpg';
if ($type && exists $get_details{$type}) {
$get_details{$type}->($pms,$p);
$pms->{'imageinfo'}->{"count_$type"} ++;
}
}
foreach my $name ( keys %{$pms->{'imageinfo'}->{"names_all"}} ) {
dbg("imageinfo: image name $name found");
}
foreach my $type ( keys %get_details ) {
$pms->{'imageinfo'}->{'pc_all'} += $pms->{'imageinfo'}->{"pc_$type"};
$pms->{'imageinfo'}->{'count_all'} += $pms->{'imageinfo'}->{"count_$type"};
foreach my $dem ( keys %{$pms->{'imageinfo'}->{"dems_$type"}} ) {
dbg("imageinfo: adding $dem to dems_all");
$pms->{'imageinfo'}->{'dems_all'}->{$dem} = 1;
}
}
}
# -----------------------------------------
sub image_named {
my ($self,$pms,$body,$name) = @_;
return unless (defined $name);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
return 0 unless (exists $pms->{'imageinfo'}->{"names_all"});
return 1 if (exists $pms->{'imageinfo'}->{"names_all"}->{$name});
return 0;
}
# -----------------------------------------
sub image_name_regex {
my ($self,$pms,$body,$re) = @_;
return unless (defined $re);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
return 0 unless (exists $pms->{'imageinfo'}->{"names_all"});
my $hit = 0;
foreach my $name (keys %{$pms->{'imageinfo'}->{"names_all"}}) {
dbg("imageinfo: checking image named $name against regex $re");
if (eval { $name =~ /$re/ }) { $hit = 1 }
dbg("imageinfo: error in regex /$re/ - $@") if $@;
if ($hit) {
dbg("imageinfo: image_name_regex hit on $name");
return 1;
}
}
return 0;
}
# -----------------------------------------
sub image_count {
my ($self,$pms,$body,$type,$min,$max) = @_;
return unless defined $min;
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
# dbg("imageinfo: count: $min, ".($max ? $max:'').", $type, ".$pms->{'imageinfo'}->{"count_$type"});
return result_check($min, $max, $pms->{'imageinfo'}->{"count_$type"});
}
# -----------------------------------------
sub pixel_coverage {
my ($self,$pms,$body,$type,$min,$max) = @_;
return unless (defined $type && defined $min);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
# dbg("imageinfo: pc_$type: $min, ".($max ? $max:'').", $type, ".$pms->{'imageinfo'}->{"pc_$type"});
return result_check($min, $max, $pms->{'imageinfo'}->{"pc_$type"});
}
# -----------------------------------------
sub image_to_text_ratio {
my ($self,$pms,$body,$type,$min,$max) = @_;
return unless (defined $type && defined $min && defined $max);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
# depending on how you call this eval (body vs rawbody),
# the $textlen will differ.
my $textlen = length(join('',@$body));
return 0 unless ( $textlen > 0 && exists $pms->{'imageinfo'}->{"pc_$type"} && $pms->{'imageinfo'}->{"pc_$type"} > 0);
my $ratio = $textlen / $pms->{'imageinfo'}->{"pc_$type"};
dbg("imageinfo: image ratio=$ratio, min=$min max=$max");
return result_check($min, $max, $ratio, 1);
}
# -----------------------------------------
sub image_size_exact {
my ($self,$pms,$body,$type,$height,$width) = @_;
return unless (defined $type && defined $height && defined $width);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
return 0 unless (exists $pms->{'imageinfo'}->{"dems_$type"});
return 1 if (exists $pms->{'imageinfo'}->{"dems_$type"}->{"${height}x${width}"});
return 0;
}
# -----------------------------------------
sub image_size_range {
my ($self,$pms,$body,$type,$minh,$minw,$maxh,$maxw) = @_;
return unless (defined $type && defined $minh && defined $minw);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
$self->_get_images($pms);
}
my $name = 'dems_'.$type;
return unless (exists $pms->{'imageinfo'}->{$name});
foreach my $dem ( keys %{$pms->{'imageinfo'}->{"dems_$type"}}) {
my ($h,$w) = split(/x/,$dem);
next if ($h < $minh); # height less than min height
next if ($w < $minw); # width less than min width
next if (defined $maxh && $h > $maxh); # height more than max height
next if (defined $maxw && $w > $maxw); # width more than max width
# if we make it here, we have a match
return 1;
}
return 0;
}
# -----------------------------------------
sub result_check {
my ($min, $max, $value, $nomaxequal) = @_;
return 0 unless defined $value;
return 0 if ($value < $min);
return 0 if (defined $max && $value > $max);
return 0 if (defined $nomaxequal && $nomaxequal && $value == $max);
return 1;
}
# -----------------------------------------
1;

View File

@ -0,0 +1,683 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
MIMEEval - perform various tests against MIME structure and body
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::MIMEEval
body NAME_OF_RULE eval:check_for_mime
body NAME_OF_RULE eval:check_for_mime_html
body NAME_OF_RULE eval:check_for_mime_html_only
body NAME_OF_RULE eval:check_mime_multipart_ratio
body NAME_OF_RULE eval:check_msg_parse_flags
body NAME_OF_RULE eval:check_for_ascii_text_illegal
body NAME_OF_RULE eval:check_abundant_unicode_ratio
body NAME_OF_RULE eval:check_for_faraway_charset
body NAME_OF_RULE eval:check_for_uppercase
body NAME_OF_RULE eval:check_ma_non_text
body NAME_OF_RULE eval:check_base64_length
body NAME_OF_RULE eval:check_qp_ratio
=head1 DESCRIPTION
Perform various tests against MIME structure and body.
=cut
package Mail::SpamAssassin::Plugin::MIMEEval;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Locales;
use Mail::SpamAssassin::Constants qw(:sa CHARSETS_LIKELY_TO_FP_AS_CAPS);
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Logger;
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
# the important bit!
$self->register_eval_rule("check_for_mime");
$self->register_eval_rule("check_for_mime_html");
$self->register_eval_rule("check_for_mime_html_only");
$self->register_eval_rule("check_mime_multipart_ratio");
$self->register_eval_rule("check_msg_parse_flags");
$self->register_eval_rule("check_for_ascii_text_illegal");
$self->register_eval_rule("check_abundant_unicode_ratio");
$self->register_eval_rule("check_for_faraway_charset");
$self->register_eval_rule("check_for_uppercase");
$self->register_eval_rule("check_ma_non_text");
$self->register_eval_rule("check_base64_length");
$self->register_eval_rule("check_qp_ratio");
return $self;
}
###########################################################################
sub are_more_high_bits_set {
my ($self, $str) = @_;
# TODO: I suspect a tr// trick may be faster here
my $numhis = () = ($str =~ /[\200-\377]/g);
my $numlos = length($str) - $numhis;
($numlos <= $numhis && $numhis > 3);
}
=over 4
=item has_check_for_ascii_text_illegal
Adds capability check for "if can()" for check_for_ascii_text_illegal
=cut
sub has_check_for_ascii_text_illegal { 1 }
=item check_for_ascii_text_illegal
If a MIME part claims to be text/plain or text/plain;charset=us-ascii and the Content-Transfer-Encoding is 7bit (either explicitly or by default), then we should enforce the actual text being only TAB, NL, SPACE through TILDE, i.e. all 7bit characters excluding NO-WS-CTL (per RFC-2822).
All mainstream MTA's get this right.
=cut
sub check_for_ascii_text_illegal {
my ($self, $pms) = @_;
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{mime_ascii_text_illegal};
return ($pms->{mime_ascii_text_illegal} > 0);
}
=item has_check_abundant_unicode_ratio
Adds capability check for "if can()" for check_abundant_unicode_ratio
=cut
sub has_check_abundant_unicode_ratio { 1 }
=item check_abundant_unicode_ratio
A MIME part claiming to be text/plain and containing Unicode characters must be encoded as quoted-printable or base64, or use UTF data coding (typically with 8bit encoding). Any message in 7bit or 8bit encoding containing (HTML) Unicode entities will not render them as Unicode, but literally.
Thus a few such sequences might occur on a mailing list of developers discussing such characters, but a message with a high density of such characters is likely spam.
=cut
sub check_abundant_unicode_ratio {
my ($self, $pms, undef, $ratio) = @_;
# validate ratio?
return 0 unless ($ratio =~ /^\d{0,3}\.\d{1,3}$/);
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{mime_text_unicode_ratio};
return ($pms->{mime_text_unicode_ratio} >= $ratio);
}
sub check_for_faraway_charset {
my ($self, $pms, $body) = @_;
my $type = $pms->get('Content-Type',undef);
my @locales = Mail::SpamAssassin::Util::get_my_locales($self->{main}->{conf}->{ok_locales});
return 0 if grep { $_ eq "all" } @locales;
$type = get_charset_from_ct_line($type) if defined $type;
if (defined $type &&
!Mail::SpamAssassin::Locales::is_charset_ok_for_locales
($type, @locales))
{
# sanity check. Some charsets (e.g. koi8-r) include the ASCII
# 7-bit charset as well, so make sure we actually have a high
# number of 8-bit chars in the body text first.
$body = join("\n", @$body);
if ($self->are_more_high_bits_set ($body)) {
return 1;
}
}
0;
}
sub check_for_mime {
my ($self, $pms, undef, $test) = @_;
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{$test};
return $pms->{$test};
}
# any text/html MIME part
sub check_for_mime_html {
my ($self, $pms) = @_;
my $ctype = $pms->get('Content-Type');
return 1 if $ctype =~ m{^text/html}i;
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{mime_body_html_count};
return ($pms->{mime_body_html_count} > 0);
}
# HTML without some other type of MIME text part
sub check_for_mime_html_only {
my ($self, $pms) = @_;
my $ctype = $pms->get('Content-Type');
return 1 if $ctype =~ m{^text/html}i;
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{mime_body_html_count};
return 0 unless exists $pms->{mime_body_text_count};
return ($pms->{mime_body_html_count} > 0 &&
$pms->{mime_body_text_count} == 0);
}
sub check_mime_multipart_ratio {
my ($self, $pms, undef, $min, $max) = @_;
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{mime_multipart_ratio};
return ($pms->{mime_multipart_ratio} >= $min &&
$pms->{mime_multipart_ratio} < $max);
}
sub _check_mime_header {
my ($self, $pms, $ctype, $cte, $cd, $charset, $name) = @_;
$charset ||= '';
if ($ctype eq 'text/html') {
$pms->{mime_body_html_count}++;
}
elsif ($ctype =~ m@^text@i) {
$pms->{mime_body_text_count}++;
}
if ($cte =~ /base64/) {
$pms->{mime_base64_count}++;
}
elsif ($cte =~ /quoted-printable/) {
$pms->{mime_qp_count}++;
}
if ($cd && $cd =~ /attachment/) {
$pms->{mime_attachment}++;
}
if ($ctype =~ /^text/ &&
$cte =~ /base64/ &&
(!$charset || $charset =~ /(?:us-ascii|ansi_x3\.4-1968|iso-ir-6|ansi_x3\.4-1986|iso_646\.irv:1991|ascii|iso646-us|us|ibm367|cp367|csascii)/) &&
!($cd && $cd =~ /^(?:attachment|inline)/))
{
$pms->{mime_base64_encoded_text} = 1;
}
if ($charset =~ /iso-\S+-\S+\b/i &&
$charset !~ /iso-(?:8859-\d{1,2}|2022-(?:jp|kr))\b/)
{
$pms->{mime_bad_iso_charset} = 1;
}
# MIME_BASE64_LATIN: now a zero-hitter
# if (!$name &&
# $cte =~ /base64/ &&
# $charset =~ /\b(?:us-ascii|iso-8859-(?:[12349]|1[0345])|windows-(?:125[0247]))\b/)
# {
# $pms->{mime_base64_latin} = 1;
# }
# MIME_QP_NO_CHARSET: now a zero-hitter
# if ($cte =~ /quoted-printable/ && $cd =~ /inline/ && !$charset) {
# $pms->{mime_qp_inline_no_charset} = 1;
# }
# MIME_HTML_NO_CHARSET: now a zero-hitter
# if ($ctype eq 'text/html' &&
# !(defined($charset) && $charset) &&
# !($cd && $cd =~ /^(?:attachment|inline)/))
# {
# $pms->{mime_html_no_charset} = 1;
# }
if ($charset =~ /[a-z]/i) {
if (defined $pms->{mime_html_charsets}) {
$pms->{mime_html_charsets} .= " ".$charset;
} else {
$pms->{mime_html_charsets} = $charset;
}
if (! $pms->{mime_faraway_charset}) {
my @l = Mail::SpamAssassin::Util::get_my_locales($self->{main}->{conf}->{ok_locales});
if (!(grep { $_ eq "all" } @l) &&
!Mail::SpamAssassin::Locales::is_charset_ok_for_locales($charset, @l))
{
$pms->{mime_faraway_charset} = 1;
}
}
}
}
sub _check_attachments {
my ($self, $pms) = @_;
# MIME status
my $where = -1; # -1 = start, 0 = nowhere, 1 = header, 2 = body
my $qp_bytes = 0; # total bytes in QP regions
my $qp_count = 0; # QP-encoded bytes in QP regions
my @part_bytes; # MIME part total bytes
my @part_type; # MIME part types
my $normal_chars = 0; # MIME text bytes that aren't encoded
my $unicode_chars = 0; # MIME text bytes that are unicode entities
# MIME header information
my $part = -1; # MIME part index
# indicate the scan has taken place
$pms->{mime_checked_attachments} = 1;
# results
# $pms->{mime_base64_blanks} = 0; # expensive to determine, no longer avail
$pms->{mime_base64_count} = 0;
$pms->{mime_base64_encoded_text} = 0;
# $pms->{mime_base64_illegal} = 0;
# $pms->{mime_base64_latin} = 0;
$pms->{mime_body_html_count} = 0;
$pms->{mime_body_text_count} = 0;
$pms->{mime_faraway_charset} = 0;
# $pms->{mime_html_no_charset} = 0;
$pms->{mime_missing_boundary} = 0;
$pms->{mime_multipart_alternative} = 0;
$pms->{mime_multipart_ratio} = 1.0;
$pms->{mime_qp_count} = 0;
# $pms->{mime_qp_illegal} = 0;
# $pms->{mime_qp_inline_no_charset} = 0;
$pms->{mime_qp_long_line} = 0;
$pms->{mime_qp_ratio} = 0;
$pms->{mime_ascii_text_illegal} = 0;
$pms->{mime_text_unicode_ratio} = 0;
# Get all parts ...
foreach my $p ($pms->{msg}->find_parts(qr/./)) {
# message headers
my ($ctype, $boundary, $charset, $name) = Mail::SpamAssassin::Util::parse_content_type($p->get_header("content-type"));
if ($ctype eq 'multipart/alternative') {
$pms->{mime_multipart_alternative} = 1;
}
my $cte = $p->get_header('Content-Transfer-Encoding') || '';
chomp($cte = defined($cte) ? lc $cte : "");
my $cd = $p->get_header('Content-Disposition') || '';
chomp($cd = defined($cd) ? lc $cd : "");
$charset = lc $charset if ($charset);
$name = lc $name if ($name);
$self->_check_mime_header($pms, $ctype, $cte, $cd, $charset, $name);
# If we're not in a leaf node in the tree, there will be no raw
# section, so skip it.
if (! $p->is_leaf()) {
next;
}
$part++;
$part_type[$part] = $ctype;
$part_bytes[$part] = 0 if $cd !~ /attachment/;
my $cte_is_base64 = $cte =~ /base64/i;
my $previous = '';
foreach (@{$p->raw()}) {
# if ($cte_is_base64) {
# if ($previous =~ /^\s*$/ && /^\s*$/) { # expensive, avoid!
# $pms->{mime_base64_blanks} = 1; # never used, don't bother
# }
# # MIME_BASE64_ILLEGAL: now a zero-hitter
# # if (m@[^A-Za-z0-9+/=\n]@ || /=[^=\s]/) {
# # $pms->{mime_base64_illegal} = 1;
# # }
# }
# if ($pms->{mime_html_no_charset} && $ctype eq 'text/html' && defined $charset) {
# $pms->{mime_html_no_charset} = 0;
# }
if ($pms->{mime_multipart_alternative} && $cd !~ /attachment/ &&
($ctype eq 'text/plain' || $ctype eq 'text/html')) {
$part_bytes[$part] += length;
}
if ($where != 1 && $cte eq "quoted-printable" && ! /^SPAM: /) {
# RFC 5322: Each line SHOULD be no more than 78 characters,
# excluding the CRLF.
# RFC 2045: The Quoted-Printable encoding REQUIRES that
# encoded lines be no more than 76 characters long.
# Bug 5491: 6% of email classified as HAM by SA triggered the
# MIME_QP_LONG_LINE rule. Apple Mail can generate a QP-line
# that is 2 chars too long. Same goes for Outlook Web Access.
# lines include one trailing \n character
# if (length > 76+1) { # conforms to RFC 5322 and RFC 2045
if (length > 78+1) { # conforms to RFC 5322 only, not RFC 2045
$pms->{mime_qp_long_line} = 1;
}
$qp_bytes += length;
# MIME_QP_DEFICIENT: zero-hitter now
# check for illegal substrings (RFC 2045), hexadecimal values 7F-FF and
# control characters other than TAB, or CR and LF as parts of CRLF pairs
# if (!$pms->{mime_qp_illegal} && /[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xff]/)
# {
# $pms->{mime_qp_illegal} = 1;
# }
# count excessive QP bytes
if (index($_, '=') != -1) {
# whoever wrote this next line is an evil hacker -- jm
my $qp = () = m/=(?:09|3[0-9ABCEF]|[2456][0-9A-F]|7[0-9A-E])/g;
if ($qp) {
$qp_count += $qp;
# tabs and spaces at end of encoded line are okay. Also, multiple
# whitespace at the end of a line are OK, like ">=20=20=20=20=20=20".
my ($trailing) = m/((?:=09|=20)+)\s*$/g;
if ($trailing) {
$qp_count -= (length($trailing) / 3);
}
}
}
}
# if our charset is ASCII, this should only contain 7-bit characters
# except NUL or a free-standing CR. anything else is a violation of
# the definition of charset="us-ascii".
if ($ctype eq 'text/plain' && (!defined $charset || $charset eq 'us-ascii')) {
# no re "strict"; # since perl 5.21.8: Ranges of ASCII printables...
if (m/[\x00\x0d\x80-\xff]+/) {
if (would_log('dbg', 'eval')) {
my $str = $_;
$str =~ s/([\x00\x0d\x80-\xff]+)/'<' . unpack('H*', $1) . '>'/eg;
dbg("check: ascii_text_illegal: matches " . $str . "\n");
}
$pms->{mime_ascii_text_illegal}++;
}
}
# if we're text/plain, we should never see unicode escapes in this
# format, especially not for 7bit or 8bit.
if ($ctype eq 'text/plain' && ($cte eq '' || $cte eq '7bit' || $cte eq '8bit')) {
my ($text, $subs) = $_;
$subs = $text =~ s/&#x[0-9A-F]{4};//g;
$normal_chars += length($text);
$unicode_chars += $subs;
if ($subs && would_log('dbg', 'eval')) {
my $str = $_;
$str = substr($str, 0, 512) . '...' if (length($str) > 512);
dbg("check: abundant_unicode: " . $str . " (" . $subs . ")\n");
}
}
$previous = $_;
}
}
if ($qp_bytes) {
$pms->{mime_qp_ratio} = $qp_count / $qp_bytes;
$pms->{mime_qp_count} = $qp_count;
$pms->{mime_qp_bytes} = $qp_bytes;
}
if ($normal_chars) {
$pms->{mime_text_unicode_ratio} = $unicode_chars / $normal_chars;
}
if ($pms->{mime_multipart_alternative}) {
my $text;
my $html;
# bug 4207: we want the size of the last parts
for (my $i = $part; $i >= 0; $i--) {
next if !defined $part_bytes[$i];
if (!defined($html) && $part_type[$i] eq 'text/html') {
$html = $part_bytes[$i];
}
elsif (!defined($text) && $part_type[$i] eq 'text/plain') {
$text = $part_bytes[$i];
}
last if (defined($html) && defined($text));
}
if (defined($text) && defined($html) && $html > 0) {
$pms->{mime_multipart_ratio} = ($text / $html);
}
}
# Look to see if any multipart boundaries are not "balanced"
foreach my $val (values %{$pms->{msg}->{mime_boundary_state}}) {
if ($val != 0) {
$pms->{mime_missing_boundary} = 1;
last;
}
}
}
=item has_check_qp_ratio
Adds capability check for "if can()" for check_qp_ratio
=cut
sub has_check_qp_ratio { 1 }
=item check_qp_ratio
Takes a min ratio to use in eval to see if there is an spamminess to the ratio of
quoted printable to total bytes in an email.
=back
=cut
sub check_qp_ratio {
my ($self, $pms, undef, $min) = @_;
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{mime_qp_ratio};
my $qp_ratio = $pms->{mime_qp_ratio};
dbg("eval: qp_ratio - %s - check for min of %s", $qp_ratio, $min);
return (defined $qp_ratio && $qp_ratio >= $min) ? 1 : 0;
}
sub check_msg_parse_flags {
my($self, $pms, $type, $type2) = @_;
$type = $type2 if ref($type);
return defined $pms->{msg}->{$type};
}
sub check_for_uppercase {
my ($self, $pms, $body, $min, $max) = @_;
local ($_);
if (exists $pms->{uppercase}) {
return ($pms->{uppercase} > $min && $pms->{uppercase} <= $max);
}
if ($self->body_charset_is_likely_to_fp($pms)) {
$pms->{uppercase} = 0; return 0;
}
# Dec 20 2002 jm: trade off some speed for low memory footprint, by
# iterating over the array computing sums, instead of joining the
# array into a giant string and working from that.
my $len = 0;
my $lower = 0;
my $upper = 0;
foreach (@{$body}) {
# examine lines in the body that have an intermediate space
next unless /\S\s+\S/;
# strip out lingering base64 (currently possible for forwarded messages)
next if /^(?:[A-Za-z0-9+\/=]{60,76} ){2}/;
my $line = $_; # copy so we don't muck up the original
# remove shift-JIS charset codes
$line =~ s/\x1b\$B.*\x1b\(B//gs;
$len += length($line);
# count numerals as lower case, otherwise 'date|mail' is spam
$lower += ($line =~ tr/a-z0-9//d);
$upper += ($line =~ tr/A-Z//);
}
# report only on mails above a minimum size; otherwise one
# or two acronyms can throw it off
if ($len < 200) {
$pms->{uppercase} = 0;
return 0;
}
if (($upper + $lower) == 0) {
$pms->{uppercase} = 0;
} else {
$pms->{uppercase} = ($upper / ($upper + $lower)) * 100;
}
return ($pms->{uppercase} > $min && $pms->{uppercase} <= $max);
}
sub body_charset_is_likely_to_fp {
my ($self, $pms) = @_;
# check for charsets where this test will FP -- iso-2022-jp, gb2312,
# koi8-r etc.
#
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
my @charsets;
my $type = $pms->get('Content-Type',undef);
$type = get_charset_from_ct_line($type) if defined $type;
push (@charsets, $type) if defined $type;
if (defined $pms->{mime_html_charsets}) {
push (@charsets, split(' ', $pms->{mime_html_charsets}));
}
my $CHARSETS_LIKELY_TO_FP_AS_CAPS = CHARSETS_LIKELY_TO_FP_AS_CAPS;
foreach my $charset (@charsets) {
if ($charset =~ /^${CHARSETS_LIKELY_TO_FP_AS_CAPS}$/) {
return 1;
}
}
return 0;
}
sub get_charset_from_ct_line {
my $type = shift;
if (!defined $type) { return; }
if ($type =~ /charset="([^"]+)"/i) { return $1; }
if ($type =~ /charset='([^']+)'/i) { return $1; }
if ($type =~ /charset=(\S+)/i) { return $1; }
return;
}
# came up on the users@ list, look for multipart/alternative parts which
# include non-text parts -- skip certain types which occur normally in ham
sub check_ma_non_text {
my($self, $pms) = @_;
foreach my $map ($pms->{msg}->find_parts(qr@^multipart/alternative$@i)) {
foreach my $p ($map->find_parts(qr/./, 1, 0)) {
next if (lc $p->{'type'} eq 'multipart/related');
next if (lc $p->{'type'} eq 'application/rtf');
next if ($p->{'type'} =~ m@^text/@i);
return 1;
}
}
return 0;
}
sub check_base64_length {
my $self = shift;
my $pms = shift;
shift; # body array, unnecessary
my $min = shift;
my $max = shift;
if (!defined $pms->{base64_length}) {
$pms->{base64_length} = $self->_check_base64_length($pms->{msg});
}
return 0 if (defined $max && $pms->{base64_length} > $max);
return $pms->{base64_length} >= $min;
}
sub _check_base64_length {
my $self = shift;
my $msg = shift;
my $result = 0;
foreach my $p ($msg->find_parts(qr@.@, 1)) {
my $ctype=
Mail::SpamAssassin::Util::parse_content_type($p->get_header('content-type'));
# FPs from Google Calendar invites, etc.
# perhaps just limit to test, and image?
next if ($ctype eq 'application/ics');
my $cte = lc($p->get_header('content-transfer-encoding') || '');
next if ($cte !~ /^base64$/);
foreach my $l ( @{$p->raw()} ) {
$result = length $l if length $l > $result;
}
}
return $result;
}
1;

View File

@ -0,0 +1,227 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
MIMEHeader - perform regexp tests against MIME headers
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
mimeheader NAME_OF_RULE Content-Id =~ /foo/
=head1 DESCRIPTION
This plugin allows regexp rules to be written against MIME headers in the
message.
=head1 RULE DEFINITIONS AND PRIVILEGED SETTINGS
=over 4
=item mimeheader NAME_OF_RULE Header-Name =~ /pattern/modifiers
Specify a rule. C<NAME_OF_RULE> is the name of the rule to be used,
C<Header-Name> is the name of the MIME header to check, and
C</pattern/modifiers> is the Perl regular expression to match against this.
Note that in a message of multiple parts, each header will be checked
against the pattern separately. In other words, if multiple parts
have a 'Content-Type' header, each header's value will be tested
individually as a separate string.
Header names are considered case-insensitive.
The header values are normally cleaned up a little; for example, whitespace
around the newline character in "folded" headers will be replaced with a single
space. Append C<:raw> to the header name to retrieve the raw, undecoded value,
including pristine whitespace, instead.
=back
=cut
package Mail::SpamAssassin::Plugin::MIMEHeader;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var compile_regexp);
use Mail::SpamAssassin::Constants qw(:sa);
our @ISA = qw(Mail::SpamAssassin::Plugin);
our @TEMPORARY_METHODS;
my $RULENAME_RE = RULENAME_RE;
# ---------------------------------------------------------------------------
# constructor
sub new {
my $class = shift;
my $samain = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($samain);
bless ($self, $class);
$self->set_config($samain->{conf});
return $self;
}
# ---------------------------------------------------------------------------
sub set_config {
my($self, $conf) = @_;
my @cmds;
my $pluginobj = $self; # allow use inside the closure below
push (@cmds, {
setting => 'mimeheader',
is_priv => 1,
code => sub {
my ($self, $key, $value, $line) = @_;
local ($1,$2,$3);
if ($value !~ s/^(${RULENAME_RE})\s+//) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
my $rulename = untaint_var($1);
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
# Take :raw to hdrname!
if ($value !~ /^([^:\s]+(?:\:(?:raw)?)?)\s*([=!]~)\s*(.+)$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
my $hdrname = $1;
my $negated = $2 eq '!~' ? 1 : 0;
my $pattern = $3;
$hdrname =~ s/:$//;
my $if_unset = '';
if ($pattern =~ s/\s+\[if-unset:\s+(.+)\]$//) {
$if_unset = $1;
}
my ($rec, $err) = compile_regexp($pattern, 1);
if (!$rec) {
info("mimeheader: invalid regexp for $rulename '$pattern': $err");
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{mimeheader_tests}->{$rulename} = {
hdr => $hdrname,
negated => $negated,
if_unset => $if_unset,
pattern => $rec
};
# now here's a hack; generate a fake eval rule function to
# call this rule's _real_ code!
# TODO: we should have a more elegant way for new rule types to
# be defined
my $evalfn = "_mimeheader_eval_$rulename";
# don't redefine the subroutine if it already exists!
# this causes lots of annoying warnings and such during things like
# "make test".
return if (defined &{'Mail::SpamAssassin::Plugin::MIMEHeader::'.$evalfn});
$self->{parser}->add_test($rulename, $evalfn."()",
$Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
# evalfn/rulename safe, sanitized by $RULENAME_RE
my $evalcode = '
sub Mail::SpamAssassin::Plugin::MIMEHeader::'.$evalfn.' {
$_[0]->eval_hook_called($_[1], q{'.$rulename.'});
}
';
eval
$evalcode . '; 1'
or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn "mimeheader: plugin error: $eval_stat\n";
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
};
$pluginobj->register_eval_rule($evalfn);
push @TEMPORARY_METHODS, "Mail::SpamAssassin::Plugin::MIMEHeader::${evalfn}";
}
});
$conf->{parser}->register_commands(\@cmds);
}
# ---------------------------------------------------------------------------
sub eval_hook_called {
my ($pobj, $scanner, $rulename) = @_;
my $rule = $scanner->{conf}->{mimeheader_tests}->{$rulename};
my $hdr = $rule->{hdr};
my $negated = $rule->{negated};
my $if_unset = $rule->{if_unset};
my $pattern = $rule->{pattern};
my $getraw;
if ($hdr =~ s/:raw$//) {
$getraw = 1;
} else {
$getraw = 0;
}
foreach my $p ($scanner->{msg}->find_parts(qr/./)) {
my $val;
if ($getraw) {
$val = $p->raw_header($hdr);
} else {
$val = $p->get_header($hdr);
}
$val = $if_unset if !defined $val;
if ($val =~ $pattern) {
return ($negated ? 0 : 1);
}
}
return ($negated ? 1 : 0);
}
# ---------------------------------------------------------------------------
sub finish_tests {
my ($self, $params) = @_;
foreach my $method (@TEMPORARY_METHODS) {
undef &{$method};
}
@TEMPORARY_METHODS = (); # clear for next time
}
# ---------------------------------------------------------------------------
1;

View File

@ -0,0 +1,974 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::OLEVBMacro - search attached documents for evidence of containing an OLE Macro
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::OLEVBMacro
ifplugin Mail::SpamAssassin::Plugin::OLEVBMacro
body OLEMACRO eval:check_olemacro()
describe OLEMACRO Attachment has an Office Macro
body OLEMACRO_MALICE eval:check_olemacro_malice()
describe OLEMACRO_MALICE Potentially malicious Office Macro
body OLEMACRO_ENCRYPTED eval:check_olemacro_encrypted()
describe OLEMACRO_ENCRYPTED Has an Office doc that is encrypted
body OLEMACRO_RENAME eval:check_olemacro_renamed()
describe OLEMACRO_RENAME Has an Office doc that has been renamed
body OLEMACRO_ZIP_PW eval:check_olemacro_zip_password()
describe OLEMACRO_ZIP_PW Has an Office doc that is password protected in a zip
body OLEMACRO_CSV eval:check_olemacro_csv()
describe OLEMACRO_CSV Malicious csv file that tries to exec cmd.exe detected
endif
=head1 DESCRIPTION
This plugin detects OLE Macro inside documents attached to emails.
It can detect documents inside zip files as well as encrypted documents.
=head1 REQUIREMENT
This plugin requires Archive::Zip and IO::String perl modules.
=head1 USER PREFERENCES
The following options can be used in both site-wide (C<local.cf>) and
user-specific (C<user_prefs>) configuration files to customize how
the module handles attached documents
=cut
package Mail::SpamAssassin::Plugin::OLEVBMacro;
use strict;
use warnings;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Util qw(compile_regexp);
use constant HAS_ARCHIVE_ZIP => eval { require Archive::Zip; };
use constant HAS_IO_STRING => eval { require IO::String; };
BEGIN
{
eval{
import Archive::Zip qw( :ERROR_CODES :CONSTANTS )
};
eval{
import IO::String
};
}
use re 'taint';
use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);
our $VERSION = '0.52';
# https://www.openoffice.org/sc/compdocfileformat.pdf
# http://blog.rootshell.be/2015/01/08/searching-for-microsoft-office-files-containing-macro/
my $marker1 = "\xd0\xcf\x11\xe0";
my $marker2 = "\x00\x41\x74\x74\x72\x69\x62\x75\x74\x00";
# embedded object in rtf files (https://www.biblioscape.com/rtf15_spec.htm)
my $marker3 = "\x5c\x6f\x62\x6a\x65\x6d\x62";
my $marker4 = "\x5c\x6f\x62\x6a\x64\x61\x74";
my $marker5 = "\x5c\x20\x6f\x62\x6a\x64\x61\x74";
# Excel .xlsx encrypted package, thanks to Dan Bagwell for the sample
my $encrypted_marker = "\x45\x00\x6e\x00\x63\x00\x72\x00\x79\x00\x70\x00\x74\x00\x65\x00\x64\x00\x50\x00\x61\x00\x63\x00\x6b\x00\x61\x00\x67\x00\x65";
# this code burps an ugly message if it fails, but that's redirected elsewhere
# AZ_OK is a constant exported by Archive::Zip
my $az_ok;
eval '$az_ok = AZ_OK';
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->set_config($mailsaobject->{conf});
$self->register_eval_rule("check_olemacro");
$self->register_eval_rule("check_olemacro_csv");
$self->register_eval_rule("check_olemacro_malice");
$self->register_eval_rule("check_olemacro_renamed");
$self->register_eval_rule("check_olemacro_encrypted");
$self->register_eval_rule("check_olemacro_zip_password");
return $self;
}
sub dbg {
Mail::SpamAssassin::Plugin::dbg ("OLEVBMacro: @_");
}
sub set_config {
my ($self, $conf) = @_;
my @cmds = ();
push(@cmds, {
setting => 'olemacro_num_mime',
default => 5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=over 4
=item olemacro_num_mime (default: 5)
Configure the maximum number of matching MIME parts the plugin will scan
=back
=cut
push(@cmds, {
setting => 'olemacro_num_zip',
default => 8,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=over 4
=item olemacro_num_zip (default: 8)
Configure the maximum number of matching zip members the plugin will scan
=back
=cut
push(@cmds, {
setting => 'olemacro_zip_depth',
default => 2,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=over 4
=item olemacro_zip_depth (default: 2)
Depth to recurse within Zip files
=back
=cut
push(@cmds, {
setting => 'olemacro_extended_scan',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
=over 4
=item olemacro_extended_scan ( 0 | 1 ) (default: 0)
Scan more files for potential macros, the C<olemacro_skip_exts> parameter will still be honored.
This parameter is off by default, this option is needed only to run
C<eval:check_olemacro_renamed> rule.
If this is turned on consider adjusting values for C<olemacro_num_mime> and C<olemacro_num_zip>
and prepare for more CPU overhead
=back
=cut
push(@cmds, {
setting => 'olemacro_prefer_contentdisposition',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
=over 4
=item olemacro_prefer_contentdisposition ( 0 | 1 ) (default: 1)
Choose if the content-disposition header filename be preferred if ambiguity is encountered whilst trying to get filename
=back
=cut
push(@cmds, {
setting => 'olemacro_max_file',
default => 1024000,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=over 4
=item olemacro_max_file (default: 1024000)
Configure the largest file that the plugin will decode from the MIME objects
=back
=cut
# https://blogs.msdn.microsoft.com/vsofficedeveloper/2008/05/08/office-2007-file-format-mime-types-for-http-content-streaming-2/
# https://technet.microsoft.com/en-us/library/ee309278(office.12).aspx
push(@cmds, {
setting => 'olemacro_exts',
default => qr/(?:doc|docx|dot|pot|ppa|pps|ppt|rtf|sldm|xl|xla|xls|xlsx|xlt|xltx|xslb)$/,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my ($rec, $err) = compile_regexp($value, 0);
if (!$rec) {
dbg("config: invalid olemacro_exts '$value': $err");
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{olemacro_exts} = $rec;
},
}
);
=over 4
=item olemacro_exts (default: (?:doc|docx|dot|pot|ppa|pps|ppt|rtf|sldm|xl|xla|xls|xlsx|xlt|xltx|xslb)$)
Set the case-insensitive regexp used to configure the extensions the plugin
targets for macro scanning
=back
=cut
push(@cmds, {
setting => 'olemacro_macro_exts',
default => qr/(?:docm|dotm|ppam|potm|ppst|ppsm|pptm|sldm|xlm|xlam|xlsb|xlsm|xltm|xltx|xps)$/,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my ($rec, $err) = compile_regexp($value, 0);
if (!$rec) {
dbg("config: invalid olemacro_macro_exts '$value': $err");
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{olemacro_macro_exts} = $rec;
},
});
=over 4
=item olemacro_macro_exts (default: (?:docm|dotm|ppam|potm|ppst|ppsm|pptm|sldm|xlm|xlam|xlsb|xlsm|xltm|xltx|xps)$)
Set the case-insensitive regexp used to configure the extensions the plugin
treats as containing a macro
=back
=cut
push(@cmds, {
setting => 'olemacro_skip_exts',
default => qr/(?:dotx|potx|ppsx|pptx|sldx)$/,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my ($rec, $err) = compile_regexp($value, 0);
if (!$rec) {
dbg("config: invalid olemacro_skip_exts '$value': $err");
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{olemacro_skip_exts} = $rec;
},
});
=over 4
=item olemacro_skip_exts (default: (?:dotx|potx|ppsx|pptx|sldx|xltx)$)
Set the case-insensitive regexp used to configure extensions for the plugin
to skip entirely, these should only be guaranteed macro free files
=back
=cut
push(@cmds, {
setting => 'olemacro_skip_ctypes',
default => qr/^(?:text\/)/,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my ($rec, $err) = compile_regexp($value, 0);
if (!$rec) {
dbg("config: invalid olemacro_skip_ctypes '$value': $err");
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{olemacro_skip_ctypes} = $rec;
},
});
=over 4
=item olemacro_skip_ctypes (default: ^(?:text\/))
Set the case-insensitive regexp used to configure content types for the
plugin to skip entirely, these should only be guaranteed macro free
=back
=cut
push(@cmds, {
setting => 'olemacro_zips',
default => qr/(?:zip)$/,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my ($rec, $err) = compile_regexp($value, 0);
if (!$rec) {
dbg("config: invalid olemacro_zips '$value': $err");
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{olemacro_zips} = $rec;
},
});
=over 4
=item olemacro_zips (default: (?:zip)$)
Set the case-insensitive regexp used to configure extensions for the plugin
to target as zip files, files listed in configs above are also tested for zip
=back
=cut
$conf->{parser}->register_commands(\@cmds);
}
sub check_olemacro {
my ($self,$pms,$body,$name) = @_;
_check_attachments(@_) unless exists $pms->{olemacro_exists};
return $pms->{olemacro_exists};
}
sub check_olemacro_csv {
my ($self,$pms,$body,$name) = @_;
my $chunk_size = $pms->{conf}->{olemacro_max_file};
foreach my $part ($pms->{msg}->find_parts(qr/./, 1)) {
next unless ($part->{type} eq "text/plain");
my ($ctt, $ctd, $cte, $name) = _get_part_details($pms, $part);
next unless defined $ctt;
next if $name eq '';
# we skipped what we need/want to
my $data = undef;
# if name extension is csv - return true
if ($name =~ /\.csv/i) {
dbg("Found csv file with name $name");
$data = $part->decode($chunk_size) unless defined $data;
if($data =~ /MSEXCEL\|.{1,20}Windows\\System32\\cmd\.exe/) {
$pms->{olemacro_csv} = 1;
}
}
}
return $pms->{olemacro_csv};
}
sub check_olemacro_malice {
my ($self,$pms,$body,$name) = @_;
_check_attachments(@_) unless exists $pms->{olemacro_malice};
return $pms->{olemacro_malice};
}
sub check_olemacro_renamed {
my ($self,$pms,$body,$name) = @_;
_check_attachments(@_) unless exists $pms->{olemacro_renamed};
if ( $pms->{olemacro_renamed} == 1 ) {
dbg("Found Office document with a renamed macro");
}
return $pms->{olemacro_renamed};
}
sub check_olemacro_encrypted {
my ($self,$pms,$body,$name) = @_;
_check_attachments(@_) unless exists $pms->{olemacro_encrypted};
return $pms->{olemacro_encrypted};
}
sub check_olemacro_zip_password {
my ($self,$pms,$body,$name) = @_;
_check_attachments(@_) unless exists $pms->{olemacro_zip_password};
return $pms->{olemacro_zip_password};
}
sub _check_attachments {
my ($self,$pms,$body,$name) = @_;
my $mimec = 0;
my $chunk_size = $pms->{conf}->{olemacro_max_file};
$pms->{olemacro_exists} = 0;
$pms->{olemacro_malice} = 0;
$pms->{olemacro_renamed} = 0;
$pms->{olemacro_encrypted} = 0;
$pms->{olemacro_zip_password} = 0;
$pms->{olemacro_office_xml} = 0;
foreach my $part ($pms->{msg}->find_parts(qr/./, 1)) {
next if ($part->{type} =~ /$pms->{conf}->{olemacro_skip_ctypes}/i);
my ($ctt, $ctd, $cte, $name) = _get_part_details($pms, $part);
next unless defined $ctt;
next if $name eq '';
next if ($name =~ /$pms->{conf}->{olemacro_skip_exts}/i);
# we skipped what we need/want to
my $data = undef;
# if name is macrotype - return true
if ($name =~ /$pms->{conf}->{olemacro_macro_exts}/i) {
dbg("Found macrotype attachment with name $name");
$pms->{olemacro_exists} = 1;
$data = $part->decode($chunk_size) unless defined $data;
_check_encrypted_doc($pms, $name, $data);
_check_macrotype_doc($pms, $name, $data);
return 1 if $pms->{olemacro_exists} == 1;
}
# if name is ext type - check and return true if needed
if ($name =~ /$pms->{conf}->{olemacro_exts}/i) {
dbg("Found attachment with name $name");
$data = $part->decode($chunk_size) unless defined $data;
_check_encrypted_doc($pms, $name, $data);
_check_oldtype_doc($pms, $name, $data);
# zipped doc that matches olemacro_exts - strange
if (_check_macrotype_doc($pms, $name, $data)) {
$pms->{olemacro_renamed} = $pms->{olemacro_office_xml};
}
return 1 if $pms->{olemacro_exists} == 1;
}
if ($name =~ /$pms->{conf}->{olemacro_zips}/i) {
dbg("Found zip attachment with name $name");
$data = $part->decode($chunk_size) unless defined $data;
_check_zip($pms, $name, $data);
return 1 if $pms->{olemacro_exists} == 1;
}
if ($pms->{conf}->{olemacro_extended_scan} == 1) {
dbg("Extended scan attachment with name $name");
$data = $part->decode($chunk_size) unless defined $data;
if (_is_office_doc($data)) {
$pms->{olemacro_renamed} = 1;
dbg("Found $name to be an Office Doc!");
_check_encrypted_doc($pms, $name, $data);
_check_oldtype_doc($pms, $name, $data);
}
if (_check_macrotype_doc($pms, $name, $data)) {
$pms->{olemacro_renamed} = $pms->{olemacro_office_xml};
}
_check_zip($pms, $name, $data);
return 1 if $pms->{olemacro_exists} == 1;
}
# if we get to here with data a part has been scanned nudge as reqd
$mimec+=1 if defined $data;
if ($mimec >= $pms->{conf}->{olemacro_num_mime}) {
dbg('MIME limit reached');
last;
}
dbg("No Marker of a Macro found in file $name");
}
return 0;
}
sub _check_zip {
my ($pms, $name, $data, $depth) = @_;
if (!HAS_ARCHIVE_ZIP) {
warn "check_zip not supported, required module Archive::Zip missing\n";
return 0;
}
return 0 if $pms->{conf}->{olemacro_num_zip} == 0;
$depth = $depth || 1;
return 0 if ($depth > $pms->{conf}->{olemacro_zip_depth});
return 0 unless _is_zip_file($name, $data);
my $zip = _open_zip_handle($data);
return 0 unless $zip;
dbg("Zip opened");
my $filec = 0;
my @members = $zip->members();
# foreach zip member
# - skip if in skip exts
# - return 1 if in macro types
# - check for marker if doc type
# - check if a zip
foreach my $member (@members){
my $mname = lc $member->fileName();
next if ($mname =~ /$pms->{conf}->{olemacro_skip_exts}/i);
my $data = undef;
my $status = undef;
# if name is macrotype - return true
if ($mname =~ /$pms->{conf}->{olemacro_macro_exts}/i) {
dbg("Found macrotype zip member $mname");
$pms->{olemacro_exists} = 1;
if ($member->isEncrypted()) {
dbg("Zip member $mname is encrypted (zip pw)");
$pms->{olemacro_zip_password} = 1;
return 1;
}
( $data, $status ) = $member->contents() unless defined $data;
return 1 unless $status == $az_ok;
_check_encrypted_doc($pms, $name, $data);
_check_macrotype_doc($pms, $name, $data);
return 1 if $pms->{olemacro_exists} == 1;
}
if ($mname =~ /$pms->{conf}->{olemacro_exts}/i) {
dbg("Found zip member $mname");
if ($member->isEncrypted()) {
dbg("Zip member $mname is encrypted (zip pw)");
$pms->{olemacro_zip_password} = 1;
next;
}
( $data, $status ) = $member->contents() unless defined $data;
next unless $status == $az_ok;
_check_encrypted_doc($pms, $name, $data);
_check_oldtype_doc($pms, $name, $data);
# zipped doc that matches olemacro_exts - strange
if (_check_macrotype_doc($pms, $name, $data)) {
$pms->{olemacro_renamed} = $pms->{olemacro_office_xml};
}
return 1 if $pms->{olemacro_exists} == 1;
}
if ($mname =~ /$pms->{conf}->{olemacro_zips}/i) {
dbg("Found zippy zip member $mname");
( $data, $status ) = $member->contents() unless defined $data;
next unless $status == $az_ok;
_check_zip($pms, $name, $data, $depth);
return 1 if $pms->{olemacro_exists} == 1;
}
if ($pms->{conf}->{olemacro_extended_scan} == 1) {
dbg("Extended scan attachment with member name $mname");
( $data, $status ) = $member->contents() unless defined $data;
next unless $status == $az_ok;
if (_is_office_doc($data)) {
dbg("Found $name to be an Office Doc!");
_check_encrypted_doc($pms, $name, $data);
$pms->{olemacro_renamed} = 1;
_check_oldtype_doc($pms, $name, $data);
}
if (_check_macrotype_doc($pms, $name, $data)) {
$pms->{olemacro_renamed} = $pms->{olemacro_office_xml};
}
_check_zip($pms, $name, $data, $depth);
return 1 if $pms->{olemacro_exists} == 1;
}
# if we get to here with data a member has been scanned nudge as reqd
$filec+=1 if defined $data;
if ($filec >= $pms->{conf}->{olemacro_num_zip}) {
dbg('Zip limit reached');
last;
}
}
return 0;
}
sub _get_part_details {
my ($pms, $part) = @_;
#https://en.wikipedia.org/wiki/MIME#Content-Disposition
#https://github.com/mikel/mail/pull/464
my $ctt = $part->get_header('content-type');
return undef unless defined $ctt; ## no critic (ProhibitExplicitReturnUndef)
my $cte = lc($part->get_header('content-transfer-encoding') || '');
return undef unless ($cte =~ /^(?:base64|quoted\-printable)$/); ## no critic (ProhibitExplicitReturnUndef)
$ctt = _decode_part_header($part, $ctt || '');
my $name = '';
my $cttname = '';
my $ctdname = '';
if($ctt =~ m/(?:file)?name\s*=\s*["']?([^"';]*)["']?/is){
$cttname = $1;
$cttname =~ s/\s+$//;
}
my $ctd = $part->get_header('content-disposition');
$ctd = _decode_part_header($part, $ctd || '');
if($ctd =~ m/filename\s*=\s*["']?([^"';]*)["']?/is){
$ctdname = $1;
$ctdname =~ s/\s+$//;
}
if (lc $ctdname eq lc $cttname) {
$name = $ctdname;
} elsif ($ctdname eq '') {
$name = $cttname;
} elsif ($cttname eq '') {
$name = $ctdname;
} else {
if ($pms->{conf}->{olemacro_prefer_contentdisposition}) {
$name = $ctdname;
} else {
$name = $cttname;
}
}
return $ctt, $ctd, $cte, lc $name;
}
sub _open_zip_handle {
my ($data) = @_;
# open our archive from raw data
my $SH = IO::String->new($data);
Archive::Zip::setErrorHandler( \&_zip_error_handler );
my $zip = Archive::Zip->new();
if($zip->readFromFileHandle( $SH ) != $az_ok){
dbg("cannot read zipfile");
# as we cannot read it its not a zip (or too big/corrupted)
# so skip processing.
return 0;
}
return $zip;
}
sub _check_macrotype_doc {
my ($pms, $name, $data) = @_;
if (!HAS_IO_STRING) {
warn "check_macrotype_doc not supported, required module IO::String missing\n";
return 0;
}
return 0 unless _is_zip_file($name, $data);
my $zip = _open_zip_handle($data);
return 0 unless $zip;
#https://www.decalage.info/vba_tools
my %macrofiles = (
'word/vbaproject.bin' => 'word2k7',
'macros/vba/_vba_project' => 'word97',
'xl/vbaproject.bin' => 'xl2k7',
'xl/embeddings/oleObject1.bin' => 'xl2k13',
'_vba_project_cur/vba/_vba_project' => 'xl97',
'ppt/vbaproject.bin' => 'ppt2k7',
);
my @members = $zip->members();
foreach my $member (@members){
my $mname = lc $member->fileName();
if (exists($macrofiles{$mname})) {
dbg("Found $macrofiles{$mname} vba file");
$pms->{olemacro_exists} = 1;
last;
}
}
# Look for a member named [Content_Types].xml and do checks
if (my $ctypesxml = $zip->memberNamed('[Content_Types].xml')) {
dbg('Found [Content_Types].xml file');
$pms->{olemacro_office_xml} = 1;
if (!$pms->{olemacro_exists}) {
my ( $data, $status ) = $ctypesxml->contents();
if (($status == $az_ok) && (_check_ctype_xml($data))) {
$pms->{olemacro_exists} = 1;
}
}
}
if (($pms->{olemacro_exists}) && (_find_malice_bins($zip))) {
$pms->{olemacro_malice} = 1;
}
return $pms->{olemacro_exists};
}
# Office 2003
sub _check_oldtype_doc {
my ($pms, $name, $data) = @_;
if (_check_markers($data)) {
$pms->{olemacro_exists} = 1;
if (_check_malice($data)) {
$pms->{olemacro_malice} = 1;
}
return 1;
}
}
# Encrypted doc
sub _check_encrypted_doc {
my ($pms, $name, $data) = @_;
if (_is_encrypted_doc($data)) {
dbg("File $name is encrypted");
$pms->{olemacro_encrypted} = 1;
}
return $pms->{olemacro_encrypted};
}
sub _is_encrypted_doc {
my ($data) = @_;
#http://stackoverflow.com/questions/14347513/how-to-detect-if-a-word-document-is-password-protected-before-uploading-the-file/14347730#14347730
if (_is_office_doc($data)) {
if ($data =~ /(?:<encryption xmlns)/i) {
return 1;
}
if (index($data, "\x13") == 523) {
return 1;
}
if (index($data, "\x2f") == 532) {
return 1;
}
if (index($data, "\xfe") == 520) {
return 1;
}
my $tdata = substr $data, 2000;
$tdata =~ s/\\0/ /g;
if (index($tdata, "E n c r y p t e d P a c k a g e") > -1) {
return 1;
}
if (index($tdata, $encrypted_marker) > -1) {
return 1;
}
}
}
sub _is_office_doc {
my ($data) = @_;
if (index($data, $marker1) == 0) {
return 1;
}
}
sub _is_zip_file {
my ($name, $data) = @_;
if (index($data, 'PK') == 0) {
return 1;
} else {
return($name =~ /(?:zip)$/i);
}
}
sub _check_markers {
my ($data) = @_;
if (index($data, $marker1) == 0 && index($data, $marker2) > -1) {
dbg('Marker 1 & 2 found');
return 1;
}
if (index($data, $marker3) > -1) {
dbg('Marker 3 found');
return 1;
}
if (index($data, $marker4) > -1) {
dbg('Marker 4 found');
return 1;
}
if (index($data, $marker5) > -1) {
dbg('Marker 5 found');
return 1;
}
if (index($data, 'w:macrosPresent="yes"') > -1) {
dbg('XML macros marker found');
return 1;
}
if (index($data, 'vbaProject.bin.rels') > -1) {
dbg('XML macros marker found');
return 1;
}
}
sub _find_malice_bins {
my ($zip) = @_;
my @binfiles = $zip->membersMatching( '.*\.bin' );
foreach my $member (@binfiles){
my ( $data, $status ) = $member->contents();
next unless $status == $az_ok;
if (_check_malice($data)) {
return 1;
}
}
}
sub _check_malice {
my ($data) = @_;
# https://www.greyhathacker.net/?p=872
if ($data =~ /(?:document|auto|workbook)_?open/i) {
dbg('Found potential malicious code');
return 1;
}
}
sub _check_ctype_xml {
my ($data) = @_;
# http://download.microsoft.com/download/D/3/3/D334A189-E51B-47FF-B0E8-C0479AFB0E3C/[MS-OFFMACRO].pdf
if ($data =~ /ContentType=["']application\/vnd\.ms-office\.vbaProject["']/i){
dbg('Found VBA ref');
return 1;
}
if ($data =~ /macroEnabled/i) {
dbg('Found Macro Ref');
return 1;
}
if ($data =~ /application\/vnd\.ms-excel\.(?:intl)?macrosheet/i) {
dbg('Excel macrosheet found');
return 1;
}
}
sub _zip_error_handler {
1;
}
sub _decode_part_header {
my($part, $header_field_body) = @_;
return '' unless defined $header_field_body && $header_field_body ne '';
# deal with folding and cream the newlines and such
$header_field_body =~ s/\n[ \t]+/\n /g;
$header_field_body =~ s/\015?\012//gs;
local($1,$2,$3);
# Multiple encoded sections must ignore the interim whitespace.
# To avoid possible FPs with (\s+(?==\?))?, look for the whole RE
# separated by whitespace.
1 while $header_field_body =~
s{ ( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) }
{$1$2}xsg;
# transcode properly encoded RFC 2047 substrings into UTF-8 octets,
# leave everything else unchanged as it is supposed to be UTF-8 (RFC 6532)
# or plain US-ASCII
$header_field_body =~
s{ (?: = \? ([A-Za-z0-9_-]+) \? ([bqBQ]) \? ([^?]*) \? = ) }
{ $part->__decode_header($1, uc($2), $3) }xsge;
return $header_field_body;
}
1;

View File

@ -0,0 +1,154 @@
=head1 NAME
Mail::SpamAssassin::Plugin::OneLineBodyRuleType - spamassassin body test plugin
=cut
package Mail::SpamAssassin::Plugin::OneLineBodyRuleType;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Constants qw(:sa);
use strict;
use warnings;
use re 'taint';
our @ISA = qw();
# constructor
sub new {
my $class = shift;
$class = ref($class) || $class;
my $self = {};
bless ($self, $class);
return $self;
}
###########################################################################
sub check_rules_at_priority {
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
my $checkobj = $params->{checkobj};
my $priority = $params->{priority};
Mail::SpamAssassin::Plugin::Check::do_one_line_body_tests($checkobj,
$pms, $priority);
}
sub check_start {
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
my $conf = $pms->{conf};
# this method runs before the body ruleset is compiled, but after
# finish_tests(). perfect spot to remove rules from the body
# set and add to another set...
my $test_set = $conf->{body_tests};
foreach my $pri (keys %{$test_set})
{
foreach my $rulename (keys %{$test_set->{$pri}})
{
if ($conf->{generate_body_one_line_sub}->{$rulename}) {
# add the rule to the one-liner set
$conf->{one_line_body_tests}->{$pri} ||= { };
$conf->{one_line_body_tests}->{$pri}->{$rulename} =
$test_set->{$pri}->{$rulename};
}
if ($conf->{skip_body_rules}->{$rulename}) {
# remove from the body set
delete $test_set->{$pri}->{$rulename};
}
}
}
}
###########################################################################
1;
# inject this method into the Check plugin's namespace
# TODO: we need a better way to define new ruletypes via plugin
package Mail::SpamAssassin::Plugin::Check;
sub do_one_line_body_tests {
my ($self, $pms, $priority) = @_;
# TODO: should have a consttype for plugin-defined "alien" rule types,
# probably something like TYPE_ALIEN_TESTS. it's only used as a key
# for {user_rules_of_type}, so that should be fine
$self->run_generic_tests ($pms, $priority,
consttype => $Mail::SpamAssassin::Conf::TYPE_BODY_TESTS,
type => 'one_line_body',
testhash => $pms->{conf}->{one_line_body_tests},
args => [ ],
loop_body => sub
{
my ($self, $pms, $conf, $rulename, $pat, %opts) = @_;
my $sub = '
my ($self, $line) = @_;
my $qrptr = $self->{main}->{conf}->{test_qrs};
';
if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
{
# support multiple matches
my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
$max = untaint_var($max);
if ($max) {
$sub .= '
if (exists $self->{tests_already_hit}->{q{'.$rulename.'}}) {
return 0 if $self->{tests_already_hit}->{q{'.$rulename.'}} >= '.$max.';
}
';
}
# avoid [perl #86784] bug (fixed in 5.13.x), access the arg through ref
$sub .= '
my $lref = \$line;
pos $$lref = 0;
'.$self->hash_line_for_rule($pms, $rulename).'
while ($$lref =~ /$qrptr->{q{'.$rulename.'}}/go) {
$self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "one_line_body");
'. $self->hit_rule_plugin_code($pms, $rulename, "one_line_body", "") . '
'. ($max? 'last if $self->{tests_already_hit}->{q{'.$rulename.'}} >= '.$max.';' : '') . '
}
';
} else {
$sub .= '
'.$self->hash_line_for_rule($pms, $rulename).'
if ($line =~ /$qrptr->{q{'.$rulename.'}}/o) {
$self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "one_line_body");
'. $self->hit_rule_plugin_code($pms, $rulename, "one_line_body", "return 1") . '
}
';
}
return if ($opts{doing_user_rules} &&
!$self->is_user_rule_sub($rulename.'_one_line_body_test'));
$self->add_temporary_method ($rulename.'_one_line_body_test', $sub);
},
pre_loop_body => sub
{
my ($self, $pms, $conf, %opts) = @_;
$self->add_evalstr($pms, '
my $bodytext = $self->get_decoded_stripped_body_text_array();
$self->{main}->call_plugins("run_body_fast_scan", {
permsgstatus => $self, ruletype => "body",
priority => '.$opts{priority}.', lines => $bodytext
});
');
});
}
###########################################################################
1;

View File

@ -0,0 +1,751 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::PDFInfo - PDFInfo Plugin for SpamAssassin
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::PDFInfo
=head1 DESCRIPTION
This plugin helps detected spam using attached PDF files
=over 4
=item See "Usage:" below - more documentation see 20_pdfinfo.cf
Original info kept for history. For later changes see SVN repo
-------------------------------------------------------
PDFInfo Plugin for SpamAssassin
Version: 0.8
Info: $Id: PDFInfo.pm 904 2007-08-12 01:36:23Z root $
Created: 2007-08-10
Modified: 2007-08-10
By: Dallas Engelken
Changes:
0.8 - added .fdf detection (thanks John Lundin) [axb]
0.7 - fixed empty body/pdf count buglet(thanks Jeremy) [axb]
0.6 - added support for tags - PDFCOUNT, PDFVERSION, PDFPRODUCER, etc.
- fixed issue on perl 5.6.1 where pdf_match_details() failed to call
_find_pdf_mime_parts(), resulting in no detection of pdf mime parts.
- quoted-printable support - requires MIME::QuotedPrint (which should be in everyones
install as a part of the MIME-Base64 package which is a SA req)
- added simple pdf_is_empty_body() function with counts the body bytes minus the
subject line. can add optional <bytes> param if you need to allow for a few bytes.
0.5 - fix warns for undef $pdf_tags
- remove { } and \ before running eval in pdf_match_details to avoid eval error
0.4 - added pdf_is_encrypted() function
- added option to look for image HxW on same line
0.3 - added 2nd fuzzy md5 which uses pdf tag layout as data
- renamed pdf_image_named() to pdf_named()
- PDF images are encapsulated and have no names. We are matching the PDF file name.
- renamed pdf_image_name_regex() to pdf_name_regex()
- PDF images are encapsulated and have no names. We are matching the PDF file name.
- changed pdf_image_count() a bit and added pdf_count().
- pdf_count() checks how many pdf attachments there are on the mail
- pdf_image_count() checks how many images are found within all pdfs in the mail.
- removed the restriction of the pdf containing an image in order to md5 it.
- added pdf_match_details() function to check the following 'details'
- author: Author of PDF if specified
- producer: Software used to produce PDF
- creator: Software used to produce PDF, usually similar to producer
- title: Title of PDF
- created: Creation Date
- modified: Last Modified
0.2 - support PDF octet-stream
0.1 - just ported over the imageinfo code, and renamed to pdfinfo.
- removed all support for png, gif, and jpg from the code.
- prepended pdf_ to all function names to avoid conflicts with ImageInfo in SA 3.2.
Usage:
pdf_count()
body RULENAME eval:pdf_count(<min>,[max])
min: required, message contains at least x pdf mime parts
max: optional, if specified, must not contain more than x pdf mime parts
pdf_image_count()
body RULENAME eval:pdf_image_count(<min>,[max])
min: required, message contains at least x images in pdf attachments.
max: optional, if specified, must not contain more than x pdf images
pdf_pixel_coverage()
body RULENAME eval:pdf_pixel_coverage(<min>,[max])
min: required, message contains at least this much pixel area
max: optional, if specified, message must not contain more than this much pixel area
pdf_named()
body RULENAME eval:pdf_named(<string>)
string: exact file name match, if you need partial match, see pdf_name_regex()
pdf_name_regex()
body RULENAME eval:pdf_name_regex(<regex>)
regex: regular expression, see examples in ruleset
pdf_match_md5()
body RULENAME eval:pdf_match_md5(<string>)
string: 32-byte md5 hex
pdf_match_fuzzy_md5()
body RULENAME eval:pdf_match_md5(<string>)
string: 32-byte md5 hex - see ruleset for obtaining the fuzzy md5
pdf_match_details()
body RULENAME eval:pdf_match_details(<detail>,<regex>);
detail: author, creator, created, modified, producer, title
regex: regular expression, see examples in ruleset
pdf_is_encrypted()
body RULENAME eval:pdf_is_encrypted()
pdf_is_empty_body()
body RULENAME eval:pdf_is_empty_body(<bytes>)
bytes: maximum byte count to allow and still consider it empty
NOTE: See the ruleset for more examples that are not documented here.
=back
=cut
# -------------------------------------------------------
package Mail::SpamAssassin::Plugin::PDFInfo;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(compile_regexp);
use strict;
use warnings;
# use bytes;
use Digest::MD5 qw(md5_hex);
use MIME::QuotedPrint;
our @ISA = qw(Mail::SpamAssassin::Plugin);
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("pdf_count");
$self->register_eval_rule ("pdf_image_count");
$self->register_eval_rule ("pdf_pixel_coverage");
$self->register_eval_rule ("pdf_image_size_exact");
$self->register_eval_rule ("pdf_image_size_range");
$self->register_eval_rule ("pdf_named");
$self->register_eval_rule ("pdf_name_regex");
$self->register_eval_rule ("pdf_image_to_text_ratio");
$self->register_eval_rule ("pdf_match_md5");
$self->register_eval_rule ("pdf_match_fuzzy_md5");
$self->register_eval_rule ("pdf_match_details");
$self->register_eval_rule ("pdf_is_encrypted");
$self->register_eval_rule ("pdf_is_empty_body");
return $self;
}
# -----------------------------------------
my %get_details = (
'pdf' => sub {
my ($self, $pms, $part) = @_;
my $type = $part->{'type'} || 'base64';
my $data = '';
if ($type eq 'quoted-printable') {
$data = decode_qp($data); # use QuotedPrint->decode_qp
}
else {
$data = $part->decode(); # just use built in base64 decoder
}
my $index = substr($data, 0, 8);
return unless ($index =~ /.PDF\-(\d\.\d)/);
my $version = $1;
$self->_set_tag($pms, 'PDFVERSION', $version);
# dbg("pdfinfo: pdf version = $version");
my ($height, $width, $fuzzy_data, $pdf_tags);
my ($producer, $created, $modified, $title, $creator, $author) = ('unknown','0','0','untitled','unknown','unknown');
my ($md5, $fuzzy_md5) = ('', '');
my ($total_height, $total_width, $total_area, $line_count) = (0,0,0,0);
my $name = $part->{'name'} || '';
$self->_set_tag($pms, 'PDFNAME', $name);
my $no_more_fuzzy = 0;
my $got_image = 0;
my $encrypted = 0;
while($data =~ /([^\n]+)/g) {
# dbg("pdfinfo: line=$1");
my $line = $1;
$line_count++;
# lines containing high bytes will have no data we need, so save some cycles
next if ($line =~ /[\x80-\xff]/);
if (!$no_more_fuzzy && $line_count < 70) {
if ($line !~ m/^\%/ && $line !~ m/^\/(?:Height|Width|(?:(?:Media|Crop)Box))/ && $line !~ m/^\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+cm$/) {
$line =~ s/\s+$//; # strip off whitespace at end.
$fuzzy_data .= $line;
}
}
if ($line =~ m/^\/([A-Za-z]+)/) {
$pdf_tags .= $1;
}
$got_image=1 if ($line =~ m/\/Image/);
$encrypted=1 if ($line =~ m/^\/Encrypt/);
# once we hit the first stream, we stop collecting data for fuzzy md5
$no_more_fuzzy = 1 if ($line =~ m/stream/);
# From a v1.3 pdf
# [12234] dbg: pdfinfo: line=630 0 0 149 0 0 cm
# [12234] dbg: pdfinfo: line=/Width 630
# [12234] dbg: pdfinfo: line=/Height 149
if ($got_image) {
if ($line =~ /^(\d+)\s+\d+\s+\d+\s+(\d+)\s+\d+\s+\d+\s+cm$/) {
$width = $1;
$height = $2;
}
elsif ($line =~ /^\/Width\s(\d+)/) {
$width = $1;
}
elsif ($line =~ /^\/Height\s(\d+)/) {
$height = $1;
}
elsif ($line =~ m/\/Width\s(\d+)\/Height\s(\d+)/) {
$width = $1;
$height = $2;
}
}
# did pdf contain image data?
if ($got_image && $width && $height) {
$no_more_fuzzy = 1;
my $area = $width * $height;
$total_height += $height;
$total_width += $width;
$total_area += $area;
$pms->{pdfinfo}->{dems_pdf}->{"${height}x${width}"} = 1;
$pms->{'pdfinfo'}->{"count_pdf_images"} ++;
dbg("pdfinfo: Found image in PDF ".($name ? $name : '')." - $height x $width pixels ($area pixels sq.)");
$self->_set_tag($pms, 'PDFIMGDIM', "${height}x${width}");
$height=0; $width=0; # reset and check for next image
$got_image = 0;
}
# [5310] dbg: pdfinfo: line=<</Producer(GPL Ghostscript 8.15)
# [5310] dbg: pdfinfo: line=/CreationDate(D:20070703144220)
# [5310] dbg: pdfinfo: line=/ModDate(D:20070703144220)
# [5310] dbg: pdfinfo: line=/Title(Microsoft Word - Document1)
# [5310] dbg: pdfinfo: line=/Creator(PScript5.dll Version 5.2)
# [5310] dbg: pdfinfo: line=/Author(colet)>>endobj
# or all on same line inside xml - v1.6+
# <</CreationDate(D:20070226165054-06'00')/Creator( Adobe Photoshop CS2 Windows)/Producer(Adobe Photoshop for Windows -- Image Conversion Plug-in)/ModDate(D:20070226165100-06'00')>>
if ($line =~ /\/Producer\s?\(([^\)\\]+)/) {
$producer = $1;
}
if ($line =~ /\/CreationDate\s?\(D\:(\d+)/) {
$created = $1;
}
if ($line =~ /\/ModDate\s?\(D\:(\d+)/) {
$modified = $1;
}
if ($line =~ /\/Title\s?\(([^\)\\]+)/) {
$title = $1;
# Title=\376\377\000w\000w\000n\000g
# Title=wwng
$title =~ s/\\\d{3}//g;
}
if ($line =~ /\/Creator\s?\(([^\)\\]+)/) {
$creator = $1;
}
if ($line =~ /\/Author\s?\(([^\)]+)/) {
$author = $1;
# Author=\376\377\000H\000P\000_\000A\000d\000m\000i\000n\000i\000s\000t\000r\000a\000t\000o\000r
# Author=HP_Administrator
$author =~ s/\\\d{3}//g;
}
}
# store the file name so we can check pdf_named() or pdf_name_match() later.
$pms->{pdfinfo}->{names_pdf}->{$name} = 1 if $name;
# store encrypted flag.
$pms->{pdfinfo}->{encrypted} = $encrypted;
# if we had multiple images in the pdf, we need to store the total HxW as well.
# If it was a single Image PDF, then this value will already be in the hash.
$pms->{pdfinfo}->{dems_pdf}->{"${total_height}x${total_width}"} = 1 if ($total_height && $total_width);;
if ($total_area) {
$pms->{pdfinfo}->{pc_pdf} = $total_area;
$self->_set_tag($pms, 'PDFIMGAREA', $total_area);
dbg("pdfinfo: Filename=$name Total HxW: $total_height x $total_width ($total_area area)") if ($total_area);
}
dbg("pdfinfo: Filename=$name Title=$title Author=$author Producer=$producer Created=$created Modified=$modified");
$md5 = uc(md5_hex($data)) if $data;
$fuzzy_md5 = uc(md5_hex($fuzzy_data)) if $fuzzy_data;
my $tags_md5;
$tags_md5 = uc(md5_hex($pdf_tags)) if $pdf_tags;
dbg("pdfinfo: MD5 results for ".($name ? $name : '')." - md5=".($md5 ? $md5 : '')." fuzzy1=".($fuzzy_md5 ? $fuzzy_md5 : '')." fuzzy2=".($tags_md5 ? $tags_md5 : ''));
# we dont need tags for these.
$pms->{pdfinfo}->{details}->{created} = $created if $created;
$pms->{pdfinfo}->{details}->{modified} = $modified if $modified;
if ($producer) {
$pms->{pdfinfo}->{details}->{producer} = $producer if $producer;
$self->_set_tag($pms, 'PDFPRODUCER', $producer);
}
if ($title) {
$pms->{pdfinfo}->{details}->{title} = $title;
$self->_set_tag($pms, 'PDFTITLE', $title);
}
if ($creator) {
$pms->{pdfinfo}->{details}->{creator} = $creator;
$self->_set_tag($pms, 'PDFCREATOR', $creator);
}
if ($author) {
$pms->{pdfinfo}->{details}->{author} = $author;
$self->_set_tag($pms, 'PDFAUTHOR', $author);
}
if ($md5) {
$pms->{pdfinfo}->{md5}->{$md5} = 1;
$self->_set_tag($pms, 'PDFMD5', $fuzzy_md5);
}
if ($fuzzy_md5) {
$pms->{pdfinfo}->{fuzzy_md5}->{$fuzzy_md5} = 1;
$self->_set_tag($pms, 'PDFMD5FUZZY1', $fuzzy_md5);
}
if ($tags_md5) {
$pms->{pdfinfo}->{fuzzy_md5}->{$tags_md5} = 1;
$self->_set_tag($pms, 'PDFMD5FUZZY2', $tags_md5);
}
},
);
# ----------------------------------------
sub _set_tag {
my ($self, $pms, $tag, $value) = @_;
dbg("pdfinfo: set_tag called for $tag $value");
return unless ($tag && $value);
if (exists $pms->{tag_data}->{$tag}) {
$pms->{tag_data}->{$tag} .= " $value"; # append value
}
else {
$pms->{tag_data}->{$tag} = $value;
}
}
# ----------------------------------------
sub _find_pdf_mime_parts {
my ($self,$pms) = @_;
# bail early if message does not have pdf parts
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
# initialize
$pms->{'pdfinfo'}->{"pc_pdf"} = 0;
$pms->{'pdfinfo'}->{"count_pdf"} = 0;
$pms->{'pdfinfo'}->{"count_pdf_images"} = 0;
my @parts = $pms->{msg}->find_parts(qr@^(image|application)/(pdf|octet\-stream)$@, 1);
my $part_count = scalar @parts;
dbg("pdfinfo: Identified $part_count possible mime parts that need checked for PDF content");
# cache this so we can easily bail
$pms->{'pdfinfo'}->{'no_parts'} = 1 unless $part_count;
foreach my $p (@parts) {
my $type = $p->{'type'} =~ m@/([\w\-]+)$@;
my $name = $p->{'name'} || '';
my $cte = lc( $p->get_header('content-transfer-encoding') || '' );
dbg("pdfinfo: found part, type=".($type ? $type : '')." file=".($name ? $name : '')." cte=".($cte ? $cte : '')."");
# make sure its a cte we support
next unless ($cte =~ /^(?:base64|quoted\-printable)$/);
# filename must end with .pdf, or application type can be pdf
# sometimes windows muas will wrap a pdf up inside a .dat file
# v0.8 - Added .fdf phoney PDF detection
next unless ($name =~ /\.[fp]df$/ || $type eq 'pdf');
# if we get this far, make sure type is pdf for sure (not octet-stream or anything else)
$type='pdf';
if ($type && exists $get_details{$type}) {
$get_details{$type}->($self, $pms, $p);
$pms->{'pdfinfo'}->{"count_$type"} ++;
}
}
$self->_set_tag($pms, 'PDFCOUNT', $pms->{'pdfinfo'}->{"count_pdf"});
$self->_set_tag($pms, 'PDFIMGCOUNT', $pms->{'pdfinfo'}->{"count_pdf_images"});
}
# ----------------------------------------
sub pdf_named {
my ($self,$pms,$body,$name) = @_;
return unless (defined $name);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"names_pdf"});
return 1 if (exists $pms->{'pdfinfo'}->{"names_pdf"}->{$name});
return 0;
}
# -----------------------------------------
sub pdf_name_regex {
my ($self,$pms,$body,$re) = @_;
return unless (defined $re);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"names_pdf"});
my ($rec, $err) = compile_regexp($re, 2);
if (!$rec) {
info("pdfinfo: invalid regexp '$re': $err");
return 0;
}
my $hit = 0;
foreach my $name (keys %{$pms->{'pdfinfo'}->{"names_pdf"}}) {
if ($name =~ $rec) {
dbg("pdfinfo: pdf_name_regex hit on $name");
return 1;
}
}
return 0;
}
# -----------------------------------------
sub pdf_is_encrypted {
my ($self,$pms,$body) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return $pms->{'pdfinfo'}->{'encrypted'};
}
# -----------------------------------------
sub pdf_count {
my ($self,$pms,$body,$min,$max) = @_;
return unless defined $min;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"count_pdf"});
return result_check($min, $max, $pms->{'pdfinfo'}->{"count_pdf"});
}
# -----------------------------------------
sub pdf_image_count {
my ($self,$pms,$body,$min,$max) = @_;
return unless defined $min;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"count_pdf_images"});
return result_check($min, $max, $pms->{'pdfinfo'}->{"count_pdf_images"});
}
# -----------------------------------------
sub pdf_pixel_coverage {
my ($self,$pms,$body,$min,$max) = @_;
return unless (defined $min);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"pc_pdf"});
# dbg("pdfinfo: pc_$type: $min, ".($max ? $max:'').", $type, ".$pms->{'pdfinfo'}->{"pc_pdf"});
return result_check($min, $max, $pms->{'pdfinfo'}->{"pc_pdf"});
}
# -----------------------------------------
sub pdf_image_to_text_ratio {
my ($self,$pms,$body,$min,$max) = @_;
return unless (defined $min && defined $max);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"pc_pdf"});
# depending on how you call this eval (body vs rawbody),
# the $textlen will differ.
my $textlen = length(join('',@$body));
return 0 unless ( $textlen > 0 && exists $pms->{'pdfinfo'}->{"pc_pdf"} && $pms->{'pdfinfo'}->{"pc_pdf"} > 0);
my $ratio = $textlen / $pms->{'pdfinfo'}->{"pc_pdf"};
dbg("pdfinfo: image ratio=$ratio, min=$min max=$max");
return result_check($min, $max, $ratio, 1);
}
# -----------------------------------------
sub pdf_is_empty_body {
my ($self,$pms,$body,$min) = @_;
$min ||= 0; # default to 0 bytes
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless $pms->{'pdfinfo'}->{"count_pdf"};
# check for cached result
return 1 if $pms->{'pdfinfo'}->{"no_body_text"};
shift @$body; # shift body array removes line #1 -> subject line.
my $bytes = 0;
my $textlen = length(join('',@$body));
foreach my $line (@$body) {
next unless ($line =~ m/\S/);
next if ($line =~ m/^Subject/);
$bytes += length($line);
}
dbg("pdfinfo: is_empty_body = $bytes bytes");
if ($bytes == 0 || ($bytes <= $min)) {
$pms->{'pdfinfo'}->{"no_body_text"} = 1;
return 1;
}
# cache it and return 0
$pms->{'pdfinfo'}->{"no_body_text"} = 0;
return 0;
}
# -----------------------------------------
sub pdf_image_size_exact {
my ($self,$pms,$body,$height,$width) = @_;
return unless (defined $height && defined $width);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"dems_pdf"});
return 1 if (exists $pms->{'pdfinfo'}->{"dems_pdf"}->{"${height}x${width}"});
return 0;
}
# -----------------------------------------
sub pdf_image_size_range {
my ($self,$pms,$body,$minh,$minw,$maxh,$maxw) = @_;
return unless (defined $minh && defined $minw);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"dems_pdf"});
foreach my $dem ( keys %{$pms->{'pdfinfo'}->{"dems_pdf"}}) {
my ($h,$w) = split(/x/,$dem);
next if ($h < $minh); # height less than min height
next if ($w < $minw); # width less than min width
next if (defined $maxh && $h > $maxh); # height more than max height
next if (defined $maxw && $w > $maxw); # width more than max width
# if we make it here, we have a match
return 1;
}
return 0;
}
# -----------------------------------------
sub pdf_match_md5 {
my ($self,$pms,$body,$md5) = @_;
return unless defined $md5;
my $uc_md5 = uc($md5); # uppercase matches only
# make sure we have pdf data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"md5"});
return 1 if (exists $pms->{'pdfinfo'}->{"md5"}->{$uc_md5});
return 0;
}
# -----------------------------------------
sub pdf_match_fuzzy_md5 {
my ($self,$pms,$body,$md5) = @_;
return unless defined $md5;
my $uc_md5 = uc($md5); # uppercase matches only
# make sure we have pdf data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"fuzzy_md5"});
return 1 if (exists $pms->{'pdfinfo'}->{"fuzzy_md5"}->{$uc_md5});
return 0;
}
# -----------------------------------------
sub pdf_match_details {
my ($self, $pms, $body, $detail, $regex) = @_;
return unless ($detail && $regex);
# make sure we have pdf data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{'details'});
my $check_value = $pms->{pdfinfo}->{details}->{$detail};
return unless $check_value;
my ($rec, $err) = compile_regexp($regex, 2);
if (!$rec) {
info("pdfinfo: invalid regexp '$regex': $err");
return 0;
}
if ($check_value =~ $rec) {
dbg("pdfinfo: pdf_match_details $detail $regex matches $check_value");
return 1;
}
return 0;
}
# -----------------------------------------
sub result_check {
my ($min, $max, $value, $nomaxequal) = @_;
return 0 unless defined $value;
return 0 if ($value < $min);
return 0 if (defined $max && $value > $max);
return 0 if (defined $nomaxequal && $nomaxequal && $value == $max);
return 1;
}
# -----------------------------------------
1;

View File

@ -0,0 +1,273 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
#
###########################################################################
package Mail::SpamAssassin::Plugin::PhishTag;
use strict;
use warnings;
use Errno qw(EBADF);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Logger;
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub new{
my ($class, $mailsa)=@_;
$class=ref($class) ||$class;
my $self = $class->SUPER::new($mailsa);
bless($self,$class);
$self->set_config($mailsa->{conf});
return $self;
}
sub set_config{
my($self, $conf) = @_;
my @cmds;
push (@cmds, {
setting => 'trigger_target',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
is_admin => 1,
});
push (@cmds, {
setting => 'trigger_config',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
is_admin => 1,
default => '',
});
push (@cmds, {
setting => 'trigger_ratio',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
is_admin => 1,
default => 0,
});
$conf->{parser}->register_commands(\@cmds);
}
#prepare the plugin
sub check_start{
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
#initialize the PHISHTAG data structure for
#saving configuration information
$pms->{PHISHTAG} = {};
$pms->{PHISHTAG}->{triggers}={};
$pms->{PHISHTAG}->{targets}=[];
#read the configuration info
$self->read_configfile($params);
$self->read_settings($params);
}
sub read_settings{
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
my $triggers= $pms->{PHISHTAG}->{triggers};
my $targets= $pms->{PHISHTAG}->{targets};
while (my ($tname,$ttarget)=each %{$pms->{conf}->{trigger_target}}){
push @$targets, [$ttarget, $tname];
$$triggers{$tname}=0;
}
}
sub read_configfile{
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
#nothing interesting here if there is not a configuration file
return if($pms->{conf}->{trigger_config} !~/\S/);
my $triggers= $pms->{PHISHTAG}->{triggers};
my $targets= $pms->{PHISHTAG}->{targets};
my $target;
local *F;
open(F, '<', $pms->{conf}->{trigger_config});
for ($!=0; <F>; $!=0) {
#each entry is separated by blank lines
undef($target) if(!/\S/);
#lines that start with pound are comments
next if(/^\s*\#/);
#an entry starts with a URL line prefixed with the word "target"
if(/^target\s+(\S+)/){
$target=[$1];
push @$targets,$target;
}
#add the test to the list of listened triggers
#and to the triggers of the last target
elsif(defined $target){
s/\s+//g;
$$triggers{$_}=0;
push @$target, $_;
}
}
defined $_ || $!==0 or
$!==EBADF ? dbg("PHISHTAG: error reading config file: $!")
: die "error reading config file: $!";
close(F) or die "error closing config file: $!";
}
sub hit_rule {
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
my $rulename = $params->{rulename};
#mark the rule as hit
if(defined($pms->{PHISHTAG}->{triggers}->{$rulename})){
$pms->{PHISHTAG}->{triggers}->{$rulename}=1;
dbg("PHISHTAG: $rulename has been caught\n");
}
}
sub check_post_learn {
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
#find out which targets have fulfilled their requirements
my $triggers= $pms->{PHISHTAG}->{triggers};
my $targets= $pms->{PHISHTAG}->{targets};
my @filled;
foreach my $target(@$targets){
my $uri= $$target[0];
my $fulfilled=1;
#all the triggers of a target have to exist for it to be fulfilled
foreach my $i(1..$#$target){
if(! $triggers->{$$target[$i]}){
$fulfilled=0;
last;
}
}
if($fulfilled){
push @filled, $uri;
dbg("PHISHTAG: Fulfilled $uri\n");
}
}
if(scalar(@filled) &&
$pms->{conf}->{trigger_ratio} > rand(100)){
$pms->{PHISHTAG}->{letgo}=0;
$pms->{PHISHTAG}->{uri}=$filled[int(rand(scalar(@filled)))];
dbg("PHISHTAG: Decided to keep this email and point to ".
$pms->{PHISHTAG}->{uri});
#make sure that SpamAssassin does not remove this email
$pms->got_hit("PHISHTAG_TOSS",
"BODY: ",
score => -100);
}
else{
dbg("PHISHTAG: Will let this email to SpamAssassin's discretion\n");
$pms->{PHISHTAG}->{letgo}=1;
}
#nothing interesting here, if we will not rewrite the email
if($pms->{PHISHTAG}->{letgo}){
return;
}
my $pristine_body=\$pms->{msg}->{pristine_body};
#dbg("PRISTINE>>\n".$$pristine_body);
my $uris = $pms->get_uri_detail_list();
#rewrite the url
while (my($uri, $info) = each %{$uris}) {
if(defined ($info->{types}->{a})){
$$pristine_body=~s/$uri/$pms->{PHISHTAG}->{uri}/mg;
}
}
dbg("PRISTINE>>\n".$$pristine_body);
}
1;
__END__
=head1 NAME
PhishTag - SpamAssassin plugin for redirecting links in incoming emails.
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::PhishTag
trigger_ratio 0.1
trigger_target RULE_NAME http://www.antiphishing.org/consumer_recs.html
=head1 DESCRIPTION
PhishTag enables administrators to rewrite links in emails that trigger certain
tests, preferably anti-phishing blacklist tests. The plugin will inhibit the
blocking of a portion of the emails that trigger the test by SpamAssassin, and
let them pass to the users' inbox after the rewrite. It is useful in providing
training to email users about company policies and general email usage.
=head1 OPTIONS
The following options can be set by modifying the configuration file.
=over 4
=item * trigger_ratio percentage_value
Sets the probability in percentage that a positive test will trigger the
email rewrite, e.g. 0.1 will rewrite on the average 1 in 1000 emails that
match the trigger.
=item * trigger_target RULE_NAME http_url
The name of the test which would trigger the email rewrite; all the URLs
will be replaced by http_url.
=back
=head1 DOWNLOAD
The source of this plugin is available at:
http://umut.topkara.org/PhishTag/PhishTag.pm
a sample configuration file is also available:
http://umut.topkara.org/PhishTag/PhishTag.cf
=head1 SEE ALSO
Check the list of tests performed by SpamAssassin to modify the
configuration file to match your needs from
https://spamassassin.apache.org/tests.html
=head1 AUTHOR
Umut Topkara, 2008, E<lt>umut@topkara.orgE<gt>
http://umut.topkara.org
=head1 COPYRIGHT AND LICENSE
This plugin is free software; you can redistribute it and/or modify
it under the same terms as SpamAssassin itself, either version 3.2.4
or, at your option, any later version of SpamAssassin you may have
available.
=cut

Some files were not shown because too many files have changed in this diff Show More