update SpamAssassin to 4.0.0

generated by make update-upstream

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
This commit is contained in:
Stoiko Ivanov 2023-03-13 21:13:17 +01:00
parent 56cebc6b1a
commit ae52237fd8
448 changed files with 32497 additions and 16814 deletions

View File

@ -1,4 +1,4 @@
Copyright (C) 2021 The Apache Software Foundation
Copyright (C) 2022 The Apache Software Foundation
Project Management Committee (PMC):

File diff suppressed because it is too large Load Diff

View File

@ -11,16 +11,16 @@ Installing or Upgrading SpamAssassin
Using CPAN via CPAN.pm:
perl -MCPAN -e shell [as root]
perl -MCPAN -e shell # as root
o conf prerequisites_policy ask
install Mail::SpamAssassin
quit
Using Linux:
Debian unstable: apt-get install spamassassin
Debian/Ubuntu: apt-get install spamassassin
Gentoo: emerge mail-filter/spamassassin
Fedora: yum install spamassassin
Fedora/CentOS/RedHat: yum install spamassassin
Alternatively download the tarfile, zipfile, and/or build your own RPM
from https://spamassassin.apache.org/. Building from tar/zip file is
@ -28,10 +28,9 @@ usually as simple as:
[unzip/untar the archive]
cd Mail-SpamAssassin-*
perl Makefile.PL
[option: add -DSPAMC_SSL to $CFLAGS to build an SSL-enabled spamc]
perl Makefile.PL # add ENABLE_SSL=yes for SSL support
make
make install [as root]
make install # as root
After installing SpamAssassin, you need to download and install the
SpamAssassin ruleset using "sa-update". See the "Installing Rules"
@ -42,16 +41,10 @@ the prerequisite information further down.
To install as non-root, see the directions below.
If you are running AFS, you may also need to specify INSTALLSITELIB and
SITELIBEXP.
Note that you can upgrade SpamAssassin using these instructions, as long
as you take care to read the caveats in the file UPGRADE. Upgrading
will not delete your learnt Bayes data or local rule modifications.
If you're using SunOS 4.1.x, see
http://wiki.spamassassin.org/w/BuildingOnSunOS4 for build tips.
Installing SpamAssassin for Personal Use (Not System-Wide)
----------------------------------------------------------
@ -62,7 +55,7 @@ These steps assume the following, so substitute as necessary:
- The location of the procmail executable is /usr/bin/procmail
Many more details of this process are at
http://wiki.apache.org/spamassassin/SingleUserUnixInstall
https://wiki.apache.org/spamassassin/SingleUserUnixInstall
1. Uncompress and extract the SpamAssassin archive, using "unzip" or
"tar xvfz", in a temporary directory.
@ -119,7 +112,7 @@ http://wiki.apache.org/spamassassin/SingleUserUnixInstall
caughtspam
Also, see the file procmailrc.example and
http://wiki.apache.org/spamassassin/UsedViaProcmail
https://wiki.apache.org/spamassassin/UsedViaProcmail
8. Now, you should be ready to send some test emails and ensure everything
works as expected. First, send yourself a test email that doesn't
@ -136,7 +129,7 @@ http://wiki.apache.org/spamassassin/UsedViaProcmail
don't lose incoming email.
Note: one possible cause for this is the use of smrsh on the MTA system;
see http://wiki.spamassassin.org/w/ProcmailVsSmrsh for details.
see https://wiki.apache.org/spamassassin/ProcmailVsSmrsh for details.
9. You can now customize SpamAssassin. See README for more information.
@ -153,18 +146,25 @@ Installing rules from network is done with a single command:
sa-update
This is normally run as root.
For security reasons, it should not be run as root, but as the user normally
running SpamAssassin. You can run the initial setup once as root, to create
necessary directories etc. Then you need to change ownership of
LOCAL_STATE_DIR to that user (usually: chown -R user:user
/var/lib/spamassassin), you can find out the default directory with
sa-update --help (look for --updatedir). Same needs to be done for
LOCAL_RULES_DIR/sa-update-keys (usually: chown -R user:user
/etc/mail/spamassassin/sa-update-keys), the directory can be found with
spamassassin --help (look for --siteconfigpath).
If you wish to install rules from downloaded files, rather than "live" from
the latest online ruleset, here is how to do it.
Obtain all the following files:
Obtain all the following files from https://spamassassin.apache.org/downloads.cgi:
Mail-SpamAssassin-rules-xxx.tgz
Mail-SpamAssassin-rules-xxx.tgz.asc
Mail-SpamAssassin-rules-xxx.tgz.md5
Mail-SpamAssassin-rules-xxx.tgz.sha1
(where xxx may look something like '3.3.0-rc1.r893295')
Mail-SpamAssassin-rules-xxx.tgz.sha512
(where xxx may look something like '4.0.0.r1900144')
Save them all to the current directory.
Obtain a rules-signing public key:
@ -180,7 +180,7 @@ Install rules from a compressed tar archive:
sa-update --install Mail-SpamAssassin-rules-xxx.tgz
Note that the ".tgz.asc", ".tgz.md5" and ".tgz.sha1" files all need to
Note that the ".tgz", ".tgz.asc" and ".tgz.sha512" files all need to
be in the same directory, otherwise sa-update will fail.
@ -197,7 +197,7 @@ CPAN
----
Most of the modules listed below are available via the Comprehensive Perl
Archive Network (CPAN, see http://www.cpan.org/ for more information).
Archive Network (CPAN, see https://www.cpan.org/ for more information).
While each module is different, most can be installed via a few simple
commands such as:
@ -222,11 +222,7 @@ through those mechanisms, too, if you prefer.
Required Perl Interpreter
-------------------------
Perl 5.8.1 or a later version is required.
Preferred versions are 5.8.8, or 5.10.1 or later.
Most of the functions might still work with Perl 5.6.1 or 5.6.2,
but 5.6.* is no longer a supported version.
Perl 5.14.0 or a later version is required.
Required Perl Modules
@ -243,81 +239,41 @@ SpamAssassin build directory.
The list of required modules that do not ship with Perl and must be
installed:
- Digest::SHA1 (from CPAN),
or the newer Digest::SHA which is a perl base module since Perl 5.10.0
- Digest::SHA (from CPAN)
The Digest::SHA1 module is used as a cryptographic hash for some
tests and the Bayes subsystem if Digest::SHA module is not available.
An external perl module razor-agents-2.84 as used by a Razor2 plugin
seems to be the only remaining component depending on Digest::SHA1
(note that a packager may ship a patched version of razor-agents which
can use Digest::SHA instead)
Debian: apt-get install libdigest-sha1-perl
Gentoo: emerge dev-perl/Digest-SHA1
Fedora: yum install perl-Digest-SHA1
Used as a cryptographic hash for some tests and the Bayes subsystem.
It is also required by the DKIM plugin.
- HTML::Parser >= 3.43 (from CPAN)
HTML is used for an ever-increasing amount of email so this dependency
is unavoidable. Run "perldoc -q html" for additional information.
Debian: apt-get install libhtml-parser-perl
Gentoo: emerge dev-perl/HTML-Parser
Fedora: yum install perl-HTML-Parser
- Net::DNS (from CPAN)
- Net::DNS >= 0.69 (from CPAN)
Used for all DNS-based tests (SBL, XBL, SpamCop, DSBL, etc.),
perform MX checks, used when manually reporting spam to SpamCop,
and used by sa-update to gather version information.
You need to make sure the Net::DNS version is sufficiently up-to-date:
- NetAddr::IP >= 4.010 (from CPAN)
- version 0.34 or higher on Unix systems
- version 0.46 or higher on Windows systems
Used to parse IP addresses and IP address ranges for "trusted_networks".
Used in determining which DNS tests are to be done for each of the
header's received fields. Used by AWL plugin for extracting network
addresses. Used by DNSxL rules for assembling DNS queries.
Debian/Ubuntu: apt-get install libnet-dns-perl
Fedora: yum install perl-Net-DNS
Avoid buggy versions 4.034-4.035 and 4.045-4.054.
- NetAddr::IP (from CPAN)
Examples of installing required modules on popular distributions:
Used to parse IP addresses and IP address ranges for
"trusted_networks".
Debian/Ubuntu:
apt-get install libdigest-sha-perl libhtml-parser-perl libnet-dns-perl libnetaddr-ip-perl
Debian/Ubuntu: apt-get install libnetaddr-ip-perl
Fedora: yum install perl-NetAddr-IP
Gentoo:
emerge dev-perl/Digest-SHA dev-perl/HTML-Parser dev-perl/Net-DNS dev-perl/NetAddr-IP
- Time::HiRes (from CPAN)
Used by asynchronous DNS lookups to operate timeouts with subsecond
precision and to report processing times accurately.
- LWP (aka libwww-perl) (from CPAN)
This set of modules will include both the LWP::UserAgent and
HTTP::Date modules, used by sa-update to retrieve update archives.
Fedora: yum install perl-libwww-perl
- HTTP::Date (from CPAN)
Used by sa-update to deal with certain Date requests.
- IO::Zlib (from CPAN)
Used by sa-update to uncompress update archives.
Version 1.04 or later is required.
Fedora: yum install perl-IO-Zlib
- Archive::Tar (from CPAN)
Used by sa-update to expand update archives.
Version 1.23 or later is required.
Fedora: yum install perl-Archive-Tar
Fedora/CentOS/RedHat:
yum install perl-Digest-SHA perl-HTML-Parser perl-Net-DNS perl-NetAddr-IP
Optional Modules
@ -329,140 +285,161 @@ their version is too low, SpamAssassin will still work, just not as
effectively because some of the spam-detection tests will have to be
skipped.
Note: SpamAssassin will not warn you if these are installed, but the
version is too low for them to be used.
Note: SpamAssassin may not warn you if these are installed, but the version
is too low for them to be used.
- MIME::Base64
- MIME::Base64 (from CPAN)
This module is highly recommended to increase the speed with which
Base64 encoded messages/mail parts are decoded.
- Encode::Detect::Detector (from CPAN)
- DB_File (from CPAN, included in many distributions)
For proper detection of charsets and converting them into Unicode, you
will need to install this module.
Used to store data on-disk, for the Bayes-style logic, TxRep, and
auto-whitelist. *Much* more efficient than the other standard Perl
database packages. Strongly recommended.
- Net::LibIDN2 (from CPAN)
- Net::LibIDN (from CPAN)
There seems to be a bug in libdb 4.1.25, which is
distributed by default on some versions of Linux. See
http://wiki.apache.org/spamassassin/DbFileSleepBug for details.
Provides mapping between Internationalized Domain Names (IDN) in Unicode
and ASCII-compatible encoding (ACE) for use in DNS and comparisions.
The module is optional, but without it Unicode IDN names found in mail
will not be suitable for DNS queries and welcome/blocklisting.
Either module should work fine, but newer Net::LibIDN2 might not be
available in all distributions.
- Net::SMTP (from CPAN)
Used when manually reporting spam to SpamCop.
- Mail::SPF (from CPAN)
Used to check DNS Sender Policy Framework (SPF) records to fight email
address forgery and make it easier to identify spams. This module
makes Mail::SPF::Query obsolete.
Net::DNS version 0.58 or higher is required.
Note that NetAddr::IP (required by Mail::SPF) versions up to and
including version 4.006 include a bug that will slow down the entire
perl interpreter. NetAddr::IP version 4.007 or later fixes this.
- Geo::IP (from CPAN)
Used by the RelayCountry plugin (not enabled by default) to determine
the domain country codes of each relay in the path of an email.
IP::Country::Fast is used as alternative if Geo::IP is not installed.
This is not recommended as it's obsolete.
- Net::Ident (from CPAN)
If you plan to use the --auth-ident option to spamd, you will need
to install this module.
- IO::Socket::INET6 (from CPAN)
This is required if the first nameserver listed in your IP
configuration or /etc/resolv.conf file is available only via an IPv6
address.
Fedora: yum install perl-IO-Socket-INET6
- IO::Socket::SSL (from CPAN)
If you wish to use SSL encryption to communicate between spamc and
spamd (the --ssl option to spamd), you need to install this
module. (You will need the OpenSSL libraries and use the
ENABLE_SSL="yes" argument to Makefile.PL to build and run an SSL
compatible spamc.)
Fedora: yum install perl-IO-Socket-SSL
- Compress::Zlib (from CPAN)
If you wish to use the optional zlib compression for communication
between spamc and spamd (the -z option to spamc), useful for
long-distance use of spamc over the internet, you need to install
this module.
Fedora: yum install perl-Compress-Zlib
- Email::Address::XS
Used to parse email addresses from header fields like To/From/cc, per
RFC 5322. If installed, it may additionally be used by internal parser
to process complex lists.
- Mail::DKIM (from CPAN)
If this module is installed, and you enable the DKIM plugin,
SpamAssassin will perform DKIM lookups when a DKIM-Signature header is
present in the message headers. Current versions of Mail::DKIM (0.20
or later) also perform Domain Key lookups on DomainKey-Signature headers,
present in the message headers. Current versions of Mail::DKIM (0.20 or
later) also perform Domain Key lookups on DomainKey-Signature headers,
without requiring the Mail::DomainKeys module, which is now obsolete.
Version 0.37 or later is preferred, the absolute minimal version is 0.31.
Version 0.37 or later is preferred, the absolute minimal version is
0.31.
Note that the Mail::DKIM module in turn requires the Digest::SHA module
and OpenSSL libraries.
- Mail::SPF (from CPAN)
Used to check DNS Sender Policy Framework (SPF) records to fight email
address forgery and make it easier to identify spams.
- MaxMind::DB::Reader::XS (GeoIP2) (from CPAN)
- MaxMind::DB::Reader (GeoIP2) (from CPAN)
- IP::Country::DB_File (from CPAN)
- Geo::IP (old deprecated GeoIP) (from CPAN)
- IP::Country::Fast (deprecated) (from CPAN)
Geolocation modules, choose one from the list (in recommended order).
Used by the RelayCountry plugin (not enabled by default) to determine
the domain country codes of each relay in the path of an email. Also
used by the URILocalBL plugin (not enabled by default) to provide ISP
and Country code based filtering.
See: https://wiki.apache.org/spamassassin/RelayCountryPlugin
- Mail::DMARC
Used by the optional DMARC check plugin, which itself requires DKIM and
SPF features working.
- DB_File (from CPAN)
Used to store data on-disk, for the Bayes-style logic, TxRep, and
auto-welcomelist. *Much* more efficient than the other standard Perl
database packages. Strongly recommended.
There seems to be a bug in libdb 4.1.25, which is
distributed by default on some versions of Linux. See
https://wiki.apache.org/spamassassin/DbFileSleepBug for details.
- IO::Socket::IP (from CPAN)
- IO::Socket::INET6 (from CPAN)
Installing IO::Socket::IP is recommended if spamd is to listen on IPv6
sockets or if DNS queries should go to an IPv6 name server. If
IO::Socket::IP is not available, using a deprecated module
IO::Socket::INET6 will be attempted, and in its absence the support for
IPv6 will not be available. Some plugins and underlying modules may
also prefer IO::Socket::IP over IO::Socket::INET6.
- IO::Socket::SSL (from CPAN)
If you wish to use SSL encryption to communicate between spamc and spamd
(the --ssl option to spamd), you need to install this module. (You will
need the OpenSSL libraries and use the ENABLE_SSL=yes argument to
Makefile.PL to build and run an SSL compatible spamc.)
- Net::Patricia
If this module is available, it will be used for IP address lookups in
tables internal_networks, trusted_networks, and msa_networks.
Recommended when a number of entries in these tables is hundred or more.
However, in case of overlapping (or conflicting) networks in these
tables, lookup results may differ as Net::Patricia finds a
tightest-matching entry, while a sequential NetAddr::IP search finds a
first-matching entry. So when overlapping network ranges are given,
specifying more specific subnets (longest netmask) first, followed by
wider subnets ensures predictable results.
- DBI *and* DBD driver/modules for your database (from CPAN)
If you intend to use SpamAssassin with an SQL database backend for
user configuration data, Bayes storage, or other storage, you will need
to have these installed; both the basic DBI module and the driver for
your database.
If you intend to use SpamAssassin with an SQL database backend for user
configuration data, Bayes storage, or other storage, you will need to
have these installed; both the basic DBI module and the driver for your
database (for example DBD::MariaDB, DBD::mysql or DBD::Pg).
- Archive::Zip
- IO::String
- Encode::Detect (from CPAN)
If you plan to use the normalize_charset config setting to detect
charsets and convert them into Unicode, you will need to install
this module.
- Apache::Test (from CPAN)
If you plan to run the Apache2 version of spamd in the
"spamd-apache2" directory, you will need to install this
module.
- Apache 2 and mod_perl
If you plan to run the Apache2 version of spamd in the "spamd-apache2"
directory, you will need to ensure these are installed.
Ubuntu: sudo apt-get install apache2 libapache2-mod-perl2
Required by the optional OLEVBMacro plugin.
- Razor2
If you plan to use Vipul's Razor, note that versions up to and
including version 2.82 include a bug that will slow down the entire
perl interpreter. Version 2.83 or later fixes this.
If you plan to use Vipul's Razor, note that versions up to and including
version 2.82 include a bug that will slow down the entire perl
interpreter. Version 2.83 or later fixes this.
If you do not plan to use this plugin, be sure to comment out
its loadplugin line in "/etc/mail/spamassassin/v310.pre".
If you do not plan to use this plugin, be sure to comment out its
loadplugin line in "/etc/mail/spamassassin/v310.pre".
- Digest::SHA1 (from CPAN)
An external perl module razor-agents-2.84 as used by a Razor2 plugin
seems to be the only remaining component depending on Digest::SHA1 (note
that a packager may ship a patched version of razor-agents which can use
Digest::SHA instead)
- LWP::UserAgent (aka libwww-perl) (from CPAN)
Can be used by sa-update to retrieve update archives, as alternative to
curl/wget/fetch.
- Net::SMTP (from CPAN)
Used when manually reporting spam to SpamCop.
Examples of installing most recommended modules on popular distributions:
Debian/Ubuntu:
apt-get install libencode-detect-perl libnet-libidn-perl \
libemail-address-xs-perl libmail-dkim-perl libmail-spf-perl \
libio-socket-ip-perl
Gentoo:
emerge dev-perl/Encode-Detect dev-perl/Net-LibIDN \
dev-perl/Email-Address-XS dev-perl/Mail-DKIM dev-perl/Mail-SPF
Fedora/CentOS/RedHat:
yum install perl-MIME-Base64 perl-Encode-Detect perl-Net-LibIDN \
perl-Email-Address-XS perl-Mail-DKIM perl-Mail-SPF perl-IO-Socket-IP
What Next?
@ -471,7 +448,7 @@ What Next?
Take a look at the USAGE document for more information on how to use
SpamAssassin.
The SpamAssassin Wiki <http://wiki.spamassassin.org/> contains
The SpamAssassin Wiki <https://wiki.apache.org/spamassassin/> contains
information on custom plugins, extensions, and other optional modules
included with SpamAssassin.

View File

@ -30,6 +30,6 @@ Notes on building SpamAssassin on VMS
- bug 1099 in the SA Bugzilla is being used to track progress.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=1099
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=1099

View File

@ -27,7 +27,7 @@ lib/Mail/SpamAssassin.pm
lib/Mail/SpamAssassin/AICache.pm
lib/Mail/SpamAssassin/ArchiveIterator.pm
lib/Mail/SpamAssassin/AsyncLoop.pm
lib/Mail/SpamAssassin/AutoWhitelist.pm
lib/Mail/SpamAssassin/AutoWelcomelist.pm
lib/Mail/SpamAssassin/Bayes.pm
lib/Mail/SpamAssassin/Bayes/CombineChi.pm
lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
@ -49,6 +49,7 @@ lib/Mail/SpamAssassin/Constants.pm
lib/Mail/SpamAssassin/DBBasedAddrList.pm
lib/Mail/SpamAssassin/Dns.pm
lib/Mail/SpamAssassin/DnsResolver.pm
lib/Mail/SpamAssassin/GeoDB.pm
lib/Mail/SpamAssassin/HTML.pm
lib/Mail/SpamAssassin/Locales.pm
lib/Mail/SpamAssassin/Locker.pm
@ -74,20 +75,23 @@ lib/Mail/SpamAssassin/Plugin/ASN.pm
lib/Mail/SpamAssassin/Plugin/AWL.pm
lib/Mail/SpamAssassin/Plugin/AccessDB.pm
lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
lib/Mail/SpamAssassin/Plugin/AuthRes.pm
lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
lib/Mail/SpamAssassin/Plugin/Bayes.pm
lib/Mail/SpamAssassin/Plugin/BodyEval.pm
lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
lib/Mail/SpamAssassin/Plugin/Check.pm
lib/Mail/SpamAssassin/Plugin/DCC.pm
lib/Mail/SpamAssassin/Plugin/DecodeShortURLs.pm
lib/Mail/SpamAssassin/Plugin/DKIM.pm
lib/Mail/SpamAssassin/Plugin/DMARC.pm
lib/Mail/SpamAssassin/Plugin/DNSEval.pm
lib/Mail/SpamAssassin/Plugin/ExtractText.pm
lib/Mail/SpamAssassin/Plugin/FreeMail.pm
lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
lib/Mail/SpamAssassin/Plugin/HashBL.pm
lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
lib/Mail/SpamAssassin/Plugin/Hashcash.pm
lib/Mail/SpamAssassin/Plugin/HashBL.pm
lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
@ -116,7 +120,7 @@ lib/Mail/SpamAssassin/Plugin/URIDetail.pm
lib/Mail/SpamAssassin/Plugin/URIEval.pm
lib/Mail/SpamAssassin/Plugin/VBounce.pm
lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
lib/Mail/SpamAssassin/Plugin/WelcomeListSubject.pm
lib/Mail/SpamAssassin/PluginHandler.pm
lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
lib/Mail/SpamAssassin/RegistryBoundaries.pm
@ -132,7 +136,6 @@ lib/Mail/SpamAssassin/Util/ScopedTimer.pm
lib/Mail/SpamAssassin/Util/TieOneStringHash.pm
lib/spamassassin-run.pod
procmailrc.example
rules.README
rules/active.list
rules/init.pre
rules/languages
@ -148,7 +151,10 @@ rules/v340.pre
rules/v341.pre
rules/v342.pre
rules/v343.pre
rules/v400.pre
rules/20_aux_tlds.cf
rules-extras/README.txt
rules-extras/10_uridnsbl_skip_financial.cf
sa-awl.raw
sa-check_spamd.raw
sa-compile.raw
@ -222,6 +228,9 @@ sql/awl_mysql.sql
sql/awl_pg.sql
sql/bayes_mysql.sql
sql/bayes_pg.sql
sql/decodeshorturl_mysql.sql
sql/decodeshorturl_pg.sql
sql/decodeshorturl_sqlite.sql
sql/userpref_mysql.sql
sql/userpref_pg.sql
sql/txrep_mysql.sql
@ -231,10 +240,13 @@ t/README
t/SATest.pl
t/SATest.pm
t/all_modules.t
t/askdns.t
t/authres.t
t/autolearn.t
t/autolearn_force.t
t/autolearn_force_fail.t
t/basic_lint.t
t/basic_lint_net.t
t/basic_lint_without_sandbox.t
t/basic_meta.t
t/basic_meta2.t
@ -246,6 +258,7 @@ t/bayessdbm.t
t/bayessdbm_seen_delete.t
t/bayessql.t
t/blacklist_autolearn.t
t/blocklist_autolearn.t
t/body_mod.t
t/body_str.t
t/check_implemented.t
@ -256,10 +269,13 @@ t/config_text.t
t/config_tree_recurse.t
t/cpp_comments_in_spamc.t
t/cross_user_config_leak.t
t/olevbmacro.t
t/dmarc.t
t/arc.t
t/data/01_test_rules.cf
t/data/01_test_rules.pre
t/data/Dumpheaders.pm
t/data/dkim/arc/ok01.eml
t/data/dkim/arc/ko01.eml
t/data/dkim/test-adsp-11.msg
t/data/dkim/test-adsp-12.msg
t/data/dkim/test-adsp-13.msg
@ -307,6 +323,15 @@ t/data/dkim/test-pass-23.msg
t/data/etc/hello.txt
t/data/etc/testhost.cert
t/data/etc/testhost.key
t/data/geodb/GeoIP2-City.mmdb
t/data/geodb/GeoIP2-Country.mmdb
t/data/geodb/GeoIP2-ISP.mmdb
t/data/geodb/GeoIPCity.dat
t/data/geodb/GeoIPISP.dat
t/data/geodb/create_GeoIPCity.README
t/data/geodb/create_GeoIPISP.README
t/data/geodb/create_ipcc.sh
t/data/geodb/ipcc.db
t/data/mime-subject.txt
t/data/nice/001
t/data/nice/002
@ -324,6 +349,7 @@ t/data/nice/013
t/data/nice/014
t/data/nice/015
t/data/nice/016
t/data/nice/authres
t/data/nice/base64.txt
t/data/nice/crlf-endings
t/data/nice/dkim/AddedVtag_07
@ -336,6 +362,10 @@ t/data/nice/dkim/MultipleSig_06
t/data/nice/dkim/NonExistingHeader_09
t/data/nice/dkim/Nowsp_03
t/data/nice/dkim/Simple_02
t/data/nice/dmarc/noneok.eml
t/data/nice/dmarc/quarok.eml
t/data/nice/dmarc/rejectok.eml
t/data/nice/dmarc/strictrejectok.eml
t/data/nice/mailman_message.txt
t/data/nice/mime1
t/data/nice/mime2
@ -353,6 +383,14 @@ t/data/nice/spf1
t/data/nice/spf2
t/data/nice/spf3
t/data/nice/spf3-received-spf
t/data/nice/spf4-received-spf-nofold
t/data/nice/spf5-received-spf-crlf
t/data/nice/spf6-received-spf-crlf2
t/data/nice/unicode1
t/data/nice/unicode2
t/data/nice.mbox
t/data/phishing/openphish-feed.txt
t/data/phishing/phishtank-feed.csv
t/data/reporterplugin.pm
t/data/spam/001
t/data/spam/002
@ -380,107 +418,149 @@ t/data/spam/badmime3.txt
t/data/spam/base64.txt
t/data/spam/bsmtp
t/data/spam/bsmtpnull
t/data/spam/decodeshorturl/base.eml
t/data/spam/decodeshorturl/base2.eml
t/data/spam/decodeshorturl/chain.eml
t/data/spam/dmarc/nodmarc.eml
t/data/spam/dmarc/noneko.eml
t/data/spam/dmarc/quarko.eml
t/data/spam/dmarc/rejectko.eml
t/data/spam/dmarc/strictrejectko.eml
t/data/spam/dnsbl.eml
t/data/spam/dnsbl_domsonly.eml
t/data/spam/dnsbl_ipsonly.eml
t/data/spam/extracttext/gtube_png.eml
t/data/spam/extracttext/gtube_pdf.eml
t/data/spam/extracttext/gtube_b64_oct.eml
t/data/spam/freemail1
t/data/spam/freemail2
t/data/spam/freemail3
t/data/spam/fromnamespoof/spoof1
t/data/spam/gtube.eml
t/data/spam/gtubedcc.eml
t/data/spam/gtubedcc_crlf.eml
t/data/spam/hashbl
t/data/spam/olevbmacro/encrypted.eml
t/data/spam/olevbmacro/goodcsv.eml
t/data/spam/olevbmacro/macro.eml
t/data/spam/olevbmacro/malicemacro.eml
t/data/spam/olevbmacro/nomacro.eml
t/data/spam/olevbmacro/renamedmacro.eml
t/data/spam/olevbmacro/target_uri.eml
t/data/spam/olevbmacro/zippwmacro.eml
t/data/spam/phishing_openphish.eml
t/data/spam/phishing_phishtank.eml
t/data/spam/pyzor
t/data/spam/razor2
t/data/spam/relayUS.eml
t/data/spam/spf1
t/data/spam/spf2
t/data/spam/spf3
t/data/spam/unicode1
t/data/spam/urilocalbl_net.eml
t/data/spamc_blank.cf
t/data/taintcheckplugin.pm
t/data/testplugin.pm
t/data/testplugin2.pm
t/data/validuserplugin.pm
t/data/whitelists/action.eff.org
t/data/whitelists/amazon_co_uk_ship
t/data/whitelists/amazon_com_ship
t/data/whitelists/cert.org
t/data/whitelists/debian_bts_reassign
t/data/whitelists/ibm_enews_de
t/data/whitelists/infoworld
t/data/whitelists/linuxplanet
t/data/whitelists/lp.org
t/data/whitelists/media_unspun
t/data/whitelists/mlist_mailman_message
t/data/whitelists/mlist_yahoo_groups_message
t/data/whitelists/mypoints
t/data/whitelists/neat_net_tricks
t/data/whitelists/netcenter-direct_de
t/data/whitelists/netsol_renewal
t/data/whitelists/networkworld
t/data/whitelists/oracle_net_techblast
t/data/whitelists/orbitz.com
t/data/whitelists/paypal.com
t/data/whitelists/register.com_password
t/data/whitelists/ryanairmail.com
t/data/whitelists/sf.net
t/data/whitelists/winxpnews.com
t/data/whitelists/yahoo-inc.com
t/data/phishing/openphish-feed.txt
t/data/phishing/phishtank-feed.csv
t/data/spam/phishing_openphish.eml
t/data/spam/phishing_phishtank.eml
t/phishing.t
t/data/welcomelists/action.eff.org
t/data/welcomelists/amazon_co_uk_ship
t/data/welcomelists/amazon_com_ship
t/data/welcomelists/cert.org
t/data/welcomelists/debian_bts_reassign
t/data/welcomelists/ibm_enews_de
t/data/welcomelists/infoworld
t/data/welcomelists/linuxplanet
t/data/welcomelists/lp.org
t/data/welcomelists/media_unspun
t/data/welcomelists/mlist_mailman_message
t/data/welcomelists/mlist_yahoo_groups_message
t/data/welcomelists/mypoints
t/data/welcomelists/neat_net_tricks
t/data/welcomelists/netcenter-direct_de
t/data/welcomelists/netsol_renewal
t/data/welcomelists/networkworld
t/data/welcomelists/oracle_net_techblast
t/data/welcomelists/orbitz.com
t/data/welcomelists/paypal.com
t/data/welcomelists/register.com_password
t/data/welcomelists/ryanairmail.com
t/data/welcomelists/sf.net
t/data/welcomelists/winxpnews.com
t/data/welcomelists/yahoo-inc.com
t/date.t
t/db_awl_path.t
t/db_awl_path_welcome_block.t
t/db_awl_perms.t
t/db_awl_perms_welcome_block.t
t/db_based_welcomelist.t
t/db_based_welcomelist_ips.t
t/db_based_whitelist.t
t/db_based_whitelist_ips.t
t/dcc.t
t/debug.t
t/decodeshorturl.t
t/desc_wrap.t
t/dkim.t
t/dnsbl.t
t/dnsbl_sc_meta.t
t/duplicates.t
t/dnsbl_subtests.t
t/enable_compat.t
t/extracttext.t
t/freemail.t
t/freemail_welcome_block.t
t/fromnamespoof.t
t/get_all_headers.t
t/get_headers.t
t/gtube.t
t/hashcash.t
t/header.t
t/header_utf8.t
t/hashbl.t
t/html_colors.t
t/html_obfu.t
t/html_utf8.t
t/idn_dots.t
t/if_can.t
t/if_else.t
t/ifversion.t
t/ip_addrs.t
t/lang_lint.t
t/lang_pl_tests.t
t/line_endings.t
t/lint_nocreate_prefs.t
t/local_tests_only.t
t/memory_cycles.t
t/metadata.t
t/mimeheader.t
t/mimeparse.t
t/missing_hb_separator.t
t/mkrules.t
t/mkrules_else.t
t/nonspam.t
t/olevbmacro.t
t/originating_ip_hdr.t
t/pdfinfo.t
t/phishing.t
t/plugin.t
t/plugin_file.t
t/plugin_priorities.t
t/prefs_include.t
t/priorities.t
t/priorities_welcome_block.t
t/perlcritic.t
t/perlcritic.pl
t/podchecker.t
t/pyzor.t
t/razor2.t
t/rcvd_parser.t
t/re_base_extraction.t
t/recips.t
t/recreate.t
t/recursion.t
t/regexp_named_capture.t
t/regexp_valid.t
t/relaycountry_fast.t
t/relaycountry_geoip.t
t/relaycountry_geoip2.t
t/relative_scores.t
t/relaycountry.t
t/report_safe.t
t/reportheader.t
t/reportheader_8bit.t
@ -500,14 +580,18 @@ t/rule_multiple.t
t/rule_names.t
t/rule_types.t
t/sa_awl.t
t/sa_awl_welcome_block.t
t/sa_check_spamd.t
t/sa_compile.t
t/sha1.t
t/shortcircuit.t
t/shortcircuit_before_dns.t
t/spam.t
t/spamc.t
t/spamc_B.t
t/spamc_E.t
t/spamc_H.t
t/spamc_bug6176.t
t/spamc_c.t
t/spamc_c_stdout_closed.t
t/spamc_cf.t
@ -540,6 +624,7 @@ t/spamd_report.t
t/spamd_report_ifspam.t
t/spamd_sql_prefs.t
t/spamd_ssl.t
t/spamd_ssl_z.t
t/spamd_ssl_accept_fail.t
t/spamd_stop.t
t/spamd_symbols.t
@ -548,8 +633,11 @@ t/spamd_unix.t
t/spamd_unix_and_tcp.t
t/spamd_user_rules_leak.t
t/spamd_utf8.t
t/spamd_welcomelist_leak.t
t/spamd_whitelist_leak.t
t/spf.t
t/spf_welcome_block.t
t/sql_based_welcomelist.t
t/sql_based_whitelist.t
t/stop_always_matching_regexps.t
t/strip2.t
@ -557,30 +645,31 @@ t/strip_no_subject.t
t/stripmarkup.t
t/tainted_msg.t
t/test_dir
t/testrules.yml
t/text_bad_ctype.t
t/timeout.t
t/trust_path.t
t/uri.t
t/uri_html.t
t/uri_list.t
t/uri_saferedirect.t
t/uri_text.t
t/uribl.t
t/urilocalbl_geoip.t
t/uri_saferedirect.t
t/uribl_all_types.t
t/uribl_domains_only.t
t/uribl_ips_only.t
t/urilocalbl.t
t/utf8.t
t/util_wrap.t
t/welcomelist_addrs.t
t/welcomelist_from.t
t/welcomelist_subject.t
t/welcomelist_to.t
t/wlbl_uri.t
t/whitelist_addrs.t
t/whitelist_from.t
t/whitelist_subject.t
t/whitelist_to.t
t/zz_cleanup.t
t/spamc_bug6176.t
t/data/spam/dnsbl_domsonly.eml
t/uribl_domains_only.t
t/data/spam/dnsbl_ipsonly.eml
t/uribl_all_types.t
t/uribl_ips_only.t
t/uri_list.t
t/dnsbl_subtests.t
powered_by/128-powered-by-spamassassin.png
powered_by/256-powered-by-spamassassin.png
powered_by/512-powered-by-spamassassin.png

View File

@ -10,6 +10,7 @@
\.pid$
\.so$
\.svn/
\.gitattributes$
\.gitignore$
\.swp$
\.tmp$
@ -22,6 +23,7 @@
\btmon\.out$
\b[Oo][Ll][Dd]$
\b[Oo][Uu][Tt]$
^.github/
^blib/
^blibdirs$
^build/3\.\d\.\d_change_summary$
@ -63,7 +65,7 @@
^t/bayessql\.cf$
^t/config$
^t/data/nice/cjk/
^t/data/whitelists/
^t/data/welcomelists/
^t/do_net$
^t/log/
^t/rule_tests\.t$
@ -118,7 +120,4 @@
^build/rebuild_xt$
^build/repackage_latest_update_rules$
^MYMETA.(json|yml)$
^trunk-only.*
^t/mkrules\.t$
^t/mkrules_else\.t$
^t/spamc_H\.t$
^textcat/

View File

@ -4,13 +4,14 @@
"The Apache SpamAssassin Project <dev@spamassassin.apache.org>"
],
"dynamic_config" : 1,
"generated_by" : "ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.120921",
"generated_by" : "ExtUtils::MakeMaker version 7.62, CPAN::Meta::Converter version 2.150010",
"license" : [
"unknown",
"apache_2_0"
],
"meta-spec" : {
"url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
"version" : "2"
"version" : 2
},
"name" : "Mail-SpamAssassin",
"no_index" : {
@ -22,26 +23,72 @@
"prereqs" : {
"build" : {
"requires" : {
"ExtUtils::MakeMaker" : "0"
"ExtUtils::MakeMaker" : "6.64"
}
},
"configure" : {
"requires" : {
"ExtUtils::MakeMaker" : "6.64"
}
},
"runtime" : {
"recommends" : {
"Archive::Zip" : "0",
"BSD::Resource" : "0",
"Compress::Zlib" : "0",
"DBD::SQLite" : "1.5901",
"DBI" : "0",
"DB_File" : "0",
"Email::Address::XS" : "0",
"Encode::Detect::Detector" : "0",
"Geo::IP" : "0",
"IO::Socket::INET6" : "0",
"IO::Socket::IP" : "0.09",
"IO::Socket::SSL" : "1.76",
"IO::String" : "0",
"IP::Country::DB_File" : "0",
"IP::Country::Fast" : "0",
"LWP::UserAgent" : "0",
"MIME::Base64" : "0",
"Mail::DKIM" : "0.37",
"Mail::DMARC" : "0",
"Mail::SPF" : "0",
"MaxMind::DB::Reader" : "0",
"MaxMind::DB::Reader::XS" : "0",
"Net::CIDR::Lite" : "0",
"Net::LibIDN" : "0",
"Net::LibIDN2" : "0",
"Net::Patricia" : "1.16",
"Net::SMTP" : "0",
"Razor2::Client::Agent" : "2.61"
},
"requires" : {
"Archive::Tar" : "1.23",
"Digest::SHA1" : "0",
"Errno" : "0",
"File::Copy" : "2.02",
"File::Spec" : "0.8",
"HTML::Parser" : "3.43",
"IO::Zlib" : "1.04",
"Mail::DKIM" : "0.31",
"Net::DNS" : "0.34",
"Net::DNS" : "0.69",
"NetAddr::IP" : "4.01",
"Pod::Usage" : "1.1",
"Sys::Hostname" : "0",
"Test::More" : "0",
"Time::HiRes" : "0",
"Time::Local" : "0"
"Time::Local" : "0",
"perl" : "5.014"
}
},
"test" : {
"recommends" : {
"Net::DNS::Nameserver" : "0"
},
"requires" : {
"Devel::Cycle" : "0",
"Perl::Critic::Policy::Perlsecret" : "0",
"Perl::Critic::Policy::TestingAndDebugging::ProhibitNoStrict" : "0",
"Test::More" : "0",
"Text::Diff" : "0"
}
}
},
@ -57,5 +104,6 @@
},
"x_MailingList" : "http://wiki.apache.org/spamassassin/MailingLists"
},
"version" : "3.004006"
"version" : "4.000000",
"x_serialization_backend" : "JSON::PP version 4.06"
}

View File

@ -3,37 +3,73 @@ abstract: 'Apache SpamAssassin is an extensible email filter which is used to id
author:
- 'The Apache SpamAssassin Project <dev@spamassassin.apache.org>'
build_requires:
ExtUtils::MakeMaker: 0
Devel::Cycle: '0'
ExtUtils::MakeMaker: '6.64'
Perl::Critic::Policy::Perlsecret: '0'
Perl::Critic::Policy::TestingAndDebugging::ProhibitNoStrict: '0'
Test::More: '0'
Text::Diff: '0'
configure_requires:
ExtUtils::MakeMaker: '6.64'
dynamic_config: 1
generated_by: 'ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.120921'
license: apache
generated_by: 'ExtUtils::MakeMaker version 7.62, CPAN::Meta::Converter version 2.150010'
license: unknown
meta-spec:
url: http://module-build.sourceforge.net/META-spec-v1.4.html
version: 1.4
version: '1.4'
name: Mail-SpamAssassin
no_index:
directory:
- t
- inc
recommends:
Archive::Zip: '0'
BSD::Resource: '0'
Compress::Zlib: '0'
DBD::SQLite: '1.5901'
DBI: '0'
DB_File: '0'
Email::Address::XS: '0'
Encode::Detect::Detector: '0'
Geo::IP: '0'
IO::Socket::INET6: '0'
IO::Socket::IP: '0.09'
IO::Socket::SSL: '1.76'
IO::String: '0'
IP::Country::DB_File: '0'
IP::Country::Fast: '0'
LWP::UserAgent: '0'
MIME::Base64: '0'
Mail::DKIM: '0.37'
Mail::DMARC: '0'
Mail::SPF: '0'
MaxMind::DB::Reader: '0'
MaxMind::DB::Reader::XS: '0'
Net::CIDR::Lite: '0'
Net::LibIDN: '0'
Net::LibIDN2: '0'
Net::Patricia: '1.16'
Net::SMTP: '0'
Razor2::Client::Agent: '2.61'
requires:
Archive::Tar: 1.23
Digest::SHA1: 0
Errno: 0
File::Copy: 2.02
File::Spec: 0.8
HTML::Parser: 3.43
IO::Zlib: 1.04
Mail::DKIM: 0.31
Net::DNS: 0.34
NetAddr::IP: 4.01
Pod::Usage: 1.1
Sys::Hostname: 0
Test::More: 0
Time::HiRes: 0
Time::Local: 0
Archive::Tar: '1.23'
Errno: '0'
File::Copy: '2.02'
File::Spec: '0.8'
HTML::Parser: '3.43'
IO::Zlib: '1.04'
Mail::DKIM: '0.31'
Net::DNS: '0.69'
NetAddr::IP: '4.01'
Pod::Usage: '1.1'
Sys::Hostname: '0'
Time::HiRes: '0'
Time::Local: '0'
perl: '5.014'
resources:
MailingList: http://wiki.apache.org/spamassassin/MailingLists
homepage: https://spamassassin.apache.org/
license: http://www.apache.org/licenses/LICENSE-2.0.html
repository: http://svn.apache.org/repos/asf/spamassassin/
x_MailingList: http://wiki.apache.org/spamassassin/MailingLists
version: 3.004006
version: '4.000000'
x_serialization_backend: 'CPAN::Meta::YAML version 0.018'

View File

@ -1,19 +1,14 @@
#!/usr/bin/perl
require 5.6.1;
require v5.14.0;
use strict;
use warnings;
use Config;
use ExtUtils::MakeMaker;
use ExtUtils::MakeMaker 6.64;
# avoid stupid 'Argument "6.30_01" isn't numeric in numeric ge (>=)' warnings;
my $mm_version = eval $ExtUtils::MakeMaker::VERSION;
# raising the version of makemaker to 6.46 per bug 6598 & 6131
if ($mm_version < 6.46) {
die "Apache SpamAssassin Makefile.PL requires at least ExtUtils::MakeMaker v6.46";
}
# raising the version of makemaker to 6.64 to use TEST_REQUIRES
use constant MIN_MAKEMAKER_VERSION => 6.64;
use constant RUNNING_ON_WINDOWS => ($^O =~ /^(mswin|dos|os2)/oi);
use constant HAS_DBI => eval { require DBI; };
@ -177,9 +172,7 @@ my %makefile = (
},
# be quite explicit about this; afaik CPAN.pm is sensible using this
# also see CURRENT_PM below
'PREREQ_PM' => {
'Digest::SHA1' => 0, # 2.0 is oldest tested version
'File::Spec' => 0.8, # older versions lack some routines we need
'File::Copy' => 2.02, # this version is shipped with 5.005_03, the oldest version known to work
'Pod::Usage' => 1.10, # all versions prior to this do seem to be buggy
@ -187,18 +180,37 @@ my %makefile = (
'Archive::Tar' => 1.23, # for sa-update
'IO::Zlib' => 1.04, # for sa-update
'Mail::DKIM' => 0.31,
'Net::DNS' => (RUNNING_ON_WINDOWS ? 0.46 : 0.34), # bugs in older revs
'Net::DNS' => 0.69,
'NetAddr::IP' => 4.010,
'Sys::Hostname' => 0,
'Test::More' => 0,
'Time::HiRes' => 0,
'Time::Local' => 0,
'Errno' => 0,
},
# In case MIN_MAKEMAKER_VERSION is greater than the version bundled in the core of MIN_PERL_VERSION
# use this to ensure CPAN will automatically upgrade MakeMaker if needed
'BUILD_REQUIRES' => {
'ExtUtils::MakeMaker' => MIN_MAKEMAKER_VERSION,
},
'CONFIGURE_REQUIRES' => {
'ExtUtils::MakeMaker' => MIN_MAKEMAKER_VERSION,
},
# The modules that are not core that are used in default tests
'TEST_REQUIRES' => {
'Devel::Cycle' => 0,
'Test::More' => 0,
'Text::Diff' => 0,
'Perl::Critic::Policy::TestingAndDebugging::ProhibitNoStrict' => 0,
'Perl::Critic::Policy::Perlsecret' => 0,
},
'dist' => {
COMPRESS => 'gzip -9f',
SUFFIX => 'gz',
SUFFIX => '.gz',
TARFLAGS => 'cf',
DIST_DEFAULT => 'tardist',
CI => 'svn commit',
@ -228,12 +240,8 @@ my %makefile = (
'rules/*.pm',
# don't remove these. they are built from 'rulesrc' in SVN, but
# in a distribution tarball, they're not
# 'rules/70_sandbox.cf',
# 'rules/72_active.cf',
# this file is no longer built, or used
'rules/70_sandbox.cf',
'rules/72_active.cf',
'rules/70_inactive.cf',
)
@ -246,6 +254,7 @@ my %makefile = (
# We have only this Makefile.PL and this option keeps MakeMaker from
# asking all questions twice after a 'make dist*'.
'NORECURS' => 1,
'MIN_PERL_VERSION'=> 5.014000,
);
# rules/72_active.cf is built from "rulesrc", but *must* exist before
@ -254,14 +263,6 @@ my @FILES_THAT_MUST_EXIST = qw(
rules/72_active.cf
);
# make sure certain optional modules are up-to-date if they are installed
# also see PREREQ_PM above
my %CURRENT_PM = (
'Net::DNS' => (RUNNING_ON_WINDOWS ? 0.46 : 0.34),
'Razor2::Client::Agent' => 2.40,
);
# All the $(*MAN1*) stuff is empty/zero if Perl was Configured with -Dman1dir=none;
# however, support site/vendor man1 dirs (bug 5338)
unless($Config{installman1dir}
@ -324,11 +325,13 @@ See that file if you wish to customize what tests are run, and how.
# check optional module versions
use lib 'lib';
use Mail::SpamAssassin::Util::DependencyInfo;
if (Mail::SpamAssassin::Util::DependencyInfo::long_diagnostics() != 0) {
# missing required module? die!
# bug 5908: http://cpantest.grango.org/wiki/CPANAuthorNotes says
# we should exit with a status of 0, but without creating Makefile
require Mail::SpamAssassin::Util::DependencyInfo;
if (Mail::SpamAssassin::Util::DependencyInfo::long_diagnostics(1) != 0) {
# This prints a full report of missing required and optional modules and binaries
# but only exit 0 without creating Makefile if there are missing required binaries.
# See http://cpantest.grango.org/wiki/CPANAuthorNotes
# Continuing when there are missing required CPAN modules allows cpan to install them
# before it runs make on the Makefile
exit 0;
}
@ -362,26 +365,45 @@ $makefile{META_MERGE} = {
MailingList => 'http://wiki.apache.org/spamassassin/MailingLists',
},
prereqs => {
runtime => {
recommends => {
'MIME::Base64' => 0,
'DB_File' => 0,
'Net::SMTP' => 0,
'Net::LibIDN2' => 0,
'Net::LibIDN' => 0,
'Mail::SPF' => 0,
'MaxMind::DB::Reader' => 0,
'MaxMind::DB::Reader::XS' => 0,
'Geo::IP' => 0,
'IP::Country::DB_File' => 0,
'IP::Country::Fast' => 0,
'Razor2::Client::Agent' => 2.61,
'Net::Ident' => 0,
'IO::Socket::IP' => 0.09,
'IO::Socket::INET6' => 0,
'IO::Socket::SSL' => 1.76,
'Compress::Zlib' => 0,
'Mail::DKIM' => 0.37,
'DBI' => 0,
'Getopt::Long' => 2.32,
'DBD::SQLite' => 1.59_01,
'LWP::UserAgent' => 0,
'HTTP::Date' => 0,
'Archive::Tar' => 1.23,
'IO::Zlib' => 1.04,
'Encode::Detect' => 0
}
'Encode::Detect::Detector' => 0,
'Net::Patricia' => 1.16,
'Net::CIDR::Lite' => 0,
'BSD::Resource' => 0,
'Archive::Zip' => 0,
'IO::String' => 0,
'Email::Address::XS' => 0,
'Mail::DMARC' => 0,
},
},
test => {
recommends => {
'Net::DNS::Nameserver' => 0,
},
},
},
};
#######################################################################
@ -390,7 +412,7 @@ $makefile{META_MERGE} = {
$makefile{EXE_FILES} = [ values %{$makefile{EXE_FILES}} ];
$makefile{AUTHOR} =~ s/(<.+) at (.+>)/$1\@$2/;
WriteMakefile(%makefile);
print "Makefile written by ExtUtils::MakeMaker $mm_version\n";
print "Makefile written by ExtUtils::MakeMaker $ExtUtils::MakeMaker::VERSION\n";
#######################################################################
@ -795,7 +817,14 @@ sub _set_macro_PERL_TAINT {
sub _set_macro_PREPROCESS {
return if get_macro('PREPROCESS');
set_macro('PREPROCESS', join(' ', macro_ref('PERL_BIN'), qq{build/preprocessor}));
# Bug 8038 - work around quirk of newer Extutils::MakeMaker on Windows with dmake
my $perl_bin = get_expanded_macro('FULLPERL');
if ($RUNNING_ON_WINDOWS and ($::Config{make} eq 'dmake') and ($perl_bin =~ /^([a-zA-Z]:)?\\"(.*)$/)) {
$perl_bin = "\"$1\\$2";
} else {
$perl_bin = macro_ref('PERL_BIN');
}
set_macro('PREPROCESS', join(' ', $perl_bin, qq{build/preprocessor}));
}
# This routine sets the value for CONFIGURE (spamc only).
@ -1131,6 +1160,7 @@ conf__install:
$(PERL) -MFile::Copy -e "copy(q[rules/v341.pre], q[$(B_CONFDIR)/v341.pre]) unless -f q[$(B_CONFDIR)/v341.pre]"
$(PERL) -MFile::Copy -e "copy(q[rules/v342.pre], q[$(B_CONFDIR)/v342.pre]) unless -f q[$(B_CONFDIR)/v342.pre]"
$(PERL) -MFile::Copy -e "copy(q[rules/v343.pre], q[$(B_CONFDIR)/v343.pre]) unless -f q[$(B_CONFDIR)/v343.pre]"
$(PERL) -MFile::Copy -e "copy(q[rules/v400.pre], q[$(B_CONFDIR)/v400.pre]) unless -f q[$(B_CONFDIR)/v400.pre]"
data__install:
-$(MKPATH) $(B_DATADIR)

View File

@ -289,25 +289,25 @@ Resources
---------
[BUGZILLA] SpamAssassin bug database:
<http://issues.apache.org/SpamAssassin/>
<https://issues.apache.org/SpamAssassin/>
[DEBPERL] Debian Perl Policy, Chapter 3: Packaged Modules:
<http://www.debian.org/doc/packaging-manuals/perl-policy/ch-module_packages.html>
<https://www.debian.org/doc/packaging-manuals/perl-policy/ch-module_packages.html>
[GNUMAKECMD] GNU make manual: Make Conventions: Variables for Specifying
Commands
<http://www.gnu.org/manual/make-3.79.1/html_chapter/make_14.html#SEC119>
<https://www.gnu.org/software/make/manual/html_node/Command-Variables.html#Command-Variables>
[MANEUMM616] The man page for ExtUtils::MakeMaker 6.16:
<http://search.cpan.org/author/MSCHWERN/ExtUtils-MakeMaker-6.16/lib/ExtUtils/MakeMaker.pm#Default_Makefile_Behaviour>
<https://search.cpan.org/author/MSCHWERN/ExtUtils-MakeMaker-6.16/lib/ExtUtils/MakeMaker.pm#Default_Makefile_Behaviour>
[MM00779] makemaker-at-perl-dot-org: Michael G Schwern: "Re: MakeMaker
problems with relocation" (PREFIX was broken):
<http://www.mail-archive.com/makemaker@perl.org/msg00779.html>
<https://www.mail-archive.com/makemaker@perl.org/msg00779.html>
[P5P94113] perl5-porters: Michael G Schwern: "Re: OS X's vendorlib default
seems wrong" (description of different repositoreis):
<http://archive.develooper.com/perl5-porters@perl.org/msg94113.html>
<https://archive.develooper.com/perl5-porters@perl.org/msg94113.html>
[RHBUG78053] Red Hat bug 78053: "incompatible changes in behavior of
MakeMaker; affects rpm build process" (introduction of DESTDIR):

View File

@ -14,7 +14,7 @@ filtering, DNS blocklists, and collaborative filtering databases.
Apache SpamAssassin is a project of the Apache Software Foundation (ASF).
What Apache SpamAssassin is Not
What Apache SpamAssassin Is Not
-------------------------------
Apache SpamAssassin is not a program to delete spam, route spam and ham to
@ -71,9 +71,9 @@ spamassassin-users mailing list[2]. If you've found a bug (and you're
sure it's a bug after checking the Wiki), please file a report in our
Bugzilla[3].
[1]: http://wiki.apache.org/spamassassin/
[2]: http://wiki.apache.org/spamassassin/MailingLists
[3]: http://issues.apache.org/SpamAssassin/
[1]: https://wiki.apache.org/spamassassin/
[2]: https://wiki.apache.org/spamassassin/MailingLists
[3]: https://issues.apache.org/SpamAssassin/
Please also be sure to read the man pages.
@ -151,7 +151,7 @@ default locations that Apache SpamAssassin will look at the end.
- $USER_HOME/.spamassassin:
User state directory. Used to hold spamassassin state, such
as a per-user automatic whitelist, and the user's preferences
as a per-user automatic welcomelist, and the user's preferences
file.
- $USER_HOME/.spamassassin/user_prefs:
@ -298,7 +298,7 @@ these translations, so that they can be added to the
distribution. Please file a bug in our Bugzilla[4], and attach your
translations. You will, of course, be credited for this work!
[4]: http://issues.apache.org/SpamAssassin/
[4]: https://issues.apache.org/SpamAssassin/
Disabled code

View File

@ -1,607 +1,259 @@
Note for Users Upgrading to SpamAssassin 3.4.5
Note for Users Upgrading to SpamAssassin 4.0.0
----------------------------------------------
- Spamassassin test suite can now run against the installed
SpamAssassin files (rather than those in the source directory)
Apache SpamAssassin 4.0.0 represents years of work by the project with
numerous improvements, new rule types, and internal native handling
of messages in international languages. We highly recommend looking
through this file and all of the .pre files to evaluate your
configuration thoroughly. Plugins have been added, removed, and
improved throughout.
- All rules, functions, command line options and modules that contain
"whitelist" or "blacklist" have been renamed to contain more
racially neutral "welcomelist" and "blocklist" terms. This allows
acronyms like WL and BL to remain the same. Previous options will
continue work at least until version 4.1.0 is released. If you have
local settings including scores or meta rules referring to old rule
names, these should be changed and "enable_compat
welcomelist_blocklist" added in init.pre. See:
https://wiki.apache.org/spamassassin/WelcomelistBlocklist (Bug 7826)
- Meta rules no longer use priority values, they are evaluated
dynamically when the rules they depend on are finished. (Bug 7735)
- API: New $pms->rule_ready() function. Any asynchronous eval-function
must now return undef (instead of 0 or 1), if rule result is not
ready when exiting the function. $pms->rule_ready($rulename) or
$pms->got_hit(...) must be called when the result has arrived. If
these are not used, it can break depending meta rule evaluation.
- Setting normalize_charset is now enabled by default. Note that rules
should not expect specific non-UTF8 or UTF8 encoding in
body. Matching is done against the raw data which may vary depending
on normalize_charset setting and whether decoding to UTF8 was
successful. See:
https://wiki.apache.org/spamassassin/WritingRulesAdvanced
- DKIM plugin has added support for ARC signature verification
- The DecodeShortURL plugin has been added and decodes URIs from URL
shorteners that may be used to evade scanning
- Strings can now be captured from rules and later reused using the
special %{TAGNAME} syntax
- The Bayes stopwords, or noise words, are now configurable in order
to optimize Bayes usage for non-English languages. Stopwords for 16
foreign languages have been included. See 60_bayes_stopwords.cf in
the rules files. See Mail::SpamAssassin::Plugin::Bayes and the
bayes_stopword_languages option if you wish to use a different
stopword list. This is highly recommended if you are using Bayes and
you are processing messages in languages other than English.
- The OLEVBMacro plugin has been improved to identify more macros
while also extracting uris from the attachments for automatic
inclusion in RBL lookups
- Internationalized domain name (IDN) support has been added and
requires Net::LibIDN2 or Net::LibIDN module with a new
Util::idn_to_ascii() function. (Bug 7215)
- Improved internal header address (From/To/Cc) parser, now also
handles multiple addresses and includes optional support for
external Email::Address::XS parser, which can handle nested comments
and other oddities.
- Header :addr :name modifiers now return all addresses. Options of
:first :last select only first (topmost) or last header to process
when there are multiple headers with the same name. :addr and :name
may still return multiple values from a single header.
- API: $pms->get() can and should now be called in list
context. Scalar context continues to return multiple values newline
separated, but this should be considered deprecated.
- New ExtractText plugin that extracts text from documents or images
to feed the data into SpamAssassin for standard processing with
existing rules, URIs extracted from documents will fall into normal
RBL lookups.
- New "nolog" tflag added to hide info coming from rules in
SpamAssassin reports
- All log output (stderr, file, syslog) is now escaped properly for \r
\n \t \\, control chars, DEL, and UTF-8 sequences presented as
\x{XX}. Whitespace is not normalized anymore like in versions prior
to 4.0.0.
- API: Logger::add() has new optional 'escape' parameter. New
Logger::escape_str() function.
- API: New $pms->add_uri_detail_list() function. Also new
uri_detail_list types: unlinked, schemeless
- Util::split_domain, trim_domain, and is_domain_valid functions have
a new optional argument ($is_ascii)
- Header names support new :host :domain :ip :revip modifiers
- AskDNS: tag HEADER(hdrname) supported to query any header content
similarly to header rules
- The HashCash module and support has been removed completely, as it
has been long since deprecated
- URILocalBL: uri_block_cc/uri_block_cont now support negation (Bug
7528)
- URILocalBL: IPv6 lookups for hosts is now support, if provided by
your database
- DNS and other asynchronous lookups such as Pyzor and DCC are now
only launched when priority -100 is reached. This allows short
circuiting at a lower priority without sending unneeded DNS queries
and starting process forms. (Bug 5930)
- API: New plugin method callback method check_dnsbl added to launch
network lookups at priority -100 and check_post_dnsbl to harvest own
network lookups
- API: New plugin callback method check_cleanup for cleaning up
things...
- FreeMail: new options freemail_import_welcomelist_auth and
freemail_import_def_welcomelist_auth added (Bug 6451)
- New internal Mail::SpamAssassin::GeoDB module that provides a
unified interface to modules MaxMind::DB::Reader (GeoIP2), Geo::IP,
IP::Country::DB_File, and IP::Country::Fast.
This is utilized by RelayCountry and URILocalBL with settings
geodb_module, geodb_options, and geodb_search_path.
Deprecated settings still work such as country_db_type,
country_db_path, uri_country_db_path, and uri_country_db_isp_path
but will print a warning to migrate to geodb_module/options.
- Razor2 razor_fork option added to create separate Razor2 processes
and read in the results later asynchronously, increasing throughput,
and automatically adjusting rule priorities to -100.
- DCC checks are now done asynchronously if using dccifd, improving
throughput. With dccifd, rule priorities are automatically adjusted
to -100. Commercial reputation rules can be ignored with the option
"use_dcc_rep 0" to save a few CPU cycles.
- Pyzor pyzor_fork option added to create separate Pyzor processes and
read in the results later asynchronously, increasing throughput, and
automatically adjusting rule priorities to -100. Renamed pyzor_max
setting to pyzor_count_min. Added pyzor_welcomelist_min and
pyzor_welcomelist_factor setting. Also try to improve false
positives by ignoring "empty body" messages.
- API: deprecated $pms->register_async_rule_start() and
$pms->register_async_rule_finish() calls though left in for
backwards compatibility. Plugins should only use
$pms->bgsend_and_start_lookup(), which handles required things
Automatically. Direct calls to bgsend or start_lookup should not be
used. $pms->bgsend_and_start_lookup() should always contain
$ent->{rulename} for correct meta dependency handling. Deprecated
start_lookup, get_lookup, lookup_ns, harvest_until_rule_completes,
and is_rule_complete.
- SPF: Mail::SPF is now the only supported perl module and
Mail::SPF::Query is deprecated along with the settings
do_not_use_mail_spf, and do_not_use_mail_spf_query. SPF lookups are
not done asynchronously so using an MTA filter such as pypolicyd-spf
or spf-engine can generate Received-SPF for SpamAssassin to parse.
- unwhitelist_auth now also removes def_whitelist_auth entries
- "ALL" pseudo-header now returns decoded headers, so it's usage is
consistent with single header matching. Using the :raw option mimics
the previous behavior of with undecoded and folded headers.
- SPF: add unwhitelist_from_spf to remove both whitelist_from_spf and
def_whitelist_from_spf entries
- New dns_block_rule option handles blocked DNSBLs (Bug 6728)
- Default SQL schema for userpref.value changed from varchar(100) to
varchar(255), no need to modify unless you hit the limit. (Bug 7803)
- ASN: Support GeoDB for ASN lookups (asn_use_geodb, asn_prefer_geodb,
asn_use_dns).
- URIDetail can now match full hostname with "host" key
- ASN: Default sa-update ruleset doesn't make ASN lookups or add
headers anymore. Configure desired methods, asn_use_geodb or
asn_use_dns, and add_header clauses manually as described in the
plugin documentation. Usage of asn_use_geodb without DNS is
recommended unless ASNCIDR is needed. Do not use rules that check
metadata X-ASN header! Only the new eval function check_asn()
described in plugin manual works reliably.
- BodyEval: plaintext_body_sig_ratio: eval rules for the (first text/plain
MIME part's) body and signature lengths and ratio
Note for Users Upgrading to SpamAssassin 3.4.4
----------------------------------------------
- FromNameSpoof: fns_extrachars parameter default value has been increased to 50
- nosubject and maxhits tflags now work correctly with sa-compile
Note for Users Upgrading to SpamAssassin 3.4.3
----------------------------------------------
- New subjprefix keyword added, this can be used to add a prefix to
email Subject if the original email matches a particular rule
- New Util::is_fqdn_valid() function to validate hostname (DNS name) format
(Bug 7736). To check if a name contains valid TLD, it's still needed to
additionally use RegistryBoundaries::is_domain_valid()
- New OLEVBMacro plugin to detect OLE Macro inside documents attached to emails,
this plugin requires Archive::Zip and IO::String Perl modules to work.
- Due to the dangerous nature of sa-update --allowplugins option, it
now prints a warning that --reallyallowplugins is required to use it.
This is to make sure all the legacy installations and wiki guides etc
still using it needlessly get fixed.
- TxRep and Awl plugins has been modified to be compatible
with latest Postgresql versions.
You should upgrade your sql database running the following command:
MySQL:
"ALTER TABLE `txrep` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';"
"ALTER TABLE `awl` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';"
"ALTER TABLE `awl` ADD last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;"
PostgreSQL:
"ALTER TABLE txrep RENAME COLUMN count TO msgcount;"
"ALTER TABLE awl RENAME COLUMN count TO msgcount;"
"ALTER TABLE awl ADD last_hit timestamp NOT NULL default CURRENT_TIMESTAMP;"
- body_part_scan_size 50000, rawbody_part_scan_size 500000 defaults added (Bug 6582)
These enable safer and faster scanning of large emails.
- ALL pseudo-header now returns decoded headers, so it's usage is consistent
with single header matching (:raw returns undecoded and folded like before).
- RegistryBoundaries did not load 20_aux_tlds.cf properly in older versions.
Old hardcoded list is now removed and RB will print "no tlds defined, need
to run sa-update" unless it can find list from config files.
- Deprecated functions: Parser::is_delimited_regexp_valid(),
Parser::is_regexp_valid(), Util::regexp_remove_delimiters(),
Util::make_qr(). These all are combined into new Util::compile_regexp().
- DNSEval: add check_rbl_headers to check specific headers in rbl
- DNSEval: add check_rbl_ns_from to check against an rbl for dns servers
- HashBL: Add check_hashbl_bodyre, check_hashbl_emails, check_hashbl_uris,
hashbl_ignore
- ASN: Support IPv6 with asn_lookup_ipv6 (Bug 7211)
- sa-update: New option --httputil to force used download utility
- Add rules_matching() expression to meta rules
- Add tflags domains_only/ips_only to DNSEval.pm functions
- RelayCountry: Added new metadata: X-Spam-Countries-External (_RELAYCOUNTRYEXT_),
X-Spam-Countries-Auth (_RELAYCOUNTRYAUTH_), X-Spam-Countries-All (_RELAYCOUNTRYALL_)
- New tflag "nosubject" for 'body' rules, to stop matching the Subject
header which is part of the body text.
Note for Users Upgrading to SpamAssassin 3.4.2
----------------------------------------------
- We now support SHA-512 and SHA-256 signatures for our rules updates.
- We may stop producing SHA-1 signatures in the near future so upgrading
to 3.4.2 is important. sa-update no longer uses these signatures.
See https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7614
- freemail_import_whitelist_auth, freemail_import_def_whitelist_auth added (Bug 6451)
New plugins
-----------
There are four new plugins added with this release:
Mail::SpamAssassin::Plugin::HashBL
The HashBL plugin is the interface to The Email Blocklist (EBL).
The EBL is intended to filter spam that is sent from IP addresses
and domains that cannot be blocked without causing significant
numbers of false positives.
Mail::SpamAssassin::Plugin::ResourceLimits
This plugin leverages BSD::Resource to assure your spamd child processes
do not exceed specified CPU or memory limit. If this happens, the child
process will die. See the BSD::Resource for more details.
Mail::SpamAssassin::Plugin::FromNameSpoof
This plugin allows for detection of the From:name field being used to mislead
recipients into thinking an email is from another address. The man page
includes examples and we expect to put test rules for this plugin into
rulesrc soon!
Mail::SpamAssassin::Plugin::Phishing
This plugin finds uris used in phishing campaigns detected by
OpenPhish (https://openphish.com) or PhishTank (https://phishtank.com) feeds.
These plugins are disabled by default. To enable, uncomment
the loadplugin configuration options in file v342.pre, or add it to
some local .pre file such as local.pre .
Notable changes
---------------
For security reasons SSLv3 support has been removed from spamc(1).
GeoIP2 support has been added to RelayCountry and URILocalBL plugins due
to GeoIP legacy api deprecations.
New configuration options
-------------------------
A new template tag _DKIMSELECTOR_ that maps to the DKIM selector (the 's' tag)
from valid signatures has been added.
A 'uri_block_cont' option to URILocalBL plugin to score uris per continent has been added.
Possible continent codes are:
af, as, eu, na, oc, sa for Africa, Asia, Europe, North America,
Oceania and South America.
The 'country_db_type' and 'country_db_path' options has been added to be able
to choose in RelayCountry plugin between GeoIP legacy
(discontinued from 04/01/2018), GeoIP2, IP::Country::Fast
and IP::Country::DB_File.
GeoIP legacy is still the default option but it will be deprecated
in future releases.
A config option 'uri_country_db_path' has been added to be able to choose
in URILocalBL plugin between GeoIP legacy and new GeoIP2 api.
A config option 'resource_limit_cpu' (default: 0 or no limit) has been added
to configure how many cpu cycles are allowed on a child process before it dies.
A config option 'resource_limit_mem' (default: 0 or no limit) has been added
to configure the maximum number of bytes of memory allowed both for
(virtual) address space bytes and resident set size.
A new config option 'report_wrap_width' (default: 70) has been added
to set the wrap width for description lines in the X-Spam-Report header.
Notable Internal changes
------------------------
SpamAssassin can cope with new Net::DNS module versions.
The "bytes" pragma has been remove from both core modules and plugins for
better utf-8 compatibility, there has been also some other utf-8 related fixes.
The spamc(1) client can now be build against OpenSSL 1.1.0.
The test framework has been switched to Test::More module.
Other updates
-------------
A list of top-level domains in registrar boundaries was updated.
Note for Users Upgrading to SpamAssassin 3.4.1
----------------------------------------------
- The TxRep plugin is now included and disabled by default for new installs.
To replace an existing AWL configuration with TxRep, follow the steps below:
- Disable AWL
- Enable TxRep
- Set txrep_factor equal to your previous AWL factor
- Set use_txrep to 1
For more detailed information and more configuration options, consult the
documentation in Mail::SpamAssassin::Plugin::TxRep.
- The $VALID_TLDS_RE global in registrar boundaries is deprecated but kept for
third-party plugin compatibility. It will become increasingly out of date
and may be removed in a future release.
See lib/Mail/SpamAssassin/Plugin/FreeMail.pm for an example of the new way
to obtain a valid list of TLDs, i.e.
$self->{main}->{registryboundaries}->{valid_tlds_re}
- Mail::SpamAssassin::Util::RegistrarBoundaries is being replaced by
Mail::SpamAssassin::RegistryBoundaries so that new TLDs can be updated via
20_aux_tlds.cf delivered via sa-update.
***3rd Party Plugin Authors, Please Note***
The following functions will be removed in the next release after 3.4.1
excepting any emergency break/fix releases immediately after 3.4.1:
Mail::SpamAssassin::Util::RegistrarBoundaries::is_domain_valid
Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain
Mail::SpamAssassin::Util::RegistrarBoundaries::split_domain
Mail::SpamAssassin::Util::uri_to_domain
And the following variables will also be removed:
Mail::SpamAssassin::Util::RegistrarBoundaries::US_STATES
Mail::SpamAssassin::Util::RegistrarBoundaries::THREE_LEVEL_DOMAINS
Mail::SpamAssassin::Util::RegistrarBoundaries::TWO_LEVEL_DOMAINS
Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS_RE
Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS
This change should only affect 3rd party plugin authors who will need to
update their code to utilize Mail::SpamAssassin::RegistryBoundaries
instead of the functions and variables in
Mail::SpamAssassin::Util::RegistrarBoundaries and the function
Mail::SpamAssassin::Util::uri_to_domain which are deprecated and will be
removed.
For example, the $VALID_TLDS_RE global in registrar boundaries is
deprecated but kept for third-party plugin compatibility. It will become
increasingly out of date and may be removed in a future release.
See lib/Mail/SpamAssassin/Plugin/FreeMail.pm for an example of the new way
to obtain a valid list of TLDs, i.e.
$self->{main}->{registryboundaries}->{valid_tlds_re}
- It is now recommended that users uncomment "normalize_charset 1" in
local.cf. It will break rules that depend on messages being in non-UTF8
encodings, but going forward this will enable more robust unicode rules and
will eventually become the default.
Note for Users Upgrading to SpamAssassin 3.4.0
----------------------------------------------
- When Bayes classification is in use and messages are 'learned' as spam
or ham and stored in a database, the Bayes plugin generates internal
message IDs of learned messages and stores them in a 'seen' database to
avoid re-learning duplicates and accidental un-learning messages that
were not previously learned. With changes in bug 5185, the calculation
of message IDs in a bayes 'seen' database has changed, so new code can
no longer associate new messages with those learned before the change.
- Note that this change does not affect recognition of old tokens and the
classification algorithm, only duplicate detection and unlearning of old
messages is affected.
- Because of this change, if you use Bayes and you are upgrading from a
version prior to 3.4.0, you may consider wiping your Bayes database
and starting fresh.
- There is a new optional dependency on Net::Patricia to speed up lookups
on internal_networks, trusted_networks or msa_networks when these lists
contain a larger number of entries. Consider installing this module to
speed up classification.
- The minimal required version of NetAddr::IP was bumped to 4.010
- In addition to existing backends, the 3.4.0 introduces support for keeping
a Bayes database on a Redis server, either running locally, or accessed
over network. Similar to SQL backends, the database may be concurrently
used by several hosts running SpamAssassin.
- For more detail on these and other changes, please see the Announcement
file at:
http://svn.apache.org/repos/asf/spamassassin/branches/3.4/build/announcements/3.4.0.txt
Note for Users Upgrading to SpamAssassin 3.3.0
-----------------------------------------------
- Rules are no longer included with SpamAssassin "out of the box". You will
need to immediately run "sa-update", or download the additional rules .tgz
package and run "sa-update --install" with it, to get a ruleset.
- The BETA label has been taken off of the SpamAssassin SQL support. Please
be aware that occasional changes may still be made to this area of the
code. You should be sure to read this upgrade document each time you
install a new version to determine if any SQL updates need to be made to
your local installation.
- The DKIM plugin is now enabled by default for new installs, if the perl
module Mail::DKIM is installed. However, installation of SpamAssassin
will not overwrite existing .pre configuration files, so to use DKIM when
upgrading from a previous release that did not use DKIM, a directive:
loadplugin Mail::SpamAssassin::Plugin::DKIM
will need to be uncommented in file "v312.pre", or added to some
other .pre file, such as local.pre.
Note for Users Upgrading to SpamAssassin 3.2.0
-----------------------------------------------
- The "127/8" network, including 127.0.0.1, is now always implicitly part of
"trusted_networks" and "internal_networks". It's impossible to remove these
from the trusted/internal sets, since if you cannot trust the host where
SpamAssassin is running, you cannot trust SpamAssassin itself. If you
previously had "trusted_networks" and "internal_networks" lines listing those
hosts, you should now remove them, otherwise a minor (non-lint-error) warning
will be output.
- For ISPs -- version 3.2.0 now includes a new way to specify Mail Submission
Agents, relay hosts which accept mail from your own users and authenticates
them appropriately. See the Mail::SpamAssassin::Conf manual page for the
"msa_networks" setting.
Note for Users Upgrading to SpamAssassin 3.1.0
-----------------------------------------------
- A significant amount of core functionality has been moved into
plugins. These include, AWL (auto-whitelist), DCC, Pyzor, Razor2,
SpamCop reporting and TextCat. For information on configuring these
plugins please refer to their individual documentation:
perldoc Mail::SpamAssassin::Plugin::* (ie AWL, DCC, etc)
- There are now multiple files read to enable plugins in the
/etc/mail/spamassassin directory; previously only one, "init.pre" was
read. Now both "init.pre", "v310.pre", and any other files ending
in ".pre" will be read. As future releases are made, new plugins
will be added to new files named according to the release they're
added in.
- Due to license restrictions the DCC and Razor2 plugins are disabled
by default. We encourage you to read the appropriate license
yourself and decide if you are able to re-enable the plugins for
your site.
- The use_auto_whitelist config option has been moved to a user config
option, this allows individual users to turn on or off whitelisting
regardless of what is set in the system config. If you would like to
disable AWL (auto-whitelist) on a site-wide basis, then you can comment
out the plugin in "v310.pre".
- The bayes_auto_learn_threshold_* config options for bayes have moved
to a plugin. In general the default should work just fine however
if you are interested in changing these values you should see:
perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold
- The AWL support for NDBM_File databases has been dropped, due to a
bug in that package which renders it useless (bug 4353). It appears
that SDBM_File, the package which will be used instead, is fully
compatible with NDBM however, so this should be unnoticeable.
- The prefork algorithm for spamd has been changed. In this version spamd
will attempt to keep a small number of "hot" child processes as busy as
possible, and keep any others as idle as possible, using something
similar to the Apache httpd server scaling algorithm. This reduces
memory usage and swapping. You can use the --round-robin switch for
spamd to disable this scaling algorithm, and the behaviour seen in the
3.0.x versions will be used instead, where all processes receive an
equal load and no scaling takes place.
- As of 3.1.0, in addition to the generic BayesSQL support (via
Mail::SpamAssassin::BayesStore::SQL) usable by multiple database
drivers there is now specific support for MySQL 4.1+ and
PostgreSQL. This support is based on non-standard features present
in both database servers that allow for various performance boosts.
If you were using the previous BayesSQL support with MySQL, and
already have MySQL 4.1+ installed you can begin using the new module
immediately by replacing the bayes_store_module line in your
configuration with: Mail::SpamAssassin::BayesStore::MySQL
We do however recommend that you switch your MySQL tables over to
InnoDB for better data integrity and multi user support. You can
most often do this via a simple ALTER TABLE command, refer to the
MySQL documentation for more information.
If you were previously using PostgreSQL for your bayes database then
we STRONGLY recommend switching to the PostgreSQL specific backend:
Mail::SpamAssassin::BayesStore::PgSQL
To switch to this backend you should first run sa-learn --backup to
make a backup of your existing data and then drop and recreate the
database following the instructions in sql/README.bayes. Then you
can restore the database with sa-learn --restore. If you have
multiple users then you will have to run --backup and --restore for
each user to fully restore the database.
- See http://wiki.apache.org/spamassassin/UpgradeTo310 for a
supplementary list of upgrade notes. It will be updated with any
upgrade notes not captured in this document.
- Finally, this document is likely not complete. Other configuration
options/arguments may have changed from older versions, etc. It would
be good to double-check any custom configuration options to make sure
they're still valid. This could be as simple as running "spamassassin
--lint", or more complex, as required by the environment.
Note for Users Upgrading to SpamAssassin 3.0.x
----------------------------------------------
- The SpamAssassin 2.6x release series was the last set of releases to
officially support perl versions earlier than perl 5.6.1. If you are
using an earlier version of perl, you will need to upgrade before you
can use the 3.0.0 version of SpamAssassin. You will also want to make
sure that you have the appropriate versions of required and optional
modules as they may have changed from old versions. The INSTALL
document has the modules and version requirements listed.
- See http://wiki.apache.org/spamassassin/UpgradeTo300 for a
supplementary list of upgrade notes. It will be updated with any
upgrade notes not captured in this document.
- SpamAssassin 3.0.0 has a significantly different API (Application Program
Interface) from the 2.x series of code. This means that if you use
SpamAssassin through a third-party utility (milter, etc,) you need to make
sure you have an updated version which supports 3.0.0. See
http://wiki.apache.org/spamassassin/UpgradeTo300 for information about
third-party software.
- The --auto-whitelist, --whitelist and -a options for "spamd" and
"spamassassin" to turn on the auto-whitelist have been removed and
replaced by the "use_auto_whitelist" configuration option which is
also now turned on by default.
- The --virtual-config switch for spamd had to be dropped, due to licensing
issues. It is replaced by the --virtual-config-dir switch.
- The "rewrite_subject" and "subject_tag" configuration options were
deprecated and are now removed. Instead, using "rewrite_header Subject
[your desired setting]". e.g.
rewrite_subject 1
subject_tag ****SPAM(_SCORE_)****
becomes
rewrite_header Subject ****SPAM(_SCORE_)****
- The "sa-learn --rebuild" command has been deprecated; please use
"sa-learn --sync" instead. The --rebuild option will remain temporarily
for backward compatibility.
- The Bayesian storage modules have been completely re-written and now
include Berkeley DB (DBM) storage as well as SQL based storage (see
sql/README.bayes for more information). In addition, a new format
has been introduced for the bayes database that stores tokens in fixed
length hashes (Bayes v3). All DBM databases should be automatically
converted to this new format the first time they are opened for write.
You can manually perform the upgrade by running "sa-learn --sync"
from the command line.
Due to the database format change, you will want to do something like
this when upgrading:
- stop running spamassassin/spamd (ie: you don't want it to be running
during the upgrade)
- run "sa-learn --rebuild", this will sync your journal. if you skip
this step, any data from the journal will be lost when the DB is
upgraded.
- upgrade SA to 3.0.0
- run "sa-learn --sync", which will cause the db format to be upgraded.
if you want to see what is going on, you can add the "-D" option.
- test the new database by running some sample mails through
SpamAssassin, and/or at least running "sa-learn --dump" to make sure
the data looks valid.
- start running spamassassin/spamd again
If, instead of uprading your Bayes database, you want to wipe it and
start fresh, you can run "sa-learn --clear" to safely remove your
Bayes database files. If the --clear command issues an error then
you can simply delete the Bayes database files ("bayes_*") while SA
is not running; SpamAssassin will recreate them in the current
format when it runs.
- "spamd" now has a default max-children setting of 5; no more than 5
child scanner processes will be run in parallel. Previously, there was
no default limit unless you specified the "-m" switch when starting
spamd.
- If you are using a UNIX machine with all database files on local disks,
and no sharing of those databases across NFS filesystems, you can use a
more efficient, but non-NFS-safe, locking mechanism. Do this by adding
the line "lock_method flock" to the /etc/mail/spamassassin/local.cf
file. This is strongly recommended if you're not using NFS, as it is
much faster than the NFS-safe locker.
- Please note that the use of the following commandline parameters for
spamassassin and spamd have been deprecated and may be removed in
upcoming versions of SpamAssassin. Please discontinue usage of these
options:
in the 2.6x series: --add-from, --pipe, -F, --stop-at-threshold,
-S, -P (spamassassin only)
in the 3.0.x series: --auto-whitelist, -a, --whitelist-factory, -M,
--warning-from, -w, --log-to-mbox, -l
- user_scores_sql_table is no longer supported. If you need to use a table
name, other than the default, create a custom query using the
user_scores_sql_custom_query config option.
- SpamAssassin runs in "taint mode" by default for improved security.
Certain third-party modules may be incompatible with taint mode.
- 2.6x deprecated the use of the "check_bayes_db" script, and it
has been removed in 3.0.0. Please see the sa-learn man/pod
documentation for more info.
- Finally, this document is likely not complete. Other configuration
options/arguments may have changed from older versions, etc. It would
be good to double-check any custom configuration options to make sure
they're still valid. This could be as simple as running "spamassassin
--lint", or more complex, as required by the environment.
An example: "require_version <version>" hasn't changed itself, but the
internal version representation is now "x.yyyzzz" instead of "x.yz"
which could cause issues if "require_version 3.00" is expected to work
(it won't, it needs to be "require_version 3.000000").
Note for Users Upgrading from SpamAssassin 2.5x
-----------------------------------------------
- Due to major reliability shortcomings in the database support libraries
other than DB_File, we now require that the DB_File module be installed
to use SpamAssassin's Bayes rules.
SpamAssassin will still work without DB_File installed, but the Bayes
support will be disabled.
If you install DB_File and wish to import old Bayes database data, each
user with a Bayes db should run "sa-learn --import" to copy old entries
from the other formats into a new DB_File file.
Due to the database library change, and the change to the database
format itself, you will want to do something like this when upgrading:
- stop running spamassassin/spamd (ie: you don't want it to be running
during the upgrade)
- run "sa-learn --rebuild", this will sync your journal. if you skip
this step, any data from the journal will be lost when the DB is
upgraded.
- install DB_File module if necessary
- upgrade SA to 3.0.0
- if you were using another database module previously, run "sa-learn
--import" to migrate the data into new DB_File files
- run "sa-learn --sync", which will cause the db format to be upgraded.
if you want to see what is going on, you can add the "-D" option.
- test the new database by running some sample mails through
SpamAssassin, and/or at least running "sa-learn --dump" to make sure
the data looks valid.
- start running spamassassin/spamd again
Obviously the steps will be different depending on your environment, but
you get the idea. :)
Note For Users Upgrading From SpamAssassin 2.3x or 2.4x
-------------------------------------------------------
- SpamAssassin no longer includes code to handle local mail delivery, as
it was not reliable enough, compared to procmail. So now, if you relied
on spamassassin to write the mail into your mail folder, you'll have to
change your setup to use procmail as detailed below. If you used
spamassassin to filter your mail and then something else wrote it into a
folder for you, then you should be fine.
- Support for versions of the optional Mail::Audit module is no longer
included.
- The default mode of tagging (which used to be ***SPAM*** in the subject
line) no longer takes place. Instead the message is rewritten. If an
incoming message is tagged as spam, instead of modifying the original
message, SpamAssassin will create a new report message and attach the
original message as a message/rfc822 MIME part (ensuring the original
message is completely preserved and easier to recover). If you do not
want to modify the body of incoming spam, use the "report_safe" option.
The "report_header" and "defang_mime" options have been removed as a
result.
- sa-update: New --score-multiplier, --score-limit, and --forcemirror
options added.
#1 forcemirror: forces sa-update to use a specific mirror server,
#2 score-multiplier: adjust all scores from update channel by a
given multiplier to quickly level set scores to match your
preferred threshold
#3 score-limit adjusts all scores from update channel over a
specified limit to a new limit
- New dns_options "nov4" and "nov6" added. IMPORTANT:; You must set
nov6 if your DNS resolver is filtering IPv6 AAAA replies.
- API: Added Message::get_pristine_body_digest(),
Message::get_msgid(), and Message::generate_msgid()
functions. Removed deprecated private Plugin::Bayes::get_msgid()
function.
- Bayes and TxRep seen Message-ID tracking hashing method changed. No
actions are required. If re-learning some old messages, they might
be learned twice but old IDs should expire automatically.
- report_charset defaults now to UTF-8.
- Meta rules inherit net tflag setting from dependencies (Bug 7735)
- BodyEval: Added plaintext_body_sig_ratio eval rules for the first
text/plain MIME part's body and signature length ratio.
- API: Now supports multiple calls of $pms->test_log() for
rules. Added $pms->check_cleanup() to finalize tags, reports,
etc. Deprecated internal $pms->{test_log_msgs}, renamed to
$pms->{test_logs}. Deprecated $pms->clear_test_state() as it is not
needed anymore. $pms->test_log() now accepts $rulename as second
argument.
- URIDNSBL: urirhsbl/urirhssub rules support "notrim" tflag to force
querying the full hostname instead of just the domain. This works
best if the specific uribl supports this mode. (Bug 7835)
- Removed deprecated --auth-ident and --ident-timeout options from
spamd
- MIMEHeader: support matching ALL header, tflags range, and tflags
concat
- Autolearn: add new tflags autolearn_header/autolearn_body. These can
force a rule to count as header or body points accordingly. (Bug
7907)
- SSL client certificate support for spamc/spamd is now easier. New
spamc options --ssl-cert, --ssl-key, --ssl-ca-file, and
--ssl-ca-path. New spamd options --ssl-verify, --ssl-ca-file, and
--ssl-ca-path (Bug 7267)
- ArchiveIterator now automatically uncompressed all gzip, bzip2, xz,
lz4, lzip, and lzo-compressed files (Bug 7598). These apply to
spamassassin and sa-learn commands also.
- New DMARC policy check plugin.
- New project maintained DecodeShortURLs plugin which may not be
directly compatible with rules from other third party plugins. See
The plugin documentation for configuration and rule format.
- Installing module Net::CIDR::Lite allows the use of dash-separated
IP range format (e.g. 192.168.1.1-192.168.255.255) for NetSet tables
including internal_networks, trusted_networks, msa_networks, and
uri_local_cidr.
- The HashBL plugin in 342.pre is now enabled by default.
- HeaderEval check_for_unique_subject_id() function is deprecated.
(end of UPGRADE)
//vim:tw=74:

View File

@ -84,7 +84,7 @@ If you use procmail, or haven't decided on any of the above examples:
If you want to use SpamAssassin site-wide:
- take a look at the notes on the Wiki website, currently at
<http://wiki.apache.org/spamassassin/UsingSiteWide>. You will probably
<https://wiki.apache.org/spamassassin/UsingSiteWide>. You will probably
want to use 'spamd' (see below). You may want to investigate the
new Apache mod_perl module, in the 'spamd-apache2' directory, too.
@ -101,11 +101,11 @@ If you want to use SpamAssassin site-wide:
add the line 'DROPPRIVS=yes' at the top of the file.
The Auto-Whitelist
The Auto-Welcomelist
------------------
The auto-whitelist is enabled using the 'use_auto_whitelist' option.
(See http://wiki.apache.org/spamassassin/AutoWhitelist for details on
The auto-welcomelist is enabled using the 'use_auto_welcomelist' option.
(See https://wiki.apache.org/spamassassin/AutoWelcomelist for details on
how it works, if you're curious.)
@ -113,15 +113,6 @@ Other Installation Notes
------------------------
- Hashcash is a useful system; it requires that senders exercise a
CPU-intensive task before they can send mail to you, so we give that
some bonus points. However, it requires that you list what addresses
you expect to receive mail for, by adding 'hashcash_accept' lines to
your ~/.spamassassin/user_prefs or /etc/mail/spamassassin/local.cf
files. See the Mail::SpamAssassin::Plugin::Hashcash manual page for
details on how to specify these.
- SpamAssassin now uses a temporary file in /tmp (or $TMPDIR, if that's
set in the environment) for Pyzor and DCC checks. Make sure that this
directory is either (a) not writable by other users, or (b) not shared
@ -161,7 +152,7 @@ Other Installation Notes
- A very handy new feature is SPF support, which allows you to check
that the message sender is permitted by their domain to send from the
IP address used. This has the potential to greatly cut down on mail
forgery. (see http://spf.pobox.com/ for more details.)
forgery.
- MDaemon users should add this line to their "local.cf" file:
@ -219,7 +210,7 @@ Other Installation Notes
to do this, take a look here [1] for a simple forwarding-based
alternative.
[1]: http://wiki.apache.org/spamassassin/SpamTrapping
[1]: https://wiki.apache.org/spamassassin/SpamTrapping
- Scores and other user preferences can now be loaded from, and Bayes
@ -242,7 +233,7 @@ Other Installation Notes
- Lots more ways to integrate SpamAssassin can be read at
http://wiki.SpamAssassin.org/ .
https://wiki.apache.org/spamassassin/ .
(end of USAGE)

View File

@ -239,6 +239,7 @@ sub lint_rule_text {
my $pretext = q{
loadplugin Mail::SpamAssassin::Plugin::Check
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
util_rb_tld com # skip "need to run sa-update" warn
use_bayes 0
};

View File

@ -1,7 +1,7 @@
#!/usr/bin/perl
BEGIN {
require Digest::SHA; import Digest::SHA qw(sha256_hex sha512_hex);
require Digest::SHA; Digest::SHA->import(qw(sha256_hex sha512_hex));
}
$/=undef;

View File

@ -1,7 +1,7 @@
#!/usr/bin/perl
BEGIN {
require Digest::SHA; import Digest::SHA qw(sha256_hex sha512_hex);
require Digest::SHA; Digest::SHA->import(qw(sha256_hex sha512_hex));
}
$/=undef;

View File

@ -6,9 +6,9 @@ SpamAssassin can now load users' score files from an LDAP server. The concept
here is to have a web application (PHP/perl/ASP/etc.) that will allow users to
be able to update their local preferences on how SpamAssassin will filter their
e-mail. The most common use for a system like this would be for users to be
able to update the white list of addresses (whitelist_from) without the need
for them to update their $HOME/.spamassassin/user_prefs file. It is also quite
common for users listed in /etc/passwd to not have a home directory,
able to update the white list of addresses (welcomelist_from, previously whitelist_from)
without the need for them to update their $HOME/.spamassassin/user_prefs file.
It is also quite common for users listed in /etc/passwd to not have a home directory,
therefore, the only way to have their own local settings would be through a
database or LDAP server.
@ -38,8 +38,8 @@ Examples:
If the user_scores_dsn option does not exist, SpamAssassin will not attempt
to use an LDAP server for retrieving users' preferences. Note that this will
NOT look for test rules, only local scores, whitelist_from(s), and
required_score.
NOT look for test rules, only local scores, welcomelist_from(s) (previously whitelist_from),
and required_score.
Requirements
------------
@ -68,7 +68,7 @@ our inetOrgPerson subclass.
Here's an example for openldap's /etc/openldap/schema/inetorgperson.schema :
# SpamAssassin
# see http://SpamAssassin.org/ .
# see https://SpamAssassin.org/ .
attributetype ( 2.16.840.1.113730.3.1.217
NAME 'spamassassin'
DESC 'SpamAssassin user preferences settings'
@ -97,9 +97,6 @@ If you do not see the above text, then the LDAP query was not successful, and
you should see any error messages reported. <username> should be the user
that was passed to spamd and is usually the user executing spamc.
If you need to set up LDAP, a good guide is here:
http://yolinux.com/TUTORIALS/LinuxTutorialLDAP.html
To test LDAP support using the SpamAssassin test suite, you need to
perform a little bit of manual configuration first. See the file
"ldap/README.testing" for details.
@ -111,6 +108,6 @@ operation of LDAP support may change at any time with future releases of SA.
******
Please send any comments to <kris at koehntopp.de> and file bugs via
<http://issues.apache.org/SpamAssassin/>.
<https://issues.apache.org/SpamAssassin/>.
Kristian Köhntopp

View File

@ -40,7 +40,7 @@ Mail::SpamAssassin - Spam detector and markup engine
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using several methods
including text analysis, internet-based realtime blacklists, statistical
including text analysis, internet-based realtime blocklists, statistical
analysis, and internet-based hashing algorithms.
Using its rule base, it uses a wide range of heuristic tests on mail
@ -64,7 +64,7 @@ use warnings;
# use bytes;
use re 'taint';
require 5.006_001;
require v5.14.0;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants;
@ -87,11 +87,10 @@ use Time::HiRes qw(time);
use Cwd;
use Config;
our $VERSION = "3.004006"; # update after release (same format as perl $])
our $VERSION = "4.000000"; # update after release (same format as perl $])
#our $IS_DEVEL_BUILD = 1; # 1 for devel build
our $IS_DEVEL_BUILD = 0; # 0 for release versions including rc & pre releases
# Used during the prerelease/release-candidate part of the official release
# process. If you hacked up your SA, you should add a version_tag to your .cf
# files; this variable should not be modified.
@ -101,18 +100,18 @@ our @ISA = qw();
# SUB_VERSION is now just <yyyy>-<mm>-<dd>
our $SUB_VERSION = 'svnunknown';
if ('$LastChangedDate: 2021-04-09 19:54:52 +1200 (Fri, 09 Apr 2021) $' =~ ':') {
# Subversion keyword "$LastChangedDate: 2021-04-09 19:54:52 +1200 (Fri, 09 Apr 2021) $" has been successfully expanded.
if ('$LastChangedDate: 2022-12-14 02:29:30 +0000 (Wed, 14 Dec 2022) $' =~ ':') {
# Subversion keyword "$LastChangedDate: 2022-12-14 02:29:30 +0000 (Wed, 14 Dec 2022) $" has been successfully expanded.
# Doesn't happen with automated launchpad builds:
# https://bugs.launchpad.net/launchpad/+bug/780916
$SUB_VERSION = (split(/\s+/,'$LastChangedDate: 2021-04-09 19:54:52 +1200 (Fri, 09 Apr 2021) $ updated by SVN'))[1];
$SUB_VERSION = (split(/\s+/,'$LastChangedDate: 2022-12-14 02:29:30 +0000 (Wed, 14 Dec 2022) $ updated by SVN'))[1];
}
if (defined $IS_DEVEL_BUILD && $IS_DEVEL_BUILD) {
if ('$LastChangedRevision: 1888548 $' =~ ':') {
# Subversion keyword "$LastChangedRevision: 1888548 $" has been successfully expanded.
push(@EXTRA_VERSION, ('r' . qw{$LastChangedRevision: 1888548 $ updated by SVN}[1]));
if ('$LastChangedRevision: 1905971 $' =~ ':') {
# Subversion keyword "$LastChangedRevision: 1905971 $" has been successfully expanded.
push(@EXTRA_VERSION, ('r' . qw{$LastChangedRevision: 1905971 $ updated by SVN}[1]));
} else {
push(@EXTRA_VERSION, ('r' . 'svnunknown'));
}
@ -428,8 +427,8 @@ sub new {
$self->timer_enable();
}
$self->{conf} ||= new Mail::SpamAssassin::Conf ($self);
$self->{plugins} = Mail::SpamAssassin::PluginHandler->new ($self);
$self->{conf} ||= Mail::SpamAssassin::Conf->new($self);
$self->{plugins} = Mail::SpamAssassin::PluginHandler->new($self);
$self->{save_pattern_hits} ||= 0;
@ -469,7 +468,7 @@ sub create_locker {
# for slow but safe, by keeping in quotes
eval '
use Mail::SpamAssassin::Locker::'.$class.';
$self->{locker} = new Mail::SpamAssassin::Locker::'.$class.' ($self);
$self->{locker} = Mail::SpamAssassin::Locker::'.$class.'->new($self);
1;
' or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
@ -728,6 +727,9 @@ sub init_learner {
if (exists $opts->{$k}) { $self->{$v} = $opts->{$k}; }
}
# Set flag which can be checked from plugins etc
$self->{learning} = 1;
return \%ret;
}
@ -757,6 +759,7 @@ Finish learning.
sub finish_learner {
my $self = shift;
$self->{bayes_scanner}->force_close(1) if $self->{bayes_scanner};
delete $self->{learning};
1;
}
@ -832,7 +835,7 @@ sub signal_user_changed {
$self->{bayes_scanner}->finish() if $self->{bayes_scanner};
if ($self->{conf}->{use_bayes}) {
require Mail::SpamAssassin::Bayes;
$self->{bayes_scanner} = new Mail::SpamAssassin::Bayes ($self);
$self->{bayes_scanner} = Mail::SpamAssassin::Bayes->new($self);
} else {
delete $self->{bayes_scanner} if $self->{bayes_scanner};
}
@ -949,73 +952,84 @@ sub revoke_as_spam {
###########################################################################
=item $f->add_address_to_whitelist ($addr, $cli_p)
=item $f->add_address_to_welcomelist ($addr, $cli_p)
Previously add_address_to_whitelist which will work interchangeably until 4.1.
Given a string containing an email address, add it to the automatic
whitelist database.
welcomelist database.
If $cli_p is set then underlying plugin may give visual feedback on additions/failures.
=cut
sub add_address_to_whitelist {
sub add_address_to_welcomelist {
my ($self, $addr, $cli_p) = @_;
$self->call_plugins("whitelist_address", { address => $addr,
$self->call_plugins("welcomelist_address", { address => $addr,
cli_p => $cli_p });
}
*add_address_to_whitelist = \&add_address_to_welcomelist; # removed in 4.1
###########################################################################
=item $f->add_all_addresses_to_whitelist ($mail, $cli_p)
=item $f->add_all_addresses_to_welcomelist ($mail, $cli_p)
Previously add_all_addresses_to_whitelist which will work interchangeably until 4.1.
Given a mail message, find as many addresses in the usual headers (To, Cc, From
etc.), and the message body, and add them to the automatic whitelist database.
etc.), and the message body, and add them to the automatic welcomelist database.
If $cli_p is set then underlying plugin may give visual feedback on additions/failures.
=cut
sub add_all_addresses_to_whitelist {
sub add_all_addresses_to_welcomelist {
my ($self, $mail_obj, $cli_p) = @_;
foreach my $addr ($self->find_all_addrs_in_mail ($mail_obj)) {
$self->call_plugins("whitelist_address", { address => $addr,
$self->call_plugins("welcomelist_address", { address => $addr,
cli_p => $cli_p });
}
}
*add_all_addresses_to_whitelist = \&add_all_addresses_to_welcomelist; # removed in 4.1
###########################################################################
=item $f->remove_address_from_whitelist ($addr, $cli_p)
=item $f->remove_address_from_welcomelist ($addr, $cli_p)
Previously remove_address_from_whitelist which will work interchangeably until 4.1.
Given a string containing an email address, remove it from the automatic
whitelist database.
welcomelist database.
If $cli_p is set then underlying plugin may give visual feedback on additions/failures.
=cut
sub remove_address_from_whitelist {
sub remove_address_from_welcomelist {
my ($self, $addr, $cli_p) = @_;
$self->call_plugins("remove_address", { address => $addr,
cli_p => $cli_p });
}
*remove_address_from_whitelist = \&remove_address_from_welcomelist; # removed in 4.1
###########################################################################
=item $f->remove_all_addresses_from_whitelist ($mail, $cli_p)
=item $f->remove_all_addresses_from_welcomelist ($mail, $cli_p)
Previously remove_all_addresses_from_whitelist which will work interchangeably until 4.1.
Given a mail message, find as many addresses in the usual headers (To, Cc, From
etc.), and the message body, and remove them from the automatic whitelist
etc.), and the message body, and remove them from the automatic welcomelist
database.
If $cli_p is set then underlying plugin may give visual feedback on additions/failures.
=cut
sub remove_all_addresses_from_whitelist {
sub remove_all_addresses_from_welcomelist {
my ($self, $mail_obj, $cli_p) = @_;
foreach my $addr ($self->find_all_addrs_in_mail ($mail_obj)) {
@ -1023,30 +1037,36 @@ sub remove_all_addresses_from_whitelist {
cli_p => $cli_p });
}
}
*remove_all_addresses_from_whitelist = \&remove_all_addresses_from_welcomelist; # removed in 4.1
###########################################################################
=item $f->add_address_to_blacklist ($addr, $cli_p)
=item $f->add_address_to_blocklist ($addr, $cli_p)
Previously add_address_to_blacklist which will work interchangeably until 4.1.
Given a string containing an email address, add it to the automatic
whitelist database with a high score, effectively blacklisting them.
welcomelist database with a high score, effectively blocklisting them.
If $cli_p is set then underlying plugin may give visual feedback on additions/failures.
=cut
sub add_address_to_blacklist {
sub add_address_to_blocklist {
my ($self, $addr, $cli_p) = @_;
$self->call_plugins("blacklist_address", { address => $addr,
$self->call_plugins("blocklist_address", { address => $addr,
cli_p => $cli_p });
}
*add_address_to_blacklist = \&add_address_to_blocklist; # removed in 4.1
###########################################################################
=item $f->add_all_addresses_to_blacklist ($mail, $cli_p)
=item $f->add_all_addresses_to_blocklist ($mail, $cli_p)
Previously add_all_addresses_to_blacklist which will work interchangeably until 4.1.
Given a mail message, find addresses in the From headers and add them to the
automatic whitelist database with a high score, effectively blacklisting them.
automatic welcomelist database with a high score, effectively blocklisting them.
Note that To and Cc addresses are not used.
@ -1054,23 +1074,26 @@ If $cli_p is set then underlying plugin may give visual feedback on additions/fa
=cut
sub add_all_addresses_to_blacklist {
sub add_all_addresses_to_blocklist {
my ($self, $mail_obj, $cli_p) = @_;
$self->init(1);
my @addrlist;
my @hdrs = $mail_obj->get_header('From');
if ($#hdrs >= 0) {
push (@addrlist, $self->find_all_addrs_in_line (join (" ", @hdrs)));
foreach my $hdr (@hdrs) {
my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr);
foreach my $addr (@addrs) {
push @addrlist, $addr->{address} if defined $addr->{address};
}
}
foreach my $addr (@addrlist) {
$self->call_plugins("blacklist_address", { address => $addr,
$self->call_plugins("blocklist_address", { address => $addr,
cli_p => $cli_p });
}
}
*add_all_addresses_to_blacklist = \&add_all_addresses_to_blocklist; # removed in 4.1
###########################################################################
@ -1158,13 +1181,12 @@ sub remove_spamassassin_markup {
my $hdrs = $mail_obj->get_pristine_header();
my $body = $mail_obj->get_pristine_body();
# remove DOS line endings
$hdrs =~ s/\r//gs;
# force \n for line-ending processing temporarily
$hdrs =~ s/\015?\012/\n/gs;
$body =~ s/\015?\012/\n/gs;
# unfold SA added headers, but not X-Spam-Prev headers ...
$hdrs = "\n".$hdrs; # simplifies regexp below
1 while $hdrs =~ s/(\nX-Spam-(?!Prev).+?)\n[ \t]+(\S.*\n)/$1 $2/g;
$hdrs =~ s/^\n//;
1 while $hdrs =~ s/((?:^|\n)X-Spam-(?!Prev).+?)\n[ \t]+(\S.*\n)/$1 $2/g;
###########################################################################
# Backward Compatibility, pre 3.0.x.
@ -1215,14 +1237,11 @@ sub remove_spamassassin_markup {
}
# remove any other X-Spam headers we added, will be unfolded
$hdrs = "\n".$hdrs; # simplifies regexp below
1 while $hdrs =~ s/\nX-Spam-.*\n/\n/g;
$hdrs =~ s/^\n//;
1 while $hdrs =~ s/(^|\n)X-Spam-.*\n/$1/g;
# re-add DOS line endings
if ($mail_obj->{line_ending} ne "\n") {
$hdrs =~ s/\r?\n/$mail_obj->{line_ending}/gs;
}
# force original message line endings
$hdrs =~ s/\n/$mail_obj->{line_ending}/gs;
$body =~ s/\n/$mail_obj->{line_ending}/gs;
# Put the whole thing back together ...
return join ('', $mbox, $hdrs, $body);
@ -1235,8 +1254,8 @@ sub remove_spamassassin_markup {
Read a configuration file and parse user preferences from it.
User preferences are as defined in the C<Mail::SpamAssassin::Conf> manual page.
In other words, they include scoring options, scores, whitelists and
blacklists, and so on, but do not include rule definitions, privileged
In other words, they include scoring options, scores, welcomelists and
blocklists, and so on, but do not include rule definitions, privileged
settings, etc. unless C<allow_user_rules> is enabled; and they never include
the administrator settings.
@ -1261,7 +1280,8 @@ sub read_scoreonly_config {
$text = "file start $filename\n" . $text;
# add an extra \n in case file did not end in one.
$text .= "\nfile end $filename\n";
$text .= "\n" unless $text =~ /\n\z/;
$text .= "file end $filename\n";
$self->{conf}->{main} = $self;
$self->{conf}->parse_scores_only ($text);
@ -1326,7 +1346,7 @@ sub load_scoreonly_ldap {
=item $f->set_persistent_address_list_factory ($factoryobj)
Set the persistent address list factory, used to create objects for the
automatic whitelist algorithm's persistent-storage back-end. See
automatic welcomelist algorithm's persistent-storage back-end. See
C<Mail::SpamAssassin::PersistentAddrList> for the API these factory objects
must implement, and the API the objects they produce must implement.
@ -1499,7 +1519,7 @@ sub lint_rules {
$self->{dont_copy_prefs} = $olddcp; # revert back to previous
# bug 5048: override settings to ensure a faster lint
$self->{'conf'}->{'use_auto_whitelist'} = 0;
$self->{'conf'}->{'use_auto_welcomelist'} = 0;
$self->{'conf'}->{'bayes_auto_learn'} = 0;
my $mail = $self->parse(\@testmsg, 1, { master_deadline => undef });
@ -1531,7 +1551,6 @@ sub finish {
$self->call_plugins("finish_tests", { conf => $self->{conf},
main => $self });
$self->{conf}->finish(); delete $self->{conf};
$self->{plugins}->finish(); delete $self->{plugins};
if ($self->{bayes_scanner}) {
@ -1541,6 +1560,8 @@ sub finish {
$self->{resolver}->finish() if $self->{resolver};
$self->{conf}->finish(); delete $self->{conf};
$self->timer_end("finish");
%{$self} = ();
}
@ -1662,6 +1683,11 @@ sub init {
# Note that this PID has run init()
$self->{_initted} = $$;
# if spamd or other forking, wait for spamd_child_init
if (!$self->{skip_prng_reseeding}) {
$self->set_global_state_dir();
}
#fix spamd reading root prefs file
if (!defined $use_user_pref) {
$use_user_pref = 1;
@ -1737,10 +1763,18 @@ sub init {
}
if ($self->{pre_config_text}) {
$self->{config_text} = $self->{pre_config_text} . $self->{config_text};
$self->{pre_config_text} .= "\n" unless $self->{pre_config_text} =~ /\n\z/;
$self->{config_text} = "file start (pre_config_text)\n".
$self->{pre_config_text}.
"file end (pre_config_text)\n".
$self->{config_text};
}
if ($self->{post_config_text}) {
$self->{config_text} .= $self->{post_config_text};
$self->{post_config_text} .= "\n" unless $self->{post_config_text} =~ /\n\z/;
$self->{config_text} .= "\n" unless $self->{config_text} =~ /\n\z/;
$self->{config_text} .= "file start (post_config_text)\n".
$self->{post_config_text}.
"file end (post_config_text)\n";
}
if ($self->{config_text} !~ /\S/) {
@ -1771,7 +1805,7 @@ sub init {
# Initialize the Bayes subsystem
if ($self->{conf}->{use_bayes}) {
require Mail::SpamAssassin::Bayes;
$self->{bayes_scanner} = new Mail::SpamAssassin::Bayes ($self);
$self->{bayes_scanner} = Mail::SpamAssassin::Bayes->new($self);
}
$self->{'learn_to_journal'} = $self->{conf}->{bayes_learn_to_journal};
@ -1796,6 +1830,21 @@ sub init {
# should be called only after configuration has been parsed
$self->{resolver} = Mail::SpamAssassin::DnsResolver->new($self);
# load GeoDB if some plugin wants it
if ($self->{geodb_wanted}) {
eval '
use Mail::SpamAssassin::GeoDB;
$self->{geodb} = Mail::SpamAssassin::GeoDB->new({
conf => $self->{conf}->{geodb},
wanted => $self->{geodb_wanted},
});
1;
';
if ($@ || !$self->{geodb}) {
dbg("config: GeoDB disabled: $@");
}
}
# TODO -- open DNS cache etc. if necessary
}
@ -1826,10 +1875,10 @@ sub _read_cf_pre {
dbg("config: file or directory $path not accessible: $!");
} elsif (-d _) {
foreach my $file ($self->$filelistmethod($path)) {
$txt .= read_cf_file($file);
$txt .= $self->read_cf_file($file);
}
} elsif (-f _ && -s _ && -r _) {
$txt .= read_cf_file($path);
$txt .= $self->read_cf_file($path);
}
}
@ -1838,9 +1887,14 @@ sub _read_cf_pre {
sub read_cf_file {
my($path) = @_;
my($self, $path) = @_;
my $txt = '';
if ($self->{cf_files_read}->{$path}++) {
dbg("config: skipping already read file: $path");
return $txt;
}
local *IN;
if (open (IN, "<".$path)) {
@ -1852,7 +1906,8 @@ sub read_cf_file {
$txt = "file start $path\n" . $txt;
# add an extra \n in case file did not end in one.
$txt .= "\nfile end $path\n";
$txt .= "\n" unless $txt =~ /\n\z/;
$txt .= "file end $path\n";
dbg("config: read file $path");
}
@ -1900,7 +1955,7 @@ sub get_and_create_userstate_dir {
dbg("config: error accessing $fname: $!");
} else { # does not exist, create it
eval {
mkpath($fname, 0, 0700); 1;
mkpath(Mail::SpamAssassin::Util::untaint_file_path($fname), 0, 0700); 1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("config: mkdir $fname failed: $eval_stat");
@ -1910,6 +1965,58 @@ sub get_and_create_userstate_dir {
$fname;
}
# find the most global writable state dir
# used by dns_block_rule state files etc
sub set_global_state_dir {
my ($self) = @_;
# try home_dir_for_helpers
my $helper_dir = $self->{home_dir_for_helpers} || '';
if ($helper_dir) {
my $dir = File::Spec->catdir($helper_dir, ".spamassassin");
return if $self->test_global_state_dir($dir);
}
# try user home (if different from helper home)
my $home;
if (am_running_on_windows()) {
# Windows has a special folder for common appdata (Bug 8050)
$home = Mail::SpamAssassin::Util::common_application_data_directory();
} else {
$home = (Mail::SpamAssassin::Util::portable_getpwuid ($>))[7];
}
if ($home && $home ne $helper_dir) {
my $dir = File::Spec->catdir($home, ".spamassassin");
return if $self->test_global_state_dir($dir);
}
# try LOCAL_STATE_DIR
return if $self->test_global_state_dir($self->{LOCAL_STATE_DIR});
# fallback to userstate
$self->{global_state_dir} = $self->get_and_create_userstate_dir();
dbg("config: global_state_dir set to userstate_dir: $self->{global_state_dir}");
}
sub test_global_state_dir {
my ($self, $dir) = @_;
eval { mkpath($dir, 0, 0700); }; # just a single stat if exists already
# Purge stale test files (enough to do only some times randomly)
if (rand() < 0.2 && opendir(WT_DIR, $dir)) {
foreach (grep {index($_, '.sawritetest') == 0 &&
(-M File::Spec->catfile($dir, $_)||0) > 0.0001} readdir(WT_DIR)) {
unlink(Mail::SpamAssassin::Util::untaint_file_path(File::Spec->catfile($dir, $_)));
}
closedir WT_DIR;
}
my $n = ".sawritetest$$".Mail::SpamAssassin::Util::pseudo_random_string(6);
my $file = File::Spec->catfile($dir, $n);
if (Mail::SpamAssassin::Util::touch_file($file, { create_exclusive => 1 })) {
dbg("config: global_state_dir set to $dir");
$self->{global_state_dir} = $dir;
unlink($file);
return 1;
}
unlink($file); # just in case?
return 0;
}
=item $fullpath = $f->find_rule_support_file ($filename)
Find a rule-support file, such as C<languages> or C<triplets.txt>,
@ -1923,8 +2030,22 @@ it exists, or undef if it doesn't exist.
sub find_rule_support_file {
my ($self, $filename) = @_;
my @paths;
# search custom directories first
if ($self->{site_rules_filename}) {
foreach my $path (split("\000", $self->{site_rules_filename})) {
push @paths, $path if -d $path;
}
}
if ($self->{rules_filename} && -d $self->{rules_filename}) {
push @paths, $self->{rules_filename}
}
# updates sub-directory missing from @default_rules_path
push @paths, '__local_state_dir__/__version__/updates_spamassassin_org';
push @paths, @default_rules_path;
return $self->first_existing_path(
map { my $p = $_; $p =~ s{$}{/$filename}; $p } @default_rules_path );
map { my $p = $_; $p =~ s{$}{/$filename}; $p } @paths );
}
=item $f->create_default_prefs ($filename, $username [ , $userdir ] )
@ -2006,15 +2127,15 @@ sub expand_name {
if (am_running_on_windows()) {
my $userprofile = $ENV{USERPROFILE} || '';
return $userprofile if ($userprofile && $userprofile =~ m/^[a-z]\:[\/\\]/oi);
return $userprofile if ($userprofile =~ m/^\\\\/o);
return $userprofile if ($userprofile && $userprofile =~ m/^[a-z]\:[\/\\]/i);
return $userprofile if ($userprofile =~ m/^\\\\/);
return $home if ($home && $home =~ m/^[a-z]\:[\/\\]/oi);
return $home if ($home =~ m/^\\\\/o);
return $home if ($home && $home =~ m/^[a-z]\:[\/\\]/i);
return $home if ($home =~ m/^\\\\/);
return '';
} else {
return $home if ($home && $home =~ /\//o);
return $home if ($home && index($home, '/') != -1);
return (getpwnam($name))[7] if ($name ne '');
return (getpwuid($>))[7];
}
@ -2028,6 +2149,9 @@ sub sed_path {
return $self->{conf}->{sed_path_cache}->{$path};
}
# <4.0 compatibility check, to be removed in 4.1
my $check_compat = $path eq '__userstate__/auto-welcomelist';
my $orig_path = $path;
$path =~ s/__local_rules_dir__/$self->{LOCAL_RULES_DIR} || ''/ges;
@ -2035,10 +2159,21 @@ sub sed_path {
$path =~ s/__def_rules_dir__/$self->{DEF_RULES_DIR} || ''/ges;
$path =~ s{__prefix__}{$self->{PREFIX} || $Config{prefix} || '/usr'}ges;
$path =~ s{__userstate__}{$self->get_and_create_userstate_dir() || ''}ges;
$path =~ s/__global_state_dir__/$self->{global_state_dir} || ''/ges;
$path =~ s{__perl_major_ver__}{$self->get_perl_major_version()}ges;
$path =~ s/__version__/${VERSION}/gs;
$path =~ s/^\~([^\/]*)/$self->expand_name($1)/es;
# <4.0 compatibility check, to be removed in 4.1
if ($check_compat) {
if ($path =~ m{^(.+)/(.+)$}) {
# Use auto-whitelist if found
if (!-e $path && -e "$1/auto-whitelist") {
$path = "$1/auto-whitelist";
}
}
}
$path = Mail::SpamAssassin::Util::untaint_file_path ($path);
$self->{conf}->{sed_path_cache}->{$orig_path} = $path;
return $path;
@ -2077,16 +2212,32 @@ sub get_pre_files_in_dir {
return $self->_get_cf_pre_files_in_dir($dir, 'pre');
}
sub _reorder_dir {
# Official ASF channel should be loaded first in
# order to be able to override scores by using custom channels
# bz 7991
if($a eq 'updates_spamassassin_org.cf') {
return -1;
} elsif ($b eq 'updates_spamassassin_org.cf') {
return 1;
}
return $a cmp $b;
}
sub _get_cf_pre_files_in_dir {
my ($self, $dir, $type) = @_;
if ($self->{config_tree_recurse}) {
my @cfs;
# copied from Mail::SpamAssassin::Util::untaint_file_path
# fix bugs 8010 and 8025 by using an untaint pattern that is better on Windows than File::Find's default
my $chars = '-_A-Za-z0-9.#%=+,/:()\\@\\xA0-\\xFF\\\\';
my $re = qr{^\s*([$chars][${chars}~ ]*)\z};
# use "eval" to avoid loading File::Find unless this is specified
eval ' use File::Find qw();
File::Find::find(
{ untaint => 1,
am_running_on_windows() ? (untaint_pattern => $re) : (),
follow => 1,
wanted =>
sub { push(@cfs, $File::Find::name) if /\.\Q$type\E$/i && -f $_ }
@ -2095,7 +2246,7 @@ sub _get_cf_pre_files_in_dir {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
die "_get_cf_pre_files_in_dir error: $eval_stat";
};
@cfs = sort { $a cmp $b } @cfs;
@cfs = sort { _reorder_dir($a, $b) } @cfs;
return @cfs;
}
else {
@ -2104,7 +2255,7 @@ sub _get_cf_pre_files_in_dir {
/\.${type}$/i && -f "$dir/$_" } readdir(SA_CF_DIR);
closedir SA_CF_DIR;
return map { "$dir/$_" } sort { $a cmp $b } @cfs;
return map { "$dir/$_" } sort { _reorder_dir($a, $b) } @cfs;
}
}
@ -2127,11 +2278,18 @@ sub call_plugins {
return unless $self->{plugins};
# Use some calls ourself too
if ($subname eq 'finish_parsing_end') {
if ($subname eq 'spamd_child_init') {
# set global dir now if spamd
$self->set_global_state_dir();
} elsif ($subname eq 'finish_parsing_end') {
# Initialize RegistryBoundaries, now that util_rb_tld etc from config is
# read. Plugins can also now use {valid_tlds_re} to one time compile
# regexes in finish_parsing_end.
$self->{registryboundaries} = Mail::SpamAssassin::RegistryBoundaries->new ($self);
} elsif ($subname eq 'whitelist_address' || $subname eq 'blacklist_address') {
# Warn about backwards compatibility, removed in 4.1
# Third party usage should be rare event, so do not translate function names
warn "config: Deprecated $subname called from call_plugins, use welcomelist_address or blocklist_address\n";
}
# safety net in case some plugin changes global settings, Bug 6218
@ -2152,8 +2310,12 @@ sub find_all_addrs_in_mail {
Errors-To Mail-Followup-To))
{
my @hdrs = $mail_obj->get_header($header);
if ($#hdrs < 0) { next; }
push (@addrlist, $self->find_all_addrs_in_line(join (" ", @hdrs)));
foreach my $hdr (@hdrs) {
my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr);
foreach my $addr (@addrs) {
push @addrlist, $addr->{address} if defined $addr->{address};
}
}
}
# find addrs in body, too
@ -2176,17 +2338,22 @@ sub find_all_addrs_in_mail {
sub find_all_addrs_in_line {
my ($self, $line) = @_;
return () unless defined $line;
# a more permissive pattern based on "dot-atom" as per RFC2822
my $ID_PATTERN = '[-a-z0-9_\+\:\=\!\#\$\%\&\*\^\?\{\}\|\~\/\.]+';
my $HOST_PATTERN = '[-a-z0-9_\+\:\/]+';
my $ID_PATTERN = qr/[-a-zA-Z0-9_\+\:\=\!\#\$\%\&\*\^\?\{\}\|\~\/\.]+/;
my $HOST_PATTERN = qr/[-a-zA-Z0-9_\+\:\/]+/;
my @addrs;
my %seen;
while ($line =~ s/(?:mailto:)?\s*
($ID_PATTERN \@
$HOST_PATTERN(?:\.$HOST_PATTERN)+)//oix)
($HOST_PATTERN(?:\.$HOST_PATTERN)+))//oix)
{
my $addr = $1;
my $host = $2;
next unless Mail::SpamAssassin::Util::is_fqdn_valid($host);
next unless $self->{registryboundaries}->is_domain_valid($host);
$addr =~ s/^mailto://;
next if (defined ($seen{$addr})); $seen{$addr} = 1;
push (@addrs, $addr);

View File

@ -25,13 +25,13 @@ use warnings;
use re 'taint';
use Errno qw(ENOENT EACCES EBADF);
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Util qw(compile_regexp);
use Mail::SpamAssassin::Constants qw(:sa);
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::AICache;
# 256 KiB is a big email, unless stated otherwise
use constant BIG_BYTES => 256*1024;
# 500 KiB is a big email, unless stated otherwise
use constant BIG_BYTES => 500*1024;
our ( $MESSAGES, $AICache, %class_opts );
@ -43,9 +43,9 @@ Mail::SpamAssassin::ArchiveIterator - find and process messages one at a time
=head1 SYNOPSIS
my $iter = new Mail::SpamAssassin::ArchiveIterator(
my $iter = Mail::SpamAssassin::ArchiveIterator->new(
{
'opt_max_size' => 256 * 1024, # 0 implies no limit
'opt_max_size' => 500 * 1024, # 0 implies no limit
'opt_cache' => 1,
}
);
@ -77,7 +77,7 @@ and C<result_sub> functions appropriately per message.
###########################################################################
=item $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... } ] )
=item $item = Mail::SpamAssassin::ArchiveIterator->new( [ { opt => val, ... } ] )
Constructs a new C<Mail::SpamAssassin::ArchiveIterator> object. You may
pass the following attribute-value pairs to the constructor. The pairs are
@ -91,7 +91,7 @@ A value of option I<opt_max_size> determines a limit (number of bytes)
beyond which a message is considered large and is skipped by ArchiveIterator.
A value 0 implies no size limit, all messages are examined. An undefined
value implies a default limit of 256 KiB.
value implies a default limit of 500 KiB.
=item opt_all
@ -205,8 +205,6 @@ sub new {
$self->{s} = [ ]; # spam, of course
$self->{h} = [ ]; # ham, as if you couldn't guess
$self->{access_problem} = 0;
if ($self->{opt_all}) {
$self->{opt_max_size} = 0;
} elsif (!defined $self->{opt_max_size}) {
@ -235,10 +233,14 @@ sub set_functions {
=item run ( @target_paths )
Generates the list of messages to process, then runs each message through the
configured wanted subroutine. Files which have a name ending in C<.gz> or
C<.bz2> will be properly uncompressed via call to C<gzip -dc> and C<bzip2 -dc>
respectively.
Generates the list of messages to process, then runs each message through
the configured wanted subroutine.
Compressed files are detected and uncompressed automatically regardless of
file extension. Supported formats are C<gzip>, C<bzip2>, C<xz>, C<lz4>,
C<lzip>, C<lzo>. Gzip is uncompressed via IO::Zlib module, others use their
specific command line tool (bzip2/xz/lz4/lzip/lzop). Compressed
mailbox/mbox files are not supported.
The target_paths array is expected to be either one element per path in the
following format: C<class:format:raw_location>, or a hash reference containing
@ -292,6 +294,11 @@ sub run {
return 0;
}
# Find some uncompressors (gzip is handled with IO::Zlib)
foreach ('bzip2','xz','lz4','lzip','lzop') {
$self->{$_.'_path'} = Mail::SpamAssassin::Util::find_executable_in_env_path($_);
}
# scan the targets and get the number and list of messages
$self->_scan_targets(\@targets,
sub {
@ -316,11 +323,16 @@ sub run {
sub _run {
my ($self, $messages) = @_;
my $messages_run = 0;
while (my $message = shift @{$messages}) {
my($class, undef, $date, undef, $result) = $self->_run_message($message);
&{$self->{result_sub}}($class, $result, $date) if $result;
if ($result) {
$messages_run++;
&{$self->{result_sub}}($class, $result, $date);
}
return ! $self->{access_problem};
}
# Return success if atleast some files were processed through
return $messages_run > 0;
}
############################################################################
@ -346,24 +358,8 @@ sub _run_message {
sub _run_file {
my ($self, $class, $format, $where, $date) = @_;
if (!_mail_open($where)) {
$self->{access_problem} = 1;
return;
}
my $stat_errn = stat(INPUT) ? 0 : 0+$!;
if ($stat_errn == ENOENT) {
dbg("archive-iterator: no such input ($where)");
return;
}
elsif ($stat_errn != 0) {
warn "archive-iterator: no access to input ($where): $!";
return;
}
elsif (!-f _ && !-c _ && !-p _) {
warn "archive-iterator: not a plain file (or char.spec. or pipe) ($where)";
return;
}
my $fh = $self->_mail_open($where, 1);
return unless $fh;
my $opt_max_size = $self->{opt_max_size};
if (!$opt_max_size) {
@ -375,7 +371,7 @@ sub _run_file {
# note that -s can only deal with files, it returns 0 on char.spec. STDIN
info("archive-iterator: skipping large message: ".
"file size %d, limit %d bytes", -s _, $opt_max_size);
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
return;
}
@ -384,12 +380,12 @@ sub _run_file {
my $len = 0;
my $str = '';
my($inbuf,$nread);
while ( $nread=read(INPUT,$inbuf,16384) ) {
while ( $nread=read($fh,$inbuf,16384) ) {
$len += $nread;
if ($opt_max_size && $len > $opt_max_size) {
info("archive-iterator: skipping large message: read %d, limit %d bytes",
$len, $opt_max_size);
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
return;
}
$str .= $inbuf;
@ -400,7 +396,7 @@ sub _run_file {
for my $j (0..$#msg) {
if ($msg[$j] =~ /^\015?$/) { $header = $j; last }
}
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
$date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
@ -418,21 +414,20 @@ sub _run_mailbox {
}
my @msg;
my $header;
if (!_mail_open($file)) {
$self->{access_problem} = 1;
return;
}
my $fh = $self->_mail_open($file, 1);
return unless $fh;
my $opt_max_size = $self->{opt_max_size};
dbg("archive-iterator: _run_mailbox %s, ofs %d, limit %d",
$file, $offset, $opt_max_size||0);
seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!";
seek($fh,$offset,0) or die "cannot reposition file to $offset: $!";
my $size = 0;
for ($!=0; <INPUT>; $!=0) {
for ($!=0; <$fh>; $!=0) {
#Changed Regex to use option Per bug 6703
last if (substr($_,0,5) eq "From " && @msg && /$self->{opt_from_regex}/o);
last if (/^From / && @msg && $_ =~ $self->{opt_from_regex});
$size += length($_);
push (@msg, $_);
@ -441,7 +436,7 @@ sub _run_mailbox {
info("archive-iterator: skipping large message: ".
"%d lines, %d bytes, limit %d bytes",
scalar @msg, $size, $opt_max_size);
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
return;
}
@ -452,7 +447,7 @@ sub _run_mailbox {
defined $_ || $!==0 or
$!==EBADF ? dbg("archive-iterator: error reading: $!")
: die "error reading: $!";
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
$date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
@ -464,23 +459,24 @@ sub _run_mailbox {
sub _run_mbx {
my ($self, $class, $format, $where, $date) = @_;
my ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
my ($file, $offset);
{ local($1,$2); # Bug 7140 (avoids perl bug [perl #123880])
($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
}
my @msg;
my $header;
if (!_mail_open($file)) {
$self->{access_problem} = 1;
return;
}
my $fh = $self->_mail_open($file, 1);
return unless $fh;
my $opt_max_size = $self->{opt_max_size};
dbg("archive-iterator: _run_mbx %s, ofs %d, limit %d",
$file, $offset, $opt_max_size||0);
seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!";
seek($fh,$offset,0) or die "cannot reposition file to $offset: $!";
my $size = 0;
for ($!=0; <INPUT>; $!=0) {
for ($!=0; <$fh>; $!=0) {
last if ($_ =~ MBX_SEPARATOR);
$size += length($_);
push (@msg, $_);
@ -490,7 +486,7 @@ sub _run_mbx {
info("archive-iterator: skipping large message: ".
"%d lines, %d bytes, limit %d bytes",
scalar @msg, $size, $opt_max_size);
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
return;
}
@ -501,7 +497,7 @@ sub _run_mbx {
defined $_ || $!==0 or
$!==EBADF ? dbg("archive-iterator: error reading: $!")
: die "error reading: $!";
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
$date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
@ -579,12 +575,9 @@ sub _scan_targets {
if ($format eq 'detect') {
# detect the format
my $stat_errn = stat($location) ? 0 : 0+$!;
if ($stat_errn == ENOENT) {
$thisformat = 'file'; # actually, no file - to be detected later
}
elsif ($stat_errn != 0) {
warn "archive-iterator: no access to $location: $!";
$thisformat = 'file';
if ($stat_errn != 0) {
warn "archive-iterator: no access to $location: $!\n";
next;
}
elsif (-d _) {
# it's a directory
@ -623,37 +616,121 @@ sub _scan_targets {
}
sub _mail_open {
my ($file) = @_;
my ($self, $file, $ignore_missing) = @_;
my $fh;
# Go ahead and try to open the file
# bug 5288: the "magic" version of open will strip leading and trailing
# whitespace from the expression. switch to the three-argument version
# of open which does not strip whitespace. see "perldoc -f open" and
# "perldoc perlipc" for more information.
# Assume that the file by default is just a plain file
my @expr = ( $file );
my $mode = '<';
# Handle different types of compressed files
if ($file =~ /\.gz$/) {
$mode = '-|';
unshift @expr, 'gunzip', '-cd';
if (!open($fh, '<', $file)) {
# Don't warn about disappeared files
if ($ignore_missing && $! == ENOENT) {
dbg("archive-iterator: no access to $file: $!");
} else {
warn "archive-iterator: no access to $file: $!\n"
}
elsif ($file =~ /\.bz2$/) {
$mode = '-|';
unshift @expr, 'bzip2', '-cd';
}
# Go ahead and try to open the file
if (!open (INPUT, $mode, @expr)) {
warn "archive-iterator: unable to open $file: $!\n";
return 0;
return;
}
# bug 5249: mail could have 8-bit data, need this on some platforms
binmode INPUT or die "cannot set input file to binmode: $!";
binmode $fh or die "cannot set input file to binmode: $!";
return 1;
# Detect compressed data (only from files, can't reopen pipe)
if (-f $file && read($fh, my $magic, 6)) {
# GZIP
if ($magic =~ /^\x1F\x8B/) {
dbg("archive-iterator: detected gzip file $file, reopening with IO::Zlib");
close $fh or die "error closing input file: $!";
eval { require IO::Zlib; };
if ($@) { warn "archive-iterator: IO::Zlib required for $file: $@\n"; return; }
$fh = IO::Zlib->new($file, "rb");
if (!$fh) {
if ($ignore_missing && $! == ENOENT) {
dbg("archive-iterator: no access to $file: $!");
} else {
warn "archive-iterator: no access to $file: $!\n";
}
return;
}
}
# BZIP2
elsif ($magic =~ /^\x42\x5A(?:\x68|\x30)/) {
dbg("archive-iterator: detected bzip2 file $file, reopening with bzip2");
close $fh or die "error closing input file: $!";
if (!$self->{bzip2_path}) {
warn "archive-iterator: bzip2 executable required for $file\n";
return;
}
if (!open($fh, '-|', $self->{bzip2_path}, '-cd', $file)) {
warn "archive-iterator: no access to $file: $!\n";
return;
}
binmode $fh or die "cannot set input file to binmode: $!";
}
# XZ
elsif ($magic =~ /^\xFD\x37\x7A\x58\x5A\x00/) {
dbg("archive-iterator: detected xz file $file, reopening with xz");
close $fh or die "error closing input file: $!";
if (!$self->{xz_path}) {
warn "archive-iterator: xz executable required for $file\n";
return;
}
if (!open($fh, '-|', $self->{xz_path}, '-cd', $file)) {
warn "archive-iterator: no access to $file: $!\n";
return;
}
binmode $fh or die "cannot set input file to binmode: $!";
}
# LZ4
elsif ($magic =~ /^\x04\x22\x4D\x18/) {
dbg("archive-iterator: detected lz4 file $file, reopening with lz4");
close $fh or die "error closing input file: $!";
if (!$self->{lz4_path}) {
warn "archive-iterator: lz4 executable required for $file\n";
return;
}
if (!open($fh, '-|', $self->{lz4_path}, '-cd', $file)) {
warn "archive-iterator: no access to $file: $!\n";
return;
}
binmode $fh or die "cannot set input file to binmode: $!";
}
# LZIP
elsif ($magic =~ /^\x4C\x5A\x49\x50/) {
dbg("archive-iterator: detected lzip file $file, reopening with lzip");
close $fh or die "error closing input file: $!";
if (!$self->{lzip_path}) {
warn "archive-iterator: lzip executable required for $file\n";
return;
}
if (!open($fh, '-|', $self->{lzip_path}, '-cd', $file)) {
warn "archive-iterator: no access to $file: $!\n";
return;
}
binmode $fh or die "cannot set input file to binmode: $!";
}
# LZO
elsif ($magic =~ /^\x89\x4C\x5A\x4F\x00\x0D/) {
dbg("archive-iterator: detected lzo file $file, reopening with lzop");
close $fh or die "error closing input file: $!";
if (!$self->{lzop_path}) {
warn "archive-iterator: lzop executable required for $file\n";
return;
}
if (!open($fh, '-|', $self->{lzop_path}, '-cd', $file)) {
warn "archive-iterator: no access to $file: $!\n";
return;
}
binmode $fh or die "cannot set input file to binmode: $!";
} else {
# Reset position
seek($fh,0,0);
}
}
return $fh;
}
sub _set_default_message_selection_opts {
@ -663,11 +740,15 @@ sub _set_default_message_selection_opts {
$self->{opt_want_date} = 1 unless (defined $self->{opt_want_date});
$self->{opt_cache} = 0 unless (defined $self->{opt_cache});
#Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413
$self->{opt_from_regex} = '^From \S+ ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)' unless (defined $self->{opt_from_regex});
#STRIP LEADING AND TRAILING / FROM REGEX FOR OPTION
$self->{opt_from_regex} =~ s/^\///;
$self->{opt_from_regex} =~ s/\/$//;
if (!defined $self->{opt_from_regex}) {
$self->{opt_from_regex} = qr/^From \S+ ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/;
} elsif (ref($self->{opt_from_regex}) ne 'Regexp') {
my ($rec, $err) = compile_regexp($self->{opt_from_regex}, 1);
if (!$rec) {
die "fatal: invalid mbox_format_from_regex '$self->{opt_from_regex}': $err\n";
}
$self->{opt_from_regex} = $rec;
}
dbg("archive-iterator: _set_default_message_selection_opts After: Scanprob[$self->{opt_scanprob}], want_date[$self->{opt_want_date}], cache[$self->{opt_cache}], from_regex[$self->{opt_from_regex}]");
@ -750,7 +831,7 @@ sub _scan_directory {
# Maildir format: bug 3003
for my $sub ("new", "cur") {
opendir (DIR, "$folder/$sub")
or die "Can't open '$folder/$sub' dir: $!\n";
or die "archive-iterator: can't open '$folder/$sub' dir: $!\n";
# Don't learn from messages marked as deleted
# Or files starting with a leading dot
push @files, map { "$sub/$_" } grep { !/^\.|:2,.*T/ } readdir(DIR);
@ -790,9 +871,10 @@ sub _scan_directory {
my $stat_errn = stat($file) ? 0 : 0+$!;
if ($stat_errn == ENOENT) {
# no longer there?
dbg("archive-iterator: no access to $file: $!");
}
elsif ($stat_errn != 0) {
warn "archive-iterator: no access to $file: $!";
warn "archive-iterator: no access to $file: $!\n";
}
elsif (-f _ || -c _ || -p _) {
$self->_scan_file($class, $file, $bkfunc);
@ -801,7 +883,7 @@ sub _scan_directory {
push(@subdirs, $file);
}
else {
warn "archive-iterator: $file is not a plain file or directory: $!";
warn "archive-iterator: $file is not a plain file or directory\n";
}
}
undef @files; # release storage
@ -844,18 +926,17 @@ sub _scan_file {
}
my $header = '';
if (!_mail_open($mail)) {
$self->{access_problem} = 1;
return;
}
for ($!=0; <INPUT>; $!=0) {
my $fh = $self->_mail_open($mail);
return unless $fh;
for ($!=0; <$fh>; $!=0) {
last if /^\015?$/s;
$header .= $_;
}
defined $_ || $!==0 or
$!==EBADF ? dbg("archive-iterator: error reading: $!")
: die "error reading: $!";
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
return if ($self->{opt_skip_empty_messages} && $header eq '');
@ -896,7 +977,6 @@ sub _scan_mailbox {
$folder =~ s/\/\s*$//; #Remove trailing slash, if there
if (!opendir(DIR, $folder)) {
warn "archive-iterator: can't open '$folder' dir: $!\n";
$self->{access_problem} = 1;
return;
}
while ($_ = readdir(DIR)) {
@ -921,9 +1001,8 @@ sub _scan_mailbox {
foreach my $file (@files) {
$self->_bump_scan_progress();
if ($file =~ /\.(?:gz|bz2)$/) {
if ($file =~ /\.(?:gz|bz2|xz|lz[o4]?)$/i) {
warn "archive-iterator: compressed mbox folders are not supported at this time\n";
$self->{access_problem} = 1;
next;
}
@ -943,20 +1022,18 @@ sub _scan_mailbox {
}
unless ($count) {
if (!_mail_open($file)) {
$self->{access_problem} = 1;
next;
}
my $fh = $self->_mail_open($file);
next unless $fh;
my $start = 0; # start of a message
my $where = 0; # current byte offset
my $first = ''; # first line of message
my $header = ''; # header text
my $in_header = 0; # are in we a header?
while (!eof INPUT) {
while (!eof $fh) {
my $offset = $start; # byte offset of this message
my $header = $first; # remember first line
for ($!=0; <INPUT>; $!=0) {
for ($!=0; <$fh>; $!=0) {
if ($in_header) {
if (/^\015?$/s) {
$in_header = 0;
@ -966,15 +1043,15 @@ sub _scan_mailbox {
}
}
#Changed Regex to use option Per bug 6703
if (substr($_,0,5) eq "From " && /$self->{opt_from_regex}/o) {
if (/^From / && $_ =~ $self->{opt_from_regex}) {
$in_header = 1;
$first = $_;
$start = $where;
$where = tell INPUT;
$where = tell $fh;
$where >= 0 or die "cannot obtain file position: $!";
last;
}
$where = tell INPUT;
$where = tell $fh;
$where >= 0 or die "cannot obtain file position: $!";
}
defined $_ || $!==0 or
@ -986,7 +1063,7 @@ sub _scan_mailbox {
$info->{$offset} = Mail::SpamAssassin::Util::receive_date($header);
}
}
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
}
while(my($k,$v) = each %{$info}) {
@ -1027,7 +1104,6 @@ sub _scan_mbx {
$folder =~ s/\/\s*$//; # remove trailing slash, if there is one
if (!opendir(DIR, $folder)) {
warn "archive-iterator: can't open '$folder' dir: $!\n";
$self->{access_problem} = 1;
return;
}
while ($_ = readdir(DIR)) {
@ -1053,9 +1129,8 @@ sub _scan_mbx {
foreach my $file (@files) {
$self->_bump_scan_progress();
if ($folder =~ /\.(?:gz|bz2)$/) {
if ($folder =~ /\.(?:gz|bz2|xz|lz[o4]?)$/i) {
warn "archive-iterator: compressed mbx folders are not supported at this time\n";
$self->{access_problem} = 1;
next;
}
@ -1075,13 +1150,11 @@ sub _scan_mbx {
}
unless ($count) {
if (!_mail_open($file)) {
$self->{access_problem} = 1;
next;
}
my $fh = $self->_mail_open($file);
next unless $fh;
# check the mailbox is in mbx format
$! = 0; $fp = <INPUT>;
$! = 0; $fp = <$fh>;
defined $fp || $!==0 or
$!==EBADF ? dbg("archive-iterator: error reading: $!")
: die "error reading: $!";
@ -1092,18 +1165,17 @@ sub _scan_mbx {
}
# skip mbx headers to the first email...
seek(INPUT,2048,0) or die "cannot reposition file to 2048: $!";
my $sep = MBX_SEPARATOR;
seek($fh,2048,0) or die "cannot reposition file to 2048: $!";
for ($!=0; <INPUT>; $!=0) {
if ($_ =~ /$sep/) {
my $offset = tell INPUT;
for ($!=0; <$fh>; $!=0) {
if ($_ =~ MBX_SEPARATOR) {
my $offset = tell $fh;
$offset >= 0 or die "cannot obtain file position: $!";
my $size = $2;
# gather up the headers...
my $header = '';
for ($!=0; <INPUT>; $!=0) {
for ($!=0; <$fh>; $!=0) {
last if (/^\015?$/s);
$header .= $_;
}
@ -1116,7 +1188,7 @@ sub _scan_mbx {
}
# go onto the next message
seek(INPUT, $offset + $size, 0)
seek($fh, $offset + $size, 0)
or die "cannot reposition file to $offset + $size: $!";
}
else {
@ -1126,7 +1198,7 @@ sub _scan_mbx {
defined $_ || $!==0 or
$!==EBADF ? dbg("archive-iterator: error reading: $!")
: die "error reading: $!";
close INPUT or die "error closing input file: $!";
close $fh or die "error closing input file: $!";
}
while(my($k,$v) = each %{$info}) {

View File

@ -42,6 +42,7 @@ use Time::HiRes qw(time);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(idn_to_ascii domain_to_search_list);
our @ISA = qw();
@ -71,9 +72,9 @@ sub new {
main => $main,
queries_started => 0,
queries_completed => 0,
total_queries_started => 0,
total_queries_completed => 0,
pending_lookups => { },
pending_rules => { }, # maintain pending rules list for meta evaluation
rules_for_key => { }, # record all rules used by a key for logging
timing_by_query => { },
all_lookups => { }, # keyed by "rr_type/domain"
};
@ -82,66 +83,54 @@ sub new {
$self;
}
# Given a domain name, produces a listref of successively stripped down
# parent domains, e.g. a domain '2.10.Example.COM' would produce a list:
# '2.10.example.com', '10.example.com', 'example.com', 'com', ''
#
sub domain_to_search_list {
my ($domain) = @_;
$domain =~ s/^\.+//; $domain =~ s/\.+\z//; # strip leading and trailing dots
my @search_keys;
if ($domain =~ /\[/) { # don't split address literals
@search_keys = ( $domain, '' ); # presumably an address literal
} else {
local $1;
$domain = lc $domain;
for (;;) {
push(@search_keys, $domain);
last if $domain eq '';
# strip one level
$domain = ($domain =~ /^ (?: [^.]* ) \. (.*) \z/xs) ? $1 : '';
}
if (@search_keys > 20) { # enforce some sanity limit
@search_keys = @search_keys[$#search_keys-19 .. $#search_keys];
}
}
return \@search_keys;
}
# ---------------------------------------------------------------------------
=item $ent = $async->start_lookup($ent, $master_deadline)
=item $ent = $async->bgsend_and_start_lookup($name, $type, $class, $ent, $cb, %options)
Register the start of a long-running asynchronous lookup operation.
C<$ent> is a hash reference containing the following items:
Launch async DNS lookups. This is the only official method supported for
plugins since version 4.0.0. Do not use bgsend and start_lookup separately.
Merges duplicate queries automatically, only launches one and calls all
related callbacks on answer.
=over 4
=item key (required)
=item $name (required)
A key string, unique to this lookup. This is what is reported in
debug messages, used as the key for C<get_lookup()>, etc.
Name to query.
=item id (required)
=item $type (required)
An ID string, also unique to this lookup. Typically, this is the DNS packet ID
as returned by DnsResolver's C<bgsend> method. Sadly, the Net::DNS
architecture forces us to keep a separate ID string for this task instead of
reusing C<key> -- if you are not using DNS lookups through DnsResolver, it
should be OK to just reuse C<key>.
Type to query, A, TXT, NS, etc.
=item type (required)
=item $class (required/deprecated)
Deprecated, ignored, set as undef.
=item C<$ent> is a required hash reference containing the following items:
=over 4
=item $ent->{rulename} (required)
The rulename that started and/or depends on this query. Required for rule
dependencies to work correctly. Can be a single rulename, or array of
multiple rulenames.
=item $ent->{type} (optional)
A string, typically one word, used to describe the type of lookup in log
messages, such as C<DNSBL>, C<MX>, C<TXT>.
messages, such as C<DNSBL>, C<URIBL-A>. If not defined, default is value of
$type.
=item zone (optional)
=item $ent->{zone} (optional)
A zone specification (typically a DNS zone name - e.g. host, domain, or RBL)
which may be used as a key to look up per-zone settings. No semantics on this
parameter is imposed by this module. Currently used to fetch by-zone timeouts.
A zone specification (typically a DNS zone name - e.g. host, domain, or
RBL) which may be used as a key to look up per-zone settings. No semantics
on this parameter is imposed by this module. Currently used to fetch
by-zone timeouts (from rbl_timeout setting). Defaults to $name.
=item timeout_initial (optional)
=item $ent->{timeout_initial} (optional)
An initial value of elapsed time for which we are willing to wait for a
response (time in seconds, floating point value is allowed). When elapsed
@ -158,20 +147,267 @@ variable rbl_timeout.
If a value of the timeout_initial parameter is below timeout_min, the initial
timeout is set to timeout_min.
=item timeout_min (optional)
=item $ent->{timeout_min} (optional)
A lower bound (in seconds) to which the actual timeout approaches as the
number of queries completed approaches the number of all queries started.
Defaults to 0.2 * timeout_initial.
=item $ent->{key}, $ent->{id} (deprecated)
Deprecated, ignored, automatically generated since 4.0.0.
=item $ent->{YOUR_OWN_ITEM}
Any other custom values/objects that you want to pass on to the answer
callback.
=back
C<$ent> is returned by this method, with its contents augmented by additional
information.
=item $cb (required)
Callback function for answer, called as $cb->($ent, $pkt). C<$ent> is the
same object that bgsend_and_start_lookup was called with. C<$pkt> is the
packet object for the response, Net::DNS:RR objects can be found from
$pkt->answer.
=item %options (required)
Hash of options. Only supported and required option is master_deadline:
master_deadline => $pms->{master_deadline}
=back
=cut
sub start_queue {
my($self) = @_;
$self->{wait_queue} = 1;
}
sub launch_queue {
my($self) = @_;
delete $self->{wait_queue};
if ($self->{bgsend_queue}) {
dbg("async: launching queued lookups");
foreach (@{$self->{bgsend_queue}}) {
$self->bgsend_and_start_lookup(@$_);
}
delete $self->{bgsend_queue};
}
}
sub bgsend_and_start_lookup {
my $self = shift;
my($domain, $type, $class, $ent, $cb, %options) = @_;
return if $self->{main}->{resolver}->{no_resolver};
# Waiting for priority -100 to launch?
if ($self->{wait_queue}) {
push @{$self->{bgsend_queue}}, [@_];
dbg("async: DNS priority not reached, queueing lookup: $domain/$type");
return $ent;
}
if (!defined $ent->{rulename} && !$self->{rulename_warned}++) {
my($package, $filename, $line) = caller;
warn "async: bgsend_and_start_lookup called without rulename, ".
"from $package ($filename) line $line. You are likely using ".
"a plugin that is not compatible with SpamAssasin 4.0.0.";
}
$domain =~ s/\.+\z//s; # strip trailing dots, these sometimes still sneak in
$domain = idn_to_ascii($domain);
# At this point the $domain should already be encoded to UTF-8 and
# IDN converted to ASCII-compatible encoding (ACE). Make sure this is
# really the case in order to be able to catch any leftover omissions.
if (utf8::is_utf8($domain)) {
utf8::encode($domain);
my($package, $filename, $line) = caller;
info("bgsend_and_start_lookup: Unicode domain name, expected octets: %s, ".
"called from %s line %d", $domain, $package, $line);
} elsif ($domain =~ tr/\x00-\x7F//c) { # is not all-ASCII
my($package, $filename, $line) = caller;
info("bgsend_and_start_lookup: non-ASCII domain name: %s, ".
"called from %s line %d", $domain, $package, $line);
}
my $dnskey = uc($type).'/'.lc($domain);
my $dns_query_info = $self->{all_lookups}{$dnskey};
$ent = {} if !$ent;
$ent->{id} = undef;
my $key = $ent->{key} = $dnskey;
$ent->{query_type} = $type;
$ent->{query_domain} = $domain;
$ent->{type} = $type if !exists $ent->{type};
$ent->{zone} = $domain if !exists $ent->{zone};
$cb = $ent->{completed_callback} if !$cb; # compatibility with SA < 3.4
my @rulenames = grep { defined } (ref $ent->{rulename} ?
@{$ent->{rulename}} : $ent->{rulename});
$self->{rules_for_key}->{$key}{$_} = 1 foreach (@rulenames);
if ($dns_query_info) { # DNS query already underway or completed
if ($dns_query_info->{blocked}) {
dbg("async: blocked by %s: %s, rules: %s", $dns_query_info->{blocked},
$dnskey, join(", ", @rulenames));
return;
}
my $id = $ent->{id} = $dns_query_info->{id}; # re-use existing query
return if !defined $id; # presumably some fatal failure
my $id_tail = $id; $id_tail =~ s{^\d+/IN/}{};
lc($id_tail) eq lc($dnskey)
or info("async: unmatched id %s, key=%s", $id, $dnskey);
my $pkt = $dns_query_info->{pkt};
if (!$pkt) { # DNS query underway, still waiting for results
# just add our query to the existing one
push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
$self->{pending_rules}->{$_}{$key} = 1 foreach (@rulenames);
dbg("async: query %s already underway, adding no.%d, rules: %s",
$id, scalar @{$dns_query_info->{applicants}},
join(", ", @rulenames));
} else { # DNS query already completed, re-use results
# answer already known, just do the callback and be done with it
delete $self->{pending_rules}->{$_}{$key} foreach (@rulenames);
if (!$cb) {
dbg("async: query %s already done, re-using for %s, rules: %s",
$id, $key, join(", ", @rulenames));
} else {
dbg("async: query %s already done, re-using for %s, callback, rules: %s",
$id, $key, join(", ", @rulenames));
eval {
$cb->($ent, $pkt); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (1) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("async: query %s completed, callback %s failed: %s\n",
$id, $key, $@);
};
}
}
}
else { # no existing query, open a new DNS query
$dns_query_info = $self->{all_lookups}{$dnskey} = {}; # new query needed
my($id, $blocked, $check_dbrdom);
# dns_query_restriction
my $blocked_by = 'dns_query_restriction';
my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
# dns_block_rule
my $dns_block_domains = $self->{main}->{conf}->{dns_block_rule_domains};
if ($dns_query_blockages || $dns_block_domains) {
my $search_list = domain_to_search_list($domain);
foreach my $parent_domain ((@$search_list, '*')) {
if ($dns_query_blockages) {
$blocked = $dns_query_blockages->{$parent_domain};
last if defined $blocked; # stop at first defined, can be true or false
}
if ($parent_domain ne '*' && exists $dns_block_domains->{$parent_domain}) {
# save for later check.. ps. untainted already
$check_dbrdom = $dns_block_domains->{$parent_domain};
}
}
}
if (!$blocked && $check_dbrdom) {
my $blockfile =
$self->{main}->sed_path("__global_state_dir__/dnsblock_${check_dbrdom}");
if (my $mtime = (stat($blockfile))[9]) {
if (time - $mtime <= $self->{main}->{conf}->{dns_block_time}) {
$blocked = 1;
$blocked_by = 'dns_block_rule';
} else {
dbg("async: dns_block_rule removing expired $blockfile");
unlink($blockfile);
}
}
}
if ($blocked) {
dbg("async: blocked by %s: %s, rules: %s", $blocked_by, $dnskey,
join(", ", @rulenames));
$dns_query_info->{blocked} = $blocked_by;
} else {
dbg("async: launching %s, rules: %s", $dnskey, join(", ", @rulenames));
$id = $self->{main}->{resolver}->bgsend($domain, $type, $class, sub {
my($pkt, $pkt_id, $timestamp) = @_;
# this callback sub is called from DnsResolver::poll_responses()
# dbg("async: in a bgsend_and_start_lookup callback, id %s", $pkt_id);
if ($pkt_id ne $id) {
warn "async: mismatched dns id: got $pkt_id, expected $id\n";
return;
}
$self->set_response_packet($pkt_id, $pkt, $ent->{key}, $timestamp);
$dns_query_info->{pkt} = $pkt;
my $cb_count = 0;
foreach my $tuple (@{$dns_query_info->{applicants}}) {
my($appl_ent, $appl_cb) = @$tuple;
my @rulenames = grep { defined } (ref $appl_ent->{rulename} ?
@{$appl_ent->{rulename}} : $appl_ent->{rulename});
foreach (@rulenames) {
delete $self->{pending_rules}->{$_}{$appl_ent->{key}};
}
if ($appl_cb) {
dbg("async: calling callback on key %s, rules: %s",
$key, join(", ", @rulenames));
$cb_count++;
eval {
$appl_cb->($appl_ent, $pkt); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (2) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("async: query %s completed, callback %s failed: %s\n",
$id, $appl_ent->{key}, $@);
};
}
}
delete $dns_query_info->{applicants};
dbg("async: query $id completed, no callbacks run") if !$cb_count;
});
}
return if !defined $id;
$dns_query_info->{id} = $ent->{id} = $id;
push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
$self->{pending_rules}->{$_}{$key} = 1 foreach (@rulenames);
$self->_start_lookup($ent, $options{master_deadline});
}
return $ent;
}
# ---------------------------------------------------------------------------
=item $ent = $async->start_lookup($ent, $master_deadline)
DIRECT USE DEPRECATED since 4.0.0, please use bgsend_and_start_lookup.
=cut
sub start_lookup {
my $self = shift;
if (!$self->{start_lookup_warned}++) {
my($package, $filename, $line) = caller;
warn "async: deprecated start_lookup called, ".
"from $package ($filename) line $line. You are likely using ".
"a plugin that is not compatible with SpamAssasin 4.0.0.";
}
return if $self->{main}->{resolver}->{no_resolver};
$self->_start_lookup(@_);
}
# Internal use not deprecated. :-)
sub _start_lookup {
my ($self, $ent, $master_deadline) = @_;
my $id = $ent->{id};
@ -229,16 +465,16 @@ sub start_lookup {
$ent->{timeout_initial} = $t_init;
$ent->{timeout_min} = $t_end;
my @rulenames = grep { defined } (ref $ent->{rulename} ?
@{$ent->{rulename}} : $ent->{rulename});
$ent->{display_id} = # identifies entry in debug logging and similar
join(", ", grep { defined }
map { ref $ent->{$_} ? @{$ent->{$_}} : $ent->{$_} }
qw(sets rules rulename type key) );
join(", ", grep { defined } map { $ent->{$_} } qw(type key));
$self->{pending_lookups}->{$key} = $ent;
$self->{queries_started}++;
$self->{total_queries_started}++;
dbg("async: starting: %s (timeout %.1fs, min %.1fs)%s",
dbg("async: starting: %s%s (timeout %.1fs, min %.1fs)%s",
@rulenames ? join(", ", @rulenames).", " : '',
$ent->{display_id}, $ent->{timeout_initial}, $ent->{timeout_min},
!$clipped_by_master_deadline ? '' : ', capped by time limit');
@ -247,135 +483,15 @@ sub start_lookup {
# ---------------------------------------------------------------------------
=item $ent = $async->bgsend_and_start_lookup($domain, $type, $class, $ent, $cb, %options)
A common idiom: calls C<bgsend>, followed by a call to C<start_lookup>,
returning the argument $ent object as modified by C<start_lookup> and
filled-in with a query ID.
=cut
sub bgsend_and_start_lookup {
my($self, $domain, $type, $class, $ent, $cb, %options) = @_;
$ent = {} if !$ent;
$domain =~ s/\.+\z//s; # strip trailing dots, these sometimes still sneak in
$ent->{id} = undef;
$ent->{query_type} = $type;
$ent->{query_domain} = $domain;
$ent->{type} = $type if !exists $ent->{type};
$cb = $ent->{completed_callback} if !$cb; # compatibility with SA < 3.4
my $key = $ent->{key} || '';
my $dnskey = uc($type) . '/' . lc($domain);
my $dns_query_info = $self->{all_lookups}{$dnskey};
if ($dns_query_info) { # DNS query already underway or completed
my $id = $ent->{id} = $dns_query_info->{id}; # re-use existing query
return if !defined $id; # presumably blocked, or other fatal failure
my $id_tail = $id; $id_tail =~ s{^\d+/IN/}{};
lc($id_tail) eq lc($dnskey)
or info("async: unmatched id %s, key=%s", $id, $dnskey);
my $pkt = $dns_query_info->{pkt};
if (!$pkt) { # DNS query underway, still waiting for results
# just add our query to the existing one
push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
dbg("async: query %s already underway, adding no.%d %s",
$id, scalar @{$dns_query_info->{applicants}},
$ent->{rulename} || $key);
} else { # DNS query already completed, re-use results
# answer already known, just do the callback and be done with it
if (!$cb) {
dbg("async: query %s already done, re-using for %s", $id, $key);
} else {
dbg("async: query %s already done, re-using for %s, callback",
$id, $key);
eval {
$cb->($ent, $pkt); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (1) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("query %s completed, callback %s failed: %s\n",
$id, $key, $@);
};
}
}
}
else { # no existing query, open a new DNS query
$dns_query_info = $self->{all_lookups}{$dnskey} = {}; # new query needed
my($id, $blocked);
my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
if ($dns_query_blockages) {
my $search_list = domain_to_search_list($domain);
foreach my $parent_domain (@$search_list) {
$blocked = $dns_query_blockages->{$parent_domain};
last if defined $blocked; # stop at first defined, can be true or false
}
}
if ($blocked) {
dbg("async: blocked by dns_query_restriction: %s", $dnskey);
} else {
dbg("async: launching %s for %s", $dnskey, $key);
$id = $self->{main}->{resolver}->bgsend($domain, $type, $class, sub {
my($pkt, $pkt_id, $timestamp) = @_;
# this callback sub is called from DnsResolver::poll_responses()
# dbg("async: in a bgsend_and_start_lookup callback, id %s", $pkt_id);
if ($pkt_id ne $id) {
warn "async: mismatched dns id: got $pkt_id, expected $id\n";
return;
}
$self->set_response_packet($pkt_id, $pkt, $ent->{key}, $timestamp);
$dns_query_info->{pkt} = $pkt;
my $cb_count = 0;
foreach my $tuple (@{$dns_query_info->{applicants}}) {
my($appl_ent, $appl_cb) = @$tuple;
if ($appl_cb) {
dbg("async: calling callback on key %s%s", $key,
!defined $appl_ent->{rulename} ? ''
: ", rule ".$appl_ent->{rulename});
$cb_count++;
eval {
$appl_cb->($appl_ent, $pkt); 1;
} or do {
chomp $@;
# resignal if alarm went off
die "async: (2) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("query %s completed, callback %s failed: %s\n",
$id, $appl_ent->{key}, $@);
};
}
}
delete $dns_query_info->{applicants};
dbg("async: query $id completed, no callbacks run") if !$cb_count;
});
}
return if !defined $id;
$dns_query_info->{id} = $ent->{id} = $id;
push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
$self->start_lookup($ent, $options{master_deadline});
}
return $ent;
}
# ---------------------------------------------------------------------------
=item $ent = $async->get_lookup($key)
Retrieve the pending-lookup object for the given key C<$key>.
If the lookup is complete, this will return C<undef>.
Note that a lookup is still considered "pending" until C<complete_lookups()> is
called, even if it has been reported as complete via C<set_response_packet()>.
DEPRECATED since 4.0.0. Do not use.
=cut
sub get_lookup {
my ($self, $key) = @_;
warn("async: deprecated get_lookup function used\n");
return $self->{pending_lookups}->{$key};
}
@ -415,18 +531,16 @@ sub complete_lookups {
my %typecount;
my $pending = $self->{pending_lookups};
$self->{queries_started} = 0;
$self->{queries_completed} = 0;
my $now = time;
if (defined $timeout && $timeout > 0 &&
%$pending && $self->{total_queries_started} > 0)
%$pending && $self->{queries_started} > 0)
{
# shrink a 'select' timeout if a caller specified unnecessarily long
# value beyond the latest deadline of any outstanding request;
# can save needless wait time (up to 1 second in harvest_dnsbl_queries)
my $r = $self->{total_queries_completed} / $self->{total_queries_started};
my $r = $self->{queries_completed} / $self->{queries_started};
my $r2 = $r * $r; # 0..1
my $max_deadline;
while (my($key,$ent) = each %$pending) {
@ -457,9 +571,9 @@ sub complete_lookups {
if (%$pending) { # any outstanding requests still?
$self->{last_poll_responses_time} = $now;
my $nfound = $self->{main}->{resolver}->poll_responses($timeout);
dbg("async: select found %s responses ready (t.o.=%.1f)",
!$nfound ? 'no' : $nfound, $timeout);
my ($nfound, $ncb) = $self->{main}->{resolver}->poll_responses($timeout);
dbg("async: select found %d responses ready (t.o.=%.1f), did %d callbacks",
$nfound, $timeout, $ncb);
}
$now = time; # capture new timestamp, after possible sleep in 'select'
@ -475,18 +589,19 @@ sub complete_lookups {
$anydone = 1;
$ent->{finish_time} = $now if !defined $ent->{finish_time};
my $elapsed = $ent->{finish_time} - $ent->{start_time};
dbg("async: completed in %.3f s: %s", $elapsed, $ent->{display_id});
$self->{timing_by_query}->{". $key"} += $elapsed;
my @rulenames = keys %{$self->{rules_for_key}->{$key}};
dbg("async: completed in %.3f s: %s, rules: %s",
$elapsed, $ent->{display_id}, join(", ", @rulenames));
$self->{timing_by_query}->{". $key ($ent->{type})"} += $elapsed;
$self->{queries_completed}++;
$self->{total_queries_completed}++;
delete $pending->{$key};
}
}
if (%$pending) { # still any requests outstanding? are they expired?
my $r =
!$allow_aborting_of_expired || !$self->{total_queries_started} ? 1.0
: $self->{total_queries_completed} / $self->{total_queries_started};
!$allow_aborting_of_expired || !$self->{queries_started} ? 1.0
: $self->{queries_completed} / $self->{queries_started};
my $r2 = $r * $r; # 0..1
while (my($key,$ent) = each %$pending) {
$typecount{$ent->{type}}++;
@ -496,8 +611,6 @@ sub complete_lookups {
$dt = 1 + int $dt if $timer_resolution == 1 && $dt > int $dt;
$allexpired = 0 if $now <= $ent->{start_time} + $dt;
}
dbg("async: queries completed: %d, started: %d",
$self->{queries_completed}, $self->{queries_started});
}
# ensure we don't get stuck if a request gets lost in the ether.
@ -511,9 +624,9 @@ sub complete_lookups {
$alldone = 1;
}
else {
dbg("async: queries active: %s%s at %s",
dbg("async: queries still pending: %s%s",
join (' ', map { "$_=$typecount{$_}" } sort keys %typecount),
$allexpired ? ', all expired' : '', scalar(localtime(time)));
$allexpired ? ', all expired' : '');
$alldone = 0;
}
1;
@ -544,15 +657,20 @@ sub abort_remaining_lookups {
my $foundcnt = 0;
my $now = time;
$self->{pending_rules} = {};
while (my($key,$ent) = each %$pending) {
dbg("async: aborting after %.3f s, %s: %s",
$now - $ent->{start_time},
my $dur = $now - $ent->{start_time};
my @rulenames = keys %{$self->{rules_for_key}->{$key}};
my $msg = sprintf( "async: aborting after %.3f s, %s: %s, rules: %s",
$dur,
(defined $ent->{timeout_initial} &&
$now > $ent->{start_time} + $ent->{timeout_initial}
? 'past original deadline' : 'deadline shrunk'),
$ent->{display_id} );
$ent->{display_id}, join(", ", @rulenames) );
$dur > 1 ? info($msg) : dbg($msg);
$foundcnt++;
$self->{timing_by_query}->{"X $key"} = $now - $ent->{start_time};
$self->{timing_by_query}->{"X $key"} = $dur;
$ent->{finish_time} = $now if !defined $ent->{finish_time};
delete $pending->{$key};
}
@ -566,8 +684,10 @@ sub abort_remaining_lookups {
foreach my $tuple (@{$dns_query_info->{applicants}}) {
my($ent, $cb) = @$tuple;
if ($cb) {
dbg("async: calling callback/abort on key %s%s", $dnskey,
!defined $ent->{rulename} ? '' : ", rule ".$ent->{rulename});
my @rulenames = grep { defined } (ref $ent->{rulename} ?
@{$ent->{rulename}} : $ent->{rulename});
dbg("async: calling callback/abort on key %s, rules: %s", $dnskey,
join(", ", @rulenames));
$cb_count++;
eval {
$cb->($ent, undef); 1;
@ -575,7 +695,7 @@ sub abort_remaining_lookups {
chomp $@;
# resignal if alarm went off
die "async: (2) $@\n" if $@ =~ /__alarm__ignore__\(.*\)/s;
warn sprintf("query %s aborted, callback %s failed: %s\n",
warn sprintf("async: query %s aborted, callback %s failed: %s\n",
$dnskey, $ent->{key}, $@);
};
}
@ -594,6 +714,8 @@ sub abort_remaining_lookups {
=item $async->set_response_packet($id, $pkt, $key, $timestamp)
For internal use, do not call from plugins.
Register a "response packet" for a given query. C<$id> is the ID for the
query, and must match the C<id> supplied in C<start_lookup()>. C<$pkt> is the
packet object for the response. A parameter C<$key> identifies an entry in a
@ -645,6 +767,8 @@ sub set_response_packet {
=item $async->report_id_complete($id,$key,$key,$timestamp)
DEPRECATED since 4.0.0. Do not use.
Legacy. Equivalent to $self->set_response_packet($id,undef,$key,$timestamp),
i.e. providing undef as a response packet. Register that a query has
completed and is no longer "pending". C<$id> is the ID for the query,

View File

@ -17,7 +17,7 @@
=head1 NAME
Mail::SpamAssassin::AutoWhitelist - auto-whitelist handler for SpamAssassin
Mail::SpamAssassin::AutoWelcomelist - auto-welcomelist handler for SpamAssassin
=head1 SYNOPSIS
@ -27,10 +27,10 @@ Mail::SpamAssassin::AutoWhitelist - auto-whitelist handler for SpamAssassin
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using text analysis and
several internet-based realtime blacklists.
several internet-based realtime blocklists.
This class is used internally by SpamAssassin to manage the automatic
whitelisting functionality. Please refer to the C<Mail::SpamAssassin>
welcomelisting functionality. Please refer to the C<Mail::SpamAssassin>
documentation for public interfaces.
=head1 METHODS
@ -39,7 +39,7 @@ documentation for public interfaces.
=cut
package Mail::SpamAssassin::AutoWhitelist;
package Mail::SpamAssassin::AutoWelcomelist;
use strict;
use warnings;
@ -64,9 +64,9 @@ sub new {
my $conf = $main->{conf};
my $self = {
main => $main,
factor => $conf->{auto_whitelist_factor},
ipv4_mask_len => $conf->{auto_whitelist_ipv4_mask_len},
ipv6_mask_len => $conf->{auto_whitelist_ipv6_mask_len},
factor => $conf->{auto_welcomelist_factor},
ipv4_mask_len => $conf->{auto_welcomelist_ipv4_mask_len},
ipv6_mask_len => $conf->{auto_welcomelist_ipv6_mask_len},
};
my $factory;
@ -74,23 +74,22 @@ sub new {
$factory = $main->{pers_addr_list_factory};
}
else {
my $type = $conf->{auto_whitelist_factory};
my $type = $conf->{auto_welcomelist_factory};
if ($type =~ /^([_A-Za-z0-9:]+)$/) {
$type = untaint_var($type);
eval '
require '.$type.';
$factory = '.$type.'->new();
1;
'
or do {
' or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn "auto-whitelist: $eval_stat\n";
warn "auto-welcomelist: $eval_stat\n";
undef $factory;
};
$main->set_persistent_address_list_factory($factory) if $factory;
}
else {
warn "auto-whitelist: illegal auto_whitelist_factory setting\n";
warn "auto-welcomelist: illegal auto_welcomelist_factory setting\n";
}
}
@ -136,7 +135,7 @@ sub check_address {
my $noipent = $self->{checker}->get_addr_entry ($noipaddr, undef);
if (defined $noipent->{msgcount} && $noipent->{msgcount} > 0) {
dbg("auto-whitelist: found entry w/o IP address for $addr: replacing with $origip");
dbg("auto-welcomelist: found entry w/o IP address for $addr: replacing with $origip");
$self->{checker}->remove_entry($noipent);
# Now assign proper entry the count and totscore values of the
# no-IP entry instead of assigning the whole value to avoid
@ -151,7 +150,7 @@ sub check_address {
$entry->{msgcount} != $entry->{msgcount} || # test for NaN
$entry->{totscore} != $entry->{totscore})
{
warn "auto-whitelist: resetting bad data for ($addr, $origip), ".
warn "auto-welcomelist: resetting bad data for ($addr, $origip), ".
"count: $entry->{msgcount}, totscore: $entry->{totscore}\n";
$entry->{msgcount} = $entry->{totscore} = 0;
}
@ -164,7 +163,7 @@ sub check_address {
=item awl->count();
This method will return the count of messages used in determining the
whitelist correction.
welcomelist correction.
=cut
@ -191,7 +190,7 @@ sub add_score {
return; # no factory defined; we can't check
}
if ($score != $score) {
warn "auto-whitelist: attempt to add a $score to AWL entry ignored\n";
warn "auto-welcomelist: attempt to add a $score to AWL entry ignored\n";
return; # don't try to add a NaN
}
@ -204,7 +203,7 @@ sub add_score {
=item awl->add_known_good_address($addr);
This method will add a score of -100 to the given address -- effectively
"bootstrapping" the address as being one that should be whitelisted.
"bootstrapping" the address as being one that should be welcomelisted.
=cut
@ -220,7 +219,7 @@ sub add_known_good_address {
=item awl->add_known_bad_address($addr);
This method will add a score of 100 to the given address -- effectively
"bootstrapping" the address as being one that should be blacklisted.
"bootstrapping" the address as being one that should be blocklisted.
=cut
@ -295,13 +294,13 @@ sub ip_to_awl_key {
} else {
my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
if (!defined $origip_obj) { # invalid IPv4 address
dbg("auto-whitelist: bad IPv4 address $origip");
dbg("auto-welcomelist: bad IPv4 address $origip");
} else {
$result = $origip_obj->network->addr;
$result =~s/(\.0){1,3}\z//; # truncate zero tail
}
}
} elsif ($origip =~ /:/ && # triage
} elsif (index($origip, ':') >= 0 && # triage
$origip =~
/^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
# looks like an IPv6 address
@ -309,19 +308,19 @@ sub ip_to_awl_key {
$mask_len = 48 if !defined $mask_len;
my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
if (!defined $origip_obj) { # invalid IPv6 address
dbg("auto-whitelist: bad IPv6 address $origip");
dbg("auto-welcomelist: bad IPv6 address $origip");
} elsif (NetAddr::IP->can('full6')) { # since NetAddr::IP 4.010
$result = $origip_obj->network->full6; # string in a canonical form
$result =~ s/(:0000){1,7}\z/::/; # compress zero tail
}
} else {
dbg("auto-whitelist: bad IP address $origip");
dbg("auto-welcomelist: bad IP address $origip");
}
if (defined $result && length($result) > 39) { # just in case, keep under
$result = substr($result,0,39); # the awl.ip field size
}
if (defined $result) {
dbg("auto-whitelist: IP masking %s -> %s", $origip,$result);
dbg("auto-welcomelist: IP masking %s -> %s", $origip,$result);
}
return $result;
}

View File

@ -111,7 +111,7 @@ sub learn {
{
# DMK, koppel@ece.lsu.edu: Hoping that the ultimate fix to bug 2263 will
# make it unnecessary to construct a PerMsgStatus here.
my $PMS = new Mail::SpamAssassin::PerMsgStatus $self->{main}, $msg;
my $PMS = Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $msg);
my $ignore = $self->ignore_message($PMS);
$PMS->finish();
return 0 if $ignore;

View File

@ -35,14 +35,11 @@ use Errno qw(EBADF);
#use Data::Dumper;
use File::Basename;
use File::Path;
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
}
use Digest::SHA qw(sha1);
use Mail::SpamAssassin::BayesStore;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(compile_regexp);
our @ISA = qw( Mail::SpamAssassin::BayesStore );
@ -77,6 +74,7 @@ sub new {
sub DESTROY {
my $self = shift;
$self->_close_db;
}
@ -650,6 +648,14 @@ sub dump_tokens {
my($self, $template, $regex, @vars) = @_;
dbg("bayes: dump_tokens starting");
if (defined $regex) {
my ($rec, $err) = compile_regexp($regex, 2);
if (!$rec) {
die "Invalid dump_tokens regex '$regex': $err\n";
}
$regex = $rec;
}
my $cursor = $self->{handles}->{tokens}->db_cursor;
$cursor or die "Couldn't get cursor: $BerkeleyDB::Error";
my ($token, $value) = ("", "");

View File

@ -27,14 +27,10 @@ use Errno qw(EBADF);
use File::Basename;
use File::Spec;
use File::Path;
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
}
use Digest::SHA qw(sha1);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows);
use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows compile_regexp);
use Mail::SpamAssassin::BayesStore;
use Mail::SpamAssassin::Logger;
@ -992,9 +988,17 @@ sub get_storage_variables {
sub dump_db_toks {
my ($self, $template, $regex, @vars) = @_;
if (defined $regex) {
my ($rec, $err) = compile_regexp($regex, 2);
if (!$rec) {
die "Invalid dump_tokens regex '$regex': $err\n";
}
$regex = $rec;
}
while (my ($tok, $tokvalue) = each %{$self->{db_toks}}) {
next if ($tok =~ MAGIC_RE); # skip magic tokens
next if (defined $regex && ($tok !~ /$regex/o));
next if (defined $regex && $tok !~ /$regex/o);
# We have the value already, so just unpack it.
my ($ts, $th, $atime) = $self->tok_unpack ($tokvalue);

View File

@ -30,6 +30,9 @@ In addition, this module will support rollback on error, if you are
using the InnoDB database table type in MySQL. For more information
please review the instructions in sql/README.bayes.
This module is also compatible with MariaDB and DBD::MariaDB can be used
instead of DBD::mysql driver.
=cut
package Mail::SpamAssassin::BayesStore::MySQL;
@ -292,7 +295,8 @@ sub set_running_expire_tok {
return 0 unless (defined($self->{_dbh}));
my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)";
my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)
ON DUPLICATE KEY UPDATE runtime=VALUES(runtime)";
my $time = time();
@ -339,6 +343,147 @@ sub remove_running_expire_tok {
return 1;
}
=head2 tok_get
public instance (Integer, Integer, Integer) tok_get (String $token)
Description:
This method retrieves a specified token (C<$token>) from the database
and returns it's spam_count, ham_count and last access time.
=cut
sub tok_get {
my ($self, $token) = @_;
return (0,0,0) unless (defined($self->{_dbh}));
my $sql = "SELECT spam_count, ham_count, atime
FROM bayes_token
WHERE id = ?
AND token = ?";
my $sth = $self->{_dbh}->prepare_cached($sql);
unless (defined($sth)) {
dbg("bayes: tok_get: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return (0,0,0);
}
$sth->bind_param(1, $self->{_userid});
$sth->bind_param(2, $token, DBI::SQL_BINARY);
my $rc = $sth->execute();
unless ($rc) {
dbg("bayes: tok_get: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return (0,0,0);
}
my ($spam_count, $ham_count, $atime) = $sth->fetchrow_array();
$sth->finish();
$spam_count = 0 if (!$spam_count || $spam_count < 0);
$ham_count = 0 if (!$ham_count || $ham_count < 0);
$atime = 0 if (!$atime);
return ($spam_count, $ham_count, $atime)
}
=head2 tok_get_all
public instance (\@) tok_get (@ $tokens)
Description:
This method retrieves the specified tokens (C<$tokens>) from storage and returns
an array ref of arrays spam count, ham count and last access time.
=cut
sub tok_get_all {
my ($self, @tokens) = @_;
return [] unless (defined($self->{_dbh}));
my $token_list_size = scalar(@tokens);
dbg("bayes: tok_get_all: token count: $token_list_size");
my @tok_results;
my $search_index = 0;
my $results_index = 0;
my $bunch_end;
my $token_select = $self->_token_select_string();
my $multi_sql = "SELECT $token_select, spam_count, ham_count, atime
FROM bayes_token
WHERE id = ?
AND token IN ";
# fetch tokens in bunches of 100 until there are <= 100 left, then just fetch the rest
while ($token_list_size > $search_index) {
my $bunch_size;
if ($token_list_size - $search_index > 100) {
$bunch_size = 100;
}
else {
$bunch_size = $token_list_size - $search_index;
}
while ($token_list_size - $search_index >= $bunch_size) {
my @tok;
my $in_str = '(';
$bunch_end = $search_index + $bunch_size;
for ( ; $search_index < $bunch_end; $search_index++) {
$in_str .= '?,';
push(@tok, $tokens[$search_index]);
}
chop $in_str;
$in_str .= ')';
my $dynamic_sql = $multi_sql . $in_str;
my $sth = $self->{_dbh}->prepare($dynamic_sql);
unless (defined($sth)) {
dbg("bayes: tok_get_all: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return [];
}
my $idx = 0;
$sth->bind_param(++$idx, $self->{_userid});
$sth->bind_param(++$idx, $_, DBI::SQL_BINARY) foreach (@tok);
my $rc = $sth->execute();
unless ($rc) {
dbg("bayes: tok_get_all: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return [];
}
my $results = $sth->fetchall_arrayref();
$sth->finish();
foreach my $result (@{$results}) {
# Make sure that spam_count and ham_count are not negative
$result->[1] = 0 if (!$result->[1] || $result->[1] < 0);
$result->[2] = 0 if (!$result->[2] || $result->[2] < 0);
# Make sure that atime has a value
$result->[3] = 0 if (!$result->[3]);
$tok_results[$results_index++] = $result;
}
}
}
return \@tok_results;
}
=head2 nspam_nham_change
public instance (Boolean) nspam_nham_change (Integer $num_spam,
@ -421,10 +566,22 @@ sub tok_touch {
AND token = ?
AND atime < ?";
my $rows = $self->{_dbh}->do($sql, undef, $atime, $self->{_userid},
$token, $atime);
my $sth = $self->{_dbh}->prepare_cached($sql);
unless (defined($rows)) {
unless (defined($sth)) {
dbg("bayes: tok_touch: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return 0;
}
$sth->bind_param(1, $atime);
$sth->bind_param(2, $self->{_userid});
$sth->bind_param(3, $token, DBI::SQL_BINARY);
$sth->bind_param(4, $atime);
my $rows = $sth->execute();
unless ($rows) {
dbg("bayes: tok_touch: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return 0;
@ -478,20 +635,29 @@ sub tok_touch_all {
return 1 unless (scalar(@{$tokens}));
my $sql = "UPDATE bayes_token SET atime = ? WHERE id = ? AND token IN (";
my @bindings = ($atime, $self->{_userid});
foreach my $token (@{$tokens}) {
foreach (@{$tokens}) {
$sql .= "?,";
push(@bindings, $token);
}
chop($sql); # get rid of trailing ,
$sql .= ") AND atime < ?";
push(@bindings, $atime);
my $rows = $self->{_dbh}->do($sql, undef, @bindings);
my $sth = $self->{_dbh}->prepare($sql);
unless (defined($rows)) {
unless (defined($sth)) {
dbg("bayes: tok_touch_all: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return [];
}
my $idx = 0;
$sth->bind_param(++$idx, $atime);
$sth->bind_param(++$idx, $self->{_userid});
$sth->bind_param(++$idx, $_, DBI::SQL_BINARY) foreach (@{$tokens});
$sth->bind_param(++$idx, $atime);
my $rows = $sth->execute();
unless ($rows) {
dbg("bayes: tok_touch_all: SQL error: ".$self->{_dbh}->errstr());
$self->{_dbh}->rollback();
return 0;
@ -735,7 +901,8 @@ sub _initialize_db {
return 0;
}
$id = $self->{_dbh}->{'mysql_insertid'};
$id = $self->{_dsn} =~ /^DBI:MariaDB/i ?
$self->{_dbh}->{'mariadb_insertid'} : $self->{_dbh}->{'mysql_insertid'};
$self->{_dbh}->commit();
@ -797,10 +964,12 @@ sub _put_token {
return 0;
}
my $rc = $sth->execute($spam_count,
$ham_count,
$self->{_userid},
$token);
$sth->bind_param(1, $spam_count);
$sth->bind_param(2, $ham_count);
$sth->bind_param(3, $self->{_userid});
$sth->bind_param(4, $token, DBI::SQL_BINARY);
my $rc = $sth->execute();
unless ($rc) {
dbg("bayes: _put_token: SQL error: ".$self->{_dbh}->errstr());
@ -824,14 +993,16 @@ sub _put_token {
return 0;
}
my $rc = $sth->execute($self->{_userid},
$token,
$spam_count,
$ham_count,
$atime,
$spam_count,
$ham_count,
$atime);
$sth->bind_param(1, $self->{_userid});
$sth->bind_param(2, $token, DBI::SQL_BINARY);
$sth->bind_param(3, $spam_count);
$sth->bind_param(4, $ham_count);
$sth->bind_param(5, $atime);
$sth->bind_param(6, $spam_count);
$sth->bind_param(7, $ham_count);
$sth->bind_param(8, $atime);
my $rc = $sth->execute();
unless ($rc) {
dbg("bayes: _put_token: SQL error: ".$self->{_dbh}->errstr());
@ -948,12 +1119,15 @@ sub _put_tokens {
return 0;
}
$sth->bind_param(1, $spam_count);
$sth->bind_param(2, $ham_count);
$sth->bind_param(3, $self->{_userid});
# 4, update token in foreach loop
my $error_p = 0;
foreach my $token (keys %{$tokens}) {
my $rc = $sth->execute($spam_count,
$ham_count,
$self->{_userid},
$token);
$sth->bind_param(4, $token, DBI::SQL_BINARY);
my $rc = $sth->execute();
unless ($rc) {
dbg("bayes: _put_tokens: SQL error: ".$self->{_dbh}->errstr());
@ -984,18 +1158,21 @@ sub _put_tokens {
return 0;
}
$sth->bind_param(1, $self->{_userid});
# 2, update token in foreach loop
$sth->bind_param(3, $spam_count);
$sth->bind_param(4, $ham_count);
$sth->bind_param(5, $atime);
$sth->bind_param(6, $spam_count);
$sth->bind_param(7, $ham_count);
$sth->bind_param(8, $atime);
my $error_p = 0;
my $new_tokens = 0;
my $need_atime_update_p = 0;
foreach my $token (keys %{$tokens}) {
my $rc = $sth->execute($self->{_userid},
$token,
$spam_count,
$ham_count,
$atime,
$spam_count,
$ham_count,
$atime);
$sth->bind_param(2, $token, DBI::SQL_BINARY);
my $rc = $sth->execute();
if (!$rc) {
dbg("bayes: _put_tokens: SQL error: ".$self->{_dbh}->errstr());
@ -1070,6 +1247,22 @@ sub _put_tokens {
return 1;
}
=head2 _token_select_string
private instance (String) _token_select_string
Description:
This method returns the string to be used in SELECT statements to represent
the token column.
The default is to use the RPAD function to pad the token out to 5 characters.
=cut
sub _token_select_string {
return "RPAD(token, 5, ' ')";
}
sub sa_die { Mail::SpamAssassin::sa_die(@_); }
1;

View File

@ -51,7 +51,7 @@ use constant HAS_DBI => eval { require DBI; };
# We need this so we can import the pg_types, since this is a DBD::Pg specific module it should be ok
# YUCK! This little require/import trick is required for the rpm stuff
BEGIN { require DBD::Pg; import DBD::Pg qw(:pg_types); }
BEGIN { require DBD::Pg; DBD::Pg->import(qw(:pg_types)); }
=head1 METHODS

View File

@ -119,16 +119,12 @@ use warnings;
# use bytes;
use re 'taint';
use Errno qw(EBADF);
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Timeout;
use Digest::SHA qw(sha1);
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
}
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::BayesStore;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Timeout;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Util::TinyRedis;
our $VERSION = 0.09;

View File

@ -21,7 +21,11 @@ Mail::SpamAssassin::BayesStore::SQL - SQL Bayesian Storage Module Implementation
=head1 DESCRIPTION
This module implements a SQL based bayesian storage module.
This module implements a SQL based bayesian storage module. It's compatible
with SQLite and possibly other standard SQL servers.
Do not use this for MySQL/MariaDB or PgSQL, they have their own specific
modules.
=cut
@ -32,11 +36,7 @@ use warnings;
# use bytes;
use re 'taint';
use Errno qw(EBADF);
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
}
use Digest::SHA qw(sha1);
use Mail::SpamAssassin::BayesStore;
use Mail::SpamAssassin::Logger;
@ -1676,6 +1676,13 @@ sub _connect_db {
dbg("bayes: database connection established");
}
# SQLite PRAGMA attributes - here for tests, see bug 8033
if ($self->{_dsn} =~ /^dbi:SQLite:.*?;(.+)/i) {
foreach my $attr (split(/;/, $1)) {
$dbh->do("PRAGMA $attr");
}
}
$self->{_dbh} = $dbh;
return 1;
@ -2348,12 +2355,13 @@ Description:
This method returns the string to be used in SELECT statements to represent
the token column.
The default is to use the RPAD function to pad the token out to 5 characters.
The default is to use the SUBSTR function to pad the token out to 5 characters.
=cut
sub _token_select_string {
return "RPAD(token, 5, ' ')";
# Use SQLite compatible RPAD alternative
return "SUBSTR(token || ' ', 1, 5)";
}
sub sa_die { Mail::SpamAssassin::sa_die(@_); }

View File

@ -106,6 +106,10 @@ sub new {
$self->{username} = $args->{username};
}
if ($args->{max_size}) {
$self->{max_size} = $args->{max_size};
}
if ($args->{timeout}) {
$self->{timeout} = $args->{timeout} || 30;
}
@ -135,6 +139,8 @@ threshold
message
report
=cut
sub process {
@ -150,6 +156,41 @@ sub process {
return $self->_filter($msg, $command);
}
=head2 spam_report
public instance (\%) spam_report (String $msg)
Description:
The method implements the report call.
See the process method for the return value.
=cut
sub spam_report {
my ($self, $msg) = @_;
return $self->_filter($msg, 'REPORT');
}
=head2 spam_report_ifspam
public instance (\%) spam_report_ifspam (String $msg)
Description:
The method implements the report_ifspam call.
A report will be returned only if the message is spam.
See the process method for the return value.
=cut
sub spam_report_ifspam {
my ($self, $msg) = @_;
return $self->_filter($msg, 'REPORT_IFSPAM');
}
=head2 check
public instance (\%) check (String $msg)
@ -270,10 +311,10 @@ sub learn {
close $remote or die "error closing socket: $!";
if ($learntype == 0 || $learntype == 1) {
return $did_set =~ /local/;
return index($did_set, 'local') >= 0;
}
else { #safe since we've already checked the $learntype values
return $did_remove =~ /local/;
return index($did_remove, 'local') >= 0;
}
}
@ -534,12 +575,15 @@ threshold
message (if available)
report (if available)
=cut
sub _filter {
my ($self, $msg, $command) = @_;
my %data;
my $msgsize;
$self->_clear_errors();
@ -547,7 +591,10 @@ sub _filter {
return 0 unless ($remote);
my $msgsize = length($msg.$EOL);
if(defined $self->{max_size}) {
$msg = substr($msg,0,$self->{max_size});
}
$msgsize = length($msg.$EOL);
print $remote "$command $PROTOVERSION$EOL";
print $remote "Content-length: $msgsize$EOL";
@ -595,7 +642,11 @@ sub _filter {
$!==EBADF ? dbg("error reading from spamd (10): $!")
: die "error reading from spamd (10): $!";
if($command =~ /^REPORT/) {
$data{report} = $return_msg if ($return_msg);
} else {
$data{message} = $return_msg if ($return_msg);
}
close $remote or die "error closing socket: $!";

File diff suppressed because it is too large Load Diff

View File

@ -27,7 +27,7 @@ Mail::SpamAssassin::Conf::LDAP - load SpamAssassin scores from LDAP database
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using text analysis and
several internet-based realtime blacklists.
several internet-based realtime blocklists.
This class is used internally by SpamAssassin to load scores from an LDAP
database. Please refer to the C<Mail::SpamAssassin> documentation for public
@ -191,6 +191,9 @@ sub load_with_ldap {
}
if ($config_text ne '') {
$conf->{main} = $main;
$config_text = "file start (ldap config)\n".
$config_text.
"file end (ldap config)\n";
$conf->parse_scores_only($config_text);
delete $conf->{main};
}

File diff suppressed because it is too large Load Diff

View File

@ -27,7 +27,7 @@ Mail::SpamAssassin::Conf::SQL - load SpamAssassin scores from SQL database
=head1 DESCRIPTION
Mail::SpamAssassin is a module to identify spam using text analysis and
several internet-based realtime blacklists.
several internet-based realtime blocklists.
This class is used internally by SpamAssassin to load scores from an SQL
database. Please refer to the C<Mail::SpamAssassin> documentation for public
@ -166,6 +166,9 @@ sub load_with_dbi {
}
if ($config_text ne '') {
$conf->{main} = $main;
$config_text = "file start (sql config)\n".
$config_text.
"file end (sql config)\n";
$conf->parse_scores_only($config_text);
delete $conf->{main};
}

View File

@ -34,13 +34,14 @@ our(@BAYES_VARS, @IP_VARS, @SA_VARS, %EXPORT_TAGS, @EXPORT_OK);
BEGIN {
@IP_VARS = qw(
IP_IN_RESERVED_RANGE IP_PRIVATE LOCALHOST IPV4_ADDRESS IP_ADDRESS
IS_IP_PRIVATE IS_LOCALHOST IS_IPV4_ADDRESS IS_IP_ADDRESS
);
@BAYES_VARS = qw(
DUMP_MAGIC DUMP_TOKEN DUMP_BACKUP
);
# These are generic constants that may be used across several modules
@SA_VARS = qw(
HARVEST_DNSBL_PRIORITY MBX_SEPARATOR
MBX_SEPARATOR
MAX_BODY_LINE_LENGTH MAX_HEADER_KEY_LENGTH MAX_HEADER_VALUE_LENGTH
MAX_HEADER_LENGTH ARITH_EXPRESSION_LEXER AI_TIME_UNKNOWN
CHARSETS_LIKELY_TO_FP_AS_CAPS MAX_URI_LENGTH RULENAME_RE IS_RULENAME
@ -168,7 +169,10 @@ use constant IP_PRIVATE => qr{^(?:
)
(?![a-f0-9:])
)
)}oxi;
)}xi;
# exact match
use constant IS_IP_PRIVATE => qr/^${\(IP_PRIVATE)}$/;
# backward compatibility
use constant IP_IN_RESERVED_RANGE => IP_PRIVATE;
@ -246,7 +250,10 @@ use constant LOCALHOST => qr/
)
(?![a-f0-9:])
)
/oxi;
/xi;
# exact match
use constant IS_LOCALHOST => qr/^${\(LOCALHOST)}$/;
# ---------------------------------------------------------------------------
# an IP address, in IPv4 format only.
@ -256,11 +263,13 @@ use constant IPV4_ADDRESS => qr/\b
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
(?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)
\b/ox;
\b/x;
# exact match
use constant IS_IPV4_ADDRESS => qr/^${\(IPV4_ADDRESS)}$/;
# ---------------------------------------------------------------------------
# an IP address, in IPv4, IPv4-mapped-in-IPv6, or IPv6 format. NOTE: cannot
# just refer to $IPV4_ADDRESS, due to perl bug reported in nesting qr//s. :(
# an IP address, in IPv4, IPv4-mapped-in-IPv6, or IPv6 format.
#
use constant IP_ADDRESS => qr/
(?:
@ -344,12 +353,13 @@ use constant IP_ADDRESS => qr/
)
(?![a-f0-9:])
)
/oxi;
/xi;
# exact match
use constant IS_IP_ADDRESS => qr/^${\(IP_ADDRESS)}$/;
# ---------------------------------------------------------------------------
use constant HARVEST_DNSBL_PRIORITY => 500;
# regular expression that matches message separators in The University of
# Washington's MBX mailbox format
use constant MBX_SEPARATOR => qr/^([\s\d]\d-[a-zA-Z]{3}-\d{4}\s\d{2}:\d{2}:\d{2}.*),(\d+);([\da-f]{12})-(\w{8})\r?$/;
@ -388,7 +398,7 @@ use constant ARITH_EXPRESSION_LEXER => qr/(?:
!=| # NEQ
[\+\-\*\/]| # Mathematical Operator
[\?:] # ? : Operator
)/ox;
)/x;
# ArchiveIterator

View File

@ -53,23 +53,23 @@ sub new_checker {
'locked_file' => ''
};
my @order = split(/\s+/, $main->{conf}->{auto_whitelist_db_modules});
my @order = split(/\s+/, $main->{conf}->{auto_welcomelist_db_modules});
untaint_var(\@order);
my $dbm_module = Mail::SpamAssassin::Util::first_available_module (@order);
if (!$dbm_module) {
die "auto-whitelist: cannot find a usable DB package from auto_whitelist_db_modules: " .
$main->{conf}->{auto_whitelist_db_modules}."\n";
die "auto-welcomelist: cannot find a usable DB package from auto_welcomelist_db_modules: " .
$main->{conf}->{auto_welcomelist_db_modules}."\n";
}
my $umask = umask ~ (oct($main->{conf}->{auto_whitelist_file_mode}));
my $umask = umask ~ (oct($main->{conf}->{auto_welcomelist_file_mode}));
# if undef then don't worry -- empty hash!
if (defined($main->{conf}->{auto_whitelist_path})) {
my $path = $main->sed_path($main->{conf}->{auto_whitelist_path});
if (defined($main->{conf}->{auto_welcomelist_path})) {
my $path = $main->sed_path($main->{conf}->{auto_welcomelist_path});
my ($mod1, $mod2);
if ($main->{locker}->safe_lock
($path, 30, $main->{conf}->{auto_whitelist_file_mode}))
($path, 30, $main->{conf}->{auto_welcomelist_file_mode}))
{
$self->{locked_file} = $path;
$self->{is_locked} = 1;
@ -80,20 +80,20 @@ sub new_checker {
($mod1, $mod2) = ('R/O', O_RDONLY);
}
dbg("auto-whitelist: tie-ing to DB file of type $dbm_module $mod1 in $path");
dbg("auto-welcomelist: tie-ing to DB file of type $dbm_module $mod1 in $path");
($self->{is_locked} && $dbm_module eq 'DB_File') and
Mail::SpamAssassin::Util::avoid_db_file_locking_bug($path);
if (! tie %{ $self->{accum} }, $dbm_module, $path, $mod2,
oct($main->{conf}->{auto_whitelist_file_mode}) & 0666)
oct($main->{conf}->{auto_welcomelist_file_mode}) & 0666)
{
my $err = $!; # might get overwritten later
if ($self->{is_locked}) {
$self->{main}->{locker}->safe_unlock($self->{locked_file});
$self->{is_locked} = 0;
}
die "auto-whitelist: cannot open auto_whitelist_path $path: $err\n";
die "auto-welcomelist: cannot open auto_welcomelist_path $path: $err\n";
}
}
umask $umask;
@ -106,10 +106,10 @@ sub new_checker {
sub finish {
my $self = shift;
dbg("auto-whitelist: DB addr list: untie-ing and unlocking");
dbg("auto-welcomelist: DB addr list: untie-ing and unlocking");
untie %{$self->{accum}};
if ($self->{is_locked}) {
dbg("auto-whitelist: DB addr list: file locked, breaking lock");
dbg("auto-welcomelist: DB addr list: file locked, breaking lock");
$self->{main}->{locker}->safe_unlock ($self->{locked_file});
$self->{is_locked} = 0;
}
@ -128,7 +128,7 @@ sub get_addr_entry {
$entry->{msgcount} = $self->{accum}->{$addr} || 0;
$entry->{totscore} = $self->{accum}->{$addr.'|totscore'} || 0;
dbg("auto-whitelist: db-based $addr scores ".$entry->{msgcount}.'/'.$entry->{totscore});
dbg("auto-welcomelist: db-based $addr scores ".$entry->{msgcount}.'/'.$entry->{totscore});
return $entry;
}
@ -143,7 +143,7 @@ sub add_score {
$entry->{msgcount}++;
$entry->{totscore} += $score;
dbg("auto-whitelist: add_score: new count: ".$entry->{msgcount}.", new totscore: ".$entry->{totscore});
dbg("auto-welcomelist: add_score: new count: ".$entry->{msgcount}.", new totscore: ".$entry->{totscore});
$self->{accum}->{$entry->{addr}} = $entry->{msgcount};
$self->{accum}->{$entry->{addr}.'|totscore'} = $entry->{totscore};

View File

@ -29,7 +29,7 @@ use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::AsyncLoop;
use Mail::SpamAssassin::Constants qw(:ip);
use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows);
use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows compile_regexp);
use File::Spec;
use IO::Socket;
@ -37,30 +37,24 @@ use POSIX ":sys_wait_h";
our $KNOWN_BAD_DIALUP_RANGES; # Nothing uses this var???
our $LAST_DNS_CHECK;
our $LAST_DNS_CHECK = 0;
# use very well-connected domains (fast DNS response, many DNS servers,
# geographical distribution is a plus, TTL of at least 3600s)
# these MUST contain both A/AAAA records so we can test dns_options v6
# Updated 8/2019 from https://ip6.nl/#!list?db=alexa500
#
our @EXISTING_DOMAINS = qw{
adelphia.net
akamai.com
apache.org
cingular.com
colorado.edu
comcast.net
doubleclick.com
ebay.com
gmx.net
bing.com
cloudflare.com
digitalpoint.com
facebook.com
google.com
intel.com
kernel.org
linux.org
mit.edu
motorola.com
msn.com
sourceforge.net
sun.com
w3.org
linkedin.com
netflix.com
php.net
wikipedia.org
yahoo.com
};
@ -84,10 +78,6 @@ BEGIN {
# local ($^W) = 0;
no warnings;
eval {
require Net::DNS;
require Net::DNS::Resolver;
};
eval {
require MIME::Base64;
};
@ -101,71 +91,52 @@ BEGIN {
sub do_rbl_lookup {
my ($self, $rule, $set, $type, $host, $subtest) = @_;
$host =~ s/\.\z//s; # strip a redundant trailing dot
my $key = "dns:$type:$host";
my $existing_ent = $self->{async}->get_lookup($key);
# only make a specific query once
if (!$existing_ent) {
my $ent = {
key => $key,
zone => $host, # serves to fetch other per-zone settings
type => "DNSBL-".$type,
sets => [ ], # filled in below
rules => [ ], # filled in below
# id is filled in after we send the query below
};
$existing_ent = $self->{async}->bgsend_and_start_lookup(
$host, $type, undef, $ent,
sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
master_deadline => $self->{master_deadline} );
}
if ($existing_ent) {
# always add set
push @{$existing_ent->{sets}}, $set;
# sometimes match or always match
if (defined $subtest) {
$self->{dnspost}->{$set}->{$subtest} = $rule;
} else {
push @{$existing_ent->{rules}}, $rule;
}
$self->{rule_to_rblkey}->{$rule} = $key;
}
}
# TODO: these are constant so they should only be added once at startup
sub register_rbl_subtest {
my ($self, $rule, $set, $subtest) = @_;
if ($subtest =~ /^sb:/) {
warn("dns: ignored $rule, SenderBase rules are deprecated\n");
info("dns: ignored $rule, SenderBase rules are deprecated");
return 0;
}
# Compile as regex if not pure ip/bitmask (same check in process_dnsbl_result)
if ($subtest !~ /^\d+(?:\.\d+\.\d+\.\d+)?$/) {
my ($rec, $err) = compile_regexp($subtest, 0);
if (!$rec) {
warn("dns: invalid rule $rule subtest regexp '$subtest': $err\n");
return 0;
}
$subtest = $rec;
}
}
$self->{dnspost}->{$set}->{$subtest} = $rule;
dbg("dns: launching rule %s, set %s, type %s, %s", $rule, $set, $type,
defined $subtest ? "subtest $subtest" : 'no subtest');
my $ent = {
rulename => $rule,
type => "DNSBL",
set => $set,
subtest => $subtest,
};
my $ret = $self->{async}->bgsend_and_start_lookup($host, $type, undef, $ent,
sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
master_deadline => $self->{master_deadline}
);
return 0 if defined $ret; # no query started
return; # return undef for async status
}
# Deprecated, was only used from DNSEval.pm?
sub do_dns_lookup {
my ($self, $rule, $type, $host) = @_;
$host =~ s/\.\z//s; # strip a redundant trailing dot
my $key = "dns:$type:$host";
my $ent = {
key => $key,
zone => $host, # serves to fetch other per-zone settings
type => "DNSBL-".$type,
rules => [ $rule ],
# id is filled in after we send the query below
rulename => $rule,
type => "DNSBL",
};
$ent = $self->{async}->bgsend_and_start_lookup(
$host, $type, undef, $ent,
$self->{async}->bgsend_and_start_lookup($host, $type, undef, $ent,
sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
master_deadline => $self->{master_deadline} );
$ent;
master_deadline => $self->{master_deadline}
);
}
###########################################################################
@ -180,38 +151,31 @@ sub dnsbl_hit {
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
$log = join('',$answer->txtdata);
$log = join('', $answer->txtdata);
utf8::encode($log) if utf8::is_utf8($log);
local $1;
$log =~ s{ (?<! [<(\[] ) (https? : // \S+)}{<$1>}xgi;
} else { # assuming $answer->type eq 'A'
local($1,$2,$3,$4,$5);
if ($question->string =~ m/^((?:[0-9a-fA-F]\.){32})(\S+\w)/) {
if ($question->string =~ /^((?:[0-9a-fA-F]\.){32})(\S+\w)/) {
$log = ' listed in ' . lc($2);
my $ipv6addr = join('', reverse split(/\./, lc $1));
$ipv6addr =~ s/\G(....)/$1:/g; chop $ipv6addr;
$ipv6addr =~ s/:0{1,3}/:/g;
$log = $ipv6addr . $log;
} elsif ($question->string =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)\.(\S+\w)/) {
} elsif ($question->string =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)\.(\S+\w)/) {
$log = "$4.$3.$2.$1 listed in " . lc($5);
} else {
$log = 'listed in ' . $question->string;
} elsif ($question->string =~ /^(\S+)(?<!\.)/) {
$log = "listed in ".lc($1);
}
}
# TODO: this may result in some log messages appearing under the
# wrong rules, since we could see this sequence: { test one hits,
# test one's message is logged, test two hits, test one fires again
# on another IP, test one's message is logged for that other IP --
# but under test two's heading }. Right now though it's better
# than just not logging at all.
$self->{already_logged} ||= { };
if ($log && !$self->{already_logged}->{$log}) {
$self->test_log($log);
$self->{already_logged}->{$log} = 1;
if ($log) {
$self->test_log($log, $rule);
}
if (!$self->{tests_already_hit}->{$rule}) {
dbg("dns: rbl rule $rule hit");
$self->got_hit($rule, "RBL: ", ruletype => "dnsbl");
}
}
@ -219,16 +183,26 @@ sub dnsbl_hit {
sub dnsbl_uri {
my ($self, $question, $answer) = @_;
my $qname = $question->qname;
my $rdatastr;
if ($answer->UNIVERSAL::can('txtdata')) {
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
#
# rdatastr() is historical/undocumented, use rdstring() since Net::DNS 0.69
my $rdatastr = $answer->UNIVERSAL::can('txtdata') ? join('',$answer->txtdata)
: $answer->UNIVERSAL::can('rdstring') ? $answer->rdstring
: $answer->rdatastr;
$rdatastr = join('', $answer->txtdata);
} else {
$rdatastr = $answer->rdstring;
# encoded in a RFC 1035 zone file format (escaped), decode it
$rdatastr =~ s{ \\ ( [0-9]{3} | (?![0-9]{3}) . ) }
{ length($1)==3 && $1 <= 255 ? chr($1) : $1 }xgse;
}
# Bug 7236: Net::DNS attempts to decode text strings in a TXT record as
# UTF-8 since version 0.69, which is undesired: octets failing the UTF-8
# decoding are converted to a Unicode "replacement character" U+FFFD, and
# ASCII text is unnecessarily flagged as perl native characters.
utf8::encode($rdatastr) if utf8::is_utf8($rdatastr);
my $qname = $question->qname;
if (defined $qname && defined $rdatastr) {
my $qclass = $question->qclass;
my $qtype = $question->qtype;
@ -236,8 +210,8 @@ sub dnsbl_uri {
push(@vals, "class=$qclass") if $qclass ne "IN";
push(@vals, "type=$qtype") if $qtype ne "A";
my $uri = "dns:$qname" . (@vals ? "?" . join(";", @vals) : "");
push @{ $self->{dnsuri}->{$uri} }, $rdatastr;
$self->{dnsuri}{$uri}{$rdatastr} = 1;
dbg("dns: hit <$uri> $rdatastr");
}
}
@ -251,19 +225,17 @@ sub process_dnsbl_result {
my $question = ($pkt->question)[0];
return if !$question;
my $sets = $ent->{sets} || [];
my $rules = $ent->{rules};
my $rulename = $ent->{rulename};
# NO_DNS_FOR_FROM
if ($self->{sender_host} &&
# fishy, qname should have been "RFC 1035 zone format" -decoded first
lc($question->qname) eq lc($self->{sender_host}) &&
$question->qtype =~ /^(?:A|MX)$/ &&
$pkt->header->rcode =~ /^(?:NXDOMAIN|SERVFAIL)$/ &&
++$self->{sender_host_fail} == 2)
{
for my $rule (@{$rules}) {
$self->got_hit($rule, "DNS: ", ruletype => "dns");
# Mark rule ready for meta rules, but only if this was the last lookup
# pending, rules can have many lookups launched for different IPs
if (!$self->get_async_pending_rules($rulename)) {
$self->rule_ready($rulename);
# Mark depending check_rbl_sub rules too
if (exists $self->{rbl_subs}{$ent->{set}}) {
foreach (@{$self->{rbl_subs}{$ent->{set}}}) {
$self->rule_ready($_->[1]);
}
}
}
@ -274,44 +246,69 @@ sub process_dnsbl_result {
$self->dnsbl_uri($question, $answer);
my $answ_type = $answer->type;
# TODO: there are some CNAME returns that might be useful
next if ($answ_type ne 'A' && $answ_type ne 'TXT');
if ($answ_type eq 'A') {
# Net::DNS::RR::A::address() is available since Net::DNS 0.69
my $ip_address = $answer->UNIVERSAL::can('address') ? $answer->address
: $answer->rdatastr;
next if $answ_type ne 'A' && $answ_type ne 'TXT';
my $rdatastr;
if ($answer->UNIVERSAL::can('txtdata')) {
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
$rdatastr = join('', $answer->txtdata);
} else {
$rdatastr = $answer->rdstring;
# encoded in a RFC 1035 zone file format (escaped), decode it
$rdatastr =~ s{ \\ ( [0-9]{3} | (?![0-9]{3}) . ) }
{ length($1)==3 && $1 <= 255 ? chr($1) : $1 }xgse;
}
# Bug 7236: Net::DNS attempts to decode text strings in a TXT record as
# UTF-8 since version 0.69, which is undesired: octets failing the UTF-8
# decoding are converted to a Unicode "replacement character" U+FFFD, and
# ASCII text is unnecessarily flagged as perl native characters.
utf8::encode($rdatastr) if utf8::is_utf8($rdatastr);
# skip any A record that isn't on 127.0.0.0/8
next if $ip_address !~ /^127\./;
next if $answ_type eq 'A' && $rdatastr !~ /^127\./;
# check_rbl tests
if (defined $ent->{subtest}) {
if ($self->check_subtest($rdatastr, $ent->{subtest})) {
$self->dnsbl_hit($rulename, $question, $answer);
}
for my $rule (@{$rules}) {
$self->dnsbl_hit($rule, $question, $answer);
}
for my $set (@{$sets}) {
if ($self->{dnspost}->{$set}) {
$self->process_dnsbl_set($set, $question, $answer);
} else {
$self->dnsbl_hit($rulename, $question, $answer);
}
# check_rbl_sub tests
if (exists $self->{rbl_subs}{$ent->{set}}) {
$self->process_dnsbl_set($ent->{set}, $question, $answer, $rdatastr);
}
}
return 1;
}
sub process_dnsbl_set {
my ($self, $set, $question, $answer) = @_;
my ($self, $set, $question, $answer, $rdatastr) = @_;
# txtdata returns a non- zone-file-format encoded result, unlike rdstring;
# avoid space-separated RDATA <character-string> fields if possible,
# txtdata provides a list of strings in a list context since Net::DNS 0.69
#
# rdatastr() is historical/undocumented, use rdstring() since Net::DNS 0.69
my $rdatastr = $answer->UNIVERSAL::can('txtdata') ? join('',$answer->txtdata)
: $answer->UNIVERSAL::can('rdstring') ? $answer->rdstring
: $answer->rdatastr;
while (my ($subtest, $rule) = each %{ $self->{dnspost}->{$set} }) {
foreach my $args (@{$self->{rbl_subs}{$set}}) {
my $subtest = $args->[0];
my $rule = $args->[1];
next if $self->{tests_already_hit}->{$rule};
if ($self->check_subtest($rdatastr, $subtest)) {
$self->dnsbl_hit($rule, $question, $answer);
}
}
}
if ($subtest =~ /^\d+\.\d+\.\d+\.\d+$/) {
# test for exact equality, not a regexp (an IPv4 address)
$self->dnsbl_hit($rule, $question, $answer) if $subtest eq $rdatastr;
sub check_subtest {
my ($self, $rdatastr, $subtest) = @_;
# regular expression
if (ref($subtest) eq 'Regexp') {
if ($rdatastr =~ $subtest) {
return 1;
}
}
# bitmask
elsif ($subtest =~ /^\d+$/) {
@ -319,40 +316,21 @@ sub process_dnsbl_set {
if ($rdatastr =~ m/^127\.\d{1,3}\.\d{1,3}\.\d{1,3}$/ &&
Mail::SpamAssassin::Util::my_inet_aton($rdatastr) & $subtest)
{
$self->dnsbl_hit($rule, $question, $answer);
return 1;
}
}
# regular expression
else {
my $test = qr/$subtest/;
if ($rdatastr =~ /$test/) {
$self->dnsbl_hit($rule, $question, $answer);
}
# test for exact equality (an IPv4 address)
if ($subtest eq $rdatastr) {
return 1;
}
}
return 0;
}
sub harvest_until_rule_completes {
my ($self, $rule) = @_;
dbg("dns: harvest_until_rule_completes");
my $result = 0;
for (my $first=1; ; $first=0) {
# complete_lookups() may call completed_callback(), which may
# call start_lookup() again (like in Plugin::URIDNSBL)
my ($alldone,$anydone) =
$self->{async}->complete_lookups($first ? 0 : 1.0, 1);
$result = 1 if $self->is_rule_complete($rule);
last if $result || $alldone;
dbg("dns: harvest_until_rule_completes - check_tick");
$self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
}
return $result;
}
# Deprecated since 4.0, meta rules do not depend on priorities anymore
sub harvest_until_rule_completes {}
sub harvest_dnsbl_queries {
my ($self) = @_;
@ -370,7 +348,7 @@ sub harvest_dnsbl_queries {
my ($alldone,$anydone) =
$self->{async}->complete_lookups($first ? 0 : 1.0, 1);
last if $alldone;
last if $alldone || $self->{deadline_exceeded} || $self->{shortcircuited};
dbg("dns: harvest_dnsbl_queries - check_tick");
$self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
@ -379,7 +357,6 @@ sub harvest_dnsbl_queries {
# explicitly abort anything left
$self->{async}->abort_remaining_lookups();
$self->{async}->log_lookups_timing();
$self->mark_all_async_rules_complete();
1;
}
@ -403,12 +380,14 @@ sub harvest_completed_queries {
sub set_rbl_tag_data {
my ($self) = @_;
return if !$self->{dnsuri};
# DNS URIs
my $rbl_tag = $self->{tag_data}->{RBL}; # just in case, should be empty
$rbl_tag = '' if !defined $rbl_tag;
while (my ($dnsuri, $answers) = each %{ $self->{dnsuri} }) {
while (my ($dnsuri, $answers) = each %{$self->{dnsuri}}) {
# when parsing, look for elements of \".*?\" or \S+ with ", " as separator
$rbl_tag .= "<$dnsuri>" . " [" . join(", ", @{ $answers }) . "]\n";
$rbl_tag .= "<$dnsuri>" . " [" . join(", ", keys %$answers) . "]\n";
}
if (defined $rbl_tag && $rbl_tag ne '') {
chomp $rbl_tag;
@ -423,7 +402,7 @@ sub rbl_finish {
$self->set_rbl_tag_data();
delete $self->{dnspost};
delete $self->{rbl_subs};
delete $self->{dnsuri};
}
@ -442,55 +421,103 @@ sub clear_resolver {
return 0;
}
# Deprecated since 4.0.0
sub lookup_ns {
warn "dns: deprecated lookup_ns called, query ignored\n";
return;
}
sub test_dns_a_aaaa {
my ($self, $dom) = @_;
return unless $self->load_resolver();
return if ($self->server_failed_to_respond_for_domain ($dom));
my $nsrecords;
dbg("dns: looking up NS for '$dom'");
my ($a, $aaaa) = (0, 0);
if ($self->{conf}->{dns_options}->{v4}) {
eval {
my $query = $self->{resolver}->send($dom, 'NS');
my @nses;
my $query = $self->{resolver}->send($dom, 'A');
if ($query) {
foreach my $rr ($query->answer) {
if ($rr->type eq "NS") { push (@nses, $rr->nsdname); }
if ($rr->type eq 'A') { $a = 1; last; }
}
}
$nsrecords = [ @nses ];
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("dns: NS lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
return;
dbg("dns: test A lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
return (undef, undef);
};
if (!$a) {
dbg("dns: test A lookup returned no results, use \"dns_options nov4\" if resolver doesn't support A queries");
}
} else {
$a = 1;
}
$nsrecords;
if ($self->{conf}->{dns_options}->{v6}) {
eval {
my $query = $self->{resolver}->send($dom, 'AAAA');
if ($query) {
foreach my $rr ($query->answer) {
if ($rr->type eq 'AAAA') { $aaaa = 1; last; }
}
}
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("dns: test AAAA lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
return (undef, undef);
};
if (!$aaaa) {
dbg("dns: test AAAA lookup returned no results, use \"dns_options nov6\" if resolver doesn't support AAAA queries");
}
} else {
$aaaa = 1;
}
return ($a, $aaaa);
}
sub is_dns_available {
my ($self) = @_;
my $dnsopt = $self->{conf}->{dns_available};
my $dnsint = $self->{conf}->{dns_test_interval} || 600;
my @domains;
$LAST_DNS_CHECK ||= 0;
my $diff = time() - $LAST_DNS_CHECK;
# Fast response for the most common cases
return 1 if $IS_DNS_AVAILABLE && $dnsopt eq "yes";
return 0 if defined $IS_DNS_AVAILABLE && $dnsopt eq "no";
# croak on misconfigured flags
if (!$self->{conf}->{dns_options}->{v4} &&
!$self->{conf}->{dns_options}->{v6})
{
warn 'dns: error: dns_options "nov4" and "nov6" are both set, '.
' only use either, or use "dns_available no" to really disable DNS'.
"\n";
$IS_DNS_AVAILABLE = 0;
$self->{conf}->{dns_available} = "no";
return 0;
}
# undef $IS_DNS_AVAILABLE if we should be testing for
# working DNS and our check interval time has passed
if ($dnsopt eq "test" && $diff > $dnsint) {
if ($dnsopt eq "test") {
my $diff = time - $LAST_DNS_CHECK;
if ($diff > ($self->{conf}->{dns_test_interval}||600)) {
$IS_DNS_AVAILABLE = undef;
dbg("dns: is_dns_available() last checked %.1f seconds ago; re-checking",
$diff);
if ($LAST_DNS_CHECK) {
dbg("dns: is_dns_available() last checked %.1f seconds ago; re-checking", $diff);
} else {
dbg("dns: is_dns_available() initial check");
}
}
$LAST_DNS_CHECK = time;
}
return $IS_DNS_AVAILABLE if (defined $IS_DNS_AVAILABLE);
$LAST_DNS_CHECK = time();
return $IS_DNS_AVAILABLE if defined $IS_DNS_AVAILABLE;
$IS_DNS_AVAILABLE = 0;
if ($dnsopt eq "no") {
dbg("dns: dns_available set to no in config file, skipping test");
return $IS_DNS_AVAILABLE;
@ -498,26 +525,16 @@ sub is_dns_available {
# Even if "dns_available" is explicitly set to "yes", we want to ignore
# DNS if we're only supposed to be looking at local tests.
goto done if ($self->{main}->{local_tests_only});
# Check version numbers - runtime check only
if (defined $Net::DNS::VERSION) {
if (am_running_on_windows()) {
if ($Net::DNS::VERSION < 0.46) {
warn("dns: Net::DNS version is $Net::DNS::VERSION, but need 0.46 for Win32");
if ($self->{main}->{local_tests_only}) {
dbg("dns: using local tests only, DNS not available");
return $IS_DNS_AVAILABLE;
}
}
else {
if ($Net::DNS::VERSION < 0.34) {
warn("dns: Net::DNS version is $Net::DNS::VERSION, but need 0.34");
#$self->clear_resolver();
if (!$self->load_resolver()) {
dbg("dns: could not load resolver, DNS not available");
return $IS_DNS_AVAILABLE;
}
}
}
$self->clear_resolver();
goto done unless $self->load_resolver();
if ($dnsopt eq "yes") {
# optionally shuffle the list of nameservers to distribute the load
@ -532,13 +549,18 @@ sub is_dns_available {
return $IS_DNS_AVAILABLE;
}
my @domains;
my @rtypes;
push @rtypes, 'A' if $self->{main}->{conf}->{dns_options}->{v4};
push @rtypes, 'AAAA' if $self->{main}->{conf}->{dns_options}->{v6};
if ($dnsopt =~ /^test:\s*(\S.*)$/) {
@domains = split (/\s+/, $1);
dbg("dns: looking up NS records for user specified domains: %s",
join(", ", @domains));
dbg("dns: testing %s records for user specified domains: %s",
join("/", @rtypes), join(", ", @domains));
} else {
@domains = @EXISTING_DOMAINS;
dbg("dns: looking up NS records for built-in domains");
dbg("dns: testing %s records for built-in domains: %s",
join("/", @rtypes), join(", ", @domains));
}
# do the test with a full set of configured nameservers
@ -556,19 +578,19 @@ sub is_dns_available {
my @good_nameservers;
foreach my $ns (@nameservers) {
$self->{resolver}->available_nameservers($ns); # try just this one
for (my $retry = 3; $retry > 0 && @domains; $retry--) {
for (my $retry = 0; $retry < 3 && @domains; $retry++) {
my $domain = splice(@domains, rand(@domains), 1);
dbg("dns: trying ($retry) $domain, server $ns ...");
my $result = $self->lookup_ns($domain);
dbg("dns: trying $domain, server $ns ..." .
($retry ? " (retry $retry)" : ""));
my ($ok_a, $ok_aaaa) = $self->test_dns_a_aaaa($domain);
$self->{resolver}->finish_socket();
if (!$result) {
dbg("dns: NS lookup of $domain using $ns failed horribly, ".
"may not be a valid nameserver");
if (!defined $ok_a || !defined $ok_aaaa) {
# error printed already
last;
} elsif (!@$result) {
dbg("dns: NS lookup of $domain using $ns failed, no results found");
} elsif (!$ok_a && !$ok_aaaa) {
dbg("dns: lookup of $domain using $ns failed, no results found");
} else {
dbg("dns: NS lookup of $domain using $ns succeeded => DNS available".
dbg("dns: lookup of $domain using $ns succeeded => DNS available".
" (set dns_available to override)");
push(@good_nameservers, $ns);
last;
@ -585,8 +607,6 @@ sub is_dns_available {
$self->{resolver}->available_nameservers(@good_nameservers);
}
done:
# jm: leaving this in!
dbg("dns: is DNS available? " . $IS_DNS_AVAILABLE);
return $IS_DNS_AVAILABLE;
}
@ -676,55 +696,26 @@ sub cleanup_kids {
###########################################################################
sub register_async_rule_start {
# Deprecated async functions, everything is handled automatically
# now by bgsend .. $self->{async}->{pending_rules}
sub register_async_rule_start {}
sub register_async_rule_finish {}
sub mark_all_async_rules_complete {}
sub is_rule_complete {}
# Return number of pending DNS lookups for a rule,
# or list all of rules still pending
sub get_async_pending_rules {
my ($self, $rule) = @_;
dbg("dns: $rule lookup start");
$self->{rule_to_rblkey}->{$rule} = '*ASYNC_START';
}
sub register_async_rule_finish {
my ($self, $rule) = @_;
dbg("dns: $rule lookup finished");
delete $self->{rule_to_rblkey}->{$rule};
}
sub mark_all_async_rules_complete {
my ($self) = @_;
$self->{rule_to_rblkey} = { };
}
sub is_rule_complete {
my ($self, $rule) = @_;
my $key = $self->{rule_to_rblkey}->{$rule};
if (!defined $key) {
# dbg("dns: $rule lookup complete, not in list");
return 1;
if (defined $rule) {
return 0 if !exists $self->{async}->{pending_rules}{$rule};
return scalar keys %{$self->{async}->{pending_rules}{$rule}};
} else {
return grep { %{$self->{async}->{pending_rules}{$_}} }
keys %{$self->{async}->{pending_rules}};
}
if ($key eq '*ASYNC_START') {
dbg("dns: $rule lookup not yet complete");
return 0; # not yet complete
}
my $ent = $self->{async}->get_lookup($key);
if (!defined $ent) {
dbg("dns: $rule lookup complete, $key no longer pending");
return 1;
}
dbg("dns: $rule lookup not yet complete");
return 0; # not yet complete
}
###########################################################################
# interface called by SPF plugin
sub check_for_from_dns {
my ($self, $pms) = @_;
if (defined $pms->{sender_host_fail}) {
return ($pms->{sender_host_fail} == 2); # both MX and A need to fail
}
}
1;

View File

@ -40,21 +40,24 @@ use warnings;
# use bytes;
use re 'taint';
require 5.008001; # needs utf8::is_utf8()
use Mail::SpamAssassin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:ip);
use Mail::SpamAssassin::Util qw(untaint_var decode_dns_question_entry);
use Mail::SpamAssassin::Util qw(untaint_var decode_dns_question_entry
idn_to_ascii reverse_ip_address
domain_to_search_list);
use Socket;
use Errno qw(EADDRINUSE EACCES);
use Time::HiRes qw(time);
use version 0.77;
our @ISA = qw();
our $have_net_dns;
our $io_socket_module_name;
BEGIN {
$have_net_dns = eval { require Net::DNS; };
if (eval { require IO::Socket::IP }) {
$io_socket_module_name = 'IO::Socket::IP';
} elsif (eval { require IO::Socket::INET6 }) {
@ -78,7 +81,6 @@ sub new {
};
bless ($self, $class);
$self->load_resolver();
$self;
}
@ -94,8 +96,8 @@ Load the C<Net::DNS::Resolver> object. Returns 0 if Net::DNS cannot be used,
sub load_resolver {
my ($self) = @_;
if ($self->{res}) { return 1; }
$self->{no_resolver} = 1;
return 0 if $self->{no_resolver};
return 1 if $self->{res};
# force only ipv4 if no IO::Socket::INET6 or ipv6 doesn't work
my $force_ipv4 = $self->{main}->{force_ipv4};
@ -112,7 +114,7 @@ sub load_resolver {
if ($io_socket_module_name) {
$sock6 = $io_socket_module_name->new(LocalAddr=>'::', Proto=>'udp');
}
if ($sock6) { $sock6->close() or warn "error closing socket: $!" }
if ($sock6) { $sock6->close() or warn "dns: error closing socket: $!\n" }
$sock6;
} or do {
dbg("dns: socket module %s is available, but no host support for IPv6",
@ -123,13 +125,14 @@ sub load_resolver {
}
eval {
require Net::DNS;
die "Net::DNS required\n" if !$have_net_dns;
die "Net::DNS 0.69 required\n"
if (version->parse(Net::DNS->VERSION) < version->parse(0.69));
# force_v4 is set in new() to avoid error in older versions of Net::DNS
# that don't have it; other options are set by function calls so a typo
# or API change will cause an error here
my $res = $self->{res} = Net::DNS::Resolver->new(force_v4 => $force_ipv4);
if ($res) {
$self->{no_resolver} = 0;
$self->{force_ipv4} = $force_ipv4;
$self->{force_ipv6} = $force_ipv6;
$self->{retry} = 1; # retries for non-backgrounded query
@ -164,7 +167,7 @@ sub load_resolver {
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("dns: eval failed: $eval_stat");
warn("dns: resolver create failed: $eval_stat\n");
};
dbg("dns: using socket module: %s version %s%s",
@ -173,12 +176,13 @@ sub load_resolver {
$self->{force_ipv4} ? ', forced IPv4' :
$self->{force_ipv6} ? ', forced IPv6' : '');
dbg("dns: is Net::DNS::Resolver available? %s",
$self->{no_resolver} ? "no" : "yes" );
if (!$self->{no_resolver} && defined $Net::DNS::VERSION) {
$self->{res} ? "yes" : "no" );
if ($self->{res} && defined $Net::DNS::VERSION) {
dbg("dns: Net::DNS version: %s", $Net::DNS::VERSION);
}
return (!$self->{no_resolver});
$self->{no_resolver} = !$self->{res};
return defined $self->{res};
}
=item $resolver = $res->get_resolver()
@ -237,18 +241,17 @@ sub available_nameservers {
}
if ($self->{force_ipv4} || $self->{force_ipv6}) {
# filter the list according to a chosen protocol family
my $ip4_re = IPV4_ADDRESS;
my(@filtered_addr_port);
for (@{$self->{available_dns_servers}}) {
local($1,$2);
/^ \[ (.*) \] : (\d+) \z/xs or next;
my($addr,$port) = ($1,$2);
if ($addr =~ /^${ip4_re}\z/o) {
if ($addr =~ IS_IPV4_ADDRESS) {
push(@filtered_addr_port, $_) unless $self->{force_ipv6};
} elsif ($addr =~ /:.*:/) {
push(@filtered_addr_port, $_) unless $self->{force_ipv4};
} else {
warn "Unrecognized DNS server specification: $_";
warn "dns: Unrecognized DNS server specification: $_\n";
}
}
if (@filtered_addr_port < @{$self->{available_dns_servers}}) {
@ -361,7 +364,7 @@ sub connect_sock {
if ($self->{sock}) {
$self->{sock}->close()
or info("connect_sock: error closing socket %s: %s", $self->{sock}, $!);
or info("dns: connect_sock: error closing socket %s: %s", $self->{sock}, $!);
$self->{sock} = undef;
}
my $sock;
@ -378,13 +381,12 @@ sub connect_sock {
# is unspecified, causing EINVAL failure when automatically assigned local
# IP address and a remote address do not belong to the same address family.
# Let's choose a suitable source address if possible.
my $ip4_re = IPV4_ADDRESS;
my $srcaddr;
if ($self->{force_ipv4}) {
$srcaddr = "0.0.0.0";
} elsif ($self->{force_ipv6}) {
$srcaddr = "::";
} elsif ($ns_addr =~ /^${ip4_re}\z/o) {
} elsif ($ns_addr =~ IS_IPV4_ADDRESS) {
$srcaddr = "0.0.0.0";
} elsif ($ns_addr =~ /:.*:/) {
$srcaddr = "::";
@ -400,10 +402,10 @@ sub connect_sock {
$lport = $self->pick_random_available_port();
if (!defined $lport) {
$lport = 0;
dbg("no configured local ports for DNS queries, letting OS choose");
dbg("dns: no configured local ports for DNS queries, letting OS choose");
}
if ($attempts+1 > 50) { # sanity check
warn "could not create a DNS resolver socket in $attempts attempts\n";
warn "dns: could not create a DNS resolver socket in $attempts attempts\n";
$errno = 0;
last;
}
@ -431,12 +433,12 @@ sub connect_sock {
$self->disable_available_port($lport);
}
} else {
warn "error creating a DNS resolver socket: $errno";
warn "dns: error creating a DNS resolver socket: $errno";
goto no_sock;
}
}
if (!$sock) {
warn "could not create a DNS resolver socket in $attempts attempts: $errno";
warn "dns: could not create a DNS resolver socket in $attempts attempts: $errno\n";
goto no_sock;
}
@ -534,9 +536,8 @@ sub new_dns_packet {
# construct a PTR query if it looks like an IPv4 address
if (!defined($type) || $type eq 'PTR') {
local($1,$2,$3,$4);
if ($domain =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) {
$domain = "$4.$3.$2.$1.in-addr.arpa.";
if ($domain =~ IS_IPV4_ADDRESS) {
$domain = reverse_ip_address($domain).".in-addr.arpa.";
$type = 'PTR';
}
}
@ -598,13 +599,7 @@ sub new_dns_packet {
my $udp_payload_size = $self->{conf}->{dns_options}->{edns};
if ($udp_payload_size && $udp_payload_size > 512) {
# dbg("dns: adding EDNS ext, UDP payload size %d", $udp_payload_size);
if ($packet->UNIVERSAL::can('edns')) { # available since Net::DNS 0.69
$packet->edns->size($udp_payload_size);
} else { # legacy mechanism
my $optrr = Net::DNS::RR->new(Type => 'OPT', Name => '', TTL => 0,
Class => $udp_payload_size);
$packet->push('additional', $optrr);
}
}
}
@ -658,6 +653,8 @@ sub _packet_id {
=item $id = $res->bgsend($domain, $type, $class, $cb)
DIRECT USE DISCOURAGED, please use bgsend_and_start_lookup in plugins.
Quite similar to C<Net::DNS::Resolver::bgsend>, except that when a reply
packet eventually arrives, and C<poll_responses> is called, the callback
sub reference C<$cb> will be called.
@ -673,7 +670,7 @@ be used, like so:
my $id = $self->{resolver}->bgsend($domain, $type, undef, sub {
my ($reply, $reply_id, $timestamp) = @_;
$self->got_a_reply ($reply, $reply_id);
$self->got_a_reply($reply, $reply_id);
});
The callback can ignore the reply as an invalid packet sent to the listening
@ -685,6 +682,19 @@ sub bgsend {
my ($self, $domain, $type, $class, $cb) = @_;
return if $self->{no_resolver};
my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
if ($dns_query_blockages) {
my $search_list = domain_to_search_list($domain);
foreach my $parent_domain ((@$search_list, '*')) {
my $blocked = $dns_query_blockages->{$parent_domain};
next if !defined $blocked; # not listed
last if !$blocked; # allowed
# blocked
dbg("dns: bgsend, query $type/$domain blocked by dns_query_restriction: $parent_domain");
return;
}
}
$self->{send_timed_out} = 0;
my $pkt = $self->new_dns_packet($domain, $type, $class);
@ -747,7 +757,7 @@ sub bgread {
$answerpkt or die "bgread: decoding DNS packet failed: $@";
$answerpkt->answerfrom($peerhost);
if (defined $decoded_length && $decoded_length ne "" && $decoded_length != length($data)) {
warn sprintf("bgread: received a %d bytes packet from %s, decoded %d bytes\n",
warn sprintf("dns: bgread: received a %d bytes packet from %s, decoded %d bytes\n",
length($data), $peerhost, $decoded_length);
}
return $answerpkt;
@ -767,13 +777,16 @@ sub poll_responses {
return if $self->{no_resolver};
return if !$self->{sock};
my $cnt = 0;
my $cnt_cb = 0;
my $rin = $self->{sock_as_vec};
my $rout;
for (;;) {
my ($nfound, $timeleft, $eval_stat);
eval { # use eval to catch alarm signal
# if a restartable signal is caught, retry 3 times before aborting
my $eintrcount = 3;
eval { # use eval to caught alarm signal
my $timer; # collects timestamp when variable goes out of scope
if (!defined($timeout) || $timeout > 0)
{ $timer = $self->{main}->time_method("poll_dns_idle") }
@ -787,16 +800,21 @@ sub poll_responses {
# most likely due to an alarm signal, resignal if so
die "dns: (2) $eval_stat\n" if $eval_stat =~ /__alarm__ignore__\(.*\)/s;
warn "dns: select aborted: $eval_stat\n";
return;
last;
} elsif (!defined $nfound || $nfound < 0) {
if ($!{EINTR} and $eintrcount > 0) {
$eintrcount--;
next;
}
if ($!) { warn "dns: select failed: $!\n" }
else { info("dns: select interrupted") } # shouldn't happen
return;
last;
} elsif (!$nfound) {
if (!defined $timeout) { warn("dns: select returned empty-handed\n") }
elsif ($timeout > 0) { dbg("dns: select timed out %.3f s", $timeout) }
return;
last;
}
$cnt += $nfound;
my $now = time;
$timeout = 0; # next time around collect whatever is available, then exit
@ -853,12 +871,12 @@ sub poll_responses {
if ($cb) {
$cb->($packet, $id, $now);
$cnt++;
$cnt_cb++;
} else { # no match, report the problem
if ($rcode eq 'REFUSED' || $id =~ m{^\d+/NO_QUESTION_IN_PACKET\z}) {
# the failure was already reported above
} else {
dbg("dns: no callback for id $id, ignored, packet on next debug line");
info("dns: no callback for id $id, ignored, packet on next debug line");
# prevent filling normal logs with huge packet dumps
dbg("dns: %s", $packet ? $packet->string : "undef");
}
@ -867,11 +885,11 @@ sub poll_responses {
if ($id =~ m{^(\d+)/}) {
my $dnsid = $1; # the raw DNS packet id
my @matches =
grep(m{^\Q$dnsid\E/}, keys %{$self->{id_to_callback}});
grep(m{^\Q$dnsid\E/}o, keys %{$self->{id_to_callback}});
if (!@matches) {
dbg("dns: no likely matching queries for id %s", $dnsid);
info("dns: no likely matching queries for id %s", $dnsid);
} else {
dbg("dns: a likely matching query: %s", join(', ', @matches));
info("dns: a likely matching query: %s", join(', ', @matches));
}
}
}
@ -879,7 +897,35 @@ sub poll_responses {
}
}
return $cnt;
return ($cnt, $cnt_cb);
}
use constant RECV_FLAGS => eval { MSG_DONTWAIT } || 0; # Not in Windows
# Used to flush stale DNS responses, which we don't need to process
sub flush_responses {
my ($self) = @_;
return if $self->{no_resolver};
return if !$self->{sock};
my $rin = $self->{sock_as_vec};
my $rout;
my $nfound;
my $packetsize = $self->{res}->udppacketsize;
$packetsize = 512 if $packetsize < 512; # just in case
$self->{sock}->blocking(0) unless(RECV_FLAGS);
for (;;) {
eval { # use eval to catch alarm signal
($nfound, undef) = select($rout=$rin, undef, undef, 0);
1;
} or do {
last;
};
last if !$nfound;
last if !$self->{sock}->recv(my $data, $packetsize+256, RECV_FLAGS);
}
$self->{sock}->blocking(1) unless(RECV_FLAGS);
}
###########################################################################
@ -920,8 +966,9 @@ sub send {
# using some arbitrary encoding (they are normally just 7-bit ascii
# characters anyway, just need to get rid of the utf8 flag). Bug 6959
# Most if not all af these come from a SPF plugin.
# (was a call to utf8::encode($name), now we prefer a proper idn_to_ascii)
#
utf8::encode($name);
$name = idn_to_ascii($name);
my $retrans = $self->{retrans};
my $retries = $self->{retry};
@ -985,7 +1032,7 @@ sub finish_socket {
my ($self) = @_;
if ($self->{sock}) {
$self->{sock}->close()
or warn "finish_socket: error closing socket $self->{sock}: $!";
or warn "dns: finish_socket: error closing socket $self->{sock}: $!\n";
undef $self->{sock};
}
}
@ -1014,7 +1061,7 @@ sub fhs_to_vec {
foreach my $sock (@fhlist) {
my $fno = fileno($sock);
if (!defined $fno) {
warn "dns: oops! fileno now undef for $sock";
warn "dns: oops! fileno now undef for $sock\n";
} else {
vec ($rin, $fno, 1) = 1;
}

View File

@ -0,0 +1,899 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::GeoDB - unified interface for geoip modules
Plugins need to signal SA main package the modules they want loaded
package Mail::SpamAssassin::Plugin::MyPlugin;
sub new {
...
$self->{main}->{geodb_wanted}->{country} = 1;
$self->{main}->{geodb_wanted}->{isp} = 1;
)
(internal stuff still subject to change)
=cut
package Mail::SpamAssassin::GeoDB;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Socket;
use version 0.77;
our @ISA = qw();
use Mail::SpamAssassin::Constants qw(:ip);
use Mail::SpamAssassin::Logger;
my @geoip_default_path = qw(
/usr/local/share/GeoIP
/usr/share/GeoIP
/var/lib/GeoIP
/opt/share/GeoIP
);
# load order (city contains country, isp contains asn)
my @geoip_types = qw( city country isp asn );
# v6 is not needed, automatically tries *v6.dat also
my %geoip_default_files = (
'city' => ['GeoIPCity.dat','GeoLiteCity.dat'],
'country' => ['GeoIP.dat'],
'isp' => ['GeoIPISP.dat'],
'asn' => ['GeoIPASNum.dat'],
);
my %geoip2_default_files = (
'city' => ['GeoIP2-City.mmdb','GeoLite2-City.mmdb',
'dbip-city.mmdb','dbip-city-lite.mmdb'],
'country' => ['GeoIP2-Country.mmdb','GeoLite2-Country.mmdb',
'dbip-country.mmdb','dbip-country-lite.mmdb'],
'isp' => ['GeoIP2-ISP.mmdb','GeoLite2-ISP.mmdb'],
'asn' => ['GeoIP2-ASN.mmdb','GeoLite2-ASN.mmdb'],
);
my %country_to_continent = (
'AP'=>'AS','EU'=>'EU','AD'=>'EU','AE'=>'AS','AF'=>'AS','AG'=>'NA',
'AI'=>'NA','AL'=>'EU','AM'=>'AS','CW'=>'NA','AO'=>'AF','AQ'=>'AN',
'AR'=>'SA','AS'=>'OC','AT'=>'EU','AU'=>'OC','AW'=>'NA','AZ'=>'AS',
'BA'=>'EU','BB'=>'NA','BD'=>'AS','BE'=>'EU','BF'=>'AF','BG'=>'EU',
'BH'=>'AS','BI'=>'AF','BJ'=>'AF','BM'=>'NA','BN'=>'AS','BO'=>'SA',
'BR'=>'SA','BS'=>'NA','BT'=>'AS','BV'=>'AN','BW'=>'AF','BY'=>'EU',
'BZ'=>'NA','CA'=>'NA','CC'=>'AS','CD'=>'AF','CF'=>'AF','CG'=>'AF',
'CH'=>'EU','CI'=>'AF','CK'=>'OC','CL'=>'SA','CM'=>'AF','CN'=>'AS',
'CO'=>'SA','CR'=>'NA','CU'=>'NA','CV'=>'AF','CX'=>'AS','CY'=>'AS',
'CZ'=>'EU','DE'=>'EU','DJ'=>'AF','DK'=>'EU','DM'=>'NA','DO'=>'NA',
'DZ'=>'AF','EC'=>'SA','EE'=>'EU','EG'=>'AF','EH'=>'AF','ER'=>'AF',
'ES'=>'EU','ET'=>'AF','FI'=>'EU','FJ'=>'OC','FK'=>'SA','FM'=>'OC',
'FO'=>'EU','FR'=>'EU','FX'=>'EU','GA'=>'AF','GB'=>'EU','GD'=>'NA',
'GE'=>'AS','GF'=>'SA','GH'=>'AF','GI'=>'EU','GL'=>'NA','GM'=>'AF',
'GN'=>'AF','GP'=>'NA','GQ'=>'AF','GR'=>'EU','GS'=>'AN','GT'=>'NA',
'GU'=>'OC','GW'=>'AF','GY'=>'SA','HK'=>'AS','HM'=>'AN','HN'=>'NA',
'HR'=>'EU','HT'=>'NA','HU'=>'EU','ID'=>'AS','IE'=>'EU','IL'=>'AS',
'IN'=>'AS','IO'=>'AS','IQ'=>'AS','IR'=>'AS','IS'=>'EU','IT'=>'EU',
'JM'=>'NA','JO'=>'AS','JP'=>'AS','KE'=>'AF','KG'=>'AS','KH'=>'AS',
'KI'=>'OC','KM'=>'AF','KN'=>'NA','KP'=>'AS','KR'=>'AS','KW'=>'AS',
'KY'=>'NA','KZ'=>'AS','LA'=>'AS','LB'=>'AS','LC'=>'NA','LI'=>'EU',
'LK'=>'AS','LR'=>'AF','LS'=>'AF','LT'=>'EU','LU'=>'EU','LV'=>'EU',
'LY'=>'AF','MA'=>'AF','MC'=>'EU','MD'=>'EU','MG'=>'AF','MH'=>'OC',
'MK'=>'EU','ML'=>'AF','MM'=>'AS','MN'=>'AS','MO'=>'AS','MP'=>'OC',
'MQ'=>'NA','MR'=>'AF','MS'=>'NA','MT'=>'EU','MU'=>'AF','MV'=>'AS',
'MW'=>'AF','MX'=>'NA','MY'=>'AS','MZ'=>'AF','NA'=>'AF','NC'=>'OC',
'NE'=>'AF','NF'=>'OC','NG'=>'AF','NI'=>'NA','NL'=>'EU','NO'=>'EU',
'NP'=>'AS','NR'=>'OC','NU'=>'OC','NZ'=>'OC','OM'=>'AS','PA'=>'NA',
'PE'=>'SA','PF'=>'OC','PG'=>'OC','PH'=>'AS','PK'=>'AS','PL'=>'EU',
'PM'=>'NA','PN'=>'OC','PR'=>'NA','PS'=>'AS','PT'=>'EU','PW'=>'OC',
'PY'=>'SA','QA'=>'AS','RE'=>'AF','RO'=>'EU','RU'=>'EU','RW'=>'AF',
'SA'=>'AS','SB'=>'OC','SC'=>'AF','SD'=>'AF','SE'=>'EU','SG'=>'AS',
'SH'=>'AF','SI'=>'EU','SJ'=>'EU','SK'=>'EU','SL'=>'AF','SM'=>'EU',
'SN'=>'AF','SO'=>'AF','SR'=>'SA','ST'=>'AF','SV'=>'NA','SY'=>'AS',
'SZ'=>'AF','TC'=>'NA','TD'=>'AF','TF'=>'AN','TG'=>'AF','TH'=>'AS',
'TJ'=>'AS','TK'=>'OC','TM'=>'AS','TN'=>'AF','TO'=>'OC','TL'=>'AS',
'TR'=>'EU','TT'=>'NA','TV'=>'OC','TW'=>'AS','TZ'=>'AF','UA'=>'EU',
'UG'=>'AF','UM'=>'OC','US'=>'NA','UY'=>'SA','UZ'=>'AS','VA'=>'EU',
'VC'=>'NA','VE'=>'SA','VG'=>'NA','VI'=>'NA','VN'=>'AS','VU'=>'OC',
'WF'=>'OC','WS'=>'OC','YE'=>'AS','YT'=>'AF','RS'=>'EU','ZA'=>'AF',
'ZM'=>'AF','ME'=>'EU','ZW'=>'AF','AX'=>'EU','GG'=>'EU','IM'=>'EU',
'JE'=>'EU','BL'=>'NA','MF'=>'NA','BQ'=>'NA','SS'=>'AF','**'=>'**',
);
sub new {
my ($class, $conf) = @_;
$class = ref($class) || $class;
my $self = {};
bless ($self, $class);
$self->{cache} = ();
$self->init_database($conf || {});
$self;
}
sub init_database {
my ($self, $opts) = @_;
# Try city too if country wanted
$opts->{wanted}->{city} = 1 if $opts->{wanted}->{country};
# Try isp too if asn wanted
$opts->{wanted}->{isp} = 1 if $opts->{wanted}->{asn};
my $geodb_opts = {
'module' => $opts->{conf}->{module} || undef,
'dbs' => $opts->{conf}->{options} || undef,
'wanted' => $opts->{wanted} || undef,
'search_path' => defined $opts->{conf}->{geodb_search_path} ?
$opts->{conf}->{geodb_search_path} : \@geoip_default_path,
};
my ($db, $dbapi, $loaded);
## GeoIP2
if (!$db && (!$geodb_opts->{module} || $geodb_opts->{module} eq 'geoip2')) {
($db, $dbapi) = $self->load_geoip2($geodb_opts);
$loaded = 'geoip2' if $db;
}
## Geo::IP
if (!$db && (!$geodb_opts->{module} || $geodb_opts->{module} eq 'geoip')) {
($db, $dbapi) = $self->load_geoip($geodb_opts);
$loaded = 'geoip' if $db;
}
## IP::Country::DB_File
if (!$db && $geodb_opts->{module} && $geodb_opts->{module} eq 'dbfile') {
# Only try if geodb_module and path to ipcc.db specified
($db, $dbapi) = $self->load_dbfile($geodb_opts);
$loaded = 'dbfile' if $db;
}
## IP::Country::Fast
if (!$db && (!$geodb_opts->{module} || $geodb_opts->{module} eq 'fast')) {
($db, $dbapi) = $self->load_fast($geodb_opts);
$loaded = 'fast' if $db;
}
if (!$db) {
dbg("geodb: No supported database could be loaded");
die("No supported GeoDB database could be loaded\n");
}
# country can be aliased to city
if (!$dbapi->{country} && $dbapi->{city}) {
$dbapi->{country} = $dbapi->{city};
}
if (!$dbapi->{country_v6} && $dbapi->{city_v6}) {
$dbapi->{country_v6} = $dbapi->{city_v6}
}
# GeoIP2 asn can be aliased to isp
if ($loaded eq 'geoip2') {
if (!$dbapi->{asn} && $dbapi->{isp}) {
$dbapi->{asn} = $dbapi->{isp};
}
if (!$dbapi->{asn_v6} && $dbapi->{isp_v6}) {
$dbapi->{asn_v6} = $dbapi->{isp_v6}
}
}
$self->{db} = $db;
$self->{dbapi} = $dbapi;
foreach (@{$self->get_dbinfo()}) {
dbg("geodb: database info: ".$_);
}
#dbg("geodb: apis available: ".join(', ', sort keys %{$self->{dbapi}}));
return 1;
}
sub load_geoip2 {
my ($self, $geodb_opts) = @_;
my ($db, $dbapi, $ok);
# Warn about fatal errors if this module was specifically requested
my $errwarn = ($geodb_opts->{module}||'') eq 'geoip2';
eval {
require MaxMind::DB::Reader;
} or do {
my $err = $@;
$err =~ s/ at .*//s;
$err = "geodb: MaxMind::DB::Reader (GeoIP2) module load failed: $err";
$errwarn ? warn("$err\n") : dbg($err);
return (undef, undef);
};
my %path;
foreach my $dbtype (@geoip_types) {
# skip country if city already loaded
next if $dbtype eq 'country' && $db->{city};
# skip asn if isp already loaded
next if $dbtype eq 'asn' && $db->{isp};
# skip if not needed
next if $geodb_opts->{wanted} && !$geodb_opts->{wanted}->{$dbtype};
# only autosearch if no absolute path given
if (!defined $geodb_opts->{dbs}->{$dbtype}) {
# Try some default locations
PATHS_GEOIP2: foreach my $p (@{$geodb_opts->{search_path}}) {
foreach my $f (@{$geoip2_default_files{$dbtype}}) {
if (-f "$p/$f") {
$path{$dbtype} = "$p/$f";
dbg("geodb: GeoIP2: search found $dbtype $p/$f");
last PATHS_GEOIP2;
}
}
}
} else {
if (!-f $geodb_opts->{dbs}->{$dbtype}) {
dbg("geodb: GeoIP2: $dbtype database requested, but not found: ".
$geodb_opts->{dbs}->{$dbtype});
next;
}
$path{$dbtype} = $geodb_opts->{dbs}->{$dbtype};
}
if (defined $path{$dbtype}) {
eval {
$db->{$dbtype} = MaxMind::DB::Reader->new(
file => $path{$dbtype},
);
die "unknown error" unless $db->{$dbtype};
1;
};
if ($@ || !$db->{$dbtype}) {
my $err = $@;
$err =~ s/\s+Trace begun.*//s;
$err =~ s/ at .*//s;
dbg("geodb: GeoIP2: $dbtype load failed: $err");
} else {
dbg("geodb: GeoIP2: loaded $dbtype from $path{$dbtype}");
$ok = 1;
}
} else {
my $from = defined $geodb_opts->{dbs}->{$dbtype} ?
$geodb_opts->{dbs}->{$dbtype} : "default locations";
dbg("geodb: GeoIP2: $dbtype database not found from $from");
}
}
if (!$ok) {
warn("geodb: GeoIP2 requested, but no databases could be loaded\n") if $errwarn;
return (undef, undef)
}
# dbinfo_DBTYPE()
$db->{city} and $dbapi->{dbinfo_city} = sub {
my $m = $_[0]->{db}->{city}->metadata();
return "GeoIP2 city: ".$m->description()->{en}." / ".localtime($m->build_epoch());
};
$db->{country} and $dbapi->{dbinfo_country} = sub {
my $m = $_[0]->{db}->{country}->metadata();
return "GeoIP2 country: ".$m->description()->{en}." / ".localtime($m->build_epoch());
};
$db->{isp} and $dbapi->{dbinfo_isp} = sub {
my $m = $_[0]->{db}->{isp}->metadata();
return "GeoIP2 isp: ".$m->description()->{en}." / ".localtime($m->build_epoch());
};
$db->{asn} and $dbapi->{dbinfo_asn} = sub {
my $m = $_[0]->{db}->{asn}->metadata();
return "GeoIP2 asn: ".$m->description()->{en}." / ".localtime($m->build_epoch());
};
# city()
$db->{city} and $dbapi->{city} = $dbapi->{city_v6} = sub {
my $res = {};
my $city;
eval {
$city = $_[0]->{db}->{city}->record_for_address($_[1]);
1;
} or do {
$@ =~ s/\s+Trace begun.*//s;
dbg("geodb: GeoIP2 city query failed for $_[1]: $@");
return $res;
};
eval {
$res->{city_name} = $city->{city}->{names}->{en};
$res->{country} = $city->{country}->{iso_code};
$res->{country_name} = $city->{country}->{names}->{en};
$res->{continent} = $city->{continent}->{code};
$res->{continent_name} = $city->{continent}->{names}->{en};
1;
};
return $res;
};
# country()
$db->{country} and $dbapi->{country} = $dbapi->{country_v6} = sub {
my $res = {};
my $country;
eval {
$country = $_[0]->{db}->{country}->record_for_address($_[1]);
1;
} or do {
$@ =~ s/\s+Trace begun.*//s;
dbg("geodb: GeoIP2 country query failed for $_[1]: $@");
return $res;
};
eval {
$res->{country} = $country->{country}->{iso_code};
$res->{country_name} = $country->{country}->{names}->{en};
$res->{continent} = $country->{continent}->{code};
$res->{continent_name} = $country->{continent}->{names}->{en};
1;
};
return $res;
};
# isp()
$db->{isp} and $dbapi->{isp} = $dbapi->{isp_v6} = sub {
my $res = {};
my $isp;
eval {
$isp = $_[0]->{db}->{isp}->record_for_address($_[1]);
1;
} or do {
$@ =~ s/\s+Trace begun.*//s;
dbg("geodb: GeoIP2 isp query failed for $_[1]: $@");
return $res;
};
eval {
$res->{asn} = $isp->{autonomous_system_number};
$res->{asn_organization} = $isp->{autonomous_system_organization};
$res->{isp} = $isp->{isp};
$res->{organization} = $isp->{organization};
1;
};
return $res;
};
# asn()
$db->{asn} and $dbapi->{asn} = $dbapi->{asn_v6} = sub {
my $res = {};
my $asn;
eval {
$asn = $_[0]->{db}->{asn}->record_for_address($_[1]);
1;
} or do {
$@ =~ s/\s+Trace begun.*//s;
dbg("geodb: GeoIP2 asn query failed for $_[1]: $@");
return $res;
};
eval {
$res->{asn} = $asn->{autonomous_system_number};
$res->{asn_organization} = $asn->{autonomous_system_organization};
1;
};
return $res;
};
return ($db, $dbapi);
}
sub load_geoip {
my ($self, $geodb_opts) = @_;
my ($db, $dbapi, $ok);
my ($gic_wanted, $gic_have, $gip_wanted, $gip_have);
my ($flags, $fix_stderr, $can_ipv6);
# Warn about fatal errors if this module was specifically requested
my $errwarn = ($geodb_opts->{module}||'') eq 'geoip';
eval {
require Geo::IP;
# need GeoIP C library 1.6.3 and GeoIP perl API 1.4.4 or later to avoid messages leaking - Bug 7153
$gip_wanted = version->parse('v1.4.4');
$gip_have = version->parse(Geo::IP->VERSION);
$gic_wanted = version->parse('v1.6.3');
eval { $gic_have = version->parse(Geo::IP->lib_version()); }; # might not have lib_version()
$gic_have = 'none' if !defined $gic_have;
dbg("geodb: GeoIP: versions: Geo::IP $gip_have, C library $gic_have");
$flags = 0;
$fix_stderr = 0;
if (ref($gic_have) eq 'version') {
# this code burps an ugly message if it fails, but that's redirected elsewhere
eval '$flags = Geo::IP::GEOIP_SILENCE' if $gip_wanted >= $gip_have;
$fix_stderr = $flags && $gic_wanted >= $gic_have;
}
$can_ipv6 = Geo::IP->VERSION >= 1.39 && Geo::IP->api eq 'CAPI';
1;
} or do {
my $err = $@;
$err =~ s/ at .*//s;
$err = "geodb: Geo::IP module load failed: $err";
$errwarn ? warn("$err\n") : dbg($err);
return (undef, undef);
};
my %path;
foreach my $dbtype (@geoip_types) {
# skip country if city already loaded
next if $dbtype eq 'country' && $db->{city};
# skip asn if isp already loaded
next if $dbtype eq 'asn' && $db->{isp};
# skip if not needed
next if $geodb_opts->{wanted} && !$geodb_opts->{wanted}->{$dbtype};
# only autosearch if no absolute path given
if (!defined $geodb_opts->{dbs}->{$dbtype}) {
# Try some default locations
PATHS_GEOIP: foreach my $p (@{$geodb_opts->{search_path}}) {
foreach my $f (@{$geoip_default_files{$dbtype}}) {
if (-f "$p/$f") {
$path{$dbtype} = "$p/$f";
dbg("geodb: GeoIP: search found $dbtype $p/$f");
if ($can_ipv6 && $f =~ s/\.(dat)$/v6.$1/i) {
if (-f "$p/$f") {
$path{$dbtype."_v6"} = "$p/$f";
dbg("geodb: GeoIP: search found $dbtype $p/$f");
}
}
last PATHS_GEOIP;
}
}
}
} else {
if (!-f $geodb_opts->{dbs}->{$dbtype}) {
dbg("geodb: GeoIP: $dbtype database requested, but not found: ".
$geodb_opts->{dbs}->{$dbtype});
next;
}
$path{$dbtype} = $geodb_opts->{dbs}->{$dbtype};
}
}
if (!$can_ipv6) {
dbg("geodb: GeoIP: IPv6 support not enabled, versions Geo::IP 1.39, GeoIP C API 1.4.7 required");
}
if ($fix_stderr) {
open(OLDERR, ">&STDERR");
open(STDERR, ">/dev/null");
}
foreach my $dbtype (@geoip_types) {
next unless defined $path{$dbtype};
eval {
$db->{$dbtype} = Geo::IP->open($path{$dbtype}, Geo::IP->GEOIP_STANDARD | $flags);
if ($can_ipv6 && defined $path{$dbtype."_v6"}) {
$db->{$dbtype."_v6"} = Geo::IP->open($path{$dbtype."_v6"}, Geo::IP->GEOIP_STANDARD | $flags);
}
};
if ($@ || !$db->{$dbtype}) {
my $err = $@;
$err =~ s/ at .*//s;
dbg("geodb: GeoIP: database $path{$dbtype} load failed: $err");
} else {
dbg("geodb: GeoIP: loaded $dbtype from $path{$dbtype}");
$ok = 1;
}
}
if ($fix_stderr) {
open(STDERR, ">&OLDERR");
close(OLDERR);
}
if (!$ok) {
warn("geodb: GeoIP requested, but no databases could be loaded\n") if $errwarn;
return (undef, undef)
}
# dbinfo_DBTYPE()
$db->{city} and $dbapi->{dbinfo_city} = sub {
return "Geo::IP IPv4 city: " . ($_[0]->{db}->{city}->database_info || '?')." / IPv6: ".
($_[0]->{db}->{city_v6} ? $_[0]->{db}->{city_v6}->database_info || '?' : 'no')
};
$db->{country} and $dbapi->{dbinfo_country} = sub {
return "Geo::IP IPv4 country: " . ($_[0]->{db}->{country}->database_info || '?')." / IPv6: ".
($_[0]->{db}->{country_v6} ? $_[0]->{db}->{country_v6}->database_info || '?' : 'no')
};
$db->{isp} and $dbapi->{dbinfo_isp} = sub {
return "Geo::IP IPv4 isp: " . ($_[0]->{db}->{isp}->database_info || '?')." / IPv6: ".
($_[0]->{db}->{isp_v6} ? $_[0]->{db}->{isp_v6}->database_info || '?' : 'no')
};
$db->{asn} and $dbapi->{dbinfo_asn} = sub {
return "Geo::IP IPv4 asn: " . ($_[0]->{db}->{asn}->database_info || '?')." / IPv6: ".
($_[0]->{db}->{asn_v6} ? $_[0]->{db}->{asn_v6}->database_info || '?' : 'no')
};
# city()
$db->{city} and $dbapi->{city} = sub {
my $res = {};
my $city;
if ($_[1] =~ IS_IPV4_ADDRESS) {
$city = $_[0]->{db}->{city}->record_by_addr($_[1]);
} elsif ($_[0]->{db}->{city_v6}) {
$city = $_[0]->{db}->{city_v6}->record_by_addr_v6($_[1]);
}
if (!defined $city) {
dbg("geodb: GeoIP city query failed for $_[1]");
return $res;
}
$res->{city_name} = $city->city;
$res->{country} = $city->country_code;
$res->{country_name} = $city->country_name;
$res->{continent} = $city->continent_code;
return $res;
};
$dbapi->{city_v6} = $dbapi->{city} if $db->{city_v6};
# country()
$db->{country} and $dbapi->{country} = sub {
my $res = {};
my $country;
eval {
if ($_[1] =~ IS_IPV4_ADDRESS) {
$country = $_[0]->{db}->{country}->country_code_by_addr($_[1]);
} elsif ($_[0]->{db}->{country_v6}) {
$country = $_[0]->{db}->{country_v6}->country_code_by_addr_v6($_[1]);
}
1;
};
if (!defined $country) {
dbg("geodb: GeoIP country query failed for $_[1]");
return $res;
};
$res->{country} = $country || 'XX';
$res->{continent} = $country_to_continent{$country} || 'XX';
return $res;
};
$dbapi->{country_v6} = $dbapi->{country} if $db->{country_v6};
# isp()
$db->{isp} and $dbapi->{isp} = sub {
my $res = {};
my $isp;
eval {
if ($_[1] =~ IS_IPV4_ADDRESS) {
$isp = $_[0]->{db}->{isp}->isp_by_addr($_[1]);
} else {
# TODO?
return $res;
}
1;
};
if (!defined $isp) {
dbg("geodb: GeoIP isp query failed for $_[1]");
return $res;
};
$res->{isp} = $isp;
return $res;
};
# asn()
$db->{asn} and $dbapi->{asn} = sub {
my $res = {};
my $asn;
eval {
if ($_[1] =~ IS_IPV4_ADDRESS) {
$asn = $_[0]->{db}->{asn}->isp_by_addr($_[1]);
} else {
# TODO?
return $res;
}
1;
};
if (!defined $asn || $asn !~ /^((?:AS)?\d+)(?:\s+(.+))?/) {
dbg("geodb: GeoIP asn query failed for $_[1]");
return $res;
};
$res->{asn} = $1;
$res->{asn_organization} = $2 if defined $2;
return $res;
};
return ($db, $dbapi);
}
sub load_dbfile {
my ($self, $geodb_opts) = @_;
my ($db, $dbapi);
# Warn about fatal errors if this module was specifically requested
my $errwarn = ($geodb_opts->{module}||'') eq 'dbfile';
if (!defined $geodb_opts->{dbs}->{country}) {
my $err = "geodb: IP::Country::DB_File requires geodb_options country:/path/to/ipcc.db";
$errwarn ? warn("$err\n") : dbg($err);
return (undef, undef);
}
if (!-f $geodb_opts->{dbs}->{country}) {
my $err = "geodb: IP::Country::DB_File database not found: ".$geodb_opts->{dbs}->{country};
$errwarn ? warn("$err\n") : dbg($err);
return (undef, undef);
}
eval {
require IP::Country::DB_File;
$db->{country} = IP::Country::DB_File->new($geodb_opts->{dbs}->{country});
1;
};
if ($@ || !$db->{country}) {
my $err = $@;
$err =~ s/ at .*//s;
$err = "geodb: IP::Country::DB_File country load failed: $err";
$errwarn ? warn("$err\n") : dbg($err);
return (undef, undef);
} else {
dbg("geodb: IP::Country::DB_File loaded country from ".$geodb_opts->{dbs}->{country});
}
# dbinfo_DBTYPE()
$db->{country} and $dbapi->{dbinfo_country} = sub {
return "IP::Country::DB_File country: ".localtime($_[0]->{db}->{country}->db_time());
};
# country();
$db->{country} and $dbapi->{country} = $dbapi->{country_v6} = sub {
my $res = {};
my $country;
if ($_[1] =~ IS_IPV4_ADDRESS) {
$country = $_[0]->{db}->{country}->inet_atocc($_[1]);
} else {
$country = $_[0]->{db}->{country}->inet6_atocc($_[1]);
}
if (!defined $country) {
dbg("geodb: IP::Country::DB_File country query failed for $_[1]");
return $res;
};
$res->{country} = $country || 'XX';
$res->{continent} = $country_to_continent{$country} || 'XX';
return $res;
};
return ($db, $dbapi);
}
sub load_fast {
my ($self, $geodb_opts) = @_;
my ($db, $dbapi);
# Warn about fatal errors if this module was specifically requested
my $errwarn = ($geodb_opts->{module}||'') eq 'fast';
eval {
require IP::Country::Fast;
$db->{country} = IP::Country::Fast->new();
1;
};
if ($@ || !$db->{country}) {
my $err = $@;
$err =~ s/ at .*//s;
$err = "geodb: IP::Country::Fast load failed: $err";
$errwarn ? warn("$err\n") : dbg($err);
return (undef, undef);
}
# dbinfo_DBTYPE()
$db->{country} and $dbapi->{dbinfo_country} = sub {
return "IP::Country::Fast country: ".localtime($_[0]->{db}->{country}->db_time());
};
# country();
$db->{country} and $dbapi->{country} = sub {
my $res = {};
my $country;
if ($_[1] =~ IS_IPV4_ADDRESS) {
$country = $_[0]->{db}->{country}->inet_atocc($_[1]);
} else {
return $res;
}
if (!defined $country) {
dbg("geodb: IP::Country::Fast country query failed for $_[1]");
return $res;
};
$res->{country} = $country || 'XX';
$res->{continent} = $country_to_continent{$country} || 'XX';
return $res;
};
return ($db, $dbapi);
}
# return array, infoline per database type
sub get_dbinfo {
my ($self, $db) = @_;
my @lines;
foreach (@geoip_types) {
if (exists $self->{dbapi}->{"dbinfo_".$_}) {
push @lines,
$self->{dbapi}->{"dbinfo_".$_}->($self) || "$_ failed";
}
}
return \@lines;
}
sub get_country {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
if ($ip =~ IS_IP_PRIVATE) {
return '**';
}
if ($ip !~ IS_IP_ADDRESS) {
$ip = name_to_ip($ip);
return 'XX' if !defined $ip;
}
if ($self->{dbapi}->{city}) {
return $self->_get('city',$ip)->{country} || 'XX';
} elsif ($self->{dbapi}->{country}) {
return $self->_get('country',$ip)->{country} || 'XX';
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
sub get_continent {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
# If it's already CC, use our own lookup table..
if (length($ip) == 2) {
return $country_to_continent{uc($ip)} || 'XX';
}
if ($self->{dbapi}->{city}) {
return $self->_get('city',$ip)->{continent} || 'XX';
} elsif ($self->{dbapi}->{country}) {
return $self->_get('country',$ip)->{continent} || 'XX';
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
sub get_isp {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
if ($self->{dbapi}->{isp}) {
return $self->_get('isp',$ip)->{isp};
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
sub get_isp_org {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
if ($self->{dbapi}->{isp}) {
return $self->_get('isp',$ip)->{organization};
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
sub get_asn {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
if ($self->{dbapi}->{asn}) {
return $self->_get('asn',$ip)->{asn};
} elsif ($self->{dbapi}->{isp}) {
return $self->_get('isp',$ip)->{asn};
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
sub get_asn_org {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
if ($self->{dbapi}->{asn}) {
return $self->_get('asn',$ip)->{asn_organization};
} elsif ($self->{dbapi}->{isp}) {
return $self->_get('isp',$ip)->{asn_organization};
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
sub get_all {
my ($self, $ip) = @_;
return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
my $all = {};
if ($ip =~ IS_IP_PRIVATE) {
return { 'country' => '**' };
}
if ($ip !~ IS_IP_ADDRESS) {
$ip = name_to_ip($ip);
if (!defined $ip) {
return { 'country' => 'XX' };
}
}
if ($self->{dbapi}->{city}) {
my $res = $self->_get('city',$ip);
$all->{$_} = $res->{$_} foreach (keys %$res);
} elsif ($self->{dbapi}->{country}) {
my $res = $self->_get('country',$ip);
$all->{$_} = $res->{$_} foreach (keys %$res);
}
if ($self->{dbapi}->{isp}) {
my $res = $self->_get('isp',$ip);
$all->{$_} = $res->{$_} foreach (keys %$res);
}
if ($self->{dbapi}->{asn}) {
my $res = $self->_get('asn',$ip);
$all->{$_} = $res->{$_} foreach (keys %$res);
}
return $all;
}
sub can {
my ($self, $check) = @_;
return defined $self->{dbapi}->{$check};
}
# TODO: use SA internal dns synchronously?
# This shouldn't be called much, as plugins
# should do their own resolving if needed
sub name_to_ip {
my $name = shift;
if (my $ip = inet_aton($name)) {
$ip = inet_ntoa($ip);
dbg("geodb: resolved internally $name: $ip");
return $ip;
}
dbg("geodb: failed to internally resolve $name");
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
sub _get {
my ($self, $type, $ip) = @_;
# reset cache at 100 ips
if (scalar keys %{$self->{cache}} >= 100) {
$self->{cache} = ();
}
if (!exists $self->{cache}{$ip}{$type}) {
if ($self->{dbapi}->{$type}) {
$self->{cache}{$ip}{$type} = $self->{dbapi}->{$type}->($self,$ip);
} else {
return undef; ## no critic (ProhibitExplicitReturnUndef)
}
}
return $self->{cache}{$ip}{$type};
}
1;

View File

@ -24,9 +24,6 @@ use strict;
use warnings;
use re 'taint';
require 5.008; # need basic Unicode support for HTML::Parser::utf8_mode
# require 5.008008; # Bug 3787; [perl #37950]: Malformed UTF-8 character ...
use HTML::Parser 3.43 ();
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:sa);
@ -66,7 +63,7 @@ my %elements_whitespace = map {; $_ => 1 }
# elements that push URIs
my %elements_uri = map {; $_ => 1 }
qw( body table tr td a area link img frame iframe embed script form base bgsound ),
qw( body table tr td a area link img frame iframe embed script form base bgsound meta ),
;
# style attribute not accepted
@ -248,6 +245,18 @@ sub parse {
# the HTML::Parser API won't do it for us
$text =~ s/<(\w+)\s*\/>/<$1>/gi;
# Normalize unicode quotes, messes up attributes parsing
# U+201C e2 80 9c LEFT DOUBLE QUOTATION MARK
# U+201D e2 80 9d RIGHT DOUBLE QUOTATION MARK
# Examples of input:
# <a href=\x{E2}\x{80}\x{9D}https://foobar.com\x{E2}\x{80}\x{9D}>
# .. results in uri "\x{E2}\x{80}\x{9D}https://foobar.com\x{E2}\x{80}\x{9D}"
if (utf8::is_utf8($text)) {
$text =~ s/(?:\x{201C}|\x{201D})/"/g;
} else {
$text =~ s/\x{E2}\x{80}(?:\x{9C}|\x{9D})/"/g;
}
if (!$self->UNIVERSAL::can('utf8_mode')) {
# utf8_mode is cleared by default, only warn if it would need to be set
warn "message: cannot set utf8_mode, module HTML::Parser is too old\n"
@ -352,8 +361,8 @@ sub canon_uri {
my ($self, $uri) = @_;
# URIs don't have leading/trailing whitespace ...
$uri =~ s/^\s+//;
$uri =~ s/\s+$//;
$uri =~ s/^[\s\xA0]+//;
$uri =~ s/[\s\xA0]+$//;
# Make sure all the URIs are nice and short
if (length $uri > MAX_URI_LENGTH) {
@ -414,6 +423,17 @@ sub html_uri {
}
}
}
elsif ($tag eq "meta" &&
exists $attr->{'http-equiv'} &&
exists $attr->{content} &&
$attr->{'http-equiv'} =~ /refresh/i &&
$attr->{content} =~ /\burl\s*=/i)
{
my $uri = $attr->{content};
$uri =~ s/^.*\burl\s*=\s*//i;
$uri =~ s/\s*;.*//i;
$self->push_uri($tag, $uri);
}
}
# this might not be quite right, may need to pay attention to table nesting
@ -516,7 +536,7 @@ sub text_style {
my $whcolor = $1 ? 'bgcolor' : 'fgcolor';
my $value = lc $2;
if ($value =~ /rgb/) {
if (index($value, 'rgb') >= 0) {
$value =~ tr/0-9,//cd;
my @rgb = split(/,/, $value);
$new{$whcolor} = sprintf("#%02x%02x%02x",
@ -705,6 +725,8 @@ sub html_tests {
{
$self->{charsets} .= exists $self->{charsets} ? " $1" : $1;
}
# todo: capture URI from meta refresh tag
}
sub display_text {
@ -1154,7 +1176,7 @@ sub _merge_uri {
return "/" . $r_path;
}
else {
if ($base_path =~ m|/|) {
if (index($base_path, '/') >= 0) {
$base_path =~ s|(?<=/)[^/]*$||;
}
else {

View File

@ -78,9 +78,6 @@ sub is_charset_ok_for_locales {
$cs =~ s/:.*$//gs; # trim off multiple charsets, just use 1st
dbg ("locales: is $cs ok for @locales?");
study $cs; # study is a no-op since perl 5.16.0, eliminating related bugs
#warn "JMD $cs";
# always OK (the net speaks mostly roman charsets)
return 1 if ($cs eq 'USASCII');
return 1 if ($cs eq 'ASCII');

View File

@ -69,6 +69,11 @@ sub jittery_one_second_sleep {
Time::HiRes::sleep(rand(1.0) + 0.5);
}
sub jittery_half_second_sleep {
my ($self) = @_;
Time::HiRes::sleep(rand(0.5) + 0.25);
}
###########################################################################
1;

View File

@ -55,7 +55,7 @@ sub safe_lock {
my $lock_file = "$path.mutex";
my $umask = umask(~$mode);
my $fh = new IO::File();
my $fh = IO::File->new;
if (!$fh->open ($lock_file, O_RDWR|O_CREAT)) {
umask $umask; # just in case

View File

@ -29,6 +29,7 @@ use Mail::SpamAssassin::Logger;
use File::Spec;
use Time::Local;
use Fcntl qw(:DEFAULT :flock);
use Errno qw(EEXIST);
our @ISA = qw(Mail::SpamAssassin::Locker);
@ -60,7 +61,7 @@ sub safe_lock {
$max_retries ||= 30;
$mode ||= "0700";
$mode = (oct $mode) & 0666;
dbg ("locker: mode is $mode");
dbg ("locker: mode is %03o", $mode);
my $lock_file = "$path.lock";
my $hname = Mail::SpamAssassin::Util::fq_hostname();
@ -76,11 +77,11 @@ sub safe_lock {
die "locker: safe_lock: cannot create tmp lockfile $lock_tmp for $lock_file: $!\n";
}
umask $umask;
autoflush LTMP 1;
LTMP->autoflush(1);
dbg("locker: safe_lock: created $lock_tmp");
for (my $retries = 0; $retries < $max_retries; $retries++) {
if ($retries > 0) { $self->jittery_one_second_sleep(); }
for (my $retries = 0; $retries < $max_retries * 2; $retries++) {
if ($retries > 0) { $self->jittery_half_second_sleep(); }
print LTMP "$hname.$$\n" or warn "Error writing to $lock_tmp: $!";
dbg("locker: safe_lock: trying to get lock on $path with $retries retries");
if (link($lock_tmp, $lock_file)) {
@ -88,6 +89,10 @@ sub safe_lock {
$is_locked = 1;
last;
}
# if lock exists, it's already likely locked, no point complaining here
unless ($!{EEXIST}) {
warn "locker: creating link $lock_file to $lock_tmp failed: '$!'";
}
# link _may_ return false even if the link _is_ created
@stat = lstat($lock_tmp);
@stat or warn "locker: error accessing $lock_tmp: $!";
@ -149,7 +154,7 @@ sub safe_unlock {
warn "locker: safe_unlock: failed to create lock tmpfile $lock_tmp: $!";
return;
} else {
autoflush LTMP 1;
LTMP->autoflush(1);
print LTMP "\n" or warn "Error writing to $lock_tmp: $!";
if (!(@stat_ourtmp = stat(LTMP)) || (scalar(@stat_ourtmp) < 11)) {
@ -157,7 +162,7 @@ sub safe_unlock {
warn "locker: safe_unlock: failed to create lock tmpfile $lock_tmp";
close LTMP or die "error closing $lock_tmp: $!";
unlink($lock_tmp)
or warn "locker: safe_lock: unlink of lock file failed: $!\n";
or warn "locker: safe_lock: unlink of lock file $lock_tmp failed: $!\n";
return;
}
}
@ -169,7 +174,7 @@ sub safe_unlock {
close LTMP or die "error closing $lock_tmp: $!";
unlink($lock_tmp)
or warn "locker: safe_lock: unlink of lock file failed: $!\n";
or warn "locker: safe_lock: unlink of lock file $lock_tmp failed: $!\n";
# 2. If the ctime hasn't been modified, unlink the file and return. If the
# lock has expired, sleep the usual random interval before returning. If we
@ -191,7 +196,7 @@ sub safe_unlock {
{
# things are good: the ctimes match so it was our lock
unlink($lock_file)
or warn "locker: safe_unlock: unlink failed: $lock_file\n";
or warn "locker: safe_unlock: unlinking $lock_file failed: $!\n";
dbg("locker: safe_unlock: unlink $lock_file");
if ($ourtmp_ctime >= $lock_ctime + LOCK_MAX_AGE) {

View File

@ -72,7 +72,7 @@ sub safe_lock {
return 1;
}
my @stat = stat($lock_file);
@stat or warn "locker: error accessing $lock_file: $!";
@stat or dbg("locker: error accessing $lock_file: $!");
# check age of lockfile ctime
my $age = ($#stat < 11 ? undef : $stat[10]);

View File

@ -75,7 +75,22 @@ $LOG_SA{facility} = {}; # no dbg facilities turned on
# always log to stderr initially
use Mail::SpamAssassin::Logger::Stderr;
$LOG_SA{method}->{stderr} = Mail::SpamAssassin::Logger::Stderr->new();
$LOG_SA{method}->{stderr} =
Mail::SpamAssassin::Logger::Stderr->new(escape =>
exists $ENV{'SA_LOGGER_ESCAPE'} ? $ENV{'SA_LOGGER_ESCAPE'} : 1
);
# Use of M:SA:Util causes circular dependencies, separate helper here.
my %escape_map =
("\r" => '\\r', "\n" => '\\n', "\t" => '\\t', "\\" => '\\\\');
sub escape_str {
# Things are already forced as octets by _log, no utf8::encode needed
# Control chars, DEL, backslash
$_[0] =~ s@
( [\x00-\x1F\x7F\x80-\xFF\\] )
@ $escape_map{$1} || sprintf("\\x{%02X}",ord($1))
@egsx;
}
=head1 METHODS
@ -165,7 +180,7 @@ sub log_message {
# don't log them -- this is caller 0, the use'ing package is 1, the eval is 2
my @caller = caller 2;
return if (defined $caller[3] && defined $caller[0] &&
$caller[3] =~ /^\(eval\)$/ &&
$caller[3] eq '(eval)' &&
$caller[0] =~ m#^Mail::SpamAssassin(?:$|::)#);
}
@ -215,13 +230,15 @@ sub _log_message {
foreach my $line (split(/\n/, $_[1])) {
# replace control characters with "_", tabs and spaces get
# replaced with a single space.
$line =~ tr/\x09\x20\x00-\x1f/ _/s;
# Deprecated here, see new Bug 6583 escaping in Logger/*.pm modules
#$line =~ tr/\x09\x20\x00-\x1f/ _/s;
if ($first) {
$first = 0;
} else {
local $1;
$line =~ s/^([^:]+?):/$1: [...]/;
}
while (my ($name, $object) = each %{ $LOG_SA{method} }) {
$object->log_message($_[0], $line, $_[2]);
}
@ -274,6 +291,9 @@ sub _log {
}
my ($level, $message, @args) = @_;
utf8::encode($message) if utf8::is_utf8($message); # handle as octets
$message =~ s/^(?:[a-z0-9_-]*):\s*//i;
$message = sprintf($message,@args) if @args;
@ -284,18 +304,28 @@ sub _log {
log_message(($level == INFO ? "info" : "dbg"), $message);
}
=item add(method => 'syslog', socket => $socket, facility => $facility)
=item add(method => 'syslog', socket => $socket, facility => $facility, escape => $escape)
C<socket> is the type the syslog ("unix" or "inet"). C<facility> is the
syslog facility (typically "mail").
=item add(method => 'file', filename => $file)
If optional C<escape> is true, all non-ascii characters are escaped for safe
output: backslashes change to \\ and non-ascii chars to \x{XX} or \x{XXXX}
(Unicode). If not defined, pre-4.0 style sanitizing is used
( tr/\x09\x20\x00-\x1f/_/s ).
C<filename> is the name of the log file.
Escape value can be overridden with environment variable
C<SA_LOGGER_ESCAPE>.
=item add(method => 'stderr')
=item add(method => 'file', filename => $file, escape => $escape)
C<filename> is the name of the log file. C<escape> works as described
above.
=item add(method => 'stderr', escape => $escape)
No options are needed for stderr logging, just don't close stderr first.
C<escape> works as described above.
=cut
@ -307,6 +337,10 @@ sub add {
return 0 if $class !~ /^\w+$/; # be paranoid
if (exists $ENV{'SA_LOGGER_ESCAPE'}) {
$params{escape} = $ENV{'SA_LOGGER_ESCAPE'}
}
eval 'use Mail::SpamAssassin::Logger::'.$class.'; 1'
or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;

View File

@ -37,13 +37,17 @@ use re 'taint';
use POSIX ();
use Time::HiRes ();
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(am_running_on_windows);
our @ISA = ();
# ADDING OS-DEPENDENT LINE TERMINATOR - BUG 6456
# Using Mail::SpamAssassin::Util::am_running_on_windows() leads to circular
# dependencies. So, we are duplicating the code instead.
use constant RUNNING_ON_WINDOWS => ($^O =~ /^(?:mswin|dos|os2)/oi);
my $eol = "\n";
if (am_running_on_windows()) {
if (RUNNING_ON_WINDOWS) {
$eol = "\r\n";
}
@ -58,6 +62,7 @@ sub new {
my %params = @_;
$self->{filename} = $params{filename} || 'spamassassin.log';
$self->{timestamp_fmt} = $params{timestamp_fmt};
$self->{escape} = $params{escape} if exists $params{escape};
if (! $self->init()) {
die "logger: file initialization failed$eol";
@ -101,6 +106,16 @@ sub log_message {
}
$timestamp .= ' ' if $timestamp ne '';
if ($self->{escape}) {
# Bug 6583, escape
Mail::SpamAssassin::Logger::escape_str($msg);
} elsif (!exists $self->{escape}) {
# Backwards compatible pre-4.0 escaping, if $escape not given.
# replace control characters with "_", tabs and spaces get
# replaced with a single space.
$msg =~ tr/\x09\x20\x00-\x1f/ _/s;
}
my($nwrite) = syswrite(STDLOG, sprintf("%s[%s] %s: %s%s",
$timestamp, $$, $level, $msg, $eol));
defined $nwrite or warn "error writing to log file: $!";

View File

@ -36,6 +36,7 @@ use re 'taint';
use POSIX ();
use Time::HiRes ();
use Mail::SpamAssassin::Logger;
our @ISA = ();
@ -59,6 +60,7 @@ sub new {
my %params = @_;
$self->{timestamp_fmt} = $params{timestamp_fmt};
$self->{escape} = $params{escape} if exists $params{escape};
return($self);
}
@ -83,6 +85,16 @@ sub log_message {
}
$timestamp .= ' ' if $timestamp ne '';
if ($self->{escape}) {
# Bug 6583, escape
Mail::SpamAssassin::Logger::escape_str($msg);
} elsif (!exists $self->{escape}) {
# Backwards compatible pre-4.0 escaping, if $escape not given.
# replace control characters with "_", tabs and spaces get
# replaced with a single space.
$msg =~ tr/\x09\x20\x00-\x1f/ _/s;
}
my($nwrite) = syswrite(STDERR, sprintf("%s[%d] %s: %s%s",
$timestamp, $$, $level, $msg, $eol));
defined $nwrite or warn "error writing to log file: $!";

View File

@ -73,6 +73,7 @@ sub new {
$self->{log_socket} = $params{socket};
$self->{log_facility} = $params{facility};
$self->{timestamp_fmt} = $params{timestamp_fmt};
$self->{escape} = $params{escape} if exists $params{escape};
if (! $self->init()) {
die "logger: syslog initialization failed\n";
@ -148,6 +149,16 @@ sub log_message {
$msg = '(bad prio: ' . $_[1] . ') ' . $msg;
}
if ($self->{escape}) {
# Bug 6583, escape
Mail::SpamAssassin::Logger::escape_str($msg);
} elsif (!exists $self->{escape}) {
# Backwards compatible pre-4.0 escaping, if $escape not given
# replace control characters with "_", tabs and spaces get
# replaced with a single space.
$msg =~ tr/\x09\x20\x00-\x1f/ _/s;
}
# install a new handler for SIGPIPE -- this signal has been
# found to occur with syslog-ng after syslog-ng restarts.
local $SIG{'PIPE'} = sub {

View File

@ -46,10 +46,8 @@ use strict;
use warnings;
use re 'taint';
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1 sha1_hex); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1 sha1_hex) }
}
use Digest::SHA qw(sha1 sha1_hex);
use Scalar::Util qw(tainted);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Message::Node;
@ -159,7 +157,7 @@ sub new {
if (ref $message eq 'ARRAY') {
@message = @{$message};
}
elsif (ref($message) eq 'GLOB' || ref($message) =~ /^IO::/) {
elsif (ref($message) eq 'GLOB' || index(ref($message), 'IO::') == 0) {
if (defined fileno $message) {
# sysread+split avoids a Perl I/O bug (Bug 5985)
@ -207,14 +205,14 @@ sub new {
# messages? Tainting the message is important because it prevents certain
# exploits later.
if (Mail::SpamAssassin::Util::am_running_in_taint_mode() &&
grep { !Scalar::Util::tainted($_) } @message) {
grep { !tainted($_) } @message) {
local($_);
# To preserve newlines, no joining and splitting here, process each line
# directly as is.
foreach (@message) {
$_ = Mail::SpamAssassin::Util::taint_var($_);
}
if (grep { !Scalar::Util::tainted($_) } @message) {
if (grep { !tainted($_) } @message) {
die "Mail::SpamAssassin::Message failed to enforce message taintness";
}
}
@ -256,7 +254,7 @@ sub new {
# bug 4363
# Check to see if we should do CRLF instead of just LF
# For now, just check the first and last line and do whatever it does
if (@message && ($message[0] =~ /\015\012/ || $message[-1] =~ /\015\012/)) {
if (index($message[0], "\015\012") != -1 || index($message[-1], "\015\012") != -1) {
$self->{line_ending} = "\015\012";
dbg("message: line ending changed to CRLF");
}
@ -270,7 +268,12 @@ sub new {
for (;;) {
# make sure not to lose the last header field when there is no body
my $eof = !@message;
my $current = $eof ? "\n" : shift @message;
my $current = $eof ? $self->{line_ending} : shift @message;
# Bug 7785: spamass-milter breaks wrapped headers, add any missing \r
if ($squash_crlf) {
$current =~ s/(?<!\015)\012/\015\012/gs;
}
if ( $current =~ /^[ \t]/ ) {
# This wasn't useful in terms of a rule, but we may want to treat it
@ -306,7 +309,7 @@ sub new {
}
}
if ($current =~ /^\r?$/) { # a regular end of a header section
if ($current eq $self->{line_ending}) { # a regular end of a header section
if ($eof) {
$self->{'missing_head_body_separator'} = 1;
} else {
@ -395,7 +398,7 @@ sub new {
# either a blank line or the boundary (if defined), insert a blank line
# to ensure proper parsing - do not consider MIME headers at the beginning of the body
# to be part of the message headers.
if ($self->{'type'} =~ /^multipart\//i && $#message > 0 && $message[0] =~ /\S/)
if (index($self->{'type'}, 'multipart/') == 0 && $#message > 0 && $message[0] =~ /\S/)
{
if (!defined $boundary || $message[0] !~ /^--\Q$boundary\E/)
{
@ -537,6 +540,77 @@ sub get_pristine_body {
return $self->{pristine_body};
}
=item get_pristine_body_digest()
Returns SHA1 hex digest of the pristine message body.
CRLF line endings are normalized to LF before hashing.
=cut
sub get_pristine_body_digest {
my ($self) = @_;
return $self->{pristine_body_digest} if exists $self->{pristine_body_digest};
if ($self->{line_ending} eq "\015\012") {
# Don't make a copy, process line by line to save memory
# CRLF should be exception, so it's not that critical here
my $sha = Digest::SHA->new('sha1');
while ($self->{pristine_body} =~ /(.*?)(\015\012)?/gs) {
$sha->add($1.(defined $2 ? "\012" : ""));
}
$self->{pristine_body_digest} = $sha->hexdigest;
} else {
$self->{pristine_body_digest} = sha1_hex($self->{pristine_body});
}
dbg("message: pristine body digest: ".$self->{pristine_body_digest});
return $self->{pristine_body_digest};
}
# ---------------------------------------------------------------------------
=item get_msgid()
Returns Message-ID header for the message, with <> and surrounding
whitespace removed. Returns undef, if nothing found between <>.
=cut
sub get_msgid {
my ($self) = @_;
my $msgid = $self->get_header("Message-Id");
if (defined $msgid && $msgid =~ /^\s*<(.+)>\s*$/s) {
return $1;
} else {
return;
}
}
=item generate_msgid()
Generate a calculated "Message-ID" in B<sha1hex@sa_generated> format, using
To, Date headers and pristine body as source for hashing.
=cut
sub generate_msgid {
my ($self) = @_;
return $self->{msgid_generated} if exists $self->{msgid_generated};
# See Bug 5185, not using Received headers etc anymore
my $to = $self->get_header("To") || '';
my $date = $self->get_header("Date") || '';
my $body_digest = $self->get_pristine_body_digest();
$self->{msgid_generated} =
sha1_hex($to."\000".$date."\000".$body_digest).'@sa_generated';
return $self->{msgid_generated};
}
# ---------------------------------------------------------------------------
=item extract_message_metadata($permsgstatus)
@ -706,6 +780,7 @@ sub finish {
# temporary files are deleted even if the finish() method is omitted
sub DESTROY {
my $self = shift;
# best practices: prevent potential calls to eval and to system routines
# in code of a DESTROY method from clobbering global variables $@ and $!
local($@,$!); # keep outer error handling unaffected by DESTROY
@ -774,7 +849,7 @@ sub parse_body {
#
my ($msg, $boundary, $body, $subparse) = @$toparse;
if ($msg->{'type'} =~ m{^multipart/}i && defined $boundary && $subparse > 0) {
if (index($msg->{'type'}, 'multipart/') == 0 && defined $boundary && $subparse > 0) {
$self->_parse_multipart($toparse);
}
else {
@ -782,7 +857,8 @@ sub parse_body {
$self->_parse_normal($toparse);
# bug 5041: process message/*, but exclude message/partial content types
if ($msg->{'type'} =~ m{^message/(?!partial\z)}i && $subparse > 0)
if (index($msg->{'type'}, 'message/') == 0 &&
$msg->{'type'} ne 'message/partial' && $subparse > 0)
{
# Just decode the part, but we don't need the resulting string here.
$msg->decode(0);
@ -798,7 +874,7 @@ sub parse_body {
# bug 5051, bug 3748: check $msg->{decoded}: sometimes message/* parts
# have no content, and we get stuck waiting for STDIN, which is bad. :(
if ($msg->{'type'} =~ m{^message/(?:rfc822|global)\z}i &&
if (($msg->{'type'} eq 'message/rfc822' || $msg->{'type'} eq 'message/global') &&
defined $msg->{'decoded'} && $msg->{'decoded'} ne '')
{
# Ok, so this part is still semi-recursive, since M::SA::Message
@ -857,6 +933,7 @@ sub _parse_multipart {
my($self, $toparse) = @_;
my ($msg, $boundary, $body, $subparse) = @{$toparse};
my $nested_boundary = 0;
# we're not supposed to be a leaf, so prep ourselves
$msg->{'body_parts'} = [];
@ -907,6 +984,7 @@ sub _parse_multipart {
my $header;
my $part_array;
my $found_end_boundary;
my $found_last_end_boundary;
my $partcnt = 0;
my $line_count = @{$body};
@ -915,7 +993,12 @@ sub _parse_multipart {
# deal with the mime part;
# a triage before an unlikely-to-match regexp avoids a CPU hotspot
$found_end_boundary = defined $boundary && substr($_,0,2) eq '--'
&& /^--\Q$boundary\E(?:--)?\s*$/;
&& /^--\Q$boundary\E(--)?\s*$/;
$found_last_end_boundary = $found_end_boundary && $1;
if ($found_end_boundary && $nested_boundary) {
$found_end_boundary = 0;
$nested_boundary = 0 if ($found_last_end_boundary); # bug 7358 - handle one level of non-unique boundary string
}
if ( --$line_count == 0 || $found_end_boundary ) {
my $line = $_; # remember the last line
@ -951,8 +1034,20 @@ sub _parse_multipart {
$part_array = [];
}
my($p_boundary);
($part_msg->{'type'}, $p_boundary) = Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type'));
($part_msg->{'type'}, my $p_boundary, undef, undef, my $ct_was_missing) =
Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type'));
# bug 5741: if ct was missing and parent == multipart/digest, then
# type should be set as message/rfc822
if ($ct_was_missing) {
if ($msg->{'type'} eq 'multipart/digest') {
dbg("message: missing type, setting multipart/digest child as message/rfc822");
$part_msg->{'type'} = 'message/rfc822';
} else {
dbg("message: missing type, setting as default text/plain");
}
}
$p_boundary ||= $boundary;
dbg("message: found part of type ".$part_msg->{'type'}.", boundary: ".(defined $p_boundary ? $p_boundary : ''));
@ -962,12 +1057,8 @@ sub _parse_multipart {
push(@{$self->{'parse_queue'}}, [ $part_msg, $p_boundary, $part_array, $subparse ]);
$msg->add_body_part($part_msg);
# rfc 1521 says /^--boundary--$/, some MUAs may just require /^--boundary--/
# but this causes problems with horizontal lines when the boundary is
# made up of dashes as well, etc.
if (defined $boundary) {
# no re "strict"; # since perl 5.21.8: Ranges of ASCII printables...
if ($line =~ /^--\Q${boundary}\E--\s*$/) {
if ($found_last_end_boundary) {
# Make a note that we've seen the end boundary
$self->{mime_boundary_state}->{$boundary}--;
last;
@ -1016,6 +1107,12 @@ sub _parse_multipart {
if ($header) {
my ( $key, $value ) = split ( /:\s*/, $header, 2 );
$part_msg->header( $key, $value );
if (defined $boundary && lc $key eq 'content-type') {
my (undef, $nested_bound) = Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type'));
if (defined $nested_bound && $nested_bound eq $boundary) {
$nested_boundary = 1;
}
}
}
$in_body = 1;
@ -1070,12 +1167,18 @@ sub _parse_normal {
dbg("message: parsing normal part");
# 0: content-type, 1: boundary, 2: charset, 3: filename
# 0: content-type, 1: boundary, 2: charset, 3: filename 4: ct_missing
my @ct = Mail::SpamAssassin::Util::parse_content_type($msg->header('content-type'));
# multipart sections are required to have a boundary set ... If this
# one doesn't, assume it's malformed and revert to text/plain
$msg->{'type'} = ($ct[0] !~ m@^multipart/@i || defined $boundary ) ? $ct[0] : 'text/plain';
# bug 5741: don't overwrite the default type assigned by _parse_multipart()
if (!$ct[4]) {
$msg->{'type'} = (index($ct[0], 'multipart/') != 0 || defined $boundary) ?
$ct[0] : 'text/plain'
} else {
dbg("message: missing type, setting previous multipart type: %s", $msg->{'type'});
}
$msg->{'charset'} = $ct[2];
# attempt to figure out a name for this attachment if there is one ...
@ -1086,9 +1189,6 @@ sub _parse_normal {
elsif ($ct[3]) {
$msg->{'name'} = $ct[3];
}
if ($msg->{'name'}) {
$msg->{'name'} = Encode::decode("MIME-Header", $msg->{'name'});
}
$msg->{'boundary'} = $boundary;
@ -1096,7 +1196,8 @@ sub _parse_normal {
# ahead and write the part data out to a temp file -- why keep sucking
# up RAM with something we're not going to use?
#
if ($msg->{'type'} !~ m@^(?:text/(?:plain|html)$|message\b)@) {
unless ($msg->{'type'} eq 'text/plain' || $msg->{'type'} eq 'text/html' ||
index($msg->{'type'}, 'message/') == 0) {
my($filepath, $fh);
eval {
($filepath, $fh) = Mail::SpamAssassin::Util::secure_tmpfile(); 1;
@ -1131,7 +1232,7 @@ sub get_mimepart_digests {
if (!exists $self->{mimepart_digests}) {
# traverse all parts which are leaves, recursively
$self->{mimepart_digests} =
[ map(sha1_hex($_->decode) . ':' . lc($_->{type}||''),
[ map(sha1_hex($_->decode) . ':' . ($_->{type}||''),
$self->find_parts(qr/^/,1,1)) ];
}
return $self->{mimepart_digests};
@ -1202,16 +1303,19 @@ sub get_body_text_array_common {
# text/plain rendered as html otherwise.
if ($html_needs_setting && $type eq 'text/html') {
$self->{metadata}->{html} = $p->{html_results};
push @{$self->{metadata}->{html_all}}, $p->{html_results};
}
}
}
# whitespace handling (warning: small changes have large effects!)
$text =~ s/\n+\s*\n+/\f/gs; # double newlines => form feed
$text =~ s/\n+\s*\n+/\x00/gs; # double newlines => null
# $text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace (incl. VT, NBSP) => space
$text =~ tr/ \t\n\r\x0b/ /s; # whitespace (incl. VT) => space
$text =~ tr/\f/\n/; # form feeds => newline
# $text =~ tr/ \t\n\r\x0b/ /s; # whitespace (incl. VT) => single space
$text =~ s/\s+/ /gs; # Unicode whitespace => single space
$text =~ tr/\x00/\n/; # null => newline
utf8::encode($text) if utf8::is_utf8($text);
my @textary = split_into_array_of_short_lines($text);
$self->{$key} = \@textary;
@ -1246,7 +1350,7 @@ sub get_decoded_body_text_array {
my $scansize = $self->{rawbody_part_scan_size};
# Find all parts which are leaves
my @parts = $self->find_parts(qr/^(?:text|message)\b/i,1);
my @parts = $self->find_parts(qr/^(?:text|message)\b/,1);
return $self->{text_decoded} unless @parts;
# Go through each part

View File

@ -50,6 +50,10 @@ use Mail::SpamAssassin::Dns;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::Constants qw(:ip);
my $IP_ADDRESS = IP_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
my $LOCALHOST = LOCALHOST;
# ---------------------------------------------------------------------------
sub parse_received_headers {
@ -117,10 +121,6 @@ sub parse_received_headers {
}
}
my $IP_ADDRESS = IP_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
my $LOCALHOST = LOCALHOST;
my @hdrs = $msg->get_header('Received');
# Now add the single line headers like X-Originating-IP. (bug 5680)
@ -336,9 +336,6 @@ sub parse_received_line {
my $ident = '';
my $envfrom = undef;
my $mta_looked_up_dns = 0;
my $IP_ADDRESS = IP_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
my $LOCALHOST = LOCALHOST;
my $auth = '';
# ---------------------------------------------------------------------------
@ -430,7 +427,7 @@ sub parse_received_line {
$auth = "GMX ($4 / $3)";
}
# Critical Path Messaging Server
elsif (/ \(authenticated as /&&/\) by .+ \(\d{1,2}\.\d\.\d{3}(?:\.\d{1,3})?\) \(authenticated as .+\) id /) {
elsif (/ \(authenticated as / && /\) by .+ \(\d{1,2}\.\d\.\d{3}(?:\.\d{1,3})?\) \(authenticated as .+\) id /) {
$auth = 'CriticalPath';
}
# Postfix 2.3 and later with "smtpd_sasl_authenticated_header yes"
@ -820,7 +817,7 @@ sub parse_received_line {
# Received: from [193.220.176.134] by web40310.mail.yahoo.com via HTTP;
# Wed, 12 Feb 2003 14:22:21 PST
if (/ via HTTP$/&&/^\[(${IP_ADDRESS})\] by (\S+) via HTTP$/) {
if (/ via HTTP$/ && /^\[(${IP_ADDRESS})\] by (\S+) via HTTP$/) {
$ip = $1; $by = $2; goto enough;
}
@ -944,13 +941,13 @@ sub parse_received_line {
# Received: from [129.24.215.125] by ws1-7.us4.outblaze.com with http for
# _bushisevil_@mail.com; Thu, 13 Feb 2003 15:59:28 -0500
if (/ with http for /&&/^\[(${IP_ADDRESS})\] by (\S+) with http for /) {
if (/ with http for / && /^\[(${IP_ADDRESS})\] by (\S+) with http for /) {
$ip = $1; $by = $2; goto enough;
}
# Received: from snake.corp.yahoo.com(216.145.52.229) by x.x.org via smap (V1.3)
# id xma093673; Wed, 26 Mar 03 20:43:24 -0600
if (/ via smap /&&/^(\S+)\((${IP_ADDRESS})\) by (\S+) via smap /) {
if (/ via smap / && /^(\S+)\((${IP_ADDRESS})\) by (\S+) via smap /) {
$mta_looked_up_dns = 1;
$rdns = $1; $ip = $2; $by = $3; goto enough;
}
@ -965,13 +962,13 @@ sub parse_received_line {
# Received: from [192.168.0.71] by web01-nyc.clicvu.com (Post.Office MTA
# v3.5.3 release 223 ID# 0-64039U1000L100S0V35) with SMTP id com for
# <x@x.org>; Tue, 25 Mar 2003 11:42:04 -0500
if (/ \(Post/&&/^\[(${IP_ADDRESS})\] by (\S+) \(Post/) {
if (/ \(Post/ && /^\[(${IP_ADDRESS})\] by (\S+) \(Post/) {
$ip = $1; $by = $2; goto enough;
}
# Received: from [127.0.0.1] by euphoria (ArGoSoft Mail Server
# Freeware, Version 1.8 (1.8.2.5)); Sat, 8 Feb 2003 09:45:32 +0200
if (/ \(ArGoSoft/&&/^\[(${IP_ADDRESS})\] by (\S+) \(ArGoSoft/) {
if (/ \(ArGoSoft/ && /^\[(${IP_ADDRESS})\] by (\S+) \(ArGoSoft/) {
$ip = $1; $by = $2; goto enough;
}
@ -984,7 +981,7 @@ sub parse_received_line {
# Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4)
# for asrg@ietf.org; 7 Mar 2003 21:10:38 -0000
if (/ with BSMTP/&&/^\S+ by \S+ with BSMTP/) {
if (/ with BSMTP/ && /^\S+ by \S+ with BSMTP/) {
return 0; # BSMTP != a TCP/IP handover, ignore it
}

View File

@ -36,8 +36,6 @@ use strict;
use warnings;
use re 'taint';
require 5.008001; # needs utf8::is_utf8()
use Mail::SpamAssassin;
use Mail::SpamAssassin::Constants qw(:sa);
use Mail::SpamAssassin::HTML;
@ -167,8 +165,8 @@ sub header {
my $key = lc($rawkey);
# Trim whitespace off of the header keys
$key =~ s/^\s+//;
$key =~ s/\s+$//;
#$key =~ s/^\s+//;
#$key =~ s/\s+$//;
if (@_) {
my $raw_value = shift;
@ -388,6 +386,12 @@ sub detect_utf16 {
my $sum_l_o = 0;
my $decoder = undef;
# avoid scan if BOM present
if( $data =~ /^(?:\xff\xfe|\xfe\xff)/ ) {
dbg( "message: detect_utf16: found BOM" );
return; # let perl figure it out from the BOM
}
my @msg_h = unpack 'H' x length( $data ), $data;
my @msg_l = unpack 'h' x length( $data ), $data;
@ -397,11 +401,11 @@ sub detect_utf16 {
$sum_h_o += hex $msg_h[$i+1];
$sum_l_e += hex $msg_l[$i];
$sum_l_o += hex $msg_l[$i+1];
if( $check_char =~ /20 00/ ) {
if (index($check_char, '20 00') >= 0) {
# UTF-16LE space char detected
$utf16le_clues++;
}
if( $check_char =~ /00 20/ ) {
if (index($check_char, '00 20') >= 0) {
# UTF-16BE space char detected
$utf16be_clues++;
}
@ -416,7 +420,7 @@ sub detect_utf16 {
if( $utf16le_clues > $utf16be_clues ) {
dbg( "message: detect_utf16: UTF-16LE" );
$decoder = Encode::find_encoding("UTF-16LE");
} elsif( $utf16le_clues > $utf16be_clues ) {
} elsif( $utf16be_clues > $utf16le_clues ) {
dbg( "message: detect_utf16: UTF-16BE" );
$decoder = Encode::find_encoding("UTF-16BE");
} else {
@ -450,6 +454,7 @@ sub _normalize {
# my $data = $_[0]; # avoid copying large strings
my $charset_declared = $_[1];
my $return_decoded = $_[2]; # true: Unicode characters, false: UTF-8 octets
my $insist_on_declared_charset = $_[3]; # no FB_CROAK in Encode::decode
warn "message: _normalize() was given characters, expected bytes: $_[0]\n"
if utf8::is_utf8($_[0]);
@ -457,10 +462,6 @@ sub _normalize {
# workaround for Encode::decode taint laundering bug [rt.cpan.org #84879]
my $data_taint = substr($_[0], 0, 0); # empty string, tainted like $data
if (!defined $charset_declared || $charset_declared eq '') {
$charset_declared = 'us-ascii';
}
# number of characters with code above 127
my $cnt_8bits = $_[0] =~ tr/\x00-\x7F//c;
@ -469,7 +470,8 @@ sub _normalize {
/^(?: (?:US-)?ASCII | ANSI[_ ]? X3\.4- (?:1986|1968) |
ISO646-US )\z/xsi)
{ # declared as US-ASCII (a.k.a. ANSI X3.4-1986) and it really is
dbg("message: kept, charset is US-ASCII as declared");
dbg("message: contains only US-ASCII characters, declared %s, not decoding",
$charset_declared);
return $_[0]; # is all-ASCII, no need for decoding
}
@ -479,17 +481,21 @@ sub _normalize {
UTF-?8 | (KOI8|EUC)-[A-Z]{1,2} |
Big5 | GBK | GB[ -]?18030 (?:-20\d\d)? )\z/xsi)
{ # declared as extended ASCII, but it is actually a plain 7-bit US-ASCII
dbg("message: kept, charset is US-ASCII, declared %s", $charset_declared);
dbg("message: contains only US-ASCII characters, declared %s, not decoding",
$charset_declared);
return $_[0]; # is all-ASCII, no need for decoding
}
# Try first to strictly decode based on a declared character set.
my $rv;
if ($charset_declared =~ /^UTF-?8\z/i) {
# attempt decoding as strict UTF-8 (flags: FB_CROAK | LEAVE_SRC)
# Try first as UTF-8 ignoring declaring?
my $tried_utf8;
if ($cnt_8bits && !$insist_on_declared_charset) {
if (eval { $rv = $enc_utf8->decode($_[0], 1|8); defined $rv }) {
dbg("message: decoded as declared charset UTF-8");
dbg("message: decoded as charset UTF-8, declared %s",
$charset_declared);
return $_[0] if !$return_decoded;
$rv .= $data_taint; # carry taintedness over, avoid Encode bug
return $rv; # decoded
@ -499,8 +505,16 @@ sub _normalize {
$err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
$err = " ($err)";
}
dbg("message: failed decoding as declared charset UTF-8 ($err)");
dbg("message: failed decoding as charset UTF-8, declared %s%s",
$charset_declared, $err);
$tried_utf8 = 1;
}
}
if ($charset_declared =~ /^(?:US-)?ASCII\z/i
&& !$insist_on_declared_charset) {
# declared as US-ASCII but contains 8-bit characters, makes no sense
# to attempt decoding first as strict US-ASCII as we know it would fail
} elsif ($charset_declared =~ /^UTF[ -]?16/i) {
# Handle cases where spammers use UTF-16 encoding without including a BOM
@ -508,8 +522,10 @@ sub _normalize {
# https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7252
my $decoder = detect_utf16( $_[0] );
if (defined $decoder) {
if (eval { $rv = $decoder->decode($_[0], 1|8); defined $rv }) {
dbg("message: declared charset %s decoded as charset %s", $charset_declared, $decoder->name);
dbg("message: decoded as charset %s, declared %s",
$decoder->name, $charset_declared);
return $_[0] if !$return_decoded;
$rv .= $data_taint; # carry taintedness over, avoid Encode bug
return $rv; # decoded
@ -519,20 +535,10 @@ sub _normalize {
$err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
$err = " ($err)";
}
dbg("message: failed decoding as declared charset %s%s", $charset_declared, $err);
dbg("message: failed decoding as charset %s, declared %s%s",
$decoder->name, $charset_declared, $err);
}
};
} elsif ($cnt_8bits &&
eval { $rv = $enc_utf8->decode($_[0], 1|8); defined $rv }) {
dbg("message: decoded as charset UTF-8, declared %s", $charset_declared);
return $_[0] if !$return_decoded;
$rv .= $data_taint; # carry taintedness over, avoid Encode bug
return $rv; # decoded
} elsif ($charset_declared =~ /^(?:US-)?ASCII\z/i) {
# declared as US-ASCII but contains 8-bit characters, makes no sense
# to attempt decoding first as strict US-ASCII as we know it would fail
} else {
# try decoding as a declared character set
@ -553,8 +559,11 @@ sub _normalize {
my($chset, $decoder);
if ($charset_declared =~ /^(?: ISO-?8859-1 | Windows-1252 | CP1252 )\z/xi) {
$chset = 'Windows-1252'; $decoder = $enc_w1252;
} elsif ($charset_declared =~ /^UTF-?8\z/i) {
$chset = 'UTF-8'; $decoder = $enc_utf8;
} else {
$chset = $charset_declared; $decoder = Encode::find_encoding($chset);
$chset = $charset_declared;
$decoder = Encode::find_encoding($chset);
if (!$decoder && $chset =~ /^GB[ -]?18030(?:-20\d\d)?\z/i) {
$decoder = Encode::find_encoding('GBK'); # a subset of GB18030
dbg("message: no decoder for a declared charset %s, using GBK",
@ -564,20 +573,24 @@ sub _normalize {
if (!$decoder) {
dbg("message: failed decoding, no decoder for a declared charset %s",
$chset);
} else {
}
elsif ($tried_utf8 && $chset eq 'UTF-8') {
# was already tried initially, no point doing again
}
else {
my $check_flags = Encode::LEAVE_SRC; # 0x0008
$check_flags |= Encode::FB_CROAK unless $insist_on_declared_charset;
my $err = '';
eval { $rv = $decoder->decode($_[0], 1|8) }; # FB_CROAK | LEAVE_SRC
if (eval { $rv = $decoder->decode($_[0], $check_flags); defined $rv }) {
dbg("message: decoded as charset %s, declared %s",
$decoder->name, $charset_declared);
} else {
if ($@) {
$err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
$err = " ($err)";
}
if (lc $chset eq lc $charset_declared) {
dbg("message: %s as declared charset %s%s",
defined $rv ? 'decoded' : 'failed decoding', $charset_declared, $err);
} else {
dbg("message: %s as charset %s, declared %s%s",
defined $rv ? 'decoded' : 'failed decoding',
$chset, $charset_declared, $err);
dbg("message: failed decoding as charset %s, declared %s%s",
$decoder->name, $charset_declared, $err);
}
}
}
@ -589,7 +602,7 @@ sub _normalize {
# Note that Windows-1252 is a proper superset of ISO-8859-1.
#
if (!defined $rv && !$cnt_8bits) {
dbg("message: kept, guessed charset is US-ASCII, declared %s",
dbg("message: contains only US-ASCII characters, declared %s, not decoding",
$charset_declared);
return $_[0]; # is all-ASCII, no need for decoding
@ -675,7 +688,7 @@ sub _normalize {
=item rendered()
render_text() takes the given text/* type MIME part, and attempts to
rendered() takes the given text/* type MIME part, and attempts to
render it into a text scalar. It will always render text/html, and will
use a heuristic to determine if other text/* parts should be considered
text/html. Two scalars are returned: the rendered type (either text/html
@ -686,18 +699,31 @@ or whatever the original type was), and the rendered text.
sub rendered {
my ($self) = @_;
if (!exists $self->{rendered}) {
# Cached?
if (exists $self->{rendered}) {
return ($self->{rendered_type}, $self->{rendered});
}
# We only know how to render text/plain and text/html ...
# Note: for bug 4843, make sure to skip text/calendar parts
# we also want to skip things like text/x-vcard
# text/x-aol is ignored here, but looks like text/html ...
return(undef,undef) unless ( $self->{'type'} =~ /^text\/(?:plain|html)$/i );
my $type = lc $self->{'type'};
unless ($type eq 'text/plain' || $type eq 'text/html') {
return (undef,undef);
}
my $text = $self->decode; # QP and Base64 decoding, bytes
my $text_len = length($text); # num of bytes in original charset encoding
my $charset = $self->{charset};
if (!defined $charset) {
dbg("message: no charset declared, using us-ascii");
$charset = 'us-ascii';
}
# render text/html always
if ($text ne '' && $self->{'type'} =~ m{^text/html$}i)
if ($text ne '' && $type eq 'text/html')
{
$self->{rendered_type} = 'text/html';
@ -712,21 +738,39 @@ sub rendered {
# subroutine _normalize() to return Unicode text. See Bug 7133
#
$character_semantics = 1; # $text will be in characters
$text = _normalize($text, $self->{charset}, 1); # bytes to chars
} elsif (!defined $self->{charset} ||
$self->{charset} =~ /^(?:US-ASCII|UTF-8)\z/i) {
# With some luck input can be interpreted as UTF-8, do not warn.
# It is still possible to hit the HTML::Parses utf8_mode bug however.
$text = _normalize($text, $charset, 1); # bytes to chars
} elsif ($charset =~ /^(?:US-ASCII|UTF-8)\z/i) {
if ($text !~ tr/\x00-\x7F//c) {
# all-ASCII, keep as octets (utf8 flag off)
dbg("message: contains only US-ASCII characters, declared %s, not decoding",
$charset);
} else { # non-ASCII, try UTF-8
my $rv;
# with some luck input can be interpreted as UTF-8
if (eval { $rv = $enc_utf8->decode($text, 1|8); defined $rv }) {
$text = $rv; # decoded to perl characters
$character_semantics = 1; # $text will be in characters
dbg("message: decoded as charset UTF-8, declared %s", $charset);
} else {
my $err = '';
if ($@) {
$err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
$err = " ($err)";
}
dbg("message: failed decoding as charset UTF-8, declared %s%s",
$charset, $err);
}
}
} else {
dbg("message: 'normalize_charset' is off, encoding will likely ".
"be misinterpreted; declared charset: %s", $self->{charset});
"be misinterpreted; declared charset: %s", $charset);
}
# the 0 requires decoded HTML results to be in bytes (not characters)
my $html = Mail::SpamAssassin::HTML->new($character_semantics,0); # object
# the 1 requires decoded HTML results to be in characters (utf8 flag on)
my $html = Mail::SpamAssassin::HTML->new($character_semantics,1); # object
$html->parse($text); # parse+render text
# resulting HTML-decoded text is in bytes, likely encoded as UTF-8
# resulting HTML-decoded text is in perl characters (utf8 flag on)
$self->{rendered} = $html->get_rendered_text();
$self->{visible_rendered} = $html->get_rendered_text(invisible => 0);
$self->{invisible_rendered} = $html->get_rendered_text(invisible => 1);
@ -735,24 +779,56 @@ sub rendered {
# end-of-document result values that require looking at the text
my $r = $self->{html_results}; # temporary reference for brevity
# count the number of spaces in the rendered text (likely UTF-8 octets)
my $space = $self->{rendered} =~ tr/ \t\n\r\x0b//;
# count the number of spaces in the rendered text
my $space;
if (utf8::is_utf8($self->{rendered})) {
my $str = $self->{rendered};
$str =~ s/\S+//g; # delete non-whitespace Unicode characters
$space = length $str; # count remaining Unicode space characters
undef $str; # deallocate storage
dbg("message: spaces (Unicode) in HTML: %d out of %d%s",
$space, length $self->{rendered},
$character_semantics ? '' : ', octets!?');
} else {
$space = $self->{rendered} =~ tr/ \t\n\r\x0b//;
dbg("message: spaces (octets) in HTML: %d out of %d%s",
$space, length $self->{rendered},
$character_semantics ? ', chars!?' : '');
}
# we may want to add the count of other Unicode whitespace characters
$r->{html_length} = length $self->{rendered}; # bytes (likely UTF-8)
$r->{html_length} = length $self->{rendered}; # perl characters count
$r->{non_space_len} = $r->{html_length} - $space;
$r->{ratio} = ($text_len - $r->{html_length}) / $text_len if $text_len;
}
else { # plain text
if ($self->{normalize} && $enc_utf8) {
# request transcoded result as UTF-8 octets!
$text = _normalize($text, $self->{charset}, 0);
$text = _normalize($text, $charset, 1); # bytes to chars
} elsif ($charset =~ /^(?:US-ASCII|UTF-8)\z/i) {
if ($text =~ tr/\x00-\x7F//c) { # non-ASCII, try UTF-8
my $rv;
# with some luck input can be interpreted as UTF-8
if (eval { $rv = $enc_utf8->decode($text, 1|8); defined $rv }) {
$text = $rv; # decoded to perl characters
dbg("message: decoded as charset UTF-8, declared %s", $charset);
} else {
my $err = '';
if ($@) {
$err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
$err = " ($err)";
}
$self->{rendered_type} = $self->{type};
$self->{rendered} = $self->{'visible_rendered'} = $text;
$self->{'invisible_rendered'} = '';
dbg("message: failed decoding as charset UTF-8, declared %s%s",
$charset, $err);
}
} else {
dbg("message: contains only US-ASCII characters, declared %s, not decoding",
$charset);
}
}
$self->{rendered_type} = $type;
$self->{rendered} = $self->{visible_rendered} = $text;
$self->{invisible_rendered} = '';
}
return ($self->{rendered_type}, $self->{rendered});
@ -841,24 +917,24 @@ Delete the specified header (decoded and raw) from the Node information.
sub delete_header {
my($self, $hdr) = @_;
foreach ( grep(/^${hdr}$/i, keys %{$self->{'headers'}}) ) {
foreach ( grep(/^${hdr}$/io, keys %{$self->{'headers'}}) ) {
delete $self->{'headers'}->{$_};
delete $self->{'raw_headers'}->{$_};
}
my @neworder = grep(!/^${hdr}$/i, @{$self->{'header_order'}});
my @neworder = grep(!/^${hdr}$/io, @{$self->{'header_order'}});
$self->{'header_order'} = \@neworder;
}
# decode a header appropriately. don't bother adding it to the pod documents.
sub __decode_header {
# decode 'encoded-word' (RFC 2047, RFC 2231)
sub _decode_mime_encoded_word {
my ( $encoding, $cte, $data ) = @_;
if ( $cte eq 'B' ) {
if ( uc $cte eq 'B' ) {
# base 64 encoded
$data = Mail::SpamAssassin::Util::base64_decode($data);
}
elsif ( $cte eq 'Q' ) {
elsif ( uc $cte eq 'Q' ) {
# quoted printable
# the RFC states that in the encoded text, "_" is equal to "=20"
@ -868,12 +944,24 @@ sub __decode_header {
}
else {
# not possible since the input has already been limited to 'B' and 'Q'
die "message: unknown encoding type '$cte' in RFC2047 header";
die "message: unknown encoding type '$cte' in RFC 2047 header";
}
return _normalize($data, $encoding, 0); # transcode to UTF-8 octets
if (defined $encoding) {
# RFC 2231 section 5: Language specification in Encoded Words
# =?US-ASCII*EN?Q?Keith_Moore?=
# strip optional language information following an asterisk
$encoding =~ s{ \* .* \z }{}xs;
$data = _normalize($data, $encoding, 0, 1); # transcode to UTF-8 octets
}
# dbg("message: _decode_mime_encoded_word (%s, %s): %s",
# $cte, $encoding || '-', $data);
return $data; # as UTF-8 octets
}
# Decode base64 and quoted-printable in headers according to RFC2047.
# Decode base64 and quoted-printable in headers according to RFC 2047.
#
sub _decode_header {
my($header_field_body, $header_field_name) = @_;
@ -881,35 +969,86 @@ sub _decode_header {
return '' unless defined $header_field_body && $header_field_body ne '';
# deal with folding and cream the newlines and such
$header_field_body =~ s/\n[ \t]+/\n /g;
$header_field_body =~ s/\n[ \t]/\n /g; # turning tab into space on folds
$header_field_body =~ s/\015?\012//gs;
if ($header_field_body =~ tr/\x00-\x7F//c) {
# Non-ASCII characters in header are not allowed by RFC 5322, but
# RFC 6532 relaxed the rule and allows UTF-8 encoding in header
# field bodies; no other encoding is allowed there (apart from
# RFC 2047 MIME encoded words, which must be all-ASCII anyway).
# The following call keeps UTF-8 octets if valid, otherwise tries
# some decoding guesswork so that the result is valid UTF-8 (octets).
$header_field_body = _normalize($header_field_body, 'UTF-8', 0);
}
if ($header_field_name =~
/^ (?: Received | (?:Resent-)? (?: Message-ID | Date ) |
MIME-Version | References | In-Reply-To | List-.* ) \z /xsi ) {
# Bug 6945: some header fields must not be processed for MIME encoding
# Bug 7466: leave out the Content-*
# Bug 7249: leave out the Content-*
} else {
local($1,$2,$3);
} elsif (index($header_field_body, '=?') != -1) { # triage for possible encoded-words
local($1,$2,$3,$4);
# Multiple encoded sections must ignore the interim whitespace.
# To avoid possible FPs with (\s+(?==\?))?, look for the whole RE
# separated by whitespace.
1 while $header_field_body =~
s{ ( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) }
{$1$2}xsg;
# transcode properly encoded RFC 2047 substrings into UTF-8 octets,
# leave everything else unchanged as it is supposed to be UTF-8 (RFC 6532)
# or plain US-ASCII
$header_field_body =~
s{ (?: = \? ([A-Za-z0-9_-]+) \? ([bqBQ]) \? ([^?]*) \? = ) }
{ __decode_header($1, uc($2), $3) }xsge;
s{ ( = \? [A-Za-z0-9*_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
(?= = \? [A-Za-z0-9*_-]+ \? [bqBQ] \? [^?]* \? = ) }{$1}xsg;
# Bug 7249: work around violations of the RFC 2047 section 5 requirement:
# Each 'encoded-word' MUST represent an integral number of characters.
# A multi-octet character may not be split across adjacent 'encoded-word's
# Unfortunately such violations are not uncommon.
#
# Bug 7307: to deal with the above, base64/QP decoding must be decoupled
# from decoding a specified multi-byte character set into UTF-8.
# A previous simpler code could not handle base64 fill bits correctly
# (merging of adjecent encoded sections before base64/QP decoding them).
my @sections; # array of pairs: [string, encoding]
my $last_encoding = '';
while ( $header_field_body =~
m{ \G = \? ([A-Za-z0-9*_-]+) \? ([bqBQ]) \? ([^?]*) \? =
| ( [^=]+ | . ) }xsg ) {
my($encoding, $str);
if (defined $1) { # we have an encoded section
$encoding = lc $1;
# decode base64 / QP decoding, remember encoding charset
$str = _decode_mime_encoded_word(undef, $2, $3);
} else { # non-encoded text
$encoding = '';
$str = $4;
}
if ($encoding eq $last_encoding && @sections) {
# merge sections with same encoding - in violation of RFC 2047 sect.5
$sections[$#sections]->[0] .= $str;
} else {
push(@sections, [$str, $encoding]);
}
$last_encoding = $encoding;
}
# dbg("message: _decode_header %s: %s", $header_field_name, $header_field_body);
# transcode encoded RFC 2047 substrings (already base64/QP-decoded)
# into UTF-8 octets, leave everything else unchanged as it is supposed
# to be UTF-8 (RFC 6532) or its plain US-ASCII subset (RFC 5322);
#
my $decoded_result = '';
for my $sect (@sections) {
my $encoding = $sect->[1];
# RFC 2231 section 5: Language specification in Encoded Words
# =?US-ASCII*EN?Q?Keith_Moore?=
# strip optional language information following an asterisk
$encoding =~ s{ \* .* \z }{}xs;
$decoded_result .=
$encoding eq '' ? $sect->[0] : _normalize($sect->[0], $encoding, 0, 1);
}
$header_field_body = $decoded_result;
}
dbg("message: _decode_header %s: %s", $header_field_name, $header_field_body);
return $header_field_body;
}

View File

@ -33,7 +33,7 @@ BEGIN {
eval {
require Net::Patricia;
Net::Patricia->VERSION(1.16); # need AF_INET6 support
import Net::Patricia;
Net::Patricia->import;
$have_patricia = 1;
};
}
@ -59,6 +59,7 @@ sub new {
sub DESTROY {
my($self) = shift;
if (exists $self->{cache}) {
local($@, $!, $_); # protect outer layers from a potential surprise
my($hits, $attempts) = ($self->{cache_hits}, $self->{cache_attempts});
@ -76,7 +77,45 @@ sub add_cidr {
my $numadded = 0;
delete $self->{cache}; # invalidate cache (in case of late additions)
# Pre-parse x.x.x.x-x.x.x.x range notation into CIDR blocks
# requires Net::CIDR::Lite
my @nets2;
foreach my $cidr_orig (@nets) {
next if index($cidr_orig, '-') == -1; # Triage
my $cidr = $cidr_orig;
my $exclude = ($cidr =~ s/^!\s*//) ? 1 : 0;
local($1);
$cidr =~ s/\b0+(\d+)/$1/; # Strip leading zeroes
eval { require Net::CIDR::Lite; }; # Only try to load now when it's necessary
if ($@) {
warn "netset: IP range notation '$cidr_orig' requires Net::CIDR::Lite module, ignoring\n";
$cidr_orig = undef;
next;
}
my $cidrs = Net::CIDR::Lite->new;
eval { $cidrs->add_range($cidr); };
if ($@) {
my $err = $@; $err =~ s/ at .*//s;
warn "netset: illegal IP range '$cidr_orig': $err\n";
$cidr_orig = undef;
next;
}
my @arr = $cidrs->list;
if (!@arr) {
my $err = $@; $err =~ s/ at .*//s;
warn "netset: failed to parse IP range '$cidr_orig': $err\n";
$cidr_orig = undef;
next;
}
# Save exclude flag
if ($exclude) { $_ = "!$_" foreach (@arr); }
# Rewrite this @nets value directly, add any rest to @nets2
$cidr_orig = shift @arr;
push @nets2, @arr if @arr;
}
foreach my $cidr_orig (@nets, @nets2) {
next unless defined $cidr_orig;
my $cidr = $cidr_orig; # leave original unchanged, useful for logging
# recognizes syntax:

View File

@ -21,7 +21,7 @@ Mail::SpamAssassin::PerMsgLearner - per-message status (spam or not-spam)
=head1 SYNOPSIS
my $spamtest = new Mail::SpamAssassin ({
my $spamtest = Mail::SpamAssassin->new({
'rules_filename' => '/etc/spamassassin.rules',
'userprefs_filename' => $ENV{HOME}.'/.spamassassin/user_prefs'
});
@ -97,8 +97,8 @@ sub learn_spam {
my ($self, $id) = @_;
# bug 4096
# if ($self->{main}->{learn_with_whitelist}) {
# $self->{main}->add_all_addresses_to_blacklist ($self->{msg});
# if ($self->{main}->{learn_with_welcomelist}) {
# $self->{main}->add_all_addresses_to_blocklist ($self->{msg});
# }
# use the real message-id here instead of mass-check's idea of an "id",
@ -124,8 +124,8 @@ sub learn_ham {
my ($self, $id) = @_;
# bug 4096
# if ($self->{main}->{learn_with_whitelist}) {
# $self->{main}->add_all_addresses_to_whitelist ($self->{msg});
# if ($self->{main}->{learn_with_welcomelist}) {
# $self->{main}->add_all_addresses_to_welcomelist ($self->{msg});
# }
$self->{learned} = $self->{bayes_scanner}->learn (0, $self->{msg}, $id);
@ -148,8 +148,8 @@ sub forget {
my ($self, $id) = @_;
# bug 4096
# if ($self->{main}->{learn_with_whitelist}) {
# $self->{main}->remove_all_addresses_from_whitelist ($self->{msg});
# if ($self->{main}->{learn_with_welcomelist}) {
# $self->{main}->remove_all_addresses_from_welcomelist ($self->{msg});
# }
$self->{learned} = $self->{bayes_scanner}->forget ($self->{msg}, $id);

File diff suppressed because it is too large Load Diff

View File

@ -33,7 +33,7 @@ SpamAssassin will call:
=head1 DESCRIPTION
All persistent address list implementations, used by the auto-whitelist
All persistent address list implementations, used by the auto-welcomelist
code to track known-good email addresses, use this as a base class.
See C<Mail::SpamAssassin::DBBasedAddrList> for an example.
@ -81,7 +81,7 @@ SpamAssassin classes.
sub new_checker {
my ($factory, $main) = @_;
die "auto-whitelist: unimplemented base method"; # override this
die "auto-welcomelist: unimplemented base method"; # override this
}
###########################################################################
@ -109,7 +109,7 @@ a C<count> key and a C<totscore> key.
sub get_addr_entry {
my ($self, $addr, $signedby) = @_;
my $entry = { };
die "auto-whitelist: unimplemented base method"; # override this
die "auto-welcomelist: unimplemented base method"; # override this
return $entry;
}
@ -117,27 +117,27 @@ sub get_addr_entry {
=item $entry = $addrlist->add_score($entry, $score);
This method should add the given score to the whitelist database for the
This method should add the given score to the welcomelist database for the
given entry, and then return the new entry.
=cut
sub add_score {
my ($self, $entry, $score) = @_;
die "auto-whitelist: unimplemented base method"; # override this
die "auto-welcomelist: unimplemented base method"; # override this
}
###########################################################################
=item $entry = $addrlist->remove_entry ($entry);
This method should remove the given entry from the whitelist database.
This method should remove the given entry from the welcomelist database.
=cut
sub remove_entry {
my ($self, $entry) = @_;
die "auto-whitelist: unimplemented base method"; # override this
die "auto-welcomelist: unimplemented base method"; # override this
}
###########################################################################
@ -145,7 +145,7 @@ sub remove_entry {
=item $entry = $addrlist->finish ();
Clean up, if necessary. Called by SpamAssassin when it has finished
checking, or adding to, the auto-whitelist database.
checking, or adding to, the auto-welcomelist database.
=cut

View File

@ -427,6 +427,20 @@ The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.
=back
=item $plugin->check_dnsbl ( { options ... } )
Called when DNSBL or other network lookups are being launched, implying
current running priority of -100. This is the place to start your own
asynchronously-started network lookups.
=over 4
=item permsgstatus
The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.
=back
=item $plugin->check_post_dnsbl ( { options ... } )
Called after the DNSBL results have been harvested. This is a good
@ -440,6 +454,21 @@ The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.
=back
=item $plugin->check_cleanup ( { options ... } )
Called just before message check is finishing and before possible
auto-learning. This is guaranteed to be always called, unlike check_tick
and check_post_dnsbl. Used for cleaning up left callbacks or forked
children etc, last chance to make rules hit.
=over 4
=item permsgstatus
The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.
=back
=item $plugin->check_post_learn ( { options ... } )
Called after auto-learning may (or may not) have taken place. If you
@ -817,7 +846,9 @@ Reference to the original message object.
=back
=item $plugin->whitelist_address( { options ... } )
=item $plugin->welcomelist_address( { options ... } )
Previously whitelist_address which will work interchangeably until 4.1.
Called when a request is made to add an address to a
persistent address list.
@ -834,7 +865,9 @@ Indicate if the call is being made from a command line interface.
=back
=item $plugin->blacklist_address( { options ... } )
=item $plugin->blocklist_address( { options ... } )
Previously blacklist_address which will work interchangeably until 4.1.
Called when a request is made to add an address to a
persistent address list.
@ -882,7 +915,7 @@ only spamd calls this API.
=item result
The C<'result: ...'> line for this scan. Format is as described
at B<http://wiki.apache.org/spamassassin/SpamdSyslogFormat>.
at B<https://wiki.apache.org/spamassassin/SpamdSyslogFormat>.
=back
@ -1013,17 +1046,27 @@ to receive specific events, or control the callback chain behaviour.
=over 4
=item $plugin->register_eval_rule ($nameofevalsub)
=item $plugin->register_eval_rule ($nameofevalsub, $ruletype)
Plugins that implement an eval test will need to call this, so that
SpamAssassin calls into the object when that eval test is encountered.
See the B<REGISTERING EVAL RULES> section for full details.
Since 4.0, optional $ruletype can be specified to enforce that eval function
cannot be called with wrong ruletype from configuration, for example user
using "header FOO eval:foobar()" instead of "body FOO eval:foobar()".
Mismatch will result in lint failure. $ruletype can be one of:
$Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS
$Mail::SpamAssassin::Conf::TYPE_BODY_EVALS (allows both body and rawbody)
$Mail::SpamAssassin::Conf::TYPE_RAWBODY_EVALS
$Mail::SpamAssassin::Conf::TYPE_FULL_EVALS
=cut
sub register_eval_rule {
my ($self, $nameofsub) = @_;
$self->{main}->{conf}->register_eval_rule ($self, $nameofsub);
my ($self, $nameofsub, $ruletype) = @_;
$self->{main}->{conf}->register_eval_rule ($self, $nameofsub, $ruletype);
}
=item $plugin->register_generated_rule_method ($nameofsub)
@ -1132,7 +1175,7 @@ called from rules in the configuration files, in the plugin class' constructor.
For example,
$plugin->register_eval_rule ('check_for_foo')
$plugin->register_eval_rule ('check_for_foo', $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS)
will cause C<$plugin-E<gt>check_for_foo()> to be called for this
SpamAssassin rule:
@ -1157,18 +1200,21 @@ In other words, the eval test method should look something like this:
sub check_for_foo {
my ($self, $permsgstatus, ...arguments...) = @_;
...code returning 0 or 1
...code returning 0 (miss), 1 (hit), or undef (async function)
}
The eval rule should return C<1> for a hit, or C<0> if the rule is not hit.
Special case of "return undef" must be used when result is not yet ready and
it will be later declared with PerMsgStatus functions got_hit() or
rule_ready() - see their documentation for more info. Make sure not to
return undef by mistake.
Note that the headers can be accessed using the C<get()> method on the
C<Mail::SpamAssassin::PerMsgStatus> object, and the body by
C<get_decoded_stripped_body_text_array()> and other similar methods.
Similarly, the C<Mail::SpamAssassin::Conf> object holding the current
configuration may be accessed through C<$permsgstatus-E<gt>{main}-E<gt>{conf}>.
The eval rule should return C<1> for a hit, or C<0> if the rule
is not hit.
State for a single message being scanned should be stored on the C<$permsgstatus>
object, not on the C<$self> object, since C<$self> persists between scan
operations. See the 'lifecycle note' on the C<check_start()> method above.
@ -1216,8 +1262,8 @@ Mail::SpamAssassin(3)
Mail::SpamAssassin::PerMsgStatus(3)
http://wiki.apache.org/spamassassin/PluginWritingTips
https://wiki.apache.org/spamassassin/PluginWritingTips
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2163
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2163
=cut

View File

@ -50,13 +50,30 @@ Autonomous System Number (ASN) of the connecting IP address.
loadplugin Mail::SpamAssassin::Plugin::ASN
# Default / recommended settings
asn_use_geodb 1
asn_use_dns 1
asn_prefer_geodb 1
# Do lookups and add tags / X-Spam-ASN header
asn_lookup asn.routeviews.org _ASN_ _ASNCIDR_
asn_lookup_ipv6 origin6.asn.cymru.com _ASN_ _ASNCIDR_
add_header all ASN _ASN_ _ASNCIDR_
header TEST_AS1234 X-ASN =~ /^1234$/
# Rules to test ASN or Organization
# NOTE: Do not use rules that check metadata X-ASN header,
# only check_asn() eval function works correctly.
# Rule argument is full regexp to match.
# ASN Number: GeoIP ASN or DNS
# Matched string includes asn_prefix if defined, and normally
# looks like "AS1234" (DNS) or "AS1234 Google LLC" (GeoIP)
header AS_1234 eval:check_asn('/^AS1234\b/')
# ASN Organisation: GeoIP ASN has, DNS lists might not have
# Note the second parameter which checks MYASN tag (default is ASN)
asn_lookup myview.example.com _MYASN_ _MYASNCIDR_
header AS_GOOGLE eval:check_asn('/\bGoogle\b/i', 'MYASN')
=head1 DESCRIPTION
@ -68,12 +85,17 @@ high-volume environment> or that you should use a local mirror of the
zone (see C<ftp://ftp.routeviews.org/dnszones/>). Other similar zones
may also be used.
GeoDB (GeoIP ASN) database lookups are supported since SpamAssassin 4.0 and
it's recommended to use them instead of DNS queries, unless C<_ASNCIDR_>
is needed.
=head1 TEMPLATE TAGS
This plugin allows you to create template tags containing the connecting
IP's AS number and route info for that AS number.
The default config will add a header field that looks like this:
If you use add_header as documented in the example before, a header field is
added that looks like this:
X-Spam-ASN: AS24940 213.239.192.0/18
@ -87,29 +109,19 @@ all be added to the C<_ASNCIDR_> tag, separated by spaces, eg:
Note that the literal "AS" before the ASN in the _ASN_ tag is configurable
through the I<asn_prefix> directive and may be set to an empty string.
=head1 CONFIGURATION
C<_ASNCIDR_> is not available with local GeoDB ASN lookups.
The standard ruleset contains a configuration that will add a header field
containing ASN data to scanned messages. The bayes tokenizer will use the
added header field for bayes calculations, and thus affect which BAYES_* rule
will trigger for a particular message.
=head1 BAYES
B<Note> that in most cases you should not score on the ASN data directly.
Bayes learning will probably trigger on the _ASNCIDR_ tag, but probably not
very well on the _ASN_ tag alone.
The bayes tokenizer will use ASN data for bayes calculations, and thus
affect which BAYES_* rule will trigger for a particular message. No
in-depth analysis of the usefulness of bayes tokenization of ASN data has
been performed.
=head1 SEE ALSO
http://www.routeviews.org/ - all data regarding routing, ASNs, etc....
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4770 -
SpamAssassin Issue #4770 concerning this plugin
=head1 STATUS
No in-depth analysis of the usefulness of bayes tokenization of ASN data has
been performed.
=cut
package Mail::SpamAssassin::Plugin::ASN;
@ -120,27 +132,23 @@ use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(reverse_ip_address);
use Mail::SpamAssassin::Dns;
use Mail::SpamAssassin::Util qw(reverse_ip_address compile_regexp);
use Mail::SpamAssassin::Constants qw(:ip);
our @ISA = qw(Mail::SpamAssassin::Plugin);
our $txtdata_can_provide_a_list;
my $IPV4_ADDRESS = IPV4_ADDRESS;
sub new {
my ($class, $mailsa) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
$self->register_eval_rule("check_asn", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->set_config($mailsa->{conf});
#$txtdata_can_provide_a_list = Net::DNS->VERSION >= 0.69;
#more robust version check from Damyan Ivanov - Bug 7095
$txtdata_can_provide_a_list = version->parse(Net::DNS->VERSION) >= version->parse('0.69');
# we need GeoDB ASN
$self->{main}->{geodb_wanted}->{asn} = 1;
return $self;
}
@ -201,6 +209,25 @@ is kept for backward compatibility with versions of SpamAssassin earlier
than 3.4.0. A sensible setting is an empty string. The argument may be (but
need not be) enclosed in single or double quotes for clarity.
=item asn_use_geodb ( 0 / 1 ) (default: 1)
Use Mail::SpamAssassin::GeoDB module to lookup ASN numbers. You need
suitable supported module like GeoIP2 or GeoIP with ISP or ASN database
installed (for example, add EditionIDs GeoLite2-ASN in GeoIP.conf for
geoipupdate program).
GeoDB can only set _ASN_ tag, it has no data for _ASNCIDR_. If you need
both, then set asn_prefer_geodb 0 so DNS rules are tried.
=item asn_prefer_geodb ( 0 / 1 ) (default: 1)
If set, DNS lookups (asn_lookup rules) will not be run if GeoDB successfully
finds ASN. Set this to 0 to get _ASNCIDR_ even if GeoDB finds _ASN_.
=item asn_use_dns ( 0 / 1 ) (default: 1)
Set to 0 to never allow DNS queries.
=back
=cut
@ -272,26 +299,53 @@ need not be) enclosed in single or double quotes for clarity.
}
});
push (@cmds, {
setting => 'asn_use_geodb',
default => 1,
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
push (@cmds, {
setting => 'asn_prefer_geodb',
default => 1,
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
push (@cmds, {
setting => 'asn_use_dns',
default => 1,
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
$conf->{parser}->register_commands(\@cmds);
}
# ---------------------------------------------------------------------------
sub parsed_metadata {
sub extract_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $conf = $self->{main}->{conf};
my $conf = $pms->{conf};
if (!$pms->is_dns_available()) {
dbg("asn: DNS is not available, skipping ASN checks");
my $geodb = $self->{main}->{geodb};
my $has_geodb = $conf->{asn_use_geodb} && $geodb && $geodb->can('asn');
if ($has_geodb) {
dbg("asn: using GeoDB ASN for lookups");
} else {
dbg("asn: GeoDB ASN not available");
if (!$conf->{asn_use_dns} || !$pms->is_dns_available()) {
dbg("asn: DNS is not available, skipping ASN check");
return;
}
if (!$conf->{asnlookups} && !$conf->{asnlookups_ipv6}) {
dbg("asn: no asn_lookups configured, skipping ASN lookups");
if ($self->{main}->{learning}) {
dbg("asn: learning message, skipping DNS-based ASN check");
return;
}
}
# initialize the tag data so that if no result is returned from the DNS
# query we won't end up with a missing tag. Don't use $pms->set_tag()
@ -309,9 +363,11 @@ sub parsed_metadata {
}
}
# get reversed IP address of last external relay to lookup
# don't return until we've initialized the template tags
my $relay = $pms->{relays_external}->[0];
# Initialize status
$pms->{asn_results} = ();
# get IP address of last external relay to lookup
my $relay = $opts->{msg}->{metadata}->{relays_external}->[0];
if (!defined $relay) {
dbg("asn: no first external relay IP available, skipping ASN check");
return;
@ -319,18 +375,45 @@ sub parsed_metadata {
dbg("asn: first external relay is a private IP, skipping ASN check");
return;
}
my $ip = $relay->{ip};
my $reversed_ip = reverse_ip_address($ip);
if (defined $reversed_ip) {
dbg("asn: using first external relay IP for lookups: %s", $ip);
# GeoDB lookup
my $asn_found;
if ($has_geodb) {
my $asn = $geodb->get_asn($ip);
my $org = $geodb->get_asn_org($ip);
if (!defined $asn) {
dbg("asn: GeoDB ASN lookup failed");
} else {
dbg("asn: could not parse first external relay IP: %s, skipping", $ip);
$asn_found = 1;
dbg("asn: GeoDB found ASN $asn");
# Prevent double prefix
my $asn_value =
length($conf->{asn_prefix}) && index($asn, $conf->{asn_prefix}) != 0 ?
$conf->{asn_prefix}.$asn : $asn;
$asn_value .= ' '.$org if defined $org && length($org);
$pms->set_tag('ASN', $asn_value);
# For Bayes
$pms->{msg}->put_metadata('X-ASN', $asn);
}
}
# Skip DNS if GeoDB was successful and preferred
if ($asn_found && $conf->{asn_prefer_geodb}) {
dbg("asn: GeoDB lookup successful, skipping DNS lookups");
return;
}
# No point continuing without DNS from now on
if (!$conf->{asn_use_dns} || !$pms->is_dns_available()) {
dbg("asn: skipping disabled DNS lookups");
return;
}
dbg("asn: using DNS for lookups");
my $lookup_zone;
if ($ip =~ /^$IPV4_ADDRESS$/o) {
if ($ip =~ IS_IPV4_ADDRESS) {
if (!defined $conf->{asnlookups}) {
dbg("asn: asn_lookup for IPv4 not defined, skipping");
return;
@ -344,6 +427,12 @@ sub parsed_metadata {
$lookup_zone = "asnlookups_ipv6";
}
my $reversed_ip = reverse_ip_address($ip);
if (!defined $reversed_ip) {
dbg("asn: could not parse IP: %s, skipping", $ip);
return;
}
# we use arrays and array indices rather than hashes and hash keys
# in case someone wants the same zone added to multiple sets of tags
my $index = 0;
@ -351,14 +440,12 @@ sub parsed_metadata {
# do the DNS query, have the callback process the result
my $zone_index = $index;
my $zone = $reversed_ip . '.' . $entry->{zone};
my $key = "asnlookup-${lookup_zone}-${zone_index}-".$entry->{zone};
my $ent = $pms->{async}->bgsend_and_start_lookup($zone, 'TXT', undef,
{ type => 'ASN', key => $key, zone => $lookup_zone },
$pms->{async}->bgsend_and_start_lookup($zone, 'TXT', undef,
{ rulename => 'asn_lookup', type => 'ASN' },
sub { my($ent, $pkt) = @_;
$self->process_dns_result($pms, $pkt, $zone_index, $lookup_zone) },
master_deadline => $pms->{master_deadline}
);
$pms->register_async_rule_start($key) if $ent;
$index++;
}
}
@ -376,6 +463,9 @@ sub parsed_metadata {
sub process_dns_result {
my ($self, $pms, $pkt, $zone_index, $lookup_zone) = @_;
# NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
return if !$pkt;
my $conf = $self->{main}->{conf};
my $zone = $conf->{$lookup_zone}[$zone_index]->{zone};
@ -403,14 +493,10 @@ sub process_dns_result {
%route_tag_data_seen = map(($_,1), @route_tag_data);
}
# NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
my @answer = !defined $pkt ? () : $pkt->answer;
foreach my $rr (@answer) {
foreach my $rr ($pkt->answer) {
#dbg("asn: %s: lookup result packet: %s", $zone, $rr->string);
next if $rr->type ne 'TXT';
my @strings = $txtdata_can_provide_a_list ? $rr->txtdata :
$rr->char_str_list; # historical
my @strings = $rr->txtdata;
next if !@strings;
for (@strings) { utf8::encode($_) if utf8::is_utf8($_) }
@ -485,7 +571,49 @@ sub process_dns_result {
}
}
sub check_asn {
my ($self, $pms, $re, $asn_tag) = @_;
my $rulename = $pms->get_current_eval_rule_name();
if (!defined $re) {
warn "asn: rule $rulename eval argument missing\n";
return 0;
}
my ($rec, $err) = compile_regexp($re, 2);
if (!$rec) {
warn "asn: invalid regexp for $rulename '$re': $err\n";
return 0;
}
$asn_tag = 'ASN' unless defined $asn_tag;
$pms->action_depends_on_tags($asn_tag,
sub { my($pms,@args) = @_;
$self->_check_asn($pms, $rulename, $rec, $asn_tag);
}
);
return; # return undef for async status
}
sub _check_asn {
my ($self, $pms, $rulename, $rec, $asn_tag) = @_;
$pms->rule_ready($rulename); # mark rule ready for metas
my $asn = $pms->get_tag($asn_tag);
return if !defined $asn;
if ($asn =~ $rec) {
$pms->test_log("$asn_tag: $asn", $rulename);
$pms->got_hit($rulename, "");
}
}
# Version features
sub has_asn_lookup_ipv6 { 1 }
sub has_asn_geodb { 1 }
sub has_check_asn { 1 }
sub has_check_asn_tag { 1 } # $asn_tag parameter for check_asn()
1;

View File

@ -17,7 +17,7 @@
=head1 NAME
Mail::SpamAssassin::Plugin::AWL - Normalize scores via auto-whitelist
Mail::SpamAssassin::Plugin::AWL - Normalize scores via auto-welcomelist
=head1 SYNOPSIS
@ -28,14 +28,14 @@ To try this out, add this or uncomment this line in init.pre:
Use the supplied 60_awl.cf file (ie you don't have to do anything) or
add these lines to a .cf file:
header AWL eval:check_from_in_auto_whitelist()
describe AWL From: address is in the auto white-list
header AWL eval:check_from_in_auto_welcomelist()
describe AWL From: address is in the auto welcome-list
tflags AWL userconf noautolearn
priority AWL 1000
=head1 DESCRIPTION
This plugin module provides support for the auto-whitelist. It keeps
This plugin module provides support for the auto-welcomelist. It keeps
track of the average SpamAssassin score for senders. Senders are
tracked using a combination of their From: address and their IP address.
It then uses that average score to reduce the variability in scoring
@ -63,7 +63,7 @@ use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::AutoWhitelist;
use Mail::SpamAssassin::AutoWelcomelist;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Logger;
@ -80,7 +80,8 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule("check_from_in_auto_whitelist");
$self->register_eval_rule("check_from_in_auto_welcomelist", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_from_in_auto_whitelist", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); # removed in 4.1
$self->set_config($mailsaobject->{conf});
@ -99,18 +100,20 @@ SpamAssassin handles incoming email messages.
=over 4
=item use_auto_whitelist ( 0 | 1 ) (default: 1)
=item use_auto_welcomelist ( 0 | 1 ) (default: 1)
Whether to use auto-whitelists. Auto-whitelists track the long-term
Previously use_auto_whitelist which will work interchangeably until 4.1.
Whether to use auto-welcomelists. Auto-welcomelists track the long-term
average score for each sender and then shift the score of new messages
toward that long-term average. This can increase or decrease the score
for messages, depending on the long-term behavior of the particular
correspondent.
For more information about the auto-whitelist system, please look
at the C<Automatic Whitelist System> section of the README file.
The auto-whitelist is not intended as a general-purpose replacement
for static whitelist entries added to your config files.
For more information about the auto-welcomelist system, please look
at the C<Automatic Welcomelist System> section of the README file.
The auto-welcomelist is not intended as a general-purpose replacement
for static welcomelist entries added to your config files.
Note that certain tests are ignored when determining the final
message score:
@ -120,12 +123,15 @@ message score:
=cut
push (@cmds, {
setting => 'use_auto_whitelist',
setting => 'use_auto_welcomelist',
aliases => ['use_auto_whitelist'], # backward compatible - to be removed for 4.1
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
=item auto_whitelist_factor n (default: 0.5, range [0..1])
=item auto_welcomelist_factor n (default: 0.5, range [0..1])
Previously auto_whitelist_factor which will work interchangeably until 4.1.
How much towards the long-term mean for the sender to regress a message.
Basically, the algorithm is to track the long-term mean score of messages for
@ -143,12 +149,15 @@ mean; C<factor> = 0 mean just use the calculated score.
=cut
push (@cmds, {
setting => 'auto_whitelist_factor',
setting => 'auto_welcomelist_factor',
aliases => ['auto_whitelist_factor'], # backward compatible - to be removed for 4.1
default => 0.5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=item auto_whitelist_ipv4_mask_len n (default: 16, range [0..32])
=item auto_welcomelist_ipv4_mask_len n (default: 16, range [0..32])
Previously auto_whitelist_ipv4_mask_len which will work interchangeably until 4.1.
The AWL database keeps only the specified number of most-significant bits
of an IPv4 address in its fields, so that different individual IP addresses
@ -164,7 +173,8 @@ of 8, any split is allowed.
=cut
push (@cmds, {
setting => 'auto_whitelist_ipv4_mask_len',
setting => 'auto_welcomelist_ipv4_mask_len',
aliases => ['auto_whitelist_ipv4_mask_len'], # removed in 4.1
default => 16,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
@ -174,11 +184,13 @@ of 8, any split is allowed.
} elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_ipv4_mask_len} = $value;
$self->{auto_welcomelist_ipv4_mask_len} = $value;
}
});
=item auto_whitelist_ipv6_mask_len n (default: 48, range [0..128])
=item auto_welcomelist_ipv6_mask_len n (default: 48, range [0..128])
Previously auto_whitelist_ipv6_mask_len which will work interchangeably until 4.1.
The AWL database keeps only the specified number of most-significant bits
of an IPv6 address in its fields, so that different individual IP addresses
@ -195,7 +207,8 @@ is allowed.
=cut
push (@cmds, {
setting => 'auto_whitelist_ipv6_mask_len',
setting => 'auto_welcomelist_ipv6_mask_len',
aliases => ['auto_whitelist_ipv6_mask_len'], # removed in 4.1
default => 48,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
@ -205,7 +218,7 @@ is allowed.
} elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_ipv6_mask_len} = $value;
$self->{auto_welcomelist_ipv6_mask_len} = $value;
}
});
@ -215,7 +228,7 @@ Used by the SQLBasedAddrList storage implementation.
If this option is set the SQLBasedAddrList module will override the set
username with the value given. This can be useful for implementing global
or group based auto-whitelist databases.
or group based auto-welcomelist databases.
=cut
@ -225,7 +238,9 @@ or group based auto-whitelist databases.
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item auto_whitelist_distinguish_signed
=item auto_welcomelist_distinguish_signed
Previously auto_whitelist_distinguish_signed which will work interchangeably until 4.1.
Used by the SQLBasedAddrList storage implementation.
@ -240,7 +255,8 @@ turning on this option.
=cut
push (@cmds, {
setting => 'auto_whitelist_distinguish_signed',
setting => 'auto_welcomelist_distinguish_signed',
aliases => ['auto_whitelist_distinguish_signed'], # removed in 4.1
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
@ -256,32 +272,38 @@ user's C<user_prefs> file.
=over 4
=item auto_whitelist_factory module (default: Mail::SpamAssassin::DBBasedAddrList)
=item auto_welcomelist_factory module (default: Mail::SpamAssassin::DBBasedAddrList)
Select alternative whitelist factory module.
Previously auto_whitelist_factory which will work interchangeably until 4.1.
Select alternative welcomelist factory module.
=cut
push (@cmds, {
setting => 'auto_whitelist_factory',
setting => 'auto_welcomelist_factory',
aliases => ['auto_whitelist_factory'], # removed in 4.1
is_admin => 1,
default => 'Mail::SpamAssassin::DBBasedAddrList',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item auto_whitelist_path /path/filename (default: ~/.spamassassin/auto-whitelist)
=item auto_welcomelist_path /path/filename (default: ~/.spamassassin/auto-welcomelist)
This is the automatic-whitelist directory and filename. By default, each user
has their own whitelist database in their C<~/.spamassassin> directory with
Previously auto_whitelist_path which will work interchangeably until 4.1.
This is the automatic-welcomelist directory and filename. By default, each user
has their own welcomelist database in their C<~/.spamassassin> directory with
mode 0700. For system-wide SpamAssassin use, you may want to share this
across all users, although that is not recommended.
=cut
push (@cmds, {
setting => 'auto_whitelist_path',
setting => 'auto_welcomelist_path',
aliases => ['auto_whitelist_path'], # removed in 4.1
is_admin => 1,
default => '__userstate__/auto-whitelist',
default => '__userstate__/auto-welcomelist',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
@ -291,13 +313,15 @@ across all users, although that is not recommended.
if (-d $value) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_path} = $value;
$self->{auto_welcomelist_path} = $value;
}
});
=item auto_whitelist_db_modules Module ... (default: see below)
=item auto_welcomelist_db_modules Module ... (default: see below)
What database modules should be used for the auto-whitelist storage database
Previously auto_whitelist_db_modules which will work interchangeably until 4.1.
What database modules should be used for the auto-welcomelist storage database
file. The first named module that can be loaded from the perl include path
will be used. The format is:
@ -313,15 +337,18 @@ preclude its use for the AWL (see SpamAssassin bug 4353).
=cut
push (@cmds, {
setting => 'auto_whitelist_db_modules',
setting => 'auto_welcomelist_db_modules',
aliases => ['auto_whitelist_db_modules'], # removed in 4.1
is_admin => 1,
default => 'DB_File GDBM_File SDBM_File',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=item auto_whitelist_file_mode (default: 0700)
=item auto_welcomelist_file_mode (default: 0700)
The file mode bits used for the automatic-whitelist directory or file.
Previously auto_whitelist_file_mode which will work interchangeably until 4.1.
The file mode bits used for the automatic-welcomelist directory or file.
Make sure you specify this using the 'x' mode bits set, as it may also be used
to create directories. However, if a file is created, the resulting file will
@ -330,7 +357,8 @@ not have any execute bits set (the umask is set to 0111).
=cut
push (@cmds, {
setting => 'auto_whitelist_file_mode',
setting => 'auto_welcomelist_file_mode',
aliases => ['auto_whitelist_file_mode'], # removed in 4.1
is_admin => 1,
default => '0700',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
@ -339,7 +367,7 @@ not have any execute bits set (the umask is set to 0111).
if ($value !~ /^0?[0-7]{3}$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{auto_whitelist_file_mode} = untaint_var($value);
$self->{auto_welcomelist_file_mode} = untaint_var($value);
}
});
@ -390,7 +418,7 @@ The password for the database username, for the above DSN.
Used by the SQLBasedAddrList storage implementation.
The table user auto-whitelists are stored in, for the above DSN.
The table user auto-welcomelists are stored in, for the above DSN.
=cut
@ -404,15 +432,15 @@ The table user auto-whitelists are stored in, for the above DSN.
$conf->{parser}->register_commands(\@cmds);
}
sub check_from_in_auto_whitelist {
sub check_from_in_auto_welcomelist {
my ($self, $pms) = @_;
return 0 unless ($pms->{conf}->{use_auto_whitelist});
return 0 unless ($pms->{conf}->{use_auto_welcomelist});
my $timer = $self->{main}->time_method("total_awl");
my $from = lc $pms->get('From:addr');
# dbg("auto-whitelist: From: $from");
# dbg("auto-welcomelist: From: $from");
return 0 unless $from =~ /\S/;
# find the earliest usable "originating IP". ignore private nets
@ -443,18 +471,18 @@ sub check_from_in_auto_whitelist {
my $awlpoints = (sprintf "%0.3f", $points) + 0;
# Create the AWL object
my $whitelist;
my $welcomelist;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($pms->{main});
$welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($pms->{main});
my $meanscore;
{ # check
my $timer = $self->{main}->time_method("check_awl");
$meanscore = $whitelist->check_address($from, $origip, $signedby);
$meanscore = $welcomelist->check_address($from, $origip, $signedby);
}
my $delta = 0;
dbg("auto-whitelist: AWL active, pre-score: %s, autolearn score: %s, ".
dbg("auto-welcomelist: AWL active, pre-score: %s, autolearn score: %s, ".
"mean: %s, IP: %s, address: %s %s",
$pms->{score}, $awlpoints,
!defined $meanscore ? 'undef' : sprintf("%.3f",$meanscore),
@ -463,13 +491,13 @@ sub check_from_in_auto_whitelist {
if (defined $meanscore) {
$delta = $meanscore - $awlpoints;
$delta *= $pms->{main}->{conf}->{auto_whitelist_factor};
$delta *= $pms->{main}->{conf}->{auto_welcomelist_factor};
$pms->set_tag('AWL', sprintf("%2.1f",$delta));
if (defined $meanscore) {
$pms->set_tag('AWLMEAN', sprintf("%2.1f", $meanscore));
}
$pms->set_tag('AWLCOUNT', sprintf("%2.1f", $whitelist->count()));
$pms->set_tag('AWLCOUNT', sprintf("%2.1f", $welcomelist->count()));
$pms->set_tag('AWLPRESCORE', sprintf("%2.1f", $pms->{score}));
}
@ -478,7 +506,7 @@ sub check_from_in_auto_whitelist {
# later ones. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=159704
if (!$pms->{disable_auto_learning}) {
my $timer = $self->{main}->time_method("update_awl");
$whitelist->add_score($awlpoints);
$welcomelist->add_score($awlpoints);
}
# now redundant, got_hit() takes care of it
@ -491,135 +519,138 @@ sub check_from_in_auto_whitelist {
score => sprintf("%0.3f", $delta));
}
$whitelist->finish();
$welcomelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
# try an unlock, in case we got that far
eval { $whitelist->finish(); } if $whitelist;
eval { $welcomelist->finish(); } if $welcomelist;
return 0;
};
dbg("auto-whitelist: post auto-whitelist score: %.3f", $pms->{score});
dbg("auto-welcomelist: post auto-welcomelist score: %.3f", $pms->{score});
# test hit is above
return 0;
}
*check_from_in_auto_whitelist = \&check_from_in_auto_welcomelist; # removed in 4.1
sub blacklist_address {
sub blocklist_address {
my ($self, $args) = @_;
return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
return 0 unless ($self->{main}->{conf}->{use_auto_welcomelist});
unless ($args->{address}) {
print "SpamAssassin auto-whitelist: failed to add address to blacklist\n" if ($args->{cli_p});
dbg("auto-whitelist: failed to add address to blacklist");
print "SpamAssassin auto-welcomelist: failed to add address to blocklist\n" if ($args->{cli_p});
dbg("auto-welcomelist: failed to add address to blocklist");
return;
}
my $whitelist;
my $welcomelist;
my $status;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
$welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($self->{main});
if ($whitelist->add_known_bad_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-whitelist: adding address to blacklist: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-whitelist: adding address to blacklist: " . $args->{address});
if ($welcomelist->add_known_bad_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-welcomelist: adding address to blocklist: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-welcomelist: adding address to blocklist: " . $args->{address});
$status = 0;
}
else {
print "SpamAssassin auto-whitelist: error adding address to blacklist\n" if ($args->{cli_p});
dbg("auto-whitelist: error adding address to blacklist");
print "SpamAssassin auto-welcomelist: error adding address to blocklist\n" if ($args->{cli_p});
dbg("auto-welcomelist: error adding address to blocklist");
$status = 1;
}
$whitelist->finish();
$welcomelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
eval { $whitelist->finish(); };
warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
eval { $welcomelist->finish(); };
return 0;
};
return $status;
}
*blacklist_address = \&blocklist_address; # removed in 4.1
sub whitelist_address {
sub welcomelist_address {
my ($self, $args) = @_;
return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
return 0 unless ($self->{main}->{conf}->{use_auto_welcomelist});
unless ($args->{address}) {
print "SpamAssassin auto-whitelist: failed to add address to whitelist\n" if ($args->{cli_p});
dbg("auto-whitelist: failed to add address to whitelist");
print "SpamAssassin auto-welcomelist: failed to add address to welcomelist\n" if ($args->{cli_p});
dbg("auto-welcomelist: failed to add address to welcomelist");
return 0;
}
my $whitelist;
my $welcomelist;
my $status;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
$welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($self->{main});
if ($whitelist->add_known_good_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-whitelist: adding address to whitelist: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-whitelist: adding address to whitelist: " . $args->{address});
if ($welcomelist->add_known_good_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-welcomelist: adding address to welcomelist: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-welcomelist: adding address to welcomelist: " . $args->{address});
$status = 1;
}
else {
print "SpamAssassin auto-whitelist: error adding address to whitelist\n" if ($args->{cli_p});
dbg("auto-whitelist: error adding address to whitelist");
print "SpamAssassin auto-welcomelist: error adding address to welcomelist\n" if ($args->{cli_p});
dbg("auto-welcomelist: error adding address to welcomelist");
$status = 0;
}
$whitelist->finish();
$welcomelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
eval { $whitelist->finish(); };
warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
eval { $welcomelist->finish(); };
return 0;
};
return $status;
}
*whitelist_address = \&welcomelist_address; # removed in 4.1
sub remove_address {
my ($self, $args) = @_;
return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
return 0 unless ($self->{main}->{conf}->{use_auto_welcomelist});
unless ($args->{address}) {
print "SpamAssassin auto-whitelist: failed to remove address\n" if ($args->{cli_p});
dbg("auto-whitelist: failed to remove address");
print "SpamAssassin auto-welcomelist: failed to remove address\n" if ($args->{cli_p});
dbg("auto-welcomelist: failed to remove address");
return 0;
}
my $whitelist;
my $welcomelist;
my $status;
eval {
$whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
$welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($self->{main});
if ($whitelist->remove_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-whitelist: removing address: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-whitelist: removing address: " . $args->{address});
if ($welcomelist->remove_address($args->{address}, $args->{signedby})) {
print "SpamAssassin auto-welcomelist: removing address: " . $args->{address} . "\n" if ($args->{cli_p});
dbg("auto-welcomelist: removing address: " . $args->{address});
$status = 1;
}
else {
print "SpamAssassin auto-whitelist: error removing address\n" if ($args->{cli_p});
dbg("auto-whitelist: error removing address");
print "SpamAssassin auto-welcomelist: error removing address\n" if ($args->{cli_p});
dbg("auto-welcomelist: error removing address");
$status = 0;
}
$whitelist->finish();
$welcomelist->finish();
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
eval { $whitelist->finish(); };
warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
eval { $welcomelist->finish(); };
return 0;
};

View File

@ -72,7 +72,7 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule("check_access_database");
$self->register_eval_rule("check_access_database", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
return $self;
}

View File

@ -64,8 +64,8 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule("check_microsoft_executable");
$self->register_eval_rule("check_suspect_name");
$self->register_eval_rule("check_microsoft_executable", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_suspect_name", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -107,7 +107,7 @@ sub _check_attachments {
# file extension indicates an executable
$pms->{antivirus_microsoft_exe} = 1;
}
elsif ($cte =~ /base64/ && defined $p->raw()->[0] &&
elsif (index($cte, 'base64') >= 0 && defined $p->raw()->[0] &&
$p->raw()->[0] =~ /^TV[opqr].A..[AB].[AQgw][A-H].A/)
{
# base64-encoded executable

View File

@ -56,18 +56,30 @@ See the C<Mail::SpamAssassin::Conf> POD for details on C<rbl_timeout>.
A query template is a string which will be expanded to produce a domain name
to be used in a DNS query. The template may include SpamAssassin tag names,
which will be replaced by their values to form a final query domain.
The final query domain must adhere to rules governing DNS domains, i.e.
must consist of fields each up to 63 characters long, delimited by dots.
There may be a trailing dot at the end, but it is redundant / carries
no semantics, because SpamAssassin uses a Net::DSN::Resolver::send method
for querying DNS, which ignores any 'search' or 'domain' DNS resolver options.
must consist of fields each up to 63 characters long, delimited by dots,
not exceeding 255 characters. International domain names (in UTF-8) are
allowed and will be encoded to ASCII-compatible encoding (ACE) according
to IDN rules. Syntactically invalid resulting queries will be discarded
by the DNS resolver code (with some info warnings).
There may be a trailing dot at the end, but it is redundant / carries no
semantics, because SpamAssassin uses a Net::DSN::Resolver::send method for
querying DNS, which ignores any 'search' or 'domain' DNS resolver options.
Domain names in DNS queries are case-insensitive.
A tag name is a string of capital letters, preceded and followed by an
underscore character. This syntax mirrors the add_header setting, except that
tags cannot have parameters in parenthesis when used in askdns templates.
Tag names may appear anywhere in the template - each queried DNS zone
prescribes how a query should be formed.
underscore character. This syntax mirrors the add_header setting, except
that tags cannot have parameters in parenthesis when used in askdns
templates (exceptions found below). Tag names may appear anywhere in the
template - each queried DNS zone prescribes how a query should be formed.
Special supported tag HEADER() can be used to query any header content,
using same header names/modifiers that as header rules support. For example
_HEADER(Reply-To:addr:domain)_ can be used to query the trimmed domain part
of Reply-To address. See Mail::SpamAssassin::Conf documentation about
header rules.
A query template may contain any number of tag names including none,
although in the most common anticipated scenario exactly one tag name would
@ -114,7 +126,8 @@ parameter will only act as a filter on a result.
Currently recognized RR types in the rr_type parameter are: ANY, A, AAAA,
MX, TXT, PTR, NAPTR, NS, SOA, CERT, CNAME, DNAME, DHCID, HINFO, MINFO,
RP, HIP, IPSECKEY, KX, LOC, SRV, SSHFP, SPF.
RP, HIP, IPSECKEY, KX, LOC, GPOS, SRV, OPENPGPKEY, SSHFP, SPF, TLSA, URI,
CAA, CSYNC.
https://www.iana.org/assignments/dns-parameters/dns-parameters.xml
@ -189,8 +202,10 @@ use warnings;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Util qw(decode_dns_question_entry);
use Mail::SpamAssassin::Util qw(decode_dns_question_entry idn_to_ascii
compile_regexp is_fqdn_valid);
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:ip);
use version 0.77;
our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -202,8 +217,6 @@ our %rcode_value = ( # https://www.iana.org/assignments/dns-parameters, RFC 619
BADMODE => 19, BADNAME => 20, BADALG => 21, BADTRUNC => 22,
);
our $txtdata_can_provide_a_list;
sub new {
my($class,$sa_main) = @_;
@ -213,10 +226,6 @@ sub new {
$self->set_config($sa_main->{conf});
#$txtdata_can_provide_a_list = Net::DNS->VERSION >= 0.69;
#more robust version check from Damyan Ivanov - Bug 7095
$txtdata_can_provide_a_list = version->parse(Net::DNS->VERSION) >= version->parse('0.69');
return $self;
}
@ -253,20 +262,14 @@ sub parse_and_canonicalize_subtest {
my $result;
local($1,$2,$3);
# modifiers /a, /d, /l, /u in suffix form were added with perl 5.13.10 (5.14)
# currently known modifiers are [msixoadlu], but let's not be too picky here
if ( $subtest =~ m{^ / (.+) / ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* \( (.+) \) ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* \[ (.+) \] ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* \{ (.+) \} ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* < (.+) > ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
} elsif ($subtest =~ m{^ m \s* (\S) (.+) \1 ([a-z]*) \z}xs) {
$result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
if ($subtest =~ m{^/ .+ / [a-z]* \z}xs ||
$subtest =~ m{^m (\W) .+ (\W) [a-z]* \z}xs) {
my ($rec, $err) = compile_regexp($subtest, 1);
if (!$rec) {
warn "askdns: subtest compile failed: '$subtest': $err\n";
} else {
$result = $rec;
}
} elsif ($subtest =~ m{^ (["']) (.*) \1 \z}xs) { # quoted string
$result = $2;
} elsif ($subtest =~ m{^ \[ ( (?:[A-Z]+|\d+)
@ -324,8 +327,9 @@ sub set_config {
my @answer_types = split(/,/, $query_type);
# https://www.iana.org/assignments/dns-parameters/dns-parameters.xml
if (grep(!/^(?:ANY|A|AAAA|MX|TXT|PTR|NAPTR|NS|SOA|CERT|CNAME|DNAME|
DHCID|HINFO|MINFO|RP|HIP|IPSECKEY|KX|LOC|SRV|
SSHFP|SPF)\z/x, @answer_types)) {
DHCID|HINFO|MINFO|RP|HIP|IPSECKEY|KX|LOC|GPOS|SRV|
OPENPGPKEY|SSHFP|SPF|TLSA|URI|CAA|CSYNC)\z/x,
@answer_types)) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$query_type = 'ANY' if @answer_types > 1 || $answer_types[0] eq 'ANY';
@ -333,25 +337,19 @@ sub set_config {
$subtest = parse_and_canonicalize_subtest($subtest);
defined $subtest or return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
# initialize rule structure
$self->{askdns}{$rulename}{query} = $query_template;
$self->{askdns}{$rulename}{q_type} = $query_type;
$self->{askdns}{$rulename}{a_types} = \@answer_types;
$self->{askdns}{$rulename}{subtest} = $subtest;
$self->{askdns}{$rulename}{tags} = ();
# collect tag names as used in each query template
my @tags = $query_template =~ /_([A-Z][A-Z0-9]*)_/g;
my %seen; @tags = grep(!$seen{$_}++, @tags); # filter out duplicates
# group rules by tag names used in them (to be used as a hash key)
my $depends_on_tags = !@tags ? '' : join(',',@tags);
# subgroup rules by a DNS RR type and a nonexpanded query template
my $query_template_key = $query_type . ':' . $query_template;
$self->{askdns}{$depends_on_tags}{$query_template_key} ||=
{ query => $query_template, rules => {}, q_type => $query_type,
a_types => # optimization: undef means "same as q_type"
@answer_types == 1 && $answer_types[0] eq $query_type ? undef
: \@answer_types };
$self->{askdns}{$depends_on_tags}{$query_template_key}{rules}{$rulename}
= $subtest;
# dbg("askdns: rule: %s, config dep: %s, domkey: %s, subtest: %s",
# $rulename, $depends_on_tags, $query_template_key, $subtest);
# also support common HEADER(arg) tag which does $pms->get(arg)
my @tags = $query_template =~ /_([A-Z][A-Z0-9]*(?:_[A-Z0-9]+)*(?:\(.*?\))?)_/g;
# save rule to tag dependencies
$self->{askdns}{$rulename}{tags}{$_} = 1 foreach (@tags);
# just define the test so that scores and lint works
$self->{parser}->add_test($rulename, undef,
@ -366,182 +364,127 @@ sub set_config {
# run as early as possible, launching DNS queries as soon as their
# dependencies are fulfilled
#
sub parsed_metadata {
sub check_dnsbl {
my($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $conf = $pms->{conf};
return if !$pms->is_dns_available;
$pms->{askdns_map_dnskey_to_rules} = {};
return if !$pms->is_dns_available();
# walk through all collected askdns rules, obtain tag values whenever
# they may become available, and launch DNS queries right after
#
for my $depends_on_tags (keys %{$conf->{askdns}}) {
my @tags;
@tags = split(/,/, $depends_on_tags) if $depends_on_tags ne '';
if (would_log("dbg","askdns")) {
while ( my($query_template_key, $struct) =
each %{$conf->{askdns}{$depends_on_tags}} ) {
my($query_template, $query_type, $answer_types_ref, $rules) =
@$struct{qw(query q_type a_types rules)};
dbg("askdns: depend on tags %s, rules: %s ",
$depends_on_tags, join(', ', keys %$rules));
foreach my $rulename (keys %{$conf->{askdns}}) {
if (!$conf->{scores}->{$rulename}) {
dbg("askdns: skipping disabled rule $rulename");
next;
}
}
if (!@tags) {
# no dependencies on tags, just call directly
$self->launch_queries($pms,$depends_on_tags);
} else {
# enqueue callback for tags needed
my @tags = sort keys %{$conf->{askdns}{$rulename}{tags}};
if (@tags) {
dbg("askdns: rule %s depends on tags: %s", $rulename,
join(', ', @tags));
$pms->action_depends_on_tags(@tags == 1 ? $tags[0] : \@tags,
sub { my($pms,@args) = @_;
$self->launch_queries($pms,$depends_on_tags) }
$self->launch_queries($pms,$rulename,\@tags) }
);
} else {
# no dependencies on tags, just call directly
$self->launch_queries($pms,$rulename,[]);
}
}
}
# generate DNS queries - called for each set of rules
# when their tag dependencies are met
# generate DNS queries - called for each rule when its tag dependencies
# are met
#
sub launch_queries {
my($self, $pms, $depends_on_tags) = @_;
my $conf = $pms->{conf};
my($self, $pms, $rulename, $tags) = @_;
my %tags;
# obtain tag/value pairs of tags we depend upon in this set of rules
if ($depends_on_tags ne '') {
%tags = map( ($_,$pms->get_tag($_)), split(/,/,$depends_on_tags) );
my $arule = $pms->{conf}->{askdns}{$rulename};
my $query_tmpl = $arule->{query};
my $queries;
if (@$tags) {
if (!exists $pms->{askdns_qtmpl_cache}{$query_tmpl}) {
# replace tags in query template
# iterate through each tag, replacing list of strings as we go
my %q_iter = ( "$query_tmpl" => 1 );
foreach my $tag (@$tags) {
# cache tag values locally
if (!exists $pms->{askdns_tag_cache}{$tag}) {
my $valref = $pms->get_tag_raw($tag);
my @vals = grep { defined $_ && $_ ne '' } (ref $valref ? @$valref : $valref);
# Paranoid check for undefined tag
if (!@vals) {
dbg("askdns: skipping rule $rulename, no value found for tag: $tag");
return;
}
dbg("askdns: preparing queries which depend on tags: %s",
join(', ', map($_.' => '.$tags{$_}, keys %tags)));
# replace tag names in a query template with actual tag values
# and launch DNS queries
while ( my($query_template_key, $struct) =
each %{$conf->{askdns}{$depends_on_tags}} ) {
my($query_template, $query_type, $answer_types_ref, $rules) =
@$struct{qw(query q_type a_types rules)};
my @rulenames = keys %$rules;
if (grep($conf->{scores}->{$_}, @rulenames)) {
dbg("askdns: query template %s, type %s, rules: %s",
$query_template,
!$answer_types_ref ? $query_type
: $query_type.'/'.join(',',@$answer_types_ref),
join(', ', @rulenames));
$pms->{askdns_tag_cache}{$tag} = \@vals;
}
my %q_iter_new;
foreach my $q (keys %q_iter) {
# handle space separated multi-valued tags
foreach my $val (@{$pms->{askdns_tag_cache}{$tag}}) {
my $qtmp = $q;
$qtmp =~ s/\Q_${tag}_\E/${val}/g;
$q_iter_new{$qtmp} = 1;
}
}
%q_iter = %q_iter_new;
}
# cache idn'd queries
my @q_arr;
push @q_arr, idn_to_ascii($_) foreach (keys %q_iter);
$pms->{askdns_qtmpl_cache}{$query_tmpl} = \@q_arr;
}
$queries = $pms->{askdns_qtmpl_cache}{$query_tmpl};
} else {
dbg("askdns: query template %s, type %s, all rules disabled: %s",
$query_template, $query_type, join(', ', @rulenames));
push @$queries, idn_to_ascii($query_tmpl);
}
foreach my $query (@$queries) {
if (!is_fqdn_valid($query, 1)) {
dbg("askdns: skipping invalid query ($rulename): $query");
next;
}
# collect all tag names from a template, each may occur more than once
my @templ_tags = $query_template =~ /_([A-Z][A-Z0-9]*)_/gs;
# filter out duplicate tag names, and tags with undefined or empty value
my %seen;
@templ_tags = grep(!$seen{$_}++ && defined $tags{$_} && $tags{$_} ne '',
@templ_tags);
my %templ_vals; # values that each tag takes
for my $t (@templ_tags) {
my %seen;
# a tag value may be a space-separated list,
# store it as an arrayref, removing duplicate values
$templ_vals{$t} = [ grep(!$seen{$_}++, split(' ',$tags{$t})) ];
}
# count through all tag value tuples
my @digit = (0) x @templ_tags; # counting accumulator
OUTER:
for (;;) {
my %current_tag_val; # maps a tag name to its current iteration value
for my $j (0 .. $#templ_tags) {
my $t = $templ_tags[$j];
$current_tag_val{$t} = $templ_vals{$t}[$digit[$j]];
}
local $1;
my $query_domain = $query_template;
$query_domain =~ s{_([A-Z][A-Z0-9]*)_}
{ defined $current_tag_val{$1} ? $current_tag_val{$1}
: '' }ge;
# the $dnskey identifies this query in AsyncLoop's pending_lookups
my $dnskey = join(':', 'askdns', $query_type, $query_domain);
dbg("askdns: expanded query %s, dns key %s", $query_domain, $dnskey);
if ($query_domain eq '') {
# ignore, just in case
} else {
if (!exists $pms->{askdns_map_dnskey_to_rules}{$dnskey}) {
$pms->{askdns_map_dnskey_to_rules}{$dnskey} =
[ [$query_type, $answer_types_ref, $rules] ];
} else {
push(@{$pms->{askdns_map_dnskey_to_rules}{$dnskey}},
[$query_type, $answer_types_ref, $rules] );
}
# launch a new DNS query for $query_type and $query_domain
my $ent = $pms->{async}->bgsend_and_start_lookup(
$query_domain, $query_type, undef,
{ key => $dnskey, zone => $query_domain },
sub { my ($ent2,$pkt) = @_;
$self->process_response_packet($pms, $ent2, $pkt, $dnskey) },
master_deadline => $pms->{master_deadline} );
# these rules are now underway; unless the rule hits, these will
# not be considered "finished" until harvest_dnsbl_queries() completes
$pms->register_async_rule_start($dnskey) if $ent;
}
last if !@templ_tags;
# increment accumulator, little-endian
for (my $j = 0; ; $j++) {
last if ++$digit[$j] <= $#{$templ_vals{$templ_tags[$j]}};
$digit[$j] = 0; # and carry
last OUTER if $j >= $#templ_tags;
}
}
dbg("askdns: launching query ($rulename): $query");
my $ret = $pms->{async}->bgsend_and_start_lookup(
$query, $arule->{q_type}, undef,
{ rulename => $rulename, type => 'AskDNS' },
sub { my ($ent,$pkt) = @_;
$self->process_response_packet($pms, $ent, $pkt, $rulename) },
master_deadline => $pms->{master_deadline}
);
$pms->rule_ready($rulename) if !$ret; # mark ready if nothing launched
}
}
sub process_response_packet {
my($self, $pms, $ent, $pkt, $dnskey) = @_;
my($self, $pms, $ent, $pkt, $rulename) = @_;
my $conf = $pms->{conf};
my %rulenames_hit;
# map a dnskey back to info on queries which caused this DNS lookup
my $queries_ref = $pms->{askdns_map_dnskey_to_rules}{$dnskey};
my($header, @question, @answer, $qtype, $rcode);
# NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
if ($pkt) {
@answer = $pkt->answer;
$header = $pkt->header;
@question = $pkt->question;
$qtype = uc $question[0]->qtype if @question;
$rcode = uc $header->rcode if $header; # 'NOERROR', 'NXDOMAIN', ...
return if !$pkt;
my @question = $pkt->question;
return if !@question;
$pms->rule_ready($rulename); # mark rule ready for metas
my @answer = $pkt->answer;
my $rcode = uc $pkt->header->rcode; # 'NOERROR', 'NXDOMAIN', ...
# NOTE: qname is encoded in RFC 1035 zone format, decode it
dbg("askdns: answer received, rcode %s, query %s, answer has %d records",
$rcode,
dbg("askdns: answer received (%s), rcode %s, query %s, answer has %d records",
$rulename, $rcode,
join(', ', map(join('/', decode_dns_question_entry($_)), @question)),
scalar @answer);
if (defined $rcode && exists $rcode_value{$rcode}) {
# Net::DNS return a rcode name for codes it knows about,
# and returns a number for the rest; we deal with numbers from here on
$rcode = $rcode_value{$rcode} if exists $rcode_value{$rcode};
}
}
if (!@answer) {
# a trick to make the following loop run at least once, so that we can
# evaluate also rules which only care for rcode status
@answer = ( undef );
}
@answer = (undef) if !@answer;
# NOTE: $rr->rdstring returns the result encoded in a DNS zone file
# format, i.e. enclosed in double quotes if a result contains whitespace
@ -561,9 +504,12 @@ sub process_response_packet {
# the code handling such reply from DNS MUST assemble all of these
# marshaled text blocks into a single one before any syntactical
# verification takes place.
# The same goes for RFC 4408 (SPF), RFC 4871 (DKIM), RFC 5617 (ADSP),
# The same goes for RFC 7208 (SPF), RFC 4871 (DKIM), RFC 5617 (ADSP),
# draft-kucherawy-dmarc-base (DMARC), ...
my $arule = $pms->{conf}->{askdns}{$rulename};
my $subtest = $arule->{subtest};
for my $rr (@answer) {
my($rr_rdatastr, $rdatanum, $rr_type);
if (!$rr) {
@ -571,37 +517,32 @@ sub process_response_packet {
} else {
$rr_type = uc $rr->type;
if ($rr_type eq 'A') {
# Net::DNS::RR::A::address() is available since Net::DNS 0.69
$rr_rdatastr = $rr->UNIVERSAL::can('address') ? $rr->address
: $rr->rdatastr;
$rr_rdatastr = $rr->address;
if ($rr_rdatastr =~ m/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\z/) {
$rdatanum = Mail::SpamAssassin::Util::my_inet_aton($rr_rdatastr);
}
} elsif ($rr->UNIVERSAL::can('txtdata')) {
# TXT, SPF: join with no intervening spaces, as per RFC 5518
if ($txtdata_can_provide_a_list || $rr_type ne 'TXT') {
$rr_rdatastr = join('', $rr->txtdata); # txtdata() in list context!
} else { # char_str_list() is only available for TXT records
$rr_rdatastr = join('', $rr->char_str_list); # historical
}
# Net::DNS attempts to decode text strings in a TXT record as UTF-8,
# which is undesired: octets failing the UTF-8 decoding are converted
# to a Unicode "replacement character" U+FFFD (encoded as octets
# \x{EF}\x{BF}\x{BD} in UTF-8), and ASCII text is unnecessarily
# flagged as perl native characters (utf8 flag on), which can be
# disruptive on later processing, e.g. implicitly upgrading strings
# on concatenation. Unfortunately there is no way of legally bypassing
# the UTF-8 decoding by Net::DNS::RR::TXT in Net::DNS::RR::Text.
# Try to minimize damage by encoding back to UTF-8 octets:
utf8::encode($rr_rdatastr) if utf8::is_utf8($rr_rdatastr);
} else {
# rdatastr() is historical, use rdstring() since Net::DNS 0.69
$rr_rdatastr = $rr->UNIVERSAL::can('rdstring') ? $rr->rdstring
: $rr->rdatastr;
$rr_rdatastr = $rr->rdstring;
utf8::encode($rr_rdatastr) if utf8::is_utf8($rr_rdatastr);
}
# dbg("askdns: received rr type %s, data: %s", $rr_type, $rr_rdatastr);
}
for my $q_tuple (!ref $queries_ref ? () : @$queries_ref) {
next if !$q_tuple;
my($query_type, $answer_types_ref, $rules) = @$q_tuple;
next if !defined $qtype;
$answer_types_ref = [$query_type] if !defined $answer_types_ref;
while (my($rulename,$subtest) = each %$rules) {
my $match;
local($1,$2,$3);
if (ref $subtest eq 'HASH') { # a list of DNS rcodes (as hash keys)
@ -609,7 +550,7 @@ sub process_response_packet {
} elsif ($rcode != 0) {
# skip remaining tests on DNS error
} elsif (!defined($rr_type) ||
!grep($_ eq 'ANY' || $_ eq $rr_type, @$answer_types_ref) ) {
!grep($_ eq 'ANY' || $_ eq $rr_type, @{$arule->{a_types}}) ) {
# skip remaining tests on wrong RR type
} elsif (!defined $subtest) {
$match = 1; # any valid response of the requested RR type matches
@ -628,15 +569,10 @@ sub process_response_packet {
: 0; # notice int($n1) to fix perl ~5.14 taint bug (Bug 7725)
}
if ($match) {
$self->askdns_hit($pms, $ent->{query_domain}, $qtype,
$self->askdns_hit($pms, $ent->{query_domain}, $question[0]->qtype,
$rr_rdatastr, $rulename);
$rulenames_hit{$rulename} = 1;
}
}
}
}
# these rules have completed (since they got at least 1 hit)
$pms->register_async_rule_finish($_) for keys %rulenames_hit;
}
sub askdns_hit {
@ -648,9 +584,11 @@ sub askdns_hit {
# only the first hit will show in the test log report, even if
# an answer section matches more than once - got_hit() handles this
$pms->clear_test_state;
$pms->test_log(sprintf("%s %s:%s", $query_domain,$qtype,$rr_rdatastr));
$pms->test_log(sprintf("%s %s:%s", $query_domain,$qtype,$rr_rdatastr), $rulename);
$pms->got_hit($rulename, 'ASKDNS: ', ruletype => 'askdns'); # score=>$score
}
# Version features
sub has_tag_header { 1 } # HEADER() was implemented together with Conf::feature_get_host # Bug 7734
1;

View File

@ -0,0 +1,553 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::AuthRes - use Authentication-Results header fields
=head1 SYNOPSIS
=head2 SpamAssassin configuration:
loadplugin Mail::SpamAssassin::Plugin::AuthRes
authres_trusted_authserv myserv.example.com
authres_networks all
=head1 DESCRIPTION
This plugin parses Authentication-Results header fields and can supply the
results obtained to other plugins, so as to avoid repeating checks that have
been performed already.
=cut
package Mail::SpamAssassin::Plugin::AuthRes;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
# use bytes;
use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
# list of valid methods and values
# https://www.iana.org/assignments/email-auth/email-auth.xhtml
# some others not in that list:
# dkim-atps=neutral
my %method_result = (
'auth' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
'dkim' => {'fail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'temperror'=>1},
'dkim-adsp' => {'discard'=>1,'fail'=>1,'none'=>1,'nxdomain'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1,'unknown'=>1},
'dkim-atps' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1,'neutral'=>1},
'dmarc' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
'domainkeys' => {'fail'=>1,'neutral'=>1,'none'=>1,'permerror'=>1,'policy'=>1,'pass'=>1,'temperror'=>1},
'iprev' => {'fail'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
'rrvs' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1,'unknown'=>1},
'sender-id' => {'fail'=>1,'hardfail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'softfail'=>1,'temperror'=>1},
'smime' => {'fail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'temperror'=>1},
'spf' => {'fail'=>1,'hardfail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'softfail'=>1,'temperror'=>1},
'vbr' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
);
my %method_ptype_prop = (
'auth' => {'smtp' => {'auth'=>1,'mailfrom'=>1}},
'dkim' => {'header' => {'d'=>1,'i'=>1,'b'=>1}},
'dkim-adsp' => {'header' => {'from'=>1}},
'dkim-atps' => {'header' => {'from'=>1}},
'dmarc' => {'header' => {'from'=>1}},
'domainkeys' => {'header' => {'d'=>1,'from'=>1,'sender'=>1}},
'iprev' => {'policy' => {'iprev'=>1}},
'rrvs' => {'smtp' => {'rcptto'=>1}},
'sender-id' => {'header' => {'*'=>1}},
'smime' => {'body' => {'smime-part'=>1,'smime-identifer'=>1,'smime-serial'=>1,'smime-issuer'=>1}},
'spf' => {'smtp' => {'mailfrom'=>1,'helo'=>1}},
'vbr' => {'header' => {'md'=>1,'mv'=>1}},
);
# Some MIME helpers
my $QUOTED_STRING = qr/"((?:[^"\\]++|\\.)*+)"?/;
my $TOKEN = qr/[^\s\x00-\x1f\x80-\xff\(\)\<\>\@\,\;\:\/\[\]\?\=\"]+/;
my $ATOM = qr/[a-zA-Z0-9\@\!\#\$\%\&\\\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+/;
sub new {
my ($class, $mailsa) = @_;
# the usual perlobj boilerplate to create a subclass object
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
$self->set_config($mailsa->{conf});
# process first as other plugins might depend on us
$self->register_method_priority("parsed_metadata", -10);
$self->register_eval_rule("check_authres_result", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
return $self;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds;
=head1 ADMINISTRATOR OPTIONS
=over
=item authres_networks internal/trusted/all (default: internal)
Process Authenticated-Results headers set by servers from these networks
(refers to SpamAssassin *_networks zones). Any header outside this is
completely ignored (affects all module settings).
internal = internal_networks
trusted = internal_networks + trusted_networks
all = all above + all external
Setting "all" is safe only if your MX servers filter properly all incoming
A-R headers, and you use authres_trusted_authserv to match your authserv-id.
This is suitable for default OpenDKIM for example. These settings might
also be required if your filters do not insert A-R header to correct
position above the internal Received header (some known offenders: OpenDKIM,
OpenDMARC, amavisd-milter).
=back
=cut
push (@cmds, {
setting => 'authres_networks',
is_admin => 1,
default => 'internal',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || $value =~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
$value = lc($value);
if ($value =~ /^(?:internal|trusted|all)$/) {
$self->{authres_networks} = $value;
} else {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
}
});
=over
=item authres_trusted_authserv authservid1 id2 ... (default: none)
Trusted authentication server IDs (the domain-name-like first word of
Authentication-Results field, also known as C<authserv-id>).
Note that if set, ALL A-R headers are ignored unless a match is found.
Use strongly recommended, possibly along with authres_networks all.
=back
=cut
push (@cmds, {
setting => 'authres_trusted_authserv',
is_admin => 1,
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || $value =~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
foreach my $id (split(/\s+/, lc $value)) {
$self->{authres_trusted_authserv}->{$id} = 1;
}
}
});
=over
=item authres_ignored_authserv authservid1 id2 ... (default: none)
Ignored authentication server IDs (the domain-name-like first word of
Authentication-Results field, also known as C<authserv-id>).
Any A-R header is ignored if match is found.
=back
=cut
push (@cmds, {
setting => 'authres_ignored_authserv',
is_admin => 1,
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || $value =~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
foreach my $id (split(/\s+/, lc $value)) {
$self->{authres_ignored_authserv}->{$id} = 1;
}
}
});
$conf->{parser}->register_commands(\@cmds);
}
=head1 METADATA
Parsed headers are stored in $pms-E<gt>{authres_parsed}, as a hash of array
of hashes where results are collected by method. For example, the header
field:
Authentication-Results: server.example.com;
spf=pass smtp.mailfrom=bounce.example.org;
dkim=pass header.i=@example.org;
dkim=fail header.i=@another.signing.domain.example
Produces the following structure:
$pms->{authres_parsed} = {
'dkim' => [
{
'properties' => {
'header' => {
'i' => '@example.org'
}
},
'authserv' => 'server.example.com',
'result' => 'pass',
'version' => 1,
'reason' => ''
},
{
'properties' => {
'header' => {
'i' => '@another.signing.domain.example'
}
},
'result' => 'fail',
'authserv' => 'server.example.com',
'version' => 1,
'reason' => ''
},
],
}
Within each array, the order of results is the original, which should be most
recent results first.
For checking result of methods, $pms-E<gt>{authres_result} is available:
$pms->{authres_result} = {
'dkim' => 'pass',
'spf' => 'fail',
}
=head1 EVAL FUNCTIONS
=over 4
=item header RULENAME eval:check_authres_result(method, result)
Can be used to check results.
ifplugin Mail::SpamAssassin::Plugin::AuthRes
ifplugin !(Mail::SpamAssassin::Plugin::SPF)
header SPF_PASS eval:check_authres_result('spf', 'pass')
header SPF_FAIL eval:check_authres_result('spf', 'fail')
header SPF_SOFTFAIL eval:check_authres_result('spf', 'softfail')
header SPF_TEMPFAIL eval:check_authres_result('spf', 'tempfail')
endif
ifplugin !(Mail::SpamAssassin::Plugin::DKIM)
header DKIM_VERIFIED eval:check_authres_result('dkim', 'pass')
header DKIM_INVALID eval:check_authres_result('dkim', 'fail')
endif
endif
=back
=cut
sub check_authres_result {
my ($self, $pms, $method, $wanted_result) = @_;
my $result = $pms->{authres_result}->{$method};
$wanted_result = lc($wanted_result);
if ($wanted_result eq 'missing') {
return !defined($result) ? 1 : 0;
}
return ($wanted_result eq $result);
}
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my @authres;
my $nethdr;
if ($pms->{conf}->{authres_networks} eq 'internal') {
$nethdr = 'ALL-INTERNAL';
} elsif ($pms->{conf}->{authres_networks} eq 'trusted') {
$nethdr = 'ALL-TRUSTED';
} else {
$nethdr = 'ALL';
}
foreach my $hdr (split(/^/m, $pms->get($nethdr))) {
if ($hdr =~ /^(?:Arc\-)?Authentication-Results:\s*(.+)/i) {
push @authres, $1;
}
}
if (!@authres) {
dbg("authres: no Authentication-Results headers found from %s",
$pms->{conf}->{authres_networks});
return 0;
}
foreach (@authres) {
eval {
$self->parse_authres($pms, $_);
} or do {
dbg("authres: skipping header, $@");
}
}
$pms->{authres_result} = {};
# Set $pms->{authres_result} info for all found methods
# 'pass' will always win if multiple results
foreach my $method (keys %method_result) {
my $parsed = $pms->{authres_parsed}->{$method};
next if !$parsed;
foreach my $pref (@$parsed) {
if (!$pms->{authres_result}->{$method} ||
$pref->{result} eq 'pass')
{
$pms->{authres_result}->{$method} = $pref->{result};
}
}
}
if (%{$pms->{authres_result}}) {
dbg("authres: results: %s",
join(' ', map { $_.'='.$pms->{authres_result}->{$_} }
sort keys %{$pms->{authres_result}}));
} else {
dbg("authres: no results");
}
}
sub parse_authres {
my ($self, $pms, $hdr) = @_;
dbg("authres: parsing Authentication-Results: $hdr");
my $authserv;
my $version = 1;
my @methods;
local $_ = $hdr;
# authserv-id
if (!/\G($TOKEN)/gcs) {
die("invalid authserv\n");
}
$authserv = lc($1);
if (%{$pms->{conf}->{authres_trusted_authserv}}) {
if (!$pms->{conf}->{authres_trusted_authserv}->{$authserv}) {
die("authserv not trusted: $authserv\n");
}
}
if ($pms->{conf}->{authres_ignored_authserv}->{$authserv}) {
die("ignored authserv: $authserv\n");
}
skip_cfws();
if (/\G\d+/gcs) { # skip authserv version
skip_cfws();
}
if (!/\G;/gcs) {
die("missing delimiter\n");
}
skip_cfws();
while (pos() < length()) {
my ($method, $result);
my $reason = '';
my $props = {};
# skip none method
if (/\Gnone\b/igcs) {
die("method none\n");
}
# method / version = result
if (!/\G([\w-]+)/gcs) {
die("invalid method\n");
}
$method = lc($1);
if (!exists $method_result{$method}) {
die("unknown method: $method\n");
}
skip_cfws();
if (/\G\//gcs) {
skip_cfws();
if (!/\G\d+/gcs) {
die("invalid $method version\n");
}
$version = $1;
skip_cfws();
}
if (!/\G=/gcs) {
die("missing result for $method: ".substr($_, pos())."\n");
}
skip_cfws();
if (!/\G(\w+)/gcs) {
die("invalid result for $method\n");
}
$result = $1;
if (!exists $method_result{$method}{$result}) {
die("unknown result for $method: $result\n");
}
skip_cfws();
# reason = value
if (/\Greason\b/igcs) {
skip_cfws();
if (!/\G=/gcs) {
die("invalid reason\n");
}
skip_cfws();
if (!/\G$QUOTED_STRING|($TOKEN)/gcs) {
die("invalid reason\n");
}
$reason = defined $1 ? $1 : $2;
skip_cfws();
}
# ptype.property = value
while (pos() < length()) {
my ($ptype, $property, $value);
# ptype
if (!/\G(\w+)/gcs) {
die("invalid ptype: ".substr($_,pos())."\n");
}
$ptype = lc($1);
if (!exists $method_ptype_prop{$method}{$ptype}) {
die("unknown ptype: $ptype\n");
}
skip_cfws();
# dot
if (!/\G\./gcs) {
die("missing property\n");
}
skip_cfws();
# property
if (!/\G(\w+)/gcs) {
die("invalid property\n");
}
$property = lc($1);
if (!exists $method_ptype_prop{$method}{$ptype}{$property} &&
!exists $method_ptype_prop{$method}{$ptype}{'*'}) {
die("unknown property for $ptype: $property\n");
}
skip_cfws();
# =
if (!/\G=/gcs) {
die("missing property value\n");
}
skip_cfws();
# value:
# The grammar is ( value / [ [ local-part ] "@" ] domain-name )
# where value := token / quoted-string
# and local-part := dot-atom / quoted-string / obs-local-part
if (!/\G$QUOTED_STRING|($ATOM(?:\.$ATOM)*|$TOKEN)(?=(?:[\s;]|$))/gcs) {
die("invalid $ptype.$property value\n");
}
$value = defined $1 ? $1 : $2;
skip_cfws();
$props->{$ptype}->{$property} = $value;
if (/\G(?:;|$)/gcs) {
skip_cfws();
last;
}
}
push @methods, [$method, {
'authserv' => $authserv,
'version' => $version,
'result' => $result,
'reason' => $reason,
'properties' => $props,
}];
}
# paranoid check..
if (pos() < length()) {
die("parse ended prematurely?\n");
}
# Pushed to pms only if header parsed completely
foreach my $marr (@methods) {
push @{$pms->{authres_parsed}->{$marr->[0]}}, $marr->[1];
}
return 1;
}
# skip whitespace and comments
sub skip_cfws {
/\G\s*/gcs;
if (/\G\(/gcs) {
my $i = 1;
while (/\G.*?([()]|\z)/gcs) {
$1 eq ')' ? $i-- : $i++;
last if !$i;
}
die("comment not ended\n") if $i;
/\G\s*/gcs;
}
}
#sub check_cleanup {
# my ($self, $opts) = @_;
# my $pms = $opts->{permsgstatus};
# use Data::Dumper;
# print STDERR Dumper($pms->{authres_parsed});
# print STDERR Dumper($pms->{authres_result});
#}
1;

View File

@ -105,7 +105,11 @@ Note: SpamAssassin requires at least 3 points from the header, and 3
points from the body to auto-learn as spam. Therefore, the minimum
working value for this option is 6.
If the test option autolearn_force is set, the minimum value will
If test option C<autolearn_header> or C<autolearn_body> is set, points from
that rule are forced to count as coming from header or body accordingly.
This can be useful for adjusting some meta rules.
If the test option C<autolearn_force> is set, the minimum value will
remain at 6 points but there is no requirement that the points come
from body and header rules. This option is useful for autolearning
with rules that are considered to be extremely safe indicators of

View File

@ -37,7 +37,43 @@ And the chi-square probability combiner as described here:
The results are incorporated into SpamAssassin as the BAYES_* rules.
=head1 METHODS
=head1 ADMINISTRATOR SETTINGS
=over 4
=item bayes_stopword_languages lang (default: en)
Languages enabled in bayes stopwords processing, every language have a
default stopwords regexp, tokens matching this regular expression will not
be considered in bayes processing.
Custom regular expressions for additional languages can be defined in C<local.cf>.
Custom regular expressions can be specified by using the C<bayes_stopword_lang>
keyword like in the following example:
bayes_stopword_languages en se
bayes_stopword_en (?:you|me)
bayes_stopword_se (?:du|mig)
Regexps are case-insensitive will be anchored automatically at beginning and
end.
To disable stopwords usage, specify C<bayes_stopword_languages disable>.
Only one bayes_stopword_languages or bayes_stopword_xx configuration line
can be used. New configuration line will override the old one, for example
the ones from SpamAssassin default ruleset (60_bayes_stopwords.cf).
=back
=over 4
=item bayes_max_token_length (default: 15)
Configure the maximum number of character a token could contain
=back
=cut
@ -48,16 +84,12 @@ use warnings;
# use bytes;
use re 'taint';
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1 sha1_hex); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1 sha1_hex) }
}
use Digest::SHA qw(sha1 sha1_hex);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
use Mail::SpamAssassin::Util qw(compile_regexp untaint_var);
# pick ONLY ONE of these combining implementations.
use Mail::SpamAssassin::Bayes::CombineChi;
@ -135,13 +167,16 @@ our $IGNORED_HDRS = qr{(?: (?:X-)?Sender # misc noise
| X-Gnus-Mail-Source
| Xref
)}x;
)}ix;
# Note only the presence of these headers, in order to reduce the
# hapaxen they generate.
our $MARK_PRESENCE_ONLY_HDRS = qr{(?: X-Face
|X-(?:Gnu-?PG|PGP|GPG)(?:-Key)?-Fingerprint
|D(?:KIM|omainKey)-Signature
|X-Google-DKIM-Signature
|ARC-(?:Message-Signature|Seal)
|Autocrypt
)}ix;
# tweaks tested as of Nov 18 2002 by jm posted to -devel at
@ -220,6 +255,8 @@ use constant REQUIRE_SIGNIFICANT_TOKENS_TO_SCORE => -1;
# How long a token should we hold onto? (note: German speakers typically
# will require a longer token than English ones.)
# This is just a default value, option can be changed using
# bayes_max_token_length option
use constant MAX_TOKEN_LENGTH => 15;
###########################################################################
@ -236,10 +273,100 @@ sub new {
$self->{conf} = $main->{conf};
$self->{use_ignores} = 1;
$self->register_eval_rule("check_bayes");
# Old default stopword list, need to have hardcoded one incase sa-update is not available
$self->{bayes_stopword}{en} = qr/(?:a(?:ble|l(?:ready|l)|n[dy]|re)|b(?:ecause|oth)|c(?:an|ome)|e(?:ach|mail|ven)|f(?:ew|irst|or|rom)|give|h(?:a(?:ve|s)|ttp)|i(?:n(?:formation|to)|t\'s)|just|know|l(?:ike|o(?:ng|ok))|m(?:a(?:de|il(?:(?:ing|to))?|ke|ny)|o(?:re|st)|uch)|n(?:eed|o[tw]|umber)|o(?:ff|n(?:ly|e)|ut|wn)|p(?:eople|lace)|right|s(?:ame|ee|uch)|t(?:h(?:at|is|rough|e)|ime)|using|w(?:eb|h(?:ere|y)|ith(?:out)?|or(?:ld|k))|y(?:ears?|ou(?:(?:\'re|r))?))/;
$self->set_config($self->{conf});
$self->register_eval_rule("check_bayes", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'bayes_max_token_length',
default => MAX_TOKEN_LENGTH,
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push(@cmds, {
setting => 'bayes_stopword_languages',
default => ['en'],
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRINGLIST,
code => sub {
my ($self, $key, $value, $line) = @_;
my @langs;
if ($value eq 'disable') {
@{$self->{bayes_stopword_languages}} = ();
}
else {
foreach my $lang (split(/(?:\s*,\s*|\s+)/, lc($value))) {
if ($lang !~ /^([a-z]{2})$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
push @langs, $lang;
}
if (!@langs) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
@{$self->{bayes_stopword_languages}} = @langs;
}
}
});
$conf->{parser}->register_commands(\@cmds);
}
sub parse_config {
my ($self, $opts) = @_;
# Ignore users's configuration lines
return 0 if $opts->{user_config};
if ($opts->{key} =~ /^bayes_stopword_([a-z]{2})$/i) {
$self->inhibit_further_callbacks();
my $lang = lc($1);
foreach my $re (split(/\s+/, $opts->{value})) {
my ($rec, $err) = compile_regexp('^(?i)'.$re.'$', 0);
if (!$rec) {
warn "bayes: invalid regexp for $opts->{key}: $err\n";
return 0;
}
$self->{bayes_stopword}{$lang} = $rec;
}
return 1;
}
return 0;
}
sub finish_parsing_end {
my ($self, $opts) = @_;
my $conf = $opts->{conf};
my @langs;
foreach my $lang (@{$conf->{bayes_stopword_languages}}) {
if (defined $self->{bayes_stopword}{$lang}) {
push @langs, $lang;
} else {
warn "bayes: missing stopwords regexp for language '$lang'\n";
}
}
if (@langs) {
dbg("bayes: stopwords for languages enabled: ".join(' ', @langs));
@{$conf->{bayes_stopword_languages}} = @langs;
} else {
dbg("bayes: no stopword languages enabled");
$conf->{bayes_stopword_languages} = [];
}
return 0;
}
sub finish {
my $self = shift;
if ($self->{store}) {
@ -410,10 +537,11 @@ sub _learn_trapped {
my @msgid = ( $msgid );
if (!defined $msgid) {
@msgid = $self->get_msgid($msg);
@msgid = ( $msg->generate_msgid(), $msg->get_msgid() );
}
foreach my $msgid_t ( @msgid ) {
next if !defined $msgid_t;
my $seen = $self->{store}->seen_get ($msgid_t);
if (defined ($seen)) {
@ -545,7 +673,7 @@ sub _forget_trapped {
my $isspam;
if (!defined $msgid) {
@msgid = $self->get_msgid($msg);
@msgid = ( $msg->generate_msgid(), $msg->get_msgid() );
}
while( $msgid = shift @msgid ) {
@ -671,7 +799,6 @@ sub learner_is_scan_available {
sub scan {
my ($self, $permsgstatus, $msg) = @_;
my $score;
return unless $self->{conf}->{use_learner};
@ -756,6 +883,7 @@ sub scan {
if (@pw_keys > N_SIGNIFICANT_TOKENS) { $#pw_keys = N_SIGNIFICANT_TOKENS - 1 }
my @sorted;
my $score;
foreach my $tok (@pw_keys) {
next if $tok_strength{$tok} <
$Mail::SpamAssassin::Bayes::Combine::MIN_PROB_STRENGTH;
@ -964,49 +1092,6 @@ sub learner_dump_database {
###########################################################################
# TODO: these are NOT public, but the test suite needs to call them.
sub get_msgid {
my ($self, $msg) = @_;
my @msgid;
my $msgid = $msg->get_header("Message-Id");
if (defined $msgid && $msgid ne '' && $msgid !~ /^\s*<\s*(?:\@sa_generated)?>.*$/) {
# remove \r and < and > prefix/suffixes
chomp $msgid;
$msgid =~ s/^<//; $msgid =~ s/>.*$//g;
push(@msgid, $msgid);
}
# Modified 2012-01-17 per bug 5185 to remove last received from msg_id calculation
# Use sha1_hex(Date: and top N bytes of body)
# where N is MIN(1024 bytes, 1/2 of body length)
#
my $date = $msg->get_header("Date");
$date = "None" if (!defined $date || $date eq ''); # No Date?
#Removed per bug 5185
#my @rcvd = $msg->get_header("Received");
#my $rcvd = $rcvd[$#rcvd];
#$rcvd = "None" if (!defined $rcvd || $rcvd eq ''); # No Received?
# Make a copy since pristine_body is a reference ...
my $body = join('', $msg->get_pristine_body());
if (length($body) > 64) { # Small Body?
my $keep = ( length $body > 2048 ? 1024 : int(length($body) / 2) );
substr($body, $keep) = '';
}
#Stripping all CR and LF so that testing midstream from MTA and post delivery don't
#generate different id's simply because of LF<->CR<->CRLF changes.
$body =~ s/[\r\n]//g;
unshift(@msgid, sha1_hex($date."\000".$body).'@sa_generated');
return wantarray ? @msgid : $msgid[0];
}
sub get_body_from_msg {
my ($self, $msg) = @_;
@ -1024,7 +1109,7 @@ sub get_body_from_msg {
if (!defined $msgdata) {
# why?!
warn "bayes: failed to get body for ".scalar($self->get_msgid($self->{msg}))."\n";
warn "bayes: failed to get body for ".scalar($self->{msg}->generate_msgid())."\n";
return { };
}
@ -1052,8 +1137,10 @@ sub _get_msgdata_from_permsgstatus {
# The calling functions expect a uniq'ed array of tokens ...
sub tokenize {
my ($self, $msg, $msgdata) = @_;
my $conf = $self->{conf};
my $t_src = $conf->{bayes_token_sources};
my $t_src = $self->{conf}->{bayes_token_sources};
$self->{stopword_cache} = ();
# visible tokens from the body
my @tokens_body;
@ -1117,6 +1204,8 @@ sub tokenize {
dbg("bayes: tokenized header: %d tokens", scalar @tokens_header);
}
delete $self->{stopword_cache};
# Go ahead and uniq the array, skip null tokens (can happen sometimes)
# generate an SHA1 hash and take the lower 40 bits as our token
my %tokens;
@ -1137,6 +1226,7 @@ sub _tokenize_line {
my $region = $_[3];
local ($_) = $_[1];
my $conf = $self->{conf};
my @rettokens;
# include quotes, .'s and -'s for URIs, and [$,]'s for Nigerian-scam strings,
@ -1174,7 +1264,7 @@ sub _tokenize_line {
# cleared, even if the source string has perl characters semantics !!!
# Is this really still desirable?
foreach my $token (split) {
TOKEN: foreach my $token (split) {
$token =~ s/^[-'"\.,]+//; # trim non-alphanum chars at start or end
$token =~ s/[-'"\.,]+$//; # so we don't get loads of '"foo' tokens
@ -1183,7 +1273,7 @@ sub _tokenize_line {
# tokens, so the SQL BayesStore returns undef. I really want a way
# of optimizing that out, but I haven't come up with anything yet.
#
next if ( defined $magic_re && $token =~ /$magic_re/ );
next if ( defined $magic_re && $token =~ /$magic_re/o );
# *do* keep 3-byte tokens; there's some solid signs in there
my $len = length($token);
@ -1192,8 +1282,24 @@ sub _tokenize_line {
# area, and it just slows us down to record them.
# See http://wiki.apache.org/spamassassin/BayesStopList for more info.
#
next if $len < 3 ||
($token =~ /^(?:a(?:ble|l(?:ready|l)|n[dy]|re)|b(?:ecause|oth)|c(?:an|ome)|e(?:ach|mail|ven)|f(?:ew|irst|or|rom)|give|h(?:a(?:ve|s)|ttp)|i(?:n(?:formation|to)|t\'s)|just|know|l(?:ike|o(?:ng|ok))|m(?:a(?:de|il(?:(?:ing|to))?|ke|ny)|o(?:re|st)|uch)|n(?:eed|o[tw]|umber)|o(?:ff|n(?:ly|e)|ut|wn)|p(?:eople|lace)|right|s(?:ame|ee|uch)|t(?:h(?:at|is|rough|e)|ime)|using|w(?:eb|h(?:ere|y)|ith(?:out)?|or(?:ld|k))|y(?:ears?|ou(?:(?:\'re|r))?))$/i);
next if $len < 3;
# check stopwords regexp if not cached
if (@{$conf->{bayes_stopword_languages}}) {
if (!exists $self->{stopword_cache}{$token}) {
foreach my $lang (@{$conf->{bayes_stopword_languages}}) {
if ($token =~ $self->{bayes_stopword}{$lang}) {
dbg("bayes: skipped token '$token' because it's in stopword list for language '$lang'");
$self->{stopword_cache}{$token} = 1;
next TOKEN;
}
}
$self->{stopword_cache}{$token} = 0;
} else {
# bail out if cached known
next if $self->{stopword_cache}{$token};
}
}
# are we in the body? If so, apply some body-specific breakouts
if ($region == 1 || $region == 2) {
@ -1212,7 +1318,7 @@ sub _tokenize_line {
# used as part of split tokens such as "HTo:D*net" indicating that
# the domain ".net" appeared in the To header.
#
if ($len > MAX_TOKEN_LENGTH && $token !~ /\*/) {
if ($len > $conf->{bayes_max_token_length} && index($token, '*') == -1) {
if (TOKENIZE_LONG_8BIT_SEQS_AS_UTF8_CHARS && $token =~ /[\x80-\xBF]{2}/) {
# Bug 7135
@ -1287,9 +1393,6 @@ sub _tokenize_headers {
my %parsed;
my %user_ignore;
$user_ignore{lc $_} = 1 for @{$self->{main}->{conf}->{bayes_ignore_headers}};
# get headers in array context
my @hdrs;
my @rcvdlines;
@ -1317,7 +1420,7 @@ sub _tokenize_headers {
# remove user-specified headers here, after Received, in case they
# want to ignore that too
next if exists $user_ignore{lc $hdr};
next if exists $self->{conf}->{bayes_ignore_header}->{lc $hdr};
# Prep the header value
$val ||= '';
@ -1344,22 +1447,38 @@ sub _tokenize_headers {
elsif ($hdr =~ /^${MARK_PRESENCE_ONLY_HDRS}$/i) {
$val = "1"; # just mark the presence, they create lots of hapaxen
}
elsif ($hdr =~ /^x-spam-relays-(?:external|internal|trusted|untrusted)$/) {
# remove redundant rdns helo ident envfrom intl auth msa words
$val =~ s/ [a-z]+=/ /g;
}
if (MAP_HEADERS_MID) {
if ($hdr =~ /^(?:In-Reply-To|References|Message-ID)$/i) {
if (exists $parsed{"*MI"}) {
$parsed{"*MI"} .= " ".$val;
} else {
$parsed{"*MI"} = $val;
}
}
}
if (MAP_HEADERS_FROMTOCC) {
if ($hdr =~ /^(?:From|To|Cc)$/i) {
if (exists $parsed{"*Ad"}) {
$parsed{"*Ad"} .= " ".$val;
} else {
$parsed{"*Ad"} = $val;
}
}
}
if (MAP_HEADERS_USERAGENT) {
if ($hdr =~ /^(?:X-Mailer|User-Agent)$/i) {
if (exists $parsed{"*UA"}) {
$parsed{"*UA"} .= " ".$val;
} else {
$parsed{"*UA"} = $val;
}
}
}
# replace hdr name with "compressed" version if possible
if (defined $HEADER_NAME_COMPRESSION{$hdr}) {
@ -1371,11 +1490,13 @@ sub _tokenize_headers {
} else {
$parsed{$hdr} = $val;
}
}
if (would_log('dbg', 'bayes') > 1) {
foreach my $hdr (sort keys %parsed) {
dbg("bayes: header tokens for $hdr = \"$parsed{$hdr}\"");
}
}
return %parsed;
}
@ -1393,7 +1514,7 @@ sub _pre_chew_content_type {
}
# stop-list words for Content-Type header: these wind up totally gray
$val =~ s/\b(?:text|charset)\b//;
$val =~ s/\b(?:text|charset)\b/ /g;
$val;
}
@ -1468,10 +1589,17 @@ sub _pre_chew_addr_header {
my ($self, $val) = @_;
local ($_);
my @addrs = $self->{main}->find_all_addrs_in_line ($val);
my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($val);
my @toks;
foreach (@addrs) {
push (@toks, $self->_tokenize_mail_addrs ($_));
foreach my $addr (@addrs) {
if (defined $addr->{phrase}) {
foreach (split(/\s+/, $addr->{phrase})) {
push @toks, "N*".$_; # Bug 6319
}
}
if (defined $addr->{address}) {
push @toks, $self->_tokenize_mail_addrs($addr->{address});
}
}
return join (' ', @toks);
}

View File

@ -39,16 +39,16 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule("multipart_alternative_difference");
$self->register_eval_rule("multipart_alternative_difference_count");
$self->register_eval_rule("check_blank_line_ratio");
$self->register_eval_rule("tvd_vertical_words");
$self->register_eval_rule("check_stock_info");
$self->register_eval_rule("check_body_length");
$self->register_eval_rule("multipart_alternative_difference", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("multipart_alternative_difference_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_blank_line_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("tvd_vertical_words", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_stock_info", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_body_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("plaintext_body_length");
$self->register_eval_rule("plaintext_sig_length");
$self->register_eval_rule("plaintext_body_sig_ratio");
$self->register_eval_rule("plaintext_body_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("plaintext_sig_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("plaintext_body_sig_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -69,7 +69,7 @@ sub multipart_alternative_difference_count {
my ($self, $pms, $fulltext, $ratio, $minhtml) = @_;
$self->_multipart_alternative_difference($pms) unless (exists $pms->{madiff});
return 0 unless $pms->{madiff_html} > $minhtml;
return(($pms->{madiff_text} / $pms->{madiff_html}) > $ratio);
return (($pms->{madiff_text} / $pms->{madiff_html}) > $ratio);
}
sub _multipart_alternative_difference {
@ -81,7 +81,7 @@ sub _multipart_alternative_difference {
my $msg = $pms->{msg};
# Find all multipart/alternative parts in the message
my @ma = $msg->find_parts(qr@^multipart/alternative\b@i);
my @ma = $msg->find_parts(qr@^multipart/alternative\b@);
# If there are no multipart/alternative sections, skip this test.
return if (!@ma);
@ -104,7 +104,7 @@ sub _multipart_alternative_difference {
my %text;
# limit our search to text-based parts
my @txt = $part->find_parts(qr@^text\b@i);
my @txt = $part->find_parts(qr@^text\b@);
foreach my $text (@txt) {
# we only care about the rendered version of the part
my ($type, $rnd) = $text->rendered();
@ -123,7 +123,7 @@ sub _multipart_alternative_difference {
}
# If there are no words, mark if there's at least 1 image ...
if (!%html && exists $pms->{html}{inside}{img}) {
if (!%html && exists $text->{html_results}{inside}{img}) {
# Use "\n" as the mark since it can't ever occur normally
$html{"\n"}=1;
}
@ -222,7 +222,7 @@ sub tvd_vertical_words {
}
dbg("eval: tvd_vertical_words value: $pms->{tvd_vertical_words} / min: $min / max: $max - value must be >= min and < max");
return 1 if ($pms->{tvd_vertical_words} >= $min && $pms->{tvd_vertical_words} < $max);
return ($pms->{tvd_vertical_words} >= $min && $pms->{tvd_vertical_words} < $max);
}
sub check_stock_info {
@ -241,7 +241,7 @@ sub _check_stock_info {
$pms->{stock_info} = 0;
# Find all multipart/alternative parts in the message
my @parts = $pms->{msg}->find_parts(qr@^text/plain$@i);
my @parts = $pms->{msg}->find_parts(qr@^text/plain$@);
return if (!@parts);
# Go through each of the multipart parts
@ -360,11 +360,17 @@ sub _plaintext_body_sig_ratio {
# Find the last occurence of a signature delimiter and get the body and
# signature lengths.
my ($len_b, $len_s) = map { length } $text =~ /(^|.*\n)-- \n(.*?)$/s;
if (! defined $len_b) { # no sig marker, all body
$len_b = length $text;
$len_s = 0;
my $len_b = length($text);
my $len_s = 0;
while ($text =~ /^-- ?\r?$/mg) {
# ignore decoy marker at the end
next if ( length($text) - $+[0] <= 4 );
$len_b = $-[0];
$len_s = length($text) - $+[0];
}
$pms->{plaintext_body_sig_ratio}->{body_length} = $len_b;

View File

@ -119,6 +119,12 @@ sub extract_set {
my $nicepri = $pri; $nicepri =~ s/-/neg/g;
$self->extract_set_pri($conf, $test_set->{$pri}, $ruletype.'_'.$nicepri);
}
# Clear extract_hints tmpfile
if ($self->{tmpf}) {
unlink $self->{tmpf};
delete $self->{tmpf};
}
}
###########################################################################
@ -190,11 +196,13 @@ NEXT_RULE:
next NEXT_RULE;
}
# ignore ReplaceTags rules
my $is_a_replacetags_rule = $conf->{replace_rules}->{$name};
# ignore ReplaceTags rules, and regex capture template rules
my $is_a_replace_rule = $conf->{replace_rules}->{$name} ||
$conf->{capture_rules}->{$name} ||
$conf->{capture_template_rules}->{$name};
my ($minlen, $lossy, @bases);
if (!$is_a_replacetags_rule) {
if (!$is_a_replace_rule) {
eval { # catch die()s
my ($qr, $mods) = $self->simplify_and_qr_regexp($rule);
($lossy, @bases) = $self->extract_hints($rule, $qr, $mods);
@ -202,6 +210,7 @@ NEXT_RULE:
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
$eval_stat =~ s/ at .*//s;
dbg("zoom: giving up on regexp: $eval_stat");
};
@ -220,9 +229,9 @@ NEXT_RULE:
}
}
if ($is_a_replacetags_rule || !$minlen || !@bases) {
if ($is_a_replace_rule || !$minlen || !@bases) {
dbg("zoom: ignoring rule %s, %s", $name,
$is_a_replacetags_rule ? 'is a replace rule'
$is_a_replace_rule ? 'is a replace rule'
: !@bases ? 'no bases' : 'no minlen');
push @failed, { orig => $rule };
$cached->{rule_bases}->{$cachekey} = { };
@ -469,7 +478,7 @@ sub simplify_and_qr_regexp {
}
else {
die "case-i" if $rule =~ /\(\?i\)/;
die "case-i" if $mods =~ /i/;
die "case-i" if index($mods, 'i') >= 0;
# always case-i: /A(?i:ct) N(?i:ow)/ => /Act Now/
$rule =~ s/(?<!\\)\(\?i\:(.*?)\)/$1/gs and die "case-i";
@ -504,10 +513,7 @@ sub simplify_and_qr_regexp {
}
sub extract_hints {
my $self = shift;
my $rawrule = shift;
my $rule = shift;
my $mods = shift;
my ($self, $rawrule, $rule, $mods) = @_;
my $main = $self->{main};
my $orig = $rule;
@ -534,24 +540,30 @@ sub extract_hints {
# r? => (r|)
$rule =~ s/(?<!\\)(\w)\?/\($1\|\)/gs;
my ($tmpf, $tmpfh) = Mail::SpamAssassin::Util::secure_tmpfile();
# Create single tmpfile for extract_hints to use, instead of thousands
if (!$self->{tmpf}) {
($self->{tmpf}, my $tmpfh) = Mail::SpamAssassin::Util::secure_tmpfile();
$tmpfh or die "failed to create a temporary file";
untaint_var(\$tmpf);
close $tmpfh;
$self->{tmpf} = untaint_var($self->{tmpf});
}
open(my $tmpfh, '>'.$self->{tmpf})
or die "error opening $self->{tmpf}: $!";
binmode $tmpfh;
print $tmpfh "use bytes; m{" . $rule . "}" . $mods
or die "error writing to $tmpf: $!";
close $tmpfh or die "error closing $tmpf: $!";
or die "error writing to $self->{tmpf}: $!";
close $tmpfh or die "error closing $self->{tmpf}: $!";
my $perl = $self->get_perl();
$self->{perl} = $self->get_perl() if !exists $self->{perl};
local *IN;
open (IN, "$perl -c -Mre=debug $tmpf 2>&1 |")
or die "cannot run $perl: ".exit_status_str($?,$!);
open (IN, "$self->{perl} -c -Mre=debug $self->{tmpf} 2>&1 |")
or die "cannot run $self->{perl}: ".exit_status_str($?,$!);
my($inbuf,$nread,$fullstr); $fullstr = '';
while ( $nread=read(IN,$inbuf,16384) ) { $fullstr .= $inbuf }
defined $nread or die "error reading from pipe: $!";
unlink $tmpf or die "cannot unlink $tmpf: $!";
close IN or die "error closing pipe: $!";
defined $fullstr or warn "empty result from a pipe";
@ -1110,7 +1122,7 @@ sub fixup_re {
$output .= "\"$esc\"";
}
}
else {
elsif ($fixup_re_test) {
print "PRE: $pre\nTOK: $tok\n" or die "error writing: $!";
}
}
@ -1132,6 +1144,8 @@ sub fixup_re {
$output =~ s/\*\*BACKSLASH\*\*/\\\\/gs;
if ($fixup_re_test) { print "OUTPUT: $output\n" or die "error writing: $!" }
utf8::encode($output) if utf8::is_utf8($output); # force octets
return $output;
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -30,6 +30,12 @@ Taking into account signatures from any signing domains:
full DKIM_VALID_AU eval:check_dkim_valid_author_sig()
full DKIM_VALID_EF eval:check_dkim_valid_envelopefrom()
Taking into account ARC signatures (Authenticated Received Chain, RFC 8617)
from any signing domains:
full ARC_SIGNED eval:check_arc_signed()
full ARC_VALID eval:check_arc_valid()
Taking into account signatures from specified signing domains only:
(quotes may be omitted on domain names consisting only of letters, digits,
dots, and minus characters)
@ -127,6 +133,7 @@ package Mail::SpamAssassin::Plugin::DKIM;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Timeout;
use Mail::SpamAssassin::Util qw(idn_to_ascii);
use strict;
use warnings;
@ -145,19 +152,23 @@ sub new {
bless ($self, $class);
# signatures
$self->register_eval_rule("check_dkim_signed");
$self->register_eval_rule("check_dkim_valid");
$self->register_eval_rule("check_dkim_valid_author_sig");
$self->register_eval_rule("check_dkim_testing");
$self->register_eval_rule("check_dkim_valid_envelopefrom");
$self->register_eval_rule("check_dkim_signed", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->register_eval_rule("check_arc_signed", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->register_eval_rule("check_dkim_valid", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->register_eval_rule("check_arc_valid", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->register_eval_rule("check_dkim_valid_author_sig", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->register_eval_rule("check_dkim_testing", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_dkim_valid_envelopefrom", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
# author domain signing practices
$self->register_eval_rule("check_dkim_adsp");
$self->register_eval_rule("check_dkim_dependable");
$self->register_eval_rule("check_dkim_adsp", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_dkim_dependable", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
# whitelisting
$self->register_eval_rule("check_for_dkim_whitelist_from");
$self->register_eval_rule("check_for_def_dkim_whitelist_from");
# welcomelisting
$self->register_eval_rule("check_for_dkim_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_dkim_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); #Stub - Remove in SA 4.1
$self->register_eval_rule("check_for_def_dkim_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_def_dkim_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); #Stub - Remove in SA 4.1
# old names (aliases) for compatibility
$self->register_eval_rule("check_dkim_verified"); # = check_dkim_valid
@ -179,17 +190,19 @@ sub set_config {
=over 4
=item whitelist_from_dkim author@example.com [signing-domain]
=item welcomelist_from_dkim author@example.com [signing-domain]
Works similarly to whitelist_from, except that in addition to matching
Previously whitelist_from_dkim which will work interchangeably until 4.1.
Works similarly to welcomelist_from, except that in addition to matching
an author address (From) to the pattern in the first parameter, the message
must also carry a valid Domain Keys Identified Mail (DKIM) signature made by
a signing domain (SDID, i.e. the d= tag) that is acceptable to us.
Only one whitelist entry is allowed per line, as in C<whitelist_from_rcvd>.
Multiple C<whitelist_from_dkim> lines are allowed. File-glob style characters
Only one welcomelist entry is allowed per line, as in C<welcomelist_from_rcvd>.
Multiple C<welcomelist_from_dkim> lines are allowed. File-glob style characters
are allowed for the From address (the first parameter), just like with
C<whitelist_from_rcvd>.
C<welcomelist_from_rcvd>.
The second parameter (the signing-domain) does not accept full file-glob style
wildcards, although a simple '*.' (or just a '.') prefix to a domain name
@ -201,39 +214,43 @@ will be an Author Domain Signature (sometimes called first-party signature)
which is a signature where the signing domain (SDID) of a signature matches
the domain of the author's address (i.e. the address in a From header field).
Since this whitelist requires a DKIM check to be made, network tests must
Since this welcomelist requires a DKIM check to be made, network tests must
be enabled.
Examples of whitelisting based on an author domain signature (first-party):
Examples of welcomelisting based on an author domain signature (first-party):
whitelist_from_dkim joe@example.com
whitelist_from_dkim *@corp.example.com
whitelist_from_dkim *@*.example.com
welcomelist_from_dkim joe@example.com
welcomelist_from_dkim *@corp.example.com
welcomelist_from_dkim *@*.example.com
Examples of whitelisting based on third-party signatures:
Examples of welcomelisting based on third-party signatures:
whitelist_from_dkim jane@example.net example.org
whitelist_from_dkim rick@info.example.net example.net
whitelist_from_dkim *@info.example.net example.net
whitelist_from_dkim *@* mail7.remailer.example.com
whitelist_from_dkim *@* *.remailer.example.com
welcomelist_from_dkim jane@example.net example.org
welcomelist_from_dkim rick@info.example.net example.net
welcomelist_from_dkim *@info.example.net example.net
welcomelist_from_dkim *@* mail7.remailer.example.com
welcomelist_from_dkim *@* *.remailer.example.com
=item def_whitelist_from_dkim author@example.com [signing-domain]
=item def_welcomelist_from_dkim author@example.com [signing-domain]
Same as C<whitelist_from_dkim>, but used for the default whitelist entries
in the SpamAssassin distribution. The whitelist score is lower, because
Previously def_whitelist_from_dkim which will work interchangeably until 4.1.
Same as C<welcomelist_from_dkim>, but used for the default welcomelist entries
in the SpamAssassin distribution. The welcomelist score is lower, because
these are often targets for abuse of public mailers which sign their mail.
=item unwhitelist_from_dkim author@example.com [signing-domain]
=item unwelcomelist_from_dkim author@example.com [signing-domain]
Previously unwhitelist_from_dkim which will work interchangeably until 4.1.
Removes an email address with its corresponding signing-domain field
from def_whitelist_from_dkim and whitelist_from_dkim tables, if it exists.
Parameters to unwhitelist_from_dkim must exactly match the parameters of
a corresponding whitelist_from_dkim or def_whitelist_from_dkim config
from def_welcomelist_from_dkim and welcomelist_from_dkim tables, if it exists.
Parameters to unwelcomelist_from_dkim must exactly match the parameters of
a corresponding welcomelist_from_dkim or def_welcomelist_from_dkim config
option which created the entry, for it to be removed (a domain name is
matched case-insensitively); i.e. if a signing-domain parameter was
specified in a whitelisting command, it must also be specified in the
unwhitelisting command.
specified in a welcomelisting command, it must also be specified in the
unwelcomelisting command.
Useful for removing undesired default entries from a distributed configuration
by a local or site-specific configuration or by C<user_prefs>.
@ -375,7 +392,7 @@ Example:
=item dkim_minimum_key_bits n (default: 1024)
The smallest size of a signing key (in bits) for a valid signature to be
considered for whitelisting. Additionally, the eval function check_dkim_valid()
considered for welcomelisting. Additionally, the eval function check_dkim_valid()
will return false on short keys when called with explicitly listed domains,
and the eval function check_dkim_valid_author_sig() will return false on short
keys (regardless of its arguments). Setting the option to 0 disables a key
@ -389,11 +406,13 @@ prepend its own signature on a copy of some third party mail and re-send it,
which makes it no more trustworthy than without such signature. This is also
a reason for a rule DKIM_VALID to have a near-zero score, i.e. a rule hit
is only informational.
This option is evaluated on ARC signatures checks as well.
=cut
push (@cmds, {
setting => 'whitelist_from_dkim',
setting => 'welcomelist_from_dkim',
aliases => ['whitelist_from_dkim'], # removed in 4.1
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my ($self, $key, $value, $line) = @_;
@ -407,13 +426,14 @@ is only informational.
my $address = $1;
my $sdid = defined $2 ? $2 : ''; # empty implies author domain signature
$address =~ s/(\@[^@]*)\z/lc($1)/e; # lowercase the email address domain
$self->{parser}->add_to_addrlist_dkim('whitelist_from_dkim',
$self->{parser}->add_to_addrlist_dkim('welcomelist_from_dkim',
$address, lc $sdid);
}
});
push (@cmds, {
setting => 'def_whitelist_from_dkim',
setting => 'def_welcomelist_from_dkim',
aliases => ['def_whitelist_from_dkim'], # removed in 4.1
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my ($self, $key, $value, $line) = @_;
@ -427,13 +447,14 @@ is only informational.
my $address = $1;
my $sdid = defined $2 ? $2 : ''; # empty implies author domain signature
$address =~ s/(\@[^@]*)\z/lc($1)/e; # lowercase the email address domain
$self->{parser}->add_to_addrlist_dkim('def_whitelist_from_dkim',
$self->{parser}->add_to_addrlist_dkim('def_welcomelist_from_dkim',
$address, lc $sdid);
}
});
push (@cmds, {
setting => 'unwhitelist_from_dkim',
setting => 'unwelcomelist_from_dkim',
aliases => ['unwhitelist_from_dkim'], # removed in 4.1
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my ($self, $key, $value, $line) = @_;
@ -447,9 +468,9 @@ is only informational.
my $address = $1;
my $sdid = defined $2 ? $2 : ''; # empty implies author domain signature
$address =~ s/(\@[^@]*)\z/lc($1)/e; # lowercase the email address domain
$self->{parser}->remove_from_addrlist_dkim('whitelist_from_dkim',
$self->{parser}->remove_from_addrlist_dkim('welcomelist_from_dkim',
$address, lc $sdid);
$self->{parser}->remove_from_addrlist_dkim('def_whitelist_from_dkim',
$self->{parser}->remove_from_addrlist_dkim('def_welcomelist_from_dkim',
$address, lc $sdid);
}
});
@ -480,7 +501,7 @@ is only informational.
}
});
# minimal signing key size in bits that is acceptable for whitelisting
# minimal signing key size in bits that is acceptable for welcomelisting
push (@cmds, {
setting => 'dkim_minimum_key_bits',
default => 1024,
@ -530,6 +551,18 @@ sub check_dkim_signed {
return $result;
}
sub check_arc_signed {
my ($self, $pms, $full_ref, @acceptable_domains) = @_;
$self->_check_dkim_signature($pms) if !$pms->{arc_checked_signature};
my $result = 0;
if (!$pms->{arc_signed}) {
# don't bother
} elsif (!@acceptable_domains) {
$result = 1; # no additional constraints, any signing domain will do
}
return $result;
}
sub check_dkim_valid {
my ($self, $pms, $full_ref, @acceptable_domains) = @_;
$self->_check_dkim_signature($pms) if !$pms->{dkim_checked_signature};
@ -545,6 +578,19 @@ sub check_dkim_valid {
return $result;
}
sub check_arc_valid {
my ($self, $pms, $full_ref, @acceptable_domains) = @_;
$self->_check_dkim_signature($pms) if !$pms->{arc_checked_signature};
my $result = 0;
if (!$pms->{arc_valid}) {
# don't bother
} elsif (!@acceptable_domains) {
$result = 1; # no additional constraints, any signing domain will do,
# also any signing key size will do
}
return $result;
}
sub check_dkim_valid_author_sig {
my ($self, $pms, $full_ref, @acceptable_domains) = @_;
$self->_check_dkim_signature($pms) if !$pms->{dkim_checked_signature};
@ -560,9 +606,10 @@ sub check_dkim_valid_author_sig {
sub check_dkim_valid_envelopefrom {
my ($self, $pms, $full_ref) = @_;
my $result = 0;
my $envfrom=$self->{'main'}->{'registryboundaries'}->uri_to_domain($pms->get("EnvelopeFrom"));
my ($envfrom) = ($pms->get('EnvelopeFrom:addr')||'') =~ /\@(\S+)/;
# if no envelopeFrom, it cannot be valid
return $result if !$envfrom;
return $result if !defined $envfrom;
$envfrom = lc $envfrom;
$self->_check_dkim_signature($pms) if !$pms->{dkim_checked_signature};
if (!$pms->{dkim_valid}) {
# don't bother
@ -646,19 +693,21 @@ sub check_dkim_testing {
return $result;
}
sub check_for_dkim_whitelist_from {
sub check_for_dkim_welcomelist_from {
my ($self, $pms) = @_;
$self->_check_dkim_whitelist($pms) if !$pms->{whitelist_checked};
return $pms->{dkim_match_in_whitelist_from_dkim} ||
$pms->{dkim_match_in_whitelist_auth};
$self->_check_dkim_welcomelist($pms) if !$pms->{welcomelist_checked};
return ($pms->{dkim_match_in_welcomelist_from_dkim} ||
$pms->{dkim_match_in_welcomelist_auth}) ? 1 : 0;
}
*check_for_dkim_whitelist_from = \&check_for_dkim_welcomelist_from; # removed in 4.1
sub check_for_def_dkim_whitelist_from {
sub check_for_def_dkim_welcomelist_from {
my ($self, $pms) = @_;
$self->_check_dkim_whitelist($pms) if !$pms->{whitelist_checked};
return $pms->{dkim_match_in_def_whitelist_from_dkim} ||
$pms->{dkim_match_in_def_whitelist_auth};
$self->_check_dkim_welcomelist($pms) if !$pms->{welcomelist_checked};
return ($pms->{dkim_match_in_def_welcomelist_from_dkim} ||
$pms->{dkim_match_in_def_welcomelist_auth}) ? 1 : 0;
}
*check_for_def_dkim_whitelist_from = \&check_for_def_dkim_welcomelist_from; # removed in 4.1
# ---------------------------------------------------------------------------
@ -667,8 +716,7 @@ sub _dkim_load_modules {
if (!$self->{tried_loading}) {
$self->{service_available} = 0;
my $timemethod = $self->{main}->UNIVERSAL::can("time_method") &&
$self->{main}->time_method("dkim_load_modules");
my $timemethod = $self->{main}->time_method("dkim_load_modules");
my $eval_stat;
eval {
# Have to do this so that RPM doesn't find these as required perl modules.
@ -686,6 +734,9 @@ sub _dkim_load_modules {
my $version = Mail::DKIM::Verifier->VERSION;
if (version->parse($version) >= version->parse(0.31)) {
dbg("dkim: using Mail::DKIM version $version");
} elsif (version->parse($version) < version->parse(0.50)) {
dbg("dkim: Mail::DKIM $version is older than 0.50 ".
"ARC support will not be available, suggested upgrade to 0.50 or later!");
} else {
info("dkim: Mail::DKIM $version is older than the required ".
"minimal version 0.31, suggested upgrade to 0.37 or later!");
@ -698,6 +749,18 @@ sub _dkim_load_modules {
eval { require Mail::DKIM::DkimPolicy } # ignoring status
}
}
eval {
# Have to do this so that RPM doesn't find these as required perl modules.
{ require Mail::DKIM::ARC::Verifier }
$self->{arc_available} = 1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
if (defined $eval_stat) {
dbg("dkim: cannot load Mail::DKIM::ARC module, DKIM::ARC checks disabled: %s",
$eval_stat);
}
$self->{arc_available} = 0;
};
}
return $self->{service_available};
}
@ -720,8 +783,8 @@ sub _check_dkim_signed_by {
next if $minimum_key_bits && $sig->{_spamassassin_key_size} &&
$sig->{_spamassassin_key_size} < $minimum_key_bits;
}
my $sdid = $sig->domain;
next if !defined $sdid; # a signature with a missing required tag 'd' ?
my ($sdid) = (defined $sig->identity)? $sig->identity =~ /\@(\S+)/ : ($sig->domain);
next if !defined $sdid; # a signature with a missing required tag 'd' or 'i' ?
$sdid = lc $sdid;
if ($must_be_author_domain_signature) {
next if !$pms->{dkim_author_domains}->{$sdid};
@ -765,9 +828,10 @@ sub _check_dkim_signature {
my ($self, $pms) = @_;
my $conf = $pms->{conf};
my($verifier, @signatures, @valid_signatures);
my($verifier, $arc_verifier, @signatures, @arc_signatures, @valid_signatures, @arc_valid_signatures);
$pms->{dkim_checked_signature} = 1; # has this sub already been invoked?
$pms->{arc_checked_signature} = 1; # has this sub already been invoked?
$pms->{dkim_signatures_ready} = 0; # have we obtained & verified signatures?
$pms->{dkim_signatures_dependable} = 0;
# dkim_signatures_dependable =
@ -776,16 +840,18 @@ sub _check_dkim_signature {
# (no signatures, or message was not truncated) )
$pms->{dkim_signatures} = \@signatures;
$pms->{dkim_valid_signatures} = \@valid_signatures;
$pms->{arc_signatures} = \@arc_signatures;
$pms->{arc_valid_signatures} = \@arc_valid_signatures;
$pms->{dkim_signed} = 0;
$pms->{arc_signed} = 0;
$pms->{dkim_valid} = 0;
$pms->{arc_valid} = 0;
$pms->{dkim_key_testing} = 0;
# the following hashes are keyed by a signing domain (SDID):
$pms->{dkim_author_sig_tempfailed} = {}; # DNS timeout verifying author sign.
$pms->{dkim_has_valid_author_sig} = {}; # a valid author domain signature
$pms->{dkim_has_any_author_sig} = {}; # valid or invalid author domain sign.
$self->_get_authors($pms) if !$pms->{dkim_author_addresses};
my $suppl_attrib = $pms->{msg}->{suppl_attrib};
if (defined $suppl_attrib && exists $suppl_attrib->{dkim_signatures}) {
# caller of SpamAssassin already supplied DKIM signature objects
@ -793,48 +859,88 @@ sub _check_dkim_signature {
@signatures = @$provided_signatures if ref $provided_signatures;
$pms->{dkim_signatures_ready} = 1;
$pms->{dkim_signatures_dependable} = 1;
dbg("dkim: signatures provided by the caller, %d signatures",
dbg("dkim: DKIM signatures provided by the caller, %d signatures",
scalar(@signatures));
}
if (defined $suppl_attrib && exists $suppl_attrib->{arc_signatures}) {
# caller of SpamAssassin already supplied ARC signature objects
my $provided_arc_signatures = $suppl_attrib->{arc_signatures};
@arc_signatures = @$provided_arc_signatures if ref $provided_arc_signatures;
$pms->{arc_signatures_ready} = 1;
$pms->{arc_signatures_dependable} = 1;
dbg("dkim: ARC signatures provided by the caller, %d signatures",
scalar(@arc_signatures));
}
if ($pms->{dkim_signatures_ready}) {
if ($pms->{dkim_signatures_ready} or $pms->{arc_signatures_ready}) {
# signatures already available and verified
_check_valid_signature($self, $pms, $verifier, 'DKIM', \@signatures) if $self->{service_available};
_check_valid_signature($self, $pms, $arc_verifier, 'ARC', \@arc_signatures) if $self->{arc_available};
} elsif (!$pms->is_dns_available()) {
dbg("dkim: signature verification disabled, DNS resolving not available");
} elsif (!$self->_dkim_load_modules()) {
# Mail::DKIM module not available
} else {
# signature objects not provided by the caller, must verify for ourselves
my $timemethod = $self->{main}->UNIVERSAL::can("time_method") &&
$self->{main}->time_method("check_dkim_signature");
use version 0.77;
if (version->parse(Mail::DKIM::Verifier->VERSION) >= version->parse(0.40)) {
my $timemethod = $self->{main}->time_method("check_dkim_signature");
if (Mail::DKIM::Verifier->VERSION >= 0.40) {
my $edns = $conf->{dns_options}->{edns};
if ($edns && $edns >= 1024) {
# Let Mail::DKIM use our interface to Net::DNS::Resolver.
# Only do so if EDNS0 provides a reasonably-sized UDP payload size,
# as our interface does not provide a DNS fallback to TCP, unlike
# the Net::DNS::Resolver::send which does provide it.
# See also Bug 7265 regarding a choice of a resolver.
# my $res = $self->{main}->{resolver}->get_resolver;
my $res = $self->{main}->{resolver};
dbg("dkim: providing our own resolver: %s", ref $res);
Mail::DKIM::DNS::resolver($res);
}
}
$verifier = Mail::DKIM::Verifier->new;
if (!$verifier) {
dbg("dkim: cannot create Mail::DKIM::Verifier object");
return;
$verifier = Mail::DKIM::Verifier->new if $self->{service_available};
_check_signature($self, $pms, $verifier, 'DKIM', \@signatures) if $self->{service_available};
$arc_verifier = Mail::DKIM::ARC::Verifier->new if $self->{arc_available};
_check_signature($self, $pms, $arc_verifier, 'ARC', \@arc_signatures) if $self->{arc_available};
}
}
sub _check_signature {
my($self, $pms, $verifier, $type, $signatures) = @_;
my $sig_type = lc $type;
$self->_get_authors($pms) if !$pms->{"${sig_type}_author_addresses"};
my(@valid_signatures);
my $conf = $pms->{conf};
if (!$verifier) {
if ($type eq 'DKIM') {
dbg("dkim: cannot create Mail::DKIM::Verifier object");
} elsif ($type eq 'ARC') {
dbg("dkim: cannot create Mail::DKIM::ARC::Verifier object");
}
return;
} else {
if ($type eq 'DKIM') {
$pms->{dkim_verifier} = $verifier;
#
} elsif ($type eq 'ARC') {
$pms->{arc_verifier} = $verifier;
}
}
# feed content of a message into verifier, using \r\n endings,
# required by Mail::DKIM API (see bug 5300)
# note: bug 5179 comment 28: perl does silly things on non-Unix platforms
# unless we use \015\012 instead of \r\n
eval {
my $str = $pms->{msg}->get_pristine();
$str =~ s/\r?\n/\015\012/sg; # ensure \015\012 ending
if ($pms->{msg}->{line_ending} eq "\015\012") {
# message already CRLF, just feed it
$verifier->PRINT($str);
} else {
# feeding large chunk to Mail::DKIM is _much_ faster than line-by-line
$str =~ s/\012/\015\012/gs; # LF -> CRLF
$verifier->PRINT($str);
undef $str;
}
1;
} or do { # intercept die() exceptions and render safe
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
@ -847,7 +953,7 @@ sub _check_dkim_signature {
{ secs => $timeout, deadline => $pms->{master_deadline} });
my $err = $timer->run_and_catch(sub {
dbg("dkim: performing public key lookup and signature verification");
dbg("dkim: performing public $type key lookup and signature verification");
$verifier->CLOSE(); # the action happens here
# currently SpamAssassin's parsing is better than Mail::Address parsing,
@ -856,8 +962,17 @@ sub _check_dkim_signature {
# versions before 0.29 only provided a public interface to fetch one
# signature, newer versions allow access to all signatures of a message
@signatures = $verifier->UNIVERSAL::can("signatures") ?
@$signatures = $verifier->UNIVERSAL::can("signatures") ?
$verifier->signatures : $verifier->signature;
if (would_log("dbg","dkim")) {
foreach my $signature (@$signatures) {
dbg("dkim: $type signature i=%s d=%s",
map(!defined $_ ? '(undef)' : $_,
$signature->identity, $signature->domain
)
);
}
}
});
if ($timer->timed_out()) {
dbg("dkim: public key lookup or verification timed out after %s s",
@ -867,23 +982,43 @@ sub _check_dkim_signature {
} elsif ($err) {
chomp $err;
dbg("dkim: public key lookup or verification failed: $err");
dbg("dkim: $type public key lookup or verification failed: $err");
}
if ($type eq 'DKIM') {
$pms->{dkim_signatures_ready} = 1;
if (!@signatures || !$pms->{tests_already_hit}->{'__TRUNCATED'}) {
if (!@$signatures || !$pms->{tests_already_hit}->{'__TRUNCATED'}) {
$pms->{dkim_signatures_dependable} = 1;
}
_check_valid_signature($self, $pms, $verifier, 'DKIM', \@$signatures) if $self->{service_available};
} elsif ($type eq 'ARC') {
$pms->{arc_signatures_ready} = 1;
if (!@$signatures || !$pms->{tests_already_hit}->{'__TRUNCATED'}) {
$pms->{arc_signatures_dependable} = 1;
}
_check_valid_signature($self, $pms, $verifier, 'ARC', \@$signatures) if $self->{arc_available};
}
}
if ($pms->{dkim_signatures_ready}) {
sub _check_valid_signature {
my($self, $pms, $verifier, $type, $signatures) = @_;
my $sig_type = lc $type;
$self->_get_authors($pms) if !$pms->{"${sig_type}_author_addresses"};
my(@valid_signatures);
my $conf = $pms->{conf};
# DKIM signatures check
if ($pms->{"${sig_type}_signatures_ready"}) {
my $sig_result_supported;
# dkim_minimum_key_bits is evaluated for ARC signatures as well
my $minimum_key_bits = $conf->{dkim_minimum_key_bits};
foreach my $signature (@signatures) {
foreach my $signature (@$signatures) {
# old versions of Mail::DKIM would give undef for an invalid signature
next if !defined $signature;
next if !$signature->selector; # empty selector
$sig_result_supported = $signature->UNIVERSAL::can("result_detail");
# test for empty selector (must not treat a selector "0" as missing!)
next if !defined $signature->selector || $signature->selector eq "";
my($info, $valid, $expired);
$valid =
($sig_result_supported ? $signature : $verifier)->result eq 'pass';
@ -904,23 +1039,24 @@ sub _check_dkim_signature {
push(@valid_signatures, $signature) if $valid && !$expired;
# check if we have a potential Author Domain Signature, valid or not
my $d = $signature->domain;
my ($d) = (defined $signature->identity)? $signature->identity =~ /\@(\S+)/ : ($signature->domain);
if (!defined $d) {
# can be undefined on a broken signature with missing required tags
} else {
$d = lc $d;
if ($pms->{dkim_author_domains}->{$d}) { # SDID matches author domain
$pms->{dkim_has_any_author_sig}->{$d} = 1;
if ($pms->{"${sig_type}_author_domains"}->{$d}) { # SDID matches author domain
$pms->{"${sig_type}_has_any_author_sig"}->{$d} = 1;
if ($valid && !$expired &&
$key_size && $key_size >= $minimum_key_bits) {
$pms->{dkim_has_valid_author_sig}->{$d} = 1;
$pms->{"${sig_type}_has_valid_author_sig"}->{$d} = 1;
} elsif ( ($sig_result_supported ? $signature
: $verifier)->result_detail
=~ /\b(?:timed out|SERVFAIL)\b/i) {
$pms->{dkim_author_sig_tempfailed}->{$d} = 1;
$pms->{"${sig_type}_author_sig_tempfailed"}->{$d} = 1;
}
}
}
if ($type eq 'DKIM') {
if (would_log("dbg","dkim")) {
dbg("dkim: %s %s, i=%s, d=%s, s=%s, a=%s, c=%s, %s, %s, %s",
$info,
@ -935,14 +1071,28 @@ sub _check_dkim_signature {
: 'does not match author domain',
);
}
} elsif ($type eq 'ARC') {
if (would_log("dbg","dkim")) {
dbg("dkim: %s %s, i=%s, d=%s, s=%s, a=%s, c=%s, %s, %s, %s",
$info,
$type,
map(!defined $_ ? '(undef)' : $_,
$signature->identity, $d, $signature->selector,
$signature->algorithm, scalar($signature->canonicalization),
$key_size ? "key_bits=$key_size" : "unknown key size",
($sig_result_supported ? $signature : $verifier)->result ),
defined $d && $pms->{arc_author_domains}->{$d}
? 'matches author domain'
: 'does not match author domain',
);
}
}
}
if (@valid_signatures) {
if ($type eq 'DKIM') {
$pms->{dkim_signed} = 1;
$pms->{dkim_valid} = 1;
# let the result stand out more clearly in the log, use uppercase
my $sig = $valid_signatures[0];
my $sig_res = ($sig_result_supported ? $sig : $verifier)->result_detail;
dbg("dkim: signature verification result: %s", uc($sig_res));
# supply values for both tags
my(%seen1, %seen2, %seen3, @identity_list, @domain_list, @selector_list);
@ -958,14 +1108,27 @@ sub _check_dkim_signature {
@domain_list == 1 ? $domain_list[0] : \@domain_list);
$pms->set_tag('DKIMSELECTOR',
@selector_list == 1 ? $selector_list[0] : \@selector_list);
} elsif (@signatures) {
} elsif ($type eq 'ARC') {
$pms->{arc_signed} = 1;
$pms->{arc_valid} = 1;
}
# let the result stand out more clearly in the log, use uppercase
my $sig = $valid_signatures[0];
my $sig_res = ($sig_result_supported ? $sig : $verifier)->result_detail;
dbg("dkim: $type signature verification result: %s", uc($sig_res));
} elsif (@$signatures) {
if ($type eq 'DKIM') {
$pms->{dkim_signed} = 1;
my $sig = $signatures[0];
my $sig_res =
($sig_result_supported && $sig ? $sig : $verifier)->result_detail;
dbg("dkim: signature verification result: %s", uc($sig_res));
} elsif ($type eq 'ARC') {
$pms->{arc_signed} = 1;
}
my $sig = @$signatures[0];
my $sig_res = ($sig_result_supported ? $sig : $verifier)->result_detail;
dbg("dkim: $type signature verification result: %s", uc($sig_res));
} else {
dbg("dkim: signature verification result: none");
dbg("dkim: $type signature verification result: none");
}
}
}
@ -1064,8 +1227,7 @@ sub _check_dkim_adsp {
dbg("dkim: adsp not retrieved, module Mail::DKIM not available");
} else { # do the ADSP DNS lookup
my $timemethod = $self->{main}->UNIVERSAL::can("time_method") &&
$self->{main}->time_method("check_dkim_adsp");
my $timemethod = $self->{main}->time_method("check_dkim_adsp");
my $practices; # author domain signing practices object
my $timeout = $pms->{conf}->{dkim_timeout};
@ -1074,12 +1236,13 @@ sub _check_dkim_adsp {
my $err = $timer->run_and_catch(sub {
eval {
if (Mail::DKIM::AuthorDomainPolicy->UNIVERSAL::can("fetch")) {
my $author_domain_ace = idn_to_ascii($author_domain);
dbg("dkim: adsp: performing lookup on _adsp._domainkey.%s",
$author_domain);
$author_domain_ace);
# get our Net::DNS::Resolver object
my $res = $self->{main}->{resolver}->get_resolver;
$practices = Mail::DKIM::AuthorDomainPolicy->fetch(
Protocol => "dns", Domain => $author_domain,
Protocol => "dns", Domain => $author_domain_ace,
DnsResolver => $res);
}
1;
@ -1128,36 +1291,36 @@ sub _check_dkim_adsp {
}
}
sub _check_dkim_whitelist {
sub _check_dkim_welcomelist {
my ($self, $pms) = @_;
$pms->{whitelist_checked} = 1;
$pms->{welcomelist_checked} = 1;
$self->_get_authors($pms) if !$pms->{dkim_author_addresses};
my $authors_str = join(", ", @{$pms->{dkim_author_addresses}});
if ($authors_str eq '') {
dbg("dkim: check_dkim_whitelist: could not find author address");
dbg("dkim: check_dkim_weclomelist: could not find author address");
return;
}
# collect whitelist entries matching the author from all lists
# collect welcomelist entries matching the author from all lists
my @acceptable_sdid_tuples;
$self->_wlcheck_acceptable_signature($pms, \@acceptable_sdid_tuples,
'def_whitelist_from_dkim');
'def_welcomelist_from_dkim');
$self->_wlcheck_author_signature($pms, \@acceptable_sdid_tuples,
'def_whitelist_auth');
'def_welcomelist_auth');
$self->_wlcheck_acceptable_signature($pms, \@acceptable_sdid_tuples,
'whitelist_from_dkim');
'welcomelist_from_dkim');
$self->_wlcheck_author_signature($pms, \@acceptable_sdid_tuples,
'whitelist_auth');
'welcomelist_auth');
if (!@acceptable_sdid_tuples) {
dbg("dkim: no wl entries match author %s, no need to verify sigs",
$authors_str);
return;
}
# if the message doesn't pass DKIM validation, it can't pass DKIM whitelist
# if the message doesn't pass DKIM validation, it can't pass DKIM welcomelist
# trigger a DKIM check;
# continue if one or more signatures are valid or we want the debug info
@ -1177,38 +1340,38 @@ sub _check_dkim_whitelist {
}
}
if (@valid) {
dbg("dkim: author %s, WHITELISTED by %s",
dbg("dkim: author %s, WELCOMELISTED by %s",
$authors_str, join(", ",@valid));
} elsif (@fail) {
dbg("dkim: author %s, found in %s BUT IGNORED",
$authors_str, join(", ",@fail));
} else {
dbg("dkim: author %s, not in any dkim whitelist", $authors_str);
dbg("dkim: author %s, not in any dkim welcomelist", $authors_str);
}
}
# check for verifier-acceptable signatures; an empty (or undefined) signing
# domain in a whitelist implies checking for an Author Domain Signature
# domain in a welcomelist implies checking for an Author Domain Signature
#
sub _wlcheck_acceptable_signature {
my ($self, $pms, $acceptable_sdid_tuples_ref, $wl) = @_;
my $wl_ref = $pms->{conf}->{$wl};
foreach my $author (@{$pms->{dkim_author_addresses}}) {
foreach my $white_addr (keys %$wl_ref) {
my $wl_addr_ref = $wl_ref->{$white_addr};
my $re = qr/$wl_addr_ref->{re}/i;
# dbg("dkim: WL %s %s, d: %s", $wl, $white_addr,
my $author_lc = lc($author);
foreach my $welcome_addr (keys %$wl_ref) {
my $wl_addr_ref = $wl_ref->{$welcome_addr};
# dbg("dkim: WL %s %s, d: %s", $wl, $welcome_addr,
# join(", ", map { $_ eq '' ? "''" : $_ } @{$wl_addr_ref->{domain}}));
if ($author =~ $re) {
if ($author_lc =~ /$wl_addr_ref->{re}/) {
foreach my $sdid (@{$wl_addr_ref->{domain}}) {
push(@$acceptable_sdid_tuples_ref, [$author,$sdid,$wl,$re]);
push(@$acceptable_sdid_tuples_ref, [$author,$sdid,$wl,$welcome_addr]);
}
}
}
}
}
# use a traditional whitelist_from -style addrlist, the only acceptable DKIM
# use a traditional welcomelist_from -style addrlist, the only acceptable DKIM
# signature is an Author Domain Signature. Note: don't pre-parse and store
# domains; that's inefficient memory-wise and only saves one m//
#
@ -1216,11 +1379,11 @@ sub _wlcheck_author_signature {
my ($self, $pms, $acceptable_sdid_tuples_ref, $wl) = @_;
my $wl_ref = $pms->{conf}->{$wl};
foreach my $author (@{$pms->{dkim_author_addresses}}) {
foreach my $white_addr (keys %$wl_ref) {
my $re = qr/$wl_ref->{$white_addr}/i;
# dbg("dkim: WL %s %s", $wl, $white_addr);
if ($author =~ $re) {
push(@$acceptable_sdid_tuples_ref, [$author,undef,$wl,$re]);
my $author_lc = lc($author);
foreach my $welcome_addr (keys %$wl_ref) {
# dbg("dkim: WL %s %s", $wl, $welcome_addr);
if ($author_lc =~ /$wl_ref->{$welcome_addr}/) {
push(@$acceptable_sdid_tuples_ref, [$author,undef,$wl,$welcome_addr]);
}
}
}
@ -1238,9 +1401,10 @@ sub _wlcheck_list {
foreach my $signature (@{$pms->{dkim_signatures}}) {
# old versions of Mail::DKIM would give undef for an invalid signature
next if !defined $signature;
next if !$signature->selector; # empty selector
my $sig_result_supported = $signature->UNIVERSAL::can("result_detail");
# test for empty selector (must not treat a selector "0" as missing!)
next if !defined $signature->selector || $signature->selector eq "";
my($info, $valid, $expired, $key_size_weak);
$valid =
($sig_result_supported ? $signature : $verifier)->result eq 'pass';
@ -1256,13 +1420,13 @@ sub _wlcheck_list {
}
}
my $sdid = $signature->domain;
my ($sdid) = (defined $signature->identity)? $signature->identity =~ /\@(\S+)/ : ($signature->domain);
$sdid = lc $sdid if defined $sdid;
my %tried_authors;
foreach my $entry (@$acceptable_sdid_tuples_ref) {
my($author, $acceptable_sdid, $wl, $re) = @$entry;
# $re and $wl are here for logging purposes only, $re already checked.
my($author, $acceptable_sdid, $wl, $welcome_addr) = @$entry;
# $welcome_addr and $wl are here for logging purposes only, already checked.
# The $acceptable_sdid is a verifier-acceptable signing domain
# identifier (to be matched against a 'd' tag in signatures).
# When $acceptable_sdid is undef or an empty string it implies
@ -1274,7 +1438,7 @@ sub _wlcheck_list {
my $matches = 0;
if (!defined $sdid) {
# don't bother, invalid signature with a missing 'd' tag
# don't bother, invalid signature with a missing 'd' or 'i' tag
} elsif (!defined $acceptable_sdid || $acceptable_sdid eq '') {
# An "Author Domain Signature" (sometimes called a first-party
@ -1287,7 +1451,7 @@ sub _wlcheck_list {
$matches = 1 if $sdid eq $author_domain;
} else { # checking for verifier-acceptable signature
# The second argument to a 'whitelist_from_dkim' option is now (since
# The second argument to a 'welcomelist_from_dkim' option is now (since
# version 3.3.0) supposed to be a signing domain (SDID), no longer an
# identity (AUID). Nevertheless, be prepared to accept the full e-mail
# address there for compatibility, and just ignore its local-part.
@ -1303,17 +1467,17 @@ sub _wlcheck_list {
if (would_log("dbg","dkim")) {
if ($sdid eq $author_domain) {
dbg("dkim: %s author domain signature by %s, MATCHES %s %s",
$info, $sdid, $wl, $re);
$info, $sdid, $wl, $welcome_addr);
} else {
dbg("dkim: %s third-party signature by %s, author domain %s, ".
"MATCHES %s %s", $info, $sdid, $author_domain, $wl, $re);
"MATCHES %s %s", $info, $sdid, $author_domain, $wl, $welcome_addr);
}
}
# a defined value indicates at least a match, not necessarily valid
# (this complication servers to preserve logging compatibility)
$any_match_by_wl{$wl} = '' if !exists $any_match_by_wl{$wl};
}
# only valid signature can cause whitelisting
# only valid signature can cause welcomelisting
$matches = 0 if !$valid || $expired || $key_size_weak;
if ($matches) {
@ -1328,4 +1492,7 @@ sub _wlcheck_list {
return ($any_match_at_all, \%any_match_by_wl);
}
# Version features
sub has_arc { 1 }
1;

View File

@ -0,0 +1,360 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
#
# Author: Giovanni Bechis <gbechis@apache.org>
=head1 NAME
Mail::SpamAssassin::Plugin::DMARC - check DMARC policy
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::DMARC
ifplugin Mail::SpamAssassin::Plugin::DMARC
header DMARC_PASS eval:check_dmarc_pass()
describe DMARC_PASS DMARC pass policy
tflags DMARC_PASS net nice
score DMARC_PASS -0.001
header DMARC_REJECT eval:check_dmarc_reject()
describe DMARC_REJECT DMARC reject policy
tflags DMARC_REJECT net
score DMARC_REJECT 0.001
header DMARC_QUAR eval:check_dmarc_quarantine()
describe DMARC_QUAR DMARC quarantine policy
tflags DMARC_QUAR net
score DMARC_QUAR 0.001
header DMARC_NONE eval:check_dmarc_none()
describe DMARC_NONE DMARC none policy
tflags DMARC_NONE net
score DMARC_NONE 0.001
header DMARC_MISSING eval:check_dmarc_missing()
describe DMARC_MISSING Missing DMARC policy
tflags DMARC_MISSING net
score DMARC_MISSING 0.001
endif
=head1 DESCRIPTION
This plugin checks if emails match DMARC policy, the plugin needs both DKIM
and SPF plugins enabled.
=cut
package Mail::SpamAssassin::Plugin::DMARC;
use strict;
use warnings;
use re 'taint';
my $VERSION = 0.2;
use Mail::SpamAssassin;
use Mail::SpamAssassin::Plugin;
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub dbg { my $msg = shift; Mail::SpamAssassin::Logger::dbg("DMARC: $msg", @_); }
sub info { my $msg = shift; Mail::SpamAssassin::Logger::info("DMARC: $msg", @_); }
sub new {
my ($class, $mailsa) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
$self->set_config($mailsa->{conf});
$self->register_eval_rule("check_dmarc_pass", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_dmarc_reject", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_dmarc_quarantine", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_dmarc_none", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_dmarc_missing", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
return $self;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds;
=over 4
=item dmarc_save_reports ( 0 | 1 ) (default: 0)
Store DMARC reports using Mail::DMARC::Store, mail-dmarc.ini must be configured to save and send DMARC reports.
=back
=cut
push(@cmds, {
setting => 'dmarc_save_reports',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
$conf->{parser}->register_commands(\@cmds);
}
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
# Force waiting of SPF and DKIM results
$pms->{dmarc_async_queue} = [];
}
sub _check_eval {
my ($self, $pms, $result) = @_;
if (exists $pms->{dmarc_async_queue}) {
my $rulename = $pms->get_current_eval_rule_name();
push @{$pms->{dmarc_async_queue}}, sub {
if ($result->()) {
$pms->got_hit($rulename, '', ruletype => 'header');
} else {
$pms->rule_ready($rulename);
}
};
return; # return undef for async status
}
$self->_check_dmarc($pms);
# make sure not to return undef, as this is not async anymore
return $result->() || 0;
}
sub check_dmarc_pass {
my ($self, $pms, $name) = @_;
my $result = sub {
defined $pms->{dmarc_result} &&
$pms->{dmarc_result} eq 'pass' &&
$pms->{dmarc_policy} ne 'no policy available';
};
return $self->_check_eval($pms, $result);
}
sub check_dmarc_reject {
my ($self, $pms, $name) = @_;
my $result = sub {
defined $pms->{dmarc_result} &&
$pms->{dmarc_result} eq 'fail' &&
$pms->{dmarc_policy} eq 'reject';
};
return $self->_check_eval($pms, $result);
}
sub check_dmarc_quarantine {
my ($self, $pms, $name) = @_;
my $result = sub {
defined $pms->{dmarc_result} &&
$pms->{dmarc_result} eq 'fail' &&
$pms->{dmarc_policy} eq 'quarantine';
};
return $self->_check_eval($pms, $result);
}
sub check_dmarc_none {
my ($self, $pms, $name) = @_;
my $result = sub {
defined $pms->{dmarc_result} &&
$pms->{dmarc_result} eq 'fail' &&
$pms->{dmarc_policy} eq 'none';
};
return $self->_check_eval($pms, $result);
}
sub check_dmarc_missing {
my ($self, $pms, $name) = @_;
my $result = sub {
defined $pms->{dmarc_result} &&
$pms->{dmarc_policy} eq 'no policy available';
};
return $self->_check_eval($pms, $result);
}
sub check_tick {
my ($self, $opts) = @_;
$self->_check_async_queue($opts->{permsgstatus});
}
sub check_cleanup {
my ($self, $opts) = @_;
# Finish it whether SPF and DKIM is ready or not
$self->_check_async_queue($opts->{permsgstatus}, 1);
}
sub _check_async_queue {
my ($self, $pms, $finish) = @_;
return unless exists $pms->{dmarc_async_queue};
# Check if SPF or DKIM is ready
if ($finish || ($pms->{spf_checked} && $pms->{dkim_checked_signature})) {
$self->_check_dmarc($pms);
$_->() foreach (@{$pms->{dmarc_async_queue}});
# No more async queueing needed. If any evals are called later, they
# will act on the results directly.
delete $pms->{dmarc_async_queue};
}
}
sub _check_dmarc {
my ($self, $pms, $name) = @_;
return unless $pms->is_dns_available();
# Load DMARC module
if (!exists $self->{has_mail_dmarc}) {
my $eval_stat;
eval {
require Mail::DMARC::PurePerl;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
};
if (!defined($eval_stat)) {
dbg("using Mail::DMARC::PurePerl for DMARC checks");
$self->{has_mail_dmarc} = 1;
} else {
dbg("cannot load Mail::DMARC::PurePerl: module: $eval_stat");
dbg("Mail::DMARC::PurePerl is required for DMARC checks, DMARC checks disabled");
$self->{has_mail_dmarc} = undef;
}
}
return if !$self->{has_mail_dmarc};
return if $pms->{dmarc_checked};
$pms->{dmarc_checked} = 1;
my $lasthop = $pms->{relays_external}->[0];
if (!defined $lasthop) {
dbg("no external relay found, skipping DMARC check");
return;
}
my $from_addr = ($pms->get('From:first:addr'))[0];
return if not defined $from_addr;
return if index($from_addr, '@') == -1;
my $mfrom_domain = ($pms->get('EnvelopeFrom:first:addr:host'))[0];
if (!defined $mfrom_domain) {
$mfrom_domain = ($pms->get('From:first:addr:domain'))[0];
return if !defined $mfrom_domain;
dbg("EnvelopeFrom header not found, using From");
}
my $spf_status = 'none';
if ($pms->{spf_pass}) { $spf_status = 'pass'; }
elsif ($pms->{spf_fail}) { $spf_status = 'fail'; }
elsif ($pms->{spf_permerror}) { $spf_status = 'fail'; }
elsif ($pms->{spf_none}) { $spf_status = 'fail'; }
elsif ($pms->{spf_neutral}) { $spf_status = 'neutral'; }
elsif ($pms->{spf_softfail}) { $spf_status = 'softfail'; }
my $spf_helo_status = 'none';
if ($pms->{spf_helo_pass}) { $spf_helo_status = 'pass'; }
elsif ($pms->{spf_helo_fail}) { $spf_helo_status = 'fail'; }
elsif ($pms->{spf_helo_permerror}) { $spf_helo_status = 'fail'; }
elsif ($pms->{spf_helo_none}) { $spf_helo_status = 'fail'; }
elsif ($pms->{spf_helo_neutral}) { $spf_helo_status = 'neutral'; }
elsif ($pms->{spf_helo_softfail}) { $spf_helo_status = 'softfail'; }
my $dmarc = Mail::DMARC::PurePerl->new();
$dmarc->source_ip($lasthop->{ip});
$dmarc->header_from_raw($from_addr);
my $suppl_attrib = $pms->{msg}->{suppl_attrib};
if (defined $suppl_attrib && exists $suppl_attrib->{dkim_signatures}) {
my $dkim_signatures = $suppl_attrib->{dkim_signatures};
foreach my $signature ( @$dkim_signatures ) {
$dmarc->dkim( domain => $signature->domain, result => $signature->result );
dbg("DKIM result for domain " . $signature->domain . ": " . $signature->result);
}
} else {
$dmarc->dkim($pms->{dkim_verifier}) if (ref($pms->{dkim_verifier}));
}
my $result;
eval {
$dmarc->spf([
{
scope => 'mfrom',
domain => $mfrom_domain,
result => $spf_status,
},
{
scope => 'helo',
domain => $lasthop->{lc_helo},
result => $spf_helo_status,
},
]);
$result = $dmarc->validate();
};
if ($@) {
dbg("error while evaluating domain $mfrom_domain: $@");
return;
}
if (defined($pms->{dmarc_result} = $result->result)) {
if ($pms->{conf}->{dmarc_save_reports}) {
my $rua = eval { $result->published()->rua(); };
if (defined $rua && index($rua, 'mailto:') >= 0) {
eval { $dmarc->save_aggregate(); };
if ($@) {
info("report could not be saved: $@");
} else {
dbg("report will be sent to $rua");
}
}
}
if (defined $result->reason->[0]{comment} &&
$result->reason->[0]{comment} eq 'too many policies') {
dbg("result: no policy available (too many policies)");
$pms->{dmarc_policy} = 'no policy available';
} elsif ($result->result eq 'pass') {
dbg("result: pass");
$pms->{dmarc_policy} = $result->published->p;
} elsif ($result->result ne 'none') {
dbg("result: $result->{result}, disposition: $result->{disposition}, dkim: $result->{dkim}, spf: $result->{spf} (spf: $spf_status, spf_helo: $spf_helo_status)");
$pms->{dmarc_policy} = $result->disposition;
} else {
dbg("result: no policy available");
$pms->{dmarc_policy} = 'no policy available';
}
}
}
1;

View File

@ -45,7 +45,7 @@ package Mail::SpamAssassin::Plugin::DNSEval;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:ip);
use Mail::SpamAssassin::Util qw(reverse_ip_address is_fqdn_valid);
use Mail::SpamAssassin::Util qw(reverse_ip_address idn_to_ascii compile_regexp is_fqdn_valid);
use strict;
use warnings;
@ -55,7 +55,6 @@ use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
my $IP_ADDRESS = IP_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
# constructor: register the eval rule
sub new {
@ -75,7 +74,6 @@ sub new {
'check_rbl_ns_from',
'check_rbl_txt',
'check_rbl_sub',
'check_rbl_results_for',
'check_rbl_from_host',
'check_rbl_from_domain',
'check_rbl_envfrom',
@ -86,7 +84,7 @@ sub new {
$self->set_config($mailsaobject->{conf});
foreach(@{$self->{'evalrules'}}) {
$self->register_eval_rule($_);
$self->register_eval_rule($_, $Mail::SpamAssassin::Conf::TYPE_RBL_EVALS);
}
return $self;
@ -154,21 +152,104 @@ sub set_config {
# directly as part of PMS
sub check_start {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
foreach(@{$self->{'evalrules'}}) {
$opts->{'permsgstatus'}->register_plugin_eval_glue($_);
$pms->register_plugin_eval_glue($_);
}
# Initialize check_rbl_sub tests
$self->_init_rbl_subs($pms);
}
sub _init_rbl_subs {
my ($self, $pms) = @_;
my $conf = $pms->{conf};
# Very hacky stuff and direct rbl_evals usage for now, TODO rewrite everything
foreach my $rule (@{$conf->{eval_to_rule}->{check_rbl_sub}||[]}) {
next if !exists $conf->{rbl_evals}->{$rule};
next if !$conf->{scores}->{$rule};
# rbl_evals is [$function,[@args]]
my $args = $conf->{rbl_evals}->{$rule}->[1];
my ($set, $subtest) = @$args;
if (!defined $subtest) {
warn("dnseval: missing subtest for rule $rule\n");
next;
}
if ($subtest =~ /^sb:/) {
warn("dnseval: ignored $rule, SenderBase rules are deprecated\n");
next;
}
# Compile as regex if not pure ip/bitmask (same check in process_dnsbl_result)
if ($subtest !~ /^\d+(?:\.\d+\.\d+\.\d+)?$/) {
my ($rec, $err) = compile_regexp($subtest, 0);
if (!$rec) {
warn("dnseval: invalid rule $rule subtest regexp '$subtest': $err\n");
next;
}
$subtest = $rec;
}
dbg("dnseval: initialize check_rbl_sub for rule $rule, set $set, subtest $subtest");
push @{$pms->{rbl_subs}{$set}}, [$subtest, $rule];
}
}
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
return 1 if $self->{main}->{conf}->{skip_rbl_checks};
return 1 if !$pms->is_dns_available();
# Process relaylists only once, not everytime in check_rbl_backend
#
# ok, make a list of all the IPs in the untrusted set
my @fullips = map { $_->{ip} } @{$pms->{relays_untrusted}};
# now, make a list of all the IPs in the external set, for use in
# notfirsthop testing. This will often be more IPs than found
# in @fullips. It includes the IPs that are trusted, but
# not in internal_networks.
my @fullexternal = map {
(!$_->{internal}) ? ($_->{ip}) : ()
} @{$pms->{relays_trusted}};
push @fullexternal, @fullips; # add untrusted set too
# Make sure a header significantly improves results before adding here
# X-Sender-Ip: could be worth using (very low occurence for me)
# X-Sender: has a very low bang-for-buck for me
my @originating;
foreach my $header (@{$pms->{conf}->{originating_ip_headers}}) {
my $str = $pms->get($header, undef);
next unless defined $str && $str ne '';
push @originating, ($str =~ m/($IP_ADDRESS)/g);
}
# Let's go ahead and trim away all private ips (KLC)
# also uniq the list and strip dups. (jm)
my @ips = $self->ip_list_uniq_and_strip_private(@fullips);
# if there's no untrusted IPs, it means we trust all the open-internet
# relays, so we skip checks
if (scalar @ips + scalar @originating > 0) {
dbg("dnseval: IPs found: full-external: ".join(", ", @fullexternal).
" untrusted: ".join(", ", @ips).
" originating: ".join(", ", @originating));
@{$pms->{dnseval_fullexternal}} = @fullexternal;
@{$pms->{dnseval_ips}} = @ips;
@{$pms->{dnseval_originating}} = @originating;
}
return 1;
}
sub ip_list_uniq_and_strip_private {
my ($self, @origips) = @_;
my @ips;
my %seen;
my $IP_PRIVATE = IP_PRIVATE;
foreach my $ip (@origips) {
next unless $ip;
next if (exists ($seen{$ip})); $seen{$ip} = 1;
next if ($ip =~ /$IP_PRIVATE/o);
next if exists $seen{$ip};
$seen{$ip} = 1;
next if $ip =~ IS_IP_PRIVATE;
push(@ips, $ip);
}
return @ips;
@ -181,12 +262,17 @@ sub ip_list_uniq_and_strip_private {
sub check_rbl_accreditor {
my ($self, $pms, $rule, $set, $rbl_server, $subtest, $accreditor) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
if (!defined $pms->{accreditor_tag}) {
$self->message_accreditor_tag($pms);
}
if ($pms->{accreditor_tag}->{$accreditor}) {
$self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
# return undef for async status
return $self->_check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
}
return 0;
}
@ -226,58 +312,16 @@ sub message_accreditor_tag {
$pms->{accreditor_tag} = \%acctags;
}
sub check_rbl_backend {
sub _check_rbl_backend {
my ($self, $pms, $rule, $set, $rbl_server, $type, $subtest) = @_;
local ($_);
# First check that DNS is available, if not do not perform this check
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
return if !exists $pms->{dnseval_ips}; # no untrusted ips
if (($rbl_server !~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) &&
(index($rbl_server, '.') >= 0) &&
($rbl_server !~ /\.$/)) {
$rbl_server .= ".";
}
dbg("dns: checking RBL $rbl_server, set $set");
# ok, make a list of all the IPs in the untrusted set
my @fullips = map { $_->{ip} } @{$pms->{relays_untrusted}};
# now, make a list of all the IPs in the external set, for use in
# notfirsthop testing. This will often be more IPs than found
# in @fullips. It includes the IPs that are trusted, but
# not in internal_networks.
my @fullexternal = map {
(!$_->{internal}) ? ($_->{ip}) : ()
} @{$pms->{relays_trusted}};
push (@fullexternal, @fullips); # add untrusted set too
# Make sure a header significantly improves results before adding here
# X-Sender-Ip: could be worth using (very low occurence for me)
# X-Sender: has a very low bang-for-buck for me
my $IP_ADDRESS = IP_ADDRESS;
my @originating;
for my $header (@{$pms->{conf}->{originating_ip_headers}}) {
my $str = $pms->get($header,undef);
next unless defined $str && $str ne '';
push (@originating, ($str =~ m/($IP_ADDRESS)/g));
}
# Let's go ahead and trim away all private ips (KLC)
# also uniq the list and strip dups. (jm)
my @ips = $self->ip_list_uniq_and_strip_private(@fullips);
# if there's no untrusted IPs, it means we trust all the open-internet
# relays, so we can return right now.
return 0 unless (scalar @ips + scalar @originating > 0);
dbg("dns: IPs found: full-external: ".join(", ", @fullexternal).
" untrusted: ".join(", ", @ips).
" originating: ".join(", ", @originating));
$rbl_server =~ s/\.+\z//; # strip unneeded trailing dot
dbg("dnseval: checking RBL $rbl_server, set $set, rule $rule");
my $trusted = $self->{main}->{conf}->{trusted_networks};
my @ips = @{$pms->{dnseval_ips}};
# If name is foo-notfirsthop, check all addresses except for
# the originating one. Suitable for use with dialup lists, like the PDL.
@ -293,9 +337,9 @@ sub check_rbl_backend {
# specified some third-party relays as trusted. Also, don't use
# @originating; those headers are added by a phase of relaying through
# a server like Hotmail, which is not going to be in dialup lists anyway.
@ips = $self->ip_list_uniq_and_strip_private(@fullexternal);
@ips = $self->ip_list_uniq_and_strip_private(@{$pms->{dnseval_fullexternal}});
if ($1 eq "lastexternal") {
@ips = (defined $ips[0]) ? ($ips[0]) : ();
@ips = defined $ips[0] ? ($ips[0]) : ();
} else {
pop @ips if (scalar @ips > 1);
}
@ -307,14 +351,14 @@ sub check_rbl_backend {
elsif ($set =~ /-(first|un)trusted$/)
{
my @tips;
foreach my $ip (@originating) {
foreach my $ip (@{$pms->{dnseval_originating}}) {
if ($ip && !$trusted->contains_ip($ip)) {
push(@tips, $ip);
}
}
@ips = $self->ip_list_uniq_and_strip_private (@ips, @tips);
@ips = $self->ip_list_uniq_and_strip_private(@ips, @tips);
if ($1 eq "first") {
@ips = (defined $ips[0]) ? ($ips[0]) : ();
@ips = defined $ips[0] ? ($ips[0]) : ();
} else {
shift @ips;
}
@ -322,7 +366,7 @@ sub check_rbl_backend {
else
{
my @tips;
foreach my $ip (@originating) {
foreach my $ip (@{$pms->{dnseval_originating}}) {
if ($ip && !$trusted->contains_ip($ip)) {
push(@tips, $ip);
}
@ -333,71 +377,77 @@ sub check_rbl_backend {
}
# How many IPs max you check in the received lines
my $checklast=$self->{main}->{conf}->{num_check_received};
my $checklast = $self->{main}->{conf}->{num_check_received};
if (scalar @ips > $checklast) {
splice (@ips, $checklast); # remove all others
}
my $tflags = $pms->{conf}->{tflags}->{$rule};
# Trusted relays should only be checked against nice rules (dnswls)
if (defined $tflags && $tflags !~ /\bnice\b/) {
if (($pms->{conf}->{tflags}->{$rule}||'') !~ /\bnice\b/) {
# remove trusted hosts from beginning
while (@ips && $trusted->contains_ip($ips[0])) { shift @ips }
}
unless (scalar @ips > 0) {
dbg("dns: no untrusted IPs to check");
dbg("dnseval: no untrusted IPs to check");
return 0;
}
dbg("dns: only inspecting the following IPs: ".join(", ", @ips));
dbg("dnseval: only inspecting the following IPs: ".join(", ", @ips));
eval {
my $queries;
foreach my $ip (@ips) {
my $revip = reverse_ip_address($ip);
$pms->do_rbl_lookup($rule, $set, $type,
$revip.'.'.$rbl_server, $subtest) if defined $revip;
if (defined(my $revip = reverse_ip_address($ip))) {
my $ret = $pms->do_rbl_lookup($rule, $set, $type, $revip.'.'.$rbl_server, $subtest);
$queries++ if defined $ret;
}
}
};
# note that results are not handled here, hits are handled directly
# as DNS responses are harvested
return 0;
return 0 if !$queries; # no query started
return; # return undef for async status
}
sub check_rbl {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
$self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
# return undef for async status
return $self->_check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
}
sub check_rbl_txt {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
$self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'TXT', $subtest);
}
# run for first message
sub check_rbl_sub {
my ($self, $pms, $rule, $set, $subtest) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
return 0 if !$pms->is_dns_available();
$pms->register_rbl_subtest($rule, $set, $subtest);
# return undef for async status
return $self->_check_rbl_backend($pms, $rule, $set, $rbl_server, 'TXT', $subtest);
}
# backward compatibility
sub check_rbl_results_for {
#warn "dns: check_rbl_results_for() is deprecated, use check_rbl_sub()\n";
check_rbl_sub(@_);
sub check_rbl_sub {
my ($self, $pms, $rule, $set, $subtest) = @_;
# just a dummy, _init_rbl_subs/do_rbl_lookup handles the subs
return; # return undef for async status
}
# this only checks the address host name and not the domain name because
# using the domain name had much worse results for dsn.rfc-ignorant.org
sub check_rbl_from_host {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
_check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $pms->all_from_addrs());
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
# return undef for async status
return $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
$subtest, $pms->all_from_addrs());
}
sub check_rbl_headers {
@ -415,16 +465,18 @@ sub check_rbl_headers {
@env_hdr = split(/,/, $conf->{rbl_headers});
}
my $queries;
foreach my $rbl_headers (@env_hdr) {
my $addr = $pms->get($rbl_headers.':addr', undef);
if ( defined $addr && $addr =~ /\@([^\@\s]+)/ ) {
$self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
my $ret = $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
$subtest, $addr);
$queries++ if defined $ret;
} else {
my $unsplitted_host = $pms->get($rbl_headers);
chomp($unsplitted_host);
foreach my $host (split(/\n/, $unsplitted_host)) {
if($host =~ /^$IP_ADDRESS$/ ) {
if ($host =~ IS_IP_ADDRESS) {
next if ($conf->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
$host = reverse_ip_address($host);
} else {
@ -432,17 +484,23 @@ sub check_rbl_headers {
next unless is_fqdn_valid($host);
next unless $pms->{main}->{registryboundaries}->is_domain_valid($host);
}
$pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
my $ret = $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
$queries++ if defined $ret;
}
}
}
return 0 if !$queries; # no query started
return; # return undef for async status
}
=over 4
=item check_rbl_from_domain
This checks all the from addrs domain names as an alternate to check_rbl_from_host. As of v3.4.1, it has been improved to include a subtest for a specific octet.
This checks all the from addrs domain names as an alternate to
check_rbl_from_host. As of v3.4.1, it has been improved to include a
subtest for a specific octet.
=back
@ -450,9 +508,14 @@ This checks all the from addrs domain names as an alternate to check_rbl_from_ho
sub check_rbl_from_domain {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
_check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $pms->all_from_addrs_domains());
}
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
# return undef for async status
return $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
$subtest, $pms->all_from_addrs_domains());
}
=over 4
=item check_rbl_ns_from
@ -472,6 +535,7 @@ sub check_rbl_ns_from {
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
dbg("dnseval: EnvelopeFrom header not found") unless defined (($pms->get("EnvelopeFrom:addr"))[0]);
for my $from ($pms->get('EnvelopeFrom:addr')) {
next unless defined $from;
$from =~ tr/././s; # bug 3366
@ -482,20 +546,20 @@ sub check_rbl_ns_from {
}
return 0 unless defined $domain;
dbg("dns: checking NS for host $domain");
dbg("dnseval: checking NS for host $domain");
my $key = "NS:" . $domain;
my $obj = { dom => $domain, rule => $rule, set => $set, rbl_server => $rbl_server, subtest => $subtest };
my $ent = {
rulename => $rule, key => $key, zone => $domain, obj => $obj, type => "URI-NS",
rulename => $rule, zone => $domain, obj => $obj, type => "URI-NS",
};
# dig $dom ns
$ent = $pms->{async}->bgsend_and_start_lookup(
my $ret = $pms->{async}->bgsend_and_start_lookup(
$domain, 'NS', undef, $ent,
sub { my ($ent2,$pkt) = @_;
$self->complete_ns_lookup($pms, $ent2, $pkt, $domain) },
master_deadline => $pms->{master_deadline} );
return $ent;
return 0 if !defined $ret; # no query started
return; # return undef for async status
}
sub complete_ns_lookup {
@ -508,11 +572,11 @@ sub complete_ns_lookup {
if (!$pkt) {
# $pkt will be undef if the DNS query was aborted (e.g. timed out)
dbg("DNSEval: complete_ns_lookup aborted %s", $ent->{key});
dbg("dnseval: complete_ns_lookup aborted %s", $ent->{key});
return;
}
dbg("DNSEval: complete_ns_lookup %s", $ent->{key});
dbg("dnseval: complete_ns_lookup %s", $ent->{key});
my @ns = $pkt->authority;
foreach my $rr (@ns) {
@ -521,9 +585,9 @@ sub complete_ns_lookup {
chomp($nshost);
if (is_fqdn_valid($nshost)) {
if ( defined $subtest ) {
dbg("dns: checking [$nshost] / $rule / $set / $rbl_server / $subtest");
dbg("dnseval: checking [$nshost] / $rule / $set / $rbl_server / $subtest");
} else {
dbg("dns: checking [$nshost] / $rule / $set / $rbl_server");
dbg("dnseval: checking [$nshost] / $rule / $set / $rbl_server");
}
$pms->do_rbl_lookup($rule, $set, 'A',
"$nshost.$rbl_server", $subtest);
@ -559,10 +623,11 @@ sub check_rbl_rcvd {
}
}
my $queries;
foreach my $host ( @udnsrcvd ) {
if((defined $host) and ($host ne "")) {
chomp($host);
if($host =~ /^$IP_ADDRESS$/ ) {
if ($host =~ IS_IP_ADDRESS) {
next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
$host = reverse_ip_address($host);
} else {
@ -572,49 +637,55 @@ sub check_rbl_rcvd {
next unless $pms->{main}->{registryboundaries}->is_domain_valid($host);
}
if ( defined $subtest ) {
dbg("dns: checking [$host] / $rule / $set / $rbl_server / $subtest");
dbg("dnseval: checking [$host] / $rule / $set / $rbl_server / $subtest");
} else {
dbg("dns: checking [$host] / $rule / $set / $rbl_server");
dbg("dnseval: checking [$host] / $rule / $set / $rbl_server");
}
$pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
my $ret = $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
$queries++ if defined $ret;
}
}
return 0;
return 0 if !$queries; # no query started
return; # return undef for async status
}
# this only checks the address host name and not the domain name because
# using the domain name had much worse results for dsn.rfc-ignorant.org
sub check_rbl_envfrom {
my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
_check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $pms->get('EnvelopeFrom:addr',undef));
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 if !$pms->is_dns_available();
# return undef for async status
return $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
$subtest, $pms->get('EnvelopeFrom:addr',undef));
}
sub _check_rbl_addresses {
my ($self, $pms, $rule, $set, $rbl_server, $subtest, @addresses) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
$rbl_server =~ s/\.+\z//; # strip unneeded trailing dot
my %hosts;
for (@addresses) {
next if !defined($_) || !/ \@ ( [^\@\s]+ )/x;
next if !defined($_) || !/\@([^\@\s]+)/;
my $address = $1;
# strip leading & trailing dots (as seen in some e-mail addresses)
$address =~ s/^\.+//; $address =~ s/\.+\z//;
$address =~ s/^\.+//;
$address =~ s/\.+\z//;
# squash duplicate dots to avoid an invalid DNS query with a null label
$address =~ tr/.//s;
$hosts{lc($address)} = 1 if $address =~ /\./; # must by a FQDN
# Also checks it's FQDN
if ($address =~ tr/.//s) {
$hosts{lc($address)} = 1;
}
}
return unless scalar keys %hosts;
if (($rbl_server !~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) &&
(index($rbl_server, '.') >= 0) &&
($rbl_server !~ /\.$/)) {
$rbl_server .= ".";
}
my $queries;
for my $host (keys %hosts) {
if ($host =~ /^$IP_ADDRESS$/) {
if ($host =~ IS_IP_ADDRESS) {
next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
$host = reverse_ip_address($host);
} else {
@ -622,23 +693,26 @@ sub _check_rbl_addresses {
next unless is_fqdn_valid($host);
next unless $pms->{main}->{registryboundaries}->is_domain_valid($host);
}
dbg("dns: checking [$host] / $rule / $set / $rbl_server");
$pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
dbg("dnseval: checking [$host] / $rule / $set / $rbl_server");
my $ret = $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
$queries++ if defined $ret;
}
return 0 if !$queries; # no async
return; # return undef for async status
}
sub check_dns_sender {
my ($self, $pms, $rule) = @_;
return 0 if $self->{main}->{conf}->{skip_rbl_checks};
return 0 unless $pms->is_dns_available();
return 0 if !$pms->is_dns_available();
my $host;
for my $from ($pms->get('EnvelopeFrom:addr',undef)) {
foreach my $from ($pms->get('EnvelopeFrom:addr', undef)) {
next unless defined $from;
$from =~ tr/././s; # bug 3366
if ($from =~ m/ \@ ( [^\@\s]+ \. [^\@\s]+ )/x ) {
$from =~ tr/.//s; # bug 3366
if ($from =~ m/\@([^\@\s]+\.[^\@\s]+)/) {
$host = lc($1);
last;
}
@ -650,15 +724,45 @@ sub check_dns_sender {
return 0;
}
dbg("dns: checking A and MX for host $host");
$host = idn_to_ascii($host);
dbg("dnseval: checking A and MX for host $host");
$pms->do_dns_lookup($rule, 'A', $host);
$pms->do_dns_lookup($rule, 'MX', $host);
my $queries;
my $ret = $self->do_sender_lookup($pms, $rule, 'A', $host);
$queries++ if defined $ret;
$ret = $self->do_sender_lookup($pms, $rule, 'MX', $host);
$queries++ if defined $ret;
# cache name of host for later checking
$pms->{sender_host} = $host;
return 0 if !$queries; # no query started
return; # return undef for async status
}
return 0;
sub do_sender_lookup {
my ($self, $pms, $rule, $type, $host) = @_;
my $ent = {
rulename => $rule,
type => "DNSBL-Sender",
};
return $pms->{async}->bgsend_and_start_lookup(
$host, $type, undef, $ent, sub {
my ($ent, $pkt) = @_;
return if !$pkt; # aborted / timed out
$pms->rule_ready($ent->{rulename}); # mark as run, could still hit
foreach my $answer ($pkt->answer) {
next if !$answer;
next if $answer->type ne 'A' && $answer->type ne 'MX';
if ($pkt->header->rcode eq 'NXDOMAIN' ||
$pkt->header->rcode eq 'SERVFAIL')
{
if (++$pms->{sender_host_fail} == 2) {
$pms->got_hit($ent->{rulename}, "DNS: ", ruletype => "dns");
}
}
}
},
master_deadline => $self->{master_deadline},
);
}
# capability checks for "if can(Mail::SpamAssassin::Plugin::DNSEval::XXX)":

View File

@ -0,0 +1,944 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
DecodeShortURLs - Check for shortened URLs
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::DecodeShortURLs
url_shortener tinyurl.com
url_shortener_get bit.ly
body HAS_SHORT_URL eval:short_url()
describe HAS_SHORT_URL Message has one or more shortened URLs
body SHORT_URL_REDIR eval:short_url_redir()
describe SHORT_URL_REDIR Message has shortened URL that resulted in a valid redirection
body SHORT_URL_CHAINED eval:short_url_chained()
describe SHORT_URL_CHAINED Message has shortened URL chained to other shorteners
body SHORT_URL_MAXCHAIN eval:short_url_maxchain()
describe SHORT_URL_MAXCHAIN Message has shortened URL that causes too many redirections
body SHORT_URL_LOOP eval:short_url_loop()
describe SHORT_URL_LOOP Message has short URL that loops back to itself
body SHORT_URL_200 eval:short_url_code('200') # Can check any non-redirect HTTP code
describe SHORT_URL_200 Message has shortened URL returning HTTP 200
body SHORT_URL_404 eval:short_url_code('404') # Can check any non-redirect HTTP code
describe SHORT_URL_404 Message has shortened URL returning HTTP 404
uri URI_TINYURL_BLOCKED m,https://tinyurl\.com/app/nospam,
describe URI_TINYURL_BLOCKED Message contains a tinyurl that has been disabled due to abuse
uri URI_BITLY_BLOCKED m,^https://bitly\.com/a/blocked,
describe URI_BITLY_BLOCKED Message contains a bit.ly URL that has been disabled due to abuse
=head1 DESCRIPTION
This plugin looks for URLs shortened by a list of URL shortening services.
Upon finding a matching URL, plugin will send a HTTP request to the
shortening service and retrieve the Location-header which points to the
actual shortened URL. It then adds this URL to the list of URIs extracted
by SpamAssassin which can then be accessed by uri rules and plugins such as
URIDNSBL.
This plugin will follow chained redirections, where a short URL redirects to
another short URL. Redirection depth limit can be set with
C<max_short_url_redirections>.
Maximum of C<max_short_urls> short URLs are checked in a message (10 by
default). Setting it to 0 disables HTTP requests, allowing only short_url()
test to work and report found shorteners.
All supported rule types for checking short URLs and redirection status are
documented in L<SYNOPSIS> section.
=head1 NOTES
This plugin runs at the check_dnsbl hook (priority -100) so that it may
modify the parsed URI list prior to normal uri rules or the URIDNSBL plugin.
=cut
package Mail::SpamAssassin::Plugin::DecodeShortURLs;
use Mail::SpamAssassin::Plugin;
use strict;
use warnings;
use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);
my $VERSION = 4.00;
use constant HAS_LWP_USERAGENT => eval { require LWP::UserAgent; };
sub dbg { my $msg = shift; return Mail::SpamAssassin::Logger::dbg("DecodeShortURLs: $msg", @_); }
sub info { my $msg = shift; return Mail::SpamAssassin::Logger::info("DecodeShortURLs: $msg", @_); }
sub new {
my $class = shift;
my $mailsaobject = shift;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
if ($mailsaobject->{local_tests_only}) {
dbg("local tests only, disabling HTTP requests");
$self->{net_disabled} = 1;
}
elsif (!HAS_LWP_USERAGENT) {
dbg("module LWP::UserAgent not installed, disabling HTTP requests");
$self->{net_disabled} = 1;
}
$self->set_config($mailsaobject->{conf});
$self->register_method_priority ('check_dnsbl', -10);
$self->register_eval_rule('short_url', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_redir', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_200', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_404', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_code', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_chained', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_maxchain', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_loop', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule('short_url_tests'); # for legacy plugin compatibility warning
return $self;
}
=head1 PRIVILEGED SETTINGS
=over 4
=item url_shortener domain [domain...] (default: none)
Domains that should be considered as an URL shortener. If the domain begins
with a '.', 3rd level tld of the main domain will be checked.
Example:
url_shortener tinyurl.com
url_shortener .page.link
=back
=over 4
=item url_shortener_get domain [domain...] (default: none)
Alias to C<url_shortener>. HTTP request will be done with GET method,
instead of default HEAD. Required for some services like bit.ly to return
blocked URL correctly.
Example:
url_shortener_get bit.ly
=back
=cut
sub set_config {
my($self, $conf) = @_;
my @cmds = ();
push (@cmds, {
setting => 'url_shortener',
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
foreach my $domain (split(/\s+/, $value)) {
$self->{url_shortener}->{lc $domain} = 1; # 1 == head
}
}
});
push (@cmds, {
setting => 'url_shortener_get',
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
foreach my $domain (split(/\s+/, $value)) {
$self->{url_shortener}->{lc $domain} = 2; # 2 == get
}
}
});
=over 4
=item clear_url_shortener [domain] [domain...]
Clear configured url_shortener and url_shortener_get domains, for example to
override default settings from an update channel. If domains are specified,
then only those are removed from list.
=back
=cut
push (@cmds, {
setting => 'clear_url_shortener',
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
$self->{url_shortener} = {};
} else {
foreach my $domain (split(/\s+/, $value)) {
delete $self->{url_shortener}->{lc $domain};
}
}
}
});
=over 4
=item url_shortener_cache_type (default: none)
The cache type that is being utilized. Currently only supported value is
C<dbi> that implies C<url_shortener_cache_dsn> is a DBI connect string.
DBI module is required.
Example:
url_shortener_cache_type dbi
=back
=cut
push (@cmds, {
setting => 'url_shortener_cache_type',
default => '',
is_priv => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=over 4
=item url_shortener_cache_dsn (default: none)
The DBI dsn of the database to use.
For SQLite, the database will be created automatically if it does not
already exist, the supplied path and file must be read/writable by the
user running spamassassin or spamd.
For MySQL/MariaDB or PostgreSQL, see sql-directory for database table
creation clauses.
You will need to have the proper DBI module for your database. For example
DBD::SQLite, DBD::mysql, DBD::MariaDB or DBD::Pg.
Minimum required SQLite version is 3.24.0 (available from DBD::SQLite 1.59_01).
Examples:
url_shortener_cache_dsn dbi:SQLite:dbname=/var/lib/spamassassin/DecodeShortURLs.db
=back
=cut
push (@cmds, {
setting => 'url_shortener_cache_dsn',
default => '',
is_priv => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=over 4
=item url_shortener_cache_username (default: none)
The username that should be used to connect to the database. Not used for
SQLite.
=back
=cut
push (@cmds, {
setting => 'url_shortener_cache_username',
default => '',
is_priv => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=over 4
=item url_shortener_cache_password (default: none)
The password that should be used to connect to the database. Not used for
SQLite.
=back
=cut
push (@cmds, {
setting => 'url_shortener_cache_password',
default => '',
is_priv => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
=over 4
=item url_shortener_cache_ttl (default: 86400)
The length of time a cache entry will be valid for in seconds.
Default is 86400 (1 day).
See C<url_shortener_cache_autoclean> for database cleaning.
=back
=cut
push (@cmds, {
setting => 'url_shortener_cache_ttl',
is_admin => 1,
default => 86400,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=over 4
=item url_shortener_cache_autoclean (default: 1000)
Automatically purge old entries from database. Value describes a random run
chance of 1/x. The default value of 1000 means that cleaning is run
approximately once for every 1000 messages processed. Value of 1 would mean
database is cleaned every time a message is processed.
Set 0 to disable automatic cleaning and to do it manually.
=back
=cut
push (@cmds, {
setting => 'url_shortener_cache_autoclean',
is_admin => 1,
default => 1000,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=over 4
=item url_shortener_loginfo (default: 0 (off))
If this option is enabled (set to 1), then short URLs and the decoded URLs will be logged with info priority.
=back
=cut
push (@cmds, {
setting => 'url_shortener_loginfo',
is_admin => 1,
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
=over 4
=item url_shortener_timeout (default: 5)
Maximum time a short URL HTTP request can take, in seconds.
=back
=cut
push (@cmds, {
setting => 'url_shortener_timeout',
is_admin => 1,
default => 5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=over 4
=item max_short_urls (default: 10)
Maximum amount of short URLs that will be looked up per message. Chained
redirections are not counted, only initial short URLs found.
Setting it to 0 disables HTTP requests, allowing only short_url() test to
work and report any found shortener URLs.
=back
=cut
push (@cmds, {
setting => 'max_short_urls',
is_admin => 1,
default => 10,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=over 4
=item max_short_url_redirections (default: 10)
Maximum depth of chained redirections that a short URL can generate.
=back
=cut
push (@cmds, {
setting => 'max_short_url_redirections',
is_admin => 1,
default => 10,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=over 4
=item url_shortener_user_agent (default: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36)
Set User-Agent header for HTTP requests. Some services require it to look
like a common browser.
=back
=cut
push (@cmds, {
setting => 'url_shortener_user_agent',
is_admin => 1,
default => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
});
$conf->{parser}->register_commands(\@cmds);
}
=head1 ACKNOWLEDGEMENTS
Original DecodeShortURLs plugin was developed by Steve Freegard.
=cut
sub short_url_tests {
# Legacy compatibility warning done in finish_parsing_start
return 0;
}
sub finish_parsing_start {
my ($self, $opts) = @_;
if ($opts->{conf}->{eval_to_rule}->{short_url_tests}) {
warn "DecodeShortURLs: Legacy configuration format detected. ".
"Eval function short_url_tests() is no longer supported, ".
"please see documentation for the new rule format.\n";
}
}
sub initialise_url_shortener_cache {
my ($self, $conf) = @_;
return if $self->{dbh};
return if !$conf->{url_shortener_cache_type};
if (!$conf->{url_shortener_cache_dsn}) {
warn "DecodeShortURLs: invalid cache configuration\n";
return;
}
##
## SQLite
##
if ($conf->{url_shortener_cache_type} =~ /^(?:dbi|sqlite)$/i
&& $conf->{url_shortener_cache_dsn} =~ /^dbi:SQLite/)
{
eval {
local $SIG{'__DIE__'};
require DBI;
require DBD::SQLite;
DBD::SQLite->VERSION(1.59_01); # Required for ON CONFLICT
$self->{dbh} = DBI->connect_cached(
$conf->{url_shortener_cache_dsn}, '', '',
{RaiseError => 1, PrintError => 0, InactiveDestroy => 1, AutoCommit => 1}
);
$self->{dbh}->do("
CREATE TABLE IF NOT EXISTS short_url_cache (
short_url TEXT PRIMARY KEY NOT NULL,
decoded_url TEXT NOT NULL,
hits INTEGER NOT NULL DEFAULT 1,
created INTEGER NOT NULL,
modified INTEGER NOT NULL
)
");
# Maintaining index for cleaning is likely more expensive than occasional full table scan
#$self->{dbh}->do("
# CREATE INDEX IF NOT EXISTS short_url_modified
# ON short_url_cache(created)
#");
$self->{sth_insert} = $self->{dbh}->prepare("
INSERT INTO short_url_cache (short_url, decoded_url, created, modified)
VALUES (?,?,strftime('%s','now'),strftime('%s','now'))
ON CONFLICT(short_url) DO UPDATE
SET decoded_url = excluded.decoded_url,
modified = excluded.modified,
hits = hits + 1
");
$self->{sth_select} = $self->{dbh}->prepare("
SELECT decoded_url FROM short_url_cache
WHERE short_url = ?
");
$self->{sth_delete} = $self->{dbh}->prepare("
DELETE FROM short_url_cache
WHERE short_url = ? AND created < strftime('%s','now') - $conf->{url_shortener_cache_ttl}
");
$self->{sth_clean} = $self->{dbh}->prepare("
DELETE FROM short_url_cache
WHERE created < strftime('%s','now') - $conf->{url_shortener_cache_ttl}
");
};
}
##
## MySQL/MariaDB
##
elsif (lc $conf->{url_shortener_cache_type} eq 'dbi'
&& $conf->{url_shortener_cache_dsn} =~ /^dbi:(?:mysql|MariaDB)/i)
{
eval {
local $SIG{'__DIE__'};
require DBI;
$self->{dbh} = DBI->connect_cached(
$conf->{url_shortener_cache_dsn},
$conf->{url_shortener_cache_username},
$conf->{url_shortener_cache_password},
{RaiseError => 1, PrintError => 0, InactiveDestroy => 1, AutoCommit => 1}
);
$self->{sth_insert} = $self->{dbh}->prepare("
INSERT INTO short_url_cache (short_url, decoded_url, created, modified)
VALUES (?,?,UNIX_TIMESTAMP(),UNIX_TIMESTAMP())
ON DUPLICATE KEY UPDATE
decoded_url = VALUES(decoded_url),
modified = VALUES(modified),
hits = hits + 1
");
$self->{sth_select} = $self->{dbh}->prepare("
SELECT decoded_url FROM short_url_cache
WHERE short_url = ?
");
$self->{sth_delete} = $self->{dbh}->prepare("
DELETE FROM short_url_cache
WHERE short_url = ? AND created < UNIX_TIMESTAMP() - $conf->{url_shortener_cache_ttl}
");
$self->{sth_clean} = $self->{dbh}->prepare("
DELETE FROM short_url_cache
WHERE created < UNIX_TIMESTAMP() - $conf->{url_shortener_cache_ttl}
");
};
}
##
## PostgreSQL
##
elsif (lc $conf->{url_shortener_cache_type} eq 'dbi'
&& $conf->{url_shortener_cache_dsn} =~ /^dbi:Pg/i)
{
eval {
local $SIG{'__DIE__'};
require DBI;
$self->{dbh} = DBI->connect_cached(
$conf->{url_shortener_cache_dsn},
$conf->{url_shortener_cache_username},
$conf->{url_shortener_cache_password},
{RaiseError => 1, PrintError => 0, InactiveDestroy => 1, AutoCommit => 1}
);
$self->{sth_insert} = $self->{dbh}->prepare("
INSERT INTO short_url_cache (short_url, decoded_url, created, modified)
VALUES (?,?,CAST(EXTRACT(epoch FROM NOW()) AS INT),CAST(EXTRACT(epoch FROM NOW()) AS INT))
ON CONFLICT (short_url) DO UPDATE SET
decoded_url = EXCLUDED.decoded_url,
modified = EXCLUDED.modified,
hits = short_url_cache.hits + 1
");
$self->{sth_select} = $self->{dbh}->prepare("
SELECT decoded_url FROM short_url_cache
WHERE short_url = ?
");
$self->{sth_delete} = $self->{dbh}->prepare("
DELETE FROM short_url_cache
WHERE short_url ? = AND created < CAST(EXTRACT(epoch FROM NOW()) AS INT) - $conf->{url_shortener_cache_ttl}
");
$self->{sth_clean} = $self->{dbh}->prepare("
DELETE FROM short_url_cache
WHERE created < CAST(EXTRACT(epoch FROM NOW()) AS INT) - $conf->{url_shortener_cache_ttl}
");
};
##
## ...
##
} else {
warn "DecodeShortURLs: invalid cache configuration\n";
return;
}
if ($@ || !$self->{sth_clean}) {
warn "DecodeShortURLs: cache connect failed: $@\n";
undef $self->{dbh};
undef $self->{sth_insert};
undef $self->{sth_select};
undef $self->{sth_delete};
undef $self->{sth_clean};
}
}
sub short_url {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url} ? 1 : 0;
}
sub short_url_redir {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url_redir} ? 1 : 0;
}
sub short_url_200 {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url_200} ? 1 : 0;
}
sub short_url_404 {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url_404} ? 1 : 0;
}
sub short_url_code {
my ($self, $pms, undef, $code) = @_;
# Make sure checks are run
$self->_check_short($pms);
return 0 unless defined $code && $code =~ /^\d{3}$/;
return $pms->{"short_url_$code"} ? 1 : 0;
}
sub short_url_chained {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url_chained} ? 1 : 0;
}
sub short_url_maxchain {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url_maxchain} ? 1 : 0;
}
sub short_url_loop {
my ($self, $pms) = @_;
# Make sure checks are run
$self->_check_short($pms);
return $pms->{short_url_loop} ? 1 : 0;
}
sub _check_shortener_uri {
my ($uri, $conf) = @_;
local($1,$2);
return 0 unless $uri =~ m{^
https?:// # Only http
(?:[^\@/?#]*\@)? # Ignore user:pass@
([^/?#:]+) # (Capture hostname)
(?::\d+)? # Possible port
(.*?\w)? # Some path wanted
}ix;
my $host = lc $1;
my $has_path = defined $2;
my $levels = $host =~ tr/.//;
# No point looking at single level "xxx.yy" without a path
return if $levels == 1 && !$has_path;
if (exists $conf->{url_shortener}->{$host}) {
return {
'uri' => $uri,
'method' => $conf->{url_shortener}->{$host} == 1 ? 'head' : 'get',
};
}
# if domain is a 3rd level domain check if there is a url shortener
# on the 2nd level tld
elsif ($levels == 2 && $host =~ /^(?!www)[^.]+(\.[^.]+\.[^.]+)$/i &&
exists $conf->{url_shortener}->{$1}) {
return {
'uri' => $uri,
'method' => $conf->{url_shortener}->{$1} == 1 ? 'head' : 'get',
};
}
return;
}
sub check_dnsbl {
my ($self, $opts) = @_;
$self->_check_short($opts->{permsgstatus});
}
sub _check_short {
my ($self, $pms) = @_;
return if $pms->{short_url_checked}++;
my $conf = $pms->{conf};
# Sort short URLs into hash to de-dup them
my %short_urls;
my $uris = $pms->get_uri_detail_list();
while (my($uri, $info) = each %{$uris}) {
next unless $info->{domains} && $info->{cleaned};
if (my $short_url_info = _check_shortener_uri($uri, $conf)) {
$short_urls{$uri} = $short_url_info;
last if scalar keys %short_urls >= $conf->{max_short_urls};
}
}
# Bail out if no shortener was found
return unless %short_urls;
# Mark that a URL shortener was found
$pms->{short_url} = 1;
# Bail out if network lookups not enabled or max_short_urls 0
return if $self->{net_disabled};
return if !$conf->{max_short_urls};
# Initialize cache
$self->initialise_url_shortener_cache($conf);
# Initialize LWP
my $ua = LWP::UserAgent->new(
'agent' => $conf->{url_shortener_user_agent},
'max_redirect' => 0,
'timeout' => $conf->{url_shortener_timeout},
);
$ua->env_proxy;
# Launch HTTP requests
foreach my $uri (keys %short_urls) {
$self->recursive_lookup($short_urls{$uri}, $pms, $ua);
}
# Automatically purge old entries
if ($self->{dbh} && $conf->{url_shortener_cache_autoclean}
&& rand() < 1/$conf->{url_shortener_cache_autoclean})
{
dbg("cleaning stale cache entries");
eval { $self->{sth_clean}->execute(); };
if ($@) { dbg("cache cleaning failed: $@"); }
}
}
sub recursive_lookup {
my ($self, $short_url_info, $pms, $ua, %been_here) = @_;
my $conf = $pms->{conf};
my $count = scalar keys %been_here;
dbg("redirection count $count") if $count;
if ($count >= $conf->{max_short_url_redirections}) {
dbg("found more than $conf->{max_short_url_redirections} shortener redirections");
# Fire test
$pms->{short_url_maxchain} = 1;
return;
}
my $short_url = $short_url_info->{uri};
my $location;
if (defined($location = $self->cache_get($short_url))) {
if ($conf->{url_shortener_loginfo}) {
info("found cached $short_url => $location");
} else {
dbg("found cached $short_url => $location");
}
# Cached http code?
if ($location =~ /^\d{3}$/) {
$pms->{"short_url_$location"} = 1;
# Update cache
$self->cache_add($short_url, $location);
return;
}
} else {
# Not cached; do lookup
my $method = $short_url_info->{method};
my $response = $ua->$method($short_url);
if (!$response->is_redirect) {
dbg("URL is not redirect: $short_url = ".$response->status_line);
my $rcode = $response->code;
if ($rcode =~ /^\d{3}$/) {
$pms->{"short_url_$rcode"} = 1;
# Update cache
$self->cache_add($short_url, $rcode);
}
return;
}
$location = $response->headers->{location};
if ($self->{url_shortener_loginfo}) {
info("found $short_url => $location");
} else {
dbg("found $short_url => $location");
}
}
# Update cache
$self->cache_add($short_url, $location);
# Bail out if $short_url redirects to itself
if ($short_url eq $location) {
dbg("URL is redirect to itself");
return;
}
# At this point we have a valid redirection and new URL in $response
$pms->{short_url_redir} = 1;
# Set chained here otherwise we might mark a disabled page or
# redirect back to the same host as chaining incorrectly.
$pms->{short_url_chained} = 1 if $count;
# Check if we are being redirected to a local page
# Don't recurse in this case...
if ($location !~ m{^[a-z]+://}i) {
my $orig_location = $location;
my $orig_short_url = $short_url;
# Strip to..
if (index($location, '/') == 0) {
$short_url =~ s{^([a-z]+://.*?)[/?#].*}{$1}; # ..absolute path
} else {
$short_url =~ s{^([a-z]+://.*)/}{$1}; # ..relative path
}
$location = "$short_url/$location";
dbg("looks like a local redirection: $orig_short_url => $location ($orig_location)");
$pms->add_uri_detail_list($location) if !$pms->{uri_detail_list}->{$location};
return;
}
if (exists $been_here{$location}) {
# Loop detected
dbg("error: loop detected: $location");
$pms->{short_url_loop} = 1;
return;
}
$been_here{$location} = 1;
$pms->add_uri_detail_list($location) if !$pms->{uri_detail_list}->{$location};
# Check for recursion
if (my $short_url_info = _check_shortener_uri($location, $conf)) {
# Recurse...
$self->recursive_lookup($short_url_info, $pms, $ua, %been_here);
}
}
sub cache_add {
my ($self, $short_url, $decoded_url) = @_;
return if !$self->{dbh};
return if length($short_url) > 256 || length($decoded_url) > 512;
# Upsert
eval { $self->{sth_insert}->execute($short_url, $decoded_url); };
if ($@) {
dbg("could not add to cache: $@");
}
return;
}
sub cache_get {
my ($self, $key) = @_;
return if !$self->{dbh};
# Make sure expired entries are gone. Just a quick check for primary key,
# not that expensive.
eval { $self->{sth_delete}->execute($key); };
if ($@) {
dbg("cache delete failed: $@");
return;
}
# Now try to get it (don't bother parsing if something was deleted above,
# it would be rare event anyway)
eval { $self->{sth_select}->execute($key); };
if ($@) {
dbg("cache get failed: $@");
return;
}
my @row = $self->{sth_select}->fetchrow_array();
if (@row) {
return $row[0];
}
return;
}
# Version features
sub has_short_url { 1 }
sub has_autoclean { 1 }
sub has_short_url_code { 1 }
sub has_user_agent { 1 } # url_shortener_user_agent
sub has_get { 1 } # url_shortener_get
sub has_clear { 1 } # clear_url_shortener
sub has_timeout { 1 } # url_shortener_timeout
sub has_max_redirections { 1 } # max_short_url_redirections
# short_url() will always hit if matching url_shortener was found, even
# without HTTP requests. To check if a valid HTTP redirection response was
# seen, use short_url_redir().
sub has_short_url_redir { 1 }
1;

View File

@ -0,0 +1,713 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
# Authors: Jonas Eckerman, Dave Wreski, Giovanni Bechis
=head1 NAME
ExtractText - extracts text from documenmts.
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::ExtractText
ifplugin Mail::SpamAssassin::Plugin::ExtractText
extracttext_external pdftotext /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} -
extracttext_use pdftotext .pdf application/pdf
# http://docx2txt.sourceforge.net
extracttext_external docx2txt /usr/bin/docx2txt {} -
extracttext_use docx2txt .docx application/docx
extracttext_external antiword /usr/bin/antiword -t -w 0 -m UTF-8.txt {}
extracttext_use antiword .doc application/(?:vnd\.?)?ms-?word.*
extracttext_external unrtf /usr/bin/unrtf --nopict {}
extracttext_use unrtf .doc .rtf application/rtf text/rtf
extracttext_external odt2txt /usr/bin/odt2txt --encoding=UTF-8 {}
extracttext_use odt2txt .odt .ott application/.*?opendocument.*text
extracttext_use odt2txt .sdw .stw application/(?:x-)?soffice application/(?:x-)?starwriter
extracttext_external tesseract {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c page_separator= {} -
extracttext_use tesseract .jpg .png .bmp .tif .tiff image/(?:jpeg|png|x-ms-bmp|tiff)
add_header all ExtractText-Flags _EXTRACTTEXTFLAGS_
header PDF_NO_TEXT X-ExtractText-Flags =~ /\bpdftotext_NoText\b/
describe PDF_NO_TEXT PDF without text
score PDF_NO_TEXT 0.001
header DOC_NO_TEXT X-ExtractText-Flags =~ /\b(?:antiword|openxml|unrtf|odt2txt)_NoText\b/
describe DOC_NO_TEXT Document without text
score DOC_NO_TEXT 0.001
header EXTRACTTEXT exists:X-ExtractText-Flags
describe EXTRACTTEXT Email processed by extracttext plugin
score EXTRACTTEXT 0.001
endif
=head1 DESCRIPTION
This module uses external tools to extract text from message parts,
and then sets the text as the rendered part. External tool must output
plain text, not HTML or other non-textual result.
How to extract text is completely configurable, and based on
MIME part type and file name.
=head1 CONFIGURATION
All configuration lines in user_prefs files will be ignored.
=over 4
=item extracttext_maxparts (default: 10)
Configure the maximum mime parts number to analyze, a value of 0 means all mime parts
will be analyzed
=item extracttext_timeout (default: 5 10)
Configure the timeout in seconds of external tool checks, per attachment.
Second argument speficies maximum total time for all checks.
=back
=head2 Tools
=over
=item extracttext_use
Specifies what tool to use for what message parts.
The general syntax is
extracttext_use C<name> C<specifiers>
=back
=over
=item name
the internal name of a tool.
=item specifiers
File extension and regular expressions for file names and MIME
types. The regular expressions are anchored to beginning and end.
=back
=head3 Examples
extracttext_use antiword .doc application/(?:vnd\.?)?ms-?word.*
extracttext_use openxml .docx .dotx .dotm application/(?:vnd\.?)openxml.*?word.*
extracttext_use openxml .doc .dot application/(?:vnd\.?)?ms-?word.*
extracttext_use unrtf .doc .rtf application/rtf text/rtf
=over
=item extracttext_external
Defines an external tool. The tool must read a document on standard input
or from a file and write text to standard output.
The special keyword "{}" will be substituted at runtime with the temporary
filename to be scanned by the external tool.
Environment variables can be defined with "{KEY=VALUE}", these strings will
be removed from commandline.
It is required that commandline used outputs result directly to STDOUT.
The general syntax is
extracttext_external C<name> C<command> C<parameters>
=back
=over
=item name
The internal name of this tool.
=item command
The full path to the external command to run.
=item parameters
Parameters for the external command. The temporary file name containing
the document will be automatically added as last parameter.
=back
=head3 Examples
extracttext_external antiword /usr/bin/antiword -t -w 0 -m UTF-8.txt {} -
extracttext_external unrtf /usr/bin/unrtf --nopict {}
extracttext_external odt2txt /usr/bin/odt2txt --encoding=UTF-8 {}
=head2 Metadata
The plugin adds some pseudo headers to the message. These headers are seen by
the bayes system, and can be used in normal SpamAssassin rules.
The headers are also available as template tags as noted below.
=head3 Example
The fictional example headers below are based on a message containing this:
=over
=item 1
A perfectly normal PDF.
=item 2
An OpenXML document with a word document inside.
Neither Office document contains text.
=back
=head3 Headers
=over
=item X-ExtractText-Chars
Tag: _EXTRACTTEXTCHARS_
Contains a count of characters that were extracted.
X-ExtractText-Chars: 10970
=item X-ExtractText-Words
Tag: _EXTRACTTEXTWORDS_
Contains a count of "words" that were extracted.
X-ExtractText-Chars: 1599
=item X-ExtractText-Tools
Tag: _EXTRACTTEXTTOOLS_
Contains chains of tools used for extraction.
X-ExtractText-Tools: pdftotext openxml_antiword
=item X-ExtractText-Types
Tag: _EXTRACTTEXTTYPES_
Contains chains of MIME types for parts found during extraction.
X-ExtractText-Types: application/pdf; application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/ms-word
=item X-ExtractText-Extensions
Tag: _EXTRACTTEXTEXTENSIONS_
Contains chains of canonicalized file extensions for parts
found during extraction.
X-ExtractText-Extensions: pdf docx
=item X-ExtractText-Flags
Tag: _EXTRACTTEXTFLAGS_
Contains notes from the plugin.
X-ExtractText-Flags: openxml_NoText
=back
=head3 Rules
Example:
header PDF_NO_TEXT X-ExtractText-Flags =~ /\bpdftotext_Notext\b/
describe PDF_NO_TEXT PDF without text
=cut
package Mail::SpamAssassin::Plugin::ExtractText;
use strict;
use warnings;
use re 'taint';
my $VERSION = 0.001;
use File::Basename;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw (compile_regexp untaint_var untaint_file_path
proc_status_ok exit_status_str);
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub new {
my ($class, $mailsa) = @_;
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsa);
bless ($self, $class);
$self->{match} = [];
$self->{tools} = {};
$self->{magic} = 0;
$self->register_method_priority('post_message_parse', -1);
$self->set_config($mailsa->{conf});
return $self;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'extracttext_maxparts',
default => 10,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push(@cmds, {
setting => 'extracttext_timeout',
default => 5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
my ($self, $key, $value, $line) = @_;
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
local ($1,$2);
unless ($value =~ /^(\d+)(?:\s+(\d+))?$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{extracttext_timeout} = $1;
$self->{extracttext_timeout_total} = $2;
}
});
$conf->{parser}->register_commands(\@cmds);
}
sub parse_config {
my ($self, $opts) = @_;
# Ignore users's configuration lines
return 0 if $opts->{user_config};
if ($opts->{key} eq 'extracttext_use') {
$self->inhibit_further_callbacks();
# Temporary kludge to notify users. Double backslashes have zero benefit for this plugin config.
if ($opts->{value} =~ s/\\\\/\\/g) {
warn "extracttext: DOUBLE BACKSLASHES DEPRECATED, change config to single backslashes, autoconverted for backward compatibility: $opts->{key} $opts->{value}\n";
}
if ($opts->{value} =~ /(?:to|2)html\b/) {
warn "extracttext: HTML tools are not supported, plain text output is required. Please remove: $opts->{key} $opts->{value}\n";
return 1;
}
my @vals = split(/\s+/, $opts->{value});
my $tool = lc(shift @vals);
return 0 unless @vals;
foreach my $what (@vals) {
my $where;
if (index($what, '/') >= 0) {
$where = 'type';
} else {
$where = 'name';
if ($what =~ /^\.[a-zA-Z0-9]+$/) {
$what = ".*\\$what";
}
}
my ($rec, $err) = compile_regexp('^(?i)'.$what.'$', 0);
if (!$rec) {
warn("invalid regexp '$what': $err\n");
return 0;
}
push @{$self->{match}}, {where=>$where, what=>$rec, tool=>$tool};
dbg('extracttext: use: %s %s %s', $tool, $where, $what);
}
return 1;
}
if ($opts->{key} eq 'extracttext_external') {
$self->inhibit_further_callbacks();
# Temporary kludge to notify users. Double backslashes have zero benefit for this plugin config.
if ($opts->{value} =~ s/\\\\/\\/g) {
warn "extracttext: DOUBLE BACKSLASHES DEPRECATED, change config to single backslashes, autoconverted for backward compatibility: $opts->{key} $opts->{value}\n";
}
if ($opts->{value} =~ /(?:to|2)html\b/) {
warn "extracttext: HTML tools are not supported, plain text output is required. Please remove: $opts->{key} $opts->{value}\n";
return 1;
}
my %env;
while ($opts->{value} =~ s/\{(.+?)\}/ /g) {
my ($k,$v) = split(/=/, $1, 2);
$env{$k} = defined $v ? $v : '';
}
my @vals = split(/\s+/, $opts->{value});
my $name = lc(shift @vals);
return 0 unless @vals > 1;
if ($self->{tools}->{$name}) {
warn "extracttext: duplicate tool defined: $name\n";
return 0;
}
#unless (-x $vals[0]) {
# warn "extracttext: missing tool: $name ($vals[0])\n";
# return 0;
#}
$self->{tools}->{$name} = {
'name' => $name,
'type' => 'external',
'env' => \%env,
'cmd' => \@vals,
};
dbg('extracttext: external: %s "%s"', $name, join('","', @vals));
return 1;
}
return 0;
}
# Extract 'text' via running an external command.
sub _extract_external {
my ($self, $object, $tool) = @_;
my ($errno, $pipe_errno, $tmp_file, $err_file, $pid);
my $resp = '';
my @cmd = @{$tool->{cmd}};
Mail::SpamAssassin::PerMsgStatus::enter_helper_run_mode($self);
# Set environment variables
foreach (keys %{$tool->{env}}) {
$ENV{$_} = $tool->{env}{$_};
}
my $timer = Mail::SpamAssassin::Timeout->new(
{ secs => $self->{main}->{conf}->{extracttext_timeout},
deadline => $self->{'master_deadline'} });
my $err = $timer->run_and_catch(sub {
local $SIG{PIPE} = sub { die "__brokenpipe__ignore__\n" };
($tmp_file, my $tmp_fh) = Mail::SpamAssassin::Util::secure_tmpfile();
$tmp_file or die "failed to create a temporary file";
print $tmp_fh ${$object->{data}};
close($tmp_fh);
($err_file, my $err_fh) = Mail::SpamAssassin::Util::secure_tmpfile();
$err_file or die "failed to create a temporary file";
close($err_fh);
$err_file = untaint_file_path($err_file);
foreach (@cmd) {
# substitute "{}" with the temporary file name to pass to the external software
s/\{\}/$tmp_file/;
$_ = untaint_var($_);
}
$pid = Mail::SpamAssassin::Util::helper_app_pipe_open(*EXTRACT, undef, ">$err_file", @cmd);
$pid or die "$!\n";
# read+split avoids a Perl I/O bug (Bug 5985)
my($inbuf, $nread);
while ($nread = read(EXTRACT, $inbuf, 8192)) { $resp .= $inbuf }
defined $nread or die "error reading from pipe: $!";
$errno = 0;
close EXTRACT or $errno = $!;
if (proc_status_ok($?, $errno)) {
dbg("extracttext: [%s] (%s) finished successfully", $pid, $cmd[0]);
} elsif (proc_status_ok($?, $errno, 0, 1)) { # sometimes it exits with 1
dbg("extracttext: [%s] (%s) finished: %s", $pid, $cmd[0], exit_status_str($?, $errno));
} else {
info("extracttext: [%s] (%s) error: %s", $pid, $cmd[0], exit_status_str($?, $errno));
}
# Save return status for later
$pipe_errno = $?;
});
if (defined(fileno(*EXTRACT))) { # still open
if ($pid) {
if (kill('TERM', $pid)) {
dbg("extracttext: killed stale helper [$pid] ($cmd[0])");
} else {
dbg("extracttext: killing helper application [$pid] ($cmd[0]) failed: $!");
}
}
$errno = 0;
close EXTRACT or $errno = $!;
proc_status_ok($?, $errno)
or info("extracttext: [%s] (%s) error: %s", $pid, $cmd[0], exit_status_str($?, $errno));
}
Mail::SpamAssassin::PerMsgStatus::leave_helper_run_mode($self);
unlink($tmp_file);
# Read first line from STDERR
my $err_resp = -s $err_file ?
do { open(ERRF, $err_file); $_ = <ERRF>; close(ERRF); chomp; $_; } : '';
unlink($err_file);
if ($err_resp ne '') {
dbg("extracttext: [$pid] ($cmd[0]) stderr output: $err_resp");
}
# If the output starts with the command that has been run it's
# probably an error message
if ($pipe_errno) {
if ($err_resp =~ /\b(?:Usage:|No such file or directory)/) {
warn "extracttext: error from $cmd[0], please verify configuration: $err_resp\n";
}
elsif ($err_resp =~ /^Syntax (?:Warning|Error): (?:May not be a PDF file|Couldn't find trailer dictionary)/) {
# Ignore pdftotext
}
elsif ($err_resp =~ /^Error in (?:findFileFormatStream|fopenReadStream): (?:truncated file|file not found)/) {
# Ignore tesseract
}
elsif ($err_resp =~ /^libpng error:/) {
# Ignore tesseract
}
elsif ($err_resp =~ /^Corrupt JPEG data:/) {
# Ignore tesseract
}
elsif ($err_resp =~ /^\S+ is not a Word Document/) {
# Ignore antiword
}
elsif (!$resp) {
warn "extracttext: error (".($pipe_errno/256).") from $cmd[0]: $err_resp\n";
}
return (0, $resp);
}
return (1, $resp);
}
sub _extract_object {
my ($self, $object, $tool) = @_;
my ($ok, $text);
if ($tool->{type} eq 'external') {
($ok, $text) = $self->_extract_external($object, $tool);
} else {
warn "extracttext: bad tool type: $tool->{type}\n";
return 0;
}
return 0 unless $ok;
if ($text =~ /^[\s\r\n]*$/s) {
$text = '';
} else {
# Remove not important html elements
#$text =~ s/(?=<!DOCTYPE)([\s\S]*?)>//g;
#$text =~ s/(?=<!--)([\s\S]*?)-->//g;
}
if ($text eq '') {
dbg('extracttext: No text extracted');
}
$text = untaint_var($text);
utf8::encode($text) if utf8::is_utf8($text);
return (1, $text);
}
sub _get_extension {
my ($self, $object) = @_;
my $fext;
if ($object->{name} && $object->{name} =~ /\.([^.\\\/]+)$/) {
$fext = $1;
}
elsif ($object->{file} && $object->{file} =~ /\.([^.\\\/]+)$/) {
$fext = $1;
}
return $fext ? ($fext) : ();
}
sub _extract {
my ($self, $coll, $part, $type, $name, $data, $tool) = @_;
my $object = {
'data' => $data,
'type' => $type,
'name' => $name
};
my @fexts;
my @types;
my @tools = ($tool->{name});
my ($ok, $text) = $self->_extract_object($object,$tool);
# when url+text, script never returns to this point from _extract_object above
#
return 0 unless $ok;
if ($text ne '' && would_log('dbg','extracttext') > 1) {
dbg("extracttext: text extracted:\n$text");
}
push @{$coll->{text}}, $text;
push @types, $type;
push @fexts, $self->_get_extension($object);
if ($text eq '') {
push @{$coll->{flags}}, 'NoText';
push @{$coll->{text}}, 'NoText';
} else {
if ($text =~ /<a(?:\s+[^>]+)?\s+href="([^">]*)"/) {
push @{$coll->{flags}}, 'ActionURI';
dbg("extracttext: ActionURI: $1");
push @{$coll->{text}}, $text;
}
if ($text =~ /NoText/) {
push @{$coll->{flags}},'NoText';
dbg("extracttext: NoText");
push @{$coll->{text}}, $text;
}
$coll->{chars} += length($text);
# the following is safe (regarding clobbering the @_) since perl v5.11.0
$coll->{words} += split(/\W+/s,$text) - 1;
# $coll->{words} += scalar @{[split(/\W+/s,$text)]} - 1; # old perl hack
dbg("extracttext: rendering text for type $type with $tool->{name}");
$part->set_rendered($text);
}
if (@types) {
push @{$coll->{types}}, join(', ', @types);
}
if (@fexts) {
push @{$coll->{extensions}}, join('_', @fexts);
}
push @{$coll->{tools}}, join('_', @tools);
return 1;
}
#
# check attachment type and match with the right tool
#
sub _check_extract {
my ($self, $coll, $checked, $part, $decoded, $data, $type, $name) = @_;
return 0 unless (defined $type || defined $name);
foreach my $match (@{$self->{match}}) {
next unless $self->{tools}->{$match->{tool}};
next if $checked->{$match->{tool}};
if ($match->{where} eq 'name') {
next unless (defined $name && $name =~ $match->{what});
} elsif ($match->{where} eq 'type') {
next unless (defined $type && $type =~ $match->{what});
} else {
next;
}
$checked->{$match->{tool}} = 1;
# dbg("extracttext: coll: $coll, part: $part, type: $type, name: $name, data: $data, tool: $self->{tools}->{$match->{tool}}");
return 1 if $self->_extract($coll,$part,$type,$name,$data,$self->{tools}->{$match->{tool}});
}
return 0;
}
sub post_message_parse {
my ($self, $opts) = @_;
my $timer = $self->{main}->time_method("extracttext");
my $msg = $opts->{'message'};
$self->{'master_deadline'} = $msg->{'master_deadline'};
my $starttime = time;
my %collect = (
'tools' => [],
'types' => [],
'extensions' => [],
'flags' => [],
'chars' => 0,
'words' => 0,
'text' => [],
);
my $conf = $self->{main}->{conf};
my $maxparts = $conf->{extracttext_maxparts};
my $ttimeout = $conf->{extracttext_timeout_total} ||
$conf->{extracttext_timeout} > 10 ? $conf->{extracttext_timeout} : 10;
my $nparts = 0;
foreach my $part ($msg->find_parts(qr/./, 1)) {
next unless $part->is_leaf;
if ($maxparts > 0 && ++$nparts > $maxparts) {
dbg("extracttext: Skipping MIME parts exceeding the ${maxparts}th");
last;
}
if (time - $starttime >= $ttimeout) {
dbg("extracttext: Skipping MIME parts, total execution timeout exceeded");
last;
}
my (undef,$rtd) = $part->rendered;
next if defined $rtd;
my %checked = ();
my $dat = $part->decode();
my $typ = $part->{type};
my $nam = $part->{name};
my $dec = 1;
next if $self->_check_extract(\%collect,\%checked,$part,\$dec,\$dat,$typ,$nam);
}
return 1 unless @{$collect{tools}};
my @uniq_tools = do { my %seen; grep { !$seen{$_}++ } @{$collect{tools}} };
my @uniq_types = do { my %seen; grep { !$seen{$_}++ } @{$collect{types}} };
my @uniq_ext = do { my %seen; grep { !$seen{$_}++ } @{$collect{extensions}} };
my @uniq_flags = do { my %seen; grep { !$seen{$_}++ } @{$collect{flags}} };
$msg->put_metadata('X-ExtractText-Words', $collect{words});
$msg->put_metadata('X-ExtractText-Chars', $collect{chars});
$msg->put_metadata('X-ExtractText-Tools', join(' ', @uniq_tools));
$msg->put_metadata('X-ExtractText-Types', join(' ', @uniq_types));
$msg->put_metadata('X-ExtractText-Extensions', join(' ', @uniq_ext));
$msg->put_metadata('X-ExtractText-Flags', join(' ', @uniq_flags));
return 1;
}
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $msg = $pms->get_message();
foreach my $tag (('Words','Chars','Tools','Types','Extensions','Flags')) {
my $v = $msg->get_metadata("X-ExtractText-$tag");
if (defined $v) {
$pms->set_tag("ExtractText$tag", $v);
dbg("extracttext: tag: $tag $v");
}
}
return 1;
}
1;

View File

@ -20,7 +20,7 @@ use strict;
use warnings;
use re 'taint';
my $VERSION = 2.003;
my $VERSION = 4.000;
=head1 NAME
@ -48,19 +48,21 @@ freemail_domains domain ...
For example:
freemail_domains hotmail.com hotmail.co.?? yahoo.* yahoo.*.*
freemail_whitelist email/domain ...
freemail_welcomelist email/domain ...
Previously freemail_whitelist which will work interchangeably until 4.1.
Emails or domains listed here are ignored (pretend they aren't
freemail). No wildcards!
freemail_import_whitelist_auth 1/0
freemail_import_welcomelist_auth 1/0
Entries in whitelist_auth will also be used to whitelist emails
Entries in welcomelist_auth will also be used to welcomelist emails
or domains from being freemail. Default is 0.
freemail_import_def_whitelist_auth 1/0
freemail_import_def_welcomelist_auth 1/0
Entries in def_whitelist_auth will also be used to whitelist emails
Entries in def_welcomelist_auth will also be used to welcomelist emails
or domains from being freemail. Default is 0.
header FREEMAIL_REPLYTO eval:check_freemail_replyto(['option'])
@ -96,17 +98,6 @@ header FREEMAIL_BODY eval:check_freemail_body(['regex'])
Searches body for freemail address. With optional regex to match.
=head1 CHANGELOG
1.996 - fix freemail_skip_bulk_envfrom
1.997 - set freemail_skip_when_over_max to 1 by default
1.998 - don't warn about missing freemail_domains when linting
1.999 - default whitelist undisclosed-recipient@yahoo.com etc
2.000 - some cleaning up
2.001 - fix freemail_whitelist
2.002 - _add_desc -> _got_hit, fix description email append bug
2.003 - freemail_import_(def_)whitelist_auth
=cut
use Mail::SpamAssassin::Plugin;
@ -115,8 +106,8 @@ use Mail::SpamAssassin::Util qw(compile_regexp);
our @ISA = qw(Mail::SpamAssassin::Plugin);
# default email whitelist
our $email_whitelist = qr/
# default email welcomelist
our $email_welcomelist = qr/
^(?:
abuse|support|sales|info|helpdesk|contact|kontakt
| (?:post|host|domain)master
@ -142,7 +133,7 @@ our $skip_replyto_envfrom = qr/
)\@
/xi;
sub dbg { Mail::SpamAssassin::Plugin::dbg ("FreeMail: @_"); }
sub dbg { my $msg = shift; Mail::SpamAssassin::Plugin::dbg("FreeMail: $msg", @_); }
sub new {
my ($class, $mailsa) = @_;
@ -153,10 +144,10 @@ sub new {
$self->{freemail_available} = 1;
$self->set_config($mailsa->{conf});
$self->register_eval_rule("check_freemail_replyto");
$self->register_eval_rule("check_freemail_from");
$self->register_eval_rule("check_freemail_header");
$self->register_eval_rule("check_freemail_body");
$self->register_eval_rule("check_freemail_replyto", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_freemail_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_freemail_header", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_freemail_body", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
return $self;
}
@ -215,13 +206,15 @@ sub set_config {
}
);
push(@cmds, {
setting => 'freemail_import_whitelist_auth',
setting => 'freemail_import_welcomelist_auth',
aliases => ['freemail_import_whitelist_auth'], # removed in 4.1
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
push(@cmds, {
setting => 'freemail_import_def_whitelist_auth',
setting => 'freemail_import_def_welcomelist_auth',
aliases => ['freemail_import_def_whitelist_auth'], # removed in 4.1
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
@ -234,9 +227,9 @@ sub parse_config {
if ($opts->{key} eq "freemail_domains") {
foreach my $temp (split(/\s+/, $opts->{value})) {
if ($temp =~ /^[a-z0-9.*?-]+$/i) {
if ($temp !~ tr/a-zA-Z0-9.*?-//c) {
my $value = lc($temp);
if ($value =~ /[*?]/) { # separate wildcard list
if ($value =~ tr/*?//) { # separate wildcard list
$self->{freemail_temp_wc}{$value} = 1;
}
else {
@ -244,21 +237,21 @@ sub parse_config {
}
}
else {
warn("invalid freemail_domains: $temp");
warn("freemail: invalid freemail_domains: $temp\n");
}
}
$self->inhibit_further_callbacks();
return 1;
}
if ($opts->{key} eq "freemail_whitelist") {
if ($opts->{key} eq "freemail_welcomelist" || $opts->{key} eq "freemail_whitelist") {
foreach my $temp (split(/\s+/, $opts->{value})) {
my $value = lc($temp);
if ($value =~ /\w[.@]\w/) {
$self->{freemail_whitelist}{$value} = 1;
$self->{freemail_welcomelist}{$value} = 1;
}
else {
warn("invalid freemail_whitelist: $temp");
warn("freemail: invalid freemail_welcomelist: $temp\n");
}
}
$self->inhibit_further_callbacks();
@ -292,12 +285,7 @@ sub finish_parsing_end {
dbg("loaded freemail_domains entries: $count normal, $wcount wildcard");
}
else {
if ($self->{main}->{lint_rules} ||1) {
dbg("no freemail_domains entries defined, disabling plugin");
}
else {
warn("no freemail_domains entries defined, disabling plugin");
}
$self->{freemail_available} = 0;
}
@ -315,29 +303,29 @@ sub _is_freemail {
return 0 if $email eq '';
if (defined $self->{freemail_whitelist}{$email}) {
dbg("whitelisted email: $email");
if (defined $self->{freemail_welcomelist}{$email}) {
dbg("welcomelisted email: $email");
return 0;
}
my $domain = $email;
$domain =~ s/.*\@//;
if (defined $self->{freemail_whitelist}{$domain}) {
dbg("whitelisted domain: $domain");
if (defined $self->{freemail_welcomelist}{$domain}) {
dbg("welcomelisted domain: $domain");
return 0;
}
if ($email =~ $email_whitelist) {
dbg("whitelisted email, default: $email");
if ($email =~ $email_welcomelist) {
dbg("welcomelisted email, default: $email");
return 0;
}
foreach my $list ('whitelist_auth','def_whitelist_auth') {
foreach my $list ('welcomelist_auth','def_welcomelist_auth') {
if ($pms->{conf}->{"freemail_import_$list"}) {
foreach my $regexp (values %{$pms->{conf}->{$list}}) {
if ($email =~ /$regexp/o) {
dbg("whitelisted email, $list: $email");
dbg("welcomelisted email, $list: $email");
return 0;
}
}
@ -422,21 +410,13 @@ sub _parse_body {
return 1;
}
sub _got_hit {
my ($self, $pms, $email, $desc) = @_;
sub _test_log {
my ($self, $pms, $email, $rulename) = @_;
my $rulename = $pms->get_current_eval_rule_name();
if (defined $pms->{conf}->{descriptions}->{$rulename}) {
$desc = $pms->{conf}->{descriptions}->{$rulename};
if ($pms->{conf}->{freemail_add_describe_email}) {
$email =~ s/\@/(at)/g;
$pms->test_log($email, $rulename);
}
if ($pms->{main}->{conf}->{freemail_add_describe_email}) {
$email =~ s/\@/[at]/g;
$pms->test_log($email);
}
$pms->got_hit($rulename, "", description => $desc, ruletype => 'eval');
}
sub check_freemail_header {
@ -448,7 +428,7 @@ sub check_freemail_header {
dbg("RULE ($rulename) check_freemail_header".(defined $regex ? " regex:$regex" : ""));
unless (defined $header) {
warn("check_freemail_header needs argument");
warn("freemail: check_freemail_header needs argument\n");
return 0;
}
@ -462,13 +442,13 @@ sub check_freemail_header {
$re = $rec;
}
my @emails = map (lc, $pms->{main}->find_all_addrs_in_line ($pms->get($header)));
my @emails = map (lc, $pms->get("$header:addr"));
if (!scalar (@emails)) {
dbg("header $header not found from mail");
return 0;
}
dbg("addresses from header $header: ".join(';',@emails));
dbg("addresses from header $header: ".join(', ', @emails));
foreach my $email (@emails) {
if ($self->_is_freemail($email, $pms)) {
@ -479,7 +459,7 @@ sub check_freemail_header {
else {
dbg("HIT! $email is freemail");
}
$self->_got_hit($pms, $email, "Header $header is freemail");
$self->_test_log($pms, $email, $rulename);
return 1;
}
}
@ -511,16 +491,16 @@ sub check_freemail_body {
foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
if ($email =~ /$re/o) {
dbg("HIT! email from body is freemail and matches regex: $email");
$self->_got_hit($pms, $email, "Email from body is freemail");
return 0;
$self->_test_log($pms, $email, $rulename);
return 1;
}
}
}
elsif (scalar keys %{$pms->{freemail_cache}{body}}) {
my $emails = join(', ', keys %{$pms->{freemail_cache}{body}});
dbg("HIT! body has freemails: $emails");
$self->_got_hit($pms, $emails, "Body contains freemails");
return 0;
$self->_test_log($pms, $emails, $rulename);
return 1;
}
return 0;
@ -563,8 +543,8 @@ sub check_freemail_from {
else {
dbg("HIT! $email is freemail");
}
$self->_got_hit($pms, $email, "Sender address is freemail");
return 0;
$self->_test_log($pms, $email, $rulename);
return 1;
}
return 0;
@ -580,7 +560,7 @@ sub check_freemail_replyto {
if (defined $what) {
if ($what ne 'replyto' and $what ne 'reply') {
warn("invalid check_freemail_replyto option: $what");
warn("freemail: invalid check_freemail_replyto option: $what\n");
return 0;
}
}
@ -590,25 +570,34 @@ sub check_freemail_replyto {
# Skip mailing-list etc looking requests, mostly FPs from them
if ($pms->{main}->{conf}->{freemail_skip_bulk_envfrom}) {
my $envfrom = lc($pms->get("EnvelopeFrom"));
if ($envfrom =~ $skip_replyto_envfrom) {
my $envfrom = ($pms->get("EnvelopeFrom"))[0];
if (defined $envfrom && $envfrom =~ $skip_replyto_envfrom) {
dbg("envelope sender looks bulk, skipping check: $envfrom");
return 0;
}
}
my $from = lc($pms->get("From:addr"));
my $replyto = lc($pms->get("Reply-To:addr"));
my $from_is_fm = $self->_is_freemail($from, $pms);
my $replyto_is_fm = $self->_is_freemail($replyto, $pms);
my @from_addrs = map (lc, $pms->get("From:addr"));
dbg("From address: ".join(", ", @from_addrs)) if @from_addrs;
dbg("From address: $from") if $from ne '';
dbg("Reply-To address: $replyto") if $replyto ne '';
my @replyto_addrs = map (lc, $pms->get("Reply-To:addr"));
dbg("Reply-To address: ".join(", ", @replyto_addrs)) if @replyto_addrs;
if ($from_is_fm and $replyto_is_fm and ($from ne $replyto)) {
my $from_is_fm = grep { $self->_is_freemail($_, $pms) } @from_addrs;
my $replyto_is_fm = grep { $self->_is_freemail($_, $pms) } @replyto_addrs;
my $from_not_in_replyto = 1;
foreach my $from (@from_addrs) {
next unless grep { $_ eq $from } @replyto_addrs;
$from_not_in_replyto = 0;
}
if ($from_is_fm and $replyto_is_fm and $from_not_in_replyto) {
dbg("HIT! From and Reply-To are different freemails");
$self->_got_hit($pms, "$from, $replyto", "From and Reply-To are different freemails");
return 0;
my $from = join(",", @from_addrs);
my $replyto = join(",", @replyto_addrs);
$self->_test_log($pms, "$from -> $replyto", $rulename);
return 1;
}
if ($what eq 'replyto') {
@ -618,7 +607,7 @@ sub check_freemail_replyto {
}
}
elsif ($what eq 'reply') {
if ($replyto ne '' and !$replyto_is_fm) {
if (@replyto_addrs and !$replyto_is_fm) {
dbg("Reply-To defined and is not freemail, skipping check");
return 0;
}
@ -627,19 +616,21 @@ sub check_freemail_replyto {
return 0;
}
}
my $reply = $replyto_is_fm ? $replyto : $from;
return 0 unless $self->_parse_body($pms);
# Compare body to headers
if (scalar keys %{$pms->{freemail_cache}{body}}) {
my $check = $what eq 'replyto' ? $replyto : $reply;
dbg("comparing $check to body freemails");
foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
if ($email ne $check) {
dbg("HIT! $check and $email are different freemails");
$self->_got_hit($pms, "$check, $email", "Different freemails in reply header and body");
return 0;
my $reply_addrs = $what eq 'replyto' ? \@replyto_addrs :
$replyto_is_fm ? \@replyto_addrs : \@from_addrs;
dbg("comparing to body freemails: ".join(", ", @$reply_addrs));
foreach my $body_email (keys %{$pms->{freemail_cache}{body}}) {
foreach my $reply_email (@$reply_addrs) {
if ($body_email ne $reply_email) {
dbg("HIT! $reply_email (Reply) and $body_email (Body) are different freemails");
$self->_test_log($pms, "$reply_email, $body_email", $rulename);
return 1;
}
}
}
}

View File

@ -17,29 +17,33 @@
=head1 NAME
FromNameSpoof - perform various tests to detect spoof attempts using the From header name section
FromNameSpoof - perform various tests to detect spoof attempts using the
From header name section
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
# Does the From:name look like it contains an email address
# From:name and From:addr do not match, matching depends on C<fns_check> setting
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
# From:name and From:addr do not match (same as above rule and C<fns_check 0>)
header __PLUGIN_FROMNAME_DIFFERENT eval:check_fromname_different()
# From:name and From:addr domains differ
header __PLUGIN_FROMNAME_DOMAIN_DIFFER eval:check_fromname_domain_differ()
# From:name looks like it contains an email address (not same as From:addr)
header __PLUGIN_FROMNAME_EMAIL eval:check_fromname_contains_email()
# Is the From:name different to the From:addr header
header __PLUGIN_FROMNAME_DIFFERENT eval:check_fromname_different()
# From:name matches any To:addr
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
# From:name and From:addr owners differ
header __PLUGIN_FROMNAME_OWNERS_DIFFER eval:check_fromname_owners_differ()
# From:name domain differs to from header
header __PLUGIN_FROMNAME_DOMAIN_DIFFER eval:check_fromname_domain_differ()
# From:name and From:address don't match and owners differ
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
# From:name address matches To:address
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
# From:name matches Reply-To:addr
header __PLUGIN_FROMNAME_EQUALS_REPLYTO eval:check_fromname_equals_replyto()
=head1 DESCRIPTION
@ -61,17 +65,21 @@ B<Note> that FromNameSpoof detects the "owner" of a domain by the following sear
<owner>.<tld>
By default FromNameSpoof will ignore the TLD when testing if From:addr is spoofed.
Default 1
By default FromNameSpoof will ignore the TLD when comparing addresses:
fns_check 1
Check levels:
0 - Strict checking of From:name != From:addr
1 - Allow for different tlds
1 - Allow for different TLDs
2 - Allow for different aliases but same domain
"Owner" info can also be mapped as aliases with C<fns_add_addrlist>. For
example, to consider "googlemail.com" as "gmail":
fns_add_addrlist (gmail) *@googlemail.com
=head1 TAGS
The following tags are added to the set if a spoof is detected. They are available for
@ -93,48 +101,35 @@ use in reports, header fields, other plugins, etc.:
Actual From:addr domain
_FNSFADDROWNER_
Actual From:addr detected owner
Actual From:addr owner
=head1 EXAMPLE
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
meta FROMNAME_SPOOF_EQUALS_TO (__PLUGIN_FROMNAME_SPOOF && __PLUGIN_FROMNAME_EQUALS_TO)
describe FROMNAME_SPOOF_EQUALS_TO From:name is spoof to look like To: address
score FROMNAME_SPOOF_EQUALS_TO 1.2
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
meta FROMNAME_SPOOF_EQUALS_TO (__PLUGIN_FROMNAME_SPOOF && __PLUGIN_FROMNAME_EQUALS_TO)
describe FROMNAME_SPOOF_EQUALS_TO From:name is spoof to look like To: address
score FROMNAME_SPOOF_EQUALS_TO 1.2
=cut
use strict;
package Mail::SpamAssassin::Plugin::FromNameSpoof;
my $VERSION = 0.9;
use strict;
use warnings;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use List::Util ();
use Mail::SpamAssassin::Util;
use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);
sub dbg { Mail::SpamAssassin::Plugin::dbg ("FromNameSpoof: @_"); }
my $VERSION = 1.0;
sub uri_to_domain {
my ($self, $domain) = @_;
return unless defined $domain;
if ($Mail::SpamAssassin::VERSION <= 3.004000) {
Mail::SpamAssassin::Util::uri_to_domain($domain);
} else {
$self->{main}->{registryboundaries}->uri_to_domain($domain);
}
}
sub dbg { my $msg = shift; Mail::SpamAssassin::Plugin::dbg("FromNameSpoof: $msg", @_); }
# constructor: register the eval rule
sub new
{
sub new {
my $class = shift;
my $mailsaobject = shift;
@ -146,13 +141,13 @@ sub new
$self->set_config($mailsaobject->{conf});
# the important bit!
$self->register_eval_rule("check_fromname_spoof");
$self->register_eval_rule("check_fromname_different");
$self->register_eval_rule("check_fromname_domain_differ");
$self->register_eval_rule("check_fromname_contains_email");
$self->register_eval_rule("check_fromname_equals_to");
$self->register_eval_rule("check_fromname_owners_differ");
$self->register_eval_rule("check_fromname_equals_replyto");
$self->register_eval_rule("check_fromname_spoof", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_fromname_different", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_fromname_domain_differ", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_fromname_contains_email", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_fromname_equals_to", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_fromname_owners_differ", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_fromname_equals_replyto", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
return $self;
}
@ -164,14 +159,13 @@ sub set_config {
setting => 'fns_add_addrlist',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my($self, $key, $value, $line) = @_;
my ($self, $key, $value, $line) = @_;
local($1,$2);
if ($value !~ /^ \( (.*?) \) \s+ (.*) \z/sx) {
if ($value !~ /^ \( (.+?) \) \s+ (.+) \z/sx) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my $listname = "FNS_$1";
$value = $2;
$self->{parser}->add_to_addrlist ($listname, split(/\s+/, lc($value)));
my $listname = "FNS_".lc($1);
$self->{parser}->add_to_addrlist($listname, split(/\s+/, lc $2));
$self->{fns_addrlists}{$listname} = 1;
}
});
@ -180,14 +174,13 @@ sub set_config {
setting => 'fns_remove_addrlist',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my($self, $key, $value, $line) = @_;
my ($self, $key, $value, $line) = @_;
local($1,$2);
if ($value !~ /^ \( (.*?) \) \s+ (.*) \z/sx) {
if ($value !~ /^ \( (.+?) \) \s+ (.+) \z/sx) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
my $listname = "FNS_$1";
$value = $2;
$self->{parser}->remove_from_addrlist ($listname, split (/\s+/, $value));
my $listname = "FNS_".lc($1);
$self->{parser}->remove_from_addrlist($listname, split (/\s+/, lc $2));
}
});
@ -206,7 +199,7 @@ sub set_config {
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
$self->{fns_ignore_dkim}->{$_} = 1 foreach (split(/\s+/, lc($value)));
$self->{fns_ignore_dkim}->{$_} = 1 foreach (split(/\s+/, lc $value));
}
});
@ -227,6 +220,16 @@ sub set_config {
setting => 'fns_check',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
if ($value !~ /^[012]$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{fns_check} = $value;
}
});
$conf->{parser}->register_commands(\@cmds);
@ -235,203 +238,287 @@ sub set_config {
sub parsed_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
$pms->action_depends_on_tags('DKIMDOMAIN',
sub { my($pms,@args) = @_;
$self->_check_fromnamespoof($pms);
# If fns_ignore_dkim used, force wait for DKIM results
if (%{$pms->{conf}->{fns_ignore_dkim}}) {
if ($self->{main}->{local_tests_only}) {
dbg("local tests only, ignoring fns_ignore_dkim setting");
}
# Check that DKIM module is loaded (a bit kludgy check)
elsif (exists $pms->{conf}->{dkim_timeout}) {
# Initialize async queue, any eval calls will queue their checks
$pms->{fromname_async_queue} = [];
# Process and finish queue as soon as DKIM is ready
$pms->action_depends_on_tags('DKIMDOMAIN', sub {
$self->_check_async_queue($pms);
});
} else {
dbg("DKIM plugin not loaded, ignoring fns_ignore_dkim setting");
}
}
);
1;
}
sub check_fromname_different
{
my ($self, $pms) = @_;
sub _check_eval {
my ($self, $pms, $result) = @_;
if (exists $pms->{fromname_async_queue}) {
my $rulename = $pms->get_current_eval_rule_name();
push @{$pms->{fromname_async_queue}}, sub {
if ($result->()) {
$pms->got_hit($rulename, '', ruletype => 'header');
} else {
$pms->rule_ready($rulename);
}
};
return; # return undef for async status
}
$self->_check_fromnamespoof($pms);
return $pms->{fromname_address_different};
# make sure not to return undef, as this is not async anymore
return $result->() || 0;
}
sub check_fromname_domain_differ
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_domain_different};
}
sub check_fromname_spoof
{
sub check_fromname_spoof {
my ($self, $pms, $check_lvl) = @_;
$self->_check_fromnamespoof($pms);
if ( not defined $check_lvl ) {
# Some deprecated eval parameter, was not documented?
if (!defined $check_lvl || $check_lvl !~ /^[012]$/) {
$check_lvl = $pms->{conf}->{fns_check};
}
my $result = sub {
my @array = (
($pms->{fromname_address_different}) ,
($pms->{fromname_address_different} && $pms->{fromname_owner_different}) ,
($pms->{fromname_address_different}),
($pms->{fromname_address_different} && $pms->{fromname_owner_different}),
($pms->{fromname_address_different} && $pms->{fromname_domain_different})
);
$array[$check_lvl];
};
return $array[$check_lvl];
return $self->_check_eval($pms, $result);
}
sub check_fromname_contains_email
{
sub check_fromname_different {
my ($self, $pms) = @_;
my $result = sub {
$pms->{fromname_address_different};
};
return $self->_check_eval($pms, $result);
}
sub check_fromname_domain_differ {
my ($self, $pms) = @_;
my $result = sub {
$pms->{fromname_domain_different};
};
return $self->_check_eval($pms, $result);
}
sub check_fromname_contains_email {
my ($self, $pms) = @_;
my $result = sub {
$pms->{fromname_contains_email};
};
return $self->_check_eval($pms, $result);
}
sub check_fromname_equals_to {
my ($self, $pms) = @_;
my $result = sub {
$pms->{fromname_equals_to_addr};
};
return $self->_check_eval($pms, $result);
}
sub check_fromname_owners_differ {
my ($self, $pms) = @_;
my $result = sub {
$pms->{fromname_owner_different};
};
return $self->_check_eval($pms, $result);
}
sub check_fromname_equals_replyto {
my ($self, $pms) = @_;
my $result = sub {
$pms->{fromname_equals_replyto};
};
return $self->_check_eval($pms, $result);
}
sub check_cleanup {
my ($self, $opts) = @_;
$self->_check_async_queue($opts->{permsgstatus});
}
# Shall only be called when DKIMDOMAIN is ready, or from check_cleanup() to
# make sure _check_fromnamespoof is called if DKIMDOMAIN was never set
sub _check_async_queue {
my ($self, $pms) = @_;
if (exists $pms->{fromname_async_queue}) {
$self->_check_fromnamespoof($pms);
return $pms->{fromname_contains_email};
$_->() foreach (@{$pms->{fromname_async_queue}});
# No more async queueing needed. If any evals are called later, they
# will act on the results directly.
delete $pms->{fromname_async_queue};
}
}
sub check_fromname_equals_replyto
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_equals_replyto};
}
sub check_fromname_equals_to
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_equals_to_addr};
}
sub check_fromname_owners_differ
{
my ($self, $pms) = @_;
$self->_check_fromnamespoof($pms);
return $pms->{fromname_owner_different};
}
sub _check_fromnamespoof
{
sub _check_fromnamespoof {
my ($self, $pms) = @_;
return if (defined $pms->{fromname_contains_email});
return if $pms->{fromname_checked};
$pms->{fromname_checked} = 1;
my $conf = $pms->{conf};
$pms->{fromname_contains_email} = 0;
$pms->{fromname_address_different} = 0;
$pms->{fromname_equals_to_addr} = 0;
$pms->{fromname_domain_different} = 0;
$pms->{fromname_owner_different} = 0;
$pms->{fromname_equals_replyto} = 0;
foreach my $addr (split / /, $pms->get_tag('DKIMDOMAIN') || '') {
if ($conf->{fns_ignore_dkim}->{lc($addr)}) {
foreach my $addr (split(/\s+/, $pms->get_tag('DKIMDOMAIN')||'')) {
if ($conf->{fns_ignore_dkim}->{lc $addr}) {
dbg("ignoring, DKIM signed: $addr");
return 0;
return;
}
}
foreach my $iheader (keys %{$conf->{fns_ignore_header}}) {
if ($pms->get($iheader)) {
dbg("ignoring, header $iheader found");
return 0 if ($pms->get($iheader));
return;
}
}
# Parse From addr
my $from_addr = lc $pms->get('From:addr');
my $from_domain = $self->{main}->{registryboundaries}->uri_to_domain("mailto:$from_addr");
return unless defined $from_domain;
# Parse From name
my $fromname = lc $pms->get('From:name');
# Very common to have From address cloned into name, ignore?
#if ($fromname eq $from_addr) {
# dbg("ignoring, From-name is exactly same as From addr: $fromname");
# return;
#}
my ($fromname_addr, $fromname_domain);
if ($fromname =~ /\b([\w\.\!\#\$\%\&\'\*\+\/\=\?\^\_\`\{\|\}\~-]+\@\w[\w-]*\.\w[\w.-]++)\b/i) {
$fromname_addr = $1;
$fromname_domain = $self->{main}->{registryboundaries}->uri_to_domain("mailto:$fromname_addr");
# No valid domain/TLD found? Any reason to keep testing a possibly obfuscated one?
if (!defined $fromname_domain) {
dbg("no From-name addr found");
return;
}
$pms->{fromname_contains_email} = 1; # check_fromname_contains_email hit
# Calculate "closeness" (this really needs documentation, as it's hard to understand)
my $nochar = ($fromname =~ y/a-z0-9//c);
$nochar -= ($fromname_addr =~ y/a-z0-9//c);
my $len = length($fromname) + $nochar - length($fromname_addr);
unless ($len <= $conf->{fns_extrachars}) {
dbg("not enough closeness for From-name/addr: $fromname <=> $fromname_addr ($len <= $conf->{fns_extrachars})");
return;
}
} else {
# No point continuing if email was not found inside name
dbg("no From-name addr found");
return;
}
# Parse owners
my $list_refs = {};
if ($conf->{fns_addrlists}) {
my @lists = keys %{$conf->{fns_addrlists}};
foreach my $list (@lists) {
$list_refs->{$list} = $conf->{$list};
}
s/^FNS_// foreach (@lists);
dbg("using addrlists: ".join(', ', @lists));
dbg("using addrlists for owner aliases: ".join(', ', map { s/^FNS_//r; } @lists));
}
my $fromname_owner = $self->_find_address_owner($fromname_addr, $fromname_domain, $list_refs);
my $from_owner = $self->_find_address_owner($from_addr, $from_domain, $list_refs);
my %fnd = ();
my %fad = ();
my %tod = ();
dbg("Parsed From-name addr/domain/owner: $fromname_addr/$fromname_domain/$fromname_owner");
dbg("Parsed From-addr addr/domain/owner: $from_addr/$from_domain/$from_owner");
$fnd{'addr'} = $pms->get("From:name");
if ($fnd{'addr'} =~ /\b((?>[\w\.\!\#\$\%\&\'\*\+\/\=\?\^\_\`\{\|\}\~\-]+@[\w\-\.]+\.[\w\-\.]+))\b/i) {
my $nochar = ($fnd{'addr'} =~ y/A-Za-z0-9//c);
$nochar -= ($1 =~ y/A-Za-z0-9//c);
return 0 unless ((length($fnd{'addr'})+$nochar) - length($1) <= $conf->{'fns_extrachars'});
$fnd{'addr'} = lc $1;
} else {
return 0;
if ($fromname_addr ne $from_addr) {
dbg("From-name addr differs from From addr: $fromname_addr != $from_addr");
$pms->{fromname_address_different} = 1;
}
my $replyto = lc $pms->get("Reply-To:addr");
$fad{'addr'} = lc $pms->get("From:addr");
my @toaddrs = $pms->all_to_addrs();
return 0 unless @toaddrs;
$tod{'addr'} = lc $toaddrs[0];
$fnd{'domain'} = $self->uri_to_domain($fnd{'addr'});
$fad{'domain'} = $self->uri_to_domain($fad{'addr'});
$tod{'domain'} = $self->uri_to_domain($tod{'addr'});
return 0 unless (defined $fnd{'domain'} && defined $fad{'domain'});
$pms->{fromname_contains_email} = 1;
$fnd{'owner'} = $self->_find_address_owner($fnd{'addr'}, $list_refs);
$fad{'owner'} = $self->_find_address_owner($fad{'addr'}, $list_refs);
$tod{'owner'} = $self->_find_address_owner($tod{'addr'}, $list_refs);
$pms->{fromname_address_different} = 1 if ($fnd{'addr'} ne $fad{'addr'});
$pms->{fromname_domain_different} = 1 if ($fnd{'domain'} ne $fad{'domain'});
$pms->{fromname_equals_to_addr} = 1 if ($fnd{'addr'} eq $tod{addr});
$pms->{fromname_equals_replyto} = 1 if ($fnd{'addr'} eq $replyto);
if ($fnd{'owner'} ne $fad{'owner'}) {
if ($fromname_domain ne $from_domain) {
dbg("From-name domain differs from From domain: $fromname_domain != $from_domain");
$pms->{fromname_domain_different} = 1;
}
if ($fromname_owner ne $from_owner) {
dbg("From-name owner differs from From owner: $fromname_owner != $from_owner");
$pms->{fromname_owner_different} = 1;
}
if ($pms->{fromname_address_different}) {
$pms->set_tag("FNSFNAMEADDR", $fnd{'addr'});
$pms->set_tag("FNSFADDRADDR", $fad{'addr'});
$pms->set_tag("FNSFNAMEOWNER", $fnd{'owner'});
$pms->set_tag("FNSFADDROWNER", $fad{'owner'});
$pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
$pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
# Check Reply-To related
my $replyto_addr = lc $pms->get('Reply-To:addr');
if ($fromname_addr eq $replyto_addr) {
dbg("From-name addr is same as Reply-To addr: $fromname_addr");
$pms->{fromname_equals_replyto} = 1;
}
dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
# Check To related
foreach my $to_addr ($pms->all_to_addrs()) {
if ($fromname_addr eq $to_addr) {
dbg("From-name addr is same as To addr: $fromname_addr");
$pms->{fromname_equals_to_addr} = 1;
last;
}
}
# Set tags
if ($pms->{fromname_address_different} || $pms->{fromname_owner_different}) {
$pms->set_tag("FNSFNAMEADDR", $fromname_addr);
$pms->set_tag("FNSFNAMEDOMAIN", $fromname_domain);
$pms->set_tag("FNSFNAMEOWNER", $fromname_owner);
$pms->set_tag("FNSFADDRADDR", $from_addr);
$pms->set_tag("FNSFADDRDOMAIN", $from_domain);
$pms->set_tag("FNSFADDROWNER", $from_owner);
}
}
sub _find_address_owner
{
my ($self, $check, $list_refs) = @_;
sub _find_address_owner {
my ($self, $addr, $addr_domain, $list_refs) = @_;
# Check fns addrlist first for user defined mapping
foreach my $owner (keys %{$list_refs}) {
foreach my $white_addr (keys %{$list_refs->{$owner}}) {
my $regexp = qr/$list_refs->{$owner}{$white_addr}/i;
if ($check =~ /$regexp/) {
$owner =~ s/^FNS_//i;
foreach my $listaddr (keys %{$list_refs->{$owner}}) {
if ($addr =~ $list_refs->{$owner}{$listaddr}) {
$owner =~ s/^FNS_//;
return lc $owner;
}
}
}
my $owner = $self->uri_to_domain($check);
$check =~ /^([^\@]+)\@(.*)$/;
if ($owner ne $2) {
return $self->_find_address_owner("$1\@$owner", $list_refs);
# If we have subdomain addr foo.bar@sub.domain.com,
# this will try to recheck foo.bar@domain.com from addrlist
local($1,$2);
if ($addr =~ /^([^\@]+)\@(.+)$/) {
if ($2 ne $addr_domain) {
return $self->_find_address_owner("$1\@$addr_domain", $addr_domain, $list_refs);
}
}
$owner =~ /^([^\.]+)\./;
return lc $1;
# Grab the first component of TLD
if ($addr_domain =~ /^([^.]+)\./) {
return $1;
} else {
return $addr_domain;
}
}
1;

View File

@ -39,18 +39,18 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule("html_tag_balance");
$self->register_eval_rule("html_image_only");
$self->register_eval_rule("html_image_ratio");
$self->register_eval_rule("html_charset_faraway");
$self->register_eval_rule("html_tag_exists");
$self->register_eval_rule("html_test");
$self->register_eval_rule("html_eval");
$self->register_eval_rule("html_text_match");
$self->register_eval_rule("html_title_subject_ratio");
$self->register_eval_rule("html_text_not_match");
$self->register_eval_rule("html_range");
$self->register_eval_rule("check_iframe_src");
$self->register_eval_rule("html_tag_balance", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_image_only", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_image_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_charset_faraway", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_tag_exists", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_test", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_eval", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_text_match", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_title_subject_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_text_not_match", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("html_range", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_iframe_src", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -61,65 +61,88 @@ sub html_tag_balance {
return 0 if $rawtag !~ /^([a-zA-Z0-9]+)$/;
my $tag = $1;
return 0 unless exists $pms->{html}{inside}{$tag};
return 0 if $rawexpr !~ /^([\<\>\=\!\-\+ 0-9]+)$/;
my $expr = untaint_var($1);
$pms->{html}{inside}{$tag} =~ /^([\<\>\=\!\-\+ 0-9]+)$/;
foreach my $html (@{$pms->{html_all}}) {
next unless exists $html->{inside}{$tag};
$html->{inside}{$tag} =~ /^([\<\>\=\!\-\+ 0-9]+)$/;
my $val = untaint_var($1);
return 1 if eval "\$val $expr";
}
return eval "\$val $expr";
return 0;
}
sub html_image_only {
my ($self, $pms, undef, $min, $max) = @_;
return (exists $pms->{html}{inside}{img} &&
exists $pms->{html}{length} &&
$pms->{html}{length} > $min &&
$pms->{html}{length} <= $max);
foreach my $html (@{$pms->{html_all}}) {
if (exists $html->{inside}{img} && exists $html->{length} &&
$html->{length} > $min && $html->{length} <= $max)
{
return 1;
}
}
return 0;
}
sub html_image_ratio {
my ($self, $pms, undef, $min, $max) = @_;
return 0 unless (exists $pms->{html}{non_space_len} &&
exists $pms->{html}{image_area} &&
$pms->{html}{image_area} > 0);
my $ratio = $pms->{html}{non_space_len} / $pms->{html}{image_area};
return ($ratio > $min && $ratio <= $max);
foreach my $html (@{$pms->{html_all}}) {
next unless (exists $html->{non_space_len} &&
exists $html->{image_area} &&
$html->{image_area} > 0);
my $ratio = $html->{non_space_len} / $html->{image_area};
return 1 if $ratio > $min && $ratio <= $max;
}
return 0;
}
sub html_charset_faraway {
my ($self, $pms) = @_;
return 0 unless exists $pms->{html}{charsets};
my @locales = Mail::SpamAssassin::Util::get_my_locales($pms->{conf}->{ok_locales});
return 0 if grep { $_ eq "all" } @locales;
foreach my $html (@{$pms->{html_all}}) {
next unless exists $html->{charsets};
my $okay = 0;
my $bad = 0;
for my $c (split(' ', $pms->{html}{charsets})) {
foreach my $c (split(/\s+/, $html->{charsets})) {
if (Mail::SpamAssassin::Locales::is_charset_ok_for_locales($c, @locales)) {
$okay++;
}
else {
} else {
$bad++;
}
}
return ($bad && ($bad >= $okay));
return 1 if $bad && $bad >= $okay;
}
return 0;
}
sub html_tag_exists {
my ($self, $pms, undef, $tag) = @_;
return exists $pms->{html}{inside}{$tag};
foreach my $html (@{$pms->{html_all}}) {
return 1 if exists $html->{inside}{$tag};
}
return 0;
}
sub html_test {
my ($self, $pms, undef, $test) = @_;
return $pms->{html}{$test};
foreach my $html (@{$pms->{html_all}}) {
return 1 if $html->{$test};
}
return 0;
}
sub html_eval {
@ -128,29 +151,38 @@ sub html_eval {
return 0 if $rawexpr !~ /^([\<\>\=\!\-\+ 0-9]+)$/;
my $expr = untaint_var($1);
foreach my $html (@{$pms->{html_all}}) {
# workaround bug 3320: weird perl bug where additional, very explicit
# untainting into a new var is required.
my $tainted = $pms->{html}{$test};
return 0 unless defined($tainted);
my $tainted = $html->{$test};
next unless defined($tainted);
my $val = $tainted;
# just use the value in $val, don't copy it needlessly
return eval "\$val $expr";
return 1 if eval "\$val $expr";
}
return 0;
}
sub html_text_match {
my ($self, $pms, undef, $text, $regexp) = @_;
my ($rec, $err) = compile_regexp($regexp, 0);
if (!$rec) {
warn "htmleval: html_text_match invalid regexp '$regexp': $err";
return 0;
}
foreach my $string (@{$pms->{html}{$text}}) {
foreach my $html (@{$pms->{html_all}}) {
next unless ref($html->{$text}) eq 'ARRAY';
foreach my $string (@{$html->{$text}}) {
next unless defined $string;
if ($string =~ $rec) {
return 1;
}
}
}
return 0;
}
@ -161,54 +193,74 @@ sub html_title_subject_ratio {
if ($subject eq '') {
return 0;
}
foreach my $html (@{$pms->{html_all}}) {
my $max = 0;
for my $string (@{ $pms->{html}{title} }) {
foreach my $string (@{$html->{title}}) {
if ($string) {
my $ratio = length($string) / length($subject);
$max = $ratio if $ratio > $max;
my $ratio_s = length($string) / length($subject);
$max = $ratio_s if $ratio_s > $max;
}
}
return $max > $ratio;
return 1 if $max > $ratio;
}
return 0;
}
sub html_text_not_match {
my ($self, $pms, undef, $text, $regexp) = @_;
for my $string (@{ $pms->{html}{$text} }) {
if (defined $string && $string !~ /${regexp}/) {
my ($rec, $err) = compile_regexp($regexp, 0);
if (!$rec) {
warn "htmleval: html_text_not_match invalid regexp '$regexp': $err";
return 0;
}
foreach my $html (@{$pms->{html_all}}) {
next unless ref($html->{$text}) eq 'ARRAY';
foreach my $string (@{$html->{$text}}) {
if (defined $string && $string !~ $rec) {
return 1;
}
}
}
return 0;
}
sub html_range {
my ($self, $pms, undef, $test, $min, $max) = @_;
return 0 unless exists $pms->{html}{$test};
$test = $pms->{html}{$test};
foreach my $html (@{$pms->{html_all}}) {
next unless defined $html->{$test};
my $value = $html->{$test};
# not all perls understand what "inf" means, so we need to do
# non-numeric tests! urg!
if (!defined $max || $max eq "inf") {
return ($test eq "inf") ? 1 : ($test > $min);
return 1 if $value > $min;
}
elsif ($test eq "inf") {
# $max < inf, so $test == inf means $test > $max
return 0;
elsif ($value eq "inf") {
# $max < inf, so $value == inf means $value > $max
next;
}
else {
# if we get here everything should be a number
return ($test > $min && $test <= $max);
return 1 if $value > $min && $value <= $max;
}
}
return 0;
}
sub check_iframe_src {
my ($self, $pms) = @_;
foreach my $v ( values %{$pms->{html}->{uri_detail}} ) {
foreach my $html (@{$pms->{html_all}}) {
foreach my $v (values %{$html->{uri_detail}}) {
return 1 if $v->{types}->{iframe};
}
}
return 0;
}

View File

@ -38,7 +38,7 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule ("check_https_http_mismatch");
$self->register_eval_rule ("check_https_http_mismatch", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -47,64 +47,62 @@ sub new {
# ("<" and ">" replaced with "[lt]" and "[gt]" to avoid Kaspersky Desktop AV
# false positive ;)
sub check_https_http_mismatch {
my ($self, $permsgstatus, undef, $minanchors, $maxanchors) = @_;
my $IP_ADDRESS = IP_ADDRESS;
my ($self, $pms, undef, $minanchors, $maxanchors) = @_;
$minanchors ||= 1;
if (!exists $permsgstatus->{chhm_hit}) {
$permsgstatus->{chhm_hit} = 0;
$permsgstatus->{chhm_anchors} = 0;
foreach my $html (@{$pms->{html_all}}) {
my $hit = 0;
my $anchors = 0;
foreach my $k (keys %{$html->{uri_detail}}) {
my $v = $html->{uri_detail}->{$k};
foreach my $k ( keys %{$permsgstatus->{html}->{uri_detail}} ) {
my %uri_detail = %{$permsgstatus->{html}->{uri_detail}};
my $v = ${uri_detail}{$k};
# if the URI wasn't used for an anchor tag, or the anchor text didn't
# exist, skip this.
next unless (exists $v->{anchor_text} && @{$v->{anchor_text}});
next unless exists $v->{anchor_text} && @{$v->{anchor_text}};
my $uri;
if ($k =~ m@^https?://([^/:]+)@i) {
if ($k =~ m@^https?://([^/:?#]+)@i) {
$uri = $1;
# Skip IPs since there's another rule to catch that already
if ($uri =~ /^$IP_ADDRESS+$/) {
undef $uri;
if ($uri =~ IS_IP_ADDRESS) {
$uri = undef;
next;
}
# want to compare whole hostnames instead of domains?
# comment this next section to the blank line.
$uri = $self->{main}->{registryboundaries}->trim_domain($uri);
undef $uri unless ($self->{main}->{registryboundaries}->is_domain_valid($uri));
my $domain = $self->{main}->{registryboundaries}->uri_to_domain($uri);
$uri = undef unless $self->{main}->{registryboundaries}->is_domain_valid($domain);
}
next unless $uri;
$permsgstatus->{chhm_anchors}++ if exists $v->{anchor_text};
$anchors++ if exists $v->{anchor_text};
foreach (@{$v->{anchor_text}}) {
if (m@https://([^/:]+)@i) {
if (m@https://([^\s/:?#]+)@i) {
my $https = $1;
# want to compare whole hostnames instead of domains?
# comment this next section to the blank line.
if ($https !~ /^$IP_ADDRESS+$/) {
if ($https !~ IS_IP_ADDRESS) {
$https = $self->{main}->{registryboundaries}->trim_domain($https);
undef $https unless ($self->{main}->{registryboundaries}->is_domain_valid($https));
$https = undef unless $self->{main}->{registryboundaries}->is_domain_valid($https);
}
next unless $https;
dbg("https_http_mismatch: domains $uri -> $https");
next if $uri eq $https;
$permsgstatus->{chhm_hit} = 1;
$hit = 1;
last;
}
}
}
dbg("https_http_mismatch: anchors ".$permsgstatus->{chhm_anchors});
dbg("https_http_mismatch: anchors $anchors");
return 1 if $hit && $anchors >= $minanchors &&
(!defined $maxanchors || $anchors < $maxanchors);
}
return ( $permsgstatus->{chhm_hit} && $permsgstatus->{chhm_anchors} >= $minanchors && (defined $maxanchors && $permsgstatus->{chhm_anchors} < $maxanchors) );
return 0;
}
1;

File diff suppressed because it is too large Load Diff

View File

@ -1,352 +0,0 @@
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to you under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# </@LICENSE>
=head1 NAME
Mail::SpamAssassin::Plugin::Hashcash - perform hashcash verification tests
=head1 SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::Hashcash
=head1 DESCRIPTION
Hashcash is a payment system for email where CPU cycles used as the
basis for an e-cash system. This plugin makes it possible to use valid
hashcash tokens added by mail programs as a bonus for messages.
=cut
=head1 USER SETTINGS
=over 4
=item use_hashcash { 1 | 0 } (default: 1)
Whether to use hashcash, if it is available.
=cut
=item hashcash_accept user@example.com ...
Used to specify addresses that we accept HashCash tokens for. You should set
it to match all the addresses that you may receive mail at.
Like whitelist and blacklist entries, the addresses are file-glob-style
patterns, so C<friend@somewhere.com>, C<*@isp.com>, or C<*.domain.net> will all
work. Specifically, C<*> and C<?> are allowed, but all other metacharacters
are not. Regular expressions are not used for security reasons.
The sequence C<%u> is replaced with the current user's username, which
is useful for ISPs or multi-user domains.
Multiple addresses per line, separated by spaces, is OK. Multiple
C<hashcash_accept> lines is also OK.
=cut
=item hashcash_doublespend_path /path/to/file (default: ~/.spamassassin/hashcash_seen)
Path for HashCash double-spend database. HashCash tokens are only usable once,
so their use is tracked in this database to avoid providing a loophole.
By default, each user has their own, in their C<~/.spamassassin> directory with
mode 0700/0600. Note that once a token is 'spent' it is written to this file,
and double-spending of a hashcash token makes it invalid, so this is not
suitable for sharing between multiple users.
=cut
=item hashcash_doublespend_file_mode (default: 0700)
The file mode bits used for the HashCash double-spend database file.
Make sure you specify this using the 'x' mode bits set, as it may also be used
to create directories. However, if a file is created, the resulting file will
not have any execute bits set (the umask is set to 111).
=cut
package Mail::SpamAssassin::Plugin::Hashcash;
use strict;
use warnings;
# use bytes;
use re 'taint';
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
use Errno qw(ENOENT EACCES);
use Fcntl;
use File::Path;
use File::Basename;
BEGIN {
eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
}
our @ISA = qw(Mail::SpamAssassin::Plugin);
use constant HAS_DB_FILE => eval { require DB_File; };
# constructor: register the eval rule
sub new {
my $class = shift;
my $mailsaobject = shift;
# some boilerplate...
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("check_hashcash_value");
$self->register_eval_rule ("check_hashcash_double_spend");
$self->set_config($mailsaobject->{conf});
return $self;
}
###########################################################################
sub set_config {
my($self, $conf) = @_;
my @cmds;
push(@cmds, {
setting => 'use_hashcash',
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push(@cmds, {
setting => 'hashcash_doublespend_path',
default => '__userstate__/hashcash_seen',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
});
push(@cmds, {
setting => 'hashcash_doublespend_file_mode',
default => "0700",
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
push(@cmds, {
setting => 'hashcash_accept',
default => {},
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
});
$conf->{parser}->register_commands(\@cmds);
}
###########################################################################
sub check_hashcash_value {
my ($self, $scanner, $valmin, $valmax) = @_;
my $val = $self->_run_hashcash($scanner);
return ($val >= $valmin && $val < $valmax);
}
sub check_hashcash_double_spend {
my ($self, $scanner) = @_;
$self->_run_hashcash($scanner);
return ($scanner->{hashcash_double_spent});
}
############################################################################
sub _run_hashcash {
my ($self, $scanner) = @_;
if (defined $scanner->{hashcash_value}) { return $scanner->{hashcash_value}; }
$scanner->{hashcash_value} = 0;
# X-Hashcash: 0:031118:camram-spam@camram.org:c068b58ade6dcbaf
# or:
# X-hashcash: 1:20:040803:hashcash@freelists.org::6dcdb3a3ad4e1b86:1519d
# X-hashcash: 1:20:040803:jm@jmason.org::6b484d06469ccb28:8838a
# X-hashcash: 1:20:040803:adam@cypherspace.org::a1cbc54bf0182ea8:5d6a0
# call down to {msg} so that we can get it as an array of
# individual headers
my @hdrs = $scanner->{msg}->get_header ("X-Hashcash");
if (scalar @hdrs == 0) {
@hdrs = $scanner->{msg}->get_header ("Hashcash");
}
foreach my $hc (@hdrs) {
my $value = $self->_run_hashcash_for_one_string($scanner, $hc);
if ($value) {
# remove the "double-spend" bool if we did find a usable string;
# this happens when one string is already spent, but another
# string has not yet been.
delete $scanner->{hashcash_double_spent};
return $value;
}
}
return 0;
}
sub _run_hashcash_for_one_string {
my ($self, $scanner, $hc) = @_;
if (!$hc) { return 0; }
$hc =~ s/\s+//gs; # remove whitespace from multiline, folded tokens
# untaint the string for paranoia, making sure not to allow \n \0 \' \"
if ($hc =~ /^[-A-Za-z0-9\xA0-\xFF:_\/\%\@\.\,\= \*\+\;]+$/) {
$hc = untaint_var($hc);
}
if (!$hc) { return 0; }
my ($ver, $bits, $date, $rsrc, $exts, $rand, $trial);
if ($hc =~ /^0:/) {
($ver, $date, $rsrc, $trial) = split (/:/, $hc, 4);
}
elsif ($hc =~ /^1:/) {
($ver, $bits, $date, $rsrc, $exts, $rand, $trial) =
split (/:/, $hc, 7);
# extensions are, as yet, unused by SpamAssassin
}
else {
dbg("hashcash: version $ver stamps not yet supported");
return 0;
}
if (!$trial) {
dbg("hashcash: no trial in stamp '$hc'");
return 0;
}
my $accept = $scanner->{conf}->{hashcash_accept};
if (!$self->_check_hashcash_resource ($scanner, $accept, $rsrc)) {
dbg("hashcash: resource $rsrc not accepted here");
return 0;
}
# get the hash collision from the token. Computing the hash collision
# is very easy (great!) -- just get SHA1(token) and count the 0 bits at
# the start of the SHA1 hash, according to the draft at
# http://www.hashcash.org/draft-hashcash.txt .
my $value = 0;
my $bitstring = unpack ("B*", sha1($hc));
$bitstring =~ /^(0+)/ and $value = length $1;
# hashcash v1 tokens: if the "claimed value" of the token is less than
# what the token actually contains (ie. token was accidentally generated
# with 24 bits instead of the claimed 20), then cut it down to just the
# claimed value. that way it's a bit tidier and more deterministic.
if ($bits && $value > $bits) {
$value = $bits;
}
dbg("hashcash: token value: $value");
if ($self->was_hashcash_token_double_spent ($scanner, $hc)) {
$scanner->{hashcash_double_spent} = 1;
return 0;
}
$scanner->{hashcash_value} = $value;
return $value;
}
sub was_hashcash_token_double_spent {
my ($self, $scanner, $token) = @_;
my $main = $self->{main};
if (!$main->{conf}->{hashcash_doublespend_path}) {
dbg("hashcash: hashcash_doublespend_path not defined or empty");
return 0;
}
if (!HAS_DB_FILE) {
dbg("hashcash: DB_File module not installed, cannot use double-spend db");
return 0;
}
my $path = $main->sed_path ($main->{conf}->{hashcash_doublespend_path});
my $parentdir = dirname ($path);
my $stat_errn = stat($parentdir) ? 0 : 0+$!;
if ($stat_errn == 0 && !-d _) {
dbg("hashcash: parent dir $parentdir exists but is not a directory");
} elsif ($stat_errn == ENOENT) {
# run in an eval(); if mkpath has no perms, it calls die()
eval {
mkpath ($parentdir, 0, (oct ($main->{conf}->{hashcash_doublespend_file_mode}) & 0777));
};
}
my %spenddb;
if (!tie %spenddb, "DB_File", $path, O_RDWR|O_CREAT,
(oct ($main->{conf}->{hashcash_doublespend_file_mode}) & 0666))
{
dbg("hashcash: failed to tie to $path: $@ $!");
# not a serious error. TODO?
return 0;
}
if (exists $spenddb{$token}) {
untie %spenddb;
dbg("hashcash: token '$token' spent already");
return 1;
}
$spenddb{$token} = time;
dbg("hashcash: marking token '$token' as spent");
# TODO: expiry?
untie %spenddb;
return 0;
}
sub _check_hashcash_resource {
my ($self, $scanner, $list, $addr) = @_;
$addr = lc $addr;
if (defined ($list->{$addr})) { return 1; }
study $addr; # study is a no-op since perl 5.16.0, eliminating related bugs
foreach my $regexp (values %{$list})
{
# allow %u == current username
# \\ is added by $conf->add_to_addrlist()
$regexp =~ s/\\\%u/$scanner->{main}->{username}/gs;
if ($addr =~ /$regexp/i) {
return 1;
}
}
# TODO: use "To" and "Cc" addresses gleaned from the mails in the Bayes
# database trained as ham, as well.
return 0;
}
############################################################################
1;
=back
=cut

View File

@ -25,12 +25,15 @@ use Errno qw(EBADF);
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Locales;
use Mail::SpamAssassin::Util qw(get_my_locales parse_rfc822_date);
use Mail::SpamAssassin::Util qw(get_my_locales parse_rfc822_date
is_valid_utf_8);
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:sa :ip);
our @ISA = qw(Mail::SpamAssassin::Plugin);
my $IP_ADDRESS = IP_ADDRESS;
# constructor: register the eval rule
sub new {
my $class = shift;
@ -42,51 +45,44 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule("check_for_fake_aol_relay_in_rcvd");
$self->register_eval_rule("check_for_faraway_charset_in_headers");
$self->register_eval_rule("check_for_unique_subject_id");
$self->register_eval_rule("check_illegal_chars");
$self->register_eval_rule("check_for_forged_hotmail_received_headers");
$self->register_eval_rule("check_for_no_hotmail_received_headers");
$self->register_eval_rule("check_for_msn_groups_headers");
$self->register_eval_rule("check_for_forged_eudoramail_received_headers");
$self->register_eval_rule("check_for_forged_yahoo_received_headers");
$self->register_eval_rule("check_for_forged_juno_received_headers");
$self->register_eval_rule("check_for_forged_gmail_received_headers");
$self->register_eval_rule("check_for_matching_env_and_hdr_from");
$self->register_eval_rule("sorted_recipients");
$self->register_eval_rule("similar_recipients");
$self->register_eval_rule("check_for_missing_to_header");
$self->register_eval_rule("check_for_forged_gw05_received_headers");
$self->register_eval_rule("check_for_shifted_date");
$self->register_eval_rule("subject_is_all_caps");
$self->register_eval_rule("check_for_to_in_subject");
$self->register_eval_rule("check_outlook_message_id");
$self->register_eval_rule("check_messageid_not_usable");
$self->register_eval_rule("check_header_count_range");
$self->register_eval_rule("check_unresolved_template");
$self->register_eval_rule("check_ratware_name_id");
$self->register_eval_rule("check_ratware_envelope_from");
$self->register_eval_rule("gated_through_received_hdr_remover");
$self->register_eval_rule("received_within_months");
$self->register_eval_rule("check_equal_from_domains");
$self->register_eval_rule("check_for_fake_aol_relay_in_rcvd", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_faraway_charset_in_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_unique_subject_id", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_illegal_chars", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_hotmail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_no_hotmail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_msn_groups_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_eudoramail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_yahoo_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_juno_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_gmail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_matching_env_and_hdr_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("sorted_recipients", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("similar_recipients", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_missing_to_header", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_gw05_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_shifted_date", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("subject_is_all_caps", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_to_in_subject", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_outlook_message_id", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_messageid_not_usable", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_header_count_range", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_unresolved_template", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_ratware_name_id", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_ratware_envelope_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("gated_through_received_hdr_remover", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("received_within_months", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_equal_from_domains", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
return $self;
}
# load triplets.txt into memory
sub compile_now_start {
my ($self) = @_;
$self->word_is_in_dictionary("aba");
}
sub check_for_fake_aol_relay_in_rcvd {
my ($self, $pms) = @_;
local ($_);
$_ = $pms->get('Received');
s/\s/ /gs;
s/\s+/ /gs;
# this is the hostname format used by AOL for their relays. Spammers love
# forging it. Don't make it more specific to match aol.com only, though --
@ -122,142 +118,20 @@ sub check_for_faraway_charset_in_headers {
return 0 if grep { $_ eq "all" } @locales;
for my $h (qw(From Subject)) {
my @hdrs = $pms->get("$h:raw"); # ??? get() returns a scalar ???
if ($#hdrs >= 0) {
$hdr = join(" ", @hdrs);
} else {
$hdr = '';
}
my @hdrs = $pms->get("$h:raw");
foreach my $hdr (@hdrs) {
while ($hdr =~ /=\?(.+?)\?.\?.*?\?=/g) {
Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales)
or return 1;
}
}
}
0;
}
# Deprecated (Bug 8051)
sub check_for_unique_subject_id {
my ($self, $pms) = @_;
local ($_);
$_ = lc $pms->get('Subject');
study; # study is a no-op since perl 5.16.0, eliminating related bugs
my $id = 0;
if (/[-_\.\s]{7,}([-a-z0-9]{4,})$/
|| /\s{10,}(?:\S\s)?(\S+)$/
|| /\s{3,}[-:\#\(\[]+([-a-z0-9]{4,})[\]\)]+$/
|| /\s{3,}[:\#\(\[]*([a-f0-9]{4,})[\]\)]*$/
|| /\s{3,}[-:\#]([a-z0-9]{5,})$/
|| /[\s._]{3,}([^0\s._]\d{3,})$/
|| /[\s._]{3,}\[(\S+)\]$/
# (7217vPhZ0-478TLdy5829qicU9-0@26) and similar
|| /\(([-\w]{7,}\@\d+)\)$/
# Seven or more digits at the end of a subject is almost certainly a id
|| /\b(\d{7,})\s*$/
# stuff at end of line after "!" or "?" is usually an id
|| /[!\?]\s*(\d{4,}|\w+(-\w+)+)\s*$/
# 9095IPZK7-095wsvp8715rJgY8-286-28 and similar
# excluding 'Re:', etc and the first word
|| /(?:\w{2,3}:\s)?\w+\s+(\w{7,}-\w{7,}(-\w+)*)\s*$/
# #30D7 and similar
|| /\s#\s*([a-f0-9]{4,})\s*$/
)
{
$id = $1;
# exempt online purchases
if ($id =~ /\d{5,}/
&& /(?:item|invoice|order|number|confirmation).{1,6}\Q$id\E\s*$/)
{
$id = 0;
}
# for the "foo-bar-baz" case, otherwise it won't
# be found in the dict:
$id =~ s/-//;
}
return ($id && !$self->word_is_in_dictionary($id));
}
# word_is_in_dictionary()
#
# See if the word looks like an English word, by checking if each triplet
# of letters it contains is one that can be found in the English language.
# Does not include triplets only found in proper names, or in the Latin
# and Greek terms that might be found in a larger dictionary
my %triplets;
my $triplets_loaded = 0;
sub word_is_in_dictionary {
my ($self, $word) = @_;
local ($_);
local $/ = "\n"; # Ensure $/ is set appropriately
# $word =~ tr/A-Z/a-z/; # already done by this stage
$word =~ s/^\s+//;
$word =~ s/\s+$//;
# If it contains a digit, dash, etc, it's not a valid word.
# Don't reject words like "can't" and "I'll"
return 0 if ($word =~ /[^a-z\']/);
# handle a few common "blah blah blah (comment)" styles
return 1 if ($word eq "ot"); # off-topic
return 1 if ($word =~ /(?:linux|nix|bsd)/); # not in most dicts
return 1 if ($word =~ /(?:whew|phew|attn|tha?nx)/); # not in most dicts
my $word_len = length($word);
# Unique IDs probably aren't going to be only one or two letters long
return 1 if ($word_len < 3);
if (!$triplets_loaded) {
# take a copy to avoid modifying the real one
my @default_triplets_path = @Mail::SpamAssassin::default_rules_path;
s{$}{/triplets.txt} for @default_triplets_path;
my $filename = $self->{main}->first_existing_path (@default_triplets_path);
if (!defined $filename) {
dbg("eval: failed to locate the triplets.txt file");
return 1;
}
local *TRIPLETS;
if (!open (TRIPLETS, "<$filename")) {
dbg("eval: failed to open '$filename', cannot check dictionary: $!");
return 1;
}
for($!=0; <TRIPLETS>; $!=0) {
chomp;
$triplets{$_} = 1;
}
defined $_ || $!==0 or
$!==EBADF ? dbg("eval: error reading from $filename: $!")
: die "error reading from $filename: $!";
close(TRIPLETS) or die "error closing $filename: $!";
$triplets_loaded = 1;
} # if (!$triplets_loaded)
my $i;
for ($i = 0; $i < ($word_len - 2); $i++) {
my $triplet = substr($word, $i, 3);
if (!$triplets{$triplet}) {
dbg("eval: unique ID: letter triplet '$triplet' from word '$word' not valid");
return 0;
}
} # for ($i = 0; $i < ($word_len - 2); $i++)
# All letter triplets in word were found to be valid
return 1;
}
# look for 8-bit and other illegal characters that should be MIME
@ -268,7 +142,18 @@ sub check_illegal_chars {
$header .= ":raw" unless $header =~ /:raw$/;
my $str = $pms->get($header);
return 0 if !defined $str || $str eq '';
return 0 if !defined $str || $str !~ /\S/;
if ($str =~ tr/\x00-\x7F//c && is_valid_utf_8($str)) {
# is non-ASCII and is valid UTF-8
if ($str =~ tr/\x00-\x08\x0B\x0C\x0E-\x1F//) {
dbg("eval: %s is valid UTF-8 but contains controls: %s", $header, $str);
} else {
# todo: only with a SMTPUTF8 mail
dbg("eval: %s is valid UTF-8: %s", $header, $str);
return 0;
}
}
# count illegal substrings (RFC 2045)
# (non-ASCII + C0 controls except TAB, NL, CR)
@ -291,12 +176,12 @@ sub gated_through_received_hdr_remover {
my ($self, $pms) = @_;
my $txt = $pms->get("Mailing-List",undef);
if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/) {
if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/m) {
my $dlto = $pms->get("Delivered-To");
my $rcvd = $pms->get("Received");
# ensure we have other indicative headers too
if ($dlto =~ /^mailing list \S+\@\S+/ &&
if ($dlto =~ /^mailing list \S+\@\S+/m &&
$rcvd =~ /qmail \d+ invoked (?:from network|by .{3,20})\); \d+ ... \d+/)
{
return 1;
@ -336,9 +221,8 @@ sub _check_for_forged_hotmail_received_headers {
return if $self->check_for_msn_groups_headers($pms);
my $ip = $pms->get('X-Originating-Ip',undef);
my $IP_ADDRESS = IP_ADDRESS;
my $orig = $pms->get('X-OriginatorOrg',undef);
my $ORIGINATOR = 'hotmail.com';
my $ORIGINATOR = qr/hotmail\.com|msonline\-outlook/;
if (defined $ip && $ip =~ /$IP_ADDRESS/) { $ip = 1; } else { $ip = 0; }
if (defined $orig && $orig =~ /$ORIGINATOR/) { $orig = 1; } else { $orig = 0; }
@ -347,6 +231,8 @@ sub _check_for_forged_hotmail_received_headers {
# Received: from hotmail.com (f135.law8.hotmail.com [216.33.241.135])
# or like
# Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-oln040092066056.outbound.protection.outlook.com [40.92.66.56])
# or
# Received: from VI1PR04MB3039.eurprd04.prod.outlook.com (2603:10a6:802:b::13)
# spammers do not ;)
if ($self->gated_through_received_hdr_remover($pms)) { return; }
@ -355,6 +241,8 @@ sub _check_for_forged_hotmail_received_headers {
{ return; }
if ($rcvd =~ /from \S*\.outbound\.protection\.outlook\.com \(\S+\.outbound\.protection\.outlook\.com[ \)]/ && $orig)
{ return; }
if ($rcvd =~ /from \S*\.eurprd\d+\.prod\.outlook\.com \($IP_ADDRESS\)/ && $orig)
{ return; }
if ($rcvd =~ /from \S*\.hotmail.com \(\[$IP_ADDRESS\][ \):]/ && $ip)
{ return; }
if ($rcvd =~ /from \S+ by \S+\.hotmail(?:\.msn)?\.com with HTTP\;/ && $ip)
@ -465,7 +353,6 @@ sub check_for_forged_eudoramail_received_headers {
$rcvd =~ s/\s+/ /gs; # just spaces, simplify the regexp
my $ip = $pms->get('X-Sender-Ip',undef);
my $IP_ADDRESS = IP_ADDRESS;
if (defined $ip && $ip =~ /$IP_ADDRESS/) { $ip = 1; } else { $ip = 0; }
# Eudoramail formats its received headers like this:
@ -518,7 +405,6 @@ sub check_for_forged_yahoo_received_headers {
if ($rcvd =~ /by web\S+\.mail\S*\.yahoo\.com via HTTP/) { return 0; }
if ($rcvd =~ /by sonic\S+\.consmr\.mail\S*\.yahoo\.com with HTTP/) { return 0; }
if ($rcvd =~ /by smtp\S+\.yahoo\.com with SMTP/) { return 0; }
my $IP_ADDRESS = IP_ADDRESS;
if ($rcvd =~
/from \[$IP_ADDRESS\] by \S+\.(?:groups|scd|dcn)\.yahoo\.com with NNFMP/) {
return 0;
@ -554,13 +440,12 @@ sub check_for_forged_juno_received_headers {
my $xorig = $pms->get('X-Originating-IP');
my $xmailer = $pms->get('X-Mailer');
my $rcvd = $pms->get('Received');
my $IP_ADDRESS = IP_ADDRESS;
if ($xorig ne '') {
# New style Juno has no X-Originating-IP header, and other changes
if($rcvd !~ /from.*\b(?:juno|untd)\.com.*[\[\(]$IP_ADDRESS[\]\)].*by/
&& $rcvd !~ / cookie\.(?:juno|untd)\.com /) { return 1; }
if($xmailer !~ /Juno /) { return 1; }
if(index($xmailer, 'Juno ') == -1) { return 1; }
} else {
if($rcvd =~ /from.*\bmail\.com.*\[$IP_ADDRESS\].*by/) {
if($xmailer !~ /\bmail\.com/) { return 1; }
@ -593,6 +478,7 @@ sub check_for_forged_gmail_received_headers {
if ($received =~ /by smtp\.googlemail\.com with ESMTPSA id \S+/) {
return 0;
}
if ( (length($xgms) >= GOOGLE_MESSAGE_STATE_LENGTH_MIN) &&
(length($xss) >= GOOGLE_SMTP_SOURCE_LENGTH_MIN)) {
return 0;
@ -637,10 +523,9 @@ sub _check_recipients {
my @inputs;
# ToCc: pseudo-header works best, but sometimes Bcc: is better
for ('ToCc', 'Bcc') {
my $to = $pms->get($_); # get recipients
$to =~ s/\(.*?\)//g; # strip out the (comments)
push(@inputs, ($to =~ m/([\w.=-]+\@\w+(?:[\w.-]+\.)+\w+)/g));
for ('ToCc:addr', 'Bcc:addr') {
my @to = $pms->get($_); # get recipients
push @inputs, @to;
last if scalar(@inputs) >= TOCC_SIMILAR_COUNT;
}

View File

@ -99,13 +99,13 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("image_count");
$self->register_eval_rule ("pixel_coverage");
$self->register_eval_rule ("image_size_exact");
$self->register_eval_rule ("image_size_range");
$self->register_eval_rule ("image_named");
$self->register_eval_rule ("image_name_regex");
$self->register_eval_rule ("image_to_text_ratio");
$self->register_eval_rule ("image_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pixel_coverage", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("image_size_exact", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("image_size_range", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("image_named", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("image_name_regex", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("image_to_text_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -256,7 +256,7 @@ sub _get_images {
sub image_named {
my ($self,$pms,$body,$name) = @_;
return unless (defined $name);
return 0 unless (defined $name);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -272,7 +272,7 @@ sub image_named {
sub image_name_regex {
my ($self,$pms,$body,$re) = @_;
return unless (defined $re);
return 0 unless (defined $re);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -300,7 +300,7 @@ sub image_name_regex {
sub image_count {
my ($self,$pms,$body,$type,$min,$max) = @_;
return unless defined $min;
return 0 unless defined $min;
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -316,7 +316,7 @@ sub image_count {
sub pixel_coverage {
my ($self,$pms,$body,$type,$min,$max) = @_;
return unless (defined $type && defined $min);
return 0 unless (defined $type && defined $min);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -331,7 +331,7 @@ sub pixel_coverage {
sub image_to_text_ratio {
my ($self,$pms,$body,$type,$min,$max) = @_;
return unless (defined $type && defined $min && defined $max);
return 0 unless (defined $type && defined $min && defined $max);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -353,7 +353,7 @@ sub image_to_text_ratio {
sub image_size_exact {
my ($self,$pms,$body,$type,$height,$width) = @_;
return unless (defined $type && defined $height && defined $width);
return 0 unless (defined $type && defined $height && defined $width);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -369,7 +369,7 @@ sub image_size_exact {
sub image_size_range {
my ($self,$pms,$body,$type,$minh,$minw,$maxh,$maxw) = @_;
return unless (defined $type && defined $minh && defined $minw);
return 0 unless (defined $type && defined $minh && defined $minw);
# make sure we have image data read in.
if (!exists $pms->{'imageinfo'}) {
@ -377,7 +377,7 @@ sub image_size_range {
}
my $name = 'dems_'.$type;
return unless (exists $pms->{'imageinfo'}->{$name});
return 0 unless (exists $pms->{'imageinfo'}->{$name});
foreach my $dem ( keys %{$pms->{'imageinfo'}->{"dems_$type"}}) {
my ($h,$w) = split(/x/,$dem);

View File

@ -68,18 +68,18 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule("check_for_mime");
$self->register_eval_rule("check_for_mime_html");
$self->register_eval_rule("check_for_mime_html_only");
$self->register_eval_rule("check_mime_multipart_ratio");
$self->register_eval_rule("check_msg_parse_flags");
$self->register_eval_rule("check_for_ascii_text_illegal");
$self->register_eval_rule("check_abundant_unicode_ratio");
$self->register_eval_rule("check_for_faraway_charset");
$self->register_eval_rule("check_for_uppercase");
$self->register_eval_rule("check_ma_non_text");
$self->register_eval_rule("check_base64_length");
$self->register_eval_rule("check_qp_ratio");
$self->register_eval_rule("check_for_mime", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_for_mime_html", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_for_mime_html_only", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_mime_multipart_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_msg_parse_flags", $Mail::SpamAssassin::Conf::TYPE_HEADER_EVALS);
$self->register_eval_rule("check_for_ascii_text_illegal", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_abundant_unicode_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_for_faraway_charset", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_for_uppercase", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_ma_non_text", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_base64_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule("check_qp_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -89,8 +89,7 @@ sub new {
sub are_more_high_bits_set {
my ($self, $str) = @_;
# TODO: I suspect a tr// trick may be faster here
my $numhis = () = ($str =~ /[\200-\377]/g);
my $numhis = $str =~ tr/\x00-\x7F//c; # number of non-ASCII chars
my $numlos = length($str) - $numhis;
($numlos <= $numhis && $numhis > 3);
@ -182,7 +181,7 @@ sub check_for_mime {
$self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
return 0 unless exists $pms->{$test};
return $pms->{$test};
return $pms->{$test} ? 1 : 0;
}
# any text/html MIME part
@ -232,19 +231,19 @@ sub _check_mime_header {
$pms->{mime_body_text_count}++;
}
if ($cte =~ /base64/) {
if (index($cte, 'base64') >= 0) {
$pms->{mime_base64_count}++;
}
elsif ($cte =~ /quoted-printable/) {
elsif (index($cte, 'quoted-printable') >= 0) {
$pms->{mime_qp_count}++;
}
if ($cd && $cd =~ /attachment/) {
if ($cd && index($cd, 'attachment') >= 0) {
$pms->{mime_attachment}++;
}
if ($ctype =~ /^text/ &&
$cte =~ /base64/ &&
index($cte, 'base64') >= 0 &&
(!$charset || $charset =~ /(?:us-ascii|ansi_x3\.4-1968|iso-ir-6|ansi_x3\.4-1986|iso_646\.irv:1991|ascii|iso646-us|us|ibm367|cp367|csascii)/) &&
!($cd && $cd =~ /^(?:attachment|inline)/))
{
@ -366,9 +365,9 @@ sub _check_attachments {
$part++;
$part_type[$part] = $ctype;
$part_bytes[$part] = 0 if $cd !~ /attachment/;
$part_bytes[$part] = 0 if index($cd, 'attachment') == -1;
my $cte_is_base64 = $cte =~ /base64/i;
my $cte_is_base64 = index($cte, 'base64') >= 0;
my $previous = '';
foreach (@{$p->raw()}) {
@ -385,12 +384,12 @@ sub _check_attachments {
# if ($pms->{mime_html_no_charset} && $ctype eq 'text/html' && defined $charset) {
# $pms->{mime_html_no_charset} = 0;
# }
if ($pms->{mime_multipart_alternative} && $cd !~ /attachment/ &&
if ($pms->{mime_multipart_alternative} && index($cd, 'attachment') == -1 &&
($ctype eq 'text/plain' || $ctype eq 'text/html')) {
$part_bytes[$part] += length;
}
if ($where != 1 && $cte eq "quoted-printable" && ! /^SPAM: /) {
if ($where != 1 && $cte eq "quoted-printable" && index($_, 'SPAM: ') != 0) {
# RFC 5322: Each line SHOULD be no more than 78 characters,
# excluding the CRLF.
# RFC 2045: The Quoted-Printable encoding REQUIRES that
@ -415,9 +414,11 @@ sub _check_attachments {
# }
# count excessive QP bytes
if (index($_, '=') != -1) {
if (index($_, '=') >= 0) {
## no critic (Perlsecret)
# whoever wrote this next line is an evil hacker -- jm
my $qp = () = m/=(?:09|3[0-9ABCEF]|[2456][0-9A-F]|7[0-9A-E])/g;
## use critic
if ($qp) {
$qp_count += $qp;
# tabs and spaces at end of encoded line are okay. Also, multiple
@ -630,7 +631,7 @@ sub get_charset_from_ct_line {
sub check_ma_non_text {
my($self, $pms) = @_;
foreach my $map ($pms->{msg}->find_parts(qr@^multipart/alternative$@i)) {
foreach my $map ($pms->{msg}->find_parts(qr@^multipart/alternative$@)) {
foreach my $p ($map->find_parts(qr/./, 1, 0)) {
next if (lc $p->{'type'} eq 'multipart/related');
next if (lc $p->{'type'} eq 'application/rtf');

View File

@ -51,6 +51,23 @@ around the newline character in "folded" headers will be replaced with a single
space. Append C<:raw> to the header name to retrieve the raw, undecoded value,
including pristine whitespace, instead.
=item tflags NAME_OF_RULE range=x-y
Match only from specific MIME parts, indexed in the order they are parsed.
Part 1 = main message headers. Part 2 = next part etc.
range=1 (match only main headers, not any subparts)
range=2- (match any subparts, but not the main headers)
range=-3 (match only first three parts, including main headers)
range=2-3 (match only first two subparts)
=item tflags NAME_OF_RULE concat
Concatenate all headers from all mime parts (possible range applied) into a
single string for matching. This allows matching headers across multiple
parts with single regex. Normally pattern is tested individually for
different mime parts.
=back
=cut
@ -151,6 +168,9 @@ sub set_config {
$self->{parser}->add_test($rulename, $evalfn."()",
$Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
# Support named regex captures
$self->{parser}->parse_captures($rulename, $rec);
# evalfn/rulename safe, sanitized by $RULENAME_RE
my $evalcode = '
sub Mail::SpamAssassin::Plugin::MIMEHeader::'.$evalfn.' {
@ -178,37 +198,104 @@ sub set_config {
# ---------------------------------------------------------------------------
sub eval_hook_called {
my ($pobj, $scanner, $rulename) = @_;
my ($pobj, $pms, $rulename) = @_;
my $rule = $scanner->{conf}->{mimeheader_tests}->{$rulename};
my $conf = $pms->{conf};
my $rule = $conf->{mimeheader_tests}->{$rulename};
my $hdr = $rule->{hdr};
my $negated = $rule->{negated};
my $if_unset = $rule->{if_unset};
my $pattern = $rule->{pattern};
my $tflags = $conf->{tflags}->{$rulename}||'';
my $getraw;
my $getraw = 0;
if ($hdr =~ s/:raw$//) {
$getraw = 1;
} else {
$getraw = 0;
}
foreach my $p ($scanner->{msg}->find_parts(qr/./)) {
my $range_min = 0;
my $range_max = 1000;
if ($tflags =~ /(?:^|\s)range=(\d+)?(-)?(\d+)?(?:\s|$)/) {
if (defined $1 && defined $2 && defined $3) {
$range_min = $1;
$range_max = $3;
}
elsif (defined $1 && defined $2) {
$range_min = $1;
}
elsif (defined $2 && defined $3) {
$range_max = $3;
}
elsif (defined $1) {
$range_min = $range_max = $1;
}
}
my $multiple = $tflags =~ /\bmultiple\b/;
my $concat = $tflags =~ /\bconcat\b/;
my $maxhits = $tflags =~ /\bmaxhits=(\d+)\b/ ? $1 :
$multiple ? 1000 : 1;
my $cval = '';
my $idx = 0;
foreach my $p ($pms->{msg}->find_parts(qr/./)) {
$idx++;
last if $idx > $range_max;
next if $idx < $range_min;
my $val;
if ($getraw) {
if ($hdr eq 'ALL') {
$val = $p->get_all_headers($getraw, 0);
} elsif ($getraw) {
$val = $p->raw_header($hdr);
} else {
$val = $p->get_header($hdr);
}
$val = $if_unset if !defined $val;
$val = $rule->{if_unset} if !defined $val;
if ($val =~ $pattern) {
return ($negated ? 0 : 1);
if ($concat) {
$val .= "\n" unless $val =~ /\n$/;
$cval .= $val;
next;
}
if (_check($pms, $rulename, $val, $pattern, $negated, $maxhits, "part $idx")) {
return 0;
}
}
return ($negated ? 1 : 0);
if ($concat) {
if (_check($pms, $rulename, $cval, $pattern, $negated, $maxhits, 'concat')) {
return 0;
}
}
if ($negated) {
dbg("mimeheader: ran rule $rulename ======> got hit: \"<negative match>\"");
return 1;
}
return 0;
}
sub _check {
my ($pms, $rulename, $value, $pattern, $negated, $maxhits, $desc) = @_;
my $hits = 0;
my %captures;
while ($value =~ /$pattern/gp) {
last if $negated;
if (%-) {
foreach my $cname (keys %-) {
push @{$captures{$cname}}, grep { $_ ne "" } @{$-{$cname}};
}
}
my $match = defined ${^MATCH} ? ${^MATCH} : "<negative match>";
$pms->got_hit($rulename, '', ruletype => 'eval');
dbg("mimeheader: ran rule $rulename ======> got hit: \"$match\" ($desc)");
last if ++$hits >= $maxhits;
}
$pms->set_captures(\%captures) if %captures;
return $hits;
}
# ---------------------------------------------------------------------------
@ -224,4 +311,10 @@ sub finish_tests {
# ---------------------------------------------------------------------------
sub has_all_header { 1 } # Supports ALL header query (Bug 5582)
sub has_tflags_range { 1 } # Supports tflags range=x-y
sub has_tflags_concat { 1 } # Supports tflags concat
sub has_tflags_multiple { 1 } # Supports tflags multiple
sub has_capture_rules { 1 } # Supports named regex captures (Bug 7992)
1;

File diff suppressed because it is too large Load Diff

View File

@ -66,6 +66,20 @@ sub check_start {
}
}
sub check_cleanup {
my ($self, $params) = @_;
my $pms = $params->{permsgstatus};
my $scoresptr = $pms->{conf}->{scores};
# Force all body rules ready for meta rules. Need to do it here in
# cleanup, because the body is scanned per line instead of per rule
if ($pms->{conf}->{skip_body_rules}) {
foreach (keys %{$pms->{conf}->{skip_body_rules}}) {
$pms->rule_ready($_, 1) if $scoresptr->{$_};
}
}
}
###########################################################################
1;
@ -96,13 +110,16 @@ sub do_one_line_body_tests {
if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
{
$sub .= '
my $hitsptr = $self->{tests_already_hit};
';
# support multiple matches
my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
$max = untaint_var($max);
if ($max) {
$sub .= '
if (exists $self->{tests_already_hit}->{q{'.$rulename.'}}) {
return 0 if $self->{tests_already_hit}->{q{'.$rulename.'}} >= '.$max.';
if ($hitsptr->{q{'.$rulename.'}}) {
return 0 if $hitsptr->{q{'.$rulename.'}} >= '.$max.';
}
';
}
@ -111,17 +128,17 @@ sub do_one_line_body_tests {
my $lref = \$line;
pos $$lref = 0;
'.$self->hash_line_for_rule($pms, $rulename).'
while ($$lref =~ /$qrptr->{q{'.$rulename.'}}/go) {
while ($$lref =~ /$qrptr->{q{'.$rulename.'}}/gop) {
$self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "one_line_body");
'. $self->hit_rule_plugin_code($pms, $rulename, "one_line_body", "") . '
'. ($max? 'last if $self->{tests_already_hit}->{q{'.$rulename.'}} >= '.$max.';' : '') . '
'. ($max? 'last if $hitsptr->{q{'.$rulename.'}} >= '.$max.';' : '') . '
}
';
} else {
$sub .= '
'.$self->hash_line_for_rule($pms, $rulename).'
if ($line =~ /$qrptr->{q{'.$rulename.'}}/o) {
if ($line =~ /$qrptr->{q{'.$rulename.'}}/op) {
$self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "one_line_body");
'. $self->hit_rule_plugin_code($pms, $rulename, "one_line_body", "return 1") . '
}
@ -129,6 +146,11 @@ sub do_one_line_body_tests {
}
# Make sure rule is marked ready for meta rules
$sub .= '
$self->rule_ready(q{'.$rulename.'}, 1);
';
return if ($opts{doing_user_rules} &&
!$self->is_user_rule_sub($rulename.'_one_line_body_test'));

View File

@ -130,6 +130,27 @@ This plugin helps detected spam using attached PDF files
body RULENAME eval:pdf_is_empty_body(<bytes>)
bytes: maximum byte count to allow and still consider it empty
pdf_image_to_text_ratio()
body RULENAME eval:pdf_image_to_text_ratio(<min>,<max>)
Ratio calculated as body_length / total_image_area
min: minimum ratio
max: maximum ratio
pdf_image_size_exact()
body RULENAME eval:pdf_image_size_exact(<h>,<w>)
h: image height is exactly h
w: image width is exactly w
pdf_image_size_range()
body RULENAME eval:pdf_image_size_range(<minh>,<minw>,[<maxh>],[<maxw>])
minh: image height is atleast minh
minw: image width is atleast minw
maxh: (optional) image height is no more than maxh
maxw: (optional) image width is no more than maxw
NOTE: See the ruleset for more examples that are not documented here.
=back
@ -145,9 +166,8 @@ use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(compile_regexp);
use strict;
use warnings;
# use bytes;
use re 'taint';
use Digest::MD5 qw(md5_hex);
use MIME::QuotedPrint;
our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -161,89 +181,115 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("pdf_count");
$self->register_eval_rule ("pdf_image_count");
$self->register_eval_rule ("pdf_pixel_coverage");
$self->register_eval_rule ("pdf_image_size_exact");
$self->register_eval_rule ("pdf_image_size_range");
$self->register_eval_rule ("pdf_named");
$self->register_eval_rule ("pdf_name_regex");
$self->register_eval_rule ("pdf_image_to_text_ratio");
$self->register_eval_rule ("pdf_match_md5");
$self->register_eval_rule ("pdf_match_fuzzy_md5");
$self->register_eval_rule ("pdf_match_details");
$self->register_eval_rule ("pdf_is_encrypted");
$self->register_eval_rule ("pdf_is_empty_body");
$self->register_eval_rule ("pdf_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_image_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_pixel_coverage", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_image_size_exact", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_image_size_range", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_named", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_name_regex", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_image_to_text_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_match_md5", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_match_fuzzy_md5", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_match_details", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_is_encrypted", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->register_eval_rule ("pdf_is_empty_body", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
# lower priority for add_uri_detail_list to work
$self->register_method_priority ("parsed_metadata", -1);
return $self;
}
# -----------------------------------------
sub parsed_metadata {
my ($self, $opts) = @_;
my %get_details = (
'pdf' => sub {
my ($self, $pms, $part) = @_;
my $pms = $opts->{permsgstatus};
my $type = $part->{'type'} || 'base64';
my $data = '';
# initialize
$pms->{pdfinfo}->{count_pdf} = 0;
$pms->{pdfinfo}->{count_pdf_images} = 0;
if ($type eq 'quoted-printable') {
$data = decode_qp($data); # use QuotedPrint->decode_qp
}
else {
$data = $part->decode(); # just use built in base64 decoder
my @parts = $pms->{msg}->find_parts(qr@^(image|application)/(pdf|octet\-stream)$@, 1);
my $part_count = scalar @parts;
dbg("pdfinfo: Identified $part_count possible mime parts that need checked for PDF content");
foreach my $p (@parts) {
my $type = $p->{type} || '';
my $name = $p->{name} || '';
dbg("pdfinfo: found part, type=$type file=$name");
# filename must end with .pdf, or application type can be pdf
# sometimes windows muas will wrap a pdf up inside a .dat file
# v0.8 - Added .fdf phoney PDF detection
next unless ($name =~ /\.[fp]df$/i || $type =~ m@/pdf$@);
_get_pdf_details($pms, $p);
$pms->{pdfinfo}->{count_pdf}++;
}
my $index = substr($data, 0, 8);
_set_tag($pms, 'PDFCOUNT', $pms->{pdfinfo}->{count_pdf});
_set_tag($pms, 'PDFIMGCOUNT', $pms->{pdfinfo}->{count_pdf_images});
}
return unless ($index =~ /.PDF\-(\d\.\d)/);
sub _get_pdf_details {
my ($pms, $part) = @_;
my $data = $part->decode();
# Remove UTF-8 BOM
$data =~ s/^\xef\xbb\xbf//;
# Search magic in first 1024 bytes
if ($data !~ /^.{0,1024}\%PDF\-(\d\.\d)/s) {
dbg("pdfinfo: PDF magic header not found, invalid file?");
return;
}
my $version = $1;
$self->_set_tag($pms, 'PDFVERSION', $version);
_set_tag($pms, 'PDFVERSION', $version);
# dbg("pdfinfo: pdf version = $version");
my ($height, $width, $fuzzy_data, $pdf_tags);
my ($producer, $created, $modified, $title, $creator, $author) = ('unknown','0','0','untitled','unknown','unknown');
my ($md5, $fuzzy_md5) = ('', '');
my ($fuzzy_data, $pdf_tags);
my ($md5, $fuzzy_md5) = ('','');
my ($total_height, $total_width, $total_area, $line_count) = (0,0,0,0);
my $name = $part->{'name'} || '';
$self->_set_tag($pms, 'PDFNAME', $name);
my $name = $part->{name} || '';
_set_tag($pms, 'PDFNAME', $name);
# store the file name so we can check pdf_named() or pdf_name_match() later.
$pms->{pdfinfo}->{names_pdf}->{$name} = 1 if $name;
my $no_more_fuzzy = 0;
my $got_image = 0;
my $encrypted = 0;
my %uris;
while($data =~ /([^\n]+)/g) {
while ($data =~ /([^\n]+)/g) {
# dbg("pdfinfo: line=$1");
my $line = $1;
$line_count++;
# lines containing high bytes will have no data we need, so save some cycles
next if ($line =~ /[\x80-\xff]/);
if (!$no_more_fuzzy && $line_count < 70) {
if (!$no_more_fuzzy && ++$line_count < 70) {
if ($line !~ m/^\%/ && $line !~ m/^\/(?:Height|Width|(?:(?:Media|Crop)Box))/ && $line !~ m/^\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+cm$/) {
$line =~ s/\s+$//; # strip off whitespace at end.
$fuzzy_data .= $line;
}
}
if ($line =~ m/^\/([A-Za-z]+)/) {
$pdf_tags .= $1;
}
$got_image=1 if ($line =~ m/\/Image/);
$encrypted=1 if ($line =~ m/^\/Encrypt/);
# once we hit the first stream, we stop collecting data for fuzzy md5
$no_more_fuzzy = 1 if ($line =~ m/stream/);
$no_more_fuzzy = 1 if index($line, 'stream') >= 0;
}
$got_image = 1 if index($line, '/Image') >= 0;
if (!$encrypted && index($line, '/Encrypt') == 0) {
# store encrypted flag.
$encrypted = $pms->{pdfinfo}->{encrypted} = 1;
}
# From a v1.3 pdf
# [12234] dbg: pdfinfo: line=630 0 0 149 0 0 cm
# [12234] dbg: pdfinfo: line=/Width 630
# [12234] dbg: pdfinfo: line=/Height 149
if ($got_image) {
my ($width, $height);
if ($line =~ /^(\d+)\s+\d+\s+\d+\s+(\d+)\s+\d+\s+\d+\s+cm$/) {
$width = $1;
$height = $2;
@ -258,21 +304,38 @@ my %get_details = (
$width = $1;
$height = $2;
}
}
# did pdf contain image data?
if ($got_image && $width && $height) {
if ($width && $height) {
$no_more_fuzzy = 1;
my $area = $width * $height;
$total_height += $height;
$total_width += $width;
$total_area += $area;
$pms->{pdfinfo}->{dems_pdf}->{"${height}x${width}"} = 1;
$pms->{'pdfinfo'}->{"count_pdf_images"} ++;
dbg("pdfinfo: Found image in PDF ".($name ? $name : '')." - $height x $width pixels ($area pixels sq.)");
$self->_set_tag($pms, 'PDFIMGDIM', "${height}x${width}");
$height=0; $width=0; # reset and check for next image
$got_image = 0;
$pms->{pdfinfo}->{count_pdf_images}++;
dbg("pdfinfo: Found image in PDF $name: $height x $width pixels ($area pixels sq.)");
_set_tag($pms, 'PDFIMGDIM', "${height}x${width}");
$got_image = $height = $width = 0; # reset and check for next image
}
}
#
# Triage - expecting / to be found for rest of the checks
#
next unless index($line, '/') >= 0;
if ($line =~ m/^\/([A-Za-z]+)/) {
$pdf_tags .= $1;
}
# XXX some pdf have uris but are stored inside binary data
if (keys %uris < 20 && $line =~ /(?:\/S\s{0,2}\/URI\s{0,2}|^\s*)\/URI\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
my $location = _parse_string($1);
next unless index($location, '.') > 0; # ignore some binary mess
if (!exists $uris{$location}) {
$uris{$location} = 1;
dbg("pdfinfo: found URI: $location");
$pms->add_uri_detail_list($location);
}
}
# [5310] dbg: pdfinfo: line=<</Producer(GPL Ghostscript 8.15)
@ -283,380 +346,231 @@ my %get_details = (
# [5310] dbg: pdfinfo: line=/Author(colet)>>endobj
# or all on same line inside xml - v1.6+
# <</CreationDate(D:20070226165054-06'00')/Creator( Adobe Photoshop CS2 Windows)/Producer(Adobe Photoshop for Windows -- Image Conversion Plug-in)/ModDate(D:20070226165100-06'00')>>
if ($line =~ /\/Producer\s?\(([^\)\\]+)/) {
$producer = $1;
# Or hex values
# /Creator<FEFF005700720069007400650072>
if ($line =~ /\/Author\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
my $author = _parse_string($1);
dbg("pdfinfo: found property Author=$author");
$pms->{pdfinfo}->{details}->{author}->{$author} = 1;
_set_tag($pms, 'PDFAUTHOR', $author);
}
if ($line =~ /\/CreationDate\s?\(D\:(\d+)/) {
$created = $1;
if ($line =~ /\/Creator\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
my $creator = _parse_string($1);
dbg("pdfinfo: found property Creator=$creator");
$pms->{pdfinfo}->{details}->{creator}->{$creator} = 1;
_set_tag($pms, 'PDFCREATOR', $creator);
}
if ($line =~ /\/ModDate\s?\(D\:(\d+)/) {
$modified = $1;
if ($line =~ /\/CreationDate\s{0,2}\(D\:(\d+)/) {
my $created = _parse_string($1);
dbg("pdfinfo: found property Created=$created");
$pms->{pdfinfo}->{details}->{created}->{$created} = 1;
}
if ($line =~ /\/Title\s?\(([^\)\\]+)/) {
$title = $1;
# Title=\376\377\000w\000w\000n\000g
# Title=wwng
$title =~ s/\\\d{3}//g;
if ($line =~ /\/ModDate\s{0,2}\(D\:(\d+)/) {
my $modified = _parse_string($1);
dbg("pdfinfo: found property Modified=$modified");
$pms->{pdfinfo}->{details}->{modified}->{$modified} = 1;
}
if ($line =~ /\/Creator\s?\(([^\)\\]+)/) {
$creator = $1;
if ($line =~ /\/Producer\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
my $producer = _parse_string($1);
dbg("pdfinfo: found property Producer=$producer");
$pms->{pdfinfo}->{details}->{producer}->{$producer} = 1;
_set_tag($pms, 'PDFPRODUCER', $producer);
}
if ($line =~ /\/Author\s?\(([^\)]+)/) {
$author = $1;
# Author=\376\377\000H\000P\000_\000A\000d\000m\000i\000n\000i\000s\000t\000r\000a\000t\000o\000r
# Author=HP_Administrator
$author =~ s/\\\d{3}//g;
if ($line =~ /\/Title\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
my $title = _parse_string($1);
dbg("pdfinfo: found property Title=$title");
$pms->{pdfinfo}->{details}->{title}->{$title} = 1;
_set_tag($pms, 'PDFTITLE', $title);
}
}
# store the file name so we can check pdf_named() or pdf_name_match() later.
$pms->{pdfinfo}->{names_pdf}->{$name} = 1 if $name;
# store encrypted flag.
$pms->{pdfinfo}->{encrypted} = $encrypted;
# if we had multiple images in the pdf, we need to store the total HxW as well.
# If it was a single Image PDF, then this value will already be in the hash.
$pms->{pdfinfo}->{dems_pdf}->{"${total_height}x${total_width}"} = 1 if ($total_height && $total_width);;
$pms->{pdfinfo}->{dems_pdf}->{"${total_height}x${total_width}"} = 1 if ($total_height && $total_width);
if ($total_area) {
$pms->{pdfinfo}->{pc_pdf} = $total_area;
$self->_set_tag($pms, 'PDFIMGAREA', $total_area);
dbg("pdfinfo: Filename=$name Total HxW: $total_height x $total_width ($total_area area)") if ($total_area);
_set_tag($pms, 'PDFIMGAREA', $total_area);
dbg("pdfinfo: Total HxW: $total_height x $total_width ($total_area area)");
}
dbg("pdfinfo: Filename=$name Title=$title Author=$author Producer=$producer Created=$created Modified=$modified");
$md5 = uc(md5_hex($data)) if $data;
$fuzzy_md5 = uc(md5_hex($fuzzy_data)) if $fuzzy_data;
my $tags_md5;
my $tags_md5 = '';
$tags_md5 = uc(md5_hex($pdf_tags)) if $pdf_tags;
dbg("pdfinfo: MD5 results for ".($name ? $name : '')." - md5=".($md5 ? $md5 : '')." fuzzy1=".($fuzzy_md5 ? $fuzzy_md5 : '')." fuzzy2=".($tags_md5 ? $tags_md5 : ''));
dbg("pdfinfo: MD5 results for $name: md5=$md5 fuzzy1=$fuzzy_md5 fuzzy2=$tags_md5");
# we dont need tags for these.
$pms->{pdfinfo}->{details}->{created} = $created if $created;
$pms->{pdfinfo}->{details}->{modified} = $modified if $modified;
if ($producer) {
$pms->{pdfinfo}->{details}->{producer} = $producer if $producer;
$self->_set_tag($pms, 'PDFPRODUCER', $producer);
}
if ($title) {
$pms->{pdfinfo}->{details}->{title} = $title;
$self->_set_tag($pms, 'PDFTITLE', $title);
}
if ($creator) {
$pms->{pdfinfo}->{details}->{creator} = $creator;
$self->_set_tag($pms, 'PDFCREATOR', $creator);
}
if ($author) {
$pms->{pdfinfo}->{details}->{author} = $author;
$self->_set_tag($pms, 'PDFAUTHOR', $author);
}
if ($md5) {
$pms->{pdfinfo}->{md5}->{$md5} = 1;
$self->_set_tag($pms, 'PDFMD5', $fuzzy_md5);
_set_tag($pms, 'PDFMD5', $fuzzy_md5);
}
if ($fuzzy_md5) {
$pms->{pdfinfo}->{fuzzy_md5}->{$fuzzy_md5} = 1;
$self->_set_tag($pms, 'PDFMD5FUZZY1', $fuzzy_md5);
_set_tag($pms, 'PDFMD5FUZZY1', $fuzzy_md5);
}
if ($tags_md5) {
$pms->{pdfinfo}->{fuzzy_md5}->{$tags_md5} = 1;
$self->_set_tag($pms, 'PDFMD5FUZZY2', $tags_md5);
_set_tag($pms, 'PDFMD5FUZZY2', $tags_md5);
}
},
}
);
# ----------------------------------------
sub _parse_string {
local $_ = shift;
# Anything inside < > is hex encoded
if (/^</) {
# Might contain whitespace so search all hex values
my $str = '';
$str .= pack("H*", $1) while (/([0-9A-Fa-f]{2})/g);
$_ = $str;
# Handle/strip UTF-16 (in ultra-naive way for now)
s/\x00//g if (s/^(?:\xfe\xff|\xff\xfe)//);
} else {
s/^\(//; s/\)$//;
# Decode octals
# Author=\376\377\000H\000P\000_\000A\000d\000m\000i\000n\000i\000s\000t\000r\000a\000t\000o\000r
s/(?<!\\)\\([0-3][0-7][0-7])/pack("C",oct($1))/ge;
# Handle/strip UTF-16 (in ultra-naive way for now)
s/\x00//g if (s/^(?:\xfe\xff|\xff\xfe)//);
# Unescape some stuff like \\ \( \)
# Title(Foo \(bar\))
s/\\([()\\])/$1/g;
}
# Limit to some sane length
return substr($_, 0, 256);
}
sub _set_tag {
my ($pms, $tag, $value) = @_;
my ($self, $pms, $tag, $value) = @_;
dbg("pdfinfo: set_tag called for $tag $value");
return unless ($tag && $value);
return unless defined $value && $value ne '';
dbg("pdfinfo: set_tag called for $tag: $value");
if (exists $pms->{tag_data}->{$tag}) {
$pms->{tag_data}->{$tag} .= " $value"; # append value
# Limit to some sane length
if (length($pms->{tag_data}->{$tag}) < 2048) {
$pms->{tag_data}->{$tag} .= ' '.$value; # append value
}
}
else {
$pms->{tag_data}->{$tag} = $value;
}
}
# ----------------------------------------
sub _find_pdf_mime_parts {
my ($self,$pms) = @_;
# bail early if message does not have pdf parts
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
# initialize
$pms->{'pdfinfo'}->{"pc_pdf"} = 0;
$pms->{'pdfinfo'}->{"count_pdf"} = 0;
$pms->{'pdfinfo'}->{"count_pdf_images"} = 0;
my @parts = $pms->{msg}->find_parts(qr@^(image|application)/(pdf|octet\-stream)$@, 1);
my $part_count = scalar @parts;
dbg("pdfinfo: Identified $part_count possible mime parts that need checked for PDF content");
# cache this so we can easily bail
$pms->{'pdfinfo'}->{'no_parts'} = 1 unless $part_count;
foreach my $p (@parts) {
my $type = $p->{'type'} =~ m@/([\w\-]+)$@;
my $name = $p->{'name'} || '';
my $cte = lc( $p->get_header('content-transfer-encoding') || '' );
dbg("pdfinfo: found part, type=".($type ? $type : '')." file=".($name ? $name : '')." cte=".($cte ? $cte : '')."");
# make sure its a cte we support
next unless ($cte =~ /^(?:base64|quoted\-printable)$/);
# filename must end with .pdf, or application type can be pdf
# sometimes windows muas will wrap a pdf up inside a .dat file
# v0.8 - Added .fdf phoney PDF detection
next unless ($name =~ /\.[fp]df$/ || $type eq 'pdf');
# if we get this far, make sure type is pdf for sure (not octet-stream or anything else)
$type='pdf';
if ($type && exists $get_details{$type}) {
$get_details{$type}->($self, $pms, $p);
$pms->{'pdfinfo'}->{"count_$type"} ++;
}
}
$self->_set_tag($pms, 'PDFCOUNT', $pms->{'pdfinfo'}->{"count_pdf"});
$self->_set_tag($pms, 'PDFIMGCOUNT', $pms->{'pdfinfo'}->{"count_pdf_images"});
}
# ----------------------------------------
sub pdf_named {
my ($self,$pms,$body,$name) = @_;
return unless (defined $name);
my ($self, $pms, $body, $name) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 unless defined $name;
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"names_pdf"});
return 1 if (exists $pms->{'pdfinfo'}->{"names_pdf"}->{$name});
return 1 if exists $pms->{pdfinfo}->{names_pdf}->{$name};
return 0;
}
# -----------------------------------------
sub pdf_name_regex {
my ($self,$pms,$body,$re) = @_;
return unless (defined $re);
my ($self, $pms, $body, $regex) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 unless defined $regex;
return 0 unless exists $pms->{pdfinfo}->{names_pdf};
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"names_pdf"});
my ($rec, $err) = compile_regexp($re, 2);
my ($rec, $err) = compile_regexp($regex, 2);
if (!$rec) {
info("pdfinfo: invalid regexp '$re': $err");
my $rulename = $pms->get_current_eval_rule_name();
warn "pdfinfo: invalid regexp for $rulename '$regex': $err";
return 0;
}
my $hit = 0;
foreach my $name (keys %{$pms->{'pdfinfo'}->{"names_pdf"}}) {
foreach my $name (keys %{$pms->{pdfinfo}->{names_pdf}}) {
if ($name =~ $rec) {
dbg("pdfinfo: pdf_name_regex hit on $name");
return 1;
}
}
return 0;
}
# -----------------------------------------
sub pdf_is_encrypted {
my ($self,$pms,$body) = @_;
my ($self, $pms, $body) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return $pms->{'pdfinfo'}->{'encrypted'};
return $pms->{pdfinfo}->{encrypted} ? 1 : 0;
}
# -----------------------------------------
sub pdf_count {
my ($self,$pms,$body,$min,$max) = @_;
return unless defined $min;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"count_pdf"});
return result_check($min, $max, $pms->{'pdfinfo'}->{"count_pdf"});
my ($self, $pms, $body, $min, $max) = @_;
return _result_check($min, $max, $pms->{pdfinfo}->{count_pdf});
}
# -----------------------------------------
sub pdf_image_count {
my ($self,$pms,$body,$min,$max) = @_;
return unless defined $min;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"count_pdf_images"});
return result_check($min, $max, $pms->{'pdfinfo'}->{"count_pdf_images"});
my ($self, $pms, $body, $min, $max) = @_;
return _result_check($min, $max, $pms->{pdfinfo}->{count_pdf_images});
}
# -----------------------------------------
sub pdf_pixel_coverage {
my ($self,$pms,$body,$min,$max) = @_;
return unless (defined $min);
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"pc_pdf"});
# dbg("pdfinfo: pc_$type: $min, ".($max ? $max:'').", $type, ".$pms->{'pdfinfo'}->{"pc_pdf"});
return result_check($min, $max, $pms->{'pdfinfo'}->{"pc_pdf"});
return _result_check($min, $max, $pms->{pdfinfo}->{pc_pdf});
}
# -----------------------------------------
sub pdf_image_to_text_ratio {
my ($self,$pms,$body,$min,$max) = @_;
return unless (defined $min && defined $max);
my ($self, $pms, $body, $min, $max) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"pc_pdf"});
return 0 unless defined $max;
return 0 unless $pms->{pdfinfo}->{pc_pdf};
# depending on how you call this eval (body vs rawbody),
# the $textlen will differ.
my $textlen = length(join('',@$body));
my $textlen = length(join('', @$body));
return 0 unless $textlen;
return 0 unless ( $textlen > 0 && exists $pms->{'pdfinfo'}->{"pc_pdf"} && $pms->{'pdfinfo'}->{"pc_pdf"} > 0);
my $ratio = $textlen / $pms->{'pdfinfo'}->{"pc_pdf"};
my $ratio = $textlen / $pms->{pdfinfo}->{pc_pdf};
dbg("pdfinfo: image ratio=$ratio, min=$min max=$max");
return result_check($min, $max, $ratio, 1);
}
# -----------------------------------------
return _result_check($min, $max, $ratio, 1);
}
sub pdf_is_empty_body {
my ($self,$pms,$body,$min) = @_;
my ($self, $pms, $body, $min) = @_;
return 0 unless $pms->{pdfinfo}->{count_pdf};
$min ||= 0; # default to 0 bytes
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless $pms->{'pdfinfo'}->{"count_pdf"};
# check for cached result
return 1 if $pms->{'pdfinfo'}->{"no_body_text"};
shift @$body; # shift body array removes line #1 -> subject line.
my $bytes = 0;
my $textlen = length(join('',@$body));
my $idx = 0;
foreach my $line (@$body) {
next unless ($line =~ m/\S/);
next if ($line =~ m/^Subject/);
next if $idx++ == 0; # skip subject line
next unless $line =~ /\S/;
$bytes += length($line);
# no hit if minimum already exceeded
return 0 if $bytes > $min;
}
dbg("pdfinfo: is_empty_body = $bytes bytes");
if ($bytes == 0 || ($bytes <= $min)) {
$pms->{'pdfinfo'}->{"no_body_text"} = 1;
dbg("pdfinfo: pdf_is_empty_body matched ($bytes <= $min)");
return 1;
}
# cache it and return 0
$pms->{'pdfinfo'}->{"no_body_text"} = 0;
return 0;
}
# -----------------------------------------
sub pdf_image_size_exact {
my ($self,$pms,$body,$height,$width) = @_;
return unless (defined $height && defined $width);
my ($self, $pms, $body, $height, $width) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 unless defined $width;
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"dems_pdf"});
return 1 if (exists $pms->{'pdfinfo'}->{"dems_pdf"}->{"${height}x${width}"});
return 1 if exists $pms->{pdfinfo}->{dems_pdf}->{"${height}x${width}"};
return 0;
}
# -----------------------------------------
sub pdf_image_size_range {
my ($self,$pms,$body,$minh,$minw,$maxh,$maxw) = @_;
return unless (defined $minh && defined $minw);
my ($self, $pms, $body, $minh, $minw, $maxh, $maxw) = @_;
# make sure we have image data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 unless defined $minw;
return 0 unless exists $pms->{pdfinfo}->{dems_pdf};
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"dems_pdf"});
foreach my $dem ( keys %{$pms->{'pdfinfo'}->{"dems_pdf"}}) {
my ($h,$w) = split(/x/,$dem);
foreach my $dem (keys %{$pms->{pdfinfo}->{dems_pdf}}) {
my ($h, $w) = split(/x/, $dem);
next if ($h < $minh); # height less than min height
next if ($w < $minw); # width less than min width
next if (defined $maxh && $h > $maxh); # height more than max height
next if (defined $maxw && $w > $maxw); # width more than max width
# if we make it here, we have a match
return 1;
}
@ -664,88 +578,54 @@ sub pdf_image_size_range {
return 0;
}
# -----------------------------------------
sub pdf_match_md5 {
my ($self, $pms, $body, $md5) = @_;
my ($self,$pms,$body,$md5) = @_;
return unless defined $md5;
return 0 unless defined $md5;
my $uc_md5 = uc($md5); # uppercase matches only
# make sure we have pdf data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"md5"});
return 1 if (exists $pms->{'pdfinfo'}->{"md5"}->{$uc_md5});
return 1 if exists $pms->{pdfinfo}->{md5}->{uc $md5};
return 0;
}
# -----------------------------------------
sub pdf_match_fuzzy_md5 {
my ($self, $pms, $body, $md5) = @_;
my ($self,$pms,$body,$md5) = @_;
return unless defined $md5;
return 0 unless defined $md5;
my $uc_md5 = uc($md5); # uppercase matches only
# make sure we have pdf data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{"fuzzy_md5"});
return 1 if (exists $pms->{'pdfinfo'}->{"fuzzy_md5"}->{$uc_md5});
return 1 if exists $pms->{pdfinfo}->{fuzzy_md5}->{uc $md5};
return 0;
}
# -----------------------------------------
sub pdf_match_details {
my ($self, $pms, $body, $detail, $regex) = @_;
return unless ($detail && $regex);
# make sure we have pdf data read in.
if (!exists $pms->{'pdfinfo'}) {
$self->_find_pdf_mime_parts($pms);
}
return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
return 0 unless (exists $pms->{'pdfinfo'}->{'details'});
my $check_value = $pms->{pdfinfo}->{details}->{$detail};
return unless $check_value;
return 0 unless defined $regex;
return 0 unless exists $pms->{pdfinfo}->{details}->{$detail};
my ($rec, $err) = compile_regexp($regex, 2);
if (!$rec) {
info("pdfinfo: invalid regexp '$regex': $err");
my $rulename = $pms->get_current_eval_rule_name();
warn "pdfinfo: invalid regexp for $rulename '$regex': $err";
return 0;
}
if ($check_value =~ $rec) {
dbg("pdfinfo: pdf_match_details $detail $regex matches $check_value");
foreach (keys %{$pms->{pdfinfo}->{details}->{$detail}}) {
if ($_ =~ $rec) {
dbg("pdfinfo: pdf_match_details $detail ($regex) match: $_");
return 1;
}
}
return 0;
}
# -----------------------------------------
sub result_check {
sub _result_check {
my ($min, $max, $value, $nomaxequal) = @_;
return 0 unless defined $value;
return 0 if ($value < $min);
return 0 if (defined $max && $value > $max);
return 0 if (defined $nomaxequal && $nomaxequal && $value == $max);
return 0 unless defined $min && defined $value;
return 0 if $value < $min;
return 0 if defined $max && $value > $max;
return 0 if defined $nomaxequal && $nomaxequal && $value == $max;
return 1;
}
# -----------------------------------------
1;

View File

@ -21,8 +21,10 @@ package Mail::SpamAssassin::Plugin::PhishTag;
use strict;
use warnings;
use re 'taint';
use Errno qw(EBADF);
use Mail::SpamAssassin;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -220,7 +222,7 @@ PhishTag - SpamAssassin plugin for redirecting links in incoming emails.
=head1 DESCRIPTION
PhishTag enables administrators to rewrite links in emails that trigger certain
tests, preferably anti-phishing blacklist tests. The plugin will inhibit the
tests, preferably anti-phishing blocklist tests. The plugin will inhibit the
blocking of a portion of the emails that trigger the test by SpamAssassin, and
let them pass to the users' inbox after the rewrite. It is useful in providing
training to email users about company policies and general email usage.

View File

@ -1,6 +1,6 @@
#
# Author: Giovanni Bechis <gbechis@apache.org>
# Copyright 2018,2019 Giovanni Bechis
# Copyright 2018,2020 Giovanni Bechis
#
# <@LICENSE>
# Licensed to the Apache Software Foundation (ASF) under one or more
@ -31,6 +31,7 @@ Mail::SpamAssassin::Plugin::Phishing - check uris against phishing feed
ifplugin Mail::SpamAssassin::Plugin::Phishing
phishing_openphish_feed /etc/mail/spamassassin/openphish-feed.txt
phishing_phishtank_feed /etc/mail/spamassassin/phishtank-feed.csv
phishing_phishstats_feed /etc/mail/spamassassin/phishstats-feed.csv
body URI_PHISHING eval:check_phishing()
describe URI_PHISHING Url match phishing in feed
endif
@ -38,21 +39,25 @@ Mail::SpamAssassin::Plugin::Phishing - check uris against phishing feed
=head1 DESCRIPTION
This plugin finds uris used in phishing campaigns detected by
OpenPhish or PhishTank feeds.
OpenPhish, PhishTank or PhishStats feeds.
The Openphish free feed is updated every 6 hours and can be downloaded from
https://openphish.com/feed.txt.
The Premium Openphish feed is not currently supported.
The PhishTank free feed is updated every 1 hours and can be downloaded from
http://data.phishtank.com/data/online-valid.csv.
To avoid download limits a registration is required.
The PhishStats feed is updated every 90 minutes and can be downloaded from
https://phishstats.info/phish_score.csv.
=cut
package Mail::SpamAssassin::Plugin::Phishing;
use strict;
use warnings;
use re 'taint';
my $VERSION = 1.1;
use Errno qw(EBADF);
@ -61,7 +66,7 @@ use Mail::SpamAssassin::PerMsgStatus;
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub dbg { Mail::SpamAssassin::Plugin::dbg ("Phishing: @_"); }
sub dbg { my $msg = shift; Mail::SpamAssassin::Plugin::dbg("Phishing: $msg", @_); }
sub new {
my ($class, $mailsa) = @_;
@ -71,7 +76,7 @@ sub new {
bless ($self, $class);
$self->set_config($mailsa->{conf});
$self->register_eval_rule("check_phishing");
$self->register_eval_rule("check_phishing", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
return $self;
}
@ -81,14 +86,98 @@ sub set_config {
my @cmds;
push(@cmds, {
setting => 'phishing_openphish_feed',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
}
);
=head1 ADMIN PREFERENCES
The following options can be used in site-wide (C<local.cf>)
configuration files to customize how the module handles phishing uris
=cut
=over 4
=item phishing_openphish_feed
Absolute path of the downloaded OpenPhish datafeed.
=back
=cut
push(@cmds, {
setting => 'phishing_phishtank_feed',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
}
);
=over 4
=item phishing_phishtank_feed
Absolute path of the downloaded PhishTank datafeed.
=back
=cut
push(@cmds, {
setting => 'phishing_uri_noparam',
is_admin => 1,
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
}
);
=over 4
=item phishing_uri_noparam ( 0 | 1 ) (default: 0)
If this option is set uri parameters will not be take into consideration
when parsing the phishing uris datafeed.
If this option is enabled and the url without parameters is "generic"
(like https://www.kisa.link/url_redirector.php?url=...) the url will be
skipped.
=back
=cut
push(@cmds, {
setting => 'phishing_phishstats_feed',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
}
);
=over 4
=item phishing_phishstats_feed
Absolute path of the downloaded PhishStats datafeed.
=back
=cut
push(@cmds, {
setting => 'phishing_phishstats_minscore',
is_admin => 1,
default => 6,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
}
);
=over 4
=item phishing_phishstats_minscore ( 0 - 10 ) (default: 6)
Minimum score to take into consideration for phishing uris downloaded
from PhishStats datafeed.
=back
=cut
$conf->{parser}->register_commands(\@cmds);
}
@ -100,7 +189,8 @@ sub finish_parsing_end {
sub _read_configfile {
my ($self) = @_;
my $conf = $self->{main}->{registryboundaries}->{conf};
my @phtank_ln;
my (@phtank_ln, @phstats_ln);
my $stripped_cluri;
local *F;
if ( defined($conf->{phishing_openphish_feed}) && ( -f $conf->{phishing_openphish_feed} ) ) {
@ -109,10 +199,14 @@ sub _read_configfile {
chomp;
#lines that start with pound are comments
next if(/^\s*\#/);
$stripped_cluri = $_;
if ( $conf->{phishing_uri_noparam} eq 1 ) {
$stripped_cluri =~ s/\?.*//;
}
my $phishdomain = $self->{main}->{registryboundaries}->uri_to_domain($_);
if ( defined $phishdomain ) {
push @{$self->{PHISHING}->{$_}->{phishdomain}}, $phishdomain;
push @{$self->{PHISHING}->{$_}->{phishinfo}->{$phishdomain}}, "OpenPhish";
push @{$self->{PHISHING}->{$stripped_cluri}->{phishdomain}}, $phishdomain;
push @{$self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$phishdomain}}, "OpenPhish";
}
}
@ -133,11 +227,14 @@ sub _read_configfile {
@phtank_ln = split(/,/, $_);
$phtank_ln[1] =~ s/\"//g;
$stripped_cluri = $phtank_ln[1];
if ( $conf->{phishing_uri_noparam} eq 1 ) {
$stripped_cluri =~ s/\?.*//;
}
my $phishdomain = $self->{main}->{registryboundaries}->uri_to_domain($phtank_ln[1]);
if ( defined $phishdomain ) {
push @{$self->{PHISHING}->{$phtank_ln[1]}->{phishdomain}}, $phishdomain;
push @{$self->{PHISHING}->{$phtank_ln[1]}->{phishinfo}->{$phishdomain}}, "PhishTank";
push @{$self->{PHISHING}->{$stripped_cluri}->{phishdomain}}, $phishdomain;
push @{$self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$phishdomain}}, "PhishTank";
}
}
@ -146,6 +243,40 @@ sub _read_configfile {
: die "error reading config file: $!";
close(F) or die "error closing config file: $!";
}
if ( defined($conf->{phishing_phishstats_feed}) && (-f $conf->{phishing_phishstats_feed} ) ) {
open(F, '<', $conf->{phishing_phishstats_feed});
for ($!=0; <F>; $!=0) {
#skip first line
next if ( $. eq 1);
chomp;
#lines that start with pound are comments
next if(/^\s*\#/);
# CSV: Date,Score,URL,IP
@phstats_ln = split(/,/, $_);
$phstats_ln[1] =~ s/\"//g;
$phstats_ln[2] =~ s/\"//g;
if ( $conf->{phishing_phishstats_minscore} >= $phstats_ln[1] ) {
next;
}
$stripped_cluri = $phstats_ln[2];
if ( $conf->{phishing_uri_noparam} eq 1 ) {
$stripped_cluri =~ s/\?.*//;
}
my $phishdomain = $self->{main}->{registryboundaries}->uri_to_domain($phstats_ln[2]);
if ( defined $phishdomain ) {
push @{$self->{PHISHING}->{$stripped_cluri}->{phishdomain}}, $phishdomain;
push @{$self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$phishdomain}}, "PhishStats";
}
}
defined $_ || $!==0 or
$!==EBADF ? dbg("PHISHING: error reading config file: $!")
: die "error reading config file: $!";
close(F) or die "error closing config file: $!";
}
}
sub check_phishing {
@ -153,10 +284,11 @@ sub check_phishing {
my $feedname;
my $domain;
my $stripped_cluri;
my $dcnt;
my $uris = $pms->get_uri_detail_list();
my $rulename = $pms->get_current_eval_rule_name();
while (my($uri, $info) = each %{$uris}) {
# we want to skip mailto: uris
next if ($uri =~ /^mailto:/i);
@ -166,12 +298,21 @@ sub check_phishing {
if (($info->{types}->{a}) || ($info->{types}->{parsed})) {
# check url
foreach my $cluri (@{$info->{cleaned}}) {
if ( exists $self->{PHISHING}->{$cluri} ) {
$stripped_cluri = $cluri;
if( $self->{main}->{conf}->{phishing_uri_noparam} eq 1 ) {
$stripped_cluri =~ s/\?.*//;
$dcnt = $stripped_cluri =~ tr/\///;
}
# If uri without parameters are considered, skip too short uris
# like https://www.google.com/url?sa=t&url=http://badsite.com
if( ($self->{main}->{conf}->{phishing_uri_noparam} eq 1) && ($dcnt <= 3) ) {
next;
}
if ( exists $self->{PHISHING}->{$stripped_cluri} ) {
$domain = $self->{main}->{registryboundaries}->uri_to_domain($cluri);
$feedname = $self->{PHISHING}->{$cluri}->{phishinfo}->{$domain}[0];
dbg("HIT! $domain [$cluri] found in $feedname feed");
$pms->test_log("$feedname ($domain)");
$pms->got_hit($rulename, "", ruletype => 'eval');
$feedname = $self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$domain}[0];
dbg("HIT! $domain [$stripped_cluri] found in $feedname feed");
$pms->test_log("$feedname ($domain)", $rulename);
return 1;
}
}

View File

@ -37,13 +37,17 @@ package Mail::SpamAssassin::Plugin::Pyzor;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Timeout;
use Mail::SpamAssassin::Util qw(untaint_var untaint_file_path
proc_status_ok exit_status_str);
use Mail::SpamAssassin::SubProcBackChannel;
use Mail::SpamAssassin::Util qw(untaint_var untaint_file_path am_running_on_windows
proc_status_ok exit_status_str force_die);
use strict;
use warnings;
# use bytes;
use re 'taint';
use Storable;
use POSIX qw(PIPE_BUF WNOHANG);
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub new {
@ -64,7 +68,7 @@ sub new {
dbg("pyzor: network tests on, attempting Pyzor");
}
$self->register_eval_rule("check_pyzor");
$self->register_eval_rule("check_pyzor", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->set_config($mailsaobject->{conf});
@ -87,11 +91,27 @@ Whether to use Pyzor, if it is available.
push (@cmds, {
setting => 'use_pyzor',
is_admin => 1,
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
});
=item pyzor_max NUMBER (default: 5)
=item pyzor_fork (0|1) (default: 1)
Instead of running Pyzor synchronously, fork separate process for it and
read the results in later (similar to async DNS lookups). Increases
throughput. Considered experimental on Windows, where default is 0.
=cut
push(@cmds, {
setting => 'pyzor_fork',
is_admin => 1,
default => am_running_on_windows()?0:1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=item pyzor_count_min NUMBER (default: 5)
This option sets how often a message's body checksum must have been
reported to the Pyzor server before SpamAssassin will consider the Pyzor
@ -103,18 +123,69 @@ set this to a relatively low value, e.g. C<5>.
=cut
push (@cmds, {
setting => 'pyzor_max',
setting => 'pyzor_count_min',
is_admin => 1,
default => 5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
# Deprecated setting, the name makes no sense!
push (@cmds, {
setting => 'pyzor_max',
is_admin => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
code => sub {
my ($self, $key, $value, $line) = @_;
warn("deprecated setting used, change pyzor_max to pyzor_count_min\n");
if ($value !~ /^\d+$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{pyzor_count_min} = $value;
}
});
=item pyzor_welcomelist_min NUMBER (default: 10)
Previously pyzor_whitelist_min which will work interchangeably until 4.1.
This option sets how often a message's body checksum must have been
welcomelisted to the Pyzor server for SpamAssassin to consider ignoring the
result. Final decision is made by pyzor_welcomelist_factor.
=cut
push (@cmds, {
setting => 'pyzor_welcomelist_min',
aliases => ['pyzor_whitelist_min'], # removed in 4.1
is_admin => 1,
default => 10,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=item pyzor_welcomelist_factor NUMBER (default: 0.2)
Previously pyzor_whitelist_factor which will work interchangeably until 4.1.
Ignore Pyzor result if REPORTCOUNT x NUMBER >= pyzor_welcomelist_min.
For default setting this means: 50 reports requires 10 welcomelistings.
=cut
push (@cmds, {
setting => 'pyzor_welcomelist_factor',
aliases => ['pyzor_whitelist_factor'], # removed in 4.1
is_admin => 1,
default => 0.2,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=back
=head1 ADMINISTRATOR OPTIONS
=over 4
=item pyzor_timeout n (default: 3.5)
=item pyzor_timeout n (default: 5)
How many seconds you wait for Pyzor to complete, before scanning continues
without the Pyzor results. A numeric value is optionally suffixed by a
@ -142,7 +213,7 @@ removing one of them.
push (@cmds, {
setting => 'pyzor_timeout',
is_admin => 1,
default => 3.5,
default => 5,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_DURATION
});
@ -202,178 +273,339 @@ you should use this, as the current PATH will have been cleared.
sub is_pyzor_available {
my ($self) = @_;
my $pyzor = $self->{main}->{conf}->{pyzor_path} || '';
unless ($pyzor) {
$pyzor = Mail::SpamAssassin::Util::find_executable_in_env_path('pyzor');
}
my $pyzor = $self->{main}->{conf}->{pyzor_path} ||
Mail::SpamAssassin::Util::find_executable_in_env_path('pyzor');
unless ($pyzor && -x $pyzor) {
dbg("pyzor: pyzor is not available: no pyzor executable found");
dbg("pyzor: no pyzor executable found");
$self->{pyzor_available} = 0;
return 0;
}
# remember any found pyzor
$self->{main}->{conf}->{pyzor_path} = $pyzor;
dbg("pyzor: pyzor is available: " . $self->{main}->{conf}->{pyzor_path});
dbg("pyzor: pyzor is available: $pyzor");
return 1;
}
sub get_pyzor_interface {
my ($self) = @_;
sub finish_parsing_start {
my ($self, $opts) = @_;
if (!$self->{main}->{conf}->{use_pyzor}) {
dbg("pyzor: use_pyzor option not enabled, disabling Pyzor");
$self->{pyzor_interface} = "disabled";
$self->{pyzor_available} = 0;
# If forking, hard adjust priority -100 to launch early
# Find rulenames from eval_to_rule mappings
if ($opts->{conf}->{pyzor_fork}) {
foreach (@{$opts->{conf}->{eval_to_rule}->{check_pyzor}}) {
dbg("pyzor: adjusting rule $_ priority to -100");
$opts->{conf}->{priority}->{$_} = -100;
}
elsif ($self->is_pyzor_available()) {
$self->{pyzor_interface} = "pyzor";
$self->{pyzor_available} = 1;
}
else {
dbg("pyzor: no pyzor found, disabling Pyzor");
$self->{pyzor_available} = 0;
}
}
sub check_pyzor {
my ($self, $permsgstatus, $full) = @_;
my ($self, $pms, $full) = @_;
# initialize valid tags
$permsgstatus->{tag_data}->{PYZOR} = "";
return 0 if !$self->{pyzor_available};
return 0 if !$self->{main}->{conf}->{use_pyzor};
return 0 if $pms->{pyzor_running};
$pms->{pyzor_running} = 1;
return 0 if !$self->is_pyzor_available();
my $timer = $self->{main}->time_method("check_pyzor");
$self->get_pyzor_interface();
return 0 unless $self->{pyzor_available};
# initialize valid tags
$pms->{tag_data}->{PYZOR} = '';
return $self->pyzor_lookup($permsgstatus, $full);
# create fulltext tmpfile now (before possible forking)
$pms->{pyzor_tmpfile} = $pms->create_fulltext_tmpfile();
## non-forking method
if (!$self->{main}->{conf}->{pyzor_fork}) {
my @results = $self->pyzor_lookup($pms);
return $self->_check_result($pms, \@results);
}
## forking method
$pms->{pyzor_rulename} = $pms->get_current_eval_rule_name();
# create socketpair for communication
$pms->{pyzor_backchannel} = Mail::SpamAssassin::SubProcBackChannel->new();
my $back_selector = '';
$pms->{pyzor_backchannel}->set_selector(\$back_selector);
eval {
$pms->{pyzor_backchannel}->setup_backchannel_parent_pre_fork();
} or do {
dbg("pyzor: backchannel pre-setup failed: $@");
delete $pms->{pyzor_backchannel};
return 0;
};
my $pid = fork();
if (!defined $pid) {
info("pyzor: child fork failed: $!");
delete $pms->{pyzor_backchannel};
return 0;
}
if (!$pid) {
$0 = "$0 (pyzor)";
$SIG{CHLD} = 'DEFAULT';
$SIG{PIPE} = 'IGNORE';
$SIG{$_} = sub {
eval { dbg("pyzor: child process $$ caught signal $_[0]"); };
force_die(6); # avoid END and destructor processing
} foreach am_running_on_windows()?qw(INT HUP TERM QUIT):qw(INT HUP TERM TSTP QUIT USR1 USR2);
dbg("pyzor: child process $$ forked");
$pms->{pyzor_backchannel}->setup_backchannel_child_post_fork();
my @results = $self->pyzor_lookup($pms);
my $backmsg;
eval {
$backmsg = Storable::freeze(\@results);
};
if ($@) {
dbg("pyzor: child return value freeze failed: $@");
force_die(0); # avoid END and destructor processing
}
if (!syswrite($pms->{pyzor_backchannel}->{parent}, $backmsg)) {
dbg("pyzor: child backchannel write failed: $!");
}
force_die(0); # avoid END and destructor processing
}
$pms->{pyzor_pid} = $pid;
eval {
$pms->{pyzor_backchannel}->setup_backchannel_parent_post_fork($pid);
} or do {
dbg("pyzor: backchannel post-setup failed: $@");
delete $pms->{pyzor_backchannel};
return 0;
};
return; # return undef for async status
}
sub pyzor_lookup {
my ($self, $permsgstatus, $fulltext) = @_;
my @response;
my $pyzor_count;
my $pyzor_whitelisted;
my $timeout = $self->{main}->{conf}->{pyzor_timeout};
my ($self, $pms) = @_;
$pyzor_count = 0;
$pyzor_whitelisted = 0;
my $pid;
# use a temp file here -- open2() is unreliable, buffering-wise, under spamd
my $tmpf = $permsgstatus->create_fulltext_tmpfile($fulltext);
my $conf = $self->{main}->{conf};
my $timeout = $conf->{pyzor_timeout};
# note: not really tainted, this came from system configuration file
my $path = untaint_file_path($self->{main}->{conf}->{pyzor_path});
my $opts = untaint_var($self->{main}->{conf}->{pyzor_options}) || '';
my $path = untaint_file_path($conf->{pyzor_path});
my $opts = untaint_var($conf->{pyzor_options}) || '';
$permsgstatus->enter_helper_run_mode();
$pms->enter_helper_run_mode();
my $pid;
my @resp;
my $timer = Mail::SpamAssassin::Timeout->new(
{ secs => $timeout, deadline => $permsgstatus->{master_deadline} });
{ secs => $timeout, deadline => $pms->{master_deadline} });
my $err = $timer->run_and_catch(sub {
local $SIG{PIPE} = sub { die "__brokenpipe__ignore__\n" };
dbg("pyzor: opening pipe: " . join(' ', $path, $opts, "check", "< $tmpf"));
dbg("pyzor: opening pipe: ".
join(' ', $path, $opts, "check", "<".$pms->{pyzor_tmpfile}));
$pid = Mail::SpamAssassin::Util::helper_app_pipe_open(*PYZOR,
$tmpf, 1, $path, split(' ', $opts), "check");
$pms->{pyzor_tmpfile}, 1, $path, split(' ', $opts), "check");
$pid or die "$!\n";
# read+split avoids a Perl I/O bug (Bug 5985)
my($inbuf,$nread,$resp); $resp = '';
while ( $nread=read(PYZOR,$inbuf,8192) ) { $resp .= $inbuf }
my($inbuf, $nread);
my $resp = '';
while ($nread = read(PYZOR, $inbuf, 8192)) { $resp .= $inbuf }
defined $nread or die "error reading from pipe: $!";
@response = split(/^/m, $resp, -1); undef $resp;
@resp = split(/^/m, $resp, -1);
my $errno = 0; close PYZOR or $errno = $!;
if (proc_status_ok($?,$errno)) {
my $errno = 0;
close PYZOR or $errno = $!;
if (proc_status_ok($?, $errno)) {
dbg("pyzor: [%s] finished successfully", $pid);
} elsif (proc_status_ok($?,$errno, 0,1)) { # sometimes it exits with 1
dbg("pyzor: [%s] finished: %s", $pid, exit_status_str($?,$errno));
} elsif (proc_status_ok($?, $errno, 0, 1)) { # sometimes it exits with 1
dbg("pyzor: [%s] finished: %s", $pid, exit_status_str($?, $errno));
} else {
info("pyzor: [%s] error: %s", $pid, exit_status_str($?,$errno));
}
if (!@response) {
# this exact string is needed below
warn("no response\n"); # yes, this is possible
return;
}
chomp for @response;
if ($response[0] =~ /^Traceback/) {
warn("internal error, python traceback seen in response: ".
join("\\n", @response));
} else {
dbg("pyzor: got response: ".join("\\n", @response));
info("pyzor: [%s] error: %s", $pid, exit_status_str($?, $errno));
}
});
if (defined(fileno(*PYZOR))) { # still open
if ($pid) {
if (kill('TERM',$pid)) { dbg("pyzor: killed stale helper [$pid]") }
else { dbg("pyzor: killing helper application [$pid] failed: $!") }
if (kill('TERM', $pid)) {
dbg("pyzor: killed stale helper [$pid]");
} else {
dbg("pyzor: killing helper application [$pid] failed: $!");
}
my $errno = 0; close PYZOR or $errno = $!;
proc_status_ok($?,$errno)
or info("pyzor: [%s] error: %s", $pid, exit_status_str($?,$errno));
}
$permsgstatus->leave_helper_run_mode();
my $errno = 0;
close PYZOR or $errno = $!;
proc_status_ok($?, $errno)
or info("pyzor: [%s] error: %s", $pid, exit_status_str($?, $errno));
}
$pms->leave_helper_run_mode();
if ($timer->timed_out()) {
dbg("pyzor: check timed out after $timeout seconds");
return 0;
}
if ($err) {
return ();
} elsif ($err) {
chomp $err;
if ($err eq "__brokenpipe__ignore__") {
dbg("pyzor: check failed: broken pipe");
} elsif ($err eq "no response") {
dbg("pyzor: check failed: no response");
} else {
warn("pyzor: check failed: $err\n");
info("pyzor: check failed: $err");
return ();
}
return @resp;
}
sub check_tick {
my ($self, $opts) = @_;
$self->_check_forked_result($opts->{permsgstatus}, 0);
}
sub check_cleanup {
my ($self, $opts) = @_;
$self->_check_forked_result($opts->{permsgstatus}, 1);
}
sub _check_forked_result {
my ($self, $pms, $finish) = @_;
return 0 if !$pms->{pyzor_backchannel};
return 0 if !$pms->{pyzor_pid};
my $timer = $self->{main}->time_method("check_pyzor");
$pms->{pyzor_abort} = $pms->{deadline_exceeded} || $pms->{shortcircuited};
my $kid_pid = $pms->{pyzor_pid};
# if $finish, force waiting for the child
my $pid = waitpid($kid_pid, $finish && !$pms->{pyzor_abort} ? 0 : WNOHANG);
if ($pid == 0) {
#dbg("pyzor: child process $kid_pid not finished yet, trying later");
if ($pms->{pyzor_abort}) {
dbg("pyzor: bailing out due to deadline/shortcircuit");
kill('TERM', $kid_pid);
if (waitpid($kid_pid, WNOHANG) == 0) {
sleep(1);
if (waitpid($kid_pid, WNOHANG) == 0) {
dbg("pyzor: child process $kid_pid still alive, KILL");
kill('KILL', $kid_pid);
waitpid($kid_pid, 0);
}
}
delete $pms->{pyzor_pid};
delete $pms->{pyzor_backchannel};
}
return 0;
} elsif ($pid == -1) {
# child does not exist?
dbg("pyzor: child process $kid_pid already handled?");
delete $pms->{pyzor_backchannel};
return 0;
}
foreach my $one_response (@response) {
$pms->rule_ready($pms->{pyzor_rulename}); # mark rule ready for metas
dbg("pyzor: child process $kid_pid finished, reading results");
my $backmsg;
my $ret = sysread($pms->{pyzor_backchannel}->{latest_kid_fh}, $backmsg, am_running_on_windows()?512:PIPE_BUF);
if (!defined $ret || $ret == 0) {
dbg("pyzor: could not read result from child: ".($ret == 0 ? 0 : $!));
delete $pms->{pyzor_backchannel};
return 0;
}
delete $pms->{pyzor_backchannel};
my $results;
eval {
$results = Storable::thaw($backmsg);
};
if ($@) {
dbg("pyzor: child return value thaw failed: $@");
return;
}
$self->_check_result($pms, $results);
}
sub _check_result {
my ($self, $pms, $results) = @_;
if (!@$results) {
dbg("pyzor: no response from server");
return 0;
}
my $count = 0;
my $count_wl = 0;
foreach my $res (@$results) {
chomp($res);
if ($res =~ /^Traceback/) {
info("pyzor: internal error, python traceback seen in response: $res");
return 0;
}
dbg("pyzor: got response: $res");
# this regexp is intended to be a little bit forgiving
if ($one_response =~ /^\S+\t.*?\t(\d+)\t(\d+)\s*$/) {
if ($res =~ /^\S+\t.*?\t(\d+)\t(\d+)\s*$/) {
# until pyzor servers can sync their DBs,
# sum counts obtained from all servers
$pyzor_whitelisted += $2+0;
$pyzor_count += $1+0;
}
else {
$count += untaint_var($1)+0; # crazy but needs untainting
$count_wl += untaint_var($2)+0;
} else {
# warn on failures to parse
dbg("pyzor: failure to parse response \"$one_response\"");
info("pyzor: failure to parse response \"$res\"");
}
}
$permsgstatus->set_tag('PYZOR', $pyzor_whitelisted ? "Whitelisted."
: "Reported $pyzor_count times.");
my $conf = $self->{main}->{conf};
if ($pyzor_count >= $self->{main}->{conf}->{pyzor_max}) {
dbg("pyzor: listed: COUNT=$pyzor_count/$self->{main}->{conf}->{pyzor_max} WHITELIST=$pyzor_whitelisted");
my $count_min = $conf->{pyzor_count_min};
my $wl_min = $conf->{pyzor_welcomelist_min};
my $wl_limit = $count_wl >= $wl_min ?
$count * $conf->{pyzor_welcomelist_factor} : 0;
dbg("pyzor: result: COUNT=$count/$count_min WELCOMELIST=$count_wl/$wl_min/%.1f",
$wl_limit);
$pms->set_tag('PYZOR', "Reported $count times, welcomelisted $count_wl times.");
# Empty body etc results in same hash, we should skip very large numbers..
if ($count >= 1000000 || $count_wl >= 10000) {
dbg("pyzor: result exceeded hardcoded limits, ignoring: count/wl 1000000/10000");
return 0;
}
# Welcomelisted?
if ($wl_limit && $count_wl >= $wl_limit) {
dbg("pyzor: message welcomelisted");
return 0;
}
if ($count >= $count_min) {
if ($conf->{pyzor_fork}) {
# forked needs to run got_hit()
$pms->got_hit($pms->{pyzor_rulename}, "", ruletype => 'eval');
return 0;
}
return 1;
}
return 0;
}
sub plugin_report {
my ($self, $options) = @_;
return unless $self->{pyzor_available};
return unless $self->{main}->{conf}->{use_pyzor};
return if !$self->{pyzor_available};
return if !$self->{main}->{conf}->{use_pyzor};
return if $options->{report}->{options}->{dont_report_to_pyzor};
return if !$self->is_pyzor_available();
if (!$options->{report}->{options}->{dont_report_to_pyzor} && $self->is_pyzor_available())
{
# use temporary file: open2() is unreliable due to buffering under spamd
my $tmpf = $options->{report}->create_fulltext_tmpfile($options->{text});
if ($self->pyzor_report($options, $tmpf)) {
@ -384,8 +616,9 @@ sub plugin_report {
else {
info("reporter: could not report spam to Pyzor");
}
$options->{report}->delete_fulltext_tmpfile();
}
$options->{report}->delete_fulltext_tmpfile($tmpf);
return 1;
}
sub pyzor_report {
@ -449,6 +682,9 @@ sub pyzor_report {
return 1;
}
# Version features
sub has_fork { 1 }
1;
=back

View File

@ -43,11 +43,16 @@ package Mail::SpamAssassin::Plugin::Razor2;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Timeout;
use Mail::SpamAssassin::SubProcBackChannel;
use Mail::SpamAssassin::Util qw(force_die am_running_on_windows);
use strict;
use warnings;
# use bytes;
use re 'taint';
use Storable;
use POSIX qw(PIPE_BUF WNOHANG);
our @ISA = qw(Mail::SpamAssassin::Plugin);
sub new {
@ -73,8 +78,8 @@ sub new {
}
}
$self->register_eval_rule("check_razor2");
$self->register_eval_rule("check_razor2_range");
$self->register_eval_rule("check_razor2", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->register_eval_rule("check_razor2_range", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
$self->set_config($mailsaobject->{conf});
@ -93,10 +98,26 @@ Whether to use Razor2, if it is available.
push(@cmds, {
setting => 'use_razor2',
is_admin => 1,
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=item razor_fork (0|1) (default: 1)
Instead of running Razor2 synchronously, fork separate process for it and
read the results in later (similar to async DNS lookups). Increases
throughput. Considered experimental on Windows, where default is 0.
=cut
push(@cmds, {
setting => 'razor_fork',
is_admin => 1,
default => am_running_on_windows()?0:1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
});
=back
=head1 ADMINISTRATOR SETTINGS
@ -195,7 +216,7 @@ sub razor2_access {
my $sigs = $rc->compute_sigs($objects)
or die "$debug: error in compute_sigs";
# if mail isn't whitelisted, check it out
# if mail isn't welcomelisted, check it out
# see 'man razor-whitelist'
if ($type ne 'check' || ! $rc->local_check($objects->[0])) {
# provide a better error message when servers are unavailable,
@ -347,6 +368,8 @@ sub plugin_report {
return unless $self->{main}->{conf}->{use_razor2};
return if $options->{report}->{options}->{dont_report_to_razor};
my $timer = $self->{main}->time_method("razor2_report");
if ($self->razor2_access($options->{text}, 'report', undef)) {
$options->{report}->{report_available} = 1;
info('reporter: spam reported to Razor');
@ -360,6 +383,8 @@ sub plugin_report {
sub plugin_revoke {
my ($self, $options) = @_;
my $timer = $self->{main}->time_method("razor2_revoke");
return unless $self->{razor2_available};
return if $self->{main}->{local_tests_only};
return unless $self->{main}->{conf}->{use_razor2};
@ -375,34 +400,191 @@ sub plugin_revoke {
}
}
sub finish_parsing_start {
my ($self, $opts) = @_;
# If forking, hard adjust priority -100 to launch early
# Find rulenames from eval_to_rule mappings
if ($opts->{conf}->{razor_fork}) {
foreach (@{$opts->{conf}->{eval_to_rule}->{check_razor2}}) {
dbg("razor2: adjusting rule $_ priority to -100");
$opts->{conf}->{priority}->{$_} = -100;
}
foreach (@{$opts->{conf}->{eval_to_rule}->{check_razor2_range}}) {
dbg("razor2: adjusting rule $_ priority to -100");
$opts->{conf}->{priority}->{$_} = -100;
}
}
}
sub check_razor2 {
my ($self, $permsgstatus, $full) = @_;
my ($self, $pms, $full) = @_;
return $permsgstatus->{razor2_result} if (defined $permsgstatus->{razor2_result});
$permsgstatus->{razor2_result} = 0;
$permsgstatus->{razor2_cf_score} = { '4' => 0, '8' => 0 };
return 0 unless $self->{razor2_available};
return 0 unless $self->{main}->{conf}->{use_razor2};
return unless $self->{razor2_available};
return unless $self->{main}->{conf}->{use_razor2};
return $pms->{razor2_result} if (defined $pms->{razor2_result});
return 0 if $pms->{razor2_running};
$pms->{razor2_running} = 1;
my $timer = $self->{main}->time_method("check_razor2");
my $return;
my @results;
## non-forking method
if (!$self->{main}->{conf}->{razor_fork}) {
# TODO: check for cache header, set results appropriately
# do it this way to make it easier to get out the results later from the
# netcache plugin
($return, @results) =
$self->razor2_access($full, 'check', $permsgstatus->{master_deadline});
# netcache plugin ... what netcache plugin?
(undef, my @results) =
$self->razor2_access($full, 'check', $pms->{master_deadline});
return $self->_check_result($pms, \@results);
}
## forking method
$pms->{razor2_rulename} = $pms->get_current_eval_rule_name();
# create socketpair for communication
$pms->{razor2_backchannel} = Mail::SpamAssassin::SubProcBackChannel->new();
my $back_selector = '';
$pms->{razor2_backchannel}->set_selector(\$back_selector);
eval {
$pms->{razor2_backchannel}->setup_backchannel_parent_pre_fork();
} or do {
dbg("razor2: backchannel pre-setup failed: $@");
delete $pms->{razor2_backchannel};
return 0;
};
my $pid = fork();
if (!defined $pid) {
info("razor2: child fork failed: $!");
delete $pms->{razor2_backchannel};
return 0;
}
if (!$pid) {
$0 = "$0 (razor2)";
$SIG{CHLD} = 'DEFAULT';
$SIG{PIPE} = 'IGNORE';
$SIG{$_} = sub {
eval { dbg("razor2: child process $$ caught signal $_[0]"); };
force_die(6); # avoid END and destructor processing
} foreach am_running_on_windows()?qw(INT HUP TERM QUIT):qw(INT HUP TERM TSTP QUIT USR1 USR2);
dbg("razor2: child process $$ forked");
$pms->{razor2_backchannel}->setup_backchannel_child_post_fork();
(undef, my @results) =
$self->razor2_access($full, 'check', $pms->{master_deadline});
my $backmsg;
eval {
$backmsg = Storable::freeze(\@results);
};
if ($@) {
dbg("razor2: child return value freeze failed: $@");
force_die(0); # avoid END and destructor processing
}
if (!syswrite($pms->{razor2_backchannel}->{parent}, $backmsg)) {
dbg("razor2: child backchannel write failed: $!");
}
force_die(0); # avoid END and destructor processing
}
$pms->{razor2_pid} = $pid;
eval {
$pms->{razor2_backchannel}->setup_backchannel_parent_post_fork($pid);
} or do {
dbg("razor2: backchannel post-setup failed: $@");
delete $pms->{razor2_backchannel};
return 0;
};
return; # return undef for async status
}
sub check_tick {
my ($self, $opts) = @_;
$self->_check_forked_result($opts->{permsgstatus}, 0);
}
sub check_cleanup {
my ($self, $opts) = @_;
$self->_check_forked_result($opts->{permsgstatus}, 1);
}
sub _check_forked_result {
my ($self, $pms, $finish) = @_;
return 0 if !$pms->{razor2_backchannel};
return 0 if !$pms->{razor2_pid};
my $timer = $self->{main}->time_method("check_razor2");
$pms->{razor2_abort} = $pms->{deadline_exceeded} || $pms->{shortcircuited};
my $kid_pid = $pms->{razor2_pid};
# if $finish, force waiting for the child
my $pid = waitpid($kid_pid, $finish && !$pms->{razor2_abort} ? 0 : WNOHANG);
if ($pid == 0) {
#dbg("razor2: child process $kid_pid not finished yet, trying later");
if ($pms->{razor2_abort}) {
dbg("razor2: bailing out due to deadline/shortcircuit");
kill('TERM', $kid_pid);
if (waitpid($kid_pid, WNOHANG) == 0) {
sleep(1);
if (waitpid($kid_pid, WNOHANG) == 0) {
dbg("razor2: child process $kid_pid still alive, KILL");
kill('KILL', $kid_pid);
waitpid($kid_pid, 0);
}
}
delete $pms->{razor2_pid};
delete $pms->{razor2_backchannel};
}
return 0;
} elsif ($pid == -1) {
# child does not exist?
dbg("razor2: child process $kid_pid already handled?");
delete $pms->{razor2_backchannel};
return 0;
}
$pms->rule_ready($pms->{razor2_rulename}); # mark rule ready for metas
dbg("razor2: child process $kid_pid finished, reading results");
my $backmsg;
my $ret = sysread($pms->{razor2_backchannel}->{latest_kid_fh}, $backmsg, am_running_on_windows()?512:PIPE_BUF);
if (!defined $ret || $ret == 0) {
dbg("razor2: could not read result from child: ".($ret == 0 ? 0 : $!));
delete $pms->{razor2_backchannel};
return 0;
}
delete $pms->{razor2_backchannel};
my $results;
eval {
$results = Storable::thaw($backmsg);
};
if ($@) {
dbg("razor2: child return value thaw failed: $@");
return;
}
$self->_check_result($pms, $results);
}
sub _check_result {
my ($self, $pms, $results) = @_;
$self->{main}->call_plugins ('process_razor_result',
{ results => \@results, permsgstatus => $permsgstatus }
{ results => $results, permsgstatus => $pms }
);
foreach my $result (@results) {
foreach my $result (@$results) {
if (exists $result->{result}) {
$permsgstatus->{razor2_result} = $result->{result} if $result->{result};
$pms->{razor2_result} = $result->{result} if $result->{result};
}
elsif ($result->{noresponse}) {
dbg('razor2: part=' . $result->{part} . ' noresponse');
@ -415,46 +597,86 @@ sub check_razor2 {
next if $result->{contested};
my $cf = $permsgstatus->{razor2_cf_score}->{$result->{engine}} || 0;
my $cf = $pms->{razor2_cf_score}->{$result->{engine}} || 0;
if ($result->{confidence} > $cf) {
$permsgstatus->{razor2_cf_score}->{$result->{engine}} = $result->{confidence};
$pms->{razor2_cf_score}->{$result->{engine}} = $result->{confidence};
}
}
}
dbg("razor2: results: spam? " . $permsgstatus->{razor2_result});
while(my ($engine, $cf) = each %{$permsgstatus->{razor2_cf_score}}) {
$pms->{razor2_result} ||= 0;
$pms->{razor2_cf_score} ||= {};
dbg("razor2: results: spam? " . $pms->{razor2_result});
while(my ($engine, $cf) = each %{$pms->{razor2_cf_score}}) {
dbg("razor2: results: engine $engine, highest cf score: $cf");
}
return $permsgstatus->{razor2_result};
if ($self->{main}->{conf}->{razor_fork}) {
# forked needs to run got_hit()
if ($pms->{razor2_rulename} && $pms->{razor2_result}) {
$pms->got_hit($pms->{razor2_rulename}, "", ruletype => 'eval');
}
# forked needs to run range callbacks
if ($pms->{razor2_range_callbacks}) {
foreach (@{$pms->{razor2_range_callbacks}}) {
$self->check_razor2_range($pms, '', @$_);
}
}
}
return $pms->{razor2_result};
}
# Check the cf value of a given message and return if it's within the
# given range
sub check_razor2_range {
my ($self, $permsgstatus, $body, $engine, $min, $max) = @_;
my ($self, $pms, $body, $engine, $min, $max, $rulename) = @_;
# If Razor2 isn't available, or the general test is disabled, don't
# continue.
return unless $self->{razor2_available};
return unless $self->{main}->{conf}->{use_razor2};
return unless $self->{main}->{conf}->{scores}->{'RAZOR2_CHECK'};
return 0 unless $self->{razor2_available};
return 0 unless $self->{main}->{conf}->{use_razor2};
# If Razor2 hasn't been checked yet, go ahead and run it.
unless (defined $permsgstatus->{razor2_result}) {
$self->check_razor2($permsgstatus, $body);
# Check if callback overriding rulename
if (!defined $rulename) {
$rulename = $pms->get_current_eval_rule_name();
}
if ($pms->{razor2_abort}) {
$pms->rule_ready($rulename); # mark rule ready for metas
return;
}
# If forked, call back later unless results are in
if ($self->{main}->{conf}->{razor_fork}) {
if (!defined $pms->{razor2_result}) {
dbg("razor2: delaying check_razor2_range call for $rulename");
# array matches check_razor2_range() argument order
push @{$pms->{razor2_range_callbacks}},
[$engine, $min, $max, $rulename];
return; # return undef for async status
}
} else {
# If Razor2 hasn't been checked yet, go ahead and run it.
# (only if we are non-forking.. forking will handle these in
# callbacks)
if (!$pms->{razor2_running}) {
$self->check_razor2($pms, $body);
}
}
$pms->rule_ready($rulename); # mark rule ready for metas
my $cf = 0;
if ($engine) {
$cf = $permsgstatus->{razor2_cf_score}->{$engine};
return unless defined $cf;
$cf = $pms->{razor2_cf_score}->{$engine};
return 0 unless defined $cf;
}
else {
# If no specific engine was given to the rule, find the highest cf
# determined and use that
while(my ($engine, $ecf) = each %{$permsgstatus->{razor2_cf_score}}) {
while(my ($engine, $ecf) = each %{$pms->{razor2_cf_score}}) {
if ($ecf > $cf) {
$cf = $ecf;
}
@ -462,13 +684,20 @@ sub check_razor2_range {
}
if ($cf >= $min && $cf <= $max) {
$permsgstatus->test_log(sprintf("cf: %3d", $cf));
my $cf_str = sprintf("cf: %3d", $cf);
$pms->test_log($cf_str, $rulename);
if ($self->{main}->{conf}->{razor_fork}) {
$pms->got_hit($rulename, "", ruletype => 'eval');
}
return 1;
}
return;
return 0;
}
# Version features
sub has_fork { 1 }
1;
=back

View File

@ -53,10 +53,8 @@ Following metadata headers and tags are added:
=head1 REQUIREMENT
This plugin requires the GeoIP2, Geo::IP, IP::Country::DB_File or
IP::Country::Fast module from CPAN.
For backward compatibility IP::Country::Fast is used as fallback if no db_type
is specified in the config file.
This plugin uses Mail::SpamAssassin::GeoDB and requires a module supported
by it, for example MaxMind::DB::Reader (GeoIP2).
=cut
@ -64,7 +62,6 @@ package Mail::SpamAssassin::Plugin::RelayCountry;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:ip);
use strict;
use warnings;
# use bytes;
@ -72,6 +69,11 @@ use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
my $db;
my $dbv6;
my $db_info; # will hold database info
my $db_type; # will hold database type
# constructor: register the eval rule
sub new {
my $class = shift;
@ -82,282 +84,39 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->set_config($mailsaobject->{conf});
# we need GeoDB country
$self->{main}->{geodb_wanted}->{country} = 1;
return $self;
}
sub set_config {
my ($self, $conf) = @_;
my @cmds;
=head1 USER PREFERENCES
The following options can be used in both site-wide (C<local.cf>) and
user-specific (C<user_prefs>) configuration files to customize how
SpamAssassin handles incoming email messages.
=over 4
=item country_db_type STRING
This option tells SpamAssassin which type of Geo database to use.
Valid database types are GeoIP, GeoIP2, DB_File and Fast.
=back
=cut
push (@cmds, {
setting => 'country_db_type',
default => "GeoIP",
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value !~ /^(?:GeoIP|GeoIP2|DB_File|Fast)$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{country_db_type} = $value;
}
});
=over 4
=item country_db_path STRING
This option tells SpamAssassin where to find MaxMind GeoIP2 or IP::Country::DB_File database.
If not defined, GeoIP2 default search includes:
/usr/local/share/GeoIP/GeoIP2-Country.mmdb
/usr/share/GeoIP/GeoIP2-Country.mmdb
/var/lib/GeoIP/GeoIP2-Country.mmdb
/usr/local/share/GeoIP/GeoLite2-Country.mmdb
/usr/share/GeoIP/GeoLite2-Country.mmdb
/var/lib/GeoIP/GeoLite2-Country.mmdb
(and same paths again for -City.mmdb, which also has country functionality)
=back
=cut
push (@cmds, {
setting => 'country_db_path',
default => "",
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
if (!defined $value || !length $value) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
if (!-e $value) {
info("config: country_db_path \"$value\" is not accessible");
$self->{country_db_path} = $value;
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{country_db_path} = $value;
}
});
push (@cmds, {
setting => 'geoip2_default_db_path',
default => [
'/usr/local/share/GeoIP/GeoIP2-Country.mmdb',
'/usr/share/GeoIP/GeoIP2-Country.mmdb',
'/var/lib/GeoIP/GeoIP2-Country.mmdb',
'/usr/local/share/GeoIP/GeoLite2-Country.mmdb',
'/usr/share/GeoIP/GeoLite2-Country.mmdb',
'/var/lib/GeoIP/GeoLite2-Country.mmdb',
'/usr/local/share/GeoIP/GeoIP2-City.mmdb',
'/usr/share/GeoIP/GeoIP2-City.mmdb',
'/var/lib/GeoIP/GeoIP2-City.mmdb',
'/usr/local/share/GeoIP/GeoLite2-City.mmdb',
'/usr/share/GeoIP/GeoLite2-City.mmdb',
'/var/lib/GeoIP/GeoLite2-City.mmdb',
],
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRINGLIST,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value eq '') {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
push(@{$self->{geoip2_default_db_path}}, split(/\s+/, $value));
}
});
$conf->{parser}->register_commands(\@cmds);
}
sub get_country {
my ($self, $ip, $db, $dbv6, $country_db_type) = @_;
my $cc;
my $IP_PRIVATE = IP_PRIVATE;
my $IPV4_ADDRESS = IPV4_ADDRESS;
# Private IPs will always be returned as '**'
if ($ip =~ /^$IP_PRIVATE$/o) {
$cc = "**";
}
elsif ($country_db_type eq "GeoIP") {
if ($ip =~ /^$IPV4_ADDRESS$/o) {
$cc = $db->country_code_by_addr($ip);
} elsif (defined $dbv6) {
$cc = $dbv6->country_code_by_addr_v6($ip);
}
}
elsif ($country_db_type eq "GeoIP2") {
my ($country, $country_rec);
eval {
if (index($db->metadata()->description()->{en}, 'City') != -1) {
$country = $db->city( ip => $ip );
} else {
$country = $db->country( ip => $ip );
}
$country_rec = $country->country();
$cc = $country_rec->iso_code();
1;
} or do {
$@ =~ s/\s+Trace begun.*//s;
dbg("metadata: RelayCountry: GeoIP2 failed: $@");
}
}
elsif ($country_db_type eq "DB_File") {
if ($ip =~ /^$IPV4_ADDRESS$/o ) {
$cc = $db->inet_atocc($ip);
} else {
$cc = $db->inet6_atocc($ip);
}
}
elsif ($country_db_type eq "Fast") {
$cc = $db->inet_atocc($ip);
}
$cc ||= 'XX';
return $cc;
}
sub extract_metadata {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $db;
my $dbv6;
my $db_info; # will hold database info
my $db_type; # will hold database type
return if $self->{relaycountry_disabled};
my $country_db_type = $opts->{conf}->{country_db_type};
my $country_db_path = $opts->{conf}->{country_db_path};
if ($country_db_type eq "GeoIP") {
eval {
require Geo::IP;
$db = Geo::IP->open_type(Geo::IP->GEOIP_COUNTRY_EDITION, Geo::IP->GEOIP_STANDARD);
die "GeoIP.dat not found" unless $db;
# IPv6 requires version Geo::IP 1.39+ with GeoIP C API 1.4.7+
if (Geo::IP->VERSION >= 1.39 && Geo::IP->api eq 'CAPI') {
$dbv6 = Geo::IP->open_type(Geo::IP->GEOIP_COUNTRY_EDITION_V6, Geo::IP->GEOIP_STANDARD);
if (!$dbv6) {
dbg("metadata: RelayCountry: GeoIP: IPv6 support not enabled, GeoIPv6.dat not found");
}
} else {
dbg("metadata: RelayCountry: GeoIP: IPv6 support not enabled, versions Geo::IP 1.39, GeoIP C API 1.4.7 required");
}
$db_info = sub { return "Geo::IP IPv4: " . ($db->database_info || '?')." / IPv6: ".($dbv6 ? $dbv6->database_info || '?' : '?') };
1;
} or do {
# Fallback to IP::Country::Fast
dbg("metadata: RelayCountry: GeoIP: GeoIP.dat not found, trying IP::Country::Fast as fallback");
$country_db_type = "Fast";
}
}
elsif ($country_db_type eq "GeoIP2") {
if (!$country_db_path) {
# Try some default locations
foreach (@{$opts->{conf}->{geoip2_default_db_path}}) {
if (-f $_) {
$country_db_path = $_;
last;
}
}
}
if (-f $country_db_path) {
eval {
require GeoIP2::Database::Reader;
$db = GeoIP2::Database::Reader->new(
file => $country_db_path,
locales => [ 'en' ]
);
die "unknown error" unless $db;
$db_info = sub {
my $m = $db->metadata();
return "GeoIP2 ".$m->description()->{en}." / ".localtime($m->build_epoch());
};
1;
} or do {
# Fallback to IP::Country::Fast
$@ =~ s/\s+Trace begun.*//s;
dbg("metadata: RelayCountry: GeoIP2: ${country_db_path} load failed: $@, trying IP::Country::Fast as fallback");
$country_db_type = "Fast";
}
} else {
# Fallback to IP::Country::Fast
my $err = $country_db_path ?
"$country_db_path not found" : "database not found from default locations";
dbg("metadata: RelayCountry: GeoIP2: $err, trying IP::Country::Fast as fallback");
$country_db_type = "Fast";
}
}
elsif ($country_db_type eq "DB_File") {
if (-f $country_db_path) {
eval {
require IP::Country::DB_File;
$db = IP::Country::DB_File->new($country_db_path);
die "unknown error" unless $db;
$db_info = sub { return "IP::Country::DB_File ".localtime($db->db_time()); };
1;
} or do {
# Fallback to IP::Country::Fast
dbg("metadata: RelayCountry: DB_File: ${country_db_path} load failed: $@, trying IP::Country::Fast as fallback");
$country_db_type = "Fast";
}
} else {
# Fallback to IP::Country::Fast
dbg("metadata: RelayCountry: DB_File: ${country_db_path} not found, trying IP::Country::Fast as fallback");
$country_db_type = "Fast";
}
if (!$self->{main}->{geodb} ||
!$self->{main}->{geodb}->can('country')) {
dbg("metadata: RelayCountry: plugin disabled, GeoDB country not available");
$self->{relaycountry_disabled} = 1;
return;
}
if ($country_db_type eq "Fast") {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
eval {
require IP::Country::Fast;
$db = IP::Country::Fast->new();
$db_info = sub { return "IP::Country::Fast ".localtime($db->db_time()); };
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("metadata: RelayCountry: failed to load 'IP::Country::Fast', skipping: $eval_stat");
return 1;
}
}
if (!$db) {
return 1;
}
dbg("metadata: RelayCountry: Using database: ".$db_info->());
my $msg = $opts->{msg};
my $geodb = $self->{main}->{geodb};
my @cc_untrusted;
foreach my $relay (@{$msg->{metadata}->{relays_untrusted}}) {
my $ip = $relay->{ip};
my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
my $cc = $geodb->get_country($ip);
push @cc_untrusted, $cc;
}
my @cc_external;
foreach my $relay (@{$msg->{metadata}->{relays_external}}) {
my $ip = $relay->{ip};
my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
my $cc = $geodb->get_country($ip);
push @cc_external, $cc;
}
@ -369,7 +128,7 @@ sub extract_metadata {
}
if ($found_auth) {
my $ip = $relay->{ip};
my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
my $cc = $geodb->get_country($ip);
push @cc_auth, $cc;
}
}
@ -377,7 +136,7 @@ sub extract_metadata {
my @cc_all;
foreach my $relay (@{$msg->{metadata}->{relays_internal}}, @{$msg->{metadata}->{relays_external}}) {
my $ip = $relay->{ip};
my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
my $cc = $geodb->get_country($ip);
push @cc_all, $cc;
}
@ -400,8 +159,6 @@ sub extract_metadata {
$msg->put_metadata("X-Relay-Countries-All", $ccstr);
dbg("metadata: X-Relay-Countries-All: $ccstr");
$pms->set_tag("RELAYCOUNTRYALL", @cc_all == 1 ? $cc_all[0] : \@cc_all);
return 1;
}
1;

View File

@ -28,6 +28,8 @@ use re 'taint';
our @ISA = qw(Mail::SpamAssassin::Plugin);
my $IPV4_ADDRESS = IPV4_ADDRESS;
# constructor: register the eval rule
sub new {
my $class = shift;
@ -39,17 +41,17 @@ sub new {
bless ($self, $class);
# the important bit!
$self->register_eval_rule("check_for_numeric_helo");
$self->register_eval_rule("check_for_illegal_ip");
$self->register_eval_rule("check_all_trusted");
$self->register_eval_rule("check_no_relays");
$self->register_eval_rule("check_relays_unparseable");
$self->register_eval_rule("check_for_sender_no_reverse");
$self->register_eval_rule("check_for_from_domain_in_received_headers");
$self->register_eval_rule("check_for_forged_received_trail");
$self->register_eval_rule("check_for_forged_received_ip_helo");
$self->register_eval_rule("helo_ip_mismatch");
$self->register_eval_rule("check_for_no_rdns_dotcom_helo");
$self->register_eval_rule("check_for_numeric_helo"); # type does not matter
$self->register_eval_rule("check_for_illegal_ip"); # type does not matter
$self->register_eval_rule("check_all_trusted"); # type does not matter
$self->register_eval_rule("check_no_relays"); # type does not matter
$self->register_eval_rule("check_relays_unparseable"); # type does not matter
$self->register_eval_rule("check_for_sender_no_reverse"); # type does not matter
$self->register_eval_rule("check_for_from_domain_in_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule("check_for_forged_received_trail"); # type does not matter
$self->register_eval_rule("check_for_forged_received_ip_helo"); # type does not matter
$self->register_eval_rule("helo_ip_mismatch"); # type does not matter
$self->register_eval_rule("check_for_no_rdns_dotcom_helo"); # type does not matter
return $self;
}
@ -72,7 +74,7 @@ sub hostname_to_domain {
}
}
sub _helo_forgery_whitelisted {
sub _helo_forgery_welcomelisted {
my ($helo, $rdns) = @_;
if ($helo eq 'msn.com' && $rdns eq 'hotmail.com') { return 1; }
0;
@ -84,12 +86,10 @@ sub check_for_numeric_helo {
my $rcvd = $pms->{relays_untrusted_str};
if ($rcvd) {
my $IP_ADDRESS = IPV4_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
local $1;
# no re "strict"; # since perl 5.21.8: Ranges of ASCII printables...
if ($rcvd =~ /\bhelo=($IP_ADDRESS)(?=[\000-\040,;\[()<>]|\z)/i # Bug 5878
&& $1 !~ /$IP_PRIVATE/) {
if ($rcvd =~ /\bhelo=($IPV4_ADDRESS)(?=[\000-\040,;\[()<>]|\z)/i # Bug 5878
&& $1 !~ IS_IP_PRIVATE) {
return 1;
}
}
@ -108,23 +108,21 @@ sub check_for_illegal_ip {
# due to bug in pure IPv6 address regular expression
sub helo_ip_mismatch {
my ($self, $pms) = @_;
my $IP_ADDRESS = IPV4_ADDRESS;
my $IP_PRIVATE = IP_PRIVATE;
for my $relay (@{$pms->{relays_untrusted}}) {
# is HELO usable?
next unless ($relay->{helo} =~ m/^$IP_ADDRESS$/ &&
$relay->{helo} !~ /$IP_PRIVATE/);
next unless ($relay->{helo} =~ IS_IPV4_ADDRESS &&
$relay->{helo} !~ IS_IP_PRIVATE);
# compare HELO with IP
return 1 if ($relay->{ip} =~ m/^$IP_ADDRESS$/ &&
$relay->{ip} !~ m/$IP_PRIVATE/ &&
return 1 if ($relay->{ip} =~ IS_IPV4_ADDRESS &&
$relay->{ip} !~ IS_IP_PRIVATE &&
$relay->{helo} ne $relay->{ip} &&
# different IP is okay if in same /24
$relay->{helo} =~ /^(\d+\.\d+\.\d+\.)/ &&
index($relay->{ip}, $1) != 0);
}
0;
return 0;
}
###########################################################################
@ -145,7 +143,7 @@ sub check_no_relays {
sub check_relays_unparseable {
my ($self, $pms) = @_;
return $pms->{num_relays_unparseable};
return $pms->{num_relays_unparseable} ? 1 : 0;
}
# Check if the apparent sender (in the last received header) had
@ -215,7 +213,7 @@ sub check_for_from_domain_in_received_headers {
sub check_for_no_rdns_dotcom_helo {
my ($self, $pms) = @_;
if (!exists $pms->{no_rdns_dotcom_helo}) { $self->_check_received_helos($pms); }
return $pms->{no_rdns_dotcom_helo};
return $pms->{no_rdns_dotcom_helo} ? 1 : 0;
}
# Bug 1133
@ -346,7 +344,7 @@ sub _check_for_forged_received {
# allow private IP addrs here, could be a legit screwup
if ($hclassb && $fclassb &&
$hclassb ne $fclassb &&
!($hlo =~ /$IP_PRIVATE/o))
$hlo !~ IS_IP_PRIVATE)
{
dbg2("eval: forged-HELO: massive mismatch on IP-addr HELO: '$hlo' != '$fip'");
$pms->{mismatch_ip_helo}++;
@ -357,7 +355,7 @@ sub _check_for_forged_received {
my $prev = $from[$i-1];
if (defined($prev) && $i > 0
&& $prev =~ /^\w+(?:[\w.-]+\.)+\w+$/
&& $by ne $prev && !_helo_forgery_whitelisted($by, $prev))
&& $by ne $prev && !_helo_forgery_welcomelisted($by, $prev))
{
dbg2("eval: forged-HELO: mismatch on from: '$prev' != '$by'");
$pms->{mismatch_from}++;

View File

@ -49,7 +49,6 @@ SpamAssassin; it is not guaranteed to work with other versions of SpamAssassin.
package Mail::SpamAssassin::Plugin::ReplaceTags;
use Mail::SpamAssassin;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(compile_regexp qr_to_string);
@ -173,12 +172,15 @@ sub finish_parsing_end {
# do the actual replacement
my ($rec, $err) = compile_regexp($re, 0);
if (!$rec) {
info("replacetags: regexp compilation failed '$re': $err");
info("replacetags: regexp compilation failed for $rule: '$re': $err");
next;
}
$conf->{test_qrs}->{$rule} = $rec;
#dbg("replacetags: replaced $rule: '$origre' => '$re'");
if (would_log('dbg','replacetags') > 1) {
dbg("replacetags: replaced $rule: '$origre' => '$re'");
} else {
dbg("replacetags: replaced $rule");
}
} else {
dbg("replacetags: nothing was replaced in $rule");
}

View File

@ -68,15 +68,15 @@ resident set size
package Mail::SpamAssassin::Plugin::ResourceLimits;
use Mail::SpamAssassin::Plugin ();
use Mail::SpamAssassin::Logger ();
use Mail::SpamAssassin::Util ();
use Mail::SpamAssassin::Constants qw(:sa);
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use strict;
use warnings;
use re 'taint';
use BSD::Resource qw(RLIMIT_RSS RLIMIT_AS RLIMIT_CPU);
use constant HAS_BSD_RESOURCE =>
eval 'use BSD::Resource qw(RLIMIT_CPU RLIMIT_RSS RLIMIT_AS); 1;';
our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -86,58 +86,62 @@ sub new {
$class = ref($class) || $class;
my $self = $class->SUPER::new($mailsaobject);
bless( $self, $class );
bless ($self, $class);
$self->set_config( $mailsaobject->{conf} );
if (!HAS_BSD_RESOURCE) {
warn "ResourceLimits not used, required module BSD::Resource missing\n";
}
$self->set_config($mailsaobject->{conf});
return $self;
}
sub set_config {
my ( $self, $conf ) = @_;
my @cmds = ();
my ($self, $conf) = @_;
my @cmds;
push(
@cmds,
{
push(@cmds, {
setting => 'resource_limit_mem',
is_admin => 1,
default => '0',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
}
);
});
push(
@cmds,
{
push(@cmds, {
setting => 'resource_limit_cpu',
is_admin => 1,
default => '0',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
}
);
});
$conf->{parser}->register_commands( \@cmds );
$conf->{parser}->register_commands(\@cmds);
}
if (HAS_BSD_RESOURCE) { eval '
sub spamd_child_init {
my ($self) = @_;
my $conf = $self->{main}->{conf};
# Set CPU Resource limits if they were specified.
Mail::SpamAssassin::Util::dbg("resourcelimitplugin: In spamd_child_init");
Mail::SpamAssassin::Util::dbg( "resourcelimitplugin: cpu limit: " . $self->{main}->{conf}->{resource_limit_cpu} );
if ( $self->{main}->{conf}->{resource_limit_cpu} ) {
BSD::Resource::setrlimit( RLIMIT_CPU, $self->{main}->{conf}->{resource_limit_cpu}, $self->{main}->{conf}->{resource_limit_cpu} )
|| info("resourcelimitplugin: Unable to set RLIMIT_CPU");
dbg("resourcelimits: cpu limit: " . $conf->{resource_limit_cpu});
if ($conf->{resource_limit_cpu}) {
BSD::Resource::setrlimit( RLIMIT_CPU,
$conf->{resource_limit_cpu}, $conf->{resource_limit_cpu} )
or info("resourcelimits: Unable to set RLIMIT_CPU");
}
# Set Resource limits if they were specified.
Mail::SpamAssassin::Util::dbg( "resourcelimitplugin: mem limit: " . $self->{main}->{conf}->{resource_limit_mem} );
if ( $self->{main}->{conf}->{resource_limit_mem} ) {
BSD::Resource::setrlimit( RLIMIT_RSS, $self->{main}->{conf}->{resource_limit_mem}, $self->{main}->{conf}->{resource_limit_mem} )
|| info("resourcelimitplugin: Unable to set RLIMIT_RSS");
BSD::Resource::setrlimit( RLIMIT_AS, $self->{main}->{conf}->{resource_limit_mem}, $self->{main}->{conf}->{resource_limit_mem} )
|| info("resourcelimitplugin: Unable to set RLIMIT_AS");
dbg("resourcelimits: mem limit: " . $conf->{resource_limit_mem});
if ($conf->{resource_limit_mem}) {
BSD::Resource::setrlimit( RLIMIT_RSS,
$conf->{resource_limit_mem}, $conf->{resource_limit_mem} )
or info("resourcelimits: Unable to set RLIMIT_RSS");
BSD::Resource::setrlimit( RLIMIT_AS,
$conf->{resource_limit_mem}, $conf->{resource_limit_mem} )
or info("resourcelimits: Unable to set RLIMIT_AS");
}
}
'; }
1;

View File

@ -10,6 +10,8 @@ Mail::SpamAssassin::Plugin::Reuse - For reusing old rule hits during a mass-chec
reuse NETWORK_RULE [ NETWORK_RULE_OLD_NAME ]
run_reuse_tests_only 0/1
endif
=head1 DESCRIPTION
@ -18,6 +20,15 @@ The purpose of this plugin is to work in conjunction with B<mass-check
--reuse> to map rules hit in input messages to rule hits in the
mass-check output.
run_reuse_tests_only 1 is special option for spamassassin/spamd use.
Only reuse flagged tests will be run. It will also _enable_ network/DNS
lookups. This is mainly intended for fast mass processing of corpus
messages, so they can be properly reused later. For example:
spamd --pre="loadmodule Mail::SpamAssassin::Plugin::Reuse" \
--pre="run_reuse_tests_only 1" ...
Such dedicated spamd could be scripted to add X-Spam-Status header to
messages efficiently.
=cut
package Mail::SpamAssassin::Plugin::Reuse;
@ -25,12 +36,16 @@ package Mail::SpamAssassin::Plugin::Reuse;
# use bytes;
use strict;
use warnings;
use re 'taint';
use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Constants qw(:sa);
our @ISA = qw(Mail::SpamAssassin::Plugin);
my $RULENAME_RE = RULENAME_RE;
# constructor
sub new {
my $invocant = shift;
@ -42,7 +57,7 @@ sub new {
bless ($self, $class);
$self->set_config($samain->{conf});
# make sure we run last (or close) of the finish_parsing_start since
# make sure we run last (or close) of the finish_parsing_end since
# we need all other rules to be defined
$self->register_method_priority("finish_parsing_start", 100);
return $self;
@ -56,28 +71,35 @@ sub set_config {
# e.g.
# reuse NET_TEST_V1 NET_TEST_V0
push (@cmds, { setting => 'reuse',
push (@cmds, {
setting => 'reuse',
type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
code => sub {
my ($conf, $key, $value, $line) = @_;
if ($value !~ /\s*(\w+)(?:\s+(\w+(?:\s+\w+)*))?\s*$/) {
if ($value !~ /^\s*(${RULENAME_RE})(?:\s+(${RULENAME_RE}(?:\s+${RULENAME_RE})*))?\s*$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
my $new_name = $1;
my @old_names = ($new_name);
if ($2) {
push @old_names, split (' ', $2);
if (defined $2) {
push @old_names, split (/\s+/, $2);
}
dbg("reuse: read rule, old: @old_names new: $new_name");
dbg("reuse: read rule, old: %s new: %s", join(' ', @old_names), $new_name);
foreach my $old (@old_names) {
push @{$conf->{reuse_tests}->{$new_name}}, $old;
}
}
});
}});
push(@cmds, {
setting => 'run_reuse_tests_only',
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
$conf->{parser}->register_commands(\@cmds);
}
@ -86,11 +108,27 @@ sub finish_parsing_start {
my ($self, $opts) = @_;
my $conf = $opts->{conf};
my $tflags = $conf->{tflags};
dbg("reuse: finish_parsing_start called");
while (my($rulename,$tfl) = each %{$tflags}) {
if ($tfl =~ /\bnet\b/ && !exists $conf->{reuse_tests}->{$rulename}) {
dbg("reuse: forcing reuse of net rule $rulename");
push @{$conf->{reuse_tests}->{$rulename}}, $rulename;
}
}
return 0 if (!exists $conf->{reuse_tests});
if ($conf->{run_reuse_tests_only}) {
# simply delete all rules not reuse
foreach (keys %{$conf->{tests}}) {
if (!defined $conf->{reuse_tests}->{$_}) {
delete $conf->{tests}->{$_};
}
}
return 0;
}
foreach my $rule_name (keys %{$conf->{reuse_tests}}) {
# If the rule does not exist, add a new EMPTY test, set default score
@ -100,7 +138,7 @@ sub finish_parsing_start {
}
if (!exists $conf->{scores}->{$rule_name}) {
my $set_score = ($rule_name =~/^T_/) ? 0.01 : 1.0;
$set_score = -$set_score if ( ($conf->{tflags}->{$rule_name}||'') =~ /\bnice\b/ );
$set_score = -$set_score if ( ($tflags->{$rule_name}||'') =~ /\bnice\b/ );
foreach my $ss (0..3) {
$conf->{scoreset}->[$ss]->{$rule_name} = $set_score;
}
@ -108,7 +146,7 @@ sub finish_parsing_start {
# Figure out when to add any hits -- grab priority and "stage"
my $priority = $conf->{priority}->{$rule_name} || 0;
my $stage = $self->_get_stage_from_rule($opts->{conf}, $rule_name);
my $stage = $self->_get_stage_from_rule($conf, $rule_name);
$conf->{reuse_tests_order}->{$rule_name} = [ $priority, $stage ];
}
@ -118,6 +156,10 @@ sub check_start {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $conf = $pms->{conf};
my $scoreset = $conf->{scoreset};
return 0 if $conf->{run_reuse_tests_only};
# Can we reuse?
my $msg = $pms->get_message();
@ -130,30 +172,34 @@ sub check_start {
# now go through the rules and priorities and figure out which ones
# need to be disabled
foreach my $rule (keys %{$pms->{conf}->{reuse_tests}}) {
foreach my $rule (keys %{$conf->{reuse_tests}}) {
dbg("reuse: looking at rule $rule");
my ($priority, $stage) = @{$pms->{conf}->{reuse_tests_order}->{$rule}};
my ($priority, $stage) = @{$conf->{reuse_tests_order}->{$rule}};
# score set could change after check_start but before we add hits,
# so we need to disable the rule in all sets
my @dis;
foreach my $ss (0..3) {
if (exists $pms->{conf}->{scoreset}->[$ss]->{$rule}) {
dbg("reuse: disabling rule $rule in score set $ss");
if (exists $scoreset->[$ss]->{$rule}) {
$pms->{reuse_old_scores}->{$rule}->[$ss] =
$pms->{conf}->{scoreset}->[$ss]->{$rule};
$pms->{conf}->{scoreset}->[$ss]->{$rule} = 0;
$scoreset->[$ss]->{$rule};
$scoreset->[$ss]->{$rule} = 0;
push @dis, $ss;
}
}
dbg("reuse: disabling rule $rule in score sets %s",
join(',', @dis)) if @dis;
# now, check for hits
OLD: foreach my $old_test (@{$pms->{conf}->{reuse_tests}->{$rule}}) {
dbg("reuse: looking for rule $old_test");
foreach my $old_test (@{$conf->{reuse_tests}->{$rule}}) {
if ($old_hash->{$old_test}) {
push @{$pms->{reuse_hits_to_add}->{"$priority $stage"}}, $rule;
dbg("reuse: rule $rule hit, will add at priority $priority, stage " .
"$stage");
last OLD;
last;
} else {
# Make sure rule is marked ready for meta rules
$pms->rule_ready($rule);
}
}
}
@ -163,11 +209,15 @@ sub check_end {
my ($self, $opts) = @_;
my $pms = $opts->{permsgstatus};
my $conf = $pms->{conf};
my $scoreset = $conf->{scoreset};
return 0 if $conf->{run_reuse_tests_only};
foreach my $disabled_rule (keys %{$pms->{reuse_old_scores}}) {
foreach my $ss (0..3) {
next unless exists $pms->{conf}->{scoreset}->[$ss]->{$disabled_rule};
$pms->{conf}->{scoreset}->[$ss]->{$disabled_rule} =
next unless exists $scoreset->[$ss]->{$disabled_rule};
$scoreset->[$ss]->{$disabled_rule} =
$pms->{reuse_old_scores}->{$disabled_rule}->[$ss];
}
}
@ -178,8 +228,11 @@ sub check_end {
sub start_rules {
my ($self, $opts) = @_;
return $self->_add_hits($opts->{permsgstatus}, $opts->{priority},
$opts->{ruletype});
my $pms = $opts->{permsgstatus};
return 0 if $pms->{conf}->{run_reuse_tests_only};
return $self->_add_hits($pms, $opts->{priority}, $opts->{ruletype});
}
sub _add_hits {
@ -215,7 +268,7 @@ my %type_to_stage = (
$Mail::SpamAssassin::Conf::TYPE_URI_EVALS => "eval",
$Mail::SpamAssassin::Conf::TYPE_META_TESTS => "meta",
$Mail::SpamAssassin::Conf::TYPE_RBL_EVALS => "eval",
);
);
sub _get_stage_from_rule {
my ($self, $conf, $rule) = @_;
@ -239,4 +292,4 @@ sub _get_stage_from_rule {
}
}
1;

View File

@ -143,6 +143,16 @@ sub setup_test_set_pri {
dbg "zoom: skipping rule $name, ReplaceTags";
next;
}
# ignore regex capture rules
if ($conf->{capture_rules}->{$name}) {
dbg "zoom: skipping rule $name, regex capture";
next;
}
# ignore regex capture template rules
if ($conf->{capture_template_rules}->{$name}) {
dbg "zoom: skipping rule $name, regex capture template";
next;
}
# we have the rule, and its regexp matches. zero out the body
# rule, so that the module can do the work instead
@ -197,6 +207,11 @@ sub check_rules_at_priority {
$self->{one_line_body}->check_rules_at_priority($params);
}
sub check_cleanup {
my ($self, $params) = @_;
$self->{one_line_body}->check_cleanup($params);
}
###########################################################################
sub run_body_fast_scan {

View File

@ -29,6 +29,12 @@ This plugin checks a message against Sender Policy Framework (SPF)
records published by the domain owners in DNS to fight email address
forgery and make it easier to identify spams.
It's recommended to use MTA filter (pypolicyd-spf / spf-engine etc), so this
plugin can reuse the Received-SPF and/or Authentication-Results header results as is.
Otherwise throughput could suffer, DNS lookups done by this plugin are not
asynchronous.
Those headers will also help when SpamAssassin is not able to correctly detect EnvelopeFrom.
=cut
package Mail::SpamAssassin::Plugin::SPF;
@ -53,22 +59,25 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule ("check_for_spf_pass");
$self->register_eval_rule ("check_for_spf_neutral");
$self->register_eval_rule ("check_for_spf_none");
$self->register_eval_rule ("check_for_spf_fail");
$self->register_eval_rule ("check_for_spf_softfail");
$self->register_eval_rule ("check_for_spf_permerror");
$self->register_eval_rule ("check_for_spf_temperror");
$self->register_eval_rule ("check_for_spf_helo_pass");
$self->register_eval_rule ("check_for_spf_helo_neutral");
$self->register_eval_rule ("check_for_spf_helo_none");
$self->register_eval_rule ("check_for_spf_helo_fail");
$self->register_eval_rule ("check_for_spf_helo_softfail");
$self->register_eval_rule ("check_for_spf_helo_permerror");
$self->register_eval_rule ("check_for_spf_helo_temperror");
$self->register_eval_rule ("check_for_spf_whitelist_from");
$self->register_eval_rule ("check_for_def_spf_whitelist_from");
$self->register_eval_rule ("check_for_spf_pass", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_neutral", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_none", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_fail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_softfail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_permerror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_temperror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_pass", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_neutral", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_none", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_fail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_softfail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_permerror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_helo_temperror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_spf_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); # removed in 4.1
$self->register_eval_rule ("check_for_def_spf_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->register_eval_rule ("check_for_def_spf_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); # removed in 4.1
$self->register_eval_rule ("check_spf_skipped_noenvfrom", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
$self->set_config($mailsaobject->{conf});
@ -85,37 +94,43 @@ sub set_config {
=over 4
=item whitelist_from_spf user@example.com
=item welcomelist_from_spf user@example.com
Works similarly to whitelist_from, except that in addition to matching
a sender address, a check against the domain's SPF record must pass.
The first parameter is an address to whitelist, and the second is a string
to match the relay's rDNS.
Previously whitelist_from_spf which will work interchangeably until 4.1.
Just like whitelist_from, multiple addresses per line, separated by spaces,
are OK. Multiple C<whitelist_from_spf> lines are also OK.
Works similarly to welcomelist_from, except that in addition to matching a
sender address, a check against the domain's SPF record must pass. The
first parameter is an address to welcomelist, and the second is a string to
match the relay's rDNS.
The headers checked for whitelist_from_spf addresses are the same headers
Just like welcomelist_from, multiple addresses per line, separated by
spaces, are OK. Multiple C<welcomelist_from_spf> lines are also OK.
The headers checked for welcomelist_from_spf addresses are the same headers
used for SPF checks (Envelope-From, Return-Path, X-Envelope-From, etc).
Since this whitelist requires an SPF check to be made, network tests must be
Since this welcomelist requires an SPF check to be made, network tests must be
enabled. It is also required that your trust path be correctly configured.
See the section on C<trusted_networks> for more info on trust paths.
e.g.
whitelist_from_spf joe@example.com fred@example.com
whitelist_from_spf *@example.com
welcomelist_from_spf joe@example.com fred@example.com
welcomelist_from_spf *@example.com
=item def_whitelist_from_spf user@example.com
=item def_welcomelist_from_spf user@example.com
Same as C<whitelist_from_spf>, but used for the default whitelist entries
in the SpamAssassin distribution. The whitelist score is lower, because
Previously def_whitelist_from_spf which will work interchangeably until 4.1.
Same as C<welcomelist_from_spf>, but used for the default welcomelist entries
in the SpamAssassin distribution. The welcomelist score is lower, because
these are often targets for spammer spoofing.
=item unwhitelist_from_spf user@example.com
=item unwelcomelist_from_spf user@example.com
Used to remove a C<whitelist_from_spf> or C<def_whitelist_from_spf> entry.
Previously unwhitelist_from_spf which will work interchangeably until 4.1.
Used to remove a C<welcomelist_from_spf> or C<def_welcomelist_from_spf> entry.
The specified email address has to match exactly the address previously used.
Useful for removing undesired default entries from a distributed configuration
@ -124,17 +139,20 @@ by a local or site-specific configuration or by C<user_prefs>.
=cut
push (@cmds, {
setting => 'whitelist_from_spf',
setting => 'welcomelist_from_spf',
aliases => ['whitelist_from_spf'], # removed in 4.1
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST
});
push (@cmds, {
setting => 'def_whitelist_from_spf',
setting => 'def_welcomelist_from_spf',
aliases => ['def_whitelist_from_spf'], # removed in 4.1
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST
});
push (@cmds, {
setting => 'unwhitelist_from_spf',
setting => 'unwelcomelist_from_spf',
aliases => ['unwhitelist_from_spf'], # removed in 4.1
type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
code => sub {
my ($self, $key, $value, $line) = @_;
@ -144,9 +162,9 @@ by a local or site-specific configuration or by C<user_prefs>.
unless ($value =~ /^(?:\S+(?:\s+\S+)*)$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
$self->{parser}->remove_from_addrlist('whitelist_from_spf',
$self->{parser}->remove_from_addrlist('welcomelist_from_spf',
split (/\s+/, $value));
$self->{parser}->remove_from_addrlist('def_whitelist_from_spf',
$self->{parser}->remove_from_addrlist('def_welcomelist_from_spf',
split (/\s+/, $value));
}
});
@ -173,38 +191,6 @@ days, weeks).
type => $Mail::SpamAssassin::Conf::CONF_TYPE_DURATION
});
=item do_not_use_mail_spf (0|1) (default: 0)
By default the plugin will try to use the Mail::SPF module for SPF checks if
it can be loaded. If Mail::SPF cannot be used the plugin will fall back to
using the legacy Mail::SPF::Query module if it can be loaded.
Use this option to stop the plugin from using Mail::SPF and cause it to try to
use Mail::SPF::Query instead.
=cut
push(@cmds, {
setting => 'do_not_use_mail_spf',
is_admin => 1,
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
=item do_not_use_mail_spf_query (0|1) (default: 0)
As above, but instead stop the plugin from trying to use Mail::SPF::Query and
cause it to only try to use Mail::SPF.
=cut
push(@cmds, {
setting => 'do_not_use_mail_spf_query',
is_admin => 1,
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
=item ignore_received_spf_header (0|1) (default: 0)
By default, to avoid unnecessary DNS lookups, the plugin will try to use the
@ -250,10 +236,23 @@ working downwards until results are successfully parsed.
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
# Deprecated since 4.0.0, leave for backwards compatibility
push(@cmds, {
setting => 'do_not_use_mail_spf',
is_admin => 1,
default => 0,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
push(@cmds, {
setting => 'do_not_use_mail_spf_query',
is_admin => 1,
default => 1,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
});
$conf->{parser}->register_commands(\@cmds);
}
=item has_check_for_spf_errors
Adds capability check for "if can()" for check_for_spf_permerror, check_for_spf_temperror, check_for_spf_helo_permerror and check_for_spf_helo_permerror
@ -262,23 +261,39 @@ Adds capability check for "if can()" for check_for_spf_permerror, check_for_spf_
sub has_check_for_spf_errors { 1 }
=item has_check_spf_skipped_noenvfrom
Adds capability check for "if can()" for check_spf_skipped_noenvfrom
=cut
sub has_check_spf_skipped_noenvfrom { 1 }
sub parsed_metadata {
my ($self, $opts) = @_;
$self->_get_sender($opts->{permsgstatus});
return 1;
}
# SPF support
sub check_for_spf_pass {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
$scanner->{spf_pass};
return $scanner->{spf_pass} ? 1 : 0;
}
sub check_for_spf_neutral {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
$scanner->{spf_neutral};
return $scanner->{spf_neutral} ? 1 : 0;
}
sub check_for_spf_none {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
$scanner->{spf_none};
return $scanner->{spf_none} ? 1 : 0;
}
sub check_for_spf_fail {
@ -287,43 +302,43 @@ sub check_for_spf_fail {
if ($scanner->{spf_failure_comment}) {
$scanner->test_log ($scanner->{spf_failure_comment});
}
$scanner->{spf_fail};
return $scanner->{spf_fail} ? 1 : 0;
}
sub check_for_spf_softfail {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
$scanner->{spf_softfail};
return $scanner->{spf_softfail} ? 1 : 0;
}
sub check_for_spf_permerror {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
$scanner->{spf_permerror};
return $scanner->{spf_permerror} ? 1 : 0;
}
sub check_for_spf_temperror {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
$scanner->{spf_temperror};
return $scanner->{spf_temperror} ? 1 : 0;
}
sub check_for_spf_helo_pass {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
$scanner->{spf_helo_pass};
return $scanner->{spf_helo_pass} ? 1 : 0;
}
sub check_for_spf_helo_neutral {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
$scanner->{spf_helo_neutral};
return $scanner->{spf_helo_neutral} ? 1 : 0;
}
sub check_for_spf_helo_none {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
$scanner->{spf_helo_none};
return $scanner->{spf_helo_none} ? 1 : 0;
}
sub check_for_spf_helo_fail {
@ -332,38 +347,60 @@ sub check_for_spf_helo_fail {
if ($scanner->{spf_helo_failure_comment}) {
$scanner->test_log ($scanner->{spf_helo_failure_comment});
}
$scanner->{spf_helo_fail};
return $scanner->{spf_helo_fail} ? 1 : 0;
}
sub check_for_spf_helo_softfail {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
$scanner->{spf_helo_softfail};
return $scanner->{spf_helo_softfail} ? 1 : 0;
}
sub check_for_spf_helo_permerror {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
$scanner->{spf_helo_permerror};
return $scanner->{spf_helo_permerror} ? 1 : 0;
}
sub check_for_spf_helo_temperror {
my ($self, $scanner) = @_;
$self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
$scanner->{spf_helo_temperror};
return $scanner->{spf_helo_temperror} ? 1 : 0;
}
sub check_for_spf_whitelist_from {
=over 4
=item check_spf_skipped_noenvfrom
Checks if SPF checks have been skipped because EnvelopeFrom cannot be determined.
=back
=cut
sub check_spf_skipped_noenvfrom {
my ($self, $scanner) = @_;
$self->_check_spf_whitelist($scanner) unless $scanner->{spf_whitelist_from_checked};
$scanner->{spf_whitelist_from};
$self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
if (!exists $scanner->{spf_sender}) {
return 1;
} else {
return 0;
}
}
sub check_for_def_spf_whitelist_from {
sub check_for_spf_welcomelist_from {
my ($self, $scanner) = @_;
$self->_check_def_spf_whitelist($scanner) unless $scanner->{def_spf_whitelist_from_checked};
$scanner->{def_spf_whitelist_from};
$self->_check_spf_welcomelist($scanner) unless $scanner->{spf_welcomelist_from_checked};
return $scanner->{spf_welcomelist_from} ? 1 : 0;
}
*check_for_spf_whitelist_from = \&check_for_spf_welcomelist_from; # removed in 4.1
sub check_for_def_spf_welcomelist_from {
my ($self, $scanner) = @_;
$self->_check_def_spf_welcomelist($scanner) unless $scanner->{def_spf_welcomelist_from_checked};
return $scanner->{def_spf_welcomelist_from} ? 1 : 0;
}
*check_for_def_spf_whitelist_from = \&check_for_def_spf_welcomelist_from; # removed in 4.1
sub _check_spf {
my ($self, $scanner, $ishelo) = @_;
@ -387,7 +424,7 @@ sub _check_spf {
$scanner->{checked_for_received_spf_header} = 1;
dbg("spf: checking to see if the message has a Received-SPF header that we can use");
my @internal_hdrs = split("\n", $scanner->get('ALL-INTERNAL'));
my @internal_hdrs = $scanner->get('ALL-INTERNAL');
unless ($scanner->{conf}->{use_newest_received_spf_header}) {
# look for the LAST (earliest in time) header, it'll be the most accurate
@internal_hdrs = reverse(@internal_hdrs);
@ -459,7 +496,7 @@ sub _check_spf {
dbg("spf: could not parse result from existing Received-SPF header");
}
} elsif ($hdr =~ /^Authentication-Results:.*;\s*SPF\s*=\s*([^;]*)/i) {
} elsif ($hdr =~ /^(?:Arc\-)?Authentication-Results:.*;\s*SPF\s*=\s*([^;]*)/i) {
dbg("spf: found an Authentication-Results header added by an internal host: $hdr");
# RFC 5451 header parser - added by D. Stussy 2010-09-09:
@ -524,8 +561,6 @@ sub _check_spf {
unless (defined $self->{has_mail_spf}) {
my $eval_stat;
eval {
die("Mail::SPF disabled by admin setting\n") if $scanner->{conf}->{do_not_use_mail_spf};
require Mail::SPF;
if (!defined $Mail::SPF::VERSION || $Mail::SPF::VERSION < 2.001) {
die "Mail::SPF 2.001 or later required, this is ".
@ -547,43 +582,18 @@ sub _check_spf {
dbg("spf: using Mail::SPF for SPF checks");
$self->{has_mail_spf} = 1;
} else {
# strip the @INC paths... users are going to see it and think there's a problem even though
# we're going to fall back to Mail::SPF::Query (which will display the same paths if it fails)
$eval_stat =~ s#^Can't locate Mail/SPFd.pm in \@INC .*#Can't locate Mail/SPFd.pm#;
dbg("spf: cannot load Mail::SPF module or create Mail::SPF::Server object: $eval_stat");
dbg("spf: attempting to use legacy Mail::SPF::Query module instead");
undef $eval_stat;
eval {
die("Mail::SPF::Query disabled by admin setting\n") if $scanner->{conf}->{do_not_use_mail_spf_query};
require Mail::SPF::Query;
if (!defined $Mail::SPF::Query::VERSION || $Mail::SPF::Query::VERSION < 1.996) {
die "Mail::SPF::Query 1.996 or later required, this is ".
(defined $Mail::SPF::Query::VERSION ? $Mail::SPF::Query::VERSION : 'unknown')."\n";
}
1;
} or do {
$eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
};
if (!defined($eval_stat)) {
dbg("spf: using Mail::SPF::Query for SPF checks");
$self->{has_mail_spf} = 0;
} else {
dbg("spf: cannot load Mail::SPF::Query module: $eval_stat");
dbg("spf: one of Mail::SPF or Mail::SPF::Query is required for SPF checks, SPF checks disabled");
dbg("spf: cannot load Mail::SPF: module: $eval_stat");
dbg("spf: Mail::SPF is required for SPF checks, SPF checks disabled");
$self->{no_spf_module} = 1;
return;
}
}
}
# skip SPF checks if the A/MX records are nonexistent for the From
# domain, anyway, to avoid crappy messages from slowing us down
# (bug 3016)
return if $scanner->check_for_from_dns();
# TODO: this will only work if the queries are ready before SPF, so never?
return if $scanner->{sender_host_fail} && $scanner->{sender_host_fail} == 2;
if ($ishelo) {
# SPF HELO-checking variant
@ -609,7 +619,7 @@ sub _check_spf {
$scanner->{spf_failure_comment} = undef;
}
my $lasthop = $self->_get_relay($scanner);
my $lasthop = $scanner->{relays_external}->[0];
if (!defined $lasthop) {
dbg("spf: no suitable relay for spf use found, skipping SPF%s check",
$ishelo ? '-helo' : '');
@ -618,7 +628,6 @@ sub _check_spf {
my $ip = $lasthop->{ip}; # always present
my $helo = $lasthop->{helo}; # could be missing
$scanner->{sender} = '' unless $scanner->{sender_got};
if ($ishelo) {
unless ($helo) {
@ -627,19 +636,17 @@ sub _check_spf {
}
dbg("spf: checking HELO (helo=$helo, ip=$ip)");
} else {
$self->_get_sender($scanner) unless $scanner->{sender_got};
# TODO: we're supposed to use the helo domain as the sender identity (for
# mfrom checks) if the sender is the null sender, however determining that
# it's the null sender, and not just a failure to get the envelope isn't
# exactly trivial... so for now we'll just skip the check
if (!$scanner->{sender}) {
if (!$scanner->{spf_sender}) {
# we already dbg'd that we couldn't get an Envelope-From and can't do SPF
return;
}
dbg("spf: checking EnvelopeFrom (helo=%s, ip=%s, envfrom=%s)",
($helo ? $helo : ''), $ip, $scanner->{sender});
($helo ? $helo : ''), $ip, $scanner->{spf_sender});
}
# this test could probably stand to be more strict, but try to test
@ -657,11 +664,8 @@ sub _check_spf {
my ($result, $comment, $text, $err);
# use Mail::SPF if it was available, otherwise use the legacy Mail::SPF::Query
if ($self->{has_mail_spf}) {
# TODO: currently we won't get to here for a mfrom check with a null sender
my $identity = $ishelo ? $helo : ($scanner->{sender}); # || $helo);
my $identity = $ishelo ? $helo : ($scanner->{spf_sender}); # || $helo);
unless ($identity) {
dbg("spf: cannot determine %s identity, skipping %s SPF check",
@ -685,61 +689,21 @@ sub _check_spf {
my $timeout = $scanner->{conf}->{spf_timeout};
my $timer = Mail::SpamAssassin::Timeout->new(
my $timer_spf = Mail::SpamAssassin::Timeout->new(
{ secs => $timeout, deadline => $scanner->{master_deadline} });
$err = $timer->run_and_catch(sub {
$err = $timer_spf->run_and_catch(sub {
my $query = $self->{spf_server}->process($request);
$result = $query->code;
$comment = $query->authority_explanation if $query->can("authority_explanation");
$text = $query->text;
});
} else {
if (!$helo) {
dbg("spf: cannot get HELO, cannot use Mail::SPF::Query, consider installing Mail::SPF");
return;
}
# TODO: if we start doing checks on the null sender using the helo domain
# be sure to fix this so that it uses the correct sender identity
my $query;
eval {
$query = Mail::SPF::Query->new (ip => $ip,
sender => $scanner->{sender},
helo => $helo,
debug => 0,
trusted => 0);
1;
} or do {
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
dbg("spf: cannot create Mail::SPF::Query object: $eval_stat");
return;
};
my $timeout = $scanner->{conf}->{spf_timeout};
my $timer = Mail::SpamAssassin::Timeout->new(
{ secs => $timeout, deadline => $scanner->{master_deadline} });
$err = $timer->run_and_catch(sub {
($result, $comment) = $query->result();
});
} # end of differences between Mail::SPF and Mail::SPF::Query
if ($err) {
chomp $err;
warn("spf: lookup failed: $err\n");
return 0;
}
$result ||= 'timeout'; # bug 5077
$comment ||= '';
$comment =~ s/\s+/ /gs; # no newlines please
@ -774,138 +738,131 @@ sub _check_spf {
}
}
dbg("spf: query for $scanner->{sender}/$ip/$helo: result: $result, comment: $comment, text: $text");
}
sub _get_relay {
my ($self, $scanner) = @_;
# dos: first external relay, not first untrusted
return $scanner->{relays_external}->[0];
if ($ishelo) {
dbg("spf: query for $ip/$helo: result: $result, comment: $comment, text: $text");
} else {
dbg("spf: query for $scanner->{spf_sender}/$ip/$helo: result: $result, comment: $comment, text: $text");
}
}
sub _get_sender {
my ($self, $scanner) = @_;
my $sender;
$scanner->{sender_got} = 1;
$scanner->{sender} = '';
my $relay = $self->_get_relay($scanner);
my $relay = $scanner->{relays_external}->[0];
if (defined $relay) {
$sender = $relay->{envfrom};
my $sender = $relay->{envfrom};
if (defined $sender) {
dbg("spf: found EnvelopeFrom '$sender' in first external Received header");
$scanner->{spf_sender} = lc $sender;
} else {
dbg("spf: EnvelopeFrom not found in first external Received header");
}
}
if ($sender) {
dbg("spf: found Envelope-From in first external Received header");
}
else {
if (!exists $scanner->{spf_sender}) {
# We cannot use the env-from data, since it went through 1 or more relays
# since the untrusted sender and they may have rewritten it.
if ($scanner->{num_relays_trusted} > 0 && !$scanner->{conf}->{always_trust_envelope_sender}) {
dbg("spf: relayed through one or more trusted relays, cannot use header-based Envelope-From, skipping");
return;
}
# we can (apparently) use whatever the current Envelope-From was,
if ($scanner->{num_relays_trusted} > 0 &&
!$scanner->{conf}->{always_trust_envelope_sender}) {
dbg("spf: relayed through one or more trusted relays, ".
"cannot use header-based EnvelopeFrom");
} else {
# we can (apparently) use whatever the current EnvelopeFrom was,
# from the Return-Path, X-Envelope-From, or whatever header.
# it's better to get it from Received though, as that is updated
# hop-by-hop.
$sender = $scanner->get("EnvelopeFrom:addr");
}
if (!$sender) {
dbg("spf: cannot get Envelope-From, cannot use SPF");
return; # avoid setting $scanner->{sender} to undef
}
return $scanner->{sender} = lc $sender;
}
sub _check_spf_whitelist {
my ($self, $scanner) = @_;
$scanner->{spf_whitelist_from_checked} = 1;
$scanner->{spf_whitelist_from} = 0;
# if we've already checked for an SPF PASS and didn't get it don't waste time
# checking to see if the sender address is in the spf whitelist
if ($scanner->{spf_checked} && !$scanner->{spf_pass}) {
dbg("spf: whitelist_from_spf: already checked spf and didn't get pass, skipping whitelist check");
return;
}
$self->_get_sender($scanner) unless $scanner->{sender_got};
unless ($scanner->{sender}) {
dbg("spf: spf_whitelist_from: could not find usable envelope sender");
return;
}
$scanner->{spf_whitelist_from} = $self->_wlcheck($scanner,'whitelist_from_spf');
if (!$scanner->{spf_whitelist_from}) {
$scanner->{spf_whitelist_from} = $self->_wlcheck($scanner, 'whitelist_auth');
}
# if the message doesn't pass SPF validation, it can't pass an SPF whitelist
if ($scanner->{spf_whitelist_from}) {
if ($self->check_for_spf_pass($scanner)) {
dbg("spf: whitelist_from_spf: $scanner->{sender} is in user's WHITELIST_FROM_SPF and passed SPF check");
my $sender = ($scanner->get("EnvelopeFrom:addr"))[0];
if (defined $sender) {
dbg("spf: found EnvelopeFrom '$sender' from header");
$scanner->{spf_sender} = lc $sender;
} else {
dbg("spf: whitelist_from_spf: $scanner->{sender} is in user's WHITELIST_FROM_SPF but failed SPF check");
$scanner->{spf_whitelist_from} = 0;
dbg("spf: EnvelopeFrom header not found");
}
} else {
dbg("spf: whitelist_from_spf: $scanner->{sender} is not in user's WHITELIST_FROM_SPF");
}
}
if (!exists $scanner->{spf_sender}) {
dbg("spf: cannot get EnvelopeFrom, cannot use SPF by DNS");
}
}
sub _check_def_spf_whitelist {
sub _check_spf_welcomelist {
my ($self, $scanner) = @_;
$scanner->{def_spf_whitelist_from_checked} = 1;
$scanner->{def_spf_whitelist_from} = 0;
$scanner->{spf_welcomelist_from_checked} = 1;
$scanner->{spf_welcomelist_from} = 0;
# if we've already checked for an SPF PASS and didn't get it don't waste time
# checking to see if the sender address is in the spf whitelist
# checking to see if the sender address is in the spf welcomelist
if ($scanner->{spf_checked} && !$scanner->{spf_pass}) {
dbg("spf: def_spf_whitelist_from: already checked spf and didn't get pass, skipping whitelist check");
dbg("spf: welcomelist_from_spf: already checked spf and didn't get pass, skipping welcomelist check");
return;
}
$self->_get_sender($scanner) unless $scanner->{sender_got};
unless ($scanner->{sender}) {
dbg("spf: def_spf_whitelist_from: could not find usable envelope sender");
if (!$scanner->{spf_sender}) {
dbg("spf: spf_welcomelist_from: no EnvelopeFrom available for welcomelist check");
return;
}
$scanner->{def_spf_whitelist_from} = $self->_wlcheck($scanner,'def_whitelist_from_spf');
if (!$scanner->{def_spf_whitelist_from}) {
$scanner->{def_spf_whitelist_from} = $self->_wlcheck($scanner, 'def_whitelist_auth');
}
$scanner->{spf_welcomelist_from} =
$self->_wlcheck($scanner, 'welcomelist_from_spf') ||
$self->_wlcheck($scanner, 'welcomelist_auth');
# if the message doesn't pass SPF validation, it can't pass an SPF whitelist
if ($scanner->{def_spf_whitelist_from}) {
# if the message doesn't pass SPF validation, it can't pass an SPF welcomelist
if ($scanner->{spf_welcomelist_from}) {
if ($self->check_for_spf_pass($scanner)) {
dbg("spf: def_whitelist_from_spf: $scanner->{sender} is in DEF_WHITELIST_FROM_SPF and passed SPF check");
dbg("spf: welcomelist_from_spf: $scanner->{spf_sender} is in user's WELCOMELIST_FROM_SPF and passed SPF check");
} else {
dbg("spf: def_whitelist_from_spf: $scanner->{sender} is in DEF_WHITELIST_FROM_SPF but failed SPF check");
$scanner->{def_spf_whitelist_from} = 0;
dbg("spf: welcomelist_from_spf: $scanner->{spf_sender} is in user's WELCOMELIST_FROM_SPF but failed SPF check");
$scanner->{spf_welcomelist_from} = 0;
}
} else {
dbg("spf: def_whitelist_from_spf: $scanner->{sender} is not in DEF_WHITELIST_FROM_SPF");
dbg("spf: welcomelist_from_spf: $scanner->{spf_sender} is not in user's WELCOMELIST_FROM_SPF");
}
}
sub _check_def_spf_welcomelist {
my ($self, $scanner) = @_;
$scanner->{def_spf_welcomelist_from_checked} = 1;
$scanner->{def_spf_welcomelist_from} = 0;
# if we've already checked for an SPF PASS and didn't get it don't waste time
# checking to see if the sender address is in the spf welcomelist
if ($scanner->{spf_checked} && !$scanner->{spf_pass}) {
dbg("spf: def_spf_welcomelist_from: already checked spf and didn't get pass, skipping welcomelist check");
return;
}
if (!$scanner->{spf_sender}) {
dbg("spf: def_spf_welcomelist_from: could not find usable envelope sender");
return;
}
$scanner->{def_spf_welcomelist_from} =
$self->_wlcheck($scanner, 'def_welcomelist_from_spf') ||
$self->_wlcheck($scanner, 'def_welcomelist_auth');
# if the message doesn't pass SPF validation, it can't pass an SPF welcomelist
if ($scanner->{def_spf_welcomelist_from}) {
if ($self->check_for_spf_pass($scanner)) {
dbg("spf: def_welcomelist_from_spf: $scanner->{spf_sender} is in DEF_WELCOMELIST_FROM_SPF and passed SPF check");
} else {
dbg("spf: def_welcomelist_from_spf: $scanner->{spf_sender} is in DEF_WELCOMELIST_FROM_SPF but failed SPF check");
$scanner->{def_spf_welcomelist_from} = 0;
}
} else {
dbg("spf: def_welcomelist_from_spf: $scanner->{spf_sender} is not in DEF_WELCOMELIST_FROM_SPF");
}
}
sub _wlcheck {
my ($self, $scanner, $param) = @_;
if (defined ($scanner->{conf}->{$param}->{$scanner->{sender}})) {
if (defined ($scanner->{conf}->{$param}->{$scanner->{spf_sender}})) {
return 1;
} else {
study $scanner->{sender}; # study is a no-op since perl 5.16.0
foreach my $regexp (values %{$scanner->{conf}->{$param}}) {
if ($scanner->{sender} =~ qr/$regexp/i) {
if ($scanner->{spf_sender} =~ $regexp) {
return 1;
}
}

View File

@ -58,7 +58,7 @@ sub new {
my $self = $class->SUPER::new($mailsaobject);
bless ($self, $class);
$self->register_eval_rule("check_shortcircuit");
$self->register_eval_rule("check_shortcircuit"); # type does not matter
$self->set_config($mailsaobject->{conf});
return $self;
@ -88,6 +88,14 @@ that.
To override a test that uses shortcircuiting, you can set the classification
type to C<off>.
Note that DNS and other network lookups are launched when SA reaches
priority -100. If you want to shortcircuit scanning before any network
queries are sent, you need to set lower than -100 priority to any such rule,
like -200 as in the examples below.
Shortcircuited test will be automatically set to priority -200, but only if
the original priority is unchanged at default 0.
=over 4
=item on
@ -99,7 +107,7 @@ shortcircuited. This would allow you, for example, to define a rule such as
body TEST /test/
describe TEST test rule that scores barely over spam threshold
score TEST 5.5
priority TEST -100
priority TEST -200
shortcircuit TEST on
The result of a message hitting the above rule would be a final score of 5.5,
@ -113,11 +121,11 @@ Disables shortcircuiting on said rule.
Shortcircuit the rule using a set of defaults; override the default score of
this rule with the score from C<shortcircuit_spam_score>, set the
C<noautolearn> tflag, and set priority to C<-100>. In other words,
C<noautolearn> tflag, and set priority to C<-200>. In other words,
equivalent to:
shortcircuit TEST on
priority TEST -100
priority TEST -200
score TEST 100
tflags TEST noautolearn
@ -125,11 +133,11 @@ equivalent to:
Shortcircuit the rule using a set of defaults; override the default score of
this rule with the score from C<shortcircuit_ham_score>, set the C<noautolearn>
and C<nice> tflags, and set priority to C<-100>. In other words, equivalent
and C<nice> tflags, and set priority to C<-200>. In other words, equivalent
to:
shortcircuit TEST on
priority TEST -100
priority TEST -200
score TEST -100
tflags TEST noautolearn nice
@ -141,23 +149,22 @@ to:
setting => 'shortcircuit',
code => sub {
my ($self, $key, $value, $line) = @_;
my ($rule,$type);
unless (defined $value && $value !~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
if ($value =~ /^(\S+)\s+(\S+)$/) {
$rule=$1;
$type=$2;
} else {
local($1,$2);
unless ($value =~ /^(\w+)\s+(\w+)$/) {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
my ($rule, $type) = ($1, $2);
if ($type =~ m/^(?:spam|ham)$/) {
if ($type eq "ham" || $type eq "spam") {
dbg("shortcircuit: adding $rule using abbreviation $type");
# set the defaults:
$self->{shortcircuit}->{$rule} = $type;
$self->{priority}->{$rule} = -100;
# don't override existing priority unless it's default 0
$self->{priority}->{$rule} ||= -200;
my $tf = $self->{tflags}->{$rule};
$self->{tflags}->{$rule} = ($tf ? $tf." " : "") .
@ -227,7 +234,7 @@ sub hit_rule {
my $rule = $params->{rulename};
# don't s/c if we're linting
return if ($scan->{lint_rules});
return if ($self->{main}->{lint_rules});
# don't s/c if we're in compile_now()
return if ($self->{am_compiling});
@ -256,6 +263,8 @@ sub hit_rule {
$scscore = $score;
}
$scan->{shortcircuited} = 1;
# bug 5256: if we short-circuit, don't do auto-learning
$scan->{disable_auto_learning} = 1;
$scan->got_hit('SHORTCIRCUIT', '', score => $scscore);
@ -306,6 +315,6 @@ sub compile_now_finish {
=head1 SEE ALSO
C<http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109>
C<https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109>
=cut

View File

@ -29,13 +29,13 @@ SpamCop is a service for reporting spam. SpamCop determines the origin
of unwanted email and reports it to the relevant Internet service
providers. By reporting spam, you have a positive impact on the
problem. Reporting unsolicited email also helps feed spam filtering
systems, including, but not limited to, the SpamCop blacklist used in
systems, including, but not limited to, the SpamCop blocklist used in
SpamAssassin as a DNSBL.
Note that spam reports sent by this plugin to SpamCop each include the
entire spam message.
See http://www.spamcop.net/ for more information about SpamCop.
See https://www.spamcop.net/ for more information about SpamCop.
=cut
@ -43,13 +43,13 @@ package Mail::SpamAssassin::Plugin::SpamCop;
use Mail::SpamAssassin::Plugin;
use Mail::SpamAssassin::Logger;
use Mail::SpamAssassin::Util qw(untaint_var);
use IO::Socket;
use strict;
use warnings;
# use bytes;
use re 'taint';
use constant HAS_NET_DNS => eval { require Net::DNS; };
use constant HAS_NET_SMTP => eval { require Net::SMTP; };
our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -63,7 +63,7 @@ sub new {
bless ($self, $class);
# are network tests enabled?
if (!$mailsaobject->{local_tests_only} && HAS_NET_DNS && HAS_NET_SMTP) {
if (!$mailsaobject->{local_tests_only} && HAS_NET_SMTP) {
$self->{spamcop_available} = 1;
dbg("reporter: network tests on, attempting SpamCop");
}
@ -114,7 +114,7 @@ guess will be used as the From: address in SpamCop reports.
=item spamcop_to_address user@example.com (default: generic reporting address)
Your customized SpamCop report submission address. You need to obtain
this address by registering at C<http://www.spamcop.net/>. If this is
this address by registering at C<https://www.spamcop.net/>. If this is
not set, SpamCop reports will go to a generic reporting address for
SpamAssassin users and your reports will probably have less weight in
the SpamCop system.
@ -153,6 +153,35 @@ size that SpamCop will accept at the time of release.
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
=item spamcop_relayhost server:port (default: direct connection to SpamCop)
Direct connection to SpamCop servers (port 587) is used for report
submission by default. If this is undesirable or blocked by local firewall
policies, you can specify a local SMTP relayhost to forward reports.
Relayhost should be configured to not scan the report, for example by using
a separate submission port. SSL or authentication is not supported.
=cut
push (@cmds, {
setting => 'spamcop_relayhost',
default => undef,
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
code => sub {
my ($self, $key, $value, $line) = @_;
if ($value =~ /^(\S+):(\d{2,5})$/) {
$self->{spamcop_relayhost} = untaint_var($1);
$self->{spamcop_relayport} = untaint_var($2);
}
elsif ($value =~ /^$/) {
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
}
else {
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
}
}
});
$conf->{parser}->register_commands(\@cmds);
}
@ -257,17 +286,31 @@ EOM
# send message
my $failure;
my $mx = $head{To};
my $hello = Mail::SpamAssassin::Util::fq_hostname() || $from;
$mx =~ s/.*\@//;
$hello =~ s/.*\@//;
for my $rr (Net::DNS::mx($mx)) {
my $exchange = Mail::SpamAssassin::Util::untaint_hostname($rr->exchange);
next unless $exchange;
my $smtp;
if ($smtp = Net::SMTP->new($exchange,
my @mxs;
if ($options->{report}->{conf}->{spamcop_relayhost}) {
push @mxs, $options->{report}->{conf}->{spamcop_relayhost};
} else {
my $mx = $head{To};
$mx =~ s/.*\@//;
foreach my $rr (Net::DNS::mx($mx)) {
if (defined $rr->exchange) {
push @mxs, Mail::SpamAssassin::Util::untaint_hostname($rr->exchange);
}
}
if (!@mxs) {
warn("reporter: failed to resolve SpamCop MX servers\n");
return 0;
}
}
my $port = $options->{report}->{conf}->{spamcop_relayport} || 587;
for my $exchange (@mxs) {
if (my $smtp = Net::SMTP->new($exchange,
Hello => $hello,
Port => 587,
Port => $port,
Timeout => 10))
{
if ($smtp->mail($from) && smtp_dbg("FROM $from", $smtp) &&

View File

@ -87,8 +87,8 @@ sub new {
}
}
$self->register_eval_rule("check_language");
$self->register_eval_rule("check_body_8bits");
$self->register_eval_rule("check_language"); # type does not matter
$self->register_eval_rule("check_body_8bits", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
$self->set_config($mailsaobject->{conf});
@ -114,7 +114,7 @@ confidence. In that case, no action is taken.
The rule C<UNWANTED_LANGUAGE_BODY> is triggered if none of the languages
detected are in the "ok" list. Note that this is the only effect of the
"ok" list. It does not act as a whitelist against any other form of spam
"ok" list. It does not act as a welcomelist against any other form of spam
scanning.
In your configuration, you must use the two or three letter language
@ -406,7 +406,7 @@ sub load_models {
# create language ngram maps once
for (@lm) {
# look for end delimiter
if (/^0 (.+)/) {
if (index($_, '0 ') == 0 && /^0 (.+)/) {
$ngram->{"language"} = $1;
push(@nm, $ngram);
# reset for next language
@ -449,7 +449,14 @@ sub classify {
$p += exists($ngram->{$_}) ? abs($ngram->{$_} - $i) : $maxp;
$i++;
}
$results{$language} = $p;
# Most latin1 languages have xx and xx.utf8 alternatives (those which
# don't have should be named xx.utf-8). Always strip .utf8 from name,
# it will not be accurate as matching will depend on normalize_charset
# and mail encoding. Keep track of the best score for alternatives.
$language = $short if index($language, '.utf8') > 0;
if (!exists $results{$language} || $results{$language} > $p) {
$results{$language} = $p
}
}
my @results = sort { $results{$a} <=> $results{$b} } keys %results;
@ -459,7 +466,11 @@ sub classify {
my @results_tag;
foreach (@results[0..19]) {
last unless defined $_;
if($best != 0) {
push @results_tag, sprintf "%s:%s(%.02f)", $_, $results{$_}, $results{$_} / $best;
} else {
push @results_tag, sprintf "%s:%s(unknown)", $_, $results{$_};
}
}
$opts->{permsgstatus}->set_tag('TEXTCATRESULTS', join(' ', @results_tag));
@ -539,11 +550,14 @@ sub extract_metadata {
my $body = $msg->get_rendered_body_text_array();
$body = join("\n", @{$body});
$body =~ s/^Subject://i;
# Strip subject prefixes, enhances results
$body =~ s/^(?:[a-z]{2,12}:\s*){1,10}//i;
# Strip anything that looks like url or email, enhances results
$body =~ s{https?://\S+}{ }gs;
$body =~ s{\S+?\@[a-zA-Z]\S+}{ }gs;
$body =~ s/https?(?:\:\/\/|:&#x2F;&#x2F;|%3A%2F%2F)\S{1,1024}/ /gs;
$body =~ s/\S{1,64}?\@[a-zA-Z]\S{1,128}/ /gs;
$body =~ s/\bwww\.\S{1,128}/ /gs;
my $len = length($body);
# truncate after 10k; that should be plenty to classify it

Some files were not shown because too many files have changed in this diff Show More