update SpamAssassin to 4.0.0

generated by make update-upstream Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2025-07-04 00:36:03 +00:00 · 2023-03-13 21:13:17 +01:00 · 2023-03-13 21:13:17 +01:00 · ae52237fd8
commit ae52237fd8
parent 56cebc6b1a
448 changed files with 32497 additions and 16814 deletions
--- a/upstream/CREDITS
+++ b/upstream/CREDITS
@ -1,4 +1,4 @@
-Copyright (C) 2021 The Apache Software Foundation
+Copyright (C) 2022 The Apache Software Foundation

 Project Management Committee (PMC):

--- a/upstream/Changes
+++ b/upstream/Changes
--- a/upstream/INSTALL
+++ b/upstream/INSTALL
@ -11,16 +11,16 @@ Installing or Upgrading SpamAssassin

 Using CPAN via CPAN.pm:

-	perl -MCPAN -e shell                    [as root]
+	perl -MCPAN -e shell  # as root
 	o conf prerequisites_policy ask
 	install Mail::SpamAssassin
 	quit

 Using Linux:

-	Debian unstable: apt-get install spamassassin
+	Debian/Ubuntu: apt-get install spamassassin
 	Gentoo: emerge mail-filter/spamassassin
-	Fedora: yum install spamassassin
+	Fedora/CentOS/RedHat: yum install spamassassin

 Alternatively download the tarfile, zipfile, and/or build your own RPM
 from https://spamassassin.apache.org/.  Building from tar/zip file is
@ -28,10 +28,9 @@ usually as simple as:

 	[unzip/untar the archive]
 	cd Mail-SpamAssassin-*
-	perl Makefile.PL
-	[option: add -DSPAMC_SSL to $CFLAGS to build an SSL-enabled spamc]
+	perl Makefile.PL  # add ENABLE_SSL=yes for SSL support
 	make
-	make install                            [as root]
+	make install  # as root

 After installing SpamAssassin, you need to download and install the
 SpamAssassin ruleset using "sa-update".  See the "Installing Rules"
@ -42,16 +41,10 @@ the prerequisite information further down.

 To install as non-root, see the directions below.

-If you are running AFS, you may also need to specify INSTALLSITELIB and
-SITELIBEXP.
-
 Note that you can upgrade SpamAssassin using these instructions, as long
 as you take care to read the caveats in the file UPGRADE.   Upgrading
 will not delete your learnt Bayes data or local rule modifications.

-If you're using SunOS 4.1.x, see
-http://wiki.spamassassin.org/w/BuildingOnSunOS4 for build tips.
-

 Installing SpamAssassin for Personal Use (Not System-Wide)
 ----------------------------------------------------------
@ -62,7 +55,7 @@ These steps assume the following, so substitute as necessary:
  - The location of the procmail executable is /usr/bin/procmail

 Many more details of this process are at
-http://wiki.apache.org/spamassassin/SingleUserUnixInstall
+https://wiki.apache.org/spamassassin/SingleUserUnixInstall

 1. Uncompress and extract the SpamAssassin archive, using "unzip" or
   "tar xvfz", in a temporary directory.
@ -119,7 +112,7 @@ http://wiki.apache.org/spamassassin/SingleUserUnixInstall
 caughtspam

 Also, see the file procmailrc.example and
-http://wiki.apache.org/spamassassin/UsedViaProcmail
+https://wiki.apache.org/spamassassin/UsedViaProcmail

 8. Now, you should be ready to send some test emails and ensure everything
   works as expected.  First, send yourself a test email that doesn't
@ -136,7 +129,7 @@ http://wiki.apache.org/spamassassin/UsedViaProcmail
   don't lose incoming email.

   Note: one possible cause for this is the use of smrsh on the MTA system;
-   see http://wiki.spamassassin.org/w/ProcmailVsSmrsh for details.
+   see https://wiki.apache.org/spamassassin/ProcmailVsSmrsh for details.

 9. You can now customize SpamAssassin.  See README for more information.

@ -153,18 +146,25 @@ Installing rules from network is done with a single command:

        sa-update

-This is normally run as root.
+For security reasons, it should not be run as root, but as the user normally
+running SpamAssassin.  You can run the initial setup once as root, to create
+necessary directories etc.  Then you need to change ownership of
+LOCAL_STATE_DIR to that user (usually: chown -R user:user
+/var/lib/spamassassin), you can find out the default directory with
+sa-update --help (look for --updatedir).  Same needs to be done for
+LOCAL_RULES_DIR/sa-update-keys (usually: chown -R user:user
+/etc/mail/spamassassin/sa-update-keys), the directory can be found with
+spamassassin --help (look for --siteconfigpath).

 If you wish to install rules from downloaded files, rather than "live" from
 the latest online ruleset, here is how to do it.

-Obtain all the following files:
+Obtain all the following files from https://spamassassin.apache.org/downloads.cgi:

    Mail-SpamAssassin-rules-xxx.tgz
    Mail-SpamAssassin-rules-xxx.tgz.asc
-    Mail-SpamAssassin-rules-xxx.tgz.md5
-    Mail-SpamAssassin-rules-xxx.tgz.sha1
-      (where xxx may look something like '3.3.0-rc1.r893295')
+    Mail-SpamAssassin-rules-xxx.tgz.sha512
+      (where xxx may look something like '4.0.0.r1900144')

 Save them all to the current directory.
 Obtain a rules-signing public key:
@ -180,7 +180,7 @@ Install rules from a compressed tar archive:

    sa-update --install Mail-SpamAssassin-rules-xxx.tgz

-Note that the ".tgz.asc", ".tgz.md5" and ".tgz.sha1" files all need to
+Note that the ".tgz", ".tgz.asc" and ".tgz.sha512" files all need to
 be in the same directory, otherwise sa-update will fail.


@ -197,7 +197,7 @@ CPAN
 ----

 Most of the modules listed below are available via the Comprehensive Perl
-Archive Network (CPAN, see http://www.cpan.org/ for more information).
+Archive Network (CPAN, see https://www.cpan.org/ for more information).
 While each module is different, most can be installed via a few simple
 commands such as:

@ -222,11 +222,7 @@ through those mechanisms, too, if you prefer.
 Required Perl Interpreter
 -------------------------

-Perl 5.8.1 or a later version is required.
-Preferred versions are 5.8.8, or 5.10.1 or later.
-
-Most of the functions might still work with Perl 5.6.1 or 5.6.2,
-but 5.6.* is no longer a supported version.
+Perl 5.14.0 or a later version is required.


 Required Perl Modules
@ -243,81 +239,41 @@ SpamAssassin build directory.
 The list of required modules that do not ship with Perl and must be
 installed:

-  - Digest::SHA1 (from CPAN),
-    or the newer Digest::SHA which is a perl base module since Perl 5.10.0
+  - Digest::SHA (from CPAN)

-    The Digest::SHA1 module is used as a cryptographic hash for some
-    tests and the Bayes subsystem if Digest::SHA module is not available.
-
-    An external perl module razor-agents-2.84 as used by a Razor2 plugin
-    seems to be the only remaining component depending on Digest::SHA1
-    (note that a packager may ship a patched version of razor-agents which
-    can use Digest::SHA instead)
-
-    Debian: apt-get install libdigest-sha1-perl
-    Gentoo: emerge dev-perl/Digest-SHA1
-    Fedora: yum install perl-Digest-SHA1
+    Used as a cryptographic hash for some tests and the Bayes subsystem.
+    It is also required by the DKIM plugin.

  - HTML::Parser >= 3.43 (from CPAN)

    HTML is used for an ever-increasing amount of email so this dependency
    is unavoidable.  Run "perldoc -q html" for additional information.

-    Debian: apt-get install libhtml-parser-perl
-    Gentoo: emerge dev-perl/HTML-Parser
-    Fedora: yum install perl-HTML-Parser
-
-  - Net::DNS (from CPAN)
+  - Net::DNS >= 0.69 (from CPAN)

    Used for all DNS-based tests (SBL, XBL, SpamCop, DSBL, etc.),
    perform MX checks, used when manually reporting spam to SpamCop,
    and used by sa-update to gather version information.

-    You need to make sure the Net::DNS version is sufficiently up-to-date:
+  - NetAddr::IP >= 4.010 (from CPAN)

-      - version 0.34 or higher on Unix systems
-      - version 0.46 or higher on Windows systems
+    Used to parse IP addresses and IP address ranges for "trusted_networks". 
+    Used in determining which DNS tests are to be done for each of the
+    header's received fields.  Used by AWL plugin for extracting network
+    addresses.  Used by DNSxL rules for assembling DNS queries.

-    Debian/Ubuntu: apt-get install libnet-dns-perl
-    Fedora: yum install perl-Net-DNS
+    Avoid buggy versions 4.034-4.035 and 4.045-4.054.

-  - NetAddr::IP (from CPAN)
+Examples of installing required modules on popular distributions:

-    Used to parse IP addresses and IP address ranges for
-    "trusted_networks".
+ Debian/Ubuntu:
+  apt-get install libdigest-sha-perl libhtml-parser-perl libnet-dns-perl libnetaddr-ip-perl

-    Debian/Ubuntu: apt-get install libnetaddr-ip-perl
-    Fedora: yum install perl-NetAddr-IP
+ Gentoo:
+  emerge dev-perl/Digest-SHA dev-perl/HTML-Parser dev-perl/Net-DNS dev-perl/NetAddr-IP

-  - Time::HiRes (from CPAN)
-
-    Used by asynchronous DNS lookups to operate timeouts with subsecond
-    precision and to report processing times accurately.
-
-  - LWP (aka libwww-perl) (from CPAN)
-
-    This set of modules will include both the LWP::UserAgent and
-    HTTP::Date modules, used by sa-update to retrieve update archives.
-
-    Fedora: yum install perl-libwww-perl
-
-  - HTTP::Date (from CPAN)
-
-    Used by sa-update to deal with certain Date requests.
-
-  - IO::Zlib (from CPAN)
-
-    Used by sa-update to uncompress update archives.
-    Version 1.04 or later is required.
-
-    Fedora: yum install perl-IO-Zlib
-
-  - Archive::Tar (from CPAN)
-
-    Used by sa-update to expand update archives.
-    Version 1.23 or later is required.
-
-    Fedora: yum install perl-Archive-Tar
+ Fedora/CentOS/RedHat:
+  yum install perl-Digest-SHA perl-HTML-Parser perl-Net-DNS perl-NetAddr-IP


 Optional Modules
@ -329,140 +285,161 @@ their version is too low, SpamAssassin will still work, just not as
 effectively because some of the spam-detection tests will have to be
 skipped.

-Note: SpamAssassin will not warn you if these are installed, but the
-version is too low for them to be used.
+Note: SpamAssassin may not warn you if these are installed, but the version
+is too low for them to be used.

-  - MIME::Base64
+  - MIME::Base64 (from CPAN)

    This module is highly recommended to increase the speed with which
    Base64 encoded messages/mail parts are decoded.

+  - Encode::Detect::Detector (from CPAN)

-  - DB_File (from CPAN, included in many distributions)
+    For proper detection of charsets and converting them into Unicode, you
+    will need to install this module.

-    Used to store data on-disk, for the Bayes-style logic, TxRep, and
-    auto-whitelist.  *Much* more efficient than the other standard Perl
-    database packages.  Strongly recommended.
+  - Net::LibIDN2 (from CPAN)
+  - Net::LibIDN (from CPAN)

-    There seems to be a bug in libdb 4.1.25, which is
-    distributed by default on some versions of Linux.  See
-    http://wiki.apache.org/spamassassin/DbFileSleepBug for details.
+    Provides mapping between Internationalized Domain Names (IDN) in Unicode
+    and ASCII-compatible encoding (ACE) for use in DNS and comparisions. 
+    The module is optional, but without it Unicode IDN names found in mail
+    will not be suitable for DNS queries and welcome/blocklisting.

+    Either module should work fine, but newer Net::LibIDN2 might not be
+    available in all distributions.

-  - Net::SMTP (from CPAN)
-
-    Used when manually reporting spam to SpamCop.
-
-
-  - Mail::SPF (from CPAN)
-
-    Used to check DNS Sender Policy Framework (SPF) records to fight email
-    address forgery and make it easier to identify spams.  This module
-    makes Mail::SPF::Query obsolete.
-
-    Net::DNS version 0.58 or higher is required.
-
-    Note that NetAddr::IP (required by Mail::SPF) versions up to and
-    including version 4.006 include a bug that will slow down the entire
-    perl interpreter.  NetAddr::IP version 4.007 or later fixes this.
-
-
-  - Geo::IP (from CPAN)
-
-    Used by the RelayCountry plugin (not enabled by default) to determine
-    the domain country codes of each relay in the path of an email.
-
-    IP::Country::Fast is used as alternative if Geo::IP is not installed.
-    This is not recommended as it's obsolete.
-
-
-  - Net::Ident (from CPAN)
-
-    If you plan to use the --auth-ident option to spamd, you will need
-    to install this module.
-
-
-  - IO::Socket::INET6 (from CPAN)
-
-    This is required if the first nameserver listed in your IP
-    configuration or /etc/resolv.conf file is available only via an IPv6
-    address.
-
-    Fedora: yum install perl-IO-Socket-INET6
-
-
-  - IO::Socket::SSL (from CPAN)
-
-    If you wish to use SSL encryption to communicate between spamc and
-    spamd (the --ssl option to spamd), you need to install this
-    module. (You will need the OpenSSL libraries and use the
-    ENABLE_SSL="yes" argument to Makefile.PL to build and run an SSL
-    compatible spamc.)
-
-    Fedora: yum install perl-IO-Socket-SSL
-
-
-  - Compress::Zlib (from CPAN)
-
-    If you wish to use the optional zlib compression for communication
-    between spamc and spamd (the -z option to spamc), useful for
-    long-distance use of spamc over the internet, you need to install
-    this module.
-
-    Fedora: yum install perl-Compress-Zlib
+  - Email::Address::XS

+    Used to parse email addresses from header fields like To/From/cc, per
+    RFC 5322.  If installed, it may additionally be used by internal parser
+    to process complex lists.

  - Mail::DKIM (from CPAN)

    If this module is installed, and you enable the DKIM plugin,
    SpamAssassin will perform DKIM lookups when a DKIM-Signature header is
-    present in the message headers.  Current versions of Mail::DKIM (0.20
-    or later) also perform Domain Key lookups on DomainKey-Signature headers,
+    present in the message headers.  Current versions of Mail::DKIM (0.20 or
+    later) also perform Domain Key lookups on DomainKey-Signature headers,
    without requiring the Mail::DomainKeys module, which is now obsolete. 
-    Version 0.37 or later is preferred, the absolute minimal version is 0.31.
+    Version 0.37 or later is preferred, the absolute minimal version is
+    0.31.
    
-    Note that the Mail::DKIM module in turn requires the Digest::SHA module
-    and OpenSSL libraries.
+  - Mail::SPF (from CPAN)

+    Used to check DNS Sender Policy Framework (SPF) records to fight email
+    address forgery and make it easier to identify spams.
+
+  - MaxMind::DB::Reader::XS (GeoIP2) (from CPAN)
+  - MaxMind::DB::Reader (GeoIP2) (from CPAN)
+  - IP::Country::DB_File (from CPAN)
+  - Geo::IP (old deprecated GeoIP) (from CPAN)
+  - IP::Country::Fast (deprecated) (from CPAN)
+
+    Geolocation modules, choose one from the list (in recommended order).
+
+    Used by the RelayCountry plugin (not enabled by default) to determine
+    the domain country codes of each relay in the path of an email.  Also
+    used by the URILocalBL plugin (not enabled by default) to provide ISP
+    and Country code based filtering.
+
+    See: https://wiki.apache.org/spamassassin/RelayCountryPlugin
+
+  - Mail::DMARC
+
+    Used by the optional DMARC check plugin, which itself requires DKIM and
+    SPF features working.
+
+  - DB_File (from CPAN)
+
+    Used to store data on-disk, for the Bayes-style logic, TxRep, and
+    auto-welcomelist.  *Much* more efficient than the other standard Perl
+    database packages.  Strongly recommended.
+
+    There seems to be a bug in libdb 4.1.25, which is
+    distributed by default on some versions of Linux.  See
+    https://wiki.apache.org/spamassassin/DbFileSleepBug for details.
+
+  - IO::Socket::IP (from CPAN)
+  - IO::Socket::INET6 (from CPAN)
+
+    Installing IO::Socket::IP is recommended if spamd is to listen on IPv6
+    sockets or if DNS queries should go to an IPv6 name server.  If
+    IO::Socket::IP is not available, using a deprecated module
+    IO::Socket::INET6 will be attempted, and in its absence the support for
+    IPv6 will not be available.  Some plugins and underlying modules may
+    also prefer IO::Socket::IP over IO::Socket::INET6.
+
+  - IO::Socket::SSL (from CPAN)
+
+    If you wish to use SSL encryption to communicate between spamc and spamd
+    (the --ssl option to spamd), you need to install this module.  (You will
+    need the OpenSSL libraries and use the ENABLE_SSL=yes argument to
+    Makefile.PL to build and run an SSL compatible spamc.)
+
+  - Net::Patricia
+
+    If this module is available, it will be used for IP address lookups in
+    tables internal_networks, trusted_networks, and msa_networks. 
+    Recommended when a number of entries in these tables is hundred or more. 
+    However, in case of overlapping (or conflicting) networks in these
+    tables, lookup results may differ as Net::Patricia finds a
+    tightest-matching entry, while a sequential NetAddr::IP search finds a
+    first-matching entry.  So when overlapping network ranges are given,
+    specifying more specific subnets (longest netmask) first, followed by
+    wider subnets ensures predictable results.

  - DBI *and* DBD driver/modules for your database (from CPAN)

-    If you intend to use SpamAssassin with an SQL database backend for
-    user configuration data, Bayes storage, or other storage, you will need
-    to have these installed; both the basic DBI module and the driver for
-    your database.
+    If you intend to use SpamAssassin with an SQL database backend for user
+    configuration data, Bayes storage, or other storage, you will need to
+    have these installed; both the basic DBI module and the driver for your
+    database (for example DBD::MariaDB, DBD::mysql or DBD::Pg).

+  - Archive::Zip
+  - IO::String

-  - Encode::Detect (from CPAN)
-
-    If you plan to use the normalize_charset config setting to detect
-    charsets and convert them into Unicode, you will need to install
-    this module.
-
-
-  - Apache::Test (from CPAN)
-
-    If you plan to run the Apache2 version of spamd in the
-    "spamd-apache2" directory, you will need to install this
-    module.
-
-
-  - Apache 2 and mod_perl
-
-    If you plan to run the Apache2 version of spamd in the "spamd-apache2"
-    directory, you will need to ensure these are installed.
-
-    Ubuntu: sudo apt-get install apache2 libapache2-mod-perl2
-
+    Required by the optional OLEVBMacro plugin.

  - Razor2

-    If you plan to use Vipul's Razor, note that versions up to and
-    including version 2.82 include a bug that will slow down the entire
-    perl interpreter.  Version 2.83 or later fixes this.
+    If you plan to use Vipul's Razor, note that versions up to and including
+    version 2.82 include a bug that will slow down the entire perl
+    interpreter.  Version 2.83 or later fixes this.

-    If you do not plan to use this plugin, be sure to comment out
-    its loadplugin line in "/etc/mail/spamassassin/v310.pre".
+    If you do not plan to use this plugin, be sure to comment out its
+    loadplugin line in "/etc/mail/spamassassin/v310.pre".
+
+  - Digest::SHA1 (from CPAN)
+
+    An external perl module razor-agents-2.84 as used by a Razor2 plugin
+    seems to be the only remaining component depending on Digest::SHA1 (note
+    that a packager may ship a patched version of razor-agents which can use
+    Digest::SHA instead)
+
+  - LWP::UserAgent (aka libwww-perl) (from CPAN)
+
+    Can be used by sa-update to retrieve update archives, as alternative to
+    curl/wget/fetch.
+
+  - Net::SMTP (from CPAN)
+
+    Used when manually reporting spam to SpamCop.
+
+Examples of installing most recommended modules on popular distributions:
+
+ Debian/Ubuntu:
+  apt-get install libencode-detect-perl libnet-libidn-perl \
+    libemail-address-xs-perl libmail-dkim-perl libmail-spf-perl \
+    libio-socket-ip-perl
+
+ Gentoo:
+  emerge dev-perl/Encode-Detect dev-perl/Net-LibIDN \
+    dev-perl/Email-Address-XS dev-perl/Mail-DKIM dev-perl/Mail-SPF
+
+ Fedora/CentOS/RedHat:
+  yum install perl-MIME-Base64 perl-Encode-Detect perl-Net-LibIDN \
+    perl-Email-Address-XS perl-Mail-DKIM perl-Mail-SPF perl-IO-Socket-IP


 What Next?
@ -471,7 +448,7 @@ What Next?
 Take a look at the USAGE document for more information on how to use
 SpamAssassin.

-The SpamAssassin Wiki <http://wiki.spamassassin.org/> contains
+The SpamAssassin Wiki <https://wiki.apache.org/spamassassin/> contains
 information on custom plugins, extensions, and other optional modules
 included with SpamAssassin.

--- a/upstream/INSTALL.VMS
+++ b/upstream/INSTALL.VMS
@ -30,6 +30,6 @@ Notes on building SpamAssassin on VMS

  - bug 1099 in the SA Bugzilla is being used to track progress.

-        http://issues.apache.org/SpamAssassin/show_bug.cgi?id=1099
+        https://issues.apache.org/SpamAssassin/show_bug.cgi?id=1099


--- a/upstream/MANIFEST
+++ b/upstream/MANIFEST
@ -27,7 +27,7 @@ lib/Mail/SpamAssassin.pm
 lib/Mail/SpamAssassin/AICache.pm
 lib/Mail/SpamAssassin/ArchiveIterator.pm
 lib/Mail/SpamAssassin/AsyncLoop.pm
-lib/Mail/SpamAssassin/AutoWhitelist.pm
+lib/Mail/SpamAssassin/AutoWelcomelist.pm
 lib/Mail/SpamAssassin/Bayes.pm
 lib/Mail/SpamAssassin/Bayes/CombineChi.pm
 lib/Mail/SpamAssassin/Bayes/CombineNaiveBayes.pm
@ -49,6 +49,7 @@ lib/Mail/SpamAssassin/Constants.pm
 lib/Mail/SpamAssassin/DBBasedAddrList.pm
 lib/Mail/SpamAssassin/Dns.pm
 lib/Mail/SpamAssassin/DnsResolver.pm
+lib/Mail/SpamAssassin/GeoDB.pm
 lib/Mail/SpamAssassin/HTML.pm
 lib/Mail/SpamAssassin/Locales.pm
 lib/Mail/SpamAssassin/Locker.pm
@ -74,20 +75,23 @@ lib/Mail/SpamAssassin/Plugin/ASN.pm
 lib/Mail/SpamAssassin/Plugin/AWL.pm
 lib/Mail/SpamAssassin/Plugin/AccessDB.pm
 lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
+lib/Mail/SpamAssassin/Plugin/AuthRes.pm
 lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
 lib/Mail/SpamAssassin/Plugin/Bayes.pm
 lib/Mail/SpamAssassin/Plugin/BodyEval.pm
 lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
 lib/Mail/SpamAssassin/Plugin/Check.pm
 lib/Mail/SpamAssassin/Plugin/DCC.pm
+lib/Mail/SpamAssassin/Plugin/DecodeShortURLs.pm
 lib/Mail/SpamAssassin/Plugin/DKIM.pm
+lib/Mail/SpamAssassin/Plugin/DMARC.pm
 lib/Mail/SpamAssassin/Plugin/DNSEval.pm
+lib/Mail/SpamAssassin/Plugin/ExtractText.pm
 lib/Mail/SpamAssassin/Plugin/FreeMail.pm
 lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
+lib/Mail/SpamAssassin/Plugin/HashBL.pm
 lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
 lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
-lib/Mail/SpamAssassin/Plugin/Hashcash.pm
-lib/Mail/SpamAssassin/Plugin/HashBL.pm
 lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
 lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
 lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
@ -116,7 +120,7 @@ lib/Mail/SpamAssassin/Plugin/URIDetail.pm
 lib/Mail/SpamAssassin/Plugin/URIEval.pm
 lib/Mail/SpamAssassin/Plugin/VBounce.pm
 lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
-lib/Mail/SpamAssassin/Plugin/WhiteListSubject.pm
+lib/Mail/SpamAssassin/Plugin/WelcomeListSubject.pm
 lib/Mail/SpamAssassin/PluginHandler.pm
 lib/Mail/SpamAssassin/Plugin/URILocalBL.pm
 lib/Mail/SpamAssassin/RegistryBoundaries.pm
@ -132,7 +136,6 @@ lib/Mail/SpamAssassin/Util/ScopedTimer.pm
 lib/Mail/SpamAssassin/Util/TieOneStringHash.pm
 lib/spamassassin-run.pod
 procmailrc.example
-rules.README
 rules/active.list
 rules/init.pre
 rules/languages
@ -148,7 +151,10 @@ rules/v340.pre
 rules/v341.pre
 rules/v342.pre
 rules/v343.pre
+rules/v400.pre
 rules/20_aux_tlds.cf
+rules-extras/README.txt
+rules-extras/10_uridnsbl_skip_financial.cf
 sa-awl.raw
 sa-check_spamd.raw
 sa-compile.raw
@ -222,6 +228,9 @@ sql/awl_mysql.sql
 sql/awl_pg.sql
 sql/bayes_mysql.sql
 sql/bayes_pg.sql
+sql/decodeshorturl_mysql.sql
+sql/decodeshorturl_pg.sql
+sql/decodeshorturl_sqlite.sql
 sql/userpref_mysql.sql
 sql/userpref_pg.sql
 sql/txrep_mysql.sql
@ -231,10 +240,13 @@ t/README
 t/SATest.pl
 t/SATest.pm
 t/all_modules.t
+t/askdns.t
+t/authres.t
 t/autolearn.t
 t/autolearn_force.t
 t/autolearn_force_fail.t
 t/basic_lint.t
+t/basic_lint_net.t
 t/basic_lint_without_sandbox.t
 t/basic_meta.t
 t/basic_meta2.t
@ -246,6 +258,7 @@ t/bayessdbm.t
 t/bayessdbm_seen_delete.t
 t/bayessql.t
 t/blacklist_autolearn.t
+t/blocklist_autolearn.t
 t/body_mod.t
 t/body_str.t
 t/check_implemented.t
@ -256,10 +269,13 @@ t/config_text.t
 t/config_tree_recurse.t
 t/cpp_comments_in_spamc.t
 t/cross_user_config_leak.t
-t/olevbmacro.t
+t/dmarc.t
+t/arc.t
 t/data/01_test_rules.cf
 t/data/01_test_rules.pre
 t/data/Dumpheaders.pm
+t/data/dkim/arc/ok01.eml
+t/data/dkim/arc/ko01.eml
 t/data/dkim/test-adsp-11.msg
 t/data/dkim/test-adsp-12.msg
 t/data/dkim/test-adsp-13.msg
@ -307,6 +323,15 @@ t/data/dkim/test-pass-23.msg
 t/data/etc/hello.txt
 t/data/etc/testhost.cert
 t/data/etc/testhost.key
+t/data/geodb/GeoIP2-City.mmdb
+t/data/geodb/GeoIP2-Country.mmdb
+t/data/geodb/GeoIP2-ISP.mmdb
+t/data/geodb/GeoIPCity.dat
+t/data/geodb/GeoIPISP.dat
+t/data/geodb/create_GeoIPCity.README
+t/data/geodb/create_GeoIPISP.README
+t/data/geodb/create_ipcc.sh
+t/data/geodb/ipcc.db
 t/data/mime-subject.txt
 t/data/nice/001
 t/data/nice/002
@ -324,6 +349,7 @@ t/data/nice/013
 t/data/nice/014
 t/data/nice/015
 t/data/nice/016
+t/data/nice/authres
 t/data/nice/base64.txt
 t/data/nice/crlf-endings
 t/data/nice/dkim/AddedVtag_07
@ -336,6 +362,10 @@ t/data/nice/dkim/MultipleSig_06
 t/data/nice/dkim/NonExistingHeader_09
 t/data/nice/dkim/Nowsp_03
 t/data/nice/dkim/Simple_02
+t/data/nice/dmarc/noneok.eml
+t/data/nice/dmarc/quarok.eml
+t/data/nice/dmarc/rejectok.eml
+t/data/nice/dmarc/strictrejectok.eml
 t/data/nice/mailman_message.txt
 t/data/nice/mime1
 t/data/nice/mime2
@ -353,6 +383,14 @@ t/data/nice/spf1
 t/data/nice/spf2
 t/data/nice/spf3
 t/data/nice/spf3-received-spf
+t/data/nice/spf4-received-spf-nofold
+t/data/nice/spf5-received-spf-crlf
+t/data/nice/spf6-received-spf-crlf2
+t/data/nice/unicode1
+t/data/nice/unicode2
+t/data/nice.mbox
+t/data/phishing/openphish-feed.txt
+t/data/phishing/phishtank-feed.csv
 t/data/reporterplugin.pm
 t/data/spam/001
 t/data/spam/002
@ -380,107 +418,149 @@ t/data/spam/badmime3.txt
 t/data/spam/base64.txt
 t/data/spam/bsmtp
 t/data/spam/bsmtpnull
+t/data/spam/decodeshorturl/base.eml
+t/data/spam/decodeshorturl/base2.eml
+t/data/spam/decodeshorturl/chain.eml
+t/data/spam/dmarc/nodmarc.eml
+t/data/spam/dmarc/noneko.eml
+t/data/spam/dmarc/quarko.eml
+t/data/spam/dmarc/rejectko.eml
+t/data/spam/dmarc/strictrejectko.eml
 t/data/spam/dnsbl.eml
+t/data/spam/dnsbl_domsonly.eml
+t/data/spam/dnsbl_ipsonly.eml
+t/data/spam/extracttext/gtube_png.eml
+t/data/spam/extracttext/gtube_pdf.eml
+t/data/spam/extracttext/gtube_b64_oct.eml
+t/data/spam/freemail1
+t/data/spam/freemail2
+t/data/spam/freemail3
+t/data/spam/fromnamespoof/spoof1
 t/data/spam/gtube.eml
 t/data/spam/gtubedcc.eml
+t/data/spam/gtubedcc_crlf.eml
+t/data/spam/hashbl
 t/data/spam/olevbmacro/encrypted.eml
 t/data/spam/olevbmacro/goodcsv.eml
 t/data/spam/olevbmacro/macro.eml
 t/data/spam/olevbmacro/malicemacro.eml
 t/data/spam/olevbmacro/nomacro.eml
 t/data/spam/olevbmacro/renamedmacro.eml
+t/data/spam/olevbmacro/target_uri.eml
 t/data/spam/olevbmacro/zippwmacro.eml
+t/data/spam/phishing_openphish.eml
+t/data/spam/phishing_phishtank.eml
+t/data/spam/pyzor
 t/data/spam/razor2
 t/data/spam/relayUS.eml
 t/data/spam/spf1
 t/data/spam/spf2
 t/data/spam/spf3
+t/data/spam/unicode1
+t/data/spam/urilocalbl_net.eml
 t/data/spamc_blank.cf
 t/data/taintcheckplugin.pm
 t/data/testplugin.pm
 t/data/testplugin2.pm
 t/data/validuserplugin.pm
-t/data/whitelists/action.eff.org
-t/data/whitelists/amazon_co_uk_ship
-t/data/whitelists/amazon_com_ship
-t/data/whitelists/cert.org
-t/data/whitelists/debian_bts_reassign
-t/data/whitelists/ibm_enews_de
-t/data/whitelists/infoworld
-t/data/whitelists/linuxplanet
-t/data/whitelists/lp.org
-t/data/whitelists/media_unspun
-t/data/whitelists/mlist_mailman_message
-t/data/whitelists/mlist_yahoo_groups_message
-t/data/whitelists/mypoints
-t/data/whitelists/neat_net_tricks
-t/data/whitelists/netcenter-direct_de
-t/data/whitelists/netsol_renewal
-t/data/whitelists/networkworld
-t/data/whitelists/oracle_net_techblast
-t/data/whitelists/orbitz.com
-t/data/whitelists/paypal.com
-t/data/whitelists/register.com_password
-t/data/whitelists/ryanairmail.com
-t/data/whitelists/sf.net
-t/data/whitelists/winxpnews.com
-t/data/whitelists/yahoo-inc.com
-t/data/phishing/openphish-feed.txt
-t/data/phishing/phishtank-feed.csv
-t/data/spam/phishing_openphish.eml
-t/data/spam/phishing_phishtank.eml
-t/phishing.t
+t/data/welcomelists/action.eff.org
+t/data/welcomelists/amazon_co_uk_ship
+t/data/welcomelists/amazon_com_ship
+t/data/welcomelists/cert.org
+t/data/welcomelists/debian_bts_reassign
+t/data/welcomelists/ibm_enews_de
+t/data/welcomelists/infoworld
+t/data/welcomelists/linuxplanet
+t/data/welcomelists/lp.org
+t/data/welcomelists/media_unspun
+t/data/welcomelists/mlist_mailman_message
+t/data/welcomelists/mlist_yahoo_groups_message
+t/data/welcomelists/mypoints
+t/data/welcomelists/neat_net_tricks
+t/data/welcomelists/netcenter-direct_de
+t/data/welcomelists/netsol_renewal
+t/data/welcomelists/networkworld
+t/data/welcomelists/oracle_net_techblast
+t/data/welcomelists/orbitz.com
+t/data/welcomelists/paypal.com
+t/data/welcomelists/register.com_password
+t/data/welcomelists/ryanairmail.com
+t/data/welcomelists/sf.net
+t/data/welcomelists/winxpnews.com
+t/data/welcomelists/yahoo-inc.com
 t/date.t
 t/db_awl_path.t
+t/db_awl_path_welcome_block.t
 t/db_awl_perms.t
+t/db_awl_perms_welcome_block.t
+t/db_based_welcomelist.t
+t/db_based_welcomelist_ips.t
 t/db_based_whitelist.t
 t/db_based_whitelist_ips.t
 t/dcc.t
 t/debug.t
+t/decodeshorturl.t
 t/desc_wrap.t
 t/dkim.t
 t/dnsbl.t
 t/dnsbl_sc_meta.t
-t/duplicates.t
+t/dnsbl_subtests.t
+t/enable_compat.t
+t/extracttext.t
 t/freemail.t
+t/freemail_welcome_block.t
+t/fromnamespoof.t
 t/get_all_headers.t
 t/get_headers.t
 t/gtube.t
-t/hashcash.t
+t/header.t
+t/header_utf8.t
+t/hashbl.t
 t/html_colors.t
 t/html_obfu.t
 t/html_utf8.t
 t/idn_dots.t
 t/if_can.t
+t/if_else.t
 t/ifversion.t
 t/ip_addrs.t
 t/lang_lint.t
 t/lang_pl_tests.t
 t/line_endings.t
 t/lint_nocreate_prefs.t
+t/local_tests_only.t
 t/memory_cycles.t
 t/metadata.t
 t/mimeheader.t
 t/mimeparse.t
 t/missing_hb_separator.t
+t/mkrules.t
+t/mkrules_else.t
 t/nonspam.t
+t/olevbmacro.t
 t/originating_ip_hdr.t
+t/pdfinfo.t
+t/phishing.t
 t/plugin.t
 t/plugin_file.t
 t/plugin_priorities.t
 t/prefs_include.t
 t/priorities.t
+t/priorities_welcome_block.t
+t/perlcritic.t
+t/perlcritic.pl
+t/podchecker.t
+t/pyzor.t
 t/razor2.t
 t/rcvd_parser.t
 t/re_base_extraction.t
 t/recips.t
 t/recreate.t
 t/recursion.t
+t/regexp_named_capture.t
 t/regexp_valid.t
-t/relaycountry_fast.t
-t/relaycountry_geoip.t
-t/relaycountry_geoip2.t
 t/relative_scores.t
+t/relaycountry.t
 t/report_safe.t
 t/reportheader.t
 t/reportheader_8bit.t
@ -500,14 +580,18 @@ t/rule_multiple.t
 t/rule_names.t
 t/rule_types.t
 t/sa_awl.t
+t/sa_awl_welcome_block.t
 t/sa_check_spamd.t
 t/sa_compile.t
 t/sha1.t
 t/shortcircuit.t
+t/shortcircuit_before_dns.t
 t/spam.t
 t/spamc.t
 t/spamc_B.t
 t/spamc_E.t
+t/spamc_H.t
+t/spamc_bug6176.t
 t/spamc_c.t
 t/spamc_c_stdout_closed.t
 t/spamc_cf.t
@ -540,6 +624,7 @@ t/spamd_report.t
 t/spamd_report_ifspam.t
 t/spamd_sql_prefs.t
 t/spamd_ssl.t
+t/spamd_ssl_z.t
 t/spamd_ssl_accept_fail.t
 t/spamd_stop.t
 t/spamd_symbols.t
@ -548,8 +633,11 @@ t/spamd_unix.t
 t/spamd_unix_and_tcp.t
 t/spamd_user_rules_leak.t
 t/spamd_utf8.t
+t/spamd_welcomelist_leak.t
 t/spamd_whitelist_leak.t
 t/spf.t
+t/spf_welcome_block.t
+t/sql_based_welcomelist.t
 t/sql_based_whitelist.t
 t/stop_always_matching_regexps.t
 t/strip2.t
@ -557,30 +645,31 @@ t/strip_no_subject.t
 t/stripmarkup.t
 t/tainted_msg.t
 t/test_dir
+t/testrules.yml
 t/text_bad_ctype.t
 t/timeout.t
 t/trust_path.t
 t/uri.t
 t/uri_html.t
+t/uri_list.t
+t/uri_saferedirect.t
 t/uri_text.t
 t/uribl.t
-t/urilocalbl_geoip.t
-t/uri_saferedirect.t
+t/uribl_all_types.t
+t/uribl_domains_only.t
+t/uribl_ips_only.t
+t/urilocalbl.t
 t/utf8.t
 t/util_wrap.t
+t/welcomelist_addrs.t
+t/welcomelist_from.t
+t/welcomelist_subject.t
+t/welcomelist_to.t
+t/wlbl_uri.t
 t/whitelist_addrs.t
 t/whitelist_from.t
 t/whitelist_subject.t
 t/whitelist_to.t
-t/zz_cleanup.t
-t/spamc_bug6176.t
-t/data/spam/dnsbl_domsonly.eml
-t/uribl_domains_only.t
-t/data/spam/dnsbl_ipsonly.eml
-t/uribl_all_types.t
-t/uribl_ips_only.t
-t/uri_list.t
-t/dnsbl_subtests.t
 powered_by/128-powered-by-spamassassin.png
 powered_by/256-powered-by-spamassassin.png
 powered_by/512-powered-by-spamassassin.png
--- a/upstream/MANIFEST.SKIP
+++ b/upstream/MANIFEST.SKIP
@ -10,6 +10,7 @@
 \.pid$
 \.so$
 \.svn/
+\.gitattributes$
 \.gitignore$
 \.swp$
 \.tmp$
@ -22,6 +23,7 @@
 \btmon\.out$
 \b[Oo][Ll][Dd]$
 \b[Oo][Uu][Tt]$
+^.github/
 ^blib/
 ^blibdirs$
 ^build/3\.\d\.\d_change_summary$
@ -63,7 +65,7 @@
 ^t/bayessql\.cf$
 ^t/config$
 ^t/data/nice/cjk/
-^t/data/whitelists/
+^t/data/welcomelists/
 ^t/do_net$
 ^t/log/
 ^t/rule_tests\.t$
@ -118,7 +120,4 @@
 ^build/rebuild_xt$
 ^build/repackage_latest_update_rules$
 ^MYMETA.(json|yml)$
-^trunk-only.*
-^t/mkrules\.t$
-^t/mkrules_else\.t$
-^t/spamc_H\.t$
+^textcat/
--- a/upstream/META.json
+++ b/upstream/META.json
@ -4,13 +4,14 @@
      "The Apache SpamAssassin Project <dev@spamassassin.apache.org>"
   ],
   "dynamic_config" : 1,
-   "generated_by" : "ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.120921",
+   "generated_by" : "ExtUtils::MakeMaker version 7.62, CPAN::Meta::Converter version 2.150010",
   "license" : [
+      "unknown",
      "apache_2_0"
   ],
   "meta-spec" : {
      "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
-      "version" : "2"
+      "version" : 2
   },
   "name" : "Mail-SpamAssassin",
   "no_index" : {
@ -22,26 +23,72 @@
   "prereqs" : {
      "build" : {
         "requires" : {
-            "ExtUtils::MakeMaker" : "0"
+            "ExtUtils::MakeMaker" : "6.64"
+         }
+      },
+      "configure" : {
+         "requires" : {
+            "ExtUtils::MakeMaker" : "6.64"
         }
      },
      "runtime" : {
+         "recommends" : {
+            "Archive::Zip" : "0",
+            "BSD::Resource" : "0",
+            "Compress::Zlib" : "0",
+            "DBD::SQLite" : "1.5901",
+            "DBI" : "0",
+            "DB_File" : "0",
+            "Email::Address::XS" : "0",
+            "Encode::Detect::Detector" : "0",
+            "Geo::IP" : "0",
+            "IO::Socket::INET6" : "0",
+            "IO::Socket::IP" : "0.09",
+            "IO::Socket::SSL" : "1.76",
+            "IO::String" : "0",
+            "IP::Country::DB_File" : "0",
+            "IP::Country::Fast" : "0",
+            "LWP::UserAgent" : "0",
+            "MIME::Base64" : "0",
+            "Mail::DKIM" : "0.37",
+            "Mail::DMARC" : "0",
+            "Mail::SPF" : "0",
+            "MaxMind::DB::Reader" : "0",
+            "MaxMind::DB::Reader::XS" : "0",
+            "Net::CIDR::Lite" : "0",
+            "Net::LibIDN" : "0",
+            "Net::LibIDN2" : "0",
+            "Net::Patricia" : "1.16",
+            "Net::SMTP" : "0",
+            "Razor2::Client::Agent" : "2.61"
+         },
         "requires" : {
            "Archive::Tar" : "1.23",
-            "Digest::SHA1" : "0",
            "Errno" : "0",
            "File::Copy" : "2.02",
            "File::Spec" : "0.8",
            "HTML::Parser" : "3.43",
            "IO::Zlib" : "1.04",
            "Mail::DKIM" : "0.31",
-            "Net::DNS" : "0.34",
+            "Net::DNS" : "0.69",
            "NetAddr::IP" : "4.01",
            "Pod::Usage" : "1.1",
            "Sys::Hostname" : "0",
-            "Test::More" : "0",
            "Time::HiRes" : "0",
-            "Time::Local" : "0"
+            "Time::Local" : "0",
+            "perl" : "5.014"
+         }
+      },
+      "test" : {
+         "recommends" : {
+            "Net::DNS::Nameserver" : "0"
+         },
+         "requires" : {
+            "Devel::Cycle" : "0",
+            "Perl::Critic::Policy::Perlsecret" : "0",
+            "Perl::Critic::Policy::TestingAndDebugging::ProhibitNoStrict" : "0",
+            "Test::More" : "0",
+            "Text::Diff" : "0"
         }
      }
   },
@ -57,5 +104,6 @@
      },
      "x_MailingList" : "http://wiki.apache.org/spamassassin/MailingLists"
   },
-   "version" : "3.004006"
+   "version" : "4.000000",
+   "x_serialization_backend" : "JSON::PP version 4.06"
 }
--- a/upstream/META.yml
+++ b/upstream/META.yml
@ -3,37 +3,73 @@ abstract: 'Apache SpamAssassin is an extensible email filter which is used to id
 author:
  - 'The Apache SpamAssassin Project <dev@spamassassin.apache.org>'
 build_requires:
-  ExtUtils::MakeMaker: 0
+  Devel::Cycle: '0'
+  ExtUtils::MakeMaker: '6.64'
+  Perl::Critic::Policy::Perlsecret: '0'
+  Perl::Critic::Policy::TestingAndDebugging::ProhibitNoStrict: '0'
+  Test::More: '0'
+  Text::Diff: '0'
+configure_requires:
+  ExtUtils::MakeMaker: '6.64'
 dynamic_config: 1
-generated_by: 'ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.120921'
-license: apache
+generated_by: 'ExtUtils::MakeMaker version 7.62, CPAN::Meta::Converter version 2.150010'
+license: unknown
 meta-spec:
  url: http://module-build.sourceforge.net/META-spec-v1.4.html
-  version: 1.4
+  version: '1.4'
 name: Mail-SpamAssassin
 no_index:
  directory:
    - t
    - inc
+recommends:
+  Archive::Zip: '0'
+  BSD::Resource: '0'
+  Compress::Zlib: '0'
+  DBD::SQLite: '1.5901'
+  DBI: '0'
+  DB_File: '0'
+  Email::Address::XS: '0'
+  Encode::Detect::Detector: '0'
+  Geo::IP: '0'
+  IO::Socket::INET6: '0'
+  IO::Socket::IP: '0.09'
+  IO::Socket::SSL: '1.76'
+  IO::String: '0'
+  IP::Country::DB_File: '0'
+  IP::Country::Fast: '0'
+  LWP::UserAgent: '0'
+  MIME::Base64: '0'
+  Mail::DKIM: '0.37'
+  Mail::DMARC: '0'
+  Mail::SPF: '0'
+  MaxMind::DB::Reader: '0'
+  MaxMind::DB::Reader::XS: '0'
+  Net::CIDR::Lite: '0'
+  Net::LibIDN: '0'
+  Net::LibIDN2: '0'
+  Net::Patricia: '1.16'
+  Net::SMTP: '0'
+  Razor2::Client::Agent: '2.61'
 requires:
-  Archive::Tar: 1.23
-  Digest::SHA1: 0
-  Errno: 0
-  File::Copy: 2.02
-  File::Spec: 0.8
-  HTML::Parser: 3.43
-  IO::Zlib: 1.04
-  Mail::DKIM: 0.31
-  Net::DNS: 0.34
-  NetAddr::IP: 4.01
-  Pod::Usage: 1.1
-  Sys::Hostname: 0
-  Test::More: 0
-  Time::HiRes: 0
-  Time::Local: 0
+  Archive::Tar: '1.23'
+  Errno: '0'
+  File::Copy: '2.02'
+  File::Spec: '0.8'
+  HTML::Parser: '3.43'
+  IO::Zlib: '1.04'
+  Mail::DKIM: '0.31'
+  Net::DNS: '0.69'
+  NetAddr::IP: '4.01'
+  Pod::Usage: '1.1'
+  Sys::Hostname: '0'
+  Time::HiRes: '0'
+  Time::Local: '0'
+  perl: '5.014'
 resources:
+  MailingList: http://wiki.apache.org/spamassassin/MailingLists
  homepage: https://spamassassin.apache.org/
  license: http://www.apache.org/licenses/LICENSE-2.0.html
  repository: http://svn.apache.org/repos/asf/spamassassin/
-  x_MailingList: http://wiki.apache.org/spamassassin/MailingLists
-version: 3.004006
+version: '4.000000'
+x_serialization_backend: 'CPAN::Meta::YAML version 0.018'
--- a/upstream/Makefile.PL
+++ b/upstream/Makefile.PL
@ -1,19 +1,14 @@
 #!/usr/bin/perl
-require 5.6.1;
+require v5.14.0;

 use strict;
 use warnings;
 use Config;

-use ExtUtils::MakeMaker;
+use ExtUtils::MakeMaker 6.64;

-# avoid stupid 'Argument "6.30_01" isn't numeric in numeric ge (>=)' warnings;
-my $mm_version = eval $ExtUtils::MakeMaker::VERSION;
-
-# raising the version of makemaker to 6.46 per bug 6598 & 6131
-if ($mm_version < 6.46) {
-  die "Apache SpamAssassin Makefile.PL requires at least ExtUtils::MakeMaker v6.46";
-}
+# raising the version of makemaker to 6.64 to use TEST_REQUIRES
+use constant MIN_MAKEMAKER_VERSION => 6.64;

 use constant RUNNING_ON_WINDOWS => ($^O =~ /^(mswin|dos|os2)/oi);
 use constant HAS_DBI => eval { require DBI; };
@ -177,9 +172,7 @@ my %makefile = (
    },

    # be quite explicit about this; afaik CPAN.pm is sensible using this
-    # also see CURRENT_PM below
    'PREREQ_PM' => {
-        'Digest::SHA1'         => 0,             # 2.0 is oldest tested version
        'File::Spec'           => 0.8,           # older versions lack some routines we need
        'File::Copy'           => 2.02,          # this version is shipped with 5.005_03, the oldest version known to work
        'Pod::Usage'           => 1.10,          # all versions prior to this do seem to be buggy
@ -187,18 +180,37 @@ my %makefile = (
        'Archive::Tar'         => 1.23,          # for sa-update
        'IO::Zlib'             => 1.04,          # for sa-update
        'Mail::DKIM'           => 0.31,
-        'Net::DNS'             => (RUNNING_ON_WINDOWS ? 0.46 : 0.34), # bugs in older revs
+        'Net::DNS'             => 0.69,
        'NetAddr::IP'          => 4.010,
        'Sys::Hostname'        => 0,
-        'Test::More'           => 0,
        'Time::HiRes'          => 0,
        'Time::Local'          => 0,
        'Errno'                => 0,
    },

+    # In case MIN_MAKEMAKER_VERSION is greater than the version bundled in the core of MIN_PERL_VERSION
+    # use this to ensure CPAN will automatically upgrade MakeMaker if needed
+    'BUILD_REQUIRES' => {
+        'ExtUtils::MakeMaker' => MIN_MAKEMAKER_VERSION,
+    },
+
+    'CONFIGURE_REQUIRES' => {
+        'ExtUtils::MakeMaker' => MIN_MAKEMAKER_VERSION,
+    },
+
+    # The modules that are not core that are used in default tests
+    'TEST_REQUIRES' => {
+        'Devel::Cycle' => 0,
+        'Test::More'   => 0,
+        'Text::Diff' => 0,
+        'Perl::Critic::Policy::TestingAndDebugging::ProhibitNoStrict' => 0,
+        'Perl::Critic::Policy::Perlsecret' => 0,
+    },
+
    'dist' => {
        COMPRESS => 'gzip -9f',
-        SUFFIX => 'gz',
+        SUFFIX => '.gz',
+        TARFLAGS => 'cf',
        DIST_DEFAULT => 'tardist',

        CI => 'svn commit',
@ -228,12 +240,8 @@ my %makefile = (

        'rules/*.pm',

-        # don't remove these. they are built from 'rulesrc' in SVN, but
-        # in a distribution tarball, they're not
-        # 'rules/70_sandbox.cf',
-        # 'rules/72_active.cf',
-
-        # this file is no longer built, or used
+        'rules/70_sandbox.cf',
+        'rules/72_active.cf',
        'rules/70_inactive.cf',

      )
@ -246,6 +254,7 @@ my %makefile = (
    # We have only this Makefile.PL and this option keeps MakeMaker from
    # asking all questions twice after a 'make dist*'.
    'NORECURS' => 1,
+    'MIN_PERL_VERSION'=> 5.014000,
 );

 # rules/72_active.cf is built from "rulesrc", but *must* exist before
@ -254,14 +263,6 @@ my @FILES_THAT_MUST_EXIST = qw(
        rules/72_active.cf
    );

-# make sure certain optional modules are up-to-date if they are installed
-# also see PREREQ_PM above
-my %CURRENT_PM = (
-    'Net::DNS' => (RUNNING_ON_WINDOWS ? 0.46 : 0.34),
-    'Razor2::Client::Agent' => 2.40,
-);
-
-
 # All the $(*MAN1*) stuff is empty/zero if Perl was Configured with -Dman1dir=none;
 # however, support site/vendor man1 dirs (bug 5338)
 unless($Config{installman1dir}
@ -324,11 +325,13 @@ See that file if you wish to customize what tests are run, and how.

 # check optional module versions
 use lib 'lib';
-use Mail::SpamAssassin::Util::DependencyInfo;
-if (Mail::SpamAssassin::Util::DependencyInfo::long_diagnostics() != 0) {
-  # missing required module?  die!
-  # bug 5908: http://cpantest.grango.org/wiki/CPANAuthorNotes says
-  # we should exit with a status of 0, but without creating Makefile
+require Mail::SpamAssassin::Util::DependencyInfo;
+if (Mail::SpamAssassin::Util::DependencyInfo::long_diagnostics(1) != 0) {
+  # This prints a full report of missing required and optional modules and binaries
+  # but only exit 0 without creating Makefile if there are missing required binaries.
+  # See http://cpantest.grango.org/wiki/CPANAuthorNotes
+  # Continuing when there are missing required CPAN modules allows cpan to install them
+  # before it runs make on the Makefile
  exit 0;
 }

@ -362,26 +365,45 @@ $makefile{META_MERGE} = {
    MailingList => 'http://wiki.apache.org/spamassassin/MailingLists',
  },

+  prereqs => {
+    runtime => {
      recommends => {
        'MIME::Base64' =>              0,
        'DB_File' =>                   0,
        'Net::SMTP' =>                 0,
+        'Net::LibIDN2' =>              0,
+        'Net::LibIDN' =>               0,
        'Mail::SPF' =>                 0,
+        'MaxMind::DB::Reader' =>       0,
+        'MaxMind::DB::Reader::XS' =>   0,
        'Geo::IP' =>                   0,
+        'IP::Country::DB_File' =>      0,
+        'IP::Country::Fast' =>         0,
        'Razor2::Client::Agent' =>     2.61,
-    'Net::Ident' =>                0,
+        'IO::Socket::IP' =>            0.09,
        'IO::Socket::INET6' =>         0,
        'IO::Socket::SSL' =>           1.76,
        'Compress::Zlib' =>            0,
        'Mail::DKIM' =>                0.37,
        'DBI' =>                       0,
-    'Getopt::Long' =>              2.32,
+        'DBD::SQLite' =>               1.59_01,
        'LWP::UserAgent' =>            0,
-    'HTTP::Date' =>                0,
-    'Archive::Tar' =>              1.23,
-    'IO::Zlib' =>                  1.04,
-    'Encode::Detect' =>            0
-  }
+        'Encode::Detect::Detector' =>  0,
+        'Net::Patricia' =>             1.16,
+        'Net::CIDR::Lite' =>           0,
+        'BSD::Resource' =>             0,
+        'Archive::Zip' =>              0,
+        'IO::String' =>                0,
+        'Email::Address::XS' =>        0,
+        'Mail::DMARC' =>               0,
+      },
+    },
+    test => {
+      recommends => {
+        'Net::DNS::Nameserver' =>      0,
+      },
+    },
+  },
 };

 #######################################################################
@ -390,7 +412,7 @@ $makefile{META_MERGE} = {
 $makefile{EXE_FILES} = [ values %{$makefile{EXE_FILES}} ];
 $makefile{AUTHOR} =~ s/(<.+) at (.+>)/$1\@$2/;
 WriteMakefile(%makefile);
-print "Makefile written by ExtUtils::MakeMaker $mm_version\n";
+print "Makefile written by ExtUtils::MakeMaker $ExtUtils::MakeMaker::VERSION\n";

 #######################################################################

@ -795,7 +817,14 @@ sub _set_macro_PERL_TAINT {
 sub _set_macro_PREPROCESS {

  return if get_macro('PREPROCESS');
-  set_macro('PREPROCESS', join(' ', macro_ref('PERL_BIN'), qq{build/preprocessor}));
+  # Bug 8038 - work around quirk of newer Extutils::MakeMaker on Windows with dmake
+  my $perl_bin = get_expanded_macro('FULLPERL');
+  if ($RUNNING_ON_WINDOWS and ($::Config{make} eq 'dmake') and ($perl_bin =~ /^([a-zA-Z]:)?\\"(.*)$/)) {
+    $perl_bin = "\"$1\\$2";
+  } else {
+    $perl_bin = macro_ref('PERL_BIN');
+  }
+  set_macro('PREPROCESS', join(' ', $perl_bin, qq{build/preprocessor}));
 }

 # This routine sets the value for CONFIGURE (spamc only).
@ -1131,6 +1160,7 @@ conf__install:
 	$(PERL) -MFile::Copy -e "copy(q[rules/v341.pre], q[$(B_CONFDIR)/v341.pre]) unless -f q[$(B_CONFDIR)/v341.pre]"
 	$(PERL) -MFile::Copy -e "copy(q[rules/v342.pre], q[$(B_CONFDIR)/v342.pre]) unless -f q[$(B_CONFDIR)/v342.pre]"
 	$(PERL) -MFile::Copy -e "copy(q[rules/v343.pre], q[$(B_CONFDIR)/v343.pre]) unless -f q[$(B_CONFDIR)/v343.pre]"
+	$(PERL) -MFile::Copy -e "copy(q[rules/v400.pre], q[$(B_CONFDIR)/v400.pre]) unless -f q[$(B_CONFDIR)/v400.pre]"

 data__install:
 	-$(MKPATH) $(B_DATADIR)
--- a/upstream/PACKAGING
+++ b/upstream/PACKAGING
@ -289,25 +289,25 @@ Resources
 ---------

 [BUGZILLA] SpamAssassin bug database:
-  <http://issues.apache.org/SpamAssassin/>
+  <https://issues.apache.org/SpamAssassin/>

 [DEBPERL] Debian Perl Policy, Chapter 3: Packaged Modules:
-  <http://www.debian.org/doc/packaging-manuals/perl-policy/ch-module_packages.html>
+  <https://www.debian.org/doc/packaging-manuals/perl-policy/ch-module_packages.html>

 [GNUMAKECMD] GNU make manual: Make Conventions: Variables for Specifying
  Commands
-  <http://www.gnu.org/manual/make-3.79.1/html_chapter/make_14.html#SEC119>
+  <https://www.gnu.org/software/make/manual/html_node/Command-Variables.html#Command-Variables>

 [MANEUMM616] The man page for ExtUtils::MakeMaker 6.16:
-  <http://search.cpan.org/author/MSCHWERN/ExtUtils-MakeMaker-6.16/lib/ExtUtils/MakeMaker.pm#Default_Makefile_Behaviour>
+  <https://search.cpan.org/author/MSCHWERN/ExtUtils-MakeMaker-6.16/lib/ExtUtils/MakeMaker.pm#Default_Makefile_Behaviour>

 [MM00779] makemaker-at-perl-dot-org: Michael G Schwern: "Re: MakeMaker
  problems with relocation" (PREFIX was broken):
-  <http://www.mail-archive.com/makemaker@perl.org/msg00779.html>
+  <https://www.mail-archive.com/makemaker@perl.org/msg00779.html>

 [P5P94113] perl5-porters: Michael G Schwern: "Re: OS X's vendorlib default
  seems wrong" (description of different repositoreis):
-  <http://archive.develooper.com/perl5-porters@perl.org/msg94113.html>
+  <https://archive.develooper.com/perl5-porters@perl.org/msg94113.html>

 [RHBUG78053] Red Hat bug 78053: "incompatible changes in behavior of
  MakeMaker; affects rpm build process" (introduction of DESTDIR):
--- a/upstream/README
+++ b/upstream/README
@ -14,7 +14,7 @@ filtering, DNS blocklists, and collaborative filtering databases.
 Apache SpamAssassin is a project of the Apache Software Foundation (ASF).


-What Apache SpamAssassin is Not
+What Apache SpamAssassin Is Not
 -------------------------------

 Apache SpamAssassin is not a program to delete spam, route spam and ham to
@ -71,9 +71,9 @@ spamassassin-users mailing list[2]. If you've found a bug (and you're
 sure it's a bug after checking the Wiki), please file a report in our
 Bugzilla[3].

-	[1]: http://wiki.apache.org/spamassassin/
-	[2]: http://wiki.apache.org/spamassassin/MailingLists
-	[3]: http://issues.apache.org/SpamAssassin/
+	[1]: https://wiki.apache.org/spamassassin/
+	[2]: https://wiki.apache.org/spamassassin/MailingLists
+	[3]: https://issues.apache.org/SpamAssassin/

 Please also be sure to read the man pages.

@ -151,7 +151,7 @@ default locations that Apache SpamAssassin will look at the end.
  - $USER_HOME/.spamassassin:

  	User state directory.  Used to hold spamassassin state, such
-	as a per-user automatic whitelist, and the user's preferences
+	as a per-user automatic welcomelist, and the user's preferences
 	file.

  - $USER_HOME/.spamassassin/user_prefs:
@ -298,7 +298,7 @@ these translations, so that they can be added to the
 distribution. Please file a bug in our Bugzilla[4], and attach your
 translations. You will, of course, be credited for this work!

-	[4]: http://issues.apache.org/SpamAssassin/
+	[4]: https://issues.apache.org/SpamAssassin/


 Disabled code
--- a/upstream/UPGRADE
+++ b/upstream/UPGRADE
@ -1,607 +1,259 @@
-Note for Users Upgrading to SpamAssassin 3.4.5
+Note for Users Upgrading to SpamAssassin 4.0.0
 ----------------------------------------------

- Spamassassin test suite can now run against the installed
-  SpamAssassin files (rather than those in the source directory)
+Apache SpamAssassin 4.0.0 represents years of work by the project with
+numerous improvements, new rule types, and internal native handling
+of messages in international languages. We highly recommend looking
+through this file and all of the .pre files to evaluate your
+configuration thoroughly. Plugins have been added, removed, and
+improved throughout.
+
+- All rules, functions, command line options and modules that contain
+  "whitelist" or "blacklist" have been renamed to contain more
+  racially neutral "welcomelist" and "blocklist" terms. This allows
+  acronyms like WL and BL to remain the same. Previous options will
+  continue work at least until version 4.1.0 is released. If you have
+  local settings including scores or meta rules referring to old rule
+  names, these should be changed and "enable_compat
+  welcomelist_blocklist" added in init.pre. See:
+  https://wiki.apache.org/spamassassin/WelcomelistBlocklist (Bug 7826)
+
+- Meta rules no longer use priority values, they are evaluated
+  dynamically when the rules they depend on are finished. (Bug 7735)
+
+- API: New $pms->rule_ready() function. Any asynchronous eval-function
+  must now return undef (instead of 0 or 1), if rule result is not
+  ready when exiting the function. $pms->rule_ready($rulename) or
+  $pms->got_hit(...) must be called when the result has arrived. If
+  these are not used, it can break depending meta rule evaluation.
+
+- Setting normalize_charset is now enabled by default. Note that rules
+  should not expect specific non-UTF8 or UTF8 encoding in
+  body. Matching is done against the raw data which may vary depending
+  on normalize_charset setting and whether decoding to UTF8 was
+  successful. See:
+  https://wiki.apache.org/spamassassin/WritingRulesAdvanced
+
+- DKIM plugin has added support for ARC signature verification
+
+- The DecodeShortURL plugin has been added and decodes URIs from URL
+  shorteners that may be used to evade scanning
+
+- Strings can now be captured from rules and later reused using the
+     special %{TAGNAME} syntax
+
+- The Bayes stopwords, or noise words, are now configurable in order
+  to optimize Bayes usage for non-English languages. Stopwords for 16
+  foreign languages have been included. See 60_bayes_stopwords.cf in
+  the rules files. See Mail::SpamAssassin::Plugin::Bayes and the
+  bayes_stopword_languages option if you wish to use a different
+  stopword list. This is highly recommended if you are using Bayes and
+  you are processing messages in languages other than English.
+
+- The OLEVBMacro plugin has been improved to identify more macros
+  while also extracting uris from the attachments for automatic
+  inclusion in RBL lookups
+
+- Internationalized domain name (IDN) support has been added and
+  requires Net::LibIDN2 or Net::LibIDN module with a new
+  Util::idn_to_ascii() function. (Bug 7215)
+
+- Improved internal header address (From/To/Cc) parser, now also
+  handles multiple addresses and includes optional support for
+  external Email::Address::XS parser, which can handle nested comments
+  and other oddities.
+
+- Header :addr :name modifiers now return all addresses. Options of
+  :first :last select only first (topmost) or last header to process
+  when there are multiple headers with the same name. :addr and :name
+  may still return multiple values from a single header.
+
+- API: $pms->get() can and should now be called in list
+  context. Scalar context continues to return multiple values newline
+  separated, but this should be considered deprecated.
+
+- New ExtractText plugin that extracts text from documents or images
+  to feed the data into SpamAssassin for standard processing with
+  existing rules, URIs extracted from documents will fall into normal
+  RBL lookups.
+
+- New "nolog" tflag added to hide info coming from rules in
+  SpamAssassin reports
+
+- All log output (stderr, file, syslog) is now escaped properly for \r
+  \n \t \\, control chars, DEL, and UTF-8 sequences presented as
+  \x{XX}.  Whitespace is not normalized anymore like in versions prior
+  to 4.0.0.
+
+- API: Logger::add() has new optional 'escape' parameter.  New
+  Logger::escape_str() function.
+
+- API: New $pms->add_uri_detail_list() function. Also new
+  uri_detail_list types: unlinked, schemeless
+
+- Util::split_domain, trim_domain, and is_domain_valid functions have
+  a new optional argument ($is_ascii)
+
+- Header names support new :host :domain :ip :revip modifiers
+
+- AskDNS: tag HEADER(hdrname) supported to query any header content
+  similarly to header rules
+
+- The HashCash module and support has been removed completely, as it
+  has been long since deprecated
+
+- URILocalBL: uri_block_cc/uri_block_cont now support negation (Bug
+  7528)
+
+- URILocalBL: IPv6 lookups for hosts is now support, if provided by
+  your database
+
+- DNS and other asynchronous lookups such as Pyzor and DCC are now
+  only launched when priority -100 is reached. This allows short
+  circuiting at a lower priority without sending unneeded DNS queries
+  and starting process forms. (Bug 5930)
+
+- API: New plugin method callback method check_dnsbl added to launch
+  network lookups at priority -100 and check_post_dnsbl to harvest own
+  network lookups
+
+- API: New plugin callback method check_cleanup for cleaning up
+  things...
+
+- FreeMail: new options freemail_import_welcomelist_auth and
+  freemail_import_def_welcomelist_auth added (Bug 6451)
+
+- New internal Mail::SpamAssassin::GeoDB module that provides a
+  unified interface to modules MaxMind::DB::Reader (GeoIP2), Geo::IP,
+  IP::Country::DB_File, and IP::Country::Fast.
+
+  This is utilized by RelayCountry and URILocalBL with settings
+  geodb_module, geodb_options, and geodb_search_path.
+
+  Deprecated settings still work such as country_db_type,
+  country_db_path, uri_country_db_path, and uri_country_db_isp_path
+  but will print a warning to migrate to geodb_module/options.
+
+- Razor2 razor_fork option added to create separate Razor2 processes
+  and read in the results later asynchronously, increasing throughput,
+  and automatically adjusting rule priorities to -100.
+
+- DCC checks are now done asynchronously if using dccifd, improving
+  throughput.  With dccifd, rule priorities are automatically adjusted
+  to -100.  Commercial reputation rules can be ignored with the option
+  "use_dcc_rep 0" to save a few CPU cycles.
+
+- Pyzor pyzor_fork option added to create separate Pyzor processes and
+  read in the results later asynchronously, increasing throughput, and
+  automatically adjusting rule priorities to -100. Renamed pyzor_max
+  setting to pyzor_count_min. Added pyzor_welcomelist_min and
+  pyzor_welcomelist_factor setting. Also try to improve false
+  positives by ignoring "empty body" messages.
+
+- API: deprecated $pms->register_async_rule_start() and
+  $pms->register_async_rule_finish() calls though left in for
+  backwards compatibility. Plugins should only use
+  $pms->bgsend_and_start_lookup(), which handles required things
+  Automatically. Direct calls to bgsend or start_lookup should not be
+  used.  $pms->bgsend_and_start_lookup() should always contain
+  $ent->{rulename} for correct meta dependency handling. Deprecated
+  start_lookup, get_lookup, lookup_ns, harvest_until_rule_completes,
+  and is_rule_complete.
+
+- SPF: Mail::SPF is now the only supported perl module and
+  Mail::SPF::Query is deprecated along with the settings
+  do_not_use_mail_spf, and do_not_use_mail_spf_query. SPF lookups are
+  not done asynchronously so using an MTA filter such as pypolicyd-spf
+  or spf-engine can generate Received-SPF for SpamAssassin to parse.

- unwhitelist_auth now also removes def_whitelist_auth entries
+- "ALL" pseudo-header now returns decoded headers, so it's usage is
+  consistent with single header matching. Using the :raw option mimics
+  the previous behavior of with undecoded and folded headers.

- SPF: add unwhitelist_from_spf to remove both whitelist_from_spf and
-  def_whitelist_from_spf entries
+- New dns_block_rule option handles blocked DNSBLs (Bug 6728)

- Default SQL schema for userpref.value changed from varchar(100) to
-  varchar(255), no need to modify unless you hit the limit. (Bug 7803)
+- ASN: Support GeoDB for ASN lookups (asn_use_geodb, asn_prefer_geodb,
+  asn_use_dns).

- URIDetail can now match full hostname with "host" key
+- ASN: Default sa-update ruleset doesn't make ASN lookups or add
+  headers anymore. Configure desired methods, asn_use_geodb or
+  asn_use_dns, and add_header clauses manually as described in the
+  plugin documentation. Usage of asn_use_geodb without DNS is
+  recommended unless ASNCIDR is needed. Do not use rules that check
+  metadata X-ASN header! Only the new eval function check_asn()
+  described in plugin manual works reliably.

- BodyEval: plaintext_body_sig_ratio: eval rules for the (first text/plain
-  MIME part's) body and signature lengths and ratio
-
-Note for Users Upgrading to SpamAssassin 3.4.4
----------------------------------------------
-
- FromNameSpoof: fns_extrachars parameter default value has been increased to 50
-
- nosubject and maxhits tflags now work correctly with sa-compile
-
-Note for Users Upgrading to SpamAssassin 3.4.3
----------------------------------------------
-
- New subjprefix keyword added, this can be used to add a prefix to
-  email Subject if the original email matches a particular rule
-
- New Util::is_fqdn_valid() function to validate hostname (DNS name) format
-  (Bug 7736).  To check if a name contains valid TLD, it's still needed to
-  additionally use RegistryBoundaries::is_domain_valid()
-
- New OLEVBMacro plugin to detect OLE Macro inside documents attached to emails,
-  this plugin requires Archive::Zip and IO::String Perl modules to work.
-
- Due to the dangerous nature of sa-update --allowplugins option, it
-  now prints a warning that --reallyallowplugins is required to use it.
-  This is to make sure all the legacy installations and wiki guides etc
-  still using it needlessly get fixed.
-
- TxRep and Awl plugins has been modified to be compatible 
-  with latest Postgresql versions.
-  You should upgrade your sql database running the following command:
-  MySQL:
-  "ALTER TABLE `txrep` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';"
-  "ALTER TABLE `awl` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';"
-  "ALTER TABLE `awl` ADD last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;"
-  PostgreSQL:
-  "ALTER TABLE txrep RENAME COLUMN count TO msgcount;"
-  "ALTER TABLE awl RENAME COLUMN count TO msgcount;"
-  "ALTER TABLE awl ADD last_hit timestamp NOT NULL default CURRENT_TIMESTAMP;"
-
- body_part_scan_size 50000, rawbody_part_scan_size 500000 defaults added (Bug 6582)
-  These enable safer and faster scanning of large emails.
-
- ALL pseudo-header now returns decoded headers, so it's usage is consistent
-  with single header matching (:raw returns undecoded and folded like before).
-
- RegistryBoundaries did not load 20_aux_tlds.cf properly in older versions. 
-  Old hardcoded list is now removed and RB will print "no tlds defined, need
-  to run sa-update" unless it can find list from config files.
-
- Deprecated functions: Parser::is_delimited_regexp_valid(),
-  Parser::is_regexp_valid(), Util::regexp_remove_delimiters(),
-  Util::make_qr().  These all are combined into new Util::compile_regexp().
-
- DNSEval: add check_rbl_headers to check specific headers in rbl
-
- DNSEval: add check_rbl_ns_from to check against an rbl for dns servers
-
- HashBL: Add check_hashbl_bodyre, check_hashbl_emails, check_hashbl_uris,
-  hashbl_ignore
-
- ASN: Support IPv6 with asn_lookup_ipv6 (Bug 7211)
-
- sa-update: New option --httputil to force used download utility
-
- Add rules_matching() expression to meta rules
-
- Add tflags domains_only/ips_only to DNSEval.pm functions
-
- RelayCountry: Added new metadata: X-Spam-Countries-External (_RELAYCOUNTRYEXT_),
-  X-Spam-Countries-Auth (_RELAYCOUNTRYAUTH_), X-Spam-Countries-All (_RELAYCOUNTRYALL_)
-
- New tflag "nosubject" for 'body' rules, to stop matching the Subject
-  header which is part of the body text.
-
-Note for Users Upgrading to SpamAssassin 3.4.2
----------------------------------------------
-
- We now support SHA-512 and SHA-256 signatures for our rules updates.
-
- We may stop producing SHA-1 signatures in the near future so upgrading
-to 3.4.2 is important.  sa-update no longer uses these signatures.
-
-See https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7614
-
- freemail_import_whitelist_auth, freemail_import_def_whitelist_auth added (Bug 6451)
-
-
-New plugins
-----------
-
-There are four new plugins added with this release:
-
-  Mail::SpamAssassin::Plugin::HashBL
-
-The HashBL plugin is the interface to The Email Blocklist (EBL).
-The EBL is intended to filter spam that is sent from IP addresses 
-and domains that cannot be blocked without causing significant 
-numbers of false positives.
-
-  Mail::SpamAssassin::Plugin::ResourceLimits
-
-This plugin leverages BSD::Resource to assure your spamd child processes
-do not exceed specified CPU or memory limit. If this happens, the child
-process will die. See the BSD::Resource for more details.
-
-  Mail::SpamAssassin::Plugin::FromNameSpoof
-
-This plugin allows for detection of the From:name field being used to mislead 
-recipients into thinking an email is from another address.  The man page 
-includes examples and we expect to put test rules for this plugin into 
-rulesrc soon!
-
-  Mail::SpamAssassin::Plugin::Phishing
-
-This plugin finds uris used in phishing campaigns detected by
-OpenPhish (https://openphish.com) or PhishTank (https://phishtank.com) feeds.
-
-These plugins are disabled by default. To enable, uncomment
-the loadplugin configuration options in file v342.pre, or add it to
-some local .pre file such as local.pre .
-
-Notable changes
---------------
-
-For security reasons SSLv3 support has been removed from spamc(1).
-
-GeoIP2 support has been added to RelayCountry and URILocalBL plugins due
-to GeoIP legacy api deprecations.
-
-New configuration options
-------------------------
-
-A new template tag _DKIMSELECTOR_ that maps to the DKIM selector (the 's' tag) 
-from valid signatures has been added.
-
-A 'uri_block_cont' option to URILocalBL plugin to score uris per continent has been added.
-Possible continent codes are:
-af, as, eu, na, oc, sa for Africa, Asia, Europe, North America, 
-Oceania and South America.
-
-The 'country_db_type' and 'country_db_path' options has been added to be able 
-to choose in RelayCountry plugin between GeoIP legacy 
-(discontinued from 04/01/2018), GeoIP2, IP::Country::Fast 
-and IP::Country::DB_File.
-GeoIP legacy is still the default option but it will be deprecated 
-in future releases.
-
-A config option 'uri_country_db_path' has been added to be able to choose 
-in URILocalBL plugin between GeoIP legacy and new GeoIP2 api.
-
-A config option 'resource_limit_cpu' (default: 0 or no limit) has been added
-to configure how many cpu cycles are allowed on a child process before it dies.
-
-A config option 'resource_limit_mem' (default: 0 or no limit) has been added
-to configure the maximum number of bytes of memory allowed both for 
-(virtual) address space bytes and resident set size.
-
-A new config option 'report_wrap_width' (default: 70) has been added
-to set the wrap width for description lines in the X-Spam-Report header.
-
-Notable Internal changes
------------------------
-
-SpamAssassin can cope with new Net::DNS module versions.
-
-The "bytes" pragma has been remove from both core modules and plugins for
-better utf-8 compatibility, there has been also some other utf-8 related fixes.
-
-The spamc(1) client can now be build against OpenSSL 1.1.0.
-
-The test framework has been switched to Test::More module.
-
-Other updates
-------------
-
-A list of top-level domains in registrar boundaries was updated.
-
-
-Note for Users Upgrading to SpamAssassin 3.4.1
----------------------------------------------
-
- The TxRep plugin is now included and disabled by default for new installs.
-  To replace an existing AWL configuration with TxRep, follow the steps below:
-  - Disable AWL
-  - Enable TxRep
-  - Set txrep_factor equal to your previous AWL factor
-  - Set use_txrep to 1
-
-  For more detailed information and more configuration options, consult the
-  documentation in Mail::SpamAssassin::Plugin::TxRep.
-
- The $VALID_TLDS_RE global in registrar boundaries is deprecated but kept for 
-  third-party plugin compatibility.  It will become increasingly out of date
-  and may be removed in a future release. 
-
-  See lib/Mail/SpamAssassin/Plugin/FreeMail.pm for an example of the new way
-  to obtain a valid list of TLDs, i.e. 
-
-    $self->{main}->{registryboundaries}->{valid_tlds_re} 
-
- Mail::SpamAssassin::Util::RegistrarBoundaries is being replaced by 
-  Mail::SpamAssassin::RegistryBoundaries so that new TLDs can be updated via 
-  20_aux_tlds.cf delivered via sa-update.
-
-  ***3rd Party Plugin Authors, Please Note***
-  
-  The following functions will be removed in the next release after 3.4.1 
-  excepting any emergency break/fix releases immediately after 3.4.1:
-  
-  Mail::SpamAssassin::Util::RegistrarBoundaries::is_domain_valid 
-  Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain
-  Mail::SpamAssassin::Util::RegistrarBoundaries::split_domain 
-  Mail::SpamAssassin::Util::uri_to_domain 
-  
-  And the following variables will also be removed:
-  
-  Mail::SpamAssassin::Util::RegistrarBoundaries::US_STATES
-  Mail::SpamAssassin::Util::RegistrarBoundaries::THREE_LEVEL_DOMAINS
-  Mail::SpamAssassin::Util::RegistrarBoundaries::TWO_LEVEL_DOMAINS
-  Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS_RE
-  Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS
-  
-  
-  This change should only affect 3rd party plugin authors who will need to 
-  update their code to utilize Mail::SpamAssassin::RegistryBoundaries 
-  instead of the functions and variables in 
-  Mail::SpamAssassin::Util::RegistrarBoundaries and the function 
-  Mail::SpamAssassin::Util::uri_to_domain which are deprecated and will be 
-  removed.
-
-  For example, the $VALID_TLDS_RE global in registrar boundaries is 
-  deprecated but kept for third-party plugin compatibility.  It will become 
-  increasingly out of date and may be removed in a future release.
-
-  See lib/Mail/SpamAssassin/Plugin/FreeMail.pm for an example of the new way
-  to obtain a valid list of TLDs, i.e.
-
-    $self->{main}->{registryboundaries}->{valid_tlds_re}
-
- It is now recommended that users uncomment "normalize_charset 1" in
-  local.cf. It will break rules that depend on messages being in non-UTF8
-  encodings, but going forward this will enable more robust unicode rules and
-  will eventually become the default.
-
-
-
-Note for Users Upgrading to SpamAssassin 3.4.0
----------------------------------------------
-
- When Bayes classification is in use and messages are 'learned' as spam
-  or ham and stored in a database, the Bayes plugin generates internal
-  message IDs of learned messages and stores them in a 'seen' database to
-  avoid re-learning duplicates and accidental un-learning messages that
-  were not previously learned. With changes in bug 5185, the calculation
-  of message IDs in a bayes 'seen' database has changed, so new code can
-  no longer associate new messages with those learned before the change.
-
- Note that this change does not affect recognition of old tokens and the
-  classification algorithm, only duplicate detection and unlearning of old
-  messages is affected.
-
- Because of this change, if you use Bayes and you are upgrading from a
-  version prior to 3.4.0, you may consider wiping your Bayes database
-  and starting fresh.
-
- There is a new optional dependency on Net::Patricia to speed up lookups
-  on internal_networks, trusted_networks or msa_networks when these lists
-  contain a larger number of entries. Consider installing this module to
-  speed up classification.
-
- The minimal required version of NetAddr::IP was bumped to 4.010
-
- In addition to existing backends, the 3.4.0 introduces support for keeping
-  a Bayes database on a Redis server, either running locally, or accessed
-  over network. Similar to SQL backends, the database may be concurrently
-  used by several hosts running SpamAssassin.
-
- For more detail on these and other changes, please see the Announcement
-  file at:
-   http://svn.apache.org/repos/asf/spamassassin/branches/3.4/build/announcements/3.4.0.txt
-
-Note for Users Upgrading to SpamAssassin 3.3.0
-----------------------------------------------
-
- Rules are no longer included with SpamAssassin "out of the box".  You will
-  need to immediately run "sa-update", or download the additional rules .tgz
-  package and run "sa-update --install" with it, to get a ruleset.
-
- The BETA label has been taken off of the SpamAssassin SQL support.  Please
-  be aware that occasional changes may still be made to this area of the
-  code.  You should be sure to read this upgrade document each time you
-  install a new version to determine if any SQL updates need to be made to
-  your local installation.
-
- The DKIM plugin is now enabled by default for new installs, if the perl
-  module Mail::DKIM is installed.  However, installation of SpamAssassin
-  will not overwrite existing .pre configuration files, so to use DKIM when
-  upgrading from a previous release that did not use DKIM, a directive:
-
-    loadplugin Mail::SpamAssassin::Plugin::DKIM
-
-  will need to be uncommented in file "v312.pre", or added to some
-  other .pre file, such as local.pre.
-
-
-Note for Users Upgrading to SpamAssassin 3.2.0
-----------------------------------------------
-
- The "127/8" network, including 127.0.0.1, is now always implicitly part of
-  "trusted_networks" and "internal_networks".  It's impossible to remove these
-  from the trusted/internal sets, since if you cannot trust the host where
-  SpamAssassin is running, you cannot trust SpamAssassin itself.  If you
-  previously had "trusted_networks" and "internal_networks" lines listing those
-  hosts, you should now remove them, otherwise a minor (non-lint-error) warning
-  will be output.
-
- For ISPs -- version 3.2.0 now includes a new way to specify Mail Submission
-  Agents, relay hosts which accept mail from your own users and authenticates
-  them appropriately.  See the Mail::SpamAssassin::Conf manual page for the
-  "msa_networks" setting.
-
-
-Note for Users Upgrading to SpamAssassin 3.1.0
-----------------------------------------------
-
- A significant amount of core functionality has been moved into
-  plugins.  These include, AWL (auto-whitelist), DCC, Pyzor, Razor2,
-  SpamCop reporting and TextCat.  For information on configuring these
-  plugins please refer to their individual documentation:
-  perldoc Mail::SpamAssassin::Plugin::* (ie AWL, DCC, etc)
-
- There are now multiple files read to enable plugins in the
-  /etc/mail/spamassassin directory; previously only one, "init.pre" was
-  read.  Now both "init.pre", "v310.pre", and any other files ending
-  in ".pre" will be read.  As future releases are made, new plugins
-  will be added to new files named according to the release they're
-  added in.
-
- Due to license restrictions the DCC and Razor2 plugins are disabled
-  by default.  We encourage you to read the appropriate license
-  yourself and decide if you are able to re-enable the plugins for
-  your site.
-
- The use_auto_whitelist config option has been moved to a user config
-  option, this allows individual users to turn on or off whitelisting
-  regardless of what is set in the system config.  If you would like to
-  disable AWL (auto-whitelist) on a site-wide basis, then you can comment
-  out the plugin in "v310.pre".
-
- The bayes_auto_learn_threshold_* config options for bayes have moved
-  to a plugin.  In general the default should work just fine however
-  if you are interested in changing these values you should see:
-  perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold
-
- The AWL support for NDBM_File databases has been dropped, due to a
-  bug in that package which renders it useless (bug 4353).  It appears
-  that SDBM_File, the package which will be used instead, is fully
-  compatible with NDBM however, so this should be unnoticeable.
-
- The prefork algorithm for spamd has been changed.  In this version spamd
-  will attempt to keep a small number of "hot" child processes as busy as
-  possible, and keep any others as idle as possible, using something
-  similar to the Apache httpd server scaling algorithm. This reduces
-  memory usage and swapping. You can use the --round-robin switch for
-  spamd to disable this scaling algorithm, and the behaviour seen in the
-  3.0.x versions will be used instead, where all processes receive an
-  equal load and no scaling takes place.
-
- As of 3.1.0, in addition to the generic BayesSQL support (via
-  Mail::SpamAssassin::BayesStore::SQL) usable by multiple database
-  drivers there is now specific support for MySQL 4.1+ and
-  PostgreSQL.  This support is based on non-standard features present
-  in both database servers that allow for various performance boosts.
-
-  If you were using the previous BayesSQL support with MySQL, and
-  already have MySQL 4.1+ installed you can begin using the new module
-  immediately by replacing the bayes_store_module line in your
-  configuration with:  Mail::SpamAssassin::BayesStore::MySQL
-
-  We do however recommend that you switch your MySQL tables over to
-  InnoDB for better data integrity and multi user support.  You can
-  most often do this via a simple ALTER TABLE command, refer to the
-  MySQL documentation for more information.
-
-  If you were previously using PostgreSQL for your bayes database then
-  we STRONGLY recommend switching to the PostgreSQL specific backend:
-  Mail::SpamAssassin::BayesStore::PgSQL
-  To switch to this backend you should first run sa-learn --backup to
-  make a backup of your existing data and then drop and recreate the
-  database following the instructions in sql/README.bayes.  Then you
-  can restore the database with sa-learn --restore.  If you have
-  multiple users then you will have to run --backup and --restore for
-  each user to fully restore the database.
-
- See http://wiki.apache.org/spamassassin/UpgradeTo310 for a
-  supplementary list of upgrade notes.  It will be updated with any
-  upgrade notes not captured in this document.
-
- Finally, this document is likely not complete.  Other configuration
-  options/arguments may have changed from older versions, etc.  It would
-  be good to double-check any custom configuration options to make sure
-  they're still valid.  This could be as simple as running "spamassassin
-  --lint", or more complex, as required by the environment.
-
-
-Note for Users Upgrading to SpamAssassin 3.0.x
----------------------------------------------
-
- The SpamAssassin 2.6x release series was the last set of releases to
-  officially support perl versions earlier than perl 5.6.1.  If you are
-  using an earlier version of perl, you will need to upgrade before you
-  can use the 3.0.0 version of SpamAssassin.  You will also want to make
-  sure that you have the appropriate versions of required and optional
-  modules as they may have changed from old versions.  The INSTALL
-  document has the modules and version requirements listed.
-
- See http://wiki.apache.org/spamassassin/UpgradeTo300 for a
-  supplementary list of upgrade notes.  It will be updated with any
-  upgrade notes not captured in this document.
-
- SpamAssassin 3.0.0 has a significantly different API (Application Program
-  Interface) from the 2.x series of code.  This means that if you use
-  SpamAssassin through a third-party utility (milter, etc,) you need to make
-  sure you have an updated version which supports 3.0.0.  See
-  http://wiki.apache.org/spamassassin/UpgradeTo300 for information about
-  third-party software.
-
- The --auto-whitelist, --whitelist and -a options for "spamd" and
-  "spamassassin" to turn on the auto-whitelist have been removed and
-  replaced by the "use_auto_whitelist" configuration option which is
-  also now turned on by default.
-
- The --virtual-config switch for spamd had to be dropped, due to licensing
-  issues.  It is replaced by the --virtual-config-dir switch.
-
- The "rewrite_subject" and "subject_tag" configuration options were
-  deprecated and are now removed. Instead, using "rewrite_header Subject
-  [your desired setting]".  e.g.
-
-    rewrite_subject 1
-    subject_tag ****SPAM(_SCORE_)****
-
-  becomes
-
-    rewrite_header Subject ****SPAM(_SCORE_)****
-
- The "sa-learn --rebuild" command has been deprecated; please use
-  "sa-learn --sync" instead.  The --rebuild option will remain temporarily
-  for backward compatibility.
-
- The Bayesian storage modules have been completely re-written and now
-  include Berkeley DB (DBM) storage as well as SQL based storage (see
-  sql/README.bayes for more information).  In addition, a new format
-  has been introduced for the bayes database that stores tokens in fixed
-  length hashes (Bayes v3).  All DBM databases should be automatically
-  converted to this new format the first time they are opened for write.
-  You can manually perform the upgrade by running "sa-learn --sync"
-  from the command line.
-
-  Due to the database format change, you will want to do something like
-  this when upgrading:
-
-  - stop running spamassassin/spamd (ie: you don't want it to be running
-    during the upgrade)
-  - run "sa-learn --rebuild", this will sync your journal.  if you skip
-    this step, any data from the journal will be lost when the DB is
-    upgraded.
-  - upgrade SA to 3.0.0
-  - run "sa-learn --sync", which will cause the db format to be upgraded.
-    if you want to see what is going on, you can add the "-D" option.
-  - test the new database by running some sample mails through
-    SpamAssassin, and/or at least running "sa-learn --dump" to make sure
-    the data looks valid.
-  - start running spamassassin/spamd again
-
-  If, instead of uprading your Bayes database, you want to wipe it and
-  start fresh, you can run "sa-learn --clear" to safely remove your
-  Bayes database files.  If the --clear command issues an error then
-  you can simply delete the Bayes database files ("bayes_*") while SA
-  is not running; SpamAssassin will recreate them in the current
-  format when it runs.
-
- "spamd" now has a default max-children setting of 5; no more than 5
-  child scanner processes will be run in parallel.  Previously, there was
-  no default limit unless you specified the "-m" switch when starting
-  spamd.
-
- If you are using a UNIX machine with all database files on local disks,
-  and no sharing of those databases across NFS filesystems, you can use a
-  more efficient, but non-NFS-safe, locking mechanism.   Do this by adding
-  the line "lock_method flock" to the /etc/mail/spamassassin/local.cf
-  file. This is strongly recommended if you're not using NFS, as it is
-  much faster than the NFS-safe locker.
-
- Please note that the use of the following commandline parameters for
-  spamassassin and spamd have been deprecated and may be removed in
-  upcoming versions of SpamAssassin.  Please discontinue usage of these
-  options:
-
-    in the 2.6x series:		--add-from, --pipe, -F, --stop-at-threshold,
-    				-S, -P (spamassassin only)
-    in the 3.0.x series:	--auto-whitelist, -a, --whitelist-factory, -M,
-    				--warning-from, -w, --log-to-mbox, -l
-
- user_scores_sql_table is no longer supported.  If you need to use a table
-  name, other than the default, create a custom query using the
-  user_scores_sql_custom_query config option.
-
- SpamAssassin runs in "taint mode" by default for improved security.
-  Certain third-party modules may be incompatible with taint mode.
-
- 2.6x deprecated the use of the "check_bayes_db" script, and it
-  has been removed in 3.0.0.  Please see the sa-learn man/pod
-  documentation for more info.
-
- Finally, this document is likely not complete.  Other configuration
-  options/arguments may have changed from older versions, etc.  It would
-  be good to double-check any custom configuration options to make sure
-  they're still valid.  This could be as simple as running "spamassassin
-  --lint", or more complex, as required by the environment.
-
-  An example: "require_version <version>" hasn't changed itself, but the
-  internal version representation is now "x.yyyzzz" instead of "x.yz"
-  which could cause issues if "require_version 3.00" is expected to work
-  (it won't, it needs to be "require_version 3.000000").
-
-
-Note for Users Upgrading from SpamAssassin 2.5x
-----------------------------------------------
-
- Due to major reliability shortcomings in the database support libraries
-  other than DB_File, we now require that the DB_File module be installed
-  to use SpamAssassin's Bayes rules.
-
-  SpamAssassin will still work without DB_File installed, but the Bayes
-  support will be disabled.
-
-  If you install DB_File and wish to import old Bayes database data, each
-  user with a Bayes db should run "sa-learn --import" to copy old entries
-  from the other formats into a new DB_File file.
-
-  Due to the database library change, and the change to the database
-  format itself, you will want to do something like this when upgrading:
-
-  - stop running spamassassin/spamd (ie: you don't want it to be running
-    during the upgrade)
-  - run "sa-learn --rebuild", this will sync your journal.  if you skip
-    this step, any data from the journal will be lost when the DB is
-    upgraded.
-  - install DB_File module if necessary
-  - upgrade SA to 3.0.0
-  - if you were using another database module previously, run "sa-learn
-    --import" to migrate the data into new DB_File files
-  - run "sa-learn --sync", which will cause the db format to be upgraded.
-    if you want to see what is going on, you can add the "-D" option.
-  - test the new database by running some sample mails through
-    SpamAssassin, and/or at least running "sa-learn --dump" to make sure
-    the data looks valid.
-  - start running spamassassin/spamd again
-
-  Obviously the steps will be different depending on your environment, but
-  you get the idea. :)
-
-
-Note For Users Upgrading From SpamAssassin 2.3x or 2.4x
-------------------------------------------------------
-
- SpamAssassin no longer includes code to handle local mail delivery, as
-  it was not reliable enough, compared to procmail.  So now, if you relied
-  on spamassassin to write the mail into your mail folder, you'll have to
-  change your setup to use procmail as detailed below.  If you used
-  spamassassin to filter your mail and then something else wrote it into a
-  folder for you, then you should be fine.
-
- Support for versions of the optional Mail::Audit module is no longer
-  included.
-
- The default mode of tagging (which used to be ***SPAM*** in the subject
-  line) no longer takes place.  Instead the message is rewritten. If an
-  incoming message is tagged as spam, instead of modifying the original
-  message, SpamAssassin will create a new report message and attach the
-  original message as a message/rfc822 MIME part (ensuring the original
-  message is completely preserved and easier to recover).  If you do not
-  want to modify the body of incoming spam, use the "report_safe" option.
-  The "report_header" and "defang_mime" options have been removed as a
-  result.
+- sa-update: New --score-multiplier, --score-limit, and --forcemirror
+  options added.
+    #1 forcemirror: forces sa-update to use a specific mirror server,
+    #2 score-multiplier: adjust all scores from update channel by a
+     given multiplier to quickly level set scores to match your
+     preferred threshold
+    #3 score-limit adjusts all scores from update channel over a
+     specified limit to a new limit
+
+- New dns_options "nov4" and "nov6" added.  IMPORTANT:; You must set
+  nov6 if your DNS resolver is filtering IPv6 AAAA replies.
+
+- API: Added Message::get_pristine_body_digest(),
+  Message::get_msgid(), and Message::generate_msgid()
+  functions. Removed deprecated private Plugin::Bayes::get_msgid()
+  function.
+
+- Bayes and TxRep seen Message-ID tracking hashing method changed.  No
+  actions are required. If re-learning some old messages, they might
+  be learned twice but old IDs should expire automatically.
+
+- report_charset defaults now to UTF-8.
+
+- Meta rules inherit net tflag setting from dependencies (Bug 7735)
+
+- BodyEval: Added plaintext_body_sig_ratio eval rules for the first
+  text/plain MIME part's body and signature length ratio.
+
+- API: Now supports multiple calls of $pms->test_log() for
+  rules. Added $pms->check_cleanup() to finalize tags, reports,
+  etc. Deprecated internal $pms->{test_log_msgs}, renamed to
+  $pms->{test_logs}. Deprecated $pms->clear_test_state() as it is not
+  needed anymore. $pms->test_log() now accepts $rulename as second
+  argument.
+
+- URIDNSBL: urirhsbl/urirhssub rules support "notrim" tflag to force
+  querying the full hostname instead of just the domain. This works
+  best if the specific uribl supports this mode. (Bug 7835)
+
+- Removed deprecated --auth-ident and --ident-timeout options from
+  spamd
+
+- MIMEHeader: support matching ALL header, tflags range, and tflags
+  concat
+
+- Autolearn: add new tflags autolearn_header/autolearn_body. These can
+  force a rule to count as header or body points accordingly. (Bug
+  7907)
+
+- SSL client certificate support for spamc/spamd is now easier. New
+  spamc options --ssl-cert, --ssl-key, --ssl-ca-file, and
+  --ssl-ca-path. New spamd options --ssl-verify, --ssl-ca-file, and
+  --ssl-ca-path (Bug 7267)
+
+- ArchiveIterator now automatically uncompressed all gzip, bzip2, xz,
+  lz4, lzip, and lzo-compressed files (Bug 7598). These apply to
+  spamassassin and sa-learn commands also.
+
+- New DMARC policy check plugin.
+
+- New project maintained DecodeShortURLs plugin which may not be
+  directly compatible with rules from other third party plugins. See
+  The plugin documentation for configuration and rule format.
+
+- Installing module Net::CIDR::Lite allows the use of dash-separated
+  IP range format (e.g. 192.168.1.1-192.168.255.255) for NetSet tables
+  including internal_networks, trusted_networks, msa_networks, and
+  uri_local_cidr.
+
+- The HashBL plugin in 342.pre is now enabled by default.
+
+- HeaderEval check_for_unique_subject_id() function is deprecated.

 (end of UPGRADE)
-
-//vim:tw=74:
--- a/upstream/USAGE
+++ b/upstream/USAGE
@ -84,7 +84,7 @@ If you use procmail, or haven't decided on any of the above examples:
 If you want to use SpamAssassin site-wide:

  - take a look at the notes on the Wiki website, currently at
-    <http://wiki.apache.org/spamassassin/UsingSiteWide>.  You will probably
+    <https://wiki.apache.org/spamassassin/UsingSiteWide>.  You will probably
    want to use 'spamd' (see below).   You may want to investigate the
    new Apache mod_perl module, in the 'spamd-apache2' directory, too.

@ -101,11 +101,11 @@ If you want to use SpamAssassin site-wide:
    add the line 'DROPPRIVS=yes' at the top of the file.


-The Auto-Whitelist
+The Auto-Welcomelist
 ------------------

-The auto-whitelist is enabled using the 'use_auto_whitelist' option.
-(See http://wiki.apache.org/spamassassin/AutoWhitelist for details on
+The auto-welcomelist is enabled using the 'use_auto_welcomelist' option.
+(See https://wiki.apache.org/spamassassin/AutoWelcomelist for details on
 how it works, if you're curious.)


@ -113,15 +113,6 @@ Other Installation Notes
 ------------------------

  
-  - Hashcash is a useful system; it requires that senders exercise a
-    CPU-intensive task before they can send mail to you, so we give that
-    some bonus points.  However, it requires that you list what addresses
-    you expect to receive mail for, by adding 'hashcash_accept' lines to
-    your ~/.spamassassin/user_prefs or /etc/mail/spamassassin/local.cf
-    files.  See the Mail::SpamAssassin::Plugin::Hashcash manual page for
-    details on how to specify these.
-
-
  - SpamAssassin now uses a temporary file in /tmp (or $TMPDIR, if that's
    set in the environment) for Pyzor and DCC checks.  Make sure that this
    directory is either (a) not writable by other users, or (b) not shared
@ -161,7 +152,7 @@ Other Installation Notes
  - A very handy new feature is SPF support, which allows you to check
    that the message sender is permitted by their domain to send from the
    IP address used.  This has the potential to greatly cut down on mail
-    forgery.  (see http://spf.pobox.com/ for more details.)
+    forgery.


  - MDaemon users should add this line to their "local.cf" file:
@ -219,7 +210,7 @@ Other Installation Notes
    to do this, take a look here [1] for a simple forwarding-based
    alternative.

-      [1]: http://wiki.apache.org/spamassassin/SpamTrapping
+      [1]: https://wiki.apache.org/spamassassin/SpamTrapping


  - Scores and other user preferences can now be loaded from, and Bayes
@ -242,7 +233,7 @@ Other Installation Notes


  - Lots more ways to integrate SpamAssassin can be read at
-    http://wiki.SpamAssassin.org/ .
+    https://wiki.apache.org/spamassassin/ .


 (end of USAGE)
--- a/upstream/build/mkrules
+++ b/upstream/build/mkrules
@ -239,6 +239,7 @@ sub lint_rule_text {
  my $pretext = q{
    loadplugin Mail::SpamAssassin::Plugin::Check
    loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
+    util_rb_tld com # skip "need to run sa-update" warn
    use_bayes 0
  };

--- a/upstream/build/sha256sum.pl
+++ b/upstream/build/sha256sum.pl
@ -1,7 +1,7 @@
 #!/usr/bin/perl

 BEGIN {
-  require Digest::SHA; import Digest::SHA qw(sha256_hex sha512_hex);
+  require Digest::SHA; Digest::SHA->import(qw(sha256_hex sha512_hex));
 }

 $/=undef;
--- a/upstream/build/sha512sum.pl
+++ b/upstream/build/sha512sum.pl
@ -1,7 +1,7 @@
 #!/usr/bin/perl

 BEGIN {
-  require Digest::SHA; import Digest::SHA qw(sha256_hex sha512_hex);
+  require Digest::SHA; Digest::SHA->import(qw(sha256_hex sha512_hex));
 }

 $/=undef;
--- a/upstream/ldap/README
+++ b/upstream/ldap/README
@ -6,9 +6,9 @@ SpamAssassin can now load users' score files from an LDAP server.  The concept
 here is to have a web application (PHP/perl/ASP/etc.) that will allow users to
 be able to update their local preferences on how SpamAssassin will filter their
 e-mail.  The most common use for a system like this would be for users to be
-able to update the white list of addresses (whitelist_from) without the need
-for them to update their $HOME/.spamassassin/user_prefs file.  It is also quite
-common for users listed in /etc/passwd to not have a home directory,
+able to update the white list of addresses (welcomelist_from, previously whitelist_from) 
+without the need for them to update their $HOME/.spamassassin/user_prefs file.  
+It is also quite common for users listed in /etc/passwd to not have a home directory,
 therefore, the only way to have their own local settings would be through a
 database or LDAP server.

@ -38,8 +38,8 @@ Examples:

 If the user_scores_dsn option does not exist, SpamAssassin will not attempt
 to use an LDAP server for retrieving users' preferences. Note that this will
-NOT look for test rules, only local scores, whitelist_from(s), and
-required_score.
+NOT look for test rules, only local scores, welcomelist_from(s) (previously whitelist_from), 
+and required_score.

 Requirements
 ------------
@ -68,7 +68,7 @@ our inetOrgPerson subclass.
 Here's an example for openldap's /etc/openldap/schema/inetorgperson.schema :

  # SpamAssassin
-  # see http://SpamAssassin.org/ .
+  # see https://SpamAssassin.org/ .
  attributetype ( 2.16.840.1.113730.3.1.217
          NAME 'spamassassin'
          DESC 'SpamAssassin user preferences settings'
@ -97,9 +97,6 @@ If you do not see the above text, then the LDAP query was not successful, and
 you should see any error messages reported. <username> should be the user
 that was passed to spamd and is usually the user executing spamc.

-If you need to set up LDAP, a good guide is here:
-http://yolinux.com/TUTORIALS/LinuxTutorialLDAP.html
-
 To test LDAP support using the SpamAssassin test suite, you need to
 perform a little bit of manual configuration first.  See the file
 "ldap/README.testing" for details.
@ -111,6 +108,6 @@ operation of LDAP support may change at any time with future releases of SA.
 ******

 Please send any comments to <kris at koehntopp.de> and file bugs via
-<http://issues.apache.org/SpamAssassin/>.
+<https://issues.apache.org/SpamAssassin/>.

 Kristian Köhntopp
--- a/upstream/lib/Mail/SpamAssassin.pm
+++ b/upstream/lib/Mail/SpamAssassin.pm
@ -40,7 +40,7 @@ Mail::SpamAssassin - Spam detector and markup engine
 =head1 DESCRIPTION

 Mail::SpamAssassin is a module to identify spam using several methods
-including text analysis, internet-based realtime blacklists, statistical
+including text analysis, internet-based realtime blocklists, statistical
 analysis, and internet-based hashing algorithms.

 Using its rule base, it uses a wide range of heuristic tests on mail
@ -64,7 +64,7 @@ use warnings;
 # use bytes;
 use re 'taint';

-require 5.006_001;
+require v5.14.0;

 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Constants;
@ -87,11 +87,10 @@ use Time::HiRes qw(time);
 use Cwd;
 use Config;

-our $VERSION = "3.004006";      # update after release (same format as perl $])
+our $VERSION = "4.000000";      # update after release (same format as perl $])
 #our $IS_DEVEL_BUILD = 1;        # 1 for devel build 
 our $IS_DEVEL_BUILD = 0;        # 0 for release versions including rc & pre releases

-
 # Used during the prerelease/release-candidate part of the official release
 # process. If you hacked up your SA, you should add a version_tag to your .cf
 # files; this variable should not be modified.
@ -101,18 +100,18 @@ our @ISA = qw();

 # SUB_VERSION is now just <yyyy>-<mm>-<dd>
 our $SUB_VERSION = 'svnunknown';
-if ('$LastChangedDate: 2021-04-09 19:54:52 +1200 (Fri, 09 Apr 2021) $' =~ ':') {
-  # Subversion keyword "$LastChangedDate: 2021-04-09 19:54:52 +1200 (Fri, 09 Apr 2021) $" has been successfully expanded.
+if ('$LastChangedDate: 2022-12-14 02:29:30 +0000 (Wed, 14 Dec 2022) $' =~ ':') {
+  # Subversion keyword "$LastChangedDate: 2022-12-14 02:29:30 +0000 (Wed, 14 Dec 2022) $" has been successfully expanded.
  # Doesn't happen with automated launchpad builds:
  # https://bugs.launchpad.net/launchpad/+bug/780916
-  $SUB_VERSION = (split(/\s+/,'$LastChangedDate: 2021-04-09 19:54:52 +1200 (Fri, 09 Apr 2021) $ updated by SVN'))[1];
+  $SUB_VERSION = (split(/\s+/,'$LastChangedDate: 2022-12-14 02:29:30 +0000 (Wed, 14 Dec 2022) $ updated by SVN'))[1];
 }


 if (defined $IS_DEVEL_BUILD && $IS_DEVEL_BUILD) {
-  if ('$LastChangedRevision: 1888548 $' =~ ':') {
-    # Subversion keyword "$LastChangedRevision: 1888548 $" has been successfully expanded.
-    push(@EXTRA_VERSION, ('r' . qw{$LastChangedRevision: 1888548 $ updated by SVN}[1]));
+  if ('$LastChangedRevision: 1905971 $' =~ ':') {
+    # Subversion keyword "$LastChangedRevision: 1905971 $" has been successfully expanded.
+    push(@EXTRA_VERSION, ('r' . qw{$LastChangedRevision: 1905971 $ updated by SVN}[1]));
  } else {
    push(@EXTRA_VERSION, ('r' . 'svnunknown'));
  }
@ -428,8 +427,8 @@ sub new {
    $self->timer_enable();
  }

-  $self->{conf} ||= new Mail::SpamAssassin::Conf ($self);
-  $self->{plugins} = Mail::SpamAssassin::PluginHandler->new ($self);
+  $self->{conf} ||= Mail::SpamAssassin::Conf->new($self);
+  $self->{plugins} = Mail::SpamAssassin::PluginHandler->new($self);

  $self->{save_pattern_hits} ||= 0;

@ -469,7 +468,7 @@ sub create_locker {
  # for slow but safe, by keeping in quotes
  eval '
    use Mail::SpamAssassin::Locker::'.$class.';
-    $self->{locker} = new Mail::SpamAssassin::Locker::'.$class.' ($self);
+    $self->{locker} = Mail::SpamAssassin::Locker::'.$class.'->new($self);
    1;
  ' or do {
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
@ -728,6 +727,9 @@ sub init_learner {
    if (exists $opts->{$k}) { $self->{$v} = $opts->{$k}; }
  }

+  # Set flag which can be checked from plugins etc
+  $self->{learning} = 1;
+
  return \%ret;
 }

@ -757,6 +759,7 @@ Finish learning.
 sub finish_learner {
  my $self = shift;
  $self->{bayes_scanner}->force_close(1) if $self->{bayes_scanner};
+  delete $self->{learning};
  1;
 }

@ -832,7 +835,7 @@ sub signal_user_changed {
  $self->{bayes_scanner}->finish() if $self->{bayes_scanner};
  if ($self->{conf}->{use_bayes}) {
      require Mail::SpamAssassin::Bayes;
-      $self->{bayes_scanner} = new Mail::SpamAssassin::Bayes ($self);
+      $self->{bayes_scanner} = Mail::SpamAssassin::Bayes->new($self);
  } else {
      delete $self->{bayes_scanner} if $self->{bayes_scanner};
  }
@ -949,73 +952,84 @@ sub revoke_as_spam {

 ###########################################################################

-=item $f->add_address_to_whitelist ($addr, $cli_p)
+=item $f->add_address_to_welcomelist ($addr, $cli_p)
+
+Previously add_address_to_whitelist which will work interchangeably until 4.1.

 Given a string containing an email address, add it to the automatic
-whitelist database.
+welcomelist database.

 If $cli_p is set then underlying plugin may give visual feedback on additions/failures.

 =cut

-sub add_address_to_whitelist {
+sub add_address_to_welcomelist {
  my ($self, $addr, $cli_p) = @_;

-  $self->call_plugins("whitelist_address", { address => $addr,
+  $self->call_plugins("welcomelist_address", { address => $addr,
                                             cli_p => $cli_p });
 }
+*add_address_to_whitelist = \&add_address_to_welcomelist; # removed in 4.1

 ###########################################################################

-=item $f->add_all_addresses_to_whitelist ($mail, $cli_p)
+=item $f->add_all_addresses_to_welcomelist ($mail, $cli_p)
+
+Previously add_all_addresses_to_whitelist which will work interchangeably until 4.1.

 Given a mail message, find as many addresses in the usual headers (To, Cc, From
-etc.), and the message body, and add them to the automatic whitelist database.
+etc.), and the message body, and add them to the automatic welcomelist database.

 If $cli_p is set then underlying plugin may give visual feedback on additions/failures.

 =cut

-sub add_all_addresses_to_whitelist {
+sub add_all_addresses_to_welcomelist {
  my ($self, $mail_obj, $cli_p) = @_;

  foreach my $addr ($self->find_all_addrs_in_mail ($mail_obj)) {
-    $self->call_plugins("whitelist_address", { address => $addr,
+    $self->call_plugins("welcomelist_address", { address => $addr,
                                               cli_p => $cli_p });
  }
 }
+*add_all_addresses_to_whitelist = \&add_all_addresses_to_welcomelist; # removed in 4.1

 ###########################################################################

-=item $f->remove_address_from_whitelist ($addr, $cli_p)
+=item $f->remove_address_from_welcomelist ($addr, $cli_p)
+
+Previously remove_address_from_whitelist which will work interchangeably until 4.1.

 Given a string containing an email address, remove it from the automatic
-whitelist database.
+welcomelist database.

 If $cli_p is set then underlying plugin may give visual feedback on additions/failures.

 =cut

-sub remove_address_from_whitelist {
+sub remove_address_from_welcomelist {
  my ($self, $addr, $cli_p) = @_;

  $self->call_plugins("remove_address", { address => $addr,
                                          cli_p => $cli_p });
 }
+*remove_address_from_whitelist = \&remove_address_from_welcomelist; # removed in 4.1

 ###########################################################################

-=item $f->remove_all_addresses_from_whitelist ($mail, $cli_p)
+=item $f->remove_all_addresses_from_welcomelist ($mail, $cli_p)
+
+Previously remove_all_addresses_from_whitelist which will work interchangeably until 4.1.

 Given a mail message, find as many addresses in the usual headers (To, Cc, From
-etc.), and the message body, and remove them from the automatic whitelist
+etc.), and the message body, and remove them from the automatic welcomelist
 database.

 If $cli_p is set then underlying plugin may give visual feedback on additions/failures.

 =cut

-sub remove_all_addresses_from_whitelist {
+sub remove_all_addresses_from_welcomelist {
  my ($self, $mail_obj, $cli_p) = @_;

  foreach my $addr ($self->find_all_addrs_in_mail ($mail_obj)) {
@ -1023,30 +1037,36 @@ sub remove_all_addresses_from_whitelist {
                                            cli_p => $cli_p });
  }
 }
+*remove_all_addresses_from_whitelist = \&remove_all_addresses_from_welcomelist; # removed in 4.1

 ###########################################################################

-=item $f->add_address_to_blacklist ($addr, $cli_p)
+=item $f->add_address_to_blocklist ($addr, $cli_p)
+
+Previously add_address_to_blacklist which will work interchangeably until 4.1.

 Given a string containing an email address, add it to the automatic
-whitelist database with a high score, effectively blacklisting them.
+welcomelist database with a high score, effectively blocklisting them.

 If $cli_p is set then underlying plugin may give visual feedback on additions/failures.

 =cut

-sub add_address_to_blacklist {
+sub add_address_to_blocklist {
  my ($self, $addr, $cli_p) = @_;
-  $self->call_plugins("blacklist_address", { address => $addr,
+  $self->call_plugins("blocklist_address", { address => $addr,
                                             cli_p => $cli_p });
 }
+*add_address_to_blacklist = \&add_address_to_blocklist; # removed in 4.1

 ###########################################################################

-=item $f->add_all_addresses_to_blacklist ($mail, $cli_p)
+=item $f->add_all_addresses_to_blocklist ($mail, $cli_p)
+
+Previously add_all_addresses_to_blacklist which will work interchangeably until 4.1.

 Given a mail message, find addresses in the From headers and add them to the
-automatic whitelist database with a high score, effectively blacklisting them.
+automatic welcomelist database with a high score, effectively blocklisting them.

 Note that To and Cc addresses are not used.

@ -1054,23 +1074,26 @@ If $cli_p is set then underlying plugin may give visual feedback on additions/fa

 =cut

-sub add_all_addresses_to_blacklist {
+sub add_all_addresses_to_blocklist {
  my ($self, $mail_obj, $cli_p) = @_;

  $self->init(1);

  my @addrlist;
  my @hdrs = $mail_obj->get_header('From');
-  if ($#hdrs >= 0) {
-    push (@addrlist, $self->find_all_addrs_in_line (join (" ", @hdrs)));
+  foreach my $hdr (@hdrs) {
+    my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr);
+    foreach my $addr (@addrs) {
+      push @addrlist, $addr->{address} if defined $addr->{address};
+    }
  }

  foreach my $addr (@addrlist) {
-    $self->call_plugins("blacklist_address", { address => $addr,
+    $self->call_plugins("blocklist_address", { address => $addr,
                                               cli_p => $cli_p });
  }
-
 }
+*add_all_addresses_to_blacklist = \&add_all_addresses_to_blocklist; # removed in 4.1

 ###########################################################################

@ -1158,13 +1181,12 @@ sub remove_spamassassin_markup {
  my $hdrs = $mail_obj->get_pristine_header();
  my $body = $mail_obj->get_pristine_body();

-  # remove DOS line endings
-  $hdrs =~ s/\r//gs;
+  # force \n for line-ending processing temporarily
+  $hdrs =~ s/\015?\012/\n/gs;
+  $body =~ s/\015?\012/\n/gs;

  # unfold SA added headers, but not X-Spam-Prev headers ...
-  $hdrs = "\n".$hdrs;   # simplifies regexp below
-  1 while $hdrs =~ s/(\nX-Spam-(?!Prev).+?)\n[ \t]+(\S.*\n)/$1 $2/g;
-  $hdrs =~ s/^\n//;
+  1 while $hdrs =~ s/((?:^|\n)X-Spam-(?!Prev).+?)\n[ \t]+(\S.*\n)/$1 $2/g;

 ###########################################################################
  # Backward Compatibility, pre 3.0.x.
@ -1215,14 +1237,11 @@ sub remove_spamassassin_markup {
  }

  # remove any other X-Spam headers we added, will be unfolded
-  $hdrs = "\n".$hdrs;   # simplifies regexp below
-  1 while $hdrs =~ s/\nX-Spam-.*\n/\n/g;
-  $hdrs =~ s/^\n//;
+  1 while $hdrs =~ s/(^|\n)X-Spam-.*\n/$1/g;

-  # re-add DOS line endings
-  if ($mail_obj->{line_ending} ne "\n") {
-    $hdrs =~ s/\r?\n/$mail_obj->{line_ending}/gs;
-  }
+  # force original message line endings
+  $hdrs =~ s/\n/$mail_obj->{line_ending}/gs;
+  $body =~ s/\n/$mail_obj->{line_ending}/gs;

  # Put the whole thing back together ...
  return join ('', $mbox, $hdrs, $body);
@ -1235,8 +1254,8 @@ sub remove_spamassassin_markup {
 Read a configuration file and parse user preferences from it.

 User preferences are as defined in the C<Mail::SpamAssassin::Conf> manual page.
-In other words, they include scoring options, scores, whitelists and
-blacklists, and so on, but do not include rule definitions, privileged
+In other words, they include scoring options, scores, welcomelists and
+blocklists, and so on, but do not include rule definitions, privileged
 settings, etc. unless C<allow_user_rules> is enabled; and they never include
 the administrator settings.

@ -1261,7 +1280,8 @@ sub read_scoreonly_config {

  $text = "file start $filename\n" . $text;
  # add an extra \n in case file did not end in one.
-  $text .= "\nfile end $filename\n";
+  $text .= "\n" unless $text =~ /\n\z/;
+  $text .= "file end $filename\n";

  $self->{conf}->{main} = $self;
  $self->{conf}->parse_scores_only ($text);
@ -1326,7 +1346,7 @@ sub load_scoreonly_ldap {
 =item $f->set_persistent_address_list_factory ($factoryobj)

 Set the persistent address list factory, used to create objects for the
-automatic whitelist algorithm's persistent-storage back-end.  See
+automatic welcomelist algorithm's persistent-storage back-end.  See
 C<Mail::SpamAssassin::PersistentAddrList> for the API these factory objects
 must implement, and the API the objects they produce must implement.

@ -1499,7 +1519,7 @@ sub lint_rules {
  $self->{dont_copy_prefs} = $olddcp;       # revert back to previous

  # bug 5048: override settings to ensure a faster lint
-  $self->{'conf'}->{'use_auto_whitelist'} = 0;
+  $self->{'conf'}->{'use_auto_welcomelist'} = 0;
  $self->{'conf'}->{'bayes_auto_learn'} = 0;

  my $mail = $self->parse(\@testmsg, 1, { master_deadline => undef });
@ -1531,7 +1551,6 @@ sub finish {
  $self->call_plugins("finish_tests", { conf => $self->{conf},
                                        main => $self });

-  $self->{conf}->finish(); delete $self->{conf};
  $self->{plugins}->finish(); delete $self->{plugins};

  if ($self->{bayes_scanner}) {
@ -1541,6 +1560,8 @@ sub finish {

  $self->{resolver}->finish()  if $self->{resolver};

+  $self->{conf}->finish(); delete $self->{conf};
+
  $self->timer_end("finish");
  %{$self} = ();
 }
@ -1662,6 +1683,11 @@ sub init {
  # Note that this PID has run init()
  $self->{_initted} = $$;

+  # if spamd or other forking, wait for spamd_child_init
+  if (!$self->{skip_prng_reseeding}) {
+    $self->set_global_state_dir();
+  }
+
  #fix spamd reading root prefs file
  if (!defined $use_user_pref) {
    $use_user_pref = 1;
@ -1737,10 +1763,18 @@ sub init {
  }

  if ($self->{pre_config_text}) {
-    $self->{config_text} = $self->{pre_config_text} . $self->{config_text};
+    $self->{pre_config_text} .= "\n" unless $self->{pre_config_text} =~ /\n\z/;
+    $self->{config_text} = "file start (pre_config_text)\n".
+                           $self->{pre_config_text}.
+                           "file end (pre_config_text)\n".
+                           $self->{config_text};
  }
  if ($self->{post_config_text}) {
-    $self->{config_text} .= $self->{post_config_text};
+    $self->{post_config_text} .= "\n" unless $self->{post_config_text} =~ /\n\z/;
+    $self->{config_text} .= "\n" unless $self->{config_text} =~ /\n\z/;
+    $self->{config_text} .= "file start (post_config_text)\n".
+                            $self->{post_config_text}.
+                            "file end (post_config_text)\n";
  }

  if ($self->{config_text} !~ /\S/) {
@ -1771,7 +1805,7 @@ sub init {
  # Initialize the Bayes subsystem
  if ($self->{conf}->{use_bayes}) {
      require Mail::SpamAssassin::Bayes;
-      $self->{bayes_scanner} = new Mail::SpamAssassin::Bayes ($self);
+      $self->{bayes_scanner} = Mail::SpamAssassin::Bayes->new($self);
  }
  $self->{'learn_to_journal'} = $self->{conf}->{bayes_learn_to_journal};

@ -1796,6 +1830,21 @@ sub init {
  # should be called only after configuration has been parsed
  $self->{resolver} = Mail::SpamAssassin::DnsResolver->new($self);

+  # load GeoDB if some plugin wants it
+  if ($self->{geodb_wanted}) {
+    eval '
+      use Mail::SpamAssassin::GeoDB;
+      $self->{geodb} = Mail::SpamAssassin::GeoDB->new({
+        conf => $self->{conf}->{geodb},
+        wanted => $self->{geodb_wanted},
+      });
+      1;
+    ';
+    if ($@ || !$self->{geodb}) {
+      dbg("config: GeoDB disabled: $@");
+    }
+  }
+
  # TODO -- open DNS cache etc. if necessary
 }

@ -1826,10 +1875,10 @@ sub _read_cf_pre {
      dbg("config: file or directory $path not accessible: $!");
    } elsif (-d _) {
      foreach my $file ($self->$filelistmethod($path)) {
-        $txt .= read_cf_file($file);
+        $txt .= $self->read_cf_file($file);
      }
    } elsif (-f _ && -s _ && -r _) {
-      $txt .= read_cf_file($path);
+      $txt .= $self->read_cf_file($path);
    }
  }

@ -1838,9 +1887,14 @@ sub _read_cf_pre {


 sub read_cf_file {
-  my($path) = @_;
+  my($self, $path) = @_;
  my $txt = '';

+  if ($self->{cf_files_read}->{$path}++) {
+    dbg("config: skipping already read file: $path");
+    return $txt;
+  }
+
  local *IN;
  if (open (IN, "<".$path)) {

@ -1852,7 +1906,8 @@ sub read_cf_file {

    $txt = "file start $path\n" . $txt;
    # add an extra \n in case file did not end in one.
-    $txt .= "\nfile end $path\n";
+    $txt .= "\n" unless $txt =~ /\n\z/;
+    $txt .= "file end $path\n";

    dbg("config: read file $path");
  }
@ -1900,7 +1955,7 @@ sub get_and_create_userstate_dir {
    dbg("config: error accessing $fname: $!");
  } else {  # does not exist, create it
    eval {
-      mkpath($fname, 0, 0700);  1;
+      mkpath(Mail::SpamAssassin::Util::untaint_file_path($fname), 0, 0700);  1;
    } or do {
      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
      dbg("config: mkdir $fname failed: $eval_stat");
@ -1910,6 +1965,58 @@ sub get_and_create_userstate_dir {
  $fname;
 }

+# find the most global writable state dir
+# used by dns_block_rule state files etc
+sub set_global_state_dir {
+  my ($self) = @_;
+  # try home_dir_for_helpers
+  my $helper_dir = $self->{home_dir_for_helpers} || '';
+  if ($helper_dir) {
+    my $dir = File::Spec->catdir($helper_dir, ".spamassassin");
+    return if $self->test_global_state_dir($dir);
+  }
+  # try user home (if different from helper home)
+  my $home;
+  if (am_running_on_windows()) {
+    # Windows has a special folder for common appdata (Bug 8050)
+    $home = Mail::SpamAssassin::Util::common_application_data_directory();
+  } else {
+    $home = (Mail::SpamAssassin::Util::portable_getpwuid ($>))[7];
+  }
+  if ($home && $home ne $helper_dir) {
+    my $dir = File::Spec->catdir($home, ".spamassassin");
+    return if $self->test_global_state_dir($dir);
+  }
+  # try LOCAL_STATE_DIR
+  return if $self->test_global_state_dir($self->{LOCAL_STATE_DIR});
+  # fallback to userstate
+  $self->{global_state_dir} = $self->get_and_create_userstate_dir();
+  dbg("config: global_state_dir set to userstate_dir: $self->{global_state_dir}");
+}
+
+sub test_global_state_dir {
+    my ($self, $dir) = @_;
+    eval { mkpath($dir, 0, 0700); }; # just a single stat if exists already
+    # Purge stale test files (enough to do only some times randomly)
+    if (rand() < 0.2 && opendir(WT_DIR, $dir)) {
+      foreach (grep {index($_, '.sawritetest') == 0 &&
+                 (-M File::Spec->catfile($dir, $_)||0) > 0.0001} readdir(WT_DIR)) {
+        unlink(Mail::SpamAssassin::Util::untaint_file_path(File::Spec->catfile($dir, $_)));
+      }
+      closedir WT_DIR;
+    }
+    my $n = ".sawritetest$$".Mail::SpamAssassin::Util::pseudo_random_string(6);
+    my $file = File::Spec->catfile($dir, $n);
+    if (Mail::SpamAssassin::Util::touch_file($file, { create_exclusive => 1 })) {
+      dbg("config: global_state_dir set to $dir");
+      $self->{global_state_dir} = $dir;
+      unlink($file);
+      return 1;
+    }
+    unlink($file); # just in case?
+    return 0;
+}
+
 =item $fullpath = $f->find_rule_support_file ($filename)

 Find a rule-support file, such as C<languages> or C<triplets.txt>,
@ -1923,8 +2030,22 @@ it exists, or undef if it doesn't exist.
 sub find_rule_support_file {
  my ($self, $filename) = @_;

+  my @paths;
+  # search custom directories first
+  if ($self->{site_rules_filename}) {
+    foreach my $path (split("\000", $self->{site_rules_filename})) {
+      push @paths, $path  if -d $path;
+    }
+  }
+  if ($self->{rules_filename} && -d $self->{rules_filename}) {
+    push @paths, $self->{rules_filename}
+  }
+  # updates sub-directory missing from @default_rules_path
+  push @paths, '__local_state_dir__/__version__/updates_spamassassin_org';
+  push @paths, @default_rules_path;
+
  return $self->first_existing_path(
-    map { my $p = $_; $p =~ s{$}{/$filename}; $p } @default_rules_path );
+    map { my $p = $_; $p =~ s{$}{/$filename}; $p } @paths );
 }

 =item $f->create_default_prefs ($filename, $username [ , $userdir ] )
@ -2006,15 +2127,15 @@ sub expand_name {
  if (am_running_on_windows()) {
    my $userprofile = $ENV{USERPROFILE} || '';

-    return $userprofile if ($userprofile && $userprofile =~ m/^[a-z]\:[\/\\]/oi);
-    return $userprofile if ($userprofile =~ m/^\\\\/o);
+    return $userprofile if ($userprofile && $userprofile =~ m/^[a-z]\:[\/\\]/i);
+    return $userprofile if ($userprofile =~ m/^\\\\/);

-    return $home if ($home && $home =~ m/^[a-z]\:[\/\\]/oi);
-    return $home if ($home =~ m/^\\\\/o);
+    return $home if ($home && $home =~ m/^[a-z]\:[\/\\]/i);
+    return $home if ($home =~ m/^\\\\/);

    return '';
  } else {
-    return $home if ($home && $home =~ /\//o);
+    return $home if ($home && index($home, '/') != -1);
    return (getpwnam($name))[7] if ($name ne '');
    return (getpwuid($>))[7];
  }
@ -2028,6 +2149,9 @@ sub sed_path {
    return $self->{conf}->{sed_path_cache}->{$path};
  }

+  # <4.0 compatibility check, to be removed in 4.1
+  my $check_compat = $path eq '__userstate__/auto-welcomelist';
+
  my $orig_path = $path;

  $path =~ s/__local_rules_dir__/$self->{LOCAL_RULES_DIR} || ''/ges;
@ -2035,10 +2159,21 @@ sub sed_path {
  $path =~ s/__def_rules_dir__/$self->{DEF_RULES_DIR} || ''/ges;
  $path =~ s{__prefix__}{$self->{PREFIX} || $Config{prefix} || '/usr'}ges;
  $path =~ s{__userstate__}{$self->get_and_create_userstate_dir() || ''}ges;
+  $path =~ s/__global_state_dir__/$self->{global_state_dir} || ''/ges;
  $path =~ s{__perl_major_ver__}{$self->get_perl_major_version()}ges;
  $path =~ s/__version__/${VERSION}/gs;
  $path =~ s/^\~([^\/]*)/$self->expand_name($1)/es;

+  # <4.0 compatibility check, to be removed in 4.1
+  if ($check_compat) {
+    if ($path =~ m{^(.+)/(.+)$}) {
+      # Use auto-whitelist if found
+      if (!-e $path && -e "$1/auto-whitelist") {
+        $path = "$1/auto-whitelist";
+      }
+    }
+  }
+
  $path = Mail::SpamAssassin::Util::untaint_file_path ($path);
  $self->{conf}->{sed_path_cache}->{$orig_path} = $path;
  return $path;
@ -2077,16 +2212,32 @@ sub get_pre_files_in_dir {
  return $self->_get_cf_pre_files_in_dir($dir, 'pre');
 }

+sub _reorder_dir {
+  # Official ASF channel should be loaded first in
+  # order to be able to override scores by using custom channels
+  # bz 7991
+  if($a eq 'updates_spamassassin_org.cf') {
+    return -1;
+  } elsif ($b eq 'updates_spamassassin_org.cf') {
+    return 1;
+  }
+  return $a cmp $b;
+}
+
 sub _get_cf_pre_files_in_dir {
  my ($self, $dir, $type) = @_;

  if ($self->{config_tree_recurse}) {
    my @cfs;
-
+    # copied from Mail::SpamAssassin::Util::untaint_file_path
+    # fix bugs 8010 and 8025 by using an untaint pattern that is better on Windows than File::Find's default
+    my $chars = '-_A-Za-z0-9.#%=+,/:()\\@\\xA0-\\xFF\\\\';
+    my $re = qr{^\s*([$chars][${chars}~ ]*)\z};
    # use "eval" to avoid loading File::Find unless this is specified
    eval ' use File::Find qw();
      File::Find::find(
        { untaint => 1,
+          am_running_on_windows() ? (untaint_pattern => $re) : (),
          follow => 1,
          wanted =>
            sub { push(@cfs, $File::Find::name) if /\.\Q$type\E$/i && -f $_ }
@ -2095,7 +2246,7 @@ sub _get_cf_pre_files_in_dir {
      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
      die "_get_cf_pre_files_in_dir error: $eval_stat";
    };
-    @cfs = sort { $a cmp $b } @cfs;
+    @cfs = sort { _reorder_dir($a, $b) } @cfs;
    return @cfs;
  }
  else {
@ -2104,7 +2255,7 @@ sub _get_cf_pre_files_in_dir {
                     /\.${type}$/i && -f "$dir/$_" } readdir(SA_CF_DIR);
    closedir SA_CF_DIR;

-    return map { "$dir/$_" } sort { $a cmp $b } @cfs;
+    return map { "$dir/$_" } sort { _reorder_dir($a, $b) } @cfs;
  }
 }

@ -2127,11 +2278,18 @@ sub call_plugins {
  return unless $self->{plugins};

  # Use some calls ourself too
-  if ($subname eq 'finish_parsing_end') {
+  if ($subname eq 'spamd_child_init') {
+    # set global dir now if spamd
+    $self->set_global_state_dir();
+  } elsif ($subname eq 'finish_parsing_end') {
    # Initialize RegistryBoundaries, now that util_rb_tld etc from config is
    # read.  Plugins can also now use {valid_tlds_re} to one time compile
    # regexes in finish_parsing_end.
    $self->{registryboundaries} = Mail::SpamAssassin::RegistryBoundaries->new ($self);
+  } elsif ($subname eq 'whitelist_address' || $subname eq 'blacklist_address') {
+    # Warn about backwards compatibility, removed in 4.1
+    # Third party usage should be rare event, so do not translate function names
+    warn "config: Deprecated $subname called from call_plugins, use welcomelist_address or blocklist_address\n";
  }

  # safety net in case some plugin changes global settings, Bug 6218
@ -2152,8 +2310,12 @@ sub find_all_addrs_in_mail {
  				Errors-To Mail-Followup-To))
  {
    my @hdrs = $mail_obj->get_header($header);
-    if ($#hdrs < 0) { next; }
-    push (@addrlist, $self->find_all_addrs_in_line(join (" ", @hdrs)));
+    foreach my $hdr (@hdrs) {
+      my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($hdr);
+      foreach my $addr (@addrs) {
+        push @addrlist, $addr->{address} if defined $addr->{address};
+      }
+    }
  }

  # find addrs in body, too
@ -2176,17 +2338,22 @@ sub find_all_addrs_in_mail {
 sub find_all_addrs_in_line {
  my ($self, $line) = @_;

+  return () unless defined $line;
+
  # a more permissive pattern based on "dot-atom" as per RFC2822
-  my $ID_PATTERN   = '[-a-z0-9_\+\:\=\!\#\$\%\&\*\^\?\{\}\|\~\/\.]+';
-  my $HOST_PATTERN = '[-a-z0-9_\+\:\/]+';
+  my $ID_PATTERN   = qr/[-a-zA-Z0-9_\+\:\=\!\#\$\%\&\*\^\?\{\}\|\~\/\.]+/;
+  my $HOST_PATTERN = qr/[-a-zA-Z0-9_\+\:\/]+/;

  my @addrs;
  my %seen;
  while ($line =~ s/(?:mailto:)?\s*
 	      ($ID_PATTERN \@
-	      $HOST_PATTERN(?:\.$HOST_PATTERN)+)//oix) 
+	      ($HOST_PATTERN(?:\.$HOST_PATTERN)+))//oix) 
  {
    my $addr = $1;
+    my $host = $2;
+    next unless Mail::SpamAssassin::Util::is_fqdn_valid($host);
+    next unless $self->{registryboundaries}->is_domain_valid($host);
    $addr =~ s/^mailto://;
    next if (defined ($seen{$addr})); $seen{$addr} = 1;
    push (@addrs, $addr);
--- a/upstream/lib/Mail/SpamAssassin/ArchiveIterator.pm
+++ b/upstream/lib/Mail/SpamAssassin/ArchiveIterator.pm
@ -25,13 +25,13 @@ use warnings;
 use re 'taint';

 use Errno qw(ENOENT EACCES EBADF);
-use Mail::SpamAssassin::Util;
+use Mail::SpamAssassin::Util qw(compile_regexp);
 use Mail::SpamAssassin::Constants qw(:sa);
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::AICache;

-# 256 KiB is a big email, unless stated otherwise
-use constant BIG_BYTES => 256*1024;
+# 500 KiB is a big email, unless stated otherwise
+use constant BIG_BYTES => 500*1024;

 our ( $MESSAGES, $AICache, %class_opts );

@ -43,9 +43,9 @@ Mail::SpamAssassin::ArchiveIterator - find and process messages one at a time

 =head1 SYNOPSIS

-  my $iter = new Mail::SpamAssassin::ArchiveIterator(
+  my $iter = Mail::SpamAssassin::ArchiveIterator->new(
    { 
-      'opt_max_size' => 256 * 1024,  # 0 implies no limit
+      'opt_max_size' => 500 * 1024,  # 0 implies no limit
      'opt_cache' => 1,
    }
  );
@ -77,7 +77,7 @@ and C<result_sub> functions appropriately per message.

 ###########################################################################

-=item $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... } ] )
+=item $item = Mail::SpamAssassin::ArchiveIterator->new( [ { opt => val, ... } ] )

 Constructs a new C<Mail::SpamAssassin::ArchiveIterator> object.  You may
 pass the following attribute-value pairs to the constructor.  The pairs are
@ -91,7 +91,7 @@ A value of option I<opt_max_size> determines a limit (number of bytes)
 beyond which a message is considered large and is skipped by ArchiveIterator.

 A value 0 implies no size limit, all messages are examined. An undefined
-value implies a default limit of 256 KiB.
+value implies a default limit of 500 KiB.

 =item opt_all

@ -205,8 +205,6 @@ sub new {
  $self->{s} = [ ];		# spam, of course
  $self->{h} = [ ];		# ham, as if you couldn't guess

-  $self->{access_problem} = 0;
-
  if ($self->{opt_all}) {
    $self->{opt_max_size} = 0;
  } elsif (!defined $self->{opt_max_size}) {
@ -235,10 +233,14 @@ sub set_functions {

 =item run ( @target_paths )

-Generates the list of messages to process, then runs each message through the
-configured wanted subroutine.  Files which have a name ending in C<.gz> or
-C<.bz2> will be properly uncompressed via call to C<gzip -dc> and C<bzip2 -dc>
-respectively.
+Generates the list of messages to process, then runs each message through
+the configured wanted subroutine.
+
+Compressed files are detected and uncompressed automatically regardless of
+file extension.  Supported formats are C<gzip>, C<bzip2>, C<xz>, C<lz4>,
+C<lzip>, C<lzo>.  Gzip is uncompressed via IO::Zlib module, others use their
+specific command line tool (bzip2/xz/lz4/lzip/lzop).  Compressed
+mailbox/mbox files are not supported.

 The target_paths array is expected to be either one element per path in the
 following format: C<class:format:raw_location>, or a hash reference containing
@ -292,6 +294,11 @@ sub run {
    return 0;
  }

+  # Find some uncompressors (gzip is handled with IO::Zlib)
+  foreach ('bzip2','xz','lz4','lzip','lzop') {
+    $self->{$_.'_path'} = Mail::SpamAssassin::Util::find_executable_in_env_path($_);
+  }
+
  # scan the targets and get the number and list of messages
  $self->_scan_targets(\@targets,
    sub {
@ -316,11 +323,16 @@ sub run {
 sub _run {
  my ($self, $messages) = @_;

+  my $messages_run = 0;
  while (my $message = shift @{$messages}) {
    my($class, undef, $date, undef, $result) = $self->_run_message($message);
-    &{$self->{result_sub}}($class, $result, $date) if $result;
+    if ($result) {
+      $messages_run++;
+      &{$self->{result_sub}}($class, $result, $date);
    }
-  return ! $self->{access_problem};
+  }
+  # Return success if atleast some files were processed through
+  return $messages_run > 0;
 }

 ############################################################################
@ -346,24 +358,8 @@ sub _run_message {
 sub _run_file {
  my ($self, $class, $format, $where, $date) = @_;

-  if (!_mail_open($where)) {
-    $self->{access_problem} = 1;
-    return;
-  }
-
-  my $stat_errn = stat(INPUT) ? 0 : 0+$!;
-  if ($stat_errn == ENOENT) {
-    dbg("archive-iterator: no such input ($where)");
-    return;
-  }
-  elsif ($stat_errn != 0) {
-    warn "archive-iterator: no access to input ($where): $!";
-    return;
-  }
-  elsif (!-f _ && !-c _ && !-p _) {
-    warn "archive-iterator: not a plain file (or char.spec. or pipe) ($where)";
-    return;
-  }
+  my $fh = $self->_mail_open($where, 1);
+  return unless $fh;

  my $opt_max_size = $self->{opt_max_size};
  if (!$opt_max_size) {
@ -375,7 +371,7 @@ sub _run_file {
    # note that -s can only deal with files, it returns 0 on char.spec. STDIN
    info("archive-iterator: skipping large message: ".
         "file size %d, limit %d bytes", -s _, $opt_max_size);
-    close INPUT  or die "error closing input file: $!";
+    close $fh  or die "error closing input file: $!";
    return;
  }

@ -384,12 +380,12 @@ sub _run_file {
  my $len = 0;
  my $str = '';
  my($inbuf,$nread);
-  while ( $nread=read(INPUT,$inbuf,16384) ) {
+  while ( $nread=read($fh,$inbuf,16384) ) {
    $len += $nread;
    if ($opt_max_size && $len > $opt_max_size) {
      info("archive-iterator: skipping large message: read %d, limit %d bytes",
           $len, $opt_max_size);
-      close INPUT  or die "error closing input file: $!";
+      close $fh  or die "error closing input file: $!";
      return;
    }
    $str .= $inbuf;
@ -400,7 +396,7 @@ sub _run_file {
  for my $j (0..$#msg) {
    if ($msg[$j] =~ /^\015?$/) { $header = $j; last }
  }
-  close INPUT  or die "error closing input file: $!";
+  close $fh  or die "error closing input file: $!";

  if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
    $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
@ -418,21 +414,20 @@ sub _run_mailbox {
  }
  my @msg;
  my $header;
-  if (!_mail_open($file)) {
-    $self->{access_problem} = 1;
-    return;
-  }
+
+  my $fh = $self->_mail_open($file, 1);
+  return unless $fh;

  my $opt_max_size = $self->{opt_max_size};
  dbg("archive-iterator: _run_mailbox %s, ofs %d, limit %d",
      $file, $offset, $opt_max_size||0);

-  seek(INPUT,$offset,0)  or die "cannot reposition file to $offset: $!";
+  seek($fh,$offset,0)  or die "cannot reposition file to $offset: $!";

  my $size = 0;
-  for ($!=0; <INPUT>; $!=0) {
+  for ($!=0; <$fh>; $!=0) {
    #Changed Regex to use option Per bug 6703
-    last if (substr($_,0,5) eq "From " && @msg && /$self->{opt_from_regex}/o);
+    last if (/^From / && @msg && $_ =~ $self->{opt_from_regex});
    $size += length($_);
    push (@msg, $_);

@ -441,7 +436,7 @@ sub _run_mailbox {
      info("archive-iterator: skipping large message: ".
           "%d lines, %d bytes, limit %d bytes",
           scalar @msg, $size, $opt_max_size);
-      close INPUT  or die "error closing input file: $!";
+      close $fh  or die "error closing input file: $!";
      return;
    }

@ -452,7 +447,7 @@ sub _run_mailbox {
  defined $_ || $!==0  or
    $!==EBADF ? dbg("archive-iterator: error reading: $!")
              : die "error reading: $!";
-  close INPUT  or die "error closing input file: $!";
+  close $fh  or die "error closing input file: $!";

  if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
    $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
@ -464,23 +459,24 @@ sub _run_mailbox {
 sub _run_mbx {
  my ($self, $class, $format, $where, $date) = @_;

-  my ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
+  my ($file, $offset);
+  { local($1,$2);  # Bug 7140 (avoids perl bug [perl #123880])
+    ($file, $offset) = ($where =~ m/(.*)\.(\d+)$/);
+  }
  my @msg;
  my $header;

-  if (!_mail_open($file)) {
-    $self->{access_problem} = 1;
-    return;
-  }
+  my $fh = $self->_mail_open($file, 1);
+  return unless $fh;

  my $opt_max_size = $self->{opt_max_size};
  dbg("archive-iterator: _run_mbx %s, ofs %d, limit %d",
      $file, $offset, $opt_max_size||0);

-  seek(INPUT,$offset,0)  or die "cannot reposition file to $offset: $!";
+  seek($fh,$offset,0)  or die "cannot reposition file to $offset: $!";
    
  my $size = 0;
-  for ($!=0; <INPUT>; $!=0) {
+  for ($!=0; <$fh>; $!=0) {
    last if ($_ =~ MBX_SEPARATOR);
    $size += length($_);
    push (@msg, $_);
@ -490,7 +486,7 @@ sub _run_mbx {
      info("archive-iterator: skipping large message: ".
           "%d lines, %d bytes, limit %d bytes",
           scalar @msg, $size, $opt_max_size);
-      close INPUT  or die "error closing input file: $!";
+      close $fh  or die "error closing input file: $!";
      return;
    }

@ -501,7 +497,7 @@ sub _run_mbx {
  defined $_ || $!==0  or
    $!==EBADF ? dbg("archive-iterator: error reading: $!")
              : die "error reading: $!";
-  close INPUT  or die "error closing input file: $!";
+  close $fh  or die "error closing input file: $!";

  if ($date == AI_TIME_UNKNOWN && $self->{determine_receive_date}) {
    $date = Mail::SpamAssassin::Util::receive_date(join('', splice(@msg, 0, $header)));
@ -579,12 +575,9 @@ sub _scan_targets {
      if ($format eq 'detect') {
 	# detect the format
        my $stat_errn = stat($location) ? 0 : 0+$!;
-        if ($stat_errn == ENOENT) {
-          $thisformat = 'file';  # actually, no file - to be detected later
-        }
-        elsif ($stat_errn != 0) {
-          warn "archive-iterator: no access to $location: $!";
-          $thisformat = 'file';
+        if ($stat_errn != 0) {
+          warn "archive-iterator: no access to $location: $!\n";
+          next;
        }
        elsif (-d _) {
 	  # it's a directory
@ -623,37 +616,121 @@ sub _scan_targets {
 }

 sub _mail_open {
-  my ($file) = @_;
+  my ($self, $file, $ignore_missing) = @_;
+  my $fh;

+  # Go ahead and try to open the file
  # bug 5288: the "magic" version of open will strip leading and trailing
  # whitespace from the expression.  switch to the three-argument version
  # of open which does not strip whitespace.  see "perldoc -f open" and
  # "perldoc perlipc" for more information.
-
-  # Assume that the file by default is just a plain file
-  my @expr = ( $file );
-  my $mode = '<';
-
-  # Handle different types of compressed files
-  if ($file =~ /\.gz$/) {
-    $mode = '-|';
-    unshift @expr, 'gunzip', '-cd';
+  if (!open($fh, '<', $file)) {
+    # Don't warn about disappeared files
+    if ($ignore_missing && $! == ENOENT) {
+      dbg("archive-iterator: no access to $file: $!");
+    } else {
+      warn "archive-iterator: no access to $file: $!\n"
    }
-  elsif ($file =~ /\.bz2$/) {
-    $mode = '-|';
-    unshift @expr, 'bzip2', '-cd';
-  }
-
-  # Go ahead and try to open the file
-  if (!open (INPUT, $mode, @expr)) {
-    warn "archive-iterator: unable to open $file: $!\n";
-    return 0;
+    return;
  }

  # bug 5249: mail could have 8-bit data, need this on some platforms
-  binmode INPUT  or die "cannot set input file to binmode: $!";
+  binmode $fh  or die "cannot set input file to binmode: $!";

-  return 1;
+  # Detect compressed data (only from files, can't reopen pipe)
+  if (-f $file && read($fh, my $magic, 6)) {
+    # GZIP
+    if ($magic =~ /^\x1F\x8B/) {
+      dbg("archive-iterator: detected gzip file $file, reopening with IO::Zlib");
+      close $fh  or die "error closing input file: $!";
+      eval { require IO::Zlib; };
+      if ($@) { warn "archive-iterator: IO::Zlib required for $file: $@\n"; return; }
+      $fh = IO::Zlib->new($file, "rb");
+      if (!$fh) {
+        if ($ignore_missing && $! == ENOENT) {
+          dbg("archive-iterator: no access to $file: $!");
+        } else {
+          warn "archive-iterator: no access to $file: $!\n";
+        }
+        return;
+      }
+    }
+    # BZIP2
+    elsif ($magic =~ /^\x42\x5A(?:\x68|\x30)/) {
+      dbg("archive-iterator: detected bzip2 file $file, reopening with bzip2");
+      close $fh  or die "error closing input file: $!";
+      if (!$self->{bzip2_path}) {
+        warn "archive-iterator: bzip2 executable required for $file\n";
+        return;
+      }
+      if (!open($fh, '-|', $self->{bzip2_path}, '-cd', $file)) {
+        warn "archive-iterator: no access to $file: $!\n";
+        return;
+      }
+      binmode $fh  or die "cannot set input file to binmode: $!";
+    }
+    # XZ
+    elsif ($magic =~ /^\xFD\x37\x7A\x58\x5A\x00/) {
+      dbg("archive-iterator: detected xz file $file, reopening with xz");
+      close $fh  or die "error closing input file: $!";
+      if (!$self->{xz_path}) {
+        warn "archive-iterator: xz executable required for $file\n";
+        return;
+      }
+      if (!open($fh, '-|', $self->{xz_path}, '-cd', $file)) {
+        warn "archive-iterator: no access to $file: $!\n";
+        return;
+      }
+      binmode $fh  or die "cannot set input file to binmode: $!";
+    }
+    # LZ4
+    elsif ($magic =~ /^\x04\x22\x4D\x18/) {
+      dbg("archive-iterator: detected lz4 file $file, reopening with lz4");
+      close $fh  or die "error closing input file: $!";
+      if (!$self->{lz4_path}) {
+        warn "archive-iterator: lz4 executable required for $file\n";
+        return;
+      }
+      if (!open($fh, '-|', $self->{lz4_path}, '-cd', $file)) {
+        warn "archive-iterator: no access to $file: $!\n";
+        return;
+      }
+      binmode $fh  or die "cannot set input file to binmode: $!";
+    }
+    # LZIP
+    elsif ($magic =~ /^\x4C\x5A\x49\x50/) {
+      dbg("archive-iterator: detected lzip file $file, reopening with lzip");
+      close $fh  or die "error closing input file: $!";
+      if (!$self->{lzip_path}) {
+        warn "archive-iterator: lzip executable required for $file\n";
+        return;
+      }
+      if (!open($fh, '-|', $self->{lzip_path}, '-cd', $file)) {
+        warn "archive-iterator: no access to $file: $!\n";
+        return;
+      }
+      binmode $fh  or die "cannot set input file to binmode: $!";
+    }
+    # LZO
+    elsif ($magic =~ /^\x89\x4C\x5A\x4F\x00\x0D/) {
+      dbg("archive-iterator: detected lzo file $file, reopening with lzop");
+      close $fh  or die "error closing input file: $!";
+      if (!$self->{lzop_path}) {
+        warn "archive-iterator: lzop executable required for $file\n";
+        return;
+      }
+      if (!open($fh, '-|', $self->{lzop_path}, '-cd', $file)) {
+        warn "archive-iterator: no access to $file: $!\n";
+        return;
+      }
+      binmode $fh  or die "cannot set input file to binmode: $!";
+    } else {
+      # Reset position
+      seek($fh,0,0);
+    }
+  }
+
+  return $fh;
 }

 sub _set_default_message_selection_opts {
@ -663,11 +740,15 @@ sub _set_default_message_selection_opts {
  $self->{opt_want_date} = 1 unless (defined $self->{opt_want_date});
  $self->{opt_cache} = 0 unless (defined $self->{opt_cache});
  #Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413
-  $self->{opt_from_regex} = '^From \S+  ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)' unless (defined $self->{opt_from_regex});
-
-  #STRIP LEADING AND TRAILING / FROM REGEX FOR OPTION
-  $self->{opt_from_regex} =~ s/^\///;
-  $self->{opt_from_regex} =~ s/\/$//;
+  if (!defined $self->{opt_from_regex}) {
+    $self->{opt_from_regex} = qr/^From \S+  ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/;
+  } elsif (ref($self->{opt_from_regex}) ne 'Regexp') {
+    my ($rec, $err) = compile_regexp($self->{opt_from_regex}, 1);
+    if (!$rec) {
+      die "fatal: invalid mbox_format_from_regex '$self->{opt_from_regex}': $err\n";
+    }
+    $self->{opt_from_regex} = $rec;
+  }

  dbg("archive-iterator: _set_default_message_selection_opts After: Scanprob[$self->{opt_scanprob}], want_date[$self->{opt_want_date}], cache[$self->{opt_cache}], from_regex[$self->{opt_from_regex}]");

@ -750,7 +831,7 @@ sub _scan_directory {
    # Maildir format: bug 3003
    for my $sub ("new", "cur") {
      opendir (DIR, "$folder/$sub")
-            or die "Can't open '$folder/$sub' dir: $!\n";
+            or die "archive-iterator: can't open '$folder/$sub' dir: $!\n";
      # Don't learn from messages marked as deleted
      # Or files starting with a leading dot
      push @files, map { "$sub/$_" } grep { !/^\.|:2,.*T/ } readdir(DIR);
@ -790,9 +871,10 @@ sub _scan_directory {
    my $stat_errn = stat($file) ? 0 : 0+$!;
    if ($stat_errn == ENOENT) {
      # no longer there?
+      dbg("archive-iterator: no access to $file: $!");
    }
    elsif ($stat_errn != 0) {
-      warn "archive-iterator: no access to $file: $!";
+      warn "archive-iterator: no access to $file: $!\n";
    }
    elsif (-f _ || -c _ || -p _) {
      $self->_scan_file($class, $file, $bkfunc);
@ -801,7 +883,7 @@ sub _scan_directory {
      push(@subdirs, $file);
    }
    else {
-      warn "archive-iterator: $file is not a plain file or directory: $!";
+      warn "archive-iterator: $file is not a plain file or directory\n";
    }
  }
  undef @files;  # release storage
@ -844,18 +926,17 @@ sub _scan_file {
      }

      my $header = '';
-      if (!_mail_open($mail)) {
-        $self->{access_problem} = 1;
-        return;
-      }
-      for ($!=0; <INPUT>; $!=0) {
+      my $fh = $self->_mail_open($mail);
+      return unless $fh;
+
+      for ($!=0; <$fh>; $!=0) {
        last if /^\015?$/s;
        $header .= $_;
      }
      defined $_ || $!==0  or
        $!==EBADF ? dbg("archive-iterator: error reading: $!")
                  : die "error reading: $!";
-      close INPUT  or die "error closing input file: $!";
+      close $fh  or die "error closing input file: $!";

      return if ($self->{opt_skip_empty_messages} && $header eq '');

@ -896,7 +977,6 @@ sub _scan_mailbox {
    $folder =~ s/\/\s*$//; #Remove trailing slash, if there
    if (!opendir(DIR, $folder)) {
      warn "archive-iterator: can't open '$folder' dir: $!\n";
-      $self->{access_problem} = 1;
      return;
    }
    while ($_ = readdir(DIR)) {
@ -921,9 +1001,8 @@ sub _scan_mailbox {

  foreach my $file (@files) {
    $self->_bump_scan_progress();
-    if ($file =~ /\.(?:gz|bz2)$/) {
+    if ($file =~ /\.(?:gz|bz2|xz|lz[o4]?)$/i) {
      warn "archive-iterator: compressed mbox folders are not supported at this time\n";
-      $self->{access_problem} = 1;
      next;
    }

@ -943,20 +1022,18 @@ sub _scan_mailbox {
    }

    unless ($count) {
-      if (!_mail_open($file)) {
-        $self->{access_problem} = 1;
-	next;
-      }
+      my $fh = $self->_mail_open($file);
+      next unless $fh;

      my $start = 0;		# start of a message
      my $where = 0;		# current byte offset
      my $first = '';		# first line of message
      my $header = '';		# header text
      my $in_header = 0;	# are in we a header?
-      while (!eof INPUT) {
+      while (!eof $fh) {
        my $offset = $start;	# byte offset of this message
        my $header = $first;	# remember first line
-        for ($!=0; <INPUT>; $!=0) {
+        for ($!=0; <$fh>; $!=0) {
 	  if ($in_header) {
            if (/^\015?$/s) {
 	      $in_header = 0;
@ -966,15 +1043,15 @@ sub _scan_mailbox {
 	    }
 	  }
          #Changed Regex to use option Per bug 6703
-	  if (substr($_,0,5) eq "From " && /$self->{opt_from_regex}/o) {
+	  if (/^From / && $_ =~ $self->{opt_from_regex}) {
 	    $in_header = 1;
 	    $first = $_;
 	    $start = $where;
-	    $where = tell INPUT;
+	    $where = tell $fh;
            $where >= 0  or die "cannot obtain file position: $!";
 	    last;
 	  }
-	  $where = tell INPUT;
+	  $where = tell $fh;
          $where >= 0  or die "cannot obtain file position: $!";
        }
        defined $_ || $!==0  or
@ -986,7 +1063,7 @@ sub _scan_mailbox {
 	  $info->{$offset} = Mail::SpamAssassin::Util::receive_date($header);
 	}
      }
-      close INPUT  or die "error closing input file: $!";
+      close $fh  or die "error closing input file: $!";
    }

    while(my($k,$v) = each %{$info}) {
@ -1027,7 +1104,6 @@ sub _scan_mbx {
    $folder =~ s/\/\s*$//; # remove trailing slash, if there is one
    if (!opendir(DIR, $folder)) {
      warn "archive-iterator: can't open '$folder' dir: $!\n";
-      $self->{access_problem} = 1;
      return;
    }
    while ($_ = readdir(DIR)) {
@ -1053,9 +1129,8 @@ sub _scan_mbx {
  foreach my $file (@files) {
    $self->_bump_scan_progress();

-    if ($folder =~ /\.(?:gz|bz2)$/) {
+    if ($folder =~ /\.(?:gz|bz2|xz|lz[o4]?)$/i) {
      warn "archive-iterator: compressed mbx folders are not supported at this time\n";
-      $self->{access_problem} = 1;
      next;
    }

@ -1075,13 +1150,11 @@ sub _scan_mbx {
    }

    unless ($count) {
-      if (!_mail_open($file)) {
-	$self->{access_problem} = 1;
-        next;
-      }
+      my $fh = $self->_mail_open($file);
+      next unless $fh;

      # check the mailbox is in mbx format
-      $! = 0; $fp = <INPUT>;
+      $! = 0; $fp = <$fh>;
      defined $fp || $!==0  or
        $!==EBADF ? dbg("archive-iterator: error reading: $!")
                  : die "error reading: $!";
@ -1092,18 +1165,17 @@ sub _scan_mbx {
      }

      # skip mbx headers to the first email...
-      seek(INPUT,2048,0)  or die "cannot reposition file to 2048: $!";
-      my $sep = MBX_SEPARATOR;
+      seek($fh,2048,0)  or die "cannot reposition file to 2048: $!";

-      for ($!=0; <INPUT>; $!=0) {
-        if ($_ =~ /$sep/) {
-	  my $offset = tell INPUT;
+      for ($!=0; <$fh>; $!=0) {
+        if ($_ =~ MBX_SEPARATOR) {
+	  my $offset = tell $fh;
          $offset >= 0  or die "cannot obtain file position: $!";
 	  my $size = $2;

 	  # gather up the headers...
 	  my $header = '';
-          for ($!=0; <INPUT>; $!=0) {
+          for ($!=0; <$fh>; $!=0) {
            last if (/^\015?$/s);
 	    $header .= $_;
 	  }
@ -1116,7 +1188,7 @@ sub _scan_mbx {
          }

 	  # go onto the next message
-	  seek(INPUT, $offset + $size, 0)
+	  seek($fh, $offset + $size, 0)
            or die "cannot reposition file to $offset + $size: $!";
 	}
        else {
@ -1126,7 +1198,7 @@ sub _scan_mbx {
      defined $_ || $!==0  or
        $!==EBADF ? dbg("archive-iterator: error reading: $!")
                  : die "error reading: $!";
-      close INPUT  or die "error closing input file: $!";
+      close $fh  or die "error closing input file: $!";
    }

    while(my($k,$v) = each %{$info}) {
--- a/upstream/lib/Mail/SpamAssassin/AsyncLoop.pm
+++ b/upstream/lib/Mail/SpamAssassin/AsyncLoop.pm
@ -42,6 +42,7 @@ use Time::HiRes qw(time);

 use Mail::SpamAssassin;
 use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Util qw(idn_to_ascii domain_to_search_list);

 our @ISA = qw();

@ -71,9 +72,9 @@ sub new {
    main                => $main,
    queries_started     => 0,
    queries_completed   => 0,
-    total_queries_started   => 0,
-    total_queries_completed => 0,
    pending_lookups     => { },
+    pending_rules	=> { },  # maintain pending rules list for meta evaluation
+    rules_for_key	=> { },  # record all rules used by a key for logging
    timing_by_query     => { },
    all_lookups         => { },  # keyed by "rr_type/domain"
  };
@ -82,66 +83,54 @@ sub new {
  $self;
 }

-# Given a domain name, produces a listref of successively stripped down
-# parent domains, e.g. a domain '2.10.Example.COM' would produce a list:
-# '2.10.example.com', '10.example.com', 'example.com', 'com', ''
-#
-sub domain_to_search_list {
-  my ($domain) = @_;
-  $domain =~ s/^\.+//; $domain =~ s/\.+\z//;  # strip leading and trailing dots
-  my @search_keys;
-  if ($domain =~ /\[/) {  # don't split address literals
-    @search_keys = ( $domain, '' );  # presumably an address literal
-  } else {
-    local $1;
-    $domain = lc $domain;
-    for (;;) {
-      push(@search_keys, $domain);
-      last  if $domain eq '';
-      # strip one level
-      $domain = ($domain =~ /^ (?: [^.]* ) \. (.*) \z/xs) ? $1 : '';
-    }
-    if (@search_keys > 20) {  # enforce some sanity limit
-      @search_keys = @search_keys[$#search_keys-19 .. $#search_keys];
-    }
-  }
-  return \@search_keys;
-}
-
 # ---------------------------------------------------------------------------

-=item $ent = $async->start_lookup($ent, $master_deadline)
+=item $ent = $async->bgsend_and_start_lookup($name, $type, $class, $ent, $cb, %options)

-Register the start of a long-running asynchronous lookup operation.
-C<$ent> is a hash reference containing the following items:
+Launch async DNS lookups.  This is the only official method supported for
+plugins since version 4.0.0.  Do not use bgsend and start_lookup separately.
+
+Merges duplicate queries automatically, only launches one and calls all
+related callbacks on answer.

 =over 4

-=item key (required)
+=item $name (required)

-A key string, unique to this lookup.  This is what is reported in
-debug messages, used as the key for C<get_lookup()>, etc.
+Name to query.

-=item id (required)
+=item $type (required)

-An ID string, also unique to this lookup.  Typically, this is the DNS packet ID
-as returned by DnsResolver's C<bgsend> method.  Sadly, the Net::DNS
-architecture forces us to keep a separate ID string for this task instead of
-reusing C<key> -- if you are not using DNS lookups through DnsResolver, it
-should be OK to just reuse C<key>.
+Type to query, A, TXT, NS, etc.

-=item type (required)
+=item $class (required/deprecated)
+
+Deprecated, ignored, set as undef.
+
+=item C<$ent> is a required hash reference containing the following items:
+
+=over 4
+
+=item $ent->{rulename} (required)
+
+The rulename that started and/or depends on this query.  Required for rule
+dependencies to work correctly.  Can be a single rulename, or array of
+multiple rulenames.
+
+=item $ent->{type} (optional)

 A string, typically one word, used to describe the type of lookup in log
-messages, such as C<DNSBL>, C<MX>, C<TXT>.
+messages, such as C<DNSBL>, C<URIBL-A>.  If not defined, default is value of
+$type.

-=item zone (optional)
+=item $ent->{zone} (optional)

-A zone specification (typically a DNS zone name - e.g. host, domain, or RBL)
-which may be used as a key to look up per-zone settings. No semantics on this
-parameter is imposed by this module. Currently used to fetch by-zone timeouts.
+A zone specification (typically a DNS zone name - e.g.  host, domain, or
+RBL) which may be used as a key to look up per-zone settings.  No semantics
+on this parameter is imposed by this module.  Currently used to fetch
+by-zone timeouts (from rbl_timeout setting).  Defaults to $name.

-=item timeout_initial (optional)
+=item $ent->{timeout_initial} (optional)

 An initial value of elapsed time for which we are willing to wait for a
 response (time in seconds, floating point value is allowed). When elapsed
@ -158,20 +147,267 @@ variable rbl_timeout.
 If a value of the timeout_initial parameter is below timeout_min, the initial
 timeout is set to timeout_min.

-=item timeout_min (optional)
+=item $ent->{timeout_min} (optional)

 A lower bound (in seconds) to which the actual timeout approaches as the
 number of queries completed approaches the number of all queries started.
 Defaults to 0.2 * timeout_initial.

+=item $ent->{key}, $ent->{id} (deprecated)
+
+Deprecated, ignored, automatically generated since 4.0.0.
+
+=item $ent->{YOUR_OWN_ITEM}
+
+Any other custom values/objects that you want to pass on to the answer
+callback.
+
 =back

-C<$ent> is returned by this method, with its contents augmented by additional
-information.
+=item $cb (required)
+
+Callback function for answer, called as $cb->($ent, $pkt).  C<$ent> is the
+same object that bgsend_and_start_lookup was called with.  C<$pkt> is the
+packet object for the response, Net::DNS:RR objects can be found from
+$pkt->answer.
+
+=item %options (required)
+
+Hash of options. Only supported and required option is master_deadline:
+
+  master_deadline => $pms->{master_deadline}
+
+=back
+
+=cut
+
+sub start_queue {
+  my($self) = @_;
+
+  $self->{wait_queue} = 1;
+}
+
+sub launch_queue {
+  my($self) = @_;
+
+  delete $self->{wait_queue};
+
+  if ($self->{bgsend_queue}) {
+    dbg("async: launching queued lookups");
+    foreach (@{$self->{bgsend_queue}}) {
+      $self->bgsend_and_start_lookup(@$_);
+    }
+    delete $self->{bgsend_queue};
+  }
+}
+
+sub bgsend_and_start_lookup {
+  my $self = shift;
+  my($domain, $type, $class, $ent, $cb, %options) = @_;
+
+  return if $self->{main}->{resolver}->{no_resolver};
+
+  # Waiting for priority -100 to launch?
+  if ($self->{wait_queue}) {
+    push @{$self->{bgsend_queue}}, [@_];
+    dbg("async: DNS priority not reached, queueing lookup: $domain/$type");
+    return $ent;
+  }
+
+  if (!defined $ent->{rulename} && !$self->{rulename_warned}++) {
+    my($package, $filename, $line) = caller;
+    warn "async: bgsend_and_start_lookup called without rulename, ".
+         "from $package ($filename) line $line. You are likely using ".
+         "a plugin that is not compatible with SpamAssasin 4.0.0.";
+  }
+
+  $domain =~ s/\.+\z//s;  # strip trailing dots, these sometimes still sneak in
+  $domain = idn_to_ascii($domain);
+
+  # At this point the $domain should already be encoded to UTF-8 and
+  # IDN converted to ASCII-compatible encoding (ACE).  Make sure this is
+  # really the case in order to be able to catch any leftover omissions.
+  if (utf8::is_utf8($domain)) {
+    utf8::encode($domain);
+    my($package, $filename, $line) = caller;
+    info("bgsend_and_start_lookup: Unicode domain name, expected octets: %s, ".
+         "called from %s line %d", $domain, $package, $line);
+  } elsif ($domain =~ tr/\x00-\x7F//c) {  # is not all-ASCII
+    my($package, $filename, $line) = caller;
+    info("bgsend_and_start_lookup: non-ASCII domain name: %s, ".
+         "called from %s line %d", $domain, $package, $line);
+  }
+
+  my $dnskey = uc($type).'/'.lc($domain);
+  my $dns_query_info = $self->{all_lookups}{$dnskey};
+
+  $ent = {}  if !$ent;
+  $ent->{id} = undef;
+  my $key = $ent->{key} = $dnskey;
+  $ent->{query_type} = $type;
+  $ent->{query_domain} = $domain;
+  $ent->{type} = $type  if !exists $ent->{type};
+  $ent->{zone} = $domain  if !exists $ent->{zone};
+  $cb = $ent->{completed_callback}  if !$cb;  # compatibility with SA < 3.4
+
+  my @rulenames = grep { defined } (ref $ent->{rulename} ?
+                    @{$ent->{rulename}} : $ent->{rulename});
+
+  $self->{rules_for_key}->{$key}{$_} = 1 foreach (@rulenames);
+
+  if ($dns_query_info) {  # DNS query already underway or completed
+    if ($dns_query_info->{blocked}) {
+      dbg("async: blocked by %s: %s, rules: %s", $dns_query_info->{blocked},
+          $dnskey, join(", ", @rulenames));
+      return;
+    }
+    my $id = $ent->{id} = $dns_query_info->{id};  # re-use existing query
+    return if !defined $id;  # presumably some fatal failure
+    my $id_tail = $id; $id_tail =~ s{^\d+/IN/}{};
+    lc($id_tail) eq lc($dnskey)
+      or info("async: unmatched id %s, key=%s", $id, $dnskey);
+
+    my $pkt = $dns_query_info->{pkt};
+    if (!$pkt) {  # DNS query underway, still waiting for results
+      # just add our query to the existing one
+      push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
+      $self->{pending_rules}->{$_}{$key} = 1 foreach (@rulenames);
+      dbg("async: query %s already underway, adding no.%d, rules: %s",
+          $id, scalar @{$dns_query_info->{applicants}},
+          join(", ", @rulenames));
+
+    } else {  # DNS query already completed, re-use results
+      # answer already known, just do the callback and be done with it
+      delete $self->{pending_rules}->{$_}{$key} foreach (@rulenames);
+      if (!$cb) {
+        dbg("async: query %s already done, re-using for %s, rules: %s",
+            $id, $key, join(", ", @rulenames));
+      } else {
+        dbg("async: query %s already done, re-using for %s, callback, rules: %s",
+            $id, $key, join(", ", @rulenames));
+        eval {
+          $cb->($ent, $pkt); 1;
+        } or do {
+          chomp $@;
+          # resignal if alarm went off
+          die "async: (1) $@\n"  if $@ =~ /__alarm__ignore__\(.*\)/s;
+          warn sprintf("async: query %s completed, callback %s failed: %s\n",
+                       $id, $key, $@);
+        };
+      }
+    }
+  }
+
+  else {  # no existing query, open a new DNS query
+    $dns_query_info = $self->{all_lookups}{$dnskey} = {};  # new query needed
+    my($id, $blocked, $check_dbrdom);
+    # dns_query_restriction
+    my $blocked_by = 'dns_query_restriction';
+    my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
+    # dns_block_rule
+    my $dns_block_domains = $self->{main}->{conf}->{dns_block_rule_domains};
+    if ($dns_query_blockages || $dns_block_domains) {
+      my $search_list = domain_to_search_list($domain);
+      foreach my $parent_domain ((@$search_list, '*')) {
+        if ($dns_query_blockages) {
+          $blocked = $dns_query_blockages->{$parent_domain};
+          last if defined $blocked; # stop at first defined, can be true or false
+        }
+        if ($parent_domain ne '*' && exists $dns_block_domains->{$parent_domain}) {
+          # save for later check.. ps. untainted already
+          $check_dbrdom = $dns_block_domains->{$parent_domain};
+        }
+      }
+    }
+    if (!$blocked && $check_dbrdom) {
+      my $blockfile =
+        $self->{main}->sed_path("__global_state_dir__/dnsblock_${check_dbrdom}");
+      if (my $mtime = (stat($blockfile))[9]) {
+        if (time - $mtime <= $self->{main}->{conf}->{dns_block_time}) {
+          $blocked = 1;
+          $blocked_by = 'dns_block_rule';
+        } else {
+          dbg("async: dns_block_rule removing expired $blockfile");
+          unlink($blockfile);
+        }
+      }
+    }
+    if ($blocked) {
+      dbg("async: blocked by %s: %s, rules: %s", $blocked_by, $dnskey,
+          join(", ", @rulenames));
+      $dns_query_info->{blocked} = $blocked_by;
+    } else {
+      dbg("async: launching %s, rules: %s", $dnskey, join(", ", @rulenames));
+      $id = $self->{main}->{resolver}->bgsend($domain, $type, $class, sub {
+          my($pkt, $pkt_id, $timestamp) = @_;
+          # this callback sub is called from DnsResolver::poll_responses()
+          # dbg("async: in a bgsend_and_start_lookup callback, id %s", $pkt_id);
+          if ($pkt_id ne $id) {
+            warn "async: mismatched dns id: got $pkt_id, expected $id\n";
+            return;
+          }
+          $self->set_response_packet($pkt_id, $pkt, $ent->{key}, $timestamp);
+          $dns_query_info->{pkt} = $pkt;
+          my $cb_count = 0;
+          foreach my $tuple (@{$dns_query_info->{applicants}}) {
+            my($appl_ent, $appl_cb) = @$tuple;
+            my @rulenames = grep { defined } (ref $appl_ent->{rulename} ?
+                      @{$appl_ent->{rulename}} : $appl_ent->{rulename});
+            foreach (@rulenames) {
+              delete $self->{pending_rules}->{$_}{$appl_ent->{key}};
+            }
+            if ($appl_cb) {
+              dbg("async: calling callback on key %s, rules: %s",
+                  $key, join(", ", @rulenames));
+              $cb_count++;
+              eval {
+                $appl_cb->($appl_ent, $pkt); 1;
+              } or do {
+                chomp $@;
+                # resignal if alarm went off
+                die "async: (2) $@\n"  if $@ =~ /__alarm__ignore__\(.*\)/s;
+                warn sprintf("async: query %s completed, callback %s failed: %s\n",
+                             $id, $appl_ent->{key}, $@);
+              };
+            }
+          }
+          delete $dns_query_info->{applicants};
+          dbg("async: query $id completed, no callbacks run")  if !$cb_count;
+        });
+    }
+    return if !defined $id;
+    $dns_query_info->{id} = $ent->{id} = $id;
+    push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
+    $self->{pending_rules}->{$_}{$key} = 1 foreach (@rulenames);
+    $self->_start_lookup($ent, $options{master_deadline});
+  }
+  return $ent;
+}
+
+# ---------------------------------------------------------------------------
+
+=item $ent = $async->start_lookup($ent, $master_deadline)
+
+DIRECT USE DEPRECATED since 4.0.0, please use bgsend_and_start_lookup.

 =cut

 sub start_lookup {
+  my $self = shift;
+
+  if (!$self->{start_lookup_warned}++) {
+    my($package, $filename, $line) = caller;
+    warn "async: deprecated start_lookup called, ".
+         "from $package ($filename) line $line. You are likely using ".
+         "a plugin that is not compatible with SpamAssasin 4.0.0.";
+  }
+
+  return if $self->{main}->{resolver}->{no_resolver};
+  $self->_start_lookup(@_);
+}
+
+# Internal use not deprecated. :-)
+sub _start_lookup {
  my ($self, $ent, $master_deadline) = @_;

  my $id  = $ent->{id};
@ -229,16 +465,16 @@ sub start_lookup {
  $ent->{timeout_initial} = $t_init;
  $ent->{timeout_min} = $t_end;

+  my @rulenames = grep { defined } (ref $ent->{rulename} ?
+                    @{$ent->{rulename}} : $ent->{rulename});
  $ent->{display_id} =  # identifies entry in debug logging and similar
-    join(", ", grep { defined }
-               map { ref $ent->{$_} ? @{$ent->{$_}} : $ent->{$_} }
-               qw(sets rules rulename type key) );
+    join(", ", grep { defined } map { $ent->{$_} } qw(type key));

  $self->{pending_lookups}->{$key} = $ent;

  $self->{queries_started}++;
-  $self->{total_queries_started}++;
-  dbg("async: starting: %s (timeout %.1fs, min %.1fs)%s",
+  dbg("async: starting: %s%s (timeout %.1fs, min %.1fs)%s",
+      @rulenames ? join(", ", @rulenames).", " : '',
      $ent->{display_id}, $ent->{timeout_initial}, $ent->{timeout_min},
      !$clipped_by_master_deadline ? '' : ', capped by time limit');

@ -247,135 +483,15 @@ sub start_lookup {

 # ---------------------------------------------------------------------------

-=item $ent = $async->bgsend_and_start_lookup($domain, $type, $class, $ent, $cb, %options)
-
-A common idiom: calls C<bgsend>, followed by a call to C<start_lookup>,
-returning the argument $ent object as modified by C<start_lookup> and
-filled-in with a query ID.
-
-=cut
-
-sub bgsend_and_start_lookup {
-  my($self, $domain, $type, $class, $ent, $cb, %options) = @_;
-  $ent = {}  if !$ent;
-  $domain =~ s/\.+\z//s;  # strip trailing dots, these sometimes still sneak in
-  $ent->{id} = undef;
-  $ent->{query_type} = $type;
-  $ent->{query_domain} = $domain;
-  $ent->{type} = $type  if !exists $ent->{type};
-  $cb = $ent->{completed_callback}  if !$cb;  # compatibility with SA < 3.4
-
-  my $key = $ent->{key} || '';
-
-  my $dnskey = uc($type) . '/' . lc($domain);
-  my $dns_query_info = $self->{all_lookups}{$dnskey};
-
-  if ($dns_query_info) {  # DNS query already underway or completed
-    my $id = $ent->{id} = $dns_query_info->{id};  # re-use existing query
-    return if !defined $id;  # presumably blocked, or other fatal failure
-    my $id_tail = $id; $id_tail =~ s{^\d+/IN/}{};
-    lc($id_tail) eq lc($dnskey)
-      or info("async: unmatched id %s, key=%s", $id, $dnskey);
-
-    my $pkt = $dns_query_info->{pkt};
-    if (!$pkt) {  # DNS query underway, still waiting for results
-      # just add our query to the existing one
-      push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
-      dbg("async: query %s already underway, adding no.%d %s",
-          $id, scalar @{$dns_query_info->{applicants}},
-          $ent->{rulename} || $key);
-
-    } else {  # DNS query already completed, re-use results
-      # answer already known, just do the callback and be done with it
-      if (!$cb) {
-        dbg("async: query %s already done, re-using for %s", $id, $key);
-      } else {
-        dbg("async: query %s already done, re-using for %s, callback",
-            $id, $key);
-        eval {
-          $cb->($ent, $pkt); 1;
-        } or do {
-          chomp $@;
-          # resignal if alarm went off
-          die "async: (1) $@\n"  if $@ =~ /__alarm__ignore__\(.*\)/s;
-          warn sprintf("query %s completed, callback %s failed: %s\n",
-                       $id, $key, $@);
-        };
-      }
-    }
-  }
-
-  else {  # no existing query, open a new DNS query
-    $dns_query_info = $self->{all_lookups}{$dnskey} = {};  # new query needed
-    my($id, $blocked);
-    my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
-    if ($dns_query_blockages) {
-      my $search_list = domain_to_search_list($domain);
-      foreach my $parent_domain (@$search_list) {
-        $blocked = $dns_query_blockages->{$parent_domain};
-        last if defined $blocked; # stop at first defined, can be true or false
-      }
-    }
-    if ($blocked) {
-      dbg("async: blocked by dns_query_restriction: %s", $dnskey);
-    } else {
-      dbg("async: launching %s for %s", $dnskey, $key);
-      $id = $self->{main}->{resolver}->bgsend($domain, $type, $class, sub {
-          my($pkt, $pkt_id, $timestamp) = @_;
-          # this callback sub is called from DnsResolver::poll_responses()
-        # dbg("async: in a bgsend_and_start_lookup callback, id %s", $pkt_id);
-          if ($pkt_id ne $id) {
-            warn "async: mismatched dns id: got $pkt_id, expected $id\n";
-            return;
-          }
-          $self->set_response_packet($pkt_id, $pkt, $ent->{key}, $timestamp);
-          $dns_query_info->{pkt} = $pkt;
-          my $cb_count = 0;
-          foreach my $tuple (@{$dns_query_info->{applicants}}) {
-            my($appl_ent, $appl_cb) = @$tuple;
-            if ($appl_cb) {
-              dbg("async: calling callback on key %s%s", $key,
-                  !defined $appl_ent->{rulename} ? ''
-                    : ", rule ".$appl_ent->{rulename});
-              $cb_count++;
-              eval {
-                $appl_cb->($appl_ent, $pkt); 1;
-              } or do {
-                chomp $@;
-                # resignal if alarm went off
-                die "async: (2) $@\n"  if $@ =~ /__alarm__ignore__\(.*\)/s;
-                warn sprintf("query %s completed, callback %s failed: %s\n",
-                             $id, $appl_ent->{key}, $@);
-              };
-            }
-          }
-          delete $dns_query_info->{applicants};
-          dbg("async: query $id completed, no callbacks run")  if !$cb_count;
-        });
-    }
-    return if !defined $id;
-    $dns_query_info->{id} = $ent->{id} = $id;
-    push(@{$dns_query_info->{applicants}}, [$ent,$cb]);
-    $self->start_lookup($ent, $options{master_deadline});
-  }
-  return $ent;
-}
-
-# ---------------------------------------------------------------------------
-
 =item $ent = $async->get_lookup($key)

-Retrieve the pending-lookup object for the given key C<$key>.
-
-If the lookup is complete, this will return C<undef>.
-
-Note that a lookup is still considered "pending" until C<complete_lookups()> is
-called, even if it has been reported as complete via C<set_response_packet()>.
+DEPRECATED since 4.0.0. Do not use.

 =cut

 sub get_lookup {
  my ($self, $key) = @_;
+  warn("async: deprecated get_lookup function used\n");
  return $self->{pending_lookups}->{$key};
 }

@ -415,18 +531,16 @@ sub complete_lookups {
  my %typecount;

  my $pending = $self->{pending_lookups};
-  $self->{queries_started} = 0;
-  $self->{queries_completed} = 0;

  my $now = time;

  if (defined $timeout && $timeout > 0 &&
-      %$pending && $self->{total_queries_started} > 0)
+      %$pending && $self->{queries_started} > 0)
  {
    # shrink a 'select' timeout if a caller specified unnecessarily long
    # value beyond the latest deadline of any outstanding request;
    # can save needless wait time (up to 1 second in harvest_dnsbl_queries)
-    my $r = $self->{total_queries_completed} / $self->{total_queries_started};
+    my $r = $self->{queries_completed} / $self->{queries_started};
    my $r2 = $r * $r;  # 0..1
    my $max_deadline;
    while (my($key,$ent) = each %$pending) {
@ -457,9 +571,9 @@ sub complete_lookups {

    if (%$pending) {  # any outstanding requests still?
      $self->{last_poll_responses_time} = $now;
-      my $nfound = $self->{main}->{resolver}->poll_responses($timeout);
-      dbg("async: select found %s responses ready (t.o.=%.1f)",
-          !$nfound ? 'no' : $nfound,  $timeout);
+      my ($nfound, $ncb) = $self->{main}->{resolver}->poll_responses($timeout);
+      dbg("async: select found %d responses ready (t.o.=%.1f), did %d callbacks",
+          $nfound, $timeout, $ncb);
    }
    $now = time;  # capture new timestamp, after possible sleep in 'select'

@ -475,18 +589,19 @@ sub complete_lookups {
        $anydone = 1;
        $ent->{finish_time} = $now  if !defined $ent->{finish_time};
        my $elapsed = $ent->{finish_time} - $ent->{start_time};
-        dbg("async: completed in %.3f s: %s", $elapsed, $ent->{display_id});
-        $self->{timing_by_query}->{". $key"} += $elapsed;
+        my @rulenames = keys %{$self->{rules_for_key}->{$key}};
+        dbg("async: completed in %.3f s: %s, rules: %s",
+            $elapsed, $ent->{display_id}, join(", ", @rulenames));
+        $self->{timing_by_query}->{". $key ($ent->{type})"} += $elapsed;
        $self->{queries_completed}++;
-        $self->{total_queries_completed}++;
        delete $pending->{$key};
      }
    }

    if (%$pending) {  # still any requests outstanding? are they expired?
      my $r =
-        !$allow_aborting_of_expired || !$self->{total_queries_started} ? 1.0
-        : $self->{total_queries_completed} / $self->{total_queries_started};
+        !$allow_aborting_of_expired || !$self->{queries_started} ? 1.0
+        : $self->{queries_completed} / $self->{queries_started};
      my $r2 = $r * $r;  # 0..1
      while (my($key,$ent) = each %$pending) {
        $typecount{$ent->{type}}++;
@ -496,8 +611,6 @@ sub complete_lookups {
        $dt = 1 + int $dt  if $timer_resolution == 1 && $dt > int $dt;
        $allexpired = 0  if $now <= $ent->{start_time} + $dt;
      }
-      dbg("async: queries completed: %d, started: %d",
-          $self->{queries_completed}, $self->{queries_started});
    }

    # ensure we don't get stuck if a request gets lost in the ether.
@ -511,9 +624,9 @@ sub complete_lookups {
      $alldone = 1;
    }
    else {
-      dbg("async: queries active: %s%s at %s",
+      dbg("async: queries still pending: %s%s",
          join (' ', map { "$_=$typecount{$_}" } sort keys %typecount),
-          $allexpired ? ', all expired' : '', scalar(localtime(time)));
+          $allexpired ? ', all expired' : '');
      $alldone = 0;
    }
    1;
@ -544,15 +657,20 @@ sub abort_remaining_lookups {
  my $foundcnt = 0;
  my $now = time;

+  $self->{pending_rules} = {};
+
  while (my($key,$ent) = each %$pending) {
-    dbg("async: aborting after %.3f s, %s: %s",
-        $now - $ent->{start_time},
+    my $dur = $now - $ent->{start_time};
+    my @rulenames = keys %{$self->{rules_for_key}->{$key}};
+    my $msg = sprintf( "async: aborting after %.3f s, %s: %s, rules: %s",
+        $dur,
        (defined $ent->{timeout_initial} &&
         $now > $ent->{start_time} + $ent->{timeout_initial}
           ? 'past original deadline' : 'deadline shrunk'),
-        $ent->{display_id} );
+        $ent->{display_id}, join(", ", @rulenames) );
+    $dur > 1 ? info($msg) : dbg($msg);
    $foundcnt++;
-    $self->{timing_by_query}->{"X $key"} = $now - $ent->{start_time};
+    $self->{timing_by_query}->{"X $key"} = $dur;
    $ent->{finish_time} = $now  if !defined $ent->{finish_time};
    delete $pending->{$key};
  }
@ -566,8 +684,10 @@ sub abort_remaining_lookups {
    foreach my $tuple (@{$dns_query_info->{applicants}}) {
      my($ent, $cb) = @$tuple;
      if ($cb) {
-        dbg("async: calling callback/abort on key %s%s", $dnskey,
-            !defined $ent->{rulename} ? '' : ", rule ".$ent->{rulename});
+        my @rulenames = grep { defined } (ref $ent->{rulename} ?
+                  @{$ent->{rulename}} : $ent->{rulename});
+        dbg("async: calling callback/abort on key %s, rules: %s", $dnskey,
+            join(", ", @rulenames));
        $cb_count++;
        eval {
          $cb->($ent, undef); 1;
@ -575,7 +695,7 @@ sub abort_remaining_lookups {
          chomp $@;
          # resignal if alarm went off
          die "async: (2) $@\n"  if $@ =~ /__alarm__ignore__\(.*\)/s;
-          warn sprintf("query %s aborted, callback %s failed: %s\n",
+          warn sprintf("async: query %s aborted, callback %s failed: %s\n",
                       $dnskey, $ent->{key}, $@);
        };
      }
@ -594,6 +714,8 @@ sub abort_remaining_lookups {

 =item $async->set_response_packet($id, $pkt, $key, $timestamp)

+For internal use, do not call from plugins.
+
 Register a "response packet" for a given query.  C<$id> is the ID for the
 query, and must match the C<id> supplied in C<start_lookup()>. C<$pkt> is the
 packet object for the response. A parameter C<$key> identifies an entry in a
@ -645,6 +767,8 @@ sub set_response_packet {

 =item $async->report_id_complete($id,$key,$key,$timestamp)

+DEPRECATED since 4.0.0. Do not use.
+
 Legacy. Equivalent to $self->set_response_packet($id,undef,$key,$timestamp),
 i.e. providing undef as a response packet. Register that a query has
 completed and is no longer "pending". C<$id> is the ID for the query,
--- a/upstream/lib/Mail/SpamAssassin/AutoWelcomelist.pm
+++ b/upstream/lib/Mail/SpamAssassin/AutoWelcomelist.pm
@ -17,7 +17,7 @@

 =head1 NAME

-Mail::SpamAssassin::AutoWhitelist - auto-whitelist handler for SpamAssassin
+Mail::SpamAssassin::AutoWelcomelist - auto-welcomelist handler for SpamAssassin

 =head1 SYNOPSIS

@ -27,10 +27,10 @@ Mail::SpamAssassin::AutoWhitelist - auto-whitelist handler for SpamAssassin
 =head1 DESCRIPTION

 Mail::SpamAssassin is a module to identify spam using text analysis and
-several internet-based realtime blacklists.
+several internet-based realtime blocklists.

 This class is used internally by SpamAssassin to manage the automatic
-whitelisting functionality.  Please refer to the C<Mail::SpamAssassin>
+welcomelisting functionality.  Please refer to the C<Mail::SpamAssassin>
 documentation for public interfaces.

 =head1 METHODS
@ -39,7 +39,7 @@ documentation for public interfaces.

 =cut

-package Mail::SpamAssassin::AutoWhitelist;
+package Mail::SpamAssassin::AutoWelcomelist;

 use strict;
 use warnings;
@ -64,9 +64,9 @@ sub new {
  my $conf = $main->{conf};
  my $self = {
    main          => $main,
-    factor        => $conf->{auto_whitelist_factor},
-    ipv4_mask_len => $conf->{auto_whitelist_ipv4_mask_len},
-    ipv6_mask_len => $conf->{auto_whitelist_ipv6_mask_len},
+    factor        => $conf->{auto_welcomelist_factor},
+    ipv4_mask_len => $conf->{auto_welcomelist_ipv4_mask_len},
+    ipv6_mask_len => $conf->{auto_welcomelist_ipv6_mask_len},
  };

  my $factory;
@ -74,23 +74,22 @@ sub new {
    $factory = $main->{pers_addr_list_factory};
  }
  else {
-    my $type = $conf->{auto_whitelist_factory};
+    my $type = $conf->{auto_welcomelist_factory};
    if ($type =~ /^([_A-Za-z0-9:]+)$/) {
      $type = untaint_var($type);
      eval '
  	require '.$type.';
        $factory = '.$type.'->new();
        1;
-           '
-      or do {
+      ' or do {
 	my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-	warn "auto-whitelist: $eval_stat\n";
+	warn "auto-welcomelist: $eval_stat\n";
 	undef $factory;
      };
      $main->set_persistent_address_list_factory($factory) if $factory;
    }
    else {
-      warn "auto-whitelist: illegal auto_whitelist_factory setting\n";
+      warn "auto-welcomelist: illegal auto_welcomelist_factory setting\n";
    }
  }

@ -136,7 +135,7 @@ sub check_address {
      my $noipent = $self->{checker}->get_addr_entry ($noipaddr, undef);

      if (defined $noipent->{msgcount} && $noipent->{msgcount} > 0) {
-	dbg("auto-whitelist: found entry w/o IP address for $addr: replacing with $origip");
+	dbg("auto-welcomelist: found entry w/o IP address for $addr: replacing with $origip");
 	$self->{checker}->remove_entry($noipent);
        # Now assign proper entry the count and totscore values of the
        # no-IP entry instead of assigning the whole value to avoid
@ -151,7 +150,7 @@ sub check_address {
      $entry->{msgcount} != $entry->{msgcount} ||  # test for NaN
      $entry->{totscore} != $entry->{totscore})
  {
-    warn "auto-whitelist: resetting bad data for ($addr, $origip), ".
+    warn "auto-welcomelist: resetting bad data for ($addr, $origip), ".
         "count: $entry->{msgcount}, totscore: $entry->{totscore}\n";
    $entry->{msgcount} = $entry->{totscore} = 0;
  }
@ -164,7 +163,7 @@ sub check_address {
 =item awl->count();

 This method will return the count of messages used in determining the
-whitelist correction.
+welcomelist correction.

 =cut

@ -191,7 +190,7 @@ sub add_score {
    return;		# no factory defined; we can't check
  }
  if ($score != $score) {
-    warn "auto-whitelist: attempt to add a $score to AWL entry ignored\n";
+    warn "auto-welcomelist: attempt to add a $score to AWL entry ignored\n";
    return;		# don't try to add a NaN
  }

@ -204,7 +203,7 @@ sub add_score {
 =item awl->add_known_good_address($addr);

 This method will add a score of -100 to the given address -- effectively
-"bootstrapping" the address as being one that should be whitelisted.
+"bootstrapping" the address as being one that should be welcomelisted.

 =cut

@ -220,7 +219,7 @@ sub add_known_good_address {
 =item awl->add_known_bad_address($addr);

 This method will add a score of 100 to the given address -- effectively
-"bootstrapping" the address as being one that should be blacklisted.
+"bootstrapping" the address as being one that should be blocklisted.

 =cut

@ -295,13 +294,13 @@ sub ip_to_awl_key {
    } else {
      my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
      if (!defined $origip_obj) {  # invalid IPv4 address
-        dbg("auto-whitelist: bad IPv4 address $origip");
+        dbg("auto-welcomelist: bad IPv4 address $origip");
      } else {
        $result = $origip_obj->network->addr;
        $result =~s/(\.0){1,3}\z//;  # truncate zero tail
      }
    }
-  } elsif ($origip =~ /:/ &&  # triage
+  } elsif (index($origip, ':') >= 0 &&  # triage
           $origip =~
           /^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
    # looks like an IPv6 address
@ -309,19 +308,19 @@ sub ip_to_awl_key {
    $mask_len = 48  if !defined $mask_len;
    my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
    if (!defined $origip_obj) {  # invalid IPv6 address
-      dbg("auto-whitelist: bad IPv6 address $origip");
+      dbg("auto-welcomelist: bad IPv6 address $origip");
    } elsif (NetAddr::IP->can('full6')) {  # since NetAddr::IP 4.010
      $result = $origip_obj->network->full6;  # string in a canonical form
      $result =~ s/(:0000){1,7}\z/::/;        # compress zero tail
    }
  } else {
-    dbg("auto-whitelist: bad IP address $origip");
+    dbg("auto-welcomelist: bad IP address $origip");
  }
  if (defined $result && length($result) > 39) {  # just in case, keep under
    $result = substr($result,0,39);               # the awl.ip field size
  }
  if (defined $result) {
-    dbg("auto-whitelist: IP masking %s -> %s", $origip,$result);
+    dbg("auto-welcomelist: IP masking %s -> %s", $origip,$result);
  }
  return $result;
 }
--- a/upstream/lib/Mail/SpamAssassin/Bayes.pm
+++ b/upstream/lib/Mail/SpamAssassin/Bayes.pm
@ -111,7 +111,7 @@ sub learn {
  {
    # DMK, koppel@ece.lsu.edu:  Hoping that the ultimate fix to bug 2263 will
    # make it unnecessary to construct a PerMsgStatus here.
-    my $PMS = new Mail::SpamAssassin::PerMsgStatus $self->{main}, $msg;
+    my $PMS = Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $msg);
    my $ignore = $self->ignore_message($PMS);
    $PMS->finish();
    return 0 if $ignore;
--- a/upstream/lib/Mail/SpamAssassin/BayesStore/BDB.pm
+++ b/upstream/lib/Mail/SpamAssassin/BayesStore/BDB.pm
@ -35,14 +35,11 @@ use Errno qw(EBADF);
 #use Data::Dumper;
 use File::Basename;
 use File::Path;
-
-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
-}
+use Digest::SHA qw(sha1);

 use Mail::SpamAssassin::BayesStore;
 use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Util qw(compile_regexp);

 our @ISA = qw( Mail::SpamAssassin::BayesStore );

@ -77,6 +74,7 @@ sub new {

 sub DESTROY {
  my $self = shift;
+
  $self->_close_db;
 }

@ -650,6 +648,14 @@ sub dump_tokens {
  my($self, $template, $regex, @vars) = @_;
  dbg("bayes: dump_tokens starting");

+  if (defined $regex) {
+    my ($rec, $err) = compile_regexp($regex, 2);
+    if (!$rec) {
+      die "Invalid dump_tokens regex '$regex': $err\n";
+    }
+    $regex = $rec;
+  }
+
  my $cursor = $self->{handles}->{tokens}->db_cursor;
  $cursor or die "Couldn't get cursor: $BerkeleyDB::Error";
  my ($token, $value) = ("", "");
--- a/upstream/lib/Mail/SpamAssassin/BayesStore/DBM.pm
+++ b/upstream/lib/Mail/SpamAssassin/BayesStore/DBM.pm
@ -27,14 +27,10 @@ use Errno qw(EBADF);
 use File::Basename;
 use File::Spec;
 use File::Path;
-
-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
-}
+use Digest::SHA qw(sha1);

 use Mail::SpamAssassin;
-use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows);
+use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows compile_regexp);
 use Mail::SpamAssassin::BayesStore;
 use Mail::SpamAssassin::Logger;

@ -992,9 +988,17 @@ sub get_storage_variables {
 sub dump_db_toks {
  my ($self, $template, $regex, @vars) = @_;

+  if (defined $regex) {
+    my ($rec, $err) = compile_regexp($regex, 2);
+    if (!$rec) {
+      die "Invalid dump_tokens regex '$regex': $err\n";
+    }
+    $regex = $rec;
+  }
+
  while (my ($tok, $tokvalue) = each %{$self->{db_toks}}) {
    next if ($tok =~ MAGIC_RE); # skip magic tokens
-    next if (defined $regex && ($tok !~ /$regex/o));
+    next if (defined $regex && $tok !~ /$regex/o);

    # We have the value already, so just unpack it.
    my ($ts, $th, $atime) = $self->tok_unpack ($tokvalue);
--- a/upstream/lib/Mail/SpamAssassin/BayesStore/MySQL.pm
+++ b/upstream/lib/Mail/SpamAssassin/BayesStore/MySQL.pm
@ -30,6 +30,9 @@ In addition, this module will support rollback on error, if you are
 using the InnoDB database table type in MySQL.  For more information
 please review the instructions in sql/README.bayes.

+This module is also compatible with MariaDB and DBD::MariaDB can be used
+instead of DBD::mysql driver.
+
 =cut

 package Mail::SpamAssassin::BayesStore::MySQL;
@ -292,7 +295,8 @@ sub set_running_expire_tok {

  return 0 unless (defined($self->{_dbh}));

-  my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)";
+  my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)
+             ON DUPLICATE KEY UPDATE runtime=VALUES(runtime)";

  my $time = time();

@ -339,6 +343,147 @@ sub remove_running_expire_tok {
  return 1;
 }

+=head2 tok_get
+
+public instance (Integer, Integer, Integer) tok_get (String $token)
+
+Description:
+This method retrieves a specified token (C<$token>) from the database
+and returns it's spam_count, ham_count and last access time.
+
+=cut
+
+sub tok_get {
+  my ($self, $token) = @_;
+
+  return (0,0,0) unless (defined($self->{_dbh}));
+
+  my $sql = "SELECT spam_count, ham_count, atime
+               FROM bayes_token
+              WHERE id = ?
+                AND token = ?";
+
+  my $sth = $self->{_dbh}->prepare_cached($sql);
+
+  unless (defined($sth)) {
+    dbg("bayes: tok_get: SQL error: ".$self->{_dbh}->errstr());
+    $self->{_dbh}->rollback();
+    return (0,0,0);
+  }
+
+  $sth->bind_param(1, $self->{_userid});
+  $sth->bind_param(2, $token, DBI::SQL_BINARY);
+
+  my $rc = $sth->execute();
+
+  unless ($rc) {
+    dbg("bayes: tok_get: SQL error: ".$self->{_dbh}->errstr());
+    $self->{_dbh}->rollback();
+    return (0,0,0);
+  }
+
+  my ($spam_count, $ham_count, $atime) = $sth->fetchrow_array();
+
+  $sth->finish();
+
+  $spam_count = 0 if (!$spam_count || $spam_count < 0);
+  $ham_count = 0 if (!$ham_count || $ham_count < 0);
+  $atime = 0 if (!$atime);
+
+  return ($spam_count, $ham_count, $atime)
+}
+
+=head2 tok_get_all
+
+public instance (\@) tok_get (@ $tokens)
+
+Description:
+This method retrieves the specified tokens (C<$tokens>) from storage and returns
+an array ref of arrays spam count, ham count and last access time.
+
+=cut
+
+sub tok_get_all {
+  my ($self, @tokens) = @_;
+
+  return [] unless (defined($self->{_dbh}));
+
+  my $token_list_size = scalar(@tokens);
+  dbg("bayes: tok_get_all: token count: $token_list_size");
+  my @tok_results;
+
+  my $search_index = 0;
+  my $results_index = 0;
+  my $bunch_end;
+
+  my $token_select = $self->_token_select_string();
+
+  my $multi_sql = "SELECT $token_select, spam_count, ham_count, atime
+                     FROM bayes_token
+                    WHERE id = ?
+                      AND token IN ";
+
+  # fetch tokens in bunches of 100 until there are <= 100 left, then just fetch the rest
+  while ($token_list_size > $search_index) {
+    my $bunch_size;
+    if ($token_list_size - $search_index > 100) {
+      $bunch_size = 100;
+    }
+    else {
+      $bunch_size = $token_list_size - $search_index;
+    }
+    while ($token_list_size - $search_index >= $bunch_size) {
+      my @tok;
+      my $in_str = '(';
+
+      $bunch_end = $search_index + $bunch_size;
+      for ( ; $search_index < $bunch_end; $search_index++) {
+	$in_str .= '?,';
+	push(@tok, $tokens[$search_index]);
+      }
+      chop $in_str;
+      $in_str .= ')';
+
+      my $dynamic_sql = $multi_sql . $in_str;
+
+      my $sth = $self->{_dbh}->prepare($dynamic_sql);
+
+      unless (defined($sth)) {
+	dbg("bayes: tok_get_all: SQL error: ".$self->{_dbh}->errstr());
+	$self->{_dbh}->rollback();
+	return [];
+      }
+
+      my $idx = 0;
+      $sth->bind_param(++$idx, $self->{_userid});
+      $sth->bind_param(++$idx, $_, DBI::SQL_BINARY) foreach (@tok);
+
+      my $rc = $sth->execute();
+
+      unless ($rc) {
+	dbg("bayes: tok_get_all: SQL error: ".$self->{_dbh}->errstr());
+	$self->{_dbh}->rollback();
+	return [];
+      }
+
+      my $results = $sth->fetchall_arrayref();
+
+      $sth->finish();
+
+      foreach my $result (@{$results}) {
+	# Make sure that spam_count and ham_count are not negative
+	$result->[1] = 0 if (!$result->[1] || $result->[1] < 0);
+	$result->[2] = 0 if (!$result->[2] || $result->[2] < 0);
+	# Make sure that atime has a value
+	$result->[3] = 0 if (!$result->[3]);
+	$tok_results[$results_index++] = $result;
+      }
+    }
+  }
+
+  return \@tok_results;
+}
+
 =head2 nspam_nham_change

 public instance (Boolean) nspam_nham_change (Integer $num_spam,
@ -421,10 +566,22 @@ sub tok_touch {
                AND token = ?
                AND atime < ?";

-  my $rows = $self->{_dbh}->do($sql, undef, $atime, $self->{_userid},
-			       $token, $atime);
+  my $sth = $self->{_dbh}->prepare_cached($sql);

-  unless (defined($rows)) {
+  unless (defined($sth)) {
+    dbg("bayes: tok_touch: SQL error: ".$self->{_dbh}->errstr());
+    $self->{_dbh}->rollback();
+    return 0;
+  }
+
+  $sth->bind_param(1, $atime);
+  $sth->bind_param(2, $self->{_userid});
+  $sth->bind_param(3, $token, DBI::SQL_BINARY);
+  $sth->bind_param(4, $atime);
+
+  my $rows = $sth->execute();
+
+  unless ($rows) {
    dbg("bayes: tok_touch: SQL error: ".$self->{_dbh}->errstr());
    $self->{_dbh}->rollback();
    return 0;
@ -478,20 +635,29 @@ sub tok_touch_all {
  return 1 unless (scalar(@{$tokens}));

  my $sql = "UPDATE bayes_token SET atime = ? WHERE id = ? AND token IN (";
-
-  my @bindings = ($atime, $self->{_userid});
-  foreach my $token (@{$tokens}) {
+  foreach (@{$tokens}) {
    $sql .= "?,";
-    push(@bindings, $token);
  }
  chop($sql); # get rid of trailing ,
-
  $sql .= ") AND atime < ?";
-  push(@bindings, $atime);

-  my $rows = $self->{_dbh}->do($sql, undef, @bindings);
+  my $sth = $self->{_dbh}->prepare($sql);

-  unless (defined($rows)) {
+  unless (defined($sth)) {
+    dbg("bayes: tok_touch_all: SQL error: ".$self->{_dbh}->errstr());
+    $self->{_dbh}->rollback();
+    return [];
+  }
+
+  my $idx = 0;
+  $sth->bind_param(++$idx, $atime);
+  $sth->bind_param(++$idx, $self->{_userid});
+  $sth->bind_param(++$idx, $_, DBI::SQL_BINARY) foreach (@{$tokens});
+  $sth->bind_param(++$idx, $atime);
+
+  my $rows = $sth->execute();
+
+  unless ($rows) {
    dbg("bayes: tok_touch_all: SQL error: ".$self->{_dbh}->errstr());
    $self->{_dbh}->rollback();
    return 0;
@ -735,7 +901,8 @@ sub _initialize_db {
    return 0;
  }

-  $id = $self->{_dbh}->{'mysql_insertid'};
+  $id = $self->{_dsn} =~ /^DBI:MariaDB/i ?
+    $self->{_dbh}->{'mariadb_insertid'} : $self->{_dbh}->{'mysql_insertid'};

  $self->{_dbh}->commit();

@ -797,10 +964,12 @@ sub _put_token {
      return 0;
    }

-    my $rc = $sth->execute($spam_count,
-			   $ham_count,
-			   $self->{_userid},
-			   $token);
+    $sth->bind_param(1, $spam_count);
+    $sth->bind_param(2, $ham_count);
+    $sth->bind_param(3, $self->{_userid});
+    $sth->bind_param(4, $token, DBI::SQL_BINARY);
+
+    my $rc = $sth->execute();

    unless ($rc) {
      dbg("bayes: _put_token: SQL error: ".$self->{_dbh}->errstr());
@ -824,14 +993,16 @@ sub _put_token {
      return 0;
    }

-    my $rc = $sth->execute($self->{_userid},
-			   $token,
-			   $spam_count,
-			   $ham_count,
-			   $atime,
-			   $spam_count,
-			   $ham_count,
-			   $atime);
+    $sth->bind_param(1, $self->{_userid});
+    $sth->bind_param(2, $token, DBI::SQL_BINARY);
+    $sth->bind_param(3, $spam_count);
+    $sth->bind_param(4, $ham_count);
+    $sth->bind_param(5, $atime);
+    $sth->bind_param(6, $spam_count);
+    $sth->bind_param(7, $ham_count);
+    $sth->bind_param(8, $atime);
+
+    my $rc = $sth->execute();

    unless ($rc) {
      dbg("bayes: _put_token: SQL error: ".$self->{_dbh}->errstr());
@ -948,12 +1119,15 @@ sub _put_tokens {
      return 0;
    }

+    $sth->bind_param(1, $spam_count);
+    $sth->bind_param(2, $ham_count);
+    $sth->bind_param(3, $self->{_userid});
+    # 4, update token in foreach loop
+
    my $error_p = 0;
    foreach my $token (keys %{$tokens}) {
-      my $rc = $sth->execute($spam_count,
-			     $ham_count,
-			     $self->{_userid},
-			     $token);
+      $sth->bind_param(4, $token, DBI::SQL_BINARY);
+      my $rc = $sth->execute();

      unless ($rc) {
 	dbg("bayes: _put_tokens: SQL error: ".$self->{_dbh}->errstr());
@ -984,18 +1158,21 @@ sub _put_tokens {
      return 0;
    }

+    $sth->bind_param(1, $self->{_userid});
+    # 2, update token in foreach loop
+    $sth->bind_param(3, $spam_count);
+    $sth->bind_param(4, $ham_count);
+    $sth->bind_param(5, $atime);
+    $sth->bind_param(6, $spam_count);
+    $sth->bind_param(7, $ham_count);
+    $sth->bind_param(8, $atime);
+
    my $error_p = 0;
    my $new_tokens = 0;
    my $need_atime_update_p = 0;
    foreach my $token (keys %{$tokens}) {
-      my $rc = $sth->execute($self->{_userid},
-			     $token,
-			     $spam_count,
-			     $ham_count,
-			     $atime,
-			     $spam_count,
-			     $ham_count,
-			     $atime);
+      $sth->bind_param(2, $token, DBI::SQL_BINARY);
+      my $rc = $sth->execute();

      if (!$rc) {
 	dbg("bayes: _put_tokens: SQL error: ".$self->{_dbh}->errstr());
@ -1070,6 +1247,22 @@ sub _put_tokens {
  return 1;
 }

+=head2 _token_select_string
+
+private instance (String) _token_select_string
+
+Description:
+This method returns the string to be used in SELECT statements to represent
+the token column.
+
+The default is to use the RPAD function to pad the token out to 5 characters.
+
+=cut
+
+sub _token_select_string {
+  return "RPAD(token, 5, ' ')";
+}
+
 sub sa_die { Mail::SpamAssassin::sa_die(@_); }

 1;
--- a/upstream/lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
+++ b/upstream/lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
@ -51,7 +51,7 @@ use constant HAS_DBI => eval { require DBI; };

 # We need this so we can import the pg_types, since this is a DBD::Pg specific module it should be ok
 # YUCK! This little require/import trick is required for the rpm stuff
-BEGIN { require DBD::Pg; import DBD::Pg qw(:pg_types); }
+BEGIN { require DBD::Pg; DBD::Pg->import(qw(:pg_types)); }

 =head1 METHODS

--- a/upstream/lib/Mail/SpamAssassin/BayesStore/Redis.pm
+++ b/upstream/lib/Mail/SpamAssassin/BayesStore/Redis.pm
@ -119,16 +119,12 @@ use warnings;
 # use bytes;
 use re 'taint';
 use Errno qw(EBADF);
-use Mail::SpamAssassin::Util qw(untaint_var);
-use Mail::SpamAssassin::Timeout;
+use Digest::SHA qw(sha1);

-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
-}
-
-use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::BayesStore;
+use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Timeout;
+use Mail::SpamAssassin::Util qw(untaint_var);
 use Mail::SpamAssassin::Util::TinyRedis;

 our $VERSION = 0.09;
--- a/upstream/lib/Mail/SpamAssassin/BayesStore/SQL.pm
+++ b/upstream/lib/Mail/SpamAssassin/BayesStore/SQL.pm
@ -21,7 +21,11 @@ Mail::SpamAssassin::BayesStore::SQL - SQL Bayesian Storage Module Implementation

 =head1 DESCRIPTION

-This module implements a SQL based bayesian storage module.
+This module implements a SQL based bayesian storage module.  It's compatible
+with SQLite and possibly other standard SQL servers.
+
+Do not use this for MySQL/MariaDB or PgSQL, they have their own specific
+modules.

 =cut

@ -32,11 +36,7 @@ use warnings;
 # use bytes;
 use re 'taint';
 use Errno qw(EBADF);
-
-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
-}
+use Digest::SHA qw(sha1);

 use Mail::SpamAssassin::BayesStore;
 use Mail::SpamAssassin::Logger;
@ -1676,6 +1676,13 @@ sub _connect_db {
    dbg("bayes: database connection established");
  }

+  # SQLite PRAGMA attributes - here for tests, see bug 8033
+  if ($self->{_dsn} =~ /^dbi:SQLite:.*?;(.+)/i) {
+    foreach my $attr (split(/;/, $1)) {
+      $dbh->do("PRAGMA $attr");
+    }
+  }
+  
  $self->{_dbh} = $dbh;

 return 1;
@ -2348,12 +2355,13 @@ Description:
 This method returns the string to be used in SELECT statements to represent
 the token column.

-The default is to use the RPAD function to pad the token out to 5 characters.
+The default is to use the SUBSTR function to pad the token out to 5 characters.

 =cut

 sub _token_select_string {
-  return "RPAD(token, 5, ' ')";
+  # Use SQLite compatible RPAD alternative
+  return "SUBSTR(token || '     ', 1, 5)";
 }

 sub sa_die { Mail::SpamAssassin::sa_die(@_); }
--- a/upstream/lib/Mail/SpamAssassin/Client.pm
+++ b/upstream/lib/Mail/SpamAssassin/Client.pm
@ -106,6 +106,10 @@ sub new {
    $self->{username} = $args->{username};
  }

+  if ($args->{max_size}) {
+    $self->{max_size} = $args->{max_size};
+  }
+
  if ($args->{timeout}) {
    $self->{timeout} = $args->{timeout} || 30;
  }
@ -135,6 +139,8 @@ threshold

 message

+report
+
 =cut

 sub process {
@ -150,6 +156,41 @@ sub process {
  return $self->_filter($msg, $command);
 }

+=head2 spam_report
+
+public instance (\%) spam_report (String $msg)
+
+Description:
+The method implements the report call.
+
+See the process method for the return value.
+
+=cut
+
+sub spam_report {
+  my ($self, $msg) = @_;
+
+  return $self->_filter($msg, 'REPORT');
+}
+
+=head2 spam_report_ifspam
+
+public instance (\%) spam_report_ifspam (String $msg)
+
+Description:
+The method implements the report_ifspam call.
+A report will be returned only if the message is spam.
+
+See the process method for the return value.
+
+=cut
+
+sub spam_report_ifspam {
+  my ($self, $msg) = @_;
+
+  return $self->_filter($msg, 'REPORT_IFSPAM');
+}
+
 =head2 check

 public instance (\%) check (String $msg)
@ -270,10 +311,10 @@ sub learn {
  close $remote  or die "error closing socket: $!";

  if ($learntype == 0 || $learntype == 1) {
-    return $did_set =~ /local/;
+    return index($did_set, 'local') >= 0;
  }
  else { #safe since we've already checked the $learntype values
-    return $did_remove =~ /local/;
+    return index($did_remove, 'local') >= 0;
  }
 }

@ -534,12 +575,15 @@ threshold

 message (if available)

+report (if available)
+
 =cut

 sub _filter {
  my ($self, $msg, $command) = @_;

  my %data;
+  my $msgsize;

  $self->_clear_errors();

@ -547,7 +591,10 @@ sub _filter {

  return 0 unless ($remote);

-  my $msgsize = length($msg.$EOL);
+  if(defined $self->{max_size}) {
+    $msg = substr($msg,0,$self->{max_size});
+  }
+  $msgsize = length($msg.$EOL);

  print $remote "$command $PROTOVERSION$EOL";
  print $remote "Content-length: $msgsize$EOL";
@ -595,7 +642,11 @@ sub _filter {
    $!==EBADF ? dbg("error reading from spamd (10): $!")
              : die "error reading from spamd (10): $!";

+  if($command =~ /^REPORT/) {
+    $data{report} = $return_msg if ($return_msg);
+  } else {
    $data{message} = $return_msg if ($return_msg);
+  }

  close $remote  or die "error closing socket: $!";

--- a/upstream/lib/Mail/SpamAssassin/Conf.pm
+++ b/upstream/lib/Mail/SpamAssassin/Conf.pm
--- a/upstream/lib/Mail/SpamAssassin/Conf/LDAP.pm
+++ b/upstream/lib/Mail/SpamAssassin/Conf/LDAP.pm
@ -27,7 +27,7 @@ Mail::SpamAssassin::Conf::LDAP - load SpamAssassin scores from LDAP database
 =head1 DESCRIPTION

 Mail::SpamAssassin is a module to identify spam using text analysis and
-several internet-based realtime blacklists.
+several internet-based realtime blocklists.

 This class is used internally by SpamAssassin to load scores from an LDAP
 database.  Please refer to the C<Mail::SpamAssassin> documentation for public
@ -191,6 +191,9 @@ sub load_with_ldap {
  }
  if ($config_text ne '') {
    $conf->{main} = $main;
+    $config_text = "file start (ldap config)\n".
+                   $config_text.
+                   "file end (ldap config)\n";
    $conf->parse_scores_only($config_text);
    delete $conf->{main};
  }
--- a/upstream/lib/Mail/SpamAssassin/Conf/Parser.pm
+++ b/upstream/lib/Mail/SpamAssassin/Conf/Parser.pm
--- a/upstream/lib/Mail/SpamAssassin/Conf/SQL.pm
+++ b/upstream/lib/Mail/SpamAssassin/Conf/SQL.pm
@ -27,7 +27,7 @@ Mail::SpamAssassin::Conf::SQL - load SpamAssassin scores from SQL database
 =head1 DESCRIPTION

 Mail::SpamAssassin is a module to identify spam using text analysis and
-several internet-based realtime blacklists.
+several internet-based realtime blocklists.

 This class is used internally by SpamAssassin to load scores from an SQL
 database.  Please refer to the C<Mail::SpamAssassin> documentation for public
@ -166,6 +166,9 @@ sub load_with_dbi {
 	 }
 	 if ($config_text ne '') {
 	   $conf->{main} = $main;
+	   $config_text = "file start (sql config)\n".
+	                  $config_text.
+	                  "file end (sql config)\n";
 	   $conf->parse_scores_only($config_text);
 	   delete $conf->{main};
 	 }
--- a/upstream/lib/Mail/SpamAssassin/Constants.pm
+++ b/upstream/lib/Mail/SpamAssassin/Constants.pm
@ -34,13 +34,14 @@ our(@BAYES_VARS, @IP_VARS, @SA_VARS, %EXPORT_TAGS, @EXPORT_OK);
 BEGIN { 
  @IP_VARS = qw(
 	IP_IN_RESERVED_RANGE IP_PRIVATE LOCALHOST IPV4_ADDRESS IP_ADDRESS
+	IS_IP_PRIVATE IS_LOCALHOST IS_IPV4_ADDRESS IS_IP_ADDRESS
  );
  @BAYES_VARS = qw(
 	DUMP_MAGIC DUMP_TOKEN DUMP_BACKUP 
  );
  # These are generic constants that may be used across several modules
  @SA_VARS = qw(
-	HARVEST_DNSBL_PRIORITY MBX_SEPARATOR
+	MBX_SEPARATOR
 	MAX_BODY_LINE_LENGTH MAX_HEADER_KEY_LENGTH MAX_HEADER_VALUE_LENGTH
 	MAX_HEADER_LENGTH ARITH_EXPRESSION_LEXER AI_TIME_UNKNOWN
 	CHARSETS_LIKELY_TO_FP_AS_CAPS MAX_URI_LENGTH RULENAME_RE IS_RULENAME
@ -168,7 +169,10 @@ use constant IP_PRIVATE => qr{^(?:
    )
    (?![a-f0-9:])
  )
-)}oxi;
+)}xi;
+
+# exact match
+use constant IS_IP_PRIVATE => qr/^${\(IP_PRIVATE)}$/;

 # backward compatibility
 use constant IP_IN_RESERVED_RANGE => IP_PRIVATE;
@ -246,7 +250,10 @@ use constant LOCALHOST => qr/
 		      )
 		      (?![a-f0-9:])
 		    )
-		  /oxi;
+		  /xi;
+
+# exact match
+use constant IS_LOCALHOST => qr/^${\(LOCALHOST)}$/;

 # ---------------------------------------------------------------------------
 # an IP address, in IPv4 format only.
@ -256,11 +263,13 @@ use constant IPV4_ADDRESS => qr/\b
                    (?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
                    (?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)\.
                    (?:1\d\d|2[0-4]\d|25[0-5]|[1-9]\d|\d)
-                  \b/ox;
+                  \b/x;
+
+# exact match
+use constant IS_IPV4_ADDRESS => qr/^${\(IPV4_ADDRESS)}$/;

 # ---------------------------------------------------------------------------
-# an IP address, in IPv4, IPv4-mapped-in-IPv6, or IPv6 format.  NOTE: cannot
-# just refer to $IPV4_ADDRESS, due to perl bug reported in nesting qr//s. :(
+# an IP address, in IPv4, IPv4-mapped-in-IPv6, or IPv6 format.
 #
 use constant IP_ADDRESS => qr/
 		    (?:
@ -344,12 +353,13 @@ use constant IP_ADDRESS => qr/
 		      )
 		      (?![a-f0-9:])
 		    )
-		  /oxi;
+		  /xi;
+
+# exact match
+use constant IS_IP_ADDRESS => qr/^${\(IP_ADDRESS)}$/;

 # ---------------------------------------------------------------------------

-use constant HARVEST_DNSBL_PRIORITY =>  500;
-
 # regular expression that matches message separators in The University of
 # Washington's MBX mailbox format
 use constant MBX_SEPARATOR => qr/^([\s\d]\d-[a-zA-Z]{3}-\d{4}\s\d{2}:\d{2}:\d{2}.*),(\d+);([\da-f]{12})-(\w{8})\r?$/;
@ -388,7 +398,7 @@ use constant ARITH_EXPRESSION_LEXER => qr/(?:
        !=|                                     # NEQ
        [\+\-\*\/]|                             # Mathematical Operator
        [\?:]                                   # ? : Operator
-      )/ox;
+      )/x;

 # ArchiveIterator

--- a/upstream/lib/Mail/SpamAssassin/DBBasedAddrList.pm
+++ b/upstream/lib/Mail/SpamAssassin/DBBasedAddrList.pm
@ -53,23 +53,23 @@ sub new_checker {
    'locked_file'	=> ''
  };

-  my @order = split(/\s+/, $main->{conf}->{auto_whitelist_db_modules});
+  my @order = split(/\s+/, $main->{conf}->{auto_welcomelist_db_modules});
  untaint_var(\@order);
  my $dbm_module = Mail::SpamAssassin::Util::first_available_module (@order);
  if (!$dbm_module) {
-    die "auto-whitelist: cannot find a usable DB package from auto_whitelist_db_modules: " .
-	$main->{conf}->{auto_whitelist_db_modules}."\n";
+    die "auto-welcomelist: cannot find a usable DB package from auto_welcomelist_db_modules: " .
+	$main->{conf}->{auto_welcomelist_db_modules}."\n";
  }

-  my $umask = umask ~ (oct($main->{conf}->{auto_whitelist_file_mode}));
+  my $umask = umask ~ (oct($main->{conf}->{auto_welcomelist_file_mode}));

  # if undef then don't worry -- empty hash!
-  if (defined($main->{conf}->{auto_whitelist_path})) {
-    my $path = $main->sed_path($main->{conf}->{auto_whitelist_path});
+  if (defined($main->{conf}->{auto_welcomelist_path})) {
+    my $path = $main->sed_path($main->{conf}->{auto_welcomelist_path});
    my ($mod1, $mod2);

    if ($main->{locker}->safe_lock
-            ($path, 30, $main->{conf}->{auto_whitelist_file_mode}))
+            ($path, 30, $main->{conf}->{auto_welcomelist_file_mode}))
    {
      $self->{locked_file} = $path;
      $self->{is_locked}   = 1;
@ -80,20 +80,20 @@ sub new_checker {
      ($mod1, $mod2) = ('R/O', O_RDONLY);
    }

-    dbg("auto-whitelist: tie-ing to DB file of type $dbm_module $mod1 in $path");
+    dbg("auto-welcomelist: tie-ing to DB file of type $dbm_module $mod1 in $path");

    ($self->{is_locked} && $dbm_module eq 'DB_File') and
            Mail::SpamAssassin::Util::avoid_db_file_locking_bug($path);

    if (! tie %{ $self->{accum} }, $dbm_module, $path, $mod2,
-            oct($main->{conf}->{auto_whitelist_file_mode}) & 0666)
+            oct($main->{conf}->{auto_welcomelist_file_mode}) & 0666)
    {
      my $err = $!;   # might get overwritten later
      if ($self->{is_locked}) {
        $self->{main}->{locker}->safe_unlock($self->{locked_file});
        $self->{is_locked} = 0;
      }
-      die "auto-whitelist: cannot open auto_whitelist_path $path: $err\n";
+      die "auto-welcomelist: cannot open auto_welcomelist_path $path: $err\n";
    }
  }
  umask $umask;
@ -106,10 +106,10 @@ sub new_checker {

 sub finish {
  my $self = shift;
-  dbg("auto-whitelist: DB addr list: untie-ing and unlocking");
+  dbg("auto-welcomelist: DB addr list: untie-ing and unlocking");
  untie %{$self->{accum}};
  if ($self->{is_locked}) {
-    dbg("auto-whitelist: DB addr list: file locked, breaking lock");
+    dbg("auto-welcomelist: DB addr list: file locked, breaking lock");
    $self->{main}->{locker}->safe_unlock ($self->{locked_file});
    $self->{is_locked} = 0;
  }
@ -128,7 +128,7 @@ sub get_addr_entry {
  $entry->{msgcount} = $self->{accum}->{$addr} || 0;
  $entry->{totscore} = $self->{accum}->{$addr.'|totscore'} || 0;

-  dbg("auto-whitelist: db-based $addr scores ".$entry->{msgcount}.'/'.$entry->{totscore});
+  dbg("auto-welcomelist: db-based $addr scores ".$entry->{msgcount}.'/'.$entry->{totscore});
  return $entry;
 }

@ -143,7 +143,7 @@ sub add_score {
    $entry->{msgcount}++;
    $entry->{totscore} += $score;

-    dbg("auto-whitelist: add_score: new count: ".$entry->{msgcount}.", new totscore: ".$entry->{totscore});
+    dbg("auto-welcomelist: add_score: new count: ".$entry->{msgcount}.", new totscore: ".$entry->{totscore});

    $self->{accum}->{$entry->{addr}} = $entry->{msgcount};
    $self->{accum}->{$entry->{addr}.'|totscore'} = $entry->{totscore};
--- a/upstream/lib/Mail/SpamAssassin/Dns.pm
+++ b/upstream/lib/Mail/SpamAssassin/Dns.pm
@ -29,7 +29,7 @@ use Mail::SpamAssassin::Conf;
 use Mail::SpamAssassin::PerMsgStatus;
 use Mail::SpamAssassin::AsyncLoop;
 use Mail::SpamAssassin::Constants qw(:ip);
-use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows);
+use Mail::SpamAssassin::Util qw(untaint_var am_running_on_windows compile_regexp);

 use File::Spec;
 use IO::Socket;
@ -37,30 +37,24 @@ use POSIX ":sys_wait_h";


 our $KNOWN_BAD_DIALUP_RANGES; # Nothing uses this var???
-our $LAST_DNS_CHECK;
+our $LAST_DNS_CHECK = 0;

 # use very well-connected domains (fast DNS response, many DNS servers,
 # geographical distribution is a plus, TTL of at least 3600s)
+# these MUST contain both A/AAAA records so we can test dns_options v6
+# Updated 8/2019 from https://ip6.nl/#!list?db=alexa500
+# 
 our @EXISTING_DOMAINS = qw{
-  adelphia.net
  akamai.com
-  apache.org
-  cingular.com
-  colorado.edu
-  comcast.net
-  doubleclick.com
-  ebay.com
-  gmx.net
+  bing.com
+  cloudflare.com
+  digitalpoint.com
+  facebook.com
  google.com
-  intel.com
-  kernel.org
-  linux.org
-  mit.edu
-  motorola.com
-  msn.com
-  sourceforge.net
-  sun.com
-  w3.org
+  linkedin.com
+  netflix.com
+  php.net
+  wikipedia.org
  yahoo.com
 };

@ -84,10 +78,6 @@ BEGIN {
 # local ($^W) = 0;

  no warnings;
-  eval {
-    require Net::DNS;
-    require Net::DNS::Resolver;
-  };
  eval {
    require MIME::Base64;
  };
@ -101,71 +91,52 @@ BEGIN {
 sub do_rbl_lookup {
  my ($self, $rule, $set, $type, $host, $subtest) = @_;

-  $host =~ s/\.\z//s;  # strip a redundant trailing dot
-  my $key = "dns:$type:$host";
-  my $existing_ent = $self->{async}->get_lookup($key);
-
-  # only make a specific query once
-  if (!$existing_ent) {
-    my $ent = {
-      key => $key,
-      zone => $host,  # serves to fetch other per-zone settings
-      type => "DNSBL-".$type,
-      sets => [ ],  # filled in below
-      rules => [ ], # filled in below
-      # id is filled in after we send the query below
-    };
-    $existing_ent = $self->{async}->bgsend_and_start_lookup(
-        $host, $type, undef, $ent,
-        sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
-      master_deadline => $self->{master_deadline} );
-  }
-
-  if ($existing_ent) {
-    # always add set
-    push @{$existing_ent->{sets}}, $set;
-
-    # sometimes match or always match
  if (defined $subtest) {
-      $self->{dnspost}->{$set}->{$subtest} = $rule;
-    } else {
-      push @{$existing_ent->{rules}}, $rule;
-    }
-
-    $self->{rule_to_rblkey}->{$rule} = $key;
-  }
-}
-
-# TODO: these are constant so they should only be added once at startup
-sub register_rbl_subtest {
-  my ($self, $rule, $set, $subtest) = @_;
-
    if ($subtest =~ /^sb:/) {
-    warn("dns: ignored $rule, SenderBase rules are deprecated\n");
+      info("dns: ignored $rule, SenderBase rules are deprecated");
      return 0;
    }
+    # Compile as regex if not pure ip/bitmask (same check in process_dnsbl_result)
+    if ($subtest !~ /^\d+(?:\.\d+\.\d+\.\d+)?$/) {
+      my ($rec, $err) = compile_regexp($subtest, 0);
+      if (!$rec) {
+        warn("dns: invalid rule $rule subtest regexp '$subtest': $err\n");
+        return 0;
+      }
+      $subtest = $rec;
+    }
+  }

-  $self->{dnspost}->{$set}->{$subtest} = $rule;
+  dbg("dns: launching rule %s, set %s, type %s, %s", $rule, $set, $type,
+    defined $subtest ? "subtest $subtest" : 'no subtest');
+
+  my $ent = {
+    rulename => $rule,
+    type => "DNSBL",
+    set => $set,
+    subtest => $subtest,
+  };
+  my $ret = $self->{async}->bgsend_and_start_lookup($host, $type, undef, $ent,
+    sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
+    master_deadline => $self->{master_deadline}
+  );
+
+  return 0 if defined $ret; # no query started
+  return; # return undef for async status
 }

+# Deprecated, was only used from DNSEval.pm?
 sub do_dns_lookup {
  my ($self, $rule, $type, $host) = @_;

-  $host =~ s/\.\z//s;  # strip a redundant trailing dot
-  my $key = "dns:$type:$host";
-
  my $ent = {
-    key => $key,
-    zone => $host,  # serves to fetch other per-zone settings
-    type => "DNSBL-".$type,
-    rules => [ $rule ],
-    # id is filled in after we send the query below
+    rulename => $rule,
+    type => "DNSBL",
  };
-  $ent = $self->{async}->bgsend_and_start_lookup(
-      $host, $type, undef, $ent,
+  $self->{async}->bgsend_and_start_lookup($host, $type, undef, $ent,
    sub { my($ent, $pkt) = @_; $self->process_dnsbl_result($ent, $pkt) },
-    master_deadline => $self->{master_deadline} );
-  $ent;
+    master_deadline => $self->{master_deadline}
+  );
 }

 ###########################################################################
@ -180,38 +151,31 @@ sub dnsbl_hit {
    # txtdata returns a non- zone-file-format encoded result, unlike rdstring;
    # avoid space-separated RDATA <character-string> fields if possible,
    # txtdata provides a list of strings in a list context since Net::DNS 0.69
-    $log = join('',$answer->txtdata);
+    $log = join('', $answer->txtdata);
+    utf8::encode($log)  if utf8::is_utf8($log);
    local $1;
    $log =~ s{ (?<! [<(\[] ) (https? : // \S+)}{<$1>}xgi;
  } else {  # assuming $answer->type eq 'A'
    local($1,$2,$3,$4,$5);
-    if ($question->string =~ m/^((?:[0-9a-fA-F]\.){32})(\S+\w)/) {
+    if ($question->string =~ /^((?:[0-9a-fA-F]\.){32})(\S+\w)/) {
      $log = ' listed in ' . lc($2);
      my $ipv6addr = join('', reverse split(/\./, lc $1));
      $ipv6addr =~ s/\G(....)/$1:/g;  chop $ipv6addr;
      $ipv6addr =~ s/:0{1,3}/:/g;
      $log = $ipv6addr . $log;
-    } elsif ($question->string =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)\.(\S+\w)/) {
+    } elsif ($question->string =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)\.(\S+\w)/) {
      $log = "$4.$3.$2.$1 listed in " . lc($5);
-    } else {
-      $log = 'listed in ' . $question->string;
+    } elsif ($question->string =~ /^(\S+)(?<!\.)/) {
+      $log = "listed in ".lc($1);
    }
  }

-  # TODO: this may result in some log messages appearing under the
-  # wrong rules, since we could see this sequence: { test one hits,
-  # test one's message is logged, test two hits, test one fires again
-  # on another IP, test one's message is logged for that other IP --
-  # but under test two's heading }.   Right now though it's better
-  # than just not logging at all.
-
-  $self->{already_logged} ||= { };
-  if ($log && !$self->{already_logged}->{$log}) {
-    $self->test_log($log);
-    $self->{already_logged}->{$log} = 1;
+  if ($log) {
+    $self->test_log($log, $rule);
  }

  if (!$self->{tests_already_hit}->{$rule}) {
+    dbg("dns: rbl rule $rule hit");
    $self->got_hit($rule, "RBL: ", ruletype => "dnsbl");
  }
 }
@ -219,16 +183,26 @@ sub dnsbl_hit {
 sub dnsbl_uri {
  my ($self, $question, $answer) = @_;

-  my $qname = $question->qname;
-
+  my $rdatastr;
+  if ($answer->UNIVERSAL::can('txtdata')) {
    # txtdata returns a non- zone-file-format encoded result, unlike rdstring;
    # avoid space-separated RDATA <character-string> fields if possible,
    # txtdata provides a list of strings in a list context since Net::DNS 0.69
-  #
-  # rdatastr() is historical/undocumented, use rdstring() since Net::DNS 0.69
-  my $rdatastr = $answer->UNIVERSAL::can('txtdata') ? join('',$answer->txtdata)
-               : $answer->UNIVERSAL::can('rdstring') ? $answer->rdstring
-                                                    : $answer->rdatastr;
+    $rdatastr = join('', $answer->txtdata);
+  } else {
+    $rdatastr = $answer->rdstring;
+    # encoded in a RFC 1035 zone file format (escaped), decode it
+    $rdatastr =~ s{ \\ ( [0-9]{3} | (?![0-9]{3}) . ) }
+                  { length($1)==3 && $1 <= 255 ? chr($1) : $1 }xgse;
+  }
+
+  # Bug 7236: Net::DNS attempts to decode text strings in a TXT record as
+  # UTF-8 since version 0.69, which is undesired: octets failing the UTF-8
+  # decoding are converted to a Unicode "replacement character" U+FFFD, and
+  # ASCII text is unnecessarily flagged as perl native characters.
+  utf8::encode($rdatastr)  if utf8::is_utf8($rdatastr);
+
+  my $qname = $question->qname;
  if (defined $qname && defined $rdatastr) {
    my $qclass = $question->qclass;
    my $qtype = $question->qtype;
@ -236,8 +210,8 @@ sub dnsbl_uri {
    push(@vals, "class=$qclass") if $qclass ne "IN";
    push(@vals, "type=$qtype") if $qtype ne "A";
    my $uri = "dns:$qname" . (@vals ? "?" . join(";", @vals) : "");
-    push @{ $self->{dnsuri}->{$uri} }, $rdatastr;

+    $self->{dnsuri}{$uri}{$rdatastr} = 1;
    dbg("dns: hit <$uri> $rdatastr");
  }
 }
@ -251,19 +225,17 @@ sub process_dnsbl_result {
  my $question = ($pkt->question)[0];
  return if !$question;

-  my $sets = $ent->{sets} || [];
-  my $rules = $ent->{rules};
+  my $rulename = $ent->{rulename};

-  # NO_DNS_FOR_FROM
-  if ($self->{sender_host} &&
-        # fishy, qname should have been "RFC 1035 zone format" -decoded first
-      lc($question->qname) eq lc($self->{sender_host}) &&
-      $question->qtype =~ /^(?:A|MX)$/ &&
-      $pkt->header->rcode =~ /^(?:NXDOMAIN|SERVFAIL)$/ &&
-      ++$self->{sender_host_fail} == 2)
-  {
-    for my $rule (@{$rules}) {
-      $self->got_hit($rule, "DNS: ", ruletype => "dns");
+  # Mark rule ready for meta rules, but only if this was the last lookup
+  # pending, rules can have many lookups launched for different IPs
+  if (!$self->get_async_pending_rules($rulename)) {
+    $self->rule_ready($rulename);
+    # Mark depending check_rbl_sub rules too
+    if (exists $self->{rbl_subs}{$ent->{set}}) {
+      foreach (@{$self->{rbl_subs}{$ent->{set}}}) {
+        $self->rule_ready($_->[1]);
+      }
    }
  }

@ -274,44 +246,69 @@ sub process_dnsbl_result {
    $self->dnsbl_uri($question, $answer);
    my $answ_type = $answer->type;
    # TODO: there are some CNAME returns that might be useful
-    next if ($answ_type ne 'A' && $answ_type ne 'TXT');
-    if ($answ_type eq 'A') {
-      # Net::DNS::RR::A::address() is available since Net::DNS 0.69
-      my $ip_address = $answer->UNIVERSAL::can('address') ? $answer->address
-                                                          : $answer->rdatastr;
+    next if $answ_type ne 'A' && $answ_type ne 'TXT';
+
+    my $rdatastr;
+    if ($answer->UNIVERSAL::can('txtdata')) {
+      # txtdata returns a non- zone-file-format encoded result, unlike rdstring;
+      # avoid space-separated RDATA <character-string> fields if possible,
+      # txtdata provides a list of strings in a list context since Net::DNS 0.69
+      $rdatastr = join('', $answer->txtdata);
+    } else {
+      $rdatastr = $answer->rdstring;
+      # encoded in a RFC 1035 zone file format (escaped), decode it
+      $rdatastr =~ s{ \\ ( [0-9]{3} | (?![0-9]{3}) . ) }
+                    { length($1)==3 && $1 <= 255 ? chr($1) : $1 }xgse;
+    }
+
+    # Bug 7236: Net::DNS attempts to decode text strings in a TXT record as
+    # UTF-8 since version 0.69, which is undesired: octets failing the UTF-8
+    # decoding are converted to a Unicode "replacement character" U+FFFD, and
+    # ASCII text is unnecessarily flagged as perl native characters.
+    utf8::encode($rdatastr)  if utf8::is_utf8($rdatastr);
+
    # skip any A record that isn't on 127.0.0.0/8
-      next if $ip_address !~ /^127\./;
+    next if $answ_type eq 'A' && $rdatastr !~ /^127\./;
+
+    # check_rbl tests
+    if (defined $ent->{subtest}) {
+      if ($self->check_subtest($rdatastr, $ent->{subtest})) {
+        $self->dnsbl_hit($rulename, $question, $answer);
      }
-    for my $rule (@{$rules}) {
-      $self->dnsbl_hit($rule, $question, $answer);
-    }
-    for my $set (@{$sets}) {
-      if ($self->{dnspost}->{$set}) {
-	$self->process_dnsbl_set($set, $question, $answer);
+    } else {
+      $self->dnsbl_hit($rulename, $question, $answer);
    }
+
+    # check_rbl_sub tests
+    if (exists $self->{rbl_subs}{$ent->{set}}) {
+      $self->process_dnsbl_set($ent->{set}, $question, $answer, $rdatastr);
    }
  }
+
  return 1;
 }

 sub process_dnsbl_set {
-  my ($self, $set, $question, $answer) = @_;
+  my ($self, $set, $question, $answer, $rdatastr) = @_;

-  # txtdata returns a non- zone-file-format encoded result, unlike rdstring;
-  # avoid space-separated RDATA <character-string> fields if possible,
-  # txtdata provides a list of strings in a list context since Net::DNS 0.69
-  #
-  # rdatastr() is historical/undocumented, use rdstring() since Net::DNS 0.69
-  my $rdatastr = $answer->UNIVERSAL::can('txtdata')  ? join('',$answer->txtdata)
-               : $answer->UNIVERSAL::can('rdstring') ? $answer->rdstring
-                                                    : $answer->rdatastr;
-
-  while (my ($subtest, $rule) = each %{ $self->{dnspost}->{$set} }) {
+  foreach my $args (@{$self->{rbl_subs}{$set}}) {
+    my $subtest = $args->[0];
+    my $rule = $args->[1];
    next if $self->{tests_already_hit}->{$rule};
+    if ($self->check_subtest($rdatastr, $subtest)) {
+      $self->dnsbl_hit($rule, $question, $answer);
+    }
+  }
+}

-    if ($subtest =~ /^\d+\.\d+\.\d+\.\d+$/) {
-      # test for exact equality, not a regexp (an IPv4 address)
-      $self->dnsbl_hit($rule, $question, $answer)  if $subtest eq $rdatastr;
+sub check_subtest {
+  my ($self, $rdatastr, $subtest) = @_;
+
+  # regular expression
+  if (ref($subtest) eq 'Regexp') {
+    if ($rdatastr =~ $subtest) {
+      return 1;
+    }
  }
  # bitmask
  elsif ($subtest =~ /^\d+$/) {
@ -319,40 +316,21 @@ sub process_dnsbl_set {
    if ($rdatastr =~ m/^127\.\d{1,3}\.\d{1,3}\.\d{1,3}$/ &&
        Mail::SpamAssassin::Util::my_inet_aton($rdatastr) & $subtest)
    {
-	$self->dnsbl_hit($rule, $question, $answer);
+      return 1;
    }
  }
-    # regular expression
  else {
-      my $test = qr/$subtest/;
-      if ($rdatastr =~ /$test/) {
-	$self->dnsbl_hit($rule, $question, $answer);
-      }
+    # test for exact equality (an IPv4 address)
+    if ($subtest eq $rdatastr) {
+      return 1;
    }
  }
+
+  return 0;
 }

-sub harvest_until_rule_completes {
-  my ($self, $rule) = @_;
-
-  dbg("dns: harvest_until_rule_completes");
-  my $result = 0;
-
-  for (my $first=1;  ; $first=0) {
-    # complete_lookups() may call completed_callback(), which may
-    # call start_lookup() again (like in Plugin::URIDNSBL)
-    my ($alldone,$anydone) =
-      $self->{async}->complete_lookups($first ? 0 : 1.0,  1);
-
-    $result = 1  if $self->is_rule_complete($rule);
-    last  if $result || $alldone;
-
-    dbg("dns: harvest_until_rule_completes - check_tick");
-    $self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
-  }
-
-  return $result;
-}
+# Deprecated since 4.0, meta rules do not depend on priorities anymore
+sub harvest_until_rule_completes {}

 sub harvest_dnsbl_queries {
  my ($self) = @_;
@ -370,7 +348,7 @@ sub harvest_dnsbl_queries {
    my ($alldone,$anydone) =
      $self->{async}->complete_lookups($first ? 0 : 1.0,  1);

-    last  if $alldone;
+    last  if $alldone || $self->{deadline_exceeded} || $self->{shortcircuited};

    dbg("dns: harvest_dnsbl_queries - check_tick");
    $self->{main}->call_plugins ("check_tick", { permsgstatus => $self });
@ -379,7 +357,6 @@ sub harvest_dnsbl_queries {
  # explicitly abort anything left
  $self->{async}->abort_remaining_lookups();
  $self->{async}->log_lookups_timing();
-  $self->mark_all_async_rules_complete();
  1;
 }

@ -403,12 +380,14 @@ sub harvest_completed_queries {
 sub set_rbl_tag_data {
  my ($self) = @_;

+  return if !$self->{dnsuri};
+
  # DNS URIs
  my $rbl_tag = $self->{tag_data}->{RBL};  # just in case, should be empty
  $rbl_tag = ''  if !defined $rbl_tag;
-  while (my ($dnsuri, $answers) = each %{ $self->{dnsuri} }) {
+  while (my ($dnsuri, $answers) = each %{$self->{dnsuri}}) {
    # when parsing, look for elements of \".*?\" or \S+ with ", " as separator
-    $rbl_tag .= "<$dnsuri>" . " [" . join(", ", @{ $answers }) . "]\n";
+    $rbl_tag .= "<$dnsuri>" . " [" . join(", ", keys %$answers) . "]\n";
  }
  if (defined $rbl_tag && $rbl_tag ne '') {
    chomp $rbl_tag;
@ -423,7 +402,7 @@ sub rbl_finish {

  $self->set_rbl_tag_data();

-  delete $self->{dnspost};
+  delete $self->{rbl_subs};
  delete $self->{dnsuri};
 }

@ -442,55 +421,103 @@ sub clear_resolver {
  return 0;
 }

+# Deprecated since 4.0.0
 sub lookup_ns {
+  warn "dns: deprecated lookup_ns called, query ignored\n";
+  return;
+}
+
+sub test_dns_a_aaaa {
  my ($self, $dom) = @_;

-  return unless $self->load_resolver();
  return if ($self->server_failed_to_respond_for_domain ($dom));

-  my $nsrecords;
-  dbg("dns: looking up NS for '$dom'");
+  my ($a, $aaaa) = (0, 0);

+  if ($self->{conf}->{dns_options}->{v4}) {
    eval {
-    my $query = $self->{resolver}->send($dom, 'NS');
-    my @nses;
+      my $query = $self->{resolver}->send($dom, 'A');
      if ($query) {
        foreach my $rr ($query->answer) {
-        if ($rr->type eq "NS") { push (@nses, $rr->nsdname); }
+          if ($rr->type eq 'A') { $a = 1; last; }
        }
      }
-    $nsrecords = [ @nses ];
      1;
    } or do {
      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-    dbg("dns: NS lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
-    return;
+      dbg("dns: test A lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
+      return (undef, undef);
    };
+    if (!$a) {
+      dbg("dns: test A lookup returned no results, use \"dns_options nov4\" if resolver doesn't support A queries");
+    }
+  } else {
+    $a = 1;
+  }

-  $nsrecords;
+  if ($self->{conf}->{dns_options}->{v6}) {
+    eval {
+      my $query = $self->{resolver}->send($dom, 'AAAA');
+      if ($query) {
+        foreach my $rr ($query->answer) {
+          if ($rr->type eq 'AAAA') { $aaaa = 1; last; }
+        }
+      }
+      1;
+    } or do {
+      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
+      dbg("dns: test AAAA lookup failed horribly, perhaps bad resolv.conf setting? (%s)", $eval_stat);
+      return (undef, undef);
+    };
+    if (!$aaaa) {
+      dbg("dns: test AAAA lookup returned no results, use \"dns_options nov6\" if resolver doesn't support AAAA queries");
+    }
+  } else {
+    $aaaa = 1;
+  }
+
+  return ($a, $aaaa);
 }

 sub is_dns_available {
  my ($self) = @_;
  my $dnsopt = $self->{conf}->{dns_available};
-  my $dnsint = $self->{conf}->{dns_test_interval} || 600;
-  my @domains;

-  $LAST_DNS_CHECK ||= 0;
-  my $diff = time() - $LAST_DNS_CHECK;
+  # Fast response for the most common cases
+  return 1 if $IS_DNS_AVAILABLE && $dnsopt eq "yes";
+  return 0 if defined $IS_DNS_AVAILABLE && $dnsopt eq "no";
+
+  # croak on misconfigured flags
+  if (!$self->{conf}->{dns_options}->{v4} &&
+      !$self->{conf}->{dns_options}->{v6})
+  {
+    warn 'dns: error: dns_options "nov4" and "nov6" are both set, '.
+         ' only use either, or use "dns_available no" to really disable DNS'.
+         "\n";
+    $IS_DNS_AVAILABLE = 0;
+    $self->{conf}->{dns_available} = "no";
+    return 0;
+  }

  # undef $IS_DNS_AVAILABLE if we should be testing for
  # working DNS and our check interval time has passed
-  if ($dnsopt eq "test" && $diff > $dnsint) {
+  if ($dnsopt eq "test") {
+    my $diff = time - $LAST_DNS_CHECK;
+    if ($diff > ($self->{conf}->{dns_test_interval}||600)) {
      $IS_DNS_AVAILABLE = undef;
-    dbg("dns: is_dns_available() last checked %.1f seconds ago; re-checking",
-        $diff);
+      if ($LAST_DNS_CHECK) {
+        dbg("dns: is_dns_available() last checked %.1f seconds ago; re-checking", $diff);
+      } else {
+        dbg("dns: is_dns_available() initial check");
+      }
+    }
+    $LAST_DNS_CHECK = time;
  }

-  return $IS_DNS_AVAILABLE if (defined $IS_DNS_AVAILABLE);
-  $LAST_DNS_CHECK = time();
+  return $IS_DNS_AVAILABLE if defined $IS_DNS_AVAILABLE;

  $IS_DNS_AVAILABLE = 0;
+
  if ($dnsopt eq "no") {
    dbg("dns: dns_available set to no in config file, skipping test");
    return $IS_DNS_AVAILABLE;
@ -498,26 +525,16 @@ sub is_dns_available {

  # Even if "dns_available" is explicitly set to "yes", we want to ignore
  # DNS if we're only supposed to be looking at local tests.
-  goto done if ($self->{main}->{local_tests_only});
-
-  # Check version numbers - runtime check only
-  if (defined $Net::DNS::VERSION) {
-    if (am_running_on_windows()) {
-      if ($Net::DNS::VERSION < 0.46) {
-	warn("dns: Net::DNS version is $Net::DNS::VERSION, but need 0.46 for Win32");
+  if ($self->{main}->{local_tests_only}) {
+    dbg("dns: using local tests only, DNS not available");
    return $IS_DNS_AVAILABLE;
  }
-    }
-    else {
-      if ($Net::DNS::VERSION < 0.34) {
-	warn("dns: Net::DNS version is $Net::DNS::VERSION, but need 0.34");
+
+  #$self->clear_resolver();
+  if (!$self->load_resolver()) {
+    dbg("dns: could not load resolver, DNS not available");
    return $IS_DNS_AVAILABLE;
  }
-    }
-  }
-
-  $self->clear_resolver();
-  goto done unless $self->load_resolver();

  if ($dnsopt eq "yes") {
    # optionally shuffle the list of nameservers to distribute the load
@ -532,13 +549,18 @@ sub is_dns_available {
    return $IS_DNS_AVAILABLE;
  }

+  my @domains;
+  my @rtypes;
+  push @rtypes, 'A' if $self->{main}->{conf}->{dns_options}->{v4};
+  push @rtypes, 'AAAA' if $self->{main}->{conf}->{dns_options}->{v6};
  if ($dnsopt =~ /^test:\s*(\S.*)$/) {
    @domains = split (/\s+/, $1);
-    dbg("dns: looking up NS records for user specified domains: %s",
-        join(", ", @domains));
+    dbg("dns: testing %s records for user specified domains: %s",
+        join("/", @rtypes), join(", ", @domains));
  } else {
    @domains = @EXISTING_DOMAINS;
-    dbg("dns: looking up NS records for built-in domains");
+    dbg("dns: testing %s records for built-in domains: %s",
+        join("/", @rtypes), join(", ", @domains));
  }

  # do the test with a full set of configured nameservers
@ -556,19 +578,19 @@ sub is_dns_available {
  my @good_nameservers;
  foreach my $ns (@nameservers) {
    $self->{resolver}->available_nameservers($ns);  # try just this one
-    for (my $retry = 3; $retry > 0 && @domains; $retry--) {
+    for (my $retry = 0; $retry < 3 && @domains; $retry++) {
      my $domain = splice(@domains, rand(@domains), 1);
-      dbg("dns: trying ($retry) $domain, server $ns ...");
-      my $result = $self->lookup_ns($domain);
+      dbg("dns: trying $domain, server $ns ..." .
+          ($retry ? " (retry $retry)" : ""));
+      my ($ok_a, $ok_aaaa) = $self->test_dns_a_aaaa($domain);
      $self->{resolver}->finish_socket();
-      if (!$result) {
-        dbg("dns: NS lookup of $domain using $ns failed horribly, ".
-            "may not be a valid nameserver");
+      if (!defined $ok_a || !defined $ok_aaaa) {
+        # error printed already
        last;
-      } elsif (!@$result) {
-        dbg("dns: NS lookup of $domain using $ns failed, no results found");
+      } elsif (!$ok_a && !$ok_aaaa) {
+        dbg("dns: lookup of $domain using $ns failed, no results found");
      } else {
-        dbg("dns: NS lookup of $domain using $ns succeeded => DNS available".
+        dbg("dns: lookup of $domain using $ns succeeded => DNS available".
            " (set dns_available to override)");
        push(@good_nameservers, $ns);
        last;
@ -585,8 +607,6 @@ sub is_dns_available {
    $self->{resolver}->available_nameservers(@good_nameservers);
  }

-done:
-  # jm: leaving this in!
  dbg("dns: is DNS available? " . $IS_DNS_AVAILABLE);
  return $IS_DNS_AVAILABLE;
 }
@ -676,55 +696,26 @@ sub cleanup_kids {

 ###########################################################################

-sub register_async_rule_start {
+# Deprecated async functions, everything is handled automatically
+# now by bgsend .. $self->{async}->{pending_rules}
+sub register_async_rule_start {}
+sub register_async_rule_finish {}
+sub mark_all_async_rules_complete {}
+sub is_rule_complete {}
+
+# Return number of pending DNS lookups for a rule,
+# or list all of rules still pending
+sub get_async_pending_rules {
  my ($self, $rule) = @_;
-  dbg("dns: $rule lookup start");
-  $self->{rule_to_rblkey}->{$rule} = '*ASYNC_START';
-}
-
-sub register_async_rule_finish {
-  my ($self, $rule) = @_;
-  dbg("dns: $rule lookup finished");
-  delete $self->{rule_to_rblkey}->{$rule};
-}
-
-sub mark_all_async_rules_complete {
-  my ($self) = @_;
-  $self->{rule_to_rblkey} = { };
-}
-
-sub is_rule_complete {
-  my ($self, $rule) = @_;
-
-  my $key = $self->{rule_to_rblkey}->{$rule};
-  if (!defined $key) {
-    # dbg("dns: $rule lookup complete, not in list");
-    return 1;
+  if (defined $rule) {
+    return 0 if !exists $self->{async}->{pending_rules}{$rule};
+    return scalar keys %{$self->{async}->{pending_rules}{$rule}};
+  } else {
+    return grep { %{$self->{async}->{pending_rules}{$_}} }
+             keys %{$self->{async}->{pending_rules}};
  }
-
-  if ($key eq '*ASYNC_START') {
-    dbg("dns: $rule lookup not yet complete");
-    return 0;       # not yet complete
-  }
-
-  my $ent = $self->{async}->get_lookup($key);
-  if (!defined $ent) {
-    dbg("dns: $rule lookup complete, $key no longer pending");
-    return 1;
-  }
-
-  dbg("dns: $rule lookup not yet complete");
-  return 0;         # not yet complete
 }

 ###########################################################################

-# interface called by SPF plugin
-sub check_for_from_dns {
-  my ($self, $pms) = @_;
-  if (defined $pms->{sender_host_fail}) {
-    return ($pms->{sender_host_fail} == 2); # both MX and A need to fail
-  }
-}
-
 1;
--- a/upstream/lib/Mail/SpamAssassin/DnsResolver.pm
+++ b/upstream/lib/Mail/SpamAssassin/DnsResolver.pm
@ -40,21 +40,24 @@ use warnings;
 # use bytes;
 use re 'taint';

-require 5.008001;  # needs utf8::is_utf8()
-
 use Mail::SpamAssassin;
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Constants qw(:ip);
-use Mail::SpamAssassin::Util qw(untaint_var decode_dns_question_entry);
+use Mail::SpamAssassin::Util qw(untaint_var decode_dns_question_entry
+                                idn_to_ascii reverse_ip_address
+                                domain_to_search_list);

 use Socket;
 use Errno qw(EADDRINUSE EACCES);
 use Time::HiRes qw(time);
+use version 0.77;

 our @ISA = qw();

+our $have_net_dns;
 our $io_socket_module_name;
 BEGIN {
+  $have_net_dns = eval { require Net::DNS; };
  if (eval { require IO::Socket::IP }) {
    $io_socket_module_name = 'IO::Socket::IP';
  } elsif (eval { require IO::Socket::INET6 }) {
@ -78,7 +81,6 @@ sub new {
  };
  bless ($self, $class);

-  $self->load_resolver();
  $self;
 }

@ -94,8 +96,8 @@ Load the C<Net::DNS::Resolver> object.  Returns 0 if Net::DNS cannot be used,
 sub load_resolver {
  my ($self) = @_;

-  if ($self->{res}) { return 1; }
-  $self->{no_resolver} = 1;
+  return 0 if $self->{no_resolver};
+  return 1 if $self->{res};

  # force only ipv4 if no IO::Socket::INET6 or ipv6 doesn't work
  my $force_ipv4 = $self->{main}->{force_ipv4};
@ -112,7 +114,7 @@ sub load_resolver {
      if ($io_socket_module_name) {
        $sock6 = $io_socket_module_name->new(LocalAddr=>'::', Proto=>'udp');
      }
-      if ($sock6) { $sock6->close() or warn "error closing socket: $!" }
+      if ($sock6) { $sock6->close() or warn "dns: error closing socket: $!\n" }
      $sock6;
    } or do {
      dbg("dns: socket module %s is available, but no host support for IPv6",
@ -123,13 +125,14 @@ sub load_resolver {
  }
  
  eval {
-    require Net::DNS;
+    die "Net::DNS required\n" if !$have_net_dns;
+    die "Net::DNS 0.69 required\n"
+      if (version->parse(Net::DNS->VERSION) < version->parse(0.69));
    # force_v4 is set in new() to avoid error in older versions of Net::DNS
    # that don't have it; other options are set by function calls so a typo
    # or API change will cause an error here
    my $res = $self->{res} = Net::DNS::Resolver->new(force_v4 => $force_ipv4);
    if ($res) {
-      $self->{no_resolver} = 0;
      $self->{force_ipv4} = $force_ipv4;
      $self->{force_ipv6} = $force_ipv6;
      $self->{retry} = 1;       # retries for non-backgrounded query
@ -164,7 +167,7 @@ sub load_resolver {
    1;
  } or do {
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-    dbg("dns: eval failed: $eval_stat");
+    warn("dns: resolver create failed: $eval_stat\n");
  };

  dbg("dns: using socket module: %s version %s%s",
@ -173,12 +176,13 @@ sub load_resolver {
      $self->{force_ipv4} ? ', forced IPv4' :
      $self->{force_ipv6} ? ', forced IPv6' : '');
  dbg("dns: is Net::DNS::Resolver available? %s",
-      $self->{no_resolver} ? "no" : "yes" );
-  if (!$self->{no_resolver} && defined $Net::DNS::VERSION) {
+      $self->{res} ? "yes" : "no" );
+  if ($self->{res} && defined $Net::DNS::VERSION) {
    dbg("dns: Net::DNS version: %s", $Net::DNS::VERSION);
  }

-  return (!$self->{no_resolver});
+  $self->{no_resolver} = !$self->{res};
+  return defined $self->{res};
 }

 =item $resolver = $res->get_resolver()
@ -237,18 +241,17 @@ sub available_nameservers {
  }
  if ($self->{force_ipv4} || $self->{force_ipv6}) {
    # filter the list according to a chosen protocol family
-    my $ip4_re = IPV4_ADDRESS;
    my(@filtered_addr_port);
    for (@{$self->{available_dns_servers}}) {
      local($1,$2);
      /^ \[ (.*) \] : (\d+) \z/xs  or next;
      my($addr,$port) = ($1,$2);
-      if ($addr =~ /^${ip4_re}\z/o) {
+      if ($addr =~ IS_IPV4_ADDRESS) {
        push(@filtered_addr_port, $_)  unless $self->{force_ipv6};
      } elsif ($addr =~ /:.*:/) {
        push(@filtered_addr_port, $_)  unless $self->{force_ipv4};
      } else {
-        warn "Unrecognized DNS server specification: $_";
+        warn "dns: Unrecognized DNS server specification: $_\n";
      }
    }
    if (@filtered_addr_port < @{$self->{available_dns_servers}}) {
@ -361,7 +364,7 @@ sub connect_sock {

  if ($self->{sock}) {
    $self->{sock}->close()
-      or info("connect_sock: error closing socket %s: %s", $self->{sock}, $!);
+      or info("dns: connect_sock: error closing socket %s: %s", $self->{sock}, $!);
    $self->{sock} = undef;
  }
  my $sock;
@ -378,13 +381,12 @@ sub connect_sock {
  # is unspecified, causing EINVAL failure when automatically assigned local
  # IP address and a remote address do not belong to the same address family.
  # Let's choose a suitable source address if possible.
-  my $ip4_re = IPV4_ADDRESS;
  my $srcaddr;
  if ($self->{force_ipv4}) {
    $srcaddr = "0.0.0.0";
  } elsif ($self->{force_ipv6}) {
    $srcaddr = "::";
-  } elsif ($ns_addr =~ /^${ip4_re}\z/o) {
+  } elsif ($ns_addr =~ IS_IPV4_ADDRESS) {
    $srcaddr = "0.0.0.0";
  } elsif ($ns_addr =~ /:.*:/) {
    $srcaddr = "::";
@ -400,10 +402,10 @@ sub connect_sock {
    $lport = $self->pick_random_available_port();
    if (!defined $lport) {
      $lport = 0;
-      dbg("no configured local ports for DNS queries, letting OS choose");
+      dbg("dns: no configured local ports for DNS queries, letting OS choose");
    }
    if ($attempts+1 > 50) {  # sanity check
-      warn "could not create a DNS resolver socket in $attempts attempts\n";
+      warn "dns: could not create a DNS resolver socket in $attempts attempts\n";
      $errno = 0;
      last;
    }
@ -431,12 +433,12 @@ sub connect_sock {
        $self->disable_available_port($lport);
      }
    } else {
-      warn "error creating a DNS resolver socket: $errno";
+      warn "dns: error creating a DNS resolver socket: $errno";
      goto no_sock;
    }
  }
  if (!$sock) {
-    warn "could not create a DNS resolver socket in $attempts attempts: $errno";
+    warn "dns: could not create a DNS resolver socket in $attempts attempts: $errno\n";
    goto no_sock;
  }

@ -534,9 +536,8 @@ sub new_dns_packet {

  # construct a PTR query if it looks like an IPv4 address
  if (!defined($type) || $type eq 'PTR') {
-    local($1,$2,$3,$4);
-    if ($domain =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) {
-      $domain = "$4.$3.$2.$1.in-addr.arpa.";
+    if ($domain =~ IS_IPV4_ADDRESS) {
+      $domain = reverse_ip_address($domain).".in-addr.arpa.";
      $type = 'PTR';
    }
  }
@ -598,13 +599,7 @@ sub new_dns_packet {
    my $udp_payload_size = $self->{conf}->{dns_options}->{edns};
    if ($udp_payload_size && $udp_payload_size > 512) {
      # dbg("dns: adding EDNS ext, UDP payload size %d", $udp_payload_size);
-      if ($packet->UNIVERSAL::can('edns')) {  # available since Net::DNS 0.69
      $packet->edns->size($udp_payload_size);
-      } else {  # legacy mechanism
-        my $optrr = Net::DNS::RR->new(Type => 'OPT', Name => '', TTL => 0,
-                                      Class => $udp_payload_size);
-        $packet->push('additional', $optrr);
-      }
    }
  }

@ -658,6 +653,8 @@ sub _packet_id {

 =item $id = $res->bgsend($domain, $type, $class, $cb)

+DIRECT USE DISCOURAGED, please use bgsend_and_start_lookup in plugins.
+
 Quite similar to C<Net::DNS::Resolver::bgsend>, except that when a reply
 packet eventually arrives, and C<poll_responses> is called, the callback
 sub reference C<$cb> will be called.
@ -673,7 +670,7 @@ be used, like so:

  my $id = $self->{resolver}->bgsend($domain, $type, undef, sub {
        my ($reply, $reply_id, $timestamp) = @_;
-        $self->got_a_reply ($reply, $reply_id);
+        $self->got_a_reply($reply, $reply_id);
      });

 The callback can ignore the reply as an invalid packet sent to the listening
@ -685,6 +682,19 @@ sub bgsend {
  my ($self, $domain, $type, $class, $cb) = @_;
  return if $self->{no_resolver};

+  my $dns_query_blockages = $self->{main}->{conf}->{dns_query_blocked};
+  if ($dns_query_blockages) {
+    my $search_list = domain_to_search_list($domain);
+    foreach my $parent_domain ((@$search_list, '*')) {
+      my $blocked = $dns_query_blockages->{$parent_domain};
+      next if !defined $blocked; # not listed
+      last if !$blocked; # allowed
+      # blocked
+      dbg("dns: bgsend, query $type/$domain blocked by dns_query_restriction: $parent_domain");
+      return;
+    }
+  }
+
  $self->{send_timed_out} = 0;

  my $pkt = $self->new_dns_packet($domain, $type, $class);
@ -747,7 +757,7 @@ sub bgread {
  $answerpkt or die "bgread: decoding DNS packet failed: $@";
  $answerpkt->answerfrom($peerhost);
  if (defined $decoded_length && $decoded_length ne "" && $decoded_length != length($data)) {
-    warn sprintf("bgread: received a %d bytes packet from %s, decoded %d bytes\n",
+    warn sprintf("dns: bgread: received a %d bytes packet from %s, decoded %d bytes\n",
                 length($data), $peerhost, $decoded_length);
  }
  return $answerpkt;
@ -767,13 +777,16 @@ sub poll_responses {
  return if $self->{no_resolver};
  return if !$self->{sock};
  my $cnt = 0;
+  my $cnt_cb = 0;

  my $rin = $self->{sock_as_vec};
  my $rout;

  for (;;) {
    my ($nfound, $timeleft, $eval_stat);
-    eval {  # use eval to catch alarm signal
+    # if a restartable signal is caught, retry 3 times before aborting
+    my $eintrcount = 3;
+    eval {  # use eval to caught alarm signal
      my $timer;  # collects timestamp when variable goes out of scope
      if (!defined($timeout) || $timeout > 0)
        { $timer = $self->{main}->time_method("poll_dns_idle") }
@ -787,16 +800,21 @@ sub poll_responses {
      # most likely due to an alarm signal, resignal if so
      die "dns: (2) $eval_stat\n"  if $eval_stat =~ /__alarm__ignore__\(.*\)/s;
      warn "dns: select aborted: $eval_stat\n";
-      return;
+      last;
    } elsif (!defined $nfound || $nfound < 0) {
+      if ($!{EINTR} and $eintrcount > 0) {
+        $eintrcount--;
+        next;
+      }
      if ($!) { warn "dns: select failed: $!\n" }
      else    { info("dns: select interrupted") }  # shouldn't happen
-      return;
+      last;
    } elsif (!$nfound) {
      if (!defined $timeout) { warn("dns: select returned empty-handed\n") }
      elsif ($timeout > 0) { dbg("dns: select timed out %.3f s", $timeout) }
-      return;
+      last;
    }
+    $cnt += $nfound;

    my $now = time;
    $timeout = 0;  # next time around collect whatever is available, then exit
@ -853,12 +871,12 @@ sub poll_responses {

        if ($cb) {
          $cb->($packet, $id, $now);
-          $cnt++;
+          $cnt_cb++;
        } else {  # no match, report the problem
          if ($rcode eq 'REFUSED' || $id =~ m{^\d+/NO_QUESTION_IN_PACKET\z}) {
            # the failure was already reported above
          } else {
-            dbg("dns: no callback for id $id, ignored, packet on next debug line");
+            info("dns: no callback for id $id, ignored, packet on next debug line");
            # prevent filling normal logs with huge packet dumps
            dbg("dns: %s", $packet ? $packet->string : "undef");
          }
@ -867,11 +885,11 @@ sub poll_responses {
          if ($id =~ m{^(\d+)/}) {
            my $dnsid = $1;  # the raw DNS packet id
            my @matches =
-              grep(m{^\Q$dnsid\E/}, keys %{$self->{id_to_callback}});
+              grep(m{^\Q$dnsid\E/}o, keys %{$self->{id_to_callback}});
            if (!@matches) {
-              dbg("dns: no likely matching queries for id %s", $dnsid);
+              info("dns: no likely matching queries for id %s", $dnsid);
            } else {
-              dbg("dns: a likely matching query: %s", join(', ', @matches));
+              info("dns: a likely matching query: %s", join(', ', @matches));
            }
          }
        }
@ -879,7 +897,35 @@ sub poll_responses {
    }
  }

-  return $cnt;
+  return ($cnt, $cnt_cb);
+}
+
+use constant RECV_FLAGS => eval { MSG_DONTWAIT } || 0; # Not in Windows
+
+# Used to flush stale DNS responses, which we don't need to process
+sub flush_responses {
+  my ($self) = @_;
+  return if $self->{no_resolver};
+  return if !$self->{sock};
+
+  my $rin = $self->{sock_as_vec};
+  my $rout;
+  my $nfound;
+
+  my $packetsize = $self->{res}->udppacketsize;
+  $packetsize = 512  if $packetsize < 512;  # just in case
+  $self->{sock}->blocking(0) unless(RECV_FLAGS);
+  for (;;) {
+    eval {  # use eval to catch alarm signal
+      ($nfound, undef) = select($rout=$rin, undef, undef, 0);
+      1;
+    } or do {
+	  last;
+    };
+    last if !$nfound;
+    last if !$self->{sock}->recv(my $data, $packetsize+256, RECV_FLAGS);
+  }
+  $self->{sock}->blocking(1) unless(RECV_FLAGS);
 }

 ###########################################################################
@ -920,8 +966,9 @@ sub send {
  # using some arbitrary encoding (they are normally just 7-bit ascii
  # characters anyway, just need to get rid of the utf8 flag).  Bug 6959
  # Most if not all af these come from a SPF plugin.
+  #   (was a call to utf8::encode($name), now we prefer a proper idn_to_ascii)
  #
-  utf8::encode($name);
+  $name = idn_to_ascii($name);

  my $retrans = $self->{retrans};
  my $retries = $self->{retry};
@ -985,7 +1032,7 @@ sub finish_socket {
  my ($self) = @_;
  if ($self->{sock}) {
    $self->{sock}->close()
-      or warn "finish_socket: error closing socket $self->{sock}: $!";
+      or warn "dns: finish_socket: error closing socket $self->{sock}: $!\n";
    undef $self->{sock};
  }
 }
@ -1014,7 +1061,7 @@ sub fhs_to_vec {
  foreach my $sock (@fhlist) {
    my $fno = fileno($sock);
    if (!defined $fno) {
-      warn "dns: oops! fileno now undef for $sock";
+      warn "dns: oops! fileno now undef for $sock\n";
    } else {
      vec ($rin, $fno, 1) = 1;
    }
--- a/upstream/lib/Mail/SpamAssassin/GeoDB.pm
+++ b/upstream/lib/Mail/SpamAssassin/GeoDB.pm
@ -0,0 +1,899 @@
+# <@LICENSE>
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at:
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# </@LICENSE>
+
+=head1 NAME
+
+Mail::SpamAssassin::GeoDB - unified interface for geoip modules
+
+Plugins need to signal SA main package the modules they want loaded
+
+package Mail::SpamAssassin::Plugin::MyPlugin;
+sub new {
+  ...
+  $self->{main}->{geodb_wanted}->{country} = 1;
+  $self->{main}->{geodb_wanted}->{isp} = 1;
+)
+
+(internal stuff still subject to change)
+
+=cut
+
+package Mail::SpamAssassin::GeoDB;
+
+use strict;
+use warnings;
+# use bytes;
+use re 'taint';
+
+use Socket;
+use version 0.77;
+
+our @ISA = qw();
+
+use Mail::SpamAssassin::Constants qw(:ip);
+use Mail::SpamAssassin::Logger;
+
+my @geoip_default_path = qw(
+  /usr/local/share/GeoIP
+  /usr/share/GeoIP
+  /var/lib/GeoIP
+  /opt/share/GeoIP
+);
+
+# load order (city contains country, isp contains asn)
+my @geoip_types = qw( city country isp asn );
+
+# v6 is not needed, automatically tries *v6.dat also
+my %geoip_default_files = (
+  'city' => ['GeoIPCity.dat','GeoLiteCity.dat'],
+  'country' => ['GeoIP.dat'],
+  'isp' => ['GeoIPISP.dat'],
+  'asn' => ['GeoIPASNum.dat'],
+);
+
+my %geoip2_default_files = (
+  'city' => ['GeoIP2-City.mmdb','GeoLite2-City.mmdb',
+             'dbip-city.mmdb','dbip-city-lite.mmdb'],
+  'country' => ['GeoIP2-Country.mmdb','GeoLite2-Country.mmdb',
+                'dbip-country.mmdb','dbip-country-lite.mmdb'],
+  'isp' => ['GeoIP2-ISP.mmdb','GeoLite2-ISP.mmdb'],
+  'asn' => ['GeoIP2-ASN.mmdb','GeoLite2-ASN.mmdb'],
+);
+
+my %country_to_continent = (
+'AP'=>'AS','EU'=>'EU','AD'=>'EU','AE'=>'AS','AF'=>'AS','AG'=>'NA',
+'AI'=>'NA','AL'=>'EU','AM'=>'AS','CW'=>'NA','AO'=>'AF','AQ'=>'AN',
+'AR'=>'SA','AS'=>'OC','AT'=>'EU','AU'=>'OC','AW'=>'NA','AZ'=>'AS',
+'BA'=>'EU','BB'=>'NA','BD'=>'AS','BE'=>'EU','BF'=>'AF','BG'=>'EU',
+'BH'=>'AS','BI'=>'AF','BJ'=>'AF','BM'=>'NA','BN'=>'AS','BO'=>'SA',
+'BR'=>'SA','BS'=>'NA','BT'=>'AS','BV'=>'AN','BW'=>'AF','BY'=>'EU',
+'BZ'=>'NA','CA'=>'NA','CC'=>'AS','CD'=>'AF','CF'=>'AF','CG'=>'AF',
+'CH'=>'EU','CI'=>'AF','CK'=>'OC','CL'=>'SA','CM'=>'AF','CN'=>'AS',
+'CO'=>'SA','CR'=>'NA','CU'=>'NA','CV'=>'AF','CX'=>'AS','CY'=>'AS',
+'CZ'=>'EU','DE'=>'EU','DJ'=>'AF','DK'=>'EU','DM'=>'NA','DO'=>'NA',
+'DZ'=>'AF','EC'=>'SA','EE'=>'EU','EG'=>'AF','EH'=>'AF','ER'=>'AF',
+'ES'=>'EU','ET'=>'AF','FI'=>'EU','FJ'=>'OC','FK'=>'SA','FM'=>'OC',
+'FO'=>'EU','FR'=>'EU','FX'=>'EU','GA'=>'AF','GB'=>'EU','GD'=>'NA',
+'GE'=>'AS','GF'=>'SA','GH'=>'AF','GI'=>'EU','GL'=>'NA','GM'=>'AF',
+'GN'=>'AF','GP'=>'NA','GQ'=>'AF','GR'=>'EU','GS'=>'AN','GT'=>'NA',
+'GU'=>'OC','GW'=>'AF','GY'=>'SA','HK'=>'AS','HM'=>'AN','HN'=>'NA',
+'HR'=>'EU','HT'=>'NA','HU'=>'EU','ID'=>'AS','IE'=>'EU','IL'=>'AS',
+'IN'=>'AS','IO'=>'AS','IQ'=>'AS','IR'=>'AS','IS'=>'EU','IT'=>'EU',
+'JM'=>'NA','JO'=>'AS','JP'=>'AS','KE'=>'AF','KG'=>'AS','KH'=>'AS',
+'KI'=>'OC','KM'=>'AF','KN'=>'NA','KP'=>'AS','KR'=>'AS','KW'=>'AS',
+'KY'=>'NA','KZ'=>'AS','LA'=>'AS','LB'=>'AS','LC'=>'NA','LI'=>'EU',
+'LK'=>'AS','LR'=>'AF','LS'=>'AF','LT'=>'EU','LU'=>'EU','LV'=>'EU',
+'LY'=>'AF','MA'=>'AF','MC'=>'EU','MD'=>'EU','MG'=>'AF','MH'=>'OC',
+'MK'=>'EU','ML'=>'AF','MM'=>'AS','MN'=>'AS','MO'=>'AS','MP'=>'OC',
+'MQ'=>'NA','MR'=>'AF','MS'=>'NA','MT'=>'EU','MU'=>'AF','MV'=>'AS',
+'MW'=>'AF','MX'=>'NA','MY'=>'AS','MZ'=>'AF','NA'=>'AF','NC'=>'OC',
+'NE'=>'AF','NF'=>'OC','NG'=>'AF','NI'=>'NA','NL'=>'EU','NO'=>'EU',
+'NP'=>'AS','NR'=>'OC','NU'=>'OC','NZ'=>'OC','OM'=>'AS','PA'=>'NA',
+'PE'=>'SA','PF'=>'OC','PG'=>'OC','PH'=>'AS','PK'=>'AS','PL'=>'EU',
+'PM'=>'NA','PN'=>'OC','PR'=>'NA','PS'=>'AS','PT'=>'EU','PW'=>'OC',
+'PY'=>'SA','QA'=>'AS','RE'=>'AF','RO'=>'EU','RU'=>'EU','RW'=>'AF',
+'SA'=>'AS','SB'=>'OC','SC'=>'AF','SD'=>'AF','SE'=>'EU','SG'=>'AS',
+'SH'=>'AF','SI'=>'EU','SJ'=>'EU','SK'=>'EU','SL'=>'AF','SM'=>'EU',
+'SN'=>'AF','SO'=>'AF','SR'=>'SA','ST'=>'AF','SV'=>'NA','SY'=>'AS',
+'SZ'=>'AF','TC'=>'NA','TD'=>'AF','TF'=>'AN','TG'=>'AF','TH'=>'AS',
+'TJ'=>'AS','TK'=>'OC','TM'=>'AS','TN'=>'AF','TO'=>'OC','TL'=>'AS',
+'TR'=>'EU','TT'=>'NA','TV'=>'OC','TW'=>'AS','TZ'=>'AF','UA'=>'EU',
+'UG'=>'AF','UM'=>'OC','US'=>'NA','UY'=>'SA','UZ'=>'AS','VA'=>'EU',
+'VC'=>'NA','VE'=>'SA','VG'=>'NA','VI'=>'NA','VN'=>'AS','VU'=>'OC',
+'WF'=>'OC','WS'=>'OC','YE'=>'AS','YT'=>'AF','RS'=>'EU','ZA'=>'AF',
+'ZM'=>'AF','ME'=>'EU','ZW'=>'AF','AX'=>'EU','GG'=>'EU','IM'=>'EU',
+'JE'=>'EU','BL'=>'NA','MF'=>'NA','BQ'=>'NA','SS'=>'AF','**'=>'**',
+);
+
+sub new {
+  my ($class, $conf) = @_;
+  $class = ref($class) || $class;
+
+  my $self = {};
+  bless ($self, $class);
+
+  $self->{cache} = ();
+  $self->init_database($conf || {});
+  $self;
+}
+
+sub init_database {
+  my ($self, $opts) = @_;
+
+  # Try city too if country wanted
+  $opts->{wanted}->{city} = 1 if $opts->{wanted}->{country};
+  # Try isp too if asn wanted
+  $opts->{wanted}->{isp} = 1 if $opts->{wanted}->{asn};
+
+  my $geodb_opts = {
+    'module' => $opts->{conf}->{module} || undef,
+    'dbs' => $opts->{conf}->{options} || undef,
+    'wanted' => $opts->{wanted} || undef,
+    'search_path' => defined $opts->{conf}->{geodb_search_path} ?
+      $opts->{conf}->{geodb_search_path} : \@geoip_default_path,
+  };
+
+  my ($db, $dbapi, $loaded);
+
+  ## GeoIP2
+  if (!$db && (!$geodb_opts->{module} || $geodb_opts->{module} eq 'geoip2')) {
+    ($db, $dbapi) = $self->load_geoip2($geodb_opts);
+    $loaded = 'geoip2' if $db;
+  }
+
+  ## Geo::IP
+  if (!$db && (!$geodb_opts->{module} || $geodb_opts->{module} eq 'geoip')) {
+    ($db, $dbapi) = $self->load_geoip($geodb_opts);
+    $loaded = 'geoip' if $db;
+  }
+
+  ## IP::Country::DB_File
+  if (!$db && $geodb_opts->{module} && $geodb_opts->{module} eq 'dbfile') {
+    # Only try if geodb_module and path to ipcc.db specified
+    ($db, $dbapi) = $self->load_dbfile($geodb_opts);
+    $loaded = 'dbfile' if $db;
+  }
+
+  ## IP::Country::Fast
+  if (!$db && (!$geodb_opts->{module} || $geodb_opts->{module} eq 'fast')) {
+    ($db, $dbapi) = $self->load_fast($geodb_opts);
+    $loaded = 'fast' if $db;
+  }
+
+  if (!$db) {
+    dbg("geodb: No supported database could be loaded");
+    die("No supported GeoDB database could be loaded\n");
+  }
+
+  # country can be aliased to city
+  if (!$dbapi->{country} && $dbapi->{city}) {
+    $dbapi->{country} = $dbapi->{city};
+  }
+  if (!$dbapi->{country_v6} && $dbapi->{city_v6}) {
+    $dbapi->{country_v6} = $dbapi->{city_v6}
+  }
+  # GeoIP2 asn can be aliased to isp
+  if ($loaded eq 'geoip2') {
+    if (!$dbapi->{asn} && $dbapi->{isp}) {
+      $dbapi->{asn} = $dbapi->{isp};
+    }
+    if (!$dbapi->{asn_v6} && $dbapi->{isp_v6}) {
+      $dbapi->{asn_v6} = $dbapi->{isp_v6}
+    }
+  }
+
+  $self->{db} = $db;
+  $self->{dbapi} = $dbapi;
+
+  foreach (@{$self->get_dbinfo()}) {
+    dbg("geodb: database info: ".$_);
+  }
+  #dbg("geodb: apis available: ".join(', ', sort keys %{$self->{dbapi}}));
+
+  return 1;
+}
+
+sub load_geoip2 {
+  my ($self, $geodb_opts) = @_;
+  my ($db, $dbapi, $ok);
+
+  # Warn about fatal errors if this module was specifically requested
+  my $errwarn = ($geodb_opts->{module}||'') eq 'geoip2';
+
+  eval {
+    require MaxMind::DB::Reader;
+  } or do {
+    my $err = $@;
+    $err =~ s/ at .*//s;
+    $err = "geodb: MaxMind::DB::Reader (GeoIP2) module load failed: $err";
+    $errwarn ? warn("$err\n") : dbg($err);
+    return (undef, undef);
+  };
+
+  my %path;
+  foreach my $dbtype (@geoip_types) {
+    # skip country if city already loaded
+    next if $dbtype eq 'country' && $db->{city};
+    # skip asn if isp already loaded
+    next if $dbtype eq 'asn' && $db->{isp};
+    # skip if not needed
+    next if $geodb_opts->{wanted} && !$geodb_opts->{wanted}->{$dbtype};
+    # only autosearch if no absolute path given
+    if (!defined $geodb_opts->{dbs}->{$dbtype}) {
+      # Try some default locations
+      PATHS_GEOIP2: foreach my $p (@{$geodb_opts->{search_path}}) {
+        foreach my $f (@{$geoip2_default_files{$dbtype}}) {
+          if (-f "$p/$f") {
+            $path{$dbtype} = "$p/$f";
+            dbg("geodb: GeoIP2: search found $dbtype $p/$f");
+            last PATHS_GEOIP2;
+          }
+        }
+      }
+    } else {
+      if (!-f $geodb_opts->{dbs}->{$dbtype}) {
+        dbg("geodb: GeoIP2: $dbtype database requested, but not found: ".
+          $geodb_opts->{dbs}->{$dbtype});
+        next;
+      }
+      $path{$dbtype} = $geodb_opts->{dbs}->{$dbtype};
+    }
+
+    if (defined $path{$dbtype}) {
+      eval {
+        $db->{$dbtype} = MaxMind::DB::Reader->new(
+          file => $path{$dbtype},
+        );
+        die "unknown error" unless $db->{$dbtype};
+        1;
+      };
+      if ($@ || !$db->{$dbtype}) {
+        my $err = $@;
+        $err =~ s/\s+Trace begun.*//s;
+        $err =~ s/ at .*//s;
+        dbg("geodb: GeoIP2: $dbtype load failed: $err");
+      } else {
+        dbg("geodb: GeoIP2: loaded $dbtype from $path{$dbtype}");
+        $ok = 1;
+      }
+    } else {
+      my $from = defined $geodb_opts->{dbs}->{$dbtype} ?
+        $geodb_opts->{dbs}->{$dbtype} : "default locations";
+      dbg("geodb: GeoIP2: $dbtype database not found from $from");
+    } 
+  }
+
+  if (!$ok) {
+    warn("geodb: GeoIP2 requested, but no databases could be loaded\n") if $errwarn;
+    return (undef, undef)
+  }
+
+  # dbinfo_DBTYPE()
+  $db->{city} and $dbapi->{dbinfo_city} = sub {
+    my $m = $_[0]->{db}->{city}->metadata();
+    return "GeoIP2 city: ".$m->description()->{en}." / ".localtime($m->build_epoch());
+  };
+  $db->{country} and $dbapi->{dbinfo_country} = sub {
+    my $m = $_[0]->{db}->{country}->metadata();
+    return "GeoIP2 country: ".$m->description()->{en}." / ".localtime($m->build_epoch());
+  };
+  $db->{isp} and $dbapi->{dbinfo_isp} = sub {
+    my $m = $_[0]->{db}->{isp}->metadata();
+    return "GeoIP2 isp: ".$m->description()->{en}." / ".localtime($m->build_epoch());
+  };
+  $db->{asn} and $dbapi->{dbinfo_asn} = sub {
+    my $m = $_[0]->{db}->{asn}->metadata();
+    return "GeoIP2 asn: ".$m->description()->{en}." / ".localtime($m->build_epoch());
+  };
+
+  # city()
+  $db->{city} and $dbapi->{city} = $dbapi->{city_v6} = sub {
+    my $res = {};
+    my $city;
+    eval {
+      $city = $_[0]->{db}->{city}->record_for_address($_[1]);
+      1;
+    } or do {
+      $@ =~ s/\s+Trace begun.*//s;
+      dbg("geodb: GeoIP2 city query failed for $_[1]: $@");
+      return $res;
+    };
+    eval {
+      $res->{city_name} = $city->{city}->{names}->{en};
+      $res->{country} = $city->{country}->{iso_code};
+      $res->{country_name} = $city->{country}->{names}->{en};
+      $res->{continent} = $city->{continent}->{code};
+      $res->{continent_name} = $city->{continent}->{names}->{en};
+      1;
+    };
+    return $res;
+  };
+
+  # country()
+  $db->{country} and $dbapi->{country} = $dbapi->{country_v6} = sub {
+    my $res = {};
+    my $country;
+    eval {
+      $country = $_[0]->{db}->{country}->record_for_address($_[1]);
+      1;
+    } or do {
+      $@ =~ s/\s+Trace begun.*//s;
+      dbg("geodb: GeoIP2 country query failed for $_[1]: $@");
+      return $res;
+    };
+    eval {
+      $res->{country} = $country->{country}->{iso_code};
+      $res->{country_name} = $country->{country}->{names}->{en};
+      $res->{continent} = $country->{continent}->{code};
+      $res->{continent_name} = $country->{continent}->{names}->{en};
+      1;
+    };
+    return $res;
+  };
+
+  # isp()
+  $db->{isp} and $dbapi->{isp} = $dbapi->{isp_v6} = sub {
+    my $res = {};
+    my $isp;
+    eval {
+      $isp = $_[0]->{db}->{isp}->record_for_address($_[1]);
+      1;
+    } or do {
+      $@ =~ s/\s+Trace begun.*//s;
+      dbg("geodb: GeoIP2 isp query failed for $_[1]: $@");
+      return $res;
+    };
+    eval {
+      $res->{asn} = $isp->{autonomous_system_number};
+      $res->{asn_organization} = $isp->{autonomous_system_organization};
+      $res->{isp} = $isp->{isp};
+      $res->{organization} = $isp->{organization};
+      1;
+    };
+    return $res;
+  };
+
+  # asn()
+  $db->{asn} and $dbapi->{asn} = $dbapi->{asn_v6} = sub {
+    my $res = {};
+    my $asn;
+    eval {
+      $asn = $_[0]->{db}->{asn}->record_for_address($_[1]);
+      1;
+    } or do {
+      $@ =~ s/\s+Trace begun.*//s;
+      dbg("geodb: GeoIP2 asn query failed for $_[1]: $@");
+      return $res;
+    };
+    eval {
+      $res->{asn} = $asn->{autonomous_system_number};
+      $res->{asn_organization} = $asn->{autonomous_system_organization};
+      1;
+    };
+    return $res;
+  };
+
+  return ($db, $dbapi);
+}
+
+sub load_geoip {
+  my ($self, $geodb_opts) = @_;
+  my ($db, $dbapi, $ok);
+  my ($gic_wanted, $gic_have, $gip_wanted, $gip_have);
+  my ($flags, $fix_stderr, $can_ipv6);
+
+  # Warn about fatal errors if this module was specifically requested
+  my $errwarn = ($geodb_opts->{module}||'') eq 'geoip';
+
+  eval {
+    require Geo::IP;
+    # need GeoIP C library 1.6.3 and GeoIP perl API 1.4.4 or later to avoid messages leaking - Bug 7153
+    $gip_wanted = version->parse('v1.4.4');
+    $gip_have = version->parse(Geo::IP->VERSION);
+    $gic_wanted = version->parse('v1.6.3');
+    eval { $gic_have = version->parse(Geo::IP->lib_version()); }; # might not have lib_version()
+    $gic_have = 'none' if !defined $gic_have;
+    dbg("geodb: GeoIP: versions: Geo::IP $gip_have, C library $gic_have");
+    $flags = 0;
+    $fix_stderr = 0;
+    if (ref($gic_have) eq 'version') {
+      # this code burps an ugly message if it fails, but that's redirected elsewhere
+      eval '$flags = Geo::IP::GEOIP_SILENCE' if $gip_wanted >= $gip_have;
+      $fix_stderr = $flags && $gic_wanted >= $gic_have;
+    }
+    $can_ipv6 = Geo::IP->VERSION >= 1.39 && Geo::IP->api eq 'CAPI';
+    1;
+  } or do {
+    my $err = $@;
+    $err =~ s/ at .*//s;
+    $err = "geodb: Geo::IP module load failed: $err";
+    $errwarn ? warn("$err\n") : dbg($err);
+    return (undef, undef);
+  };
+
+  my %path;
+  foreach my $dbtype (@geoip_types) {
+    # skip country if city already loaded
+    next if $dbtype eq 'country' && $db->{city};
+    # skip asn if isp already loaded
+    next if $dbtype eq 'asn' && $db->{isp};
+    # skip if not needed
+    next if $geodb_opts->{wanted} && !$geodb_opts->{wanted}->{$dbtype};
+    # only autosearch if no absolute path given
+    if (!defined $geodb_opts->{dbs}->{$dbtype}) {
+      # Try some default locations
+      PATHS_GEOIP: foreach my $p (@{$geodb_opts->{search_path}}) {
+        foreach my $f (@{$geoip_default_files{$dbtype}}) {
+          if (-f "$p/$f") {
+            $path{$dbtype} = "$p/$f";
+            dbg("geodb: GeoIP: search found $dbtype $p/$f");
+            if ($can_ipv6 && $f =~ s/\.(dat)$/v6.$1/i) {
+              if (-f "$p/$f") {
+                $path{$dbtype."_v6"} = "$p/$f";
+                dbg("geodb: GeoIP: search found $dbtype $p/$f");
+              }
+            }
+            last PATHS_GEOIP;
+          }
+        }
+      }
+    } else {
+      if (!-f $geodb_opts->{dbs}->{$dbtype}) {
+        dbg("geodb: GeoIP: $dbtype database requested, but not found: ".
+          $geodb_opts->{dbs}->{$dbtype});
+        next;
+      }
+      $path{$dbtype} = $geodb_opts->{dbs}->{$dbtype};
+    }
+  }
+
+  if (!$can_ipv6) {
+    dbg("geodb: GeoIP: IPv6 support not enabled, versions Geo::IP 1.39, GeoIP C API 1.4.7 required");
+  }
+
+  if ($fix_stderr) {
+    open(OLDERR, ">&STDERR");
+    open(STDERR, ">/dev/null");
+  }
+  foreach my $dbtype (@geoip_types) {
+    next unless defined $path{$dbtype};
+    eval {
+      $db->{$dbtype} = Geo::IP->open($path{$dbtype}, Geo::IP->GEOIP_STANDARD | $flags);
+      if ($can_ipv6 && defined $path{$dbtype."_v6"}) {
+        $db->{$dbtype."_v6"} = Geo::IP->open($path{$dbtype."_v6"}, Geo::IP->GEOIP_STANDARD | $flags);
+      }
+    };
+    if ($@ || !$db->{$dbtype}) {
+      my $err = $@;
+      $err =~ s/ at .*//s;
+      dbg("geodb: GeoIP: database $path{$dbtype} load failed: $err");
+    } else {
+      dbg("geodb: GeoIP: loaded $dbtype from $path{$dbtype}");
+      $ok = 1;
+    }
+  }
+  if ($fix_stderr) {
+    open(STDERR, ">&OLDERR");
+    close(OLDERR);
+  }
+
+  if (!$ok) {
+    warn("geodb: GeoIP requested, but no databases could be loaded\n") if $errwarn;
+    return (undef, undef)
+  }
+
+  # dbinfo_DBTYPE()
+  $db->{city} and $dbapi->{dbinfo_city} = sub {
+    return "Geo::IP IPv4 city: " . ($_[0]->{db}->{city}->database_info || '?')." / IPv6: ".
+      ($_[0]->{db}->{city_v6} ? $_[0]->{db}->{city_v6}->database_info || '?' : 'no')
+  };
+  $db->{country} and $dbapi->{dbinfo_country} = sub {
+    return "Geo::IP IPv4 country: " . ($_[0]->{db}->{country}->database_info || '?')." / IPv6: ".
+      ($_[0]->{db}->{country_v6} ? $_[0]->{db}->{country_v6}->database_info || '?' : 'no')
+  };
+  $db->{isp} and $dbapi->{dbinfo_isp} = sub {
+    return "Geo::IP IPv4 isp: " . ($_[0]->{db}->{isp}->database_info || '?')." / IPv6: ".
+      ($_[0]->{db}->{isp_v6} ? $_[0]->{db}->{isp_v6}->database_info || '?' : 'no')
+  };
+  $db->{asn} and $dbapi->{dbinfo_asn} = sub {
+    return "Geo::IP IPv4 asn: " . ($_[0]->{db}->{asn}->database_info || '?')." / IPv6: ".
+      ($_[0]->{db}->{asn_v6} ? $_[0]->{db}->{asn_v6}->database_info || '?' : 'no')
+  };
+
+  # city()
+  $db->{city} and $dbapi->{city} = sub {
+    my $res = {};
+    my $city;
+    if ($_[1] =~ IS_IPV4_ADDRESS) {
+      $city = $_[0]->{db}->{city}->record_by_addr($_[1]);
+    } elsif ($_[0]->{db}->{city_v6}) {
+      $city = $_[0]->{db}->{city_v6}->record_by_addr_v6($_[1]);
+    }
+    if (!defined $city) {
+      dbg("geodb: GeoIP city query failed for $_[1]");
+      return $res;
+    }
+    $res->{city_name} = $city->city;
+    $res->{country} = $city->country_code;
+    $res->{country_name} = $city->country_name;
+    $res->{continent} = $city->continent_code;
+    return $res;
+  };
+  $dbapi->{city_v6} = $dbapi->{city} if $db->{city_v6};
+
+  # country()
+  $db->{country} and $dbapi->{country} = sub {
+    my $res = {};
+    my $country;
+    eval {
+      if ($_[1] =~ IS_IPV4_ADDRESS) {
+        $country = $_[0]->{db}->{country}->country_code_by_addr($_[1]);
+      } elsif ($_[0]->{db}->{country_v6}) {
+        $country = $_[0]->{db}->{country_v6}->country_code_by_addr_v6($_[1]);
+      }
+      1;
+    };
+    if (!defined $country) {
+      dbg("geodb: GeoIP country query failed for $_[1]");
+      return $res;
+    };
+    $res->{country} = $country || 'XX';
+    $res->{continent} = $country_to_continent{$country} || 'XX';
+    return $res;
+  };
+  $dbapi->{country_v6} = $dbapi->{country} if $db->{country_v6};
+
+  # isp()
+  $db->{isp} and $dbapi->{isp} = sub {
+    my $res = {};
+    my $isp;
+    eval {
+      if ($_[1] =~ IS_IPV4_ADDRESS) {
+        $isp = $_[0]->{db}->{isp}->isp_by_addr($_[1]);
+      } else {
+        # TODO?
+        return $res;
+      }
+      1;
+    };
+    if (!defined $isp) {
+      dbg("geodb: GeoIP isp query failed for $_[1]");
+      return $res;
+    };
+    $res->{isp} = $isp;
+    return $res;
+  };
+
+  # asn()
+  $db->{asn} and $dbapi->{asn} = sub {
+    my $res = {};
+    my $asn;
+    eval {
+      if ($_[1] =~ IS_IPV4_ADDRESS) {
+        $asn = $_[0]->{db}->{asn}->isp_by_addr($_[1]);
+      } else {
+        # TODO?
+        return $res;
+      }
+      1;
+    };
+    if (!defined $asn || $asn !~ /^((?:AS)?\d+)(?:\s+(.+))?/) {
+      dbg("geodb: GeoIP asn query failed for $_[1]");
+      return $res;
+    };
+    $res->{asn} = $1;
+    $res->{asn_organization} = $2 if defined $2;
+    return $res;
+  };
+
+  return ($db, $dbapi);
+}
+
+sub load_dbfile {
+  my ($self, $geodb_opts) = @_;
+  my ($db, $dbapi);
+
+  # Warn about fatal errors if this module was specifically requested
+  my $errwarn = ($geodb_opts->{module}||'') eq 'dbfile';
+
+  if (!defined $geodb_opts->{dbs}->{country}) {
+    my $err = "geodb: IP::Country::DB_File requires geodb_options country:/path/to/ipcc.db";
+    $errwarn ? warn("$err\n") : dbg($err);
+    return (undef, undef);
+  }
+
+  if (!-f $geodb_opts->{dbs}->{country}) {
+    my $err = "geodb: IP::Country::DB_File database not found: ".$geodb_opts->{dbs}->{country};
+    $errwarn ? warn("$err\n") : dbg($err);
+    return (undef, undef);
+  }
+
+  eval {
+    require IP::Country::DB_File;
+    $db->{country} = IP::Country::DB_File->new($geodb_opts->{dbs}->{country});
+    1;
+  };
+  if ($@ || !$db->{country}) {
+    my $err = $@;
+    $err =~ s/ at .*//s;
+    $err = "geodb: IP::Country::DB_File country load failed: $err";
+    $errwarn ? warn("$err\n") : dbg($err);
+    return (undef, undef);
+  } else {
+    dbg("geodb: IP::Country::DB_File loaded country from ".$geodb_opts->{dbs}->{country});
+  }
+
+  # dbinfo_DBTYPE()
+  $db->{country} and $dbapi->{dbinfo_country} = sub {
+    return "IP::Country::DB_File country: ".localtime($_[0]->{db}->{country}->db_time());
+  };
+
+  # country();
+  $db->{country} and $dbapi->{country} = $dbapi->{country_v6} = sub {
+    my $res = {};
+    my $country;
+    if ($_[1] =~ IS_IPV4_ADDRESS) {
+      $country = $_[0]->{db}->{country}->inet_atocc($_[1]);
+    } else {
+      $country = $_[0]->{db}->{country}->inet6_atocc($_[1]);
+    }
+    if (!defined $country) {
+      dbg("geodb: IP::Country::DB_File country query failed for $_[1]");
+      return $res;
+    };
+    $res->{country} = $country || 'XX';
+    $res->{continent} = $country_to_continent{$country} || 'XX';
+    return $res;
+  };
+
+  return ($db, $dbapi);
+}
+
+sub load_fast {
+  my ($self, $geodb_opts) = @_;
+  my ($db, $dbapi);
+
+  # Warn about fatal errors if this module was specifically requested
+  my $errwarn = ($geodb_opts->{module}||'') eq 'fast';
+
+  eval {
+    require IP::Country::Fast;
+    $db->{country} = IP::Country::Fast->new();
+    1;
+  };
+  if ($@ || !$db->{country}) {
+    my $err = $@;
+    $err =~ s/ at .*//s;
+    $err = "geodb: IP::Country::Fast load failed: $err";
+    $errwarn ? warn("$err\n") : dbg($err);
+    return (undef, undef);
+  }
+
+  # dbinfo_DBTYPE()
+  $db->{country} and $dbapi->{dbinfo_country} = sub {
+    return "IP::Country::Fast country: ".localtime($_[0]->{db}->{country}->db_time());
+  };
+
+  # country();
+  $db->{country} and $dbapi->{country} = sub {
+    my $res = {};
+    my $country;
+    if ($_[1] =~ IS_IPV4_ADDRESS) {
+      $country = $_[0]->{db}->{country}->inet_atocc($_[1]);
+    } else {
+      return $res;
+    }
+    if (!defined $country) {
+      dbg("geodb: IP::Country::Fast country query failed for $_[1]");
+      return $res;
+    };
+    $res->{country} = $country || 'XX';
+    $res->{continent} = $country_to_continent{$country} || 'XX';
+    return $res;
+  };
+
+  return ($db, $dbapi);
+}
+
+# return array, infoline per database type
+sub get_dbinfo {
+  my ($self, $db) = @_;
+
+  my @lines;
+  foreach (@geoip_types) {
+    if (exists $self->{dbapi}->{"dbinfo_".$_}) {
+      push @lines,
+        $self->{dbapi}->{"dbinfo_".$_}->($self) || "$_ failed";
+    }
+  }
+
+  return \@lines;
+}
+
+sub get_country {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  if ($ip =~ IS_IP_PRIVATE) {
+    return '**';
+  }
+
+  if ($ip !~ IS_IP_ADDRESS) {
+    $ip = name_to_ip($ip);
+    return 'XX' if !defined $ip;
+  }
+  
+  if ($self->{dbapi}->{city}) {
+    return $self->_get('city',$ip)->{country} || 'XX';
+  } elsif ($self->{dbapi}->{country}) {
+    return $self->_get('country',$ip)->{country} || 'XX';
+  } else {
+    return undef; ## no critic (ProhibitExplicitReturnUndef)
+  }
+}
+
+sub get_continent {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  # If it's already CC, use our own lookup table..
+  if (length($ip) == 2) {
+    return $country_to_continent{uc($ip)} || 'XX';
+  }
+
+  if ($self->{dbapi}->{city}) {
+    return $self->_get('city',$ip)->{continent} || 'XX';
+  } elsif ($self->{dbapi}->{country}) {
+    return $self->_get('country',$ip)->{continent} || 'XX';
+  } else {
+    return undef; ## no critic (ProhibitExplicitReturnUndef)
+  }
+}
+
+sub get_isp {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  if ($self->{dbapi}->{isp}) {
+    return $self->_get('isp',$ip)->{isp};
+  } else {
+    return undef; ## no critic (ProhibitExplicitReturnUndef)
+  }
+}
+
+sub get_isp_org {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  if ($self->{dbapi}->{isp}) {
+    return $self->_get('isp',$ip)->{organization};
+  } else {
+    return undef; ## no critic (ProhibitExplicitReturnUndef)
+  }
+}
+
+sub get_asn {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  if ($self->{dbapi}->{asn}) {
+    return $self->_get('asn',$ip)->{asn};
+  } elsif ($self->{dbapi}->{isp}) {
+    return $self->_get('isp',$ip)->{asn};
+  } else {
+    return undef; ## no critic (ProhibitExplicitReturnUndef)
+  }
+}
+
+sub get_asn_org {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  if ($self->{dbapi}->{asn}) {
+    return $self->_get('asn',$ip)->{asn_organization};
+  } elsif ($self->{dbapi}->{isp}) {
+    return $self->_get('isp',$ip)->{asn_organization};
+  } else {
+    return undef; ## no critic (ProhibitExplicitReturnUndef)
+  }
+}
+
+sub get_all {
+  my ($self, $ip) = @_;
+
+  return undef if !defined $ip || $ip !~ /\S/; ## no critic (ProhibitExplicitReturnUndef)
+
+  my $all = {};
+
+  if ($ip =~ IS_IP_PRIVATE) {
+    return { 'country' => '**' };
+  }
+
+  if ($ip !~ IS_IP_ADDRESS) {
+    $ip = name_to_ip($ip);
+    if (!defined $ip) {
+      return { 'country' => 'XX' };
+    }
+  }
+  
+  if ($self->{dbapi}->{city}) {
+    my $res = $self->_get('city',$ip);
+    $all->{$_} = $res->{$_} foreach (keys %$res);
+  } elsif ($self->{dbapi}->{country}) {
+    my $res = $self->_get('country',$ip);
+    $all->{$_} = $res->{$_} foreach (keys %$res);
+  }
+
+  if ($self->{dbapi}->{isp}) {
+    my $res = $self->_get('isp',$ip);
+    $all->{$_} = $res->{$_} foreach (keys %$res);
+  }
+
+  if ($self->{dbapi}->{asn}) {
+    my $res = $self->_get('asn',$ip);
+    $all->{$_} = $res->{$_} foreach (keys %$res);
+  }
+
+  return $all;
+}
+
+sub can {
+  my ($self, $check) = @_;
+
+  return defined $self->{dbapi}->{$check};
+}
+
+# TODO: use SA internal dns synchronously?
+# This shouldn't be called much, as plugins
+# should do their own resolving if needed
+sub name_to_ip {
+  my $name = shift;
+  if (my $ip = inet_aton($name)) {
+    $ip = inet_ntoa($ip);
+    dbg("geodb: resolved internally $name: $ip");
+    return $ip;
+  }
+  dbg("geodb: failed to internally resolve $name");
+  return undef; ## no critic (ProhibitExplicitReturnUndef)
+}
+
+sub _get {
+  my ($self, $type, $ip) = @_;
+
+  # reset cache at 100 ips
+  if (scalar keys %{$self->{cache}} >= 100) {
+    $self->{cache} = ();
+  }
+
+  if (!exists $self->{cache}{$ip}{$type}) {
+    if ($self->{dbapi}->{$type}) {
+      $self->{cache}{$ip}{$type} = $self->{dbapi}->{$type}->($self,$ip);
+    } else {
+      return undef; ## no critic (ProhibitExplicitReturnUndef)
+    }
+  }
+
+  return $self->{cache}{$ip}{$type};
+}
+
+1;
--- a/upstream/lib/Mail/SpamAssassin/HTML.pm
+++ b/upstream/lib/Mail/SpamAssassin/HTML.pm
@ -24,9 +24,6 @@ use strict;
 use warnings;
 use re 'taint';

-require 5.008;     # need basic Unicode support for HTML::Parser::utf8_mode
-# require 5.008008;  # Bug 3787; [perl #37950]: Malformed UTF-8 character ...
-
 use HTML::Parser 3.43 ();
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Constants qw(:sa);
@ -66,7 +63,7 @@ my %elements_whitespace = map {; $_ => 1 }

 # elements that push URIs
 my %elements_uri = map {; $_ => 1 }
-  qw( body table tr td a area link img frame iframe embed script form base bgsound ),
+  qw( body table tr td a area link img frame iframe embed script form base bgsound meta ),
 ;

 # style attribute not accepted
@ -248,6 +245,18 @@ sub parse {
  # the HTML::Parser API won't do it for us
  $text =~ s/<(\w+)\s*\/>/<$1>/gi;

+  # Normalize unicode quotes, messes up attributes parsing
+  # U+201C e2 80 9c LEFT DOUBLE QUOTATION MARK
+  # U+201D e2 80 9d RIGHT DOUBLE QUOTATION MARK
+  # Examples of input:
+  # <a href=\x{E2}\x{80}\x{9D}https://foobar.com\x{E2}\x{80}\x{9D}>
+  # .. results in uri "\x{E2}\x{80}\x{9D}https://foobar.com\x{E2}\x{80}\x{9D}"
+  if (utf8::is_utf8($text)) {
+    $text =~ s/(?:\x{201C}|\x{201D})/"/g;
+  } else {
+    $text =~ s/\x{E2}\x{80}(?:\x{9C}|\x{9D})/"/g;
+  }
+
  if (!$self->UNIVERSAL::can('utf8_mode')) {
    # utf8_mode is cleared by default, only warn if it would need to be set
    warn "message: cannot set utf8_mode, module HTML::Parser is too old\n"
@ -352,8 +361,8 @@ sub canon_uri {
  my ($self, $uri) = @_;

  # URIs don't have leading/trailing whitespace ...
-  $uri =~ s/^\s+//;
-  $uri =~ s/\s+$//;
+  $uri =~ s/^[\s\xA0]+//;
+  $uri =~ s/[\s\xA0]+$//;

  # Make sure all the URIs are nice and short
  if (length $uri > MAX_URI_LENGTH) {
@ -414,6 +423,17 @@ sub html_uri {
      }
    }
  }
+  elsif ($tag eq "meta" &&
+    exists $attr->{'http-equiv'} &&
+    exists $attr->{content} &&
+    $attr->{'http-equiv'} =~ /refresh/i &&
+    $attr->{content} =~ /\burl\s*=/i)
+  {
+      my $uri = $attr->{content};
+      $uri =~ s/^.*\burl\s*=\s*//i;
+      $uri =~ s/\s*;.*//i;
+      $self->push_uri($tag, $uri);
+  }
 }

 # this might not be quite right, may need to pay attention to table nesting
@ -516,7 +536,7 @@ sub text_style {
 	    my $whcolor = $1 ? 'bgcolor' : 'fgcolor';
 	    my $value = lc $2;

-	    if ($value =~ /rgb/) {
+	    if (index($value, 'rgb') >= 0) {
 	      $value =~ tr/0-9,//cd;
 	      my @rgb = split(/,/, $value);
              $new{$whcolor} = sprintf("#%02x%02x%02x",
@ -705,6 +725,8 @@ sub html_tests {
  {
    $self->{charsets} .= exists $self->{charsets} ? " $1" : $1;
  }
+
+  # todo: capture URI from meta refresh tag
 }

 sub display_text {
@ -1154,7 +1176,7 @@ sub _merge_uri {
    return "/" . $r_path;
  }
  else {
-    if ($base_path =~ m|/|) {
+    if (index($base_path, '/') >= 0) {
      $base_path =~ s|(?<=/)[^/]*$||;
    }
    else {
--- a/upstream/lib/Mail/SpamAssassin/Locales.pm
+++ b/upstream/lib/Mail/SpamAssassin/Locales.pm
@ -78,9 +78,6 @@ sub is_charset_ok_for_locales {
  $cs =~ s/:.*$//gs;            # trim off multiple charsets, just use 1st
  dbg ("locales: is $cs ok for @locales?");

-  study $cs;  # study is a no-op since perl 5.16.0, eliminating related bugs
-  #warn "JMD $cs";
-
  # always OK (the net speaks mostly roman charsets)
  return 1 if ($cs eq 'USASCII');
  return 1 if ($cs eq 'ASCII');
--- a/upstream/lib/Mail/SpamAssassin/Locker.pm
+++ b/upstream/lib/Mail/SpamAssassin/Locker.pm
@ -69,6 +69,11 @@ sub jittery_one_second_sleep {
  Time::HiRes::sleep(rand(1.0) + 0.5);
 }

+sub jittery_half_second_sleep {
+  my ($self) = @_;
+  Time::HiRes::sleep(rand(0.5) + 0.25);
+}
+
 ###########################################################################

 1;
--- a/upstream/lib/Mail/SpamAssassin/Locker/Flock.pm
+++ b/upstream/lib/Mail/SpamAssassin/Locker/Flock.pm
@ -55,7 +55,7 @@ sub safe_lock {

  my $lock_file = "$path.mutex";
  my $umask = umask(~$mode);
-  my $fh = new IO::File();
+  my $fh = IO::File->new;

  if (!$fh->open ($lock_file, O_RDWR|O_CREAT)) {
      umask $umask; # just in case
--- a/upstream/lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
+++ b/upstream/lib/Mail/SpamAssassin/Locker/UnixNFSSafe.pm
@ -29,6 +29,7 @@ use Mail::SpamAssassin::Logger;
 use File::Spec;
 use Time::Local;
 use Fcntl qw(:DEFAULT :flock);
+use Errno qw(EEXIST);

 our @ISA = qw(Mail::SpamAssassin::Locker);

@ -60,7 +61,7 @@ sub safe_lock {
  $max_retries ||= 30;
  $mode ||= "0700";
  $mode = (oct $mode) & 0666;
-  dbg ("locker: mode is $mode");
+  dbg ("locker: mode is %03o", $mode);

  my $lock_file = "$path.lock";
  my $hname = Mail::SpamAssassin::Util::fq_hostname();
@ -76,11 +77,11 @@ sub safe_lock {
      die "locker: safe_lock: cannot create tmp lockfile $lock_tmp for $lock_file: $!\n";
  }
  umask $umask;
-  autoflush LTMP 1;
+  LTMP->autoflush(1);
  dbg("locker: safe_lock: created $lock_tmp");

-  for (my $retries = 0; $retries < $max_retries; $retries++) {
-    if ($retries > 0) { $self->jittery_one_second_sleep(); }
+  for (my $retries = 0; $retries < $max_retries * 2; $retries++) {
+    if ($retries > 0) { $self->jittery_half_second_sleep(); }
    print LTMP "$hname.$$\n"  or warn "Error writing to $lock_tmp: $!";
    dbg("locker: safe_lock: trying to get lock on $path with $retries retries");
    if (link($lock_tmp, $lock_file)) {
@ -88,6 +89,10 @@ sub safe_lock {
      $is_locked = 1;
      last;
    }
+    # if lock exists, it's already likely locked, no point complaining here
+    unless ($!{EEXIST}) {
+      warn "locker: creating link $lock_file to $lock_tmp failed: '$!'";
+    }
    # link _may_ return false even if the link _is_ created
    @stat = lstat($lock_tmp);
    @stat  or warn "locker: error accessing $lock_tmp: $!";
@ -149,7 +154,7 @@ sub safe_unlock {
    warn "locker: safe_unlock: failed to create lock tmpfile $lock_tmp: $!";
    return;
  } else {
-    autoflush LTMP 1;
+    LTMP->autoflush(1);
    print LTMP "\n"  or warn "Error writing to $lock_tmp: $!";

    if (!(@stat_ourtmp = stat(LTMP)) || (scalar(@stat_ourtmp) < 11)) {
@ -157,7 +162,7 @@ sub safe_unlock {
      warn "locker: safe_unlock: failed to create lock tmpfile $lock_tmp";
      close LTMP  or die "error closing $lock_tmp: $!";
      unlink($lock_tmp)
-        or warn "locker: safe_lock: unlink of lock file failed: $!\n";
+        or warn "locker: safe_lock: unlink of lock file $lock_tmp failed: $!\n";
      return;
    }
  }
@ -169,7 +174,7 @@ sub safe_unlock {

  close LTMP  or die "error closing $lock_tmp: $!";
  unlink($lock_tmp)
-    or warn "locker: safe_lock: unlink of lock file failed: $!\n";
+    or warn "locker: safe_lock: unlink of lock file $lock_tmp failed: $!\n";

  # 2. If the ctime hasn't been modified, unlink the file and return. If the
  # lock has expired, sleep the usual random interval before returning. If we
@ -191,7 +196,7 @@ sub safe_unlock {
  {
    # things are good: the ctimes match so it was our lock
    unlink($lock_file)
-      or warn "locker: safe_unlock: unlink failed: $lock_file\n";
+      or warn "locker: safe_unlock: unlinking $lock_file failed: $!\n";
    dbg("locker: safe_unlock: unlink $lock_file");

    if ($ourtmp_ctime >= $lock_ctime + LOCK_MAX_AGE) {
--- a/upstream/lib/Mail/SpamAssassin/Locker/Win32.pm
+++ b/upstream/lib/Mail/SpamAssassin/Locker/Win32.pm
@ -72,7 +72,7 @@ sub safe_lock {
      return 1;
    }
    my @stat = stat($lock_file);
-    @stat  or warn "locker: error accessing $lock_file: $!";
+    @stat  or dbg("locker: error accessing $lock_file: $!");

    # check age of lockfile ctime
    my $age = ($#stat < 11 ? undef : $stat[10]);
--- a/upstream/lib/Mail/SpamAssassin/Logger.pm
+++ b/upstream/lib/Mail/SpamAssassin/Logger.pm
@ -75,7 +75,22 @@ $LOG_SA{facility} = {};		# no dbg facilities turned on

 # always log to stderr initially
 use Mail::SpamAssassin::Logger::Stderr;
-$LOG_SA{method}->{stderr} = Mail::SpamAssassin::Logger::Stderr->new();
+$LOG_SA{method}->{stderr} =
+  Mail::SpamAssassin::Logger::Stderr->new(escape =>
+    exists $ENV{'SA_LOGGER_ESCAPE'} ? $ENV{'SA_LOGGER_ESCAPE'} : 1
+  );
+
+# Use of M:SA:Util causes circular dependencies, separate helper here.
+my %escape_map =
+  ("\r" => '\\r', "\n" => '\\n', "\t" => '\\t', "\\" => '\\\\');
+sub escape_str {
+  # Things are already forced as octets by _log, no utf8::encode needed
+  # Control chars, DEL, backslash
+  $_[0] =~ s@
+    ( [\x00-\x1F\x7F\x80-\xFF\\] )
+    @ $escape_map{$1} || sprintf("\\x{%02X}",ord($1))
+    @egsx;
+}

 =head1 METHODS

@ -165,7 +180,7 @@ sub log_message {
    # don't log them -- this is caller 0, the use'ing package is 1, the eval is 2
    my @caller = caller 2;
    return if (defined $caller[3] && defined $caller[0] &&
-		       $caller[3] =~ /^\(eval\)$/ &&
+		       $caller[3] eq '(eval)' &&
 		       $caller[0] =~ m#^Mail::SpamAssassin(?:$|::)#);
  }

@ -215,13 +230,15 @@ sub _log_message {
  foreach my $line (split(/\n/, $_[1])) {
    # replace control characters with "_", tabs and spaces get
    # replaced with a single space.
-    $line =~ tr/\x09\x20\x00-\x1f/  _/s;
+    # Deprecated here, see new Bug 6583 escaping in Logger/*.pm modules
+    #$line =~ tr/\x09\x20\x00-\x1f/  _/s;
+
    if ($first) {
      $first = 0;
    } else {
-      local $1;
      $line =~ s/^([^:]+?):/$1: [...]/;
    }
+
    while (my ($name, $object) = each %{ $LOG_SA{method} }) {
      $object->log_message($_[0], $line, $_[2]);
    }
@ -274,6 +291,9 @@ sub _log {
  }

  my ($level, $message, @args) = @_;
+
+  utf8::encode($message)  if utf8::is_utf8($message); # handle as octets
+
  $message =~ s/^(?:[a-z0-9_-]*):\s*//i;

  $message = sprintf($message,@args)  if @args;
@ -284,18 +304,28 @@ sub _log {
  log_message(($level == INFO ? "info" : "dbg"), $message);
 }

-=item add(method => 'syslog', socket => $socket, facility => $facility)
+=item add(method => 'syslog', socket => $socket, facility => $facility, escape => $escape)

 C<socket> is the type the syslog ("unix" or "inet").  C<facility> is the
 syslog facility (typically "mail").

-=item add(method => 'file', filename => $file)
+If optional C<escape> is true, all non-ascii characters are escaped for safe
+output: backslashes change to \\ and non-ascii chars to \x{XX} or \x{XXXX}
+(Unicode).  If not defined, pre-4.0 style sanitizing is used
+( tr/\x09\x20\x00-\x1f/_/s ).

-C<filename> is the name of the log file.
+Escape value can be overridden with environment variable
+C<SA_LOGGER_ESCAPE>.

-=item add(method => 'stderr')
+=item add(method => 'file', filename => $file, escape => $escape)
+
+C<filename> is the name of the log file.  C<escape> works as described
+above.
+
+=item add(method => 'stderr', escape => $escape)

 No options are needed for stderr logging, just don't close stderr first. 
+C<escape> works as described above.

 =cut

@ -307,6 +337,10 @@ sub add {

  return 0 if $class !~ /^\w+$/; # be paranoid

+  if (exists $ENV{'SA_LOGGER_ESCAPE'}) {
+    $params{escape} = $ENV{'SA_LOGGER_ESCAPE'}
+  }
+
  eval 'use Mail::SpamAssassin::Logger::'.$class.'; 1'
  or do {
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
--- a/upstream/lib/Mail/SpamAssassin/Logger/File.pm
+++ b/upstream/lib/Mail/SpamAssassin/Logger/File.pm
@ -37,13 +37,17 @@ use re 'taint';
 use POSIX ();
 use Time::HiRes ();
 use Mail::SpamAssassin::Logger;
-use Mail::SpamAssassin::Util qw(am_running_on_windows);

 our @ISA = ();

 # ADDING OS-DEPENDENT LINE TERMINATOR - BUG 6456
+
+# Using Mail::SpamAssassin::Util::am_running_on_windows() leads to circular
+# dependencies. So, we are duplicating the code instead.
+use constant RUNNING_ON_WINDOWS => ($^O =~ /^(?:mswin|dos|os2)/oi);
+
 my $eol = "\n";
-if (am_running_on_windows()) {
+if (RUNNING_ON_WINDOWS) {
  $eol = "\r\n";
 }

@ -58,6 +62,7 @@ sub new {
  my %params = @_;
  $self->{filename} = $params{filename} || 'spamassassin.log';
  $self->{timestamp_fmt} = $params{timestamp_fmt};
+  $self->{escape} = $params{escape} if exists $params{escape};

  if (! $self->init()) {
    die "logger: file initialization failed$eol";
@ -101,6 +106,16 @@ sub log_message {
  }
  $timestamp .= ' '  if $timestamp ne '';

+  if ($self->{escape}) {
+    # Bug 6583, escape
+    Mail::SpamAssassin::Logger::escape_str($msg);
+  } elsif (!exists $self->{escape}) {
+    # Backwards compatible pre-4.0 escaping, if $escape not given.
+    # replace control characters with "_", tabs and spaces get
+    # replaced with a single space.
+    $msg =~ tr/\x09\x20\x00-\x1f/  _/s;
+  }
+
  my($nwrite) = syswrite(STDLOG, sprintf("%s[%s] %s: %s%s",
                                         $timestamp, $$, $level, $msg, $eol));
  defined $nwrite  or warn "error writing to log file: $!";
--- a/upstream/lib/Mail/SpamAssassin/Logger/Stderr.pm
+++ b/upstream/lib/Mail/SpamAssassin/Logger/Stderr.pm
@ -36,6 +36,7 @@ use re 'taint';

 use POSIX ();
 use Time::HiRes ();
+use Mail::SpamAssassin::Logger;

 our @ISA = ();

@ -59,6 +60,7 @@ sub new {

  my %params = @_;
  $self->{timestamp_fmt} = $params{timestamp_fmt};
+  $self->{escape} = $params{escape} if exists $params{escape};

  return($self);
 }
@ -83,6 +85,16 @@ sub log_message {
  }
  $timestamp .= ' '  if $timestamp ne '';

+  if ($self->{escape}) {
+    # Bug 6583, escape
+    Mail::SpamAssassin::Logger::escape_str($msg);
+  } elsif (!exists $self->{escape}) {
+    # Backwards compatible pre-4.0 escaping, if $escape not given.
+    # replace control characters with "_", tabs and spaces get
+    # replaced with a single space.
+    $msg =~ tr/\x09\x20\x00-\x1f/  _/s;
+  }
+
  my($nwrite) = syswrite(STDERR, sprintf("%s[%d] %s: %s%s",
                                         $timestamp, $$, $level, $msg, $eol));
  defined $nwrite  or warn "error writing to log file: $!";
--- a/upstream/lib/Mail/SpamAssassin/Logger/Syslog.pm
+++ b/upstream/lib/Mail/SpamAssassin/Logger/Syslog.pm
@ -73,6 +73,7 @@ sub new {
  $self->{log_socket} = $params{socket};
  $self->{log_facility} = $params{facility};
  $self->{timestamp_fmt} = $params{timestamp_fmt};
+  $self->{escape} = $params{escape} if exists $params{escape};

  if (! $self->init()) {
    die "logger: syslog initialization failed\n";
@ -148,6 +149,16 @@ sub log_message {
    $msg = '(bad prio: ' . $_[1] . ') ' . $msg;
  }

+  if ($self->{escape}) {
+    # Bug 6583, escape
+    Mail::SpamAssassin::Logger::escape_str($msg);
+  } elsif (!exists $self->{escape}) {
+    # Backwards compatible pre-4.0 escaping, if $escape not given
+    # replace control characters with "_", tabs and spaces get
+    # replaced with a single space.
+    $msg =~ tr/\x09\x20\x00-\x1f/  _/s;
+  }
+
  # install a new handler for SIGPIPE -- this signal has been
  # found to occur with syslog-ng after syslog-ng restarts.
  local $SIG{'PIPE'} = sub {
--- a/upstream/lib/Mail/SpamAssassin/Message.pm
+++ b/upstream/lib/Mail/SpamAssassin/Message.pm
@ -46,10 +46,8 @@ use strict;
 use warnings;
 use re 'taint';

-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1 sha1_hex); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1 sha1_hex) }
-}
+use Digest::SHA qw(sha1 sha1_hex);
+use Scalar::Util qw(tainted);

 use Mail::SpamAssassin;
 use Mail::SpamAssassin::Message::Node;
@ -159,7 +157,7 @@ sub new {
  if (ref $message eq 'ARRAY') {
     @message = @{$message};
  }
-  elsif (ref($message) eq 'GLOB' || ref($message) =~ /^IO::/) {
+  elsif (ref($message) eq 'GLOB' || index(ref($message), 'IO::') == 0) {
    if (defined fileno $message) {

      # sysread+split avoids a Perl I/O bug (Bug 5985)
@ -207,14 +205,14 @@ sub new {
  # messages? Tainting the message is important because it prevents certain
  # exploits later.
  if (Mail::SpamAssassin::Util::am_running_in_taint_mode() &&
-        grep { !Scalar::Util::tainted($_) } @message) {
+        grep { !tainted($_) } @message) {
    local($_);
    # To preserve newlines, no joining and splitting here, process each line
    # directly as is.
    foreach (@message) {
      $_ = Mail::SpamAssassin::Util::taint_var($_);
    }
-    if (grep { !Scalar::Util::tainted($_) } @message) {
+    if (grep { !tainted($_) } @message) {
      die "Mail::SpamAssassin::Message failed to enforce message taintness";
    }
  }
@ -256,7 +254,7 @@ sub new {
  # bug 4363
  # Check to see if we should do CRLF instead of just LF
  # For now, just check the first and last line and do whatever it does
-  if (@message && ($message[0] =~ /\015\012/ || $message[-1] =~ /\015\012/)) {
+  if (index($message[0], "\015\012") != -1 || index($message[-1], "\015\012") != -1) {
    $self->{line_ending} = "\015\012";
    dbg("message: line ending changed to CRLF");
  }
@ -270,7 +268,12 @@ sub new {
  for (;;) {
    # make sure not to lose the last header field when there is no body
    my $eof = !@message;
-    my $current = $eof ? "\n" : shift @message;
+    my $current = $eof ? $self->{line_ending} : shift @message;
+
+    # Bug 7785: spamass-milter breaks wrapped headers, add any missing \r
+    if ($squash_crlf) {
+      $current =~ s/(?<!\015)\012/\015\012/gs;
+    }

    if ( $current =~ /^[ \t]/ ) {
      # This wasn't useful in terms of a rule, but we may want to treat it
@ -306,7 +309,7 @@ sub new {
        }
      }

-      if ($current =~ /^\r?$/) {  # a regular end of a header section
+      if ($current eq $self->{line_ending}) {  # a regular end of a header section
 	if ($eof) {
 	  $self->{'missing_head_body_separator'} = 1;
 	} else {
@ -395,7 +398,7 @@ sub new {
  # either a blank line or the boundary (if defined), insert a blank line
  # to ensure proper parsing - do not consider MIME headers at the beginning of the body
  # to be part of the message headers.
-  if ($self->{'type'} =~ /^multipart\//i && $#message > 0 && $message[0] =~ /\S/)
+  if (index($self->{'type'}, 'multipart/') == 0 && $#message > 0 && $message[0] =~ /\S/)
  {
    if (!defined $boundary || $message[0] !~ /^--\Q$boundary\E/)
    {
@ -537,6 +540,77 @@ sub get_pristine_body {
  return $self->{pristine_body};
 }

+=item get_pristine_body_digest()
+
+Returns SHA1 hex digest of the pristine message body.
+CRLF line endings are normalized to LF before hashing.
+
+=cut
+
+sub get_pristine_body_digest {
+  my ($self) = @_;
+
+  return $self->{pristine_body_digest} if exists $self->{pristine_body_digest};
+
+  if ($self->{line_ending} eq "\015\012") {
+    # Don't make a copy, process line by line to save memory
+    # CRLF should be exception, so it's not that critical here
+    my $sha = Digest::SHA->new('sha1');
+    while ($self->{pristine_body} =~ /(.*?)(\015\012)?/gs) {
+      $sha->add($1.(defined $2 ? "\012" : ""));
+    }
+    $self->{pristine_body_digest} = $sha->hexdigest;
+  } else {
+    $self->{pristine_body_digest} = sha1_hex($self->{pristine_body});
+  }
+
+  dbg("message: pristine body digest: ".$self->{pristine_body_digest});
+  return $self->{pristine_body_digest};
+}
+
+# ---------------------------------------------------------------------------
+
+=item get_msgid()
+
+Returns Message-ID header for the message, with <> and surrounding
+whitespace removed. Returns undef, if nothing found between <>.
+
+=cut
+
+sub get_msgid {
+  my ($self) = @_;
+
+  my $msgid = $self->get_header("Message-Id");
+  if (defined $msgid && $msgid =~ /^\s*<(.+)>\s*$/s) {
+    return $1;
+  } else {
+    return;
+  }
+}
+
+=item generate_msgid()
+
+Generate a calculated "Message-ID" in B<sha1hex@sa_generated> format, using
+To, Date headers and pristine body as source for hashing.
+
+=cut
+
+sub generate_msgid {
+  my ($self) = @_;
+
+  return $self->{msgid_generated} if exists $self->{msgid_generated};
+
+  # See Bug 5185, not using Received headers etc anymore
+  my $to = $self->get_header("To") || '';
+  my $date = $self->get_header("Date") || '';
+  my $body_digest = $self->get_pristine_body_digest();
+
+  $self->{msgid_generated} =
+    sha1_hex($to."\000".$date."\000".$body_digest).'@sa_generated';
+
+  return $self->{msgid_generated};
+}
+
 # ---------------------------------------------------------------------------

 =item extract_message_metadata($permsgstatus)
@ -706,6 +780,7 @@ sub finish {
 # temporary files are deleted even if the finish() method is omitted
 sub DESTROY {
  my $self = shift;
+
  # best practices: prevent potential calls to eval and to system routines
  # in code of a DESTROY method from clobbering global variables $@ and $! 
  local($@,$!);  # keep outer error handling unaffected by DESTROY
@ -774,7 +849,7 @@ sub parse_body {
    #
    my ($msg, $boundary, $body, $subparse) = @$toparse;

-    if ($msg->{'type'} =~ m{^multipart/}i && defined $boundary && $subparse > 0) {
+    if (index($msg->{'type'}, 'multipart/') == 0 && defined $boundary && $subparse > 0) {
      $self->_parse_multipart($toparse);
    }
    else {
@ -782,7 +857,8 @@ sub parse_body {
      $self->_parse_normal($toparse);

      # bug 5041: process message/*, but exclude message/partial content types
-      if ($msg->{'type'} =~ m{^message/(?!partial\z)}i && $subparse > 0)
+      if (index($msg->{'type'}, 'message/') == 0 &&
+          $msg->{'type'} ne 'message/partial' && $subparse > 0)
      {
        # Just decode the part, but we don't need the resulting string here.
        $msg->decode(0);
@ -798,7 +874,7 @@ sub parse_body {
        # bug 5051, bug 3748: check $msg->{decoded}: sometimes message/* parts
        # have no content, and we get stuck waiting for STDIN, which is bad. :(

-        if ($msg->{'type'} =~ m{^message/(?:rfc822|global)\z}i &&
+        if (($msg->{'type'} eq 'message/rfc822' || $msg->{'type'} eq 'message/global') &&
            defined $msg->{'decoded'} && $msg->{'decoded'} ne '')
        {
 	  # Ok, so this part is still semi-recursive, since M::SA::Message
@ -857,6 +933,7 @@ sub _parse_multipart {
  my($self, $toparse) = @_;

  my ($msg, $boundary, $body, $subparse) = @{$toparse};
+  my $nested_boundary = 0;

  # we're not supposed to be a leaf, so prep ourselves
  $msg->{'body_parts'} = [];
@ -907,6 +984,7 @@ sub _parse_multipart {
  my $header;
  my $part_array;
  my $found_end_boundary;
+  my $found_last_end_boundary;
  my $partcnt = 0;

  my $line_count = @{$body};
@ -915,7 +993,12 @@ sub _parse_multipart {
    # deal with the mime part;
    # a triage before an unlikely-to-match regexp avoids a CPU hotspot
    $found_end_boundary = defined $boundary && substr($_,0,2) eq '--'
-                          && /^--\Q$boundary\E(?:--)?\s*$/;
+                          && /^--\Q$boundary\E(--)?\s*$/;
+    $found_last_end_boundary = $found_end_boundary && $1;
+    if ($found_end_boundary && $nested_boundary) {
+      $found_end_boundary = 0;
+      $nested_boundary = 0 if ($found_last_end_boundary); # bug 7358 - handle one level of non-unique boundary string
+    }
    if ( --$line_count == 0 || $found_end_boundary ) {
      my $line = $_; # remember the last line

@ -951,8 +1034,20 @@ sub _parse_multipart {
        $part_array = [];
      }

-      my($p_boundary);
-      ($part_msg->{'type'}, $p_boundary) = Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type'));
+      ($part_msg->{'type'}, my $p_boundary, undef, undef, my $ct_was_missing) =
+          Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type'));
+
+      # bug 5741: if ct was missing and parent == multipart/digest, then
+      # type should be set as message/rfc822
+      if ($ct_was_missing) {
+        if ($msg->{'type'} eq 'multipart/digest') {
+          dbg("message: missing type, setting multipart/digest child as message/rfc822");
+          $part_msg->{'type'} = 'message/rfc822';
+        } else {
+          dbg("message: missing type, setting as default text/plain");
+        }
+      }
+
      $p_boundary ||= $boundary;
      dbg("message: found part of type ".$part_msg->{'type'}.", boundary: ".(defined $p_boundary ? $p_boundary : ''));

@ -962,12 +1057,8 @@ sub _parse_multipart {
      push(@{$self->{'parse_queue'}}, [ $part_msg, $p_boundary, $part_array, $subparse ]);
      $msg->add_body_part($part_msg);

-      # rfc 1521 says /^--boundary--$/, some MUAs may just require /^--boundary--/
-      # but this causes problems with horizontal lines when the boundary is
-      # made up of dashes as well, etc.
      if (defined $boundary) {
-        # no re "strict";  # since perl 5.21.8: Ranges of ASCII printables...
-        if ($line =~ /^--\Q${boundary}\E--\s*$/) {
+        if ($found_last_end_boundary) {
 	  # Make a note that we've seen the end boundary
 	  $self->{mime_boundary_state}->{$boundary}--;
          last;
@ -1016,6 +1107,12 @@ sub _parse_multipart {
        if ($header) {
          my ( $key, $value ) = split ( /:\s*/, $header, 2 );
          $part_msg->header( $key, $value );
+          if (defined $boundary && lc $key eq 'content-type') {
+	    my (undef, $nested_bound) = Mail::SpamAssassin::Util::parse_content_type($part_msg->header('content-type'));
+            if (defined $nested_bound && $nested_bound eq $boundary) {
+       	      $nested_boundary = 1;
+            }
+          }
        }
        $in_body = 1;

@ -1070,12 +1167,18 @@ sub _parse_normal {

  dbg("message: parsing normal part");

-  # 0: content-type, 1: boundary, 2: charset, 3: filename
+  # 0: content-type, 1: boundary, 2: charset, 3: filename 4: ct_missing
  my @ct = Mail::SpamAssassin::Util::parse_content_type($msg->header('content-type'));

  # multipart sections are required to have a boundary set ...  If this
  # one doesn't, assume it's malformed and revert to text/plain
-  $msg->{'type'} = ($ct[0] !~ m@^multipart/@i || defined $boundary ) ? $ct[0] : 'text/plain';
+  # bug 5741: don't overwrite the default type assigned by _parse_multipart()
+  if (!$ct[4]) {
+    $msg->{'type'} = (index($ct[0], 'multipart/') != 0 || defined $boundary) ?
+      $ct[0] : 'text/plain'
+  } else {
+    dbg("message: missing type, setting previous multipart type: %s", $msg->{'type'});
+  }
  $msg->{'charset'} = $ct[2];

  # attempt to figure out a name for this attachment if there is one ...
@ -1086,9 +1189,6 @@ sub _parse_normal {
  elsif ($ct[3]) {
    $msg->{'name'} = $ct[3];
  }
-  if ($msg->{'name'}) {
-    $msg->{'name'} = Encode::decode("MIME-Header", $msg->{'name'});
-  }

  $msg->{'boundary'} = $boundary;

@ -1096,7 +1196,8 @@ sub _parse_normal {
  # ahead and write the part data out to a temp file -- why keep sucking
  # up RAM with something we're not going to use?
  #
-  if ($msg->{'type'} !~ m@^(?:text/(?:plain|html)$|message\b)@) {
+  unless ($msg->{'type'} eq 'text/plain' || $msg->{'type'} eq 'text/html' ||
+          index($msg->{'type'}, 'message/') == 0) {
    my($filepath, $fh);
    eval {
      ($filepath, $fh) = Mail::SpamAssassin::Util::secure_tmpfile();  1;
@ -1131,7 +1232,7 @@ sub get_mimepart_digests {
  if (!exists $self->{mimepart_digests}) {
    # traverse all parts which are leaves, recursively
    $self->{mimepart_digests} =
-      [ map(sha1_hex($_->decode) . ':' . lc($_->{type}||''),
+      [ map(sha1_hex($_->decode) . ':' . ($_->{type}||''),
            $self->find_parts(qr/^/,1,1)) ];
  }
  return $self->{mimepart_digests};
@ -1202,16 +1303,19 @@ sub get_body_text_array_common {
      # text/plain rendered as html otherwise.
      if ($html_needs_setting && $type eq 'text/html') {
        $self->{metadata}->{html} = $p->{html_results};
+        push @{$self->{metadata}->{html_all}}, $p->{html_results};
      }
    }
  }

  # whitespace handling (warning: small changes have large effects!)
-  $text =~ s/\n+\s*\n+/\f/gs;		# double newlines => form feed
+  $text =~ s/\n+\s*\n+/\x00/gs;		# double newlines => null
 # $text =~ tr/ \t\n\r\x0b\xa0/ /s;	# whitespace (incl. VT, NBSP) => space
-  $text =~ tr/ \t\n\r\x0b/ /s;		# whitespace (incl. VT) => space
-  $text =~ tr/\f/\n/;			# form feeds => newline
+# $text =~ tr/ \t\n\r\x0b/ /s;		# whitespace (incl. VT) => single space
+  $text =~ s/\s+/ /gs;		        # Unicode whitespace => single space
+  $text =~ tr/\x00/\n/;			# null => newline

+  utf8::encode($text) if utf8::is_utf8($text);
  my @textary = split_into_array_of_short_lines($text);
  $self->{$key} = \@textary;

@ -1246,7 +1350,7 @@ sub get_decoded_body_text_array {
  my $scansize = $self->{rawbody_part_scan_size};

  # Find all parts which are leaves
-  my @parts = $self->find_parts(qr/^(?:text|message)\b/i,1);
+  my @parts = $self->find_parts(qr/^(?:text|message)\b/,1);
  return $self->{text_decoded} unless @parts;

  # Go through each part
--- a/upstream/lib/Mail/SpamAssassin/Message/Metadata/Received.pm
+++ b/upstream/lib/Mail/SpamAssassin/Message/Metadata/Received.pm
@ -50,6 +50,10 @@ use Mail::SpamAssassin::Dns;
 use Mail::SpamAssassin::PerMsgStatus;
 use Mail::SpamAssassin::Constants qw(:ip);

+my $IP_ADDRESS = IP_ADDRESS;
+my $IP_PRIVATE = IP_PRIVATE;
+my $LOCALHOST = LOCALHOST;
+
 # ---------------------------------------------------------------------------

 sub parse_received_headers {
@ -117,10 +121,6 @@ sub parse_received_headers {
    }
  }

-  my $IP_ADDRESS = IP_ADDRESS;
-  my $IP_PRIVATE = IP_PRIVATE;
-  my $LOCALHOST = LOCALHOST;
-
  my @hdrs = $msg->get_header('Received');

  # Now add the single line headers like X-Originating-IP. (bug 5680)
@ -336,9 +336,6 @@ sub parse_received_line {
  my $ident = '';
  my $envfrom = undef;
  my $mta_looked_up_dns = 0;
-  my $IP_ADDRESS = IP_ADDRESS;
-  my $IP_PRIVATE = IP_PRIVATE;
-  my $LOCALHOST = LOCALHOST;
  my $auth = '';

 # ---------------------------------------------------------------------------
@ -430,7 +427,7 @@ sub parse_received_line {
    $auth = "GMX ($4 / $3)";
  }
  # Critical Path Messaging Server
-  elsif (/ \(authenticated as /&&/\) by .+ \(\d{1,2}\.\d\.\d{3}(?:\.\d{1,3})?\) \(authenticated as .+\) id /) {
+  elsif (/ \(authenticated as / && /\) by .+ \(\d{1,2}\.\d\.\d{3}(?:\.\d{1,3})?\) \(authenticated as .+\) id /) {
    $auth = 'CriticalPath';
  }
  # Postfix 2.3 and later with "smtpd_sasl_authenticated_header yes"
@ -820,7 +817,7 @@ sub parse_received_line {
    
    # Received: from [193.220.176.134] by web40310.mail.yahoo.com via HTTP;
    # Wed, 12 Feb 2003 14:22:21 PST
-    if (/ via HTTP$/&&/^\[(${IP_ADDRESS})\] by (\S+) via HTTP$/) {
+    if (/ via HTTP$/ && /^\[(${IP_ADDRESS})\] by (\S+) via HTTP$/) {
      $ip = $1; $by = $2; goto enough;
    }

@ -944,13 +941,13 @@ sub parse_received_line {

    # Received: from [129.24.215.125] by ws1-7.us4.outblaze.com with http for
    # _bushisevil_@mail.com; Thu, 13 Feb 2003 15:59:28 -0500
-    if (/ with http for /&&/^\[(${IP_ADDRESS})\] by (\S+) with http for /) {
+    if (/ with http for / && /^\[(${IP_ADDRESS})\] by (\S+) with http for /) {
      $ip = $1; $by = $2; goto enough;
    }

    # Received: from snake.corp.yahoo.com(216.145.52.229) by x.x.org via smap (V1.3)
    # id xma093673; Wed, 26 Mar 03 20:43:24 -0600
-    if (/ via smap /&&/^(\S+)\((${IP_ADDRESS})\) by (\S+) via smap /) {
+    if (/ via smap / && /^(\S+)\((${IP_ADDRESS})\) by (\S+) via smap /) {
      $mta_looked_up_dns = 1;
      $rdns = $1; $ip = $2; $by = $3; goto enough;
    }
@ -965,13 +962,13 @@ sub parse_received_line {
    # Received: from [192.168.0.71] by web01-nyc.clicvu.com (Post.Office MTA
    # v3.5.3 release 223 ID# 0-64039U1000L100S0V35) with SMTP id com for
    # <x@x.org>; Tue, 25 Mar 2003 11:42:04 -0500
-    if (/ \(Post/&&/^\[(${IP_ADDRESS})\] by (\S+) \(Post/) {
+    if (/ \(Post/ && /^\[(${IP_ADDRESS})\] by (\S+) \(Post/) {
      $ip = $1; $by = $2; goto enough;
    }

    # Received: from [127.0.0.1] by euphoria (ArGoSoft Mail Server 
    # Freeware, Version 1.8 (1.8.2.5)); Sat, 8 Feb 2003 09:45:32 +0200
-    if (/ \(ArGoSoft/&&/^\[(${IP_ADDRESS})\] by (\S+) \(ArGoSoft/) {
+    if (/ \(ArGoSoft/ && /^\[(${IP_ADDRESS})\] by (\S+) \(ArGoSoft/) {
      $ip = $1; $by = $2; goto enough;
    }

@ -984,7 +981,7 @@ sub parse_received_line {

    # Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4)
    # for asrg@ietf.org; 7 Mar 2003 21:10:38 -0000
-    if (/ with BSMTP/&&/^\S+ by \S+ with BSMTP/) {
+    if (/ with BSMTP/ && /^\S+ by \S+ with BSMTP/) {
      return 0;	# BSMTP != a TCP/IP handover, ignore it
    }

--- a/upstream/lib/Mail/SpamAssassin/Message/Node.pm
+++ b/upstream/lib/Mail/SpamAssassin/Message/Node.pm
@ -36,8 +36,6 @@ use strict;
 use warnings;
 use re 'taint';

-require 5.008001;  # needs utf8::is_utf8()
-
 use Mail::SpamAssassin;
 use Mail::SpamAssassin::Constants qw(:sa);
 use Mail::SpamAssassin::HTML;
@ -167,8 +165,8 @@ sub header {
  my $key    = lc($rawkey);

  # Trim whitespace off of the header keys
-  $key       =~ s/^\s+//;
-  $key       =~ s/\s+$//;
+  #$key       =~ s/^\s+//;
+  #$key       =~ s/\s+$//;

  if (@_) {
    my $raw_value = shift;
@ -388,6 +386,12 @@ sub detect_utf16 {
 	my $sum_l_o = 0;
 	my $decoder = undef;

+	# avoid scan if BOM present
+	if( $data =~ /^(?:\xff\xfe|\xfe\xff)/ ) {
+		dbg( "message: detect_utf16: found BOM" );
+		return;	# let perl figure it out from the BOM
+	}
+	
 	my @msg_h = unpack 'H' x length( $data ), $data;
 	my @msg_l = unpack 'h' x length( $data ), $data;

@ -397,11 +401,11 @@ sub detect_utf16 {
 		$sum_h_o += hex $msg_h[$i+1];
 		$sum_l_e += hex $msg_l[$i];
 		$sum_l_o += hex $msg_l[$i+1];
-		if( $check_char =~ /20 00/ ) {
+		if (index($check_char, '20 00') >= 0) {
 			# UTF-16LE space char detected
 			$utf16le_clues++;
 		}
-		if( $check_char =~ /00 20/ ) {
+		if (index($check_char, '00 20') >= 0) {
 			# UTF-16BE space char detected
 			$utf16be_clues++;
 		}
@ -416,7 +420,7 @@ sub detect_utf16 {
 	if( $utf16le_clues > $utf16be_clues ) {
 		dbg( "message: detect_utf16: UTF-16LE" );
 		$decoder = Encode::find_encoding("UTF-16LE");
-	} elsif( $utf16le_clues > $utf16be_clues ) {
+	} elsif( $utf16be_clues > $utf16le_clues ) {
 		dbg( "message: detect_utf16: UTF-16BE" );
 		$decoder = Encode::find_encoding("UTF-16BE");
 	} else {
@ -450,6 +454,7 @@ sub _normalize {
 # my $data = $_[0];  # avoid copying large strings
  my $charset_declared = $_[1];
  my $return_decoded = $_[2];  # true: Unicode characters, false: UTF-8 octets
+  my $insist_on_declared_charset = $_[3];  # no FB_CROAK in Encode::decode

  warn "message: _normalize() was given characters, expected bytes: $_[0]\n"
    if utf8::is_utf8($_[0]);
@ -457,10 +462,6 @@ sub _normalize {
  # workaround for Encode::decode taint laundering bug [rt.cpan.org #84879]
  my $data_taint = substr($_[0], 0, 0);  # empty string, tainted like $data

-  if (!defined $charset_declared || $charset_declared eq '') {
-    $charset_declared = 'us-ascii';
-  }
-
  # number of characters with code above 127
  my $cnt_8bits = $_[0] =~ tr/\x00-\x7F//c;

@ -469,7 +470,8 @@ sub _normalize {
        /^(?: (?:US-)?ASCII | ANSI[_ ]? X3\.4- (?:1986|1968) |
              ISO646-US )\z/xsi)
  { # declared as US-ASCII (a.k.a. ANSI X3.4-1986) and it really is
-    dbg("message: kept, charset is US-ASCII as declared");
+    dbg("message: contains only US-ASCII characters, declared %s, not decoding",
+      $charset_declared);
    return $_[0];  # is all-ASCII, no need for decoding
  }

@ -479,17 +481,21 @@ sub _normalize {
              UTF-?8 | (KOI8|EUC)-[A-Z]{1,2} |
              Big5 | GBK | GB[ -]?18030 (?:-20\d\d)? )\z/xsi)
  { # declared as extended ASCII, but it is actually a plain 7-bit US-ASCII
-    dbg("message: kept, charset is US-ASCII, declared %s", $charset_declared);
+    dbg("message: contains only US-ASCII characters, declared %s, not decoding",
+      $charset_declared);
    return $_[0];  # is all-ASCII, no need for decoding
  }

  # Try first to strictly decode based on a declared character set.

  my $rv;
-  if ($charset_declared =~ /^UTF-?8\z/i) {
-    # attempt decoding as strict UTF-8  (flags: FB_CROAK | LEAVE_SRC)
+
+  # Try first as UTF-8 ignoring declaring?
+  my $tried_utf8;
+  if ($cnt_8bits && !$insist_on_declared_charset) {
    if (eval { $rv = $enc_utf8->decode($_[0], 1|8); defined $rv }) {
-      dbg("message: decoded as declared charset UTF-8");
+      dbg("message: decoded as charset UTF-8, declared %s",
+        $charset_declared);
      return $_[0]  if !$return_decoded;
      $rv .= $data_taint;  # carry taintedness over, avoid Encode bug
      return $rv;  # decoded
@ -499,8 +505,16 @@ sub _normalize {
        $err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
        $err = " ($err)";
      }
-      dbg("message: failed decoding as declared charset UTF-8 ($err)");
+      dbg("message: failed decoding as charset UTF-8, declared %s%s",
+        $charset_declared, $err);
+      $tried_utf8 = 1;
    }
+  }
+
+  if ($charset_declared =~ /^(?:US-)?ASCII\z/i
+           && !$insist_on_declared_charset) {
+    # declared as US-ASCII but contains 8-bit characters, makes no sense
+    # to attempt decoding first as strict US-ASCII as we know it would fail

  } elsif ($charset_declared =~ /^UTF[ -]?16/i) {
    # Handle cases where spammers use UTF-16 encoding without including a BOM
@ -508,8 +522,10 @@ sub _normalize {
    # https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7252

    my $decoder = detect_utf16( $_[0] );
+    if (defined $decoder) {
      if (eval { $rv = $decoder->decode($_[0], 1|8); defined $rv }) {
-      dbg("message: declared charset %s decoded as charset %s", $charset_declared, $decoder->name);
+        dbg("message: decoded as charset %s, declared %s",
+          $decoder->name, $charset_declared);
        return $_[0]  if !$return_decoded;
        $rv .= $data_taint;  # carry taintedness over, avoid Encode bug
        return $rv;  # decoded
@ -519,20 +535,10 @@ sub _normalize {
          $err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
          $err = " ($err)";
        }
-      dbg("message: failed decoding as declared charset %s%s", $charset_declared, $err);
+        dbg("message: failed decoding as charset %s, declared %s%s",
+          $decoder->name, $charset_declared, $err);
+      }
    };
-
-  } elsif ($cnt_8bits &&
-           eval { $rv = $enc_utf8->decode($_[0], 1|8); defined $rv }) {
-    dbg("message: decoded as charset UTF-8, declared %s", $charset_declared);
-    return $_[0]  if !$return_decoded;
-    $rv .= $data_taint;  # carry taintedness over, avoid Encode bug
-    return $rv;  # decoded
-
-  } elsif ($charset_declared =~ /^(?:US-)?ASCII\z/i) {
-    # declared as US-ASCII but contains 8-bit characters, makes no sense
-    # to attempt decoding first as strict US-ASCII as we know it would fail
-
  } else {
    # try decoding as a declared character set

@ -553,8 +559,11 @@ sub _normalize {
    my($chset, $decoder);
    if ($charset_declared =~ /^(?: ISO-?8859-1 | Windows-1252 | CP1252 )\z/xi) {
      $chset = 'Windows-1252'; $decoder = $enc_w1252;
+    } elsif ($charset_declared =~ /^UTF-?8\z/i) {
+      $chset = 'UTF-8'; $decoder = $enc_utf8;
    } else {
-      $chset = $charset_declared; $decoder = Encode::find_encoding($chset);
+      $chset = $charset_declared;
+      $decoder = Encode::find_encoding($chset);
      if (!$decoder && $chset =~ /^GB[ -]?18030(?:-20\d\d)?\z/i) {
        $decoder = Encode::find_encoding('GBK');  # a subset of GB18030
        dbg("message: no decoder for a declared charset %s, using GBK",
@ -564,20 +573,24 @@ sub _normalize {
    if (!$decoder) {
      dbg("message: failed decoding, no decoder for a declared charset %s",
          $chset);
-    } else {
+    }
+    elsif ($tried_utf8 && $chset eq 'UTF-8') {
+      # was already tried initially, no point doing again
+    }
+    else {
+      my $check_flags = Encode::LEAVE_SRC;  # 0x0008
+      $check_flags |= Encode::FB_CROAK  unless $insist_on_declared_charset;
      my $err = '';
-      eval { $rv = $decoder->decode($_[0], 1|8) };  # FB_CROAK | LEAVE_SRC
+      if (eval { $rv = $decoder->decode($_[0], $check_flags); defined $rv }) {
+        dbg("message: decoded as charset %s, declared %s",
+          $decoder->name, $charset_declared);
+      } else {
        if ($@) {
          $err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
          $err = " ($err)";
        }
-      if (lc $chset eq lc $charset_declared) {
-        dbg("message: %s as declared charset %s%s",
-            defined $rv ? 'decoded' : 'failed decoding', $charset_declared, $err);
-      } else {
-        dbg("message: %s as charset %s, declared %s%s",
-            defined $rv ? 'decoded' : 'failed decoding',
-            $chset, $charset_declared, $err);
+        dbg("message: failed decoding as charset %s, declared %s%s",
+          $decoder->name, $charset_declared, $err);
      }
    }
  }
@ -589,7 +602,7 @@ sub _normalize {
  # Note that Windows-1252 is a proper superset of ISO-8859-1.
  #
  if (!defined $rv && !$cnt_8bits) {
-    dbg("message: kept, guessed charset is US-ASCII, declared %s",
+    dbg("message: contains only US-ASCII characters, declared %s, not decoding",
        $charset_declared);
    return $_[0];  # is all-ASCII, no need for decoding

@ -675,7 +688,7 @@ sub _normalize {

 =item rendered()

-render_text() takes the given text/* type MIME part, and attempts to
+rendered() takes the given text/* type MIME part, and attempts to
 render it into a text scalar.  It will always render text/html, and will
 use a heuristic to determine if other text/* parts should be considered
 text/html.  Two scalars are returned: the rendered type (either text/html
@ -686,18 +699,31 @@ or whatever the original type was), and the rendered text.
 sub rendered {
  my ($self) = @_;

-  if (!exists $self->{rendered}) {
+  # Cached?
+  if (exists $self->{rendered}) {
+    return ($self->{rendered_type}, $self->{rendered});
+  }
+
  # We only know how to render text/plain and text/html ...
  # Note: for bug 4843, make sure to skip text/calendar parts
  # we also want to skip things like text/x-vcard
  # text/x-aol is ignored here, but looks like text/html ...
-    return(undef,undef) unless ( $self->{'type'} =~ /^text\/(?:plain|html)$/i );
+  my $type = lc $self->{'type'};
+  unless ($type eq 'text/plain' || $type eq 'text/html') {
+    return (undef,undef);
+  }

  my $text = $self->decode;  # QP and Base64 decoding, bytes
  my $text_len = length($text);  # num of bytes in original charset encoding

+  my $charset = $self->{charset};
+  if (!defined $charset) {
+    dbg("message: no charset declared, using us-ascii");
+    $charset = 'us-ascii';
+  }
+
  # render text/html always
-    if ($text ne '' && $self->{'type'} =~ m{^text/html$}i)
+  if ($text ne '' && $type eq 'text/html')
  {
    $self->{rendered_type} = 'text/html';

@ -712,21 +738,39 @@ sub rendered {
      # subroutine _normalize() to return Unicode text.  See Bug 7133
      #
      $character_semantics = 1;  # $text will be in characters
-        $text = _normalize($text, $self->{charset}, 1); # bytes to chars
-      } elsif (!defined $self->{charset} ||
-               $self->{charset} =~ /^(?:US-ASCII|UTF-8)\z/i) {
-        # With some luck input can be interpreted as UTF-8, do not warn.
-        # It is still possible to hit the HTML::Parses utf8_mode bug however.
+      $text = _normalize($text, $charset, 1); # bytes to chars
+    } elsif ($charset =~ /^(?:US-ASCII|UTF-8)\z/i) {
+      if ($text !~ tr/\x00-\x7F//c) {
+        # all-ASCII, keep as octets (utf8 flag off)
+        dbg("message: contains only US-ASCII characters, declared %s, not decoding",
+          $charset);
+      } else { # non-ASCII, try UTF-8
+        my $rv;
+        # with some luck input can be interpreted as UTF-8
+        if (eval { $rv = $enc_utf8->decode($text, 1|8); defined $rv }) {
+          $text = $rv;  # decoded to perl characters
+          $character_semantics = 1;  # $text will be in characters
+          dbg("message: decoded as charset UTF-8, declared %s", $charset);
+        } else {
+          my $err = '';
+          if ($@) {
+            $err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
+            $err = " ($err)";
+          }
+          dbg("message: failed decoding as charset UTF-8, declared %s%s",
+            $charset, $err);
+        }
+      }
    } else {
      dbg("message: 'normalize_charset' is off, encoding will likely ".
-            "be misinterpreted; declared charset: %s", $self->{charset});
+          "be misinterpreted; declared charset: %s", $charset);
    }
-      # the 0 requires decoded HTML results to be in bytes (not characters)
-      my $html = Mail::SpamAssassin::HTML->new($character_semantics,0); # object
+    # the 1 requires decoded HTML results to be in characters (utf8 flag on)
+    my $html = Mail::SpamAssassin::HTML->new($character_semantics,1); # object

    $html->parse($text);  # parse+render text

-      # resulting HTML-decoded text is in bytes, likely encoded as UTF-8
+    # resulting HTML-decoded text is in perl characters (utf8 flag on)
    $self->{rendered} = $html->get_rendered_text();
    $self->{visible_rendered} = $html->get_rendered_text(invisible => 0);
    $self->{invisible_rendered} = $html->get_rendered_text(invisible => 1);
@ -735,24 +779,56 @@ sub rendered {
    # end-of-document result values that require looking at the text
    my $r = $self->{html_results};	# temporary reference for brevity

-      # count the number of spaces in the rendered text (likely UTF-8 octets)
-      my $space = $self->{rendered} =~ tr/ \t\n\r\x0b//;
+    # count the number of spaces in the rendered text
+    my $space;
+    if (utf8::is_utf8($self->{rendered})) {
+      my $str = $self->{rendered};
+      $str =~ s/\S+//g;  # delete non-whitespace Unicode characters
+      $space = length $str;  # count remaining Unicode space characters
+      undef $str;  # deallocate storage
+      dbg("message: spaces (Unicode) in HTML: %d out of %d%s",
+          $space, length $self->{rendered},
+          $character_semantics ? '' : ', octets!?');
+    } else {
+      $space = $self->{rendered} =~ tr/ \t\n\r\x0b//;
+      dbg("message: spaces (octets) in HTML: %d out of %d%s",
+          $space, length $self->{rendered},
+          $character_semantics ? ', chars!?' : '');
+    }
    # we may want to add the count of other Unicode whitespace characters

-      $r->{html_length} = length $self->{rendered};  # bytes (likely UTF-8)
+    $r->{html_length} = length $self->{rendered};  # perl characters count
    $r->{non_space_len} = $r->{html_length} - $space;
    $r->{ratio} = ($text_len - $r->{html_length}) / $text_len  if $text_len;
  }
-
  else {  # plain text
    if ($self->{normalize} && $enc_utf8) {
      # request transcoded result as UTF-8 octets!
-        $text = _normalize($text, $self->{charset}, 0);
+      $text = _normalize($text, $charset, 1); # bytes to chars
+    } elsif ($charset =~ /^(?:US-ASCII|UTF-8)\z/i) {
+      if ($text =~ tr/\x00-\x7F//c) {  # non-ASCII, try UTF-8
+        my $rv;
+        # with some luck input can be interpreted as UTF-8
+        if (eval { $rv = $enc_utf8->decode($text, 1|8); defined $rv }) {
+          $text = $rv;  # decoded to perl characters
+          dbg("message: decoded as charset UTF-8, declared %s", $charset);
+        } else {
+          my $err = '';
+          if ($@) {
+            $err = $@; $err =~ s/\s+/ /gs; $err =~ s/(.*) at .*/$1/;
+            $err = " ($err)";
          }
-      $self->{rendered_type} = $self->{type};
-      $self->{rendered} = $self->{'visible_rendered'} = $text;
-      $self->{'invisible_rendered'} = '';
+          dbg("message: failed decoding as charset UTF-8, declared %s%s",
+            $charset, $err);
        }
+      } else {
+        dbg("message: contains only US-ASCII characters, declared %s, not decoding",
+          $charset);
+      }
+    }
+    $self->{rendered_type} = $type;
+    $self->{rendered} = $self->{visible_rendered} = $text;
+    $self->{invisible_rendered} = '';
  }

  return ($self->{rendered_type}, $self->{rendered});
@ -841,24 +917,24 @@ Delete the specified header (decoded and raw) from the Node information.
 sub delete_header {
  my($self, $hdr) = @_;

-  foreach ( grep(/^${hdr}$/i, keys %{$self->{'headers'}}) ) {
+  foreach ( grep(/^${hdr}$/io, keys %{$self->{'headers'}}) ) {
    delete $self->{'headers'}->{$_};
    delete $self->{'raw_headers'}->{$_};
  }
  
-  my @neworder = grep(!/^${hdr}$/i, @{$self->{'header_order'}});
+  my @neworder = grep(!/^${hdr}$/io, @{$self->{'header_order'}});
  $self->{'header_order'} = \@neworder;
 }

-# decode a header appropriately.  don't bother adding it to the pod documents.
-sub __decode_header {
+# decode 'encoded-word' (RFC 2047, RFC 2231)
+sub _decode_mime_encoded_word {
  my ( $encoding, $cte, $data ) = @_;

-  if ( $cte eq 'B' ) {
+  if ( uc $cte eq 'B' ) {
    # base 64 encoded
    $data = Mail::SpamAssassin::Util::base64_decode($data);
  }
-  elsif ( $cte eq 'Q' ) {
+  elsif ( uc $cte eq 'Q' ) {
    # quoted printable

    # the RFC states that in the encoded text, "_" is equal to "=20"
@ -868,12 +944,24 @@ sub __decode_header {
  }
  else {
    # not possible since the input has already been limited to 'B' and 'Q'
-    die "message: unknown encoding type '$cte' in RFC2047 header";
+    die "message: unknown encoding type '$cte' in RFC 2047 header";
  }
-  return _normalize($data, $encoding, 0);  # transcode to UTF-8 octets
+
+  if (defined $encoding) {
+    # RFC 2231 section 5: Language specification in Encoded Words
+    #   =?US-ASCII*EN?Q?Keith_Moore?=
+    # strip optional language information following an asterisk
+    $encoding =~ s{ \* .* \z }{}xs;
+
+    $data = _normalize($data, $encoding, 0, 1);  # transcode to UTF-8 octets
+  }
+  # dbg("message: _decode_mime_encoded_word (%s, %s): %s",
+  #     $cte, $encoding || '-', $data);
+
+  return $data;  # as UTF-8 octets
 }

-# Decode base64 and quoted-printable in headers according to RFC2047.
+# Decode base64 and quoted-printable in headers according to RFC 2047.
 #
 sub _decode_header {
  my($header_field_body, $header_field_name) = @_;
@ -881,35 +969,86 @@ sub _decode_header {
  return '' unless defined $header_field_body && $header_field_body ne '';

  # deal with folding and cream the newlines and such
-  $header_field_body =~ s/\n[ \t]+/\n /g;
+  $header_field_body =~ s/\n[ \t]/\n /g;  # turning tab into space on folds
  $header_field_body =~ s/\015?\012//gs;

+  if ($header_field_body =~ tr/\x00-\x7F//c) {
+    # Non-ASCII characters in header are not allowed by RFC 5322, but
+    # RFC 6532 relaxed the rule and allows UTF-8 encoding in header
+    # field bodies; no other encoding is allowed there (apart from
+    # RFC 2047 MIME encoded words, which must be all-ASCII anyway).
+    # The following call keeps UTF-8 octets if valid, otherwise tries
+    # some decoding guesswork so that the result is valid UTF-8 (octets).
+    $header_field_body = _normalize($header_field_body, 'UTF-8', 0);
+  }
+
  if ($header_field_name =~
       /^ (?: Received | (?:Resent-)? (?: Message-ID | Date ) |
              MIME-Version | References | In-Reply-To | List-.* ) \z /xsi ) {
    # Bug 6945: some header fields must not be processed for MIME encoding
-    # Bug 7466: leave out the Content-*
+    # Bug 7249: leave out the Content-*

-  } else {
-    local($1,$2,$3);
+  } elsif (index($header_field_body, '=?') != -1) {  # triage for possible encoded-words
+    local($1,$2,$3,$4);

    # Multiple encoded sections must ignore the interim whitespace.
    # To avoid possible FPs with (\s+(?==\?))?, look for the whole RE
    # separated by whitespace.
-    1 while $header_field_body =~
-              s{ ( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
-                 ( = \? [A-Za-z0-9_-]+ \? [bqBQ] \? [^?]* \? = ) }
-               {$1$2}xsg;
-
-    # transcode properly encoded RFC 2047 substrings into UTF-8 octets,
-    # leave everything else unchanged as it is supposed to be UTF-8 (RFC 6532)
-    # or plain US-ASCII
    $header_field_body =~
-      s{ (?: = \? ([A-Za-z0-9_-]+) \? ([bqBQ]) \? ([^?]*) \? = ) }
-       { __decode_header($1, uc($2), $3) }xsge;
+      s{ (   = \? [A-Za-z0-9*_-]+ \? [bqBQ] \? [^?]* \? = ) \s+
+         (?= = \? [A-Za-z0-9*_-]+ \? [bqBQ] \? [^?]* \? = ) }{$1}xsg;
+
+    # Bug 7249: work around violations of the RFC 2047 section 5 requirement:
+    #   Each 'encoded-word' MUST represent an integral number of characters.
+    #   A multi-octet character may not be split across adjacent 'encoded-word's
+    # Unfortunately such violations are not uncommon.
+    #
+    # Bug 7307: to deal with the above, base64/QP decoding must be decoupled
+    # from decoding a specified multi-byte character set into UTF-8.
+    # A previous simpler code could not handle base64 fill bits correctly
+    # (merging of adjecent encoded sections before base64/QP decoding them).
+
+    my @sections;  # array of pairs: [string, encoding]
+    my $last_encoding = '';
+    while ( $header_field_body =~
+              m{ \G = \? ([A-Za-z0-9*_-]+) \? ([bqBQ]) \? ([^?]*) \? =
+                  | ( [^=]+ | . ) }xsg ) {
+      my($encoding, $str);
+      if (defined $1) {  # we have an encoded section
+        $encoding = lc $1;
+        # decode base64 / QP decoding, remember encoding charset
+        $str = _decode_mime_encoded_word(undef, $2, $3);
+      } else {  # non-encoded text
+        $encoding = '';
+        $str = $4;
+      }
+      if ($encoding eq $last_encoding && @sections) {
+        # merge sections with same encoding - in violation of RFC 2047 sect.5
+        $sections[$#sections]->[0] .= $str;
+      } else {
+        push(@sections, [$str, $encoding]);
+      }
+      $last_encoding = $encoding;
    }

-# dbg("message: _decode_header %s: %s", $header_field_name, $header_field_body);
+    # transcode encoded RFC 2047 substrings (already base64/QP-decoded)
+    # into UTF-8 octets, leave everything else unchanged as it is supposed
+    # to be UTF-8 (RFC 6532) or its plain US-ASCII subset (RFC 5322);
+    #
+    my $decoded_result = '';
+    for my $sect (@sections) {
+      my $encoding = $sect->[1];
+      # RFC 2231 section 5: Language specification in Encoded Words
+      #   =?US-ASCII*EN?Q?Keith_Moore?=
+      # strip optional language information following an asterisk
+      $encoding =~ s{ \* .* \z }{}xs;
+      $decoded_result .=
+        $encoding eq '' ? $sect->[0] : _normalize($sect->[0], $encoding, 0, 1);
+    }
+    $header_field_body = $decoded_result;
+  }
+
+  dbg("message: _decode_header %s: %s", $header_field_name, $header_field_body);
  return $header_field_body;
 }

--- a/upstream/lib/Mail/SpamAssassin/NetSet.pm
+++ b/upstream/lib/Mail/SpamAssassin/NetSet.pm
@ -33,7 +33,7 @@ BEGIN {
  eval {
    require Net::Patricia;
    Net::Patricia->VERSION(1.16);  # need AF_INET6 support
-    import Net::Patricia;
+    Net::Patricia->import;
    $have_patricia = 1;
  };
 }
@ -59,6 +59,7 @@ sub new {

 sub DESTROY {
  my($self) = shift;
+
  if (exists $self->{cache}) {
    local($@, $!, $_);  # protect outer layers from a potential surprise
    my($hits, $attempts) = ($self->{cache_hits}, $self->{cache_attempts});
@ -76,7 +77,45 @@ sub add_cidr {
  my $numadded = 0;
  delete $self->{cache};  # invalidate cache (in case of late additions)

+  # Pre-parse x.x.x.x-x.x.x.x range notation into CIDR blocks
+  # requires Net::CIDR::Lite
+  my @nets2;
  foreach my $cidr_orig (@nets) {
+    next if index($cidr_orig, '-') == -1; # Triage
+    my $cidr = $cidr_orig;
+    my $exclude = ($cidr =~ s/^!\s*//) ? 1 : 0;
+    local($1);
+    $cidr =~ s/\b0+(\d+)/$1/; # Strip leading zeroes
+    eval { require Net::CIDR::Lite; }; # Only try to load now when it's necessary
+    if ($@) {
+      warn "netset: IP range notation '$cidr_orig' requires Net::CIDR::Lite module, ignoring\n";
+      $cidr_orig = undef;
+      next;
+    }
+    my $cidrs = Net::CIDR::Lite->new;
+    eval { $cidrs->add_range($cidr); };
+    if ($@) {
+      my $err = $@; $err =~ s/ at .*//s;
+      warn "netset: illegal IP range '$cidr_orig': $err\n";
+      $cidr_orig = undef;
+      next;
+    }
+    my @arr = $cidrs->list;
+    if (!@arr) {
+      my $err = $@; $err =~ s/ at .*//s;
+      warn "netset: failed to parse IP range '$cidr_orig': $err\n";
+      $cidr_orig = undef;
+      next;
+    }
+    # Save exclude flag
+    if ($exclude) { $_ = "!$_" foreach (@arr); }
+    # Rewrite this @nets value directly, add any rest to @nets2
+    $cidr_orig = shift @arr;
+    push @nets2, @arr  if @arr;
+  }
+
+  foreach my $cidr_orig (@nets, @nets2) {
+    next unless defined $cidr_orig;
    my $cidr = $cidr_orig;  # leave original unchanged, useful for logging

    # recognizes syntax:
--- a/upstream/lib/Mail/SpamAssassin/PerMsgLearner.pm
+++ b/upstream/lib/Mail/SpamAssassin/PerMsgLearner.pm
@ -21,7 +21,7 @@ Mail::SpamAssassin::PerMsgLearner - per-message status (spam or not-spam)

 =head1 SYNOPSIS

-  my $spamtest = new Mail::SpamAssassin ({
+  my $spamtest = Mail::SpamAssassin->new({
    'rules_filename'      => '/etc/spamassassin.rules',
    'userprefs_filename'  => $ENV{HOME}.'/.spamassassin/user_prefs'
  });
@ -97,8 +97,8 @@ sub learn_spam {
  my ($self, $id) = @_;

  # bug 4096
-  # if ($self->{main}->{learn_with_whitelist}) {
-  # $self->{main}->add_all_addresses_to_blacklist ($self->{msg});
+  # if ($self->{main}->{learn_with_welcomelist}) {
+  # $self->{main}->add_all_addresses_to_blocklist ($self->{msg});
  # }

  # use the real message-id here instead of mass-check's idea of an "id",
@ -124,8 +124,8 @@ sub learn_ham {
  my ($self, $id) = @_;

  # bug 4096
-  # if ($self->{main}->{learn_with_whitelist}) {
-  # $self->{main}->add_all_addresses_to_whitelist ($self->{msg});
+  # if ($self->{main}->{learn_with_welcomelist}) {
+  # $self->{main}->add_all_addresses_to_welcomelist ($self->{msg});
  # }

  $self->{learned} = $self->{bayes_scanner}->learn (0, $self->{msg}, $id);
@ -148,8 +148,8 @@ sub forget {
  my ($self, $id) = @_;

  # bug 4096
-  # if ($self->{main}->{learn_with_whitelist}) {
-  # $self->{main}->remove_all_addresses_from_whitelist ($self->{msg});
+  # if ($self->{main}->{learn_with_welcomelist}) {
+  # $self->{main}->remove_all_addresses_from_welcomelist ($self->{msg});
  # }

  $self->{learned} = $self->{bayes_scanner}->forget ($self->{msg}, $id);
--- a/upstream/lib/Mail/SpamAssassin/PerMsgStatus.pm
+++ b/upstream/lib/Mail/SpamAssassin/PerMsgStatus.pm
--- a/upstream/lib/Mail/SpamAssassin/PersistentAddrList.pm
+++ b/upstream/lib/Mail/SpamAssassin/PersistentAddrList.pm
@ -33,7 +33,7 @@ SpamAssassin will call:

 =head1 DESCRIPTION

-All persistent address list implementations, used by the auto-whitelist
+All persistent address list implementations, used by the auto-welcomelist
 code to track known-good email addresses, use this as a base class.

 See C<Mail::SpamAssassin::DBBasedAddrList> for an example.
@ -81,7 +81,7 @@ SpamAssassin classes.

 sub new_checker {
  my ($factory, $main) = @_;
-  die "auto-whitelist: unimplemented base method";	# override this
+  die "auto-welcomelist: unimplemented base method";	# override this
 }

 ###########################################################################
@ -109,7 +109,7 @@ a C<count> key and a C<totscore> key.
 sub get_addr_entry {
  my ($self, $addr, $signedby) = @_;
  my $entry = { };
-  die "auto-whitelist: unimplemented base method";	# override this
+  die "auto-welcomelist: unimplemented base method";	# override this
  return $entry;
 }

@ -117,27 +117,27 @@ sub get_addr_entry {

 =item $entry = $addrlist->add_score($entry, $score);

-This method should add the given score to the whitelist database for the
+This method should add the given score to the welcomelist database for the
 given entry, and then return the new entry.

 =cut

 sub add_score {
    my ($self, $entry, $score) = @_;
-    die "auto-whitelist: unimplemented base method"; # override this
+    die "auto-welcomelist: unimplemented base method"; # override this
 }

 ###########################################################################

 =item $entry = $addrlist->remove_entry ($entry);

-This method should remove the given entry from the whitelist database.
+This method should remove the given entry from the welcomelist database.

 =cut

 sub remove_entry {
  my ($self, $entry) = @_;
-  die "auto-whitelist: unimplemented base method";	# override this
+  die "auto-welcomelist: unimplemented base method";	# override this
 }

 ###########################################################################
@ -145,7 +145,7 @@ sub remove_entry {
 =item $entry = $addrlist->finish ();

 Clean up, if necessary.  Called by SpamAssassin when it has finished
-checking, or adding to, the auto-whitelist database.
+checking, or adding to, the auto-welcomelist database.

 =cut

--- a/upstream/lib/Mail/SpamAssassin/Plugin.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin.pm
@ -427,6 +427,20 @@ The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.

 =back

+=item $plugin->check_dnsbl ( { options ... } )
+
+Called when DNSBL or other network lookups are being launched, implying
+current running priority of -100.  This is the place to start your own
+asynchronously-started network lookups.
+
+=over 4
+
+=item permsgstatus
+
+The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.
+
+=back
+
 =item $plugin->check_post_dnsbl ( { options ... } )

 Called after the DNSBL results have been harvested.  This is a good
@ -440,6 +454,21 @@ The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.

 =back

+=item $plugin->check_cleanup ( { options ... } )
+
+Called just before message check is finishing and before possible
+auto-learning.  This is guaranteed to be always called, unlike check_tick
+and check_post_dnsbl.  Used for cleaning up left callbacks or forked
+children etc, last chance to make rules hit.
+
+=over 4
+
+=item permsgstatus
+
+The C<Mail::SpamAssassin::PerMsgStatus> context object for this scan.
+
+=back
+
 =item $plugin->check_post_learn ( { options ... } )

 Called after auto-learning may (or may not) have taken place.  If you
@ -817,7 +846,9 @@ Reference to the original message object.

 =back

-=item $plugin->whitelist_address( { options ... } )
+=item $plugin->welcomelist_address( { options ... } )
+
+Previously whitelist_address which will work interchangeably until 4.1.

 Called when a request is made to add an address to a
 persistent address list.
@ -834,7 +865,9 @@ Indicate if the call is being made from a command line interface.

 =back

-=item $plugin->blacklist_address( { options ... } )
+=item $plugin->blocklist_address( { options ... } )
+
+Previously blacklist_address which will work interchangeably until 4.1.

 Called when a request is made to add an address to a
 persistent address list.
@ -882,7 +915,7 @@ only spamd calls this API.
 =item result

 The C<'result: ...'> line for this scan.  Format is as described
-at B<http://wiki.apache.org/spamassassin/SpamdSyslogFormat>.
+at B<https://wiki.apache.org/spamassassin/SpamdSyslogFormat>.

 =back

@ -1013,17 +1046,27 @@ to receive specific events, or control the callback chain behaviour.

 =over 4

-=item $plugin->register_eval_rule ($nameofevalsub)
+=item $plugin->register_eval_rule ($nameofevalsub, $ruletype)

 Plugins that implement an eval test will need to call this, so that
 SpamAssassin calls into the object when that eval test is encountered.
 See the B<REGISTERING EVAL RULES> section for full details.

+Since 4.0, optional $ruletype can be specified to enforce that eval function
+cannot be called with wrong ruletype from configuration, for example user
+using "header FOO eval:foobar()" instead of "body FOO eval:foobar()". 
+Mismatch will result in lint failure. $ruletype can be one of:
+
+  $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS
+  $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS  (allows both body and rawbody)
+  $Mail::SpamAssassin::Conf::TYPE_RAWBODY_EVALS
+  $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS
+
 =cut

 sub register_eval_rule {
-  my ($self, $nameofsub) = @_;
-  $self->{main}->{conf}->register_eval_rule ($self, $nameofsub);
+  my ($self, $nameofsub, $ruletype) = @_;
+  $self->{main}->{conf}->register_eval_rule ($self, $nameofsub, $ruletype);
 }

 =item $plugin->register_generated_rule_method ($nameofsub)
@ -1132,7 +1175,7 @@ called from rules in the configuration files, in the plugin class' constructor.

 For example,

-  $plugin->register_eval_rule ('check_for_foo')
+  $plugin->register_eval_rule ('check_for_foo', $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS)

 will cause C<$plugin-E<gt>check_for_foo()> to be called for this
 SpamAssassin rule:
@ -1157,18 +1200,21 @@ In other words, the eval test method should look something like this:

  sub check_for_foo {
    my ($self, $permsgstatus, ...arguments...) = @_;
-    ...code returning 0 or 1
+    ...code returning 0 (miss), 1 (hit), or undef (async function)
  }

+The eval rule should return C<1> for a hit, or C<0> if the rule is not hit. 
+Special case of "return undef" must be used when result is not yet ready and
+it will be later declared with PerMsgStatus functions got_hit() or
+rule_ready() - see their documentation for more info.  Make sure not to
+return undef by mistake.
+
 Note that the headers can be accessed using the C<get()> method on the
 C<Mail::SpamAssassin::PerMsgStatus> object, and the body by
 C<get_decoded_stripped_body_text_array()> and other similar methods.
 Similarly, the C<Mail::SpamAssassin::Conf> object holding the current
 configuration may be accessed through C<$permsgstatus-E<gt>{main}-E<gt>{conf}>.

-The eval rule should return C<1> for a hit, or C<0> if the rule
-is not hit.
-
 State for a single message being scanned should be stored on the C<$permsgstatus>
 object, not on the C<$self> object, since C<$self> persists between scan
 operations.  See the 'lifecycle note' on the C<check_start()> method above.
@ -1216,8 +1262,8 @@ Mail::SpamAssassin(3)

 Mail::SpamAssassin::PerMsgStatus(3)

-http://wiki.apache.org/spamassassin/PluginWritingTips
+https://wiki.apache.org/spamassassin/PluginWritingTips

-http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2163
+https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2163

 =cut
--- a/upstream/lib/Mail/SpamAssassin/Plugin/ASN.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/ASN.pm
@ -50,13 +50,30 @@ Autonomous System Number (ASN) of the connecting IP address.

 loadplugin Mail::SpamAssassin::Plugin::ASN

+ # Default / recommended settings
+ asn_use_geodb 1
+ asn_use_dns 1
+ asn_prefer_geodb 1
+
+ # Do lookups and add tags / X-Spam-ASN header
 asn_lookup asn.routeviews.org _ASN_ _ASNCIDR_
- 
 asn_lookup_ipv6 origin6.asn.cymru.com _ASN_ _ASNCIDR_
-
 add_header all ASN _ASN_ _ASNCIDR_

- header TEST_AS1234 X-ASN =~ /^1234$/
+ # Rules to test ASN or Organization
+ # NOTE: Do not use rules that check metadata X-ASN header,
+ # only check_asn() eval function works correctly.
+ # Rule argument is full regexp to match.
+
+ # ASN Number: GeoIP ASN or DNS
+ # Matched string includes asn_prefix if defined, and normally
+ # looks like "AS1234" (DNS) or "AS1234 Google LLC" (GeoIP)
+ header AS_1234 eval:check_asn('/^AS1234\b/')
+
+ # ASN Organisation: GeoIP ASN has, DNS lists might not have
+ # Note the second parameter which checks MYASN tag (default is ASN)
+ asn_lookup myview.example.com _MYASN_ _MYASNCIDR_
+ header AS_GOOGLE eval:check_asn('/\bGoogle\b/i', 'MYASN')

 =head1 DESCRIPTION

@ -68,12 +85,17 @@ high-volume environment> or that you should use a local mirror of the
 zone (see C<ftp://ftp.routeviews.org/dnszones/>).  Other similar zones
 may also be used.

+GeoDB (GeoIP ASN) database lookups are supported since SpamAssassin 4.0 and
+it's recommended to use them instead of DNS queries, unless C<_ASNCIDR_>
+is needed.
+
 =head1 TEMPLATE TAGS

 This plugin allows you to create template tags containing the connecting
 IP's AS number and route info for that AS number.

-The default config will add a header field that looks like this:
+If you use add_header as documented in the example before, a header field is
+added that looks like this:

 X-Spam-ASN: AS24940 213.239.192.0/18

@ -87,29 +109,19 @@ all be added to the C<_ASNCIDR_> tag, separated by spaces, eg:
 Note that the literal "AS" before the ASN in the _ASN_ tag is configurable
 through the I<asn_prefix> directive and may be set to an empty string.

-=head1 CONFIGURATION
+C<_ASNCIDR_> is not available with local GeoDB ASN lookups.

-The standard ruleset contains a configuration that will add a header field
-containing ASN data to scanned messages.  The bayes tokenizer will use the
-added header field for bayes calculations, and thus affect which BAYES_* rule
-will trigger for a particular message.
+=head1 BAYES

-B<Note> that in most cases you should not score on the ASN data directly.
-Bayes learning will probably trigger on the _ASNCIDR_ tag, but probably not
-very well on the _ASN_ tag alone.
+The bayes tokenizer will use ASN data for bayes calculations, and thus
+affect which BAYES_* rule will trigger for a particular message.  No
+in-depth analysis of the usefulness of bayes tokenization of ASN data has
+been performed.

 =head1 SEE ALSO

 http://www.routeviews.org/ - all data regarding routing, ASNs, etc....

-http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4770 -
-SpamAssassin Issue #4770 concerning this plugin
-
-=head1 STATUS
-
-No in-depth analysis of the usefulness of bayes tokenization of ASN data has
-been performed.
-
 =cut

 package Mail::SpamAssassin::Plugin::ASN;
@ -120,27 +132,23 @@ use re 'taint';

 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
-use Mail::SpamAssassin::Util qw(reverse_ip_address);
-use Mail::SpamAssassin::Dns;
+use Mail::SpamAssassin::Util qw(reverse_ip_address compile_regexp);
 use Mail::SpamAssassin::Constants qw(:ip);

 our @ISA = qw(Mail::SpamAssassin::Plugin);

-our $txtdata_can_provide_a_list;
-
-my $IPV4_ADDRESS = IPV4_ADDRESS;
-
 sub new {
  my ($class, $mailsa) = @_;
  $class = ref($class) || $class;
  my $self = $class->SUPER::new($mailsa);
  bless ($self, $class);

+  $self->register_eval_rule("check_asn", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+
  $self->set_config($mailsa->{conf});

-  #$txtdata_can_provide_a_list = Net::DNS->VERSION >= 0.69;
-  #more robust version check from Damyan Ivanov - Bug 7095
-  $txtdata_can_provide_a_list = version->parse(Net::DNS->VERSION) >= version->parse('0.69');
+  # we need GeoDB ASN
+  $self->{main}->{geodb_wanted}->{asn} = 1;

  return $self;
 }
@ -201,6 +209,25 @@ is kept for backward compatibility with versions of SpamAssassin earlier
 than 3.4.0. A sensible setting is an empty string. The argument may be (but
 need not be) enclosed in single or double quotes for clarity.

+=item asn_use_geodb ( 0 / 1 )          (default: 1)
+
+Use Mail::SpamAssassin::GeoDB module to lookup ASN numbers.  You need
+suitable supported module like GeoIP2 or GeoIP with ISP or ASN database
+installed (for example, add EditionIDs GeoLite2-ASN in GeoIP.conf for
+geoipupdate program).
+
+GeoDB can only set _ASN_ tag, it has no data for _ASNCIDR_.  If you need
+both, then set asn_prefer_geodb 0 so DNS rules are tried.
+
+=item asn_prefer_geodb ( 0 / 1 )       (default: 1)
+
+If set, DNS lookups (asn_lookup rules) will not be run if GeoDB successfully
+finds ASN. Set this to 0 to get _ASNCIDR_ even if GeoDB finds _ASN_.
+
+=item asn_use_dns ( 0 / 1 )            (default: 1)
+
+Set to 0 to never allow DNS queries.
+
 =back

 =cut
@ -272,26 +299,53 @@ need not be) enclosed in single or double quotes for clarity.
    }
  });

+  push (@cmds, {
+    setting => 'asn_use_geodb',
+    default => 1,
+    is_admin => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });
+
+  push (@cmds, {
+    setting => 'asn_prefer_geodb',
+    default => 1,
+    is_admin => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });
+
+  push (@cmds, {
+    setting => 'asn_use_dns',
+    default => 1,
+    is_admin => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });
+
  $conf->{parser}->register_commands(\@cmds);
 }

 # ---------------------------------------------------------------------------

-sub parsed_metadata {
+sub extract_metadata {
  my ($self, $opts) = @_;

  my $pms = $opts->{permsgstatus};
-  my $conf = $self->{main}->{conf};
+  my $conf = $pms->{conf};

-  if (!$pms->is_dns_available()) {
-    dbg("asn: DNS is not available, skipping ASN checks");
+  my $geodb = $self->{main}->{geodb};
+  my $has_geodb = $conf->{asn_use_geodb} && $geodb && $geodb->can('asn');
+  if ($has_geodb) {
+    dbg("asn: using GeoDB ASN for lookups");
+  } else {
+    dbg("asn: GeoDB ASN not available");
+    if (!$conf->{asn_use_dns} || !$pms->is_dns_available()) {
+      dbg("asn: DNS is not available, skipping ASN check");
      return;
    }
-
-  if (!$conf->{asnlookups} && !$conf->{asnlookups_ipv6}) {
-    dbg("asn: no asn_lookups configured, skipping ASN lookups");
+    if ($self->{main}->{learning}) {
+      dbg("asn: learning message, skipping DNS-based ASN check");
      return;
    }
+  }

  # initialize the tag data so that if no result is returned from the DNS
  # query we won't end up with a missing tag.  Don't use $pms->set_tag()
@ -309,9 +363,11 @@ sub parsed_metadata {
    }
  }

-  # get reversed IP address of last external relay to lookup
-  # don't return until we've initialized the template tags
-  my $relay = $pms->{relays_external}->[0];
+  # Initialize status
+  $pms->{asn_results} = ();
+
+  # get IP address of last external relay to lookup
+  my $relay = $opts->{msg}->{metadata}->{relays_external}->[0];
  if (!defined $relay) {
    dbg("asn: no first external relay IP available, skipping ASN check");
    return;
@ -319,18 +375,45 @@ sub parsed_metadata {
    dbg("asn: first external relay is a private IP, skipping ASN check");
    return;
  }
-
  my $ip = $relay->{ip};
-  my $reversed_ip = reverse_ip_address($ip);
-  if (defined $reversed_ip) {
  dbg("asn: using first external relay IP for lookups: %s", $ip);
+
+  # GeoDB lookup
+  my $asn_found;
+  if ($has_geodb) {
+    my $asn = $geodb->get_asn($ip);
+    my $org = $geodb->get_asn_org($ip);
+    if (!defined $asn) {
+      dbg("asn: GeoDB ASN lookup failed");
    } else {
-    dbg("asn: could not parse first external relay IP: %s, skipping", $ip);
+      $asn_found = 1;
+      dbg("asn: GeoDB found ASN $asn");
+      # Prevent double prefix
+      my $asn_value =
+        length($conf->{asn_prefix}) && index($asn, $conf->{asn_prefix}) != 0 ?
+          $conf->{asn_prefix}.$asn : $asn;
+      $asn_value .= ' '.$org if defined $org && length($org);
+      $pms->set_tag('ASN', $asn_value);
+      # For Bayes
+      $pms->{msg}->put_metadata('X-ASN', $asn);
+    }
+  }
+
+  # Skip DNS if GeoDB was successful and preferred
+  if ($asn_found && $conf->{asn_prefer_geodb}) {
+    dbg("asn: GeoDB lookup successful, skipping DNS lookups");
    return;
  }

+  # No point continuing without DNS from now on
+  if (!$conf->{asn_use_dns} || !$pms->is_dns_available()) {
+    dbg("asn: skipping disabled DNS lookups");
+    return;
+  }
+
+  dbg("asn: using DNS for lookups");
  my $lookup_zone;
-  if ($ip =~ /^$IPV4_ADDRESS$/o) {
+  if ($ip =~ IS_IPV4_ADDRESS) {
    if (!defined $conf->{asnlookups}) {
      dbg("asn: asn_lookup for IPv4 not defined, skipping");
      return;
@ -344,6 +427,12 @@ sub parsed_metadata {
    $lookup_zone = "asnlookups_ipv6";
  }
  
+  my $reversed_ip = reverse_ip_address($ip);
+  if (!defined $reversed_ip) {
+    dbg("asn: could not parse IP: %s, skipping", $ip);
+    return;
+  }
+
  # we use arrays and array indices rather than hashes and hash keys
  # in case someone wants the same zone added to multiple sets of tags
  my $index = 0;
@ -351,14 +440,12 @@ sub parsed_metadata {
    # do the DNS query, have the callback process the result
    my $zone_index = $index;
    my $zone = $reversed_ip . '.' . $entry->{zone};
-    my $key = "asnlookup-${lookup_zone}-${zone_index}-".$entry->{zone};
-    my $ent = $pms->{async}->bgsend_and_start_lookup($zone, 'TXT', undef,
-      { type => 'ASN', key => $key, zone => $lookup_zone },
+    $pms->{async}->bgsend_and_start_lookup($zone, 'TXT', undef,
+      { rulename => 'asn_lookup', type => 'ASN' },
      sub { my($ent, $pkt) = @_;
            $self->process_dns_result($pms, $pkt, $zone_index, $lookup_zone) },
      master_deadline => $pms->{master_deadline}
    );
-    $pms->register_async_rule_start($key) if $ent;
    $index++;
  }
 }
@ -376,6 +463,9 @@ sub parsed_metadata {
 sub process_dns_result {
  my ($self, $pms, $pkt, $zone_index, $lookup_zone) = @_;

+  # NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
+  return if !$pkt;
+
  my $conf = $self->{main}->{conf};

  my $zone = $conf->{$lookup_zone}[$zone_index]->{zone};
@ -403,14 +493,10 @@ sub process_dns_result {
    %route_tag_data_seen = map(($_,1), @route_tag_data);
  }

-  # NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
-  my @answer = !defined $pkt ? () : $pkt->answer;
-
-  foreach my $rr (@answer) {
+  foreach my $rr ($pkt->answer) {
    #dbg("asn: %s: lookup result packet: %s", $zone, $rr->string);
    next if $rr->type ne 'TXT';
-    my @strings = $txtdata_can_provide_a_list ? $rr->txtdata :
-      $rr->char_str_list; # historical
+    my @strings = $rr->txtdata;
    next if !@strings;
    for (@strings) { utf8::encode($_) if utf8::is_utf8($_) }

@ -485,7 +571,49 @@ sub process_dns_result {
  }
 }

+sub check_asn {
+  my ($self, $pms, $re, $asn_tag) = @_;
+
+  my $rulename = $pms->get_current_eval_rule_name();
+  if (!defined $re) {
+    warn "asn: rule $rulename eval argument missing\n";
+    return 0;
+  }
+
+  my ($rec, $err) = compile_regexp($re, 2);
+  if (!$rec) {
+    warn "asn: invalid regexp for $rulename '$re': $err\n";
+    return 0;
+  }
+
+  $asn_tag = 'ASN' unless defined $asn_tag;
+  $pms->action_depends_on_tags($asn_tag,
+    sub { my($pms,@args) = @_;
+      $self->_check_asn($pms, $rulename, $rec, $asn_tag);
+    }
+  );
+
+  return; # return undef for async status
+}
+
+sub _check_asn {
+  my ($self, $pms, $rulename, $rec, $asn_tag) = @_;
+
+  $pms->rule_ready($rulename); # mark rule ready for metas
+
+  my $asn = $pms->get_tag($asn_tag);
+  return if !defined $asn;
+
+  if ($asn =~ $rec) {
+    $pms->test_log("$asn_tag: $asn", $rulename);
+    $pms->got_hit($rulename, "");
+  }
+}
+
 # Version features
 sub has_asn_lookup_ipv6 { 1 }
+sub has_asn_geodb { 1 }
+sub has_check_asn { 1 }
+sub has_check_asn_tag { 1 } # $asn_tag parameter for check_asn()

 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/AWL.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/AWL.pm
@ -17,7 +17,7 @@

 =head1 NAME

-Mail::SpamAssassin::Plugin::AWL - Normalize scores via auto-whitelist
+Mail::SpamAssassin::Plugin::AWL - Normalize scores via auto-welcomelist

 =head1 SYNOPSIS

@ -28,14 +28,14 @@ To try this out, add this or uncomment this line in init.pre:
 Use the supplied 60_awl.cf file (ie you don't have to do anything) or
 add these lines to a .cf file:

-  header AWL             eval:check_from_in_auto_whitelist()
-  describe AWL           From: address is in the auto white-list
+  header AWL             eval:check_from_in_auto_welcomelist()
+  describe AWL           From: address is in the auto welcome-list
  tflags AWL             userconf noautolearn
  priority AWL           1000

 =head1 DESCRIPTION

-This plugin module provides support for the auto-whitelist.  It keeps
+This plugin module provides support for the auto-welcomelist.  It keeps
 track of the average SpamAssassin score for senders.  Senders are
 tracked using a combination of their From: address and their IP address.
 It then uses that average score to reduce the variability in scoring
@ -63,7 +63,7 @@ use warnings;
 # use bytes;
 use re 'taint';
 use Mail::SpamAssassin::Plugin;
-use Mail::SpamAssassin::AutoWhitelist;
+use Mail::SpamAssassin::AutoWelcomelist;
 use Mail::SpamAssassin::Util qw(untaint_var);
 use Mail::SpamAssassin::Logger;

@ -80,7 +80,8 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule("check_from_in_auto_whitelist");
+  $self->register_eval_rule("check_from_in_auto_welcomelist", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_from_in_auto_whitelist", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); # removed in 4.1

  $self->set_config($mailsaobject->{conf});

@ -99,18 +100,20 @@ SpamAssassin handles incoming email messages.

 =over 4

-=item use_auto_whitelist ( 0 | 1 )		(default: 1)
+=item use_auto_welcomelist ( 0 | 1 )		(default: 1)

-Whether to use auto-whitelists.  Auto-whitelists track the long-term
+Previously use_auto_whitelist which will work interchangeably until 4.1.
+
+Whether to use auto-welcomelists.  Auto-welcomelists track the long-term
 average score for each sender and then shift the score of new messages
 toward that long-term average.  This can increase or decrease the score
 for messages, depending on the long-term behavior of the particular
 correspondent.

-For more information about the auto-whitelist system, please look
-at the C<Automatic Whitelist System> section of the README file.
-The auto-whitelist is not intended as a general-purpose replacement
-for static whitelist entries added to your config files.
+For more information about the auto-welcomelist system, please look
+at the C<Automatic Welcomelist System> section of the README file.
+The auto-welcomelist is not intended as a general-purpose replacement
+for static welcomelist entries added to your config files.

 Note that certain tests are ignored when determining the final
 message score:
@ -120,12 +123,15 @@ message score:
 =cut

  push (@cmds, {
-		setting => 'use_auto_whitelist',
+		setting => 'use_auto_welcomelist',
+		aliases => ['use_auto_whitelist'], # backward compatible - to be removed for 4.1
 		default => 1,
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
 	       });

-=item auto_whitelist_factor n	(default: 0.5, range [0..1])
+=item auto_welcomelist_factor n	(default: 0.5, range [0..1])
+
+Previously auto_whitelist_factor which will work interchangeably until 4.1.

 How much towards the long-term mean for the sender to regress a message.
 Basically, the algorithm is to track the long-term mean score of messages for
@ -143,12 +149,15 @@ mean; C<factor> = 0 mean just use the calculated score.
 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_factor',
+		setting => 'auto_welcomelist_factor',
+		aliases => ['auto_whitelist_factor'], # backward compatible - to be removed for 4.1
 		default => 0.5,
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
 	       });

-=item auto_whitelist_ipv4_mask_len n	(default: 16, range [0..32])
+=item auto_welcomelist_ipv4_mask_len n	(default: 16, range [0..32])
+
+Previously auto_whitelist_ipv4_mask_len which will work interchangeably until 4.1.

 The AWL database keeps only the specified number of most-significant bits
 of an IPv4 address in its fields, so that different individual IP addresses
@ -164,7 +173,8 @@ of 8, any split is allowed.
 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_ipv4_mask_len',
+		setting => 'auto_welcomelist_ipv4_mask_len',
+		aliases => ['auto_whitelist_ipv4_mask_len'], # removed in 4.1
 		default => 16,
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
 		code => sub {
@ -174,11 +184,13 @@ of 8, any split is allowed.
 		  } elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32) {
 		    return $Mail::SpamAssassin::Conf::INVALID_VALUE;
 		  }
-		  $self->{auto_whitelist_ipv4_mask_len} = $value;
+		  $self->{auto_welcomelist_ipv4_mask_len} = $value;
 		}
 	       });

-=item auto_whitelist_ipv6_mask_len n	(default: 48, range [0..128])
+=item auto_welcomelist_ipv6_mask_len n	(default: 48, range [0..128])
+
+Previously auto_whitelist_ipv6_mask_len which will work interchangeably until 4.1.

 The AWL database keeps only the specified number of most-significant bits
 of an IPv6 address in its fields, so that different individual IP addresses
@ -195,7 +207,8 @@ is allowed.
 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_ipv6_mask_len',
+		setting => 'auto_welcomelist_ipv6_mask_len',
+		aliases => ['auto_whitelist_ipv6_mask_len'], # removed in 4.1
 		default => 48,
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
 		code => sub {
@ -205,7 +218,7 @@ is allowed.
 		  } elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128) {
 		    return $Mail::SpamAssassin::Conf::INVALID_VALUE;
 		  }
-		  $self->{auto_whitelist_ipv6_mask_len} = $value;
+		  $self->{auto_welcomelist_ipv6_mask_len} = $value;
 		}
 	       });

@ -215,7 +228,7 @@ Used by the SQLBasedAddrList storage implementation.

 If this option is set the SQLBasedAddrList module will override the set
 username with the value given.  This can be useful for implementing global
-or group based auto-whitelist databases.
+or group based auto-welcomelist databases.

 =cut

@ -225,7 +238,9 @@ or group based auto-whitelist databases.
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
 	       });

-=item auto_whitelist_distinguish_signed
+=item auto_welcomelist_distinguish_signed
+
+Previously auto_whitelist_distinguish_signed which will work interchangeably until 4.1.

 Used by the SQLBasedAddrList storage implementation.

@ -240,7 +255,8 @@ turning on this option.
 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_distinguish_signed',
+		setting => 'auto_welcomelist_distinguish_signed',
+		aliases => ['auto_whitelist_distinguish_signed'], # removed in 4.1
 		default => 0,
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
 	       });
@ -256,32 +272,38 @@ user's C<user_prefs> file.

 =over 4

-=item auto_whitelist_factory module (default: Mail::SpamAssassin::DBBasedAddrList)
+=item auto_welcomelist_factory module (default: Mail::SpamAssassin::DBBasedAddrList)

-Select alternative whitelist factory module.
+Previously auto_whitelist_factory which will work interchangeably until 4.1.
+
+Select alternative welcomelist factory module.

 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_factory',
+		setting => 'auto_welcomelist_factory',
+		aliases => ['auto_whitelist_factory'], # removed in 4.1
 		is_admin => 1,
 		default => 'Mail::SpamAssassin::DBBasedAddrList',
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
 	       });

-=item auto_whitelist_path /path/filename (default: ~/.spamassassin/auto-whitelist)
+=item auto_welcomelist_path /path/filename (default: ~/.spamassassin/auto-welcomelist)

-This is the automatic-whitelist directory and filename.  By default, each user
-has their own whitelist database in their C<~/.spamassassin> directory with
+Previously auto_whitelist_path which will work interchangeably until 4.1.
+
+This is the automatic-welcomelist directory and filename.  By default, each user
+has their own welcomelist database in their C<~/.spamassassin> directory with
 mode 0700.  For system-wide SpamAssassin use, you may want to share this
 across all users, although that is not recommended.

 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_path',
+		setting => 'auto_welcomelist_path',
+		aliases => ['auto_whitelist_path'], # removed in 4.1
 		is_admin => 1,
-		default => '__userstate__/auto-whitelist',
+		default => '__userstate__/auto-welcomelist',
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
 		code => sub {
 		  my ($self, $key, $value, $line) = @_;
@ -291,13 +313,15 @@ across all users, although that is not recommended.
 		  if (-d $value) {
 		    return $Mail::SpamAssassin::Conf::INVALID_VALUE;
 		  }
-		  $self->{auto_whitelist_path} = $value;
+		  $self->{auto_welcomelist_path} = $value;
 		}
 	       });

-=item auto_whitelist_db_modules Module ...	(default: see below)
+=item auto_welcomelist_db_modules Module ...	(default: see below)

-What database modules should be used for the auto-whitelist storage database
+Previously auto_whitelist_db_modules which will work interchangeably until 4.1.
+
+What database modules should be used for the auto-welcomelist storage database
 file.   The first named module that can be loaded from the perl include path
 will be used.  The format is:

@ -313,15 +337,18 @@ preclude its use for the AWL (see SpamAssassin bug 4353).
 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_db_modules',
+		setting => 'auto_welcomelist_db_modules',
+		aliases => ['auto_whitelist_db_modules'], # removed in 4.1
 		is_admin => 1,
 		default => 'DB_File GDBM_File SDBM_File',
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
 	       });

-=item auto_whitelist_file_mode		(default: 0700)
+=item auto_welcomelist_file_mode		(default: 0700)

-The file mode bits used for the automatic-whitelist directory or file.
+Previously auto_whitelist_file_mode which will work interchangeably until 4.1.
+
+The file mode bits used for the automatic-welcomelist directory or file.

 Make sure you specify this using the 'x' mode bits set, as it may also be used
 to create directories.  However, if a file is created, the resulting file will
@ -330,7 +357,8 @@ not have any execute bits set (the umask is set to 0111).
 =cut

  push (@cmds, {
-		setting => 'auto_whitelist_file_mode',
+		setting => 'auto_welcomelist_file_mode',
+		aliases => ['auto_whitelist_file_mode'], # removed in 4.1
 		is_admin => 1,
 		default => '0700',
 		type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
@ -339,7 +367,7 @@ not have any execute bits set (the umask is set to 0111).
 		  if ($value !~ /^0?[0-7]{3}$/) {
                    return $Mail::SpamAssassin::Conf::INVALID_VALUE;
                  }
-		  $self->{auto_whitelist_file_mode} = untaint_var($value);
+		  $self->{auto_welcomelist_file_mode} = untaint_var($value);
 		}
 	       });

@ -390,7 +418,7 @@ The password for the database username, for the above DSN.

 Used by the SQLBasedAddrList storage implementation.

-The table user auto-whitelists are stored in, for the above DSN.
+The table user auto-welcomelists are stored in, for the above DSN.

 =cut

@ -404,15 +432,15 @@ The table user auto-whitelists are stored in, for the above DSN.
  $conf->{parser}->register_commands(\@cmds);
 }

-sub check_from_in_auto_whitelist {
+sub check_from_in_auto_welcomelist {
    my ($self, $pms) = @_;

-    return 0 unless ($pms->{conf}->{use_auto_whitelist});
+    return 0 unless ($pms->{conf}->{use_auto_welcomelist});

    my $timer = $self->{main}->time_method("total_awl");

    my $from = lc $pms->get('From:addr');
-  # dbg("auto-whitelist: From: $from");
+  # dbg("auto-welcomelist: From: $from");
    return 0 unless $from =~ /\S/;

    # find the earliest usable "originating IP".  ignore private nets
@ -443,18 +471,18 @@ sub check_from_in_auto_whitelist {
    my $awlpoints = (sprintf "%0.3f", $points) + 0;

   # Create the AWL object
-    my $whitelist;
+    my $welcomelist;
    eval {
-      $whitelist = Mail::SpamAssassin::AutoWhitelist->new($pms->{main});
+      $welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($pms->{main});

      my $meanscore;
      { # check
        my $timer = $self->{main}->time_method("check_awl");
-        $meanscore = $whitelist->check_address($from, $origip, $signedby);
+        $meanscore = $welcomelist->check_address($from, $origip, $signedby);
      }
      my $delta = 0;

-      dbg("auto-whitelist: AWL active, pre-score: %s, autolearn score: %s, ".
+      dbg("auto-welcomelist: AWL active, pre-score: %s, autolearn score: %s, ".
 	  "mean: %s, IP: %s, address: %s %s",
          $pms->{score}, $awlpoints,
          !defined $meanscore ? 'undef' : sprintf("%.3f",$meanscore),
@ -463,13 +491,13 @@ sub check_from_in_auto_whitelist {

      if (defined $meanscore) {
 	$delta = $meanscore - $awlpoints;
-	$delta *= $pms->{main}->{conf}->{auto_whitelist_factor};
+	$delta *= $pms->{main}->{conf}->{auto_welcomelist_factor};
      
 	$pms->set_tag('AWL', sprintf("%2.1f",$delta));
        if (defined $meanscore) {
 	  $pms->set_tag('AWLMEAN', sprintf("%2.1f", $meanscore));
 	}
-	$pms->set_tag('AWLCOUNT', sprintf("%2.1f", $whitelist->count()));
+	$pms->set_tag('AWLCOUNT', sprintf("%2.1f", $welcomelist->count()));
 	$pms->set_tag('AWLPRESCORE', sprintf("%2.1f", $pms->{score}));
      }

@ -478,7 +506,7 @@ sub check_from_in_auto_whitelist {
      # later ones.  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=159704
      if (!$pms->{disable_auto_learning}) {
        my $timer = $self->{main}->time_method("update_awl");
-	$whitelist->add_score($awlpoints);
+	$welcomelist->add_score($awlpoints);
      }

      # now redundant, got_hit() takes care of it
@ -491,135 +519,138 @@ sub check_from_in_auto_whitelist {
                      score => sprintf("%0.3f", $delta));
      }

-      $whitelist->finish();
+      $welcomelist->finish();
      1;
    } or do {
      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-      warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
+      warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
      # try an unlock, in case we got that far
-      eval { $whitelist->finish(); } if $whitelist;
+      eval { $welcomelist->finish(); } if $welcomelist;
      return 0;
    };

-    dbg("auto-whitelist: post auto-whitelist score: %.3f", $pms->{score});
+    dbg("auto-welcomelist: post auto-welcomelist score: %.3f", $pms->{score});

    # test hit is above
    return 0;
 }
+*check_from_in_auto_whitelist = \&check_from_in_auto_welcomelist; # removed in 4.1

-sub blacklist_address {
+sub blocklist_address {
  my ($self, $args) = @_;

-  return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
+  return 0 unless ($self->{main}->{conf}->{use_auto_welcomelist});

  unless ($args->{address}) {
-    print "SpamAssassin auto-whitelist: failed to add address to blacklist\n" if ($args->{cli_p});
-    dbg("auto-whitelist: failed to add address to blacklist");
+    print "SpamAssassin auto-welcomelist: failed to add address to blocklist\n" if ($args->{cli_p});
+    dbg("auto-welcomelist: failed to add address to blocklist");
    return;
  }
  
-  my $whitelist;
+  my $welcomelist;
  my $status;

  eval {
-    $whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
+    $welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($self->{main});

-    if ($whitelist->add_known_bad_address($args->{address}, $args->{signedby})) {
-      print "SpamAssassin auto-whitelist: adding address to blacklist: " . $args->{address} . "\n" if ($args->{cli_p});
-      dbg("auto-whitelist: adding address to blacklist: " . $args->{address});
+    if ($welcomelist->add_known_bad_address($args->{address}, $args->{signedby})) {
+      print "SpamAssassin auto-welcomelist: adding address to blocklist: " . $args->{address} . "\n" if ($args->{cli_p});
+      dbg("auto-welcomelist: adding address to blocklist: " . $args->{address});
      $status = 0;
    }
    else {
-      print "SpamAssassin auto-whitelist: error adding address to blacklist\n" if ($args->{cli_p});
-      dbg("auto-whitelist: error adding address to blacklist");
+      print "SpamAssassin auto-welcomelist: error adding address to blocklist\n" if ($args->{cli_p});
+      dbg("auto-welcomelist: error adding address to blocklist");
      $status = 1;
    }
-    $whitelist->finish();
+    $welcomelist->finish();
    1;
  } or do {
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-    warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
-    eval { $whitelist->finish(); };
+    warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
+    eval { $welcomelist->finish(); };
    return 0;
  };

  return $status;
 }
+*blacklist_address = \&blocklist_address; # removed in 4.1

-sub whitelist_address {
+sub welcomelist_address {
  my ($self, $args) = @_;

-  return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
+  return 0 unless ($self->{main}->{conf}->{use_auto_welcomelist});

  unless ($args->{address}) {
-    print "SpamAssassin auto-whitelist: failed to add address to whitelist\n" if ($args->{cli_p});
-    dbg("auto-whitelist: failed to add address to whitelist");
+    print "SpamAssassin auto-welcomelist: failed to add address to welcomelist\n" if ($args->{cli_p});
+    dbg("auto-welcomelist: failed to add address to welcomelist");
    return 0;
  }

-  my $whitelist;
+  my $welcomelist;
  my $status;

  eval {
-    $whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
+    $welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($self->{main});

-    if ($whitelist->add_known_good_address($args->{address}, $args->{signedby})) {
-      print "SpamAssassin auto-whitelist: adding address to whitelist: " . $args->{address} . "\n" if ($args->{cli_p});
-      dbg("auto-whitelist: adding address to whitelist: " . $args->{address});
+    if ($welcomelist->add_known_good_address($args->{address}, $args->{signedby})) {
+      print "SpamAssassin auto-welcomelist: adding address to welcomelist: " . $args->{address} . "\n" if ($args->{cli_p});
+      dbg("auto-welcomelist: adding address to welcomelist: " . $args->{address});
      $status = 1;
    }
    else {
-      print "SpamAssassin auto-whitelist: error adding address to whitelist\n" if ($args->{cli_p});
-      dbg("auto-whitelist: error adding address to whitelist");
+      print "SpamAssassin auto-welcomelist: error adding address to welcomelist\n" if ($args->{cli_p});
+      dbg("auto-welcomelist: error adding address to welcomelist");
      $status = 0;
    }

-    $whitelist->finish();
+    $welcomelist->finish();
    1;
  } or do {
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-    warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
-    eval { $whitelist->finish(); };
+    warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
+    eval { $welcomelist->finish(); };
    return 0;
  };

  return $status;
 }
+*whitelist_address = \&welcomelist_address; # removed in 4.1

 sub remove_address {
  my ($self, $args) = @_;

-  return 0 unless ($self->{main}->{conf}->{use_auto_whitelist});
+  return 0 unless ($self->{main}->{conf}->{use_auto_welcomelist});

  unless ($args->{address}) {
-    print "SpamAssassin auto-whitelist: failed to remove address\n" if ($args->{cli_p});
-    dbg("auto-whitelist: failed to remove address");
+    print "SpamAssassin auto-welcomelist: failed to remove address\n" if ($args->{cli_p});
+    dbg("auto-welcomelist: failed to remove address");
    return 0;
  }

-  my $whitelist;
+  my $welcomelist;
  my $status;

  eval {
-    $whitelist = Mail::SpamAssassin::AutoWhitelist->new($self->{main});
+    $welcomelist = Mail::SpamAssassin::AutoWelcomelist->new($self->{main});

-    if ($whitelist->remove_address($args->{address}, $args->{signedby})) {
-      print "SpamAssassin auto-whitelist: removing address: " . $args->{address} . "\n" if ($args->{cli_p});
-      dbg("auto-whitelist: removing address: " . $args->{address});
+    if ($welcomelist->remove_address($args->{address}, $args->{signedby})) {
+      print "SpamAssassin auto-welcomelist: removing address: " . $args->{address} . "\n" if ($args->{cli_p});
+      dbg("auto-welcomelist: removing address: " . $args->{address});
      $status = 1;
    }
    else {
-      print "SpamAssassin auto-whitelist: error removing address\n" if ($args->{cli_p});
-      dbg("auto-whitelist: error removing address");
+      print "SpamAssassin auto-welcomelist: error removing address\n" if ($args->{cli_p});
+      dbg("auto-welcomelist: error removing address");
      $status = 0;
    }
  
-    $whitelist->finish();
+    $welcomelist->finish();
    1;
  } or do {
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-    warn("auto-whitelist: open of auto-whitelist file failed: $eval_stat\n");
-    eval { $whitelist->finish(); };
+    warn("auto-welcomelist: open of auto-welcomelist file failed: $eval_stat\n");
+    eval { $welcomelist->finish(); };
    return 0;
  };

--- a/upstream/lib/Mail/SpamAssassin/Plugin/AccessDB.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/AccessDB.pm
@ -72,7 +72,7 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->register_eval_rule("check_access_database");
+  $self->register_eval_rule("check_access_database", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);

  return $self;
 }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/AntiVirus.pm
@ -64,8 +64,8 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->register_eval_rule("check_microsoft_executable");
-  $self->register_eval_rule("check_suspect_name");
+  $self->register_eval_rule("check_microsoft_executable", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_suspect_name", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  return $self;
 }
@ -107,7 +107,7 @@ sub _check_attachments {
      # file extension indicates an executable
      $pms->{antivirus_microsoft_exe} = 1;
    }
-    elsif ($cte =~ /base64/ && defined $p->raw()->[0] &&
+    elsif (index($cte, 'base64') >= 0 && defined $p->raw()->[0] &&
 	   $p->raw()->[0] =~ /^TV[opqr].A..[AB].[AQgw][A-H].A/)
    {
      # base64-encoded executable
--- a/upstream/lib/Mail/SpamAssassin/Plugin/AskDNS.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/AskDNS.pm
@ -56,18 +56,30 @@ See the C<Mail::SpamAssassin::Conf> POD for details on C<rbl_timeout>.
 A query template is a string which will be expanded to produce a domain name
 to be used in a DNS query. The template may include SpamAssassin tag names,
 which will be replaced by their values to form a final query domain.
+
 The final query domain must adhere to rules governing DNS domains, i.e.
-must consist of fields each up to 63 characters long, delimited by dots.
-There may be a trailing dot at the end, but it is redundant / carries
-no semantics, because SpamAssassin uses a Net::DSN::Resolver::send method
-for querying DNS, which ignores any 'search' or 'domain' DNS resolver options.
+must consist of fields each up to 63 characters long, delimited by dots,
+not exceeding 255 characters. International domain names (in UTF-8) are
+allowed and will be encoded to ASCII-compatible encoding (ACE) according
+to IDN rules. Syntactically invalid resulting queries will be discarded
+by the DNS resolver code (with some info warnings).
+
+There may be a trailing dot at the end, but it is redundant / carries no
+semantics, because SpamAssassin uses a Net::DSN::Resolver::send method for
+querying DNS, which ignores any 'search' or 'domain' DNS resolver options.
 Domain names in DNS queries are case-insensitive.

 A tag name is a string of capital letters, preceded and followed by an
-underscore character. This syntax mirrors the add_header setting, except that
-tags cannot have parameters in parenthesis when used in askdns templates.
-Tag names may appear anywhere in the template - each queried DNS zone
-prescribes how a query should be formed.
+underscore character.  This syntax mirrors the add_header setting, except
+that tags cannot have parameters in parenthesis when used in askdns
+templates (exceptions found below).  Tag names may appear anywhere in the
+template - each queried DNS zone prescribes how a query should be formed.
+
+Special supported tag HEADER() can be used to query any header content,
+using same header names/modifiers that as header rules support.  For example
+_HEADER(Reply-To:addr:domain)_ can be used to query the trimmed domain part
+of Reply-To address.  See Mail::SpamAssassin::Conf documentation about
+header rules.

 A query template may contain any number of tag names including none,
 although in the most common anticipated scenario exactly one tag name would
@ -114,7 +126,8 @@ parameter will only act as a filter on a result.

 Currently recognized RR types in the rr_type parameter are: ANY, A, AAAA,
 MX, TXT, PTR, NAPTR, NS, SOA, CERT, CNAME, DNAME, DHCID, HINFO, MINFO,
-RP, HIP, IPSECKEY, KX, LOC, SRV, SSHFP, SPF.
+RP, HIP, IPSECKEY, KX, LOC, GPOS, SRV, OPENPGPKEY, SSHFP, SPF, TLSA, URI,
+CAA, CSYNC.

 https://www.iana.org/assignments/dns-parameters/dns-parameters.xml

@ -189,8 +202,10 @@ use warnings;
 use re 'taint';

 use Mail::SpamAssassin::Plugin;
-use Mail::SpamAssassin::Util qw(decode_dns_question_entry);
+use Mail::SpamAssassin::Util qw(decode_dns_question_entry idn_to_ascii
+                                compile_regexp is_fqdn_valid);
 use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Constants qw(:ip);
 use version 0.77;

 our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -202,8 +217,6 @@ our %rcode_value = (  # https://www.iana.org/assignments/dns-parameters, RFC 619
  BADMODE => 19, BADNAME => 20, BADALG => 21, BADTRUNC => 22,
 );

-our $txtdata_can_provide_a_list;
-
 sub new {
  my($class,$sa_main) = @_;

@ -213,10 +226,6 @@ sub new {

  $self->set_config($sa_main->{conf});

-  #$txtdata_can_provide_a_list = Net::DNS->VERSION >= 0.69;
-  #more robust version check from Damyan Ivanov - Bug 7095
-  $txtdata_can_provide_a_list = version->parse(Net::DNS->VERSION) >= version->parse('0.69');
-
  return $self;
 }

@ -253,20 +262,14 @@ sub parse_and_canonicalize_subtest {
  my $result;

  local($1,$2,$3);
-  # modifiers /a, /d, /l, /u in suffix form were added with perl 5.13.10 (5.14)
-  # currently known modifiers are [msixoadlu], but let's not be too picky here
-  if (     $subtest =~ m{^       /  (.+) /  ([a-z]*) \z}xs) {
-    $result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
-  } elsif ($subtest =~ m{^ m \s* \( (.+) \) ([a-z]*) \z}xs) {
-    $result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
-  } elsif ($subtest =~ m{^ m \s* \[ (.+) \] ([a-z]*) \z}xs) {
-    $result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
-  } elsif ($subtest =~ m{^ m \s* \{ (.+) \} ([a-z]*) \z}xs) {
-    $result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
-  } elsif ($subtest =~ m{^ m \s*  < (.+)  > ([a-z]*) \z}xs) {
-    $result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
-  } elsif ($subtest =~ m{^ m \s* (\S) (.+) \1 ([a-z]*) \z}xs) {
-    $result = $2 ne '' ? qr{(?$2)$1} : qr{$1};
+  if ($subtest =~ m{^/ .+ / [a-z]* \z}xs ||
+      $subtest =~ m{^m (\W) .+ (\W) [a-z]* \z}xs) {
+    my ($rec, $err) = compile_regexp($subtest, 1);
+    if (!$rec) {
+      warn "askdns: subtest compile failed: '$subtest': $err\n";
+    } else {
+      $result = $rec;
+    }
  } elsif ($subtest =~ m{^ (["']) (.*) \1 \z}xs) {  # quoted string
    $result = $2;
  } elsif ($subtest =~ m{^ \[ ( (?:[A-Z]+|\d+)
@ -324,8 +327,9 @@ sub set_config {
        my @answer_types = split(/,/, $query_type);
        # https://www.iana.org/assignments/dns-parameters/dns-parameters.xml
        if (grep(!/^(?:ANY|A|AAAA|MX|TXT|PTR|NAPTR|NS|SOA|CERT|CNAME|DNAME|
-                       DHCID|HINFO|MINFO|RP|HIP|IPSECKEY|KX|LOC|SRV|
-                       SSHFP|SPF)\z/x, @answer_types)) {
+                       DHCID|HINFO|MINFO|RP|HIP|IPSECKEY|KX|LOC|GPOS|SRV|
+                       OPENPGPKEY|SSHFP|SPF|TLSA|URI|CAA|CSYNC)\z/x,
+                 @answer_types)) {
          return $Mail::SpamAssassin::Conf::INVALID_VALUE;
        }
        $query_type = 'ANY' if @answer_types > 1 || $answer_types[0] eq 'ANY';
@ -333,25 +337,19 @@ sub set_config {
          $subtest = parse_and_canonicalize_subtest($subtest);
          defined $subtest or return $Mail::SpamAssassin::Conf::INVALID_VALUE;
        }
+
+        # initialize rule structure
+        $self->{askdns}{$rulename}{query} = $query_template;
+        $self->{askdns}{$rulename}{q_type} = $query_type;
+        $self->{askdns}{$rulename}{a_types} = \@answer_types;
+        $self->{askdns}{$rulename}{subtest} = $subtest;
+        $self->{askdns}{$rulename}{tags} = ();
+
        # collect tag names as used in each query template
-        my @tags = $query_template =~ /_([A-Z][A-Z0-9]*)_/g;
-        my %seen; @tags = grep(!$seen{$_}++, @tags);  # filter out duplicates
-
-        # group rules by tag names used in them (to be used as a hash key)
-        my $depends_on_tags = !@tags ? '' : join(',',@tags);
-
-        # subgroup rules by a DNS RR type and a nonexpanded query template
-        my $query_template_key = $query_type . ':' . $query_template;
-
-        $self->{askdns}{$depends_on_tags}{$query_template_key} ||=
-          { query => $query_template, rules => {}, q_type => $query_type,
-            a_types =>  # optimization: undef means "same as q_type"
-              @answer_types == 1 && $answer_types[0] eq $query_type ? undef
-                                                           : \@answer_types };
-        $self->{askdns}{$depends_on_tags}{$query_template_key}{rules}{$rulename}
-          = $subtest;
-      # dbg("askdns: rule: %s, config dep: %s, domkey: %s, subtest: %s",
-      #     $rulename, $depends_on_tags, $query_template_key, $subtest);
+        # also support common HEADER(arg) tag which does $pms->get(arg)
+        my @tags = $query_template =~ /_([A-Z][A-Z0-9]*(?:_[A-Z0-9]+)*(?:\(.*?\))?)_/g;
+        # save rule to tag dependencies
+        $self->{askdns}{$rulename}{tags}{$_} = 1 foreach (@tags);

        # just define the test so that scores and lint works
        $self->{parser}->add_test($rulename, undef,
@ -366,182 +364,127 @@ sub set_config {
 # run as early as possible, launching DNS queries as soon as their
 # dependencies are fulfilled
 #
-sub parsed_metadata {
+sub check_dnsbl {
  my($self, $opts) = @_;
+
  my $pms = $opts->{permsgstatus};
  my $conf = $pms->{conf};

-  return if !$pms->is_dns_available;
-  $pms->{askdns_map_dnskey_to_rules} = {};
+  return if !$pms->is_dns_available();

  # walk through all collected askdns rules, obtain tag values whenever
  # they may become available, and launch DNS queries right after
-  #
-  for my $depends_on_tags (keys %{$conf->{askdns}}) {
-    my @tags;
-    @tags = split(/,/, $depends_on_tags)  if $depends_on_tags ne '';
-
-    if (would_log("dbg","askdns")) {
-      while ( my($query_template_key, $struct) =
-                each %{$conf->{askdns}{$depends_on_tags}} ) {
-        my($query_template, $query_type, $answer_types_ref, $rules) =
-          @$struct{qw(query q_type a_types rules)};
-        dbg("askdns: depend on tags %s, rules: %s ",
-            $depends_on_tags, join(', ', keys %$rules));
+  foreach my $rulename (keys %{$conf->{askdns}}) {
+    if (!$conf->{scores}->{$rulename}) {
+      dbg("askdns: skipping disabled rule $rulename");
+      next;
    }
-    }
-
-    if (!@tags) {
-      # no dependencies on tags, just call directly
-      $self->launch_queries($pms,$depends_on_tags);
-    } else {
-      # enqueue callback for tags needed
+    my @tags = sort keys %{$conf->{askdns}{$rulename}{tags}};
+    if (@tags) {
+      dbg("askdns: rule %s depends on tags: %s", $rulename,
+          join(', ', @tags));
      $pms->action_depends_on_tags(@tags == 1 ? $tags[0] : \@tags,
            sub { my($pms,@args) = @_;
-                    $self->launch_queries($pms,$depends_on_tags) }
+                  $self->launch_queries($pms,$rulename,\@tags) }
      );
+    } else {
+      # no dependencies on tags, just call directly
+      $self->launch_queries($pms,$rulename,[]);
    }
  }
 }

-# generate DNS queries - called for each set of rules
-# when their tag dependencies are met
+# generate DNS queries - called for each rule when its tag dependencies
+# are met
 #
 sub launch_queries {
-  my($self, $pms, $depends_on_tags) = @_;
-  my $conf = $pms->{conf};
+  my($self, $pms, $rulename, $tags) = @_;

-  my %tags;
-  # obtain tag/value pairs of tags we depend upon in this set of rules
-  if ($depends_on_tags ne '') {
-    %tags = map( ($_,$pms->get_tag($_)), split(/,/,$depends_on_tags) );
+  my $arule = $pms->{conf}->{askdns}{$rulename};
+  my $query_tmpl = $arule->{query};
+  my $queries;
+  if (@$tags) {
+    if (!exists $pms->{askdns_qtmpl_cache}{$query_tmpl}) {
+      # replace tags in query template
+      # iterate through each tag, replacing list of strings as we go
+      my %q_iter = ( "$query_tmpl" => 1 );
+      foreach my $tag (@$tags) {
+        # cache tag values locally
+        if (!exists $pms->{askdns_tag_cache}{$tag}) {
+          my $valref = $pms->get_tag_raw($tag);
+          my @vals = grep { defined $_ && $_ ne '' } (ref $valref ? @$valref : $valref);
+          # Paranoid check for undefined tag
+          if (!@vals) {
+            dbg("askdns: skipping rule $rulename, no value found for tag: $tag");
+            return;
          }
-  dbg("askdns: preparing queries which depend on tags: %s",
-      join(', ', map($_.' => '.$tags{$_}, keys %tags)));
-
-  # replace tag names in a query template with actual tag values
-  # and launch DNS queries
-  while ( my($query_template_key, $struct) =
-            each %{$conf->{askdns}{$depends_on_tags}} ) {
-    my($query_template, $query_type, $answer_types_ref, $rules) =
-      @$struct{qw(query q_type a_types rules)};
-
-    my @rulenames = keys %$rules;
-    if (grep($conf->{scores}->{$_}, @rulenames)) {
-      dbg("askdns: query template %s, type %s, rules: %s",
-          $query_template,
-          !$answer_types_ref ? $query_type
-            : $query_type.'/'.join(',',@$answer_types_ref),
-          join(', ', @rulenames));
+          $pms->{askdns_tag_cache}{$tag} = \@vals;
+        }
+        my %q_iter_new;
+        foreach my $q (keys %q_iter) {
+          # handle space separated multi-valued tags
+          foreach my $val (@{$pms->{askdns_tag_cache}{$tag}}) {
+            my $qtmp = $q;
+            $qtmp =~ s/\Q_${tag}_\E/${val}/g;
+            $q_iter_new{$qtmp} = 1;
+          }
+        }
+        %q_iter = %q_iter_new;
+      }
+      # cache idn'd queries
+      my @q_arr;
+      push @q_arr, idn_to_ascii($_) foreach (keys %q_iter);
+      $pms->{askdns_qtmpl_cache}{$query_tmpl} = \@q_arr;
+    }
+    $queries = $pms->{askdns_qtmpl_cache}{$query_tmpl};
  } else {
-      dbg("askdns: query template %s, type %s, all rules disabled: %s",
-          $query_template, $query_type, join(', ', @rulenames));
+    push @$queries, idn_to_ascii($query_tmpl);
+  }
+
+  foreach my $query (@$queries) {
+    if (!is_fqdn_valid($query, 1)) {
+      dbg("askdns: skipping invalid query ($rulename): $query");
      next;
    }
-
-    # collect all tag names from a template, each may occur more than once
-    my @templ_tags = $query_template =~ /_([A-Z][A-Z0-9]*)_/gs;
-
-    # filter out duplicate tag names, and tags with undefined or empty value
-    my %seen;
-    @templ_tags = grep(!$seen{$_}++ && defined $tags{$_} && $tags{$_} ne '',
-                       @templ_tags);
-
-    my %templ_vals;  # values that each tag takes
-    for my $t (@templ_tags) {
-      my %seen;
-      # a tag value may be a space-separated list,
-      # store it as an arrayref, removing duplicate values
-      $templ_vals{$t} = [ grep(!$seen{$_}++, split(' ',$tags{$t})) ];
-    }
-
-    # count through all tag value tuples
-    my @digit = (0) x @templ_tags;  # counting accumulator
-OUTER:
-    for (;;) {
-      my %current_tag_val;  # maps a tag name to its current iteration value
-      for my $j (0 .. $#templ_tags) {
-        my $t = $templ_tags[$j];
-        $current_tag_val{$t} = $templ_vals{$t}[$digit[$j]];
-      }
-      local $1;
-      my $query_domain = $query_template;
-      $query_domain =~ s{_([A-Z][A-Z0-9]*)_}
-                        { defined $current_tag_val{$1} ? $current_tag_val{$1}
-                                                       : '' }ge;
-
-      # the $dnskey identifies this query in AsyncLoop's pending_lookups
-      my $dnskey = join(':', 'askdns', $query_type, $query_domain);
-      dbg("askdns: expanded query %s, dns key %s", $query_domain, $dnskey);
-
-      if ($query_domain eq '') {
-        # ignore, just in case
-      } else {
-        if (!exists $pms->{askdns_map_dnskey_to_rules}{$dnskey}) {
-          $pms->{askdns_map_dnskey_to_rules}{$dnskey} =
-             [ [$query_type, $answer_types_ref, $rules] ];
-        } else {
-          push(@{$pms->{askdns_map_dnskey_to_rules}{$dnskey}},
-               [$query_type, $answer_types_ref, $rules] );
-        }
-        # launch a new DNS query for $query_type and $query_domain
-        my $ent = $pms->{async}->bgsend_and_start_lookup(
-          $query_domain, $query_type, undef,
-          { key => $dnskey, zone => $query_domain },
-          sub { my ($ent2,$pkt) = @_;
-                $self->process_response_packet($pms, $ent2, $pkt, $dnskey) },
-          master_deadline => $pms->{master_deadline} );
-        # these rules are now underway;  unless the rule hits, these will
-        # not be considered "finished" until harvest_dnsbl_queries() completes
-        $pms->register_async_rule_start($dnskey) if $ent;
-      }
-
-      last  if !@templ_tags;
-      # increment accumulator, little-endian
-      for (my $j = 0;  ; $j++) {
-        last  if ++$digit[$j] <= $#{$templ_vals{$templ_tags[$j]}};
-        $digit[$j] = 0;  # and carry
-        last OUTER  if $j >= $#templ_tags;
-      }
-    }
+    dbg("askdns: launching query ($rulename): $query");
+    my $ret = $pms->{async}->bgsend_and_start_lookup(
+      $query, $arule->{q_type}, undef,
+        { rulename => $rulename, type => 'AskDNS' },
+        sub { my ($ent,$pkt) = @_;
+              $self->process_response_packet($pms, $ent, $pkt, $rulename) },
+        master_deadline => $pms->{master_deadline}
+    );
+    $pms->rule_ready($rulename) if !$ret; # mark ready if nothing launched
  }
 }

 sub process_response_packet {
-  my($self, $pms, $ent, $pkt, $dnskey) = @_;
+  my($self, $pms, $ent, $pkt, $rulename) = @_;

-  my $conf = $pms->{conf};
-  my %rulenames_hit;
-
-  # map a dnskey back to info on queries which caused this DNS lookup
-  my $queries_ref = $pms->{askdns_map_dnskey_to_rules}{$dnskey};
-
-  my($header, @question, @answer, $qtype, $rcode);
  # NOTE: $pkt will be undef if the DNS query was aborted (e.g. timed out)
-  if ($pkt) {
-    @answer = $pkt->answer;
-    $header = $pkt->header;
-    @question = $pkt->question;
-    $qtype = uc $question[0]->qtype  if @question;
-    $rcode = uc $header->rcode  if $header;  # 'NOERROR', 'NXDOMAIN', ...
+  return if !$pkt;
+
+  my @question = $pkt->question;
+  return if !@question;
+
+  $pms->rule_ready($rulename); # mark rule ready for metas
+
+  my @answer = $pkt->answer;
+  my $rcode = uc $pkt->header->rcode;  # 'NOERROR', 'NXDOMAIN', ...

  # NOTE: qname is encoded in RFC 1035 zone format, decode it
-    dbg("askdns: answer received, rcode %s, query %s, answer has %d records",
-        $rcode,
+  dbg("askdns: answer received (%s), rcode %s, query %s, answer has %d records",
+      $rulename, $rcode,
      join(', ', map(join('/', decode_dns_question_entry($_)), @question)),
      scalar @answer);

-    if (defined $rcode && exists $rcode_value{$rcode}) {
  # Net::DNS return a rcode name for codes it knows about,
  # and returns a number for the rest; we deal with numbers from here on
  $rcode = $rcode_value{$rcode}  if exists $rcode_value{$rcode};
-    }
-  }
-  if (!@answer) {
+
  # a trick to make the following loop run at least once, so that we can
  # evaluate also rules which only care for rcode status
-    @answer = ( undef );
-  }
+  @answer = (undef)  if !@answer;

  # NOTE:  $rr->rdstring returns the result encoded in a DNS zone file
  # format, i.e. enclosed in double quotes if a result contains whitespace
@ -561,9 +504,12 @@ sub process_response_packet {
  # the code handling such reply from DNS MUST assemble all of these
  # marshaled text blocks into a single one before any syntactical
  # verification takes place.
-  # The same goes for RFC 4408 (SPF), RFC 4871 (DKIM), RFC 5617 (ADSP),
+  # The same goes for RFC 7208 (SPF), RFC 4871 (DKIM), RFC 5617 (ADSP),
  # draft-kucherawy-dmarc-base (DMARC), ...

+  my $arule = $pms->{conf}->{askdns}{$rulename};
+  my $subtest = $arule->{subtest};
+
  for my $rr (@answer) {
    my($rr_rdatastr, $rdatanum, $rr_type);
    if (!$rr) {
@ -571,37 +517,32 @@ sub process_response_packet {
    } else {
      $rr_type = uc $rr->type;
      if ($rr_type eq 'A') {
-        # Net::DNS::RR::A::address() is available since Net::DNS 0.69
-        $rr_rdatastr = $rr->UNIVERSAL::can('address') ? $rr->address
-                                                      : $rr->rdatastr;
+        $rr_rdatastr = $rr->address;
        if ($rr_rdatastr =~ m/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\z/) {
          $rdatanum = Mail::SpamAssassin::Util::my_inet_aton($rr_rdatastr);
        }

      } elsif ($rr->UNIVERSAL::can('txtdata')) {
        # TXT, SPF: join with no intervening spaces, as per RFC 5518
-        if ($txtdata_can_provide_a_list || $rr_type ne 'TXT') {
        $rr_rdatastr = join('', $rr->txtdata);  # txtdata() in list context!
-        } else {  # char_str_list() is only available for TXT records
-          $rr_rdatastr = join('', $rr->char_str_list);  # historical
-        }
+        # Net::DNS attempts to decode text strings in a TXT record as UTF-8,
+        # which is undesired: octets failing the UTF-8 decoding are converted
+        # to a Unicode "replacement character" U+FFFD (encoded as octets
+        # \x{EF}\x{BF}\x{BD} in UTF-8), and ASCII text is unnecessarily
+        # flagged as perl native characters (utf8 flag on), which can be
+        # disruptive on later processing, e.g. implicitly upgrading strings
+        # on concatenation. Unfortunately there is no way of legally bypassing
+        # the UTF-8 decoding by Net::DNS::RR::TXT in Net::DNS::RR::Text.
+        # Try to minimize damage by encoding back to UTF-8 octets:
+        utf8::encode($rr_rdatastr)  if utf8::is_utf8($rr_rdatastr);
+
      } else {
-        # rdatastr() is historical, use rdstring() since Net::DNS 0.69
-        $rr_rdatastr = $rr->UNIVERSAL::can('rdstring') ? $rr->rdstring
-                                                       : $rr->rdatastr;
+        $rr_rdatastr = $rr->rdstring;
        utf8::encode($rr_rdatastr)  if utf8::is_utf8($rr_rdatastr);
      }
      # dbg("askdns: received rr type %s, data: %s", $rr_type, $rr_rdatastr);
    }

-    for my $q_tuple (!ref $queries_ref ? () : @$queries_ref) {
-      next  if !$q_tuple;
-      my($query_type, $answer_types_ref, $rules) = @$q_tuple;
-
-      next  if !defined $qtype;
-      $answer_types_ref = [$query_type]  if !defined $answer_types_ref;
-
-      while (my($rulename,$subtest) = each %$rules) {
    my $match;
    local($1,$2,$3);
    if (ref $subtest eq 'HASH') {  # a list of DNS rcodes (as hash keys)
@ -609,7 +550,7 @@ sub process_response_packet {
    } elsif ($rcode != 0) {
      # skip remaining tests on DNS error
    } elsif (!defined($rr_type) ||
-                 !grep($_ eq 'ANY' || $_ eq $rr_type, @$answer_types_ref) ) {
+             !grep($_ eq 'ANY' || $_ eq $rr_type, @{$arule->{a_types}}) ) {
      # skip remaining tests on wrong RR type
    } elsif (!defined $subtest) {
      $match = 1;  # any valid response of the requested RR type matches
@ -628,15 +569,10 @@ sub process_response_packet {
        : 0; # notice int($n1) to fix perl ~5.14 taint bug (Bug 7725)
    }
    if ($match) {
-          $self->askdns_hit($pms, $ent->{query_domain}, $qtype,
+      $self->askdns_hit($pms, $ent->{query_domain}, $question[0]->qtype,
                        $rr_rdatastr, $rulename);
-          $rulenames_hit{$rulename} = 1;
    }
  }
-    }
-  }
-  # these rules have completed (since they got at least 1 hit)
-  $pms->register_async_rule_finish($_)  for keys %rulenames_hit;
 }

 sub askdns_hit {
@ -648,9 +584,11 @@ sub askdns_hit {

  # only the first hit will show in the test log report, even if
  # an answer section matches more than once - got_hit() handles this
-  $pms->clear_test_state;
-  $pms->test_log(sprintf("%s %s:%s", $query_domain,$qtype,$rr_rdatastr));
+  $pms->test_log(sprintf("%s %s:%s", $query_domain,$qtype,$rr_rdatastr), $rulename);
  $pms->got_hit($rulename, 'ASKDNS: ', ruletype => 'askdns');  # score=>$score
 }

+# Version features
+sub has_tag_header { 1 } # HEADER() was implemented together with Conf::feature_get_host # Bug 7734
+
 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/AuthRes.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/AuthRes.pm
@ -0,0 +1,553 @@
+# <@LICENSE>
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at:
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# </@LICENSE>
+
+=head1 NAME
+
+Mail::SpamAssassin::Plugin::AuthRes - use Authentication-Results header fields
+
+=head1 SYNOPSIS
+
+=head2 SpamAssassin configuration:
+
+loadplugin     Mail::SpamAssassin::Plugin::AuthRes
+
+authres_trusted_authserv  myserv.example.com
+authres_networks  all
+
+=head1 DESCRIPTION
+
+This plugin parses Authentication-Results header fields and can supply the
+results obtained to other plugins, so as to avoid repeating checks that have
+been performed already.
+
+=cut
+
+package Mail::SpamAssassin::Plugin::AuthRes;
+
+use Mail::SpamAssassin::Plugin;
+use Mail::SpamAssassin::Logger;
+use strict;
+use warnings;
+# use bytes;
+use re 'taint';
+
+our @ISA = qw(Mail::SpamAssassin::Plugin);
+
+# list of valid methods and values
+# https://www.iana.org/assignments/email-auth/email-auth.xhtml
+# some others not in that list:
+#   dkim-atps=neutral
+my %method_result = (
+  'auth' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
+  'dkim' => {'fail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'temperror'=>1},
+  'dkim-adsp' => {'discard'=>1,'fail'=>1,'none'=>1,'nxdomain'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1,'unknown'=>1},
+  'dkim-atps' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1,'neutral'=>1},
+  'dmarc' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
+  'domainkeys' => {'fail'=>1,'neutral'=>1,'none'=>1,'permerror'=>1,'policy'=>1,'pass'=>1,'temperror'=>1},
+  'iprev' => {'fail'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
+  'rrvs' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1,'unknown'=>1},
+  'sender-id' => {'fail'=>1,'hardfail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'softfail'=>1,'temperror'=>1},
+  'smime' => {'fail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'temperror'=>1},
+  'spf' => {'fail'=>1,'hardfail'=>1,'neutral'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'policy'=>1,'softfail'=>1,'temperror'=>1},
+  'vbr' => {'fail'=>1,'none'=>1,'pass'=>1,'permerror'=>1,'temperror'=>1},
+);
+my %method_ptype_prop = (
+  'auth' => {'smtp' => {'auth'=>1,'mailfrom'=>1}},
+  'dkim' => {'header' => {'d'=>1,'i'=>1,'b'=>1}},
+  'dkim-adsp' => {'header' => {'from'=>1}},
+  'dkim-atps' => {'header' => {'from'=>1}},
+  'dmarc' => {'header' => {'from'=>1}},
+  'domainkeys' => {'header' => {'d'=>1,'from'=>1,'sender'=>1}},
+  'iprev' => {'policy' => {'iprev'=>1}},
+  'rrvs' => {'smtp' => {'rcptto'=>1}},
+  'sender-id' => {'header' => {'*'=>1}},
+  'smime' => {'body' => {'smime-part'=>1,'smime-identifer'=>1,'smime-serial'=>1,'smime-issuer'=>1}},
+  'spf' => {'smtp' => {'mailfrom'=>1,'helo'=>1}},
+  'vbr' => {'header' => {'md'=>1,'mv'=>1}},
+);
+      
+# Some MIME helpers
+my $QUOTED_STRING = qr/"((?:[^"\\]++|\\.)*+)"?/;
+my $TOKEN = qr/[^\s\x00-\x1f\x80-\xff\(\)\<\>\@\,\;\:\/\[\]\?\=\"]+/;
+my $ATOM = qr/[a-zA-Z0-9\@\!\#\$\%\&\\\'\*\+\-\/\=\?\^\_\`\{\|\}\~]+/;
+
+sub new {
+  my ($class, $mailsa) = @_;
+
+  # the usual perlobj boilerplate to create a subclass object
+  $class = ref($class) || $class;
+  my $self = $class->SUPER::new($mailsa);
+  bless ($self, $class);
+
+  $self->set_config($mailsa->{conf});
+
+  # process first as other plugins might depend on us
+  $self->register_method_priority("parsed_metadata", -10);
+
+  $self->register_eval_rule("check_authres_result", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+
+  return $self;
+}
+
+sub set_config {
+  my ($self, $conf) = @_;
+  my @cmds;
+
+=head1 ADMINISTRATOR OPTIONS
+
+=over
+
+=item authres_networks internal/trusted/all    (default: internal)
+
+Process Authenticated-Results headers set by servers from these networks
+(refers to SpamAssassin *_networks zones).  Any header outside this is
+completely ignored (affects all module settings).
+
+ internal   = internal_networks
+ trusted    = internal_networks + trusted_networks
+ all        = all above + all external
+
+Setting "all" is safe only if your MX servers filter properly all incoming
+A-R headers, and you use authres_trusted_authserv to match your authserv-id. 
+This is suitable for default OpenDKIM for example.  These settings might
+also be required if your filters do not insert A-R header to correct
+position above the internal Received header (some known offenders: OpenDKIM,
+OpenDMARC, amavisd-milter).
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'authres_networks',
+    is_admin => 1,
+    default => 'internal',
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if (!defined $value || $value =~ /^$/) {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      $value = lc($value);
+      if ($value =~ /^(?:internal|trusted|all)$/) {
+        $self->{authres_networks} = $value;
+      } else {
+        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
+      }
+    }
+  });
+
+=over
+
+=item authres_trusted_authserv authservid1 id2 ...   (default: none)
+
+Trusted authentication server IDs (the domain-name-like first word of
+Authentication-Results field, also known as C<authserv-id>).
+
+Note that if set, ALL A-R headers are ignored unless a match is found.
+
+Use strongly recommended, possibly along with authres_networks all.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'authres_trusted_authserv',
+    is_admin => 1,
+    default => {},
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if (!defined $value || $value =~ /^$/) {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      foreach my $id (split(/\s+/, lc $value)) {
+        $self->{authres_trusted_authserv}->{$id} = 1;
+      }
+    }
+  });
+
+=over
+
+=item authres_ignored_authserv authservid1 id2 ...   (default: none)
+
+Ignored authentication server IDs (the domain-name-like first word of
+Authentication-Results field, also known as C<authserv-id>).
+
+Any A-R header is ignored if match is found.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'authres_ignored_authserv',
+    is_admin => 1,
+    default => {},
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if (!defined $value || $value =~ /^$/) {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      foreach my $id (split(/\s+/, lc $value)) {
+        $self->{authres_ignored_authserv}->{$id} = 1;
+      }
+    }
+  });
+
+  $conf->{parser}->register_commands(\@cmds);
+}
+
+=head1 METADATA
+
+Parsed headers are stored in $pms-E<gt>{authres_parsed}, as a hash of array
+of hashes where results are collected by method.  For example, the header
+field:
+
+  Authentication-Results: server.example.com;
+    spf=pass smtp.mailfrom=bounce.example.org;
+    dkim=pass header.i=@example.org;
+    dkim=fail header.i=@another.signing.domain.example
+
+Produces the following structure:
+
+ $pms->{authres_parsed} = {
+   'dkim' => [
+     {
+       'properties' => {
+         'header' => {
+           'i' => '@example.org'
+         }
+       },
+       'authserv' => 'server.example.com',
+       'result' => 'pass',
+       'version' => 1,
+       'reason' => ''
+     },
+     {
+       'properties' => {
+         'header' => {
+           'i' => '@another.signing.domain.example'
+         }
+       },
+       'result' => 'fail',
+       'authserv' => 'server.example.com',
+       'version' => 1,
+       'reason' => ''
+     },
+   ],
+ }
+
+Within each array, the order of results is the original, which should be most
+recent results first.
+
+For checking result of methods, $pms-E<gt>{authres_result} is available:
+
+ $pms->{authres_result} = {
+   'dkim' => 'pass',
+   'spf' => 'fail',
+ }
+
+=head1 EVAL FUNCTIONS
+
+=over 4
+
+=item header RULENAME eval:check_authres_result(method, result)
+
+Can be used to check results.
+
+  ifplugin Mail::SpamAssassin::Plugin::AuthRes
+  ifplugin !(Mail::SpamAssassin::Plugin::SPF)
+    header  SPF_PASS      eval:check_authres_result('spf', 'pass')
+    header  SPF_FAIL      eval:check_authres_result('spf', 'fail')
+    header  SPF_SOFTFAIL  eval:check_authres_result('spf', 'softfail')
+    header  SPF_TEMPFAIL  eval:check_authres_result('spf', 'tempfail')
+  endif
+  ifplugin !(Mail::SpamAssassin::Plugin::DKIM)
+    header  DKIM_VERIFIED  eval:check_authres_result('dkim', 'pass')
+    header  DKIM_INVALID   eval:check_authres_result('dkim', 'fail')
+  endif
+  endif
+
+=back
+
+=cut
+
+sub check_authres_result {
+  my ($self, $pms, $method, $wanted_result) = @_;
+
+  my $result = $pms->{authres_result}->{$method};
+  $wanted_result = lc($wanted_result);
+
+  if ($wanted_result eq 'missing') {
+    return !defined($result) ? 1 : 0;
+  }
+
+  return ($wanted_result eq $result);
+}
+
+sub parsed_metadata {
+  my ($self, $opts) = @_;
+
+  my $pms = $opts->{permsgstatus};
+
+  my @authres;
+  my $nethdr;
+
+  if ($pms->{conf}->{authres_networks} eq 'internal') {
+    $nethdr = 'ALL-INTERNAL';
+  } elsif ($pms->{conf}->{authres_networks} eq 'trusted') {
+    $nethdr = 'ALL-TRUSTED';
+  } else {
+    $nethdr = 'ALL';
+  }
+
+  foreach my $hdr (split(/^/m, $pms->get($nethdr))) {
+    if ($hdr =~ /^(?:Arc\-)?Authentication-Results:\s*(.+)/i) {
+      push @authres, $1;
+    }
+  }
+
+  if (!@authres) {
+    dbg("authres: no Authentication-Results headers found from %s",
+      $pms->{conf}->{authres_networks});
+    return 0;
+  }
+
+  foreach (@authres) {
+    eval {
+      $self->parse_authres($pms, $_);
+    } or do {
+      dbg("authres: skipping header, $@");
+    }
+  }
+
+  $pms->{authres_result} = {};
+  # Set $pms->{authres_result} info for all found methods
+  # 'pass' will always win if multiple results
+  foreach my $method (keys %method_result) {
+    my $parsed = $pms->{authres_parsed}->{$method};
+    next if !$parsed;
+    foreach my $pref (@$parsed) {
+      if (!$pms->{authres_result}->{$method} ||
+            $pref->{result} eq 'pass')
+      {
+        $pms->{authres_result}->{$method} = $pref->{result};
+      }
+    }
+  }
+
+  if (%{$pms->{authres_result}}) {
+    dbg("authres: results: %s",
+      join(' ', map { $_.'='.$pms->{authres_result}->{$_} }
+        sort keys %{$pms->{authres_result}}));
+  } else {
+    dbg("authres: no results");
+  }
+}
+
+sub parse_authres {
+  my ($self, $pms, $hdr) = @_;
+
+  dbg("authres: parsing Authentication-Results: $hdr");
+
+  my $authserv;
+  my $version = 1;
+  my @methods;
+
+  local $_ = $hdr;
+
+  # authserv-id
+  if (!/\G($TOKEN)/gcs) {
+    die("invalid authserv\n");
+  }
+  $authserv = lc($1);
+
+  if (%{$pms->{conf}->{authres_trusted_authserv}}) {
+    if (!$pms->{conf}->{authres_trusted_authserv}->{$authserv}) {
+      die("authserv not trusted: $authserv\n");
+    }
+  }
+  if ($pms->{conf}->{authres_ignored_authserv}->{$authserv}) {
+    die("ignored authserv: $authserv\n");
+  }
+
+  skip_cfws();
+  if (/\G\d+/gcs) { # skip authserv version
+    skip_cfws();
+  }
+  if (!/\G;/gcs) {
+    die("missing delimiter\n");
+  }
+  skip_cfws();
+
+  while (pos() < length()) {
+    my ($method, $result);
+    my $reason = '';
+    my $props = {};
+
+    # skip none method
+    if (/\Gnone\b/igcs) {
+      die("method none\n");
+    }
+
+    # method / version = result
+    if (!/\G([\w-]+)/gcs) {
+      die("invalid method\n");
+    }
+    $method = lc($1);
+    if (!exists $method_result{$method}) {
+      die("unknown method: $method\n");
+    }
+    skip_cfws();
+    if (/\G\//gcs) {
+      skip_cfws();
+      if (!/\G\d+/gcs) {
+        die("invalid $method version\n");
+      }
+      $version = $1;
+      skip_cfws();
+    }
+    if (!/\G=/gcs) {
+      die("missing result for $method: ".substr($_, pos())."\n");
+    }
+    skip_cfws();
+    if (!/\G(\w+)/gcs) {
+      die("invalid result for $method\n");
+    }
+    $result = $1;
+    if (!exists $method_result{$method}{$result}) {
+      die("unknown result for $method: $result\n");
+    }
+    skip_cfws();
+
+    # reason = value
+    if (/\Greason\b/igcs) {
+      skip_cfws();
+      if (!/\G=/gcs) {
+        die("invalid reason\n");
+      }
+      skip_cfws();
+      if (!/\G$QUOTED_STRING|($TOKEN)/gcs) {
+        die("invalid reason\n");
+      }
+      $reason = defined $1 ? $1 : $2;
+      skip_cfws();
+    }
+
+    # ptype.property = value
+    while (pos() < length()) {
+      my ($ptype, $property, $value);
+
+      # ptype
+      if (!/\G(\w+)/gcs) {
+        die("invalid ptype: ".substr($_,pos())."\n");
+      }
+      $ptype = lc($1);
+      if (!exists $method_ptype_prop{$method}{$ptype}) {
+        die("unknown ptype: $ptype\n");
+      }
+      skip_cfws();
+
+      # dot
+      if (!/\G\./gcs) {
+        die("missing property\n");
+      }
+      skip_cfws();
+
+      # property
+      if (!/\G(\w+)/gcs) {
+        die("invalid property\n");
+      }
+      $property = lc($1);
+      if (!exists $method_ptype_prop{$method}{$ptype}{$property} &&
+          !exists $method_ptype_prop{$method}{$ptype}{'*'}) {
+        die("unknown property for $ptype: $property\n");
+      }
+      skip_cfws();
+
+      # =
+      if (!/\G=/gcs) {
+        die("missing property value\n");
+      }
+      skip_cfws();
+
+      # value:
+      # The grammar is ( value / [ [ local-part ] "@" ] domain-name )
+      # where value := token / quoted-string
+      # and local-part := dot-atom / quoted-string / obs-local-part
+      if (!/\G$QUOTED_STRING|($ATOM(?:\.$ATOM)*|$TOKEN)(?=(?:[\s;]|$))/gcs) {
+        die("invalid $ptype.$property value\n");
+      }
+      $value = defined $1 ? $1 : $2;
+      skip_cfws();
+
+      $props->{$ptype}->{$property} = $value;
+
+      if (/\G(?:;|$)/gcs) {
+        skip_cfws();
+        last;
+      }
+    }
+
+    push @methods, [$method, {
+        'authserv' => $authserv,
+        'version' => $version,
+        'result' => $result,
+        'reason' => $reason,
+        'properties' => $props,
+        }];
+  }
+
+  # paranoid check..
+  if (pos() < length()) {
+    die("parse ended prematurely?\n");
+  }
+
+  # Pushed to pms only if header parsed completely
+  foreach my $marr (@methods) {
+    push @{$pms->{authres_parsed}->{$marr->[0]}}, $marr->[1];
+  }
+
+  return 1;
+}
+
+# skip whitespace and comments
+sub skip_cfws {
+  /\G\s*/gcs;
+  if (/\G\(/gcs) {
+    my $i = 1;
+    while (/\G.*?([()]|\z)/gcs) {
+      $1 eq ')' ? $i-- : $i++;
+      last if !$i;
+    }
+    die("comment not ended\n") if $i;
+    /\G\s*/gcs;
+  }
+}
+
+#sub check_cleanup {
+#  my ($self, $opts) = @_;
+#  my $pms = $opts->{permsgstatus};
+#  use Data::Dumper;
+#  print STDERR Dumper($pms->{authres_parsed});
+#  print STDERR Dumper($pms->{authres_result});
+#}
+
+1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
@ -105,7 +105,11 @@ Note: SpamAssassin requires at least 3 points from the header, and 3
 points from the body to auto-learn as spam.  Therefore, the minimum
 working value for this option is 6.

-If the test option autolearn_force is set, the minimum value will 
+If test option C<autolearn_header> or C<autolearn_body> is set, points from
+that rule are forced to count as coming from header or body accordingly. 
+This can be useful for adjusting some meta rules.
+
+If the test option C<autolearn_force> is set, the minimum value will 
 remain at 6 points but there is no requirement that the points come
 from body and header rules.  This option is useful for autolearning
 with rules that are considered to be extremely safe indicators of 
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Bayes.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Bayes.pm
@ -37,7 +37,43 @@ And the chi-square probability combiner as described here:

 The results are incorporated into SpamAssassin as the BAYES_* rules.

-=head1 METHODS
+=head1 ADMINISTRATOR SETTINGS
+
+=over 4
+
+=item bayes_stopword_languages lang             (default: en)
+
+Languages enabled in bayes stopwords processing, every language have a
+default stopwords regexp, tokens matching this regular expression will not
+be considered in bayes processing.
+
+Custom regular expressions for additional languages can be defined in C<local.cf>.
+
+Custom regular expressions can be specified by using the C<bayes_stopword_lang>
+keyword like in the following example:
+
+ bayes_stopword_languages en se
+ bayes_stopword_en (?:you|me)
+ bayes_stopword_se (?:du|mig)
+
+Regexps are case-insensitive will be anchored automatically at beginning and
+end.
+
+To disable stopwords usage, specify C<bayes_stopword_languages disable>.
+
+Only one bayes_stopword_languages or bayes_stopword_xx configuration line
+can be used.  New configuration line will override the old one, for example
+the ones from SpamAssassin default ruleset (60_bayes_stopwords.cf).
+
+=back
+
+=over 4
+
+=item bayes_max_token_length (default: 15)
+
+Configure the maximum number of character a token could contain
+
+=back

 =cut

@ -48,16 +84,12 @@ use warnings;
 # use bytes;
 use re 'taint';

-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1 sha1_hex); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1 sha1_hex) }
-}
+use Digest::SHA qw(sha1 sha1_hex);

-use Mail::SpamAssassin;
 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::PerMsgStatus;
 use Mail::SpamAssassin::Logger;
-use Mail::SpamAssassin::Util qw(untaint_var);
+use Mail::SpamAssassin::Util qw(compile_regexp untaint_var);

 # pick ONLY ONE of these combining implementations.
 use Mail::SpamAssassin::Bayes::CombineChi;
@ -135,13 +167,16 @@ our $IGNORED_HDRS = qr{(?: (?:X-)?Sender    # misc noise
  | X-Gnus-Mail-Source
  | Xref

-)}x;
+)}ix;

 # Note only the presence of these headers, in order to reduce the
 # hapaxen they generate.
 our $MARK_PRESENCE_ONLY_HDRS = qr{(?: X-Face
  |X-(?:Gnu-?PG|PGP|GPG)(?:-Key)?-Fingerprint
  |D(?:KIM|omainKey)-Signature
+  |X-Google-DKIM-Signature
+  |ARC-(?:Message-Signature|Seal)
+  |Autocrypt
 )}ix;

 # tweaks tested as of Nov 18 2002 by jm posted to -devel at
@ -220,6 +255,8 @@ use constant REQUIRE_SIGNIFICANT_TOKENS_TO_SCORE => -1;

 # How long a token should we hold onto?  (note: German speakers typically
 # will require a longer token than English ones.)
+# This is just a default value, option can be changed using
+# bayes_max_token_length option
 use constant MAX_TOKEN_LENGTH => 15;

 ###########################################################################
@ -236,10 +273,100 @@ sub new {
  $self->{conf} = $main->{conf};
  $self->{use_ignores} = 1;

-  $self->register_eval_rule("check_bayes");
+  # Old default stopword list, need to have hardcoded one incase sa-update is not available
+  $self->{bayes_stopword}{en} = qr/(?:a(?:ble|l(?:ready|l)|n[dy]|re)|b(?:ecause|oth)|c(?:an|ome)|e(?:ach|mail|ven)|f(?:ew|irst|or|rom)|give|h(?:a(?:ve|s)|ttp)|i(?:n(?:formation|to)|t\'s)|just|know|l(?:ike|o(?:ng|ok))|m(?:a(?:de|il(?:(?:ing|to))?|ke|ny)|o(?:re|st)|uch)|n(?:eed|o[tw]|umber)|o(?:ff|n(?:ly|e)|ut|wn)|p(?:eople|lace)|right|s(?:ame|ee|uch)|t(?:h(?:at|is|rough|e)|ime)|using|w(?:eb|h(?:ere|y)|ith(?:out)?|or(?:ld|k))|y(?:ears?|ou(?:(?:\'re|r))?))/;
+
+  $self->set_config($self->{conf});
+  $self->register_eval_rule("check_bayes", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
  $self;
 }

+sub set_config {
+  my ($self, $conf) = @_;
+  my @cmds;
+
+  push(@cmds, {
+    setting => 'bayes_max_token_length',
+    default => MAX_TOKEN_LENGTH,
+    is_admin => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+  });
+
+  push(@cmds, {
+    setting => 'bayes_stopword_languages',
+    default => ['en'],
+    is_admin => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRINGLIST,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      my @langs;
+      if ($value eq 'disable') {
+        @{$self->{bayes_stopword_languages}} = ();
+      }
+      else {
+        foreach my $lang (split(/(?:\s*,\s*|\s+)/, lc($value))) {
+          if ($lang !~ /^([a-z]{2})$/) {
+            return $Mail::SpamAssassin::Conf::INVALID_VALUE;
+          }
+          push @langs, $lang;
+        }
+        if (!@langs) {
+          return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+        }
+        @{$self->{bayes_stopword_languages}} = @langs;
+      }
+    }
+  });
+
+  $conf->{parser}->register_commands(\@cmds);
+}
+
+sub parse_config {
+  my ($self, $opts) = @_;
+
+  # Ignore users's configuration lines
+  return 0 if $opts->{user_config};
+
+  if ($opts->{key} =~ /^bayes_stopword_([a-z]{2})$/i) {
+      $self->inhibit_further_callbacks();
+      my $lang = lc($1);
+      foreach my $re (split(/\s+/, $opts->{value})) {
+        my ($rec, $err) = compile_regexp('^(?i)'.$re.'$', 0);
+        if (!$rec) {
+          warn "bayes: invalid regexp for $opts->{key}: $err\n";
+          return 0;
+        }
+        $self->{bayes_stopword}{$lang} = $rec;
+      }
+      return 1;
+  }
+
+  return 0;
+}
+
+sub finish_parsing_end {
+  my ($self, $opts) = @_;
+  my $conf = $opts->{conf};
+
+  my @langs;
+  foreach my $lang (@{$conf->{bayes_stopword_languages}}) {
+    if (defined $self->{bayes_stopword}{$lang}) {
+      push @langs, $lang;
+    } else {
+      warn "bayes: missing stopwords regexp for language '$lang'\n";
+    }
+  }
+  if (@langs) {
+    dbg("bayes: stopwords for languages enabled: ".join(' ', @langs));
+    @{$conf->{bayes_stopword_languages}} = @langs;
+  } else {
+    dbg("bayes: no stopword languages enabled");
+    $conf->{bayes_stopword_languages} = [];
+  }
+
+  return 0;
+}
+
 sub finish {
  my $self = shift;
  if ($self->{store}) {
@ -410,10 +537,11 @@ sub _learn_trapped {
  my @msgid = ( $msgid );

  if (!defined $msgid) {
-    @msgid = $self->get_msgid($msg);
+    @msgid = ( $msg->generate_msgid(), $msg->get_msgid() );
  }

  foreach my $msgid_t ( @msgid ) {
+    next if !defined $msgid_t;
    my $seen = $self->{store}->seen_get ($msgid_t);

    if (defined ($seen)) {
@ -545,7 +673,7 @@ sub _forget_trapped {
  my $isspam;

  if (!defined $msgid) {
-    @msgid = $self->get_msgid($msg);
+    @msgid = ( $msg->generate_msgid(), $msg->get_msgid() );
  }

  while( $msgid = shift @msgid ) {
@ -671,7 +799,6 @@ sub learner_is_scan_available {

 sub scan {
  my ($self, $permsgstatus, $msg) = @_;
-  my $score;

  return unless $self->{conf}->{use_learner};

@ -756,6 +883,7 @@ sub scan {
  if (@pw_keys > N_SIGNIFICANT_TOKENS) { $#pw_keys = N_SIGNIFICANT_TOKENS - 1 }

  my @sorted;
+  my $score;
  foreach my $tok (@pw_keys) {
    next if $tok_strength{$tok} <
                $Mail::SpamAssassin::Bayes::Combine::MIN_PROB_STRENGTH;
@ -964,49 +1092,6 @@ sub learner_dump_database {
 ###########################################################################
 # TODO: these are NOT public, but the test suite needs to call them.

-sub get_msgid {
-  my ($self, $msg) = @_;
-
-  my @msgid;
-
-  my $msgid = $msg->get_header("Message-Id");
-  if (defined $msgid && $msgid ne '' && $msgid !~ /^\s*<\s*(?:\@sa_generated)?>.*$/) {
-    # remove \r and < and > prefix/suffixes
-    chomp $msgid;
-    $msgid =~ s/^<//; $msgid =~ s/>.*$//g;
-    push(@msgid, $msgid);
-  }
-
-  # Modified 2012-01-17  per bug 5185 to remove last received from msg_id calculation
-
-  # Use sha1_hex(Date: and top N bytes of body)
-  # where N is MIN(1024 bytes, 1/2 of body length)
-  #
-  my $date = $msg->get_header("Date");
-  $date = "None" if (!defined $date || $date eq ''); # No Date?
-
-  #Removed per bug 5185
-  #my @rcvd = $msg->get_header("Received");
-  #my $rcvd = $rcvd[$#rcvd];
-  #$rcvd = "None" if (!defined $rcvd || $rcvd eq ''); # No Received?
-
-  # Make a copy since pristine_body is a reference ...
-  my $body = join('', $msg->get_pristine_body());
-
-  if (length($body) > 64) { # Small Body?
-    my $keep = ( length $body > 2048 ? 1024 : int(length($body) / 2) );
-    substr($body, $keep) = '';
-  }
-
-  #Stripping all CR and LF so that testing midstream from MTA and post delivery don't 
-  #generate different id's simply because of LF<->CR<->CRLF changes.
-  $body =~ s/[\r\n]//g;
-
-  unshift(@msgid, sha1_hex($date."\000".$body).'@sa_generated');
-
-  return wantarray ? @msgid : $msgid[0];
-}
-
 sub get_body_from_msg {
  my ($self, $msg) = @_;

@ -1024,7 +1109,7 @@ sub get_body_from_msg {

  if (!defined $msgdata) {
    # why?!
-    warn "bayes: failed to get body for ".scalar($self->get_msgid($self->{msg}))."\n";
+    warn "bayes: failed to get body for ".scalar($self->{msg}->generate_msgid())."\n";
    return { };
  }

@ -1052,8 +1137,10 @@ sub _get_msgdata_from_permsgstatus {
 # The calling functions expect a uniq'ed array of tokens ...
 sub tokenize {
  my ($self, $msg, $msgdata) = @_;
+  my $conf = $self->{conf};
+  my $t_src = $conf->{bayes_token_sources};

-  my $t_src = $self->{conf}->{bayes_token_sources};
+  $self->{stopword_cache} = ();

  # visible tokens from the body
  my @tokens_body;
@ -1117,6 +1204,8 @@ sub tokenize {
    dbg("bayes: tokenized header: %d tokens", scalar @tokens_header);
  }

+  delete $self->{stopword_cache};
+
  # Go ahead and uniq the array, skip null tokens (can happen sometimes)
  # generate an SHA1 hash and take the lower 40 bits as our token
  my %tokens;
@ -1137,6 +1226,7 @@ sub _tokenize_line {
  my $region = $_[3];
  local ($_) = $_[1];

+  my $conf = $self->{conf};
  my @rettokens;

  # include quotes, .'s and -'s for URIs, and [$,]'s for Nigerian-scam strings,
@ -1174,7 +1264,7 @@ sub _tokenize_line {
  # cleared, even if the source string has perl characters semantics !!!
  # Is this really still desirable?

-  foreach my $token (split) {
+TOKEN: foreach my $token (split) {
    $token =~ s/^[-'"\.,]+//;        # trim non-alphanum chars at start or end
    $token =~ s/[-'"\.,]+$//;        # so we don't get loads of '"foo' tokens

@ -1183,7 +1273,7 @@ sub _tokenize_line {
    # tokens, so the SQL BayesStore returns undef.  I really want a way
    # of optimizing that out, but I haven't come up with anything yet.
    #
-    next if ( defined $magic_re && $token =~ /$magic_re/ );
+    next if ( defined $magic_re && $token =~ /$magic_re/o );

    # *do* keep 3-byte tokens; there's some solid signs in there
    my $len = length($token);
@ -1192,8 +1282,24 @@ sub _tokenize_line {
    # area, and it just slows us down to record them.
    # See http://wiki.apache.org/spamassassin/BayesStopList for more info.
    #
-    next if $len < 3 ||
-	($token =~ /^(?:a(?:ble|l(?:ready|l)|n[dy]|re)|b(?:ecause|oth)|c(?:an|ome)|e(?:ach|mail|ven)|f(?:ew|irst|or|rom)|give|h(?:a(?:ve|s)|ttp)|i(?:n(?:formation|to)|t\'s)|just|know|l(?:ike|o(?:ng|ok))|m(?:a(?:de|il(?:(?:ing|to))?|ke|ny)|o(?:re|st)|uch)|n(?:eed|o[tw]|umber)|o(?:ff|n(?:ly|e)|ut|wn)|p(?:eople|lace)|right|s(?:ame|ee|uch)|t(?:h(?:at|is|rough|e)|ime)|using|w(?:eb|h(?:ere|y)|ith(?:out)?|or(?:ld|k))|y(?:ears?|ou(?:(?:\'re|r))?))$/i);
+    next if $len < 3;
+
+    # check stopwords regexp if not cached
+    if (@{$conf->{bayes_stopword_languages}}) {
+      if (!exists $self->{stopword_cache}{$token}) {
+        foreach my $lang (@{$conf->{bayes_stopword_languages}}) {
+          if ($token =~ $self->{bayes_stopword}{$lang}) {
+            dbg("bayes: skipped token '$token' because it's in stopword list for language '$lang'");
+            $self->{stopword_cache}{$token} = 1;
+            next TOKEN;
+          }
+        }
+        $self->{stopword_cache}{$token} = 0;
+      } else {
+        # bail out if cached known
+        next if $self->{stopword_cache}{$token};
+      }
+    }

    # are we in the body?  If so, apply some body-specific breakouts
    if ($region == 1 || $region == 2) {
@ -1212,7 +1318,7 @@ sub _tokenize_line {
    # used as part of split tokens such as "HTo:D*net" indicating that 
    # the domain ".net" appeared in the To header.
    #
-    if ($len > MAX_TOKEN_LENGTH && $token !~ /\*/) {
+    if ($len > $conf->{bayes_max_token_length} && index($token, '*') == -1) {

      if (TOKENIZE_LONG_8BIT_SEQS_AS_UTF8_CHARS && $token =~ /[\x80-\xBF]{2}/) {
 	# Bug 7135
@ -1287,9 +1393,6 @@ sub _tokenize_headers {

  my %parsed;

-  my %user_ignore;
-  $user_ignore{lc $_} = 1 for @{$self->{main}->{conf}->{bayes_ignore_headers}};
-
  # get headers in array context
  my @hdrs;
  my @rcvdlines;
@ -1317,7 +1420,7 @@ sub _tokenize_headers {

    # remove user-specified headers here, after Received, in case they
    # want to ignore that too
-    next if exists $user_ignore{lc $hdr};
+    next if exists $self->{conf}->{bayes_ignore_header}->{lc $hdr};

    # Prep the header value
    $val ||= '';
@ -1344,22 +1447,38 @@ sub _tokenize_headers {
    elsif ($hdr =~ /^${MARK_PRESENCE_ONLY_HDRS}$/i) {
      $val = "1"; # just mark the presence, they create lots of hapaxen
    }
+    elsif ($hdr =~ /^x-spam-relays-(?:external|internal|trusted|untrusted)$/) {
+      # remove redundant rdns helo ident envfrom intl auth msa words
+      $val =~ s/ [a-z]+=/ /g;
+    }

    if (MAP_HEADERS_MID) {
      if ($hdr =~ /^(?:In-Reply-To|References|Message-ID)$/i) {
+        if (exists $parsed{"*MI"}) {
+          $parsed{"*MI"} .= " ".$val;
+        } else {
          $parsed{"*MI"} = $val;
        }
      }
+    }
    if (MAP_HEADERS_FROMTOCC) {
      if ($hdr =~ /^(?:From|To|Cc)$/i) {
+        if (exists $parsed{"*Ad"}) {
+          $parsed{"*Ad"} .= " ".$val;
+        } else {
          $parsed{"*Ad"} = $val;
        }
      }
+    }
    if (MAP_HEADERS_USERAGENT) {
      if ($hdr =~ /^(?:X-Mailer|User-Agent)$/i) {
+        if (exists $parsed{"*UA"}) {
+          $parsed{"*UA"} .= " ".$val;
+        } else {
          $parsed{"*UA"} = $val;
        }
      }
+    }

    # replace hdr name with "compressed" version if possible
    if (defined $HEADER_NAME_COMPRESSION{$hdr}) {
@ -1371,11 +1490,13 @@ sub _tokenize_headers {
    } else {
      $parsed{$hdr} = $val;
    }
+  }
+
  if (would_log('dbg', 'bayes') > 1) {
+    foreach my $hdr (sort keys %parsed) {
      dbg("bayes: header tokens for $hdr = \"$parsed{$hdr}\"");
    }
  }
-
  return %parsed;
 }

@ -1393,7 +1514,7 @@ sub _pre_chew_content_type {
  }

  # stop-list words for Content-Type header: these wind up totally gray
-  $val =~ s/\b(?:text|charset)\b//;
+  $val =~ s/\b(?:text|charset)\b/ /g;

  $val;
 }
@ -1468,10 +1589,17 @@ sub _pre_chew_addr_header {
  my ($self, $val) = @_;
  local ($_);

-  my @addrs = $self->{main}->find_all_addrs_in_line ($val);
+  my @addrs = Mail::SpamAssassin::Util::parse_header_addresses($val);
  my @toks;
-  foreach (@addrs) {
-    push (@toks, $self->_tokenize_mail_addrs ($_));
+  foreach my $addr (@addrs) {
+    if (defined $addr->{phrase}) {
+      foreach (split(/\s+/, $addr->{phrase})) {
+        push @toks, "N*".$_; # Bug 6319
+      }
+    }
+    if (defined $addr->{address}) {
+      push @toks, $self->_tokenize_mail_addrs($addr->{address});
+    }
  }
  return join (' ', @toks);
 }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/BodyEval.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/BodyEval.pm
@ -39,16 +39,16 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule("multipart_alternative_difference");
-  $self->register_eval_rule("multipart_alternative_difference_count");
-  $self->register_eval_rule("check_blank_line_ratio");
-  $self->register_eval_rule("tvd_vertical_words");
-  $self->register_eval_rule("check_stock_info");
-  $self->register_eval_rule("check_body_length");
+  $self->register_eval_rule("multipart_alternative_difference", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("multipart_alternative_difference_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_blank_line_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("tvd_vertical_words", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_stock_info", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_body_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

-  $self->register_eval_rule("plaintext_body_length");
-  $self->register_eval_rule("plaintext_sig_length");
-  $self->register_eval_rule("plaintext_body_sig_ratio");
+  $self->register_eval_rule("plaintext_body_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("plaintext_sig_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("plaintext_body_sig_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  return $self;
 }
@ -69,7 +69,7 @@ sub multipart_alternative_difference_count {
  my ($self, $pms, $fulltext, $ratio, $minhtml) = @_;
  $self->_multipart_alternative_difference($pms) unless (exists $pms->{madiff});
  return 0 unless $pms->{madiff_html} > $minhtml;
-  return(($pms->{madiff_text} / $pms->{madiff_html}) > $ratio);
+  return (($pms->{madiff_text} / $pms->{madiff_html}) > $ratio);
 }

 sub _multipart_alternative_difference {
@ -81,7 +81,7 @@ sub _multipart_alternative_difference {
  my $msg = $pms->{msg};

  # Find all multipart/alternative parts in the message
-  my @ma = $msg->find_parts(qr@^multipart/alternative\b@i);
+  my @ma = $msg->find_parts(qr@^multipart/alternative\b@);

  # If there are no multipart/alternative sections, skip this test.
  return if (!@ma);
@ -104,7 +104,7 @@ sub _multipart_alternative_difference {
    my %text;

    # limit our search to text-based parts
-    my @txt = $part->find_parts(qr@^text\b@i);
+    my @txt = $part->find_parts(qr@^text\b@);
    foreach my $text (@txt) {
      # we only care about the rendered version of the part
      my ($type, $rnd) = $text->rendered();
@ -123,7 +123,7 @@ sub _multipart_alternative_difference {
        }

 	# If there are no words, mark if there's at least 1 image ...
-	if (!%html && exists $pms->{html}{inside}{img}) {
+	if (!%html && exists $text->{html_results}{inside}{img}) {
 	  # Use "\n" as the mark since it can't ever occur normally
 	  $html{"\n"}=1;
 	}
@ -222,7 +222,7 @@ sub tvd_vertical_words {
  }

  dbg("eval: tvd_vertical_words value: $pms->{tvd_vertical_words} / min: $min / max: $max - value must be >= min and < max");
-  return 1 if ($pms->{tvd_vertical_words} >= $min && $pms->{tvd_vertical_words} < $max);
+  return ($pms->{tvd_vertical_words} >= $min && $pms->{tvd_vertical_words} < $max);
 }

 sub check_stock_info {
@ -241,7 +241,7 @@ sub _check_stock_info {
  $pms->{stock_info} = 0;

  # Find all multipart/alternative parts in the message
-  my @parts = $pms->{msg}->find_parts(qr@^text/plain$@i);
+  my @parts = $pms->{msg}->find_parts(qr@^text/plain$@);
  return if (!@parts);

  # Go through each of the multipart parts
@ -360,11 +360,17 @@ sub _plaintext_body_sig_ratio {

  # Find the last occurence of a signature delimiter and get the body and
  # signature lengths.
-  my ($len_b, $len_s) = map { length } $text =~ /(^|.*\n)-- \n(.*?)$/s;

-  if (! defined $len_b) {     # no sig marker, all body
-      $len_b = length $text;
-      $len_s = 0;
+  my $len_b = length($text);
+  my $len_s = 0;
+
+  while ($text =~ /^-- ?\r?$/mg) {
+
+    # ignore decoy marker at the end
+    next if ( length($text) - $+[0] <= 4 );
+
+    $len_b = $-[0];
+    $len_s = length($text) - $+[0];
  }

  $pms->{plaintext_body_sig_ratio}->{body_length} = $len_b;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
@ -119,6 +119,12 @@ sub extract_set {
    my $nicepri = $pri; $nicepri =~ s/-/neg/g;
    $self->extract_set_pri($conf, $test_set->{$pri}, $ruletype.'_'.$nicepri);
  }
+
+  # Clear extract_hints tmpfile
+  if ($self->{tmpf}) {
+    unlink $self->{tmpf};
+    delete $self->{tmpf};
+  }
 }

 ###########################################################################
@ -190,11 +196,13 @@ NEXT_RULE:
      next NEXT_RULE;
    }

-    # ignore ReplaceTags rules
-    my $is_a_replacetags_rule = $conf->{replace_rules}->{$name};
+    # ignore ReplaceTags rules, and regex capture template rules
+    my $is_a_replace_rule = $conf->{replace_rules}->{$name} ||
+                            $conf->{capture_rules}->{$name} ||
+                            $conf->{capture_template_rules}->{$name};
    my ($minlen, $lossy, @bases);

-    if (!$is_a_replacetags_rule) {
+    if (!$is_a_replace_rule) {
      eval {  # catch die()s
        my ($qr, $mods) = $self->simplify_and_qr_regexp($rule);
        ($lossy, @bases) = $self->extract_hints($rule, $qr, $mods);
@ -202,6 +210,7 @@ NEXT_RULE:
        1;
      } or do {
        my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
+        $eval_stat =~ s/ at .*//s;
        dbg("zoom: giving up on regexp: $eval_stat");
      };

@ -220,9 +229,9 @@ NEXT_RULE:
      }
    }

-    if ($is_a_replacetags_rule || !$minlen || !@bases) {
+    if ($is_a_replace_rule || !$minlen || !@bases) {
      dbg("zoom: ignoring rule %s, %s", $name,
-          $is_a_replacetags_rule ? 'is a replace rule'
+          $is_a_replace_rule ? 'is a replace rule'
          : !@bases ? 'no bases' : 'no minlen');
      push @failed, { orig => $rule };
      $cached->{rule_bases}->{$cachekey} = { };
@ -469,7 +478,7 @@ sub simplify_and_qr_regexp {
  }
  else {
    die "case-i" if $rule =~ /\(\?i\)/;
-    die "case-i" if $mods =~ /i/;
+    die "case-i" if index($mods, 'i') >= 0;

    # always case-i: /A(?i:ct) N(?i:ow)/ => /Act Now/
    $rule =~ s/(?<!\\)\(\?i\:(.*?)\)/$1/gs and die "case-i";
@ -504,10 +513,7 @@ sub simplify_and_qr_regexp {
 }

 sub extract_hints {
-  my $self = shift;
-  my $rawrule = shift;
-  my $rule = shift;
-  my $mods = shift;
+  my ($self, $rawrule, $rule, $mods) = @_;

  my $main = $self->{main};
  my $orig = $rule;
@ -534,24 +540,30 @@ sub extract_hints {
  # r? => (r|)
  $rule =~ s/(?<!\\)(\w)\?/\($1\|\)/gs;

-  my ($tmpf, $tmpfh) = Mail::SpamAssassin::Util::secure_tmpfile();
+  # Create single tmpfile for extract_hints to use, instead of thousands
+  if (!$self->{tmpf}) {
+    ($self->{tmpf}, my $tmpfh) = Mail::SpamAssassin::Util::secure_tmpfile();
    $tmpfh  or die "failed to create a temporary file";
-  untaint_var(\$tmpf);
+    close $tmpfh;
+    $self->{tmpf} = untaint_var($self->{tmpf});
+  }

+  open(my $tmpfh, '>'.$self->{tmpf})
+    or die "error opening $self->{tmpf}: $!";
+  binmode $tmpfh;
  print $tmpfh "use bytes; m{" . $rule . "}" . $mods
-    or die "error writing to $tmpf: $!";
-  close $tmpfh  or die "error closing $tmpf: $!";
+    or die "error writing to $self->{tmpf}: $!";
+  close $tmpfh  or die "error closing $self->{tmpf}: $!";

-  my $perl = $self->get_perl();
+  $self->{perl} = $self->get_perl()  if !exists $self->{perl};
  local *IN;
-  open (IN, "$perl -c -Mre=debug $tmpf 2>&1 |")
-    or die "cannot run $perl: ".exit_status_str($?,$!);
+  open (IN, "$self->{perl} -c -Mre=debug $self->{tmpf} 2>&1 |")
+    or die "cannot run $self->{perl}: ".exit_status_str($?,$!);

  my($inbuf,$nread,$fullstr); $fullstr = '';
  while ( $nread=read(IN,$inbuf,16384) ) { $fullstr .= $inbuf }
  defined $nread  or die "error reading from pipe: $!";

-  unlink $tmpf  or die "cannot unlink $tmpf: $!";
  close IN      or die "error closing pipe: $!";
  defined $fullstr  or warn "empty result from a pipe";

@ -1110,7 +1122,7 @@ sub fixup_re {
        $output .= "\"$esc\"";
      }
    }
-    else {
+    elsif ($fixup_re_test) {
      print "PRE: $pre\nTOK: $tok\n"  or die "error writing: $!";
    }
  }
@ -1132,6 +1144,8 @@ sub fixup_re {
  $output =~ s/\*\*BACKSLASH\*\*/\\\\/gs;

  if ($fixup_re_test) { print "OUTPUT: $output\n"  or die "error writing: $!" }
+
+  utf8::encode($output)  if utf8::is_utf8($output); # force octets
  return $output;
 }

--- a/upstream/lib/Mail/SpamAssassin/Plugin/Check.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Check.pm
--- a/upstream/lib/Mail/SpamAssassin/Plugin/DCC.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/DCC.pm
--- a/upstream/lib/Mail/SpamAssassin/Plugin/DKIM.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/DKIM.pm
@ -30,6 +30,12 @@ Taking into account signatures from any signing domains:
 full   DKIM_VALID_AU         eval:check_dkim_valid_author_sig()
 full   DKIM_VALID_EF         eval:check_dkim_valid_envelopefrom()

+Taking into account ARC signatures (Authenticated Received Chain, RFC 8617)
+from any signing domains:
+
+ full   ARC_SIGNED            eval:check_arc_signed()
+ full   ARC_VALID             eval:check_arc_valid()
+
 Taking into account signatures from specified signing domains only:
 (quotes may be omitted on domain names consisting only of letters, digits,
 dots, and minus characters)
@ -127,6 +133,7 @@ package Mail::SpamAssassin::Plugin::DKIM;
 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Timeout;
+use Mail::SpamAssassin::Util qw(idn_to_ascii);

 use strict;
 use warnings;
@ -145,19 +152,23 @@ sub new {
  bless ($self, $class);

  # signatures
-  $self->register_eval_rule("check_dkim_signed");
-  $self->register_eval_rule("check_dkim_valid");
-  $self->register_eval_rule("check_dkim_valid_author_sig");
-  $self->register_eval_rule("check_dkim_testing");
-  $self->register_eval_rule("check_dkim_valid_envelopefrom");
+  $self->register_eval_rule("check_dkim_signed", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
+  $self->register_eval_rule("check_arc_signed", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
+  $self->register_eval_rule("check_dkim_valid", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
+  $self->register_eval_rule("check_arc_valid", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
+  $self->register_eval_rule("check_dkim_valid_author_sig", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
+  $self->register_eval_rule("check_dkim_testing", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_dkim_valid_envelopefrom", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);

  # author domain signing practices
-  $self->register_eval_rule("check_dkim_adsp");
-  $self->register_eval_rule("check_dkim_dependable");
+  $self->register_eval_rule("check_dkim_adsp", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_dkim_dependable", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);

-  # whitelisting
-  $self->register_eval_rule("check_for_dkim_whitelist_from");
-  $self->register_eval_rule("check_for_def_dkim_whitelist_from");
+  # welcomelisting
+  $self->register_eval_rule("check_for_dkim_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_dkim_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);  #Stub - Remove in SA 4.1
+  $self->register_eval_rule("check_for_def_dkim_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_def_dkim_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);  #Stub - Remove in SA 4.1

  # old names (aliases) for compatibility
  $self->register_eval_rule("check_dkim_verified");  # = check_dkim_valid
@ -179,17 +190,19 @@ sub set_config {

 =over 4

-=item whitelist_from_dkim author@example.com [signing-domain]
+=item welcomelist_from_dkim author@example.com [signing-domain]

-Works similarly to whitelist_from, except that in addition to matching
+Previously whitelist_from_dkim which will work interchangeably until 4.1.
+
+Works similarly to welcomelist_from, except that in addition to matching
 an author address (From) to the pattern in the first parameter, the message
 must also carry a valid Domain Keys Identified Mail (DKIM) signature made by
 a signing domain (SDID, i.e. the d= tag) that is acceptable to us.

-Only one whitelist entry is allowed per line, as in C<whitelist_from_rcvd>.
-Multiple C<whitelist_from_dkim> lines are allowed. File-glob style characters
+Only one welcomelist entry is allowed per line, as in C<welcomelist_from_rcvd>.
+Multiple C<welcomelist_from_dkim> lines are allowed. File-glob style characters
 are allowed for the From address (the first parameter), just like with
-C<whitelist_from_rcvd>.
+C<welcomelist_from_rcvd>.

 The second parameter (the signing-domain) does not accept full file-glob style
 wildcards, although a simple '*.' (or just a '.') prefix to a domain name
@ -201,39 +214,43 @@ will be an Author Domain Signature (sometimes called first-party signature)
 which is a signature where the signing domain (SDID) of a signature matches
 the domain of the author's address (i.e. the address in a From header field).

-Since this whitelist requires a DKIM check to be made, network tests must
+Since this welcomelist requires a DKIM check to be made, network tests must
 be enabled.

-Examples of whitelisting based on an author domain signature (first-party):
+Examples of welcomelisting based on an author domain signature (first-party):

-  whitelist_from_dkim joe@example.com
-  whitelist_from_dkim *@corp.example.com
-  whitelist_from_dkim *@*.example.com
+  welcomelist_from_dkim joe@example.com
+  welcomelist_from_dkim *@corp.example.com
+  welcomelist_from_dkim *@*.example.com

-Examples of whitelisting based on third-party signatures:
+Examples of welcomelisting based on third-party signatures:

-  whitelist_from_dkim jane@example.net      example.org
-  whitelist_from_dkim rick@info.example.net example.net
-  whitelist_from_dkim *@info.example.net    example.net
-  whitelist_from_dkim *@*                   mail7.remailer.example.com
-  whitelist_from_dkim *@*                   *.remailer.example.com
+  welcomelist_from_dkim jane@example.net      example.org
+  welcomelist_from_dkim rick@info.example.net example.net
+  welcomelist_from_dkim *@info.example.net    example.net
+  welcomelist_from_dkim *@*                   mail7.remailer.example.com
+  welcomelist_from_dkim *@*                   *.remailer.example.com

-=item def_whitelist_from_dkim author@example.com [signing-domain]
+=item def_welcomelist_from_dkim author@example.com [signing-domain]

-Same as C<whitelist_from_dkim>, but used for the default whitelist entries
-in the SpamAssassin distribution.  The whitelist score is lower, because
+Previously def_whitelist_from_dkim which will work interchangeably until 4.1.
+
+Same as C<welcomelist_from_dkim>, but used for the default welcomelist entries
+in the SpamAssassin distribution.  The welcomelist score is lower, because
 these are often targets for abuse of public mailers which sign their mail.

-=item unwhitelist_from_dkim author@example.com [signing-domain]
+=item unwelcomelist_from_dkim author@example.com [signing-domain]
+
+Previously unwhitelist_from_dkim which will work interchangeably until 4.1.

 Removes an email address with its corresponding signing-domain field
-from def_whitelist_from_dkim and whitelist_from_dkim tables, if it exists.
-Parameters to unwhitelist_from_dkim must exactly match the parameters of
-a corresponding whitelist_from_dkim or def_whitelist_from_dkim config
+from def_welcomelist_from_dkim and welcomelist_from_dkim tables, if it exists.
+Parameters to unwelcomelist_from_dkim must exactly match the parameters of
+a corresponding welcomelist_from_dkim or def_welcomelist_from_dkim config
 option which created the entry, for it to be removed (a domain name is
 matched case-insensitively);  i.e. if a signing-domain parameter was
-specified in a whitelisting command, it must also be specified in the
-unwhitelisting command.
+specified in a welcomelisting command, it must also be specified in the
+unwelcomelisting command.

 Useful for removing undesired default entries from a distributed configuration
 by a local or site-specific configuration or by C<user_prefs>.
@ -375,7 +392,7 @@ Example:
 =item dkim_minimum_key_bits n             (default: 1024)

 The smallest size of a signing key (in bits) for a valid signature to be
-considered for whitelisting. Additionally, the eval function check_dkim_valid()
+considered for welcomelisting. Additionally, the eval function check_dkim_valid()
 will return false on short keys when called with explicitly listed domains,
 and the eval function check_dkim_valid_author_sig() will return false on short
 keys (regardless of its arguments). Setting the option to 0 disables a key
@ -389,11 +406,13 @@ prepend its own signature on a copy of some third party mail and re-send it,
 which makes it no more trustworthy than without such signature. This is also
 a reason for a rule DKIM_VALID to have a near-zero score, i.e. a rule hit
 is only informational.
+This option is evaluated on ARC signatures checks as well.

 =cut

  push (@cmds, {
-    setting => 'whitelist_from_dkim',
+    setting => 'welcomelist_from_dkim',
+    aliases => ['whitelist_from_dkim'], # removed in 4.1
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
    code => sub {
      my ($self, $key, $value, $line) = @_;
@ -407,13 +426,14 @@ is only informational.
      my $address = $1;
      my $sdid = defined $2 ? $2 : '';  # empty implies author domain signature
      $address =~ s/(\@[^@]*)\z/lc($1)/e;  # lowercase the email address domain
-      $self->{parser}->add_to_addrlist_dkim('whitelist_from_dkim',
+      $self->{parser}->add_to_addrlist_dkim('welcomelist_from_dkim',
                                            $address, lc $sdid);
    }
  });

  push (@cmds, {
-    setting => 'def_whitelist_from_dkim',
+    setting => 'def_welcomelist_from_dkim',
+    aliases => ['def_whitelist_from_dkim'], # removed in 4.1
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
    code => sub {
      my ($self, $key, $value, $line) = @_;
@ -427,13 +447,14 @@ is only informational.
      my $address = $1;
      my $sdid = defined $2 ? $2 : '';  # empty implies author domain signature
      $address =~ s/(\@[^@]*)\z/lc($1)/e;  # lowercase the email address domain
-      $self->{parser}->add_to_addrlist_dkim('def_whitelist_from_dkim',
+      $self->{parser}->add_to_addrlist_dkim('def_welcomelist_from_dkim',
                                            $address, lc $sdid);
    }
  });

  push (@cmds, {
-    setting => 'unwhitelist_from_dkim',
+    setting => 'unwelcomelist_from_dkim',
+    aliases => ['unwhitelist_from_dkim'], # removed in 4.1
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
    code => sub {
      my ($self, $key, $value, $line) = @_;
@ -447,9 +468,9 @@ is only informational.
      my $address = $1;
      my $sdid = defined $2 ? $2 : '';  # empty implies author domain signature
      $address =~ s/(\@[^@]*)\z/lc($1)/e;  # lowercase the email address domain
-      $self->{parser}->remove_from_addrlist_dkim('whitelist_from_dkim',
+      $self->{parser}->remove_from_addrlist_dkim('welcomelist_from_dkim',
                                                 $address, lc $sdid);
-      $self->{parser}->remove_from_addrlist_dkim('def_whitelist_from_dkim',
+      $self->{parser}->remove_from_addrlist_dkim('def_welcomelist_from_dkim',
                                                 $address, lc $sdid);
    }
  });
@ -480,7 +501,7 @@ is only informational.
    }
  });

-  # minimal signing key size in bits that is acceptable for whitelisting
+  # minimal signing key size in bits that is acceptable for welcomelisting
  push (@cmds, {
    setting => 'dkim_minimum_key_bits',
    default => 1024,
@ -530,6 +551,18 @@ sub check_dkim_signed {
  return $result;
 }

+sub check_arc_signed {
+  my ($self, $pms, $full_ref, @acceptable_domains) = @_;
+  $self->_check_dkim_signature($pms)  if !$pms->{arc_checked_signature};
+  my $result = 0;
+  if (!$pms->{arc_signed}) {
+    # don't bother
+  } elsif (!@acceptable_domains) {
+    $result = 1;  # no additional constraints, any signing domain will do
+  }
+  return $result;
+}
+
 sub check_dkim_valid {
  my ($self, $pms, $full_ref, @acceptable_domains) = @_;
  $self->_check_dkim_signature($pms)  if !$pms->{dkim_checked_signature};
@ -545,6 +578,19 @@ sub check_dkim_valid {
  return $result;
 }

+sub check_arc_valid {
+  my ($self, $pms, $full_ref, @acceptable_domains) = @_;
+  $self->_check_dkim_signature($pms)  if !$pms->{arc_checked_signature};
+  my $result = 0;
+  if (!$pms->{arc_valid}) {
+    # don't bother
+  } elsif (!@acceptable_domains) {
+    $result = 1;  # no additional constraints, any signing domain will do,
+                  # also any signing key size will do
+  }
+  return $result;
+}
+
 sub check_dkim_valid_author_sig {
  my ($self, $pms, $full_ref, @acceptable_domains) = @_;
  $self->_check_dkim_signature($pms)  if !$pms->{dkim_checked_signature};
@ -560,9 +606,10 @@ sub check_dkim_valid_author_sig {
 sub check_dkim_valid_envelopefrom {
  my ($self, $pms, $full_ref) = @_;
  my $result = 0;
-  my $envfrom=$self->{'main'}->{'registryboundaries'}->uri_to_domain($pms->get("EnvelopeFrom"));
+  my ($envfrom) = ($pms->get('EnvelopeFrom:addr')||'') =~ /\@(\S+)/;
  # if no envelopeFrom, it cannot be valid
-  return $result if !$envfrom;
+  return $result if !defined $envfrom;
+  $envfrom = lc $envfrom;
  $self->_check_dkim_signature($pms)  if !$pms->{dkim_checked_signature};
  if (!$pms->{dkim_valid}) {
    # don't bother
@ -646,19 +693,21 @@ sub check_dkim_testing {
  return $result;
 }

-sub check_for_dkim_whitelist_from {
+sub check_for_dkim_welcomelist_from {
  my ($self, $pms) = @_;
-  $self->_check_dkim_whitelist($pms)  if !$pms->{whitelist_checked};
-  return $pms->{dkim_match_in_whitelist_from_dkim} || 
-         $pms->{dkim_match_in_whitelist_auth};
+  $self->_check_dkim_welcomelist($pms)  if !$pms->{welcomelist_checked};
+  return ($pms->{dkim_match_in_welcomelist_from_dkim} || 
+          $pms->{dkim_match_in_welcomelist_auth}) ? 1 : 0;
 }
+*check_for_dkim_whitelist_from = \&check_for_dkim_welcomelist_from; # removed in 4.1

-sub check_for_def_dkim_whitelist_from {
+sub check_for_def_dkim_welcomelist_from {
  my ($self, $pms) = @_;
-  $self->_check_dkim_whitelist($pms)  if !$pms->{whitelist_checked};
-  return $pms->{dkim_match_in_def_whitelist_from_dkim} || 
-         $pms->{dkim_match_in_def_whitelist_auth};
+  $self->_check_dkim_welcomelist($pms)  if !$pms->{welcomelist_checked};
+  return ($pms->{dkim_match_in_def_welcomelist_from_dkim} || 
+         $pms->{dkim_match_in_def_welcomelist_auth}) ? 1 : 0;
 }
+*check_for_def_dkim_whitelist_from = \&check_for_def_dkim_welcomelist_from; # removed in 4.1

 # ---------------------------------------------------------------------------

@ -667,8 +716,7 @@ sub _dkim_load_modules {

  if (!$self->{tried_loading}) {
    $self->{service_available} = 0;
-    my $timemethod = $self->{main}->UNIVERSAL::can("time_method") &&
-                     $self->{main}->time_method("dkim_load_modules");
+    my $timemethod = $self->{main}->time_method("dkim_load_modules");
    my $eval_stat;
    eval {
      # Have to do this so that RPM doesn't find these as required perl modules.
@ -686,6 +734,9 @@ sub _dkim_load_modules {
      my $version = Mail::DKIM::Verifier->VERSION;
      if (version->parse($version) >= version->parse(0.31)) {
        dbg("dkim: using Mail::DKIM version $version");
+      } elsif (version->parse($version) < version->parse(0.50)) {
+        dbg("dkim: Mail::DKIM $version is older than 0.50 ".
+             "ARC support will not be available, suggested upgrade to 0.50 or later!");
      } else {
        info("dkim: Mail::DKIM $version is older than the required ".
             "minimal version 0.31, suggested upgrade to 0.37 or later!");
@ -698,6 +749,18 @@ sub _dkim_load_modules {
        eval { require Mail::DKIM::DkimPolicy }  # ignoring status
      }
    }
+    eval {
+      # Have to do this so that RPM doesn't find these as required perl modules.
+      { require Mail::DKIM::ARC::Verifier }
+      $self->{arc_available} = 1;
+    } or do {
+      $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
+      if (defined $eval_stat) {
+        dbg("dkim: cannot load Mail::DKIM::ARC module, DKIM::ARC checks disabled: %s",
+          $eval_stat);
+      }
+      $self->{arc_available} = 0;
+    };
  }
  return $self->{service_available};
 }
@ -720,8 +783,8 @@ sub _check_dkim_signed_by {
      next if $minimum_key_bits && $sig->{_spamassassin_key_size} &&
              $sig->{_spamassassin_key_size} < $minimum_key_bits;
    }
-    my $sdid = $sig->domain;
-    next if !defined $sdid;  # a signature with a missing required tag 'd' ?
+    my ($sdid) = (defined $sig->identity)? $sig->identity =~ /\@(\S+)/ : ($sig->domain);
+    next if !defined $sdid;  # a signature with a missing required tag 'd' or 'i' ?
    $sdid = lc $sdid;
    if ($must_be_author_domain_signature) {
      next if !$pms->{dkim_author_domains}->{$sdid};
@ -765,9 +828,10 @@ sub _check_dkim_signature {
  my ($self, $pms) = @_;

  my $conf = $pms->{conf};
-  my($verifier, @signatures, @valid_signatures);
+  my($verifier, $arc_verifier, @signatures, @arc_signatures, @valid_signatures, @arc_valid_signatures);

  $pms->{dkim_checked_signature} = 1; # has this sub already been invoked?
+  $pms->{arc_checked_signature} = 1;  # has this sub already been invoked?
  $pms->{dkim_signatures_ready} = 0;  # have we obtained & verified signatures?
  $pms->{dkim_signatures_dependable} = 0;
  # dkim_signatures_dependable =
@ -776,16 +840,18 @@ sub _check_dkim_signature {
  #     (no signatures, or message was not truncated) )
  $pms->{dkim_signatures} = \@signatures;
  $pms->{dkim_valid_signatures} = \@valid_signatures;
+  $pms->{arc_signatures} = \@arc_signatures;
+  $pms->{arc_valid_signatures} = \@arc_valid_signatures;
  $pms->{dkim_signed} = 0;
+  $pms->{arc_signed} = 0;
  $pms->{dkim_valid} = 0;
+  $pms->{arc_valid} = 0;
  $pms->{dkim_key_testing} = 0;
  # the following hashes are keyed by a signing domain (SDID):
  $pms->{dkim_author_sig_tempfailed} = {}; # DNS timeout verifying author sign.
  $pms->{dkim_has_valid_author_sig} = {};  # a valid author domain signature
  $pms->{dkim_has_any_author_sig} = {};  # valid or invalid author domain sign.

-  $self->_get_authors($pms)  if !$pms->{dkim_author_addresses};
-
  my $suppl_attrib = $pms->{msg}->{suppl_attrib};
  if (defined $suppl_attrib && exists $suppl_attrib->{dkim_signatures}) {
    # caller of SpamAssassin already supplied DKIM signature objects
@ -793,48 +859,88 @@ sub _check_dkim_signature {
    @signatures = @$provided_signatures  if ref $provided_signatures;
    $pms->{dkim_signatures_ready} = 1;
    $pms->{dkim_signatures_dependable} = 1;
-    dbg("dkim: signatures provided by the caller, %d signatures",
+    dbg("dkim: DKIM signatures provided by the caller, %d signatures",
        scalar(@signatures));
  }
+  if (defined $suppl_attrib && exists $suppl_attrib->{arc_signatures}) {
+    # caller of SpamAssassin already supplied ARC signature objects
+    my $provided_arc_signatures = $suppl_attrib->{arc_signatures};
+    @arc_signatures = @$provided_arc_signatures  if ref $provided_arc_signatures;
+    $pms->{arc_signatures_ready} = 1;
+    $pms->{arc_signatures_dependable} = 1;
+    dbg("dkim: ARC signatures provided by the caller, %d signatures",
+        scalar(@arc_signatures));
+  }

-  if ($pms->{dkim_signatures_ready}) {
+  if ($pms->{dkim_signatures_ready} or $pms->{arc_signatures_ready}) {
    # signatures already available and verified
+    _check_valid_signature($self, $pms, $verifier, 'DKIM', \@signatures) if $self->{service_available};
+    _check_valid_signature($self, $pms, $arc_verifier, 'ARC', \@arc_signatures) if $self->{arc_available};
  } elsif (!$pms->is_dns_available()) {
    dbg("dkim: signature verification disabled, DNS resolving not available");
  } elsif (!$self->_dkim_load_modules()) {
    # Mail::DKIM module not available
  } else {
    # signature objects not provided by the caller, must verify for ourselves
-    my $timemethod = $self->{main}->UNIVERSAL::can("time_method") &&
-                     $self->{main}->time_method("check_dkim_signature");
-    use version 0.77;
-    if (version->parse(Mail::DKIM::Verifier->VERSION) >= version->parse(0.40)) {
+    my $timemethod = $self->{main}->time_method("check_dkim_signature");
+    if (Mail::DKIM::Verifier->VERSION >= 0.40) {
      my $edns = $conf->{dns_options}->{edns};
      if ($edns && $edns >= 1024) {
        # Let Mail::DKIM use our interface to Net::DNS::Resolver.
        # Only do so if EDNS0 provides a reasonably-sized UDP payload size,
        # as our interface does not provide a DNS fallback to TCP, unlike
        # the Net::DNS::Resolver::send which does provide it.
+        # See also Bug 7265 regarding a choice of a resolver.
+      # my $res = $self->{main}->{resolver}->get_resolver;
        my $res = $self->{main}->{resolver};
        dbg("dkim: providing our own resolver: %s", ref $res);
        Mail::DKIM::DNS::resolver($res);
      }
    }
-    $verifier = Mail::DKIM::Verifier->new;
-    if (!$verifier) {
-      dbg("dkim: cannot create Mail::DKIM::Verifier object");
-      return;
+    $verifier = Mail::DKIM::Verifier->new if $self->{service_available};
+    _check_signature($self, $pms, $verifier, 'DKIM', \@signatures) if $self->{service_available};
+    $arc_verifier = Mail::DKIM::ARC::Verifier->new if $self->{arc_available};
+    _check_signature($self, $pms, $arc_verifier, 'ARC', \@arc_signatures) if $self->{arc_available};
  }
+}
+
+sub _check_signature {
+  my($self, $pms, $verifier, $type, $signatures) = @_;
+
+  my $sig_type = lc $type;
+  $self->_get_authors($pms)  if !$pms->{"${sig_type}_author_addresses"};
+
+  my(@valid_signatures);
+  my $conf = $pms->{conf};
+  if (!$verifier) {
+    if ($type eq 'DKIM') {
+      dbg("dkim: cannot create Mail::DKIM::Verifier object");
+    } elsif ($type eq 'ARC') {
+      dbg("dkim: cannot create Mail::DKIM::ARC::Verifier object");
+    }
+    return;
+  } else {
+    if ($type eq 'DKIM') {
      $pms->{dkim_verifier} = $verifier;
-    #
+    } elsif ($type eq 'ARC') {
+      $pms->{arc_verifier} = $verifier;
+    }
+  }
  # feed content of a message into verifier, using \r\n endings,
  # required by Mail::DKIM API (see bug 5300)
  # note: bug 5179 comment 28: perl does silly things on non-Unix platforms
  # unless we use \015\012 instead of \r\n
  eval {
    my $str = $pms->{msg}->get_pristine();
-      $str =~ s/\r?\n/\015\012/sg;  # ensure \015\012 ending
+    if ($pms->{msg}->{line_ending} eq "\015\012") {
+      # message already CRLF, just feed it
      $verifier->PRINT($str);
+    } else {
+      # feeding large chunk to Mail::DKIM is _much_ faster than line-by-line
+      $str =~ s/\012/\015\012/gs; # LF -> CRLF
+      $verifier->PRINT($str);
+      undef $str;
+    }
    1;
  } or do {  # intercept die() exceptions and render safe
    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
@ -847,7 +953,7 @@ sub _check_dkim_signature {
                { secs => $timeout, deadline => $pms->{master_deadline} });

  my $err = $timer->run_and_catch(sub {
-      dbg("dkim: performing public key lookup and signature verification");
+    dbg("dkim: performing public $type key lookup and signature verification");
    $verifier->CLOSE();  # the action happens here

    # currently SpamAssassin's parsing is better than Mail::Address parsing,
@ -856,8 +962,17 @@ sub _check_dkim_signature {

    # versions before 0.29 only provided a public interface to fetch one
    # signature, newer versions allow access to all signatures of a message
-      @signatures = $verifier->UNIVERSAL::can("signatures") ?
+    @$signatures = $verifier->UNIVERSAL::can("signatures") ?
                               $verifier->signatures : $verifier->signature;
+    if (would_log("dbg","dkim")) {
+      foreach my $signature (@$signatures) {
+        dbg("dkim: $type signature i=%s d=%s",
+          map(!defined $_ ? '(undef)' : $_,
+            $signature->identity, $signature->domain
+          )
+        );
+      }
+    }
  });
  if ($timer->timed_out()) {
    dbg("dkim: public key lookup or verification timed out after %s s",
@ -867,23 +982,43 @@ sub _check_dkim_signature {

  } elsif ($err) {
    chomp $err;
-      dbg("dkim: public key lookup or verification failed: $err");
+    dbg("dkim: $type public key lookup or verification failed: $err");
  }
+  if ($type eq 'DKIM') {
    $pms->{dkim_signatures_ready} = 1;
-    if (!@signatures || !$pms->{tests_already_hit}->{'__TRUNCATED'}) {
+    if (!@$signatures || !$pms->{tests_already_hit}->{'__TRUNCATED'}) {
      $pms->{dkim_signatures_dependable} = 1;
    }
+    _check_valid_signature($self, $pms, $verifier, 'DKIM', \@$signatures) if $self->{service_available};
+  } elsif ($type eq 'ARC') {
+    $pms->{arc_signatures_ready} = 1;
+    if (!@$signatures || !$pms->{tests_already_hit}->{'__TRUNCATED'}) {
+      $pms->{arc_signatures_dependable} = 1;
    }
+    _check_valid_signature($self, $pms, $verifier, 'ARC', \@$signatures) if $self->{arc_available};
+  }
+}

-  if ($pms->{dkim_signatures_ready}) {
+sub _check_valid_signature {
+  my($self, $pms, $verifier, $type, $signatures) = @_;
+
+  my $sig_type = lc $type;
+  $self->_get_authors($pms)  if !$pms->{"${sig_type}_author_addresses"};
+
+  my(@valid_signatures);
+  my $conf = $pms->{conf};
+  # DKIM signatures check
+  if ($pms->{"${sig_type}_signatures_ready"}) {
    my $sig_result_supported;
+    # dkim_minimum_key_bits is evaluated for ARC signatures as well
    my $minimum_key_bits = $conf->{dkim_minimum_key_bits};
-    foreach my $signature (@signatures) {
+    foreach my $signature (@$signatures) {
      # old versions of Mail::DKIM would give undef for an invalid signature
      next if !defined $signature;
-      next if !$signature->selector; # empty selector
-
      $sig_result_supported = $signature->UNIVERSAL::can("result_detail");
+      # test for empty selector (must not treat a selector "0" as missing!)
+      next if !defined $signature->selector || $signature->selector eq "";
+
      my($info, $valid, $expired);
      $valid =
        ($sig_result_supported ? $signature : $verifier)->result eq 'pass';
@ -904,23 +1039,24 @@ sub _check_dkim_signature {
      push(@valid_signatures, $signature)  if $valid && !$expired;

      # check if we have a potential Author Domain Signature, valid or not
-      my $d = $signature->domain;
+      my ($d) = (defined $signature->identity)? $signature->identity =~ /\@(\S+)/ : ($signature->domain);
      if (!defined $d) {
        # can be undefined on a broken signature with missing required tags
      } else {
        $d = lc $d;
-        if ($pms->{dkim_author_domains}->{$d}) {  # SDID matches author domain
-          $pms->{dkim_has_any_author_sig}->{$d} = 1;
+        if ($pms->{"${sig_type}_author_domains"}->{$d}) {  # SDID matches author domain
+          $pms->{"${sig_type}_has_any_author_sig"}->{$d} = 1;
          if ($valid && !$expired &&
              $key_size && $key_size >= $minimum_key_bits) {
-            $pms->{dkim_has_valid_author_sig}->{$d} = 1;
+            $pms->{"${sig_type}_has_valid_author_sig"}->{$d} = 1;
          } elsif ( ($sig_result_supported ? $signature
                                           : $verifier)->result_detail
                   =~ /\b(?:timed out|SERVFAIL)\b/i) {
-            $pms->{dkim_author_sig_tempfailed}->{$d} = 1;
+            $pms->{"${sig_type}_author_sig_tempfailed"}->{$d} = 1;
          }
        }
      }
+      if ($type eq 'DKIM') {
        if (would_log("dbg","dkim")) {
          dbg("dkim: %s %s, i=%s, d=%s, s=%s, a=%s, c=%s, %s, %s, %s",
            $info,
@ -935,14 +1071,28 @@ sub _check_dkim_signature {
              : 'does not match author domain',
          );
        }
+      } elsif ($type eq 'ARC') {
+        if (would_log("dbg","dkim")) {
+          dbg("dkim: %s %s, i=%s, d=%s, s=%s, a=%s, c=%s, %s, %s, %s",
+            $info,
+            $type,
+            map(!defined $_ ? '(undef)' : $_,
+              $signature->identity, $d, $signature->selector,
+              $signature->algorithm, scalar($signature->canonicalization),
+              $key_size ? "key_bits=$key_size" : "unknown key size",
+              ($sig_result_supported ? $signature : $verifier)->result ),
+            defined $d && $pms->{arc_author_domains}->{$d}
+              ? 'matches author domain'
+              : 'does not match author domain',
+          );
        }
+      }
+    }
+
    if (@valid_signatures) {
+      if ($type eq 'DKIM') {
        $pms->{dkim_signed} = 1;
        $pms->{dkim_valid} = 1;
-      # let the result stand out more clearly in the log, use uppercase
-      my $sig = $valid_signatures[0];
-      my $sig_res = ($sig_result_supported ? $sig : $verifier)->result_detail;
-      dbg("dkim: signature verification result: %s", uc($sig_res));

        # supply values for both tags
        my(%seen1, %seen2, %seen3, @identity_list, @domain_list, @selector_list);
@ -958,14 +1108,27 @@ sub _check_dkim_signature {
                    @domain_list == 1   ? $domain_list[0]   : \@domain_list);
        $pms->set_tag('DKIMSELECTOR',
                    @selector_list == 1 ? $selector_list[0] : \@selector_list);
-    } elsif (@signatures) {
+      } elsif ($type eq 'ARC') {
+        $pms->{arc_signed} = 1;
+        $pms->{arc_valid} = 1;
+      }
+      # let the result stand out more clearly in the log, use uppercase
+      my $sig = $valid_signatures[0];
+      my $sig_res = ($sig_result_supported ? $sig : $verifier)->result_detail;
+      dbg("dkim: $type signature verification result: %s", uc($sig_res));
+
+    } elsif (@$signatures) {
+      if ($type eq 'DKIM') {
        $pms->{dkim_signed} = 1;
-      my $sig = $signatures[0];
-      my $sig_res =
-        ($sig_result_supported && $sig ? $sig : $verifier)->result_detail;
-      dbg("dkim: signature verification result: %s", uc($sig_res));
+      } elsif ($type eq 'ARC') {
+        $pms->{arc_signed} = 1;
+      }
+      my $sig = @$signatures[0];
+      my $sig_res = ($sig_result_supported ? $sig : $verifier)->result_detail;
+      dbg("dkim: $type signature verification result: %s", uc($sig_res));
+
    } else {
-      dbg("dkim: signature verification result: none");
+      dbg("dkim: $type signature verification result: none");
    }
  }
 }
@ -1064,8 +1227,7 @@ sub _check_dkim_adsp {
          dbg("dkim: adsp not retrieved, module Mail::DKIM not available");

        } else {  # do the ADSP DNS lookup
-          my $timemethod = $self->{main}->UNIVERSAL::can("time_method") &&
-                           $self->{main}->time_method("check_dkim_adsp");
+          my $timemethod = $self->{main}->time_method("check_dkim_adsp");

          my $practices;  # author domain signing practices object
          my $timeout = $pms->{conf}->{dkim_timeout};
@ -1074,12 +1236,13 @@ sub _check_dkim_adsp {
          my $err = $timer->run_and_catch(sub {
            eval {
              if (Mail::DKIM::AuthorDomainPolicy->UNIVERSAL::can("fetch")) {
+                my $author_domain_ace = idn_to_ascii($author_domain);
                dbg("dkim: adsp: performing lookup on _adsp._domainkey.%s",
-                    $author_domain);
+                    $author_domain_ace);
                # get our Net::DNS::Resolver object
                my $res = $self->{main}->{resolver}->get_resolver;
                $practices = Mail::DKIM::AuthorDomainPolicy->fetch(
-                               Protocol => "dns", Domain => $author_domain,
+                               Protocol => "dns", Domain => $author_domain_ace,
                               DnsResolver => $res);
              }
              1;
@ -1128,36 +1291,36 @@ sub _check_dkim_adsp {
  }
 }

-sub _check_dkim_whitelist {
+sub _check_dkim_welcomelist {
  my ($self, $pms) = @_;

-  $pms->{whitelist_checked} = 1;
+  $pms->{welcomelist_checked} = 1;

  $self->_get_authors($pms)  if !$pms->{dkim_author_addresses};

  my $authors_str = join(", ", @{$pms->{dkim_author_addresses}});
  if ($authors_str eq '') {
-    dbg("dkim: check_dkim_whitelist: could not find author address");
+    dbg("dkim: check_dkim_weclomelist: could not find author address");
    return;
  }

-  # collect whitelist entries matching the author from all lists
+  # collect welcomelist entries matching the author from all lists
  my @acceptable_sdid_tuples;
  $self->_wlcheck_acceptable_signature($pms, \@acceptable_sdid_tuples,
-                                       'def_whitelist_from_dkim');
+                                       'def_welcomelist_from_dkim');
  $self->_wlcheck_author_signature($pms, \@acceptable_sdid_tuples,
-                                       'def_whitelist_auth');
+                                       'def_welcomelist_auth');
  $self->_wlcheck_acceptable_signature($pms, \@acceptable_sdid_tuples,
-                                       'whitelist_from_dkim');
+                                       'welcomelist_from_dkim');
  $self->_wlcheck_author_signature($pms, \@acceptable_sdid_tuples,
-                                       'whitelist_auth');
+                                       'welcomelist_auth');
  if (!@acceptable_sdid_tuples) {
    dbg("dkim: no wl entries match author %s, no need to verify sigs",
        $authors_str);
    return;
  }

-  # if the message doesn't pass DKIM validation, it can't pass DKIM whitelist
+  # if the message doesn't pass DKIM validation, it can't pass DKIM welcomelist

  # trigger a DKIM check;
  # continue if one or more signatures are valid or we want the debug info
@ -1177,38 +1340,38 @@ sub _check_dkim_whitelist {
    }
  }
  if (@valid) {
-    dbg("dkim: author %s, WHITELISTED by %s",
+    dbg("dkim: author %s, WELCOMELISTED by %s",
        $authors_str, join(", ",@valid));
  } elsif (@fail) {
    dbg("dkim: author %s, found in %s BUT IGNORED",
        $authors_str, join(", ",@fail));
  } else {
-    dbg("dkim: author %s, not in any dkim whitelist", $authors_str);
+    dbg("dkim: author %s, not in any dkim welcomelist", $authors_str);
  }
 }

 # check for verifier-acceptable signatures; an empty (or undefined) signing
-# domain in a whitelist implies checking for an Author Domain Signature
+# domain in a welcomelist implies checking for an Author Domain Signature
 #
 sub _wlcheck_acceptable_signature {
  my ($self, $pms, $acceptable_sdid_tuples_ref, $wl) = @_;
  my $wl_ref = $pms->{conf}->{$wl};
  foreach my $author (@{$pms->{dkim_author_addresses}}) {
-    foreach my $white_addr (keys %$wl_ref) {
-      my $wl_addr_ref = $wl_ref->{$white_addr};
-      my $re = qr/$wl_addr_ref->{re}/i;
-    # dbg("dkim: WL %s %s, d: %s", $wl, $white_addr,
+    my $author_lc = lc($author);
+    foreach my $welcome_addr (keys %$wl_ref) {
+      my $wl_addr_ref = $wl_ref->{$welcome_addr};
+    # dbg("dkim: WL %s %s, d: %s", $wl, $welcome_addr,
    #     join(", ", map { $_ eq '' ? "''" : $_ } @{$wl_addr_ref->{domain}}));
-      if ($author =~ $re) {
+      if ($author_lc =~ /$wl_addr_ref->{re}/) {
        foreach my $sdid (@{$wl_addr_ref->{domain}}) {
-          push(@$acceptable_sdid_tuples_ref, [$author,$sdid,$wl,$re]);
+          push(@$acceptable_sdid_tuples_ref, [$author,$sdid,$wl,$welcome_addr]);
        }
      }
    }
  }
 }

-# use a traditional whitelist_from -style addrlist, the only acceptable DKIM
+# use a traditional welcomelist_from -style addrlist, the only acceptable DKIM
 # signature is an Author Domain Signature.  Note: don't pre-parse and store
 # domains; that's inefficient memory-wise and only saves one m//
 #
@ -1216,11 +1379,11 @@ sub _wlcheck_author_signature {
  my ($self, $pms, $acceptable_sdid_tuples_ref, $wl) = @_;
  my $wl_ref = $pms->{conf}->{$wl};
  foreach my $author (@{$pms->{dkim_author_addresses}}) {
-    foreach my $white_addr (keys %$wl_ref) {
-      my $re = qr/$wl_ref->{$white_addr}/i;
-    # dbg("dkim: WL %s %s", $wl, $white_addr);
-      if ($author =~ $re) {
-        push(@$acceptable_sdid_tuples_ref, [$author,undef,$wl,$re]);
+    my $author_lc = lc($author);
+    foreach my $welcome_addr (keys %$wl_ref) {
+    # dbg("dkim: WL %s %s", $wl, $welcome_addr);
+      if ($author_lc =~ /$wl_ref->{$welcome_addr}/) {
+        push(@$acceptable_sdid_tuples_ref, [$author,undef,$wl,$welcome_addr]);
      }
    }
  }
@ -1238,9 +1401,10 @@ sub _wlcheck_list {
  foreach my $signature (@{$pms->{dkim_signatures}}) {
    # old versions of Mail::DKIM would give undef for an invalid signature
    next if !defined $signature;
-    next if !$signature->selector; # empty selector
-
    my $sig_result_supported = $signature->UNIVERSAL::can("result_detail");
+    # test for empty selector (must not treat a selector "0" as missing!)
+    next if !defined $signature->selector || $signature->selector eq "";
+
    my($info, $valid, $expired, $key_size_weak);
    $valid =
      ($sig_result_supported ? $signature : $verifier)->result eq 'pass';
@ -1256,13 +1420,13 @@ sub _wlcheck_list {
      }
    }

-    my $sdid = $signature->domain;
+    my ($sdid) = (defined $signature->identity)? $signature->identity =~ /\@(\S+)/ : ($signature->domain);
    $sdid = lc $sdid  if defined $sdid;

    my %tried_authors;
    foreach my $entry (@$acceptable_sdid_tuples_ref) {
-      my($author, $acceptable_sdid, $wl, $re) = @$entry;
-      # $re and $wl are here for logging purposes only, $re already checked.
+      my($author, $acceptable_sdid, $wl, $welcome_addr) = @$entry;
+      # $welcome_addr and $wl are here for logging purposes only, already checked.
      # The $acceptable_sdid is a verifier-acceptable signing domain
      # identifier (to be matched against a 'd' tag in signatures).
      # When $acceptable_sdid is undef or an empty string it implies
@ -1274,7 +1438,7 @@ sub _wlcheck_list {

      my $matches = 0;
      if (!defined $sdid) {
-        # don't bother, invalid signature with a missing 'd' tag
+        # don't bother, invalid signature with a missing 'd' or 'i' tag

      } elsif (!defined $acceptable_sdid || $acceptable_sdid eq '') {
        # An "Author Domain Signature" (sometimes called a first-party
@ -1287,7 +1451,7 @@ sub _wlcheck_list {
        $matches = 1  if $sdid eq $author_domain;

      } else {  # checking for verifier-acceptable signature
-        # The second argument to a 'whitelist_from_dkim' option is now (since
+        # The second argument to a 'welcomelist_from_dkim' option is now (since
        # version 3.3.0) supposed to be a signing domain (SDID), no longer an
        # identity (AUID). Nevertheless, be prepared to accept the full e-mail
        # address there for compatibility, and just ignore its local-part.
@ -1303,17 +1467,17 @@ sub _wlcheck_list {
        if (would_log("dbg","dkim")) {
          if ($sdid eq $author_domain) {
            dbg("dkim: %s author domain signature by %s, MATCHES %s %s",
-                $info, $sdid, $wl, $re);
+                $info, $sdid, $wl, $welcome_addr);
          } else {
            dbg("dkim: %s third-party signature by %s, author domain %s, ".
-                "MATCHES %s %s", $info, $sdid, $author_domain, $wl, $re);
+                "MATCHES %s %s", $info, $sdid, $author_domain, $wl, $welcome_addr);
          }
        }
        # a defined value indicates at least a match, not necessarily valid
        # (this complication servers to preserve logging compatibility)
        $any_match_by_wl{$wl} = ''  if !exists $any_match_by_wl{$wl};
      }
-      # only valid signature can cause whitelisting
+      # only valid signature can cause welcomelisting
      $matches = 0  if !$valid || $expired || $key_size_weak;

      if ($matches) {
@ -1328,4 +1492,7 @@ sub _wlcheck_list {
  return ($any_match_at_all, \%any_match_by_wl);
 }

+# Version features
+sub has_arc { 1 }
+
 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/DMARC.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/DMARC.pm
@ -0,0 +1,360 @@
+# <@LICENSE>
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at:
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# </@LICENSE>
+#
+# Author: Giovanni Bechis <gbechis@apache.org>
+
+=head1 NAME
+
+Mail::SpamAssassin::Plugin::DMARC - check DMARC policy
+
+=head1 SYNOPSIS
+
+  loadplugin Mail::SpamAssassin::Plugin::DMARC
+
+  ifplugin Mail::SpamAssassin::Plugin::DMARC
+    header DMARC_PASS eval:check_dmarc_pass()
+    describe DMARC_PASS DMARC pass policy
+    tflags DMARC_PASS net nice
+    score DMARC_PASS -0.001
+
+    header DMARC_REJECT eval:check_dmarc_reject()
+    describe DMARC_REJECT DMARC reject policy
+    tflags DMARC_REJECT net
+    score DMARC_REJECT 0.001
+
+    header DMARC_QUAR eval:check_dmarc_quarantine()
+    describe DMARC_QUAR DMARC quarantine policy
+    tflags DMARC_QUAR net
+    score DMARC_QUAR 0.001
+
+    header DMARC_NONE eval:check_dmarc_none()
+    describe DMARC_NONE DMARC none policy
+    tflags DMARC_NONE net
+    score DMARC_NONE 0.001
+
+    header DMARC_MISSING eval:check_dmarc_missing()
+    describe DMARC_MISSING Missing DMARC policy
+    tflags DMARC_MISSING net
+    score DMARC_MISSING 0.001
+  endif
+
+=head1 DESCRIPTION
+
+This plugin checks if emails match DMARC policy, the plugin needs both DKIM
+and SPF plugins enabled.
+
+=cut
+
+package Mail::SpamAssassin::Plugin::DMARC;
+
+use strict;
+use warnings;
+use re 'taint';
+
+my $VERSION = 0.2;
+
+use Mail::SpamAssassin;
+use Mail::SpamAssassin::Plugin;
+
+our @ISA = qw(Mail::SpamAssassin::Plugin);
+
+sub dbg { my $msg = shift; Mail::SpamAssassin::Logger::dbg("DMARC: $msg", @_); }
+sub info { my $msg = shift; Mail::SpamAssassin::Logger::info("DMARC: $msg", @_); }
+
+sub new {
+  my ($class, $mailsa) = @_;
+
+  $class = ref($class) || $class;
+  my $self = $class->SUPER::new($mailsa);
+  bless ($self, $class);
+
+  $self->set_config($mailsa->{conf});
+  $self->register_eval_rule("check_dmarc_pass", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_dmarc_reject", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_dmarc_quarantine", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_dmarc_none", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_dmarc_missing", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+
+  return $self;
+}
+
+sub set_config {
+  my ($self, $conf) = @_;
+  my @cmds;
+
+=over 4
+
+=item dmarc_save_reports ( 0 | 1 ) (default: 0)
+
+Store DMARC reports using Mail::DMARC::Store, mail-dmarc.ini must be configured to save and send DMARC reports.
+
+=back
+
+=cut
+
+  push(@cmds, {
+    setting => 'dmarc_save_reports',
+    default => 0,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });
+
+  $conf->{parser}->register_commands(\@cmds);
+}
+
+sub parsed_metadata {
+  my ($self, $opts) = @_;
+  my $pms = $opts->{permsgstatus};
+
+  # Force waiting of SPF and DKIM results
+  $pms->{dmarc_async_queue} = [];
+}
+
+sub _check_eval {
+  my ($self, $pms, $result) = @_;
+
+  if (exists $pms->{dmarc_async_queue}) {
+    my $rulename = $pms->get_current_eval_rule_name();
+    push @{$pms->{dmarc_async_queue}}, sub {
+      if ($result->()) {
+        $pms->got_hit($rulename, '', ruletype => 'header');
+      } else {
+        $pms->rule_ready($rulename);
+      }
+    };
+    return; # return undef for async status
+  }
+
+  $self->_check_dmarc($pms);
+  # make sure not to return undef, as this is not async anymore
+  return $result->() || 0;
+}
+
+sub check_dmarc_pass {
+  my ($self, $pms, $name) = @_;
+
+  my $result = sub {
+    defined $pms->{dmarc_result} &&
+      $pms->{dmarc_result} eq 'pass' &&
+      $pms->{dmarc_policy} ne 'no policy available';
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_dmarc_reject {
+  my ($self, $pms, $name) = @_;
+
+  my $result = sub {
+    defined $pms->{dmarc_result} &&
+      $pms->{dmarc_result} eq 'fail' &&
+      $pms->{dmarc_policy} eq 'reject';
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_dmarc_quarantine {
+  my ($self, $pms, $name) = @_;
+
+  my $result = sub {
+    defined $pms->{dmarc_result} &&
+      $pms->{dmarc_result} eq 'fail' &&
+      $pms->{dmarc_policy} eq 'quarantine';
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_dmarc_none {
+  my ($self, $pms, $name) = @_;
+
+  my $result = sub {
+    defined $pms->{dmarc_result} &&
+      $pms->{dmarc_result} eq 'fail' &&
+      $pms->{dmarc_policy} eq 'none';
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_dmarc_missing {
+  my ($self, $pms, $name) = @_;
+
+  my $result = sub {
+    defined $pms->{dmarc_result} &&
+      $pms->{dmarc_policy} eq 'no policy available';
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_tick {
+  my ($self, $opts) = @_;
+
+  $self->_check_async_queue($opts->{permsgstatus});
+}
+
+sub check_cleanup {
+  my ($self, $opts) = @_;
+
+  # Finish it whether SPF and DKIM is ready or not
+  $self->_check_async_queue($opts->{permsgstatus}, 1);
+}
+
+sub _check_async_queue {
+  my ($self, $pms, $finish) = @_;
+
+  return unless exists $pms->{dmarc_async_queue};
+
+  # Check if SPF or DKIM is ready
+  if ($finish || ($pms->{spf_checked} && $pms->{dkim_checked_signature})) {
+    $self->_check_dmarc($pms);
+    $_->() foreach (@{$pms->{dmarc_async_queue}});
+    # No more async queueing needed.  If any evals are called later, they
+    # will act on the results directly.
+    delete $pms->{dmarc_async_queue};
+  }
+}
+
+sub _check_dmarc {
+  my ($self, $pms, $name) = @_;
+
+  return unless $pms->is_dns_available();
+
+  # Load DMARC module
+  if (!exists $self->{has_mail_dmarc}) {
+    my $eval_stat;
+    eval {
+      require Mail::DMARC::PurePerl;
+    } or do {
+      $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
+    };
+    if (!defined($eval_stat)) {
+      dbg("using Mail::DMARC::PurePerl for DMARC checks");
+      $self->{has_mail_dmarc} = 1;
+    } else {
+      dbg("cannot load Mail::DMARC::PurePerl: module: $eval_stat");
+      dbg("Mail::DMARC::PurePerl is required for DMARC checks, DMARC checks disabled");
+      $self->{has_mail_dmarc} = undef;
+    }
+  }
+
+  return if !$self->{has_mail_dmarc};
+  return if $pms->{dmarc_checked};
+  $pms->{dmarc_checked} = 1;
+
+  my $lasthop = $pms->{relays_external}->[0];
+  if (!defined $lasthop) {
+    dbg("no external relay found, skipping DMARC check");
+    return;
+  }
+
+  my $from_addr = ($pms->get('From:first:addr'))[0];
+  return if not defined $from_addr;
+  return if index($from_addr, '@') == -1;
+
+  my $mfrom_domain = ($pms->get('EnvelopeFrom:first:addr:host'))[0];
+  if (!defined $mfrom_domain) {
+    $mfrom_domain = ($pms->get('From:first:addr:domain'))[0];
+    return if !defined $mfrom_domain;
+    dbg("EnvelopeFrom header not found, using From");
+  }
+
+  my $spf_status = 'none';
+  if ($pms->{spf_pass})         { $spf_status = 'pass'; }
+  elsif ($pms->{spf_fail})      { $spf_status = 'fail'; }
+  elsif ($pms->{spf_permerror}) { $spf_status = 'fail'; }
+  elsif ($pms->{spf_none})      { $spf_status = 'fail'; }
+  elsif ($pms->{spf_neutral})   { $spf_status = 'neutral'; }
+  elsif ($pms->{spf_softfail})  { $spf_status = 'softfail'; }
+
+  my $spf_helo_status = 'none';
+  if ($pms->{spf_helo_pass})         { $spf_helo_status = 'pass'; }
+  elsif ($pms->{spf_helo_fail})      { $spf_helo_status = 'fail'; }
+  elsif ($pms->{spf_helo_permerror}) { $spf_helo_status = 'fail'; }
+  elsif ($pms->{spf_helo_none})      { $spf_helo_status = 'fail'; }
+  elsif ($pms->{spf_helo_neutral})   { $spf_helo_status = 'neutral'; }
+  elsif ($pms->{spf_helo_softfail})  { $spf_helo_status = 'softfail'; }
+
+  my $dmarc = Mail::DMARC::PurePerl->new();
+  $dmarc->source_ip($lasthop->{ip});
+  $dmarc->header_from_raw($from_addr);
+
+  my $suppl_attrib = $pms->{msg}->{suppl_attrib};
+  if (defined $suppl_attrib && exists $suppl_attrib->{dkim_signatures}) {
+    my $dkim_signatures = $suppl_attrib->{dkim_signatures};
+    foreach my $signature ( @$dkim_signatures ) {
+      $dmarc->dkim( domain => $signature->domain, result => $signature->result );
+      dbg("DKIM result for domain " . $signature->domain . ": " . $signature->result);
+    }
+  } else {
+    $dmarc->dkim($pms->{dkim_verifier}) if (ref($pms->{dkim_verifier}));
+  }
+
+  my $result;
+  eval {
+    $dmarc->spf([
+      {
+        scope  => 'mfrom',
+        domain => $mfrom_domain,
+        result => $spf_status,
+      },
+      {
+        scope  => 'helo',
+        domain => $lasthop->{lc_helo},
+        result => $spf_helo_status,
+      },
+    ]);
+    $result = $dmarc->validate();
+  };
+  if ($@) {
+    dbg("error while evaluating domain $mfrom_domain: $@");
+    return;
+  }
+
+  if (defined($pms->{dmarc_result} = $result->result)) {
+    if ($pms->{conf}->{dmarc_save_reports}) {
+      my $rua = eval { $result->published()->rua(); };
+      if (defined $rua && index($rua, 'mailto:') >= 0) {
+        eval { $dmarc->save_aggregate(); };
+        if ($@) {
+          info("report could not be saved: $@");
+        } else {
+          dbg("report will be sent to $rua");
+        }
+      }
+    }
+
+    if (defined $result->reason->[0]{comment} &&
+          $result->reason->[0]{comment} eq 'too many policies') {
+      dbg("result: no policy available (too many policies)");
+      $pms->{dmarc_policy} = 'no policy available';
+    } elsif ($result->result eq 'pass') {
+      dbg("result: pass");
+      $pms->{dmarc_policy} = $result->published->p;
+    } elsif ($result->result ne 'none') {
+      dbg("result: $result->{result}, disposition: $result->{disposition}, dkim: $result->{dkim}, spf: $result->{spf} (spf: $spf_status, spf_helo: $spf_helo_status)");
+      $pms->{dmarc_policy} = $result->disposition;
+    } else {
+      dbg("result: no policy available");
+      $pms->{dmarc_policy} = 'no policy available';
+    }
+  }
+}
+
+1;
+
--- a/upstream/lib/Mail/SpamAssassin/Plugin/DNSEval.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/DNSEval.pm
@ -45,7 +45,7 @@ package Mail::SpamAssassin::Plugin::DNSEval;
 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Constants qw(:ip);
-use Mail::SpamAssassin::Util qw(reverse_ip_address is_fqdn_valid);
+use Mail::SpamAssassin::Util qw(reverse_ip_address idn_to_ascii compile_regexp is_fqdn_valid);

 use strict;
 use warnings;
@ -55,7 +55,6 @@ use re 'taint';
 our @ISA = qw(Mail::SpamAssassin::Plugin);

 my $IP_ADDRESS = IP_ADDRESS;
-my $IP_PRIVATE = IP_PRIVATE;

 # constructor: register the eval rule
 sub new {
@ -75,7 +74,6 @@ sub new {
    'check_rbl_ns_from',
    'check_rbl_txt',
    'check_rbl_sub',
-    'check_rbl_results_for',
    'check_rbl_from_host',
    'check_rbl_from_domain',
    'check_rbl_envfrom',
@ -86,7 +84,7 @@ sub new {

  $self->set_config($mailsaobject->{conf});
  foreach(@{$self->{'evalrules'}}) {
-    $self->register_eval_rule($_);
+    $self->register_eval_rule($_, $Mail::SpamAssassin::Conf::TYPE_RBL_EVALS);
  }

  return $self;
@ -154,21 +152,104 @@ sub set_config {
 # directly as part of PMS
 sub check_start {
  my ($self, $opts) = @_;
+  my $pms = $opts->{permsgstatus};

  foreach(@{$self->{'evalrules'}}) {
-    $opts->{'permsgstatus'}->register_plugin_eval_glue($_);
+    $pms->register_plugin_eval_glue($_);
  }
+
+  # Initialize check_rbl_sub tests
+  $self->_init_rbl_subs($pms);
+}
+
+sub _init_rbl_subs {
+  my ($self, $pms) = @_;
+  my $conf = $pms->{conf};
+
+  # Very hacky stuff and direct rbl_evals usage for now, TODO rewrite everything
+  foreach my $rule (@{$conf->{eval_to_rule}->{check_rbl_sub}||[]}) {
+    next if !exists $conf->{rbl_evals}->{$rule};
+    next if !$conf->{scores}->{$rule};
+    # rbl_evals is [$function,[@args]]
+    my $args = $conf->{rbl_evals}->{$rule}->[1];
+    my ($set, $subtest) = @$args;
+    if (!defined $subtest) {
+      warn("dnseval: missing subtest for rule $rule\n");
+      next;
+    }
+    if ($subtest =~ /^sb:/) {
+      warn("dnseval: ignored $rule, SenderBase rules are deprecated\n");
+      next;
+    }
+    # Compile as regex if not pure ip/bitmask (same check in process_dnsbl_result)
+    if ($subtest !~ /^\d+(?:\.\d+\.\d+\.\d+)?$/) {
+      my ($rec, $err) = compile_regexp($subtest, 0);
+      if (!$rec) {
+        warn("dnseval: invalid rule $rule subtest regexp '$subtest': $err\n");
+        next;
+      }
+      $subtest = $rec;
+    }
+    dbg("dnseval: initialize check_rbl_sub for rule $rule, set $set, subtest $subtest");
+    push @{$pms->{rbl_subs}{$set}}, [$subtest, $rule];
+  }
+}
+
+sub parsed_metadata {
+  my ($self, $opts) = @_;
+
+  my $pms = $opts->{permsgstatus};
+
+  return 1 if $self->{main}->{conf}->{skip_rbl_checks};
+  return 1 if !$pms->is_dns_available();
+
+  # Process relaylists only once, not everytime in check_rbl_backend
+  #
+  # ok, make a list of all the IPs in the untrusted set
+  my @fullips = map { $_->{ip} } @{$pms->{relays_untrusted}};
+  # now, make a list of all the IPs in the external set, for use in
+  # notfirsthop testing.  This will often be more IPs than found
+  # in @fullips.  It includes the IPs that are trusted, but
+  # not in internal_networks.
+  my @fullexternal = map {
+	(!$_->{internal}) ? ($_->{ip}) : ()
+      } @{$pms->{relays_trusted}};
+  push @fullexternal, @fullips; # add untrusted set too
+  # Make sure a header significantly improves results before adding here
+  # X-Sender-Ip: could be worth using (very low occurence for me)
+  # X-Sender: has a very low bang-for-buck for me
+  my @originating;
+  foreach my $header (@{$pms->{conf}->{originating_ip_headers}}) {
+    my $str = $pms->get($header, undef);
+    next unless defined $str && $str ne '';
+    push @originating, ($str =~ m/($IP_ADDRESS)/g);
+  }
+  # Let's go ahead and trim away all private ips (KLC)
+  # also uniq the list and strip dups. (jm)
+  my @ips = $self->ip_list_uniq_and_strip_private(@fullips);
+  # if there's no untrusted IPs, it means we trust all the open-internet
+  # relays, so we skip checks
+  if (scalar @ips + scalar @originating > 0) {
+    dbg("dnseval: IPs found: full-external: ".join(", ", @fullexternal).
+      " untrusted: ".join(", ", @ips).
+      " originating: ".join(", ", @originating));
+    @{$pms->{dnseval_fullexternal}} = @fullexternal;
+    @{$pms->{dnseval_ips}} = @ips;
+    @{$pms->{dnseval_originating}} = @originating;
+  }
+
+  return 1;
 }

 sub ip_list_uniq_and_strip_private {
  my ($self, @origips) = @_;
  my @ips;
  my %seen;
-  my $IP_PRIVATE = IP_PRIVATE;
  foreach my $ip (@origips) {
    next unless $ip;
-    next if (exists ($seen{$ip})); $seen{$ip} = 1;
-    next if ($ip =~ /$IP_PRIVATE/o);
+    next if exists $seen{$ip};
+    $seen{$ip} = 1;
+    next if $ip =~ IS_IP_PRIVATE;
    push(@ips, $ip);
  }
  return @ips;
@ -181,12 +262,17 @@ sub ip_list_uniq_and_strip_private {
 sub check_rbl_accreditor {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest, $accreditor) = @_;

+  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
+  return 0 if !$pms->is_dns_available();
+
  if (!defined $pms->{accreditor_tag}) {
    $self->message_accreditor_tag($pms);
  }
  if ($pms->{accreditor_tag}->{$accreditor}) {
-    $self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
+    # return undef for async status
+    return $self->_check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
  }
+
  return 0;
 }

@ -226,58 +312,16 @@ sub message_accreditor_tag {
  $pms->{accreditor_tag} = \%acctags;
 }

-sub check_rbl_backend {
+sub _check_rbl_backend {
  my ($self, $pms, $rule, $set, $rbl_server, $type, $subtest) = @_;
-  local ($_);

-  # First check that DNS is available, if not do not perform this check
-  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
-  return 0 unless $pms->is_dns_available();
+  return if !exists $pms->{dnseval_ips}; # no untrusted ips

-  if (($rbl_server !~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) &&
-      (index($rbl_server, '.') >= 0) &&
-      ($rbl_server !~ /\.$/)) {
-    $rbl_server .= ".";
-  }
-
-  dbg("dns: checking RBL $rbl_server, set $set");
-
-  # ok, make a list of all the IPs in the untrusted set
-  my @fullips = map { $_->{ip} } @{$pms->{relays_untrusted}};
-
-  # now, make a list of all the IPs in the external set, for use in
-  # notfirsthop testing.  This will often be more IPs than found
-  # in @fullips.  It includes the IPs that are trusted, but
-  # not in internal_networks.
-  my @fullexternal = map {
-	(!$_->{internal}) ? ($_->{ip}) : ()
-      } @{$pms->{relays_trusted}};
-  push (@fullexternal, @fullips);	# add untrusted set too
-
-  # Make sure a header significantly improves results before adding here
-  # X-Sender-Ip: could be worth using (very low occurence for me)
-  # X-Sender: has a very low bang-for-buck for me
-  my $IP_ADDRESS = IP_ADDRESS;
-  my @originating;
-  for my $header (@{$pms->{conf}->{originating_ip_headers}}) {
-    my $str = $pms->get($header,undef);
-    next unless defined $str && $str ne '';
-    push (@originating, ($str =~ m/($IP_ADDRESS)/g));
-  }
-
-  # Let's go ahead and trim away all private ips (KLC)
-  # also uniq the list and strip dups. (jm)
-  my @ips = $self->ip_list_uniq_and_strip_private(@fullips);
-
-  # if there's no untrusted IPs, it means we trust all the open-internet
-  # relays, so we can return right now.
-  return 0 unless (scalar @ips + scalar @originating > 0);
-
-  dbg("dns: IPs found: full-external: ".join(", ", @fullexternal).
-	" untrusted: ".join(", ", @ips).
-	" originating: ".join(", ", @originating));
+  $rbl_server =~ s/\.+\z//; # strip unneeded trailing dot
+  dbg("dnseval: checking RBL $rbl_server, set $set, rule $rule");

  my $trusted = $self->{main}->{conf}->{trusted_networks};
+  my @ips = @{$pms->{dnseval_ips}};

  # If name is foo-notfirsthop, check all addresses except for
  # the originating one.  Suitable for use with dialup lists, like the PDL.
@ -293,9 +337,9 @@ sub check_rbl_backend {
    # specified some third-party relays as trusted.  Also, don't use
    # @originating; those headers are added by a phase of relaying through
    # a server like Hotmail, which is not going to be in dialup lists anyway.
-    @ips = $self->ip_list_uniq_and_strip_private(@fullexternal);
+    @ips = $self->ip_list_uniq_and_strip_private(@{$pms->{dnseval_fullexternal}});
    if ($1 eq "lastexternal") {
-      @ips = (defined $ips[0]) ? ($ips[0]) : ();
+      @ips = defined $ips[0] ? ($ips[0]) : ();
    } else {
 	pop @ips if (scalar @ips > 1);
    }
@ -307,14 +351,14 @@ sub check_rbl_backend {
  elsif ($set =~ /-(first|un)trusted$/)
  {
    my @tips;
-    foreach my $ip (@originating) {
+    foreach my $ip (@{$pms->{dnseval_originating}}) {
      if ($ip && !$trusted->contains_ip($ip)) {
        push(@tips, $ip);
      }
    }
-    @ips = $self->ip_list_uniq_and_strip_private (@ips, @tips);
+    @ips = $self->ip_list_uniq_and_strip_private(@ips, @tips);
    if ($1 eq "first") {
-      @ips = (defined $ips[0]) ? ($ips[0]) : ();
+      @ips = defined $ips[0] ? ($ips[0]) : ();
    } else {
      shift @ips;
    }
@ -322,7 +366,7 @@ sub check_rbl_backend {
  else
  {
    my @tips;
-    foreach my $ip (@originating) {
+    foreach my $ip (@{$pms->{dnseval_originating}}) {
      if ($ip && !$trusted->contains_ip($ip)) {
        push(@tips, $ip);
      }
@ -333,71 +377,77 @@ sub check_rbl_backend {
  }

  # How many IPs max you check in the received lines
-  my $checklast=$self->{main}->{conf}->{num_check_received};
+  my $checklast = $self->{main}->{conf}->{num_check_received};

  if (scalar @ips > $checklast) {
    splice (@ips, $checklast);	# remove all others
  }

-  my $tflags = $pms->{conf}->{tflags}->{$rule};
-
  # Trusted relays should only be checked against nice rules (dnswls)
-  if (defined $tflags && $tflags !~ /\bnice\b/) {
+  if (($pms->{conf}->{tflags}->{$rule}||'') !~ /\bnice\b/) {
    # remove trusted hosts from beginning
    while (@ips && $trusted->contains_ip($ips[0])) { shift @ips }
  }

  unless (scalar @ips > 0) {
-    dbg("dns: no untrusted IPs to check");
+    dbg("dnseval: no untrusted IPs to check");
    return 0;
  }

-  dbg("dns: only inspecting the following IPs: ".join(", ", @ips));
+  dbg("dnseval: only inspecting the following IPs: ".join(", ", @ips));

-  eval {
+  my $queries;
  foreach my $ip (@ips) {
-      my $revip = reverse_ip_address($ip);
-      $pms->do_rbl_lookup($rule, $set, $type,
-                          $revip.'.'.$rbl_server, $subtest) if defined $revip;
+    if (defined(my $revip = reverse_ip_address($ip))) {
+      my $ret = $pms->do_rbl_lookup($rule, $set, $type, $revip.'.'.$rbl_server, $subtest);
+      $queries++ if defined $ret;
+    }
  }
-  };

  # note that results are not handled here, hits are handled directly
  # as DNS responses are harvested
-  return 0;
+  return 0 if !$queries; # no query started
+  return; # return undef for async status
 }

 sub check_rbl {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
-  $self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
+
+  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
+  return 0 if !$pms->is_dns_available();
+
+  # return undef for async status
+  return $self->_check_rbl_backend($pms, $rule, $set, $rbl_server, 'A', $subtest);
 }

 sub check_rbl_txt {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
-  $self->check_rbl_backend($pms, $rule, $set, $rbl_server, 'TXT', $subtest);
-}
-
-# run for first message 
-sub check_rbl_sub {
-  my ($self, $pms, $rule, $set, $subtest) = @_;

  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
-  return 0 unless $pms->is_dns_available();
+  return 0 if !$pms->is_dns_available();

-  $pms->register_rbl_subtest($rule, $set, $subtest);
+  # return undef for async status
+  return $self->_check_rbl_backend($pms, $rule, $set, $rbl_server, 'TXT', $subtest);
 }

-# backward compatibility
-sub check_rbl_results_for {
-  #warn "dns: check_rbl_results_for() is deprecated, use check_rbl_sub()\n";
-  check_rbl_sub(@_);
+sub check_rbl_sub {
+  my ($self, $pms, $rule, $set, $subtest) = @_;
+  # just a dummy, _init_rbl_subs/do_rbl_lookup handles the subs
+
+  return; # return undef for async status
 }

 # this only checks the address host name and not the domain name because
 # using the domain name had much worse results for dsn.rfc-ignorant.org
 sub check_rbl_from_host {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_; 
-  _check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $pms->all_from_addrs());
+
+  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
+  return 0 if !$pms->is_dns_available();
+
+  # return undef for async status
+  return $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
+                                     $subtest, $pms->all_from_addrs());
 }

 sub check_rbl_headers {
@ -415,16 +465,18 @@ sub check_rbl_headers {
    @env_hdr = split(/,/, $conf->{rbl_headers});
  }

+  my $queries;
  foreach my $rbl_headers (@env_hdr) {
    my $addr = $pms->get($rbl_headers.':addr', undef);
    if ( defined $addr && $addr =~ /\@([^\@\s]+)/ ) {
-      $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
+      my $ret = $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
                                            $subtest, $addr);
+      $queries++ if defined $ret;
    } else {
      my $unsplitted_host = $pms->get($rbl_headers);
      chomp($unsplitted_host);
      foreach my $host (split(/\n/, $unsplitted_host)) {
-        if($host =~ /^$IP_ADDRESS$/ ) {
+        if ($host =~ IS_IP_ADDRESS) {
          next if ($conf->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
          $host = reverse_ip_address($host);
        } else {
@ -432,17 +484,23 @@ sub check_rbl_headers {
          next unless is_fqdn_valid($host);
          next unless $pms->{main}->{registryboundaries}->is_domain_valid($host);
        }
-        $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
+        my $ret = $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
+        $queries++ if defined $ret;
      }
    }
  }
+
+  return 0 if !$queries; # no query started
+  return; # return undef for async status
 }

 =over 4

 =item check_rbl_from_domain

-This checks all the from addrs domain names as an alternate to check_rbl_from_host.  As of v3.4.1, it has been improved to include a subtest for a specific octet.
+This checks all the from addrs domain names as an alternate to
+check_rbl_from_host.  As of v3.4.1, it has been improved to include a
+subtest for a specific octet.

 =back

@ -450,9 +508,14 @@ This checks all the from addrs domain names as an alternate to check_rbl_from_ho

 sub check_rbl_from_domain {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_;
-  _check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $pms->all_from_addrs_domains());
-}

+  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
+  return 0 if !$pms->is_dns_available();
+
+  # return undef for async status
+  return $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
+                                     $subtest, $pms->all_from_addrs_domains());
+}
 =over 4

 =item check_rbl_ns_from
@ -472,6 +535,7 @@ sub check_rbl_ns_from {
  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
  return 0 unless $pms->is_dns_available();

+  dbg("dnseval: EnvelopeFrom header not found") unless defined (($pms->get("EnvelopeFrom:addr"))[0]);
  for my $from ($pms->get('EnvelopeFrom:addr')) {
    next unless defined $from;
    $from =~ tr/././s;          # bug 3366
@ -482,20 +546,20 @@ sub check_rbl_ns_from {
  }
  return 0 unless defined $domain;

-  dbg("dns: checking NS for host $domain");
+  dbg("dnseval: checking NS for host $domain");

-  my $key = "NS:" . $domain;
  my $obj = { dom => $domain, rule => $rule, set => $set, rbl_server => $rbl_server, subtest => $subtest };
  my $ent = {
-    rulename => $rule, key => $key, zone => $domain, obj => $obj, type => "URI-NS",
+    rulename => $rule, zone => $domain, obj => $obj, type => "URI-NS",
  };
  # dig $dom ns
-  $ent = $pms->{async}->bgsend_and_start_lookup(
+  my $ret = $pms->{async}->bgsend_and_start_lookup(
    $domain, 'NS', undef, $ent,
    sub { my ($ent2,$pkt) = @_;
          $self->complete_ns_lookup($pms, $ent2, $pkt, $domain) },
    master_deadline => $pms->{master_deadline} );
-  return $ent;
+  return 0 if !defined $ret; # no query started
+  return; # return undef for async status
 }

 sub complete_ns_lookup {
@ -508,11 +572,11 @@ sub complete_ns_lookup {

  if (!$pkt) {
    # $pkt will be undef if the DNS query was aborted (e.g. timed out)
-    dbg("DNSEval: complete_ns_lookup aborted %s", $ent->{key});
+    dbg("dnseval: complete_ns_lookup aborted %s", $ent->{key});
    return;
  }

-  dbg("DNSEval: complete_ns_lookup %s", $ent->{key});
+  dbg("dnseval: complete_ns_lookup %s", $ent->{key});
  my @ns = $pkt->authority;

  foreach my $rr (@ns) {
@ -521,9 +585,9 @@ sub complete_ns_lookup {
    chomp($nshost);
    if (is_fqdn_valid($nshost)) {
      if ( defined $subtest ) {
-        dbg("dns: checking [$nshost] / $rule / $set / $rbl_server / $subtest");
+        dbg("dnseval: checking [$nshost] / $rule / $set / $rbl_server / $subtest");
      } else {
-        dbg("dns: checking [$nshost] / $rule / $set / $rbl_server");
+        dbg("dnseval: checking [$nshost] / $rule / $set / $rbl_server");
      }
      $pms->do_rbl_lookup($rule, $set, 'A',
        "$nshost.$rbl_server", $subtest);
@ -559,10 +623,11 @@ sub check_rbl_rcvd {
    }
  }

+  my $queries;
  foreach my $host ( @udnsrcvd ) {
    if((defined $host) and ($host ne "")) {
      chomp($host);
-      if($host =~ /^$IP_ADDRESS$/ ) {
+      if ($host =~ IS_IP_ADDRESS) {
        next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
        $host = reverse_ip_address($host);
      } else {
@ -572,49 +637,55 @@ sub check_rbl_rcvd {
        next unless $pms->{main}->{registryboundaries}->is_domain_valid($host);
      }
      if ( defined $subtest ) {
-        dbg("dns: checking [$host] / $rule / $set / $rbl_server / $subtest");
+        dbg("dnseval: checking [$host] / $rule / $set / $rbl_server / $subtest");
      } else {
-        dbg("dns: checking [$host] / $rule / $set / $rbl_server");
+        dbg("dnseval: checking [$host] / $rule / $set / $rbl_server");
      }
-      $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
+      my $ret = $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
+      $queries++ if defined $ret;
    }
  }
-  return 0;
+
+  return 0 if !$queries; # no query started
+  return; # return undef for async status
 }

 # this only checks the address host name and not the domain name because
 # using the domain name had much worse results for dsn.rfc-ignorant.org
 sub check_rbl_envfrom {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest) = @_; 
-  _check_rbl_addresses($self, $pms, $rule, $set, $rbl_server, $subtest, $pms->get('EnvelopeFrom:addr',undef));
+
+  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
+  return 0 if !$pms->is_dns_available();
+
+  # return undef for async status
+  return $self->_check_rbl_addresses($pms, $rule, $set, $rbl_server,
+                                 $subtest, $pms->get('EnvelopeFrom:addr',undef));
 }

 sub _check_rbl_addresses {
  my ($self, $pms, $rule, $set, $rbl_server, $subtest, @addresses) = @_;
  
-  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
-  return 0 unless $pms->is_dns_available();
+  $rbl_server =~ s/\.+\z//; # strip unneeded trailing dot

  my %hosts;
  for (@addresses) {
-    next if !defined($_) || !/ \@ ( [^\@\s]+ )/x;
+    next if !defined($_) || !/\@([^\@\s]+)/;
    my $address = $1;
    # strip leading & trailing dots (as seen in some e-mail addresses)
-    $address =~ s/^\.+//; $address =~ s/\.+\z//;
+    $address =~ s/^\.+//;
+    $address =~ s/\.+\z//;
    # squash duplicate dots to avoid an invalid DNS query with a null label
-    $address =~ tr/.//s;
-    $hosts{lc($address)} = 1  if $address =~ /\./;  # must by a FQDN
+    # Also checks it's FQDN
+    if ($address =~ tr/.//s) {
+      $hosts{lc($address)} = 1;
+    }
  }
  return unless scalar keys %hosts;

-  if (($rbl_server !~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) &&
-      (index($rbl_server, '.') >= 0) &&
-      ($rbl_server !~ /\.$/)) {
-    $rbl_server .= ".";
-  }
-
+  my $queries;
  for my $host (keys %hosts) {
-    if ($host =~ /^$IP_ADDRESS$/) {
+    if ($host =~ IS_IP_ADDRESS) {
      next if ($pms->{conf}->{tflags}->{$rule}||'') =~ /\bdomains_only\b/;
      $host = reverse_ip_address($host);
    } else {
@ -622,23 +693,26 @@ sub _check_rbl_addresses {
      next unless is_fqdn_valid($host);
      next unless $pms->{main}->{registryboundaries}->is_domain_valid($host);
    }
-    dbg("dns: checking [$host] / $rule / $set / $rbl_server");
-    $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
+    dbg("dnseval: checking [$host] / $rule / $set / $rbl_server");
+    my $ret = $pms->do_rbl_lookup($rule, $set, 'A', "$host.$rbl_server", $subtest);
+    $queries++ if defined $ret;
  }
+
+  return 0 if !$queries; # no async
+  return; # return undef for async status
 }

 sub check_dns_sender {
  my ($self, $pms, $rule) = @_;

  return 0 if $self->{main}->{conf}->{skip_rbl_checks};
-  return 0 unless $pms->is_dns_available();
+  return 0 if !$pms->is_dns_available();

  my $host;
-  for my $from ($pms->get('EnvelopeFrom:addr',undef)) {
+  foreach my $from ($pms->get('EnvelopeFrom:addr', undef)) {
    next unless defined $from;
-
-    $from =~ tr/././s;		# bug 3366
-    if ($from =~ m/ \@ ( [^\@\s]+ \. [^\@\s]+ )/x ) {
+    $from =~ tr/.//s; # bug 3366
+    if ($from =~ m/\@([^\@\s]+\.[^\@\s]+)/) {
      $host = lc($1);
      last;
    }
@ -650,15 +724,45 @@ sub check_dns_sender {
    return 0;
  }

-  dbg("dns: checking A and MX for host $host");
+  $host = idn_to_ascii($host);
+  dbg("dnseval: checking A and MX for host $host");

-  $pms->do_dns_lookup($rule, 'A', $host);
-  $pms->do_dns_lookup($rule, 'MX', $host);
+  my $queries;
+  my $ret = $self->do_sender_lookup($pms, $rule, 'A', $host);
+  $queries++ if defined $ret;
+  $ret = $self->do_sender_lookup($pms, $rule, 'MX', $host);
+  $queries++ if defined $ret;

-  # cache name of host for later checking
-  $pms->{sender_host} = $host;
+  return 0 if !$queries; # no query started
+  return; # return undef for async status
+}

-  return 0;
+sub do_sender_lookup {
+  my ($self, $pms, $rule, $type, $host) = @_;
+
+  my $ent = {
+    rulename => $rule,
+    type => "DNSBL-Sender",
+  };
+  return $pms->{async}->bgsend_and_start_lookup(
+    $host, $type, undef, $ent, sub {
+      my ($ent, $pkt) = @_;
+      return if !$pkt; # aborted / timed out
+      $pms->rule_ready($ent->{rulename}); # mark as run, could still hit
+      foreach my $answer ($pkt->answer) {
+        next if !$answer;
+        next if $answer->type ne 'A' && $answer->type ne 'MX';
+        if ($pkt->header->rcode eq 'NXDOMAIN' ||
+            $pkt->header->rcode eq 'SERVFAIL')
+        {
+          if (++$pms->{sender_host_fail} == 2) {
+            $pms->got_hit($ent->{rulename}, "DNS: ", ruletype => "dns");
+          }
+        }
+      }
+    },
+    master_deadline => $self->{master_deadline},
+  );
 }

 # capability checks for "if can(Mail::SpamAssassin::Plugin::DNSEval::XXX)":
--- a/upstream/lib/Mail/SpamAssassin/Plugin/DecodeShortURLs.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/DecodeShortURLs.pm
@ -0,0 +1,944 @@
+# <@LICENSE>
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at:
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# </@LICENSE>
+
+=head1 NAME
+
+DecodeShortURLs - Check for shortened URLs
+
+=head1 SYNOPSIS
+
+  loadplugin    Mail::SpamAssassin::Plugin::DecodeShortURLs
+
+  url_shortener tinyurl.com
+  url_shortener_get bit.ly
+
+  body HAS_SHORT_URL          eval:short_url()
+  describe HAS_SHORT_URL      Message has one or more shortened URLs
+
+  body SHORT_URL_REDIR        eval:short_url_redir()
+  describe SHORT_URL_REDIR    Message has shortened URL that resulted in a valid redirection
+
+  body SHORT_URL_CHAINED      eval:short_url_chained()
+  describe SHORT_URL_CHAINED  Message has shortened URL chained to other shorteners
+
+  body SHORT_URL_MAXCHAIN     eval:short_url_maxchain()
+  describe SHORT_URL_MAXCHAIN Message has shortened URL that causes too many redirections
+
+  body SHORT_URL_LOOP         eval:short_url_loop()
+  describe SHORT_URL_LOOP     Message has short URL that loops back to itself
+
+  body SHORT_URL_200          eval:short_url_code('200') # Can check any non-redirect HTTP code
+  describe SHORT_URL_200      Message has shortened URL returning HTTP 200
+
+  body SHORT_URL_404          eval:short_url_code('404') # Can check any non-redirect HTTP code
+  describe SHORT_URL_404      Message has shortened URL returning HTTP 404
+
+  uri URI_TINYURL_BLOCKED      m,https://tinyurl\.com/app/nospam,
+  describe URI_TINYURL_BLOCKED Message contains a tinyurl that has been disabled due to abuse
+
+  uri URI_BITLY_BLOCKED       m,^https://bitly\.com/a/blocked,
+  describe URI_BITLY_BLOCKED  Message contains a bit.ly URL that has been disabled due to abuse
+
+=head1 DESCRIPTION
+
+This plugin looks for URLs shortened by a list of URL shortening services. 
+Upon finding a matching URL, plugin will send a HTTP request to the
+shortening service and retrieve the Location-header which points to the
+actual shortened URL.  It then adds this URL to the list of URIs extracted
+by SpamAssassin which can then be accessed by uri rules and plugins such as
+URIDNSBL.
+
+This plugin will follow chained redirections, where a short URL redirects to
+another short URL.  Redirection depth limit can be set with
+C<max_short_url_redirections>.
+
+Maximum of C<max_short_urls> short URLs are checked in a message (10 by
+default).  Setting it to 0 disables HTTP requests, allowing only short_url()
+test to work and report found shorteners.
+
+All supported rule types for checking short URLs and redirection status are
+documented in L<SYNOPSIS> section.
+
+=head1 NOTES
+
+This plugin runs at the check_dnsbl hook (priority -100) so that it may
+modify the parsed URI list prior to normal uri rules or the URIDNSBL plugin.
+
+=cut
+
+package Mail::SpamAssassin::Plugin::DecodeShortURLs;
+
+use Mail::SpamAssassin::Plugin;
+use strict;
+use warnings;
+
+use vars qw(@ISA);
+@ISA = qw(Mail::SpamAssassin::Plugin);
+
+my $VERSION = 4.00;
+
+use constant HAS_LWP_USERAGENT => eval { require LWP::UserAgent; };
+
+sub dbg { my $msg = shift; return Mail::SpamAssassin::Logger::dbg("DecodeShortURLs: $msg", @_); }
+sub info { my $msg = shift; return Mail::SpamAssassin::Logger::info("DecodeShortURLs: $msg", @_); }
+
+sub new {
+  my $class = shift;
+  my $mailsaobject = shift;
+
+  $class = ref($class) || $class;
+  my $self = $class->SUPER::new($mailsaobject);
+  bless ($self, $class);
+
+  if ($mailsaobject->{local_tests_only}) {
+    dbg("local tests only, disabling HTTP requests");
+    $self->{net_disabled} = 1;
+  }
+  elsif (!HAS_LWP_USERAGENT) {
+    dbg("module LWP::UserAgent not installed, disabling HTTP requests");
+    $self->{net_disabled} = 1;
+  }
+
+  $self->set_config($mailsaobject->{conf});
+  $self->register_method_priority ('check_dnsbl', -10);
+  $self->register_eval_rule('short_url', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_redir', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_200', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_404', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_code', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_chained', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_maxchain', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_loop', $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule('short_url_tests'); # for legacy plugin compatibility warning
+
+  return $self;
+}
+
+=head1 PRIVILEGED SETTINGS
+
+=over 4
+
+=item url_shortener  domain [domain...]     (default: none)
+
+Domains that should be considered as an URL shortener.  If the domain begins
+with a '.', 3rd level tld of the main domain will be checked.
+
+Example:
+
+ url_shortener tinyurl.com
+ url_shortener .page.link
+
+=back
+
+=over 4
+
+=item url_shortener_get  domain [domain...]     (default: none)
+
+Alias to C<url_shortener>.  HTTP request will be done with GET method,
+instead of default HEAD.  Required for some services like bit.ly to return
+blocked URL correctly.
+
+Example:
+
+ url_shortener_get bit.ly
+
+=back
+
+=cut
+
+sub set_config {
+  my($self, $conf) = @_;
+  my @cmds = ();
+
+  push (@cmds, {
+    setting => 'url_shortener',
+    default => {},
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if ($value eq '') {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      foreach my $domain (split(/\s+/, $value)) {
+        $self->{url_shortener}->{lc $domain} = 1; # 1 == head
+      }
+    }
+  });
+
+  push (@cmds, {
+    setting => 'url_shortener_get',
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if ($value eq '') {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      foreach my $domain (split(/\s+/, $value)) {
+        $self->{url_shortener}->{lc $domain} = 2; # 2 == get
+      }
+    }
+  });
+
+=over 4
+
+=item clear_url_shortener  [domain] [domain...]
+
+Clear configured url_shortener and url_shortener_get domains, for example to
+override default settings from an update channel.  If domains are specified,
+then only those are removed from list.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'clear_url_shortener',
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if ($value eq '') {
+        $self->{url_shortener} = {};
+      } else {
+        foreach my $domain (split(/\s+/, $value)) {
+          delete $self->{url_shortener}->{lc $domain};
+        }
+      }
+    }
+  });
+
+=over 4
+
+=item url_shortener_cache_type     (default: none)
+
+The cache type that is being utilized.  Currently only supported value is
+C<dbi> that implies C<url_shortener_cache_dsn> is a DBI connect string.
+DBI module is required.
+
+Example:
+url_shortener_cache_type dbi
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_cache_type',
+    default => '',
+    is_priv => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
+  });
+
+=over 4
+
+=item url_shortener_cache_dsn		(default: none)
+
+The DBI dsn of the database to use.
+
+For SQLite, the database will be created automatically if it does not
+already exist, the supplied path and file must be read/writable by the
+user running spamassassin or spamd.
+
+For MySQL/MariaDB or PostgreSQL, see sql-directory for database table
+creation clauses.
+
+You will need to have the proper DBI module for your database.  For example
+DBD::SQLite, DBD::mysql, DBD::MariaDB or DBD::Pg.
+
+Minimum required SQLite version is 3.24.0 (available from DBD::SQLite 1.59_01).
+
+Examples:
+
+ url_shortener_cache_dsn dbi:SQLite:dbname=/var/lib/spamassassin/DecodeShortURLs.db
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_cache_dsn',
+    default => '',
+    is_priv => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
+  });
+
+=over 4
+
+=item url_shortener_cache_username  (default: none)
+
+The username that should be used to connect to the database.  Not used for
+SQLite.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_cache_username',
+    default => '',
+    is_priv => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
+  });
+
+=over 4
+
+=item url_shortener_cache_password  (default: none)
+
+The password that should be used to connect to the database.  Not used for
+SQLite.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_cache_password',
+    default => '',
+    is_priv => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
+  });
+
+=over 4
+
+=item url_shortener_cache_ttl		(default: 86400)
+
+The length of time a cache entry will be valid for in seconds.
+Default is 86400 (1 day).
+
+See C<url_shortener_cache_autoclean> for database cleaning.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_cache_ttl',
+    is_admin => 1,
+    default => 86400,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
+=over 4
+
+=item url_shortener_cache_autoclean	(default: 1000)
+
+Automatically purge old entries from database.  Value describes a random run
+chance of 1/x.  The default value of 1000 means that cleaning is run
+approximately once for every 1000 messages processed.  Value of 1 would mean
+database is cleaned every time a message is processed.
+
+Set 0 to disable automatic cleaning and to do it manually.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_cache_autoclean',
+    is_admin => 1,
+    default => 1000,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
+=over 4
+
+=item url_shortener_loginfo           (default: 0 (off))
+
+If this option is enabled (set to 1), then short URLs and the decoded URLs will be logged with info priority.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_loginfo',
+    is_admin => 1,
+    default => 0,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
+  });
+
+=over 4
+
+=item url_shortener_timeout     (default: 5)
+
+Maximum time a short URL HTTP request can take, in seconds.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_timeout',
+    is_admin => 1,
+    default => 5,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
+=over 4
+
+=item max_short_urls                 (default: 10)
+
+Maximum amount of short URLs that will be looked up per message.  Chained
+redirections are not counted, only initial short URLs found.
+
+Setting it to 0 disables HTTP requests, allowing only short_url() test to
+work and report any found shortener URLs.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'max_short_urls',
+    is_admin => 1,
+    default => 10,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
+=over 4
+
+=item max_short_url_redirections     (default: 10)
+
+Maximum depth of chained redirections that a short URL can generate.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'max_short_url_redirections',
+    is_admin => 1,
+    default => 10,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
+=over 4
+
+=item url_shortener_user_agent       (default: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36)
+
+Set User-Agent header for HTTP requests.  Some services require it to look
+like a common browser.
+
+=back
+
+=cut
+
+  push (@cmds, {
+    setting => 'url_shortener_user_agent',
+    is_admin => 1,
+    default => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
+  });
+
+  $conf->{parser}->register_commands(\@cmds);
+}
+
+=head1 ACKNOWLEDGEMENTS
+
+Original DecodeShortURLs plugin was developed by Steve Freegard.
+
+=cut
+
+sub short_url_tests {
+  # Legacy compatibility warning done in finish_parsing_start
+  return 0;
+}
+
+sub finish_parsing_start {
+  my ($self, $opts) = @_;
+
+  if ($opts->{conf}->{eval_to_rule}->{short_url_tests}) {
+    warn "DecodeShortURLs: Legacy configuration format detected. ".
+         "Eval function short_url_tests() is no longer supported, ".
+         "please see documentation for the new rule format.\n";
+  }
+}
+
+sub initialise_url_shortener_cache {
+  my ($self, $conf) = @_;
+
+  return if $self->{dbh};
+  return if !$conf->{url_shortener_cache_type};
+
+  if (!$conf->{url_shortener_cache_dsn}) {
+    warn "DecodeShortURLs: invalid cache configuration\n";
+    return;
+  }
+
+  ##
+  ## SQLite
+  ## 
+  if ($conf->{url_shortener_cache_type} =~ /^(?:dbi|sqlite)$/i
+      && $conf->{url_shortener_cache_dsn} =~ /^dbi:SQLite/)
+  {
+    eval {
+      local $SIG{'__DIE__'};
+      require DBI;
+      require DBD::SQLite;
+      DBD::SQLite->VERSION(1.59_01); # Required for ON CONFLICT
+      $self->{dbh} = DBI->connect_cached(
+        $conf->{url_shortener_cache_dsn}, '', '',
+        {RaiseError => 1, PrintError => 0, InactiveDestroy => 1, AutoCommit => 1}
+      );
+      $self->{dbh}->do("
+        CREATE TABLE IF NOT EXISTS short_url_cache (
+          short_url   TEXT PRIMARY KEY NOT NULL,
+          decoded_url TEXT NOT NULL,
+          hits        INTEGER NOT NULL DEFAULT 1,
+          created     INTEGER NOT NULL,
+          modified    INTEGER NOT NULL
+        )
+      ");
+      # Maintaining index for cleaning is likely more expensive than occasional full table scan
+      #$self->{dbh}->do("
+      #  CREATE INDEX IF NOT EXISTS short_url_modified
+      #    ON short_url_cache(created)
+      #");
+      $self->{sth_insert} = $self->{dbh}->prepare("
+        INSERT INTO short_url_cache (short_url, decoded_url, created, modified)
+        VALUES (?,?,strftime('%s','now'),strftime('%s','now'))
+        ON CONFLICT(short_url) DO UPDATE
+          SET decoded_url = excluded.decoded_url,
+              modified = excluded.modified,
+              hits = hits + 1
+      ");
+      $self->{sth_select} = $self->{dbh}->prepare("
+        SELECT decoded_url FROM short_url_cache
+        WHERE short_url = ?
+      ");
+      $self->{sth_delete} = $self->{dbh}->prepare("
+        DELETE FROM short_url_cache
+        WHERE short_url = ? AND created < strftime('%s','now') - $conf->{url_shortener_cache_ttl}
+      ");
+      $self->{sth_clean} = $self->{dbh}->prepare("
+        DELETE FROM short_url_cache
+        WHERE created < strftime('%s','now') - $conf->{url_shortener_cache_ttl}
+      ");
+    };
+  }
+  ##
+  ## MySQL/MariaDB
+  ## 
+  elsif (lc $conf->{url_shortener_cache_type} eq 'dbi'
+      && $conf->{url_shortener_cache_dsn} =~ /^dbi:(?:mysql|MariaDB)/i)
+  {
+    eval {
+      local $SIG{'__DIE__'};
+      require DBI;
+      $self->{dbh} = DBI->connect_cached(
+        $conf->{url_shortener_cache_dsn},
+        $conf->{url_shortener_cache_username},
+        $conf->{url_shortener_cache_password},
+        {RaiseError => 1, PrintError => 0, InactiveDestroy => 1, AutoCommit => 1}
+      );
+      $self->{sth_insert} = $self->{dbh}->prepare("
+        INSERT INTO short_url_cache (short_url, decoded_url, created, modified)
+        VALUES (?,?,UNIX_TIMESTAMP(),UNIX_TIMESTAMP())
+        ON DUPLICATE KEY UPDATE
+          decoded_url = VALUES(decoded_url),
+          modified = VALUES(modified),
+          hits = hits + 1
+      ");
+      $self->{sth_select} = $self->{dbh}->prepare("
+        SELECT decoded_url FROM short_url_cache
+        WHERE short_url = ?
+      ");
+      $self->{sth_delete} = $self->{dbh}->prepare("
+        DELETE FROM short_url_cache
+        WHERE short_url = ? AND created < UNIX_TIMESTAMP() - $conf->{url_shortener_cache_ttl}
+      ");
+      $self->{sth_clean} = $self->{dbh}->prepare("
+        DELETE FROM short_url_cache
+        WHERE created < UNIX_TIMESTAMP() - $conf->{url_shortener_cache_ttl}
+      ");
+    };
+  }
+  ##
+  ## PostgreSQL
+  ## 
+  elsif (lc $conf->{url_shortener_cache_type} eq 'dbi'
+      && $conf->{url_shortener_cache_dsn} =~ /^dbi:Pg/i)
+  {
+    eval {
+      local $SIG{'__DIE__'};
+      require DBI;
+      $self->{dbh} = DBI->connect_cached(
+        $conf->{url_shortener_cache_dsn},
+        $conf->{url_shortener_cache_username},
+        $conf->{url_shortener_cache_password},
+        {RaiseError => 1, PrintError => 0, InactiveDestroy => 1, AutoCommit => 1}
+      );
+      $self->{sth_insert} = $self->{dbh}->prepare("
+        INSERT INTO short_url_cache (short_url, decoded_url, created, modified)
+        VALUES (?,?,CAST(EXTRACT(epoch FROM NOW()) AS INT),CAST(EXTRACT(epoch FROM NOW()) AS INT))
+        ON CONFLICT (short_url) DO UPDATE SET
+          decoded_url = EXCLUDED.decoded_url,
+          modified = EXCLUDED.modified,
+          hits = short_url_cache.hits + 1
+      ");
+      $self->{sth_select} = $self->{dbh}->prepare("
+        SELECT decoded_url FROM short_url_cache
+        WHERE short_url = ?
+      ");
+      $self->{sth_delete} = $self->{dbh}->prepare("
+        DELETE FROM short_url_cache
+        WHERE short_url ? = AND created < CAST(EXTRACT(epoch FROM NOW()) AS INT) - $conf->{url_shortener_cache_ttl}
+      ");
+      $self->{sth_clean} = $self->{dbh}->prepare("
+        DELETE FROM short_url_cache
+        WHERE created < CAST(EXTRACT(epoch FROM NOW()) AS INT) - $conf->{url_shortener_cache_ttl}
+      ");
+    };
+  ##
+  ## ...
+  ##
+  } else {
+    warn "DecodeShortURLs: invalid cache configuration\n";
+    return;
+  }
+
+  if ($@ || !$self->{sth_clean}) {
+    warn "DecodeShortURLs: cache connect failed: $@\n";
+    undef $self->{dbh};
+    undef $self->{sth_insert};
+    undef $self->{sth_select};
+    undef $self->{sth_delete};
+    undef $self->{sth_clean};
+  }
+}
+
+sub short_url {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url} ? 1 : 0;
+}
+
+sub short_url_redir {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url_redir} ? 1 : 0;
+}
+
+sub short_url_200 {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url_200} ? 1 : 0;
+}
+
+sub short_url_404 {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url_404} ? 1 : 0;
+}
+
+sub short_url_code {
+  my ($self, $pms, undef, $code) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return 0 unless defined $code && $code =~ /^\d{3}$/;
+  return $pms->{"short_url_$code"} ? 1 : 0;
+}
+
+sub short_url_chained {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url_chained} ? 1 : 0;
+}
+
+sub short_url_maxchain {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url_maxchain} ? 1 : 0;
+}
+
+sub short_url_loop {
+  my ($self, $pms) = @_;
+
+  # Make sure checks are run
+  $self->_check_short($pms);
+
+  return $pms->{short_url_loop} ? 1 : 0;
+}
+
+sub _check_shortener_uri {
+  my ($uri, $conf) = @_;
+
+  local($1,$2);
+  return 0 unless $uri =~ m{^
+    https?://		# Only http
+    (?:[^\@/?#]*\@)?	# Ignore user:pass@
+    ([^/?#:]+)		# (Capture hostname)
+    (?::\d+)?		# Possible port
+    (.*?\w)?		# Some path wanted
+    }ix;
+  my $host = lc $1;
+  my $has_path = defined $2;
+  my $levels = $host =~ tr/.//;
+  # No point looking at single level "xxx.yy" without a path
+  return if $levels == 1 && !$has_path;
+  if (exists $conf->{url_shortener}->{$host}) {
+    return {
+      'uri' => $uri,
+      'method' => $conf->{url_shortener}->{$host} == 1 ? 'head' : 'get',
+    };
+  }
+  # if domain is a 3rd level domain check if there is a url shortener
+  # on the 2nd level tld
+  elsif ($levels == 2 && $host =~ /^(?!www)[^.]+(\.[^.]+\.[^.]+)$/i &&
+           exists $conf->{url_shortener}->{$1}) {
+    return {
+      'uri' => $uri,
+      'method' => $conf->{url_shortener}->{$1} == 1 ? 'head' : 'get',
+    };
+  }
+  return;
+}
+
+sub check_dnsbl {
+  my ($self, $opts) = @_;
+
+  $self->_check_short($opts->{permsgstatus});
+}
+
+sub _check_short {
+  my ($self, $pms) = @_;
+
+  return if $pms->{short_url_checked}++;
+  my $conf = $pms->{conf};
+
+  # Sort short URLs into hash to de-dup them
+  my %short_urls;
+  my $uris = $pms->get_uri_detail_list();
+  while (my($uri, $info) = each %{$uris}) {
+    next unless $info->{domains} && $info->{cleaned};
+    if (my $short_url_info = _check_shortener_uri($uri, $conf)) {
+      $short_urls{$uri} = $short_url_info;
+      last if scalar keys %short_urls >= $conf->{max_short_urls};
+    }
+  }
+
+  # Bail out if no shortener was found
+  return unless %short_urls;
+
+  # Mark that a URL shortener was found
+  $pms->{short_url} = 1;
+
+  # Bail out if network lookups not enabled or max_short_urls 0
+  return if $self->{net_disabled};
+  return if !$conf->{max_short_urls};
+
+  # Initialize cache
+  $self->initialise_url_shortener_cache($conf);
+
+  # Initialize LWP
+  my $ua = LWP::UserAgent->new(
+    'agent' => $conf->{url_shortener_user_agent},
+    'max_redirect' => 0,
+    'timeout' => $conf->{url_shortener_timeout},
+  );
+  $ua->env_proxy;
+
+  # Launch HTTP requests
+  foreach my $uri (keys %short_urls) {
+    $self->recursive_lookup($short_urls{$uri}, $pms, $ua);
+  }
+
+  # Automatically purge old entries
+  if ($self->{dbh} && $conf->{url_shortener_cache_autoclean}
+      && rand() < 1/$conf->{url_shortener_cache_autoclean})
+  {
+    dbg("cleaning stale cache entries");
+    eval { $self->{sth_clean}->execute(); };
+    if ($@) { dbg("cache cleaning failed: $@"); }
+  }
+}
+
+sub recursive_lookup {
+  my ($self, $short_url_info, $pms, $ua, %been_here) = @_;
+  my $conf = $pms->{conf};
+
+  my $count = scalar keys %been_here;
+  dbg("redirection count $count") if $count;
+  if ($count >= $conf->{max_short_url_redirections}) {
+    dbg("found more than $conf->{max_short_url_redirections} shortener redirections");
+    # Fire test
+    $pms->{short_url_maxchain} = 1;
+    return;
+  }
+
+  my $short_url = $short_url_info->{uri};
+  my $location;
+  if (defined($location = $self->cache_get($short_url))) {
+    if ($conf->{url_shortener_loginfo}) {
+      info("found cached $short_url => $location");
+    } else {
+      dbg("found cached $short_url => $location");
+    }
+    # Cached http code?
+    if ($location =~ /^\d{3}$/) {
+      $pms->{"short_url_$location"} = 1;
+      # Update cache
+      $self->cache_add($short_url, $location);
+      return;
+    }
+  } else {
+    # Not cached; do lookup
+    my $method = $short_url_info->{method};
+    my $response = $ua->$method($short_url);
+    if (!$response->is_redirect) {
+      dbg("URL is not redirect: $short_url = ".$response->status_line);
+      my $rcode = $response->code;
+      if ($rcode =~ /^\d{3}$/) {
+        $pms->{"short_url_$rcode"} = 1;
+        # Update cache
+        $self->cache_add($short_url, $rcode);
+      }
+      return;
+    }
+    $location = $response->headers->{location};
+    if ($self->{url_shortener_loginfo}) {
+      info("found $short_url => $location");
+    } else {
+      dbg("found $short_url => $location");
+    }
+  }
+
+  # Update cache
+  $self->cache_add($short_url, $location);
+
+  # Bail out if $short_url redirects to itself
+  if ($short_url eq $location) {
+    dbg("URL is redirect to itself");
+    return;
+  }
+
+  # At this point we have a valid redirection and new URL in $response
+  $pms->{short_url_redir} = 1;
+
+  # Set chained here otherwise we might mark a disabled page or
+  # redirect back to the same host as chaining incorrectly.
+  $pms->{short_url_chained} = 1 if $count;
+
+  # Check if we are being redirected to a local page
+  # Don't recurse in this case...
+  if ($location !~ m{^[a-z]+://}i) {
+    my $orig_location = $location;
+    my $orig_short_url = $short_url;
+    # Strip to..
+    if (index($location, '/') == 0) {
+      $short_url =~ s{^([a-z]+://.*?)[/?#].*}{$1}; # ..absolute path
+    } else {
+      $short_url =~ s{^([a-z]+://.*)/}{$1}; # ..relative path
+    }
+    $location = "$short_url/$location";
+    dbg("looks like a local redirection: $orig_short_url => $location ($orig_location)");
+    $pms->add_uri_detail_list($location) if !$pms->{uri_detail_list}->{$location};
+    return;
+  }
+
+  if (exists $been_here{$location}) {
+    # Loop detected
+    dbg("error: loop detected: $location");
+    $pms->{short_url_loop} = 1;
+    return;
+  }
+  $been_here{$location} = 1;
+  $pms->add_uri_detail_list($location) if !$pms->{uri_detail_list}->{$location};
+
+  # Check for recursion
+  if (my $short_url_info = _check_shortener_uri($location, $conf)) {
+    # Recurse...
+    $self->recursive_lookup($short_url_info, $pms, $ua, %been_here);
+  }
+}
+
+sub cache_add {
+  my ($self, $short_url, $decoded_url) = @_;
+
+  return if !$self->{dbh};
+  return if length($short_url) > 256 || length($decoded_url) > 512;
+
+  # Upsert
+  eval { $self->{sth_insert}->execute($short_url, $decoded_url); };
+  if ($@) {
+    dbg("could not add to cache: $@");
+  }
+
+  return;
+}
+
+sub cache_get {
+  my ($self, $key) = @_;
+
+  return if !$self->{dbh};
+
+  # Make sure expired entries are gone.  Just a quick check for primary key,
+  # not that expensive.
+  eval { $self->{sth_delete}->execute($key); };
+  if ($@) {
+    dbg("cache delete failed: $@");
+    return;
+  }
+
+  # Now try to get it (don't bother parsing if something was deleted above,
+  # it would be rare event anyway)
+  eval { $self->{sth_select}->execute($key); };
+  if ($@) {
+    dbg("cache get failed: $@");
+    return;
+  }
+
+  my @row = $self->{sth_select}->fetchrow_array();
+  if (@row) {
+    return $row[0];
+  }
+
+  return;
+}
+
+# Version features
+sub has_short_url { 1 }
+sub has_autoclean { 1 }
+sub has_short_url_code { 1 }
+sub has_user_agent { 1 } # url_shortener_user_agent
+sub has_get { 1 } # url_shortener_get
+sub has_clear { 1 } # clear_url_shortener
+sub has_timeout { 1 } # url_shortener_timeout
+sub has_max_redirections { 1 } # max_short_url_redirections
+# short_url() will always hit if matching url_shortener was found, even
+# without HTTP requests.  To check if a valid HTTP redirection response was
+# seen, use short_url_redir().
+sub has_short_url_redir { 1 }
+
+1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/ExtractText.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/ExtractText.pm
@ -0,0 +1,713 @@
+# <@LICENSE>
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at:
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# </@LICENSE>
+
+# Authors: Jonas Eckerman, Dave Wreski, Giovanni Bechis
+
+=head1 NAME
+
+ExtractText - extracts text from documenmts.
+
+=head1 SYNOPSIS
+
+loadplugin Mail::SpamAssassin::Plugin::ExtractText
+
+ifplugin Mail::SpamAssassin::Plugin::ExtractText
+
+  extracttext_external  pdftotext  /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} -
+  extracttext_use       pdftotext  .pdf application/pdf
+
+  # http://docx2txt.sourceforge.net
+  extracttext_external  docx2txt   /usr/bin/docx2txt {} -
+  extracttext_use       docx2txt   .docx application/docx
+
+  extracttext_external  antiword   /usr/bin/antiword -t -w 0 -m UTF-8.txt {}
+  extracttext_use       antiword   .doc application/(?:vnd\.?)?ms-?word.*
+
+  extracttext_external  unrtf      /usr/bin/unrtf --nopict {}
+  extracttext_use       unrtf      .doc .rtf application/rtf text/rtf
+
+  extracttext_external  odt2txt    /usr/bin/odt2txt --encoding=UTF-8 {}
+  extracttext_use       odt2txt    .odt .ott application/.*?opendocument.*text
+  extracttext_use       odt2txt    .sdw .stw application/(?:x-)?soffice application/(?:x-)?starwriter
+
+  extracttext_external  tesseract  {OMP_THREAD_LIMIT=1} /usr/bin/tesseract -c page_separator= {} -
+  extracttext_use       tesseract  .jpg .png .bmp .tif .tiff image/(?:jpeg|png|x-ms-bmp|tiff)
+
+  add_header   all          ExtractText-Flags _EXTRACTTEXTFLAGS_
+  header       PDF_NO_TEXT  X-ExtractText-Flags =~ /\bpdftotext_NoText\b/
+  describe     PDF_NO_TEXT  PDF without text
+  score        PDF_NO_TEXT  0.001
+
+  header       DOC_NO_TEXT  X-ExtractText-Flags =~ /\b(?:antiword|openxml|unrtf|odt2txt)_NoText\b/
+  describe     DOC_NO_TEXT  Document without text
+  score        DOC_NO_TEXT  0.001
+
+  header       EXTRACTTEXT  exists:X-ExtractText-Flags
+  describe     EXTRACTTEXT  Email processed by extracttext plugin
+  score        EXTRACTTEXT  0.001
+
+endif
+
+=head1 DESCRIPTION
+
+This module uses external tools to extract text from message parts,
+and then sets the text as the rendered part. External tool must output
+plain text, not HTML or other non-textual result.
+
+How to extract text is completely configurable, and based on
+MIME part type and file name.
+
+=head1 CONFIGURATION
+
+All configuration lines in user_prefs files will be ignored.
+
+=over 4
+
+=item extracttext_maxparts (default: 10)
+
+Configure the maximum mime parts number to analyze, a value of 0 means all mime parts
+will be analyzed
+
+=item extracttext_timeout (default: 5 10)
+
+Configure the timeout in seconds of external tool checks, per attachment.
+
+Second argument speficies maximum total time for all checks.
+
+=back
+
+=head2 Tools
+
+=over
+
+=item extracttext_use
+
+Specifies what tool to use for what message parts.
+
+The general syntax is
+
+extracttext_use  C<name>  C<specifiers>
+
+=back
+
+=over
+
+=item name
+
+the internal name of a tool.
+
+=item specifiers
+
+File extension and regular expressions for file names and MIME
+types. The regular expressions are anchored to beginning and end.
+
+=back
+
+=head3 Examples
+
+	extracttext_use  antiword  .doc application/(?:vnd\.?)?ms-?word.*
+	extracttext_use  openxml   .docx .dotx .dotm application/(?:vnd\.?)openxml.*?word.*
+	extracttext_use  openxml   .doc .dot application/(?:vnd\.?)?ms-?word.*
+	extracttext_use  unrtf     .doc .rtf application/rtf text/rtf
+
+=over
+
+=item extracttext_external
+
+Defines an external tool.  The tool must read a document on standard input
+or from a file and write text to standard output.
+
+The special keyword "{}" will be substituted at runtime with the temporary
+filename to be scanned by the external tool.
+
+Environment variables can be defined with "{KEY=VALUE}", these strings will
+be removed from commandline.
+
+It is required that commandline used outputs result directly to STDOUT.
+
+The general syntax is
+
+extracttext_external C<name> C<command> C<parameters>
+
+=back
+
+=over
+
+=item name
+
+The internal name of this tool.
+
+=item command
+
+The full path to the external command to run.
+
+=item parameters
+
+Parameters for the external command. The temporary file name containing
+the document will be automatically added as last parameter.
+
+=back
+
+=head3 Examples
+
+	extracttext_external  antiword  /usr/bin/antiword -t -w 0 -m UTF-8.txt {} -
+	extracttext_external  unrtf     /usr/bin/unrtf --nopict {}
+	extracttext_external  odt2txt   /usr/bin/odt2txt --encoding=UTF-8 {}
+
+=head2 Metadata
+
+The plugin adds some pseudo headers to the message. These headers are seen by
+the bayes system, and can be used in normal SpamAssassin rules.
+
+The headers are also available as template tags as noted below.
+
+=head3 Example
+
+The fictional example headers below are based on a message containing this:
+
+=over
+
+=item 1
+A perfectly normal PDF.
+
+=item 2
+An OpenXML document with a word document inside.
+Neither Office document contains text.
+
+=back
+
+=head3 Headers
+
+=over
+
+=item X-ExtractText-Chars
+
+Tag: _EXTRACTTEXTCHARS_
+
+Contains a count of characters that were extracted.
+
+X-ExtractText-Chars: 10970
+
+=item X-ExtractText-Words
+
+Tag: _EXTRACTTEXTWORDS_
+
+Contains a count of "words" that were extracted.
+
+X-ExtractText-Chars: 1599
+
+=item X-ExtractText-Tools
+
+Tag: _EXTRACTTEXTTOOLS_
+
+Contains chains of tools used for extraction.
+
+X-ExtractText-Tools: pdftotext openxml_antiword
+
+=item X-ExtractText-Types
+
+Tag: _EXTRACTTEXTTYPES_
+
+Contains chains of MIME types for parts found during extraction.
+
+X-ExtractText-Types: application/pdf; application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/ms-word
+
+=item X-ExtractText-Extensions
+
+Tag: _EXTRACTTEXTEXTENSIONS_
+
+Contains chains of canonicalized file extensions for parts
+found during extraction.
+
+X-ExtractText-Extensions: pdf docx
+
+=item X-ExtractText-Flags
+
+Tag: _EXTRACTTEXTFLAGS_
+
+Contains notes from the plugin.
+
+X-ExtractText-Flags: openxml_NoText
+
+=back
+
+=head3 Rules
+
+Example:
+
+	header    PDF_NO_TEXT  X-ExtractText-Flags =~ /\bpdftotext_Notext\b/
+	describe  PDF_NO_TEXT  PDF without text
+
+=cut
+
+package Mail::SpamAssassin::Plugin::ExtractText;
+
+use strict;
+use warnings;
+use re 'taint';
+
+my $VERSION = 0.001;
+
+use File::Basename;
+
+use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Util qw (compile_regexp untaint_var untaint_file_path
+  proc_status_ok exit_status_str);
+
+our @ISA = qw(Mail::SpamAssassin::Plugin);
+
+sub new {
+  my ($class, $mailsa) = @_;
+
+  $class = ref($class) || $class;
+  my $self = $class->SUPER::new($mailsa);
+  bless ($self, $class);
+
+  $self->{match} = [];
+  $self->{tools} = {};
+  $self->{magic} = 0;
+
+  $self->register_method_priority('post_message_parse', -1);
+  $self->set_config($mailsa->{conf});
+  return $self;
+}
+
+sub set_config {
+  my ($self, $conf) = @_;
+  my @cmds;
+
+  push(@cmds, {
+    setting => 'extracttext_maxparts',
+    default => 10,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+  });
+
+  push(@cmds, {
+    setting => 'extracttext_timeout',
+    default => 5,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      unless (defined $value && $value !~ /^$/) {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      local ($1,$2);
+      unless ($value =~ /^(\d+)(?:\s+(\d+))?$/) {
+        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
+      }
+      $self->{extracttext_timeout} = $1;
+      $self->{extracttext_timeout_total} = $2;
+    }
+  });
+
+  $conf->{parser}->register_commands(\@cmds);
+}
+
+sub parse_config {
+  my ($self, $opts) = @_;
+
+  # Ignore users's configuration lines
+  return 0 if $opts->{user_config};
+
+  if ($opts->{key} eq 'extracttext_use') {
+    $self->inhibit_further_callbacks();
+    # Temporary kludge to notify users. Double backslashes have zero benefit for this plugin config.
+    if ($opts->{value} =~ s/\\\\/\\/g) {
+      warn "extracttext: DOUBLE BACKSLASHES DEPRECATED, change config to single backslashes, autoconverted for backward compatibility: $opts->{key} $opts->{value}\n";
+    }
+    if ($opts->{value} =~ /(?:to|2)html\b/) {
+      warn "extracttext: HTML tools are not supported, plain text output is required. Please remove: $opts->{key} $opts->{value}\n";
+      return 1;
+    }
+    my @vals = split(/\s+/, $opts->{value});
+    my $tool = lc(shift @vals);
+    return 0 unless @vals;
+    foreach my $what (@vals) {
+      my $where;
+      if (index($what, '/') >= 0) {
+        $where = 'type';
+      } else {
+        $where = 'name';
+        if ($what =~ /^\.[a-zA-Z0-9]+$/) {
+          $what = ".*\\$what";
+        }
+      }
+      my ($rec, $err) = compile_regexp('^(?i)'.$what.'$', 0);
+      if (!$rec) {
+        warn("invalid regexp '$what': $err\n");
+        return 0;
+      }
+      push @{$self->{match}}, {where=>$where, what=>$rec, tool=>$tool};
+      dbg('extracttext: use: %s %s %s', $tool, $where, $what);
+    }
+    return 1;
+  }
+  
+  if ($opts->{key} eq 'extracttext_external') {
+    $self->inhibit_further_callbacks();
+    # Temporary kludge to notify users. Double backslashes have zero benefit for this plugin config.
+    if ($opts->{value} =~ s/\\\\/\\/g) {
+      warn "extracttext: DOUBLE BACKSLASHES DEPRECATED, change config to single backslashes, autoconverted for backward compatibility: $opts->{key} $opts->{value}\n";
+    }
+    if ($opts->{value} =~ /(?:to|2)html\b/) {
+      warn "extracttext: HTML tools are not supported, plain text output is required. Please remove: $opts->{key} $opts->{value}\n";
+      return 1;
+    }
+    my %env;
+    while ($opts->{value} =~ s/\{(.+?)\}/ /g) {
+      my ($k,$v) = split(/=/, $1, 2);
+      $env{$k} = defined $v ? $v : '';
+    }
+    my @vals = split(/\s+/, $opts->{value});
+    my $name = lc(shift @vals);
+    return 0 unless @vals > 1;
+    if ($self->{tools}->{$name}) {
+      warn "extracttext: duplicate tool defined: $name\n";
+      return 0;
+    }
+    #unless (-x $vals[0]) {
+    #  warn "extracttext: missing tool: $name ($vals[0])\n";
+    #  return 0;
+    #}
+    $self->{tools}->{$name} = {
+      'name' => $name,
+      'type' => 'external',
+      'env' => \%env,
+      'cmd' => \@vals,
+    };
+    dbg('extracttext: external: %s "%s"', $name, join('","', @vals));
+    return 1;
+  }
+
+  return 0;
+}
+
+# Extract 'text' via running an external command.
+sub _extract_external {
+  my ($self, $object, $tool) = @_;
+
+  my ($errno, $pipe_errno, $tmp_file, $err_file, $pid);
+  my $resp = '';
+  my @cmd = @{$tool->{cmd}};
+
+  Mail::SpamAssassin::PerMsgStatus::enter_helper_run_mode($self);
+
+  # Set environment variables
+  foreach (keys %{$tool->{env}}) {
+    $ENV{$_} = $tool->{env}{$_};
+  }
+
+  my $timer = Mail::SpamAssassin::Timeout->new(
+    { secs => $self->{main}->{conf}->{extracttext_timeout},
+      deadline => $self->{'master_deadline'} });
+
+  my $err = $timer->run_and_catch(sub {
+    local $SIG{PIPE} = sub { die "__brokenpipe__ignore__\n" };
+
+    ($tmp_file, my $tmp_fh) = Mail::SpamAssassin::Util::secure_tmpfile();
+    $tmp_file  or die "failed to create a temporary file";
+    print $tmp_fh ${$object->{data}};
+    close($tmp_fh);
+
+    ($err_file, my $err_fh) = Mail::SpamAssassin::Util::secure_tmpfile();
+    $err_file  or die "failed to create a temporary file";
+    close($err_fh);
+    $err_file = untaint_file_path($err_file);
+
+    foreach (@cmd) {
+      # substitute "{}" with the temporary file name to pass to the external software
+      s/\{\}/$tmp_file/;
+      $_ = untaint_var($_);
+    }
+
+    $pid = Mail::SpamAssassin::Util::helper_app_pipe_open(*EXTRACT, undef, ">$err_file", @cmd);
+    $pid or die "$!\n";
+
+    # read+split avoids a Perl I/O bug (Bug 5985)
+    my($inbuf, $nread);
+
+    while ($nread = read(EXTRACT, $inbuf, 8192)) { $resp .= $inbuf }
+    defined $nread  or die "error reading from pipe: $!";
+
+    $errno = 0;
+    close EXTRACT or $errno = $!;
+
+    if (proc_status_ok($?, $errno)) {
+      dbg("extracttext: [%s] (%s) finished successfully", $pid, $cmd[0]);
+    } elsif (proc_status_ok($?, $errno, 0, 1)) {  # sometimes it exits with 1
+      dbg("extracttext: [%s] (%s) finished: %s", $pid, $cmd[0], exit_status_str($?, $errno));
+    } else {
+      info("extracttext: [%s] (%s) error: %s", $pid, $cmd[0], exit_status_str($?, $errno));
+    }
+    # Save return status for later
+    $pipe_errno = $?;
+  });
+
+  if (defined(fileno(*EXTRACT))) {  # still open
+    if ($pid) {
+      if (kill('TERM', $pid)) {
+        dbg("extracttext: killed stale helper [$pid] ($cmd[0])");
+      } else {
+        dbg("extracttext: killing helper application [$pid] ($cmd[0]) failed: $!");
+      }
+    }
+    $errno = 0;
+    close EXTRACT or $errno = $!;
+    proc_status_ok($?, $errno)
+      or info("extracttext: [%s] (%s) error: %s", $pid, $cmd[0], exit_status_str($?, $errno));
+  }
+
+  Mail::SpamAssassin::PerMsgStatus::leave_helper_run_mode($self);
+  unlink($tmp_file);
+  # Read first line from STDERR
+  my $err_resp = -s $err_file ?
+    do { open(ERRF, $err_file); $_ = <ERRF>; close(ERRF); chomp; $_; } : '';
+  unlink($err_file);
+
+  if ($err_resp ne '') {
+    dbg("extracttext: [$pid] ($cmd[0]) stderr output: $err_resp");
+  }
+
+  # If the output starts with the command that has been run it's
+  # probably an error message
+  if ($pipe_errno) {
+    if ($err_resp =~ /\b(?:Usage:|No such file or directory)/) {
+      warn "extracttext: error from $cmd[0], please verify configuration: $err_resp\n";
+    }
+    elsif ($err_resp =~ /^Syntax (?:Warning|Error): (?:May not be a PDF file|Couldn't find trailer dictionary)/) {
+      # Ignore pdftotext
+    }
+    elsif ($err_resp =~ /^Error in (?:findFileFormatStream|fopenReadStream): (?:truncated file|file not found)/) {
+      # Ignore tesseract
+    }
+    elsif ($err_resp =~ /^libpng error:/) {
+      # Ignore tesseract
+    }
+    elsif ($err_resp =~ /^Corrupt JPEG data:/) {
+      # Ignore tesseract
+    }
+    elsif ($err_resp =~ /^\S+ is not a Word Document/) {
+      # Ignore antiword
+    }
+    elsif (!$resp) {
+      warn "extracttext: error (".($pipe_errno/256).") from $cmd[0]: $err_resp\n";
+    }
+    return (0, $resp);
+  }
+  return (1, $resp);
+}
+
+sub _extract_object {
+  my ($self, $object, $tool) = @_;
+  my ($ok, $text);
+
+  if ($tool->{type} eq 'external') {
+    ($ok, $text) = $self->_extract_external($object, $tool);
+  } else {
+    warn "extracttext: bad tool type: $tool->{type}\n";
+    return 0;
+  }
+
+  return 0 unless $ok;
+
+  if ($text =~ /^[\s\r\n]*$/s) {
+    $text = '';
+  } else {
+    # Remove not important html elements
+    #$text =~ s/(?=<!DOCTYPE)([\s\S]*?)>//g;
+    #$text =~ s/(?=<!--)([\s\S]*?)-->//g;
+  }
+
+  if ($text eq '') {
+    dbg('extracttext: No text extracted');
+  }
+
+  $text = untaint_var($text);
+  utf8::encode($text) if utf8::is_utf8($text);
+
+  return (1, $text);
+}
+
+sub _get_extension {
+  my ($self, $object) = @_;
+  my $fext;
+  if ($object->{name} && $object->{name} =~ /\.([^.\\\/]+)$/) {
+    $fext = $1;
+  }
+  elsif ($object->{file} && $object->{file} =~ /\.([^.\\\/]+)$/) {
+    $fext = $1;
+  }
+  return $fext ? ($fext) : ();
+}
+
+sub _extract {
+  my ($self, $coll, $part, $type, $name, $data, $tool) = @_;
+  my $object = {
+    'data' => $data,
+    'type' => $type,
+    'name' => $name
+  };
+  my @fexts;
+  my @types;
+
+  my @tools = ($tool->{name});
+  my ($ok, $text) = $self->_extract_object($object,$tool);
+
+  # when url+text, script never returns to this point from _extract_object above
+  #
+  return 0 unless $ok;
+  if ($text ne '' && would_log('dbg','extracttext') > 1) {
+    dbg("extracttext: text extracted:\n$text");
+  }
+
+  push @{$coll->{text}}, $text;
+  push @types, $type;
+  push @fexts, $self->_get_extension($object);
+  if ($text eq '') {
+    push @{$coll->{flags}}, 'NoText';
+    push @{$coll->{text}}, 'NoText';
+  } else {
+    if ($text =~ /<a(?:\s+[^>]+)?\s+href="([^">]*)"/) {
+      push @{$coll->{flags}}, 'ActionURI';
+      dbg("extracttext: ActionURI: $1");
+      push @{$coll->{text}}, $text;
+    }
+    if ($text =~ /NoText/) {
+      push @{$coll->{flags}},'NoText';
+      dbg("extracttext: NoText");
+      push @{$coll->{text}}, $text;
+    }
+    $coll->{chars} += length($text);
+
+    # the following is safe (regarding clobbering the @_) since perl v5.11.0
+    $coll->{words} += split(/\W+/s,$text) - 1;
+    # $coll->{words} += scalar @{[split(/\W+/s,$text)]} - 1;  # old perl hack
+
+    dbg("extracttext: rendering text for type $type with $tool->{name}");
+    $part->set_rendered($text);
+  }
+
+  if (@types) {
+    push @{$coll->{types}}, join(', ', @types);
+  }
+  if (@fexts) {
+    push @{$coll->{extensions}}, join('_', @fexts);
+  }
+  push @{$coll->{tools}}, join('_', @tools);
+  return 1;
+}
+
+#
+# check attachment type and match with the right tool
+#
+sub _check_extract {
+  my ($self, $coll, $checked, $part, $decoded, $data, $type, $name) = @_;
+  return 0 unless (defined $type || defined $name);
+  foreach my $match (@{$self->{match}}) {
+    next unless $self->{tools}->{$match->{tool}};
+    next if $checked->{$match->{tool}};
+
+    if ($match->{where} eq 'name') {
+      next unless (defined $name && $name =~ $match->{what});
+    } elsif ($match->{where} eq 'type') {
+      next unless (defined $type && $type =~ $match->{what});
+    } else {
+      next;
+    }
+    $checked->{$match->{tool}} = 1;
+    # dbg("extracttext: coll: $coll, part: $part, type: $type, name: $name, data: $data, tool: $self->{tools}->{$match->{tool}}");
+    return 1 if $self->_extract($coll,$part,$type,$name,$data,$self->{tools}->{$match->{tool}});
+  }
+  return 0;
+}
+
+sub post_message_parse {
+  my ($self, $opts) = @_;
+
+  my $timer = $self->{main}->time_method("extracttext");
+
+  my $msg = $opts->{'message'};
+  $self->{'master_deadline'} = $msg->{'master_deadline'};
+  my $starttime = time;
+
+  my %collect = (
+    'tools'		=> [],
+    'types'		=> [],
+    'extensions'	=> [],
+    'flags'		=> [],
+    'chars'		=> 0,
+    'words'		=> 0,
+    'text'		=> [],
+  );
+
+  my $conf = $self->{main}->{conf};
+  my $maxparts = $conf->{extracttext_maxparts};
+  my $ttimeout = $conf->{extracttext_timeout_total} ||
+    $conf->{extracttext_timeout} > 10 ? $conf->{extracttext_timeout} : 10;
+  my $nparts = 0;
+  foreach my $part ($msg->find_parts(qr/./, 1)) {
+    next unless $part->is_leaf;
+    if ($maxparts > 0 && ++$nparts > $maxparts) {
+      dbg("extracttext: Skipping MIME parts exceeding the ${maxparts}th");
+      last;
+    }
+    if (time - $starttime >= $ttimeout) {
+      dbg("extracttext: Skipping MIME parts, total execution timeout exceeded");
+      last;
+    }
+    my (undef,$rtd) = $part->rendered;
+    next if defined $rtd;
+    my %checked = ();
+    my $dat = $part->decode();
+    my $typ = $part->{type};
+    my $nam = $part->{name};
+    my $dec = 1;
+    next if $self->_check_extract(\%collect,\%checked,$part,\$dec,\$dat,$typ,$nam);
+  }
+
+  return 1 unless @{$collect{tools}};
+
+  my @uniq_tools = do { my %seen; grep { !$seen{$_}++ } @{$collect{tools}} };
+  my @uniq_types = do { my %seen; grep { !$seen{$_}++ } @{$collect{types}} };
+  my @uniq_ext   = do { my %seen; grep { !$seen{$_}++ } @{$collect{extensions}} };
+  my @uniq_flags = do { my %seen; grep { !$seen{$_}++ } @{$collect{flags}} };
+
+  $msg->put_metadata('X-ExtractText-Words', $collect{words});
+  $msg->put_metadata('X-ExtractText-Chars', $collect{chars});
+  $msg->put_metadata('X-ExtractText-Tools', join(' ', @uniq_tools));
+  $msg->put_metadata('X-ExtractText-Types', join(' ', @uniq_types));
+  $msg->put_metadata('X-ExtractText-Extensions', join(' ', @uniq_ext));
+  $msg->put_metadata('X-ExtractText-Flags', join(' ', @uniq_flags));
+
+  return 1;
+}
+
+sub parsed_metadata {
+  my ($self, $opts) = @_;
+  my $pms = $opts->{permsgstatus};
+  my $msg = $pms->get_message();
+  foreach my $tag (('Words','Chars','Tools','Types','Extensions','Flags')) {
+    my $v = $msg->get_metadata("X-ExtractText-$tag");
+    if (defined $v) {
+      $pms->set_tag("ExtractText$tag", $v);
+      dbg("extracttext: tag: $tag $v");
+    }
+  }
+  return 1;
+}
+
+1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/FreeMail.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/FreeMail.pm
@ -20,7 +20,7 @@ use strict;
 use warnings;
 use re 'taint';

-my $VERSION = 2.003;
+my $VERSION = 4.000;

 =head1 NAME

@ -48,19 +48,21 @@ freemail_domains domain ...
   For example:
   freemail_domains hotmail.com hotmail.co.?? yahoo.* yahoo.*.*

-freemail_whitelist email/domain ...
+freemail_welcomelist email/domain ...
+
+   Previously freemail_whitelist which will work interchangeably until 4.1.

   Emails or domains listed here are ignored (pretend they aren't
   freemail). No wildcards!

-freemail_import_whitelist_auth 1/0
+freemail_import_welcomelist_auth 1/0

-   Entries in whitelist_auth will also be used to whitelist emails
+   Entries in welcomelist_auth will also be used to welcomelist emails
   or domains from being freemail.  Default is 0.

-freemail_import_def_whitelist_auth 1/0
+freemail_import_def_welcomelist_auth 1/0

-   Entries in def_whitelist_auth will also be used to whitelist emails
+   Entries in def_welcomelist_auth will also be used to welcomelist emails
   or domains from being freemail.  Default is 0.

 header FREEMAIL_REPLYTO eval:check_freemail_replyto(['option'])
@ -96,17 +98,6 @@ header FREEMAIL_BODY eval:check_freemail_body(['regex'])

   Searches body for freemail address. With optional regex to match.

-=head1 CHANGELOG
-
- 1.996 - fix freemail_skip_bulk_envfrom
- 1.997 - set freemail_skip_when_over_max to 1 by default
- 1.998 - don't warn about missing freemail_domains when linting
- 1.999 - default whitelist undisclosed-recipient@yahoo.com etc
- 2.000 - some cleaning up
- 2.001 - fix freemail_whitelist
- 2.002 - _add_desc -> _got_hit, fix description email append bug
- 2.003 - freemail_import_(def_)whitelist_auth
-
 =cut

 use Mail::SpamAssassin::Plugin;
@ -115,8 +106,8 @@ use Mail::SpamAssassin::Util qw(compile_regexp);

 our @ISA = qw(Mail::SpamAssassin::Plugin);

-# default email whitelist
-our $email_whitelist = qr/
+# default email welcomelist
+our $email_welcomelist = qr/
  ^(?:
      abuse|support|sales|info|helpdesk|contact|kontakt
    | (?:post|host|domain)master
@ -142,7 +133,7 @@ our $skip_replyto_envfrom = qr/
  )\@
 /xi;

-sub dbg { Mail::SpamAssassin::Plugin::dbg ("FreeMail: @_"); }
+sub dbg { my $msg = shift; Mail::SpamAssassin::Plugin::dbg("FreeMail: $msg", @_); }

 sub new {
    my ($class, $mailsa) = @_;
@ -153,10 +144,10 @@ sub new {

    $self->{freemail_available} = 1;
    $self->set_config($mailsa->{conf});
-    $self->register_eval_rule("check_freemail_replyto");
-    $self->register_eval_rule("check_freemail_from");
-    $self->register_eval_rule("check_freemail_header");
-    $self->register_eval_rule("check_freemail_body");
+    $self->register_eval_rule("check_freemail_replyto", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+    $self->register_eval_rule("check_freemail_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+    $self->register_eval_rule("check_freemail_header", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+    $self->register_eval_rule("check_freemail_body", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);

    return $self;
 }
@ -215,13 +206,15 @@ sub set_config {
        }
    );
    push(@cmds, {
-        setting => 'freemail_import_whitelist_auth',
+        setting => 'freemail_import_welcomelist_auth',
+        aliases => ['freemail_import_whitelist_auth'], # removed in 4.1
        default => 0,
        type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
        }
    );
    push(@cmds, {
-        setting => 'freemail_import_def_whitelist_auth',
+        setting => 'freemail_import_def_welcomelist_auth',
+        aliases => ['freemail_import_def_whitelist_auth'], # removed in 4.1
        default => 0,
        type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
        }
@ -234,9 +227,9 @@ sub parse_config {

    if ($opts->{key} eq "freemail_domains") {
        foreach my $temp (split(/\s+/, $opts->{value})) {
-            if ($temp =~ /^[a-z0-9.*?-]+$/i) {
+            if ($temp !~ tr/a-zA-Z0-9.*?-//c) {
                my $value = lc($temp);
-                if ($value =~ /[*?]/) { # separate wildcard list
+                if ($value =~ tr/*?//) { # separate wildcard list
                    $self->{freemail_temp_wc}{$value} = 1;
                }
                else {
@ -244,21 +237,21 @@ sub parse_config {
                }
            }
            else {
-                warn("invalid freemail_domains: $temp");
+                warn("freemail: invalid freemail_domains: $temp\n");
            }
        }
        $self->inhibit_further_callbacks();
        return 1;
    }

-    if ($opts->{key} eq "freemail_whitelist") {
+    if ($opts->{key} eq "freemail_welcomelist" || $opts->{key} eq "freemail_whitelist") {
        foreach my $temp (split(/\s+/, $opts->{value})) {
            my $value = lc($temp);
            if ($value =~ /\w[.@]\w/) {
-                $self->{freemail_whitelist}{$value} = 1;
+                $self->{freemail_welcomelist}{$value} = 1;
            }
            else {
-                warn("invalid freemail_whitelist: $temp");
+                warn("freemail: invalid freemail_welcomelist: $temp\n");
            }
        }
        $self->inhibit_further_callbacks();
@ -292,12 +285,7 @@ sub finish_parsing_end {
        dbg("loaded freemail_domains entries: $count normal, $wcount wildcard");
    }
    else {
-        if ($self->{main}->{lint_rules} ||1) {
        dbg("no freemail_domains entries defined, disabling plugin");
-        }
-        else {
-            warn("no freemail_domains entries defined, disabling plugin");
-        }
        $self->{freemail_available} = 0;
    }

@ -315,29 +303,29 @@ sub _is_freemail {

    return 0 if $email eq '';

-    if (defined $self->{freemail_whitelist}{$email}) {
-        dbg("whitelisted email: $email");
+    if (defined $self->{freemail_welcomelist}{$email}) {
+        dbg("welcomelisted email: $email");
        return 0;
    }

    my $domain = $email;
    $domain =~ s/.*\@//;

-    if (defined $self->{freemail_whitelist}{$domain}) {
-        dbg("whitelisted domain: $domain");
+    if (defined $self->{freemail_welcomelist}{$domain}) {
+        dbg("welcomelisted domain: $domain");
        return 0;
    }

-    if ($email =~ $email_whitelist) {
-        dbg("whitelisted email, default: $email");
+    if ($email =~ $email_welcomelist) {
+        dbg("welcomelisted email, default: $email");
        return 0;
    }

-    foreach my $list ('whitelist_auth','def_whitelist_auth') {
+    foreach my $list ('welcomelist_auth','def_welcomelist_auth') {
        if ($pms->{conf}->{"freemail_import_$list"}) {
            foreach my $regexp (values %{$pms->{conf}->{$list}}) {
                if ($email =~ /$regexp/o) {
-                    dbg("whitelisted email, $list: $email");
+                    dbg("welcomelisted email, $list: $email");
                    return 0;
                }
            }
@ -422,21 +410,13 @@ sub _parse_body {
    return 1;
 }

-sub _got_hit {
-    my ($self, $pms, $email, $desc) = @_;
+sub _test_log {
+    my ($self, $pms, $email, $rulename) = @_;

-    my $rulename = $pms->get_current_eval_rule_name();
-
-    if (defined $pms->{conf}->{descriptions}->{$rulename}) {
-        $desc = $pms->{conf}->{descriptions}->{$rulename};
+    if ($pms->{conf}->{freemail_add_describe_email}) {
+        $email =~ s/\@/(at)/g;
+        $pms->test_log($email, $rulename);
    }
-
-    if ($pms->{main}->{conf}->{freemail_add_describe_email}) {
-        $email =~ s/\@/[at]/g;
-        $pms->test_log($email);
-    }
-
-    $pms->got_hit($rulename, "", description => $desc, ruletype => 'eval');
 }

 sub check_freemail_header {
@ -448,7 +428,7 @@ sub check_freemail_header {
    dbg("RULE ($rulename) check_freemail_header".(defined $regex ? " regex:$regex" : ""));

    unless (defined $header) {
-        warn("check_freemail_header needs argument");
+        warn("freemail: check_freemail_header needs argument\n");
        return 0;
    }

@ -462,13 +442,13 @@ sub check_freemail_header {
        $re = $rec;
    }

-    my @emails = map (lc, $pms->{main}->find_all_addrs_in_line ($pms->get($header)));
+    my @emails = map (lc, $pms->get("$header:addr"));

    if (!scalar (@emails)) {
         dbg("header $header not found from mail");
         return 0;
    }
-    dbg("addresses from header $header: ".join(';',@emails));
+    dbg("addresses from header $header: ".join(', ', @emails));

    foreach my $email (@emails) {    
        if ($self->_is_freemail($email, $pms)) {
@ -479,7 +459,7 @@ sub check_freemail_header {
            else {
                dbg("HIT! $email is freemail");
            }
-            $self->_got_hit($pms, $email, "Header $header is freemail");
+            $self->_test_log($pms, $email, $rulename);
            return 1;
         }
    }
@ -511,16 +491,16 @@ sub check_freemail_body {
        foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
            if ($email =~ /$re/o) {
                dbg("HIT! email from body is freemail and matches regex: $email");
-                $self->_got_hit($pms, $email, "Email from body is freemail");
-                return 0;
+                $self->_test_log($pms, $email, $rulename);
+                return 1;
            }
        }
    }
    elsif (scalar keys %{$pms->{freemail_cache}{body}}) {
        my $emails = join(', ', keys %{$pms->{freemail_cache}{body}});
        dbg("HIT! body has freemails: $emails");
-        $self->_got_hit($pms, $emails, "Body contains freemails");
-        return 0;
+        $self->_test_log($pms, $emails, $rulename);
+        return 1;
    }

    return 0;
@ -563,8 +543,8 @@ sub check_freemail_from {
        else {
            dbg("HIT! $email is freemail");
        }
-        $self->_got_hit($pms, $email, "Sender address is freemail");
-        return 0;
+        $self->_test_log($pms, $email, $rulename);
+        return 1;
    }

    return 0;
@ -580,7 +560,7 @@ sub check_freemail_replyto {

    if (defined $what) {
        if ($what ne 'replyto' and $what ne 'reply') {
-            warn("invalid check_freemail_replyto option: $what");
+            warn("freemail: invalid check_freemail_replyto option: $what\n");
            return 0;
        }
    }
@ -590,25 +570,34 @@ sub check_freemail_replyto {

    # Skip mailing-list etc looking requests, mostly FPs from them
    if ($pms->{main}->{conf}->{freemail_skip_bulk_envfrom}) {
-        my $envfrom = lc($pms->get("EnvelopeFrom"));
-        if ($envfrom =~ $skip_replyto_envfrom) {
+        my $envfrom = ($pms->get("EnvelopeFrom"))[0];
+        if (defined $envfrom && $envfrom =~ $skip_replyto_envfrom) {
            dbg("envelope sender looks bulk, skipping check: $envfrom");
            return 0;
        }
    }

-    my $from = lc($pms->get("From:addr"));
-    my $replyto = lc($pms->get("Reply-To:addr"));
-    my $from_is_fm = $self->_is_freemail($from, $pms);
-    my $replyto_is_fm = $self->_is_freemail($replyto, $pms);
+    my @from_addrs = map (lc, $pms->get("From:addr"));
+    dbg("From address: ".join(", ", @from_addrs)) if @from_addrs;

-    dbg("From address: $from") if $from ne '';
-    dbg("Reply-To address: $replyto") if $replyto ne '';
+    my @replyto_addrs = map (lc, $pms->get("Reply-To:addr"));
+    dbg("Reply-To address: ".join(", ", @replyto_addrs)) if @replyto_addrs;

-    if ($from_is_fm and $replyto_is_fm and ($from ne $replyto)) {
+    my $from_is_fm = grep { $self->_is_freemail($_, $pms) } @from_addrs;
+    my $replyto_is_fm = grep { $self->_is_freemail($_, $pms) } @replyto_addrs;
+
+    my $from_not_in_replyto = 1;
+    foreach my $from (@from_addrs) {
+        next unless grep { $_ eq $from } @replyto_addrs;
+        $from_not_in_replyto = 0;
+    }
+
+    if ($from_is_fm and $replyto_is_fm and $from_not_in_replyto) {
        dbg("HIT! From and Reply-To are different freemails");
-        $self->_got_hit($pms, "$from, $replyto", "From and Reply-To are different freemails");
-        return 0;
+        my $from = join(",", @from_addrs);
+        my $replyto = join(",", @replyto_addrs);
+        $self->_test_log($pms, "$from -> $replyto", $rulename);
+        return 1;
    }

    if ($what eq 'replyto') {
@ -618,7 +607,7 @@ sub check_freemail_replyto {
        }
    }
    elsif ($what eq 'reply') {
-        if ($replyto ne '' and !$replyto_is_fm) {
+        if (@replyto_addrs and !$replyto_is_fm) {
            dbg("Reply-To defined and is not freemail, skipping check");
            return 0;
        }
@ -627,19 +616,21 @@ sub check_freemail_replyto {
            return 0;
        }
    }
-    my $reply = $replyto_is_fm ? $replyto : $from;

    return 0 unless $self->_parse_body($pms);

    # Compare body to headers
    if (scalar keys %{$pms->{freemail_cache}{body}}) {
-        my $check = $what eq 'replyto' ? $replyto : $reply;
-        dbg("comparing $check to body freemails");
-        foreach my $email (keys %{$pms->{freemail_cache}{body}}) {
-            if ($email ne $check) {
-                dbg("HIT! $check and $email are different freemails");
-                $self->_got_hit($pms, "$check, $email", "Different freemails in reply header and body");
-                return 0;
+        my $reply_addrs = $what eq 'replyto' ? \@replyto_addrs :
+                              $replyto_is_fm ? \@replyto_addrs : \@from_addrs;
+        dbg("comparing to body freemails: ".join(", ", @$reply_addrs));
+        foreach my $body_email (keys %{$pms->{freemail_cache}{body}}) {
+            foreach my $reply_email (@$reply_addrs) {
+                if ($body_email ne $reply_email) {
+                    dbg("HIT! $reply_email (Reply) and $body_email (Body) are different freemails");
+                    $self->_test_log($pms, "$reply_email, $body_email", $rulename);
+                    return 1;
+                }
            }
        }
    }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
@ -17,29 +17,33 @@

 =head1 NAME

-FromNameSpoof - perform various tests to detect spoof attempts using the From header name section
+FromNameSpoof - perform various tests to detect spoof attempts using the
+From header name section

 =head1 SYNOPSIS

 loadplugin    Mail::SpamAssassin::Plugin::FromNameSpoof

- # Does the From:name look like it contains an email address
+ # From:name and From:addr do not match, matching depends on C<fns_check> setting
+ header  __PLUGIN_FROMNAME_SPOOF  eval:check_fromname_spoof()
+  
+ # From:name and From:addr do not match (same as above rule and C<fns_check 0>)
+ header  __PLUGIN_FROMNAME_DIFFERENT  eval:check_fromname_different()
+
+ # From:name and From:addr domains differ
+ header  __PLUGIN_FROMNAME_DOMAIN_DIFFER  eval:check_fromname_domain_differ()
+
+ # From:name looks like it contains an email address (not same as From:addr)
 header  __PLUGIN_FROMNAME_EMAIL  eval:check_fromname_contains_email()

- # Is the From:name different to the From:addr header
- header   __PLUGIN_FROMNAME_DIFFERENT  eval:check_fromname_different()
+ # From:name matches any To:addr
+ header  __PLUGIN_FROMNAME_EQUALS_TO  eval:check_fromname_equals_to()

 # From:name and From:addr owners differ
 header  __PLUGIN_FROMNAME_OWNERS_DIFFER  eval:check_fromname_owners_differ()

- # From:name domain differs to from header
- header   __PLUGIN_FROMNAME_DOMAIN_DIFFER  eval:check_fromname_domain_differ()
-
- # From:name and From:address don't match and owners differ
- header   __PLUGIN_FROMNAME_SPOOF  eval:check_fromname_spoof()
-  
- # From:name address matches To:address
- header __PLUGIN_FROMNAME_EQUALS_TO  eval:check_fromname_equals_to()
+ # From:name matches Reply-To:addr
+ header  __PLUGIN_FROMNAME_EQUALS_REPLYTO  eval:check_fromname_equals_replyto()

 =head1 DESCRIPTION

@ -61,17 +65,21 @@ B<Note> that FromNameSpoof detects the "owner" of a domain by the following sear

  <owner>.<tld>

-By default FromNameSpoof will ignore the TLD when testing if From:addr is spoofed.
-Default 1
+By default FromNameSpoof will ignore the TLD when comparing addresses:

  fns_check 1

 Check levels:

  0 - Strict checking of From:name != From:addr
- 1 - Allow for different tlds
+  1 - Allow for different TLDs
  2 - Allow for different aliases but same domain

+"Owner" info can also be mapped as aliases with C<fns_add_addrlist>.  For
+example, to consider "googlemail.com" as "gmail":
+
+  fns_add_addrlist (gmail) *@googlemail.com
+
 =head1 TAGS

 The following tags are added to the set if a spoof is detected. They are available for 
@ -93,48 +101,35 @@ use in reports, header fields, other plugins, etc.:
    Actual From:addr domain

  _FNSFADDROWNER_
-    Actual From:addr detected owner
+    Actual From:addr owner

 =head1 EXAMPLE 

-header   __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
-header   __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
-
-meta     FROMNAME_SPOOF_EQUALS_TO  (__PLUGIN_FROMNAME_SPOOF && __PLUGIN_FROMNAME_EQUALS_TO)
-describe FROMNAME_SPOOF_EQUALS_TO From:name is spoof to look like To: address
-score    FROMNAME_SPOOF_EQUALS_TO 1.2
+  header  __PLUGIN_FROMNAME_SPOOF  eval:check_fromname_spoof()
+  header  __PLUGIN_FROMNAME_EQUALS_TO  eval:check_fromname_equals_to()
+  meta     FROMNAME_SPOOF_EQUALS_TO (__PLUGIN_FROMNAME_SPOOF && __PLUGIN_FROMNAME_EQUALS_TO)
+  describe FROMNAME_SPOOF_EQUALS_TO From:name is spoof to look like To: address
+  score    FROMNAME_SPOOF_EQUALS_TO 1.2

 =cut

-use strict;
-
 package Mail::SpamAssassin::Plugin::FromNameSpoof;
-my $VERSION = 0.9;
+
+use strict;
+use warnings;
+use re 'taint';

 use Mail::SpamAssassin::Plugin;
-use List::Util ();
-use Mail::SpamAssassin::Util;

 use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);

-sub dbg { Mail::SpamAssassin::Plugin::dbg ("FromNameSpoof: @_"); }
+my $VERSION = 1.0;

-sub uri_to_domain {
-  my ($self, $domain) = @_;
-
-  return unless defined $domain;
-
-  if ($Mail::SpamAssassin::VERSION <= 3.004000) {
-    Mail::SpamAssassin::Util::uri_to_domain($domain);
-  } else {
-    $self->{main}->{registryboundaries}->uri_to_domain($domain);
-  }
-}
+sub dbg { my $msg = shift; Mail::SpamAssassin::Plugin::dbg("FromNameSpoof: $msg", @_); }

 # constructor: register the eval rule
-sub new
-{
+sub new {
  my $class = shift;
  my $mailsaobject = shift;

@ -146,13 +141,13 @@ sub new
  $self->set_config($mailsaobject->{conf});

  # the important bit!
-  $self->register_eval_rule("check_fromname_spoof");
-  $self->register_eval_rule("check_fromname_different");
-  $self->register_eval_rule("check_fromname_domain_differ");
-  $self->register_eval_rule("check_fromname_contains_email");
-  $self->register_eval_rule("check_fromname_equals_to");
-  $self->register_eval_rule("check_fromname_owners_differ");
-  $self->register_eval_rule("check_fromname_equals_replyto");
+  $self->register_eval_rule("check_fromname_spoof", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_fromname_different", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_fromname_domain_differ", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_fromname_contains_email", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_fromname_equals_to", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_fromname_owners_differ", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_fromname_equals_replyto", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
  return $self;
 }

@ -164,14 +159,13 @@ sub set_config {
    setting => 'fns_add_addrlist',
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
    code => sub {
-      my($self, $key, $value, $line) = @_;
+      my ($self, $key, $value, $line) = @_;
      local($1,$2);
-      if ($value !~ /^ \( (.*?) \) \s+ (.*) \z/sx) {
+      if ($value !~ /^ \( (.+?) \) \s+ (.+) \z/sx) {
        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
      }
-      my $listname = "FNS_$1";
-      $value = $2;
-      $self->{parser}->add_to_addrlist ($listname, split(/\s+/, lc($value)));
+      my $listname = "FNS_".lc($1);
+      $self->{parser}->add_to_addrlist($listname, split(/\s+/, lc $2));
      $self->{fns_addrlists}{$listname} = 1;
    }
  });
@ -180,14 +174,13 @@ sub set_config {
    setting => 'fns_remove_addrlist',
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
    code => sub {
-      my($self, $key, $value, $line) = @_;
+      my ($self, $key, $value, $line) = @_;
      local($1,$2);
-      if ($value !~ /^ \( (.*?) \) \s+ (.*) \z/sx) {
+      if ($value !~ /^ \( (.+?) \) \s+ (.+) \z/sx) {
        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
      }
-      my $listname = "FNS_$1";
-      $value = $2;
-      $self->{parser}->remove_from_addrlist ($listname, split (/\s+/, $value));
+      my $listname = "FNS_".lc($1);
+      $self->{parser}->remove_from_addrlist($listname, split (/\s+/, lc $2));
    }
  });

@ -206,7 +199,7 @@ sub set_config {
      if ($value eq '') {
        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
      }
-      $self->{fns_ignore_dkim}->{$_} = 1 foreach (split(/\s+/, lc($value)));
+      $self->{fns_ignore_dkim}->{$_} = 1 foreach (split(/\s+/, lc $value));
    }
  });

@ -227,6 +220,16 @@ sub set_config {
    setting => 'fns_check',
    default => 1,
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if ($value eq '') {
+        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      if ($value !~ /^[012]$/) {
+        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
+      }
+      $self->{fns_check} = $value;
+    }
  });

  $conf->{parser}->register_commands(\@cmds);
@ -235,203 +238,287 @@ sub set_config {
 sub parsed_metadata {
  my ($self, $opts) = @_;
  my $pms = $opts->{permsgstatus};
-  $pms->action_depends_on_tags('DKIMDOMAIN',
-      sub { my($pms,@args) = @_;
-        $self->_check_fromnamespoof($pms);
+
+  # If fns_ignore_dkim used, force wait for DKIM results
+  if (%{$pms->{conf}->{fns_ignore_dkim}}) {
+    if ($self->{main}->{local_tests_only}) {
+      dbg("local tests only, ignoring fns_ignore_dkim setting");
+    }
+    # Check that DKIM module is loaded (a bit kludgy check)
+    elsif (exists $pms->{conf}->{dkim_timeout}) {
+      # Initialize async queue, any eval calls will queue their checks
+      $pms->{fromname_async_queue} = [];
+      # Process and finish queue as soon as DKIM is ready
+      $pms->action_depends_on_tags('DKIMDOMAIN', sub {
+        $self->_check_async_queue($pms);
+      });
+    } else {
+      dbg("DKIM plugin not loaded, ignoring fns_ignore_dkim setting");
+    }
  }
-  );
-  1;
 }

-sub check_fromname_different
-{
-  my ($self, $pms) = @_;
+sub _check_eval {
+  my ($self, $pms, $result) = @_;
+
+  if (exists $pms->{fromname_async_queue}) {
+    my $rulename = $pms->get_current_eval_rule_name();
+    push @{$pms->{fromname_async_queue}}, sub {
+      if ($result->()) {
+        $pms->got_hit($rulename, '', ruletype => 'header');
+      } else {
+        $pms->rule_ready($rulename);
+      }
+    };
+    return; # return undef for async status
+  }
+
  $self->_check_fromnamespoof($pms);
-  return $pms->{fromname_address_different};
+  # make sure not to return undef, as this is not async anymore
+  return $result->() || 0;
 }

-sub check_fromname_domain_differ
-{
-  my ($self, $pms) = @_;
-  $self->_check_fromnamespoof($pms);
-  return $pms->{fromname_domain_different};
-}
-
-sub check_fromname_spoof
-{
+sub check_fromname_spoof {
  my ($self, $pms, $check_lvl) = @_;
-  $self->_check_fromnamespoof($pms);

-  if ( not defined $check_lvl ) {
+  # Some deprecated eval parameter, was not documented?
+  if (!defined $check_lvl || $check_lvl !~ /^[012]$/) {
    $check_lvl = $pms->{conf}->{fns_check};
  }

+  my $result = sub {
    my @array = (
-    ($pms->{fromname_address_different}) ,
-    ($pms->{fromname_address_different} && $pms->{fromname_owner_different}) ,
+      ($pms->{fromname_address_different}),
+      ($pms->{fromname_address_different} && $pms->{fromname_owner_different}),
      ($pms->{fromname_address_different} && $pms->{fromname_domain_different})
    );
+    $array[$check_lvl];
+  };

-  return $array[$check_lvl];
-
+  return $self->_check_eval($pms, $result);
 }

-sub check_fromname_contains_email
-{
+sub check_fromname_different {
  my ($self, $pms) = @_;
+
+  my $result = sub {
+    $pms->{fromname_address_different};
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_fromname_domain_differ {
+  my ($self, $pms) = @_;
+
+  my $result = sub {
+    $pms->{fromname_domain_different};
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_fromname_contains_email {
+  my ($self, $pms) = @_;
+
+  my $result = sub {
+    $pms->{fromname_contains_email};
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_fromname_equals_to {
+  my ($self, $pms) = @_;
+
+  my $result = sub {
+    $pms->{fromname_equals_to_addr};
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_fromname_owners_differ {
+  my ($self, $pms) = @_;
+
+  my $result = sub {
+    $pms->{fromname_owner_different};
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_fromname_equals_replyto {
+  my ($self, $pms) = @_;
+
+  my $result = sub {
+    $pms->{fromname_equals_replyto};
+  };
+
+  return $self->_check_eval($pms, $result);
+}
+
+sub check_cleanup {
+  my ($self, $opts) = @_;
+
+  $self->_check_async_queue($opts->{permsgstatus});
+}
+
+# Shall only be called when DKIMDOMAIN is ready, or from check_cleanup() to
+# make sure _check_fromnamespoof is called if DKIMDOMAIN was never set
+sub _check_async_queue {
+  my ($self, $pms) = @_;
+
+  if (exists $pms->{fromname_async_queue}) {
    $self->_check_fromnamespoof($pms);
-  return $pms->{fromname_contains_email};
+    $_->() foreach (@{$pms->{fromname_async_queue}});
+    # No more async queueing needed.  If any evals are called later, they
+    # will act on the results directly.
+    delete $pms->{fromname_async_queue};
+  }
 }

-sub check_fromname_equals_replyto
-{
-  my ($self, $pms) = @_;
-  $self->_check_fromnamespoof($pms);
-  return $pms->{fromname_equals_replyto};
-}
-
-sub check_fromname_equals_to
-{
-  my ($self, $pms) = @_;
-  $self->_check_fromnamespoof($pms);
-  return $pms->{fromname_equals_to_addr};
-}
-
-sub check_fromname_owners_differ
-{
-  my ($self, $pms) = @_;
-  $self->_check_fromnamespoof($pms);
-  return $pms->{fromname_owner_different};
-}
-
-sub _check_fromnamespoof
-{
+sub _check_fromnamespoof {
  my ($self, $pms) = @_;

-  return if (defined $pms->{fromname_contains_email});
+  return if $pms->{fromname_checked};
+  $pms->{fromname_checked} = 1;

  my $conf = $pms->{conf};

-  $pms->{fromname_contains_email} = 0;
-  $pms->{fromname_address_different} = 0;
-  $pms->{fromname_equals_to_addr} = 0;
-  $pms->{fromname_domain_different} = 0;
-  $pms->{fromname_owner_different} = 0;
-  $pms->{fromname_equals_replyto} = 0;
-
-  foreach my $addr (split / /, $pms->get_tag('DKIMDOMAIN') || '') {
-    if ($conf->{fns_ignore_dkim}->{lc($addr)}) {
+  foreach my $addr (split(/\s+/, $pms->get_tag('DKIMDOMAIN')||'')) {
+    if ($conf->{fns_ignore_dkim}->{lc $addr}) {
      dbg("ignoring, DKIM signed: $addr");
-      return 0;
+      return;
    }
  }

  foreach my $iheader (keys %{$conf->{fns_ignore_header}}) {
    if ($pms->get($iheader)) {
      dbg("ignoring, header $iheader found");
-      return 0 if ($pms->get($iheader));
+      return;
    }
  }

+  # Parse From addr
+  my $from_addr = lc $pms->get('From:addr');
+  my $from_domain = $self->{main}->{registryboundaries}->uri_to_domain("mailto:$from_addr");
+  return unless defined $from_domain;
+
+  # Parse From name
+  my $fromname = lc $pms->get('From:name');
+  # Very common to have From address cloned into name, ignore?
+  #if ($fromname eq $from_addr) {
+  #  dbg("ignoring, From-name is exactly same as From addr: $fromname");
+  #  return;
+  #}
+  my ($fromname_addr, $fromname_domain);
+  if ($fromname =~ /\b([\w\.\!\#\$\%\&\'\*\+\/\=\?\^\_\`\{\|\}\~-]+\@\w[\w-]*\.\w[\w.-]++)\b/i) {
+    $fromname_addr = $1;
+    $fromname_domain = $self->{main}->{registryboundaries}->uri_to_domain("mailto:$fromname_addr");
+    # No valid domain/TLD found? Any reason to keep testing a possibly obfuscated one?
+    if (!defined $fromname_domain) {
+      dbg("no From-name addr found");
+      return;
+    }
+    $pms->{fromname_contains_email} = 1; # check_fromname_contains_email hit
+    # Calculate "closeness" (this really needs documentation, as it's hard to understand)
+    my $nochar = ($fromname =~ y/a-z0-9//c);
+    $nochar -= ($fromname_addr =~ y/a-z0-9//c);
+    my $len = length($fromname) + $nochar - length($fromname_addr);
+    unless ($len <= $conf->{fns_extrachars}) {
+      dbg("not enough closeness for From-name/addr: $fromname <=> $fromname_addr ($len <= $conf->{fns_extrachars})");
+      return;
+    }
+  } else {
+    # No point continuing if email was not found inside name
+    dbg("no From-name addr found");
+    return;
+  }
+
+  # Parse owners
  my $list_refs = {};
-
  if ($conf->{fns_addrlists}) {
    my @lists = keys %{$conf->{fns_addrlists}};
    foreach my $list (@lists) {
      $list_refs->{$list} = $conf->{$list};
    }
-    s/^FNS_// foreach (@lists);
-    dbg("using addrlists: ".join(', ', @lists));
+    dbg("using addrlists for owner aliases: ".join(', ', map { s/^FNS_//r; } @lists));
  }
+  my $fromname_owner = $self->_find_address_owner($fromname_addr, $fromname_domain, $list_refs);
+  my $from_owner = $self->_find_address_owner($from_addr, $from_domain, $list_refs);

-  my %fnd = ();
-  my %fad = ();
-  my %tod = ();
+  dbg("Parsed From-name addr/domain/owner: $fromname_addr/$fromname_domain/$fromname_owner");
+  dbg("Parsed From-addr addr/domain/owner: $from_addr/$from_domain/$from_owner");

-  $fnd{'addr'} = $pms->get("From:name");
-
-  if ($fnd{'addr'} =~ /\b((?>[\w\.\!\#\$\%\&\'\*\+\/\=\?\^\_\`\{\|\}\~\-]+@[\w\-\.]+\.[\w\-\.]+))\b/i) {
-    my $nochar = ($fnd{'addr'} =~ y/A-Za-z0-9//c);
-    $nochar -= ($1 =~ y/A-Za-z0-9//c);
-
-    return 0 unless ((length($fnd{'addr'})+$nochar) - length($1) <= $conf->{'fns_extrachars'});
-
-    $fnd{'addr'} = lc $1;
-  } else {
-    return 0;
+  if ($fromname_addr ne $from_addr) {
+    dbg("From-name addr differs from From addr: $fromname_addr != $from_addr");
+    $pms->{fromname_address_different} = 1;
  }
-
-  my $replyto = lc $pms->get("Reply-To:addr");
-
-  $fad{'addr'} = lc $pms->get("From:addr");
-  my @toaddrs = $pms->all_to_addrs();
-  return 0 unless @toaddrs;
-
-  $tod{'addr'} = lc $toaddrs[0];
-
-  $fnd{'domain'} = $self->uri_to_domain($fnd{'addr'});
-  $fad{'domain'} = $self->uri_to_domain($fad{'addr'});
-  $tod{'domain'} = $self->uri_to_domain($tod{'addr'});
-
-  return 0 unless (defined $fnd{'domain'} && defined $fad{'domain'});
-
-  $pms->{fromname_contains_email} = 1;
-
-  $fnd{'owner'} = $self->_find_address_owner($fnd{'addr'}, $list_refs);
-
-  $fad{'owner'} = $self->_find_address_owner($fad{'addr'}, $list_refs);
-
-  $tod{'owner'} = $self->_find_address_owner($tod{'addr'}, $list_refs);
-
-  $pms->{fromname_address_different} = 1 if ($fnd{'addr'} ne $fad{'addr'});
-
-  $pms->{fromname_domain_different} = 1 if ($fnd{'domain'} ne $fad{'domain'});
-
-  $pms->{fromname_equals_to_addr} = 1 if ($fnd{'addr'} eq $tod{addr});
-
-  $pms->{fromname_equals_replyto} = 1 if ($fnd{'addr'} eq $replyto);
-
-  if ($fnd{'owner'} ne $fad{'owner'}) {
+  if ($fromname_domain ne $from_domain) {
+    dbg("From-name domain differs from From domain: $fromname_domain != $from_domain");
+    $pms->{fromname_domain_different} = 1;
+  }
+  if ($fromname_owner ne $from_owner) {
+    dbg("From-name owner differs from From owner: $fromname_owner != $from_owner");
    $pms->{fromname_owner_different} = 1;
  }

-  if ($pms->{fromname_address_different}) {
-    $pms->set_tag("FNSFNAMEADDR", $fnd{'addr'});
-    $pms->set_tag("FNSFADDRADDR", $fad{'addr'});
-    $pms->set_tag("FNSFNAMEOWNER", $fnd{'owner'});
-    $pms->set_tag("FNSFADDROWNER", $fad{'owner'});
-    $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
-    $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
+  # Check Reply-To related
+  my $replyto_addr = lc $pms->get('Reply-To:addr');
+  if ($fromname_addr eq $replyto_addr) {
+    dbg("From-name addr is same as Reply-To addr: $fromname_addr");
+    $pms->{fromname_equals_replyto} = 1;
+  }

-    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
-    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
-    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
+  # Check To related
+  foreach my $to_addr ($pms->all_to_addrs()) {
+    if ($fromname_addr eq $to_addr) {
+      dbg("From-name addr is same as To addr: $fromname_addr");
+      $pms->{fromname_equals_to_addr} = 1;
+      last;
+    }
+  }
+
+  # Set tags
+  if ($pms->{fromname_address_different} || $pms->{fromname_owner_different}) {
+    $pms->set_tag("FNSFNAMEADDR", $fromname_addr);
+    $pms->set_tag("FNSFNAMEDOMAIN", $fromname_domain);
+    $pms->set_tag("FNSFNAMEOWNER", $fromname_owner);
+    $pms->set_tag("FNSFADDRADDR", $from_addr);
+    $pms->set_tag("FNSFADDRDOMAIN", $from_domain);
+    $pms->set_tag("FNSFADDROWNER", $from_owner);
  }
 }

-sub _find_address_owner
-{
-  my ($self, $check, $list_refs) = @_;
+sub _find_address_owner {
+  my ($self, $addr, $addr_domain, $list_refs) = @_;
+
+  # Check fns addrlist first for user defined mapping
  foreach my $owner (keys %{$list_refs}) {
-    foreach my $white_addr (keys %{$list_refs->{$owner}}) {
-      my $regexp = qr/$list_refs->{$owner}{$white_addr}/i;
-      if ($check =~ /$regexp/)  {
-        $owner =~ s/^FNS_//i;
+    foreach my $listaddr (keys %{$list_refs->{$owner}}) {
+      if ($addr =~ $list_refs->{$owner}{$listaddr}) {
+        $owner =~ s/^FNS_//;
        return lc $owner;
      }
    }
  }

-  my $owner = $self->uri_to_domain($check);
-
-  $check =~ /^([^\@]+)\@(.*)$/;
-
-  if ($owner ne $2) {
-    return $self->_find_address_owner("$1\@$owner", $list_refs);
+  # If we have subdomain addr foo.bar@sub.domain.com,
+  # this will try to recheck foo.bar@domain.com from addrlist
+  local($1,$2);
+  if ($addr =~ /^([^\@]+)\@(.+)$/) {
+    if ($2 ne $addr_domain) {
+      return $self->_find_address_owner("$1\@$addr_domain", $addr_domain, $list_refs);
+    }
  }

-  $owner =~ /^([^\.]+)\./;
-  return lc $1;
+  # Grab the first component of TLD
+  if ($addr_domain =~ /^([^.]+)\./) {
+    return $1;
+  } else {
+    return $addr_domain;
+  }
 }

 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/HTMLEval.pm
@ -39,18 +39,18 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule("html_tag_balance");
-  $self->register_eval_rule("html_image_only");
-  $self->register_eval_rule("html_image_ratio");
-  $self->register_eval_rule("html_charset_faraway");
-  $self->register_eval_rule("html_tag_exists");
-  $self->register_eval_rule("html_test");
-  $self->register_eval_rule("html_eval");
-  $self->register_eval_rule("html_text_match");
-  $self->register_eval_rule("html_title_subject_ratio");
-  $self->register_eval_rule("html_text_not_match");
-  $self->register_eval_rule("html_range");
-  $self->register_eval_rule("check_iframe_src");
+  $self->register_eval_rule("html_tag_balance", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_image_only", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_image_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_charset_faraway", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_tag_exists", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_test", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_eval", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_text_match", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_title_subject_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_text_not_match", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("html_range", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_iframe_src", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  return $self;
 }
@ -61,65 +61,88 @@ sub html_tag_balance {
  return 0 if $rawtag !~ /^([a-zA-Z0-9]+)$/;
  my $tag = $1;

-  return 0 unless exists $pms->{html}{inside}{$tag};
-
  return 0 if $rawexpr !~ /^([\<\>\=\!\-\+ 0-9]+)$/;
  my $expr = untaint_var($1);

-  $pms->{html}{inside}{$tag} =~ /^([\<\>\=\!\-\+ 0-9]+)$/;
+  foreach my $html (@{$pms->{html_all}}) {
+    next unless exists $html->{inside}{$tag};
+    $html->{inside}{$tag} =~ /^([\<\>\=\!\-\+ 0-9]+)$/;
    my $val = untaint_var($1);
+    return 1 if eval "\$val $expr";
+  }

-  return eval "\$val $expr";
+  return 0;
 }

 sub html_image_only {
  my ($self, $pms, undef, $min, $max) = @_;

-  return (exists $pms->{html}{inside}{img} &&
-	  exists $pms->{html}{length} &&
-	  $pms->{html}{length} > $min &&
-	  $pms->{html}{length} <= $max);
+  foreach my $html (@{$pms->{html_all}}) {
+    if (exists $html->{inside}{img} && exists $html->{length} &&
+        $html->{length} > $min && $html->{length} <= $max)
+    {
+      return 1;
+    }
+  }
+
+  return 0;
 }

 sub html_image_ratio {
  my ($self, $pms, undef, $min, $max) = @_;

-  return 0 unless (exists $pms->{html}{non_space_len} &&
-		   exists $pms->{html}{image_area} &&
-		   $pms->{html}{image_area} > 0);
-  my $ratio = $pms->{html}{non_space_len} / $pms->{html}{image_area};
-  return ($ratio > $min && $ratio <= $max);
+  foreach my $html (@{$pms->{html_all}}) {
+    next unless (exists $html->{non_space_len} &&
+                 exists $html->{image_area} &&
+                 $html->{image_area} > 0);
+    my $ratio = $html->{non_space_len} / $html->{image_area};
+    return 1 if $ratio > $min && $ratio <= $max;
+  }
+
+  return 0;
 }

 sub html_charset_faraway {
  my ($self, $pms) = @_;

-  return 0 unless exists $pms->{html}{charsets};
-
  my @locales = Mail::SpamAssassin::Util::get_my_locales($pms->{conf}->{ok_locales});
  return 0 if grep { $_ eq "all" } @locales;

+  foreach my $html (@{$pms->{html_all}}) {
+    next unless exists $html->{charsets};
    my $okay = 0;
    my $bad = 0;
-  for my $c (split(' ', $pms->{html}{charsets})) {
+    foreach my $c (split(/\s+/, $html->{charsets})) {
      if (Mail::SpamAssassin::Locales::is_charset_ok_for_locales($c, @locales)) {
        $okay++;
-    }
-    else {
+      } else {
        $bad++;
      }
    }
-  return ($bad && ($bad >= $okay));
+    return 1 if $bad && $bad >= $okay;
+  }
+
+  return 0;
 }

 sub html_tag_exists {
  my ($self, $pms, undef, $tag) = @_;
-  return exists $pms->{html}{inside}{$tag};
+
+  foreach my $html (@{$pms->{html_all}}) {
+    return 1 if exists $html->{inside}{$tag};
+  }
+
+  return 0;
 }

 sub html_test {
  my ($self, $pms, undef, $test) = @_;
-  return $pms->{html}{$test};
+
+  foreach my $html (@{$pms->{html_all}}) {
+    return 1 if $html->{$test};
+  }
+
+  return 0;
 }

 sub html_eval {
@ -128,29 +151,38 @@ sub html_eval {
  return 0 if $rawexpr !~ /^([\<\>\=\!\-\+ 0-9]+)$/;
  my $expr = untaint_var($1);

+  foreach my $html (@{$pms->{html_all}}) {
    # workaround bug 3320: weird perl bug where additional, very explicit
    # untainting into a new var is required.
-  my $tainted = $pms->{html}{$test};
-  return 0 unless defined($tainted);
+    my $tainted = $html->{$test};
+    next unless defined($tainted);
    my $val = $tainted;
-
    # just use the value in $val, don't copy it needlessly
-  return eval "\$val $expr";
+    return 1 if eval "\$val $expr";
+  }
+
+  return 0;
 }

 sub html_text_match {
  my ($self, $pms, undef, $text, $regexp) = @_;
+
  my ($rec, $err) = compile_regexp($regexp, 0);
  if (!$rec) {
    warn "htmleval: html_text_match invalid regexp '$regexp': $err";
    return 0;
  }
-  foreach my $string (@{$pms->{html}{$text}}) {
+
+  foreach my $html (@{$pms->{html_all}}) {
+    next unless ref($html->{$text}) eq 'ARRAY';
+    foreach my $string (@{$html->{$text}}) {
      next unless defined $string;
      if ($string =~ $rec) {
        return 1;
      }
    }
+  }
+
  return 0;
 }

@ -161,54 +193,74 @@ sub html_title_subject_ratio {
  if ($subject eq '') {
    return 0;
  }
+
+  foreach my $html (@{$pms->{html_all}}) {
    my $max = 0;
-  for my $string (@{ $pms->{html}{title} }) {
+    foreach my $string (@{$html->{title}}) {
      if ($string) {
-      my $ratio = length($string) / length($subject);
-      $max = $ratio if $ratio > $max;
+        my $ratio_s = length($string) / length($subject);
+        $max = $ratio_s if $ratio_s > $max;
      }
    }
-  return $max > $ratio;
+    return 1 if $max > $ratio;
+  }
+
+  return 0;
 }

 sub html_text_not_match {
  my ($self, $pms, undef, $text, $regexp) = @_;
-  for my $string (@{ $pms->{html}{$text} }) {
-    if (defined $string && $string !~ /${regexp}/) {
+
+  my ($rec, $err) = compile_regexp($regexp, 0);
+  if (!$rec) {
+    warn "htmleval: html_text_not_match invalid regexp '$regexp': $err";
+    return 0;
+  }
+
+  foreach my $html (@{$pms->{html_all}}) {
+    next unless ref($html->{$text}) eq 'ARRAY';
+    foreach my $string (@{$html->{$text}}) {
+      if (defined $string && $string !~ $rec) {
        return 1;
      }
    }
+  }
+
  return 0;
 }

 sub html_range {
  my ($self, $pms, undef, $test, $min, $max) = @_;

-  return 0 unless exists $pms->{html}{$test};
-
-  $test = $pms->{html}{$test};
-
+  foreach my $html (@{$pms->{html_all}}) {
+    next unless defined $html->{$test};
+    my $value = $html->{$test};
    # not all perls understand what "inf" means, so we need to do
    # non-numeric tests!  urg!
    if (!defined $max || $max eq "inf") {
-    return ($test eq "inf") ? 1 : ($test > $min);
+      return 1 if $value > $min;
    }
-  elsif ($test eq "inf") {
-    # $max < inf, so $test == inf means $test > $max
-    return 0;
+    elsif ($value eq "inf") {
+      # $max < inf, so $value == inf means $value > $max
+      next;
    }
    else {
      # if we get here everything should be a number
-    return ($test > $min && $test <= $max);
+      return 1 if $value > $min && $value <= $max;
    }
+  }
+
+  return 0;
 }

 sub check_iframe_src {
  my ($self, $pms) = @_;

-  foreach my $v ( values %{$pms->{html}->{uri_detail}} ) {
+  foreach my $html (@{$pms->{html_all}}) {
+    foreach my $v (values %{$html->{uri_detail}}) {
      return 1 if $v->{types}->{iframe};
    }
+  }

  return 0;
 }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/HTTPSMismatch.pm
@ -38,7 +38,7 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule ("check_https_http_mismatch");
+  $self->register_eval_rule ("check_https_http_mismatch", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  return $self;
 }
@ -47,64 +47,62 @@ sub new {
 # ("<" and ">" replaced with "[lt]" and "[gt]" to avoid Kaspersky Desktop AV
 # false positive ;)
 sub check_https_http_mismatch {
-  my ($self, $permsgstatus, undef, $minanchors, $maxanchors) = @_;
-
-  my $IP_ADDRESS = IP_ADDRESS;
+  my ($self, $pms, undef, $minanchors, $maxanchors) = @_;

  $minanchors ||= 1;

-  if (!exists $permsgstatus->{chhm_hit}) {
-    $permsgstatus->{chhm_hit} = 0;
-    $permsgstatus->{chhm_anchors} = 0;
+  foreach my $html (@{$pms->{html_all}}) {
+    my $hit = 0;
+    my $anchors = 0;
+    foreach my $k (keys %{$html->{uri_detail}}) {
+      my $v = $html->{uri_detail}->{$k};

-    foreach my $k ( keys %{$permsgstatus->{html}->{uri_detail}} ) {
-      my %uri_detail = %{$permsgstatus->{html}->{uri_detail}};
-      my $v = ${uri_detail}{$k};
      # if the URI wasn't used for an anchor tag, or the anchor text didn't
      # exist, skip this.
-      next unless (exists $v->{anchor_text} && @{$v->{anchor_text}});
+      next unless exists $v->{anchor_text} && @{$v->{anchor_text}};

      my $uri;
-      if ($k =~ m@^https?://([^/:]+)@i) {
+      if ($k =~ m@^https?://([^/:?#]+)@i) {
        $uri = $1;
        # Skip IPs since there's another rule to catch that already
-        if ($uri =~ /^$IP_ADDRESS+$/) {
-          undef $uri;
+        if ($uri =~ IS_IP_ADDRESS) {
+          $uri = undef;
          next;
        } 
        # want to compare whole hostnames instead of domains?
        # comment this next section to the blank line.
        $uri = $self->{main}->{registryboundaries}->trim_domain($uri);
-        undef $uri unless ($self->{main}->{registryboundaries}->is_domain_valid($uri));
+        my $domain = $self->{main}->{registryboundaries}->uri_to_domain($uri);
+        $uri = undef  unless $self->{main}->{registryboundaries}->is_domain_valid($domain);
      }
-
      next unless $uri;
-      $permsgstatus->{chhm_anchors}++ if exists $v->{anchor_text};

+      $anchors++ if exists $v->{anchor_text};
      foreach (@{$v->{anchor_text}}) {
-        if (m@https://([^/:]+)@i) {
+        if (m@https://([^\s/:?#]+)@i) {
          my $https = $1;

 	  # want to compare whole hostnames instead of domains?
 	  # comment this next section to the blank line.
-          if ($https !~ /^$IP_ADDRESS+$/) {
+          if ($https !~ IS_IP_ADDRESS) {
 	    $https = $self->{main}->{registryboundaries}->trim_domain($https);
-            undef $https unless ($self->{main}->{registryboundaries}->is_domain_valid($https));
+            $https = undef  unless $self->{main}->{registryboundaries}->is_domain_valid($https);
          }
 	  next unless $https;
-
 	  dbg("https_http_mismatch: domains $uri -> $https");
-
 	  next if $uri eq $https;
-	  $permsgstatus->{chhm_hit} = 1;
+	  $hit = 1;
 	  last;
        }
      }
    }
-    dbg("https_http_mismatch: anchors ".$permsgstatus->{chhm_anchors});
+
+    dbg("https_http_mismatch: anchors $anchors");
+    return 1 if $hit && $anchors >= $minanchors &&
+                (!defined $maxanchors || $anchors < $maxanchors);
  }

-  return ( $permsgstatus->{chhm_hit} && $permsgstatus->{chhm_anchors} >= $minanchors && (defined $maxanchors && $permsgstatus->{chhm_anchors} < $maxanchors) );
+  return 0;
 }

 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/HashBL.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/HashBL.pm
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Hashcash.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Hashcash.pm
@ -1,352 +0,0 @@
-# <@LICENSE>
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to you under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at:
-# 
-#     http://www.apache.org/licenses/LICENSE-2.0
-# 
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# </@LICENSE>
-
-=head1 NAME
-
-Mail::SpamAssassin::Plugin::Hashcash - perform hashcash verification tests
-
-=head1 SYNOPSIS
-
-  loadplugin     Mail::SpamAssassin::Plugin::Hashcash
-
-=head1 DESCRIPTION
-
-Hashcash is a payment system for email where CPU cycles used as the
-basis for an e-cash system.  This plugin makes it possible to use valid
-hashcash tokens added by mail programs as a bonus for messages.
-
-=cut
-
-=head1 USER SETTINGS
-
-=over 4
-
-=item use_hashcash { 1 | 0 }   (default: 1)
-
-Whether to use hashcash, if it is available.
-
-=cut
-
-=item hashcash_accept user@example.com ...
-
-Used to specify addresses that we accept HashCash tokens for.  You should set
-it to match all the addresses that you may receive mail at.
-
-Like whitelist and blacklist entries, the addresses are file-glob-style
-patterns, so C<friend@somewhere.com>, C<*@isp.com>, or C<*.domain.net> will all
-work.  Specifically, C<*> and C<?> are allowed, but all other metacharacters
-are not.  Regular expressions are not used for security reasons.
-
-The sequence C<%u> is replaced with the current user's username, which
-is useful for ISPs or multi-user domains.
-
-Multiple addresses per line, separated by spaces, is OK.  Multiple
-C<hashcash_accept> lines is also OK.
-
-=cut
-
-=item hashcash_doublespend_path /path/to/file   (default: ~/.spamassassin/hashcash_seen)
-
-Path for HashCash double-spend database.  HashCash tokens are only usable once,
-so their use is tracked in this database to avoid providing a loophole.
-
-By default, each user has their own, in their C<~/.spamassassin> directory with
-mode 0700/0600.  Note that once a token is 'spent' it is written to this file,
-and double-spending of a hashcash token makes it invalid, so this is not
-suitable for sharing between multiple users.
-
-=cut
-
-=item hashcash_doublespend_file_mode            (default: 0700)
-
-The file mode bits used for the HashCash double-spend database file.
-
-Make sure you specify this using the 'x' mode bits set, as it may also be used
-to create directories.  However, if a file is created, the resulting file will
-not have any execute bits set (the umask is set to 111).
-
-=cut
-
-package Mail::SpamAssassin::Plugin::Hashcash;
-
-use strict;
-use warnings;
-# use bytes;
-use re 'taint';
-
-use Mail::SpamAssassin::Plugin;
-use Mail::SpamAssassin::Logger;
-use Mail::SpamAssassin::Util qw(untaint_var);
-
-use Errno qw(ENOENT EACCES);
-use Fcntl;
-use File::Path;
-use File::Basename;
-
-BEGIN {
-  eval { require Digest::SHA; import Digest::SHA qw(sha1); 1 }
-  or do { require Digest::SHA1; import Digest::SHA1 qw(sha1) }
-}
-
-our @ISA = qw(Mail::SpamAssassin::Plugin);
-
-use constant HAS_DB_FILE => eval { require DB_File; };
-
-# constructor: register the eval rule
-sub new {
-  my $class = shift;
-  my $mailsaobject = shift;
-
-  # some boilerplate...
-  $class = ref($class) || $class;
-  my $self = $class->SUPER::new($mailsaobject);
-  bless ($self, $class);
-
-  $self->register_eval_rule ("check_hashcash_value");
-  $self->register_eval_rule ("check_hashcash_double_spend");
-
-  $self->set_config($mailsaobject->{conf});
-
-  return $self;
-}
-
-###########################################################################
-
-sub set_config {
-  my($self, $conf) = @_;
-  my @cmds;
-
-  push(@cmds, {
-    setting => 'use_hashcash',
-    default => 1,
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
-  });
-
-  push(@cmds, {
-    setting => 'hashcash_doublespend_path',
-    default => '__userstate__/hashcash_seen',
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
-  });
-
-  push(@cmds, {
-    setting => 'hashcash_doublespend_file_mode',
-    default => "0700",
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
-  });
-
-  push(@cmds, {
-    setting => 'hashcash_accept',
-    default => {},
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
-  });
-
-  $conf->{parser}->register_commands(\@cmds);
-}
-
-###########################################################################
-
-sub check_hashcash_value {
-  my ($self, $scanner, $valmin, $valmax) = @_;
-  my $val = $self->_run_hashcash($scanner);
-  return ($val >= $valmin && $val < $valmax);
-}
-
-sub check_hashcash_double_spend {
-  my ($self, $scanner) = @_;
-  $self->_run_hashcash($scanner);
-  return ($scanner->{hashcash_double_spent});
-}
-
-############################################################################
-
-sub _run_hashcash {
-  my ($self, $scanner) = @_;
-
-  if (defined $scanner->{hashcash_value}) { return $scanner->{hashcash_value}; }
-
-  $scanner->{hashcash_value} = 0;
-
-  # X-Hashcash: 0:031118:camram-spam@camram.org:c068b58ade6dcbaf
-  # or:
-  # X-hashcash: 1:20:040803:hashcash@freelists.org::6dcdb3a3ad4e1b86:1519d
-  # X-hashcash: 1:20:040803:jm@jmason.org::6b484d06469ccb28:8838a
-  # X-hashcash: 1:20:040803:adam@cypherspace.org::a1cbc54bf0182ea8:5d6a0
-
-  # call down to {msg} so that we can get it as an array of
-  # individual headers
-  my @hdrs = $scanner->{msg}->get_header ("X-Hashcash");
-  if (scalar @hdrs == 0) {
-    @hdrs = $scanner->{msg}->get_header ("Hashcash");
-  }
-
-  foreach my $hc (@hdrs) {
-    my $value = $self->_run_hashcash_for_one_string($scanner, $hc);
-    if ($value) {
-      # remove the "double-spend" bool if we did find a usable string;
-      # this happens when one string is already spent, but another
-      # string has not yet been.
-      delete $scanner->{hashcash_double_spent};
-      return $value;
-    }
-  }
-  return 0;
-}
-
-sub _run_hashcash_for_one_string {
-  my ($self, $scanner, $hc) = @_;
-
-  if (!$hc) { return 0; }
-  $hc =~ s/\s+//gs;       # remove whitespace from multiline, folded tokens
-
-  # untaint the string for paranoia, making sure not to allow \n \0 \' \"
-  if ($hc =~ /^[-A-Za-z0-9\xA0-\xFF:_\/\%\@\.\,\= \*\+\;]+$/) {
-    $hc = untaint_var($hc);
-  }
-  if (!$hc) { return 0; }
-
-  my ($ver, $bits, $date, $rsrc, $exts, $rand, $trial);
-  if ($hc =~ /^0:/) {
-    ($ver, $date, $rsrc, $trial) = split (/:/, $hc, 4);
-  }
-  elsif ($hc =~ /^1:/) {
-    ($ver, $bits, $date, $rsrc, $exts, $rand, $trial) =
-                                    split (/:/, $hc, 7);
-    # extensions are, as yet, unused by SpamAssassin
-  }
-  else {
-    dbg("hashcash: version $ver stamps not yet supported");
-    return 0;
-  }
-
-  if (!$trial) {
-    dbg("hashcash: no trial in stamp '$hc'");
-    return 0;
-  }
-
-  my $accept = $scanner->{conf}->{hashcash_accept};
-  if (!$self->_check_hashcash_resource ($scanner, $accept, $rsrc)) {
-    dbg("hashcash: resource $rsrc not accepted here");
-    return 0;
-  }
-
-  # get the hash collision from the token.  Computing the hash collision
-  # is very easy (great!) -- just get SHA1(token) and count the 0 bits at
-  # the start of the SHA1 hash, according to the draft at
-  # http://www.hashcash.org/draft-hashcash.txt .
-  my $value = 0;
-  my $bitstring = unpack ("B*", sha1($hc));
-  $bitstring =~ /^(0+)/ and $value = length $1;
-
-  # hashcash v1 tokens: if the "claimed value" of the token is less than
-  # what the token actually contains (ie. token was accidentally generated
-  # with 24 bits instead of the claimed 20), then cut it down to just the
-  # claimed value.  that way it's a bit tidier and more deterministic.
-  if ($bits && $value > $bits) {
-    $value = $bits;
-  }
-
-  dbg("hashcash: token value: $value");
-
-  if ($self->was_hashcash_token_double_spent ($scanner, $hc)) {
-    $scanner->{hashcash_double_spent} = 1;
-    return 0;
-  }
-
-  $scanner->{hashcash_value} = $value;
-  return $value;
-}
-
-sub was_hashcash_token_double_spent {
-  my ($self, $scanner, $token) = @_;
-
-  my $main = $self->{main};
-  if (!$main->{conf}->{hashcash_doublespend_path}) {
-    dbg("hashcash: hashcash_doublespend_path not defined or empty");
-    return 0;
-  }
-  if (!HAS_DB_FILE) {
-    dbg("hashcash: DB_File module not installed, cannot use double-spend db");
-    return 0;
-  }
-
-  my $path = $main->sed_path ($main->{conf}->{hashcash_doublespend_path});
-  my $parentdir = dirname ($path);
-  my $stat_errn = stat($parentdir) ? 0 : 0+$!;
-  if ($stat_errn == 0 && !-d _) {
-    dbg("hashcash: parent dir $parentdir exists but is not a directory");
-  } elsif ($stat_errn == ENOENT) {
-    # run in an eval(); if mkpath has no perms, it calls die()
-    eval {
-      mkpath ($parentdir, 0, (oct ($main->{conf}->{hashcash_doublespend_file_mode}) & 0777));
-    };
-  }
-
-  my %spenddb;
-  if (!tie %spenddb, "DB_File", $path, O_RDWR|O_CREAT,
-                (oct ($main->{conf}->{hashcash_doublespend_file_mode}) & 0666))
-  {
-    dbg("hashcash: failed to tie to $path: $@ $!");
-    # not a serious error. TODO?
-    return 0;
-  }
-
-  if (exists $spenddb{$token}) {
-    untie %spenddb;
-    dbg("hashcash: token '$token' spent already");
-    return 1;
-  }
-
-  $spenddb{$token} = time;
-  dbg("hashcash: marking token '$token' as spent");
-
-  # TODO: expiry?
-
-  untie %spenddb;
-
-  return 0;
-}
-
-sub _check_hashcash_resource {
-  my ($self, $scanner, $list, $addr) = @_;
-  $addr = lc $addr;
-  if (defined ($list->{$addr})) { return 1; }
-  study $addr;  # study is a no-op since perl 5.16.0, eliminating related bugs
-
-  foreach my $regexp (values %{$list})
-  {
-    # allow %u == current username
-    # \\ is added by $conf->add_to_addrlist()
-    $regexp =~ s/\\\%u/$scanner->{main}->{username}/gs;
-
-    if ($addr =~ /$regexp/i) {
-      return 1;
-    }
-  }
-
-  # TODO: use "To" and "Cc" addresses gleaned from the mails in the Bayes
-  # database trained as ham, as well.
-
-  return 0;
-}
-
-############################################################################
-
-1;
-
-=back
-
-=cut
--- a/upstream/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/HeaderEval.pm
@ -25,12 +25,15 @@ use Errno qw(EBADF);

 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Locales;
-use Mail::SpamAssassin::Util qw(get_my_locales parse_rfc822_date);
+use Mail::SpamAssassin::Util qw(get_my_locales parse_rfc822_date
+                                is_valid_utf_8);
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Constants qw(:sa :ip);

 our @ISA = qw(Mail::SpamAssassin::Plugin);

+my $IP_ADDRESS = IP_ADDRESS;
+
 # constructor: register the eval rule
 sub new {
  my $class = shift;
@ -42,51 +45,44 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule("check_for_fake_aol_relay_in_rcvd");
-  $self->register_eval_rule("check_for_faraway_charset_in_headers");
-  $self->register_eval_rule("check_for_unique_subject_id");
-  $self->register_eval_rule("check_illegal_chars");
-  $self->register_eval_rule("check_for_forged_hotmail_received_headers");
-  $self->register_eval_rule("check_for_no_hotmail_received_headers");
-  $self->register_eval_rule("check_for_msn_groups_headers");
-  $self->register_eval_rule("check_for_forged_eudoramail_received_headers");
-  $self->register_eval_rule("check_for_forged_yahoo_received_headers");
-  $self->register_eval_rule("check_for_forged_juno_received_headers");
-  $self->register_eval_rule("check_for_forged_gmail_received_headers");
-  $self->register_eval_rule("check_for_matching_env_and_hdr_from");
-  $self->register_eval_rule("sorted_recipients");
-  $self->register_eval_rule("similar_recipients");
-  $self->register_eval_rule("check_for_missing_to_header");
-  $self->register_eval_rule("check_for_forged_gw05_received_headers");
-  $self->register_eval_rule("check_for_shifted_date");
-  $self->register_eval_rule("subject_is_all_caps");
-  $self->register_eval_rule("check_for_to_in_subject");
-  $self->register_eval_rule("check_outlook_message_id");
-  $self->register_eval_rule("check_messageid_not_usable");
-  $self->register_eval_rule("check_header_count_range");
-  $self->register_eval_rule("check_unresolved_template");
-  $self->register_eval_rule("check_ratware_name_id");
-  $self->register_eval_rule("check_ratware_envelope_from");
-  $self->register_eval_rule("gated_through_received_hdr_remover");
-  $self->register_eval_rule("received_within_months");
-  $self->register_eval_rule("check_equal_from_domains");
+  $self->register_eval_rule("check_for_fake_aol_relay_in_rcvd", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_faraway_charset_in_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_unique_subject_id", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_illegal_chars", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_hotmail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_no_hotmail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_msn_groups_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_eudoramail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_yahoo_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_juno_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_gmail_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_matching_env_and_hdr_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("sorted_recipients", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("similar_recipients", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_missing_to_header", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_gw05_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_shifted_date", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("subject_is_all_caps", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_to_in_subject", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_outlook_message_id", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_messageid_not_usable", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_header_count_range", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_unresolved_template", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_ratware_name_id", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_ratware_envelope_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("gated_through_received_hdr_remover", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("received_within_months", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_equal_from_domains", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);

  return $self;
 }

-# load triplets.txt into memory 
-sub compile_now_start {
-  my ($self) = @_;
-
-  $self->word_is_in_dictionary("aba");
-}
-
 sub check_for_fake_aol_relay_in_rcvd {
  my ($self, $pms) = @_;
  local ($_);

  $_ = $pms->get('Received');
-  s/\s/ /gs;
+  s/\s+/ /gs;

  # this is the hostname format used by AOL for their relays. Spammers love 
  # forging it.  Don't make it more specific to match aol.com only, though --
@ -122,142 +118,20 @@ sub check_for_faraway_charset_in_headers {
  return 0 if grep { $_ eq "all" } @locales;

  for my $h (qw(From Subject)) {
-    my @hdrs = $pms->get("$h:raw");  # ??? get() returns a scalar ???
-    if ($#hdrs >= 0) {
-      $hdr = join(" ", @hdrs);
-    } else {
-      $hdr = '';
-    }
+    my @hdrs = $pms->get("$h:raw");
+    foreach my $hdr (@hdrs) {
      while ($hdr =~ /=\?(.+?)\?.\?.*?\?=/g) {
        Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales)
          or return 1;
      }
    }     
+  }
  0;
 }

+# Deprecated (Bug 8051)
 sub check_for_unique_subject_id {
-  my ($self, $pms) = @_;
-  local ($_);
-  $_ = lc $pms->get('Subject');
-  study;  # study is a no-op since perl 5.16.0, eliminating related bugs
-
-  my $id = 0;
-  if (/[-_\.\s]{7,}([-a-z0-9]{4,})$/
-	|| /\s{10,}(?:\S\s)?(\S+)$/
-	|| /\s{3,}[-:\#\(\[]+([-a-z0-9]{4,})[\]\)]+$/
-	|| /\s{3,}[:\#\(\[]*([a-f0-9]{4,})[\]\)]*$/
-	|| /\s{3,}[-:\#]([a-z0-9]{5,})$/
-	|| /[\s._]{3,}([^0\s._]\d{3,})$/
-	|| /[\s._]{3,}\[(\S+)\]$/
-
-        # (7217vPhZ0-478TLdy5829qicU9-0@26) and similar
-        || /\(([-\w]{7,}\@\d+)\)$/
-
-        # Seven or more digits at the end of a subject is almost certainly a id
-        || /\b(\d{7,})\s*$/
-
-        # stuff at end of line after "!" or "?" is usually an id
-        || /[!\?]\s*(\d{4,}|\w+(-\w+)+)\s*$/
-
-        # 9095IPZK7-095wsvp8715rJgY8-286-28 and similar
-	# excluding 'Re:', etc and the first word
-        || /(?:\w{2,3}:\s)?\w+\s+(\w{7,}-\w{7,}(-\w+)*)\s*$/
-
-        # #30D7 and similar
-        || /\s#\s*([a-f0-9]{4,})\s*$/
-     )
-  {
-    $id = $1;
-    # exempt online purchases
-    if ($id =~ /\d{5,}/
-	&& /(?:item|invoice|order|number|confirmation).{1,6}\Q$id\E\s*$/)
-    {
-      $id = 0;
-    }
-
-    # for the "foo-bar-baz" case, otherwise it won't
-    # be found in the dict:
-    $id =~ s/-//;
-  }
-
-  return ($id && !$self->word_is_in_dictionary($id));
-}
-
-# word_is_in_dictionary()
-#
-# See if the word looks like an English word, by checking if each triplet
-# of letters it contains is one that can be found in the English language.
-# Does not include triplets only found in proper names, or in the Latin
-# and Greek terms that might be found in a larger dictionary
-
-my %triplets;
-my $triplets_loaded = 0;
-
-sub word_is_in_dictionary {
-  my ($self, $word) = @_;
-  local ($_);
-  local $/ = "\n";		# Ensure $/ is set appropriately
-
-  # $word =~ tr/A-Z/a-z/;	# already done by this stage
-  $word =~ s/^\s+//;
-  $word =~ s/\s+$//;
-
-  # If it contains a digit, dash, etc, it's not a valid word.
-  # Don't reject words like "can't" and "I'll"
-  return 0 if ($word =~ /[^a-z\']/);
-
-  # handle a few common "blah blah blah (comment)" styles
-  return 1 if ($word eq "ot");	# off-topic
-  return 1 if ($word =~ /(?:linux|nix|bsd)/); # not in most dicts
-  return 1 if ($word =~ /(?:whew|phew|attn|tha?nx)/);  # not in most dicts
-
-  my $word_len = length($word);
-
-  # Unique IDs probably aren't going to be only one or two letters long
-  return 1 if ($word_len < 3);
-
-  if (!$triplets_loaded) {
-    # take a copy to avoid modifying the real one
-    my @default_triplets_path = @Mail::SpamAssassin::default_rules_path;
-    s{$}{/triplets.txt}  for @default_triplets_path;
-    my $filename = $self->{main}->first_existing_path (@default_triplets_path);
-
-    if (!defined $filename) {
-      dbg("eval: failed to locate the triplets.txt file");
-      return 1;
-    }
-
-    local *TRIPLETS;
-    if (!open (TRIPLETS, "<$filename")) {
-      dbg("eval: failed to open '$filename', cannot check dictionary: $!");
-      return 1;
-    }
-    for($!=0; <TRIPLETS>; $!=0) {
-      chomp;
-      $triplets{$_} = 1;
-    }
-    defined $_ || $!==0  or
-      $!==EBADF ? dbg("eval: error reading from $filename: $!")
-                : die "error reading from $filename: $!";
-    close(TRIPLETS)  or die "error closing $filename: $!";
-
-    $triplets_loaded = 1;
-  } # if (!$triplets_loaded)
-
-
-  my $i;
-
-  for ($i = 0; $i < ($word_len - 2); $i++) {
-    my $triplet = substr($word, $i, 3);
-    if (!$triplets{$triplet}) {
-      dbg("eval: unique ID: letter triplet '$triplet' from word '$word' not valid");
  return 0;
-    }
-  } # for ($i = 0; $i < ($word_len - 2); $i++)
-
-  # All letter triplets in word were found to be valid
-  return 1;
 }

 # look for 8-bit and other illegal characters that should be MIME
@ -268,7 +142,18 @@ sub check_illegal_chars {

  $header .= ":raw" unless $header =~ /:raw$/;
  my $str = $pms->get($header);
-  return 0 if !defined $str || $str eq '';
+  return 0 if !defined $str || $str !~ /\S/;
+
+  if ($str =~ tr/\x00-\x7F//c && is_valid_utf_8($str)) {
+    # is non-ASCII and is valid UTF-8
+    if ($str =~ tr/\x00-\x08\x0B\x0C\x0E-\x1F//) {
+      dbg("eval: %s is valid UTF-8 but contains controls: %s", $header, $str);
+    } else {
+      # todo: only with a SMTPUTF8 mail
+      dbg("eval: %s is valid UTF-8: %s", $header, $str);
+      return 0;
+    }
+  }

  # count illegal substrings (RFC 2045)
  # (non-ASCII + C0 controls except TAB, NL, CR)
@ -291,12 +176,12 @@ sub gated_through_received_hdr_remover {
  my ($self, $pms) = @_;

  my $txt = $pms->get("Mailing-List",undef);
-  if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/) {
+  if (defined $txt && $txt =~ /^contact \S+\@\S+\; run by ezmlm$/m) {
    my $dlto = $pms->get("Delivered-To");
    my $rcvd = $pms->get("Received");

    # ensure we have other indicative headers too
-    if ($dlto =~ /^mailing list \S+\@\S+/ &&
+    if ($dlto =~ /^mailing list \S+\@\S+/m &&
        $rcvd =~ /qmail \d+ invoked (?:from network|by .{3,20})\); \d+ ... \d+/)
    {
      return 1;
@ -336,9 +221,8 @@ sub _check_for_forged_hotmail_received_headers {
  return if $self->check_for_msn_groups_headers($pms);

  my $ip = $pms->get('X-Originating-Ip',undef);
-  my $IP_ADDRESS = IP_ADDRESS;
  my $orig = $pms->get('X-OriginatorOrg',undef);
-  my $ORIGINATOR = 'hotmail.com';
+  my $ORIGINATOR = qr/hotmail\.com|msonline\-outlook/;

  if (defined $ip && $ip =~ /$IP_ADDRESS/) { $ip = 1; } else { $ip = 0; }
  if (defined $orig && $orig =~ /$ORIGINATOR/) { $orig = 1; } else { $orig = 0; }
@ -347,6 +231,8 @@ sub _check_for_forged_hotmail_received_headers {
  # Received: from hotmail.com (f135.law8.hotmail.com [216.33.241.135])
  # or like
  # Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-oln040092066056.outbound.protection.outlook.com [40.92.66.56])
+  # or
+  # Received: from VI1PR04MB3039.eurprd04.prod.outlook.com (2603:10a6:802:b::13)
  # spammers do not ;)

  if ($self->gated_through_received_hdr_remover($pms)) { return; }
@ -355,6 +241,8 @@ sub _check_for_forged_hotmail_received_headers {
                { return; }
  if ($rcvd =~ /from \S*\.outbound\.protection\.outlook\.com \(\S+\.outbound\.protection\.outlook\.com[ \)]/ && $orig)
                { return; }
+  if ($rcvd =~ /from \S*\.eurprd\d+\.prod\.outlook\.com \($IP_ADDRESS\)/ && $orig)
+                { return; }
  if ($rcvd =~ /from \S*\.hotmail.com \(\[$IP_ADDRESS\][ \):]/ && $ip)
                { return; }
  if ($rcvd =~ /from \S+ by \S+\.hotmail(?:\.msn)?\.com with HTTP\;/ && $ip)
@ -465,7 +353,6 @@ sub check_for_forged_eudoramail_received_headers {
  $rcvd =~ s/\s+/ /gs;		# just spaces, simplify the regexp

  my $ip = $pms->get('X-Sender-Ip',undef);
-  my $IP_ADDRESS = IP_ADDRESS;
  if (defined $ip && $ip =~ /$IP_ADDRESS/) { $ip = 1; } else { $ip = 0; }

  # Eudoramail formats its received headers like this:
@ -518,7 +405,6 @@ sub check_for_forged_yahoo_received_headers {
  if ($rcvd =~ /by web\S+\.mail\S*\.yahoo\.com via HTTP/) { return 0; }
  if ($rcvd =~ /by sonic\S+\.consmr\.mail\S*\.yahoo\.com with HTTP/) { return 0; }
  if ($rcvd =~ /by smtp\S+\.yahoo\.com with SMTP/) { return 0; }
-  my $IP_ADDRESS = IP_ADDRESS;
  if ($rcvd =~
      /from \[$IP_ADDRESS\] by \S+\.(?:groups|scd|dcn)\.yahoo\.com with NNFMP/) {
    return 0;
@ -554,13 +440,12 @@ sub check_for_forged_juno_received_headers {
  my $xorig = $pms->get('X-Originating-IP');
  my $xmailer = $pms->get('X-Mailer');
  my $rcvd = $pms->get('Received');
-  my $IP_ADDRESS = IP_ADDRESS;

  if ($xorig ne '') {
    # New style Juno has no X-Originating-IP header, and other changes
    if($rcvd !~ /from.*\b(?:juno|untd)\.com.*[\[\(]$IP_ADDRESS[\]\)].*by/
        && $rcvd !~ / cookie\.(?:juno|untd)\.com /) { return 1; }
-    if($xmailer !~ /Juno /) { return 1; }
+    if(index($xmailer, 'Juno ') == -1) { return 1; }
  } else {
    if($rcvd =~ /from.*\bmail\.com.*\[$IP_ADDRESS\].*by/) {
      if($xmailer !~ /\bmail\.com/) { return 1; }
@ -593,6 +478,7 @@ sub check_for_forged_gmail_received_headers {
  if ($received =~ /by smtp\.googlemail\.com with ESMTPSA id \S+/) {
    return 0;
  }
+
  if ( (length($xgms) >= GOOGLE_MESSAGE_STATE_LENGTH_MIN) && 
    (length($xss) >= GOOGLE_SMTP_SOURCE_LENGTH_MIN)) {
      return 0;
@ -637,10 +523,9 @@ sub _check_recipients {
  my @inputs;

  # ToCc: pseudo-header works best, but sometimes Bcc: is better
-  for ('ToCc', 'Bcc') {
-    my $to = $pms->get($_);	# get recipients
-    $to =~ s/\(.*?\)//g;	# strip out the (comments)
-    push(@inputs, ($to =~ m/([\w.=-]+\@\w+(?:[\w.-]+\.)+\w+)/g));
+  for ('ToCc:addr', 'Bcc:addr') {
+    my @to = $pms->get($_);	# get recipients
+    push @inputs, @to;
    last if scalar(@inputs) >= TOCC_SIMILAR_COUNT;
  }

--- a/upstream/lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/ImageInfo.pm
@ -99,13 +99,13 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->register_eval_rule ("image_count");
-  $self->register_eval_rule ("pixel_coverage");
-  $self->register_eval_rule ("image_size_exact");
-  $self->register_eval_rule ("image_size_range");
-  $self->register_eval_rule ("image_named");
-  $self->register_eval_rule ("image_name_regex");
-  $self->register_eval_rule ("image_to_text_ratio");
+  $self->register_eval_rule ("image_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pixel_coverage", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("image_size_exact", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("image_size_range", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("image_named", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("image_name_regex", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("image_to_text_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  return $self;
 }
@ -256,7 +256,7 @@ sub _get_images {

 sub image_named {
  my ($self,$pms,$body,$name) = @_;
-  return unless (defined $name);
+  return 0 unless (defined $name);

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -272,7 +272,7 @@ sub image_named {

 sub image_name_regex {
  my ($self,$pms,$body,$re) = @_;
-  return unless (defined $re);
+  return 0 unless (defined $re);

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -300,7 +300,7 @@ sub image_name_regex {
 sub image_count {
  my ($self,$pms,$body,$type,$min,$max) = @_;

-  return unless defined $min;
+  return 0 unless defined $min;

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -316,7 +316,7 @@ sub image_count {
 sub pixel_coverage {
  my ($self,$pms,$body,$type,$min,$max) = @_;

-  return unless (defined $type && defined $min);
+  return 0 unless (defined $type && defined $min);

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -331,7 +331,7 @@ sub pixel_coverage {

 sub image_to_text_ratio {
  my ($self,$pms,$body,$type,$min,$max) = @_;
-  return unless (defined $type && defined $min && defined $max);
+  return 0 unless (defined $type && defined $min && defined $max);

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -353,7 +353,7 @@ sub image_to_text_ratio {

 sub image_size_exact {
  my ($self,$pms,$body,$type,$height,$width) = @_;
-  return unless (defined $type && defined $height && defined $width);
+  return 0 unless (defined $type && defined $height && defined $width);

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -369,7 +369,7 @@ sub image_size_exact {

 sub image_size_range {
  my ($self,$pms,$body,$type,$minh,$minw,$maxh,$maxw) = @_;
-  return unless (defined $type && defined $minh && defined $minw);
+  return 0 unless (defined $type && defined $minh && defined $minw);

  # make sure we have image data read in.
  if (!exists $pms->{'imageinfo'}) {
@ -377,7 +377,7 @@ sub image_size_range {
  }

  my $name = 'dems_'.$type;
-  return unless (exists $pms->{'imageinfo'}->{$name});
+  return 0 unless (exists $pms->{'imageinfo'}->{$name});

  foreach my $dem ( keys %{$pms->{'imageinfo'}->{"dems_$type"}}) {
    my ($h,$w) = split(/x/,$dem);
--- a/upstream/lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
@ -68,18 +68,18 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule("check_for_mime");
-  $self->register_eval_rule("check_for_mime_html");
-  $self->register_eval_rule("check_for_mime_html_only");
-  $self->register_eval_rule("check_mime_multipart_ratio");
-  $self->register_eval_rule("check_msg_parse_flags");
-  $self->register_eval_rule("check_for_ascii_text_illegal");
-  $self->register_eval_rule("check_abundant_unicode_ratio");
-  $self->register_eval_rule("check_for_faraway_charset");
-  $self->register_eval_rule("check_for_uppercase");
-  $self->register_eval_rule("check_ma_non_text");
-  $self->register_eval_rule("check_base64_length");
-  $self->register_eval_rule("check_qp_ratio");
+  $self->register_eval_rule("check_for_mime", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_for_mime_html", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_for_mime_html_only", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_mime_multipart_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_msg_parse_flags", $Mail::SpamAssassin::Conf::TYPE_HEADER_EVALS);
+  $self->register_eval_rule("check_for_ascii_text_illegal", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_abundant_unicode_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_for_faraway_charset", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_for_uppercase", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_ma_non_text", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_base64_length", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule("check_qp_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  return $self;
 }
@ -89,8 +89,7 @@ sub new {
 sub are_more_high_bits_set {
  my ($self, $str) = @_;

-  # TODO: I suspect a tr// trick may be faster here
-  my $numhis = () = ($str =~ /[\200-\377]/g);
+  my $numhis = $str =~ tr/\x00-\x7F//c;  # number of non-ASCII chars
  my $numlos = length($str) - $numhis;

  ($numlos <= $numhis && $numhis > 3);
@ -182,7 +181,7 @@ sub check_for_mime {

  $self->_check_attachments($pms) unless exists $pms->{mime_checked_attachments};
  return 0 unless exists $pms->{$test};
-  return $pms->{$test};
+  return $pms->{$test} ? 1 : 0;
 }

 # any text/html MIME part
@ -232,19 +231,19 @@ sub _check_mime_header {
    $pms->{mime_body_text_count}++;
  }

-  if ($cte =~ /base64/) {
+  if (index($cte, 'base64') >= 0) {
    $pms->{mime_base64_count}++;
  }
-  elsif ($cte =~ /quoted-printable/) {
+  elsif (index($cte, 'quoted-printable') >= 0) {
    $pms->{mime_qp_count}++;
  }

-  if ($cd && $cd =~ /attachment/) {
+  if ($cd && index($cd, 'attachment') >= 0) {
    $pms->{mime_attachment}++;
  }

  if ($ctype =~ /^text/ &&
-      $cte =~ /base64/ &&
+      index($cte, 'base64') >= 0 &&
      (!$charset || $charset =~ /(?:us-ascii|ansi_x3\.4-1968|iso-ir-6|ansi_x3\.4-1986|iso_646\.irv:1991|ascii|iso646-us|us|ibm367|cp367|csascii)/) &&
      !($cd && $cd =~ /^(?:attachment|inline)/))
  {
@ -366,9 +365,9 @@ sub _check_attachments {

    $part++;
    $part_type[$part] = $ctype;
-    $part_bytes[$part] = 0 if $cd !~ /attachment/;
+    $part_bytes[$part] = 0 if index($cd, 'attachment') == -1;

-    my $cte_is_base64 = $cte =~ /base64/i;
+    my $cte_is_base64 = index($cte, 'base64') >= 0;
    my $previous = '';
    foreach (@{$p->raw()}) {

@ -385,12 +384,12 @@ sub _check_attachments {
      # if ($pms->{mime_html_no_charset} && $ctype eq 'text/html' && defined $charset) {
      # $pms->{mime_html_no_charset} = 0;
      # }
-      if ($pms->{mime_multipart_alternative} && $cd !~ /attachment/ &&
+      if ($pms->{mime_multipart_alternative} && index($cd, 'attachment') == -1 &&
          ($ctype eq 'text/plain' || $ctype eq 'text/html')) {
 	$part_bytes[$part] += length;
      }

-      if ($where != 1 && $cte eq "quoted-printable" && ! /^SPAM: /) {
+      if ($where != 1 && $cte eq "quoted-printable" && index($_, 'SPAM: ') != 0) {
        # RFC 5322: Each line SHOULD be no more than 78 characters,
        #           excluding the CRLF.
        # RFC 2045: The Quoted-Printable encoding REQUIRES that
@ -415,9 +414,11 @@ sub _check_attachments {
        # }

        # count excessive QP bytes
-        if (index($_, '=') != -1) {
+        if (index($_, '=') >= 0) {
+## no critic (Perlsecret)
 	  # whoever wrote this next line is an evil hacker -- jm
 	  my $qp = () = m/=(?:09|3[0-9ABCEF]|[2456][0-9A-F]|7[0-9A-E])/g;
+## use critic
 	  if ($qp) {
 	    $qp_count += $qp;
 	    # tabs and spaces at end of encoded line are okay.  Also, multiple
@ -630,7 +631,7 @@ sub get_charset_from_ct_line {
 sub check_ma_non_text {
  my($self, $pms) = @_;

-  foreach my $map ($pms->{msg}->find_parts(qr@^multipart/alternative$@i)) {
+  foreach my $map ($pms->{msg}->find_parts(qr@^multipart/alternative$@)) {
    foreach my $p ($map->find_parts(qr/./, 1, 0)) {
      next if (lc $p->{'type'} eq 'multipart/related');
      next if (lc $p->{'type'} eq 'application/rtf');
--- a/upstream/lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/MIMEHeader.pm
@ -51,6 +51,23 @@ around the newline character in "folded" headers will be replaced with a single
 space.  Append C<:raw> to the header name to retrieve the raw, undecoded value,
 including pristine whitespace, instead.

+=item tflags NAME_OF_RULE range=x-y
+
+Match only from specific MIME parts, indexed in the order they are parsed.
+Part 1 = main message headers. Part 2 = next part etc.
+
+ range=1    (match only main headers, not any subparts)
+ range=2-   (match any subparts, but not the main headers)
+ range=-3   (match only first three parts, including main headers)
+ range=2-3  (match only first two subparts)
+
+=item tflags NAME_OF_RULE concat
+
+Concatenate all headers from all mime parts (possible range applied) into a
+single string for matching.  This allows matching headers across multiple
+parts with single regex.  Normally pattern is tested individually for
+different mime parts.
+
 =back

 =cut
@ -151,6 +168,9 @@ sub set_config {
      $self->{parser}->add_test($rulename, $evalfn."()",
                $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

+      # Support named regex captures
+      $self->{parser}->parse_captures($rulename, $rec);
+
      # evalfn/rulename safe, sanitized by $RULENAME_RE
      my $evalcode = '
        sub Mail::SpamAssassin::Plugin::MIMEHeader::'.$evalfn.' {
@ -178,37 +198,104 @@ sub set_config {
 # ---------------------------------------------------------------------------

 sub eval_hook_called {
-  my ($pobj, $scanner, $rulename) = @_;
+  my ($pobj, $pms, $rulename) = @_;

-  my $rule = $scanner->{conf}->{mimeheader_tests}->{$rulename};
+  my $conf = $pms->{conf};
+  my $rule = $conf->{mimeheader_tests}->{$rulename};
  my $hdr = $rule->{hdr};
  my $negated = $rule->{negated};
-  my $if_unset = $rule->{if_unset};
  my $pattern = $rule->{pattern};
+  my $tflags = $conf->{tflags}->{$rulename}||'';
  
-
-  my $getraw;
+  my $getraw = 0;
  if ($hdr =~ s/:raw$//) {
    $getraw = 1;
-  } else {
-    $getraw = 0;
  }

-  foreach my $p ($scanner->{msg}->find_parts(qr/./)) {
+  my $range_min = 0;
+  my $range_max = 1000;
+  if ($tflags =~ /(?:^|\s)range=(\d+)?(-)?(\d+)?(?:\s|$)/) {
+    if (defined $1 && defined $2 && defined $3) {
+      $range_min = $1;
+      $range_max = $3;
+    }
+    elsif (defined $1 && defined $2) {
+      $range_min = $1;
+    }
+    elsif (defined $2 && defined $3) {
+      $range_max = $3;
+    }
+    elsif (defined $1) {
+      $range_min = $range_max = $1;
+    }
+  }
+
+  my $multiple = $tflags =~ /\bmultiple\b/;
+  my $concat = $tflags =~ /\bconcat\b/;
+  my $maxhits = $tflags =~ /\bmaxhits=(\d+)\b/ ? $1 :
+                           $multiple ? 1000 : 1;
+  my $cval = '';
+
+  my $idx = 0;
+  foreach my $p ($pms->{msg}->find_parts(qr/./)) {
+    $idx++;
+    last if $idx > $range_max;
+    next if $idx < $range_min;
+
    my $val;
-    if ($getraw) {
+    if ($hdr eq 'ALL') {
+      $val = $p->get_all_headers($getraw, 0);
+    } elsif ($getraw) {
      $val = $p->raw_header($hdr);
    } else {
      $val = $p->get_header($hdr);
    }
-    $val = $if_unset if !defined $val;
+    $val = $rule->{if_unset}  if !defined $val;

-    if ($val =~ $pattern) {
-      return ($negated ? 0 : 1);
+    if ($concat) {
+      $val .= "\n" unless $val =~ /\n$/;
+      $cval .= $val;
+      next;
+    }
+
+    if (_check($pms, $rulename, $val, $pattern, $negated, $maxhits, "part $idx")) {
+      return 0;
    }
  }

-  return ($negated ? 1 : 0);
+  if ($concat) {
+    if (_check($pms, $rulename, $cval, $pattern, $negated, $maxhits, 'concat')) {
+      return 0;
+    }
+  }
+
+  if ($negated) {
+    dbg("mimeheader: ran rule $rulename ======> got hit: \"<negative match>\"");
+    return 1;
+  }
+
+  return 0;
+}
+
+sub _check {
+  my ($pms, $rulename, $value, $pattern, $negated, $maxhits, $desc) = @_;
+
+  my $hits = 0;
+  my %captures;
+  while ($value =~ /$pattern/gp) {
+    last if $negated;
+    if (%-) {
+      foreach my $cname (keys %-) {
+        push @{$captures{$cname}}, grep { $_ ne "" } @{$-{$cname}};
+      }
+    }
+    my $match = defined ${^MATCH} ? ${^MATCH} : "<negative match>";
+    $pms->got_hit($rulename, '', ruletype => 'eval');
+    dbg("mimeheader: ran rule $rulename ======> got hit: \"$match\" ($desc)");
+    last if ++$hits >= $maxhits;
+  }
+  $pms->set_captures(\%captures) if %captures;
+  return $hits;
 }

 # ---------------------------------------------------------------------------
@ -224,4 +311,10 @@ sub finish_tests {

 # ---------------------------------------------------------------------------

+sub has_all_header { 1 } # Supports ALL header query (Bug 5582)
+sub has_tflags_range { 1 } # Supports tflags range=x-y
+sub has_tflags_concat { 1 } # Supports tflags concat
+sub has_tflags_multiple { 1 } # Supports tflags multiple
+sub has_capture_rules { 1 } # Supports named regex captures (Bug 7992)
+
 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/OLEVBMacro.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/OLEVBMacro.pm
--- a/upstream/lib/Mail/SpamAssassin/Plugin/OneLineBodyRuleType.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/OneLineBodyRuleType.pm
@ -66,6 +66,20 @@ sub check_start {
  }
 }

+sub check_cleanup {
+  my ($self, $params) = @_;
+  my $pms = $params->{permsgstatus};
+  my $scoresptr = $pms->{conf}->{scores};
+
+  # Force all body rules ready for meta rules.  Need to do it here in
+  # cleanup, because the body is scanned per line instead of per rule
+  if ($pms->{conf}->{skip_body_rules}) {
+    foreach (keys %{$pms->{conf}->{skip_body_rules}}) {
+      $pms->rule_ready($_, 1)  if $scoresptr->{$_};
+    }
+  }
+}
+
 ###########################################################################

 1;
@ -96,13 +110,16 @@ sub do_one_line_body_tests {

    if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
    {
+      $sub .= '
+        my $hitsptr = $self->{tests_already_hit};
+      ';
      # support multiple matches
      my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
      $max = untaint_var($max);
      if ($max) {
        $sub .= '
-          if (exists $self->{tests_already_hit}->{q{'.$rulename.'}}) {
-            return 0 if $self->{tests_already_hit}->{q{'.$rulename.'}} >= '.$max.';
+          if ($hitsptr->{q{'.$rulename.'}}) {
+            return 0 if $hitsptr->{q{'.$rulename.'}} >= '.$max.';
          }
        ';
      }
@ -111,17 +128,17 @@ sub do_one_line_body_tests {
      my $lref = \$line;
      pos $$lref = 0;
      '.$self->hash_line_for_rule($pms, $rulename).'
-      while ($$lref =~ /$qrptr->{q{'.$rulename.'}}/go) {
+      while ($$lref =~ /$qrptr->{q{'.$rulename.'}}/gop) {
        $self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "one_line_body");
        '. $self->hit_rule_plugin_code($pms, $rulename, "one_line_body", "") . '
-        '. ($max? 'last if $self->{tests_already_hit}->{q{'.$rulename.'}} >= '.$max.';' : '') . '
+        '. ($max? 'last if $hitsptr->{q{'.$rulename.'}} >= '.$max.';' : '') . '
      }
      ';

    } else {
      $sub .= '
      '.$self->hash_line_for_rule($pms, $rulename).'
-      if ($line =~ /$qrptr->{q{'.$rulename.'}}/o) {
+      if ($line =~ /$qrptr->{q{'.$rulename.'}}/op) {
        $self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "one_line_body");
        '. $self->hit_rule_plugin_code($pms, $rulename, "one_line_body", "return 1") . '
      }
@ -129,6 +146,11 @@ sub do_one_line_body_tests {

    }

+    # Make sure rule is marked ready for meta rules
+    $sub .= '
+      $self->rule_ready(q{'.$rulename.'}, 1);
+    ';
+
    return if ($opts{doing_user_rules} &&
                  !$self->is_user_rule_sub($rulename.'_one_line_body_test'));

--- a/upstream/lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
@ -130,6 +130,27 @@ This plugin helps detected spam using attached PDF files
     body RULENAME eval:pdf_is_empty_body(<bytes>)
        bytes: maximum byte count to allow and still consider it empty

+  pdf_image_to_text_ratio()
+
+     body RULENAME eval:pdf_image_to_text_ratio(<min>,<max>)
+        Ratio calculated as body_length / total_image_area
+        min: minimum ratio
+        max: maximum ratio
+
+  pdf_image_size_exact()
+
+     body RULENAME eval:pdf_image_size_exact(<h>,<w>)
+        h: image height is exactly h
+        w: image width is exactly w
+
+  pdf_image_size_range()
+
+     body RULENAME eval:pdf_image_size_range(<minh>,<minw>,[<maxh>],[<maxw>])
+        minh: image height is atleast minh
+        minw: image width is atleast minw
+        maxh: (optional) image height is no more than maxh
+        maxw: (optional) image width is no more than maxw
+
  NOTE: See the ruleset for more examples that are not documented here.

 =back
@ -145,9 +166,8 @@ use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Util qw(compile_regexp);
 use strict;
 use warnings;
-# use bytes;
+use re 'taint';
 use Digest::MD5 qw(md5_hex);
-use MIME::QuotedPrint;

 our @ISA = qw(Mail::SpamAssassin::Plugin);

@ -161,89 +181,115 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->register_eval_rule ("pdf_count");
-  $self->register_eval_rule ("pdf_image_count");
-  $self->register_eval_rule ("pdf_pixel_coverage");
-  $self->register_eval_rule ("pdf_image_size_exact");
-  $self->register_eval_rule ("pdf_image_size_range");
-  $self->register_eval_rule ("pdf_named");
-  $self->register_eval_rule ("pdf_name_regex");
-  $self->register_eval_rule ("pdf_image_to_text_ratio");
-  $self->register_eval_rule ("pdf_match_md5");
-  $self->register_eval_rule ("pdf_match_fuzzy_md5");
-  $self->register_eval_rule ("pdf_match_details");
-  $self->register_eval_rule ("pdf_is_encrypted");
-  $self->register_eval_rule ("pdf_is_empty_body");
+  $self->register_eval_rule ("pdf_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_image_count", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_pixel_coverage", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_image_size_exact", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_image_size_range", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_named", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_name_regex", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_image_to_text_ratio", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_match_md5", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_match_fuzzy_md5", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_match_details", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_is_encrypted", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+  $self->register_eval_rule ("pdf_is_empty_body", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);
+
+  # lower priority for add_uri_detail_list to work
+  $self->register_method_priority ("parsed_metadata", -1);

  return $self;
 }

-# -----------------------------------------
+sub parsed_metadata {
+  my ($self, $opts) = @_;

-my %get_details = (
-  'pdf' => sub {
-    my ($self, $pms, $part) = @_;
+  my $pms = $opts->{permsgstatus};

-    my $type = $part->{'type'} || 'base64';
-    my $data = '';
+  # initialize
+  $pms->{pdfinfo}->{count_pdf} = 0;
+  $pms->{pdfinfo}->{count_pdf_images} = 0;

-    if ($type eq 'quoted-printable') {
-      $data = decode_qp($data); # use QuotedPrint->decode_qp
-    }
-    else {
-      $data = $part->decode();  # just use built in base64 decoder
+  my @parts = $pms->{msg}->find_parts(qr@^(image|application)/(pdf|octet\-stream)$@, 1);
+  my $part_count = scalar @parts;
+
+  dbg("pdfinfo: Identified $part_count possible mime parts that need checked for PDF content");
+
+  foreach my $p (@parts) {
+    my $type = $p->{type} || '';
+    my $name = $p->{name} || '';
+
+    dbg("pdfinfo: found part, type=$type file=$name");
+
+    # filename must end with .pdf, or application type can be pdf
+    # sometimes windows muas will wrap a pdf up inside a .dat file
+    # v0.8 - Added .fdf phoney PDF detection
+    next unless ($name =~ /\.[fp]df$/i || $type =~ m@/pdf$@);
+
+    _get_pdf_details($pms, $p);
+    $pms->{pdfinfo}->{count_pdf}++;
  }

-    my $index = substr($data, 0, 8);
+  _set_tag($pms, 'PDFCOUNT',  $pms->{pdfinfo}->{count_pdf});
+  _set_tag($pms, 'PDFIMGCOUNT', $pms->{pdfinfo}->{count_pdf_images});
+}

-    return unless ($index =~ /.PDF\-(\d\.\d)/);
+sub _get_pdf_details {
+  my ($pms, $part) = @_;
+
+  my $data = $part->decode();
+
+  # Remove UTF-8 BOM
+  $data =~ s/^\xef\xbb\xbf//;
+
+  # Search magic in first 1024 bytes
+  if ($data !~ /^.{0,1024}\%PDF\-(\d\.\d)/s) {
+    dbg("pdfinfo: PDF magic header not found, invalid file?");
+    return;
+  }
  my $version = $1;
-    $self->_set_tag($pms, 'PDFVERSION', $version);
+  _set_tag($pms, 'PDFVERSION', $version);
  # dbg("pdfinfo: pdf version = $version");

-    my ($height, $width, $fuzzy_data, $pdf_tags);
-    my ($producer, $created, $modified, $title, $creator, $author) = ('unknown','0','0','untitled','unknown','unknown');
-    my ($md5, $fuzzy_md5) = ('', '');
+  my ($fuzzy_data, $pdf_tags);
+  my ($md5, $fuzzy_md5) = ('','');
  my ($total_height, $total_width, $total_area, $line_count) = (0,0,0,0);

-    my $name = $part->{'name'} || '';
-    $self->_set_tag($pms, 'PDFNAME', $name);
+  my $name = $part->{name} || '';
+  _set_tag($pms, 'PDFNAME', $name);
+  # store the file name so we can check pdf_named() or pdf_name_match() later.
+  $pms->{pdfinfo}->{names_pdf}->{$name} = 1 if $name;

  my $no_more_fuzzy = 0;
  my $got_image = 0;
  my $encrypted = 0;
+  my %uris;

-    while($data =~ /([^\n]+)/g) {
+  while ($data =~ /([^\n]+)/g) {
    # dbg("pdfinfo: line=$1");
    my $line = $1;

-      $line_count++;
-
-      # lines containing high bytes will have no data we need, so save some cycles
-      next if ($line =~ /[\x80-\xff]/);
-
-      if (!$no_more_fuzzy && $line_count < 70) {
+    if (!$no_more_fuzzy && ++$line_count < 70) {
      if ($line !~ m/^\%/ && $line !~ m/^\/(?:Height|Width|(?:(?:Media|Crop)Box))/ && $line !~ m/^\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+cm$/) {
        $line =~ s/\s+$//;  # strip off whitespace at end.
        $fuzzy_data .= $line;
      }
-      }
-
-      if ($line =~ m/^\/([A-Za-z]+)/) {
-         $pdf_tags .= $1;
-      }
-
-      $got_image=1 if ($line =~ m/\/Image/);
-      $encrypted=1 if ($line =~ m/^\/Encrypt/);
-
      # once we hit the first stream, we stop collecting data for fuzzy md5
-      $no_more_fuzzy = 1 if ($line =~ m/stream/);
+      $no_more_fuzzy = 1  if index($line, 'stream') >= 0;
+    }
+
+    $got_image = 1  if index($line, '/Image') >= 0;
+    if (!$encrypted && index($line, '/Encrypt') == 0) {
+      # store encrypted flag.
+      $encrypted = $pms->{pdfinfo}->{encrypted} = 1;
+    }

    # From a v1.3 pdf
    # [12234] dbg: pdfinfo: line=630 0 0 149 0 0 cm
    # [12234] dbg: pdfinfo: line=/Width 630
    # [12234] dbg: pdfinfo: line=/Height 149
    if ($got_image) {
+      my ($width, $height);
      if ($line =~ /^(\d+)\s+\d+\s+\d+\s+(\d+)\s+\d+\s+\d+\s+cm$/) {
        $width = $1;
        $height = $2;
@ -258,21 +304,38 @@ my %get_details = (
        $width = $1;
        $height = $2;
      }
-      }
-
-      # did pdf contain image data?
-      if ($got_image && $width && $height) {
+      if ($width && $height) {
        $no_more_fuzzy = 1;
        my $area = $width * $height;
        $total_height += $height;
        $total_width += $width;
        $total_area += $area;
        $pms->{pdfinfo}->{dems_pdf}->{"${height}x${width}"} = 1;
-        $pms->{'pdfinfo'}->{"count_pdf_images"} ++;
-        dbg("pdfinfo: Found image in PDF ".($name ? $name : '')." - $height x $width pixels ($area pixels sq.)");
-        $self->_set_tag($pms, 'PDFIMGDIM', "${height}x${width}");
-        $height=0; $width=0;  # reset and check for next image
-        $got_image = 0;
+        $pms->{pdfinfo}->{count_pdf_images}++;
+        dbg("pdfinfo: Found image in PDF $name: $height x $width pixels ($area pixels sq.)");
+        _set_tag($pms, 'PDFIMGDIM', "${height}x${width}");
+        $got_image = $height = $width = 0;  # reset and check for next image
+      }
+    }
+
+    #
+    # Triage - expecting / to be found for rest of the checks
+    #
+    next unless index($line, '/') >= 0;
+
+    if ($line =~ m/^\/([A-Za-z]+)/) {
+      $pdf_tags .= $1;
+    }
+
+    # XXX some pdf have uris but are stored inside binary data
+    if (keys %uris < 20 && $line =~ /(?:\/S\s{0,2}\/URI\s{0,2}|^\s*)\/URI\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
+      my $location = _parse_string($1);
+      next unless index($location, '.') > 0; # ignore some binary mess
+      if (!exists $uris{$location}) {
+        $uris{$location} = 1;
+        dbg("pdfinfo: found URI: $location");
+        $pms->add_uri_detail_list($location);
+      }
    }

    # [5310] dbg: pdfinfo: line=<</Producer(GPL Ghostscript 8.15)
@ -283,380 +346,231 @@ my %get_details = (
    # [5310] dbg: pdfinfo: line=/Author(colet)>>endobj
    # or all on same line inside xml - v1.6+
    # <</CreationDate(D:20070226165054-06'00')/Creator( Adobe Photoshop CS2 Windows)/Producer(Adobe Photoshop for Windows -- Image Conversion Plug-in)/ModDate(D:20070226165100-06'00')>>
-
-      if ($line =~ /\/Producer\s?\(([^\)\\]+)/) {
-        $producer = $1;
+    # Or hex values
+    # /Creator<FEFF005700720069007400650072>
+    if ($line =~ /\/Author\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
+      my $author = _parse_string($1);
+      dbg("pdfinfo: found property Author=$author");
+      $pms->{pdfinfo}->{details}->{author}->{$author} = 1;
+      _set_tag($pms, 'PDFAUTHOR', $author);
    }
-      if ($line =~ /\/CreationDate\s?\(D\:(\d+)/) {
-        $created = $1;
+    if ($line =~ /\/Creator\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
+      my $creator = _parse_string($1);
+      dbg("pdfinfo: found property Creator=$creator");
+      $pms->{pdfinfo}->{details}->{creator}->{$creator} = 1;
+      _set_tag($pms, 'PDFCREATOR', $creator);
    }
-      if ($line =~ /\/ModDate\s?\(D\:(\d+)/) {
-        $modified = $1;
+    if ($line =~ /\/CreationDate\s{0,2}\(D\:(\d+)/) {
+      my $created = _parse_string($1);
+      dbg("pdfinfo: found property Created=$created");
+      $pms->{pdfinfo}->{details}->{created}->{$created} = 1;
    }
-      if ($line =~ /\/Title\s?\(([^\)\\]+)/) {
-        $title = $1;
-        # Title=\376\377\000w\000w\000n\000g
-        # Title=wwng
-        $title =~ s/\\\d{3}//g;
+    if ($line =~ /\/ModDate\s{0,2}\(D\:(\d+)/) {
+      my $modified = _parse_string($1);
+      dbg("pdfinfo: found property Modified=$modified");
+      $pms->{pdfinfo}->{details}->{modified}->{$modified} = 1;
    }
-      if ($line =~ /\/Creator\s?\(([^\)\\]+)/) {
-        $creator = $1;
+    if ($line =~ /\/Producer\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
+      my $producer = _parse_string($1);
+      dbg("pdfinfo: found property Producer=$producer");
+      $pms->{pdfinfo}->{details}->{producer}->{$producer} = 1;
+      _set_tag($pms, 'PDFPRODUCER', $producer);
    }
-      if ($line =~ /\/Author\s?\(([^\)]+)/) {
-        $author = $1;
-        # Author=\376\377\000H\000P\000_\000A\000d\000m\000i\000n\000i\000s\000t\000r\000a\000t\000o\000r
-        # Author=HP_Administrator
-        $author =~ s/\\\d{3}//g;
+    if ($line =~ /\/Title\s{0,2}( \( .*? (?<!\\) \) | < [^>]* > )/x) {
+      my $title = _parse_string($1);
+      dbg("pdfinfo: found property Title=$title");
+      $pms->{pdfinfo}->{details}->{title}->{$title} = 1;
+      _set_tag($pms, 'PDFTITLE', $title);
    }
  }

-    # store the file name so we can check pdf_named() or pdf_name_match() later.
-    $pms->{pdfinfo}->{names_pdf}->{$name} = 1 if $name;
-
-    # store encrypted flag.
-    $pms->{pdfinfo}->{encrypted} = $encrypted;
-
  # if we had multiple images in the pdf, we need to store the total HxW as well.
  # If it was a single Image PDF, then this value will already be in the hash.
-    $pms->{pdfinfo}->{dems_pdf}->{"${total_height}x${total_width}"} = 1 if ($total_height && $total_width);;
+  $pms->{pdfinfo}->{dems_pdf}->{"${total_height}x${total_width}"} = 1 if ($total_height && $total_width);

  if ($total_area) {
    $pms->{pdfinfo}->{pc_pdf} = $total_area;
-      $self->_set_tag($pms, 'PDFIMGAREA', $total_area);
-      dbg("pdfinfo: Filename=$name Total HxW: $total_height x $total_width ($total_area area)") if ($total_area);
+    _set_tag($pms, 'PDFIMGAREA', $total_area);
+    dbg("pdfinfo: Total HxW: $total_height x $total_width ($total_area area)");
  }

-    dbg("pdfinfo: Filename=$name Title=$title Author=$author Producer=$producer Created=$created Modified=$modified");
-
  $md5 = uc(md5_hex($data)) if $data;
  $fuzzy_md5 = uc(md5_hex($fuzzy_data)) if $fuzzy_data;
-    my $tags_md5;
+  my $tags_md5 = '';
  $tags_md5 = uc(md5_hex($pdf_tags)) if $pdf_tags;

-    dbg("pdfinfo: MD5 results for ".($name ? $name : '')." - md5=".($md5 ? $md5 : '')." fuzzy1=".($fuzzy_md5 ? $fuzzy_md5 : '')." fuzzy2=".($tags_md5 ? $tags_md5 : ''));
+  dbg("pdfinfo: MD5 results for $name: md5=$md5 fuzzy1=$fuzzy_md5 fuzzy2=$tags_md5");

-    # we dont need tags for these.
-    $pms->{pdfinfo}->{details}->{created} = $created if $created;
-    $pms->{pdfinfo}->{details}->{modified} = $modified if $modified;
-
-    if ($producer) {
-      $pms->{pdfinfo}->{details}->{producer} = $producer if $producer;
-      $self->_set_tag($pms, 'PDFPRODUCER', $producer);
-    }
-    if ($title) {
-      $pms->{pdfinfo}->{details}->{title} = $title;
-      $self->_set_tag($pms, 'PDFTITLE', $title);
-    }
-    if ($creator) {
-      $pms->{pdfinfo}->{details}->{creator} = $creator;
-      $self->_set_tag($pms, 'PDFCREATOR', $creator);
-    }
-    if ($author) {
-      $pms->{pdfinfo}->{details}->{author} = $author;
-      $self->_set_tag($pms, 'PDFAUTHOR', $author);
-    }
  if ($md5) {
    $pms->{pdfinfo}->{md5}->{$md5} = 1;
-      $self->_set_tag($pms, 'PDFMD5', $fuzzy_md5);
+    _set_tag($pms, 'PDFMD5', $fuzzy_md5);
  }
  if ($fuzzy_md5) {
    $pms->{pdfinfo}->{fuzzy_md5}->{$fuzzy_md5} = 1;
-      $self->_set_tag($pms, 'PDFMD5FUZZY1', $fuzzy_md5);
+    _set_tag($pms, 'PDFMD5FUZZY1', $fuzzy_md5);
  }
  if ($tags_md5) {
    $pms->{pdfinfo}->{fuzzy_md5}->{$tags_md5} = 1;
-      $self->_set_tag($pms, 'PDFMD5FUZZY2', $tags_md5);
+    _set_tag($pms, 'PDFMD5FUZZY2', $tags_md5);
  }
-  },
+}

-);
-
-# ----------------------------------------
+sub _parse_string {
+  local $_ = shift;
+  # Anything inside < > is hex encoded
+  if (/^</) {
+    # Might contain whitespace so search all hex values
+    my $str = '';
+    $str .= pack("H*", $1) while (/([0-9A-Fa-f]{2})/g);
+    $_ = $str;
+    # Handle/strip UTF-16 (in ultra-naive way for now)
+    s/\x00//g if (s/^(?:\xfe\xff|\xff\xfe)//);
+  } else {
+    s/^\(//; s/\)$//;
+    # Decode octals
+    # Author=\376\377\000H\000P\000_\000A\000d\000m\000i\000n\000i\000s\000t\000r\000a\000t\000o\000r
+    s/(?<!\\)\\([0-3][0-7][0-7])/pack("C",oct($1))/ge;
+    # Handle/strip UTF-16 (in ultra-naive way for now)
+    s/\x00//g if (s/^(?:\xfe\xff|\xff\xfe)//);
+    # Unescape some stuff like \\ \( \)
+    # Title(Foo \(bar\))
+    s/\\([()\\])/$1/g;
+  }
+  # Limit to some sane length
+  return substr($_, 0, 256);
+}

 sub _set_tag {
+  my ($pms, $tag, $value) = @_;

-  my ($self, $pms, $tag, $value) = @_;
-
-  dbg("pdfinfo: set_tag called for $tag $value");
-  return unless ($tag && $value);
+  return unless defined $value && $value ne '';
+  dbg("pdfinfo: set_tag called for $tag: $value");

  if (exists $pms->{tag_data}->{$tag}) {
-    $pms->{tag_data}->{$tag} .= " $value";  # append value
+    # Limit to some sane length
+    if (length($pms->{tag_data}->{$tag}) < 2048) {
+      $pms->{tag_data}->{$tag} .= ' '.$value;  # append value
+    }
  }
  else {
    $pms->{tag_data}->{$tag} = $value;
  }
 }

-# ----------------------------------------
-
-sub _find_pdf_mime_parts {
-  my ($self,$pms) = @_;
-
-  # bail early if message does not have pdf parts
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-
-  # initialize
-  $pms->{'pdfinfo'}->{"pc_pdf"} = 0;
-  $pms->{'pdfinfo'}->{"count_pdf"} = 0;
-  $pms->{'pdfinfo'}->{"count_pdf_images"} = 0;
-
-  my @parts = $pms->{msg}->find_parts(qr@^(image|application)/(pdf|octet\-stream)$@, 1);
-  my $part_count = scalar @parts;
-
-  dbg("pdfinfo: Identified $part_count possible mime parts that need checked for PDF content");
-
-  # cache this so we can easily bail
-  $pms->{'pdfinfo'}->{'no_parts'} = 1 unless $part_count;
-
-  foreach my $p (@parts) {
-    my $type = $p->{'type'} =~ m@/([\w\-]+)$@;
-    my $name = $p->{'name'} || '';
-
-    my $cte = lc( $p->get_header('content-transfer-encoding') || '' );
-
-    dbg("pdfinfo: found part, type=".($type ? $type : '')." file=".($name ? $name : '')." cte=".($cte ? $cte : '')."");
-
-    # make sure its a cte we support
-    next unless ($cte =~ /^(?:base64|quoted\-printable)$/);
-
-    # filename must end with .pdf, or application type can be pdf
-    # sometimes windows muas will wrap a pdf up inside a .dat file
-    # v0.8 - Added .fdf phoney PDF detection
-    next unless ($name =~ /\.[fp]df$/ || $type eq 'pdf');
-
-    # if we get this far, make sure type is pdf for sure (not octet-stream or anything else)
-    $type='pdf';
-
-    if ($type && exists $get_details{$type}) {
-       $get_details{$type}->($self, $pms, $p);
-       $pms->{'pdfinfo'}->{"count_$type"} ++;
-    }
-  }
-
-  $self->_set_tag($pms, 'PDFCOUNT',  $pms->{'pdfinfo'}->{"count_pdf"});
-  $self->_set_tag($pms, 'PDFIMGCOUNT', $pms->{'pdfinfo'}->{"count_pdf_images"});
-
-}
-
-# ----------------------------------------
-
 sub pdf_named {
-  my ($self,$pms,$body,$name) = @_;
-  return unless (defined $name);
+  my ($self, $pms, $body, $name) = @_;

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
+  return 0 unless defined $name;

-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-
-  return 0 unless (exists $pms->{'pdfinfo'}->{"names_pdf"});
-  return 1 if (exists $pms->{'pdfinfo'}->{"names_pdf"}->{$name});
+  return 1 if exists $pms->{pdfinfo}->{names_pdf}->{$name};
  return 0;
 }

-# -----------------------------------------
-
 sub pdf_name_regex {
-  my ($self,$pms,$body,$re) = @_;
-  return unless (defined $re);
+  my ($self, $pms, $body, $regex) = @_;

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
+  return 0 unless defined $regex;
+  return 0 unless exists $pms->{pdfinfo}->{names_pdf};

-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"names_pdf"});
-
-  my ($rec, $err) = compile_regexp($re, 2);
+  my ($rec, $err) = compile_regexp($regex, 2);
  if (!$rec) {
-    info("pdfinfo: invalid regexp '$re': $err");
+    my $rulename = $pms->get_current_eval_rule_name();
+    warn "pdfinfo: invalid regexp for $rulename '$regex': $err";
    return 0;
  }

-  my $hit = 0;
-  foreach my $name (keys %{$pms->{'pdfinfo'}->{"names_pdf"}}) {
+  foreach my $name (keys %{$pms->{pdfinfo}->{names_pdf}}) {
    if ($name =~ $rec) {
      dbg("pdfinfo: pdf_name_regex hit on $name");
      return 1;
    }
  }
+
  return 0;
-
 }

-# -----------------------------------------
-
 sub pdf_is_encrypted {
-  my ($self,$pms,$body) = @_;
+  my ($self, $pms, $body) = @_;

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return $pms->{'pdfinfo'}->{'encrypted'};
+  return $pms->{pdfinfo}->{encrypted} ? 1 : 0;
 }

-# -----------------------------------------
-
 sub pdf_count {
-  my ($self,$pms,$body,$min,$max) = @_;
-  return unless defined $min;
-
-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"count_pdf"});
-  return result_check($min, $max, $pms->{'pdfinfo'}->{"count_pdf"});
+  my ($self, $pms, $body, $min, $max) = @_;

+  return _result_check($min, $max, $pms->{pdfinfo}->{count_pdf});
 }

-# -----------------------------------------
-
 sub pdf_image_count {
-  my ($self,$pms,$body,$min,$max) = @_;
-  return unless defined $min;
-
-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"count_pdf_images"});
-  return result_check($min, $max, $pms->{'pdfinfo'}->{"count_pdf_images"});
+  my ($self, $pms, $body, $min, $max) = @_;

+  return _result_check($min, $max, $pms->{pdfinfo}->{count_pdf_images});
 }

-# -----------------------------------------
-
 sub pdf_pixel_coverage {
  my ($self,$pms,$body,$min,$max) = @_;
-  return unless (defined $min);

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"pc_pdf"});
-
-  # dbg("pdfinfo: pc_$type: $min, ".($max ? $max:'').", $type, ".$pms->{'pdfinfo'}->{"pc_pdf"});
-  return result_check($min, $max, $pms->{'pdfinfo'}->{"pc_pdf"});
+  return _result_check($min, $max, $pms->{pdfinfo}->{pc_pdf});
 }

-# -----------------------------------------
-
 sub pdf_image_to_text_ratio {
-  my ($self,$pms,$body,$min,$max) = @_;
-  return unless (defined $min && defined $max);
+  my ($self, $pms, $body, $min, $max) = @_;

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"pc_pdf"});
+  return 0 unless defined $max;
+  return 0 unless $pms->{pdfinfo}->{pc_pdf};

  # depending on how you call this eval (body vs rawbody),
  # the $textlen will differ.
-  my $textlen = length(join('',@$body));
+  my $textlen = length(join('', @$body));
+  return 0 unless $textlen;

-  return 0 unless ( $textlen > 0 && exists $pms->{'pdfinfo'}->{"pc_pdf"} && $pms->{'pdfinfo'}->{"pc_pdf"} > 0);
-
-  my $ratio = $textlen / $pms->{'pdfinfo'}->{"pc_pdf"};
+  my $ratio = $textlen / $pms->{pdfinfo}->{pc_pdf};
  dbg("pdfinfo: image ratio=$ratio, min=$min max=$max");
-  return result_check($min, $max, $ratio, 1);
-}

-# -----------------------------------------
+  return _result_check($min, $max, $ratio, 1);
+}

 sub pdf_is_empty_body {
-  my ($self,$pms,$body,$min) = @_;
+  my ($self, $pms, $body, $min) = @_;

+  return 0 unless $pms->{pdfinfo}->{count_pdf};
  $min ||= 0;  # default to 0 bytes

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless $pms->{'pdfinfo'}->{"count_pdf"};
-
-  # check for cached result
-  return 1 if $pms->{'pdfinfo'}->{"no_body_text"};
-
-  shift @$body;  # shift body array removes line #1 -> subject line.
-
  my $bytes = 0;
-  my $textlen = length(join('',@$body));
+  my $idx = 0;
  foreach my $line (@$body) {
-    next unless ($line =~ m/\S/);
-    next if ($line =~ m/^Subject/);
+    next if $idx++ == 0; # skip subject line
+    next unless $line =~ /\S/;
    $bytes += length($line);
+    # no hit if minimum already exceeded
+    return 0 if $bytes > $min;
  }

-  dbg("pdfinfo: is_empty_body = $bytes bytes");
-
-  if ($bytes == 0 || ($bytes <= $min)) {
-    $pms->{'pdfinfo'}->{"no_body_text"} = 1;
+  dbg("pdfinfo: pdf_is_empty_body matched ($bytes <= $min)");
  return 1;
-  }
-
-  # cache it and return 0
-  $pms->{'pdfinfo'}->{"no_body_text"} = 0;
-  return 0;
 }

-# -----------------------------------------
-
 sub pdf_image_size_exact {
-  my ($self,$pms,$body,$height,$width) = @_;
-  return unless (defined $height && defined $width);
+  my ($self, $pms, $body, $height, $width) = @_;

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
+  return 0 unless defined $width;

-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"dems_pdf"});
-  return 1 if (exists $pms->{'pdfinfo'}->{"dems_pdf"}->{"${height}x${width}"});
+  return 1 if exists $pms->{pdfinfo}->{dems_pdf}->{"${height}x${width}"};
  return 0;
 }

-# -----------------------------------------
-
 sub pdf_image_size_range {
-  my ($self,$pms,$body,$minh,$minw,$maxh,$maxw) = @_;
-  return unless (defined $minh && defined $minw);
+  my ($self, $pms, $body, $minh, $minw, $maxh, $maxw) = @_;

-  # make sure we have image data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
+  return 0 unless defined $minw;
+  return 0 unless exists $pms->{pdfinfo}->{dems_pdf};

-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"dems_pdf"});
-
-  foreach my $dem ( keys %{$pms->{'pdfinfo'}->{"dems_pdf"}}) {
-    my ($h,$w) = split(/x/,$dem);
+  foreach my $dem (keys %{$pms->{pdfinfo}->{dems_pdf}}) {
+    my ($h, $w) = split(/x/, $dem);
    next if ($h < $minh);  # height less than min height
    next if ($w < $minw);  # width less than min width
    next if (defined $maxh && $h > $maxh);  # height more than max height
    next if (defined $maxw && $w > $maxw);  # width more than max width
-
    # if we make it here, we have a match
    return 1;
  }
@ -664,88 +578,54 @@ sub pdf_image_size_range {
  return 0;
 }

-# -----------------------------------------
-
 sub pdf_match_md5 {
+  my ($self, $pms, $body, $md5) = @_;

-  my ($self,$pms,$body,$md5) = @_;
-  return unless defined $md5;
+  return 0 unless defined $md5;

-  my $uc_md5 = uc($md5);  # uppercase matches only
-
-  # make sure we have pdf data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"md5"});
-  return 1 if (exists $pms->{'pdfinfo'}->{"md5"}->{$uc_md5});
+  return 1 if exists $pms->{pdfinfo}->{md5}->{uc $md5};
  return 0;
 }

-# -----------------------------------------
-
 sub pdf_match_fuzzy_md5 {
+  my ($self, $pms, $body, $md5) = @_;

-  my ($self,$pms,$body,$md5) = @_;
-  return unless defined $md5;
+  return 0 unless defined $md5;

-  my $uc_md5 = uc($md5);  # uppercase matches only
-
-  # make sure we have pdf data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{"fuzzy_md5"});
-  return 1 if (exists $pms->{'pdfinfo'}->{"fuzzy_md5"}->{$uc_md5});
+  return 1 if exists $pms->{pdfinfo}->{fuzzy_md5}->{uc $md5};
  return 0;
 }

-# -----------------------------------------
-
 sub pdf_match_details {
  my ($self, $pms, $body, $detail, $regex) = @_;
-  return unless ($detail && $regex);

-  # make sure we have pdf data read in.
-  if (!exists $pms->{'pdfinfo'}) {
-    $self->_find_pdf_mime_parts($pms);
-  }
-
-  return 0 if (exists $pms->{'pdfinfo'}->{'no_parts'});
-  return 0 unless (exists $pms->{'pdfinfo'}->{'details'});
-
-  my $check_value = $pms->{pdfinfo}->{details}->{$detail};
-  return unless $check_value;
+  return 0 unless defined $regex;
+  return 0 unless exists $pms->{pdfinfo}->{details}->{$detail};

  my ($rec, $err) = compile_regexp($regex, 2);
  if (!$rec) {
-    info("pdfinfo: invalid regexp '$regex': $err");
+    my $rulename = $pms->get_current_eval_rule_name();
+    warn "pdfinfo: invalid regexp for $rulename '$regex': $err";
    return 0;
  }

-  if ($check_value =~ $rec) {
-    dbg("pdfinfo: pdf_match_details $detail $regex matches $check_value");
+  foreach (keys %{$pms->{pdfinfo}->{details}->{$detail}}) {
+    if ($_ =~ $rec) {
+      dbg("pdfinfo: pdf_match_details $detail ($regex) match: $_");
      return 1;
    }
+  }
+
  return 0;
 }

-# -----------------------------------------
-
-sub result_check {
+sub _result_check {
  my ($min, $max, $value, $nomaxequal) = @_;
-  return 0 unless defined $value;
-  return 0 if ($value < $min);
-  return 0 if (defined $max && $value > $max);
-  return 0 if (defined $nomaxequal && $nomaxequal && $value == $max);
+  return 0 unless defined $min && defined $value;
+  return 0 if $value < $min;
+  return 0 if defined $max && $value > $max;
+  return 0 if defined $nomaxequal && $nomaxequal && $value == $max;
  return 1;
 }

-# -----------------------------------------
-
 1;
-
--- a/upstream/lib/Mail/SpamAssassin/Plugin/PhishTag.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/PhishTag.pm
@ -21,8 +21,10 @@ package Mail::SpamAssassin::Plugin::PhishTag;

 use strict;
 use warnings;
+use re 'taint';
 use Errno qw(EBADF);
-use Mail::SpamAssassin;
+
+use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;

 our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -220,7 +222,7 @@ PhishTag - SpamAssassin plugin for redirecting links in incoming emails.
 =head1 DESCRIPTION

 PhishTag enables administrators to rewrite links in emails that trigger certain
-tests, preferably anti-phishing blacklist tests. The plugin will inhibit the
+tests, preferably anti-phishing blocklist tests. The plugin will inhibit the
 blocking of a portion of the emails that trigger the test by SpamAssassin, and
 let them pass to the users' inbox after the rewrite. It is useful in providing
 training to email users about company policies and general email usage.
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Phishing.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Phishing.pm
@ -1,6 +1,6 @@
 #
 # Author: Giovanni Bechis <gbechis@apache.org>
-# Copyright 2018,2019 Giovanni Bechis
+# Copyright 2018,2020 Giovanni Bechis
 #
 # <@LICENSE>
 # Licensed to the Apache Software Foundation (ASF) under one or more
@ -31,6 +31,7 @@ Mail::SpamAssassin::Plugin::Phishing - check uris against phishing feed
  ifplugin Mail::SpamAssassin::Plugin::Phishing
    phishing_openphish_feed /etc/mail/spamassassin/openphish-feed.txt
    phishing_phishtank_feed /etc/mail/spamassassin/phishtank-feed.csv
+    phishing_phishstats_feed /etc/mail/spamassassin/phishstats-feed.csv
    body     URI_PHISHING      eval:check_phishing()
    describe URI_PHISHING      Url match phishing in feed
  endif
@ -38,21 +39,25 @@ Mail::SpamAssassin::Plugin::Phishing - check uris against phishing feed
 =head1 DESCRIPTION

 This plugin finds uris used in phishing campaigns detected by
-OpenPhish or PhishTank feeds.
+OpenPhish, PhishTank or PhishStats feeds.

 The Openphish free feed is updated every 6 hours and can be downloaded from
 https://openphish.com/feed.txt.
-The Premium Openphish feed is not currently supported.

 The PhishTank free feed is updated every 1 hours and can be downloaded from
 http://data.phishtank.com/data/online-valid.csv.
 To avoid download limits a registration is required.

+The PhishStats feed is updated every 90 minutes and can be downloaded from
+https://phishstats.info/phish_score.csv.
+
 =cut

 package Mail::SpamAssassin::Plugin::Phishing;
 use strict;
 use warnings;
+use re 'taint';
+
 my $VERSION = 1.1;

 use Errno qw(EBADF);
@ -61,7 +66,7 @@ use Mail::SpamAssassin::PerMsgStatus;

 our @ISA = qw(Mail::SpamAssassin::Plugin);

-sub dbg { Mail::SpamAssassin::Plugin::dbg ("Phishing: @_"); }
+sub dbg { my $msg = shift; Mail::SpamAssassin::Plugin::dbg("Phishing: $msg", @_); }

 sub new {
    my ($class, $mailsa) = @_;
@ -71,7 +76,7 @@ sub new {
    bless ($self, $class);

    $self->set_config($mailsa->{conf});
-    $self->register_eval_rule("check_phishing");
+    $self->register_eval_rule("check_phishing", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

    return $self;
 }
@ -81,14 +86,98 @@ sub set_config {
    my @cmds;
    push(@cmds, {
        setting => 'phishing_openphish_feed',
+        is_admin => 1,
        type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
        }
    );
+
+=head1 ADMIN PREFERENCES
+
+The following options can be used in site-wide (C<local.cf>)
+configuration files to customize how the module handles phishing uris
+
+=cut
+
+=over 4
+
+=item phishing_openphish_feed
+
+Absolute path of the downloaded OpenPhish datafeed.
+
+=back
+
+=cut
    push(@cmds, {
        setting => 'phishing_phishtank_feed',
+        is_admin => 1,
        type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
        }
    );
+
+=over 4
+
+=item phishing_phishtank_feed
+
+Absolute path of the downloaded PhishTank datafeed.
+
+=back
+
+=cut
+    push(@cmds, {
+        setting => 'phishing_uri_noparam',
+        is_admin => 1,
+        default => 0,
+        type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+        }
+    );
+
+=over 4
+
+=item phishing_uri_noparam ( 0 | 1 ) (default: 0)
+
+If this option is set uri parameters will not be take into consideration
+when parsing the phishing uris datafeed.
+If this option is enabled and the url without parameters is "generic"
+(like https://www.kisa.link/url_redirector.php?url=...) the url will be
+skipped.
+
+=back
+
+=cut
+    push(@cmds, {
+        setting => 'phishing_phishstats_feed',
+        is_admin => 1,
+        type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
+        }
+    );
+
+=over 4
+
+=item phishing_phishstats_feed
+
+Absolute path of the downloaded PhishStats datafeed.
+
+=back
+
+=cut
+    push(@cmds, {
+        setting => 'phishing_phishstats_minscore',
+        is_admin => 1,
+        default => 6,
+        type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+        }
+    );
+
+=over 4
+
+=item phishing_phishstats_minscore ( 0 - 10 ) (default: 6)
+
+Minimum score to take into consideration for phishing uris downloaded
+from PhishStats datafeed.
+
+=back
+
+=cut
    $conf->{parser}->register_commands(\@cmds);
 }

@ -100,7 +189,8 @@ sub finish_parsing_end {
 sub _read_configfile {
  my ($self) = @_;
  my $conf = $self->{main}->{registryboundaries}->{conf};
-  my @phtank_ln;
+  my (@phtank_ln, @phstats_ln);
+  my $stripped_cluri;

  local *F;
  if ( defined($conf->{phishing_openphish_feed}) && ( -f $conf->{phishing_openphish_feed} ) ) {
@ -109,10 +199,14 @@ sub _read_configfile {
        chomp;
        #lines that start with pound are comments
        next if(/^\s*\#/);
+        $stripped_cluri = $_;
+	if ( $conf->{phishing_uri_noparam} eq 1 ) {
+          $stripped_cluri =~ s/\?.*//;
+	}
        my $phishdomain = $self->{main}->{registryboundaries}->uri_to_domain($_);
        if ( defined $phishdomain ) {
-          push @{$self->{PHISHING}->{$_}->{phishdomain}}, $phishdomain;
-          push @{$self->{PHISHING}->{$_}->{phishinfo}->{$phishdomain}}, "OpenPhish";
+          push @{$self->{PHISHING}->{$stripped_cluri}->{phishdomain}}, $phishdomain;
+          push @{$self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$phishdomain}}, "OpenPhish";
        }
    }

@ -133,11 +227,14 @@ sub _read_configfile {

        @phtank_ln = split(/,/, $_);
        $phtank_ln[1] =~ s/\"//g;
-
+        $stripped_cluri = $phtank_ln[1];
+	if ( $conf->{phishing_uri_noparam} eq 1 ) {
+          $stripped_cluri =~ s/\?.*//;
+	}
        my $phishdomain = $self->{main}->{registryboundaries}->uri_to_domain($phtank_ln[1]);
        if ( defined $phishdomain ) {
-          push @{$self->{PHISHING}->{$phtank_ln[1]}->{phishdomain}}, $phishdomain;
-          push @{$self->{PHISHING}->{$phtank_ln[1]}->{phishinfo}->{$phishdomain}}, "PhishTank";
+          push @{$self->{PHISHING}->{$stripped_cluri}->{phishdomain}}, $phishdomain;
+          push @{$self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$phishdomain}}, "PhishTank";
        }
    }

@ -146,6 +243,40 @@ sub _read_configfile {
                : die "error reading config file: $!";
    close(F) or die "error closing config file: $!";
  }
+
+  if ( defined($conf->{phishing_phishstats_feed}) && (-f $conf->{phishing_phishstats_feed} ) ) {
+    open(F, '<', $conf->{phishing_phishstats_feed});
+    for ($!=0; <F>; $!=0) {
+        #skip first line
+        next if ( $. eq 1);
+        chomp;
+        #lines that start with pound are comments
+        next if(/^\s*\#/);
+
+	# CSV: Date,Score,URL,IP
+        @phstats_ln = split(/,/, $_);
+        $phstats_ln[1] =~ s/\"//g;
+        $phstats_ln[2] =~ s/\"//g;
+	if ( $conf->{phishing_phishstats_minscore} >= $phstats_ln[1] ) {
+	  next;
+	}
+        $stripped_cluri = $phstats_ln[2];
+	if ( $conf->{phishing_uri_noparam} eq 1 ) {
+          $stripped_cluri =~ s/\?.*//;
+	}
+        my $phishdomain = $self->{main}->{registryboundaries}->uri_to_domain($phstats_ln[2]);
+        if ( defined $phishdomain ) {
+          push @{$self->{PHISHING}->{$stripped_cluri}->{phishdomain}}, $phishdomain;
+          push @{$self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$phishdomain}}, "PhishStats";
+        }
+    }
+
+    defined $_ || $!==0  or
+      $!==EBADF ? dbg("PHISHING: error reading config file: $!")
+                : die "error reading config file: $!";
+    close(F) or die "error closing config file: $!";
+  }
+
 }

 sub check_phishing {
@ -153,10 +284,11 @@ sub check_phishing {

  my $feedname;
  my $domain;
+  my $stripped_cluri;
+  my $dcnt;
+
  my $uris = $pms->get_uri_detail_list();
-
  my $rulename = $pms->get_current_eval_rule_name();
-
  while (my($uri, $info) = each %{$uris}) {
    # we want to skip mailto: uris
    next if ($uri =~ /^mailto:/i);
@ -166,12 +298,21 @@ sub check_phishing {
    if (($info->{types}->{a}) || ($info->{types}->{parsed})) {
      # check url
      foreach my $cluri (@{$info->{cleaned}}) {
-        if ( exists $self->{PHISHING}->{$cluri} ) {
+        $stripped_cluri = $cluri;
+	if( $self->{main}->{conf}->{phishing_uri_noparam} eq 1 ) {
+          $stripped_cluri =~ s/\?.*//;
+          $dcnt = $stripped_cluri =~ tr/\///;
+	}
+	# If uri without parameters are considered, skip too short uris
+	# like https://www.google.com/url?sa=t&url=http://badsite.com
+        if( ($self->{main}->{conf}->{phishing_uri_noparam} eq 1) && ($dcnt <= 3) ) {
+          next;
+        }
+        if ( exists $self->{PHISHING}->{$stripped_cluri} ) {
          $domain = $self->{main}->{registryboundaries}->uri_to_domain($cluri);
-          $feedname = $self->{PHISHING}->{$cluri}->{phishinfo}->{$domain}[0];
-          dbg("HIT! $domain [$cluri] found in $feedname feed");
-          $pms->test_log("$feedname ($domain)");
-          $pms->got_hit($rulename, "", ruletype => 'eval');
+          $feedname = $self->{PHISHING}->{$stripped_cluri}->{phishinfo}->{$domain}[0];
+          dbg("HIT! $domain [$stripped_cluri] found in $feedname feed");
+          $pms->test_log("$feedname ($domain)", $rulename);
          return 1;
        }
      }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Pyzor.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Pyzor.pm
@ -37,13 +37,17 @@ package Mail::SpamAssassin::Plugin::Pyzor;
 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Timeout;
-use Mail::SpamAssassin::Util qw(untaint_var untaint_file_path
-                                proc_status_ok exit_status_str);
+use Mail::SpamAssassin::SubProcBackChannel;
+use Mail::SpamAssassin::Util qw(untaint_var untaint_file_path am_running_on_windows
+                                proc_status_ok exit_status_str force_die);
 use strict;
 use warnings;
 # use bytes;
 use re 'taint';

+use Storable;
+use POSIX qw(PIPE_BUF WNOHANG);
+
 our @ISA = qw(Mail::SpamAssassin::Plugin);

 sub new {
@ -64,7 +68,7 @@ sub new {
    dbg("pyzor: network tests on, attempting Pyzor");
  }

-  $self->register_eval_rule("check_pyzor");
+  $self->register_eval_rule("check_pyzor", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);

  $self->set_config($mailsaobject->{conf});

@ -87,11 +91,27 @@ Whether to use Pyzor, if it is available.

  push (@cmds, {
    setting => 'use_pyzor',
+    is_admin => 1,
    default => 1,
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
  });

-=item pyzor_max NUMBER		(default: 5)
+=item pyzor_fork (0|1)		(default: 1)
+
+Instead of running Pyzor synchronously, fork separate process for it and
+read the results in later (similar to async DNS lookups).  Increases
+throughput. Considered experimental on Windows, where default is 0.
+
+=cut
+
+  push(@cmds, {
+    setting => 'pyzor_fork',
+    is_admin => 1,
+    default => am_running_on_windows()?0:1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+  });
+
+=item pyzor_count_min NUMBER	(default: 5)

 This option sets how often a message's body checksum must have been
 reported to the Pyzor server before SpamAssassin will consider the Pyzor
@ -103,18 +123,69 @@ set this to a relatively low value, e.g. C<5>.
 =cut

  push (@cmds, {
-    setting => 'pyzor_max',
+    setting => 'pyzor_count_min',
+    is_admin => 1,
    default => 5,
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
  });

+  # Deprecated setting, the name makes no sense!
+  push (@cmds, {
+    setting => 'pyzor_max',
+    is_admin => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      warn("deprecated setting used, change pyzor_max to pyzor_count_min\n");
+      if ($value !~ /^\d+$/) {
+        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
+      }
+      $self->{pyzor_count_min} = $value;
+    }
+  });
+
+=item pyzor_welcomelist_min NUMBER	(default: 10)
+
+Previously pyzor_whitelist_min which will work interchangeably until 4.1.
+
+This option sets how often a message's body checksum must have been
+welcomelisted to the Pyzor server for SpamAssassin to consider ignoring the
+result.  Final decision is made by pyzor_welcomelist_factor.
+
+=cut
+
+  push (@cmds, {
+    setting => 'pyzor_welcomelist_min',
+    aliases => ['pyzor_whitelist_min'], # removed in 4.1
+    is_admin => 1,
+    default => 10,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
+=item pyzor_welcomelist_factor NUMBER	(default: 0.2)
+
+Previously pyzor_whitelist_factor which will work interchangeably until 4.1.
+
+Ignore Pyzor result if REPORTCOUNT x NUMBER >= pyzor_welcomelist_min.
+For default setting this means: 50 reports requires 10 welcomelistings.
+
+=cut
+
+  push (@cmds, {
+    setting => 'pyzor_welcomelist_factor',
+    aliases => ['pyzor_whitelist_factor'], # removed in 4.1
+    is_admin => 1,
+    default => 0.2,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
+  });
+
 =back

 =head1 ADMINISTRATOR OPTIONS

 =over 4

-=item pyzor_timeout n		(default: 3.5)
+=item pyzor_timeout n		(default: 5)

 How many seconds you wait for Pyzor to complete, before scanning continues
 without the Pyzor results. A numeric value is optionally suffixed by a
@ -142,7 +213,7 @@ removing one of them.
  push (@cmds, {
    setting => 'pyzor_timeout',
    is_admin => 1,
-    default => 3.5,
+    default => 5,
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_DURATION
  });

@ -202,178 +273,339 @@ you should use this, as the current PATH will have been cleared.
 sub is_pyzor_available {
  my ($self) = @_;

-  my $pyzor = $self->{main}->{conf}->{pyzor_path} || '';
-  unless ($pyzor) {
-    $pyzor = Mail::SpamAssassin::Util::find_executable_in_env_path('pyzor');
-  }
+  my $pyzor = $self->{main}->{conf}->{pyzor_path} ||
+    Mail::SpamAssassin::Util::find_executable_in_env_path('pyzor');
+
  unless ($pyzor && -x $pyzor) {
-    dbg("pyzor: pyzor is not available: no pyzor executable found");
+    dbg("pyzor: no pyzor executable found");
+    $self->{pyzor_available} = 0;
    return 0;
  }

  # remember any found pyzor
  $self->{main}->{conf}->{pyzor_path} = $pyzor;

-  dbg("pyzor: pyzor is available: " . $self->{main}->{conf}->{pyzor_path});
+  dbg("pyzor: pyzor is available: $pyzor");
  return 1;
 }

-sub get_pyzor_interface {
-  my ($self) = @_;
+sub finish_parsing_start {
+  my ($self, $opts) = @_;

-  if (!$self->{main}->{conf}->{use_pyzor}) {
-    dbg("pyzor: use_pyzor option not enabled, disabling Pyzor");
-    $self->{pyzor_interface} = "disabled";
-    $self->{pyzor_available} = 0;
+  # If forking, hard adjust priority -100 to launch early
+  # Find rulenames from eval_to_rule mappings
+  if ($opts->{conf}->{pyzor_fork}) {
+    foreach (@{$opts->{conf}->{eval_to_rule}->{check_pyzor}}) {
+      dbg("pyzor: adjusting rule $_ priority to -100");
+      $opts->{conf}->{priority}->{$_} = -100;
    }
-  elsif ($self->is_pyzor_available()) {
-    $self->{pyzor_interface} = "pyzor";
-    $self->{pyzor_available} = 1;
-  }
-  else {
-    dbg("pyzor: no pyzor found, disabling Pyzor");
-    $self->{pyzor_available} = 0;
  }
 }

 sub check_pyzor {
-  my ($self, $permsgstatus, $full) = @_;
+  my ($self, $pms, $full) = @_;

-  # initialize valid tags
-  $permsgstatus->{tag_data}->{PYZOR} = "";
+  return 0 if !$self->{pyzor_available};
+  return 0 if !$self->{main}->{conf}->{use_pyzor};
+
+  return 0 if $pms->{pyzor_running};
+  $pms->{pyzor_running} = 1;
+
+  return 0 if !$self->is_pyzor_available();

  my $timer = $self->{main}->time_method("check_pyzor");

-  $self->get_pyzor_interface();
-  return 0 unless $self->{pyzor_available};
+  # initialize valid tags
+  $pms->{tag_data}->{PYZOR} = '';

-  return $self->pyzor_lookup($permsgstatus, $full);
+  # create fulltext tmpfile now (before possible forking)
+  $pms->{pyzor_tmpfile} = $pms->create_fulltext_tmpfile();
+
+  ## non-forking method
+
+  if (!$self->{main}->{conf}->{pyzor_fork}) {
+    my @results = $self->pyzor_lookup($pms);
+    return $self->_check_result($pms, \@results);
+  }
+
+  ## forking method
+
+  $pms->{pyzor_rulename} = $pms->get_current_eval_rule_name();
+
+  # create socketpair for communication
+  $pms->{pyzor_backchannel} = Mail::SpamAssassin::SubProcBackChannel->new();
+  my $back_selector = '';
+  $pms->{pyzor_backchannel}->set_selector(\$back_selector);
+  eval {
+    $pms->{pyzor_backchannel}->setup_backchannel_parent_pre_fork();
+  } or do {
+    dbg("pyzor: backchannel pre-setup failed: $@");
+    delete $pms->{pyzor_backchannel};
+    return 0;
+  };
+
+  my $pid = fork();
+  if (!defined $pid) {
+    info("pyzor: child fork failed: $!");
+    delete $pms->{pyzor_backchannel};
+    return 0;
+  }
+  if (!$pid) {
+    $0 = "$0 (pyzor)";
+    $SIG{CHLD} = 'DEFAULT';
+    $SIG{PIPE} = 'IGNORE';
+    $SIG{$_} = sub {
+      eval { dbg("pyzor: child process $$ caught signal $_[0]"); };
+      force_die(6);  # avoid END and destructor processing
+      } foreach am_running_on_windows()?qw(INT HUP TERM QUIT):qw(INT HUP TERM TSTP QUIT USR1 USR2);
+    dbg("pyzor: child process $$ forked");
+    $pms->{pyzor_backchannel}->setup_backchannel_child_post_fork();
+    my @results = $self->pyzor_lookup($pms);
+    my $backmsg;
+    eval {
+      $backmsg = Storable::freeze(\@results);
+    };
+    if ($@) {
+      dbg("pyzor: child return value freeze failed: $@");
+      force_die(0); # avoid END and destructor processing
+    }
+    if (!syswrite($pms->{pyzor_backchannel}->{parent}, $backmsg)) {
+      dbg("pyzor: child backchannel write failed: $!");
+    }
+    force_die(0); # avoid END and destructor processing
+  }
+
+  $pms->{pyzor_pid} = $pid;
+
+  eval {
+    $pms->{pyzor_backchannel}->setup_backchannel_parent_post_fork($pid);
+  } or do {
+    dbg("pyzor: backchannel post-setup failed: $@");
+    delete $pms->{pyzor_backchannel};
+    return 0;
+  };
+
+  return; # return undef for async status
 }

 sub pyzor_lookup {
-  my ($self, $permsgstatus, $fulltext) = @_;
-  my @response;
-  my $pyzor_count;
-  my $pyzor_whitelisted;
-  my $timeout = $self->{main}->{conf}->{pyzor_timeout};
+  my ($self, $pms) = @_;

-  $pyzor_count = 0;
-  $pyzor_whitelisted = 0;
-  my $pid;
-
-  # use a temp file here -- open2() is unreliable, buffering-wise, under spamd
-  my $tmpf = $permsgstatus->create_fulltext_tmpfile($fulltext);
+  my $conf = $self->{main}->{conf};
+  my $timeout = $conf->{pyzor_timeout};

  # note: not really tainted, this came from system configuration file
-  my $path = untaint_file_path($self->{main}->{conf}->{pyzor_path});
-  my $opts = untaint_var($self->{main}->{conf}->{pyzor_options}) || '';
+  my $path = untaint_file_path($conf->{pyzor_path});
+  my $opts = untaint_var($conf->{pyzor_options}) || '';

-  $permsgstatus->enter_helper_run_mode();
+  $pms->enter_helper_run_mode();

+  my $pid;
+  my @resp;
  my $timer = Mail::SpamAssassin::Timeout->new(
-           { secs => $timeout, deadline => $permsgstatus->{master_deadline} });
+           { secs => $timeout, deadline => $pms->{master_deadline} });
  my $err = $timer->run_and_catch(sub {
-
    local $SIG{PIPE} = sub { die "__brokenpipe__ignore__\n" };

-    dbg("pyzor: opening pipe: " . join(' ', $path, $opts, "check", "< $tmpf"));
+    dbg("pyzor: opening pipe: ".
+      join(' ', $path, $opts, "check", "<".$pms->{pyzor_tmpfile}));

    $pid = Mail::SpamAssassin::Util::helper_app_pipe_open(*PYZOR,
-	$tmpf, 1, $path, split(' ', $opts), "check");
+	$pms->{pyzor_tmpfile}, 1, $path, split(' ', $opts), "check");
    $pid or die "$!\n";

    # read+split avoids a Perl I/O bug (Bug 5985)
-    my($inbuf,$nread,$resp); $resp = '';
-    while ( $nread=read(PYZOR,$inbuf,8192) ) { $resp .= $inbuf }
+    my($inbuf, $nread);
+    my $resp = '';
+    while ($nread = read(PYZOR, $inbuf, 8192)) { $resp .= $inbuf }
    defined $nread  or die "error reading from pipe: $!";
-    @response = split(/^/m, $resp, -1);  undef $resp;
+    @resp = split(/^/m, $resp, -1);

-    my $errno = 0;  close PYZOR or $errno = $!;
-    if (proc_status_ok($?,$errno)) {
+    my $errno = 0;
+    close PYZOR or $errno = $!;
+    if (proc_status_ok($?, $errno)) {
      dbg("pyzor: [%s] finished successfully", $pid);
-    } elsif (proc_status_ok($?,$errno, 0,1)) {  # sometimes it exits with 1
-      dbg("pyzor: [%s] finished: %s", $pid, exit_status_str($?,$errno));
+    } elsif (proc_status_ok($?, $errno, 0, 1)) {  # sometimes it exits with 1
+      dbg("pyzor: [%s] finished: %s", $pid, exit_status_str($?, $errno));
    } else {
-      info("pyzor: [%s] error: %s", $pid, exit_status_str($?,$errno));
-    }
-
-    if (!@response) {
-      # this exact string is needed below
-      warn("no response\n");	# yes, this is possible
-      return;
-    }
-    chomp for @response;
-
-    if ($response[0] =~ /^Traceback/) {
-      warn("internal error, python traceback seen in response: ".
-        join("\\n", @response));
-    } else {
-      dbg("pyzor: got response: ".join("\\n", @response));
+      info("pyzor: [%s] error: %s", $pid, exit_status_str($?, $errno));
    }

  });

  if (defined(fileno(*PYZOR))) {  # still open
    if ($pid) {
-      if (kill('TERM',$pid)) { dbg("pyzor: killed stale helper [$pid]") }
-      else { dbg("pyzor: killing helper application [$pid] failed: $!") }
+      if (kill('TERM', $pid)) {
+        dbg("pyzor: killed stale helper [$pid]");
+      } else {
+        dbg("pyzor: killing helper application [$pid] failed: $!");
      }
-    my $errno = 0;  close PYZOR or $errno = $!;
-    proc_status_ok($?,$errno)
-      or info("pyzor: [%s] error: %s", $pid, exit_status_str($?,$errno));
    }
-  $permsgstatus->leave_helper_run_mode();
+    my $errno = 0;
+    close PYZOR or $errno = $!;
+    proc_status_ok($?, $errno)
+      or info("pyzor: [%s] error: %s", $pid, exit_status_str($?, $errno));
+  }
+
+  $pms->leave_helper_run_mode();

  if ($timer->timed_out()) {
    dbg("pyzor: check timed out after $timeout seconds");
-    return 0;
-  }
-
-  if ($err) {
+    return ();
+  } elsif ($err) {
    chomp $err;
-    if ($err eq "__brokenpipe__ignore__") {
-      dbg("pyzor: check failed: broken pipe");
-    } elsif ($err eq "no response") {
-      dbg("pyzor: check failed: no response");
-    } else {
-      warn("pyzor: check failed: $err\n");
+    info("pyzor: check failed: $err");
+    return ();
  }
+
+  return @resp;
+}
+
+sub check_tick {
+  my ($self, $opts) = @_;
+  $self->_check_forked_result($opts->{permsgstatus}, 0);
+}
+
+sub check_cleanup {
+  my ($self, $opts) = @_;
+  $self->_check_forked_result($opts->{permsgstatus}, 1);
+}
+
+sub _check_forked_result {
+  my ($self, $pms, $finish) = @_;
+
+  return 0 if !$pms->{pyzor_backchannel};
+  return 0 if !$pms->{pyzor_pid};
+
+  my $timer = $self->{main}->time_method("check_pyzor");
+
+  $pms->{pyzor_abort} = $pms->{deadline_exceeded} || $pms->{shortcircuited};
+
+  my $kid_pid = $pms->{pyzor_pid};
+  # if $finish, force waiting for the child
+  my $pid = waitpid($kid_pid, $finish && !$pms->{pyzor_abort} ? 0 : WNOHANG);
+  if ($pid == 0) {
+    #dbg("pyzor: child process $kid_pid not finished yet, trying later");
+    if ($pms->{pyzor_abort}) {
+      dbg("pyzor: bailing out due to deadline/shortcircuit");
+      kill('TERM', $kid_pid);
+      if (waitpid($kid_pid, WNOHANG) == 0) {
+        sleep(1);
+        if (waitpid($kid_pid, WNOHANG) == 0) {
+          dbg("pyzor: child process $kid_pid still alive, KILL");
+          kill('KILL', $kid_pid);
+          waitpid($kid_pid, 0);
+        }
+      }
+      delete $pms->{pyzor_pid};
+      delete $pms->{pyzor_backchannel};
+    }
+    return 0;
+  } elsif ($pid == -1) {
+    # child does not exist?
+    dbg("pyzor: child process $kid_pid already handled?");
+    delete $pms->{pyzor_backchannel};
    return 0;
  }

-  foreach my $one_response (@response) {
+  $pms->rule_ready($pms->{pyzor_rulename}); # mark rule ready for metas
+
+  dbg("pyzor: child process $kid_pid finished, reading results");
+
+  my $backmsg;
+  my $ret = sysread($pms->{pyzor_backchannel}->{latest_kid_fh}, $backmsg, am_running_on_windows()?512:PIPE_BUF);
+  if (!defined $ret || $ret == 0) {
+    dbg("pyzor: could not read result from child: ".($ret == 0 ? 0 : $!));
+    delete $pms->{pyzor_backchannel};
+    return 0;
+  }
+
+  delete $pms->{pyzor_backchannel};
+
+  my $results;
+  eval {
+    $results = Storable::thaw($backmsg);
+  };
+  if ($@) {
+    dbg("pyzor: child return value thaw failed: $@");
+    return;
+  }
+
+  $self->_check_result($pms, $results);
+}
+
+sub _check_result {
+  my ($self, $pms, $results) = @_;
+
+  if (!@$results) {
+    dbg("pyzor: no response from server");
+    return 0;
+  }
+
+  my $count = 0;
+  my $count_wl = 0;
+  foreach my $res (@$results) {
+    chomp($res);
+    if ($res =~ /^Traceback/) {
+      info("pyzor: internal error, python traceback seen in response: $res");
+      return 0;
+    }
+    dbg("pyzor: got response: $res");
    # this regexp is intended to be a little bit forgiving
-    if ($one_response =~ /^\S+\t.*?\t(\d+)\t(\d+)\s*$/) {
+    if ($res =~ /^\S+\t.*?\t(\d+)\t(\d+)\s*$/) {
      # until pyzor servers can sync their DBs,
      # sum counts obtained from all servers
-      $pyzor_whitelisted += $2+0;
-      $pyzor_count += $1+0;
-    }
-    else {
+      $count += untaint_var($1)+0; # crazy but needs untainting
+      $count_wl += untaint_var($2)+0;
+    } else {
      # warn on failures to parse
-      dbg("pyzor: failure to parse response \"$one_response\"");
+      info("pyzor: failure to parse response \"$res\"");
    }
  }

-  $permsgstatus->set_tag('PYZOR', $pyzor_whitelisted ? "Whitelisted."
-                                             : "Reported $pyzor_count times.");
+  my $conf = $self->{main}->{conf};

-  if ($pyzor_count >= $self->{main}->{conf}->{pyzor_max}) {
-    dbg("pyzor: listed: COUNT=$pyzor_count/$self->{main}->{conf}->{pyzor_max} WHITELIST=$pyzor_whitelisted");
+  my $count_min = $conf->{pyzor_count_min};
+  my $wl_min = $conf->{pyzor_welcomelist_min};
+
+  my $wl_limit = $count_wl >= $wl_min ?
+    $count * $conf->{pyzor_welcomelist_factor} : 0;
+
+  dbg("pyzor: result: COUNT=$count/$count_min WELCOMELIST=$count_wl/$wl_min/%.1f",
+    $wl_limit);
+  $pms->set_tag('PYZOR', "Reported $count times, welcomelisted $count_wl times.");
+
+  # Empty body etc results in same hash, we should skip very large numbers..
+  if ($count >= 1000000 || $count_wl >= 10000) {
+    dbg("pyzor: result exceeded hardcoded limits, ignoring: count/wl 1000000/10000");
+    return 0;
+  }
+
+  # Welcomelisted?
+  if ($wl_limit && $count_wl >= $wl_limit) {
+    dbg("pyzor: message welcomelisted");
+    return 0;
+  }
+
+  if ($count >= $count_min) {
+    if ($conf->{pyzor_fork}) {
+      # forked needs to run got_hit()
+      $pms->got_hit($pms->{pyzor_rulename}, "", ruletype => 'eval');
+      return 0;
+    }
    return 1;
  }
-
  return 0;
 }

 sub plugin_report {
  my ($self, $options) = @_;

-  return unless $self->{pyzor_available};
-  return unless $self->{main}->{conf}->{use_pyzor};
+  return if !$self->{pyzor_available};
+  return if !$self->{main}->{conf}->{use_pyzor};
+  return if $options->{report}->{options}->{dont_report_to_pyzor};
+  return if !$self->is_pyzor_available();

-  if (!$options->{report}->{options}->{dont_report_to_pyzor} && $self->is_pyzor_available())
-  {
  # use temporary file: open2() is unreliable due to buffering under spamd
  my $tmpf = $options->{report}->create_fulltext_tmpfile($options->{text});
  if ($self->pyzor_report($options, $tmpf)) {
@ -384,8 +616,9 @@ sub plugin_report {
  else {
    info("reporter: could not report spam to Pyzor");
  }
-    $options->{report}->delete_fulltext_tmpfile();
-  }
+  $options->{report}->delete_fulltext_tmpfile($tmpf);
+
+  return 1;
 }

 sub pyzor_report {
@ -449,6 +682,9 @@ sub pyzor_report {
  return 1;
 }

+# Version features
+sub has_fork { 1 }
+
 1;

 =back
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Razor2.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Razor2.pm
@ -43,11 +43,16 @@ package Mail::SpamAssassin::Plugin::Razor2;
 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Timeout;
+use Mail::SpamAssassin::SubProcBackChannel;
+use Mail::SpamAssassin::Util qw(force_die am_running_on_windows);
 use strict;
 use warnings;
 # use bytes;
 use re 'taint';

+use Storable;
+use POSIX qw(PIPE_BUF WNOHANG);
+
 our @ISA = qw(Mail::SpamAssassin::Plugin);

 sub new {
@ -73,8 +78,8 @@ sub new {
    }
  }

-  $self->register_eval_rule("check_razor2");
-  $self->register_eval_rule("check_razor2_range");
+  $self->register_eval_rule("check_razor2", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);
+  $self->register_eval_rule("check_razor2_range", $Mail::SpamAssassin::Conf::TYPE_FULL_EVALS);

  $self->set_config($mailsaobject->{conf});

@ -93,10 +98,26 @@ Whether to use Razor2, if it is available.

  push(@cmds, {
    setting => 'use_razor2',
+    is_admin => 1,
    default => 1,
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
  });

+=item razor_fork (0|1)		(default: 1)
+
+Instead of running Razor2 synchronously, fork separate process for it and
+read the results in later (similar to async DNS lookups). Increases
+throughput. Considered experimental on Windows, where default is 0.
+
+=cut
+
+  push(@cmds, {
+    setting => 'razor_fork',
+    is_admin => 1,
+    default => am_running_on_windows()?0:1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
+  });
+
 =back

 =head1 ADMINISTRATOR SETTINGS
@ -195,7 +216,7 @@ sub razor2_access {
      my $sigs = $rc->compute_sigs($objects)
 	  or die "$debug: error in compute_sigs";

-      # if mail isn't whitelisted, check it out
+      # if mail isn't welcomelisted, check it out
      # see 'man razor-whitelist'
      if ($type ne 'check' || ! $rc->local_check($objects->[0])) {
 	# provide a better error message when servers are unavailable,
@ -347,6 +368,8 @@ sub plugin_report {
  return unless $self->{main}->{conf}->{use_razor2};
  return if $options->{report}->{options}->{dont_report_to_razor};

+  my $timer = $self->{main}->time_method("razor2_report");
+
  if ($self->razor2_access($options->{text}, 'report', undef)) {
    $options->{report}->{report_available} = 1;
    info('reporter: spam reported to Razor');
@ -360,6 +383,8 @@ sub plugin_report {
 sub plugin_revoke {
  my ($self, $options) = @_;

+  my $timer = $self->{main}->time_method("razor2_revoke");
+
  return unless $self->{razor2_available};
  return if $self->{main}->{local_tests_only};
  return unless $self->{main}->{conf}->{use_razor2};
@ -375,34 +400,191 @@ sub plugin_revoke {
  }
 }

+sub finish_parsing_start {
+  my ($self, $opts) = @_;
+
+  # If forking, hard adjust priority -100 to launch early
+  # Find rulenames from eval_to_rule mappings
+  if ($opts->{conf}->{razor_fork}) {
+    foreach (@{$opts->{conf}->{eval_to_rule}->{check_razor2}}) {
+      dbg("razor2: adjusting rule $_ priority to -100");
+      $opts->{conf}->{priority}->{$_} = -100;
+    }
+    foreach (@{$opts->{conf}->{eval_to_rule}->{check_razor2_range}}) {
+      dbg("razor2: adjusting rule $_ priority to -100");
+      $opts->{conf}->{priority}->{$_} = -100;
+    }
+  }
+}
+
 sub check_razor2 {
-  my ($self, $permsgstatus, $full) = @_;
+  my ($self, $pms, $full) = @_;

-  return $permsgstatus->{razor2_result} if (defined $permsgstatus->{razor2_result});
-  $permsgstatus->{razor2_result} = 0;
-  $permsgstatus->{razor2_cf_score} = { '4' => 0, '8' => 0 };
+  return 0 unless $self->{razor2_available};
+  return 0 unless $self->{main}->{conf}->{use_razor2};

-  return unless $self->{razor2_available};
-  return unless $self->{main}->{conf}->{use_razor2};
+  return $pms->{razor2_result} if (defined $pms->{razor2_result});
+
+  return 0 if $pms->{razor2_running};
+  $pms->{razor2_running} = 1;

  my $timer = $self->{main}->time_method("check_razor2");

-  my $return;
-  my @results;
+  ## non-forking method

+  if (!$self->{main}->{conf}->{razor_fork}) {
    # TODO: check for cache header, set results appropriately
-
    # do it this way to make it easier to get out the results later from the
-  # netcache plugin
-  ($return, @results) =
-    $self->razor2_access($full, 'check', $permsgstatus->{master_deadline});
+    # netcache plugin ... what netcache plugin?
+    (undef, my @results) =
+      $self->razor2_access($full, 'check', $pms->{master_deadline});
+    return $self->_check_result($pms, \@results);
+  }
+
+  ## forking method
+
+  $pms->{razor2_rulename} = $pms->get_current_eval_rule_name();
+
+  # create socketpair for communication
+  $pms->{razor2_backchannel} = Mail::SpamAssassin::SubProcBackChannel->new();
+  my $back_selector = '';
+  $pms->{razor2_backchannel}->set_selector(\$back_selector);
+  eval {
+    $pms->{razor2_backchannel}->setup_backchannel_parent_pre_fork();
+  } or do {
+    dbg("razor2: backchannel pre-setup failed: $@");
+    delete $pms->{razor2_backchannel};
+    return 0;
+  };
+
+  my $pid = fork();
+  if (!defined $pid) {
+    info("razor2: child fork failed: $!");
+    delete $pms->{razor2_backchannel};
+    return 0;
+  }
+  if (!$pid) {
+    $0 = "$0 (razor2)";
+    $SIG{CHLD} = 'DEFAULT';
+    $SIG{PIPE} = 'IGNORE';
+    $SIG{$_} = sub {
+      eval { dbg("razor2: child process $$ caught signal $_[0]"); };
+      force_die(6);  # avoid END and destructor processing
+      } foreach am_running_on_windows()?qw(INT HUP TERM QUIT):qw(INT HUP TERM TSTP QUIT USR1 USR2);
+    dbg("razor2: child process $$ forked");
+    $pms->{razor2_backchannel}->setup_backchannel_child_post_fork();
+    (undef, my @results) =
+      $self->razor2_access($full, 'check', $pms->{master_deadline});
+    my $backmsg;
+    eval {
+      $backmsg = Storable::freeze(\@results);
+    };
+    if ($@) {
+      dbg("razor2: child return value freeze failed: $@");
+      force_die(0); # avoid END and destructor processing
+    }
+    if (!syswrite($pms->{razor2_backchannel}->{parent}, $backmsg)) {
+      dbg("razor2: child backchannel write failed: $!");
+    }
+    force_die(0); # avoid END and destructor processing
+  }
+
+  $pms->{razor2_pid} = $pid;
+
+  eval {
+    $pms->{razor2_backchannel}->setup_backchannel_parent_post_fork($pid);
+  } or do {
+    dbg("razor2: backchannel post-setup failed: $@");
+    delete $pms->{razor2_backchannel};
+    return 0;
+  };
+
+  return; # return undef for async status
+}
+
+sub check_tick {
+  my ($self, $opts) = @_;
+  $self->_check_forked_result($opts->{permsgstatus}, 0);
+}
+
+sub check_cleanup {
+  my ($self, $opts) = @_;
+  $self->_check_forked_result($opts->{permsgstatus}, 1);
+}
+
+sub _check_forked_result {
+  my ($self, $pms, $finish) = @_;
+
+  return 0 if !$pms->{razor2_backchannel};
+  return 0 if !$pms->{razor2_pid};
+
+  my $timer = $self->{main}->time_method("check_razor2");
+
+  $pms->{razor2_abort} = $pms->{deadline_exceeded} || $pms->{shortcircuited};
+
+  my $kid_pid = $pms->{razor2_pid};
+  # if $finish, force waiting for the child
+  my $pid = waitpid($kid_pid, $finish && !$pms->{razor2_abort} ? 0 : WNOHANG);
+  if ($pid == 0) {
+    #dbg("razor2: child process $kid_pid not finished yet, trying later");
+    if ($pms->{razor2_abort}) {
+      dbg("razor2: bailing out due to deadline/shortcircuit");
+      kill('TERM', $kid_pid);
+      if (waitpid($kid_pid, WNOHANG) == 0) {
+        sleep(1);
+        if (waitpid($kid_pid, WNOHANG) == 0) {
+          dbg("razor2: child process $kid_pid still alive, KILL");
+          kill('KILL', $kid_pid);
+          waitpid($kid_pid, 0);
+        }
+      }
+      delete $pms->{razor2_pid};
+      delete $pms->{razor2_backchannel};
+    }
+    return 0;
+  } elsif ($pid == -1) {
+    # child does not exist?
+    dbg("razor2: child process $kid_pid already handled?");
+    delete $pms->{razor2_backchannel};
+    return 0;
+  }
+
+  $pms->rule_ready($pms->{razor2_rulename}); # mark rule ready for metas
+
+  dbg("razor2: child process $kid_pid finished, reading results");
+
+  my $backmsg;
+  my $ret = sysread($pms->{razor2_backchannel}->{latest_kid_fh}, $backmsg, am_running_on_windows()?512:PIPE_BUF);
+  if (!defined $ret || $ret == 0) {
+    dbg("razor2: could not read result from child: ".($ret == 0 ? 0 : $!));
+    delete $pms->{razor2_backchannel};
+    return 0;
+  }
+
+  delete $pms->{razor2_backchannel};
+
+  my $results;
+  eval {
+    $results = Storable::thaw($backmsg);
+  };
+  if ($@) {
+    dbg("razor2: child return value thaw failed: $@");
+    return;
+  }
+
+  $self->_check_result($pms, $results);
+}
+
+sub _check_result {
+  my ($self, $pms, $results) = @_;
+
  $self->{main}->call_plugins ('process_razor_result',
-  	{ results => \@results, permsgstatus => $permsgstatus }
+  	{ results => $results, permsgstatus => $pms }
  );

-  foreach my $result (@results) {
+  foreach my $result (@$results) {
    if (exists $result->{result}) {
-      $permsgstatus->{razor2_result} = $result->{result} if $result->{result};
+      $pms->{razor2_result} = $result->{result} if $result->{result};
    }
    elsif ($result->{noresponse}) {
      dbg('razor2: part=' . $result->{part} . ' noresponse');
@ -415,46 +597,86 @@ sub check_razor2 {

      next if $result->{contested};

-      my $cf = $permsgstatus->{razor2_cf_score}->{$result->{engine}} || 0;
+      my $cf = $pms->{razor2_cf_score}->{$result->{engine}} || 0;
      if ($result->{confidence} > $cf) {
-        $permsgstatus->{razor2_cf_score}->{$result->{engine}} = $result->{confidence};
+        $pms->{razor2_cf_score}->{$result->{engine}} = $result->{confidence};
      }
    }
  }

-  dbg("razor2: results: spam? " . $permsgstatus->{razor2_result});
-  while(my ($engine, $cf) = each %{$permsgstatus->{razor2_cf_score}}) {
+  $pms->{razor2_result} ||= 0;
+  $pms->{razor2_cf_score} ||= {};
+
+  dbg("razor2: results: spam? " . $pms->{razor2_result});
+  while(my ($engine, $cf) = each %{$pms->{razor2_cf_score}}) {
    dbg("razor2: results: engine $engine, highest cf score: $cf");
  }

-  return $permsgstatus->{razor2_result};
+  if ($self->{main}->{conf}->{razor_fork}) {
+    # forked needs to run got_hit()
+    if ($pms->{razor2_rulename} && $pms->{razor2_result}) {
+      $pms->got_hit($pms->{razor2_rulename}, "", ruletype => 'eval');
+    }
+    # forked needs to run range callbacks
+    if ($pms->{razor2_range_callbacks}) {
+      foreach (@{$pms->{razor2_range_callbacks}}) {
+        $self->check_razor2_range($pms, '', @$_);
+      }
+    }
+  }
+
+  return $pms->{razor2_result};
 }

 # Check the cf value of a given message and return if it's within the
 # given range
 sub check_razor2_range {
-  my ($self, $permsgstatus, $body, $engine, $min, $max) = @_;
+  my ($self, $pms, $body, $engine, $min, $max, $rulename) = @_;

  # If Razor2 isn't available, or the general test is disabled, don't
  # continue.
-  return unless $self->{razor2_available};
-  return unless $self->{main}->{conf}->{use_razor2};
-  return unless $self->{main}->{conf}->{scores}->{'RAZOR2_CHECK'};
+  return 0 unless $self->{razor2_available};
+  return 0 unless $self->{main}->{conf}->{use_razor2};

-  # If Razor2 hasn't been checked yet, go ahead and run it.
-  unless (defined $permsgstatus->{razor2_result}) {
-    $self->check_razor2($permsgstatus, $body);
+  # Check if callback overriding rulename
+  if (!defined $rulename) {
+    $rulename = $pms->get_current_eval_rule_name();
  }

+  if ($pms->{razor2_abort}) {
+    $pms->rule_ready($rulename); # mark rule ready for metas
+    return;
+  }
+
+  # If forked, call back later unless results are in
+  if ($self->{main}->{conf}->{razor_fork}) {
+    if (!defined $pms->{razor2_result}) {
+      dbg("razor2: delaying check_razor2_range call for $rulename");
+      # array matches check_razor2_range() argument order
+      push @{$pms->{razor2_range_callbacks}},
+        [$engine, $min, $max, $rulename];
+      return; # return undef for async status
+    }
+  } else {
+    # If Razor2 hasn't been checked yet, go ahead and run it.
+    # (only if we are non-forking.. forking will handle these in
+    # callbacks)
+    if (!$pms->{razor2_running}) {
+      $self->check_razor2($pms, $body);
+    }
+  }
+
+  $pms->rule_ready($rulename); # mark rule ready for metas
+
  my $cf = 0;
  if ($engine) {
-    $cf = $permsgstatus->{razor2_cf_score}->{$engine};
-    return unless defined $cf;
+    $cf = $pms->{razor2_cf_score}->{$engine};
+    return 0 unless defined $cf;
  }
  else {
    # If no specific engine was given to the rule, find the highest cf
    # determined and use that
-    while(my ($engine, $ecf) = each %{$permsgstatus->{razor2_cf_score}}) {
+    while(my ($engine, $ecf) = each %{$pms->{razor2_cf_score}}) {
      if ($ecf > $cf) {
        $cf = $ecf;
      }
@ -462,13 +684,20 @@ sub check_razor2_range {
  }

  if ($cf >= $min && $cf <= $max) {
-    $permsgstatus->test_log(sprintf("cf: %3d", $cf));
+    my $cf_str = sprintf("cf: %3d", $cf);
+    $pms->test_log($cf_str, $rulename);
+    if ($self->{main}->{conf}->{razor_fork}) {
+      $pms->got_hit($rulename, "", ruletype => 'eval');
+    }
    return 1;
  }

-  return;
+  return 0;
 }

+# Version features
+sub has_fork { 1 }
+
 1;

 =back
--- a/upstream/lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/RelayCountry.pm
@ -53,10 +53,8 @@ Following metadata headers and tags are added:

 =head1 REQUIREMENT

-This plugin requires the GeoIP2, Geo::IP, IP::Country::DB_File or 
-IP::Country::Fast module from CPAN.
-For backward compatibility IP::Country::Fast is used as fallback if no db_type
-is specified in the config file.
+This plugin uses Mail::SpamAssassin::GeoDB and requires a module supported
+by it, for example MaxMind::DB::Reader (GeoIP2).

 =cut

@ -64,7 +62,6 @@ package Mail::SpamAssassin::Plugin::RelayCountry;

 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
-use Mail::SpamAssassin::Constants qw(:ip);
 use strict;
 use warnings;
 # use bytes;
@ -72,6 +69,11 @@ use re 'taint';

 our @ISA = qw(Mail::SpamAssassin::Plugin);

+my $db;
+my $dbv6;
+my $db_info;  # will hold database info
+my $db_type;  # will hold database type
+
 # constructor: register the eval rule
 sub new {
  my $class = shift;
@ -82,282 +84,39 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->set_config($mailsaobject->{conf});
+  # we need GeoDB country
+  $self->{main}->{geodb_wanted}->{country} = 1;
+
  return $self;
 }

-sub set_config {
-  my ($self, $conf) = @_;
-  my @cmds;
-
-=head1 USER PREFERENCES
-
-The following options can be used in both site-wide (C<local.cf>) and
-user-specific (C<user_prefs>) configuration files to customize how
-SpamAssassin handles incoming email messages.
-
-=over 4
-
-=item country_db_type STRING
-
-This option tells SpamAssassin which type of Geo database to use.
-Valid database types are GeoIP, GeoIP2, DB_File and Fast.
-
-=back
-
-=cut
-
-  push (@cmds, {
-    setting => 'country_db_type',
-    default => "GeoIP",
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
-    code => sub {
-      my ($self, $key, $value, $line) = @_;
-      if ($value !~ /^(?:GeoIP|GeoIP2|DB_File|Fast)$/) {
-        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
-      }
-      $self->{country_db_type} = $value;
-    }
-  });
-
-=over 4
-
-=item country_db_path STRING
-
-This option tells SpamAssassin where to find MaxMind GeoIP2 or IP::Country::DB_File database.
-
-If not defined, GeoIP2 default search includes:
- /usr/local/share/GeoIP/GeoIP2-Country.mmdb
- /usr/share/GeoIP/GeoIP2-Country.mmdb
- /var/lib/GeoIP/GeoIP2-Country.mmdb
- /usr/local/share/GeoIP/GeoLite2-Country.mmdb
- /usr/share/GeoIP/GeoLite2-Country.mmdb
- /var/lib/GeoIP/GeoLite2-Country.mmdb
- (and same paths again for -City.mmdb, which also has country functionality)
-
-=back
-
-=cut
-
-  push (@cmds, {
-    setting => 'country_db_path',
-    default => "",
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
-    code => sub {
-      my ($self, $key, $value, $line) = @_;
-      if (!defined $value || !length $value) {
-        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
-      }
-      if (!-e $value) {
-        info("config: country_db_path \"$value\" is not accessible");
-        $self->{country_db_path} = $value;
-        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
-      }
-      $self->{country_db_path} = $value;
-    }
-  });
-
-  push (@cmds, {
-    setting => 'geoip2_default_db_path',
-    default => [
-      '/usr/local/share/GeoIP/GeoIP2-Country.mmdb',
-      '/usr/share/GeoIP/GeoIP2-Country.mmdb',
-      '/var/lib/GeoIP/GeoIP2-Country.mmdb',
-      '/usr/local/share/GeoIP/GeoLite2-Country.mmdb',
-      '/usr/share/GeoIP/GeoLite2-Country.mmdb',
-      '/var/lib/GeoIP/GeoLite2-Country.mmdb',
-      '/usr/local/share/GeoIP/GeoIP2-City.mmdb',
-      '/usr/share/GeoIP/GeoIP2-City.mmdb',
-      '/var/lib/GeoIP/GeoIP2-City.mmdb',
-      '/usr/local/share/GeoIP/GeoLite2-City.mmdb',
-      '/usr/share/GeoIP/GeoLite2-City.mmdb',
-      '/var/lib/GeoIP/GeoLite2-City.mmdb',
-      ],
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRINGLIST,
-    code => sub {
-      my ($self, $key, $value, $line) = @_;
-      if ($value eq '') {
-        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
-      }
-      push(@{$self->{geoip2_default_db_path}}, split(/\s+/, $value));
-    }
-  });
-  
-  $conf->{parser}->register_commands(\@cmds);
-}
-
-sub get_country {
-    my ($self, $ip, $db, $dbv6, $country_db_type) = @_;
-    my $cc;
-    my $IP_PRIVATE = IP_PRIVATE;
-    my $IPV4_ADDRESS = IPV4_ADDRESS;
-
-    # Private IPs will always be returned as '**'
-    if ($ip =~ /^$IP_PRIVATE$/o) {
-      $cc = "**";
-    }
-    elsif ($country_db_type eq "GeoIP") {
-      if ($ip =~ /^$IPV4_ADDRESS$/o) {
-        $cc = $db->country_code_by_addr($ip);
-      } elsif (defined $dbv6) {
-        $cc = $dbv6->country_code_by_addr_v6($ip);
-      }
-    }
-    elsif ($country_db_type eq "GeoIP2") {
-      my ($country, $country_rec);
-      eval {
-        if (index($db->metadata()->description()->{en}, 'City') != -1) {
-          $country = $db->city( ip => $ip );
-        } else {
-          $country = $db->country( ip => $ip );
-        }
-        $country_rec = $country->country();
-        $cc = $country_rec->iso_code();
-        1;
-      } or do {
-        $@ =~ s/\s+Trace begun.*//s;
-        dbg("metadata: RelayCountry: GeoIP2 failed: $@");
-      }
-    }
-    elsif ($country_db_type eq "DB_File") {
-      if ($ip =~ /^$IPV4_ADDRESS$/o ) {
-        $cc = $db->inet_atocc($ip);
-      } else {
-        $cc = $db->inet6_atocc($ip);
-      }
-    }
-    elsif ($country_db_type eq "Fast") {
-      $cc = $db->inet_atocc($ip);
-    }
-
-    $cc ||= 'XX';
-
-    return $cc;
-}
-
 sub extract_metadata {
  my ($self, $opts) = @_;
  my $pms = $opts->{permsgstatus};
  
-  my $db;
-  my $dbv6;
-  my $db_info;  # will hold database info
-  my $db_type;  # will hold database type
+  return if $self->{relaycountry_disabled};

-  my $country_db_type = $opts->{conf}->{country_db_type};
-  my $country_db_path = $opts->{conf}->{country_db_path};
-
-  if ($country_db_type eq "GeoIP") {
-    eval {
-      require Geo::IP;
-      $db = Geo::IP->open_type(Geo::IP->GEOIP_COUNTRY_EDITION, Geo::IP->GEOIP_STANDARD);
-      die "GeoIP.dat not found" unless $db;
-      # IPv6 requires version Geo::IP 1.39+ with GeoIP C API 1.4.7+
-      if (Geo::IP->VERSION >= 1.39 && Geo::IP->api eq 'CAPI') {
-        $dbv6 = Geo::IP->open_type(Geo::IP->GEOIP_COUNTRY_EDITION_V6, Geo::IP->GEOIP_STANDARD);
-        if (!$dbv6) {
-          dbg("metadata: RelayCountry: GeoIP: IPv6 support not enabled, GeoIPv6.dat not found");
-        }
-      } else {
-        dbg("metadata: RelayCountry: GeoIP: IPv6 support not enabled, versions Geo::IP 1.39, GeoIP C API 1.4.7 required");
-      }
-      $db_info = sub { return "Geo::IP IPv4: " . ($db->database_info || '?')." / IPv6: ".($dbv6 ? $dbv6->database_info || '?' : '?') };
-      1;
-    } or do {
-      # Fallback to IP::Country::Fast
-      dbg("metadata: RelayCountry: GeoIP: GeoIP.dat not found, trying IP::Country::Fast as fallback");
-      $country_db_type = "Fast";
-    }
-  }
-  elsif ($country_db_type eq "GeoIP2") {
-    if (!$country_db_path) {
-      # Try some default locations
-      foreach (@{$opts->{conf}->{geoip2_default_db_path}}) {
-        if (-f $_) {
-          $country_db_path = $_;
-          last;
-        }
-      }
-    }
-    if (-f $country_db_path) {
-      eval {
-        require GeoIP2::Database::Reader;
-        $db = GeoIP2::Database::Reader->new(
-          file => $country_db_path,
-          locales => [ 'en' ]
-        );
-        die "unknown error" unless $db;
-        $db_info = sub {
-          my $m = $db->metadata();
-          return "GeoIP2 ".$m->description()->{en}." / ".localtime($m->build_epoch());
-        };
-        1;
-      } or do {
-        # Fallback to IP::Country::Fast
-        $@ =~ s/\s+Trace begun.*//s;
-        dbg("metadata: RelayCountry: GeoIP2: ${country_db_path} load failed: $@, trying IP::Country::Fast as fallback");
-        $country_db_type = "Fast";
-      }
-    } else {
-      # Fallback to IP::Country::Fast
-      my $err = $country_db_path ?
-        "$country_db_path not found" : "database not found from default locations";
-      dbg("metadata: RelayCountry: GeoIP2: $err, trying IP::Country::Fast as fallback");
-      $country_db_type = "Fast";
-    }
-  }
-  elsif ($country_db_type eq "DB_File") {
-    if (-f $country_db_path) {
-      eval {
-        require IP::Country::DB_File;
-        $db = IP::Country::DB_File->new($country_db_path);
-        die "unknown error" unless $db;
-        $db_info = sub { return "IP::Country::DB_File ".localtime($db->db_time()); };
-        1;
-      } or do {
-        # Fallback to IP::Country::Fast
-        dbg("metadata: RelayCountry: DB_File: ${country_db_path} load failed: $@, trying IP::Country::Fast as fallback");
-        $country_db_type = "Fast";
-      }
-    } else {
-      # Fallback to IP::Country::Fast
-      dbg("metadata: RelayCountry: DB_File: ${country_db_path} not found, trying IP::Country::Fast as fallback");
-      $country_db_type = "Fast";
-    }
+  if (!$self->{main}->{geodb} ||
+        !$self->{main}->{geodb}->can('country')) {
+    dbg("metadata: RelayCountry: plugin disabled, GeoDB country not available");
+    $self->{relaycountry_disabled} = 1;
+    return;
  }

-  if ($country_db_type eq "Fast") {
-    my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-    eval {
-      require IP::Country::Fast;
-      $db = IP::Country::Fast->new();
-      $db_info = sub { return "IP::Country::Fast ".localtime($db->db_time()); };
-      1;
-    } or do {
-      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-      dbg("metadata: RelayCountry: failed to load 'IP::Country::Fast', skipping: $eval_stat");
-      return 1;
-    }
-  }
-
-  if (!$db) {
-    return 1;
-  }
-
-  dbg("metadata: RelayCountry: Using database: ".$db_info->());
  my $msg = $opts->{msg};
+  my $geodb = $self->{main}->{geodb};

  my @cc_untrusted;
  foreach my $relay (@{$msg->{metadata}->{relays_untrusted}}) {
    my $ip = $relay->{ip};
-    my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
+    my $cc = $geodb->get_country($ip);
    push @cc_untrusted, $cc;
  }

  my @cc_external;
  foreach my $relay (@{$msg->{metadata}->{relays_external}}) {
    my $ip = $relay->{ip};
-    my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
+    my $cc = $geodb->get_country($ip);
    push @cc_external, $cc;
  }

@ -369,7 +128,7 @@ sub extract_metadata {
    }
    if ($found_auth) {
      my $ip = $relay->{ip};
-      my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
+      my $cc = $geodb->get_country($ip);
      push @cc_auth, $cc;
    }
  }
@ -377,7 +136,7 @@ sub extract_metadata {
  my @cc_all;
  foreach my $relay (@{$msg->{metadata}->{relays_internal}}, @{$msg->{metadata}->{relays_external}}) {
    my $ip = $relay->{ip};
-    my $cc = $self->get_country($ip, $db, $dbv6, $country_db_type);
+    my $cc = $geodb->get_country($ip);
    push @cc_all, $cc;
  }

@ -400,8 +159,6 @@ sub extract_metadata {
  $msg->put_metadata("X-Relay-Countries-All", $ccstr);
  dbg("metadata: X-Relay-Countries-All: $ccstr");
  $pms->set_tag("RELAYCOUNTRYALL", @cc_all == 1 ? $cc_all[0] : \@cc_all);
-
-  return 1;
 }

 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/RelayEval.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/RelayEval.pm
@ -28,6 +28,8 @@ use re 'taint';

 our @ISA = qw(Mail::SpamAssassin::Plugin);

+my $IPV4_ADDRESS = IPV4_ADDRESS;
+
 # constructor: register the eval rule
 sub new {
  my $class = shift;
@ -39,17 +41,17 @@ sub new {
  bless ($self, $class);

  # the important bit!
-  $self->register_eval_rule("check_for_numeric_helo");
-  $self->register_eval_rule("check_for_illegal_ip");
-  $self->register_eval_rule("check_all_trusted");
-  $self->register_eval_rule("check_no_relays");
-  $self->register_eval_rule("check_relays_unparseable");
-  $self->register_eval_rule("check_for_sender_no_reverse");
-  $self->register_eval_rule("check_for_from_domain_in_received_headers");
-  $self->register_eval_rule("check_for_forged_received_trail");
-  $self->register_eval_rule("check_for_forged_received_ip_helo");
-  $self->register_eval_rule("helo_ip_mismatch");
-  $self->register_eval_rule("check_for_no_rdns_dotcom_helo");
+  $self->register_eval_rule("check_for_numeric_helo"); # type does not matter
+  $self->register_eval_rule("check_for_illegal_ip"); # type does not matter
+  $self->register_eval_rule("check_all_trusted"); # type does not matter
+  $self->register_eval_rule("check_no_relays"); # type does not matter
+  $self->register_eval_rule("check_relays_unparseable"); # type does not matter
+  $self->register_eval_rule("check_for_sender_no_reverse"); # type does not matter
+  $self->register_eval_rule("check_for_from_domain_in_received_headers", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule("check_for_forged_received_trail"); # type does not matter
+  $self->register_eval_rule("check_for_forged_received_ip_helo"); # type does not matter
+  $self->register_eval_rule("helo_ip_mismatch"); # type does not matter
+  $self->register_eval_rule("check_for_no_rdns_dotcom_helo"); # type does not matter

  return $self;
 }
@ -72,7 +74,7 @@ sub hostname_to_domain {
  }
 }

-sub _helo_forgery_whitelisted {
+sub _helo_forgery_welcomelisted {
  my ($helo, $rdns) = @_;
  if ($helo eq 'msn.com' && $rdns eq 'hotmail.com') { return 1; }
  0;
@ -84,12 +86,10 @@ sub check_for_numeric_helo {
  my $rcvd = $pms->{relays_untrusted_str};

  if ($rcvd) {
-    my $IP_ADDRESS = IPV4_ADDRESS;
-    my $IP_PRIVATE = IP_PRIVATE;
    local $1;
    # no re "strict";  # since perl 5.21.8: Ranges of ASCII printables...
-    if ($rcvd =~ /\bhelo=($IP_ADDRESS)(?=[\000-\040,;\[()<>]|\z)/i  # Bug 5878
-        && $1 !~ /$IP_PRIVATE/) {
+    if ($rcvd =~ /\bhelo=($IPV4_ADDRESS)(?=[\000-\040,;\[()<>]|\z)/i  # Bug 5878
+        && $1 !~ IS_IP_PRIVATE) {
      return 1;
    }
  }
@ -108,23 +108,21 @@ sub check_for_illegal_ip {
 # due to bug in pure IPv6 address regular expression
 sub helo_ip_mismatch {
  my ($self, $pms) = @_;
-  my $IP_ADDRESS = IPV4_ADDRESS;
-  my $IP_PRIVATE = IP_PRIVATE;

  for my $relay (@{$pms->{relays_untrusted}}) {
    # is HELO usable?
-    next unless ($relay->{helo} =~ m/^$IP_ADDRESS$/ &&
-		 $relay->{helo} !~ /$IP_PRIVATE/);
+    next unless ($relay->{helo} =~ IS_IPV4_ADDRESS &&
+		 $relay->{helo} !~ IS_IP_PRIVATE);
    # compare HELO with IP
-    return 1 if ($relay->{ip} =~ m/^$IP_ADDRESS$/ &&
-		 $relay->{ip} !~ m/$IP_PRIVATE/ &&
+    return 1 if ($relay->{ip} =~ IS_IPV4_ADDRESS &&
+		 $relay->{ip} !~ IS_IP_PRIVATE &&
 		 $relay->{helo} ne $relay->{ip} &&
 		 # different IP is okay if in same /24
 		 $relay->{helo} =~ /^(\d+\.\d+\.\d+\.)/ &&
 		 index($relay->{ip}, $1) != 0);
  }

-  0;
+  return 0;
 }

 ###########################################################################
@ -145,7 +143,7 @@ sub check_no_relays {

 sub check_relays_unparseable {
  my ($self, $pms) = @_;
-  return $pms->{num_relays_unparseable};
+  return $pms->{num_relays_unparseable} ? 1 : 0;
 }

 # Check if the apparent sender (in the last received header) had
@ -215,7 +213,7 @@ sub check_for_from_domain_in_received_headers {
 sub check_for_no_rdns_dotcom_helo {
  my ($self, $pms) = @_;
  if (!exists $pms->{no_rdns_dotcom_helo}) { $self->_check_received_helos($pms); }
-  return $pms->{no_rdns_dotcom_helo};
+  return $pms->{no_rdns_dotcom_helo} ? 1 : 0;
 }

 # Bug 1133
@ -346,7 +344,7 @@ sub _check_for_forged_received {
 	# allow private IP addrs here, could be a legit screwup
 	if ($hclassb && $fclassb && 
 		$hclassb ne $fclassb &&
-		!($hlo =~ /$IP_PRIVATE/o))
+		$hlo !~ IS_IP_PRIVATE)
 	{
 	  dbg2("eval: forged-HELO: massive mismatch on IP-addr HELO: '$hlo' != '$fip'");
 	  $pms->{mismatch_ip_helo}++;
@ -357,7 +355,7 @@ sub _check_for_forged_received {
    my $prev = $from[$i-1];
    if (defined($prev) && $i > 0
 		&& $prev =~ /^\w+(?:[\w.-]+\.)+\w+$/
-		&& $by ne $prev && !_helo_forgery_whitelisted($by, $prev))
+		&& $by ne $prev && !_helo_forgery_welcomelisted($by, $prev))
    {
      dbg2("eval: forged-HELO: mismatch on from: '$prev' != '$by'");
      $pms->{mismatch_from}++;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm
@ -49,7 +49,6 @@ SpamAssassin; it is not guaranteed to work with other versions of SpamAssassin.

 package Mail::SpamAssassin::Plugin::ReplaceTags;

-use Mail::SpamAssassin;
 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
 use Mail::SpamAssassin::Util qw(compile_regexp qr_to_string);
@ -173,12 +172,15 @@ sub finish_parsing_end {
      # do the actual replacement
      my ($rec, $err) = compile_regexp($re, 0);
      if (!$rec) {
-        info("replacetags: regexp compilation failed '$re': $err");
+        info("replacetags: regexp compilation failed for $rule: '$re': $err");
        next;
      }
      $conf->{test_qrs}->{$rule} = $rec;
-      #dbg("replacetags: replaced $rule: '$origre' => '$re'");
+      if (would_log('dbg','replacetags') > 1) {
+        dbg("replacetags: replaced $rule: '$origre' => '$re'");
+      } else {
        dbg("replacetags: replaced $rule");
+      }
    } else {
      dbg("replacetags: nothing was replaced in $rule");
    }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/ResourceLimits.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/ResourceLimits.pm
@ -68,15 +68,15 @@ resident set size

 package Mail::SpamAssassin::Plugin::ResourceLimits;

-use Mail::SpamAssassin::Plugin ();
-use Mail::SpamAssassin::Logger ();
-use Mail::SpamAssassin::Util   ();
-use Mail::SpamAssassin::Constants qw(:sa);
+use Mail::SpamAssassin::Plugin;
+use Mail::SpamAssassin::Logger;

 use strict;
 use warnings;
+use re 'taint';

-use BSD::Resource qw(RLIMIT_RSS RLIMIT_AS RLIMIT_CPU);
+use constant HAS_BSD_RESOURCE =>
+  eval 'use BSD::Resource qw(RLIMIT_CPU RLIMIT_RSS RLIMIT_AS); 1;';

 our @ISA = qw(Mail::SpamAssassin::Plugin);

@ -86,58 +86,62 @@ sub new {

  $class = ref($class) || $class;
  my $self = $class->SUPER::new($mailsaobject);
-    bless( $self, $class );
+  bless ($self, $class);

-    $self->set_config( $mailsaobject->{conf} );
+  if (!HAS_BSD_RESOURCE) {
+    warn "ResourceLimits not used, required module BSD::Resource missing\n";
+  }
+
+  $self->set_config($mailsaobject->{conf});
  return $self;
 }

 sub set_config {
-    my ( $self, $conf ) = @_;
-    my @cmds = ();
+  my ($self, $conf) = @_;
+  my @cmds;

-    push(
-        @cmds,
-        {
+  push(@cmds, {
    setting  => 'resource_limit_mem',
    is_admin => 1,
-            default  => '0',
+    default  => 0,
    type     => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
-        }
-    );
+  });

-    push(
-        @cmds,
-        {
+  push(@cmds, {
    setting  => 'resource_limit_cpu',
    is_admin => 1,
-            default  => '0',
+    default  => 0,
    type     => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
-        }
-    );
+  });

-    $conf->{parser}->register_commands( \@cmds );
+  $conf->{parser}->register_commands(\@cmds);
 }

+if (HAS_BSD_RESOURCE) { eval '
 sub spamd_child_init {
  my ($self) = @_;

+  my $conf = $self->{main}->{conf};
+
  # Set CPU Resource limits if they were specified.
-    Mail::SpamAssassin::Util::dbg("resourcelimitplugin: In spamd_child_init");
-    Mail::SpamAssassin::Util::dbg( "resourcelimitplugin: cpu limit: " . $self->{main}->{conf}->{resource_limit_cpu} );
-    if ( $self->{main}->{conf}->{resource_limit_cpu} ) {
-        BSD::Resource::setrlimit( RLIMIT_CPU, $self->{main}->{conf}->{resource_limit_cpu}, $self->{main}->{conf}->{resource_limit_cpu} )
-          || info("resourcelimitplugin: Unable to set RLIMIT_CPU");
+  dbg("resourcelimits: cpu limit: " . $conf->{resource_limit_cpu});
+  if ($conf->{resource_limit_cpu}) {
+    BSD::Resource::setrlimit( RLIMIT_CPU,
+        $conf->{resource_limit_cpu}, $conf->{resource_limit_cpu} )
+      or info("resourcelimits: Unable to set RLIMIT_CPU");
  }

  # Set Resource limits if they were specified.
-    Mail::SpamAssassin::Util::dbg( "resourcelimitplugin: mem limit: " . $self->{main}->{conf}->{resource_limit_mem} );
-    if ( $self->{main}->{conf}->{resource_limit_mem} ) {
-        BSD::Resource::setrlimit( RLIMIT_RSS, $self->{main}->{conf}->{resource_limit_mem}, $self->{main}->{conf}->{resource_limit_mem} )
-          || info("resourcelimitplugin: Unable to set RLIMIT_RSS");
-        BSD::Resource::setrlimit( RLIMIT_AS, $self->{main}->{conf}->{resource_limit_mem}, $self->{main}->{conf}->{resource_limit_mem} )
-          || info("resourcelimitplugin: Unable to set RLIMIT_AS");
+  dbg("resourcelimits: mem limit: " . $conf->{resource_limit_mem});
+  if ($conf->{resource_limit_mem}) {
+    BSD::Resource::setrlimit( RLIMIT_RSS,
+        $conf->{resource_limit_mem}, $conf->{resource_limit_mem} )
+      or info("resourcelimits: Unable to set RLIMIT_RSS");
+    BSD::Resource::setrlimit( RLIMIT_AS,
+        $conf->{resource_limit_mem}, $conf->{resource_limit_mem} )
+      or info("resourcelimits: Unable to set RLIMIT_AS");
  }
 }
+'; }

 1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Reuse.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Reuse.pm
@ -10,6 +10,8 @@ Mail::SpamAssassin::Plugin::Reuse - For reusing old rule hits during a mass-chec

  reuse NETWORK_RULE [ NETWORK_RULE_OLD_NAME ]

+  run_reuse_tests_only 0/1
+
  endif

 =head1 DESCRIPTION
@ -18,6 +20,15 @@ The purpose of this plugin is to work in conjunction with B<mass-check
 --reuse> to map rules hit in input messages to rule hits in the
 mass-check output.

+run_reuse_tests_only 1 is special option for spamassassin/spamd use.
+Only reuse flagged tests will be run. It will also _enable_ network/DNS
+lookups. This is mainly intended for fast mass processing of corpus
+messages, so they can be properly reused later. For example:
+  spamd --pre="loadmodule Mail::SpamAssassin::Plugin::Reuse" \
+    --pre="run_reuse_tests_only 1" ...
+Such dedicated spamd could be scripted to add X-Spam-Status header to
+messages efficiently.
+
 =cut

 package Mail::SpamAssassin::Plugin::Reuse;
@ -25,12 +36,16 @@ package Mail::SpamAssassin::Plugin::Reuse;
 # use bytes;
 use strict;
 use warnings;
+use re 'taint';

 use Mail::SpamAssassin::Conf;
 use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Constants qw(:sa);

 our @ISA = qw(Mail::SpamAssassin::Plugin);

+my $RULENAME_RE = RULENAME_RE;
+
 # constructor
 sub new {
  my $invocant = shift;
@ -42,7 +57,7 @@ sub new {
  bless ($self, $class);

  $self->set_config($samain->{conf});
-  # make sure we run last (or close) of the finish_parsing_start since
+  # make sure we run last (or close) of the finish_parsing_end since
  # we need all other rules to be defined
  $self->register_method_priority("finish_parsing_start", 100);
  return $self;
@ -56,28 +71,35 @@ sub set_config {
  # e.g.
  # reuse NET_TEST_V1 NET_TEST_V0

-  push (@cmds, { setting => 'reuse',
+  push (@cmds, {
+    setting => 'reuse',
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_HASH_KEY_VALUE,
    code => sub {
      my ($conf, $key, $value, $line) = @_;

-                   if ($value !~ /\s*(\w+)(?:\s+(\w+(?:\s+\w+)*))?\s*$/) {
+      if ($value !~ /^\s*(${RULENAME_RE})(?:\s+(${RULENAME_RE}(?:\s+${RULENAME_RE})*))?\s*$/) {
        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }

      my $new_name = $1;
      my @old_names = ($new_name);
-		   if ($2) {
-		     push @old_names, split (' ', $2);
+      if (defined $2) {
+        push @old_names, split (/\s+/, $2);
      }

-                   dbg("reuse: read rule, old: @old_names new: $new_name");
+      dbg("reuse: read rule, old: %s new: %s", join(' ', @old_names), $new_name);
  
      foreach my $old (@old_names) {
        push @{$conf->{reuse_tests}->{$new_name}}, $old;
      }
+    }
+  });

-               }});
-
+  push(@cmds, {
+    setting => 'run_reuse_tests_only',
+    default => 0,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });

  $conf->{parser}->register_commands(\@cmds);
 }
@ -86,11 +108,27 @@ sub finish_parsing_start {
  my ($self, $opts) = @_;

  my $conf = $opts->{conf};
+  my $tflags = $conf->{tflags};

-  dbg("reuse: finish_parsing_start called");
+  while (my($rulename,$tfl) = each %{$tflags}) {
+    if ($tfl =~ /\bnet\b/ && !exists $conf->{reuse_tests}->{$rulename}) {
+      dbg("reuse: forcing reuse of net rule $rulename");
+      push @{$conf->{reuse_tests}->{$rulename}}, $rulename;
+    }
+  }

  return 0 if (!exists $conf->{reuse_tests});

+  if ($conf->{run_reuse_tests_only}) {
+    # simply delete all rules not reuse
+    foreach (keys %{$conf->{tests}}) {
+      if (!defined $conf->{reuse_tests}->{$_}) {
+        delete $conf->{tests}->{$_};
+      }
+    }
+    return 0;
+  }
+
  foreach my $rule_name (keys %{$conf->{reuse_tests}}) {

    # If the rule does not exist, add a new EMPTY test, set default score
@ -100,7 +138,7 @@ sub finish_parsing_start {
    }
    if (!exists $conf->{scores}->{$rule_name}) {
      my $set_score = ($rule_name =~/^T_/) ? 0.01 : 1.0;
-      $set_score = -$set_score if ( ($conf->{tflags}->{$rule_name}||'') =~ /\bnice\b/ );
+      $set_score = -$set_score if ( ($tflags->{$rule_name}||'') =~ /\bnice\b/ );
      foreach my $ss (0..3) {
        $conf->{scoreset}->[$ss]->{$rule_name} = $set_score;
      }
@ -108,7 +146,7 @@ sub finish_parsing_start {

    # Figure out when to add any hits -- grab priority and "stage"
    my $priority = $conf->{priority}->{$rule_name} || 0;
-    my $stage = $self->_get_stage_from_rule($opts->{conf}, $rule_name);
+    my $stage = $self->_get_stage_from_rule($conf, $rule_name);
    $conf->{reuse_tests_order}->{$rule_name} = [ $priority, $stage ];

  }
@ -118,6 +156,10 @@ sub check_start {
  my ($self, $opts) = @_;

  my $pms = $opts->{permsgstatus};
+  my $conf = $pms->{conf};
+  my $scoreset = $conf->{scoreset};
+
+  return 0 if $conf->{run_reuse_tests_only};

  # Can we reuse?
  my $msg = $pms->get_message();
@ -130,30 +172,34 @@ sub check_start {

  # now go through the rules and priorities and figure out which ones
  # need to be disabled
-  foreach my $rule (keys %{$pms->{conf}->{reuse_tests}}) {
+  foreach my $rule (keys %{$conf->{reuse_tests}}) {

-    dbg("reuse: looking at rule $rule");
-    my ($priority, $stage) = @{$pms->{conf}->{reuse_tests_order}->{$rule}};
+    my ($priority, $stage) = @{$conf->{reuse_tests_order}->{$rule}};

    # score set could change after check_start but before we add hits,
    # so we need to disable the rule in all sets
+    my @dis;
    foreach my $ss (0..3) {
-      if (exists $pms->{conf}->{scoreset}->[$ss]->{$rule}) {
-	dbg("reuse: disabling rule $rule in score set $ss");
+      if (exists $scoreset->[$ss]->{$rule}) {
        $pms->{reuse_old_scores}->{$rule}->[$ss] =
-	  $pms->{conf}->{scoreset}->[$ss]->{$rule};
-	$pms->{conf}->{scoreset}->[$ss]->{$rule} = 0;
+          $scoreset->[$ss]->{$rule};
+        $scoreset->[$ss]->{$rule} = 0;
+        push @dis, $ss;
      }
    }
+    dbg("reuse: disabling rule $rule in score sets %s",
+      join(',', @dis)) if @dis;

    # now, check for hits
-  OLD: foreach my $old_test (@{$pms->{conf}->{reuse_tests}->{$rule}}) {
-      dbg("reuse: looking for rule $old_test");
+    foreach my $old_test (@{$conf->{reuse_tests}->{$rule}}) {
      if ($old_hash->{$old_test}) {
        push @{$pms->{reuse_hits_to_add}->{"$priority $stage"}}, $rule;
        dbg("reuse: rule $rule hit, will add at priority $priority, stage " .
           "$stage");
-        last OLD;
+        last;
+      } else {
+        # Make sure rule is marked ready for meta rules
+        $pms->rule_ready($rule);
      }
    }
  }
@ -163,11 +209,15 @@ sub check_end {
  my ($self, $opts) = @_;

  my $pms = $opts->{permsgstatus};
+  my $conf = $pms->{conf};
+  my $scoreset = $conf->{scoreset};
+
+  return 0 if $conf->{run_reuse_tests_only};

  foreach my $disabled_rule (keys %{$pms->{reuse_old_scores}}) {
    foreach my $ss (0..3) {
-      next unless exists $pms->{conf}->{scoreset}->[$ss]->{$disabled_rule};
-      $pms->{conf}->{scoreset}->[$ss]->{$disabled_rule} =
+      next unless exists $scoreset->[$ss]->{$disabled_rule};
+      $scoreset->[$ss]->{$disabled_rule} =
        $pms->{reuse_old_scores}->{$disabled_rule}->[$ss];
    }
  }
@ -178,8 +228,11 @@ sub check_end {
 sub start_rules {
  my ($self, $opts) = @_;

-  return $self->_add_hits($opts->{permsgstatus}, $opts->{priority},
-			  $opts->{ruletype});
+  my $pms = $opts->{permsgstatus};
+
+  return 0 if $pms->{conf}->{run_reuse_tests_only};
+
+  return $self->_add_hits($pms, $opts->{priority}, $opts->{ruletype});
 }

 sub _add_hits {
@ -215,7 +268,7 @@ my %type_to_stage = (
  $Mail::SpamAssassin::Conf::TYPE_URI_EVALS     => "eval",
  $Mail::SpamAssassin::Conf::TYPE_META_TESTS    => "meta",
  $Mail::SpamAssassin::Conf::TYPE_RBL_EVALS     => "eval",
-		    );
+);

 sub _get_stage_from_rule {
  my  ($self, $conf, $rule) = @_;
@ -239,4 +292,4 @@ sub _get_stage_from_rule {
  }
 }

-
+1;
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
@ -143,6 +143,16 @@ sub setup_test_set_pri {
      dbg "zoom: skipping rule $name, ReplaceTags";
      next;
    }
+    # ignore regex capture rules
+    if ($conf->{capture_rules}->{$name}) {
+      dbg "zoom: skipping rule $name, regex capture";
+      next;
+    }
+    # ignore regex capture template rules
+    if ($conf->{capture_template_rules}->{$name}) {
+      dbg "zoom: skipping rule $name, regex capture template";
+      next;
+    }

    # we have the rule, and its regexp matches.  zero out the body
    # rule, so that the module can do the work instead
@ -197,6 +207,11 @@ sub check_rules_at_priority {
  $self->{one_line_body}->check_rules_at_priority($params);
 }

+sub check_cleanup {
+  my ($self, $params) = @_;
+  $self->{one_line_body}->check_cleanup($params);
+}
+
 ###########################################################################

 sub run_body_fast_scan {
--- a/upstream/lib/Mail/SpamAssassin/Plugin/SPF.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/SPF.pm
@ -29,6 +29,12 @@ This plugin checks a message against Sender Policy Framework (SPF)
 records published by the domain owners in DNS to fight email address
 forgery and make it easier to identify spams.

+It's recommended to use MTA filter (pypolicyd-spf / spf-engine etc), so this
+plugin can reuse the Received-SPF and/or Authentication-Results header results as is.
+Otherwise throughput could suffer, DNS lookups done by this plugin are not
+asynchronous.
+Those headers will also help when SpamAssassin is not able to correctly detect EnvelopeFrom.
+
 =cut

 package Mail::SpamAssassin::Plugin::SPF;
@ -53,22 +59,25 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->register_eval_rule ("check_for_spf_pass");
-  $self->register_eval_rule ("check_for_spf_neutral");
-  $self->register_eval_rule ("check_for_spf_none");
-  $self->register_eval_rule ("check_for_spf_fail");
-  $self->register_eval_rule ("check_for_spf_softfail");
-  $self->register_eval_rule ("check_for_spf_permerror");
-  $self->register_eval_rule ("check_for_spf_temperror");
-  $self->register_eval_rule ("check_for_spf_helo_pass");
-  $self->register_eval_rule ("check_for_spf_helo_neutral");
-  $self->register_eval_rule ("check_for_spf_helo_none");
-  $self->register_eval_rule ("check_for_spf_helo_fail");
-  $self->register_eval_rule ("check_for_spf_helo_softfail");
-  $self->register_eval_rule ("check_for_spf_helo_permerror");
-  $self->register_eval_rule ("check_for_spf_helo_temperror");
-  $self->register_eval_rule ("check_for_spf_whitelist_from");
-  $self->register_eval_rule ("check_for_def_spf_whitelist_from");
+  $self->register_eval_rule ("check_for_spf_pass", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_neutral", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_none", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_fail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_softfail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_permerror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_temperror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_pass", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_neutral", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_none", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_fail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_softfail", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_permerror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_helo_temperror", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_spf_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); # removed in 4.1
+  $self->register_eval_rule ("check_for_def_spf_welcomelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
+  $self->register_eval_rule ("check_for_def_spf_whitelist_from", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS); # removed in 4.1
+  $self->register_eval_rule ("check_spf_skipped_noenvfrom", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);

  $self->set_config($mailsaobject->{conf});

@ -85,37 +94,43 @@ sub set_config {

 =over 4

-=item whitelist_from_spf user@example.com
+=item welcomelist_from_spf user@example.com

-Works similarly to whitelist_from, except that in addition to matching
-a sender address, a check against the domain's SPF record must pass.
-The first parameter is an address to whitelist, and the second is a string
-to match the relay's rDNS.
+Previously whitelist_from_spf which will work interchangeably until 4.1.

-Just like whitelist_from, multiple addresses per line, separated by spaces,
-are OK. Multiple C<whitelist_from_spf> lines are also OK.
+Works similarly to welcomelist_from, except that in addition to matching a
+sender address, a check against the domain's SPF record must pass.  The
+first parameter is an address to welcomelist, and the second is a string to
+match the relay's rDNS.

-The headers checked for whitelist_from_spf addresses are the same headers
+Just like welcomelist_from, multiple addresses per line, separated by
+spaces, are OK.  Multiple C<welcomelist_from_spf> lines are also OK.
+
+The headers checked for welcomelist_from_spf addresses are the same headers
 used for SPF checks (Envelope-From, Return-Path, X-Envelope-From, etc).

-Since this whitelist requires an SPF check to be made, network tests must be
+Since this welcomelist requires an SPF check to be made, network tests must be
 enabled. It is also required that your trust path be correctly configured.
 See the section on C<trusted_networks> for more info on trust paths.

 e.g.

-  whitelist_from_spf joe@example.com fred@example.com
-  whitelist_from_spf *@example.com
+  welcomelist_from_spf joe@example.com fred@example.com
+  welcomelist_from_spf *@example.com

-=item def_whitelist_from_spf user@example.com
+=item def_welcomelist_from_spf user@example.com

-Same as C<whitelist_from_spf>, but used for the default whitelist entries
-in the SpamAssassin distribution.  The whitelist score is lower, because
+Previously def_whitelist_from_spf which will work interchangeably until 4.1.
+
+Same as C<welcomelist_from_spf>, but used for the default welcomelist entries
+in the SpamAssassin distribution.  The welcomelist score is lower, because
 these are often targets for spammer spoofing.

-=item unwhitelist_from_spf user@example.com
+=item unwelcomelist_from_spf user@example.com

-Used to remove a C<whitelist_from_spf> or C<def_whitelist_from_spf> entry. 
+Previously unwhitelist_from_spf which will work interchangeably until 4.1.
+
+Used to remove a C<welcomelist_from_spf> or C<def_welcomelist_from_spf> entry. 
 The specified email address has to match exactly the address previously used.

 Useful for removing undesired default entries from a distributed configuration
@ -124,17 +139,20 @@ by a local or site-specific configuration or by C<user_prefs>.
 =cut

  push (@cmds, {
-    setting => 'whitelist_from_spf',
+    setting => 'welcomelist_from_spf',
+    aliases => ['whitelist_from_spf'], # removed in 4.1
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST
  });

  push (@cmds, {
-    setting => 'def_whitelist_from_spf',
+    setting => 'def_welcomelist_from_spf',
+    aliases => ['def_whitelist_from_spf'], # removed in 4.1
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST
  });

  push (@cmds, {
-    setting => 'unwhitelist_from_spf',
+    setting => 'unwelcomelist_from_spf',
+    aliases => ['unwhitelist_from_spf'], # removed in 4.1
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_ADDRLIST,
    code => sub {
      my ($self, $key, $value, $line) = @_;
@ -144,9 +162,9 @@ by a local or site-specific configuration or by C<user_prefs>.
      unless ($value =~ /^(?:\S+(?:\s+\S+)*)$/) {
        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }
-      $self->{parser}->remove_from_addrlist('whitelist_from_spf',
+      $self->{parser}->remove_from_addrlist('welcomelist_from_spf',
                                        split (/\s+/, $value));
-      $self->{parser}->remove_from_addrlist('def_whitelist_from_spf',
+      $self->{parser}->remove_from_addrlist('def_welcomelist_from_spf',
                                        split (/\s+/, $value));
    }
  });
@ -173,38 +191,6 @@ days, weeks).
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_DURATION
  });

-=item do_not_use_mail_spf (0|1)		(default: 0)
-
-By default the plugin will try to use the Mail::SPF module for SPF checks if
-it can be loaded.  If Mail::SPF cannot be used the plugin will fall back to
-using the legacy Mail::SPF::Query module if it can be loaded.
-
-Use this option to stop the plugin from using Mail::SPF and cause it to try to
-use Mail::SPF::Query instead.
-
-=cut
-
-  push(@cmds, {
-    setting => 'do_not_use_mail_spf',
-    is_admin => 1,
-    default => 0,
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
-  });
-
-=item do_not_use_mail_spf_query (0|1)	(default: 0)
-
-As above, but instead stop the plugin from trying to use Mail::SPF::Query and
-cause it to only try to use Mail::SPF.
-
-=cut
-
-  push(@cmds, {
-    setting => 'do_not_use_mail_spf_query',
-    is_admin => 1,
-    default => 0,
-    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
-  });
-
 =item ignore_received_spf_header (0|1)	(default: 0)

 By default, to avoid unnecessary DNS lookups, the plugin will try to use the
@ -250,10 +236,23 @@ working downwards until results are successfully parsed.
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
  });

+  # Deprecated since 4.0.0, leave for backwards compatibility
+  push(@cmds, {
+    setting => 'do_not_use_mail_spf',
+    is_admin => 1,
+    default => 0,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });
+  push(@cmds, {
+    setting => 'do_not_use_mail_spf_query',
+    is_admin => 1,
+    default => 1,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL,
+  });
+
  $conf->{parser}->register_commands(\@cmds);
 }

-
 =item has_check_for_spf_errors

 Adds capability check for "if can()" for check_for_spf_permerror, check_for_spf_temperror, check_for_spf_helo_permerror and check_for_spf_helo_permerror
@ -262,23 +261,39 @@ Adds capability check for "if can()" for check_for_spf_permerror, check_for_spf_

 sub has_check_for_spf_errors { 1 }

+=item has_check_spf_skipped_noenvfrom
+
+Adds capability check for "if can()" for check_spf_skipped_noenvfrom
+
+=cut
+
+sub has_check_spf_skipped_noenvfrom { 1 }
+
+sub parsed_metadata {
+  my ($self, $opts) = @_;
+
+  $self->_get_sender($opts->{permsgstatus});
+
+  return 1;
+}
+
 # SPF support
 sub check_for_spf_pass {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
-  $scanner->{spf_pass};
+  return $scanner->{spf_pass} ? 1 : 0;
 }

 sub check_for_spf_neutral {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
-  $scanner->{spf_neutral};
+  return $scanner->{spf_neutral} ? 1 : 0;
 }

 sub check_for_spf_none {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
-  $scanner->{spf_none};
+  return $scanner->{spf_none} ? 1 : 0;
 }

 sub check_for_spf_fail {
@ -287,43 +302,43 @@ sub check_for_spf_fail {
  if ($scanner->{spf_failure_comment}) {
    $scanner->test_log ($scanner->{spf_failure_comment});
  }
-  $scanner->{spf_fail};
+  return $scanner->{spf_fail} ? 1 : 0;
 }

 sub check_for_spf_softfail {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
-  $scanner->{spf_softfail};
+  return $scanner->{spf_softfail} ? 1 : 0;
 }

 sub check_for_spf_permerror {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
-  $scanner->{spf_permerror};
+  return $scanner->{spf_permerror} ? 1 : 0;
 }

 sub check_for_spf_temperror {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
-  $scanner->{spf_temperror};
+  return $scanner->{spf_temperror} ? 1 : 0;
 }

 sub check_for_spf_helo_pass {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
-  $scanner->{spf_helo_pass};
+  return $scanner->{spf_helo_pass} ? 1 : 0;
 }

 sub check_for_spf_helo_neutral {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
-  $scanner->{spf_helo_neutral};
+  return $scanner->{spf_helo_neutral} ? 1 : 0;
 }

 sub check_for_spf_helo_none {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
-  $scanner->{spf_helo_none};
+  return $scanner->{spf_helo_none} ? 1 : 0;
 }

 sub check_for_spf_helo_fail {
@ -332,38 +347,60 @@ sub check_for_spf_helo_fail {
  if ($scanner->{spf_helo_failure_comment}) {
    $scanner->test_log ($scanner->{spf_helo_failure_comment});
  }
-  $scanner->{spf_helo_fail};
+  return $scanner->{spf_helo_fail} ? 1 : 0;
 }

 sub check_for_spf_helo_softfail {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
-  $scanner->{spf_helo_softfail};
+  return $scanner->{spf_helo_softfail} ? 1 : 0;
 }

 sub check_for_spf_helo_permerror {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
-  $scanner->{spf_helo_permerror};
+  return $scanner->{spf_helo_permerror} ? 1 : 0;
 }

 sub check_for_spf_helo_temperror {
  my ($self, $scanner) = @_;
  $self->_check_spf ($scanner, 1) unless $scanner->{spf_helo_checked};
-  $scanner->{spf_helo_temperror};
+  return $scanner->{spf_helo_temperror} ? 1 : 0;
 }

-sub check_for_spf_whitelist_from {
+=over 4
+
+=item check_spf_skipped_noenvfrom
+
+Checks if SPF checks have been skipped because EnvelopeFrom cannot be determined.
+
+=back
+
+=cut
+
+sub check_spf_skipped_noenvfrom {
  my ($self, $scanner) = @_;
-  $self->_check_spf_whitelist($scanner) unless $scanner->{spf_whitelist_from_checked};
-  $scanner->{spf_whitelist_from};
+  $self->_check_spf ($scanner, 0) unless $scanner->{spf_checked};
+  if (!exists $scanner->{spf_sender}) {
+    return 1;
+  } else {
+    return 0;
+  }
 }

-sub check_for_def_spf_whitelist_from {
+sub check_for_spf_welcomelist_from {
  my ($self, $scanner) = @_;
-  $self->_check_def_spf_whitelist($scanner) unless $scanner->{def_spf_whitelist_from_checked};
-  $scanner->{def_spf_whitelist_from};
+  $self->_check_spf_welcomelist($scanner) unless $scanner->{spf_welcomelist_from_checked};
+  return $scanner->{spf_welcomelist_from} ? 1 : 0;
 }
+*check_for_spf_whitelist_from = \&check_for_spf_welcomelist_from; # removed in 4.1
+
+sub check_for_def_spf_welcomelist_from {
+  my ($self, $scanner) = @_;
+  $self->_check_def_spf_welcomelist($scanner) unless $scanner->{def_spf_welcomelist_from_checked};
+  return $scanner->{def_spf_welcomelist_from} ? 1 : 0;
+}
+*check_for_def_spf_whitelist_from = \&check_for_def_spf_welcomelist_from; # removed in 4.1

 sub _check_spf {
  my ($self, $scanner, $ishelo) = @_;
@ -387,7 +424,7 @@ sub _check_spf {
    $scanner->{checked_for_received_spf_header} = 1;
    dbg("spf: checking to see if the message has a Received-SPF header that we can use");

-    my @internal_hdrs = split("\n", $scanner->get('ALL-INTERNAL'));
+    my @internal_hdrs = $scanner->get('ALL-INTERNAL');
    unless ($scanner->{conf}->{use_newest_received_spf_header}) {
      # look for the LAST (earliest in time) header, it'll be the most accurate
      @internal_hdrs = reverse(@internal_hdrs);
@ -459,7 +496,7 @@ sub _check_spf {
 	  dbg("spf: could not parse result from existing Received-SPF header");
 	}

-      } elsif ($hdr =~ /^Authentication-Results:.*;\s*SPF\s*=\s*([^;]*)/i) {
+      } elsif ($hdr =~ /^(?:Arc\-)?Authentication-Results:.*;\s*SPF\s*=\s*([^;]*)/i) {
        dbg("spf: found an Authentication-Results header added by an internal host: $hdr");

        # RFC 5451 header parser - added by D. Stussy 2010-09-09:
@ -524,8 +561,6 @@ sub _check_spf {
  unless (defined $self->{has_mail_spf}) {
    my $eval_stat;
    eval {
-      die("Mail::SPF disabled by admin setting\n") if $scanner->{conf}->{do_not_use_mail_spf};
-
      require Mail::SPF;
      if (!defined $Mail::SPF::VERSION || $Mail::SPF::VERSION < 2.001) {
 	die "Mail::SPF 2.001 or later required, this is ".
@ -547,43 +582,18 @@ sub _check_spf {
      dbg("spf: using Mail::SPF for SPF checks");
      $self->{has_mail_spf} = 1;
    } else {
-      # strip the @INC paths... users are going to see it and think there's a problem even though
-      # we're going to fall back to Mail::SPF::Query (which will display the same paths if it fails)
-      $eval_stat =~ s#^Can't locate Mail/SPFd.pm in \@INC .*#Can't locate Mail/SPFd.pm#;
-      dbg("spf: cannot load Mail::SPF module or create Mail::SPF::Server object: $eval_stat");
-      dbg("spf: attempting to use legacy Mail::SPF::Query module instead");
-
-      undef $eval_stat;
-      eval {
-	die("Mail::SPF::Query disabled by admin setting\n") if $scanner->{conf}->{do_not_use_mail_spf_query};
-
-	require Mail::SPF::Query;
-	if (!defined $Mail::SPF::Query::VERSION || $Mail::SPF::Query::VERSION < 1.996) {
-	  die "Mail::SPF::Query 1.996 or later required, this is ".
-	    (defined $Mail::SPF::Query::VERSION ? $Mail::SPF::Query::VERSION : 'unknown')."\n";
-	}
-        1;
-      } or do {
-        $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-      };
-
-      if (!defined($eval_stat)) {
-	dbg("spf: using Mail::SPF::Query for SPF checks");
-	$self->{has_mail_spf} = 0;
-      } else {
-	dbg("spf: cannot load Mail::SPF::Query module: $eval_stat");
-	dbg("spf: one of Mail::SPF or Mail::SPF::Query is required for SPF checks, SPF checks disabled");
+      dbg("spf: cannot load Mail::SPF: module: $eval_stat");
+      dbg("spf: Mail::SPF is required for SPF checks, SPF checks disabled");
      $self->{no_spf_module} = 1;
      return;
    }
  }
-  }
-

  # skip SPF checks if the A/MX records are nonexistent for the From
  # domain, anyway, to avoid crappy messages from slowing us down
  # (bug 3016)
-  return if $scanner->check_for_from_dns();
+  # TODO: this will only work if the queries are ready before SPF, so never?
+  return if $scanner->{sender_host_fail} && $scanner->{sender_host_fail} == 2;

  if ($ishelo) {
    # SPF HELO-checking variant
@ -609,7 +619,7 @@ sub _check_spf {
    $scanner->{spf_failure_comment} = undef;
  }

-  my $lasthop = $self->_get_relay($scanner);
+  my $lasthop = $scanner->{relays_external}->[0];
  if (!defined $lasthop) {
    dbg("spf: no suitable relay for spf use found, skipping SPF%s check",
        $ishelo ? '-helo' : '');
@ -618,7 +628,6 @@ sub _check_spf {

  my $ip = $lasthop->{ip};	# always present
  my $helo = $lasthop->{helo};	# could be missing
-  $scanner->{sender} = '' unless $scanner->{sender_got};

  if ($ishelo) {
    unless ($helo) {
@ -627,19 +636,17 @@ sub _check_spf {
    }
    dbg("spf: checking HELO (helo=$helo, ip=$ip)");
  } else {
-    $self->_get_sender($scanner) unless $scanner->{sender_got};
-
    # TODO: we're supposed to use the helo domain as the sender identity (for
    # mfrom checks) if the sender is the null sender, however determining that
    # it's the null sender, and not just a failure to get the envelope isn't
    # exactly trivial... so for now we'll just skip the check

-    if (!$scanner->{sender}) {
+    if (!$scanner->{spf_sender}) {
      # we already dbg'd that we couldn't get an Envelope-From and can't do SPF
      return;
    }
    dbg("spf: checking EnvelopeFrom (helo=%s, ip=%s, envfrom=%s)",
-        ($helo ? $helo : ''), $ip, $scanner->{sender});
+        ($helo ? $helo : ''), $ip, $scanner->{spf_sender});
  }

  # this test could probably stand to be more strict, but try to test
@ -657,11 +664,8 @@ sub _check_spf {

  my ($result, $comment, $text, $err);

-  # use Mail::SPF if it was available, otherwise use the legacy Mail::SPF::Query
-  if ($self->{has_mail_spf}) {
-
  # TODO: currently we won't get to here for a mfrom check with a null sender
-    my $identity = $ishelo ? $helo : ($scanner->{sender}); # || $helo);
+  my $identity = $ishelo ? $helo : ($scanner->{spf_sender}); # || $helo);

  unless ($identity) {
    dbg("spf: cannot determine %s identity, skipping %s SPF check",
@ -685,61 +689,21 @@ sub _check_spf {

  my $timeout = $scanner->{conf}->{spf_timeout};

-    my $timer = Mail::SpamAssassin::Timeout->new(
+  my $timer_spf = Mail::SpamAssassin::Timeout->new(
              { secs => $timeout, deadline => $scanner->{master_deadline} });
-    $err = $timer->run_and_catch(sub {
-
+  $err = $timer_spf->run_and_catch(sub {
    my $query = $self->{spf_server}->process($request);
-
    $result = $query->code;
    $comment = $query->authority_explanation if $query->can("authority_explanation");
    $text = $query->text;
-
  });

-
-  } else {
-
-    if (!$helo) {
-      dbg("spf: cannot get HELO, cannot use Mail::SPF::Query, consider installing Mail::SPF");
-      return;
-    }
-
-    # TODO: if we start doing checks on the null sender using the helo domain
-    # be sure to fix this so that it uses the correct sender identity
-    my $query;
-    eval {
-      $query = Mail::SPF::Query->new (ip => $ip,
-				    sender => $scanner->{sender},
-				    helo => $helo,
-				    debug => 0,
-				    trusted => 0);
-      1;
-    } or do {
-      my $eval_stat = $@ ne '' ? $@ : "errno=$!";  chomp $eval_stat;
-      dbg("spf: cannot create Mail::SPF::Query object: $eval_stat");
-      return;
-    };
-
-    my $timeout = $scanner->{conf}->{spf_timeout};
-
-    my $timer = Mail::SpamAssassin::Timeout->new(
-                { secs => $timeout, deadline => $scanner->{master_deadline} });
-    $err = $timer->run_and_catch(sub {
-
-      ($result, $comment) = $query->result();
-
-    });
-
-  } # end of differences between Mail::SPF and Mail::SPF::Query
-
  if ($err) {
    chomp $err;
    warn("spf: lookup failed: $err\n");
    return 0;
  }

-
  $result ||= 'timeout';	# bug 5077
  $comment ||= '';
  $comment =~ s/\s+/ /gs;	# no newlines please
@ -774,138 +738,131 @@ sub _check_spf {
    }
  }

-  dbg("spf: query for $scanner->{sender}/$ip/$helo: result: $result, comment: $comment, text: $text");
-}
-
-sub _get_relay {
-  my ($self, $scanner) = @_;
-
-  # dos: first external relay, not first untrusted
-  return $scanner->{relays_external}->[0];
+  if ($ishelo) {
+    dbg("spf: query for $ip/$helo: result: $result, comment: $comment, text: $text");
+  } else {
+    dbg("spf: query for $scanner->{spf_sender}/$ip/$helo: result: $result, comment: $comment, text: $text");
+  }
 }

 sub _get_sender {
  my ($self, $scanner) = @_;
-  my $sender;

-  $scanner->{sender_got} = 1;
-  $scanner->{sender} = '';
-
-  my $relay = $self->_get_relay($scanner);
+  my $relay = $scanner->{relays_external}->[0];
  if (defined $relay) {
-    $sender = $relay->{envfrom};
+    my $sender = $relay->{envfrom};
+    if (defined $sender) {
+      dbg("spf: found EnvelopeFrom '$sender' in first external Received header");	
+      $scanner->{spf_sender} = lc $sender;
+    } else {
+      dbg("spf: EnvelopeFrom not found in first external Received header");
+    }
  }

-  if ($sender) {
-    dbg("spf: found Envelope-From in first external Received header");
-  }
-  else {
+  if (!exists $scanner->{spf_sender}) {
    # We cannot use the env-from data, since it went through 1 or more relays 
    # since the untrusted sender and they may have rewritten it.
-    if ($scanner->{num_relays_trusted} > 0 && !$scanner->{conf}->{always_trust_envelope_sender}) {
-      dbg("spf: relayed through one or more trusted relays, cannot use header-based Envelope-From, skipping");
-      return;
-    }
-
-    # we can (apparently) use whatever the current Envelope-From was,
+    if ($scanner->{num_relays_trusted} > 0 &&
+          !$scanner->{conf}->{always_trust_envelope_sender}) {
+      dbg("spf: relayed through one or more trusted relays, ".
+            "cannot use header-based EnvelopeFrom");
+    } else {
+      # we can (apparently) use whatever the current EnvelopeFrom was,
      # from the Return-Path, X-Envelope-From, or whatever header.
      # it's better to get it from Received though, as that is updated
      # hop-by-hop.
-    $sender = $scanner->get("EnvelopeFrom:addr");
-  }
-
-  if (!$sender) {
-    dbg("spf: cannot get Envelope-From, cannot use SPF");
-    return;  # avoid setting $scanner->{sender} to undef
-  }
-
-  return $scanner->{sender} = lc $sender;
-}
-
-sub _check_spf_whitelist {
-  my ($self, $scanner) = @_;
-
-  $scanner->{spf_whitelist_from_checked} = 1;
-  $scanner->{spf_whitelist_from} = 0;
-
-  # if we've already checked for an SPF PASS and didn't get it don't waste time
-  # checking to see if the sender address is in the spf whitelist
-  if ($scanner->{spf_checked} && !$scanner->{spf_pass}) {
-    dbg("spf: whitelist_from_spf: already checked spf and didn't get pass, skipping whitelist check");
-    return;
-  }
-
-  $self->_get_sender($scanner) unless $scanner->{sender_got};
-
-  unless ($scanner->{sender}) {
-    dbg("spf: spf_whitelist_from: could not find usable envelope sender");
-    return;
-  }
-
-  $scanner->{spf_whitelist_from} = $self->_wlcheck($scanner,'whitelist_from_spf');
-  if (!$scanner->{spf_whitelist_from}) {
-    $scanner->{spf_whitelist_from} = $self->_wlcheck($scanner, 'whitelist_auth');
-  }
-
-  # if the message doesn't pass SPF validation, it can't pass an SPF whitelist
-  if ($scanner->{spf_whitelist_from}) {
-    if ($self->check_for_spf_pass($scanner)) {
-      dbg("spf: whitelist_from_spf: $scanner->{sender} is in user's WHITELIST_FROM_SPF and passed SPF check");
+      my $sender = ($scanner->get("EnvelopeFrom:addr"))[0];
+      if (defined $sender) {
+        dbg("spf: found EnvelopeFrom '$sender' from header");
+        $scanner->{spf_sender} = lc $sender;
      } else {
-      dbg("spf: whitelist_from_spf: $scanner->{sender} is in user's WHITELIST_FROM_SPF but failed SPF check");
-      $scanner->{spf_whitelist_from} = 0;
+        dbg("spf: EnvelopeFrom header not found");
      }
-  } else {
-    dbg("spf: whitelist_from_spf: $scanner->{sender} is not in user's WHITELIST_FROM_SPF");
+    }
+  }
+
+  if (!exists $scanner->{spf_sender}) {
+    dbg("spf: cannot get EnvelopeFrom, cannot use SPF by DNS");
  }
 }

-sub _check_def_spf_whitelist {
+sub _check_spf_welcomelist {
  my ($self, $scanner) = @_;

-  $scanner->{def_spf_whitelist_from_checked} = 1;
-  $scanner->{def_spf_whitelist_from} = 0;
+  $scanner->{spf_welcomelist_from_checked} = 1;
+  $scanner->{spf_welcomelist_from} = 0;

  # if we've already checked for an SPF PASS and didn't get it don't waste time
-  # checking to see if the sender address is in the spf whitelist
+  # checking to see if the sender address is in the spf welcomelist
  if ($scanner->{spf_checked} && !$scanner->{spf_pass}) {
-    dbg("spf: def_spf_whitelist_from: already checked spf and didn't get pass, skipping whitelist check");
+    dbg("spf: welcomelist_from_spf: already checked spf and didn't get pass, skipping welcomelist check");
    return;
  }

-  $self->_get_sender($scanner) unless $scanner->{sender_got};
-
-  unless ($scanner->{sender}) {
-    dbg("spf: def_spf_whitelist_from: could not find usable envelope sender");
+  if (!$scanner->{spf_sender}) {
+    dbg("spf: spf_welcomelist_from: no EnvelopeFrom available for welcomelist check");
    return;
  }

-  $scanner->{def_spf_whitelist_from} = $self->_wlcheck($scanner,'def_whitelist_from_spf');
-  if (!$scanner->{def_spf_whitelist_from}) {
-    $scanner->{def_spf_whitelist_from} = $self->_wlcheck($scanner, 'def_whitelist_auth');
-  }
+  $scanner->{spf_welcomelist_from} =
+    $self->_wlcheck($scanner, 'welcomelist_from_spf') ||
+    $self->_wlcheck($scanner, 'welcomelist_auth');

-  # if the message doesn't pass SPF validation, it can't pass an SPF whitelist
-  if ($scanner->{def_spf_whitelist_from}) {
+  # if the message doesn't pass SPF validation, it can't pass an SPF welcomelist
+  if ($scanner->{spf_welcomelist_from}) {
    if ($self->check_for_spf_pass($scanner)) {
-      dbg("spf: def_whitelist_from_spf: $scanner->{sender} is in DEF_WHITELIST_FROM_SPF and passed SPF check");
+      dbg("spf: welcomelist_from_spf: $scanner->{spf_sender} is in user's WELCOMELIST_FROM_SPF and passed SPF check");
    } else {
-      dbg("spf: def_whitelist_from_spf: $scanner->{sender} is in DEF_WHITELIST_FROM_SPF but failed SPF check");
-      $scanner->{def_spf_whitelist_from} = 0;
+      dbg("spf: welcomelist_from_spf: $scanner->{spf_sender} is in user's WELCOMELIST_FROM_SPF but failed SPF check");
+      $scanner->{spf_welcomelist_from} = 0;
    }
  } else {
-    dbg("spf: def_whitelist_from_spf: $scanner->{sender} is not in DEF_WHITELIST_FROM_SPF");
+    dbg("spf: welcomelist_from_spf: $scanner->{spf_sender} is not in user's WELCOMELIST_FROM_SPF");
+  }
+}
+
+sub _check_def_spf_welcomelist {
+  my ($self, $scanner) = @_;
+
+  $scanner->{def_spf_welcomelist_from_checked} = 1;
+  $scanner->{def_spf_welcomelist_from} = 0;
+
+  # if we've already checked for an SPF PASS and didn't get it don't waste time
+  # checking to see if the sender address is in the spf welcomelist
+  if ($scanner->{spf_checked} && !$scanner->{spf_pass}) {
+    dbg("spf: def_spf_welcomelist_from: already checked spf and didn't get pass, skipping welcomelist check");
+    return;
+  }
+
+  if (!$scanner->{spf_sender}) {
+    dbg("spf: def_spf_welcomelist_from: could not find usable envelope sender");
+    return;
+  }
+
+  $scanner->{def_spf_welcomelist_from} =
+    $self->_wlcheck($scanner, 'def_welcomelist_from_spf') ||
+    $self->_wlcheck($scanner, 'def_welcomelist_auth');
+
+  # if the message doesn't pass SPF validation, it can't pass an SPF welcomelist
+  if ($scanner->{def_spf_welcomelist_from}) {
+    if ($self->check_for_spf_pass($scanner)) {
+      dbg("spf: def_welcomelist_from_spf: $scanner->{spf_sender} is in DEF_WELCOMELIST_FROM_SPF and passed SPF check");
+    } else {
+      dbg("spf: def_welcomelist_from_spf: $scanner->{spf_sender} is in DEF_WELCOMELIST_FROM_SPF but failed SPF check");
+      $scanner->{def_spf_welcomelist_from} = 0;
+    }
+  } else {
+    dbg("spf: def_welcomelist_from_spf: $scanner->{spf_sender} is not in DEF_WELCOMELIST_FROM_SPF");
  }
 }

 sub _wlcheck {
  my ($self, $scanner, $param) = @_;
-  if (defined ($scanner->{conf}->{$param}->{$scanner->{sender}})) {
+  if (defined ($scanner->{conf}->{$param}->{$scanner->{spf_sender}})) {
    return 1;
  } else {
-    study $scanner->{sender};  # study is a no-op since perl 5.16.0
    foreach my $regexp (values %{$scanner->{conf}->{$param}}) {
-      if ($scanner->{sender} =~ qr/$regexp/i) {
+      if ($scanner->{spf_sender} =~ $regexp) {
        return 1;
      }
    }
--- a/upstream/lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/Shortcircuit.pm
@ -58,7 +58,7 @@ sub new {
  my $self = $class->SUPER::new($mailsaobject);
  bless ($self, $class);

-  $self->register_eval_rule("check_shortcircuit");
+  $self->register_eval_rule("check_shortcircuit"); # type does not matter
  $self->set_config($mailsaobject->{conf});

  return $self;
@ -88,6 +88,14 @@ that.
 To override a test that uses shortcircuiting, you can set the classification
 type to C<off>.

+Note that DNS and other network lookups are launched when SA reaches
+priority -100.  If you want to shortcircuit scanning before any network
+queries are sent, you need to set lower than -100 priority to any such rule,
+like -200 as in the examples below.
+
+Shortcircuited test will be automatically set to priority -200, but only if
+the original priority is unchanged at default 0.
+
 =over 4

 =item on
@ -99,7 +107,7 @@ shortcircuited.  This would allow you, for example, to define a rule such as
  body TEST /test/
  describe TEST test rule that scores barely over spam threshold
  score TEST 5.5
-  priority TEST -100
+  priority TEST -200
  shortcircuit TEST on

 The result of a message hitting the above rule would be a final score of 5.5,
@ -113,11 +121,11 @@ Disables shortcircuiting on said rule.

 Shortcircuit the rule using a set of defaults; override the default score of
 this rule with the score from C<shortcircuit_spam_score>, set the
-C<noautolearn> tflag, and set priority to C<-100>.  In other words,
+C<noautolearn> tflag, and set priority to C<-200>.  In other words,
 equivalent to:

  shortcircuit TEST on
-  priority TEST -100
+  priority TEST -200
  score TEST 100
  tflags TEST noautolearn

@ -125,11 +133,11 @@ equivalent to:

 Shortcircuit the rule using a set of defaults; override the default score of
 this rule with the score from C<shortcircuit_ham_score>, set the C<noautolearn>
-and C<nice> tflags, and set priority to C<-100>.   In other words, equivalent
+and C<nice> tflags, and set priority to C<-200>.   In other words, equivalent
 to:

  shortcircuit TEST on
-  priority TEST -100
+  priority TEST -200
  score TEST -100
  tflags TEST noautolearn nice

@ -141,23 +149,22 @@ to:
    setting => 'shortcircuit',
    code => sub {
      my ($self, $key, $value, $line) = @_;
-      my ($rule,$type);
      unless (defined $value && $value !~ /^$/) {
        return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
      }
-      if ($value =~ /^(\S+)\s+(\S+)$/) {
-        $rule=$1;
-        $type=$2;
-      } else {
+      local($1,$2);
+      unless ($value =~ /^(\w+)\s+(\w+)$/) {
        return $Mail::SpamAssassin::Conf::INVALID_VALUE;
      }
+      my ($rule, $type) = ($1, $2);

-      if ($type =~ m/^(?:spam|ham)$/) {
+      if ($type eq "ham" || $type eq "spam") {
        dbg("shortcircuit: adding $rule using abbreviation $type");

        # set the defaults:
        $self->{shortcircuit}->{$rule} = $type;
-        $self->{priority}->{$rule} = -100;
+        # don't override existing priority unless it's default 0
+        $self->{priority}->{$rule} ||= -200;

        my $tf = $self->{tflags}->{$rule};
        $self->{tflags}->{$rule} = ($tf ? $tf." " : "") .
@ -227,7 +234,7 @@ sub hit_rule {
  my $rule = $params->{rulename};

  # don't s/c if we're linting
-  return if ($scan->{lint_rules});
+  return if ($self->{main}->{lint_rules});

  # don't s/c if we're in compile_now()
  return if ($self->{am_compiling});
@ -256,6 +263,8 @@ sub hit_rule {
    $scscore = $score;
  }

+  $scan->{shortcircuited} = 1;
+
  # bug 5256: if we short-circuit, don't do auto-learning
  $scan->{disable_auto_learning} = 1;
  $scan->got_hit('SHORTCIRCUIT', '', score => $scscore);
@ -306,6 +315,6 @@ sub compile_now_finish {

 =head1 SEE ALSO

-C<http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109>
+C<https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109>

 =cut
--- a/upstream/lib/Mail/SpamAssassin/Plugin/SpamCop.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/SpamCop.pm
@ -29,13 +29,13 @@ SpamCop is a service for reporting spam.  SpamCop determines the origin
 of unwanted email and reports it to the relevant Internet service
 providers.  By reporting spam, you have a positive impact on the
 problem.  Reporting unsolicited email also helps feed spam filtering
-systems, including, but not limited to, the SpamCop blacklist used in
+systems, including, but not limited to, the SpamCop blocklist used in
 SpamAssassin as a DNSBL.

 Note that spam reports sent by this plugin to SpamCop each include the
 entire spam message.

-See http://www.spamcop.net/ for more information about SpamCop.
+See https://www.spamcop.net/ for more information about SpamCop.

 =cut

@ -43,13 +43,13 @@ package Mail::SpamAssassin::Plugin::SpamCop;

 use Mail::SpamAssassin::Plugin;
 use Mail::SpamAssassin::Logger;
+use Mail::SpamAssassin::Util qw(untaint_var);
 use IO::Socket;
 use strict;
 use warnings;
 # use bytes;
 use re 'taint';

-use constant HAS_NET_DNS => eval { require Net::DNS; };
 use constant HAS_NET_SMTP => eval { require Net::SMTP; };

 our @ISA = qw(Mail::SpamAssassin::Plugin);
@ -63,7 +63,7 @@ sub new {
  bless ($self, $class);

  # are network tests enabled?
-  if (!$mailsaobject->{local_tests_only} && HAS_NET_DNS && HAS_NET_SMTP) {
+  if (!$mailsaobject->{local_tests_only} && HAS_NET_SMTP) {
    $self->{spamcop_available} = 1;
    dbg("reporter: network tests on, attempting SpamCop");
  }
@ -114,7 +114,7 @@ guess will be used as the From: address in SpamCop reports.
 =item spamcop_to_address user@example.com   (default: generic reporting address)

 Your customized SpamCop report submission address.  You need to obtain
-this address by registering at C<http://www.spamcop.net/>.  If this is
+this address by registering at C<https://www.spamcop.net/>.  If this is
 not set, SpamCop reports will go to a generic reporting address for
 SpamAssassin users and your reports will probably have less weight in
 the SpamCop system.
@ -153,6 +153,35 @@ size that SpamCop will accept at the time of release.
    type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
  });

+=item spamcop_relayhost server:port  (default: direct connection to SpamCop)
+
+Direct connection to SpamCop servers (port 587) is used for report
+submission by default.  If this is undesirable or blocked by local firewall
+policies, you can specify a local SMTP relayhost to forward reports. 
+Relayhost should be configured to not scan the report, for example by using
+a separate submission port.  SSL or authentication is not supported.
+
+=cut
+
+  push (@cmds, {
+    setting => 'spamcop_relayhost',
+    default => undef,
+    type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
+    code => sub {
+      my ($self, $key, $value, $line) = @_;
+      if ($value =~ /^(\S+):(\d{2,5})$/) {
+	$self->{spamcop_relayhost} = untaint_var($1);
+	$self->{spamcop_relayport} = untaint_var($2);
+      }
+      elsif ($value =~ /^$/) {
+	return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;
+      }
+      else {
+	return $Mail::SpamAssassin::Conf::INVALID_VALUE;
+      }
+    }
+  });
+
  $conf->{parser}->register_commands(\@cmds);
 }

@ -257,17 +286,31 @@ EOM

  # send message
  my $failure;
-  my $mx = $head{To};
  my $hello = Mail::SpamAssassin::Util::fq_hostname() || $from;
-  $mx =~ s/.*\@//;
  $hello =~ s/.*\@//;
-  for my $rr (Net::DNS::mx($mx)) {
-    my $exchange = Mail::SpamAssassin::Util::untaint_hostname($rr->exchange);
-    next unless $exchange;
-    my $smtp;
-    if ($smtp = Net::SMTP->new($exchange,
+
+  my @mxs;
+  if ($options->{report}->{conf}->{spamcop_relayhost}) {
+    push @mxs, $options->{report}->{conf}->{spamcop_relayhost};
+  } else {
+    my $mx = $head{To};
+    $mx =~ s/.*\@//;
+    foreach my $rr (Net::DNS::mx($mx)) {
+      if (defined $rr->exchange) {
+        push @mxs, Mail::SpamAssassin::Util::untaint_hostname($rr->exchange);
+      }
+    }
+    if (!@mxs) {
+      warn("reporter: failed to resolve SpamCop MX servers\n");
+      return 0;
+    }
+  }
+  my $port = $options->{report}->{conf}->{spamcop_relayport} || 587;
+
+  for my $exchange (@mxs) {
+    if (my $smtp = Net::SMTP->new($exchange,
 			       Hello => $hello,
-			       Port => 587,
+			       Port => $port,
 			       Timeout => 10))
    {
      if ($smtp->mail($from) && smtp_dbg("FROM $from", $smtp) &&
--- a/upstream/lib/Mail/SpamAssassin/Plugin/TextCat.pm
+++ b/upstream/lib/Mail/SpamAssassin/Plugin/TextCat.pm
@ -87,8 +87,8 @@ sub new {
    }
  }

-  $self->register_eval_rule("check_language");
-  $self->register_eval_rule("check_body_8bits");
+  $self->register_eval_rule("check_language"); # type does not matter
+  $self->register_eval_rule("check_body_8bits", $Mail::SpamAssassin::Conf::TYPE_BODY_EVALS);

  $self->set_config($mailsaobject->{conf});

@ -114,7 +114,7 @@ confidence. In that case, no action is taken.

 The rule C<UNWANTED_LANGUAGE_BODY> is triggered if none of the languages
 detected are in the "ok" list. Note that this is the only effect of the
-"ok" list. It does not act as a whitelist against any other form of spam
+"ok" list. It does not act as a welcomelist against any other form of spam
 scanning.

 In your configuration, you must use the two or three letter language
@ -406,7 +406,7 @@ sub load_models {
  # create language ngram maps once
  for (@lm) {
    # look for end delimiter
-    if (/^0 (.+)/) {
+    if (index($_, '0 ') == 0 && /^0 (.+)/) {
      $ngram->{"language"} = $1;
      push(@nm, $ngram);
      # reset for next language
@ -449,7 +449,14 @@ sub classify {
      $p += exists($ngram->{$_}) ? abs($ngram->{$_} - $i) : $maxp;
      $i++;
    }
-    $results{$language} = $p;
+    # Most latin1 languages have xx and xx.utf8 alternatives (those which
+    # don't have should be named xx.utf-8).  Always strip .utf8 from name,
+    # it will not be accurate as matching will depend on normalize_charset
+    # and mail encoding.  Keep track of the best score for alternatives.
+    $language = $short  if index($language, '.utf8') > 0;
+    if (!exists $results{$language} || $results{$language} > $p) {
+      $results{$language} = $p
+    }
  }
  my @results = sort { $results{$a} <=> $results{$b} } keys %results;

@ -459,7 +466,11 @@ sub classify {
  my @results_tag;
  foreach (@results[0..19]) {
    last unless defined $_;
+    if($best != 0) {
      push @results_tag, sprintf "%s:%s(%.02f)", $_, $results{$_}, $results{$_} / $best;
+    } else {
+      push @results_tag, sprintf "%s:%s(unknown)", $_, $results{$_};
+    }
  }
  $opts->{permsgstatus}->set_tag('TEXTCATRESULTS', join(' ', @results_tag));

@ -539,11 +550,14 @@ sub extract_metadata {

  my $body = $msg->get_rendered_body_text_array();
  $body = join("\n", @{$body});
-  $body =~ s/^Subject://i;
+
+  # Strip subject prefixes, enhances results
+  $body =~ s/^(?:[a-z]{2,12}:\s*){1,10}//i;

  # Strip anything that looks like url or email, enhances results
-  $body =~ s{https?://\S+}{ }gs;
-  $body =~ s{\S+?\@[a-zA-Z]\S+}{ }gs;
+  $body =~ s/https?(?:\:\/\/|:&#x2F;&#x2F;|%3A%2F%2F)\S{1,1024}/ /gs;
+  $body =~ s/\S{1,64}?\@[a-zA-Z]\S{1,128}/ /gs;
+  $body =~ s/\bwww\.\S{1,128}/ /gs;

  my $len = length($body);
  # truncate after 10k; that should be plenty to classify it
--- a/Show More
+++ b/Show More