mirror of
https://git.proxmox.com/git/proxmox-spamassassin
synced 2025-04-28 12:19:37 +00:00
2074 lines
82 KiB
Perl
2074 lines
82 KiB
Perl
# <@LICENSE>
|
|
# Licensed to the Apache Software Foundation (ASF) under one or more
|
|
# contributor license agreements. See the NOTICE file distributed with
|
|
# this work for additional information regarding copyright ownership.
|
|
# The ASF licenses this file to you under the Apache License, Version 2.0
|
|
# (the "License"); you may not use this file except in compliance with
|
|
# the License. You may obtain a copy of the License at:
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
# </@LICENSE>
|
|
|
|
|
|
=head1 NAME
|
|
|
|
Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
The TxRep (Reputation) plugin is designed as an improved replacement of the AWL
|
|
(Auto-Welcomelist) plugin. It adjusts the final message spam score by looking up
|
|
and taking in consideration the reputation of the sender.
|
|
|
|
To try TxRep out, you B<have to> first disable the AWL plugin (if enabled), and
|
|
back up its database. AWL is loaded in v310.pre and can be disabled by
|
|
commenting out the loadplugin line:
|
|
|
|
# loadplugin Mail::SpamAssassin::Plugin::AWL
|
|
|
|
When AWL is not disabled, TxRep will refuse to run.
|
|
|
|
TxRep should be enabled by uncommenting the following line in v341.pre:
|
|
|
|
loadplugin Mail::SpamAssassin::Plugin::TxRep
|
|
|
|
Use the supplied 60_txreputation.cf file or add these lines to a .cf file:
|
|
|
|
header TXREP eval:check_senders_reputation()
|
|
describe TXREP Score normalizing based on sender's reputation
|
|
tflags TXREP userconf noautolearn
|
|
priority TXREP 1000
|
|
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
This plugin is intended to replace the former AWL - AutoWelcomeList. Although the
|
|
concept and the scope differ, the purpose remains the same - the normalizing of spam
|
|
score results based on previous sender's history. The name was intentionally changed
|
|
from "whitelist" to "reputation" to avoid any confusion, since the result score can
|
|
be adjusted in both directions.
|
|
|
|
The TxRep plugin keeps track of the average SpamAssassin score for senders.
|
|
Senders are tracked using multiple identificators, or their combinations: the From:
|
|
email address, the originating IP and/or an originating block of IPs, sender's domain
|
|
name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce
|
|
the variability in scoring from message to message, and modifies the final score by
|
|
pushing the result towards the historical average. This improves the accuracy of
|
|
filtering for most email.
|
|
|
|
In comparison with the original AWL plugin, several conceptual changes were implemented
|
|
in TxRep:
|
|
|
|
1. B<Scoring> - at AWL, although it tracks the number of messages received from each
|
|
respective sender, when calculating the corrective score at a new message, it does
|
|
not take it in count in any way. So for example a sender who previously sent a single
|
|
ham message with the score of -5, and then sends a second one with the score of +10,
|
|
AWL will issue a corrective score bringing the score towards the -5. With the default
|
|
C<auto_welcomelist_factor> of 0.5, the resulting score would be only 2.5. And it would be
|
|
exactly the same even if the sender previously sent 1,000 messages with the average of
|
|
-5. TxRep tries to take the maximal advantage of the collected data, and adjusts the
|
|
final score not only with the mean reputation score stored in the database, but also
|
|
respecting the number of messages already seen from the sender. You can see the exact
|
|
formula in the section L</C<txrep_factor>>.
|
|
|
|
2. B<Learning> - AWL ignores any spam/ham learning. In fact it acts against it, which
|
|
often leads to a frustrating situation, where a user repeatedly tags all messages of a
|
|
given sender as spam (resp. ham), but at any new message from the sender, AWL will
|
|
adjust the score of the message back to the historical average which does B<not> include
|
|
the learned scores. This is now changed at TxRep, and every spam/ham learning will be
|
|
recorded in the reputation database, and hence taken in consideration at future email
|
|
from the respective sender. See the section L</"LEARNING SPAM / HAM"> for more details.
|
|
|
|
3. B<Auto-Learning> - in certain situations SpamAssassin may declare a message an
|
|
obvious spam resp. ham, and launch the auto-learning process, so that the message can be
|
|
re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin
|
|
will readjust the stored reputation by the value defined by L</C<txrep_learn_penalty>>
|
|
resp. L</C<txrep_learn_bonus>>. Auto-learning score thresholds may be tuned, or the
|
|
auto-learning completely disabled, through the setting L</C<txrep_autolearn>>.
|
|
|
|
4. B<Relearning> - messages that were wrongly learned or auto-learned, can be relearned.
|
|
Old reputations are removed from the database, and new ones added instead of them. The
|
|
relearning works better when message tracking is enabled through the
|
|
L</C<txrep_track_messages>> option. Without it, the relearned score is simply added to
|
|
the reputation, without removing the old ones.
|
|
|
|
5. B<Aging> - with AWL, any historical record of given sender has the same weight. It
|
|
means that changes in senders behavior, or modified SA rules may take long time, or
|
|
be virtually negated by the AWL normalization, especially at senders with high count
|
|
of past messages, and low recent frequency. It also turns to be particularly
|
|
counterproductive when the administrator detects new patterns in certain messages, and
|
|
applies new rules to better tag such messages as spam or ham. AWL will practically
|
|
eliminate the effect of the new rules, by adjusting the score back towards the (wrong)
|
|
historical average. Only setting the C<auto_welcomelist_factor> lower would help, but in
|
|
the same time it would also reduce the overall impact of AWL, and put doubts on its
|
|
purpose. TxRep, besides the L</C<txrep_factor>> (replacement of the C<auto_welcomelist_factor>),
|
|
introduces also the L</C<txrep_dilution_factor>> to help coping with this issue by
|
|
progressively reducing the impact of past records. More details can be found in the
|
|
description of the factor below.
|
|
|
|
6. B<Blocklisting and Welcomelisting> - when a welcomelisting or blocklisting was requested
|
|
through SpamAssassin's API, AWL adjusts the historical total score of the plain email
|
|
address without IP (and deleted records bound to an IP), but since during the reception
|
|
new records with IP will be added, the blocklisted entry would cease acting during
|
|
scanning. TxRep always uses the record of the plain email address without IP together
|
|
with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight
|
|
factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100)
|
|
for the blocklisting (resp. welcomelisting) purposes. TxRep increases the value
|
|
proportionally to the weight factor of the EMAIL reputation. It is explained in details
|
|
in the section L<BLOCKLISTING / WELCOMELISTING>. TxRep can blocklist or welcomelist also
|
|
IP addresses, domain names, and dotless HELO names.
|
|
|
|
7. B<Sender Identification> - AWL identifies a sender on the basis of the email address
|
|
used, and the originating IP address (better told its part defined by the mask setting).
|
|
The main purpose of this measure is to avoid assigning false good scores to spammers who
|
|
spoof known email addresses. The disadvantage appears at senders who send from frequently
|
|
changing locations or even when connecting through dynamical IP addresses that are not
|
|
within the block defined by the mask setting. Their score is difficult or sometimes
|
|
impossible to track. Another disadvantage is, for example, at a spammer persistently
|
|
sending spam from the same IP address, just under different email addresses. AWL will not
|
|
find his previous scores, unless he reuses the same email address again. TxRep uses several
|
|
identificators, and creates separate database entries for each of them. It tracks not only
|
|
the email/IP address combination like AWL, but also the standalone email address (regardless
|
|
of the originating IP), the standalone IP (regardless of email address used), the domain
|
|
name of the email address, the DKIM signature, and the HELO name of the connecting PC. The
|
|
influence of each individual identificator may be tuned up with the help of weight factors
|
|
described in the section L</REPUTATION WEIGHTS>.
|
|
|
|
8. B<Message Tracking> - TxRep (optionally) keeps track of already scanned and/or learned
|
|
message ID's. This is useful for avoiding to strengthen the reputation score by simply
|
|
rescanning or relearning the same message multiple times. In the same time it also allows
|
|
the proper relearning of once wrongly learned messages, or relearning them after the
|
|
learn penalty or bonus were changed. See the option L</C<txrep_track_messages>>.
|
|
|
|
9. B<User and Global Storages> - usually it is recommended to use the per-user setup
|
|
of SpamAssassin, because each user may have quite different requirements, and may receive
|
|
quite different sort of email. Especially when using the Bayesian and AWL plugins,
|
|
the efficiency is much better when SpamAssassin is learned spam and ham separately
|
|
for each user. However, the disadvantage is that senders and emails already learned
|
|
many times by different users, will need to be relearned without any recognized history,
|
|
anytime they arrive to another user. TxRep uses the advantages of both systems. It can
|
|
use dual storages: the global common storage, where all email processed by SpamAssassin
|
|
is recorded, and a local storage separate for each user, with reputation data from his
|
|
email only. See more details at the setting L</C<txrep_user2global_ratio>>.
|
|
|
|
10. B<Outbound Welcomelisting> - when a local user sends messages to an email address, we
|
|
assume that he needs to see the eventual answer too, hence the recipient's address should
|
|
be welcomelisted. When SpamAssassin is used for scanning outgoing email too, when local
|
|
users use the SMTP server where SA is installed, for sending email, and when internal
|
|
networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses
|
|
from messages originating in the internal networks. Details can be found at the setting
|
|
L</C<txrep_welcomelist_out>>.
|
|
|
|
Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow
|
|
TxRep running. TxRep reuses the database handling of the original AWL module, and some
|
|
its parameters bound to the database handler modules. By default, TxRep creates its own
|
|
database, but the original auto-welcomelist can be reused as a starting point. The AWL
|
|
database can be renamed to the name defined in TxRep settings, and TxRep will start
|
|
using it. The original auto-welcomelist database has to be backed up, to allow switching
|
|
back to the original state.
|
|
|
|
The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and
|
|
spamassassin/AutoWelcomelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm
|
|
and spamassassin/SQLBasedAddrList.pm are still needed.
|
|
|
|
|
|
=head1 TEMPLATE TAGS
|
|
|
|
This plugin module adds the following C<tags> that can be used as
|
|
placeholders in certain options. See L<Mail::SpamAssassin::Conf>
|
|
for more information on TEMPLATE TAGS.
|
|
|
|
_TXREPXXXY_ TXREP modifier
|
|
_TXREPXXXYMEAN_ Mean score on which TXREP modification is based
|
|
_TXREPXXXYCOUNT_ Number of messages on which TXREP modification is based
|
|
_TXREPXXXYPRESCORE_ Score before TXREP
|
|
_TXREPXXXYUNKNOWN_ New sender (not found in the TXREP list)
|
|
|
|
The XXX part of the tag takes the form of one of the following IDs, depending
|
|
on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or HELO. The Y appendix
|
|
ID is used only in the case of dual storage, and takes the form of either U (for
|
|
user storage reputations), or G (for global storage reputations).
|
|
|
|
=cut
|
|
|
|
package Mail::SpamAssassin::Plugin::TxRep;
|
|
|
|
use strict;
|
|
use warnings;
|
|
# use bytes;
|
|
use re 'taint';
|
|
|
|
use NetAddr::IP 4.000; # qw(:upper);
|
|
use Mail::SpamAssassin::Plugin;
|
|
use Mail::SpamAssassin::Plugin::Bayes;
|
|
use Mail::SpamAssassin::Util qw(untaint_var);
|
|
use Mail::SpamAssassin::Logger;
|
|
|
|
our @ISA = qw(Mail::SpamAssassin::Plugin);
|
|
|
|
|
|
###########################################################################
|
|
sub new { # constructor: register the eval rule
|
|
###########################################################################
|
|
my ($class, $main) = @_;
|
|
|
|
$class = ref($class) || $class;
|
|
my $self = $class->SUPER::new($main);
|
|
bless($self, $class);
|
|
|
|
$self->{main} = $main;
|
|
$self->{conf} = $main->{conf};
|
|
$self->{factor} = $main->{conf}->{txrep_factor};
|
|
$self->register_eval_rule("check_senders_reputation", $Mail::SpamAssassin::Conf::TYPE_HEAD_EVALS);
|
|
$self->set_config($main->{conf});
|
|
|
|
# only the default conf loaded here, do nothing here requiring
|
|
# the runtime settings
|
|
dbg("TxRep: new object created");
|
|
return $self;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub set_config {
|
|
###########################################################################
|
|
my($self, $conf) = @_;
|
|
my @cmds;
|
|
|
|
# -------------------------------------------------------------------------
|
|
=head1 USER PREFERENCES
|
|
|
|
The following options can be used in both site-wide (C<local.cf>) and
|
|
user-specific (C<user_prefs>) configuration files to customize how
|
|
SpamAssassin handles incoming email messages.
|
|
|
|
=over 4
|
|
|
|
=item B<use_txrep>
|
|
|
|
0 | 1 (default: 0)
|
|
|
|
Whether to use TxRep reputation system. TxRep tracks the long-term average
|
|
score for each sender and then shifts the score of new messages toward that
|
|
long-term average. This can increase or decrease the score for messages,
|
|
depending on the long-term behavior of the particular correspondent.
|
|
|
|
Note that certain tests are ignored when determining the final message score:
|
|
|
|
- rules with tflags set to 'noautolearn'
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'use_txrep',
|
|
default => 0,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
|
|
});
|
|
|
|
|
|
=item B<txrep_factor>
|
|
|
|
range [0..1] (default: 0.5)
|
|
|
|
How much towards the long-term mean for the sender to regress a message.
|
|
Basically, the algorithm is to track the long-term total score and the count
|
|
of messages for the sender (C<total> and C<count>), and then once we have
|
|
otherwise fully calculated the score for this message (C<score>), we calculate
|
|
the final score for the message as:
|
|
|
|
finalscore = score + factor * (total + score)/(count + 1)
|
|
|
|
So if C<factor> = 0.5, then we'll move to half way between the calculated
|
|
score and the new mean value. If C<factor> = 0.3, then we'll move about 1/3
|
|
of the way from the score toward the mean. C<factor> = 1 means use the
|
|
long-term mean including also the new unadjusted score; C<factor> = 0 mean
|
|
just use the calculated score, disabling so the score averaging, though still
|
|
recording the reputation to the database.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_factor',
|
|
default => 0.5,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_factor} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_dilution_factor>
|
|
|
|
range [0.7..1.0] (default: 0.98)
|
|
|
|
At any new email from given sender, the historical reputation records are "diluted",
|
|
or "watered down" by certain fraction given by this factor. It means that the
|
|
influence of old records will progressively diminish with every new message from
|
|
given sender. This is important to allow a more flexible handling of changes in
|
|
sender's behavior, or new improvements or changes of local SA rules.
|
|
|
|
Without any dilution expiry (dilution factor set to 1), the new message score is
|
|
simply add to the total score of given sender in the reputation database. When
|
|
dilution is used (factor < 1), the impact of the historical reputation average is
|
|
reduced by the factor before calculating the new average, which in turn is then
|
|
used to adjust the new total score to be stored in the database.
|
|
|
|
newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
|
|
|
|
In other words, it means that the older a message is, the less and less impact
|
|
on the new average its original spam score has. For example if we set the factor
|
|
to 0.9 (meaning dilution by 10%), the score of the new message will be recorded
|
|
to its 100%, the last score of the same sender to 90%, the second last to 81%
|
|
(0.9 * 0.9 = 0.81), and for example the 10th last message just to 35%.
|
|
|
|
At stable systems, we recommend keeping the factor close to 1 (but still lower
|
|
than 1). At systems where SA rules tuning and spam learning is still in progress,
|
|
lower factors will help the reputation to quicker adapt any modifications. In
|
|
the same time, it will also reduce the impact of the historical reputation
|
|
though.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_dilution_factor',
|
|
default => 0.98,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0.7 || $value > 1.0) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_dilution_factor} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
# TODO, not implemented yet, hence no advertising until then
|
|
# -------------------------------------------------------------------------
|
|
#=item B<txrep_expiry_days>
|
|
#
|
|
# range [0..10000] (default: 365)
|
|
#
|
|
#The scores of of messages can be removed from the total reputation, and the
|
|
#message tracking entry removed from the database after given number of days.
|
|
#It helps keeping the database growth under control, and it also reduces the
|
|
#influence of old scores on the current reputation (both scoring methods, and
|
|
#sender's behavior might have changed over time).
|
|
#
|
|
#=cut # ...................................................................
|
|
push (@cmds, {
|
|
setting => 'txrep_expiry_days',
|
|
default => 365,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10000) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_expiry_days} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_learn_penalty>
|
|
|
|
range [0..200] (default: 20)
|
|
|
|
When SpamAssassin is trained a SPAM message, the given penalty score will
|
|
be added to the total reputation score of the sender, regardless of the real
|
|
spam score. The impact of the penalty will be the smaller the higher is the
|
|
number of messages that the sender already has in the TxRep database.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_learn_penalty',
|
|
default => 20,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_learn_penalty} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_learn_bonus>
|
|
|
|
range [0..200] (default: 20)
|
|
|
|
When SpamAssassin is trained a HAM message, the given penalty score will be
|
|
deduced from the total reputation score of the sender, regardless of the real
|
|
spam score. The impact of the penalty will be the smaller the higher is the
|
|
number of messages that the sender already has in the TxRep database.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_learn_bonus',
|
|
default => 20,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_learn_bonus} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_autolearn>
|
|
|
|
range [0..5] (default: 0)
|
|
|
|
When SpamAssassin declares a message a clear spam resp. ham during the message
|
|
scan, and launches the auto-learn process, sender reputation scores of given
|
|
message will be adjusted by the value of the option L</C<txrep_learn_penalty>>,
|
|
resp. the L</C<txrep_learn_bonus>> in the same way as during the manual learning.
|
|
Value 0 at this option disables the auto-learn reputation adjustment - only the
|
|
score calculated before the auto-learn will be stored to the reputation database.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_autolearn',
|
|
default => 0,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 5) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_autolearn} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_track_messages>
|
|
|
|
0 | 1 (default: 1)
|
|
|
|
Whether TxRep should keep track of already scanned and/or learned messages.
|
|
When enabled, an additional record in the reputation database will be created
|
|
to avoid false score adjustments due to repeated scanning of the same message,
|
|
and to allow proper relearning of messages that were either previously wrongly
|
|
learned, or need to be relearned after modifying the learn penalty or bonus.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_track_messages',
|
|
default => 1,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
|
|
});
|
|
|
|
|
|
=item B<txrep_welcomelist_out>
|
|
|
|
range [0..200] (default: 10)
|
|
|
|
Previously txrep_whitelist_out which will work interchangeably until 4.1.
|
|
|
|
When the value of this setting is greater than zero, recipients of messages sent from
|
|
within the internal networks will be welcomelisted through improving their total reputation
|
|
score with the number of points defined by this setting. Since the IP address and other
|
|
sender identificators are not known when sending the email, only the reputation of the
|
|
standalone email is being welcomelisted. The domain name is intentionally also left
|
|
unaffected. The outbound welcomelisting can only work when SpamAssassin is set up to scan
|
|
also outgoing email, when local users use the SMTP server for sending email, and when
|
|
C<internal_networks> are defined in SpamAssassin configuration. The improving of the
|
|
reputation happens at every message sent from internal networks, so the more messages is
|
|
being sent to the recipient, the better reputation his email address will have.
|
|
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_welcomelist_out',
|
|
aliases => ['txrep_whitelist_out'], # removed in 4.1
|
|
default => 10,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 200) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_welcomelist_out} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_ipv4_mask_len>
|
|
|
|
range [0..32] (default: 16)
|
|
|
|
The AWL database keeps only the specified number of most-significant bits
|
|
of an IPv4 address in its fields, so that different individual IP addresses
|
|
within a subnet belonging to the same owner are managed under a single
|
|
database record. As we have no information available on the allocated
|
|
address ranges of senders, this CIDR mask length is only an approximation.
|
|
The default is 16 bits, corresponding to a former class B. Increase the
|
|
number if a finer granularity is desired, e.g. to 24 (class C) or 32.
|
|
A value 0 is allowed but is not particularly useful, as it would treat the
|
|
whole internet as a single organization. The number need not be a multiple
|
|
of 8, any split is allowed.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_ipv4_mask_len',
|
|
default => 16,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if (!defined $value || $value eq '')
|
|
{return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
|
|
elsif ($value !~ /^\d+$/ || $value < 0 || $value > 32)
|
|
{return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_ipv4_mask_len} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<txrep_ipv6_mask_len>
|
|
|
|
range [0..128] (default: 48)
|
|
|
|
The AWL database keeps only the specified number of most-significant bits
|
|
of an IPv6 address in its fields, so that different individual IP addresses
|
|
within a subnet belonging to the same owner are managed under a single
|
|
database record. As we have no information available on the allocated address
|
|
ranges of senders, this CIDR mask length is only an approximation. The default
|
|
is 48 bits, corresponding to an address range commonly allocated to individual
|
|
(smaller) organizations. Increase the number for a finer granularity, e.g.
|
|
to 64 or 96 or 128, or decrease for wider ranges, e.g. 32. A value 0 is
|
|
allowed but is not particularly useful, as it would treat the whole internet
|
|
as a single organization. The number need not be a multiple of 4, any split
|
|
is allowed.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_ipv6_mask_len',
|
|
default => 48,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if (!defined $value || $value eq '')
|
|
{return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
|
|
elsif ($value !~ /^\d+$/ || $value < 0 || $value > 128)
|
|
{return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_ipv6_mask_len} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<user_awl_sql_override_username>
|
|
|
|
string (default: undefined)
|
|
|
|
Used by the SQLBasedAddrList storage implementation.
|
|
|
|
If this option is set the SQLBasedAddrList module will override the set
|
|
username with the value given. This can be useful for implementing global
|
|
or group based TxRep databases.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'user_awl_sql_override_username',
|
|
default => '',
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
|
|
=item B<txrep_user2global_ratio>
|
|
|
|
range [0..10] (default: 0)
|
|
|
|
When the option txrep_user2global_ratio is set to a value greater than zero, and
|
|
if the server configuration allows it, two data storages will be used - user and
|
|
global (server-wide) storages.
|
|
|
|
User storage keeps only senders who send messages to the respective recipient,
|
|
and will reflect also the corrected/learned scores, when some messages are marked
|
|
by the user as spam or ham, or when the sender is welcomelisted or blocklisted
|
|
through the API of SpamAssassin.
|
|
|
|
Global storage keeps the reputation data of all messages processed by SpamAssassin
|
|
with their spam scores and spam/ham learning data from all users on the server.
|
|
Hence, the module will return a reputation value even at senders not known to the
|
|
current recipient, as long as he already sent email to anyone else on the server.
|
|
|
|
The value of the txrep_user2global_ratio parameter controls the impact of each
|
|
of the two reputations. When equal to 1, both the global and the user score will
|
|
have the same impact on the result. When set to 2, the reputation taken from
|
|
the user storage will have twice the impact of the global value. The final value
|
|
of the TXREP tag will be calculated as follows:
|
|
|
|
total = ( ratio * user + global ) / ( ratio + 1 )
|
|
|
|
When no reputation is found in the user storage, and a global reputation is
|
|
available, the global storage is used fully, without applying the ratio.
|
|
|
|
When the ratio is set to zero, only the default storage will be used. And it
|
|
then depends whether you use the global, or the local user storage by default,
|
|
which in turn is controlled either by the parameter user_awl_sql_override_username
|
|
(in case of SQL storage), or the C</auto_welcomelist_path> parameter (in case of
|
|
Berkeley database).
|
|
|
|
When this dual storage is enabled, and no global storage is defined by the
|
|
above mentioned parameters for the Berkeley or SQL databases, TxRep will attempt
|
|
to use a generic storage - user 'GLOBAL' in case of SQL, and in the case of
|
|
Berkeley database it uses the path defined by '__local_state_dir__/tx-reputation',
|
|
which typically renders into /var/db/spamassassin/tx-reputation. When the default
|
|
storages are not available, or are not writable, you would have to set the global
|
|
storage with the help of the C<user_awl_sql_override_username> resp.
|
|
C<auto_welcomelist_path settings>.
|
|
|
|
Please note that some SpamAssassin installations run always under the same user
|
|
ID. In such case it is pointless enabling the dual storage, because it would
|
|
maximally lead to two identical global storages in different locations.
|
|
|
|
This feature is disabled by default.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_user2global_ratio',
|
|
default => 0,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_user2global_ratio} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<auto_welcomelist_distinguish_signed> (default: 1 - enabled)
|
|
|
|
Previously auto_welcomelist_distinguish_signed which will work interchangeably until 4.1.
|
|
|
|
Used by the SQLBasedAddrList storage implementation.
|
|
|
|
If this option is set the SQLBasedAddrList module will keep separate
|
|
database entries for DKIM-validated e-mail addresses and for non-validated
|
|
ones. Without this option, or for domains that do not use a DKIM signature,
|
|
the reputation of legitimate email can get mixed with the reputation of
|
|
forgeries. A pre-requisite when setting this option is that a field
|
|
txrep.signedby exists in a SQL table, otherwise SQL operations will fail.
|
|
A DKIM plugin must also be enabled in order for this option to take effect.
|
|
This option is highly recommended. Unless you are using a pre-3.3.0 database
|
|
schema and cannot upgrade, there is no reason to disable this option. If
|
|
you are upgrading from AWL and using a pre-3.3.0 schema, the txrep.signedby
|
|
column will not exist. It is recommended that you add this column, but if
|
|
that is not possible you must set this option to 0 to avoid SQL errors.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'auto_welcomelist_distinguish_signed',
|
|
aliases => ['auto_whitelist_distinguish_signed'], # removed in 4.1
|
|
default => 1,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
|
|
});
|
|
|
|
|
|
=item B<txrep_spf>
|
|
|
|
0 | 1 (default: 1)
|
|
|
|
When enabled, TxRep will treat any IP address using a given email address as
|
|
the same authorized identity, and will not associate any IP address with it.
|
|
(The same happens with valid DKIM signatures. No option available for DKIM).
|
|
|
|
Note: at domains that define the useless SPF +all (pass all), no IP would be
|
|
ever associated with the email address, and all addresses (incl. the forged
|
|
ones) would be treated as coming from the authorized source. However, such
|
|
domains are hopefully rare, and ask for this kind of treatment anyway.
|
|
|
|
=back
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_spf',
|
|
default => 1,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
|
|
});
|
|
|
|
|
|
=head2 REPUTATION WEIGHTS
|
|
|
|
The overall reputation of the sender comprises several elements:
|
|
|
|
=over 4
|
|
|
|
=item 1) The reputation of the 'From' email address bound to the originating IP
|
|
address fraction (see the mask parameters for details)
|
|
|
|
=item 2) The reputation of the 'From' email address alone (regardless the IP
|
|
address being currently used)
|
|
|
|
=item 3) The reputation of the domain name of the 'From' email address
|
|
|
|
=item 4) The reputation of the originating IP address, regardless of sender's email address
|
|
|
|
=item 5) The reputation of the HELO name of the originating computer (if available)
|
|
|
|
=back
|
|
|
|
Each of these partial reputations is weighted with the help of these parameters,
|
|
and the overall reputation is calculation as the sum of the individual
|
|
reputations divided by the sum of all their weights:
|
|
|
|
sender_reputation = weight_email * rep_email +
|
|
weight_email_ip * rep_email_ip +
|
|
weight_domain * rep_domain +
|
|
weight_ip * rep_ip +
|
|
weight_helo * rep_helo
|
|
|
|
You can disable the individual partial reputations by setting their respective
|
|
weight to zero. This will also reduce the size of the database, since each
|
|
partial reputation requires a separate entry in the database table. Disabling
|
|
some of the partial reputations in this way may also help with the performance
|
|
on busy servers, because the respective database lookups and processing will
|
|
be skipped too.
|
|
|
|
=over 4
|
|
|
|
=item B<txrep_weight_email>
|
|
|
|
range [0..10] (default: 3)
|
|
|
|
This weight factor controls the influence of the reputation of the standalone
|
|
email address, regardless of the originating IP address. When adjusting the
|
|
weight, you need to keep on mind that an email address can be easily spoofed,
|
|
and hence spammers can use 'from' email addresses belonging to senders with
|
|
good reputation. From this point of view, the email address bound to the
|
|
originating IP address is a more reliable indicator for the overall reputation.
|
|
|
|
On the other hand, some reputable senders may be sending from a bigger number
|
|
of IP addresses, so looking for the reputation of the standalone email address
|
|
without regarding the originating IP has some sense too.
|
|
|
|
We recommend using a relatively low value for this partial reputation.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_weight_email',
|
|
default => 3,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_weight_email} = $value;
|
|
}
|
|
});
|
|
|
|
=item B<txrep_weight_email_ip>
|
|
|
|
range [0..10] (default: 10)
|
|
|
|
This is the standard reputation used in the same way as it was by the original
|
|
AWL plugin. Each sender's email address is bound to the originating IP, or
|
|
its part as defined by the txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
|
|
|
|
At a user sending from multiple locations, diverse mail servers, or from a dynamic
|
|
IP range out of the masked block, his email address will have a separate reputation
|
|
value for each of the different (partial) IP addresses.
|
|
|
|
When the option auto_welcomelist_distinguish_signed is enabled, in contrary to
|
|
the original AWL module, TxRep does not record the IP address when DKIM
|
|
signature is detected. The email address is then not bound to any IP address, but
|
|
rather just to the DKIM signature, since it is considered that it authenticates
|
|
the sender more reliably than the IP address (which can also vary).
|
|
|
|
This is by design the most relevant reputation, and its weight should be kept
|
|
high.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_weight_email_ip',
|
|
default => 10,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_weight_email_ip} = $value;
|
|
}
|
|
});
|
|
|
|
=item B<txrep_weight_domain>
|
|
|
|
range [0..10] (default: 2)
|
|
|
|
Some spammers may use always their real domain name in the email address,
|
|
just with multiple or changing local parts. This reputation will record the
|
|
spam scores of all messages send from the respective domain, regardless of
|
|
the local part (user name) used.
|
|
|
|
Similarly as with the email_ip reputation, the domain reputation is also
|
|
bound to the originating address (or a masked block, if mask parameters used).
|
|
It avoids giving false reputation based on spoofed email addresses.
|
|
|
|
In case of a DKIM signature detected, the signature signer is used instead
|
|
of the domain name extracted from the email address. It is considered that
|
|
the signing authority is responsible for sending email of any domain name,
|
|
hence the same reputation applies here.
|
|
|
|
The domain reputation will give relevant picture about the owner of the
|
|
domain in case of small servers, or corporation with strict policies, but
|
|
will be less relevant for freemailers like Gmail, Hotmail, and similar,
|
|
because both ham and spam may be sent by their users.
|
|
|
|
The default value is set relatively low. Higher weight values may be useful,
|
|
but we recommend caution and observing the scores before increasing it.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_weight_domain',
|
|
default => 2,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_weight_domain} = $value;
|
|
}
|
|
});
|
|
|
|
=item B<txrep_weight_ip>
|
|
|
|
range [0..10] (default: 4)
|
|
|
|
Spammers can send through the same relay (incl. compromised hosts) under a
|
|
multitude of email addresses. This is the exact case when the IP reputation
|
|
can help. This reputation is a kind of a local RBL.
|
|
|
|
The weight is set by default lower than for the email_IP reputation, because
|
|
there may be cases when the same IP address hosts both spammers and acceptable
|
|
senders (for example the marketing department of a company sends you spam, but
|
|
you still need to get messages from their billing address).
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_weight_ip',
|
|
default => 4,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_weight_ip} = $value;
|
|
}
|
|
});
|
|
|
|
=item B<txrep_weight_helo>
|
|
|
|
range [0..10] (default: 0.5)
|
|
|
|
Big number of spam messages come from compromised hosts, often personal computers,
|
|
or top-boxes. Their NetBIOS names are usually used as the HELO name when connecting
|
|
to your mail server. Some of the names are pretty generic and hence may be shared by
|
|
a big number of hosts, but often the names are quite unique and may be a good
|
|
indicator for detecting a spammer, despite that he uses different email and IP
|
|
addresses (spam can come also from portable devices).
|
|
|
|
No IP address is bound to the HELO name when stored to the reputation database.
|
|
This is intentional, and despite the possibility that numerous devices may share
|
|
some of the HELO names.
|
|
|
|
This option is still considered experimental, hence the low weight value, but after
|
|
some testing it could be likely at least slightly increased.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_weight_helo',
|
|
default => 0.5,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value < 0 || $value > 10) {return $Mail::SpamAssassin::Conf::INVALID_VALUE;}
|
|
$self->{txrep_weight_helo} = $value;
|
|
}
|
|
});
|
|
|
|
=item B<txrep_report_details>
|
|
|
|
0 | 1 | 2 (default: 0)
|
|
|
|
Add TxRep details to the rule's description in the message report or summary,
|
|
similar to how RBL rules commonly are showing listed domains.
|
|
|
|
If enabled (value 1) the identificators (From address bound to originating IP
|
|
address fraction, From address alone, domain name bound to originating IP
|
|
address fraction, originating IP address and HELO if available) used in
|
|
calculating the sender's overall reputation are listed, including the
|
|
originating IP address fraction (according to the mask settings) where
|
|
applicable.
|
|
|
|
If this option is set to 2, the listed identificators' individual mean
|
|
reputation and count are reported in addition.
|
|
|
|
Identificators and additional data will only be added to the description on a
|
|
message's initial scan. Re-processing a previously already scanned message
|
|
will not list the individual idenficators and their respective reputation
|
|
values used originally.
|
|
|
|
This option is disabled by default for now, due to potential formatting issues
|
|
caused by the number and length of additional description details.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_report_details',
|
|
default => 0,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
|
|
return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE
|
|
if ($value eq '');
|
|
return $Mail::SpamAssassin::Conf::INVALID_VALUE
|
|
unless ($value =~ /^[012]$/);
|
|
|
|
$self->{txrep_report_details} = $value;
|
|
}
|
|
});
|
|
|
|
=back
|
|
|
|
=head1 ADMINISTRATOR SETTINGS
|
|
|
|
These settings differ from the ones above, in that they are considered 'more
|
|
privileged' -- even more than the ones in the B<PRIVILEGED SETTINGS> section.
|
|
No matter what C<allow_user_rules> is set to, these can never be set from a
|
|
user's C<user_prefs> file.
|
|
|
|
=over 4
|
|
|
|
=item B<txrep_factory module>
|
|
|
|
(default: Mail::SpamAssassin::DBBasedAddrList)
|
|
|
|
Select alternative database factory module for the TxRep database.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'txrep_factory',
|
|
is_admin => 1,
|
|
default => 'Mail::SpamAssassin::DBBasedAddrList',
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
|
|
=item B<auto_welcomelist_path /path/filename>
|
|
|
|
(default: ~/.spamassassin/tx-reputation)
|
|
|
|
Previously auto_whitelist_path which will work interchangeably until 4.1.
|
|
|
|
This is the TxRep directory and filename. By default, each user
|
|
has their own reputation database in their C<~/.spamassassin> directory with
|
|
mode 0700. For system-wide SpamAssassin use, you may want to share this
|
|
across all users.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'auto_welcomelist_path',
|
|
aliases => ['auto_whitelist_path'], # removed in 4.1
|
|
is_admin => 1,
|
|
default => '__userstate__/tx-reputation',
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
unless (defined $value && $value !~ /^$/) {return $Mail::SpamAssassin::Conf::MISSING_REQUIRED_VALUE;}
|
|
$self->{auto_welcomelist_path} = $value;
|
|
}
|
|
});
|
|
|
|
|
|
=item B<auto_welcomelist_db_modules Module ...>
|
|
|
|
(default: see below)
|
|
|
|
Previously auto_whitelist_db_modules which will work interchangeably until 4.1.
|
|
|
|
What database modules should be used for the TxRep storage database
|
|
file. The first named module that can be loaded from the Perl include path
|
|
will be used. The format is:
|
|
|
|
PreferredModuleName SecondBest ThirdBest ...
|
|
|
|
ie. a space-separated list of Perl module names. The default is:
|
|
|
|
DB_File GDBM_File SDBM_File
|
|
|
|
NDBM_File is not supported (see SpamAssassin bug 4353).
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'auto_welcomelist_db_modules',
|
|
aliases => ['auto_whitelist_db_modules'], # removed in 4.1
|
|
is_admin => 1,
|
|
default => 'DB_File GDBM_File SDBM_File',
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
|
|
=item B<auto_welcomelist_file_mode>
|
|
|
|
(default: 0700)
|
|
|
|
Previously auto_whitelist_file_mode which will work interchangeably until 4.1.
|
|
|
|
The file mode bits used for the TxRep directory or file.
|
|
|
|
Make sure you specify this using the 'x' mode bits set, as it may also be used
|
|
to create directories. However, if a file is created, the resulting file will
|
|
not have any execute bits set (the umask is set to 0111).
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'auto_welcomelist_file_mode',
|
|
aliases => ['auto_whitelist_file_mode'], # removed in 4.1
|
|
is_admin => 1,
|
|
default => '0700',
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC,
|
|
code => sub {
|
|
my ($self, $key, $value, $line) = @_;
|
|
if ($value !~ /^0?[0-7]{3}$/) {
|
|
return $Mail::SpamAssassin::Conf::INVALID_VALUE;
|
|
}
|
|
$value = '0'.$value if length($value) == 3; # Bug 5771
|
|
$self->{auto_welcomelist_file_mode} = untaint_var($value);
|
|
}
|
|
});
|
|
|
|
|
|
=item B<user_awl_dsn DBI:databasetype:databasename:hostname:port>
|
|
|
|
Used by the SQLBasedAddrList storage implementation.
|
|
|
|
This will set the DSN used to connect. Example:
|
|
C<DBI:mysql:spamassassin:localhost>
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'user_awl_dsn',
|
|
is_admin => 1,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
|
|
=item B<user_awl_sql_username username>
|
|
|
|
Used by the SQLBasedAddrList storage implementation.
|
|
|
|
The authorized username to connect to the above DSN.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'user_awl_sql_username',
|
|
is_admin => 1,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
|
|
=item B<user_awl_sql_password password>
|
|
|
|
Used by the SQLBasedAddrList storage implementation.
|
|
|
|
The password for the database username, for the above DSN.
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'user_awl_sql_password',
|
|
is_admin => 1,
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
|
|
=item B<user_awl_sql_table tablename>
|
|
|
|
(default: txrep)
|
|
|
|
Used by the SQLBasedAddrList storage implementation.
|
|
|
|
The table name where reputation is to be stored in, for the above DSN.
|
|
|
|
=back
|
|
|
|
=cut
|
|
|
|
push (@cmds, {
|
|
setting => 'user_awl_sql_table',
|
|
is_admin => 1,
|
|
default => 'txrep',
|
|
type => $Mail::SpamAssassin::Conf::CONF_TYPE_STRING
|
|
});
|
|
|
|
$conf->{parser}->register_commands(\@cmds);
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub _message {
|
|
###########################################################################
|
|
my ($self, $value, $msg) = @_;
|
|
print "SpamAssassin TxRep: $value\n" if ($msg);
|
|
dbg("TxRep: $value");
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub _fail_exit {
|
|
###########################################################################
|
|
my ($self, $err) = @_;
|
|
my $eval_stat = ($err ne '') ? $err : "errno=$!";
|
|
chomp $eval_stat;
|
|
warn("TxRep: open of TxRep file failed: $eval_stat\n");
|
|
if (!defined $self->{txKeepStoreTied}) {$self->finish();}
|
|
return 0;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub _fn_envelope {
|
|
###########################################################################
|
|
my ($self, $args, $value, $msg) = @_;
|
|
|
|
unless ($self->{main}->{conf}->{use_txrep}){ return 0;}
|
|
unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg); return 0;}
|
|
|
|
my $factor = $self->{conf}->{txrep_weight_email} +
|
|
$self->{conf}->{txrep_weight_email_ip} +
|
|
$self->{conf}->{txrep_weight_domain} +
|
|
$self->{conf}->{txrep_weight_ip} +
|
|
$self->{conf}->{txrep_weight_helo};
|
|
my $sign = $args->{signedby};
|
|
my $id = $args->{address};
|
|
if (index($args->{address}, ',') >= 0) {
|
|
$sign = $args->{address};
|
|
$sign =~ s/^.*,//g;
|
|
$id =~ s/,.*$//g;
|
|
}
|
|
|
|
# simplified regex used for IP detection (possible FP at a domain is not critical)
|
|
if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo})
|
|
{$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';}
|
|
elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip})
|
|
{$factor /= $self->{conf}->{txrep_weight_ip};}
|
|
elsif (index($id, '@') >= 0 && $self->{conf}->{txrep_weight_email})
|
|
{$factor /= $self->{conf}->{txrep_weight_email};}
|
|
elsif (index($id, '@') == -1 && $self->{conf}->{txrep_weight_domain})
|
|
{$factor /= $self->{conf}->{txrep_weight_domain};}
|
|
else {$factor = 1;}
|
|
|
|
$self->open_storages();
|
|
my $score = (!defined $value)? undef : $factor * $value;
|
|
my $status = $self->modify_reputation($id, $score, $sign);
|
|
dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || '');
|
|
eval {
|
|
$self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id);
|
|
if (!defined $self->{txKeepStoreTied}) {$self->finish();}
|
|
1;
|
|
} or return $self->_fail_exit( $@ );
|
|
return $status;
|
|
}
|
|
|
|
|
|
=head1 BLOCKLISTING / WELCOMELISTING
|
|
|
|
When asked by SpamAssassin to blocklist or welcomelist a user, the TxRep
|
|
plugin adds a score of 100 (for blocklisting) or -100 (for welcomelisting)
|
|
to the given sender's email address. At a plain address without any IP
|
|
address, the value is multiplied by the ratio of total reputation
|
|
weight to the EMAIL reputation weight to account for the reduced impact
|
|
of the standalone EMAIL reputation when calculating the overall reputation.
|
|
|
|
total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
|
|
blocklisted_reputation = 100 * total_weight / weight_email
|
|
|
|
When a standalone email address is blocklisted/welcomelisted, all records
|
|
of the email address bound to an IP address, DKIM signature, or a SPF pass
|
|
will be removed from the database, and only the standalone record is kept.
|
|
|
|
Besides blocklisting/welcomelisting of standalone email addresses, the same
|
|
method may be used also for blocklisting/welcomelisting of IP addresses,
|
|
domain names, and HELO names (only dotless Netbios HELO names can be used).
|
|
|
|
When welcomelisting/blocklisting an email address or domain name, you can
|
|
bind them to a specified DKIM signature or SPF record by appending the
|
|
DKIM signing domain or the tag 'spf' after the ID in the following way:
|
|
|
|
spamassassin --add-addr-to-blocklist=spamming.biz,spf
|
|
spamassassin --add-addr-to-welcomelist=friend@good.org,good.org
|
|
|
|
When a message contains both a DKIM signature and an SPF pass, the DKIM
|
|
signature takes the priority, so the record bound to the 'spf' tag won't
|
|
be checked. Only email addresses and domains can be bound to DKIM or SPF.
|
|
Records of IP addresses and HELO names are always without DKIM/SPF.
|
|
|
|
In case of dual storage, the block/welcomelisting is performed only in the
|
|
default storage.
|
|
|
|
=cut
|
|
|
|
######################################################## plugin hooks #####
|
|
sub blocklist_address {my $self=shift; return $self->_fn_envelope(@_, 100, "blocklisting address");}
|
|
*blacklist_address = \&blocklist_address; # removed in 4.1
|
|
sub welcomelist_address {my $self=shift; return $self->_fn_envelope(@_, -100, "welcomelisting address");}
|
|
*whitelist_address = \&welcomelist_address; # removed in 4.1
|
|
sub remove_address {my $self=shift; return $self->_fn_envelope(@_,undef, "removing address");}
|
|
###########################################################################
|
|
|
|
|
|
=head1 REPUTATION LOGICS
|
|
|
|
1. The most significant sender identificator is equally as at AWL, the
|
|
combination of the email address and the originating IP address, resp.
|
|
its part defined by the IPv4 resp. IPv6 mask setting.
|
|
|
|
2. No IP checking for standalone EMAIL address reputation
|
|
|
|
3. No signature checking for IP reputation, and for HELO name reputation
|
|
|
|
4. The EMAIL_IP weight, and not the standalone EMAIL weight is used when
|
|
no IP address is available (EMAIL_IP is the main indicator, and has
|
|
the highest weight)
|
|
|
|
5. No IP checking at signed emails (signature authenticates the email
|
|
instead of the IP address)
|
|
|
|
6. No IP checking at SPF pass (we assume the domain owner is responsible
|
|
for all IP's he authorizes to send from, hence we use the same identity
|
|
for all of them)
|
|
|
|
7. No signature used for standalone EMAIL reputation (would be redundant,
|
|
since no IP is used at signed EMAIL_IP reputation, and we would store
|
|
two identical hits)
|
|
|
|
8. When available, the DKIM signer is used instead of the domain name for
|
|
the DOMAIN reputation
|
|
|
|
9. No IP and no signature used for HELO reputation (despite the possibility
|
|
of the possible existence of multiple computers with the same HELO)
|
|
|
|
10. The full (unmasked IP) address is used (in the address field, instead the
|
|
IP field) for the standalone IP reputation
|
|
|
|
=cut
|
|
|
|
###########################################################################
|
|
sub check_senders_reputation {
|
|
###########################################################################
|
|
my ($self, $pms) = @_;
|
|
|
|
# just for the development debugging
|
|
# use Data::Printer;
|
|
# dbg("TxRep: DEBUG DUMP of pms: %s, %s", $pms, p($pms));
|
|
|
|
my $autolearn = defined $self->{autolearn};
|
|
$self->{last_pms} = $self->{autolearn} = undef;
|
|
$self->{pms} = $pms;
|
|
|
|
# Cases where we would not be able to use TxRep
|
|
if(not $self->{conf}->{use_txrep}) {
|
|
dbg("TxRep is disabled, quitting");
|
|
return 0;
|
|
}
|
|
if ($self->{conf}->{use_auto_welcomelist}) {
|
|
warn("TxRep: cannot run when Auto-Welcomelist is enabled. Please disable it!\n");
|
|
return 0;
|
|
}
|
|
if ($autolearn && !$self->{conf}->{txrep_autolearn}) {
|
|
dbg("TxRep: autolearning disabled, no more reputation adjusting, quitting");
|
|
return 0;
|
|
}
|
|
my @from = $pms->all_from_addrs();
|
|
if (@from && $from[0] eq 'ignore@compiling.spamassassin.taint.org') {
|
|
dbg("TxRep: no scan in lint mode, quitting");
|
|
return 0;
|
|
}
|
|
|
|
my $delta = 0;
|
|
my $timer = $self->{main}->time_method("total_txrep");
|
|
my $msgscore = (defined $self->{learning})? $self->{learning} : $pms->get_autolearn_points();
|
|
my $date = $pms->{msg}->receive_date() || $pms->{date_header_time};
|
|
my $msg_id = $self->{msgid} || $pms->{msg}->generate_msgid();
|
|
|
|
my $from = lc $pms->get('From:addr') || $pms->get('EnvelopeFrom:addr');
|
|
return 0 unless $from =~ /\S/;
|
|
my $domain = $from;
|
|
$domain =~ s/^.+@//;
|
|
|
|
# Find the last untrusted relay and populate helo and original IP
|
|
my ($origip, $helo);
|
|
if (defined $pms->{relays_trusted} || defined $pms->{relays_untrusted}) {
|
|
my $trusteds = @{$pms->{relays_trusted}};
|
|
foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) {
|
|
# Get the last found HELO, regardless of private/public or trusted/untrusted
|
|
# Avoiding a redundant duplicate entry if HELO is equal/similar to another identificator
|
|
if (defined $rly->{helo} &&
|
|
$rly->{helo} !~ /^\[?\Q$rly->{ip}\E\]?$/ &&
|
|
$rly->{helo} !~ /^\Q$domain\E$/i &&
|
|
$rly->{helo} !~ /^\Q$from\E$/i ) {
|
|
$helo = $rly->{helo};
|
|
}
|
|
# use only trusted ID, but use the first untrusted IP (if available) (AWL bug 6908)
|
|
# at low spam scores (<2) ignore trusted/untrusted
|
|
# set IP to 127.0.0.1 for any internal IP, so that it can be distinguished from none (AWL bug 6357)
|
|
if ((--$trusteds >= 0 || $msgscore<2) && !$msg_id && $rly->{id}) {$msg_id = $rly->{id};}
|
|
if (($trusteds >= -1 || $msgscore<2) && !$rly->{ip_private} && $rly->{ip}) {$origip = $rly->{ip};}
|
|
if ( $trusteds >= 0 && !$origip && $rly->{ip_private} && $rly->{ip}) {$origip = '127.0.0.1';}
|
|
}
|
|
}
|
|
|
|
# Look for previous scores of the same message, for instance when doing re-learning
|
|
if ($self->{conf}->{txrep_track_messages}) {
|
|
if ($msg_id) {
|
|
my $msg_rep = $self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, undef);
|
|
if (defined $msg_rep && ($self->count() > 0)) {
|
|
if (defined $self->{learning} && !defined $self->{forgetting}) {
|
|
# already learned, forget only if already learned (count>1), and relearn
|
|
# when only scanned (count=1), go ahead with normal rep scan
|
|
if ($self->count() > 1) {
|
|
$self->{last_pms} = $pms; # cache the pmstatus
|
|
$self->forget_message($pms->{msg},$msg_id); # sub reentrance OK
|
|
}
|
|
} elsif ($self->{forgetting}) {
|
|
$msgscore = $msg_rep; # forget the old stored score instead of the one got now
|
|
dbg("TxRep: forgetting stored score %0.3f of message %s", $msgscore || 'undef', $msg_id);
|
|
} else {
|
|
# calculating the delta from the stored message reputation
|
|
$delta = ($msgscore + $self->{conf}->{txrep_factor}*$msg_rep) / (1+$self->{conf}->{txrep_factor}) - $msgscore;
|
|
if ($delta != 0) {
|
|
$pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
|
|
}
|
|
dbg("TxRep: message %s already scanned, using old data; post-TxRep score: %0.3f", $msg_id, $pms->{score} || 'undef');
|
|
if (!defined $self->{txKeepStoreTied}) {
|
|
$self->finish();
|
|
}
|
|
return 0;
|
|
}
|
|
} # no stored reputation found, go ahead with normal rep scan
|
|
} else {dbg("TxRep: no message-id available, parsing forced");}
|
|
} # else no message tracking, go ahead with normal rep scan
|
|
|
|
# welcomelists recipients at senders from internal networks after checking MSG_ID only
|
|
if ( $self->{conf}->{txrep_welcomelist_out} &&
|
|
defined $pms->{relays_internal} && @{$pms->{relays_internal}} &&
|
|
(!defined $pms->{relays_external} || !@{$pms->{relays_external}})
|
|
) {
|
|
foreach my $rcpt ($pms->all_to_addrs()) {
|
|
if ($rcpt) {
|
|
dbg("TxRep: internal sender, welcomelisting recipient: $rcpt");
|
|
$self->modify_reputation($rcpt, -1*$self->{conf}->{txrep_welcomelist_out}, undef);
|
|
}
|
|
}
|
|
}
|
|
|
|
# Get the signing domain
|
|
my $signedby = ($self->{conf}->{auto_welcomelist_distinguish_signed})? $pms->get_tag('DKIMDOMAIN') : undef;
|
|
|
|
# Summary of all information we've gathered so far
|
|
dbg("TxRep: active, %s pre-score: %s, autolearn score: %s, IP: %s, address: %s %s",
|
|
$msg_id || '',
|
|
$pms->{score} || '?',
|
|
$msgscore || '?',
|
|
$origip || '?',
|
|
$from || '?',
|
|
$signedby ? "signed by $signedby" : '(unsigned)'
|
|
);
|
|
|
|
my $ip = $origip;
|
|
my $spf_domain;
|
|
if ($signedby) {
|
|
$ip = undef;
|
|
$domain = $signedby;
|
|
} elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf} && defined $pms->{spf_sender}) {
|
|
$ip = undef;
|
|
$spf_domain = $pms->{spf_sender};
|
|
$spf_domain =~ s/^.+@//;
|
|
$signedby = 'spf-'.$spf_domain;
|
|
dbg("TxRep: email signed by spf domain $spf_domain");
|
|
} elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) {
|
|
$ip = undef;
|
|
$signedby = 'spf';
|
|
}
|
|
|
|
my $totalweight = 0;
|
|
$self->{totalweight} = $totalweight;
|
|
|
|
# Get current reputation info
|
|
$delta += $self->check_reputations($pms, 'EMAIL_IP', $from, $ip, $signedby, $msgscore);
|
|
|
|
if ($domain) {
|
|
$delta += $self->check_reputations($pms, 'DOMAIN', $domain, $ip, $signedby, $msgscore);
|
|
}
|
|
if ($helo) {
|
|
$delta += $self->check_reputations($pms, 'HELO', $helo, undef, 'HELO', $msgscore);
|
|
}
|
|
if ($origip) {
|
|
if (!$signedby) {
|
|
$delta += $self->check_reputations($pms, 'EMAIL', $from, undef, undef, $msgscore);
|
|
}
|
|
$delta += $self->check_reputations($pms, 'IP', $origip, undef, undef, $msgscore);
|
|
}
|
|
|
|
# Learn against this message and store reputation
|
|
if (!defined $self->{learning}) {
|
|
$delta = ($self->{totalweight})? $self->{conf}->{txrep_factor} * $delta / $self->{totalweight} : 0;
|
|
if ($delta) {
|
|
$pms->got_hit("TXREP", "TXREP: ", ruletype => 'eval', score => sprintf("%0.3f", $delta));
|
|
}
|
|
$msgscore += $delta;
|
|
if (defined $pms->{score}) {
|
|
dbg("TxRep: post-TxRep score: %.3f", $pms->{score});
|
|
}
|
|
}
|
|
# Track message ID
|
|
if ($self->{conf}->{txrep_track_messages} && $msg_id) {
|
|
$self->check_reputations($pms, 'MSG_ID', $msg_id, undef, $date, $msgscore);
|
|
}
|
|
# Close any open resources
|
|
if (!defined $self->{txKeepStoreTied}) {
|
|
$self->finish();
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub check_reputations {
|
|
###########################################################################
|
|
my $self = shift;
|
|
my $delta;
|
|
|
|
if ($self->open_storages()) {
|
|
if ($self->{conf}->{txrep_user2global_ratio} && $self->{user_storage} != $self->{global_storage}) {
|
|
my $user = $self->check_reputation('user_storage', @_);
|
|
my $global = $self->check_reputation('global_storage',@_);
|
|
|
|
if (defined $user and $user == $user) {
|
|
$delta = ( $self->{conf}->{txrep_user2global_ratio} * $user + $global ) / ( 1 + $self->{conf}->{txrep_user2global_ratio} );
|
|
} else {
|
|
$delta = $global;
|
|
}
|
|
} else {
|
|
$delta = $self->check_reputation(undef,@_);
|
|
}
|
|
}
|
|
return $delta;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub check_reputation {
|
|
###########################################################################
|
|
my ($self, $storage, $pms, $key, $id, $ip, $signedby, $msgscore) = @_;
|
|
|
|
my $delta = 0;
|
|
my $weight = ($key eq 'MSG_ID') ? 1 : $pms->{main}->{conf}->{'txrep_weight_'.lc($key)};
|
|
|
|
# {
|
|
# #Bug 7164, trying to find out reason for these: _WARN: Use of uninitialized value $msgscore in addition (+) at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
|
|
# no warnings;
|
|
#
|
|
# unless (defined $msgscore) {
|
|
# #Output some params and the calling function so we can identify more about this bug
|
|
# dbg("TxRep: MsgScore Undefined (bug 7164) - check_reputation Parameters: self: $self storage: $storage pms: $pms, key: $key, id: $id, ip: $ip, signedby: $signedby, msgscore: $msgscore");
|
|
# dbg("TxRep: MsgScore Undefined (bug 7164) - weight: $weight");
|
|
#
|
|
# my ($package, $filename, $line) = caller();
|
|
#
|
|
# chomp($package);
|
|
# chomp($filename);
|
|
# chomp($line);
|
|
#
|
|
# dbg("TxRep: MsgScore Undefined (bug 7164) - Caller Info: Package: $package - Filename: $filename - Line: $line");
|
|
#
|
|
# #Define $msgscore as a triage to hide warnings while we find the root cause
|
|
# #$msgscore = 0;
|
|
# }
|
|
# }
|
|
|
|
|
|
if (defined $weight && $weight) {
|
|
my $meanrep;
|
|
my $timer = $self->{main}->time_method('check_txrep_'.lc($key));
|
|
|
|
if (defined $storage) {
|
|
$self->{checker} = $self->{$storage};
|
|
}
|
|
my $found = $self->get_sender($id, $ip, $signedby);
|
|
my $tag_id = (defined $storage)? uc($key.'_'.substr($storage,0,1)) : uc($key);
|
|
# TEMPLATE TAGS should match [A-Z] in their name
|
|
# and "_" must be avoided
|
|
$tag_id =~ s/_//g;
|
|
if (defined $found && ($self->count() > 0)) {
|
|
$meanrep = $self->total() / $self->count();
|
|
}
|
|
if ($self->{learning} && defined $msgscore) {
|
|
if (defined $meanrep) {
|
|
# $msgscore<=>0 gives the sign of $msgscore
|
|
$msgscore += ($msgscore<=>0) * abs($meanrep);
|
|
}
|
|
dbg("TxRep: reputation: %s, count: %d, learning: %s, $tag_id: %s",
|
|
defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
|
|
$self->count() || 0,
|
|
$self->{learning} || '',
|
|
$id || 'none'
|
|
);
|
|
} else {
|
|
$self->{totalweight} += $weight;
|
|
if ($key eq 'MSG_ID' && ($self->count() > 0)) {
|
|
$delta = $self->total() / $self->count();
|
|
$pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f", $delta));
|
|
} elsif (defined $self->total()) {
|
|
#Bug 7164 - $msgscore undefined
|
|
# in some cases we can have negative number
|
|
# even if both total and $msgscore are positive numbers
|
|
my $deltacheck;
|
|
my $skipmsgscore = 0;
|
|
if(defined $msgscore) {
|
|
$deltacheck = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;
|
|
if(($self->total() > 0) && ($msgscore > 0) && ($deltacheck < 0)) {
|
|
$skipmsgscore = 1;
|
|
} elsif(($self->total() < 0) && ($msgscore < 0) && ($deltacheck > 0)) {
|
|
$skipmsgscore = 1;
|
|
}
|
|
}
|
|
if($skipmsgscore) {
|
|
dbg("TxRep: skipping msg score $msgscore when calculating delta");
|
|
}
|
|
if (defined $msgscore and not $skipmsgscore) {
|
|
$delta = $deltacheck;
|
|
} else {
|
|
$delta = ($self->total()) / (1 + $self->count());
|
|
}
|
|
|
|
$pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f", $delta));
|
|
if (defined $meanrep) {
|
|
$pms->set_tag('TXREP'.$tag_id.'MEAN', sprintf("%2.1f", $meanrep));
|
|
}
|
|
$pms->set_tag('TXREP'.$tag_id.'COUNT', sprintf("%2.1f", $self->count()));
|
|
$pms->set_tag('TXREP'.$tag_id.'PRESCORE', sprintf("%2.1f", $pms->{score}));
|
|
} else {
|
|
$pms->set_tag('TXREP'.$tag_id.'UNKNOWN', 1);
|
|
}
|
|
dbg("TxRep: reputation: %s, count: %d, weight: %.1f, delta: %.3f, $tag_id: %s",
|
|
defined $meanrep? sprintf("%.3f",$meanrep) : 'none',
|
|
$self->count() || 0,
|
|
$weight || 0,
|
|
$delta || 0,
|
|
$id || 'none'
|
|
);
|
|
|
|
if ($self->{conf}->{txrep_report_details}
|
|
&& defined $id && defined $meanrep && $tag_id ne "MSGID") {
|
|
|
|
my $log = sprintf("%s: %s",
|
|
$tag_id,
|
|
(defined $ip) ? $id."|".$self->ip_to_awl_key($ip) : $id
|
|
);
|
|
|
|
if ($self->{conf}->{txrep_report_details} == 2) {
|
|
$log .= sprintf(", rep: %.2f, count: %d",
|
|
$meanrep,
|
|
$self->count() || 0
|
|
);
|
|
}
|
|
|
|
$pms->test_log($log, "TXREP");
|
|
# dbg ("TxRep: test_log: $log");
|
|
}
|
|
|
|
}
|
|
$timer = $self->{main}->time_method('update_txrep_'.lc($key));
|
|
if (defined $msgscore) {
|
|
if ($self->{forgetting}) { # forgetting a message score
|
|
$self->remove_score($msgscore); # remove the given score and decrement the count
|
|
if ($key eq 'MSG_ID') { # remove the message ID score completely
|
|
$self->{checker}->remove_entry($self->{entry});
|
|
}
|
|
} else {
|
|
$self->add_score($msgscore); # add the score and increment the count
|
|
if ($self->{learning} && $key eq 'MSG_ID' && $self->count() eq 1) {
|
|
$self->add_score($msgscore); # increasing the count by 1 at a learned score (count=2)
|
|
} # it can be distinguished from a scanned score (count=1)
|
|
}
|
|
} elsif (defined $found && $self->{forgetting} && $key eq 'MSG_ID') {
|
|
$self->{checker}->remove_entry($self->{entry}); #forgetting the message ID
|
|
}
|
|
}
|
|
if (!defined $storage) {
|
|
$self->{checker} = $self->{default_storage};
|
|
}
|
|
|
|
return ($weight || 0) * ($delta || 0);
|
|
}
|
|
|
|
|
|
|
|
#--------------------------------------------------------------------------
|
|
# Database handler subroutines
|
|
#--------------------------------------------------------------------------
|
|
|
|
###########################################################################
|
|
sub count {my $self=shift; return (defined $self->{checker})? $self->{entry}->{msgcount} : 0;}
|
|
sub total {my $self=shift; return (defined $self->{checker})? $self->{entry}->{totscore} : undef;}
|
|
###########################################################################
|
|
|
|
|
|
###########################################################################
|
|
sub get_sender {
|
|
###########################################################################
|
|
my ($self, $addr, $origip, $signedby) = @_;
|
|
|
|
return unless (defined $self->{checker});
|
|
|
|
my $fulladdr = $self->pack_addr($addr, $origip);
|
|
my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
|
|
$self->{entry} = $entry;
|
|
$origip = $origip || 'none';
|
|
|
|
if ($entry->{msgcount}<0 || $entry->{msgcount}=~/^(nan|)$/ || $entry->{totscore}=~/^(nan|)$/) {
|
|
warn "TxRep: resetting bad data for ($addr, $origip), count: $entry->{msgcount}, totscore: $entry->{totscore}\n";
|
|
$self->{entry}->{msgcount} = $self->{entry}->{totscore} = 0;
|
|
}
|
|
return $self->{entry}->{msgcount};
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub add_score {
|
|
###########################################################################
|
|
my ($self,$score) = @_;
|
|
|
|
return unless (defined $self->{checker}); # no factory defined; we can't check
|
|
|
|
if ($score != $score) {
|
|
warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
|
|
return; # don't try to add a NaN
|
|
}
|
|
$self->{entry}->{msgcount} ||= 0;
|
|
|
|
# performing the dilution aging correction
|
|
if (defined $self->total() && defined $self->count() && $self->count() > 0 && defined $self->{txrep_dilution_factor}) {
|
|
my $diluted_total =
|
|
($self->count() + 1) *
|
|
($self->{txrep_dilution_factor} * $self->total() + $score) /
|
|
($self->{txrep_dilution_factor} * $self->count() + 1);
|
|
my $corrected_score = $diluted_total - $self->total();
|
|
$self->{checker}->add_score($self->{entry}, $corrected_score);
|
|
} else {
|
|
$self->{checker}->add_score($self->{entry}, $score);
|
|
}
|
|
}
|
|
|
|
|
|
|
|
###########################################################################
|
|
sub remove_score {
|
|
###########################################################################
|
|
my ($self,$score) = @_;
|
|
|
|
return unless (defined $self->{checker}); # no factory defined; we can't check
|
|
|
|
if ($score != $score) { # don't try to add a NaN
|
|
warn "TxRep: attempt to add a $score to TxRep entry ignored\n";
|
|
return;
|
|
}
|
|
# no reversal dilution aging correction (not easily possible),
|
|
# just removing the original message score
|
|
if ($self->{entry}->{msgcount} > 2)
|
|
{$self->{entry}->{msgcount} -= 2;}
|
|
else {$self->{entry}->{msgcount} = 0;}
|
|
# subtract 2, and add a score; hence decrementing by 1
|
|
$self->{checker}->add_score($self->{entry}, -1*$score);
|
|
}
|
|
|
|
|
|
|
|
###########################################################################
|
|
sub modify_reputation {
|
|
###########################################################################
|
|
my ($self, $addr, $score, $signedby) = @_;
|
|
|
|
return unless (defined $self->{checker}); # no factory defined; we can't check
|
|
my $fulladdr = $self->pack_addr($addr, undef);
|
|
my $entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
|
|
|
|
# remove any old entries (will remove per-ip entries as well)
|
|
# always call this regardless, as the current entry may have 0
|
|
# scores, but the per-ip one may have more
|
|
$self->{checker}->remove_entry($entry);
|
|
|
|
# remove address only, no new score to add if score NaN or undef
|
|
if (defined $score && $score==$score) {
|
|
# else add score. get a new entry first
|
|
$entry = $self->{checker}->get_addr_entry($fulladdr, $signedby);
|
|
$self->{checker}->add_score($entry, $score);
|
|
}
|
|
return 1;
|
|
}
|
|
|
|
|
|
# connecting the primary and the secondary storage; needed only on the first run
|
|
# (this can't be in the constructor, since the settings are not available there)
|
|
###########################################################################
|
|
sub open_storages {
|
|
###########################################################################
|
|
my $self = shift;
|
|
|
|
# Enabled per bug 7191 comment 18
|
|
return 1 unless (!defined $self->{default_storage});
|
|
|
|
return 1 if defined ($self->{checker});
|
|
|
|
my $factory;
|
|
if ($self->{main}->{pers_addr_list_factory}) {
|
|
$factory = $self->{main}->{pers_addr_list_factory};
|
|
} else {
|
|
my $type = $self->{conf}->{txrep_factory};
|
|
if ($type =~ /^[_A-Za-z0-9:]+$/) {
|
|
$type = untaint_var($type);
|
|
eval '
|
|
require '.$type.';
|
|
$factory = '.$type.'->new();
|
|
1;
|
|
' or do {
|
|
my $eval_stat = $@ ne '' ? $@ : "errno=$!"; chomp $eval_stat;
|
|
warn "TxRep: $eval_stat\n";
|
|
undef $factory;
|
|
};
|
|
$self->{main}->set_persistent_address_list_factory($factory) if $factory;
|
|
} else {warn "TxRep: illegal factory setting\n";}
|
|
}
|
|
if (defined $factory) {
|
|
$self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main});
|
|
|
|
if ($self->{conf}->{txrep_user2global_ratio} && !defined $self->{global_storage}) {
|
|
# hack to handle the BDB and SQL factory types of the storage object
|
|
# TODO: add an a method to the handler class instead
|
|
my ($storage_type, $is_global);
|
|
|
|
if (index(ref($factory), 'SQLBasedAddrList') >= 0) {
|
|
$is_global = defined $self->{conf}->{user_awl_sql_override_username};
|
|
$storage_type = 'SQL';
|
|
if ($is_global && $self->{conf}->{user_awl_sql_override_username} eq $self->{main}->{username}) {
|
|
# skip double storage if current user same as the global override
|
|
$self->{user_storage} = $self->{global_storage} = $self->{default_storage};
|
|
}
|
|
} elsif (index(ref($factory), 'DBBasedAddrList') >= 0) {
|
|
$is_global = index($self->{conf}->{auto_welcomelist_path}, '__userstate__') == -1;
|
|
$storage_type = 'DB';
|
|
}
|
|
if (!defined $self->{global_storage}) {
|
|
my $sql_override_orig = $self->{conf}->{user_awl_sql_override_username};
|
|
my $awl_path_orig = $self->{conf}->{auto_welcomelist_path};
|
|
if ($is_global) {
|
|
$self->{conf}->{user_awl_sql_override_username} = '';
|
|
$self->{conf}->{auto_welcomelist_path} = '__userstate__/tx-reputation';
|
|
$self->{global_storage} = $self->{default_storage};
|
|
$self->{user_storage} = $factory->new_checker($self->{main});
|
|
} else {
|
|
$self->{conf}->{user_awl_sql_override_username} = 'GLOBAL';
|
|
$self->{conf}->{auto_welcomelist_path} = '__local_state_dir__/tx-reputation';
|
|
$self->{global_storage} = $factory->new_checker($self->{main});
|
|
$self->{user_storage} = $self->{default_storage};
|
|
}
|
|
$self->{conf}->{user_awl_sql_override_username} = $sql_override_orig;
|
|
$self->{conf}->{auto_welcomelist_path} = $awl_path_orig;
|
|
|
|
# Another ugly hack to find out whether the user differs from
|
|
# the global one. We need to add a method to the factory handlers
|
|
if ($storage_type eq 'DB' &&
|
|
$self->{user_storage}->{locked_file} eq $self->{global_storage}->{locked_file}) {
|
|
if ($is_global)
|
|
{$self->{global_storage}->finish();}
|
|
else {$self->{user_storage}->finish();}
|
|
$self->{user_storage} = $self->{global_storage} = $self->{default_storage};
|
|
}
|
|
}
|
|
}
|
|
} else {
|
|
$self->{user_storage} = $self->{global_storage} = $self->{checker} = $self->{default_storage} = undef;
|
|
warn("TxRep: could not open storages, quitting!\n");
|
|
return 0;
|
|
}
|
|
return 1;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub finish {
|
|
###########################################################################
|
|
my $self = shift;
|
|
|
|
return unless (defined $self->{checker}); # no factory defined; we can't check
|
|
|
|
if ($self->{conf}->{txrep_user2global_ratio} && defined $self->{user_storage} && ($self->{user_storage} != $self->{global_storage})) {
|
|
$self->{user_storage}->finish();
|
|
$self->{global_storage}->finish();
|
|
$self->{user_storage} = undef;
|
|
$self->{global_storage} = undef;
|
|
} elsif (defined $self->{default_storage}) {
|
|
$self->{default_storage}->finish();
|
|
}
|
|
$self->{default_storage} = $self->{checker} = undef;
|
|
$self->{factory} = undef;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub ip_to_awl_key {
|
|
###########################################################################
|
|
my ($self, $origip) = @_;
|
|
|
|
my $result;
|
|
local $1;
|
|
if (!defined $origip) {
|
|
# could not find an IP address to use
|
|
} elsif ($origip =~ /^ (\d{1,3} \. \d{1,3}) \. \d{1,3} \. \d{1,3} $/xs) {
|
|
my $mask_len = $self->{conf}->{txrep_ipv4_mask_len};
|
|
$mask_len = 16 if !defined $mask_len;
|
|
# handle the default and easy cases manually
|
|
if ($mask_len == 32) {$result = $origip;}
|
|
elsif ($mask_len == 16) {$result = $1;}
|
|
else {
|
|
my $origip_obj = NetAddr::IP->new($origip . '/' . $mask_len);
|
|
if (!defined $origip_obj) { # invalid IPv4 address
|
|
dbg("TxRep: bad IPv4 address $origip");
|
|
} else {
|
|
$result = $origip_obj->network->addr;
|
|
$result =~s/(\.0){1,3}\z//; # truncate zero tail
|
|
}
|
|
}
|
|
} elsif (index($origip, ':') >= 0 && # triage
|
|
$origip =~
|
|
/^ [0-9a-f]{0,4} (?: : [0-9a-f]{0,4} | \. [0-9]{1,3} ){2,9} $/xsi) {
|
|
# looks like an IPv6 address
|
|
my $mask_len = $self->{conf}->{txrep_ipv6_mask_len};
|
|
$mask_len = 48 if !defined $mask_len;
|
|
my $origip_obj = NetAddr::IP->new6($origip . '/' . $mask_len);
|
|
if (!defined $origip_obj) { # invalid IPv6 address
|
|
dbg("TxRep: bad IPv6 address $origip");
|
|
} else {
|
|
$result = $origip_obj->network->full6; # string in a canonical form
|
|
$result =~ s/(:0000){1,7}\z/::/; # compress zero tail
|
|
}
|
|
} else {
|
|
dbg("TxRep: bad IP address $origip");
|
|
}
|
|
if (defined $result && length($result) > 39) { # just in case, keep under
|
|
$result = substr($result,0,39); # the awl.ip field size
|
|
}
|
|
# if (defined $result) {dbg("TxRep: IP masking %s -> %s", $origip || '?', $result || '?');}
|
|
return $result;
|
|
}
|
|
|
|
|
|
###########################################################################
|
|
sub pack_addr {
|
|
###########################################################################
|
|
my ($self, $addr, $origip) = @_;
|
|
|
|
$addr = lc $addr;
|
|
$addr =~ s/[\000\;\'\"\!\|]/_/gs; # paranoia
|
|
|
|
if ( defined $origip) {$origip = $self->ip_to_awl_key($origip);}
|
|
if (!defined $origip) {$origip = 'none';}
|
|
if ( $self->{conf}->{txrep_welcomelist_out} &&
|
|
defined $self->{pms}->{relays_internal} && @{$self->{pms}->{relays_internal}} &&
|
|
(!defined $self->{pms}->{relays_external} || !@{$self->{pms}->{relays_external}})
|
|
and $addr =~ /(?:[^\s\@]+)\@(?:[^\s\@]+)/) {
|
|
$origip = 'WELCOMELIST_OUT';
|
|
}
|
|
return $addr . "|ip=" . $origip;
|
|
}
|
|
|
|
|
|
=head1 LEARNING SPAM / HAM
|
|
|
|
When SpamAssassin is told to learn (or relearn) a given message as spam or
|
|
ham, all reputations relevant to the message (email, email_ip, domain, ip, helo)
|
|
in both global and user storages will be updated using the C<txrep_learn_penalty>
|
|
respectively the C<rxrep_learn_bonus> values. The new reputation of given sender
|
|
property (email, domain,...) will be the respective result of one of the following
|
|
formulas:
|
|
|
|
new_reputation = old_reputation + learn_penalty
|
|
new_reputation = old_reputation - learn_bonus
|
|
|
|
The TxRep plugin currently does track each message individually, hence it
|
|
does not detect when you learn the message repeatedly. It will add/subtract
|
|
the penalty/bonus score each time the message is fed to the spam learner.
|
|
|
|
=cut
|
|
|
|
######################################################### plugin hook #####
|
|
sub learner_new {
|
|
###########################################################################
|
|
my ($self) = @_;
|
|
|
|
$self->{txKeepStoreTied} = undef;
|
|
return $self;
|
|
}
|
|
|
|
|
|
######################################################### plugin hook #####
|
|
sub autolearn {
|
|
###########################################################################
|
|
my ($self, $params) = @_;
|
|
|
|
$self->{last_pms} = $params->{permsgstatus};
|
|
return $self->{autolearn} = 1;
|
|
}
|
|
|
|
|
|
######################################################### plugin hook #####
|
|
sub learn_message {
|
|
###########################################################################
|
|
my ($self, $params) = @_;
|
|
return 0 unless (defined $params->{isspam});
|
|
|
|
dbg("TxRep: learning a message");
|
|
my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
|
|
if (!defined $pms->{relays_internal} && !defined $pms->{relays_external}) {
|
|
$pms->extract_message_metadata();
|
|
}
|
|
|
|
if ($params->{isspam})
|
|
{$self->{learning} = $self->{conf}->{txrep_learn_penalty};}
|
|
else {$self->{learning} = -1 * $self->{conf}->{txrep_learn_bonus};}
|
|
|
|
my $ret = !$self->{learning} || $self->check_senders_reputation($pms);
|
|
$self->{learning} = undef;
|
|
return $ret;
|
|
}
|
|
|
|
|
|
######################################################### plugin hook #####
|
|
sub forget_message {
|
|
###########################################################################
|
|
my ($self, $params) = @_;
|
|
return 0 unless ($self->{conf}->{use_txrep});
|
|
my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main}, $params->{msg});
|
|
|
|
dbg("TxRep: forgetting a message");
|
|
$self->{forgetting} = 1;
|
|
my $ret = $self->check_senders_reputation($pms);
|
|
$self->{forgetting} = undef;
|
|
return $ret;
|
|
}
|
|
|
|
|
|
######################################################### plugin hook #####
|
|
sub learner_expire_old_training {
|
|
###########################################################################
|
|
my ($self, $params) = @_;
|
|
return 0 unless ($self->{conf}->{use_txrep} && $self->{conf}->{txrep_expiry_days});
|
|
|
|
dbg("TxRep: expiry not implemented yet");
|
|
# dbg("TxRep: expiry starting");
|
|
# my $timer = $self->{main}->time_method("expire_bayes");
|
|
# $self->{store}->expire_old_tokens($params);
|
|
# dbg("TxRep: expiry completed");
|
|
}
|
|
|
|
|
|
######################################################### plugin hook #####
|
|
sub learner_close {
|
|
###########################################################################
|
|
my ($self, $params) = @_;
|
|
my $quiet = $params->{quiet};
|
|
return 0 unless ($self->{conf}->{use_txrep});
|
|
|
|
$self->{txKeepStoreTied} = undef;
|
|
$self->finish();
|
|
dbg("TxRep: learner_close");
|
|
}
|
|
|
|
|
|
=head1 OPTIMIZING TXREP
|
|
|
|
TxRep can be optimized for speed and simplicity, or for the precision in
|
|
assigning the reputation scores.
|
|
|
|
First of all TxRep can be quickly disabled and re-enabled through the option
|
|
L</C<use_txrep>>. It can be done globally, or individually in each respective
|
|
C<user_prefs>. Disabling TxRep will not destroy the database, so it can be
|
|
re-enabled any time later again.
|
|
|
|
On many systems, SQL-based storage may perform faster than the default
|
|
Berkeley DB storage, so you should consider setting it up.
|
|
|
|
Then there are multiple settings that can reduce the number of records stored
|
|
in the database, hence reducing the size of the storage, and also the processing
|
|
time:
|
|
|
|
1. Setting L</C<txrep_user2global_ratio>> to zero will disable the dual storage,
|
|
halving so the disk space requirements, and the processing times of this plugin.
|
|
|
|
2. You can disable all but one of the L<REPUTATION WEIGHTS>. The EMAIL_IP is
|
|
the most specific option, so it is the most likely choice in such case, but you
|
|
could base the reputation system on any of the remaining scores. Each of the
|
|
enabled reputations adds a new entry to the database for each new identificator.
|
|
So while for example the number of recorded and scored domains may be big, the
|
|
number of stored IP addresses will be probably higher, and would require more
|
|
space in the storage.
|
|
|
|
3. Disabling the L</C<txrep_track_messages>> avoids storing a separate entry
|
|
for every scanned message, hence also reducing the disk space requirements, and
|
|
the processing time.
|
|
|
|
4. Disabling the option L</C<txrep_autolearn>> will save the processing time
|
|
at messages that trigger the auto-learning process.
|
|
|
|
5. Disabling L</C<txrep_welcomelist_out>> will reduce the processing time at
|
|
outbound connections.
|
|
|
|
6. Keeping the option L</C<auto_welcomelist_distinguish_signed>> enabled may help
|
|
slightly reducing the size of the database, because at signed messages, the
|
|
originating IP address is ignored, hence no additional database entries are
|
|
needed for each separate IP address (resp. a masked block of IP addresses).
|
|
|
|
|
|
Since TxRep reuses the storage architecture of the former AWL plugin, for
|
|
initializing the SQL storage, the same instructions apply also to TxRep.
|
|
Although the old AWL table can be reused for TxRep, by default TxRep expects
|
|
the SQL table to be named "txrep".
|
|
|
|
To install a new SQL table for TxRep, run the appropriate SQL file for your
|
|
system under the /sql directory.
|
|
|
|
If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
|
|
instead of ENGINE=MyISAM at the end of the command. You can also use other
|
|
types of ENGINE (depending on what is available on your system). For example
|
|
MEMORY engine stores the entire table in the server memory, achieving
|
|
performance similar to Redis. You would need to care about the replication
|
|
of the RAM table to disk through a cronjob, to avoid loss of data at reboot.
|
|
The InnoDB engine is used by default, offering high scalability (database
|
|
size and concurrence of accesses). In conjunction with a high value of
|
|
innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also
|
|
offer performance comparable to Redis.
|
|
|
|
=cut
|
|
|
|
1;
|