mirror of
https://git.proxmox.com/git/proxmox-spamassassin
synced 2025-06-03 12:42:05 +00:00
214 lines
9.0 KiB
Plaintext
214 lines
9.0 KiB
Plaintext
|
|
Using A SQL Database for Bayesian Storage Module
|
|
-------------------------------------------------------
|
|
|
|
SpamAssassin can now store users' bayesian filter data in a SQL
|
|
database. The most common use for a system like this would be for
|
|
users to be able to have per user bayesian filter data on systems
|
|
where users may not have a home directory to store the data.
|
|
|
|
In order to activate the SQL based bayesian storage you have to
|
|
configure spamassassin and spamd to use a different bayes storage
|
|
module. This can be done via a setting in the global configuration
|
|
file.
|
|
|
|
The directives required to turn on the SQL based bayesian storage are:
|
|
|
|
bayes_store_module Mail::SpamAssassin::BayesStore::SQL
|
|
|
|
This directive is used by the Bayes module to determine which storage
|
|
module should be used. If not set it will default to:
|
|
Mail::SpamAssassin::BayesStore::DBM
|
|
|
|
The storage module Mail::SpamAssassin::BayesStore::SQL is an older generic
|
|
SQL module which can be also be used with versions of MySQL which did not
|
|
have support for an InnoDB engine and transactions. If choosing this module
|
|
consider replacing the InnoDB engine with MyISAM (explicitly or defaulted)
|
|
in the schema (files bayes_mysql.sql and awl_mysql.sql). Note that old
|
|
versions of MySQL expect syntax TYPE=MyISAM instead of ENGINE=MyISAM,
|
|
while newer versions throw a syntax error on TYPE and only allow ENGINE.
|
|
In short: replace ENGINE=InnoDB with TYPE=MyISAM (or just leave it out)
|
|
in the bayes_mysql.sql and awl_mysql.sql schemas if ENGINE=InnoDB is not
|
|
accepted.
|
|
|
|
There is also a MySQL specific storage driver available to provides a
|
|
small boost in performance. It requires version 4.1 or above of the
|
|
MySQL database software to work properly. In addition, it provides
|
|
rollback on error functionality if you create your bayes database table
|
|
using the InnoDB storage engine. WARNING: Using this module with a version
|
|
of MySQL < 4.1 could have unexpected results. To use the MySQL 4.1+
|
|
specific module set your bayes_store_module directive accordingly:
|
|
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
|
|
|
|
PostgreSQL users will want to use the PostgreSQL specific storage
|
|
module:
|
|
bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
|
|
This module provides a slightly different interface to makes better
|
|
use of the resources that PostgreSQL offers. In addition, please make
|
|
sure that you follow the instructions below for loading the proper
|
|
procedural language and installing the tables and stored procedure.
|
|
|
|
Additional configuration directives provided by BayesSQL:
|
|
|
|
bayes_sql_dsn DBI:driver:database:hostname[:port]
|
|
bayes_sql_username dbusername
|
|
bayes_sql_password dbpassword
|
|
|
|
The bayes_sql_dsn directive describes the data source name that will
|
|
be used to create the connection to your SQL server. It MUST be in
|
|
the format as listed above. <driver> should be the DBD driver that
|
|
you have installed to access your database (initially tested with
|
|
MySQL, PostgreSQL, and SQLite). <database> must be the name of the
|
|
database that you created to store the bayes data tables. <hostname>
|
|
is the name of the host that contains the SQL database server.
|
|
<port> is the optional port number where your database server is
|
|
listening.
|
|
|
|
For an example of connection to PostgreSQL, see the main README file.
|
|
|
|
In addition to the global configuration directives there is a user
|
|
preference:
|
|
|
|
bayes_sql_override_username someusername
|
|
|
|
This directive, if used, will override the username used for storing
|
|
data in the database. This could be used to group users together to
|
|
share bayesian filter data. You can also use this config option to
|
|
trick sa-learn to learn data as a specific user.
|
|
|
|
|
|
Requirements
|
|
------------
|
|
|
|
In order for SpamAssassin to work with your SQL database, you must
|
|
have the perl DBI module installed, AS WELL AS the DBD driver/module
|
|
for your specific database. For example, if using MySQL as your
|
|
RDBMS, you must have the DBD::mysql module installed, a PostgreSQL
|
|
database requires DBD::Pg. Check CPAN for latest versions of DBI
|
|
and your database driver/module.
|
|
|
|
The BayesStore::SQL module was tested with:
|
|
|
|
DBI-1.38
|
|
DBD-mysql-2.9002
|
|
perl v5.8.0
|
|
|
|
But older (and newer) versions should work fine as the SQL code in
|
|
SpamAssassin is as simple as could be.
|
|
|
|
NOTE: Some users have reported problems using DBD::Pg v1.22. There appears
|
|
to be a bug in how parameters are quoted in that version, we recommend you
|
|
upgrade to at least v1.31.
|
|
|
|
In addition, there appears to be a quote bug in some versions of PostgreSQL.
|
|
It's unclear when the bug was fixed. 7.4.2 seems to work just fine, however
|
|
some have reported problems with 7.3.6. Your mileage may vary with versions
|
|
less than 7.4.2. If you happen to know when the specific bug was fixed please
|
|
feel free to notify the dev team so they can update this documentation.
|
|
|
|
Database Schema
|
|
---------------
|
|
|
|
The database schema for storage of the bayesian filter data contains
|
|
several different tables. Several sample SQL schemas have been
|
|
included in to help in setting up your database. The schemas contain
|
|
the minimum tables and columns necessary to work with the code as
|
|
written. You are free to add other columns as needed for your local
|
|
implementation. Presently there is no way to override the table and
|
|
column names used by the BayesStore::SQL code, this feature may be added
|
|
in the future.
|
|
|
|
Example setup of bayes tables for MySQL:
|
|
|
|
This assumes that you have already created a database for use with
|
|
spamassassin and setup a username/password that can access that database.
|
|
(See "Creating A Database", in "sql/README", if you don't have a suitable
|
|
database ready.)
|
|
|
|
To install the tables using the included example, use the following command:
|
|
|
|
mysql -h <hostname> -u <adminusername> -p databasename < bayes_mysql.sql
|
|
Enter password: <adminpassword>
|
|
|
|
To install the tables for PostgreSQL, use:
|
|
|
|
You need to install the plpgsql procedural language (assuming it
|
|
hasn't already been installed in template1). You can do this by
|
|
running the following command:
|
|
|
|
createlang plpgsql <databasename>
|
|
|
|
and then:
|
|
|
|
psql -U <username> -f bayes_pg.sql <databasename>
|
|
|
|
From a security viewpoint, it is wise to grant the minimum privileges
|
|
needed to the user on the bayes tables. These grants largely depend
|
|
on your system policies but here are some example grants to get you
|
|
started, please review carefully before putting into production:
|
|
|
|
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE bayes_token TO <username>;
|
|
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE bayes_vars TO <username>;
|
|
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE bayes_seen TO <username>;
|
|
GRANT SELECT, DELETE, INSERT ON TABLE bayes_expire TO <username>;
|
|
GRANT SELECT ON TABLE bayes_global_vars TO <username>;
|
|
GRANT USAGE, SELECT, UPDATE ON SEQUENCE bayes_vars_id_seq TO <username>;
|
|
|
|
Once you have created the database and added the tables, just add the
|
|
required lines to your global configuration file (local.cf).
|
|
|
|
Testing SpamAssassin/SQL
|
|
------------------------
|
|
|
|
To test your SQL setup, and debug any possible problems, you should
|
|
start spamd with the -D option, which will keep spamd in the
|
|
foreground, and will output debug message to the terminal. You should
|
|
then test spamd with a message by calling spamc. You can use the
|
|
sample-spam.txt file with the following command:
|
|
|
|
spamc < sample-spam.txt
|
|
|
|
Watch the debug output from spamd and look for the following debug
|
|
line:
|
|
|
|
debug: bayes: Database connection established
|
|
debug: bayes: Using username: <username>
|
|
|
|
If you do not see the above text, then the SQL query was not
|
|
successful, and you should see any error messages reported.
|
|
|
|
This code has been tested using MySQL as the RDMS, with basic tests
|
|
against PostgreSQL and SQLite. It does require a database that allows
|
|
you to refer to a column on the right hand side of an expression (ie
|
|
update foo set bar = bar + 1). Any database driver that allows for
|
|
that usage should work with the BayesStore::SQL code. NOTE: You may
|
|
find that some implementations do not provide a significant advantage
|
|
over using the default DBM implementation. If you find a driver that
|
|
should work and has issues, please report them to the SADev list.
|
|
|
|
Converting Bayes Data From a DBM Database
|
|
-----------------------------------------
|
|
|
|
Converting your bayes database data from Berkeley (DBM) based storage
|
|
to SQL based storage is as simple as a backup and then restore.
|
|
|
|
If you are upgrading from a previous version of SpamAssassin you
|
|
should first follow any recommended upgrade instructions for that
|
|
release, in most cases this will be as simple as running an
|
|
sa-learn --sync
|
|
|
|
Once you have performed this upgrade, for each bayes database follow
|
|
this procedure:
|
|
|
|
o Run 'sa-learn --backup > backup.txt' which will backup your bayes
|
|
data into a text file.
|
|
o Optionally you can run 'sa-learn --clear' to remove the DBM based
|
|
bayes files.
|
|
o Modify your local.cf file according to the directions above.
|
|
o Run 'sa-learn --restore backup.txt' to restore your bayes data to
|
|
the SQL database.
|
|
|
|
NOTE: sa-learn must be run as the user who's data you are loading, or
|
|
you must make use of the bayes_sql_override_username config
|
|
option.
|