Commit Graph

56 Commits

Author SHA1 Message Date
Thomas Lamprecht
5d551f5e2a influxdb: rework comment
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-26 17:38:22 +02:00
Lorenz Stechauner
456be47171 fix #3440: influxdb: remove duplicate vmid tag
remove vmid from data part, it is already contained in object part.
this is accomplished by adding the parameter $excluded to
build_influxdb_payload().

Signed-off-by: Lorenz Stechauner <l.stechauner@proxmox.com>
2021-05-26 17:36:54 +02:00
Dominik Csapak
bc33c73963 metrix: influx: fix default api_prefix
we set the api prefix by default to '/' so we always triggered
the the replacement and added '///' which is wrong and does not
work for the 'health' api path
(influxdb returns 404 for 'https://ip:port///health')

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-03-15 15:28:11 +01:00
Dominik Csapak
c7777408ea metrics: influx: special case 'health' api path i _get_v2url
the forwards compatible api of 1.8 only contains this path
(not api/v2/health) and it it also contained in the v2 api

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-03-15 15:26:46 +01:00
Thomas Lamprecht
23c9eaf63d metrics: influx: allow one to add an API URL-path prefix
I normally use a reverse proxy in front of my influxdb instances,
proxying all from the /influx/ path to the only locally listening
influxdb. So here I'd need to set "influx" as api-path-prefix.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-15 14:45:05 +01:00
Thomas Lamprecht
6e5405fb21 metrics: influx: do not error out when credendtials could not be loaded
Not a hard error, some network box (proxy) down the line could add it
for us, or it could be just not required, so ...

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-15 14:45:05 +01:00
Thomas Lamprecht
bb35a833d1 metrics: influx: send along auth token on connection test too
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-15 14:45:05 +01:00
Thomas Lamprecht
9f8d8f2b05 metrics: influx: include unrecognized protocol value in error
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-15 14:45:05 +01:00
Thomas Lamprecht
f8d1d5ad9a ext. metris. fixup InfluxDB spelling in schema and code style
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-13 21:30:03 +01:00
Thomas Lamprecht
c2162150f1 metric status: fix misspelled method call
reported in:
https://forum.proxmox.com/threads/typo-in-influxdb-pm.85017/

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-27 16:11:09 +01:00
Dominik Csapak
6b6eb15c7d status/influxdb: remove unnecessary comment
we already have that information in the reference docs, no need to
have it here as well

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-01-28 17:32:23 +01:00
Dominik Csapak
ccb614311d status/influxdb: implement influxdb 2.x http api
needs an organization/bucket (previously db) and an optional token
the http client does not fit exactly in the connect/send/disconnect
scheme, so it simply creates a request in 'connect',
does the actual http connection in 'send' and nothing in 'disconnect'

max-body-size is set to 25.000.000 bytes by default (the influxdb default)
and the timeout to 1 second (same as default graphite tcp timeout)

the token (if given) gets saved in /etc/pve/priv/metricserver/$ID.pw
it is optional, because the 1.8.x compatibility api does not need
authentication (in contrast to influxdb 2.x)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-01-28 17:32:23 +01:00
Dominik Csapak
27bc5e8e02 status/plugin: extend with add/update/delete hooks
like we do in it for the storage section configs

we will need this to store the token for influxdbs http api

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-01-28 17:32:23 +01:00
Dominik Csapak
fa97819773 status/plugin: extend send/_connect/_disconnect/test_connection
by providing the id or cfg to have better context in those methods
we will need that for influxdb http api

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-01-28 17:32:23 +01:00
Dominik Csapak
5a7252df23 status/plugin: do not test connection if disabled
so that if one disables the plugin (e.g. because it is offline),
it will work even when the server is not reachable

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-12-07 17:20:54 +01:00
Dominik Csapak
dadba141a8 api: metrics/server: test connection on add/update
just a basic check, but better than not checking at all

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-11-25 14:55:25 +01:00
Dominik Csapak
b22cbac9bf api: metrics/server: add minimum and maximum to port schema
we just added the api, so it would be good to only accept valid ports
(they were wrapped before)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-11-25 12:31:42 +01:00
Thomas Lamprecht
2c4bf90ff0 influxdb: avoid three line comment if one is enough
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-21 20:39:33 +01:00
Dominik Csapak
2f6cc103e0 Status/Plugin: add id to schema
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-11-20 14:00:16 +01:00
Dominik Csapak
fbe4599246 Status/Plugin: fix jsonschema for MTU
jsonschema wants 'minimum' not 'min'

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-11-20 14:00:16 +01:00
Thomas Lamprecht
acff3d6eec follouwp whitespace fix
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-29 09:12:22 +01:00
Fabian Grünbichler
0fc553eb1b status/metrics: make MTU configurable
since some users don't even have a full 1500 (and some systems might
have links with bigger MTU and not require as much fragmentation).

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-10-29 08:51:51 +01:00
Thomas Lamprecht
fab600b796 fix #2802: metric flush check should not care about current usage
We only need to check if the next data addition brings us over the
batch send size, not if we have already at least half of that data in
there, as else we may get again over the batch sent size.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-06-17 10:16:27 +02:00
Thomas Lamprecht
1d5c5ba19a acme: account: hide TOS checkbox during load and reuse field references
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-14 16:28:39 +02:00
Thomas Lamprecht
45dbb18177 ext. metric server: workaround stupid MTU problem..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-08 17:22:35 +02:00
Thomas Lamprecht
5c77a34f08 metric server: improve flush on big data updates
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-08 17:15:44 +02:00
Thomas Lamprecht
db2ce4886c status/graphite: fix memory leak, avoid cyclic closure reference
The data passed to this closure was never free'd, depending on the
count of VM/CTs one could get >1 MB of RSS (!) memory leaked per
statd status cycle update run...

We could also use Scalar::Util's weaken, to weak a copy of this
variable, but as a simple undef works lets do that with a comment..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-19 09:11:47 +01:00
Thomas Lamprecht
1e4ae7d44c fixup: graphite: use correct variable in closure
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-19 09:10:54 +01:00
Thomas Lamprecht
87be2c19e3 ext. metric: move to a transaction model
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-18 19:04:29 +01:00
Thomas Lamprecht
f1f4bfefc7 move common metric server management part to own module
For now it only handles the plugin registration and the two recently
integrated helpers.
But, this is a prepartation to move the external metrics server
update mechanic from a stateless always-newly-connect-send-disconnect
to a statefull transaction based mechanis; see later patches

keep the PVE::Status::Plugin use in pvestatd, as we read the cfs
hosted status.cfg there, and the parser is defined by the common
status plugin base module.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-16 16:19:42 +01:00
Thomas Lamprecht
e051836377 status: cleanup config parser regsistration
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-15 10:49:05 +01:00
Thomas Lamprecht
68f58b5d59 status plugins: add _connect to plugin method interface
in preparation of doing real transactions, with one batch connect +
send + disconnect, and not hundreds of those per update cycle..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-14 19:27:28 +01:00
Thomas Lamprecht
5e82aaac89 status plugins: add update_all and foreach_plug helper
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-14 19:24:24 +01:00
Thomas Lamprecht
108e0c8b9f status/graphite: refactor write_graphite to send all at once
Instead of doing multiple sends, for each status metric line one,
assemble it all in a string and send it out in a single go.
Per VM/CT/Node we had >10 lines to send, so this is quite the
reduction. But, also note that thanks to Nagler's delay algorithm
this may not had a big effect for TCP, as it buffered those small
writes anyhow.
For UDP it can reduce the packet count on the line dramatically,
though.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-07 18:59:06 +01:00
Thomas Lamprecht
8dcf2cac46 status/graphite: just use setsockopt to set timeouts
after rethinking this it felt weird, sockets already can to this
themself, so I checked out the IO::Socket::Timeout module, and yeah,
it's just a OOP wrapper for this, hiding the "scary" struct pack.

So instead of adding that as dependency lets do it ourself.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-07 16:27:50 +01:00
Thomas Lamprecht
228f017ee4 status/graphite: record missing module-use
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-07 16:26:54 +01:00
Thomas Lamprecht
c44a0c01c8 status/graphite: reduce default timeout to 1 second
This is for TCP only, and TCP needs roughly 1.5 time of the Round
Trip Time for connection setup, So, with 1 second timeout we're still
good for connections with 660 ms latency in-between.

The assumption is that most of the time the status server is
relatively near (same datacenter, or region), and connections to it
are datacenter grade, and not like a spotty GPRS modem.
So, reduce this timeout to ensure that we do not block to long.

If anybody needs higher timeouts they can just change the default
anyway.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-06 21:06:00 +01:00
Thomas Lamprecht
dd4268e50e status/graphite: refactor default assignments, no ternary
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-06 21:02:59 +01:00
Martin Verges
2c927f113b add graphite tcp support
This change allows sending statistics to graphite over TCP.

So far only UDP is possible, which is not available in some environments, like behind a loadbalancer.

Configuration example:
~ $ cat /etc/pve/status.cfg

graphite:
    server 10.20.30.40
    port 2003
    path proxmox
    proto tcp
    timeout 3

Signed-off-by: Martin Verges <martin.verges@croit.io>
2019-11-06 20:48:35 +01:00
Thomas Lamprecht
fa6f371649 cleanup Status plugins use statements
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-04-11 09:22:04 +02:00
Thomas Lamprecht
0bd2dc09e4 followup: code cleanup, remove unnecessary type check
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-04-11 09:11:07 +02:00
Dominik Csapak
035452b97f fix #1326: allow multiple status server definitions per type
we allow an id like storage.cfg but leave it optional (so we do not
break existing configs):

 influxdb: name

so that one can export the data to multiple servers of the same type

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-04-11 08:11:43 +02:00
Dominik Csapak
8077d94a02 fix #2030: use looks_like_number for number check
since numbers can also be in '1.e-10' format, we have to change
how we check for a number

Scalar::Util is already core and we use it in PVE::Tools, so
no new dependecy.

in case of "NaN" or "Infinity" we omit the key/value pair

else we quote like before

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-01-07 15:01:18 +01:00
Dominik Csapak
5c90e08ab0 Graphite.pm: fix whitespace
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-03-05 14:21:50 +01:00
Dominik Csapak
5a5aed73e2 fix #1683: do not send non-numeric values to graphite
the graphite daemons which accept the data (carbon), only
accepts numeric values, and logs all invalid lines

since that were about 5 values per vm/ct this generated lot of noise
in the carbon log

so we check with a regex if a value is numeric, and
additionally we have a blacklist of keys which seem to be numeric but
are either boolean (e.g. template) or a state (e.g. pid)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-03-05 14:21:50 +01:00
Thomas Lamprecht
09f19204be InfluxDB plugins: send nodename when updating CT/VM status
This allows filtering by node in InfluxDB queries, so the statistics
of all virtual guests on a specific nodes can be queried.

While for InfluxDB this is only a tag which does changes where the
data is stored, Graphite - our other status plugin - has no such
mechanics available. If we would add it to the object hierarchy,
e.g.: "qemu.$vmid.$nodename" a migration of a VM would result in two
different datasets.
So avoid breaking setups and omit it for Graphite for now.

Suggested-by: Daniel1108 <danielgallegosanchez@gmail.com>
CC: Daniel1108 <danielgallegosanchez@gmail.com>

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-02-28 11:28:10 +01:00
Thomas Lamprecht
8f46103543 Makefile: fix distclean target
As some Makefiles in sub directories do not implement the distclean
target, namely:
PVE/Service/Makefile
PVE/CLI/Makefile

This target is broken.

As all other implementations just redirect to the 'clean' target I
do not implement the missing ones but rather remove all such
targets. Keep it just in the top level directory, for consistence
sake with other pve repos, and redirect it there directly to the
clean target.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-09-28 08:21:48 +02:00
Dominik Csapak
7d7f77ba06 fix influxdb field assignment and allow non integer field
this patch fixes an issue where we assemble the influxdb
key value pairs to the wrong measurement

and also we did only allow integer fields,
excluding all cpu,load and wait measurements

this patch fixes both issues with a rewrite of the
recursive build_influxdb_payload sub

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-08-02 13:15:47 +02:00
Thomas Lamprecht
f014da61c5 Status: report errors on socket creation problems
If the socket couldn't be created (e.g. FQDN not resolvable) we
continued witouth any hint, when actualy writing the data we then
die'd. The user then does not really know why, so report errors
if the socket creation failed.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-05-23 09:17:42 +02:00
Thomas Lamprecht
4caf47e9e2 Status: allow IPs and move properties to base class
We only allowed servers with the dns-name format, as such status
server may often be in internal networks and with no hostname
(testing, small network so no dns, ...) do not limit the
configuration possibilities with no reason.

Also move the base property part to the base Status class, all
current plugins use server and port so no need for double
declaration of format/descriptions.

If a future plugin doesn't need them it can omit them by not
returning the respective properties in the options method
inherited by SectionConfig.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-05-23 09:16:33 +02:00