The API server proxies HTTP requests in two cases:
- between cluster nodes (pveproxy->pveproxy)
- between daemons on one node for protected API endpoints
(pveproxy->pvedaemon)
The API server uses AnyEvent::HTTP for proxying, with unfortunate
settings for connection reuse (details below). With these settings,
long-running synchronous API requests on the proxy destination's side
can cause unrelated proxied requests to fail with a misleading HTTP
599 "Too many redirections" error response. In order to avoid these
errors, improve the connection reuse settings.
In more detail:
Per default, AnyEvent::HTTP reuses previously-opened connections for
requests with idempotent HTTP verbs, e.g. GET/PUT/DELETE [1]. However,
when trying to reuse a previously-opened connection, it can happen
that the destination unexpectedly closes the connection. In case of
idempotent requests, AnyEvent::HTTP's http_request will retry by
recursively calling itself. Since the API server disallows recursion
by passing `recurse => 0` to http_request initially, the recursive
call fails with "HTTP 599 Too many redirections".
This can happen both for pveproxy->pveproxy and pveproxy->pvedaemon,
as connection reuse is enabled in both cases. Connection reuse being
enabled in the pveproxy->pvedaemon case was likely not intended: A
comment mentions that "keep alive for localhost is not worth it", but
only sets `keepalive => 0` and not `persistent => 0`. This setting
switches from HTTP/1.1 persistent connections to HTTP/1.0-style
keep-alive connections, but still allows connection reuse.
The destination unexpectedly closing the connection can be due to
unfortunate timing, but it becomes much more likely in case of
long-running synchronous requests. An example sequence:
1) A pveproxy worker P1 handles a protected request R1 and proxies it
to a pvedaemon worker D1, opening a pveproxy worker->pvedaemon
worker connection C1. The pvedaemon worker D1 is relatively fast
(<1s) in handling R1. P1 saves connection C1 for later reuse.
2) A different pveproxy worker P2 handles a protected request R2 and
proxies it to the same pvedaemon worker D1, opening a new pveproxy
worker->pvedaemon connection C2. Handling this request takes a long
time (>5s), for example because it queries a slow storage. While
the request is being handled, the pvedaemon worker D1 cannot do
anything else.
3) Since pvedaemon worker D1 sets a timeout of 5s when accepting
connections and it did not see anything on connection C1 for >5s
(because it was busy handling R2), it closes the connection C1.
4) pveproxy worker P1 handles a protected idempotent request R3. Since
the request is idempotent, it tries to reuse connection C1. But C1
was just closed by D1, so P1 fails request R3 with HTTP 599 as
described above.
In addition, AnyEvent::HTTP's default of reusing connections for all
idempotent HTTP verbs is problematic in our case, as not all PUT
requests of the PVE API are actually idempotent, e.g. /sendkey [2].
To fix the issues above, improve the connection reuse settings:
a) Actually disable connection reuse for pveproxy->pvedaemon requests,
by passing `persistent => 0`.
b) For pveproxy->pveproxy requests, enable connection reuse for GET
requests only, as these should be actually idempotent.
c) If connection reuse is enabled, allow one retry by passing `recurse
=> 1`, to avoid the HTTP 599 errors.
With a) and b), the API server will reuse connections less often,
which can theoretically result in a performance drop. To gain
confidence that the performance impact is tolerable, here are the
results of a simple benchmark.
The benchmark runs hey [3] against a virtual 3-node PVE cluster, with
or without the patch applied. It performs 10000 requests in 2 worker
threads to `PUT $HTTP_NODE:8006/api2/json/nodes/$PROXY_NODE/config`
with a JSON payload that sets a 32KiB ASCII `description`. The
shortened hey invocation:
hey -H "$TOKEN" -m PUT -T application/json -D payload.json \
--disable-keepalive -n 10000 -c 2 "$URL"
The endpoint was chosen because it is performs little work (locks and
writes a config file), it is protected (to test behavior change a)),
and it is a PUT endpoint (to test behavior change b)).
The command is ran two times:
- With $HTTP_NODE == $PROXY_NODE for pveproxy->pvedaemon proxying
- With $HTTP_NODE != $PROXY_NODE for pveproxy->pveproxy->pvedaemon
proxying
For each invocation, we record the response times.
Without this patch:
$HTTP_NODE == $PROXY_NODE
Slowest: 0.0215 secs
Fastest: 0.0061 secs
Average: 0.0090 secs
0.006 [1] |
0.008 [2409] |■■■■■■■■■■■■■■■■■■■■■■■■
0.009 [4065] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.011 [1781] |■■■■■■■■■■■■■■■■■■
0.012 [1024] |■■■■■■■■■■
0.014 [414] |■■■■
0.015 [196] |■■
0.017 [85] |■
0.018 [21] |
0.020 [2] |
0.022 [2] |
$HTTP_NODE != $PROXY_NODE
Slowest: 0.0584 secs
Fastest: 0.0075 secs
Average: 0.0105 secs
0.007 [1] |
0.013 [8445] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.018 [1482] |■■■■■■■
0.023 [56] |
0.028 [5] |
0.033 [1] |
0.038 [0] |
0.043 [0] |
0.048 [0] |
0.053 [5] |
0.058 [5] |
With this patch:
$HTTP_NODE == $PROXY_NODE
Slowest: 0.0194 secs
Fastest: 0.0062 secs
Average: 0.0088 secs
0.006 [1] |
0.007 [1980] |■■■■■■■■■■■■■■■■■■■
0.009 [4134] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.010 [1874] |■■■■■■■■■■■■■■■■■■
0.011 [1406] |■■■■■■■■■■■■■■
0.013 [482] |■■■■■
0.014 [93] |■
0.015 [16] |
0.017 [5] |
0.018 [4] |
0.019 [5] |
$HTTP_NODE != $PROXY_NODE
Slowest: 0.0369 secs
Fastest: 0.0091 secs
Average: 0.0121 secs
0.009 [1] |
0.012 [5711] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.015 [3392] |■■■■■■■■■■■■■■■■■■■■■■■■
0.017 [794] |■■■■■■
0.020 [79] |■
0.023 [16] |
0.026 [3] |
0.029 [2] |
0.031 [0] |
0.034 [1] |
0.037 [1] |
Comparing the averages, there is
- little difference when $HTTP_NODE == $PROXY_NODE (0.009s vs
0.0088s). So for pveproxy->pvedaemon proxying, the effect of
disabling connection reuse seems negligible.
- ~15% overhead when $HTTP_NODE != $PROXY_NODE (0.0105s vs 0.0121s).
Such an increase for pveproxy->pveproxy->pvedaemon proxying is not
nothing, but in real-world workloads I'd expect the response time
for non-idempotent requests to be dominated by other factors.
[1] https://metacpan.org/pod/AnyEvent::HTTP#persistent-=%3E-$boolean
[2] https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}/sendkey
[3] https://github.com/rakyll/hey
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
There where some changes w.r.t. allowing downloads in response making
that a bit stricter, the package versions before the break are not
compatible with that stricter behavior.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this was only used by PMG's HttpServer and for non-API file responses. all of
those got dropped there in favour of always returning an object like
{
data => {
download => {
[download info here]
},
[..],
},
[..],
}
in case of PMG, or passing in a download hash in case of APIServer internal
calls.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
only a few API endpoints should allow downloads, mark them explicitly and
forbid downloading for the rest.
Fixes: 6d832db ("allow 'download' to be passed from API handler")
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Add support for compressing the body of responses with
`Content-Encoding: deflate` following [RFC9110]. Note that in this
context `deflate` is actually a "zlib" data format as defined in
[RFC1950].
To preserve the current behavior we prefer `Content-Encoding: gzip`
whenever `gzip` is listed as one of the encodings in the
`Accept-Encoding` header and the data should be compressed.
[RFC9110] https://www.rfc-editor.org/rfc/rfc9110#name-deflate-coding
[RFC1950] https://www.rfc-editor.org/rfc/rfc1950
Suggested-by: Lukas Wagner <l.wagner@proxmox.com>
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Tested-by: Folke Gleumes <f.gleumes@proxmox.com>
ALLOW_FROM/DENY_FROM accept any syntax understood by Net::IP. However,
if an IP range like "10.1.1.1-10.1.1.3" is configured, a confusing
Perl warning is printed to the syslog on a match:
Use of uninitialized value in concatenation (.) or string at [...]
The reason is that we use Net::IP::prefix to prepare a debug message,
but this returns undef if a range was specified. To avoid the warning,
use Net::IP::print to obtain a string representation instead.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
set_min/max_proto_version is recommended upstream nowadays, and it seems to be
required for some reason if *only* TLS v1.3 is supposed to be enabled.
querying via get_options gives us the union of
- system-wide openssl defaults
- our internal SSL defaults
- flags configured by the user via /etc/default/pveproxy
note that by default only 1.2 and 1.3 are enabled in the first place, so
disabling either leaves a single version being set as min and max.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
OpenSSL as packaged in Debian bookworm now ships a compat symlink for
the "combined" CA certificates file (CAfile) as managed by
update-ca-certificates. This symlink is in addition to the CApath
one that has been around for a file. The new symlink in turn gets
picked up by openssl-using code that uses the default values for the
trust store.
Every TLS context initialization now reads the full combined file,
even if no TLS is actually employed on a connection. We do such an
initialization for every proxied connection (where our HTTP server is
the client).
By specifying an explicit CA path (that is identical to the default
one), the old behaviour of looking up each CA certificate
individually iff needed is enabled again.
For an API endpoint where HTTP request handling is the bottle neck
(as opposed to the actual API handler), this improves performance of
proxied requests to be back in line with unproxied ones handled
directly by the unprivileged daemon. For all proxied requests, CPU
usage is decreased as well.
The default CAfile and CApath contain the same certificates, so there
should be no change in trusted certificates. Additionally,
certificate fingerprints are pinned in this context and verified
against the cache of pinned fingerprints.
Reported-by: Roland Kletzing <roland.kletzing@cybercon.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
when installing AnyEvent::AIO (by the package libanyevent-aio-perl),
the worker forks of our daemons using AnyEvent would consume 100% cpu
cycles while trying to do an epoll_wait which no one read from. It
was not really clear which part of the code set that fd up.
Reading the documentation of the related perl modules, it became
clear that the issue was with AnyEvent::IO. By default this uses
AnyEvent::AIO (if installed) which in turn uses IO::AIO which
explicitly says it uses pthreads and is not really fork compatible
(which we rely heavy upon).
It seems that IO::AIO sets up some fds with epoll in the END handler
of it's library (or earlier, but sends data to it in the END
handler), so that when using 'exit' instead of 'POSIX::_exit' (which
we do in PVE::Daemon) creates the observed behavior.
Interestingly we did not use any of AnyEvent::IO's functionality, so
we can safely remove it. Even if we would have used it in the past,
without AnyEvent::AIO the IO would not have been async anyway (the
pure perl impl doesn't do async IO). My best guess is that we wanted
to use it, but noticed that we can't, and forgot to remove the use
statement. (This is indicated by a comment that says aio_load is not
async unless IO::AIO is used)
This only occurs now, since bookworm is the first debian release to
package the library.
if we ever wanted to use AnyEvent::AIO, there are probably two other
ways that could fix it:
* replace our 'exit()' calls with 'POSIX::_exit()', which seems to
fix it, but other side effects are currently unknown
* use 'IO::AIO::reinit()' after forking, which also seems to fix it,
but perldoc says it 'is not an operation supported by any
standards, but happens to work on GNU/LINUX and some newer BSD
systems'
With this fix, one can safely install 'libanyevent-aio-perl' and
'libperl-languageserver-perl' (the only user of it AFAICS) on a
Proxmox VE or Proxmox Mail Gateway system.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In case the actual request-body is empty it seems not Content-Type
header is set by browsers.
Tested on a vm with stopping and starting a container via GUI
(/api2/extjs/nodes/<nodename>/lxc/<vmid>/status/stop)
fixes f398a3d94b
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
since there is no other way to get an array parameter when using
x-www-form-urlencoded content type
the previous format with \0 separated strings (known as '-alist' format)
should not be used anymore (in favor of the now supported arrays)
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
instead of always trying to encode them as x-www-form-urlencoded
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
This prohibits the cookie from being sent along in cross-site
sub-requests or when the user navigates to a different site.
Signed-off-by: Max Carrara <m.carrara@proxmox.com>
Since v5.13, URI::Escape handles the 'unsafe characters' parameter
differently than before, i.e., enforcing what is documented [0]:
The set is specified as a string that can be used in a regular
expression character class (between [ ]).
So, the leading/trailing [] were never supposed to be there.
Note that since v5.15 we could also pass a qr// regex object.
[0]: 1a4ed66802
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ T: Add details and mention regex objects ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
commas can be used in two ways, quoting Perl Best Practices (PBP):
> The comma actually has two distinct roles in Perl. In a scalar
> context, it is (as those former C programmers expect) a sequencing
> operator: “do this, then do that”. But in a list context, such as
> the argument list of a print, the comma is a list separator, not
> technically an operator at all.
-- PBP, page 69
And the separating variant is called a "junior semicolon" by PBP.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This is an internal parameter and we pass the actual internal one
around via the $reqstate variable, so avoid confusion and return a
clear error if a POST request sets this query parameter.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Suggested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Until now, we calculated the MD5 hash of any uploaded file during the
upload, regardless of whether the user chose to provide a hash sum
and algorithm. The hash was only logged in the syslog.
As the user can provide a hash algorithm and a checksum when
uploading a file, which gets automatically checked (after the
upload), this is not needed anymore. Instead, the file name is
logged.
Depending on the speed of the network and the cpu, upload speed or
CPU usage might improve: All tests were made by uploading a 3.6GB iso
from the PVE host to a local VM. First line is with md5, second
without.
no networklimit
multipart upload complete (size: 3826831360B time: 20.310s rate: 179.69MiB/s md5sum: 8c651682056205967d530697c98d98c3)
multipart upload complete (size: 3826831360B time: 16.169s rate: 225.72MiB/s filename: ubuntu-22.04.1-desktop-amd64.iso)
125MB/s network
In this test, pveproxy worker used x % CPU during the upload. As you can see, the reduced CPU usage is noticable in slower networks.
~75% CPU: multipart upload complete (size: 3826831360B time: 30.764s rate: 118.63MiB/s md5sum: 8c651682056205967d530697c98d98c3)
~60% CPU: multipart upload complete (size: 3826831360B time: 30.763s rate: 118.64MiB/s filename: ubuntu-22.04.1-desktop-amd64.iso)
qemu64 cpu, no network limit
multipart upload complete (size: 3826831360B time: 46.113s rate: 79.14MiB/s md5sum: 8c651682056205967d530697c98d98c3)
multipart upload complete (size: 3826831360B time: 41.492s rate: 87.96MiB/s filename: ubuntu-22.04.1-desktop-amd64.iso)
qemu64, -aes, 1 core, 0.7 cpu
multipart upload complete (size: 3826831360B time: 79.875s rate: 45.69MiB/s md5sum: 8c651682056205967d530697c98d98c3)
multipart upload complete (size: 3826831360B time: 66.364s rate: 54.99MiB/s filename: ubuntu-22.04.1-desktop-amd64.iso)
Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>
[ T: reflow text-width and slightly add to subject ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
As reported in the forum, multipart requests are parsed incorrectly if
the file part header contains *only* Content-Disposition, but no other
fields (in particular, no Content-Type). As a result, uploaded files
are mangled: In most cases, an additional carriage return and line
feed (\r\n) is prepended to the file contents.
As an example, consider the following file part (with explicit \r\n
for clarity):
Content-Disposition: form-data; name=...; filename=...\r\n
Content-Type: application/x-iso9660-image\r\n
\r\n
file contents...
The current parsing code for file parts roughly works as follows:
1) Consume the Content-Disposition field including the trailing \r\n
2) Consume and ignore everything up to and including the next \r\n\r\n
3) Read the file contents
This works fine in the example above. However, it has a bug in case
Content-Disposition is the *only* header field:
Content-Disposition: form-data; name=...; filename=...\r\n
\r\n
file contents...
Now, step 1 already consumes the first half of the \r\n\r\n sequence
that marks the end of the part headers. As a result, step 3 starts
reading the file at a wrong offset:
- If the remaining contents of the read buffer (currently sized 16KiB)
contain \r\n\r\n, step 2 consumes everything up to and including
this marker and step 3 starts reading file contents there. As a
result, the uploaded file is truncated at its beginning.
- Otherwise, step 2 is a noop and step 3 considers the remaining
second half of the \r\n\r\n marker to be part of the file contents.
As a result, the uploaded file is prepended with an extra \r\n.
To fix this, modify step 1 to *not* consume the trailing \r\n. This
keeps the \r\n\r\n marker intact, no matter whether additional header
fields are present or not.
Fixes: 3e3faddb4a
Link: https://forum.proxmox.com/threads/125411/
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Allow HTTP connections up until the request's header has been
parsed and processed. If no TLS handshake has been completed
beforehand, the server now responds with either a
'301 Moved Permanently' or a '308 Permanent Redirect' as noted in the
MDN web docs[1].
This is done after the header was parsed; for the redirect to work,
the `Host` header field of the request is used to create the
`Location` field of the response. This makes redirections independent
of how the server is accessed (e.g. via IP, localhost, FQDN, ...)
possible.
Upon redirection the client is immediately disconnected; otherwise,
they would have to wait for the connection to time out until
they may reconnect via TLS again.
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/301
Signed-off-by: Max Carrara <m.carrara@proxmox.com>
The part responsible for authentication and subsequent request
handling is moved into the new `authenticate_and_handle_request`
subroutine.
If `authenticate_and_handle_request` doesn't return early, it returns
`1` for further control flow purposes.
Some minor things are formatted or renamed for readability's sake.
Signed-off-by: Max Carrara <m.carrara@proxmox.com>