else it's not actually possible to define ACLs on them, which means they are
effectively root only instead of allowing their intended permission scheme.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Before 0f3d14d6 ("auth: tfa: read tfa.cfg also if the user.cfg entry
has no "x" marker"), `user_get_tfa` failed if the realm required TFA,
but the user's `keys` attribute was empty. Since 0f3d14d6,
`user_get_tfa` fails if the realm requires TFA, but neither user.cfg
nor tfa.cfg define any second factors for that user.
However, both before and after 0f3d14d6, a realm that requires TOTP
allows a user to login without a second factor if they have at least
one configured factor in tfa.cfg and all factors are disabled -- for
example if they have only a disabled TOTP factor. This behavior is
unwanted, as users can then circumvent the realm-mandated TFA
requirement by disabling their own TOTP factor.
This happens because a user with a disabled TOTP factor in tfa.cfg
passes the check in `user_get_tfa`. Hence, `authenticate_2nd_new_do`
proceeds to call `authentication_challenge`, which does not generate a
challenge (and returns undef) because the user has no enabled factors.
Consequently, `authenticate_2nd_new_do` returns undef and allows login
without a second factor.
Note that this does not happen for realms that require Yubico TFA,
because for these realms, `authenticate_2nd_new_do` does not call
`authentication_challenge` and instead generates a challenge in any
case, regardless of whether the user has enabled Yubico factors or
not.
This patch fixes the issue by moving the check out of `user_get_tfa`,
and instead letting `authenticate_2nd_new_do` fail if the realm
requires TFA but `authentication_challenge` generates no challenge
(returns undef). This also saves a call to `api_list_user_tfa` that
was introduced in 0f3d14d6.
This patch still allows users to login with a recovery key to a realm
that requires TFA , which is the intended behavior.
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Alternatively we could potentially move the realm-tfa check to after
`authentication_challenge`.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Previously, `user_get_tfa` read the `keys` user attribute from
user.cfg to determine whether a user has second factors configured.
`keys` could contain TOTP secrets or Yubico key IDs (for realms that
require TFA), or the marker "x" to signify that second factors are
defined in tfa.cfg, in which case `user_get_tfa` would additionally
read tfa.cfg.
However, syncing an LDAP realm with `remove-vanished=properties`
erases the `keys` attribute, and thus the "x" marker (unless custom
`sync_attributes` with a mapping for `keys` are defined). This would
allow TFA-enabled users to log in without a second factor after a
realm sync. This issue was first reported in the forum [1].
To fix this issue, `user_get_tfa` now reads tfa.cfg unconditionally,
and thus independently of the value of `keys`. In other words, the "x"
marker is now irrelevant for authentication. The reasoning for this
change is that most current setups define second factors in tfa.cfg
anyway.
Special care is needed to avoid breaking realms that require TFA: In
that case, `user_get_tfa` must fail authentication if neither user.cfg
nor tfa.cfg define any second factors.
This patch changes the behavior of a hypothetical (and not officially
supported) LDAP realm setup in which `sync_attributes: keys=attr` and
`remove-vanished=properties` is used to maintain `keys` in the LDAP
directory. In such a setup, an admin could enable/disable TFA for a
user who has an enabled second factor in tfa.cfg by editing their LDAP
entry and switching between "x" and "". With this patch, TFA is always
enabled for that user.
This patch makes the "x" marker irrelevant for authentication, but PVE
still *writes* it if the user has second factors configured in
tfa.cfg. This behavior is kept for now to avoid issues in cluster
upgrade scenarios, where some nodes that still rely on the "x" marker
could allow logins without a second factor.
[1] https://forum.proxmox.com/threads/130440/
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
All nodes should be new enough, especially as this is understood
since pve-manager 7.0-15 and users must upgrade to 7.4 before
upgrading to Proxmox VE 8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
uses the privileges:
Mapping.Use
Mapping.Modify
Mapping.Audit
on /mapping/{TYPE}/{id}
so that we can assign privileges on resource level
this will generate new roles (PVEMappingUser, PVEMappingAdmin,
PVEMappingAuditor)
note that every user with Permissions.Modify on '/' and propagate can add these
new roles to themselves
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
to reduce the chances of accidentally handing out privilege modification
privileges. the old default setup of having Permissions.Modify in PVESysAdmin
and PVEAdmin weakened the distinction between those roles and Administrator.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Previously `authentication_verify` just `die`d on error and
would only return a boolean whether `priv/tfa.cfg` needs
updating as a positive result.
Since we want to support locking TOTP as well as a general
TFA lock-out via the config, we also want to be able to tell
when this occurs. Most of it is handled by the TFA rust
crate already, but notifying users needs to be done on this
end instead.
In pve-rs we now have a different API for this:
`authentication_verify2`, which, instead of die()ing on
errors, always returns a hash containing the result as well
as the flags 'tfa-limit-reached' and 'totp-limit-reached'
which, if set, tell us to notify the user.
However, doing so will introduce new fields in the
`priv/tfa.cfg` in a struct marked as `deny_unknown_fields`,
so in a cluster, the limits & notification handling should
only be done once we can be sure that all nodes are up to
date.
These fields are only introduced on login errors, so for
now, handle a failed result early without saving
`priv/tfa.cfg`.
The only case where saving the file was previously required
was when *successfully* logging in with a recovery key, by
which we cannot be reaching a limit, so this should still be
safe.
Once we can validate that all cluster nodes are up to date,
we can implement the notification system.
A commented-out code structure for this is included in this
patch.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
by switching to a tree-based in-memory structure, like we do in PBS.
instead of parsing ACL entries into a hash using the full ACL path as key for
each entry, parse them into a tree-like nested hash. when evaluating ACLs,
iterating over all path prefixes starting at '/' is needed anyway, so this is a
more natural way to store and access the parsed configuration.
some performance data, timing `pveum user permissions $user > /dev/null` for
various amounts of ACL entries in user.cfg
entries | stock | patched | speedup
-------------------------------------
1k | 1.234s | 0.241s | 5.12
2k | 4.480s | 0.262s | 17.09
20k | 7m25s | 0.987s | 450.86
similarly, an /access/ticket request such as the one happening on login goes
down from 4.27s to 109ms with 2k entries (testing with 20k entries fails
because the request times out after 30s, but with the patch it takes 336ms).
the underlying issue is that these two code paths not only iterate over *all
defined ACL paths* to get a complete picture of a user's/token's privileges,
but the fact that that ACL computation for each checked path itself did another
such loop in PVE::AccessControl::roles().
it is enough to iterate over the to-be-checked ACL path in a component-wise
fashion in order to handle role propagation, e.g., when looking at /a/b/c/d,
iterate over
/
/a
/a/b
/a/b/c
/a/b/c/d
in turn instead of all defined ACL paths.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
just above, we check & return if $tfa_challenge is set, so there is no
way that it would be set here. To make it clearer that it must be undef
here pass it as such.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we currently only need to lock the tfa config when we got a recovery key
as a tfa challenge response, since that's the only thing that can
actually change the tfa config (every other method only reads from
there).
so to do that, factor out the code that was inside the lock, and call it
with/without lock depending on the tfa challenge response
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
there is a (hard to trigger) race that can cause a double rotation of
the auth key, with potentially confusing fallout (various processes on
different nodes having an inconsistent view of the current and previous
auth keys, resulting in "random" invalid ticket errors until the next
proper key rotation 24h later).
the underlying cause is that `stat()` calls are excempt from our
otherwise non-cached/direct_io handling of pmxcfs I/O, which allows the
following sequence of events to take place:
LAST: mtime of current auth key
- current epoch advances to LAST + 24h
the following can be arbitrarly interleaved between node A and B:
- LAST+24h node A: pvedaemon/pvestatd on node A calls check_authkey(1)
- LAST+24h node A: it returns 0 (rotation required)
- LAST+24h node A: rotate_key() is called
- LAST+24h node A: cfs_lock_authkey is called
- LAST+24h node B: pvedaemon/pvestatd calls check_authkey(1)
- LAST+24h node B: key is not yet cached in-memory by current process
- LAST+24h node B: key file is opened, stat-ed, read, parsed, and content+mtime
is cached (the kernel will now cache this stat result for 1s unless
the path is opened)
- LAST+24h node B: it returns 0 (rotation required)
- LAST+24h node B: rotate_key() is called
- LAST+24h node B: cfs_lock_authkey is called
the following is mutex-ed via a cfs_lock:
- LAST+24h node A: lock is obtained
- LAST+24h node A: check_authkey() is called
- LAST+24h node A: key is stat-ed, mtime is still (correctly) LAST,
cached mtime and content are returned
- LAST+24h node A: it returns 0 (rotation still required)
- LAST+24h node A: get_pubkey() is called and returns current auth key
- LAST+24h node A: new keypair is generated and persisted
- LAST+24h node A: cfs_lock is released
- LAST+24h node B: changes by node A are processed by pmxcfs
- LAST+24h node B: lock is obtained
- LAST+24h node B: check_authkey() is called
- LAST+24h node B: key is stat-ed, mtime is (incorrectly!) still LAST
since the stat call is handled by the kernel/page cache, not by
pmxcfs, cached mtime and content are returned
- LAST+24h node B: it returns 0 (rotation still required)
- LAST+24h node B: get_pubkey() is called and returns either previous or
key written by node A (depending on whether page cache or pmxcfs
answers stat call)
- LAST+24h node B: new keypair is generated, key returned by last
get_pubkey call is written as old key
the end result is that some nodes and process will treat the key
generated by node A as "current", while others will treat the one
generated by nodoe B as "current". both have the same mtime, so the
in-memory cache hash won't be updated unless the service is restarted or
another rotation happens. depending on who generated the ticket and who
attempts validating it, a ticket might be rejected as invalid even
though the generating party would treat it as valid, and time on all
nodes is properly synced.
there seems to be now way for pmxcfs to pro-actively invalidate the page
cache entry safely (since we'd need to do so while writes to the same
path can happen concurrently), so work around by forcing an open/close
at the (stat) call site which does the work for us. regular reads are
not affected since those already bypass the page cache entirely anyway.
thankfully in almost all cases, the following sequence has enough
synchronization overhead baked in to avoid triggering the issue almost
entirely:
- cfs_lock
- generate key
- create tmp file for old key
- write tmp file
- rename tmp file into proper place
- create tmp file for new pub key
- write tmp file
- rename tmp file into proper place
- create tmp file for new priv key
- write tmp file
- rename tmp file into proper place
- release lock
that being said, there has been at least one report where this was
triggered in the wild from time to time.
it is easy to reproduce by increasing the attr_timeout and entry_timeout
fuse settings inside pmxcfs to increase the time stat results are
treated as valid/retained in the page cache:
-----8<-----
diff --git a/data/src/pmxcfs.c b/data/src/pmxcfs.c
index d78a248..e3e807b 100644
--- a/data/src/pmxcfs.c
+++ b/data/src/pmxcfs.c
@@ -935,7 +935,7 @@ int main(int argc, char *argv[])
mkdir(CFSDIR, 0755);
- char *fa[] = { "-f", "-odefault_permissions", "-oallow_other", NULL};
+ char *fa[] = { "-f", "-odefault_permissions", "-oallow_other", "-oentry_timeout=5", "-oattr_timeout=5", NULL};
struct fuse_args fuse_args = FUSE_ARGS_INIT(sizeof (fa)/sizeof(gpointer) - 1, fa);
----->8-----
in which case it's even easy to trigger more than double rotation in a
bigger test cluster (stopping all PVE services except for pve-cluster
helps avoiding interference):
on a single node:
$ touch --date yesterday /etc/pve/authkey.pub
in parallel (i.e., via tmux synchronized panes):
-----8<-----
use strict;
use warnings;
use PVE::Cluster;
use PVE::AccessControl;
PVE::Cluster::cfs_update();
# ensure page cache entry is there
PVE::AccessControl::check_authkey(1);
PVE::AccessControl::check_authkey(1);
# now attempt rotation
PVE::AccessControl::rotate_authkey();
----->8-----
Thanks to Wolfgang Bumiller for assistance in triaging and exploring
various avenues of fixing.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to detect similar issues to that fixed in the previous commit early on
and without the potential for dangerous side-effects.
root@pam is intentionally still allowed before the check in case such
situations can be triggered by misconfiguration - root@pam can then
still clean up the affected configs via the GUI/API, and not just via
manual editing.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
not that relevant for the user as the daemon auth log already
contains that info, but for token it can be nice.
The API response is always just a plain "401 auth failure" in any
case (expired or wrong creds)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
but don't bail out of the entire auth process, otherwise
not even totp or recovery keys will work anymore in this
case
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
in order to allow subdomains to work, the wa config should
only specify 'id' and 'rp', the 'origin' gets filled in by
the node
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
this happens when the first new tfa entry is added and the
'keys' entry is replaced by "x"
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In PBS we don't support this, so the current TFA API in rust
does not support this either (although the config does know
about its *existence*).
For now, yubico authentication will be done in perl. Adding
it to rust the rust TFA crate would not make much sense
anyway as we'd likely not want to use the same http client
crate in pve and pbs anyway (since pve is all blocking code
and pbs is async...)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
implements the same api paths as in pbs by forwarding the
api methods to the rust implementation after performing the
product-specific checks
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
The main difference here is that we have no separate api
path for verification. Instead, the ticket api call gets an
optional 'tfa-challenge' parameter.
This is opt-in: old pve-manager UI with new
pve-access-control will still work as expected, but users
won't be able to *update* their TFA settings.
Since the new tfa config parser will build a compatible
in-perl tfa config object as well, the old authentication
code is left unchanged for compatibility and will be removed
with pve-8, where the `new-format` parameter in the ticket
call will change its default to `1`.
The `change_tfa` call will simply die in this commit. It is
removed later when adding the pbs-style TFA API calls.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>