nfsd and lockd call vfs_lock_file() to lock/unlock the inode
returned by locks_inode(file).
Many places in nfsd/lockd code use the inode returned by
file_inode(file) for lock manipulation. With Overlayfs, file_inode()
(the underlying inode) is not the same object as locks_inode() (the
overlay inode). This can result in "Leaked POSIX lock" messages
and eventually to a kernel crash as reported by Eddie Horng:
https://marc.info/?l=linux-unionfs&m=153086643202072&w=2
Fix all the call sites in nfsd/lockd that should use locks_inode().
This is a correctness bug that manifested when overlayfs gained
NFS export support in v4.16.
Reported-by: Eddie Horng <eddiehorng.tw@gmail.com>
Tested-by: Eddie Horng <eddiehorng.tw@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>
Fixes: 8383f17488 ("ovl: wire up NFS export operations")
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's inode->i_lock that's now taken in setlease and break_lease, instead
of the big kernel lock.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The name of this variable doesn't fit the type. And we only ever use
one field of it.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Make the function prototype match the name a little better.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the client is only renewing state a little sooner than once a lease
period, then it might not discover the server has restarted till close
to the end of the grace period, and might run out of time to do the
actual reclaim.
Extend the grace period by a second each time we notice there are
clients still trying to reclaim, up to a limit of another whole lease
period.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
AtfvEeaPHaOyD8/h2Q==
=zUUp
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
I noticed a memory corruption crash in nfsd in
4.17-rc1. This patch corrects the issue.
Fix to return error if the delegation couldn't be hashed or there was
a recall in progress. Use the existing error path instead of
destroy_delegation() for readability.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Fixes: 353601e7d3 ("nfsd: create a separate lease for each delegation")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fs/nfsd/nfs4state.c:926:8-9: WARNING: return of 0/1 in function 'nfs4_delegation_exists' with return type bool
fs/nfsd/nfs4state.c:2955:9-10: WARNING: return of 0/1 in function 'nfsd4_compound_in_session' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Fixes: 68b18f5294 ("nfsd: make nfs4_get_existing_delegation less confusing")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[bfields: also fix -EAGAIN]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently we only take one vfs-level delegation (lease) for each file,
no matter how many clients hold delegations on that file.
Let's instead keep a one-to-one mapping between NFSv4 delegations and
VFS delegations. This turns out to be simpler.
There is still a many-to-one mapping of NFS opens to NFS files, and the
delegations on one file are all associated with one struct file. The
VFS can still distinguish between these delegations since we're setting
fl_owner to the struct nfs4_delegation now, not to the shared file.
I'm replacing at least one complicated function wholesale, which I don't
like to do, but I haven't figured out how to do this more incrementally.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Pull some duplicated code into a common helper.
This changes the order in destroy_delegation a little, but it looks to
me like that shouldn't matter.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
For now this makes no difference, as for files having delegations,
there's a one-to-one relationship between an nfs4_file and its
nfs4_delegation.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Every single caller gets the file out of the delegation, so let's do
that once in nfs4_put_deleg_lease.
Plus we'll need it there for other reasons.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
fi_delegees is basically just a reference count on users of
fi_deleg_file, which is cleared when fi_delegees goes to zero. The
fi_deleg_file check here is redundant. Also add an assertion to make
sure we don't have unbalanced puts.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
On x86_64, it's 1152 bytes, so we can avoid wasting 896 bytes each.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We already send it for v4.1, but RFC7530 also notes that the stateid in
the close reply is bogus.
Always send the special close stateid, even in v4.0 responses. No client
should put any meaning on it whatsoever. For now, we continue to
increment the stateid value, though that might not be necessary either.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We had some reports of panics in nfsd4_lm_notify, and that showed a
nfs4_lockowner that had outlived its so_client.
Ensure that we walk any leftover lockowners after tearing down all of
the stateids, and remove any blocked locks that they hold.
With this change, we also don't need to walk the nbl_lru on nfsd_net
shutdown, as that will happen naturally when we tear down the clients.
Fixes: 76d348fadf (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks)
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There's no point I can see to
stp->st_stid.sc_type = NFS4_CLOSED_STID;
given release_lock_stateid immediately sets sc_type to 0.
That set of sc_type to 0 should be enough to prevent it being used where
we don't want it to be; NFS4_CLOSED_STID should only be needed for
actual open stateid's that are actually closed.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The state of the stid is guaranteed by 2 locks:
- The nfs4_client 'cl_lock' spinlock
- The nfs4_ol_stateid 'st_mutex' mutex
so it is quite possible for the stid to be unhashed after lookup,
but before calling nfsd4_lock_ol_stateid(). So we do need to check
for a zero value for 'sc_type' in nfsd4_verify_open_stid().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Checuk Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 659aefb68e "nfsd: Ensure we don't recognise lock stateids..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
restart_grace() uses hardcoded init_net.
It can cause to "list_add double add" in following scenario:
1) nfsd and lockd was started in several net namespaces
2) nfsd in init_net was stopped (lockd was not stopped because
it have users from another net namespaces)
3) lockd got signal, called restart_grace() -> set_grace_period()
and enabled lock_manager in hardcoded init_net.
4) nfsd in init_net is started again,
its lockd_up() calls set_grace_period() and tries to add
lock_manager into init_net 2nd time.
Jeff Layton suggest:
"Make it safe to call locks_start_grace multiple times on the same
lock_manager. If it's already on the global grace_list, then don't try
to add it again. (But we don't intentionally add twice, so for now we
WARN about that case.)
With this change, we also need to ensure that the nfsd4 lock manager
initializes the list before we call locks_start_grace. While we're at
it, move the rest of the nfsd_net initialization into
nfs4_state_create_net. I see no reason to have it spread over two
functions like it is today."
Suggested patch was updated to generate warning in described situation.
Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Prevent the use of the closed (invalid) special stateid by clients.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
From kernel 4.9, my two nfsv4 servers sometimes suffer from
"panic: unable to handle kernel page request"
in posix_unblock_lock() called from nfs4_laundromat().
These panics diseappear if we revert the commit "nfsd: add a LRU list
for blocked locks".
The cause appears to be a typo in nfs4_laundromat(), which is also
present in nfs4_state_shutdown_net().
Cc: stable@vger.kernel.org
Fixes: 7919d0a27f "nfsd: add a LRU list for blocked locks"
Cc: jlayton@redhat.com
Reveiwed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The use of the st_mutex has been confusing the validator. Use the
proper nested notation so as to not produce warnings.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The various functions that call check_stateid_generation() in order
to compare a client-supplied stateid with the nfs4_stid state, usually
need to atomically check for closed state. Those that perform the
check after locking the st_mutex using nfsd4_lock_ol_stateid()
should now be OK, but we do want to fix up the others.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After taking the stateid st_mutex, we want to know that the stateid
still represents valid state before performing any non-idempotent
actions.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If we're looking up a new lock state, and the creation fails, then
we want to unhash it, just like we do for OPEN. However in order
to do so, we need to that no other LOCK requests can grab the
mutex until we have unhashed it (and marked it as closed).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Trivial cleanup to simplify following patch.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In order to deal with lookup races, nfsd4_free_lock_stateid() needs
to be able to signal to other stateful functions that the lock stateid
is no longer valid. Right now, nfsd_lock() will check whether or not an
existing stateid is still hashed, but only in the "new lock" path.
To ensure the stateid invalidation is also recognised by the "existing lock"
path, and also by a second call to nfsd4_free_lock_stateid() itself, we can
change the type to NFS4_CLOSED_STID under the stp->st_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.
Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1
case, and NFS4ERR_BAD_STATEID otherwise.
The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply. That can cause the Linux 4.1 client to loop in its
stage manager.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Publishing of net pointer is not safe,
let's use nfs->ns.inum instead
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs4_file.fi_ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs4_stid.sc_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.
Catching every such case is difficult, but we may as well catch a few
easy ones. This also handles the case described in the previous patch,
in a different way.
The spec does however require us to catch the case where the difference
is in the rpc credentials. This prevents somebody from snooping another
user's replies by fabricating retries.
(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.
Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.
The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.
The protocol permits us to cache replies even if the client didn't ask
us to. And it's easy to do so in the case of solo SEQUENCE compounds.
So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.
Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops. This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id. If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much. But we can at
least attempt to return valid xdr!
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Instead of granting client's full requests until we hit our DRC size
limit and then failing CREATE_SESSIONs (and hence mounts) completely,
start granting clients smaller slot tables as we approach the limit.
The factor chosen here is pretty much arbitrary.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.
It also adds two missing structures to struct nfsd4_op.u to facilitate
this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
kstrdup() already checks for NULL.
(Brought to our attention by Jason Yann noticing (from sparse output)
that it should have been declared static.)
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
dprintk already provides a KERN_* prefix; this KERN_INFO just shows up
as some odd characters in the output.
Simplify the message a bit while we're there.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The rpccred gotten from rpc_lookup_machine_cred() should be put when
state is shutdown.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().
If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).
This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().
Cc: stable@vger.kernel.org
Fixes: 356a95ece7 "nfsd: clean up races in lock stateid searching..."
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Bruce was hitting some lockdep warnings in testing, showing that we
could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a
rather complex situation involving four different spinlocks.
The crux of the matter is that we end up taking the nn->client_lock in
the lm_notify handler. The simplest fix is to just declare a new
per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
benefit from user testing:
Anna Schumacker contributed a simple NFSv4.2 COPY implementation. COPY
is already supported on the client side, so a call to copy_file_range()
on a recent client should now result in a server-side copy that doesn't
require all the data to make a round trip to the client and back.
Jeff Layton implemented callbacks to notify clients when contended locks
become available, which should reduce latency on workloads with
contended locks.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX/mcsAAoJECebzXlCjuG+MU0P/3SzTLGYXU5yOTAorx255/uf
fUVKQQhTzzaA2xj3gWWWztYx3y0ZJUVgwU56a+Ap5Z8/goqDQ78H+ePEc+MG7BT/
/UXS/bITvt0MP/dvPrDzhSltvqx/wpelLPBo29hGLlAQ2dsnD4Y75IbOOQccWqcC
iD2v6x7lnpWZ7j9Zhwzg/JNQHwISIb7tiLoYBjfcdNDEMU76KIyhxD0Cx9MSeBzH
9Rq/oEdwGDFS5WqVfNe2jxbngoauq1IupziQ2eQGv2D/POyXCx8fphoYjDz1XaW8
PxaJtJtM2owPGG+z2CxklJqNaS1Z4F+oppjg+nf4i/ibxmIBaTy8NluASX3vMh69
CDO1+ly+TiF0l1VqMOQJWRnqn1qGk6fLpF6P1Ac62B0oWpeLGU7nmik7XN1ORgsi
8ksxRKNAWeprZo3wl5xNrADu/wlZ7XCJTc4QoHEgYT04aHF+j8EMCHv+mtZ8+Bwn
WWiA8iItZOgXV4vitCRJlvsixjYvmF3djPIoI2Lt5KDWIg+eL89sKwzTALSfeC4m
Vjb0svzPX1MmZCNP1rCStFbl3gZYXZyqPk+uA6M7H8mjAjVeKxRPowWpMBgvYZHr
FjCPb878bAuqCeBVbIyOLLcKWBLTw8PsUWZAor3gNg454JGkMjLUyJ/S22Cz5Nbo
HdjoiTJtbPrHnCwTMXwa
=nozl
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:
- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.
- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"
* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.
If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes). But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.
2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).
All of the above is unnecessary. Switch to the usual
trailing-zero-len-array scheme. Memory is allocated with
kmalloc/vmalloc() and only as much as needed. Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).
Maximum number of gids is 65536 which translates to 256KB+8 bytes. I
think kernel can handle such allocation.
On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!
Nice side effects:
- "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
- fix little mess in net/ipv4/ping.c
should have been using GROUP_AT macro but this point becomes moot,
- aux group allocation is persistent and should be accounted as such.
Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A setclientid_confirm with (clientid, verifier) both matching an
existing confirmed record is assumed to be a replay, but if the verifier
doesn't match, it shouldn't be.
This would be a very rare case, except that clients following
https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
failure.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If we are using v4.1+, then we can send notification when contended
locks become free. Inform the client of that fact.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's possible for a client to call in on a lock that is blocked for a
long time, but discontinue polling for it. A malicious client could
even set a lock on a file, and then spam the server with failing lock
requests from different lockowners that pile up in a DoS attack.
Add the blocked lock structures to a per-net namespace LRU when hashing
them, and timestamp them. If the lock request is not revisited after a
lease period, we'll drop it under the assumption that the client is no
longer interested.
This also gives us a mechanism to clean up these objects at server
shutdown time as well.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Create a new per-lockowner+per-inode structure that contains a
file_lock. Have nfsd4_lock add this structure to the lockowner's list
prior to setting the lock. Then call the vfs and request a blocking lock
(by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
back, then we dequeue the block structure and free it. When the next
lock request comes in, we'll look for an existing block for the same
filehandle and dequeue and reuse it if there is one.
When the lock comes free (a'la an lm_notify call), we dequeue it
from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
inform the client that it should retry the lock request.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.
Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:
Frame 324 R OPEN stateid [2,O]
Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID
In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.
To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.
Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.
During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.
Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.
Found by inspection, we don't currently have a reproducer.
Fixes: 2c41beb0e5 ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Silent a few smatch warnings about indentation
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.
The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.
Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.
Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the state selection logic inside from the caller,
always making it return correct stp to use.
Signed-off-by: J . Bruce Fields <bfields@fieldses.org>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
To avoid racing entry into nfs4_get_vfs_file().
Make init_open_stateid() return with locked stateid to be unlocked
by the caller.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It used to be the case that state had an rwlock that was locked for write
by downgrades, but for read for upgrades (opens). Well, the problem is
if there are two competing opens for the same state, they step on
each other toes potentially leading to leaking file descriptors
from the state structure, since access mode is a bitmap only set once.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the existing static function to an inline helper, and call it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The maximum size of a backchannel message on RPC-over-RDMA depends
on the connection's inline threshold. Today that threshold is
typically 1024 bytes, making the maximum message size 996 bytes.
The Linux server's CREATE_SESSION operation checks that the size
of callback Calls can be as large as 1044 bytes, to accommodate
RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the
true message size maximum of 996 bytes.
But the server's backchannel currently does not support RPCSEC_GSS.
The actual maximum size it needs is much smaller. It is safe to
reduce the limit to enable NFSv4.1 on RDMA backchannel operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The server does indeed now support NFSv4.1 on RDMA transports. It
does not support shifting an RDMA-capable TCP transport (such as
iWARP) to RDMA mode.
Reported-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Remember free allocated client when meeting unsupported state protect how.
Fixes: 50c7b948ad ("nfsd: minor consolidation of mach_cred handling code")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms
that can affect some high-availability NFS setups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWmVZHAAoJECebzXlCjuG+Cn4P/3zwSuwIeLuv9b89vzFXU8Xv
AbBWHk7WkFXJQGTKdclYjwxqU+l15D5lYHCae1cuD5eXdviraxXf7EcnqrMhJUc0
oRiQx0rAwlEkKUAVrxGCFP7WKjlX3TsEBV6wPpTCP3BEMzTPDEeaDek7+hICFkLF
9a/miEXAopm3jxP7WNmXEkdKpFEHklDDwtv6Av7iIKCW6+7XCGp7Prqo4NQKAKp6
hjE+nvt2HiD06MZhUeyb14cn6547smzt1rbSfK4IB4yHMwLyaoqPrT7ekDh9LDrE
uGgo+Y2PBbEcTAE6tJ88EjZx7cMCFPn0te+eKPgnpPy9RqrNqSxj5N/b7JAecKgW
a/09BtvFOoYs8fO5ovqeRY5THrE3IRyMIwn4gt7fCYaaAbG3dwGKG1uklTAVXtb1
95DkhOb8He2VhOCCoJ6ybbTnRfjB6b/cv7ZuEGlQfvTE+BtU3Jj9I76ruWFhb3zd
HM1dRI20UfwL/0Y8yYhZ+/rje9SSk2jOmVgSCqY9hnCmEqOqOdUU0X/uumIWaBym
zfGx9GIM0jQuYVdLQRXtJJbUgJUUN3MilGyU5wx7YoXip5guqTalXqAdQpShzXeW
s1ATYh/mY5X9ig51KogkkVlm9bXDQAzJBAnDRpLtJZqy5Cgkrj9RSu0ExN1Rmlhw
LKQCddBQxUSWJ+XWycgK
=G7V3
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Smaller bugfixes and cleanup, including a fix for a failures of
kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK
storms that can affect some high-availability NFS setups"
* tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux:
nfsd: add new io class tracepoint
nfsd: give up on CB_LAYOUTRECALLs after two lease periods
nfsd: Fix nfsd leaks sunrpc module references
lockd: constify nlmsvc_binding structure
lockd: use to_delayed_work
nfsd: use to_delayed_work
Revert "svcrdma: Do not send XDR roundup bytes for a write chunk"
lockd: Register callbacks on the inetaddr_chain and inet6addr_chain
nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain
sunrpc: Add a function to close temporary transports immediately
nfsd: don't base cl_cb_status on stale information
nfsd4: fix gss-proxy 4.1 mounts for some AD principals
nfsd: fix unlikely NULL deref in mach_creds_match
nfsd: minor consolidation of mach_cred handling code
nfsd: helper for dup of possibly NULL string
svcrpc: move some initialization to common code
nfsd: fix a warning message
nfsd: constify nfsd4_callback_ops structure
nfsd: recover: constify nfsd4_client_tracking_ops structures
svcrdma: Do not send XDR roundup bytes for a write chunk
This will be needed so COPY can look up the saved_fh in addition to the
current_fh.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The principal name on a gss cred is used to setup the NFSv4.0 callback,
which has to have a client principal name to authenticate to.
That code wants the name to be in the form servicetype@hostname.
rpc.svcgssd passes down such names (and passes down no principal name at
all in the case the principal isn't a service principal).
gss-proxy always passes down the principal name, and passes it down in
the form servicetype/hostname@REALM. So we've been munging the name
gss-proxy passes down into the format the NFSv4.0 callback code expects,
or throwing away the name if we can't.
Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've
also been using the principal name to verify that certain operations are
done as the same principal as was used on the original EXCHANGE_ID call.
For that application, the original name passed down by gss-proxy is also
useful.
Lack of that name in some cases was causing some kerberized NFSv4.1
mount failures in an Active Directory environment.
This fix only works in the gss-proxy case. The fix for legacy
rpc.svcgssd would be more involved, and rpc.svcgssd already has other
problems in the AD case.
Reported-and-tested-by: James Ralston <ralston@pobox.com>
Acked-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We really shouldn't allow a client to be created with cl_mach_cred set
unless it also has a principal name.
This also allows us to fail such cases immediately on EXCHANGE_ID as
opposed to waiting and incorrectly returning WRONG_CRED on the following
CREATE_SESSION.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Technically the initialization in the NULL case isn't even needed as the
only caller already has target zeroed out, but it seems safer to keep
copy_cred generic.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The WARN() macro takes a condition and a format string. The condition
was accidentally left out here so it just prints the function name
instead of the message.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfsd4_callback_ops structure is never modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We observed multiple open stateids on the server for files that
seemingly should have been closed.
nfsd4_process_open2() tests for the existence of a preexisting
stateid. If one is not found, the locks are dropped and a new
one is created. The problem is that init_open_stateid(), which
is also responsible for hashing the newly initialized stateid,
doesn't check to see if another open has raced in and created
a matching stateid. This fix is to enable init_open_stateid() to
return the matching stateid and have nfsd4_process_open2()
swap to that stateid and switch to the open upgrade path.
In testing this patch, coverage to the newly created
path indicates that the race was indeed happening.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We've observed the nfsd server in a state where there are
multiple delegations on the same nfs4_file for the same client.
The nfs client does attempt to DELEGRETURN these when they are presented to
it - but apparently under some (unknown) circumstances the client does not
manage to return all of them. This leads to the eventual
attempt to CB_RECALL more than one delegation with the same nfs
filehandle to the same client. The first recall will succeed, but the
next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
having delegations on cl_revoked that the client has no way to FREE
or DELEGRETURN, with resulting inability to recover. The state manager
on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
and the state manager on the client will be looping unable to satisfy
the server.
List discussion also reports a race between OPEN and DELEGRETURN that
will be avoided by only sending the delegation once to the
client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.
So, let's:
1.) Not hand out duplicate delegations.
2.) Only send them to the client once.
RFC 5561:
9.1.1:
"Delegations and layouts, on the other hand, are not associated with a
specific owner but are associated with the client as a whole
(identified by a client ID)."
10.2:
"...the stateid for a delegation is associated with a client ID and may be
used on behalf of all the open-owners for the given client. A
delegation is made to the client as a whole and not to any specific
process or thread of control within it."
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Bruce points out that the increment of the seqid in stateids is not
serialized in any way, so it's possible for racing calls to bump it
twice and end up sending the same stateid. While we don't have any
reports of this problem it _is_ theoretically possible, and could lead
to spurious state recovery by the client.
In the current code, update_stateid is always followed by a memcpy of
that stateid, so we can combine the two operations. For better
atomicity, we add a spinlock to the nfs4_stid and hold that when bumping
the seqid and copying the stateid.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
At least in the v4.0 case openowners can hang around for a while after
last close, but they shouldn't really block (for example), a new mount
with a different principal.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In bakeathon testing Solaris client was getting CLID_INUSE error when
doing a krb5 mount soon after an auth_sys mount, or vice versa.
That's not really necessary since in this case the old client doesn't
have any state any more:
http://tools.ietf.org/html/rfc7530#page-103
"when the server gets a SETCLIENTID for a client ID that
currently has no state, or it has state but the lease has
expired, rather than returning NFS4ERR_CLID_INUSE, the server
MUST allow the SETCLIENTID and confirm the new client ID if
followed by the appropriate SETCLIENTID_CONFIRM."
This doesn't fix the problem completely since our client_has_state()
check counts openowners left around to handle close replays, which we
should probably just remove in this case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
running in parallel. The server would receive the OPEN_DOWNGRADE first
and check its seqid, but then an OPEN would race in and bump it. The
OPEN_DOWNGRADE would then complete and bump the seqid again. The result
was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
it should have been rejected since the seqid changed.
The only recourse we have here I think is to serialize operations that
bump the seqid in a stateid, particularly when we're given a seqid in
the call. To address this, we add a new rw_semaphore to the
nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
after looking up the stateid to ensure that nothing else is going to
bump it while we're operating on it.
In the case of OPEN, we do a down_read, as the call doesn't contain a
seqid. Those can run in parallel -- we just need to serialize them when
there is a concurrent OPEN_DOWNGRADE or CLOSE.
LOCK and LOCKU however always take the write lock as there is no
opportunity for parallelizing those.
Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We have observed the server sending recalls for delegation stateids
that have already been successfully returned. Change
nfsd4_cb_recall_done() to return success if the client has returned
the delegation. While this does not completely eliminate the sending
of recalls for delegations that have already been returned, this
does prevent unnecessarily declaring the callback path to be down.
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Somebody with a Solaris client was hitting this case. We haven't
figured out why yet, and don't have a reproducer. Meanwhile Frank
noticed that RFC 7530 actually recommends CLID_INUSE for this case.
Unlikely to help the original reporter, but may as well fix it.
Reported-by: Frank Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's possible that a DELEGRETURN could race with (e.g.) client expiry,
in which case we could end up putting the delegation hash reference more
than once.
Have unhash_delegation_locked return a bool that indicates whether it
was already unhashed. In the case of destroy_delegation we only
conditionally put the hash reference if that returns true.
The other callers of unhash_delegation_locked call it while walking
list_heads that shouldn't yet be detached. If we find that it doesn't
return true in those cases, then throw a WARN_ON as that indicates that
we have a partially hashed delegation, and that something is likely very
wrong.
Tested-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When an open or lock stateid is hashed, we take an extra reference to
it. When we unhash it, we drop that reference. The code however does
not properly account for the case where we have two callers concurrently
trying to unhash the stateid. This can lead to list corruption and the
hash reference being put more than once.
Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
list_head, and then testing to see if that list_head is empty before
releasing the hash reference. This means that some of the unhashing
wrappers now become bool return functions so we can test to see whether
the stateid was unhashed before we put the reference.
Reported-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Andrew W Elble <aweits@rit.edu>
Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We can potentially have several nfs4_laundromat jobs running if there
are multiple namespaces running nfsd on the box. Those are effectively
separated from one another though, so I don't see any reason to
serialize them.
Also, create_singlethread_workqueue automatically adds the
WQ_MEM_RECLAIM flag. Since we run this job on a timer, it's not really
involved in any reclaim paths. I see no need for a rescuer thread.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NLM locks don't conflict with NFSv4 share reservations, so we're not
going to learn anything new by watiting for them.
They do conflict with NFSv4 locks and with delegations.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit 294ac32e99 "nfsd: protect clid and verifier generation with
client_lock" moved gen_confirm() to gen_clid().
After that commit, setclientid will return a bad reply with all-zero
verifier after copy_clid().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If using clientid_counter, it seems possible that gen_confirm could
generate the same verifier for the same client in some situations.
Add a new counter for client confirm verifier to make sure gen_confirm
generates a different verifier on each call for the same clientid.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
v2, new helper nfs4_free_stateowner for freeing so_owner.data and sop
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Layout is a state resource, nfsd should check it too.
v2, drop unneeded updating in nfsd4_renew()
v3, fix compile error without CONFIG_NFSD_PNFS
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, preprocess_stateid_op calls nfs4_check_olstateid which
verifies that the open stateid corresponds to the current filehandle in the
call by calling nfs4_check_fh.
If the stateid is a NFS4_DELEG_STID however, then no such check is done.
This could cause incorrect enforcement of permissions, because the
nfsd_permission() call in nfs4_check_file uses current the current
filehandle, but any subsequent IO operation will use the file descriptor
in the stateid.
Move the call to nfs4_check_fh into nfs4_check_file instead so that it
can be done for all stateid types.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
[bfields: moved fh check to avoid NULL deref in special stateid case]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that. For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid. This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.
There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters. In the long run we should probably aim to
cache full file structures used with special stateids instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Split out two self contained helpers to make the function more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
gcc-5.0 warns about a potential uninitialized variable use in nfsd:
fs/nfsd/nfs4state.c: In function 'nfsd4_process_open2':
fs/nfsd/nfs4state.c:3781:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
reset_union_bmap_deny(old_deny_bmap, stp);
^
fs/nfsd/nfs4state.c:3760:16: note: 'old_deny_bmap' was declared here
unsigned char old_deny_bmap;
^
This is a false positive, the code path that is warned about cannot
actually be reached.
This adds an initialization for the variable to make the warning go
away.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Checking the rpc_client pointer is not a reliable way to detect
backchannel changes: cl_cb_client is changed only after shutting down
the rpc client, so the condition cl_cb_client = tk_client will always be
true.
Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
the buggy cl_callbacks list and fix the lifetime rules due to double
calls of the ->prepare callback operations method for this retry case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For the sake of forgetful clients, the server should return the layouts
to the file system on 'last close' of a file (assuming that there are no
delegations outstanding to that particular client) or on delegreturn
(assuming that there are no opens on a file from that particular
client).
In theory the information is all there in current data structures, but
it's not efficiently available; nfs4_file->fi_ref includes references on
the file across all clients, but we need a per-(client, file) count.
Walking through lots of stateid's to calculate this on each close or
delegreturn would be painful.
This patch introduces infrastructure to maintain per-client opens and
delegation counters on a per-file basis.
[hch: ported to the mainline pNFS support, merged various fixes from Jeff]
Signed-off-by: Sachin Bhamare <sachin.bhamare@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If we find a non-confirmed openowner we jump to exit the function, but do
not set an error value. Fix this by factoring out a helper to do the
check and properly set the error from nfsd4_validate_stateid.
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
Pull nfsd updates from Bruce Fields:
"A quiet cycle this time; this is basically entirely bugfixes.
The few that aren't cc'd to stable are cleanup or seemed unlikely to
affect anyone much"
* 'for-4.1' of git://linux-nfs.org/~bfields/linux:
uapi: Remove kernel internal declaration
nfsd: fix nsfd startup race triggering BUG_ON
nfsd: eliminate NFSD_DEBUG
nfsd4: fix READ permission checking
nfsd4: disallow SEEK with special stateids
nfsd4: disallow ALLOCATE with special stateids
nfsd: add NFSEXP_PNFS to the exflags array
nfsd: Remove duplicate macro define for max sec label length
nfsd: allow setting acls with unenforceable DENYs
nfsd: NFSD_FAULT_INJECTION depends on DEBUG_FS
nfsd: remove unused status arg to nfsd4_cleanup_open_state
nfsd: remove bogus setting of status in nfsd4_process_open2
NFSD: Use correct reply size calculating function
NFSD: Using path_equal() for checking two paths
Commit f895b252d4 ("sunrpc: eliminate RPC_DEBUG") introduced
use of IS_ENABLED() in a uapi header which leads to a build
failure for userspace apps trying to use <linux/nfsd/debug.h>:
linux/nfsd/debug.h:18:15: error: missing binary operator before token "("
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
^
Since this was only used to define NFSD_DEBUG if CONFIG_SUNRPC_DEBUG
is enabled, replace instances of NFSD_DEBUG with CONFIG_SUNRPC_DEBUG.
Cc: stable@vger.kernel.org
Fixes: f895b252d4 "sunrpc: eliminate RPC_DEBUG"
Signed-off-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The current prototypes for these operations are somewhat awkward as they
deal with fl_owners but take struct file_lock arguments. In the future,
we'll want to be able to take references without necessarily dealing
with a struct file_lock.
Change them to take fl_owner_t arguments instead and have the callers
deal with assigning the values to the file_lock structs.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
status is always reset after this (and it doesn't make much sense there
anyway).
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.
Noticed by inspection after Jeff Layton fixed the same bug for open
owners. Depending on client behavior, this one may be trickier to
trigger in practice.
Fixes: c58c6610ec "nfsd: Protect adding/removing lock owners using client_lock"
Cc: <stable@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.
In commit 7ffb588086, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).
This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner. That
has changed now with v4.0 kernels, and this race can now easily occur.
Fixes: 7ffb588086
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
commit 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is
protected by clp->cl_lock") removed the use of the reaplist to
clean out clp->cl_revoked. It failed to change list_entry() to
walk clp->cl_revoked.next instead of reaplist.next
Fixes: 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
Cc: stable@vger.kernel.org
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Tested-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add support to issue layout recalls to clients. For now we only support
full-file recalls to get a simple and stable implementation. This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall. For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.
To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.
The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.
Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.
The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.
Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
The pnfs code will need it too. Also remove the nfsd_ prefix to match the
other filehandle helpers in that file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Just like for other lock types we should allow different owners to have
a read lease on a file. Currently this can't happen, but with the addition
of pNFS layout leases we'll need this feature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Someone with a weird time_t happened to notice this, it shouldn't really
manifest till 2038. It may not be our ownly year-2038 problem.
Reported-by: Aaron Pace <Aaron.Pace@alcatel-lucent.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Now that we use standard list_heads for tracking leases, we can have
lm_change take a pointer to the lease to be modified instead of a
double pointer.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
We can now add a dedicated spinlock without expanding struct inode.
Change to using that to protect the various i_flctx lists.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Remove the function renew_client() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fi_delegees is always handled under the fi_lock, so there's no need to
use an atomic_t for this field.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.
Change the code to take the reference after the check and only if there
is no conflict.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
This patch effectively reverts commit 500f808726 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd5655 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.
While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.
Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.
[1] https://patchwork.ozlabs.org/patch/299369/
[2] https://patchwork.ozlabs.org/patch/418756/
Cc: Neil Brown <neilb@suse.de>
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bugs similar to the one in acbbe6fbb2 (kcmp: fix standard comparison
bug) are in rich supply.
In this variant, the problem is that struct xdr_netobj::len has type
unsigned int, so the expression o1->len - o2->len _also_ has type
unsigned int; it has completely well-defined semantics, and the result
is some non-negative integer, which is always representable in a long
long. But this means that if the conditional triggers, we are
guaranteed to return a positive value from compare_blob.
In this case it could be fixed by
- res = o1->len - o2->len;
+ res = (long long)o1->len - (long long)o2->len;
but I'd rather eliminate the usually broken 'return a - b;' idiom.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The global state_lock protects the file_hashtbl, and that has the
potential to be a scalability bottleneck.
Address this by making the file_hashtbl use RCU. Add a rcu_head to the
nfs4_file and use that when freeing ones that have been hashed. In order
to conserve space, we union the fi_rcu field with the fi_delegations
list_head which must be clear by the time the last reference to the file
is dropped.
Convert find_file_locked to use RCU lookup primitives and not to require
that the state_lock be held, and convert find_file to do a lockless
lookup. Convert find_or_add_file to attempt a lockless lookup first, and
then fall back to doing a locked search and insert if that fails to find
anything.
Also, minimize the number of times we need to calculate the hash value
by passing it in as an argument to the search and insert functions, and
optimize the order of arguments in nfsd4_init_file.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A client may not want to use the back channel on a transport it sent
CREATE_SESSION on, in which case it clears SESSION4_BACK_CHAN.
However, cl_cb_addr should be populated anyway, to be used if the
client binds other connections to this session. If cl_cb_addr is
not initialized, rpc_create() fails when the server attempts to
set up a back channel on such secondary transports.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba
EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf
d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9
PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj
RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G
MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk
DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j
eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT
exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w
BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V
Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf
J2z9Shz68zu0ok8cuSlo
=Hhee
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux
Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:
- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.
There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"
* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...
Christoph suggests:
"Add a return value to lm_break so that the lock manager can tell the
core code "you can delete this lease right now". That gets rid of
the games with the timeout which require all kinds of race avoidance
code in the users."
Do that here and have the nfsd lease break routine use it when it detects
that there was a race between setting up the lease and it being broken.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There was only one place where we still could free a file_lock while
holding the i_lock -- lease_modify. Add a new list_head argument to the
lm_change operation, pass in a private list when calling it, and fix
those callers to dispose of the list once the lock has been dropped.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
...and move the fasync setup into it for fcntl lease calls. At the same
time, change the semantics of how the file_lock double-pointer is
handled. Up until now, on a successful lease return you got a pointer to
the lock on the list. This is bad, since that pointer can no longer be
relied on as valid once the inode->i_lock has been released.
Change the code to instead just zero out the pointer if the lease we
passed in ended up being used. Then the callers can just check to see
if it's NULL after the call and free it if it isn't.
The priv argument has the same semantics. The lm_setup function can
zero the pointer out to signal to the caller that it should not be
freed after the function returns.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
In later patches, we're going to add a new lock_manager_operation to
finish setting up the lease while still holding the i_lock. To do
this, we'll need to pass a little bit of info in the fcntl setlease
case (primarily an fasync structure). Plumb the extra pointer into
there in advance of that.
We declare this pointer as a void ** to make it clear that this is
private info, and that the caller isn't required to set this unless
the lm_setup specifically requires it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Now that we don't need to pass in an actual lease pointer to
vfs_setlease on unlock, we can stop tracking a pointer to the lease in
the nfs4_file.
Switch all of the places that check the fi_lease to check fi_deleg_file
instead. We always set that at the same time so it will have the same
semantics.
Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Ensure that it's OK to pass in a NULL file_lock double pointer on
a F_UNLCK request and convert the vfs_setlease F_UNLCK callers to
do just that.
Finally, turn the BUG_ON in generic_setlease into a WARN_ON_ONCE
with an error return. That's a problem we can handle without
crashing the box if it occurs.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
It's unlikely to ever occur, but if there were already a lease set on
the file then we could end up getting back a different pointer on a
successful setlease attempt than the one we allocated. If that happens,
the one we allocated could leak.
In practice, I don't think this will happen due to the fact that we only
try to set up the lease once per nfs4_file, but this error handling is a
bit more correct given the current lease API.
Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We now have cb_to_delegation and to_delegation, which do the same thing
and are defined separately in different .c files. Move the
cb_to_delegation definition into a header file and eliminate the
redundant to_delegation definition.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Add a higher level abstraction than the rpc_ops for callback operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Split out initializing the nfs4_callback structure from using it. For
the NULL callback this gets rid of tons of pointless re-initializations.
Note that I don't quite understand what protects us from running multiple
NULL callbacks at the same time, but at least this chance doesn't make
it worse..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a helper to queue up a callback. CB_NULL has a bit of special casing
because it is special in the specification, but all other new callback
operations will be able to share code with this and a few more changes
to refactor the callback code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The grace period is ended in two steps--first userland is notified that
the grace period is now long enough that any clients who have not yet
reclaimed can be safely forgotten, then we flip the switch that forbids
reclaims and allows new opens. I had to think a bit to convince myself
that the ordering was right here. Document it.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The attempt to automatically set a new grace period time at the end of
the grace period isn't really helpful. We'll probably shut down and
reboot before we actually make use of the new grace period time anyway.
So may as well leave it up to the init system to get this right.
This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime
change from what they set it to.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In order to support lifting the grace period early, we must tell
nfsdcltrack what sort of client the "create" upcall is for. We can't
reliably tell if a v4.0 client has completed reclaiming, so we can only
lift the grace period once all the v4.1+ clients have issued a
RECLAIM_COMPLETE and if there are no v4.0 clients.
Also, in order to lift the grace period, we have to tell userland when
the grace period started so that it can tell whether a RECLAIM_COMPLETE
has been issued for each client since then.
Since this is all optional info, we pass it along in environment
variables to the "init" and "create" upcalls. By doing this, we don't
need to revise the upcall format. The UMH upcall can simply make use of
this info if it happens to be present. If it's not then it can just
avoid lifting the grace period early.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Allow a privileged userland process to end the v4 grace period early.
Writing "Y", "y", or "1" to the file will cause the v4 grace period to
be lifted. The basic idea with this will be to allow the userland
client tracking program to lift the grace period once it knows that no
more clients will be reclaiming state.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
As stated in RFC 5661, section 18.51.3:
Once a RECLAIM_COMPLETE is done, there can be no further reclaim
operations for locks whose scope is defined as having completed
recovery. Once the client sends RECLAIM_COMPLETE, the server will
not allow the client to do subsequent reclaims of locking state for
that scope and, if these are attempted, will return
NFS4ERR_NO_GRACE.
Ensure that we enforce that requirement.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
v5: using nfs4_get_stateowner() instead of an inline function
v3: Update based on Jeff's comments
v2: Fix bad using of struct file_lock_operations for handle the owner
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
v5: same as the first version
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
locks_alloc_lock() has initialized struct file_lock, no need to
re-initialize it here.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, we hold the state_lock when releasing the lease. That's
potentially problematic in the future if we allow for setlease methods
that can sleep. Move the nfs4_put_deleg_lease call out of the delegation
unhashing routine (which was always a bit goofy anyway), and into the
unlocked sections of the callers of unhash_delegation_locked.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently these fields are protected with the state_lock, but that
doesn't really make a lot of sense. These fields are "private" to the
nfs4_file, and can be protected with the more granular fi_lock.
The fi_lock is already held when setting these fields. Make the code
hold the fp->fi_lock when clearing the lease-related fields in the
nfs4_file, and no longer require that the state_lock be held when
calling into this function.
To prevent lock inversion with the i_lock, we also move the vfs_setlease
and fput calls outside of the fi_lock. This also sets us up for allowing
vfs_setlease calls to block in the future.
Finally, remove a redundant NULL pointer check. unhash_delegation_locked
locks the fp->fi_lock prior to that check, so fp in that function must
never be NULL.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Also destroy_clientid and bind_conn_to_session.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Remove the old nfsd_for_n_state function and move nfsd_find_client
higher up into the file to get rid of forward declaration. Remove
the struct nfsd_fault_inject_op arguments from the operations as
they are no longer needed by any of them.
Finally, remove the old "standard" get and set routines, which
also eliminates the client_mutex from this code.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...instead of relying on the client_mutex.
Also, fix up the printk output that is generated when the file is read.
It currently says that it's reporting the number of open files, but
it's actually reporting the number of openowners.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In a later patch, we'll want to collect the locks onto a list for later
destruction. If "func" is defined and "collect" is defined, then we'll
add the lock stateid to the list.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...which uses the client_lock for protection instead of client_mutex.
Also remove nfsd_forget_client as there are no more callers.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...that relies on the client_lock instead of client_mutex.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a new "get" routine for forget_clients that relies on the
client_lock instead of the client_mutex.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The clid counter is a global counter currently. Move it to be a per-net
property so that it can be properly protected by the nn->client_lock
instead of relying on the client_mutex.
The verifier generator is also potentially racy if there are two
simultaneous callers. Generate the verifier when we generate the clid
value, so it's also created under the client_lock. With this, there's
no need to keep two counters as they'd always be in sync anyway, so
just use the clientid_counter for both.
As Trond points out, what would be best is to eventually move this
code to use IDR instead of the hash tables. That would also help ensure
uniqueness, but that's probably best done as a separate project.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's possible that we'll have an in-progress call on some of the clients
while a rogue EXCHANGE_ID or DESTROY_CLIENTID call comes in. Be sure to
try and mark the client expired first, so that the refcount is
respected.
This will only be a problem once the client_mutex is removed.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If it fails, it means that the client is in use and so destroying it
would be bad. Currently, the client_mutex prevents this from happening
but once we remove it, we won't be able to do this.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
All the callers except for the fault injection code call it directly
afterward, and in the fault injection case it won't hurt to do so
anyway.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, it's protected by the client_mutex. Move it so that the list
and the fields in the openowner are protected by the client_lock.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that the client lookup is done safely under the client_lock, so
we're not relying on the client_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...instead of relying on the client_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In particular, we want to ensure that the move_to_confirmed() is
protected by the nn->client_lock spin lock, so that we can use that when
looking up the clientid etc. instead of relying on the client_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...instead of relying on the client_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For efficiency reasons, and because we want to use spin locks instead
of relying on the client_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The struct nfs_client is supposed to be invisible and unreferenced
before it gets here.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If we leave the client on the confirmed/unconfirmed tables, and leave
the sessions visible on the sessionid_hashtbl, then someone might
find them before we've had a chance to destroy them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When we remove the client_mutex protection, we will need to ensure
that it can't be found by other threads while we're destroying it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...to better match other functions that deal with open/lock stateids.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When we remove the client_mutex, we'll have a potential race between
FREE_STATEID and CLOSE.
The root of the problem is that we are walking the st_locks list,
dropping the spinlock and then trying to release the persistent
reference to the lockstateid. In between, a FREE_STATEID call can come
along and take the lock, find the stateid and then try to put the
reference. That leads to a double put.
Fix this by not releasing the cl_lock in order to release each lock
stateid. Use put_generic_stateid_locked to unhash them and gather them
onto a list, and free_ol_stateid_reaplist to free any that end up on the
list.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Releasing an openowner is a bit inefficient as it can potentially thrash
the cl_lock if you have a lot of stateids attached to it. Once we remove
the client_mutex, it'll also potentially be dangerous to do this.
Add some functions to make it easier to defer the part of putting a
generic stateid reference that needs to be done outside the cl_lock while
doing the parts that must be done while holding it under a single lock.
First we unhash each open stateid. Then we call
put_generic_stateid_locked which will put the reference to an
nfs4_ol_stateid. If it turns out to be the last reference, it'll go
ahead and remove the stid from the IDR tree and put it onto the reaplist
using the st_locks list_head.
Then, after dropping the lock we'll call free_ol_stateid_reaplist to
walk the list of stateids that are fully unhashed and ready to be freed,
and free each of them. This function can sleep, so it must be done
outside any spinlocks.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we remove the client_mutex, it'll be possible for the sc_type of a
lock stateid to change after it's found and checked, but before we can
go to destroy it. If that happens, we can end up putting the persistent
reference to the stateid more than once, and unhash it more than once.
Fix this by unhashing the lock stateid prior to dropping the cl_lock but
after finding it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reduce the cl_lock trashing in destroy_lockowner. Unhash all of the
lockstateids on the lockowner's list. Put the reference under the lock
and see if it was the last one. If so, then add it to a private list
to be destroyed after we drop the lock.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we remove the client_mutex, we'll need to properly protect
the stateowner reference counts using the cl_lock.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Do more within the main loop, and simplify the function a bit. Also,
there's no need to take a stateowner reference unless we're going to call
release_lockowner.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Preparation for removing the client_mutex.
Convert the open owner hash table into a per-client table and protect it
using the nfs4_client->cl_lock spin lock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we remove client mutex protection, we'll need to ensure that
stateowner lookup and creation are atomic between concurrent compounds.
Ensure that alloc_init_lock_stateowner checks the hashtable under the
client_lock before adding a new element.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we remove client mutex protection, we'll need to ensure that
stateowner lookup and creation are atomic between concurrent compounds.
Ensure that alloc_init_open_stateowner checks the hashtable under the
client_lock before adding a new element.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we remove client_mutex protection, it'll be possible to have an
in-flight operation using an openstateid when a CLOSE call comes in.
If that happens, we can't just put the sc_file reference and clear its
pointer without risking an oops.
Fix this by ensuring that v4.0 CLOSE operations wait for the refcount
to drop before proceeding to do so.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Change it so that only openstateids hold persistent references to
openowners. References can still be held by compounds in progress.
With this, we can get rid of NFS4_OO_NEW. It's possible that we
will create a new openowner in the process of doing the open, but
something later fails. In the meantime, another task could find
that openowner and start using it on a successful open. If that
occurs we don't necessarily want to tear it down, just put the
reference that the failing compound holds.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that lockowner references are only held by lockstateids and
operations that are in-progress. With this, we can get rid of
release_lockowner_if_empty, which will be racy once we remove
client_mutex protection.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A necessary step toward client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow stateowners to be unhashed and destroyed when the last reference
is put. The unhashing must be idempotent. In a future patch, we'll add
some locking around it, but for now it's only protected by the
client_mutex.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that when finding or creating a lockowner, that we get a
reference to it. For now, we also take an extra reference when a
lockowner is created that can be put when release_lockowner is called,
but we'll remove that in a later patch once we change how references are
held.
Since we no longer destroy lockowners in the event of an error in
nfsd4_lock, we must change how the seqid gets bumped in the lk_is_new
case. Instead of doing so on creation, do it manually in nfsd4_lock.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't want to rely on the client_mutex for protection in the case of
NFSv4 open owners. Instead, we add a mutex that will only be taken for
NFSv4.0 state mutating operations, and that will be released once the
entire compound is done.
Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay
take a reference to the stateowner when they are using it for NFSv4.0
open and lock replay caching.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The way stateowners are managed today is somewhat awkward. They need to
be explicitly destroyed, even though the stateids reference them. This
will be particularly problematic when we remove the client_mutex.
We may create a new stateowner and attempt to open a file or set a lock,
and have that fail. In the meantime, another RPC may come in that uses
that same stateowner and succeed. We can't have the first task tearing
down the stateowner in that situation.
To fix this, we need to change how stateowners are tracked altogether.
Refcount them and only destroy them once all stateids that reference
them have been destroyed. This patch starts by adding the refcounting
necessary to do that.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow nfs4_find_stateid_by_type to take the stateid reference, while
still holding the &cl->cl_lock. Necessary step toward client_mutex
removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow nfs4_lookup_stateid to take the stateid reference, instead
of having all the callers do so.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow nfs4_preprocess_seqid_op to take the stateid reference, instead
of having all the callers do so.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that all the callers put the open stateid after use.
Necessary step toward client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that nfsd4_open_confirm() keeps a reference to the open
stateid until it is done working with it.
Necessary step toward client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Prepare nfsd4_close for a future where nfs4_preprocess_seqid_op()
hands it a fully referenced open stateid. Necessary step toward
client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that nfsd4_process_open2() keeps a reference to the open
stateid until it is done working with it. Necessary step toward
client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that nfsd4_process_open2() keeps a reference to the delegation
stateid until it is done working with it. Necessary step toward
client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that nfs4_open_delegation() keeps a reference to the delegation
stateid until it is done working with it. Necessary step toward
client_mutex removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that nfsd4_locku() keeps a reference to the lock stateid
until it is done working with it. Necessary step toward client_mutex
removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that nfsd4_lock() references the lock stateid while it is
manipulating it. Not currently necessary, but will be once the
client_mutex is removed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Hold the cl_lock over the bulk of these functions. In addition to
ensuring that they aren't freed prematurely, this will also help prevent
a potential race that could be introduced later. Once we remove the
client_mutex, it'll be possible for FREE_STATEID and CLOSE to race and
for both to try to put the "persistent" reference to the stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Preparation for removal of the client_mutex.
Currently, no lock aside from the client_mutex is held when calling
find_lock_state. Ensure that the cl_lock is held by adding a lockdep
assertion.
Once we remove the client_mutex, it'll be possible for another thread to
race in and insert a lock state for the same file after we search but
before we insert a new one. Ensure that doesn't happen by redoing the
search after allocating a new stid that we plan to insert. If one is
found just put the one that was allocated.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Change to using the clp->cl_lock for this. For now, there's a lot of
cl_lock thrashing, but in later patches we'll eliminate that and close
the potential races that can occur when releasing the cl_lock while
walking the lists. For now, the client_mutex prevents those races.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Releasing locks when we unhash the stateid instead of doing so only when
the stateid is actually released will be problematic in later patches
when we need to protect the unhashing with spinlocks. Move it into the
sc_free operation instead.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, this is serialized by the client_mutex, which is slated for
removal. Add finer-grained locking here. Also, do some cleanup around
find_stateid to prepare for taking references.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
All stateids are associated with a nfs4_file. Let's consolidate.
Replace delegation->dl_file with the dl_stid.sc_file, and
nfs4_ol_stateid->st_file with st_stid.sc_file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When we remove the client_mutex, we'll need to be able to ensure that
these objects aren't destroyed while we're not holding locks.
Add a ->free() callback to the struct nfs4_stid, so that we can
release a reference to the stid without caring about the contents.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Now that the nfs4_file has a filehandle in it, we no longer need to
keep a per-delegation copy of it. Switch to using the one in the
nfs4_file instead.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The state lock can be fairly heavily contended, and there's no reason
that nfs4_file lookups and delegation_blocked should be mutually
exclusive. Let's give the new block_delegation code its own spinlock.
It does mean that we'll need to take a different lock in the delegation
break code, but that's not generally as critical to performance.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the alloc_init_deleg call into nfs4_set_delegation and change the
function to return a pointer to the delegation or an IS_ERR return. This
allows us to skip allocating a delegation if the file has already
experienced a lease conflict.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
No need to pass in a net pointer since we can derive that.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We want to convert to an atomic type so that we don't need to lock
across the call to alloc_init_deleg(). Then convert to a long type so
that we match the size of 'max_delegations'.
None of this is a problem today, but it will be once we remove
client_mutex protection.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, both destroy_revoked_delegation and revoke_delegation
manipulate the cl_revoked list without any locking aside from the
client_mutex. Ensure that the clp->cl_lock is held when manipulating it,
except for the list walking in destroy_client. At that point, the client
should no longer be in use, and so it should be safe to walk the list
without any locking. That also means that we don't need to do the
list_splice_init there either.
Also, the fact that revoke_delegation deletes dl_recall_lru list_head
without any locking makes it difficult to know whether it's doing so
safely in all cases. Move the list_del_init calls into the callers, and
add a WARN_ON in the event that t's passed a delegation that has a
non-empty list_head.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Ensure that the delegations cannot be found by the laundromat etc once
we add them to the various 'revoke' lists.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Don't allow stateids to clear the open file pointer until they are
being destroyed. In a later patches we'll want to rely on the fact that
we have a valid file pointer when dealing with the stateid and this
will save us from having to do a lot of NULL pointer checks before
doing so.
Also, move to allocating stateids with kzalloc and get rid of the
explicit zeroing of fields.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Remove the fi_inode field in struct nfs4_file in order to remove the
possibility of struct nfs4_file pinning the inode when it does not have
any open state.
The only place we still need to get to an inode is in check_for_locks,
so change it to use find_any_file and use the inode from any that it
finds. If it doesn't find one, then just assume there aren't any.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...instead of just checking the inode that corresponds to it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This makes more sense anyway since an inode pointer value can change
even when the filehandle doesn't.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For use when we may not have a struct inode.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Open stateids must be initialized with the st_access_bmap and
st_deny_bmap set to 0, so that nfs4_get_vfs_file can properly record
their state in old_access_bmap and old_deny_bmap.
This bug was introduced in commit baeb4ff0e5 (nfsd: make deny mode
enforcement more efficient and close races in it) and was causing the
refcounts to end up incorrect when nfs4_get_vfs_file returned an error
after bumping the refcounts. This made it impossible to unmount the
underlying filesystem after running pynfs tests that involve deny modes.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There's a potential race between a lease break and DELEGRETURN call.
Suppose a lease break comes in and queues the workqueue job for a
delegation, but it doesn't run just yet. Then, a DELEGRETURN comes in
finds the delegation and calls destroy_delegation on it to unhash it and
put its primary reference.
Next, the workqueue job runs and queues the delegation back onto the
del_recall_lru list, issues the CB_RECALL and puts the final reference.
With that, the final reference to the delegation is put, but it's still
on the LRU list.
When we go to unhash a delegation, it's because we intend to get rid of
it soon afterward, so we don't want lease breaks to mess with it once
that occurs. Fix this by bumping the dl_time whenever we unhash a
delegation, to ensure that lease breaks don't monkey with it.
I believe this is a regression due to commit 02e1215f9f (nfsd: Avoid
taking state_lock while holding inode lock in nfsd_break_one_deleg).
Prior to that, the state_lock was held in the lm_break callback itself,
and that would have prevented this race.
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We will want to add reference counting to the lock stateid and open
stateids too in later patches.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If nfs4_setlease succesfully acquires a new delegation, then another
task breaks the delegation before we reach hash_delegation_locked, then
the breaking task will see an empty fi_delegations list and do nothing.
The client will receive an open reply incorrectly granting a delegation
and will never receive a recall.
Move more of the delegation fields to be protected by the fi_lock. It's
more granular than the state_lock and in later patches we'll want to
be able to rely on it in addition to the state_lock.
Attempt to acquire a delegation. If that succeeds, take the spinlocks
and then check to see if the file has had a conflict show up since then.
If it has, then we assume that the lease is no longer valid and that
we shouldn't hand out a delegation.
There's also one more potential (but very unlikely) problem. If the
lease is broken before the delegation is hashed, then it could leak.
In the event that the fi_delegations list is empty, reset the
fl_break_time to jiffies so that it's cleaned up ASAP by
the normal lease handling code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd4_probe_callback kicks off some work that will eventually run
nfsd4_process_cb_update and update the session flags. In theory we
could process a following SEQUENCE call before that update happens
resulting in flags that don't accurately represent, for example, the
lack of a backchannel.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add an extra delegation state to allow the stateid to remain in the idr
tree until the last reference has been released. This will be necessary
to ensure uniqueness once the client_mutex is removed.
[jlayton: reset the sc_type under the state_lock in unhash_delegation]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
No need to pass the delegation pointer in here as it's only used to get
the nfs4_file pointer.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
state_lock is a heavily contended global lock. We don't want to grab
that while simultaneously holding the inode->i_lock.
Add a new per-nfs4_file lock that we can use to protect the
per-nfs4_file delegation list. Hold that while walking the list in the
break_deleg callback and queue the workqueue job for each one.
The workqueue job can then take the state_lock and do the list
manipulations without the i_lock being held prior to starting the
rpc call.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's just an obfuscated INIT_WORK call. Just make the work_func_t a
non-static symbol and use a normal INIT_WORK call.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rename it to better describe what it does, and have it just return the
stateid instead of a __be32 (which is now always nfs_ok). Also, do the
search for an existing stateid after the delegation check, to reduce
cleanup if the delegation check returns error.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current enforcement of deny modes is both inefficient and scattered
across several places, which makes it hard to guarantee atomicity. The
inefficiency is a problem now, and the lack of atomicity will mean races
once the client_mutex is removed.
First, we address the inefficiency. We have to track deny modes on a
per-stateid basis to ensure that open downgrades are sane, but when the
server goes to enforce them it has to walk the entire list of stateids
and check against each one.
Instead of doing that, maintain a per-nfs4_file deny mode. When a file
is opened, we simply set any deny bits in that mode that were specified
in the OPEN call. We can then use that unified deny mode to do a simple
check to see whether there are any conflicts without needing to walk the
entire stateid list.
The only time we'll need to walk the entire list of stateids is when a
stateid that has a deny mode on it is being released, or one is having
its deny mode downgraded. In that case, we must walk the entire list and
recalculate the fi_share_deny field. Since deny modes are pretty rare
today, this should be very rare under normal workloads.
To address the potential for races once the client_mutex is removed,
protect fi_share_deny with the fi_lock. In nfs4_get_vfs_file, check to
make sure that any deny mode we want to apply won't conflict with
existing access. If that's ok, then have nfs4_file_get_access check that
new access to the file won't conflict with existing deny modes.
If that also passes, then get file access references, set the correct
access and deny bits in the stateid, and update the fi_share_deny field.
If opening the file or truncating it fails, then unwind the whole mess
and return the appropriate error.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we remove the client_mutex, there's an unlikely but possible race
that could occur. It will be possible for nfs4_file_put_access to race
with nfs4_file_get_access. The refcount will go to zero (briefly) and
then bumped back to one. If that happens we set ourselves up for a
use-after-free and the potential for a lock to race onto the i_flock
list as a filp is being torn down.
Ensure that we can safely bump the refcount on the file by holding the
fi_lock whenever that's done. The only place it currently isn't is in
get_lock_access.
In order to ensure atomicity with finding the file, use the
find_*_file_locked variants and then call get_lock_access to get new
access references on the nfs4_file under the same lock.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fix the "deny" argument type, and start the loop at 1. The 0 iteration
is always a noop.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cleanup -- ensure that the stateid bits are set at the same time that
the file access refcounts are incremented. Keeping them coherent like
this makes it easier to ensure that we account for all of the
references.
Since the initialization of the st_*_bmap fields is done when it's
hashed, we go ahead and hash the stateid before getting access to the
file and unhash it if that function returns error. This will be
necessary anyway in a follow-on patch that will overhaul deny mode
handling.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We never use anything above bit #3, so an unsigned long for each is
wasteful. Shrink them to a char each, and add some WARN_ON_ONCE calls if
we try to set or clear bits that would go outside those sizes.
Note too that because atomic bitops work on unsigned longs, we have to
abandon their use here. That shouldn't be a problem though since we
don't really care about the atomicity in this code anyway. Using them
was just a convenient way to flip bits.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...and replace it with a simple swap call.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Have them take NFS4_SHARE_ACCESS_* flags instead of an open mode. This
spares the callers from having to convert it themselves.
This also allows us to simplify these functions as we no longer need
to do the access_to_omode conversion in either one.
Note too that this patch eliminates the WARN_ON in
__nfs4_file_get_access. It's valid for now, but in a later patch we'll
be bumping the refcounts prior to opening the file in order to close
some races, at which point we'll need to remove it anyway.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Use filp_close instead of open coding. filp_close does a bit more than
just release the locks and put the filp. It also calls ->flush and
dnotify_flush, both of which should be done here anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Preparation for removal of the client_mutex, which currently protects
this array. While we don't actually need the find_*_file_locked variants
just yet, a later patch will. So go ahead and add them now to reduce
future churn in this code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Access to this list is currently serialized by the client_mutex. Add
finer grained locking around this list in preparation for its removal.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
No need to take the lock unless the count goes to 0.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Bruce says:
There's also a preexisting expire_client/laundromat vs break race:
- expire_client/laundromat adds a delegation to its local
reaplist using the same dl_recall_lru field that a delegation
uses to track its position on the recall lru and drops the
state lock.
- a concurrent break_lease adds the delegation to the lru.
- expire/client/laundromat then walks it reaplist and sees the
lru head as just another delegation on the list....
Fix this race by checking the dl_time under the state_lock. If we find
that it's not 0, then we know that it has already been queued to the LRU
list and that we shouldn't queue it again.
In the case of destroy_client, we must also ensure that we don't hit
similar races by ensuring that we don't move any delegations to the
reaplist with a dl_time of 0. Just bump the dl_time by one before we
drop the state_lock. We're destroying the delegations anyway, so a 1s
difference there won't matter.
The fault injection code also requires a bit of surgery here:
First, in the case of nfsd_forget_client_delegations, we must prevent
the same sort of race vs. the delegation break callback. For that, we
just increment the dl_time to ensure that a delegation callback can't
race in while we're working on it.
We can't do that for nfsd_recall_client_delegations, as we need to have
it actually queue the delegation, and that won't happen if we increment
the dl_time. The state lock is held over that function, so we don't need
to worry about these sorts of races there.
There is one other potential bug nfsd_recall_client_delegations though.
Entries on the victims list are not dequeued before calling
nfsd_break_one_deleg. That's a potential list corruptor, so ensure that
we do that there.
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
lookup_clientid is preferable to find_confirmed_client since it's able
to use the cached client in the compound state.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In later patches, we'll be moving the stateowner table into the
nfs4_client, and by doing this we ensure that we have a cached
nfs4_client pointer.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...and have alloc_init_open_stateowner just use the cstate->clp pointer
instead of passing in a clp separately. This allows us to use the
cached nfs4_client pointer in the cstate instead of having to look it
up again.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We want to use the nfsd4_compound_state to cache the nfs4_client in
order to optimise away extra lookups of the clid.
In the v4.0 case, we use this to ensure that we only have to look up the
client at most once per compound for each call into lookup_clientid. For
v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
should never need to do a search for it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The cstate already holds information about the session, and hence
the client id, so it makes more sense to pass that information
rather than the current practice of passing a 'minor version' number.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the client were to disappear from underneath us while we're holding
a session reference, things would be bad. This cleanup helps ensure
that it cannot, which will be a possibility when the client_mutex is
removed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Now that we know that we won't have several lockowners with the same,
owner->data, we can simplify nfsd4_release_lockowner and get rid of
the lo_list in the process.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Just like open-owners, lock-owners are associated with a name, a clientid
and, in the case of minor version 0, a sequence id. There is no association
to a file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A lockowner can have more than one lock stateid. For instance, if a
process has more than one file open and has locks on both, then the same
lockowner has more than one stateid associated with it. Change it so
that this reality is better reflected by the objects that nfsd uses.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not.
This patch replaces the current list of lock owners in the open stateids
with a list of lock stateids.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Minor cleanup that should introduce no behavioral changes.
Currently this function just unhashes the stateid and leaves the caller
to do the work of the CLOSE processing.
Change nfsd4_close_open_stateid so that it handles doing all of the work
of closing a stateid. Move the handling of the unhashed stateid into it
instead of doing that work in nfsd4_close. This will help isolate some
coming changes to stateid handling from nfsd4_close.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There's no need to confirm an openowner in v4.1 and above, so we can
go ahead and set NFS4_OO_CONFIRMED when we create openowners in
those versions. This will also be necessary when we remove the
client_mutex, as it'll be possible for two concurrent opens to race
in versions >4.0.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c
instead of open coding in nfs4svc_encode_compoundres.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Not technically a bugfix, since nothing tries to use the return pointer
if this function doesn't return success, but it could be a problem
with some coming changes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Current code depends on the client_mutex to guarantee a single struct
nfs4_file per inode in the file_hashtbl and make addition atomic with
respect to lookup. Rely instead on the state_Lock, to make it easier to
stop taking the client_mutex here later.
To prevent an i_lock/state_lock inversion, change nfsd4_init_file to
use ihold instead if igrab. That's also more efficient anyway as we
definitely hold a reference to the inode at that point.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd4_process_open2 will currently will get access to the file, and then
call nfsd4_truncate to (possibly) truncate it. If that operation fails
though, then the access references will never be released as the
nfs4_ol_stateid is never initialized.
Fix by moving the nfsd4_truncate call into nfs4_get_vfs_file, ensuring
that the refcounts are properly put if the truncate fails.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
sparse complains that we're stuffing non-byte-swapped values into
__be32's here. Since they're supposed to be opaque, it doesn't matter
much. Just add __force to make sparse happy.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If nfsd needs to recall a delegation for some reason it implies that there is
contention on the file, so further delegations should not be handed out.
The current code fails to do so, and the result is effectively a
live-lock under some workloads: a client attempting a conflicting
operation on a read-delegated file receives NFS4ERR_DELAY and retries
the operation, but by the time it retries the server may already have
given out another delegation.
We could simply avoid delegations for (say) 30 seconds after any recall, but
this is probably too heavy handed.
We could keep a list of inodes (or inode numbers or filehandles) for recalled
delegations, but that requires memory allocation and searching.
The approach taken here is to use a bloom filter to record the filehandles
which are currently blocked from delegation, and to accept the cost of a few
false positives.
We have 2 bloom filters, each of which is valid for 30 seconds. When a
delegation is recalled the filehandle is added to one filter and will remain
disabled for between 30 and 60 seconds.
We keep a count of the number of filehandles that have been added, so when
that count is zero we can bypass all other tests.
The bloom filters have 256 bits and 3 hash functions. This should allow a
couple of dozen blocked filehandles with minimal false positives. If many
more filehandles are all blocked at once, behaviour will degrade towards
rejecting all delegations for between 30 and 60 seconds, then resetting and
allowing new delegations.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
27b11428b7 ("nfsd4: remove lockowner when removing lock stateid")
introduced a memory leak.
Cc: stable@vger.kernel.org
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't want the stateid to be found in the hash table before the delegation
is granted.
Currently this is protected by the client_mutex, but we want to break that
up and this is a necessary step toward that goal.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...as the name is a bit more descriptive and we've started using it for
other purposes.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This fixes a bug in the handling of the fi_delegations list.
nfs4_setlease does not hold the recall_lock when adding to it. The
client_mutex is held, which prevents against concurrent list changes,
but nfsd_break_deleg_cb does not hold while walking it. New delegations
could theoretically creep onto the list while we're walking it there.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The laundromat uses two variables to calculate when it should next run,
but one is completely ignored at the end of the run. Merge the two and
rename the variable to be more descriptive of what it does.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we know the limits the session places on the size of the rpc, we
can also use that information to release any unnecessary reserved reply
buffer space.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We can simplify session limit enforcement by restricting the xdr buflen
to the session size.
Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.
The one exception is very small zero-copy reads. That's an odd corner
case as clients wouldn't normally ask those to be cached.
in any case, this seems a little more robust.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is a mechanical transformation with no change in behavior.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
No need for a kmem_cache_destroy wrapper in nfsd, just do proper
goto based unwinding.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We're not cleaning up everything we need to on error. In particular,
we're not removing our lease. Among other problems this can cause the
struct nfs4_file used as fl_owner to be referenced after it has been
destroyed.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current code assumes a one-to-one lockowner<->lock stateid
correspondance.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.
We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.
Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the state locking and file descriptor reference out from the
callers and into nfs4_preprocess_stateid_op() itself.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
They do not need to be used outside fs/nfsd/nfs4state.c
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It is large, it is used in more than one place, and it is not performance
critical. Let gcc figure out whether it should be inlined...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Mainly to ensure that we don't leave any hanging timers.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RFC5661 obsoletes NFS4ERR_STALE_STATEID in favour of NFS4ERR_BAD_STATEID.
Note that because nfsd encodes the clientid boot time in the stateid, we
can hit this error case in certain scenarios where the Linux client
state management thread exits early, before it has finished recovering
all state.
Reported-by: Idan Kedar <idank@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Connection from alloc_conn must be freed through free_conn,
otherwise, the reference of svc_xprt will never be put.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
CLAIM_FH was added by NFSv4.1. It is the same as CLAIM_NULL except that it
uses only current FH to identify the file to be opened.
The NFS client is using CLAIM_FH if the FH is available when opening a file.
Currently, we cannot get any delegation if we stat a file before open it
because the server delegation code does not recognize CLAIM_FH.
We tested this patch and found delegation can be handed out now when claim is
CLAIM_FH.
See http://marc.info/?l=linux-nfs&m=136369847801388&w=2 and
http://www.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues#New_open_claim_types
Signed-off-by: Ming Chen <mchen@cs.stonybrook.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
As far as I can tell, this list is used only under the state lock, so we
may as well do this in the simpler order.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If failed after calling alloc_session but before init_session, nfsd will call __free_session to
free se_slots in session. But, session->se_fchannel.maxreqs is not initialized (value is zero).
So that, the memory malloced for slots will be lost in free_session_slots for maxreqs is zero.
This path sets the information for channel in alloc_session after mallocing slots succeed,
instead in init_session.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
the length for backchannel checking should be multiplied by sizeof(__be32).
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
check_forechannel_attrs gets drc memory, so nfsd must put it when
check_backchannel_attrs fails.
After many requests with bad back channel attrs, nfsd will deny any
client's CREATE_SESSION forever.
A new test case named CSESS29 for pynfs will send in another mail.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
commit 5b6feee960 forgot
recording the back channel attrs in nfsd4_session.
nfsd just check the back channel attars by check_backchannel_attrs,
but do not record it in nfsd4_session in the latest kernel.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd changes from Bruce Fields:
"This includes miscellaneous bugfixes and cleanup and a performance fix
for write-heavy NFSv4 workloads.
(The most significant nfsd-relevant change this time is actually in
the delegation patches that went through Viro, fixing a long-standing
bug that can cause NFSv4 clients to miss updates made by non-nfs users
of the filesystem. Those enable some followup nfsd patches which I
have queued locally, but those can wait till 3.14)"
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (24 commits)
nfsd: export proper maximum file size to the client
nfsd4: improve write performance with better sendspace reservations
svcrpc: remove an unnecessary assignment
sunrpc: comment typo fix
Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation"
nfsd4: fix discarded security labels on setattr
NFSD: Add support for NFS v4.2 operation checking
nfsd4: nfsd_shutdown_net needs state lock
NFSD: Combine decode operations for v4 and v4.1
nfsd: -EINVAL on invalid anonuid/gid instead of silent failure
nfsd: return better errors to exportfs
nfsd: fh_update should error out in unexpected cases
nfsd4: need to destroy revoked delegations in destroy_client
nfsd: no need to unhash_stid before free
nfsd: remove_stid can be incorporated into nfs4_put_delegation
nfsd: nfs4_open_delegation needs to remove_stid rather than unhash_stid
nfsd: nfs4_free_stid
nfsd: fix Kconfig syntax
sunrpc: trim off EC bytes in GSSAPI v2 unwrap
gss_krb5: document that we ignore sequence number
...
For now FL_DELEG is just a synonym for FL_LEASE. So this patch doesn't
change behavior.
Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit 7ebe40f203. We forgot
the nfs4_put_delegation call in fs/nfsd/nfs4callback.c which should not
be unhashing the stateid. This lead to warnings from the idr code when
we tried to removed id's twice.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A comment claims the caller should take it, but that's not being done.
Note we don't want it around the cancel_delayed_work_sync since that may
wait on work which holds the client lock.
Reported-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[use list_splice_init]
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
[bfields: no need for recall_lock here]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
idr_remove is about to be called before kmem_cache_free so unhashing it
is redundant
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
All calls to nfs4_put_delegation are preceded with remove_stid.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the out_free: path, the newly allocated stid must be removed rather
than unhashed so it can never be found.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This fixes a regression from 68a3396178
"nfsd4: shut down more of delegation earlier".
After that commit, nfs4_set_delegation() failures result in
nfs4_put_delegation being called, but nfs4_put_delegation doesn't free
the nfs4_file that has already been set by alloc_init_deleg().
This can result in an oops on later unmounting the exported filesystem.
Note also delaying the fi_had_conflict check we're able to return a
better error (hence give 4.1 clients a better idea why the delegation
failed; though note CONFLICT isn't an exact match here, as that's
supposed to indicate a current conflict, but all we know here is that
there was one recently).
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit df66e75395.
nfsd4_lock can get a read-only or write-only reference when only a
read-write open is available. This is normal.
Cc: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This actually makes a difference in the 4.1 case, since we use the
status to decide what reason to give the client for the delegation
refusal (see nfsd4_open_deleg_none_ext), and in theory a client might
choose suboptimal behavior if we give the wrong answer.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If fi_fds = {non-NULL, NULL, non-NULL} and oflag = O_WRONLY
the WARN_ON_ONCE(!(fp->fi_fds[oflag] || fp->fi_fds[O_RDWR]))
doesn't trigger when it should.
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd changes from Bruce Fields:
"Changes this time include:
- 4.1 enabled on the server by default: the last 4.1-specific issues
I know of are fixed, so we're not going to find the rest of the
bugs without more exposure.
- Experimental support for NFSv4.2 MAC Labeling (to allow running
selinux over NFS), from Dave Quigley.
- Fixes for some delicate cache/upcall races that could cause rare
server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
debugging persistence.
- Fixes for some bugs found at the recent NFS bakeathon, mostly v4
and v4.1-specific, but also a generic bug handling fragmented rpc
calls"
* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: support minorversion 1 by default
nfsd4: allow destroy_session over destroyed session
svcrpc: fix failures to handle -1 uid's
sunrpc: Don't schedule an upcall on a replaced cache entry.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: remove races with queuing an upcall.
nfsd4: return delegation immediately if lease fails
nfsd4: do not throw away 4.1 lock state on last unlock
nfsd4: delegation-based open reclaims should bypass permissions
svcrpc: don't error out on small tcp fragment
svcrpc: fix handling of too-short rpc's
nfsd4: minor read_buf cleanup
nfsd4: fix decoding of compounds across page boundaries
nfsd4: clean up nfs4_open_delegation
NFSD: Don't give out read delegations on creates
nfsd4: allow client to send no cb_sec flavors
nfsd4: fail attempts to request gss on the backchannel
nfsd4: implement minimal SP4_MACH_CRED
...
RFC 5661 allows a client to destroy a session using a compound
associated with the destroyed session, as long as the DESTROY_SESSION op
is the last op of the compound.
We attempt to allow this, but testing against a Solaris client (which
does destroy sessions in this way) showed that we were failing the
DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
count on the session (held by us) represented another rpc in progress
over this session.
Fix this by noting that in this case the expected reference count is 1,
not 0.
Also, note as long as the session holds a reference to the compound
we're destroying, we can't free it here--instead, delay the free till
the final put in nfs4svc_encode_compoundres.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This case shouldn't happen--the administrator shouldn't really allow
other applications access to the export until clients have had the
chance to reclaim their state--but if it does then we should set the
"return this lease immediately" bit on the reply. That still leaves
some small races, but it's the best the protocol allows us to do in the
case a lease is ripped out from under us....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit eb2099f31b "nfsd4:
release lockowners on last unlock in 4.1 case". Trond identified
language in rfc 5661 section 8.2.4 which forbids this behavior:
Stateids associated with byte-range locks are an exception.
They remain valid even if a LOCKU frees all remaining locks, so
long as the open file with which they are associated remains
open, unless the client frees the stateids via the FREE_STATEID
operation.
And bakeathon 2013 testing found a 4.1 freebsd client was getting an
incorrect BAD_STATEID return from a FREE_STATEID in the above situation
and then failing.
The spec language honestly was probably a mistake but at this point with
implementations already following it we're probably stuck with that.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfs4_open_delegation logic is unecessarily baroque.
Also stop pretending we support write delegations in several places.
Some day we will support write delegations, but when that happens adding
back in these flag parameters will be the easy part. For now they're
just confusing.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When an exclusive create is done with the mode bits
set (aka open(testfile, O_CREAT | O_EXCL, 0777)) this
causes a OPEN op followed by a SETATTR op. When a
read delegation is given in the OPEN, it causes
the SETATTR to delay with EAGAIN until the
delegation is recalled.
This patch caused exclusive creates to give out
a write delegation (which turn into no delegation)
which allows the SETATTR seamlessly succeed.
Signed-off-by: Steve Dickson <steved@redhat.com>
[bfields: do this for any CREATE, not just exclusive; comment]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't support gss on the backchannel. We should state that fact up
front rather than just letting things continue and later making the
client try to figure out why the backchannel isn't working.
Trond suggested instead returning NFS4ERR_NOENT. I think it would be
tricky for the client to distinguish between the case "I don't support
gss on the backchannel" and "I can't find that in my cache, please
create another context and try that instead", and I'd prefer something
that currently doesn't have any other meaning for this operation, hence
the (somewhat arbitrary) NFS4ERR_ENCR_ALG_UNSUPP.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
the client-provided spo_must_* arrays and just enforcing credential
checks for the minimum required operations.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
This will make it easier to enforce SP4_MACH_CRED, which needs to
compare the mechanism used on the exchange_id with that used on
protected operations.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.
->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.
Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.
For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the blocked_list is done without dropping the
lock in between.
On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.
With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.
Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In C, signed integer overflow results in undefined behavior, but unsigned
overflow wraps around. So do the subtraction first, then cast to signed.
Reported-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This code assumes that any client using exchange_id is using NFSv4.1,
but with the introduction of 4.2 that will no longer true.
This main effect of this is that client callbacks will use the same
minorversion as that used on the exchange_id.
Note that clients are forbidden from mixing 4.1 and 4.2 compounds. (See
rfc 5661, section 2.7, #13: "A client MUST NOT attempt to use a stateid,
filehandle, or similar returned object from the COMPOUND procedure with
minor version X for another COMPOUND procedure with minor version Y,
where X != Y.") However, we do not currently attempt to enforce this
except in the case of mixing zero minor version with non-zero minor
versions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd changes from J Bruce Fields:
"Highlights include:
- Some more DRC cleanup and performance work from Jeff Layton
- A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the
server using credentials from Active Directory often fail due to
limitations of the svcgssd upcall interface. This replacement
lifts those limitations. The existing upcall is still supported
for backwards compatibility.
- More NFSv4.1 support: at this point, if a user with a current
client who upgrades from 4.0 to 4.1 should see no regressions. In
theory we do everything a 4.1 server is required to do. Patches
for a couple minor exceptions are ready for 3.11, and with those
and some more testing I'd like to turn 4.1 on by default in 3.11."
Fix up semantic conflict as per Stephen Rothwell and linux-next:
Commit 030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication") adds two new users of "PDE(inode)->data", but we're
supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad
("procfs: new helper - PDE_DATA(inode)").
The old PDE() macro is no longer available since commit c30480b92c
("proc: Make the PROC_I() and PDE() macros internal to procfs")
* 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits)
NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
nfsd: make symbol nfsd_reply_cache_shrinker static
svcauth_gss: fix error return code in rsc_parse()
nfsd4: don't remap EISDIR errors in rename
svcrpc: fix gss-proxy to respect user namespaces
SUNRPC: gssp_procedures[] can be static
SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case
nfsd4: better error return to indicate SSV non-support
nfsd: fix EXDEV checking in rename
SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
SUNRPC: conditionally return endtime from import_sec_context
SUNRPC: allow disabling idle timeout
SUNRPC: attempt AF_LOCAL connect on setup
nfsd: Decode and send 64bit time values
nfsd4: put_client_renew_locked can be static
nfsd4: remove unused macro
nfsd4: remove some useless code
nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
...
As 4.1 becomes less experimental and SSV still isn't implemented, we
have to admit it's not going to be, and return some sensible error
rather than just saying "our server's broken". Discussion in the ietf
group hasn't turned up any objections to using NFS4ERR_ENC_ALG_UNSUPP
for that purpose.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The "list_empty(&oo->oo_owner.so_stateids)" is aways true, so remove it.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A 4.1 server must notify a client that has had any state revoked using
the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag. The client can figure
out exactly which state is the problem using CHECK_STATEID and then free
it using FREE_STATEID. The status flag will be unset once all such
revoked stateids are freed.
Our server's only recallable state is delegations. So we keep with each
4.1 client a list of delegations that have timed out and been recalled,
but haven't yet been freed by FREE_STATEID.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The logic here is better expressed with a switch statement.
While we're here, CLOSED stateids (or stateids of an unkown type--which
would indicate a server bug) should probably return nfserr_bad_stateid,
though this behavior shouldn't affect any non-buggy client.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Negotiation of the 4.1 session forechannel attributes is a mess. Fix:
- Move it all into check_forechannel_attrs instead of spreading
it between that, alloc_session, and init_forechannel_attrs.
- set a minimum "slotsize" so that our drc memory limits apply
even for small maxresponsesize_cached. This also fixes some
bugs when slotsize becomes <= 0.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pass this struct by reference, not by value, and return an error instead
of a boolean to allow for future additions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the 4.1 case we're supposed to release lockowners as soon as they're
no longer used.
It would probably be more efficient to reference count them, but that's
slightly fiddly due to the need to have callbacks from locks.c to take
into account lock merging and splitting.
For most cases just scanning the inode's lock list on unlock for
matching locks will be sufficient.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Closed stateid's are kept around a little while to handle close replays
in the 4.0 case. So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner. We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented. But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.
This is unnecessarily baroque.
Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.
The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.
Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we've unhashed the delegation, it's only hanging around for the
benefit of an oustanding recall, which only needs the encoded
filehandle, stateid, and dl_retries counter. No point keeping the file
around any longer, or keeping it hashed.
This also fixes a race: calls to idr_remove should really be serialized
by the caller, but the nfs4_put_delegation call from the callback code
isn't taking the state lock.
(Better might be to cancel the callback before destroying the
delegation, and remove any need for reference counting--but I don't see
an easy way to cancel an rpc call.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY. This simplifies things somewhat and
helps meet a coming 4.2 requirement.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.
The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.
This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.
The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end. But this still
leaves some races in the current code.
I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY. This is a case that should never happen
anyway.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
destroy_session uses the session and client without continuously holding
any reference or locks.
Put the whole thing under the state lock for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I'm not sure what the check for clientid expiry was meant to do here.
The check for a matching session is redundant given the previous check
for state: a client without state is, in particular, a client without
sessions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
E.g. printk's that just report the return value from an op are
uninteresting as we already do that in the main proc_compound loop.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This should never happen.
(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.
The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
idr_get_new*() and friends are about to be deprecated. Convert to the
new idr_alloc() interface.
Only compile-tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_new_stid() is no longer used since commit 3abdb60712 ("nfsd4:
simplify idr allocation"). Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd changes from J Bruce Fields:
"Miscellaneous bugfixes, plus:
- An overhaul of the DRC cache by Jeff Layton. The main effect is
just to make it larger. This decreases the chances of intermittent
errors especially in the UDP case. But we'll need to watch for any
reports of performance regressions.
- Containerized nfsd: with some limitations, we now support
per-container nfs-service, thanks to extensive work from Stanislav
Kinsbursky over the last year."
Some notes about conflicts, since there were *two* non-data semantic
conflicts here:
- idr_remove_all() had been added by a memory leak fix, but has since
become deprecated since idr_destroy() does it for us now.
- xs_local_connect() had been added by this branch to make AF_LOCAL
connections be synchronous, but in the meantime Trond had changed the
calling convention in order to avoid a RCU dereference.
There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.
* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
SUNRPC: make AF_LOCAL connect synchronous
nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
svcrpc: fix rpc server shutdown races
svcrpc: make svc_age_temp_xprts enqueue under sv_lock
lockd: nlmclnt_reclaim(): avoid stack overflow
nfsd: enable NFSv4 state in containers
nfsd: disable usermode helper client tracker in container
nfsd: use proper net while reading "exports" file
nfsd: containerize NFSd filesystem
nfsd: fix comments on nfsd_cache_lookup
SUNRPC: move cache_detail->cache_request callback call to cache_read()
SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
SUNRPC: rework cache upcall logic
SUNRPC: introduce cache_detail->cache_request callback
NFS: simplify and clean cache library
NFS: use SUNRPC cache creation and destruction helper for DNS cache
nfsd4: free_stid can be static
nfsd: keep a checksum of the first 256 bytes of request
sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
sunrpc: fix comment in struct xdr_buf definition
...
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.
I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.
There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.
The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.
XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...
The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, NFSd is ready to operate in network namespace based containers.
So let's drop check for "init_net" and make it able to fly.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Use uid_eq(uid, GLOBAL_ROOT_UID) instead of !uid.
Use gid_eq(gid, GLOBAL_ROOT_GID) instead of !gid.
Use uid_eq(uid, INVALID_UID) instead of uid == -1
Use gid_eq(uid, INVALID_GID) instead of gid == -1
Use uid = GLOBAL_ROOT_UID instead of uid = 0;
Use gid = GLOBAL_ROOT_GID instead of gid = 0;
Use !uid_eq(uid1, uid2) instead of uid1 != uid2.
Use !gid_eq(gid1, gid2) instead of gid1 != gid2.
Use uid_eq(uid1, uid2) instead of uid1 == uid2.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
These routines are used by server and client code, so having them in a
separate header would be best.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't really need to preallocate at all; just allocate and initialize
everything at once, but leave the sc_type field initially 0 to prevent
finding the stateid till it's fully initialized.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When free nfs-client, it must free the ->cl_stateids.
Cc: stable@kernel.org
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If CONFIG_LOCKDEP is disabled, then there would be a warning like this:
CC [M] fs/nfsd/nfs4state.o
fs/nfsd/nfs4state.c: In function ‘free_client’:
fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]
So, let's add "maybe_unused" tag to this variable.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the procedure of CREATE_SESSION, the state is locked after
alloc_conn_from_crses(). If the allocation fails, the function
goes to "out_free_session", and then "out" where there is an
unlock function.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In alloc_session(), numslots is the correct slot number used by the session.
But the slot number passed to nfsd4_put_drc_mem() is the one from nfs client.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Note the stateid is hashed early on in init_stid(), but isn't currently
being unhashed on error paths.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>