Commit Graph

482 Commits

Author SHA1 Message Date
Andreas Gruenbacher
446279168e Merge part of branch 'for-next.instantiate' into for-next 2022-08-05 18:37:03 +02:00
Roman Gushchin
e33c267ab7 mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects.  For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g.  for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers.  register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
    <subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
  $ cd /sys/kernel/debug/shrinker/
  $ ls
    dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
    mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
    mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
    rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
    sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
    sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
    sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
    sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
    sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
    sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
    sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
    sb-dax-11           sb-proc-45       sb-tmpfs-35
    sb-debugfs-7        sb-proc-46       sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
  Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
  Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:40 -07:00
Andreas Gruenbacher
6feaec8147 gfs2: List traversal in do_promote is safe
In do_promote(), we're never removing the current entry from the list
and so the list traversal is actually safe.  Switch back to
list_for_each_entry().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 17:01:59 +02:00
Bob Peterson
0befb8511e gfs2: do_promote glock holder stealing fix
In do_promote(), when the glock had no strong holders, we were
accidentally calling demote_incompat_holders() with new_gh == NULL, so
no weak holders were considered incompatible.  Instead, the new holder
should have been passed in.

For doing that, the HIF_HOLDER flag needs to be set in new_gh to prevent
may_grant() from complaining.  This means that the new holder will now
be recognized as a current holder, so skip over it explicitly in
demote_incompat_holders() to prevent it from being dequeued.

To further clarify things, we can now rename new_gh to current_gh in
demote_incompat_holders(); after all, the HIF_HOLDER flag is already set,
which means the new holder is already a current holder.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 17:00:55 +02:00
Andreas Gruenbacher
8f0028fc60 gfs2: Use better variable name
In do_promote() and add_to_queue(), use current_gh as the variable name
for the first strong holder we could find: this matches the variable
name is may_grant(), and more clearly indicates that we're interested in
one (any) of the current strong holders.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 17:00:21 +02:00
Andreas Gruenbacher
5f38a4d3c4 gfs2: Make go_instantiate take a glock
Make go_instantiate take a glock instead of a glock holder as its argument:
this handler is supposed to instantiate the object associated with the glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 16:59:07 +02:00
Andreas Gruenbacher
86c30a01f5 gfs2: Add new go_held glock operation
Right now, inode_go_instantiate() contains functionality that relates to
how a glock is held rather than the glock itself, like waiting for
pending direct I/O to complete and completing interrupted truncates.
This code is meant to be run each time a holder is acquired, but
go_instantiate is actually only called once, when the glock is
instantiated.

To fix that, introduce a new go_held glock operation that is called each
time a glock holder is acquired.  Move the holder specific code in
inode_go_instantiate() over to inode_go_held().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 16:56:41 +02:00
Andreas Gruenbacher
de3f906f0a gfs2: Revert 'Fix "truncate in progress" hang'
Now that interrupted truncates are completed in the context of the
process taking the glock, there is no need for the glock state engine to
delegate that task to gfs2_quotad or for quotad to perform those
truncates anymore.  Get rid of the obsolete associated infrastructure.

Reverts commit 813e0c46c9 ("GFS2: Fix "truncate in progress" hang").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2022-06-29 16:54:59 +02:00
Andreas Gruenbacher
53d6913295 gfs2: Instantiate glocks ouside of glock state engine
Instantiate glocks outside of the glock state engine: there is no real
reason for instantiating them inside the glock state engine; it only
complicates the code.

Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait()
using the new gfs2_glock_holder_ready() helper.  On top of that, the only
other place that acquires a glock without using gfs2_glock_wait() or
gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call
gfs2_glock_holder_ready() there as well.

If a dinode has a pending truncate, the glock-specific instantiate function
for inodes wakes up the truncate function in the quota daemon.  Waiting for
the completion of the truncate was previously done by the glock state
engine, but we now need to wait in inode_go_instantiate().

This also means that gfs2_instantiate() will now no longer return any
"special" error codes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 16:53:22 +02:00
Andreas Gruenbacher
bdff777cbb gfs2: Fix up gfs2_glock_async_wait
Since commit 1fc05c8d84 ("gfs2: cancel timed-out glock requests"), a
pending locking request can be canceled by calling gfs2_glock_dq() on
the pending holder.  In gfs2_glock_async_wait(), when we time out, use
that to cancel the remaining locking requests and dequeue the locking
requests already granted.  That's simpler as well as more efficient than
waiting for all locking requests to eventually be granted and dequeuing
them then.

In addition, gfs2_glock_async_wait() promises that by the time the
function completes, all glocks are either granted or dequeued, but the
implementation doesn't keep that promise if individual locking requests
fail.  Fix that as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 16:47:44 +02:00
Andreas Gruenbacher
cbe6d2576e gfs2: Add GL_NOPID flag for process-independent glock holders
Add a GL_NOPID flag to indicate that once a glock holder has been acquired, it
won't be associated with the current process anymore.  This is useful for iopen
and flock glocks which are associated with open files, as well as journal glock
holders and similar which are associated with the filesystem.

Once GL_NOPID is used for all applicable glocks (see the next patches),
processes will no longer be falsely reported as holding glocks which they are
not actually holding in the glocks dump file.  Unlike before, when a process is
reported as having "(ended)", this will indicate an actual bug.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 13:07:54 +02:00
Andreas Gruenbacher
56535dc695 gfs2: Add flocks to glockfd debugfs file
Include flock glocks in the "glockfd" debugfs file.  Those are similar to the
iopen glocks; while an open file is holding an flock, it is holding the file's
flock glock.

We cannot take f_fl_mutex in gfs2_glockfd_seq_show_flock() or else dumping the
"glockfd" file would block on flock operations.  Instead, use the file->f_lock
spin lock to protect the f_fl_gh.gh_gl glock pointer.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 13:07:53 +02:00
Andreas Gruenbacher
4480c27ca3 gfs2: Add glockfd debugfs file
When a process has a gfs2 file open, the file is keeping a reference on the
underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in
shared mode.  In other words, the process depends on the iopen glock of each
open gfs2 file.  Expose those dependencies in a new "glockfd" debugfs file.

The new debugfs file contains one line for each gfs2 file descriptor,
specifying the tgid, file descriptor number, and glock name, e.g.,

  1601 6 5/816d

This list is compiled by iterating all tasks on the system using find_ge_pid(),
and all file descriptors of each task using task_lookup_next_fd_rcu().  To make
that work from gfs2, export those two functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 13:07:16 +02:00
Andreas Gruenbacher
44dab005fd gfs2: Minor gfs2_glock_nq_m cleanup
Add state and flags arguments to gfs2_rlist_alloc() to make it somewhat more
obvious which state and flags an rlist uses.  With that, stop knocking off
flags in gfs2_glock_nq_m() and its nq_m_sync() helper that are never set in the
first place.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-28 20:38:15 +02:00
Bob Peterson
565f82b57a gfs2: Rewrap overlong comment in do_promote
Rewrap the comment to keep the line length below 80 characters.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-09 12:26:43 +02:00
Kees Cook
11d8b79e84 gfs2: Use container_of() for gfs2_glock(aspace)
Clang's structure layout randomization feature gets upset when it sees
struct address_space (which is randomized) cast to struct gfs2_glock.
This is due to seeing the mapping pointer as being treated as an array
of gfs2_glock, rather than "something else, before struct address_space":

In file included from fs/gfs2/acl.c:23:
fs/gfs2/meta_io.h:44:12: error: casting from randomized structure pointer type 'struct address_space *' to 'struct gfs2_glock *'
	return (((struct gfs2_glock *)mapping) - 1)->gl_name.ln_sbd;
		^

Replace the instances of open-coded pointer math with container_of()
usage, and update the allocator to match.

Some cleanups and conversion of gfs2_glock_get() and
gfs2_glock_dealloc() by Andreas.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202205041550.naKxwCBj-lkp@intel.com
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bill Wendling <morbo@google.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-24 21:29:14 +02:00
Andreas Gruenbacher
a4e8145edc gfs2: Initialize gh_error in gfs2_glock_nq
The gh_error field if a glock holder is initialized to zero in
gfs2_holder_init().  When a locking operation fails, gh_error is set to
an error code; when it succeeds, the gh_error value is left unchanged.
The field isn't initialized in gfs2_holder_reinit(), which is a problem.
Instead of fixing that directly, initialize gh_error in gfs2_glock_nq().
That also obsoletes the assignment in do_flock().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15 15:01:40 +01:00
Andreas Gruenbacher
5a27a43efd gfs2: Make use of list_is_first
Use list_is_first() instead of open-coding it.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15 15:01:40 +01:00
Andreas Gruenbacher
1fc05c8d84 gfs2: cancel timed-out glock requests
The gfs2 evict code tries to upgrade the iopen glock from SH to EX. If
the attempt to upgrade times out, gfs2 needs to tell dlm to cancel the
lock request or it can deadlock. We also need to wake up the process
waiting for the lock when dlm sends its AST back to gfs2.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2022-02-15 15:01:40 +01:00
Andreas Gruenbacher
356b8103d4 Revert "gfs2: check context in gfs2_glock_put"
It turns out that the might_sleep() call that commit 660a6126f8 adds
is triggering occasional data corruption in testing.  We're not sure
about the root cause yet, but since this commit was added as a debugging
aid only, revert it for now.

This reverts commit 660a6126f8.

Fixes: 660a6126f8 ("gfs2: check context in gfs2_glock_put")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-11 17:44:11 +01:00
Andreas Gruenbacher
3c5c67ec29 gfs2: Fix gfs2_instantiate description
The description of gfs2_instantiate accidentally lists a glock argument,
but the function takes a glock holder.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04 20:02:26 +01:00
Andreas Gruenbacher
ffd0cd3c2f gfs2: Fix __gfs2_holder_init function name in kernel-doc comment
The function name in the kernel-doc comment wasn't updated when the
function was renamed.

Fixes: b016d9a84a ("gfs2: Save ip from gfs2_glock_nq_init")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04 20:02:26 +01:00
Andreas Gruenbacher
e11b02df60 gfs2: Fix remote demote of weak glock holders
When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in
response to a remote locking conflict, we don't set the HIF_HOLDER flag.  This
causes function may_grant to BUG.  Fix by setting the missing HIF_HOLDER flag
in the mock glock holder.

In addition, define the mock glock holder where it is used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02 12:41:09 +01:00
Andreas Gruenbacher
a7ac203d8f gfs2: Fix "Introduce flag for glock holder auto-demotion"
Function demote_incompat_holders iterates over the list of glock holders
with list_for_each_entry, and it then sometimes removes the current
holder from the list.  This will get the loop stuck; we must use
list_for_each_entry_safe instead.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-08 16:29:28 +01:00
Andreas Gruenbacher
7a92deaae6 gfs2: Fix atomic bug in gfs2_instantiate
Replace test_bit() + set_bit() with test_and_set_bit() where we need an atomic
operation.  Use clear_and_wake_up_bit() instead of open coding it.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-05 17:03:31 +01:00
Alexander Aring
660a6126f8 gfs2: check context in gfs2_glock_put
Add a might_sleep call into gfs2_glock_put which can sleep in DLM when
the last reference is released.  This will show problems earlier, and
not only when the last reference is put.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:20 +02:00
Andreas Gruenbacher
7427f3bb49 gfs2: Fix glock_hash_walk bugs
So far, glock_hash_walk took a reference on each glock it iterated over, and it
was the examiner's responsibility to drop those references.  Dropping the final
reference to a glock can sleep and the examiners are called in a RCU critical
section with spin locks held, so examiners that didn't need the extra reference
had to drop it asynchronously via gfs2_glock_queue_put or similar.  This wasn't
done correctly in thaw_glock which did call gfs2_glock_put, and not at all in
dump_glock_func.

Change glock_hash_walk to not take glock references at all.  That way, the
examiners that don't need them won't have to bother with slow asynchronous
puts, and the examiners that do need references can take them themselves.

Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:20 +02:00
Andreas Gruenbacher
486408d690 gfs2: Cancel remote delete work asynchronously
In gfs2_inode_lookup and gfs2_create_inode, we're calling
gfs2_cancel_delete_work which currently cancels any remote delete work
(delete_work_func) synchronously.  This means that if the work is
currently running, it will wait for it to finish.  We're doing this to
pevent a previous instance of an inode from having any influence on the
next instance.

However, delete_work_func uses gfs2_inode_lookup internally, and we can
end up in a deadlock when delete_work_func gets interrupted at the wrong
time.  For example,

  (1) An inode's iopen glock has delete work queued, but the inode
      itself has been evicted from the inode cache.

  (2) The delete work is preempted before reaching gfs2_inode_lookup.

  (3) Another process recreates the inode (gfs2_create_inode).  It tries
      to cancel any outstanding delete work, which blocks waiting for
      the ongoing delete work to finish.

  (4) The delete work calls gfs2_inode_lookup, which blocks waiting for
      gfs2_create_inode to instantiate and unlock the new inode =>
      deadlock.

It turns out that when the delete work notices that its inode has been
re-instantiated, it will do nothing.  This means that it's safe to
cancel the delete work asynchronously.  This prevents the kind of
deadlock described above.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-10-25 08:42:20 +02:00
Bob Peterson
f2e70d8f2f gfs2: fix GL_SKIP node_scope problems
Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_instantiate glops function
(if one existed), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock.

But node-scope introduced a new problem: if the first holder has GL_SKIP
and the next one does NOT, since it is not the first holder on the queue,
the go_instantiate op was not called. Eventually the GL_SKIP holder may
call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
still a window of time in which another non-GL_SKIP holder assumes the
instantiate function had been called by the first holder. In the case of
rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.

This patch tries to fix the problem by introducing two new glock flags:

GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
needs to be called to "fill in" or "read in" the object before it is
referenced.

GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
in the process of reading in the object. Whenever a function needs to
reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
function.

As before, the gl_lockref spin_lock is unlocked during the IO operation,
which may take a relatively long amount of time to complete. While
unlocked, if another process determines go_instantiate is still needed,
it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
go_instantiate operation may not have been successful.

Functions that previously called the instantiate sub-functions now call
directly into gfs2_instantiate so the new bits are managed properly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Bob Peterson
e6f856008d gfs2: split glock instantiation off from do_promote
Before this patch, function do_promote had a section of code that did
the actual instantiation.  This patch splits that off into its own
function, gfs2_instantiate, which prepares us for the next patch that
will use that function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Bob Peterson
60d8bae9d1 gfs2: further simplify do_promote
This patch further simplifies function do_promote by eliminating some
redundant code in favor of using a lock_released flag. This is just
prep work for a future patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Bob Peterson
17a6eceeb1 gfs2: re-factor function do_promote
This patch simply re-factors function do_promote to reduce the indents.
The logic should be unchanged. This makes future patches more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Andreas Gruenbacher
d74d0ce5bc gfs2: Remove 'first' trace_gfs2_promote argument
Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the
'first' holder isn't the one that instantiates the glock
(gl_instantiate), which is what the 'first' flag was apparently supposed
to indicate.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Bob Peterson
3278b977c9 gfs2: change go_lock to go_instantiate
Before this patch, the go_lock glock operations (glops) did not do
any actual locking. They were used to instantiate objects, like reading
in dinodes and rgrps from the media.

This patch renames the functions to go_instantiate for clarity.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:18 +02:00
Andreas Gruenbacher
b016d9a84a gfs2: Save ip from gfs2_glock_nq_init
Before this patch, when a glock was locked by function gfs2_glock_nq_init,
it initialized the holder gh_ip (return address) as gfs2_glock_nq_init.
That made it extremely difficult to track down problems because many
functions call gfs2_glock_nq_init. This patch changes the function so
that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes
it easy to backtrack which holder took the lock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-10-25 08:42:18 +02:00
Bob Peterson
c1442f6b53 gfs2: move GL_SKIP check from glops to do_promote
Before this patch, each individual "go_lock" glock operation (glop)
checked the GL_SKIP flag, and if set, would skip further processing.

This patch changes the logic so the go_lock caller, function go_promote,
checks the GL_SKIP flag before calling the go_lock op in the first place.
This avoids having to unnecessarily unlock gl_lockref.lock only to
re-lock it again.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:17 +02:00
Bob Peterson
4c69038d90 gfs2: Add GL_SKIP holder flag to dump_holder
Somehow, the GL_SKIP flag was missed when dumping glock holders.
This patch adds it to function hflags2str. I added it at the end because
I wanted Holder and Skip flags together to read "Hs" rather than "sH"
to avoid confusion with "Shared" ("SH") holder state.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:17 +02:00
Bob Peterson
dc732906c2 gfs2: Introduce flag for glock holder auto-demotion
This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the locking
state of an active holder and that holder has the HIF_MAY_DEMOTE flag
set, the holder will be demoted before the incoming locking request is
granted.

Note that this mechanism demotes active holders (with the HIF_HOLDER
flag set), while before we were only demoting glocks without any active
holders.  This allows processes to keep hold of locks that may form a
cyclic locking dependency; the core glock logic will then break those
dependencies in case a conflicting locking request occurs.  We'll use
this to avoid giving up the inode glock proactively before faulting in
pages.

Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
Later, they call gfs2_holder_disallow_demote() to clear the flag again,
and then they check if their holder is still queued: if it is, they are
still holding the glock; if it isn't, they can re-acquire the glock (or
abort).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20 19:33:08 +02:00
Andreas Gruenbacher
6144464937 gfs2: Clean up function may_grant
Pass the first current glock holder into function may_grant and
deobfuscate the logic there.

While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant.  To make
that build cleanly, de-constify the may_grant arguments.

We're now using function find_first_holder in do_promote, so move the
function's definition above do_promote.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20 19:33:08 +02:00
Bob Peterson
08d7366671 gfs2: Remove redundant check from gfs2_glock_dq
In function gfs2_glock_dq, it checks to see if this is the fast path.
Before this patch, it checked both "find_first_holder(gl) == NULL" and
list_empty(&gl->gl_holders), which is redundant. If gl_holders is empty
then find_first_holder must return NULL. This patch removes the
redundancy.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-08-20 09:03:46 -05:00
Bob Peterson
a8f1d32d0f gfs2: Eliminate vestigial HIF_FIRST
Holder flag HIF_FIRST is no longer used or needed, so remove it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-08-20 09:03:46 -05:00
Baokun Li
38a618dbf4 gfs2: Use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28 14:13:38 +02:00
Hillf Danton
1ab19c5de4 gfs2: Fix use-after-free in gfs2_glock_shrink_scan
The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to
remove the glock from the lru list in __gfs2_glock_put().

On the shrink scan path, the same flag is cleared under lru_lock but because
of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the
put side can be made without deleting the glock from the lru list.

Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to
ensure correct behavior on both sides - clear GLF_LRU after list_del under
lru_lock.

Reported-by: syzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-31 12:03:28 +02:00
Bob Peterson
865cc3e9cc gfs2: fix a deadlock on withdraw-during-mount
Before this patch, gfs2 would deadlock because of the following
sequence during mount:

mount
   gfs2_fill_super
      gfs2_make_fs_rw <--- Detects IO error with glock
         kthread_stop(sdp->sd_quotad_process);
            <--- Blocked waiting for quotad to finish

logd
   Detects IO error and the need to withdraw
   calls gfs2_withdraw
      gfs2_make_fs_ro
         kthread_stop(sdp->sd_quotad_process);
            <--- Blocked waiting for quotad to finish

gfs2_quotad
   gfs2_statfs_sync
      gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted

glock_work_func
   do_xmote <---Detects IO error, can't release glock: blocked on withdraw
      glops->go_inval
      glock_blocked_by_withdraw
         requeue glock work & exit <--- work requeued, blocked by withdraw

This patch makes a special exception for the statfs system inode glock,
which allows the statfs glock UNLOCK to proceed normally. That allows the
quotad daemon to exit during the withdraw, which allows the logd daemon
to exit during the withdraw, which allows the mount to exit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-20 13:31:37 +02:00
Bob Peterson
20265d9a67 gfs2: fix scheduling while atomic bug in glocks
Before this patch, in the unlikely event that gfs2_glock_dq encountered
a withdraw, it would do a wait_on_bit to wait for its journal to be
recovered, but it never released the glock's spin_lock, which caused a
scheduling-while-atomic error.

This patch unlocks the lockref spin_lock before waiting for recovery.

Fixes: 601ef0d52e ("gfs2: Force withdraw to replay journals and wait for it to finish")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-05-20 13:31:37 +02:00
Matthew Wilcox (Oracle)
7716506ada mm: introduce and use mapping_empty()
Patch series "Remove nrexceptional tracking", v2.

We actually use nrexceptional for very little these days.  It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger with
the THP patches because we don't know how many indices a shadow entry
occupies.  It's easier to just remove it than keep it accurate.

Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects.  Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.

This patch (of 4):

Instead of checking the two counters (nrpages and nrexceptional), we can
just check whether i_pages is empty.

Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 11:27:19 -07:00
Linus Torvalds
f2c80837e2 Changes in gfs2:
- Fix some compiler and kernel-doc warnings.
 
 - Various minor cleanups and optimizations.
 
 - Add a new sysfs gfs2 status file with some filesystem wide
   information.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmCKiWIUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqXOA//cgEMi+WZ0pQ1m4Z7Yk58ArAGXOW4
 L+efdMjk2zoqgixF502tQzaa2ctz6XpukF4oZbM+Jc+yZxrbZ5CLjUIOWc9RH+Id
 WQwj5+5GLbMAPnn5ksHUCTK9V+1oAlpgoY4fMtLdKq234Y6xqWj5qBjvtGUTLFAl
 ACvy8FUZplFOkaOSBqgh221LT4Oh0Wthe/Elq5qvqwBfdAiE/p1sHSi2FWxktIlU
 wV3PKL96rFsnWN8E6jqyJR1RNJ5d5MYA+PDkTHKcoqcXZrzw4mfu2tCh88Bh9wFb
 MEyjtLxE09G1+3Li/T/Tb7qbRKWvxEmkLZXaFAjRUp7zYPvM6twKSg8nihcBDtLi
 UgvTrc208CYvYj7QpRQ1dU9lEg47A46rB8dgLz+ymlpNNk/G0gqgvWLevMKnBfaX
 AkZviI6qm1iNCBd6wWWPUKqR0qrCWqoe9N8F7cWyZBki7dKkoj29Gt1X1SeIQMjd
 n8Mkv6Btd39kBt3DydXlCEaREMQYeDrxBJHxur234hEfFLFraFj5tYFjoeSODZdg
 Uxsn5X5dgLy/hHjps8YcuBoEgRMR/aKovK5G1FXDcQR5O6UByqtJuJTJBT8jxAld
 vLYHqO6vxdgGATaYAkuGLSLrJjwES+7tEXjtrdarswGo55dPitwtOB1NRWuOor/Z
 uTnzJbykMdIzEHs=
 =VblJ
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix some compiler and kernel-doc warnings

 - Various minor cleanups and optimizations

 - Add a new sysfs gfs2 status file with some filesystem wide
   information

* tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix fall-through warnings for Clang
  gfs2: Fix a number of kernel-doc warnings
  gfs2: Make gfs2_setattr_simple static
  gfs2: Add new sysfs file for gfs2 status
  gfs2: Silence possible null pointer dereference warning
  gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer
  gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent
  gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent
  gfs2: Add new gfs2_iomap_get helper
  gfs2: Remove unused variable sb_format
  gfs2: Fix dir.c function parameter descriptions
  gfs2: Eliminate gh parameter from go_xmote_bh func
  gfs2: don't create empty buffers for NO_CREATE
2021-04-29 10:33:35 -07:00
Lee Jones
c551f66c5d gfs2: Fix a number of kernel-doc warnings
Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions.  Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.

Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-09 22:14:13 +02:00
Sami Tolvanen
4f0f586bf0 treewide: Change list_sort to use const pointers
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-08 16:04:22 -07:00
Bob Peterson
f68effb308 gfs2: Eliminate gh parameter from go_xmote_bh func
The only glock that uses go_xmote_bh glops function is the freeze glock
which uses freeze_go_xmote_bh. It does not use its gh parameter, so
this patch eliminates the unneeded parameter.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-03 21:32:09 +02:00
Bob Peterson
06e908cd9e gfs2: Allow node-wide exclusive glock sharing
Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
From the point of view of other nodes, the lock is still held
exclusively.

A future patch will start using this flag to improve performance with
rgrp sharing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-17 19:30:28 +01:00
Andreas Gruenbacher
a55a47a3bc Revert "GFS2: Prevent delete work from occurring on glocks used for create"
Since commit a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work"), we're
cancelling any pending delete work of an iopen glock before attaching a new
inode to that glock in gfs2_create_inode.  This means that delete_work_func can
no longer be queued or running when attaching the iopen glock to the new inode,
and we can revert commit a4923865ea ("GFS2: Prevent delete work from
occurring on glocks used for create"), which tried to achieve the same but in a
racy way.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-01 00:25:21 +01:00
Alexander Aring
515b269d5b gfs2: set lockdep subclass for iopen glocks
This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&gl->gl_lockref.lock);
  lock(&gl->gl_lockref.lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
 #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x19e/0x2e3
 lock_acquire+0x150/0x410
 ? lockref_get+0x9/0x20
 _raw_spin_lock+0x27/0x40
 ? lockref_get+0x9/0x20
 lockref_get+0x9/0x20
 delete_work_func+0x188/0x260
 process_one_work+0x237/0x540
 worker_thread+0x4d/0x3b0
 ? process_one_work+0x540/0x540
 kthread+0x127/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-24 23:45:58 +01:00
Alexander Aring
da7d554f7c gfs2: Wake up when sd_glock_disposal becomes zero
Commit fc0e38dae6 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero.  As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up.  Add the missing wakeup.

Fixes: fc0e38dae6 ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-03 14:39:11 +01:00
Bob Peterson
2ffed5290b gfs2: Only access gl_delete for iopen glocks
Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete.  Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20 23:16:22 +02:00
Bob Peterson
dbffb29dac gfs2: Fix comments to glock_hash_walk
The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20 23:16:16 +02:00
Bob Peterson
e2c6c8a797 gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-15 17:04:53 +02:00
Bob Peterson
ee1e2c773e gfs2: call truncate_inode_pages_final for address space glocks
Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:42 +02:00
Liu Shixin
e8a8023ee0 gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:41 +02:00
Andreas Gruenbacher
c07bfb4d8f gfs2: Fix refcount leak in gfs2_glock_poke
In gfs2_glock_poke, make sure gfs2_holder_uninit is called on the local
glock holder.  Without that, we're leaking a glock and a pid reference.

Fixes: 9e8990dea9 ("gfs2: Smarter iopen glock waiting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03 13:45:37 +02:00
Bob Peterson
5deaf1f63b gfs2: Add some flags missing from glock output
Before this patch, three flags were not represented in the glock output.
This patch adds them in:

c - GLF_INODE_CREATING
P - GLF_PENDING_DELETE
x - GLF_FREEING (both f and F are already used)

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03 13:20:13 +02:00
Andreas Gruenbacher
34244d711d gfs2: Don't sleep during glock hash walk
In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work.  This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30 13:04:45 +02:00
Andreas Gruenbacher
300e549b6e Merge branch 'gfs2-iopen' into for-next 2020-06-05 21:25:36 +02:00
Andreas Gruenbacher
9e8990dea9 gfs2: Smarter iopen glock waiting
When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.

To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock.  The other node(s) trying to acquire the lock can then
abort instead of timing out.

Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
35b6f8fbcf gfs2: Wake up when setting GLF_DEMOTE
Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE
flag.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
b0dcffd8da gfs2: Check inode generation number in delete_work_func
In delete_work_func, if the iopen glock still has an inode attached,
limit the inode lookup to that specific generation number: in the likely
case that the inode was deleted on the node on which the inode's link
count dropped to zero, we can skip verifying the on-disk block type and
reading in the inode.  The same applies if another node that had the
inode open managed to delete the inode before us.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
6bdcadea75 gfs2: Minor gfs2_lookup_by_inum cleanup
Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
generation number will qualify: a valid inode never has a zero no_formal_ino.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
8c7b9262a8 gfs2: Give up the iopen glock on contention
When there's contention on the iopen glock, it means that the link count
of the corresponding inode has dropped to zero on a remote node which is
now trying to delete the inode.  In that case, try to evict the inode so
that the iopen glock will be released, which will allow the remote node
to do its job.

When the inode is still open locally, the inode's reference count won't
drop to zero and so we'll keep holding the inode and its iopen glock.
The remote node will time out its request to grab the iopen glock, and
when the inode is finally closed locally, we'll try to delete it
ourself.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
a0e3cc65fa gfs2: Turn gl_delete into a delayed work
This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
f286d627ef gfs2: Keep track of deleted inode generations in LVBs
When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB).  When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes.  This avoids taking the resource
group glock in order to verify the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:20 +02:00
Bob Peterson
15f2547b41 gfs2: Allow ASPACE glocks to also have an lvb
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:18:59 +02:00
Bob Peterson
ea4e61c7f4 gfs2: introduce new gfs2_glock_assert_withdraw
Before this patch, asserts based on glocks did not print the glock with
the error. This patch introduces a new macro, gfs2_glock_assert_withdraw
which first prints the glock, then takes the assert.

This also changes a few glock asserts to the new macro.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 16:44:29 +02:00
Bob Peterson
7e901d6e95 gfs2: print mapping->nrpages in glock dump for address space glocks
This patch makes the glock dumps in debugfs print the number of pages
(nrpages) for address space glocks. This will aid in debugging.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 14:58:23 +02:00
Bob Peterson
b14c94908b Revert "gfs2: Don't demote a glock until its revokes are written"
This reverts commit df5db5f9ee.

This patch fixes a regression: patch df5db5f9ee allowed function
run_queue() to bypass its call to do_xmote() if revokes were queued for
the glock. That's wrong because its call to do_xmote() is what is
responsible for calling the go_sync() glops functions to sync both
the ail list and any revokes queued for it. By bypassing the call,
gfs2 could get into a stand-off where the glock could not be demoted
until its revokes are written back, but the revokes would not be
written back because do_xmote() was never called.

It "sort of" works, however, because there are other mechanisms like
the log flush daemon (logd) that can sync the ail items and revokes,
if it deems it necessary. The problem is: without file system pressure,
it might never deem it necessary.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-05-08 15:01:25 -05:00
Bob Peterson
b11e1a84f3 gfs2: If go_sync returns error, withdraw but skip invalidate
Before this patch, if the go_sync operation returned an error during
the do_xmote process (such as unable to sync metadata to the journal)
the code did goto out. That kept the glock locked, so it could not be
given away, which correctly avoids file system corruption. However,
it never set the withdraw bit or requeueing the glock work. So it would
hang forever, unable to ever demote the glock.

This patch changes to goto to a new label, skip_inval, so that errors
from go_sync are treated the same way as errors from go_inval:
The delayed withdraw bit is set and the work is requeued. That way,
the logd should eventually figure out there's a problem and withdraw
properly there.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08 15:00:07 -05:00
Bob Peterson
a8b7528b69 gfs2: Fix error exit in do_xmote
Before this patch, if an error was detected from glock function go_sync
by function do_xmote, it would return.  But the function had temporarily
unlocked the gl_lockref spin_lock, and it never re-locked it.  When the
caller of do_xmote tried to unlock it again, it was already unlocked,
which resulted in a corrupted spin_lock value.

This patch makes sure the gl_lockref spin_lock is re-locked after it is
unlocked.

Thanks to Wu Bo <wubo40@huawei.com> for reporting this problem.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08 14:45:38 +02:00
Andreas Gruenbacher
969183bc68 gfs2: Switch to list_{first,last}_entry
Replace open-coded versions of list_first_entry and list_last_entry with those
functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Bob Peterson
1c634f94c3 gfs2: Do proper error checking for go_sync family of glops functions
Before this patch, function do_xmote would try to sync out the glock
dirty data by calling the appropriate glops function XXX_go_sync()
but it did not check for a good return code. If the sync was not
possible due to an io error or whatever, do_xmote would continue on
and call go_inval and release the glock to other cluster nodes.
When those nodes go to replay the journal, they may already be holding
glocks for the journal records that should have been synced, but were
not due to the ignored error.

This patch introduces proper error code checking to the go_sync
family of glops functions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27 07:53:18 -06:00
Bob Peterson
df5db5f9ee gfs2: Don't demote a glock until its revokes are written
Before this patch, run_queue would demote glocks based on whether
there are any more holders. But if the glock has pending revokes that
haven't been written to the media, giving up the glock might end in
file system corruption if the revokes never get written due to
io errors, node crashes and fences, etc. In that case, another node
will replay the metadata blocks associated with the glock, but
because the revoke was never written, it could replay that block
even though the glock had since been granted to another node who
might have made changes.

This patch changes the logic in run_queue so that it never demotes
a glock until its count of pending revokes reaches zero.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27 07:53:18 -06:00
Bob Peterson
d93ae386ef gfs2: Check for log write errors before telling dlm to unlock
Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple checks for errors
in do_xmote after the calls to go_sync and go_inval. If an error
is found, we cannot withdraw yet, because the withdraw itself
uses glocks to make the file system read-only. Instead, we flag
the error. Later, asserts should cause another node to replay
the journal before continuing, thus protecting rgrp and dinode
glocks and maintaining the integrity of the metadata. Note that
we only need to do this for journaled glocks. System glocks
should be able to progress even under withdrawn conditions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27 07:53:18 -06:00
Bob Peterson
33dbd1e41a gfs2: fix infinite loop when checking ail item count before go_inval
Before this patch, the rgrp_go_inval and inode_go_inval functions each
checked if there were any items left on the ail count (by way of a
count), and if so, did a withdraw. But the withdraw code now uses
glocks when changing the file system to read-only status. So we can
not have glock functions withdrawing or a hang will likely result:
The glocks can't be serviced by the work_func if the work_func is
busy doing its own withdraw.

This patch removes the checks from the go_inval functions and adds
a centralized check in do_xmote to warn about the problem and not
withdraw, but flag the error so it's eventually caught when the logd
daemon eventually runs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-27 07:53:17 -06:00
Bob Peterson
601ef0d52e gfs2: Force withdraw to replay journals and wait for it to finish
When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held.

The "live" glock which is now used to signal when a withdraw
occurs. When a withdraw occurs, the node signals its withdraw by
dequeueing the "live" glock and trying to enqueue it in EX mode,
thus forcing the other nodes to all see a demote request, by way
of a "1CB" (one callback) try lock. The "live" glock is not
granted in EX; the callback is only just used to indicate a
withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

We also need to take care when glocks decide to withdraw, since
the withdraw code now uses glocks.

Also, before this patch, if a process encountered an error and
decided to withdraw, if another process was already withdrawing,
the second withdraw would be silently ignored, which set it free
to unlock its glocks. That's correct behavior if the original
withdrawer encounters further errors down the road. But if
secondary waiters don't wait for the journal replay, unlocking
glocks will allow other nodes to use them, despite the fact that
the journal containing those blocks is being replayed. The
replay needs to finish before our glocks are released to other
nodes. IOW, secondary withdraws need to wait for the first
withdraw to finish.

For example, if an rgrp glock is unlocked by a process that didn't
wait for the first withdraw, a journal replay could introduce file
system corruption by replaying a rgrp block that has already been
granted to a different cluster node.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-27 07:53:12 -06:00
Bob Peterson
a72d2401f5 gfs2: Allow some glocks to be used during withdraw
We need to allow some glocks to be enqueued, dequeued, promoted, and demoted
when we're withdrawn. For example, to maintain metadata integrity, we should
disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like
iopen or the transaction glocks may be safely used because none of their
metadata goes through the journal. So in general, we should disallow all
glocks with an address space, and allow all the others. One exception is:
we need to allow our active journal to be demoted so others may recover it.

Allowing glocks after withdraw gives us the ability to take appropriate
action (in a following patch) to have our journal properly replayed by
another node rather than just abandoning the current transactions and
pretending nothing bad happened, leaving the other nodes free to modify
the blocks we had in our journal, which may result in file system
corruption.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-20 11:01:36 -06:00
Bob Peterson
b3422cacdd gfs2: Rework how rgrp buffer_heads are managed
Before this patch, the rgrp code had a serious problem related to
how it managed buffer_heads for resource groups. The problem caused
file system corruption, especially in cases of journal replay.

When an rgrp glock was demoted to transfer ownership to a
different cluster node, do_xmote() first calls rgrp_go_sync and then
rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called
gfs2_rgrp_brelse() that dropped the buffer_head reference count.
In most cases, the reference count went to zero, which is right.
However, there were other places where the buffers are handled
differently.

After rgrp_go_sync, do_xmote called rgrp_go_inval which called
gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to
truncate_inode_pages_range would get rid of the pages in memory,
but only if the reference count drops to 0.

Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL.
So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer
to the buffer_heads in cases where the reference count was still 1.
Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time,
it failed the check for "if (bi->bi_bh)" and thus failed to call
brelse a second time. Because of that, the reference count on those
buffers sometimes failed to drop from 1 to 0. And that caused
function truncate_inode_pages_range to keep the pages in page cache
rather than freeing them.

The next time the rgrp glock was acquired, the metadata read of
the rgrp buffers re-used the pages in memory, which were now
wrong because they were likely modified by the other node who
acquired the glock in EX (which is why we demoted the glock).
This re-use of the page cache caused corruption because changes
made by the other nodes were never seen, so the bitmaps were
inaccurate.

For some reason, the problem became most apparent when journal
replay forced the replay of rgrps in memory, which caused newer
rgrp data to be overwritten by the older in-core pages.

A big part of the problem was that the rgrp buffer were released
in multiple places: The go_unlock function would release them when
the glock was released rather than when the glock is demoted,
which is clearly wrong because our intent was to cache them until
the glock is demoted from SH or EX.

This patch attempts to clean up the mess and make one consistent
and centralized mechanism for managing the rgrp buffer_heads by
implementing several changes:

1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync.
   We don't want to release the buffers or zero the pointers when
   syncing for the reasons stated above. It only makes sense to
   release them when the glock is actually invalidated (go_inval).
   And when we do, then we set the bh pointers to NULL.
2. The go_unlock function (which was only used for rgrps) is
   eliminated, as we've talked about doing many times before.
   The go_unlock function was called too early in the glock dq
   process, and should not happen until the glock is invalidated.
3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd.
   That will now happen automatically when the rgrp glocks are
   demoted, and shouldn't happen any sooner or later than that.
   Instead, function gfs2_clear_rgrpd has been modified to demote
   the rgrp glocks, and therefore, free those pages, before the
   remaining glocks are culled by gfs2_gl_hash_clear. This
   prevents the gl_object from hanging around when the glocks are
   culled.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10 07:39:48 -06:00
Andreas Gruenbacher
f7be987b82 gfs2: Remove GFS2_MIN_LVB_SIZE define
The dlm lockspace is set up to have lock value blocks of GDLM_LVB_SIZE bytes,
and dlm is the only lock manager we support, so there is no point in claiming
that the lock value block could have any other size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-20 08:46:53 +01:00
Bob Peterson
d99724c3c3 gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS
This patch closes a timing window in which two processes compete
and overlap in the execution of do_xmote for the same glock:

             Process A                              Process B
   ------------------------------------   -----------------------------
1. Grabs gl_lockref and calls do_xmote
2.                                        Grabs gl_lockref but is blocked
3. Sets GLF_INVALIDATE_IN_PROGRESS
4. Unlocks gl_lockref
5.                                        Calls do_xmote
6. Call glops->go_sync
7. test_and_clear_bit GLF_DIRTY
8. Call gfs2_log_flush                    Call glops->go_sync
9. (slow IO, so it blocks a long time)    test_and_clear_bit GLF_DIRTY
                                          It's not dirty (step 7) returns
10.                                       Tests GLF_INVALIDATE_IN_PROGRESS
11.                                       Calls go_inval (rgrp_go_inval)
12.                                       gfs2_rgrp_relse does brelse
13.                                       truncate_inode_pages_range
14.                                       Calls lm_lock UN

In step 14 we've just told dlm to give the glock to another node
when, in fact, process A has not finished the IO and synced all
buffer_heads to disk and make sure their revokes are done.

This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
to use test_and_set_bit, and if the bit is already set, process B just
ignores it and trusts that process A will do the do_xmote in the proper
order.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-15 18:21:59 +01:00
Bob Peterson
eb43e660c0 gfs2: Introduce function gfs2_withdrawn
Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN
bit to call it. This does not change the logic or function of gfs2, and
it facilitates later improvements to the withdraw sequence.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-14 19:46:18 +01:00
Bob Peterson
ad26967b9a gfs2: Use async glocks for rename
Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
reverse the roles of which directories are "old" and which are "new" for
the purposes of rename. This can cause deadlocks where two nodes end up
waiting for each other.

There can be several layers of directory dependencies across many nodes.

This patch fixes the problem by acquiring all gfs2_rename's inode glocks
asychronously and waiting for all glocks to be acquired.  That way all
inodes are locked regardless of the order.

The timeout value for multiple asynchronous glocks is calculated to be
the total of the individual wait times for each glock times two.

Since gfs2_exchange is very similar to gfs2_rename, both functions are
patched in the same way.

A new async glock wait queue, sd_async_glock_wait, keeps a list of
waiters for these events. If gfs2's holder_wake function detects an
async holder, it wakes up any waiters for the event. The waiter only
tests whether any of its requests are still pending.

Since the glocks are sent to dlm asychronously, the wait function needs
to check to see which glocks, if any, were granted.

If a glock is granted by dlm (and therefore held), its minimum hold time
is checked and adjusted as necessary, as other glock grants do.

If the event times out, all glocks held thus far must be dequeued to
resolve any existing deadlocks.  Then, if there are any outstanding
locking requests, we need to loop around and wait for dlm to respond to
those requests too.  After we release all requests, we return -ESTALE to
the caller (vfs rename) which loops around and retries the request.

    Node1           Node2
    ---------       ---------
1.  Enqueue A       Enqueue B
2.  Enqueue B       Enqueue A
3.  A granted
6.                  B granted
7.  Wait for B
8.                  Wait for A
9.                  A times out (since Node 1 holds A)
10.                 Dequeue B (since it was granted)
11.                 Wait for all requests from DLM
12. B Granted (since Node2 released it in step 10)
13. Rename
14. Dequeue A
15.                 DLM Grants A
16.                 Dequeue A (due to the timeout and since we
                    no longer have B held for our task).
17. Dequeue B
18.                 Return -ESTALE to vfs
19.                 VFS retries the operation, goto step 1.

This release-all-locks / acquire-all-locks may slow rename / exchange
down as both nodes struggle in the same way and do the same thing.
However, this will only happen when there is contention for the same
inodes, which ought to be rare.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-04 20:22:17 +02:00
Andreas Gruenbacher
01123cf17c gfs2: create function gfs2_glock_update_hold_time
This patch moves the code that updates glock minimum hold
time to a separate function. This will be called by a future
patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-09-04 20:22:17 +02:00
Bob Peterson
98fb057487 gfs2: Fix possible fs name overflows
This patch fixes three places in which temporary character buffers
could overflow due to the addition of the file system id from patch
3792ce973f. Thanks to Dan Carpenter for pointing it out.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-03 09:42:41 +02:00
Bob Peterson
3792ce973f gfs2: dump fsid when dumping glock problems
Before this patch, if a glock error was encountered, the glock with
the problem was dumped. But sometimes you may have lots of file systems
mounted, and that doesn't tell you which file system it was for.

This patch adds a new boolean parameter fsid to the dump_glock family
of functions. For non-error cases, such as dumping the glocks debugfs
file, the fsid is not dumped in order to keep lock dumps and glocktop
as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
file system id is now printed. This will make it easier to debug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:27:43 +02:00
Bob Peterson
04aea0ca14 gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN
Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:26:35 +02:00
Kefeng Wang
15a798f7de gfs2: Use IS_ERR_OR_NULL
Use IS_ERR_OR_NULL where appropriate.

(Several more places converted by Andreas.)

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 20:53:46 +02:00
Linus Torvalds
9331b6740f SPDX update for 5.2-rc4
Another round of SPDX header file fixes for 5.2-rc4
 
 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
 added, based on the text in the files.  We are slowly chipping away at
 the 700+ different ways people tried to write the license text.  All of
 these were reviewed on the spdx mailing list by a number of different
 people.
 
 We now have over 60% of the kernel files covered with SPDX tags:
 	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
 	Files checked:            64533
 	Files with SPDX:          40392
 	Files with errors:            0
 
 I think the majority of the "easy" fixups are now done, it's now the
 start of the longer-tail of crazy variants to wade through.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
 sxVEiFZo8ZU9v1IoRb1I
 =qU++
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull yet more SPDX updates from Greg KH:
 "Another round of SPDX header file fixes for 5.2-rc4

  These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
  added, based on the text in the files. We are slowly chipping away at
  the 700+ different ways people tried to write the license text. All of
  these were reviewed on the spdx mailing list by a number of different
  people.

  We now have over 60% of the kernel files covered with SPDX tags:
	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
	Files checked:            64533
	Files with SPDX:          40392
	Files with errors:            0

  I think the majority of the "easy" fixups are now done, it's now the
  start of the longer-tail of crazy variants to wade through"

* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
  ...
2019-06-08 12:52:42 -07:00
Bob Peterson
638803d456 Revert "gfs2: Replace gl_revokes with a GLF flag"
Commit 73118ca8ba introduced a glock reference counting bug in
gfs2_trans_remove_revoke.  Given that, replacing gl_revokes with a GLF flag is
no longer useful, so revert that commit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-06 16:29:26 +02:00
Thomas Gleixner
7336d0e654 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:12 +02:00
Bob Peterson
73118ca8ba gfs2: Replace gl_revokes with a GLF flag
The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes.  However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero.  Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Andreas Gruenbacher
9287c6452d gfs2: Fix occasional glock use-after-free
This patch has to do with the life cycle of glocks and buffers.  When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list.  Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.

So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke.  At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node.  This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.

Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list.  The
evict decremented the glock reference count to zero, which allowed the
glock to be freed.  After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.

This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07 23:39:14 +02:00
Ross Lagerwall
7881ef3f33 gfs2: Fix lru_count going negative
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:

vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
    negative objects to delete nr=-1

This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
   decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
   lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
   incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
   to the lru list but the lru_count has still decreased by one.

Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 22:33:53 +02:00
Andreas Gruenbacher
605b0487f0 gfs2: Fix missed wakeups in find_insert_glock
Mark Syms has reported seeing tasks that are stuck waiting in
find_insert_glock.  It turns out that struct lm_lockname contains four padding
bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
hashing the glock name.  As a result, we can end up waking up the wrong
waitqueue, and the waiting tasks may be stuck forever.

Fix that by using ht_parms.key_len instead of sizeof(struct lm_lockname) for
the key length.

Reported-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-03-08 15:49:01 +01:00
Greg Kroah-Hartman
2abbf9a4d2 gfs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

There is no need to save the dentries for the debugfs files, so drop
those variables to save a bit of space and make the code simpler.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-01-23 12:30:34 +01:00
Bob Peterson
27a2660f1e gfs2: Dump nrpages for inodes and their glocks
This patch is based on an idea from Steve Whitehouse. The idea is
to dump the number of pages for inodes in the glock dumps.
The additional locking required me to drop const from quite a few
places.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-12 12:33:23 +01:00
Bob Peterson
e54c78a27f gfs2: Use fs_* functions instead of pr_* function where we can
Before this patch, various errors and messages were reported using
the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does
not tell you which gfs2 mount had the problem, which is often vital
to debugging. This patch changes the calls from pr_* to fs_* in
most of the messages so that the file system id is printed along
with the message.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-05 11:16:54 -05:00
Kees Cook
6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Andreas Gruenbacher
3fd5d3ad35 gfs2: Stop using rhashtable_walk_peek
Function rhashtable_walk_peek is problematic because there is no
guarantee that the glock previously returned still exists; when that key
is deleted, rhashtable_walk_peek can end up returning a different key,
which will cause an inconsistent glock dump.  Fix this by keeping track
of the current glock in the seq file iterator functions instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12 09:41:19 -07:00
Andreas Gruenbacher
7ac07fdaf8 gfs2: Glock dump performance regression fix
Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks
dump": keep the glock hash table iterator active while the glock dump
file is held open.  This avoids having to rescan the hash table from the
start for each read, with quadratically rising runtime.

In addition, use rhastable_walk_peek for resuming a glock dump at the
current position: when a glock doesn't fit in the provided buffer
anymore, the next read must revisit the same glock.

Finally, also restart the dump from the first entry when we notice that
the hash table has been resized in gfs2_glock_seq_start.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01 11:27:11 -07:00
Tom Herbert
97a6ec4ac0 rhashtable: Change rhashtable_walk_start to return void
Most callers of rhashtable_walk_start don't care about a resize event
which is indicated by a return value of -EAGAIN. So calls to
rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
like this is common:

       ret = rhashtable_walk_start(rhiter);
       if (ret && ret != -EAGAIN)
               goto out;

Since zero and -EAGAIN are the only possible return values from the
function this check is pointless. The condition never evaluates to true.

This patch changes rhashtable_walk_start to return void. This simplifies
code for the callers that ignore -EAGAIN. For the few cases where the
caller cares about the resize event, particularly where the table can be
walked in mulitple parts for netlink or seq file dump, the function
rhashtable_walk_start_check has been added that returns -EAGAIN on a
resize event.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11 09:58:38 -05:00
Andreas Gruenbacher
10201655b0 gfs2: Fix debugfs glocks dump
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock
dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a
single buffer: the right function for restarting an rhashtable iteration
from the beginning of the hash table is rhashtable_walk_enter;
rhashtable_walk_stop + rhashtable_walk_start will just resume from the
current position.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.3+
2017-09-25 12:32:33 -05:00
Arvind Yadav
d296b15ed5 gfs2: constify rhashtable_params
rhashtable_params are not supposed to change at runtime. All
Functions rhashtable_* working with const rhashtable_params
provided by <linux/rhashtable.h>. So mark the non-const structs
as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 08:14:39 -05:00
Bob Peterson
27c3b415f6 GFS2: Fix up some sparse warnings
This patch cleans up various pieces of GFS2 to avoid sparse errors.
This doesn't fix them all, but it fixes several. The first error,
in function glock_hash_walk was a genuine bug where the rhashtable
could be started and not stopped.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 18:47:18 -05:00
Andreas Gruenbacher
a91323e255 gfs2: Clean up waiting on glocks
The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in
commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are
better removed, resulting in cleaner code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:51:02 -05:00
Andreas Gruenbacher
71c1b21368 gfs2: gfs2_evict_inode: Put glocks asynchronously
gfs2_evict_inode is called to free inodes under memory pressure.  The
function calls into DLM when an inode's last cluster-wide reference goes
away (remote unlink) and to release the glock and associated DLM lock
before finally destroying the inode.  However, if DLM is blocked on
memory to become available, calling into DLM again will deadlock.

Avoid that by decoupling releasing glocks from destroying inodes in that
case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
in work queue context, when the associated inodes have likely already
been destroyed.

With this change, inodes can end up being unlinked, remote-unlink can be
triggered, and then the inode can be reallocated before all
remote-unlink callbacks are processed.  To detect that, revalidate the
link count in gfs2_evict_inode to make sure we're not deleting an
allocated, referenced inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:45:21 -05:00
Andreas Gruenbacher
0515480ad4 gfs2: gfs2_glock_get: Wait on freeing glocks
Keep glocks in their hash table until they are freed instead of removing
them when their last reference is dropped.  This allows to wait for any
previous instances of a glock to go away in gfs2_glock_get before
creating a new glocks.

Special thanks to Andy Price for finding and fixing a problem which also
required us to delete the rcu_read_unlock from the error case in function
gfs2_glock_get.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:39:31 -05:00
Bob Peterson
645ebd49f0 GFS2: Don't waste time locking lru_lock for non-lru glocks
Before this patch, glock_dq would call gfs2_glock_remove_from_lru.
For glocks that are never put on the LRU, such as the transaction
glock, this just takes the spin_lock, determines there's nothing to
be done because the list is empty, then unlocks again. This was
causing unnecessary lock contention on the lru_lock spin_lock.
This patch adds a check for GLOF_LRU in the glops before taking
the spin_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Andreas Gruenbacher
961ae1d83d gfs2: Fix glock rhashtable rcu bug
Before commit 88ffbf3e03 "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu.  This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe.  Bring back the original
code for freeing glocks via call_rcu.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # 4.3+
2017-07-07 13:22:05 -05:00
Andreas Gruenbacher
6b0c7440bc gfs2: Clean up glock work enqueuing
This patch adds a standardized queueing mechanism for glock work
with spin_lock protection to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:00 -05:00
Bob Peterson
ed17545d01 GFS2: Allow glocks to be unlocked after withdraw
This bug fixes a regression introduced by patch 0d1c7ae9d8.

The intent of the patch was to stop promoting glocks after a
file system is withdrawn due to a variety of errors, because doing
so results in a BUG(). (You should be able to unmount after a
withdraw rather than having the kernel panic.)

Unfortunately, it also stopped demotions, so glocks could not be
unlocked after withdraw, which means the unmount would hang.

This patch allows function do_xmote to demote locks to an
unlocked state after a withdraw, but not promote them.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-05-05 14:19:28 -05:00
Andreas Gruenbacher
0a52aba7c2 gfs2: Switch to rhashtable_lookup_get_insert_fast
Switch from rhashtable_lookup_insert_fast + rhashtable_lookup_fast to
rhashtable_lookup_get_insert_fast, which is cleaner and avoids an extra
rhashtable lookup.

At the same time, turn the retry loop in gfs2_glock_get into an infinite
loop.  The lookup or insert will eventually succeed, usually very fast,
but there is no reason to give up trying at a fixed number of
iterations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03 09:14:41 -04:00
Andreas Gruenbacher
972b044eec gfs2: Don't pack struct lm_lockname
As per a suggestion by Linus, don't pack struct lm_lockname: we did that
because the struct is used as a rhashtable key, but packing tells the
compiler that the 64-bit fields in the struct may be unaligned, causing
it to generate worse code on some architectures.  Instead, rearrange the
fields in the struct so that there is no padding between fields, and
exclude any tail padding from the hash key size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 09:58:49 -04:00
Andreas Gruenbacher
92ecd73a88 gfs2: Deduplicate gfs2_{glocks,glstats}_open
Both functions are identical except for the seq_operations used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Andreas Gruenbacher
cc37a62785 gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter
Function rhashtable_walk_init is deprecated.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Bob Peterson
0d1c7ae9d8 GFS2: Prevent BUG from occurring when normal Withdraws occur
When the GFS2 file system withdraws due to metadata corruption, it
often has outstanding transactions in the journal and delayed work
queued for its glocks. This patch adds some new checks for a
withdrawn file system before proceeding with operations that would
obviously cause a BUG() to be triggered. That allows GFS2 to be
safely unmounted rather than cause the system to go down.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Linus Torvalds
15192b0295 This is an addendum for the 4.11 merge window.
Andy Price wrote this patch to close a nasty race condition
 that allows access to glocks that are being destroyed. Without
 this patch, GFS2 is vulnerable to random corruption and kernel
 panic.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYrv8+AAoJENeLYdPf93o7T58H/i3K+awecX1yrCl9qvAvxte+
 UJioZd9wnrjHsprFkMMzeVC2rFH5EIm5JKEyl8zGGwIq/oaGtgWlxQsBOvyOnSyx
 WRvu99XjZTzu3vov7u1kiWmOOvVturdcALPHH6mFdgkCw8d15AHqQdfDvljfWbRp
 aHFc+x1evptskRTj4D7I6EeWig8v3Sr9qosJ2N8uKtsrcc/xIlh4ItsonlQh3Cz0
 Dg83HVN2opHI5CWjRAjTK6zjF6XoEMgsjIOR4HLRVC9XEXiWLd3w+JBnTbFYJt0f
 k8NMk8oGbmzTC/HteJvnzGuNfSlkk4RAwaCkYo7F9f6hcKsWPECzUdyHn3ubm7M=
 =uIIs
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.11.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 fix from Bob Peterson:
 "This is an addendum for the 4.11 merge window.

  Andy Price wrote this patch to close a nasty race condition that
  allows access to glocks that are being destroyed. Without this patch,
  GFS2 is vulnerable to random corruption and kernel panic"

* tag 'gfs2-4.11.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Add missing rcu locking for glock	lookup
2017-02-23 09:36:04 -08:00
Andrew Price
f38e5fb95a gfs2: Add missing rcu locking for glock lookup
We must hold the rcu read lock across looking up glocks and trying to
bump their refcount to prevent the glocks from being freed in between.

Cc: <stable@vger.kernel.org> # 4.3+
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-02-23 10:06:00 -05:00
Linus Torvalds
3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds
9763dd6f81 We've got eight GFS2 patches for this merge window:
1. Andy Price submitted a patch to make gfs2_write_full_page a
    static function.
 2. Dan Carpenter submitted a patch to fix a ERR_PTR thinko.
 
 I've also got a few patches, three of which fix bugs related to
 deleting very large files, which cause GFS2 to run out of
 journal space:
 
 3. The first one prevents GFS2 delete operation from requesting too
    much journal space.
 4. The second one fixes a problem whereby GFS2 can hang because it
    wasn't taking journal space demand into its calculations.
 5. The third one wakes up IO waiters when a flush is done to restart
    processes stuck waiting for journal space to become available.
 
 The other three patches are a performance improvement related to
 spin_lock contention between multiple writers:
 
 6. The "tr_touched" variable was switched to a flag to be more atomic
    and eliminate the possibility of some races.
 7. Function meta_lo_add was moved inline with its only caller to make
    the code more readable and efficient.
 8. Contention on the gfs2_log_lock spinlock was greatly reduced by
    avoiding the lock altogether in cases where we don't really need
    it: buffers that already appear in the appropriate metadata list
    for the journal. Many thanks to Steve Whitehouse for the ideas and
    principles behind these patches.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYrEEEAAoJENeLYdPf93o7bjoIAIqPG/EAzi+idgMWDPQa9Eit
 53dPy16snkrbWwtaK6spSWlH6bGYuHeanXORYon9bvtVjKYaa4NQclGihN2IE6uB
 O8zT+MGwP45LDhNplVJpumaOALZ9ZDqQSe+3tHeNK3FhNirLyiIjSqrHt/7Yi1qi
 fPLlT4Jx0TBo5rhvEGa7Yg01WhWVtnmVSMqJXj/7ZtC50s1aPyDUikdNIDfDCN2X
 LxfKGDXuk6p63VQ6JKqYSBVATCR0/bbKfkuk/kBUTYLoHoapImxB8d0HgIdsh1Mv
 9PlbZnnNW8k5oapuhVxjl0T5G0JsQgCkPb/wlte+ryOCjBoc2L2fCUV5qc0QxWc=
 =xQyl
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Robert Peterson:
 "We've got eight GFS2 patches for this merge window:

   - Andy Price submitted a patch to make gfs2_write_full_page a static
     function.

   - Dan Carpenter submitted a patch to fix a ERR_PTR thinko.

  Three patches fix bugs related to deleting very large files, which
  cause GFS2 to run out of journal space:

   - The first one prevents GFS2 delete operation from requesting too
     much journal space.

   - The second one fixes a problem whereby GFS2 can hang because it
     wasn't taking journal space demand into its calculations.

   - The third one wakes up IO waiters when a flush is done to restart
     processes stuck waiting for journal space to become available.

  The final three patches are a performance improvement related to
  spin_lock contention between multiple writers:

   - The "tr_touched" variable was switched to a flag to be more atomic
     and eliminate the possibility of some races.

   - Function meta_lo_add was moved inline with its only caller to make
     the code more readable and efficient.

   - Contention on the gfs2_log_lock spinlock was greatly reduced by
     avoiding the lock altogether in cases where we don't really need
     it: buffers that already appear in the appropriate metadata list
     for the journal. Many thanks to Steve Whitehouse for the ideas and
     principles behind these patches"

* tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Make gfs2_write_full_page static
  GFS2: Reduce contention on gfs2_log_lock
  GFS2: Inline function meta_lo_add
  GFS2: Switch tr_touched to flag in transaction
  GFS2: Wake up io waiters whenever a flush is done
  GFS2: Made logd daemon take into account log demand
  GFS2: Limit number of transaction blocks requested for truncates
  GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
2017-02-21 07:46:34 -08:00
Herbert Xu
98687f426b gfs2: Use rhashtable walk interface in glock_hash_walk
The function glock_hash_walk walks the rhashtable by hand.  This
is broken because if it catches the hash table in the middle of
a rehash, then it will miss entries.

This patch replaces the manual walk by using the rhashtable walk
interface.

Fixes: 88ffbf3e03 ("GFS2: Use resizable hash table for glocks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 12:28:22 -05:00
Thomas Gleixner
8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Dan Carpenter
14d37564fa GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
This patch fixes a place where function gfs2_glock_iter_next can
reference an invalid error pointer.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-12-15 19:30:07 -06:00
Chao Yu
e0d735c1cc gfs2: fix to detect failure of register_shrinker
register_shrinker can fail after commit 1d3d4437ea ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after gfs2 module was been inited successfully.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-09-21 12:09:40 -05:00
Fabian Frederick
47a9a52794 GFS2: use BIT() macro
Replace 1 << value shift by more explicit BIT() macro

Also fixes two bare unsigned definitions:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+		unsigned hsize = BIT(ip->i_depth);

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-08-02 12:05:27 -05:00
Andreas Gruenbacher
6df9f9a253 gfs2: Lock holder cleanup
Make the code more readable by cleaning up the different ways of
initializing lock holders and checking for initialized lock holders:
mark lock holders as uninitialized by setting the holder's glock to NULL
(gfs2_holder_mark_uninitialized) instead of zeroing out the entire
object or using a separate flag.  Recognize initialized holders by their
non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
which are immeditiately initialized via gfs2_holder_init or
gfs2_glock_nq_init.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:09 -05:00
Andreas Gruenbacher
ec5ec66ba4 gfs2: Get rid of gfs2_ilookup
Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
(and not for cached inodes anymore), there no longer is a need to
optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
and gfs2_ilookup can be removed.

In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
inconsistency goes away as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:08 -05:00
Andreas Gruenbacher
3ce37b2cb4 gfs2: Fix gfs2_lookup_by_inum lock inversion
The current gfs2_lookup_by_inum takes the glock of a presumed inode
identified by block number, verifies that the block is indeed an inode,
and then instantiates and reads the new inode via gfs2_inode_lookup.

However, instantiating a new inode may block on freeing a previous
instance of that inode (__wait_on_freeing_inode), and freeing an inode
requires to take the glock already held, leading to lock inversion and
deadlock.

Fix this by first instantiating the new inode, then verifying that the
block is an inode (if required), and then reading in the new inode, all
in gfs2_inode_lookup.

If the block we are looking for is not an inode, we discard the new
inode via iget_failed, which marks inodes as bad and unhashes them.
Other tasks waiting on that inode will get back a bad inode back from
ilookup or iget_locked; in that case, retry the lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:07 -05:00
Linus Torvalds
be1332c099 We've got nine patches this time:
- Abhi Das has two patches that fix a GFS2 splice issue (and an adjustment).
 - Ben Marzinski has a patch which allows the proper unmount of a GFS2
   file system after hitting a withdraw error.
 - I have a patch to fix a problem where GFS2 would dereference an error
   value, plus three cosmetic / refactoring patches.
 - Daniel DeFreez has a patch to fix two glock reference count problems,
   where GFS2 was not properly "uninitializing" its glock holder on error
   paths.
 - Denys Vlasenko has a patch to change a function to not be inlined,
   thus reducing the memory footprint of the GFS2 module.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXP2KLAAoJENeLYdPf93o7nScH/jlDCv+JzwBwAiFEVelgB657
 lFY+Q3bPvmcTaHiOqXzco7U1EeyvAuZ+x9duaFBBEGC2/NydIbkAdCP1XjDOTAGZ
 aVVAP8qov4adFiXTfBdzMYjJ7DArOy/JRnlC8WK5OiD/3yxsXJZNlAHTgjjL/ZMq
 PkhjxX2Lc9ZHGF9YIFTyeoBlqf1qO6WbhkxvSNefgq3c702QPM+T9XND5T1duGDF
 ThEcXS2DRRoE+6wr3h+ruvS5kcfxXRLLausAgqNpj2FYQOFCQ6XXgGK8pyXA5V/t
 3Vh8mHW0VmSjAfrRWslCRjd+APXoOf+7QHqOHUAA1dvQ6CGfC9qTETH9Wp32JCI=
 =8g+a
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got nine patches this time:

   - Abhi Das has two patches that fix a GFS2 splice issue (and an
     adjustment).

   - Ben Marzinski has a patch which allows the proper unmount of a GFS2
     file system after hitting a withdraw error.

   - I have a patch to fix a problem where GFS2 would dereference an
     error value, plus three cosmetic / refactoring patches.

   - Daniel DeFreez has a patch to fix two glock reference count
     problems, where GFS2 was not properly "uninitializing" its glock
     holder on error paths.

   - Denys Vlasenko has a patch to change a function to not be inlined,
     thus reducing the memory footprint of the GFS2 module"

* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Refactor gfs2_remove_from_journal
  GFS2: Remove allocation parms from gfs2_rbm_find
  gfs2: use inode_lock/unlock instead of accessing i_mutex directly
  GFS2: Add calls to gfs2_holder_uninit in two error handlers
  GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
  GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
  gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
  GFS2: Get rid of dead code in inode_go_demote_ok
  GFS2: ignore unlock failures after withdraw
2016-05-20 15:11:26 -07:00
Denys Vlasenko
a527b38e14 GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
This function compiles to 522 bytes of machine code.

Error paths are not very time critical.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-12 12:39:12 -04:00
Bob Copeland
8f6fd83c6c rhashtable: accept GFP flags in rhashtable_walk_init
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section.  Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05 10:56:32 +02:00
Benjamin Marzinski
3e11e53041 GFS2: ignore unlock failures after withdraw
After gfs2 has withdrawn the filesystem, it may still have many locks not
in the unlocked state.  If it is using lock_dlm, it will failed trying
the unlocks since it has already unmounted the lock manager. Instead, it
should set the SDF_SKIP_DLM_UNLOCK flag on withdraw, to signal that
it can skip the lock_manager on unlocks, and failback to lock_nolock
style unlocking.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-03-24 08:28:48 -04:00
Bob Peterson
ff34245d52 GFS2: Don't filter out I_FREEING inodes anymore
This patch basically reverts a very old patch from 2008,
7a9f53b3c1, with the title
"Alternate gfs2_iget to avoid looking up inodes being freed".
The original patch was designed to avoid a deadlock caused by lock
ordering with try_rgrp_unlink. The patch forced the function to not
find inodes that were being removed by VFS. The problem is, that
made it impossible for nodes to delete their own unlinked dinodes
after a certain point in time, because the inode needed was not found
by this filtering process. There is no longer a need for the patch,
since function try_rgrp_unlink no longer locks the inode: All it does
is queue the glock onto the delete work_queue, so there should be no
more deadlock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-15 10:46:45 -04:00
Bob Peterson
a4923865ea GFS2: Prevent delete work from occurring on glocks used for create
This patch tries to prevent delete work (queued via iopen callback)
from executing if the glock is currently being used to create
a new inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-15 10:46:37 -04:00
Bob Peterson
7508abc4bd GFS2: Check if iopen is held when deleting inode
This patch fixes an error condition in which an inode is partially
created in gfs2_create_inode() but then some error is discovered,
which causes it to fail and call iput() before the iopen glock is
created or held. In that case, gfs2_delete_inode would try to
unlock an iopen glock that doesn't yet exist. Therefore, we test
its holder (which must exist) for the HIF_HOLDER bit before trying
to dq it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-01-14 08:47:42 -05:00
Bob Peterson
2aba1b5b4f GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
At some point in the past, we used to have a timeout when GFS2 was
unmounting, trying to clear out its glocks. If the timeout expires,
it would dump the remaining glocks to the kernel messages so that
developers can debug the problem. That timeout was eliminated,
probably by accident. This patch reintroduces it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-14 12:19:31 -06:00
Bob Peterson
b58bf407ca GFS2: Reduce size of incore inode
This patch makes no functional changes. Its goal is to reduce the
size of the gfs2 inode in memory by rearranging structures and
changing the size of some variables within the structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-14 12:19:24 -06:00
Andrew Price
3dd1dd8c69 GFS2: Use rht_for_each_entry_rcu in glock_hash_walk
This lockdep splat was being triggered on umount:

[55715.973122] ===============================
[55715.980169] [ INFO: suspicious RCU usage. ]
[55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: G        W
[55715.982353] -------------------------------
[55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() usage!

The code it refers to is the rht_for_each_entry_safe usage in
glock_hash_walk. The condition that triggers the warning is
lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
__rcu_dereference_protected macro.

The rhashtable buckets are not changed in glock_hash_walk so it's safe
to rely on the rcu protection. Replace the rht_for_each_entry_safe()
usage with rht_for_each_entry_rcu(), which doesn't care whether the
bucket lock is held if the rcu read lock is held.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-11-16 11:57:59 -06:00
Andreas Gruenbacher
f3dd164912 gfs2: Remove gl_spin define
Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a
gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in
gl_lockref).  Remove that define to make the references to gl_lockref.lock more
obvious.

Signed-off-by: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-10-29 12:57:48 -05:00
Andreas Gruenbacher
8f7e0a806d gfs2: A minor "sbstats" cleanup
It seems cleaner to avoid the temporary value here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-09-03 13:34:14 -05:00
Ben Hutchings
4d207133e9 gfs2: Make statistics unsigned, suitable for use with do_div()
None of these statistics can meaningfully be negative, and the
numerator for do_div() must have the type u64.  The generic
implementation of do_div() used on some 32-bit architectures asserts
that, resulting in a compiler error in gfs2_rgrp_congested().

Fixes: 0166b197c2 ("GFS2: Average in only non-zero round-trip times ...")

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Andreas Gruenbacher <agruenba@redhat.com>
2015-09-03 13:33:32 -05:00
Bob Peterson
88ffbf3e03 GFS2: Use resizable hash table for glocks
This patch changes the glock hash table from a normal hash table to
a resizable hash table, which scales better. This also simplifies
a lot of code.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-09-03 13:33:24 -05:00
Bob Peterson
15562c439d GFS2: Move glock superblock pointer to field gl_name
What uniquely identifies a glock in the glock hash table is not
gl_name, but gl_name and its superblock pointer. This patch makes
the gl_name field correspond to a unique glock identifier. That will
allow us to simplify hashing with a future patch, since the hash
algorithm can then take the gl_name and hash its components in one
operation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-09-03 13:33:09 -05:00