Commit Graph

184 Commits

Author SHA1 Message Date
Christian Brauner
549c729771
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:20 +01:00
David Howells
366911cd76 afs: Fix directory entry size calculation
The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.

The calculation we need to use is:

	1 + (((strlen(name) + 1) + 15) >> 5)

where we are adding one to the strlen() result to account for the NUL
termination.

Fix this by the following means:

 (1) Create an inline function to do the calculation for a given name
     length.

 (2) Use the function to calculate the number of records used for a dirent
     in afs_dir_iterate_block().

     Use this to move the over-end check out of the loop since it only
     needs to be done once.

     Further use this to only go through the loop for the 2nd+ records
     composing an entry.  The only test there now is for if the record is
     allocated - and we already checked the first block at the top of the
     outer loop.

 (3) Add a max name length check in afs_dir_iterate_block().

 (4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
     from (1) to calculate the number of blocks rather than doing it
     incorrectly themselves.

Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
2021-01-04 12:25:19 +00:00
David Howells
a9e5c87ca7 afs: Fix speculative status fetch going out of order wrt to modifications
When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).

When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes.  What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.

This results in something like the following being seen in dmesg:

	kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification.  This was causing the cache to be
invalidated for that vnode when it shouldn't have been.  If it happens
on a data file, this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.

Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway.  It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 11:27:03 -08:00
David Howells
fa04a40b16 afs: Fix to take ref on page when PG_private is set
Fix afs to take a ref on a page when it sets PG_private on it and to drop
the ref when removing the flag.

Note that in afs_write_begin(), a lot of the time, PG_private is already
set on a page to which we're going to add some data.  In such a case, we
leave the bit set and mustn't increment the page count.

As suggested by Matthew Wilcox, use attach/detach_page_private() where
possible.

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-29 13:53:04 +00:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
David Howells
f8ea5c7bce afs: Fix afs_do_lookup() to call correct fetch-status op variant
Fix afs_do_lookup()'s fallback case for when FS.InlineBulkStatus isn't
supported by the server.

In the fallback, it calls FS.FetchStatus for the specific vnode it's
meant to be looking up.  Commit b6489a49f7 broke this by renaming one
of the two identically-named afs_fetch_status_operation descriptors to
something else so that one of them could be made non-static.  The site
that used the renamed one, however, wasn't renamed and didn't produce
any warning because the other was declared in a header.

Fix this by making afs_do_lookup() use the renamed variant.

Note that there are two variants of the success method because one is
called from ->lookup() where we may or may not have an inode, but can't
call iget until after we've talked to the server - whereas the other is
called from within iget where we have an inode, but it may or may not be
initialised.

The latter variant expects there to be an inode, but because it's being
called from there former case, there might not be - resulting in an oops
like the following:

  BUG: kernel NULL pointer dereference, address: 00000000000000b0
  ...
  RIP: 0010:afs_fetch_status_success+0x27/0x7e
  ...
  Call Trace:
    afs_wait_for_operation+0xda/0x234
    afs_do_lookup+0x2fe/0x3c1
    afs_lookup+0x3c5/0x4bd
    __lookup_slow+0xcd/0x10f
    walk_component+0xa2/0x10c
    path_lookupat.isra.0+0x80/0x110
    filename_lookup+0x81/0x104
    vfs_statx+0x76/0x109
    __do_sys_newlstat+0x39/0x6b
    do_syscall_64+0x4c/0x78
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: b6489a49f7 ("afs: Fix silly rename")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-20 12:01:58 -07:00
David Howells
b6489a49f7 afs: Fix silly rename
Fix AFS's silly rename by the following means:

 (1) Set the destination directory in afs_do_silly_rename() so as to avoid
     misbehaviour and indicate that the directory data version will
     increment by 1 so as to avoid warnings about unexpected changes in the
     DV.  Also indicate that the ctime should be updated to avoid xfstest
     grumbling.

 (2) Note when the server indicates that a directory changed more than we
     expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a
     third party change, checking on successful completion of unlink and
     rename.

     The problem is that the FS.RemoveFile RPC op doesn't report the status
     of the unlinked file, though YFS.RemoveFile2 does.  This can be
     mitigated by the assumption that if the directory DV cranked by
     exactly 1, we can be sure we removed one link from the file; further,
     ordinarily in AFS, files cannot be hardlinked across directories, so
     if we reduce nlink to 0, the file is deleted.

     However, if the directory DV jumps by more than 1, we cannot know if a
     third party intervened by adding or removing a link on the file we
     just removed a link from.

     The same also goes for any vnode that is at the destination of the
     FS.Rename RPC op.

 (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock
     section along with the other attribute updates if ->op_unlinked is set
     on the descriptor for the appropriate vnode.

 (4) Issue a follow up status fetch to the unlinked file in the event of a
     third party conflict that makes it impossible for us to know if we
     actually deleted the file or not.

 (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to
     the user about the nlink of a silly deleted file so that it appears as
     0, not 1.

Found with the generic/035 and generic/084 xfstests.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 22:00:28 +01:00
David Howells
728279a5a1 afs: Fix use of afs_check_for_remote_deletion()
afs_check_for_remote_deletion() checks to see if error ENOENT is returned
by the server in response to an operation and, if so, marks the primary
vnode as having been deleted as the FID is no longer valid.

However, it's being called from the operation success functions, where no
abort has happened - and if an inline abort is recorded, it's handled by
afs_vnode_commit_status().

Fix this by actually calling the operation aborted method if provided and
having that point to afs_check_for_remote_deletion().

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
44767c3531 afs: Remove afs_operation::abort_code
Remove afs_operation::abort_code as it's read but never set.  Use
ac.abort_code instead.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
da8d075512 afs: Concoct ctimes
The in-kernel afs filesystem ignores ctime because the AFS fileserver
protocol doesn't support ctimes.  This, however, causes various xfstests to
fail.

Work around this by:

 (1) Setting ctime to attr->ia_ctime in afs_setattr().

 (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings.

 (3) Setting the ctime from the server mtime when on the target file when
     creating a hard link to it.

 (4) Setting the ctime on directories from their revised mtimes when
     renaming/moving a file.

Found by the generic/221 and generic/309 xfstests.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-15 15:41:02 +01:00
David Howells
fed79fd783 afs: Fix debugging statements with %px to be %p
Fix a couple of %px to be %p in debugging statements.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Fixes: 8a070a9648 ("afs: Detect cell aliases 1 - Cells with root volumes")
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
2020-06-09 18:17:14 +01:00
David Howells
20325960f8 afs: Reorganise volume and server trees to be rooted on the cell
Reorganise afs_volume objects such that they're in a tree keyed on volume
ID, rooted at on an afs_cell object rather than being in multiple trees,
each of which is rooted on an afs_server object.

afs_server structs become per-cell and acquire a pointer to the cell.

The process of breaking a callback then starts with finding the server by
its network address, following that to the cell and then looking up each
volume ID in the volume tree.

This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
and allows those structs and the code for maintaining them to be simplified
or removed.

It does make a couple of things a bit more tricky, though:

 (1) Operations now start with a volume, not a server, so there can be more
     than one answer as to whether or not the server we'll end up using
     supports the FS.InlineBulkStatus RPC.

 (2) CB RPC operations that specify the server UUID.  There's still a tree
     of servers by UUID on the afs_net struct, but the UUIDs in it aren't
     guaranteed unique.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:57 +01:00
David Howells
e49c7b2f6d afs: Build an abstraction around an "operation" concept
Turn the afs_operation struct into the main way that most fileserver
operations are managed.  Various things are added to the struct, including
the following:

 (1) All the parameters and results of the relevant operations are moved
     into it, removing corresponding fields from the afs_call struct.
     afs_call gets a pointer to the op.

 (2) The target volume is made the main focus of the operation, rather than
     the target vnode(s), and a bunch of op->vnode->volume are made
     op->volume instead.

 (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
     in most operations.  The vnode record (struct afs_vnode_param)
     contains:

	- The vnode pointer.

	- The fid of the vnode to be included in the parameters or that was
          returned in the reply (eg. FS.MakeDir).

	- The status and callback information that may be returned in the
     	  reply about the vnode.

	- Callback break and data version tracking for detecting
          simultaneous third-parth changes.

 (4) Pointers to dentries to be updated with new inodes.

 (5) An operations table pointer.  The table includes pointers to functions
     for issuing AFS and YFS-variant RPCs, handling the success and abort
     of an operation and handling post-I/O-lock local editing of a
     directory.

To make this work, the following function restructuring is made:

 (A) The rotation loop that issues calls to fileservers that can be found
     in each function that wants to issue an RPC (such as afs_mkdir()) is
     extracted out into common code, in a new file called fs_operation.c.

 (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
     a much smaller piece of code that allocates an operation, sets the
     parameters and then calls out to the common code to do the actual
     work.

 (C) The code for handling the success and failure of an operation are
     moved into operation functions (as (5) above) and these are called
     from the core code at appropriate times.

 (D) The pseudo inode getting stuff used by the dynamic root code is moved
     over into dynroot.c.

 (E) struct afs_iget_data is absorbed into the operation struct and
     afs_iget() expects to be given an op pointer and a vnode record.

 (F) Point (E) doesn't work for the root dir of a volume, but we know the
     FID in advance (it's always vnode 1, unique 1), so a separate inode
     getter, afs_root_iget(), is provided to special-case that.

 (G) The inode status init/update functions now also take an op and a vnode
     record.

 (H) The RPC marshalling functions now, for the most part, just take an
     afs_operation struct as their only argument.  All the data they need
     is held there.  The result delivery functions write their answers
     there as well.

 (I) The call is attached to the operation and then the operation core does
     the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

 (*) Overhauling the rotation (again).

 (*) Support for asynchronous I/O, where the fileserver rotation must be
     done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:17 +01:00
David Howells
a310082f6d afs: Rename struct afs_fs_cursor to afs_operation
As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.

This struct is going to form the core of the operation management and is
going to acquire more members in later.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:52 +01:00
David Howells
13fcc6356a afs: Always include dir in bulk status fetch from afs_do_lookup()
When a lookup is done in an AFS directory, the filesystem will speculate
and fetch up to 49 other statuses for files in the same directory and fetch
those as well, turning them into inodes or updating inodes that already
exist.

However, occasionally, a callback break might go missing due to NAT timing
out, but the afs filesystem doesn't then realise that the directory is not
up to date.

Alleviate this by using one of the status slots to check the directory in
which the lookup is being done.

Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Suggested-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
40fc81027f afs: Fix afs_d_validate() to set the right directory version
If a dentry's version is somewhere between invalid_before and the current
directory version, we should be setting it forward to the current version,
not backwards to the invalid_before version.  Note that we're only doing
this at all because dentry::d_fsdata isn't large enough on a 32-bit system.

Fix this by using a separate variable for invalid_before so that we don't
accidentally clobber the current dir version.

Fixes: a4ff7401fb ("afs: Keep track of invalid-before version for dentry coherency")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13 15:09:01 +01:00
David Howells
2105c2820d afs: Fix race between post-modification dir edit and readdir/d_revalidate
AFS directories are retained locally as a structured file, with lookup
being effected by a local search of the file contents.  When a modification
(such as mkdir) happens, the dir file content is modified locally rather
than redownloading the directory.

The directory contents are accessed in a number of ways, with a number of
different locks schemes:

 (1) Download of contents - dvnode->validate_lock/write in afs_read_dir().

 (2) Lookup and readdir - dvnode->validate_lock/read in afs_dir_iterate(),
     downgrading from (1) if necessary.

 (3) d_revalidate of child dentry - dvnode->validate_lock/read in
     afs_do_lookup_one() downgrading from (1) if necessary.

 (4) Edit of dir after modification - page locks on individual dir pages.

Unfortunately, because (4) uses different locking scheme to (1) - (3),
nothing protects against the page being scanned whilst the edit is
underway.  Even download is not safe as it doesn't lock the pages - relying
instead on the validate_lock to serialise as a whole (the theory being that
directory contents are treated as a block and always downloaded as a
block).

Fix this by write-locking dvnode->validate_lock around the edits.  Care
must be taken in the rename case as there may be two different dirs - but
they need not be locked at the same time.  In any case, once the lock is
taken, the directory version must be rechecked, and the edit skipped if a
later version has been downloaded by revalidation (there can't have been
any local changes because the VFS holds the inode lock, but there can have
been remote changes).

Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13 15:09:01 +01:00
David Howells
b98f0ec91c afs: Fix rename operation status delivery
The afs_deliver_fs_rename() and yfs_deliver_fs_rename() functions both only
decode the second file status returned unless the parent directories are
different - unfortunately, this means that the xdr pointer isn't advanced
and the volsync record will be read incorrectly in such an instance.

Fix this by always decoding the second status into the second
status/callback block which wasn't being used if the dirs were the same.

The afs_update_dentry_version() calls that update the directory data
version numbers on the dentries can then unconditionally use the second
status record as this will always reflect the state of the destination dir
(the two records will be identical if the destination dir is the same as
the source dir)

Fixes: 260a980317 ("[AFS]: Add "directory write" support.")
Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-13 15:09:01 +01:00
David Howells
f52b83b0b1 afs: Fix afs_lookup() to not clobber the version on a new dentry
Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).

Fixes: 9dd0b82ef5 ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
David Howells
40a708bd62 afs: Fix use-after-loss-of-ref
afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.

Fix this by caching the fid.

Fixes: 80548b0399 ("afs: Add more tracepoints")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
David Howells
a28f239e29 afs: Fix race in commit bulk status fetch
When a lookup is done, the afs filesystem will perform a bulk status-fetch
operation on the requested vnode (file) plus the next 49 other vnodes from
the directory list (in AFS, directory contents are downloaded as blobs and
parsed locally).  When the results are received, it will speculatively
populate the inode cache from the extra data.

However, if the lookup races with another lookup on the same directory, but
for a different file - one that's in the 49 extra fetches, then if the bulk
status-fetch operation finishes first, it will try and update the inode
from the other lookup.

If this other inode is still in the throes of being created, however, this
will cause an assertion failure in afs_apply_status():

	BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));

on or about fs/afs/inode.c:175 because it expects data to be there already
that it can compare to.

Fix this by skipping the update if the inode is being created as the
creator will presumably set up the inode with the same information.

Fixes: 39db9815da ("afs: Fix application of the results of a inline bulk status fetch")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15 10:28:02 -08:00
David Howells
a0753c2900 afs: Support RCU pathwalk
Make afs_permission() and afs_d_revalidate() do initial checks in RCU-mode
pathwalk to reduce latency in pathwalk elements that get done multiple
times.  We don't need to query the server unless we've received a
notification from it that something has changed or the callback has
expired.

This requires that we can request a key and check permits under RCU
conditions if we need to.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02 11:43:54 +01:00
Marc Dionne
c4c613ff08 afs: Fix possible oops in afs_lookup trace event
The afs_lookup trace event can cause the following:

[  216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
[  216.576803] #PF: supervisor read access in kernel mode
[  216.576813] #PF: error_code(0x0000) - not-present page
...
[  216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]

If the inode from afs_do_lookup() is an error other than ENOENT, or if it
is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
try to dereference the error pointer as a valid pointer.

Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.

Ideally the trace would include the error value, but for now just avoid
the oops.

Fixes: 80548b0399 ("afs: Add more tracepoints")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-22 13:33:26 +01:00
David Howells
9dd0b82ef5 afs: Fix missing dentry data version updating
In the in-kernel afs filesystem, the d_fsdata dentry field is used to hold
the data version of the parent directory when it was created or when
d_revalidate() last caused it to be updated.  This is compared to the
->invalid_before field in the directory inode, rather than the actual data
version number, thereby allowing changes due to local edits to be ignored.
Only if the server data version gets bumped unexpectedly (eg. by a
competing client), do we need to revalidate stuff.

However, the d_fsdata field should also be updated if an rpc op is
performed that modifies that particular dentry.  Such ops return the
revised data version of the directory(ies) involved, so we should use that.

This is particularly problematic for rename, since a dentry from one
directory may be moved directly into another directory (ie. mv a/x b/x).
It would then be sporting the wrong data version - and if this is in the
future, for the destination directory, revalidations would be missed,
leading to foreign renames and hard-link deletion being missed.

Fix this by the following means:

 (1) Return the data version number from operations that read the directory
     contents - if they issue the read.  This starts in afs_dir_iterate()
     and is used, ignored or passed back by its callers.

 (2) In afs_lookup*(), set the dentry version to the version returned by
     (1) before d_splice_alias() is called and the dentry published.

 (3) In afs_d_revalidate(), set the dentry version to that returned from
     (1) if an rpc call was issued.  This means that if a parallel
     procedure, such as mkdir(), modifies the directory, we won't
     accidentally use the data version from that.

 (4) In afs_{mkdir,create,link,symlink}(), set the new dentry's version to
     the directory data version before d_instantiate() is called.

 (5) In afs_{rmdir,unlink}, update the target dentry's version to the
     directory data version as soon as we've updated the directory inode.

 (6) In afs_rename(), we need to unhash the old dentry before we start so
     that we don't get afs_d_revalidate() reverting the version change in
     cross-directory renames.

     We then need to set both the old and the new dentry versions the data
     version of the new directory before we call d_move() as d_move() will
     rehash them.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:52 +01:00
David Howells
5dc84855b0 afs: Only update d_fsdata if different in afs_d_revalidate()
In the in-kernel afs filesystem, d_fsdata is set with the data version of
the parent directory.  afs_d_revalidate() will update this to the current
directory version, but it shouldn't do this if it the value it read from
d_fsdata is the same as no lock is held and cmpxchg() is not used.

Fix the code to only change the value if it is different from the current
directory version.

Fixes: 260a980317 ("[AFS]: Add "directory write" support.")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:51 +01:00
David Howells
37c0bbb332 afs: Fix off-by-one in afs_rename() expected data version calculation
When afs_rename() calculates the expected data version of the target
directory in a cross-directory rename, it doesn't increment it as it
should, so it always thinks that the target inode is unexpectedly modified
on the server.

Fixes: a58823ac45 ("afs: Fix application of status and callback to be under same lock")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:51 +01:00
Linus Torvalds
8dda9957e3 AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRyW8vu3V2unywtrAQIhsw//cVtxLx4ZCox5Z/93cdqych8RoCrwcUEG
 Cli0NAjlp/0HETvCsIqdkPKf+4OYCW1tHB2KTdbFdQLZptLgoEhykx89k70z9ggb
 ViieEa1GvAKhdamVqkPUC+3Q33uzyRaK7Gi5N3phJoaO+o328SlrPG0LerQgY0Np
 Rf3je56A1gIjEgWTmpStxiY262jlgaR3IuvpOqbu2G0TQVWV8CsBKw61fTdmEEQp
 dIkNO/xFXS+PvPdmQe5zCAjD/W2D+ggeBMbBwHF411qA60plGinubBYKZ98ikliZ
 OnQQPExI7mroIMzpYT+rzEQyxui2nz5t+Hj+d6t7iIvitNcX/Q53sVTq3RfQ0FjG
 QCd+j/l2p7fkXK4Sxgb/UBkj/pRr6W+FYSbQ/tmpD8UypEf5B3ln6GuA6yTMuNRF
 wVb744slKWq0c7KUuXmz806B2qJoyFG206jyFnoByvs6cPmB1+JqhBBYOKHcwjbo
 HIK+oUKkEfE6ofjQ3B9xOQ1anfbRnjjfJCmXvns9v57y/nRP2P78HUJNnEsOolk2
 nc3Ep41OgeZdwkts9KnSjmwy6VF3UZ2NQEiWXsUIOxGMtcodw9ci1bpquJ71oyut
 4sFMJvMU4eJD+XuCOlAgpbTaQ0Wuf11kFpl1Cof4fj0Z09C25Ahj6iKEKnumtO+4
 edfNLlwO6oo=
 =wgib
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "A set of minor changes for AFS:

   - Remove an unnecessary check in afs_unlink()

   - Add a tracepoint for tracking callback management

   - Add a tracepoint for afs_server object usage

   - Use struct_size()

   - Add mappings for AFS UAE abort codes to Linux error codes, using
     symbolic names rather than hex numbers in the .c file"

* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Add support for the UAE error table
  fs/afs: use struct_size() in kzalloc()
  afs: Trace afs_server usage
  afs: Add some callback management tracepoints
  afs: afs_unlink() doesn't need to check dentry->d_inode
2019-07-10 20:55:33 -07:00
Zhengyuan Liu
ee102584ef fs/afs: use struct_size() in kzalloc()
As Gustavo said in other patches doing the same replace, we can now
use the new struct_size() helper to avoid leaving these open-coded and
prone to type mistake.

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
David Howells
051d25250b afs: Add some callback management tracepoints
Add a couple of tracepoints to track callback management:

 (1) afs_cb_miss - Logs when we were unable to apply a callback, either due
     to the inode being discarded or due to a competing thread applying a
     callback first.

 (2) afs_cb_break - Logs when we attempted to clear the noted callback
     promise, either due to the server explicitly breaking the callback,
     the callback promise lapsing or a local event obsoleting it.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
David Howells
fa59f52f5b afs: afs_unlink() doesn't need to check dentry->d_inode
Don't check that dentry->d_inode is valid in afs_unlink().  We should be
able to take that as given.

This caused Smatch to issue the following warning:

	fs/afs/dir.c:1392 afs_unlink() error: we previously assumed 'vnode' could be null (see line 1375)

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
David Howells
39db9815da afs: Fix application of the results of a inline bulk status fetch
Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
it will update inodes that are already extant (something that afs_iget()
doesn't do) and to cache permits for each inode created (thereby avoiding a
follow up FS.FetchStatus call to determine this).

Extant inodes need looking up in advance so that their cb_break counters
before and after the operation can be compared.  To this end, the inode
pointers are cached so that they don't need looking up again after the op.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
b835915325 afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.

This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
a38a75581e afs: Fix unlink to handle YFS.RemoveFile2 better
Make use of the status update for the target file that the YFS.RemoveFile2
RPC op returns to correctly update the vnode as to whether the file was
actually deleted or just had nlink reduced.

Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
f642404a04 afs: Make vnode->cb_interest RCU safe
Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
vnode->cb_interest.  Use that change to allow afs_check_validity() to use
read_seqbegin_or_lock() instead of read_seqlock_excl().

This also requires the caller of afs_check_validity() to hold the RCU read
lock across the call.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
a58823ac45 afs: Fix application of status and callback to be under same lock
When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

 (1) Allocating a joint {status,callback} record (afs_status_cb) before
     calling the RPC function for each vnode for which the RPC reply
     contains a status or a status plus a callback.  A flag is set in the
     record to indicate if a callback was actually received.

 (2) These records are passed into the RPC functions to be filled in.  The
     afs_decode_status() and yfs_decode_status() functions are removed and
     the cb_lock is no longer taken.

 (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
     update the vnode.

 (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
     the vnode.

 (5) vnodes, expected data-version numbers and callback break counters
     (cb_break) no longer need to be passed to the reply delivery
     functions.

     Note that, for the moment, the file locking functions still need
     access to both the call and the vnode at the same time.

 (6) afs_vnode_commit_status() is now given the cb_break value and the
     expected data_version and the task of applying the status and the
     callback to the vnode are now done here.

     This is done under a single taking of vnode->cb_lock.

 (7) afs_pages_written_back() is now called by afs_store_data() rather than
     by the reply delivery function.

     afs_pages_written_back() has been moved to before the call point and
     is now given the first and last page numbers rather than a pointer to
     the call.

 (8) The indicator from YFS.RemoveFile2 as to whether the target file
     actually got removed (status.abort_code == VNOVNODE) rather than
     merely dropping a link is now checked in afs_unlink rather than in
     xdr_decode_YFSFetchStatus().

Supplementary fixes:

 (*) afs_cache_permit() now gets the caller_access mask from the
     afs_status_cb object rather than picking it out of the vnode's status
     record.  afs_fetch_status() returns caller_access through its argument
     list for this purpose also.

 (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
     than a read lock and now sets the callback inside the same critical
     section.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
87182759cd afs: Fix order-1 allocation in afs_do_lookup()
afs_do_lookup() will do an order-1 allocation to allocate status records if
there are more than 39 vnodes to stat.

Fix this by allocating an array of {status,callback} records for each vnode
we want to examine using vmalloc() if larger than a page.

This not only gets rid of the order-1 allocation, but makes it easier to
grow beyond 50 records for YFS servers.  It also allows us to move to
{status,callback} tuples for other calls too and makes it easier to lock
across the application of the status and the callback to the vnode.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
20b8391fff afs: Make some RPC operations non-interruptible
Make certain RPC operations non-interruptible, including:

 (*) Set attributes
 (*) Store data

     We don't want to get interrupted during a flush on close, flush on
     unlock, writeback or an inode update, leaving us in a state where we
     still need to do the writeback or update.

 (*) Extend lock
 (*) Release lock

     We don't want to get lock extension interrupted as the file locks on
     the server are time-limited.  Interruption during lock release is less
     of an issue since the lock is time-limited, but it's better to
     complete the release to avoid a several-minute wait to recover it.

     *Setting* the lock isn't a problem if it's interrupted since we can
      just return to the user and tell them they were interrupted - at
      which point they can elect to retry.

 (*) Silly unlink

     We want to remove silly unlink files if we can, rather than leaving
     them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

	kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list.  The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells
b134d687dd afs: Log more information for "kAFS: AFS vnode with undefined type\n"
Log more information when "kAFS: AFS vnode with undefined type\n" is
displayed due to a vnode record being retrieved from the server that
appears to have a duff file type (usually 0).  This prints more information
to try and help pin down the problem.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
David Howells
80548b0399 afs: Add more tracepoints
Add four more tracepoints:

 (1) afs_make_fs_call1 - Split from afs_make_fs_call but takes a filename
     to log also.

 (2) afs_make_fs_call2 - Like the above but takes two filenames to log.

 (3) afs_lookup - Log the result of doing a successful lookup, including a
     negative result (fid 0:0).

 (4) afs_get_tree - Log the set up of a volume for mounting.

It also extends the name buffer on the afs_edit_dir tracepoint to 24 chars
and puts quotes around the filename in the text representation.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells
79ddbfa500 afs: Implement sillyrename for unlink and rename
Implement sillyrename for AFS unlink and rename, using the NFS variant
implementation as a basis.

Note that the asynchronous file locking extender/releaser has to be
notified with a state change to stop it complaining if there's a race
between that and the actual file deletion.

A tracepoint, afs_silly_rename, is also added to note the silly rename and
the cleanup.  The afs_edit_dir tracepoint is given some extra reason
indicators and the afs_flock_ev tracepoint is given a silly-delete file
lock cancellation indicator.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells
99987c5600 afs: Add directory reload tracepoint
Add a tracepoint (afs_reload_dir) to indicate when a directory is being
reloaded.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells
445b10289f afs: Improve dir check failure reports
Improve the content of directory check failure reports from:

	kAFS: afs_dir_check_page(6d57): bad magic 1/2 is 0000

to dump more information about the individual blocks in a directory page.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells
73116df7bb afs: Use d_instantiate() rather than d_add() and don't d_drop()
Use d_instantiate() rather than d_add() and don't d_drop() in
afs_vnode_new_inode().  The dentry shouldn't be removed as it's not
changing its name.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29 21:08:14 -05:00
David Howells
30062bd13e afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
f58db83fd3 afs: Get the target vnode in afs_rmdir() and get a callback on it
Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
0067191201 afs: Commit the status on a new file/dir/symlink
Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.

Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
3b6492df41 afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
f51375cd9e afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
Al Viro
1401a0fc2d afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT)
simpler logics in callers that way

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-05 15:50:59 -04:00
Al Viro
34b2a88fb4 afs_lookup(): switch to d_splice_alias()
->lookup() methods can (and should) use d_splice_alias() instead of
d_add().  Even if they are not going to be hit by open_by_handle(),
code does get copied around; besides, d_splice_alias() has better
calling conventions for use in ->lookup(), so the code gets simpler.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-05 15:44:14 -04:00
David Howells
68251f0a68 afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
b61f7dcf4e afs: Fix directory page locking
The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data.  As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.

Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data.  Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.

The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.

Without this patch, the:

 	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
		...
	}

part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:

  page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
  flags: 0xfffe000001004(referenced|private)
  raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
  raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  page->mem_cgroup:ffff98156b27c000
  ------------[ cut here ]------------
  kernel BUG at mm/filemap.c:1205!
  ...
  RIP: 0010:unlock_page+0x43/0x50
  ...
  Call Trace:
   afs_dir_iterate+0x789/0x8f0 [kafs]
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x166/0x1d0
   ? afs_do_lookup+0x69/0x490 [kafs]
   ? afs_do_lookup+0x101/0x490 [kafs]
   ? key_default_cmp+0x20/0x20
   ? request_key+0x3c/0x80
   ? afs_lookup+0xf1/0x340 [kafs]
   ? __lookup_slow+0x97/0x150
   ? lookup_slow+0x35/0x50
   ? walk_component+0x1bf/0x490
   ? path_lookupat.isra.52+0x75/0x200
   ? filename_lookup.part.66+0xa0/0x170
   ? afs_end_vnode_operation+0x41/0x60 [kafs]
   ? __check_object_size+0x9c/0x171
   ? strncpy_from_user+0x4a/0x170
   ? vfs_statx+0x73/0xe0
   ? __do_sys_newlstat+0x39/0x70
   ? __x64_sys_getdents+0xc9/0x140
   ? __x64_sys_getdents+0x140/0x140
   ? do_syscall_64+0x5b/0x160
   ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
5a81327616 afs: Do better accretion of small writes on newly created content
Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between.  Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

 (1) If a file is just created locally or has just been truncated with
     O_TRUNC locally, allow subsequent writes to the file to be merged with
     intervening space if that space doesn't cross an entire intervening
     page.

 (2) Don't flush the file on ->flush() but rather on ->release() if the
     file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

	file-wr : n=441 nb=513581204

and after the patch:

	file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
63a4681ff3 afs: Locally edit directory data for mkdir/create/unlink/...
Locally edit the contents of an AFS directory upon a successful inode
operation that modifies that directory (such as mkdir, create and unlink)
so that we can avoid the current practice of re-downloading the directory
after each change.

This is viable provided that the directory version number we get back from
the modifying RPC op is exactly incremented by 1 from what we had
previously.  The data in the directory contents is in a defined format that
we have to parse locally to perform lookups and readdir, so modifying isn't
a problem.

If the edit fails, we just clear the VALID flag on the directory and it
will be reloaded next time it is needed.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
0031763698 afs: Adjust the directory XDR structures
Adjust the AFS directory XDR structures in a number of superficial ways:

 (1) Rename them to all begin afs_xdr_.

 (2) Use u8 instead of uint8_t.

 (3) Mark the structures as __packed so they don't get rearranged by the
     compiler.

 (4) Rename the hdr member of afs_xdr_dir_block to meta.

 (5) Rename the pagehdr member of afs_xdr_dir_block to hdr.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
4ea219a839 afs: Split the directory content defs into a header
Split the directory content definitions into a header file so that they can
be used by multiple .c files.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
f3ddee8dc4 afs: Fix directory handling
AFS directories are structured blobs that are downloaded just like files
and then parsed by the lookup and readdir code and, as such, are currently
handled in the pagecache like any other file, with the entire directory
content being thrown away each time the directory changes.

However, since the blob is a known structure and since the data version
counter on a directory increases by exactly one for each change committed
to that directory, we can actually edit the directory locally rather than
fetching it from the server after each locally-induced change.

What we can't do, though, is mix data from the server and data from the
client since the server is technically at liberty to rearrange or compress
a directory if it sees fit, provided it updates the data version number
when it does so and breaks the callback (ie. sends a notification).

Further, lookup with lookup-ahead, readdir and, when it arrives, local
editing are likely want to scan the whole of a directory.

So directory handling needs to be improved to maintain the coherency of the
directory blob prior to permitting local directory editing.

To this end:

 (1) If any directory page gets discarded, invalidate and reread the entire
     directory.

 (2) If readpage notes that if when it fetches a single page that the
     version number has changed, the entire directory is flagged for
     invalidation.

 (3) Read as much of the directory in one go as we can.

Note that this removes local caching of directories in fscache for the
moment as we can't pass the pages to fscache_read_or_alloc_pages() since
page->lru is in use by the LRU.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
66c7e1d319 afs: Split the dynroot stuff out and give it its own ops tables
Split the AFS dynamic root stuff out of the main directory handling file
and into its own file as they share little in common.

The dynamic root code also gets its own dentry and inode ops tables.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:00 +01:00
David Howells
a4ff7401fb afs: Keep track of invalid-before version for dentry coherency
Each afs dentry is tagged with the version that the parent directory was at
last time it was validated and, currently, if this differs, the directory
is scanned and the dentry is refreshed.

However, this leads to an excessive amount of revalidation on directories
that get modified on the client without conflict with another client.  We
know there's no conflict because the parent directory's data version number
got incremented by exactly 1 on any create, mkdir, unlink, etc., therefore
we can trust the current state of the unaffected dentries when we perform a
local directory modification.

Optimise by keeping track of the last version of the parent directory that
was changed outside of the client in the parent directory's vnode and using
that to validate the dentries rather than the current version.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:59 +01:00
David Howells
d55b4da433 afs: Introduce a statistics proc file
Introduce a proc file that displays a bunch of statistics for the AFS
filesystem in the current network namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:54 +01:00
David Howells
37ab636880 afs: Implement @cell substitution handling
Implement @cell substitution handling such that if @cell is seen as a name
in a dynamic root mount, then the name of the root cell for that network
namespace will be substituted for @cell during lookup.

The substitution of @cell for the current net namespace is set by writing
the cell name to /proc/fs/afs/rootcell.  The value can be obtained by
reading the file.

For example:

	# mount -t afs none /kafs -o dyn
	# echo grand.central.org >/proc/fs/afs/rootcell
	# ls /kafs/@cell
	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/
	# cat /proc/fs/afs/rootcell
	grand.central.org

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:18:58 +01:00
David Howells
6f8880d8e6 afs: Implement @sys substitution handling
Implement the AFS feature by which @sys at the end of a pathname component
may be substituted for one of a list of values, typically naming the
operating system.  Up to 16 alternatives may be specified and these are
tried in turn until one works.  Each network namespace has[*] a separate
independent list.

Upon creation of a new network namespace, the list of values is
initialised[*] to a single OpenAFS-compatible string representing arch type
plus "_linux26".  For example, on x86_64, the sysname is "amd64_linux26".

[*] Or will, once network namespace support is finalised in kAFS.

The list may be set by:

	# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname

for which separate writes to the same fd are amalgamated and applied on
close.  The LF character may be used as a separator to specify multiple
items in the same write() call.

The list may be cleared by:

	# echo >/proc/fs/afs/sysname

and read by:

	# cat /proc/fs/afs/sysname
	foo
	bar
	linux-x86_64

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
5cf9dd55a0 afs: Prospectively look up extra files when doing a single lookup
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.

This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
4d673da145 afs: Support the AFS dynamic root
Support the AFS dynamic root which is a pseudo-volume that doesn't connect
to any server resource, but rather is just a root directory that
dynamically creates mountpoint directories where the name of such a
directory is the name of the cell.

Such a mount can be created thus:

	mount -t afs none /afs -o dyn

Dynamic root superblocks aren't shared except by bind mounts and
propagation.  Cell root volumes can then be mounted by referring to them by
name, e.g.:

	ls /afs/grand.central.org/
	ls /afs/.grand.central.org/

The kernel will upcall to consult the DNS if the address wasn't supplied
directly.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06 14:43:37 +00:00
David Howells
440fbc3a8a afs: Fix unlink
Repeating creation and deletion of a file on an afs mount will run the box
out of memory, e.g.:

	dd if=/dev/zero of=/afs/scratch/m0 bs=$((1024*1024)) count=512
	rm /afs/scratch/m0

The problem seems to be that it's not properly decrementing the nlink count
so that the inode can be scrapped.

Note that this doesn't fix local creation followed by remote deletion.
That's harder to handle and will require a separate patch as we're not told
that the file has been deleted - only that the directory has changed.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-01-02 10:02:19 +00:00
Colin Ian King
43dd388b21 afs: remove redundant assignment of dvnode to itself
The assignment of dvnode to itself is redundant and can be removed.
Cleans up warning detected by cppcheck:

fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24 13:55:46 +00:00
David Howells
4433b69141 afs: Fix signal handling in some file ops
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop
the target dentry if a signal causes the operation to be killed immediately
before we try to contact the server.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24 13:55:35 +00:00
David Howells
bc1527dcb4 afs: Fix some dentry handling in dir ops and missing key_puts
Fix some of dentry handling in AFS directory ops:

 (1) Do d_drop() on the new_dentry before assigning a new inode to it in
     afs_vnode_new_inode().  It's fine to do this before calling afs_iget()
     because the operation has taken place on the server.

 (2) Replace d_instantiate()/d_rehash() with d_add().

 (3) Don't d_drop() the new_dentry in afs_rename() on error.

Also fix afs_link() and afs_rename() to call key_put() on all error paths
where the key is taken.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24 10:56:51 +00:00
David Howells
215804a992 afs: Introduce a file-private data record
Introduce a file-private data record for kAFS and put the key into it
rather than storing the key in file->private_data.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
dab17c1add afs: Fix directory read/modify race
Because parsing of the directory wasn't being done under any sort of lock,
the pages holding the directory content can get invalidated whilst the
parsing is ongoing.

Further, the directory page check function gets called outside of the page
lock, so if the page gets cleared or updated, this may return reports of
bad magic numbers in the directory page.

Also, the directory may change size whilst checking and parsing are
ongoing, so more care needs to be taken here.

Fix this by:

 (1) Perform the page check from the page filling function before we set
     PageUptodate and drop the page lock.

 (2) Check for the file having shrunk and the page having been abandoned
     before checking the page contents.

 (3) Lock the page whilst parsing it for the directory iterator.

Whilst we're at it, add a tracepoint to report check failure.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
d2ddc776a4 afs: Overhaul volume and server record caching and fileserver rotation
The current code assumes that volumes and servers are per-cell and are
never shared, but this is not enforced, and, indeed, public cells do exist
that are aliases of each other.  Further, an organisation can, say, set up
a public cell and a private cell with overlapping, but not identical, sets
of servers.  The difference is purely in the database attached to the VL
servers.

The current code will malfunction if it sees a server in two cells as it
assumes global address -> server record mappings and that each server is in
just one cell.

Further, each server may have multiple addresses - and may have addresses
of different families (IPv4 and IPv6, say).

To this end, the following structural changes are made:

 (1) Server record management is overhauled:

     (a) Server records are made independent of cell.  The namespace keeps
     	 track of them, volume records have lists of them and each vnode
     	 has a server on which its callback interest currently resides.

     (b) The cell record no longer keeps a list of servers known to be in
     	 that cell.

     (c) The server records are now kept in a flat list because there's no
     	 single address to sort on.

     (d) Server records are now keyed by their UUID within the namespace.

     (e) The addresses for a server are obtained with the VL.GetAddrsU
     	 rather than with VL.GetEntryByName, using the server's UUID as a
     	 parameter.

     (f) Cached server records are garbage collected after a period of
     	 non-use and are counted out of existence before purging is allowed
     	 to complete.  This protects the work functions against rmmod.

     (g) The servers list is now in /proc/fs/afs/servers.

 (2) Volume record management is overhauled:

     (a) An RCU-replaceable server list is introduced.  This tracks both
     	 servers and their coresponding callback interests.

     (b) The superblock is now keyed on cell record and numeric volume ID.

     (c) The volume record is now tied to the superblock which mounts it,
     	 and is activated when mounted and deactivated when unmounted.
     	 This makes it easier to handle the cache cookie without causing a
     	 double-use in fscache.

     (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
     	 to get the server UUID list.

     (e) The volume name is updated if it is seen to have changed when the
     	 volume is updated (the update is keyed on the volume ID).

 (3) The vlocation record is got rid of and VLDB records are no longer
     cached.  Sufficient information is stored in the volume record, though
     an update to a volume record is now no longer shared between related
     volumes (volumes come in bundles of three: R/W, R/O and backup).

and the following procedural changes are made:

 (1) The fileserver cursor introduced previously is now fleshed out and
     used to iterate over fileservers and their addresses.

 (2) Volume status is checked during iteration, and the server list is
     replaced if a change is detected.

 (3) Server status is checked during iteration, and the address list is
     replaced if a change is detected.

 (4) The abort code is saved into the address list cursor and -ECONNABORTED
     returned in afs_make_call() if a remote abort happened rather than
     translating the abort into an error message.  This allows actions to
     be taken depending on the abort code more easily.

     (a) If a VMOVED abort is seen then this is handled by rechecking the
     	 volume and restarting the iteration.

     (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
         handled by sleeping for a short period and retrying and/or trying
         other servers that might serve that volume.  A message is also
         displayed once until the condition has cleared.

     (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
     	 moment.

     (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
     	 see if it has been deleted; if not, the fileserver is probably
     	 indicating that the volume couldn't be attached and needs
     	 salvaging.

     (e) If statfs() sees one of these aborts, it does not sleep, but
     	 rather returns an error, so as not to block the umount program.

 (5) The fileserver iteration functions in vnode.c are now merged into
     their callers and more heavily macroised around the cursor.  vnode.c
     is removed.

 (6) Operations on a particular vnode are serialised on that vnode because
     the server will lock that vnode whilst it operates on it, so a second
     op sent will just have to wait.

 (7) Fileservers are probed with FS.GetCapabilities before being used.
     This is where service upgrade will be done.

 (8) A callback interest on a fileserver is set up before an FS operation
     is performed and passed through to afs_make_call() so that it can be
     set on the vnode if the operation returns a callback.  The callback
     interest is passed through to afs_iget() also so that it can be set
     there too.

In general, record updating is done on an as-needed basis when we try to
access servers, volumes or vnodes rather than offloading it to work items
and special threads.

Notes:

 (1) Pre AFS-3.4 servers are no longer supported, though this can be added
     back if necessary (AFS-3.4 was released in 1998).

 (2) VBUSY is retried forever for the moment at intervals of 1s.

 (3) /proc/fs/afs/<cell>/servers no longer exists.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
c435ee3455 afs: Overhaul the callback handling
Overhaul the AFS callback handling by the following means:

 (1) Don't give up callback promises on vnodes that we are no longer using,
     rather let them just expire on the server or let the server break
     them.  This is actually more efficient for the server as the callback
     lookup is expensive if there are lots of extant callbacks.

 (2) Only give up the callback promises we have from a server when the
     server record is destroyed.  Then we can just give up *all* the
     callback promises on it in one go.

 (3) Servers can end up being shared between cells if cells are aliased, so
     don't add all the vnodes being backed by a particular server into a
     big FID-indexed tree on that server as there may be duplicates.

     Instead have each volume instance (~= superblock) register an interest
     in a server as it starts to make use of it and use this to allow the
     processor for callbacks from the server to find the superblock and
     thence the inode corresponding to the FID being broken by means of
     ilookup_nowait().

 (4) Rather than iterating over the entire callback list when a mass-break
     comes in from the server, maintain a counter of mass-breaks in
     afs_server (cb_seq) and make afs_validate() check it against the copy
     in afs_vnode.

     It would be nice not to have to take a read_lock whilst doing this,
     but that's tricky without using RCU.

 (5) Save a ref on the fileserver we're using for a call in the afs_call
     struct so that we can access its cb_s_break during call decoding.

 (6) Write-lock around callback and status storage in a vnode and read-lock
     around getattr so that we don't see the status mid-update.

This has the following consequences:

 (1) Data invalidation isn't seen until someone calls afs_validate() on a
     vnode.  Unfortunately, we need to use a key to query the server, but
     getting one from a background thread is tricky without caching loads
     of keys all over the place.

 (2) Mass invalidation isn't seen until someone calls afs_validate().

 (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
     Could this be replaced with rcu_read_lock() since inodes are destroyed
     under RCU conditions.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
9ed900b116 afs: Push the net ns pointer to more places
Push the network namespace pointer to more places in AFS, including the
afs_server structure (which doesn't hold a ref on the netns).

In particular, afs_put_cell() now takes requires a net ns parameter so that
it can safely alter the netns after decrementing the cell usage count - the
cell will be deallocated by a background thread after being cached for a
period, which means that it's not safe to access it after reducing its
usage count.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
d3e3b7eac8 afs: Add metadata xattrs
Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
available.  The following xattrs are now available:

 - "afs.cell"

   The name of the cell in which the vnode's volume resides.

 - "afs.fid"

   The volume ID, vnode ID and vnode uniquifier of the file as three hex
   numbers separated by colons.

 - "afs.volume"

   The name of the volume in which the vnode resides.

For example:

	# getfattr -d -m ".*" /mnt/scratch
	getfattr: Removing leading '/' from absolute path names
	# file: mnt/scratch
	afs.cell="mycell.myorg.org"
	afs.fid="10000b:1:1"
	afs.volume="scratch"

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-09 14:40:12 -07:00
Alexey Dobriyan
5b5e0928f7 lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Miklos Szeredi
2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Miklos Szeredi
1cd66c93ba fs: make remaining filesystems use .rename2
This is trivial to do:

 - add flags argument to foo_rename()
 - check if flags is zero
 - assign foo_rename() to .rename2 instead of .rename

This doesn't mean it's impossible to support RENAME_NOREPLACE for these
filesystems, but it is not trivial, like for local filesystems.
RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
for a file to be created on one host while it is overwritten by rename on
another host).

Filesystems converted:

9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

After this, we can get rid of the duplicate interfaces for rename.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Howells <dhowells@redhat.com> [AFS]
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Mark Fasheh <mfasheh@suse.com>
2016-09-27 11:03:58 +02:00
Al Viro
29884eff1f afs: switch to ->iterate_shared()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-10 14:27:44 -04:00
Al Viro
be5b82dbfe make ext2_get_page() and friends work without external serialization
Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:25 -04:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
David Howells
2b0143b5c9 VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:57 -04:00
Al Viro
a455589f18 assorted conversions to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:20 -05:00
Miklos Szeredi
ac7576f4b1 vfs: make first argument of dir_context.actor typed
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-31 17:48:54 -04:00
Eric W. Biederman
9b053f3207 vfs: Remove unnecessary calls of check_submounts_and_drop
Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:56 -04:00
Al Viro
13f3583892 afs: dget_parent() can't return a negative dentry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-29 22:02:24 -04:00
Al Viro
5d8943b04b afs: get rid of redundant ->d_name.len checks
No dentry can get to directory modification methods without
having passed either ->lookup() or ->atomic_open(); if name is
rejected by those two (or by ->d_hash()) with an error, it won't
be seen by anything else.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:55 -04:00
Miklos Szeredi
ba81238076 afs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:51 -04:00
Al Viro
1bbae9f818 [readdir] convert afs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:58 +04:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Al Viro
ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro
0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro
4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
David Howells
d6e43f751f AFS: Use i_generation not i_version for the vnode uniquifier
Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version
number.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:48 -04:00
Sage Weil
705da2de95 afs: remove unnecessary dentry_unhash on rmdir, dir rename
afs has no problems with references to unlinked directories.

CC: David Howells <dhowells@redhat.com>
CC: linux-afs@lists.infradead.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-28 01:02:53 -04:00
Sage Weil
e4eaac06bc vfs: push dentry_unhash on rename_dir into file systems
Only a few file systems need this.  Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26 07:26:48 -04:00
Sage Weil
79bf7c732b vfs: push dentry_unhash on rmdir into file systems
Only a few file systems need this.  Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.

This does not change behavior for any in-tree file systems.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26 07:26:47 -04:00
David Howells
d18610b0ce AFS: Use d_automount() rather than abusing follow_link()
Make AFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:33 -05:00