As described in commit 02117b8ae9 ("f2fs: Set GF_NOFS in
read_cache_page_gfp while doing f2fs_quota_read"), we must not enter
filesystem reclaim while holding the dq_lock. Prevent this more generally
by using memalloc_nofs_save() while holding the lock.
Link: https://lore.kernel.org/r/20220605143815.2330891-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Move most of the block related definition out of fs.h into more suitable
headers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.
As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.
Previous efforts were made in 2 functions, see:
c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes
but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.
One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
"Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
Basically, the problem is that negative pinned dentries require
careful treatment - unless ->d_lock is locked or parent is held at
least shared, another thread can make them positive right under us.
Most of the uses turned out to be safe - the main surprises as far as
filesystems are concerned were
- race in dget_parent() fastpath, that might end up with the caller
observing the returned dentry _negative_, due to insufficient
barriers. It is positive in memory, but we could end up seeing the
wrong value of ->d_inode in CPU cache. Fixed.
- manual checks that result of lookup_one_len_unlocked() is positive
(and rejection of negatives). Again, insufficient barriers (we
might end up with inconsistent observed values of ->d_inode and
->d_flags). Fixed by switching to a new primitive that does the
checks itself and returns ERR_PTR(-ENOENT) instead of a negative
dentry. That way we get rid of boilerplate converting negatives
into ERR_PTR(-ENOENT) in the callers and have a single place to
deal with the barrier-related mess - inside fs/namei.c rather than
in every caller out there.
The guts of pathname resolution *do* need to be careful - the race
found by Ritesh is real, as well as several similar races.
Fortunately, it turns out that we can take care of that with fairly
local changes in there.
The tree-wide audit had not been fun, and I hate the idea of repeating
it. I think the right approach would be to annotate the places where
we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
catch regressions. But I'm still not sure what would be the least
invasive way of doing that and it's clearly the next cycle fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/namei.c: fix missing barriers when checking positivity
fix dget_parent() fastpath race
new helper: lookup_positive_unlocked()
fs/namei.c: pull positivity check into follow_managed()
Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Quota statistics counted as 64-bit per-cpu counter. Reading sums per-cpu
fractions as signed 64-bit int, filters negative values and then reports
lower half as signed 32-bit int.
Result may looks like:
fs.quota.allocated_dquots = 22327
fs.quota.cache_hits = -489852115
fs.quota.drops = -487288718
fs.quota.free_dquots = 22083
fs.quota.lookups = -486883485
fs.quota.reads = 22327
fs.quota.syncs = 335064
fs.quota.writes = 3088689
Values bigger than 2^31-1 reported as negative.
All counters except "allocated_dquots" and "free_dquots" are monotonic,
thus they should be reported as is without filtering negative values.
Kernel doesn't have generic helper for 64-bit sysctl yet,
let's use at least unsigned long.
Link: https://lore.kernel.org/r/157337934693.2078.9842146413181153727.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
Make dquot_get_state() gracefully handle a situation when there are no
quota files present even though quotas are enabled.
Signed-off-by: Jan Kara <jack@suse.cz>
Quota on and quota off are protected by s_umount semaphore held in
exclusive mode since commit 7d6cd73d33 "quota: Hold s_umount in
exclusive mode when enabling / disabling quotas". This makes it
impossible for dquot_disable() to race with other enabling or disabling
of quotas. Simplify the cleanup done by dquot_disable() based on this
fact and also remove some stale comments. As a bonus this cleanup makes
dquot_disable() properly handle a case when there are no quota inodes.
Signed-off-by: Jan Kara <jack@suse.cz>
Now dquot_enable() has only two internal callers and both of them just
need to update quota flags and don't need most of checks. Just drop
dquot_enable() and fold necessary functionality into the two calling
places.
Signed-off-by: Jan Kara <jack@suse.cz>
Rename vfs_load_quota_inode() to dquot_load_quota_inode() to be
consistent with naming of other functions used for enabling quota
accounting from filesystems. Also export the function and add some
sanity checks to assure filesystems are calling the function properly.
Signed-off-by: Jan Kara <jack@suse.cz>
We already have quota inode loaded when resuming quotas. Use
vfs_load_quota() to avoid some pointless churn with the quota inode.
Signed-off-by: Jan Kara <jack@suse.cz>
Factor out setting up of quota inode and eventual error cleanup from
vfs_load_quota_inode(). This will simplify situation for filesystems
that don't have any quota inodes.
Signed-off-by: Jan Kara <jack@suse.cz>
There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
but before we grab dquot->dq_lock inside dquot_release()
TASK1 TASK2 (chowner)
->dqput()
we_slept:
spin_lock(&dq_list_lock)
if (dquot_dirty(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->write_dquot(dquot);
goto we_slept
if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
dqget()
mark_dquot_dirty()
dqput()
goto we_slept;
}
So dquot dirty quota will be released by TASK1, but on next we_sleept loop
we detect this and call ->write_dquot() for it.
XFSTEST: 440a80d4cb
Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
It is meaningless to increase DQST_LOOKUPS number while iterating
over dirty/inuse list, so just avoid it.
Link: https://lore.kernel.org/r/20190926083408.4269-1-cgxu519@zoho.com.cn
Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn>
Signed-off-by: Jan Kara <jack@suse.cz>
We reset time limit when current usage is smaller
or equal to soft limit in other place, so follow
this rule in do_set_dqblk().
Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn>
Link: https://lore.kernel.org/r/20190724053216.19392-1-cgxu519@zoho.com.cn
Signed-off-by: Jan Kara <jack@suse.cz>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl0mED8ACgkQnJ2qBz9k
QNnZ9wgAmC+eP8m6jB38HM7gZ+fWGEX3+FvnjbMbnXmNJTsnYWYC1VIRZhwKZb4b
42OGinfLq5tZMY/whrFBdB/c4UbVhAMhd1aFTpM2n5A6FR12YZxaLZEC+MLy3T7z
VU8m4uWDn80OvlUByo4Bylh+Icj78m8tLgj8SHSWxoh/DlGVKSLj9OKufV9Laens
YxubcUxE5sEEu8IVQen84283oJoizmeQf+f9yogAKIaskDLBzxqBIZwEACEUUchz
kEWRiHwS+Ou8EUHuwXqdKKksQgoLHEdxz2szYK1xSQ1wPmxMKPG5DqbQZv2QUBD0
Ek5T5YP4Tmph4s14n+jKDhakAJcqIQ==
=HWaa
-----END PGP SIGNATURE-----
Merge tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, udf and quota updates from Jan Kara:
- some ext2 fixes and cleanups
- a fix of udf bug when extending files
- a fix of quota Q_XGETQSTAT[V] handling
* tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
ext2: Use kmemdup rather than duplicating its implementation
quota: honor quota type in Q_XGETQSTAT[V] calls
ext2: Always brelse bh on failure in ext2_iget()
ext2: add missing brelse() in ext2_iget()
ext2: Fix a typo in ext2_getattr argument
ext2: fix a typo in comment
ext2: add missing brelse() in ext2_new_inode()
ext2: optimize ext2_xattr_get()
ext2: introduce new helper for xattr entry comparison
ext2: merge xattr next entry check to ext2_xattr_entry_valid()
ext2: code cleanup for ext2_preread_inode()
ext2: code cleanup by using test_opt() and clear_opt()
doc: ext2: update description of quota options for ext2
ext2: Strengthen xattr block checks
ext2: Merge loops in ext2_xattr_set()
ext2: introduce helper for xattr entry validation
ext2: introduce helper for xattr header validation
quota: add dqi_dirty_list description to comment of Dquot List Management
Run below script as root, dquot_add_space will return -EDQUOT since
__dquot_transfer call dquot_add_space with flags=0, and dquot_add_space
think it's a preallocation. Fix it by set flags as DQUOT_SPACE_WARN.
mkfs.ext4 -O quota,project /dev/vdb
mount -o prjquota /dev/vdb /mnt
setquota -P 23 1 1 0 0 /dev/vdb
dd if=/dev/zero of=/mnt/test-file bs=4K count=1
chattr -p 23 test-file
Fixes: 7b9ca4c61b ("quota: Reduce contention on dq_data_lock")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Actually there are four lists for dquot management, so add
the description of dqui_dirty_list to comment.
Signed-off-by: Chengguang Xu <cgxu519@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When we fail from allocating inode/space, we back out
the change we already did. In a special case which has
exceeded soft limit by the change, we should also check
time limit and reset it properly.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Local variable *reserved* of remove_dquot_ref() is only used if
define CONFIG_QUOTA_DEBUG, but not ebraced in CONFIG_QUOTA_DEBUG
macro, which leads to unused-but-set-variable warning when compiling.
This patch ebrace it into CONFIG_QUOTA_DEBUG macro like what is done
in add_dquot_ref().
Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We need to check return code only when calling ->read_dqblk(),
so fix it properly.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Use list_first_entry() and list_empty() instead of opencoded variants.
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The dquots in the free_dquots list are not reclaimed in LRU way.
put_dquot_last() puts entries to the tail and dqcache_shrink_scan()
frees from the tail. Free unreferenced dquots in LRU order because it
seems more reasonable than freeing most recently used.
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
dquot_init() is never called in atomic context.
This function is only set as a parameter of fs_initcall().
Despite never getting called from atomic context,
dquot_init() calls __get_free_pages() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.
This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
register_shrinker() might return -ENOMEM error since Linux 3.12.
Call panic() as with other failure checks in this function if
register_shrinker() failed.
Fixes: 1d3d4437ea ("vmscan: per-node deferred work")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
In commit 6184fc0b8d ("quota: Propagate error from ->acquire_dquot()"),
we have propagated error from __dquot_initialize to caller, but we forgot
to handle such error in add_dquot_ref(), so, currently, during quota
accounting information initialization flow, if we failed for some of
inodes, we just ignore such error, and do account for others, which is
not a good implementation.
In this patch, we choose to let user be aware of such error, so after
turning on quota successfully, we can make sure all inodes disk usage
can be accounted, which will be more reasonable.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull quota, ext2, isofs and udf fixes from Jan Kara:
- two small quota error handling fixes
- two isofs fixes for architectures with signed char
- several udf block number overflow and signedness fixes
- ext2 rework of mount option handling to avoid GFP_KERNEL allocation
with spinlock held
- ... it also contains a patch to implement auditing of responses to
fanotify permission events. That should have been in the fanotify
pull request but I mistakenly merged that patch into a wrong branch
and noticed only now at which point I don't think it's worth rebasing
and redoing.
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: be aware of error from dquot_initialize
quota: fix potential infinite loop
isofs: use unsigned char types consistently
isofs: fix timestamps beyond 2027
udf: Fix some sign-conversion warnings
udf: Fix signed/unsigned format specifiers
udf: Fix 64-bit sign extension issues affecting blocks > 0x7FFFFFFF
udf: Remove some outdate references from documentation
udf: Avoid overflow when session starts at large offset
ext2: Fix possible sleep in atomic during mount option parsing
ext2: Parse mount options into a dedicated structure
audit: Record fanotify access control decisions
Commit 6184fc0b8d ("quota: Propagate error from ->acquire_dquot()")
missed to handle error from dquot_initialize in dquot_file_open, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In dquot_writeback_dquots(), we write back dquot from dirty dquots
list. There is a potential infinite loop if ->write_dquot() failure
and forget remove dquot from the list. This patch clear dirty bit
anyway to avoid it.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Eryu has reported that since commit 7b9ca4c61b "quota: Reduce
contention on dq_data_lock" test generic/233 occasionally fails. This is
caused by the fact that since that commit we don't generate warning and
set grace time for quota allocations that have DQUOT_SPACE_NOFAIL set
(these are for example some metadata allocations in ext4). We need these
allocations to behave regularly wrt warning generation and grace time
setting so fix the code to return to the original behavior.
Reported-and-tested-by: Eryu Guan <eguan@redhat.com>
CC: stable@vger.kernel.org
Fixes: 7b9ca4c61b
Signed-off-by: Jan Kara <jack@suse.cz>
Lock dq_dqb_lock around dquot_decr_inodes()
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 7b9ca4c61b ("quota: Reduce contention on dq_data_lock")
Signed-off-by: Jan Kara <jack@suse.cz>
Pull quota scaling updates from Jan Kara:
"This contains changes to make the quota subsystem more scalable.
Reportedly it improves number of files created per second on ext4
filesystem on fast storage by about a factor of 2x"
* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
quota: Add lock annotations to struct members
quota: Reduce contention on dq_data_lock
fs: Provide __inode_get_bytes()
quota: Inline dquot_[re]claim_reserved_space() into callsite
quota: Inline inode_{incr,decr}_space() into callsites
quota: Inline functions into their callsites
ext4: Disable dirty list tracking of dquots when journalling quotas
quota: Allow disabling tracking of dirty dquots in a list
quota: Remove dq_wait_unused from dquot
quota: Move locking into clear_dquot_dirty()
quota: Do not dirty bad dquots
quota: Fix possible corruption of dqi_flags
quota: Propagate ->quota_read errors from v2_read_file_info()
quota: Fix error codes in v2_read_file_info()
quota: Push dqio_sem down to ->read_file_info()
quota: Push dqio_sem down to ->write_file_info()
quota: Push dqio_sem down to ->get_next_id()
quota: Push dqio_sem down to ->release_dqblk()
quota: Remove locking for writing to the old quota format
quota: Do not acquire dqio_sem for dquot overwrites in v2 format
...
dq_data_lock is currently used to protect all modifications of quota
accounting information, consistency of quota accounting on the inode,
and dquot pointers from inode. As a result contention on the lock can be
pretty heavy.
Reduce the contention on the lock by protecting quota accounting
information by a new dquot->dq_dqb_lock and consistency of quota
accounting with inode usage by inode->i_lock.
This change reduces time to create 500000 files on ext4 on ramdisk by 50
different processes in separate directories by 6% when user quota is
turned on. When those 50 processes belong to 50 different users, the
improvement is about 9%.
Signed-off-by: Jan Kara <jack@suse.cz>
inode_incr_space() and inode_decr_space() have only two callsites.
Inline them there as that will make locking changes simpler.
Signed-off-by: Jan Kara <jack@suse.cz>
inode_add_rsv_space() and inode_sub_rsv_space() had only one callsite.
Inline them there directly. inode_claim_rsv_space() and
inode_reclaim_rsv_space() had two callsites so inline them there as
well. This will simplify further locking changes.
Signed-off-by: Jan Kara <jack@suse.cz>
Filesystems that are journalling quotas generally don't need tracking of
dirty dquots in a list since forcing a transaction commit flushes all
quotas anyway. Allow filesystem to say it doesn't want dquots to be
tracked as it reduces contention on the dq_list_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Currently every dquot carries a wait_queue_head_t used only when we are
turning quotas off to wait for last users to drop dquot references.
Since such rare case is not performance sensitive in any means, just use
a global waitqueue for this and save space in struct dquot. Also convert
the logic to use wait_event() instead of open-coding it.
Signed-off-by: Jan Kara <jack@suse.cz>
Move locking of dq_list_lock into clear_dquot_dirty(). It makes the
function more self-contained and will simplify our life later.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently we mark dirty even dquots that are not active (i.e.,
initialization or reading failed for them). Thus later we have to check
whether dirty dquot is really active and just clear the dirty bit if
not. Avoid this complication by just never marking non-active dquot as
dirty.
Signed-off-by: Jan Kara <jack@suse.cz>
dqi_flags modifications are protected by dq_data_lock. However the
modifications in vfs_load_quota_inode() and in mark_info_dirty() were
not which could lead to corruption of dqi_flags. Since modifications to
dqi_flags are rare, this is hard to observe in practice but in theory it
could happen. Fix the problem by always using dq_data_lock for
protection.
Signed-off-by: Jan Kara <jack@suse.cz>
Push down acquisition of dqio_sem into ->read_file_info() callback. This
is for consistency with other operations and it also allows us to get
rid of an ugliness in OCFS2.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Push down acquisition of dqio_sem into ->write_file_info() callback.
Mostly for consistency with other operations.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Push down acquisition of dqio_sem into ->get_next_id() callback. Mostly
for consistency with other operations.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Push down acquisition of dqio_sem into ->release_dqblk() callback. It
will allow quota formats to decide whether they need it or not.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Push down acquisition of dqio_sem into ->write_dqblk() callback. It will
allow quota formats to decide whether they need it or not.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Push down acquisition of dqio_sem into ->read_dqblk() callback. It will
allow quota formats to decide whether they need it or not.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently dquot writeout is only protected by dqio_sem held for writing.
As we transition to a finer grained locking we will use dquot->dq_lock
instead. So acquire it in dquot_commit() and move dqio_sem just around
->commit_dqblk() call as it is still needed to serialize quota file
changes.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
vfs_load_quota_inode() needs dqio_sem only for reading. In fact dqio_sem
is not needed there at all since the function can be called only during
quota on when quota file cannot be modified but let's leave the
protection there since it is logical and the path is in no way
performance critical.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
dquot_get_next_id() needs dqio_sem only for reading to protect against
racing with modification of quota file structure.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
We need dqio_sem held just for reading when calling ->read_dqblk() in
dquot_acquire(). Also dqio_sem is not needed when setting DQ_READ_B and
DQ_ACTIVE_B as concurrent reads and dquot activations are serialized by
dq_lock. So acquire and release dqio_sem closer to the place where it is
needed. This reduces lock hold time and will make locking changes
easier.
Signed-off-by: Jan Kara <jack@suse.cz>
Currently we compare total space (curspace + rsvspace)
with space limit in quota-tools when setting grace time
and also in check_bdq(), but we missing rsvspace in
somewhere else, correct them. This patch also fix incorrect
zero dqb_btime and grace time updating failure when we use
rsvspace(e.g. ext4 dalloc feature).
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Ext4 ea_inode feature allows storing xattr values in external inodes to
be able to store values that are bigger than a block in size. Ext4 also
has deduplication support for these type of inodes. With deduplication,
the actual storage waste is eliminated but the users of such inodes are
still charged full quota for the inodes as if there was no sharing
happening in the background.
This design requires ext4 to manually charge the users because the
inodes are shared.
An implication of this is that, if someone calls chown on a file that
has such references we need to transfer the quota for the file and xattr
inodes. Current dquot_transfer() function implicitly transfers one inode
charge. With ea_inode feature, we would like to transfer multiple inode
charges.
Add get_inode_usage callback which can interrogate the total number of
inodes that were charged for a given inode.
[ Applied fix from Colin King to make sure the 'ret' variable is
initialized on the successful return path. Detected by
CoverityScan, CID#1446616 ("Uninitialized scalar variable") --tytso]
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jan Kara <jack@suse.cz>
ext4_xattr_block_set() calls dquot_alloc_block() to charge for an xattr
block when new references are made. However if dquot_initialize() hasn't
been called on an inode, request for charging is effectively ignored
because ext4_inode_info->i_dquot is not initialized yet.
Add dquot_initialize() to call paths that lead to ext4_xattr_block_set().
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Currently we set IMMUTABLE and NOATIME flags on quota files to stop
userspace from messing with them. Now that all filesystems set these
flags in their quota_on handlers, we can stop setting the flags in
generic quota code. This will allow filesystems to stop copying i_flags
to their on-disk flags on various occasions.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.
Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull quota, fsnotify and ext2 updates from Jan Kara:
"Changes to locking of some quota operations from dedicated quota mutex
to s_umount semaphore, a fsnotify fix and a simple ext2 fix"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix bogus warning in dquot_disable()
fsnotify: Fix possible use-after-free in inode iteration on umount
ext2: reject inodes with negative size
quota: Remove dqonoff_mutex
ocfs2: Use s_umount for quota recovery protection
quota: Remove dqonoff_mutex from dquot_scan_active()
ocfs2: Protect periodic quota syncing with s_umount semaphore
quota: Use s_umount protection for quota operations
quota: Hold s_umount in exclusive mode when enabling / disabling quotas
fs: Provide function to get superblock with exclusive s_umount
dquot_disable() was warning when sb_has_quota_loaded() was true when
invalidating page cache for quota files. The thinking behind this
warning was that we must have raced with somebody else turning quotas on
and this should not happen because all places modifying quota state must
hold s_umount exclusively now. However sb_has_quota_loaded() can be also
true at this point when we are just suspending quotas on remount
read-only. Just restore the behavior to situation before commit
c3b004460d ("quota: Remove dqonoff_mutex") which introduced the
warning.
The code in dquot_disable() can be further simplified with the new
locking of quota state changes however let's leave that to a separate
commit that can get more testing exposure.
Fixes: c3b004460d
Signed-off-by: Jan Kara <jack@suse.cz>
The only places that were grabbing dqonoff_mutex are functions turning
quotas on and off and these are properly serialized using s_umount
semaphore. Remove dqonoff_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
All callers of dquot_scan_active() now hold s_umount so we can rely on
that lock to protect us against quota state changes.
Signed-off-by: Jan Kara <jack@suse.cz>
Writeback quota is protected by s_umount semaphore held for reading
because every writeback must be protected by that lock (grabbed either
by the generic writeback code or by quotactl handler). Getting next
available ID in quota file, querying quota state, setting quota
information, getting quota format are all quotactl operations protected
by s_umount semaphore held for reading grabbed in quotactl handler.
This also fixes lockdep splat about possible deadlock during filesystem
freezing where sync_filesystem() is called with page-faults already
blocked but sync_filesystem() calls into dquot_writeback_dquots() which
grabs dqonoff_mutex which ranks above i_mutex (vfs_load_quota_inode()
grabs i_mutex under dqonoff_mutex) which clearly ranks below page fault
freeze protection (e.g. via mmap_sem dependencies). The reported problem
is not a real deadlock possibility since during quota on we check
whether filesystem freezing is not in progress but still it is good to
have this fixed.
Reported-by: Ted Tso <tytso@mit.edu>
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently we hold s_umount semaphore only in shared mode when enabling
or disabling quotas and use dqonoff_mutex for serializing quota state
changes on a filesystem and also quota state changes with other places
depending on current quota state. Using dedicated mutex for this causes
possible deadlocks during filesystem freezing (see following commit for
details) so we transition to using s_umount semaphore for the necessary
synchronization whose lock ordering is properly handled by the
filesystem freezing code. As a start grab s_umount in exclusive mode
when enabling / disabling quotas.
Signed-off-by: Jan Kara <jack@suse.cz>
Pull userns vfs updates from Eric Biederman:
"This tree contains some very long awaited work on generalizing the
user namespace support for mounting filesystems to include filesystems
with a backing store. The real world target is fuse but the goal is
to update the vfs to allow any filesystem to be supported. This
patchset is based on a lot of code review and testing to approach that
goal.
While looking at what is needed to support the fuse filesystem it
became clear that there were things like xattrs for security modules
that needed special treatment. That the resolution of those concerns
would not be fuse specific. That sorting out these general issues
made most sense at the generic level, where the right people could be
drawn into the conversation, and the issues could be solved for
everyone.
At a high level what this patchset does a couple of simple things:
- Add a user namespace owner (s_user_ns) to struct super_block.
- Teach the vfs to handle filesystem uids and gids not mapping into
to kuids and kgids and being reported as INVALID_UID and
INVALID_GID in vfs data structures.
By assigning a user namespace owner filesystems that are mounted with
only user namespace privilege can be detected. This allows security
modules and the like to know which mounts may not be trusted. This
also allows the set of uids and gids that are communicated to the
filesystem to be capped at the set of kuids and kgids that are in the
owning user namespace of the filesystem.
One of the crazier corner casees this handles is the case of inodes
whose i_uid or i_gid are not mapped into the vfs. Most of the code
simply doesn't care but it is easy to confuse the inode writeback path
so no operation that could cause an inode write-back is permitted for
such inodes (aka only reads are allowed).
This set of changes starts out by cleaning up the code paths involved
in user namespace permirted mounts. Then when things are clean enough
adds code that cleanly sets s_user_ns. Then additional restrictions
are added that are possible now that the filesystem superblock
contains owner information.
These changes should not affect anyone in practice, but there are some
parts of these restrictions that are changes in behavior.
- Andy's restriction on suid executables that does not honor the
suid bit when the path is from another mount namespace (think
/proc/[pid]/fd/) or when the filesystem was mounted by a less
privileged user.
- The replacement of the user namespace implicit setting of MNT_NODEV
with implicitly setting SB_I_NODEV on the filesystem superblock
instead.
Using SB_I_NODEV is a stronger form that happens to make this state
user invisible. The user visibility can be managed but it caused
problems when it was introduced from applications reasonably
expecting mount flags to be what they were set to.
There is a little bit of work remaining before it is safe to support
mounting filesystems with backing store in user namespaces, beyond
what is in this set of changes.
- Verifying the mounter has permission to read/write the block device
during mount.
- Teaching the integrity modules IMA and EVM to handle filesystems
mounted with only user namespace root and to reduce trust in their
security xattrs accordingly.
- Capturing the mounters credentials and using that for permission
checks in d_automount and the like. (Given that overlayfs already
does this, and we need the work in d_automount it make sense to
generalize this case).
Furthermore there are a few changes that are on the wishlist:
- Get all filesystems supporting posix acls using the generic posix
acls so that posix_acl_fix_xattr_from_user and
posix_acl_fix_xattr_to_user may be removed. [Maintainability]
- Reducing the permission checks in places such as remount to allow
the superblock owner to perform them.
- Allowing the superblock owner to chown files with unmapped uids and
gids to something that is mapped so the files may be treated
normally.
I am not considering even obvious relaxations of permission checks
until it is clear there are no more corner cases that need to be
locked down and handled generically.
Many thanks to Seth Forshee who kept this code alive, and putting up
with me rewriting substantial portions of what he did to handle more
corner cases, and for his diligent testing and reviewing of my
changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
fs: Call d_automount with the filesystems creds
fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
evm: Translate user/group ids relative to s_user_ns when computing HMAC
dquot: For now explicitly don't support filesystems outside of init_user_ns
quota: Handle quota data stored in s_user_ns in quota_setxquota
quota: Ensure qids map to the filesystem
vfs: Don't create inodes with a uid or gid unknown to the vfs
vfs: Don't modify inodes with a uid or gid unknown to the vfs
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
vfs: Verify acls are valid within superblock's s_user_ns.
userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
fs: Refuse uid/gid changes which don't map into s_user_ns
selinux: Add support for unprivileged mounts from user namespaces
Smack: Handle labels consistently in untrusted mounts
Smack: Add support for unprivileged mounts from user namespaces
fs: Treat foreign mounts as nosuid
fs: Limit file caps to the user namespace of the super block
userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
userns: Remove implicit MNT_NODEV fragility.
...
Mostly supporting filesystems outside of init_user_ns is
s/&init_usre_ns/dquot->dq_sb->s_user_ns/. An actual need for
supporting quotas on filesystems outside of s_user_ns is quite a ways
away and to be done responsibily needs an audit on what can happen
with hostile quota files. Until that audit is complete don't attempt
to support quota files on filesystems outside of s_user_ns.
Cc: Jan Kara <jack@suse.cz>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Introduce the helper qid_has_mapping and use it to ensure that the
quota system only considers qids that map to the filesystems
s_user_ns.
In practice for quota supporting filesystems today this is the exact
same check as qid_valid. As only 0xffffffff aka (qid_t)-1 does not
map into init_user_ns.
Replace the qid_valid calls with qid_has_mapping as values come in
from userspace. This is harmless today and it prepares the quota
system to work on filesystems with quotas but mounted by unprivileged
users.
Call qid_has_mapping from dqget. This ensures the passed in qid has a
prepresentation on the underlying filesystem. Previously this was
unnecessary as filesystesm never had qids that could not map. With
the introduction of filesystems outside of s_user_ns this will not
remain true.
All of this ensures the quota code never has to deal with qids that
don't map to the underlying filesystem.
Cc: Jan Kara <jack@suse.cz>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The quota subsystem has two formats, the old v1 format using architecture
specific time_t values on the on-disk format, while the v2 format
(introduced in Linux 2.5.16 and 2.4.22) uses fixed 64-bit little-endian.
While there is no future for the v1 format beyond y2038, the v2 format
is almost there on 32-bit architectures, as both the user interface
and the on-disk format use 64-bit timestamps, just not the time_t
inbetween.
This changes the internal representation to use time64_t, which will
end up doing the right thing everywhere for v2 format.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently we oopsed when Q_GETNEXTQUOTA got called when quota was
disabled. Properly check whether quota is enabled for the filesystem
before calling into quota format handler.
Reported-by: Ted Tso <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull UDF and quota updates from Jan Kara:
"This contains a rewrite of UDF handling of filename encoding to fix
remaining overflow issues from Andrew Gabbasov and quota changes to
support new Q_[X]GETNEXTQUOTA quotactl for VFS quota formats"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix possible GPF due to uninitialised pointers
ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes
quota: Forbid Q_GETQUOTA and Q_GETNEXTQUOTA for frozen filesystem
quota: Fix possible races during quota loading
ocfs2: Implement get_next_id()
quota_v2: Implement get_next_id() for V2 quota format
quota: Add support for ->get_nextdqblk() for VFS quota
udf: Merge linux specific translation into CS0 conversion function
udf: Remove struct ustr as non-needed intermediate storage
udf: Use separate buffer for copying split names
udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
udf: Join functions for UTF8 and NLS conversions
udf: Parameterize output length in udf_put_filename
quota: Allow Q_GETQUOTA for frozen filesystem
quota: Fixup comments about return value of Q_[X]GETNEXTQUOTA
When dqget() in __dquot_initialize() fails e.g. due to IO error,
__dquot_initialize() will pass an array of uninitialized pointers to
dqput_all() and thus can lead to deference of random data. Fix the
problem by properly initializing the array.
CC: stable@vger.kernel.org
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When loading new quota structure from disk, there is a possibility caller
of dqget() will see uninitialized data due to CPU reordering loads or
stores - loads from dquot can be reordered before test of DQ_ACTIVE_B
bit or setting of this bit could be reordered before filling of the
structure. Fix the issue by adding proper memory barriers.
Signed-off-by: Jan Kara <jack@suse.cz>
Add infrastructure for supporting get_nextdqblk() callback for VFS
quotas. Translate the operation into a callback to appropriate
filesystem and consequently to quota format callback.
Signed-off-by: Jan Kara <jack@suse.cz>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The Kconfig currently controlling compilation of this code is:
config QUOTA
bool "Quota support"
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modularity so that when reading the
driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering gets bumped to one level earlier when we
use the more appropriate fs_initcall here. However we've made similar
changes before without any fallout and none is expected here either.
We don't delete module.h because the code in turn tries to load other
modules as appropriate and so it still needs that header.
Cc: Jan Kara <jack@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull vfs updates from Al Viro:
"In this one:
- d_move fixes (Eric Biederman)
- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)
- switch of sb_writers to percpu rwsem (Oleg Nesterov)
- superblock scalability (Josef Bacik and Dave Chinner)
- swapon(2) race fix (Hugh Dickins)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
Currently when some error happened in ->acquire_dquot(), dqget() just
returned NULL. That was indistinguishable from a case when e.g. someone
run quotaoff and so was generally silently ignored. However
->acquire_dquot() can fail because of ENOSPC or EIO in which case user
should better know. So propagate error up from ->acquire_dquot properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
library helpers called by filesystem drivers on their own inodes
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds support for a new quota type PRJQUOTA for project quota
enforcement. Also a new method get_projid() is added into dquot_operations
structure.
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Change ->set_info to take new qc_info structure which contains all the
necessary information both for XFS and VFS. Convert Q_SETINFO handler
to use this structure.
Signed-off-by: Jan Kara <jack@suse.cz>
Create new internal interface for getting information about quota which
contains everything needed for both VFS quotas and XFS quotas. Make VFS
use this and hook it up to Q_GETINFO.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently maximum space limit quota format supports is in blocks however
since we store space limits in bytes, this is somewhat confusing. So
store the maximum limit in bytes as well. Also rename the field to match
the new unit and related inode field to match the new naming scheme.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Add functions which translate ->quota_enable / ->quota_disable calls
into appropriate changes in VFS quota. This will enable filesystems
supporting VFS quota files in system inodes to be controlled via
Q_XQUOTA[ON|OFF] quotactls for better userspace compatibility.
Also provide a vector for quotactl using these functions which can be
used by filesystems with quota files stored in hidden system files.
Signed-off-by: Jan Kara <jack@suse.cz>