mirror of
https://git.proxmox.com/git/mirror_ubuntu-kernels.git
synced 2025-11-25 12:31:07 +00:00
In blk-cgroup, operations on blkg objects are protected with the request_queue lock. This is no more the lock that protects I/O-scheduler operations in blk-mq. In fact, the latter are now protected with a finer-grained per-scheduler-instance lock. As a consequence, although blkg lookups are also rcu-protected, blk-mq I/O schedulers may see inconsistent data when they access blkg and blkg-related objects. BFQ does access these objects, and does incur this problem, in the following case. The blkg_lookup performed in bfq_get_queue, being protected (only) through rcu, may happen to return the address of a copy of the original blkg. If this is the case, then the blkg_get performed in bfq_get_queue, to pin down the blkg, is useless: it does not prevent blk-cgroup code from destroying both the original blkg and all objects directly or indirectly referred by the copy of the blkg. BFQ accesses these objects, which typically causes a crash for NULL-pointer dereference of memory-protection violation. Some additional protection mechanism should be added to blk-cgroup to address this issue. In the meantime, this commit provides a quick temporary fix for BFQ: cache (when safe) blkg data that might disappear right after a blkg_lookup. In particular, this commit exploits the following facts to achieve its goal without introducing further locks. Destroy operations on a blkg invoke, as a first step, hooks of the scheduler associated with the blkg. And these hooks are executed with bfqd->lock held for BFQ. As a consequence, for any blkg associated with the request queue an instance of BFQ is attached to, we are guaranteed that such a blkg is not destroyed, and that all the pointers it contains are consistent, while that instance is holding its bfqd->lock. A blkg_lookup performed with bfqd->lock held then returns a fully consistent blkg, which remains consistent until this lock is held. In more detail, this holds even if the returned blkg is a copy of the original one. Finally, also the object describing a group inside BFQ needs to be protected from destruction on the blkg_free of the original blkg (which invokes bfq_pd_free). This commit adds private refcounting for this object, to let it disappear only after no bfq_queue refers to it any longer. This commit also removes or updates some stale comments on locking issues related to blk-cgroup operations. Reported-by: Tomas Konir <tomas.konir@gmail.com> Reported-by: Lee Tibbert <lee.tibbert@gmail.com> Reported-by: Marco Piazza <mpiazza@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Tomas Konir <tomas.konir@gmail.com> Tested-by: Lee Tibbert <lee.tibbert@gmail.com> Tested-by: Marco Piazza <mpiazza@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|---|---|---|
| .. | ||
| partitions | ||
| badblocks.c | ||
| bfq-cgroup.c | ||
| bfq-iosched.c | ||
| bfq-iosched.h | ||
| bfq-wf2q.c | ||
| bio-integrity.c | ||
| bio.c | ||
| blk-cgroup.c | ||
| blk-core.c | ||
| blk-exec.c | ||
| blk-flush.c | ||
| blk-integrity.c | ||
| blk-ioc.c | ||
| blk-lib.c | ||
| blk-map.c | ||
| blk-merge.c | ||
| blk-mq-cpumap.c | ||
| blk-mq-debugfs.c | ||
| blk-mq-debugfs.h | ||
| blk-mq-pci.c | ||
| blk-mq-sched.c | ||
| blk-mq-sched.h | ||
| blk-mq-sysfs.c | ||
| blk-mq-tag.c | ||
| blk-mq-tag.h | ||
| blk-mq-virtio.c | ||
| blk-mq.c | ||
| blk-mq.h | ||
| blk-settings.c | ||
| blk-softirq.c | ||
| blk-stat.c | ||
| blk-stat.h | ||
| blk-sysfs.c | ||
| blk-tag.c | ||
| blk-throttle.c | ||
| blk-timeout.c | ||
| blk-wbt.c | ||
| blk-wbt.h | ||
| blk-zoned.c | ||
| blk.h | ||
| bounce.c | ||
| bsg-lib.c | ||
| bsg.c | ||
| cfq-iosched.c | ||
| cmdline-parser.c | ||
| compat_ioctl.c | ||
| deadline-iosched.c | ||
| elevator.c | ||
| genhd.c | ||
| ioctl.c | ||
| ioprio.c | ||
| Kconfig | ||
| Kconfig.iosched | ||
| kyber-iosched.c | ||
| Makefile | ||
| mq-deadline.c | ||
| noop-iosched.c | ||
| opal_proto.h | ||
| partition-generic.c | ||
| scsi_ioctl.c | ||
| sed-opal.c | ||
| t10-pi.c | ||