The calculation of the upper limit for queues does not depend solely on
the number of possible CPUs; for example, the isolcpus kernel
command-line option must also be considered.
To account for this, the block layer provides a helper function to
retrieve the maximum number of queues. Use it to set an appropriate
upper queue number limit.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20250617-isolcpus-queue-counters-v1-5-13923686b54b@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove the q argument from blk_rq_map_kern and the internal helpers
called by it as the queue can trivially be derived from the request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250507120451.4000627-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfTYxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjVKD/9qZjztmVq6Rk9RBjZwMYxcO9Nzj7qQQ6m9
S15eXslAA1eLec3p1Mx4oVaWoFranY03BClqCgywBUAgpYstnT9cEqkz0P+n6xIE
bNGjfxx4NInvrQYRETskc4wQqOnAdiRMd9i96EpHqW9Pi/pl8dSQxmxlaeo0BBIM
XDvodLhr38aLJwcNBQ9NKCfhJ7RruACuSiXRAsPH3D641ZpccW4ADuhYxhJDehKa
fMzuEFaa9/mBvmIrhE3QCbvWr7VYzSkadMLJWnxLsN1PU4FXZbh4Oy5Kp7DrA4Zq
YkwezSivjNWNqNsiyvVa63mKbxfe9MSh5odqWuLrkWr4cOzEOcpHRbNV2El5RK/x
BtGt/eCT2cRQAG4MzveoiE1yG9AAmUvUZL/RvxbERedqWO69IsgrsIsdnoiaLgw/
267eCeGQlpHGhVUKga7ouShlTowTaCLCi+XgJwUTsVP/VPuzEFwgkzX0J45bSPGd
h0laUzuHcThe8cRY2t5JWu+JJTqHj6ubsPeqiMAQzCns1C+IWYsjPXEohfqt7av+
2yoIwG9DCBfJfh0ml0t3yHHMSJzjcwQcQAw1P7loLI+TIDvrpVP7AYOVt4SYeXl4
RTEvNKQRmQGNZ8B3lGrqVKnbJ5ExBzvE6muQTOhockCTQsNK7WNaT2dMWRLyW6rW
HcdUkADDVg==
=MG/C
-----END PGP SIGNATURE-----
Merge tag 'block-6.14-20250313' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Concurrent pci error and hotplug handling fix (Keith)
- Endpoint function fixes (Damien)
- Fix for a regression introduced in this cycle with error checking for
batched request completions (Shin'ichiro)
* tag 'block-6.14-20250313' of git://git.kernel.dk/linux:
block: change blk_mq_add_to_batch() third argument type to bool
nvme: move error logging from nvme_end_req() to __nvme_end_req()
nvmet: pci-epf: Do not add an IRQ vector if not needed
nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created
nvme-pci: fix stuck reset on concurrent DPC and HP
request_queue param is no longer used by blk_rq_map_sg and
__blk_rq_map_sg. Remove it.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250313035322.243239-1-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 1f47ed294a ("block: cleanup and fix batch completion adding
conditions") modified the evaluation criteria for the third argument,
'ioerror', in the blk_mq_add_to_batch() function. Initially, the
function had checked if 'ioerror' equals zero. Following the commit, it
started checking for negative error values, with the presumption that
such values, for instance -EIO, would be passed in.
However, blk_mq_add_to_batch() callers do not pass negative error
values. Instead, they pass status codes defined in various ways:
- NVMe PCI and Apple drivers pass NVMe status code
- virtio_blk driver passes the virtblk request header status byte
- null_blk driver passes blk_status_t
These codes are either zero or positive, therefore the revised check
fails to function as intended. Specifically, with the NVMe PCI driver,
this modification led to the failure of the blktests test case nvme/039.
In this test scenario, errors are artificially injected to the NVMe
driver, resulting in positive NVMe status codes passed to
blk_mq_add_to_batch(), which unexpectedly processes the failed I/O in a
batch. Hence the failure.
To correct the ioerror check within blk_mq_add_to_batch(), make all
callers to uniformly pass the argument as boolean. Modify the callers to
check their specific status codes and pass the boolean value 'is_error'.
Also describe the arguments of blK_mq_add_to_batch as kerneldoc.
Fixes: 1f47ed294a ("block: cleanup and fix batch completion adding conditions")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20250311104359.1767728-3-shinichiro.kawasaki@wdc.com
[axboe: fold in documentation update]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmec72kQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpm0pD/sENbY3LKqN8JipH8fWjLmdcimB7STB8MpB
LcuWiiuhDQuU1Yjr7c1h0Nkb6qwrb+gP+NQJrhlyUO7JVlvzFivJqsRscibNj+MF
hX/vUM2zKo0La2YxRv7aygTTCs8JfRmLKkTtfo0H4geljbX8/yMKSEJA+7x4j60X
TeJjmAHEKBGm2cNA7QMGqhzoUfjW1WQyn9x6/YOlLsYLYB8Cd7LFJqxM1tiDzUay
ys/QJ833qctdqBK5iLIh5voqZA58N3koidw536NLNgyvJn77CSAmC5jF1Vr0kFS6
Zb+dE9qRXDsND83dq3qvpSG5gtE42DygEG0g7s49zztd1rPOWE45RCxQtcZqHSvu
N9hN+/4SNX+02AfrgZFEk8+yGbtIlsb0mHRxqIKh8261tb1RTYVmtI4tqC3GVMjj
Qo41rDLDhaPOoKayo/KZO55n9U9M8BIkoF73hIIr5l9wHf5iOwi+EjzULlYhGigc
xypMvwWFLu3Of9YCJo38fTxGO6fL4eOJihoqqxNfeRACyQqbVeLWAT4wgUr2G8la
MIp2l5Vb7MmFXsES39fARLoN/8jjbgaXdxnSWhZlo4SPQaAlVA5ScXC4tKI3Gte3
O7fPVAX76PrZ+y6x4enfooI3mrpkeK676BxsokOX0m8WrVyVmIc4cfSaayc1gC5m
Uy298X93IA==
=molg
-----END PGP SIGNATURE-----
Merge tag 'block-6.14-20250131' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- MD pull request via Song:
- Fix a md-cluster regression introduced
- More sysfs race fixes
- Mark anything inside queue freezing as not being able to do IO for
memory allocations
- Fix for a regression introduced in loop in this merge window
- Fix for a regression in queue mapping setups introduced in this merge
window
- Fix for the block dio fops attempting an iov_iter revert upton
getting -EIOCBQUEUED on the read side. This one is going to stable as
well
* tag 'block-6.14-20250131' of git://git.kernel.dk/linux:
block: force noio scope in blk_mq_freeze_queue
block: fix nr_hw_queue update racing with disk addition/removal
block: get rid of request queue ->sysfs_dir_lock
loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}
md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
blk-mq: create correct map for fallback case
block: don't revert iter for -EIOCBQUEUED
When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock. Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.
Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A small number of improvements all over the place:
vdpa/octeon gained support for multiple interrupts
virtio-pci gained support for error recovery
vp_vdpa gained support for notification with data
vhost/net has been fixed to set num_buffers for spec compliance
virtio-mem now works with kdump on s390
Small cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmeXnAsPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpbsYH/0gfvGFBrILN3O06cWtm/ZEny6U86o3imvxm
5tBYOu/gh7yFqPHb3ywwz0Xy8Sty8zdIGVcod6+ioiS5JxV4m75/8eODZZHK/O+g
W+2ozgRFm07RIQX8qQxfN6MURTEw9GHWLPqHfLopbQtoKJbD0NpWnm272xlJkox2
SzuHJ2D1Sg3ItcRr0x1TVsjefQKUHFduS/nt2WfQWjCnEXEbCx3S+Jp6oFCoub6L
zgI6RLim9HdScgo5lXzbWEyJ4fEjWOypO3Z5IEXls8ZP/OEueCHZX3eZmfgbbfhP
/uCPhoIxHe4PJBFDRKogdNyV40Iq8LvF7RzhOtJjS7GFlf1bipM=
=PM05
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place:
- vdpa/octeon support for multiple interrupts
- virtio-pci support for error recovery
- vp_vdpa support for notification with data
- vhost/net fix to set num_buffers for spec compliance
- virtio-mem now works with kdump on s390
And small cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (23 commits)
virtio_blk: Add support for transport error recovery
virtio_pci: Add support for PCIe Function Level Reset
vhost/net: Set num_buffers for virtio 1.0
vdpa/octeon_ep: read vendor-specific PCI capability
virtio-pci: define type and header for PCI vendor data
vdpa/octeon_ep: handle device config change events
vdpa/octeon_ep: enable support for multiple interrupts per device
vdpa: solidrun: Replace deprecated PCI functions
s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
virtio-mem: remember usable region size
virtio-mem: mark device ready before registering callbacks in kdump mode
fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
fs/proc/vmcore: factor out freeing a list of vmcore ranges
fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
fs/proc/vmcore: move vmcore definitions out of kcore.h
fs/proc/vmcore: prefix all pr_* with "vmcore:"
fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
...
Add support for proper cleanup and re-initialization of virtio-blk devices
during transport reset error recovery flow.
This enhancement includes:
- Pre-reset handler (reset_prepare) to perform device-specific cleanup
- Post-reset handler (reset_done) to re-initialize the device
These changes allow the device to recover from various reset scenarios,
ensuring proper functionality after a reset event occurs.
Without this implementation, the device cannot properly recover from
resets, potentially leading to undefined behavior or device malfunction.
This feature has been tested using PCI transport with Function Level
Reset (FLR) as an example reset mechanism. The reset can be triggered
manually via sysfs (echo 1 > /sys/bus/pci/devices/$PCI_ADDR/reset).
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <1732690652-3065-3-git-send-email-israelr@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a helper that freezes the queue, updates the queue limits and
unfreezes the queue and convert all open coded versions of that to the
new helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250110054726.1499538-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely
process passthrough commands (bsg-lib, ufs tmf, various nvme admin
queues) and thus don't even check the flag. Remove it to simplify the
driver interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace all users of blk_mq_virtio_map_queues with the more generic
blk_mq_map_hw_queues. This in preparation to retire
blk_mq_virtio_map_queues.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-7-27211e9c2cd5@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 4ce6e2db00 ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.
block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run into
deadlock if the queue is frozen in current context. There are all kinds
of ->suspend() called in suspend context, so keeping queue frozen in the
whole suspend context isn't one good idea. And Marek reported lockdep
warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().
[1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/
Given the motivation is to drain in-flight IOs, it can be done by calling
freeze & unfreeze, meantime restore to previous behavior by keeping queue
quiesced during suspend.
Cc: Yi Sun <yi.sun@unisoc.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux.dev
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add requests to the tail of the list instead of the front so that they
are queued up in submission order.
Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs
and nvme_queue_rqs now that the list is ordered as expected.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace the semi-open coded request list helpers with a proper rq_list
type that mirrors the bio_list and has head and tail pointers. Besides
better type safety this actually allows to insert at the tail of the
list, which will be useful soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_mq_flush_plug_list submits requests in the reverse order that they
were submitted, which leads to a rather suboptimal I/O pattern
especially in rotational devices. Fix this by rewriting virtio_queue_rqs
so that it always pops the requests from the passed in request list, and
then adds them to the head of a local submit list. This actually
simplifies the code a bit as it removes the complicated list splicing,
at the cost of extra updates of the rq_next pointer. As that should be
cache hot anyway it should be an easy price to pay.
Fixes: 0e9911fa76 ("virtio-blk: support mq_ops->queue_rqs()")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241113152050.157179-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper. This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.
Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241108154657.845768-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper. This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.
Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Several new features here:
- Virtio find vqs API has been reworked
(required to fix the scalability issue we have with
adminq, which I hope to merge later in the cycle)
- vDPA driver for Marvell OCTEON
- virtio fs performance improvement
- mlx5 migration speedups
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaXjQQPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpnIsH/jVNqAQbe/vaBQdNMdnsA+P9A9unLbYRxYCQ
tN73mQRIXKtnZHBRAEbMGq52HPYg8HlN2HJSgyNo6I6t8VD+PiOco7m+3GpmqEcW
aXPOPl0BAbVoDgyutxRuuodP8Z61lBx0mG6iOxpzTXOPGlpQqtPCFHO8YnodqnPf
tMix/5uAqgZKV2siCbw5DtzwEc0gDHU8qsD0/nyoS5nBDF9yh/ardr5P/qiyFDQH
atCNYTOhIFU83pLAaw0fpCGbkt7gxf+5RpWVx3wkYww+/MwvYhsveRvQyaGbBz3n
WDtET3SOtVTta98OAGIKCq/2z8f6mYXBP7vXapBgnJG3vwS/poQ=
=LYua
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- Virtio find vqs API has been reworked (required to fix the
scalability issue we have with adminq, which I hope to merge later
in the cycle)
- vDPA driver for Marvell OCTEON
- virtio fs performance improvement
- mlx5 migration speedups
Fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits)
virtio: rename virtio_find_vqs_info() to virtio_find_vqs()
virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers
virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()
virtio_balloon: convert to use virtio_find_vqs_info()
virtiofs: convert to use virtio_find_vqs_info()
scsi: virtio_scsi: convert to use virtio_find_vqs_info()
virtio_net: convert to use virtio_find_vqs_info()
virtio_crypto: convert to use virtio_find_vqs_info()
virtio_console: convert to use virtio_find_vqs_info()
virtio_blk: convert to use virtio_find_vqs_info()
virtio: rename find_vqs_info() op to find_vqs()
virtio: remove the original find_vqs() op
virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly
virtio: convert find_vqs() op implementations to find_vqs_info()
virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
virtio: introduce virtio_queue_info struct and find_vqs_info() config op
virtio: make virtio_find_single_vq() call virtio_find_vqs()
virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
caif_virtio: use virtio_find_single_vq() for single virtqueue finding
vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
...
Since the original virtio_find_vqs() is no longer present, rename
virtio_find_vqs_info() back to virtio_find_vqs().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20240708074814.1739223-20-jiri@resnulli.us>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of passing separate names and callbacks arrays
to virtio_find_vqs(), allocate one of virtual_queue_info structs and
pass it to virtio_find_vqs_info().
Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20240708074814.1739223-11-jiri@resnulli.us>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The block queue limits validation does this for us now.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240708091651.177447-5-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we fail to read a logical block size in virtblk_read_limits() ->
virtio_cread_feature(), then we default to what is in
lim->logical_block_size, but that would be 0.
We can deal with lim->logical_block_size = 0 later in the
blk_mq_alloc_disk(), but the code in virtblk_read_limits() needs a proper
default, so give a default of SECTOR_SIZE.
Fixes: 27e32cd23f ("block: pass a queue_limits argument to blk_mq_alloc_disk")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240708091651.177447-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that device mapper can handle resetting all zones of a mapped zoned
device using REQ_OP_ZONE_RESET_ALL, all zoned block device drivers
support this operation. With this, the request queue feature
BLK_FEAT_ZONE_RESETALL is not necessary and the emulation code in
blk-zone.c can be removed.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240704052816.623865-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the zone_resetall flag into the queue_limits feature field so that
it can be set atomically with the queue frozen.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the zoned flags into the features field to reclaim a little
bit of space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the nonrot flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.
Use the chance to switch to defaulting to non-rotational and require
the driver to opt into rotational, which matches the polarity of the
sysfs interface.
For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
rotational flag is not set as they clearly are not rotational despite
this being a behavior change. There are some other drivers that
unconditionally set the rotational flag to keep the existing behavior
as they arguably can be used on rotational devices even if that is
probably not their main use today (e.g. virtio_blk and drbd).
The flag is automatically inherited in blk_stack_limits matching the
existing behavior in dm and md.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.
Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer. Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.
The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.
The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0. The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
virtblk_update_cache_mode boils down to a single call to
blk_queue_write_cache. Remove it in preparation for moving the cache
control flags into the queue_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240617060532.127975-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Several new features here:
- virtio-net is finally supported in vduse.
- Virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmZN570PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp2JUH/1K3fZOHymop6Y5Z3USFS7YdlF+dniedY/vg
TKyWERkXOlxq1d9DVxC0mN7tk72DweuWI0YJjLXofrEW1VuW29ecSbyFXxpeWJls
b7ErffxDAFRas5jkMCngD8TuFnbEegU0mGP5kbiHpEndBydQ2hH99Gg0x7swW+cE
xsvU5zonCCLwLGIP2DrVrn9qGOHtV6o8eZfVKDVXfvicn3lFBkUSxlwEYsO9RMup
aKxV4FT2Pb1yBicwBK4TH1oeEXqEGy1YLEn+kAHRbgoC/5L0/LaiqrkzwzwwOIPj
uPGkacf8CIbX0qZo5EzD8kvfcYL1xhU3eT9WBmpp2ZwD+4bINd4=
=nax1
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-6-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The only user of blk_revalidate_disk_zones() second argument was the
SCSI disk driver (sd). Now that this driver does not require this
update_driver_data argument, remove it to simplify the interface of
blk_revalidate_disk_zones(). Also update the function kdoc comment to
be more accurate (i.e. there is no gendisk ->revalidate method).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-21-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In virtblk_read_zoned_limits(), setting a zoned block device maximum
number of open and active zones using the functions
disk_set_max_open_zones() and disk_set_max_active_zones() is incorrect
as setting the limits for the request queue is now done atomically when
the gendisk is created (with blk_mq_alloc_disk()). The value set by the
disk_set_max_open/active_zones() functions will be overwritten.
Fix this by setting the maximum number of open and active zones directly
in the queue_limits structure passed to virtblk_read_zoned_limits().
Fixes: 8b83725656 ("virtio_blk: pass queue_limits to blk_mq_alloc_disk")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240301192639.410183-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Call virtblk_read_limits and most of virtblk_probe_zoned_device before
allocating the gendisk and thus request_queue and make them read into
a queue_limits structure instead. Pass this initialized queue_limits
to blk_mq_alloc_disk to set the queue up with the right parameters
from the start and only leave a few final touches for zoned devices
to be done just before adding the disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Split out a virtblk_read_limits helper that just reads the various
queue limits to separate it from the higher level probing logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ensure no remaining requests in virtqueues before resetting vdev and
deleting virtqueues. Otherwise these requests will never be completed.
It may cause the system to become unresponsive.
Function blk_mq_quiesce_queue() can ensure that requests have become
in_flight status, but it cannot guarantee that requests have been
processed by the device. Virtqueues should never be deleted before
all requests become complete status.
Function blk_mq_freeze_queue() ensure that all requests in virtqueues
become complete status. And no requests can enter in virtqueues.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmWpoCgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqIUEADFvJdC2izkPzYzsOrMK5Rt1H7vaHGKhbA+
zWCuQaa1xQd8bazq+NVnQpbzgclkE/WodTCNfNXcTTjzeQEmcZC888llP3Y9vwyP
XfEKH7fSaeKvGigJLro1oPe3YV7/t89F5ol3BoZayfzJF8GEU9BXRWzgOkZzijnk
xdm5wUyn/GknksMuQQraZ+U6bQRFLBOulzoaQeMD6Dosx+uRlM4WvAJawC+uOV6R
qPT2BVSfYGzmgEKvoaphw0FMkUhFBMDHfXTpQBi5tIzTKOaof8tynYEGz0FHZWeh
V0JEEp+3jLWFxFXeEcXgBVPJPE8J0DzGm9g17/uwC2Yhmlbw4FKZVRvGG+PpeUso
D5aqhqm3w0x7HgZ7JKwy/aUctADYvjVcSVzPHTaFK0aCSYCIAXxqv4p7fOoxPqyx
T32IUHTzGtkCdqzv/xFdtTYhTNM2vyzzbbWj5lXgCBqHsXOVbCh8UM2p+9ec2Umq
Fo1XF9eoCDe6Sn4s15hJ5G4DEhKGOKkHluvRUdM+0selA5b0sNOeUqlAf2v+0ve3
Pv3e3X4NPssNIEcsDHf5pc3zGC+LXRS0oFvfIvDESBjwXc3iHIMl+SkjyS57P4Fd
RKrHEUUiACuCKO/IWqFYLiNBNHnP3RmV5gSxIZr9QJhFSwOzP+/+4++TCdF5vdAV
amhv+0PdCw==
=DLW9
-----END PGP SIGNATURE-----
Merge tag 'for-6.8/block-2024-01-18' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- tcp, fc, and rdma target fixes (Maurizio, Daniel, Hannes,
Christoph)
- discard fixes and improvements (Christoph)
- timeout debug improvements (Keith, Max)
- various cleanups (Daniel, Max, Giuxen)
- trace event string fixes (Arnd)
- shadow doorbell setup on reset fix (William)
- a write zeroes quirk for SK Hynix (Jim)
- MD pull request via Song:
- Sparse warning since v6.0 (Bart)
- /proc/mdstat regression since v6.7 (Yu Kuai)
- Use symbolic error value (Christian)
- IO Priority documentation update (Christian)
- Fix for accessing queue limits without having entered the queue
(Christoph, me)
- Fix for loop dio support (Christoph)
- Move null_blk off deprecated ida interface (Christophe)
- Ensure nbd initializes full msghdr (Eric)
- Fix for a regression with the folio conversion, which is now easier
to hit because of an unrelated change (Matthew)
- Remove redundant check in virtio-blk (Li)
- Fix for a potential hang in sbitmap (Ming)
- Fix for partial zone appending (Damien)
- Misc changes and fixes (Bart, me, Kemeng, Dmitry)
* tag 'for-6.8/block-2024-01-18' of git://git.kernel.dk/linux: (45 commits)
Documentation: block: ioprio: Update schedulers
loop: fix the the direct I/O support check when used on top of block devices
blk-mq: Remove the hctx 'run' debugfs attribute
nbd: always initialize struct msghdr completely
block: Fix iterating over an empty bio with bio_for_each_folio_all
block: bio-integrity: fix kcalloc() arguments order
virtio_blk: remove duplicate check if queue is broken in virtblk_done
sbitmap: remove stale comment in sbq_calc_wake_batch
block: Correct a documentation comment in blk-cgroup.c
null_blk: Remove usage of the deprecated ida_simple_xx() API
block: ensure we hold a queue reference when using queue limits
blk-mq: rename blk_mq_can_use_cached_rq
block: print symbolic error name instead of error code
blk-mq: fix IO hang from sbitmap wakeup race
nvmet-rdma: avoid circular locking dependency on install_queue()
nvmet-tcp: avoid circular locking dependency on install_queue()
nvme-pci: set doorbell config before unquiescing
block: fix partial zone append completion handling in req_bio_endio()
block/iocost: silence warning on 'last_period' potentially being unused
md/raid1: Use blk_opf_t for read and write operations
...
virtqueue_enable_cb() will call virtqueue_poll() which will check if
queue is broken at beginning, so remove the virtqueue_is_broken() call
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmWcIOIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpn6hD/9oO7U75PuxUwYYHZ9Uzxpw6gQ0LEmeyJmE
NQYCkfYHVq3IsgOdF7elI9v3qtr6v8V8CdB7cByrnn3DgwsMuiTKZZ0dK7vH37PO
DX+/xn349e8oH7RdRo7f3m95g1YbHfpfnj0Rc4mjTDV72Jr/HlLTVgGTQg8DEnCR
wBIFmeuBHHgeeLh87gsWLAP7ReReiy9V1uqpDFsko2/4BxRAM/8eedkwcAxD8aEy
rd+dT/SBQj2cOdQMUeExT3gWjwzHh6ZHx3f1WCLK5fdck6BogH2hBUeri6F/H98L
HoaXjBZYBTH68hB/mnO5I4g1ZlrVM74Vp7JPa3e1SFFtyEi6lsyrk2J3GoNh0E7r
pXqH5kAcaJwBsBrbRGuvEyGbn9RLTaN5Gvseud0VE4oMruyodTniQaHXuIGackgz
sMavMho4486EUWPaF7gIBdLNK1hO13w+IDZ4+3oBxhudMqdgZbk4iYpOCqQ7QY5G
2vkzAE/sZ+aVNXeaIQOI8dE5clBy8gJ+6+t8dm3DY1r1xdbcnU40iZ8/fri3h69r
vHs9bpQnVWZF0gEyEflY1pkcAPpIkvMmWCR7Ehy5YCkIfa+qfSL05o3dicpWovLP
N+gCtpkhTK2AvmUWsUMypMLRvoSOImyCIiobrr3qNBaUdgRP8xKfUa72RuRp8cGl
Vrj5oAiE3w==
=YAfp
-----END PGP SIGNATURE-----
Merge tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
"Pretty quiet round this time around. This contains:
- NVMe updates via Keith:
- nvme fabrics spec updates (Guixin, Max)
- nvme target udpates (Guixin, Evan)
- nvme attribute refactoring (Daniel)
- nvme-fc numa fix (Keith)
- MD updates via Song:
- Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai)
- Fix raid5 hang issue (Junxiao Bi)
- Add Yu Kuai as Reviewer of the md subsystem
- Remove deprecated flavors (Song Liu)
- raid1 read error check support (Li Nan)
- Better handle events off-by-1 case (Alex Lyakas)
- Efficiency improvements for passthrough (Kundan)
- Support for mapping integrity data directly (Keith)
- Zoned write fix (Damien)
- rnbd fixes (Kees, Santosh, Supriti)
- Default to a sane discard size granularity (Christoph)
- Make the default max transfer size naming less confusing
(Christoph)
- Remove support for deprecated host aware zoned model (Christoph)
- Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel,
Bart, Christoph)"
* tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits)
block: Treat sequential write preferred zone type as invalid
block: remove disk_clear_zoned
sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics
drivers/block/xen-blkback/common.h: Fix spelling typo in comment
blk-cgroup: fix rcu lockdep warning in blkg_lookup()
blk-cgroup: don't use removal safe list iterators
block: floor the discard granularity to the physical block size
mtd_blkdevs: use the default discard granularity
bcache: use the default discard granularity
zram: use the default discard granularity
null_blk: use the default discard granularity
nbd: use the default discard granularity
ubd: use the default discard granularity
block: default the discard granularity to sector size
bcache: discard_granularity should not be smaller than a sector
block: remove two comments in bio_split_discard
block: rename and document BLK_DEF_MAX_SECTORS
loop: don't abuse BLK_DEF_MAX_SECTORS
aoe: don't abuse BLK_DEF_MAX_SECTORS
null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS
...
Only use disk_set_zoned to actually enable zoned device support.
For clearing it, call disk_clear_zoned, which is renamed from
disk_clear_zone_settings and now directly clears the zoned flag as
well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):
- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)
Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.
Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
virtblk_revalidate_zones is called unconditionally from
virtblk_config_changed_work from the virtio config_changed callback.
virtblk_revalidate_zones is a bit odd in that it re-clears the zoned
state for host aware or non-zoned devices, which isn't needed unless the
zoned mode changed - but a zone mode change to a host managed model isn't
handled at all, and virtio_blk also doesn't handle any other config
change except for a capacity change is handled (and even if it was
the upper layers above virtio_blk wouldn't handle it very well).
But even the useful case of a size change that would add or remove
zones isn't handled properly as blk_revalidate_disk_zones expects the
device capacity to cover all zones, but the capacity is only updated
after virtblk_revalidate_zones.
As this code appears to be entirely untested and is getting in the way
remove it for now, but it can be readded in a fixed version with
proper test coverage if needed.
Fixes: 95bfec41bd ("virtio-blk: add support for zoned block devices")
Fixes: f1ba4e674f ("virtio-blk: fix to match virtio spec")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move reading and checking the zoned model from virtblk_probe_zoned_device
into the caller, leaving only the code to perform the actual setup for
host managed zoned devices in virtblk_probe_zoned_device.
This allows to share the model reading and sharing between builds with
and without CONFIG_BLK_DEV_ZONED, and improve it for the
!CONFIG_BLK_DEV_ZONED case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 4e04005256 ("virtio-blk: support polling I/O") triggers the
following gcc 13 W=1 warnings:
drivers/block/virtio_blk.c: In function ‘init_vq’:
drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=]
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~
drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534]
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~~~~~~~~~~~~
drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a false positive because the lower bound -2147483648 is
incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs <
65536.
The code mixes int, unsigned short, and unsigned int types in addition
to using "%d" for an unsigned value. Use unsigned short and "%u"
consistently to solve the compiler warning.
Cc: Suwan Kim <suwan.kim027@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231204140743.1487843-1-stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vdpa/mlx5:
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
new maintainer
vdpa:
support for vq descriptor mappings
decouple reset of iotlb mapping from device reset
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVCUMYPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp4L0H/RKcnNXPRqzhhBI1XVQ11Z8CO8WjovcmJalu
ADHNEGmvuWnY79fp9eLiZ4iVaTx1qbzqIB5Q500DJ65jh71W7UQ8ww6CGjNUoRGs
Zoe4G09WoOf4bvDZZzVV7ml/AzMdsHWSZK8pxY3QI9CsC9Zfp9hg20QYxPylCqYx
SIJx7w2MkoojfmtOHRx1WUxaQz99yfU4Z0C5PxtRE1HGN6/a1aY0P0CAl5jq8uCK
U5sCRsfCmP7VKlspeEddMiPA35ADbCiysSobCbwGVQEs5cHpMUX7KWa+oV0tF/PY
9uyJb2rJy6zG3tXmL4XNib665ZR86HX6qiWRfm2nBQQStuHaJyg=
=mXgo
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"vhost,virtio,vdpa: features, fixes, cleanups.
vdpa/mlx5:
- VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
- new maintainer
vdpa:
- support for vq descriptor mappings
- decouple reset of iotlb mapping from device reset
and fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
vdpa_sim: implement .reset_map support
vdpa/mlx5: implement .reset_map driver op
vhost-vdpa: clean iotlb map during reset for older userspace
vdpa: introduce .compat_reset operation callback
vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
vhost-vdpa: reset vendor specific mapping to initial state in .release
vdpa: introduce .reset_map operation callback
virtio_pci: add check for common cfg size
virtio-blk: fix implicit overflow on virtio_max_dma_size
virtio_pci: add build offset check for the new common cfg items
virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
vduse: make vduse_class constant
vhost-scsi: Spelling s/preceeding/preceding/g
virtio: kdoc for struct virtio_pci_modern_device
vdpa: Update sysfs ABI documentation
MAINTAINERS: Add myself as mlx5_vdpa driver
virtio-balloon: correct the comment of virtballoon_migratepage()
mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
vdpa/mlx5: Update cvq iotlb mapping on ASID change
vdpa/mlx5: Make iotlb helper functions more generic
...
The following codes have an implicit conversion from size_t to u32:
(u32)max_size = (size_t)virtio_max_dma_size(vdev);
This may lead overflow, Ex (size_t)4G -> (u32)0. Once
virtio_max_dma_size() has a larger size than U32_MAX, use U32_MAX
instead.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230904061045.510460-1-pizhenwei@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>