linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 23:46:45 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	c6dc26df6b	netfilter pull request 25-07-25 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmiDtZkACgkQ1w0aZmrP KyGxohAAns0Vyq4zx4VKcaDQQbciakZnHPK28eHOGxNKzh6qX/ybB8feOeAVOR1Q RCsVkSej/HiGIOsrWwiOKecx+d/Sf2wSGeJYftPMjk5pL/V3SyMdAUUgQgPtne1o LHE3rsu9BLGXt6M1mpn4+DjDQqDbLBVXvi3x/FNXCFJETuEfaL7gcisXKxzeqCtS fWqFw0JzQNmHCYsBbUb7CpoowD3QKvSBdIUP+8ciC9qszRfgGfOlCKwkyRwp/uV3 vh1yLkHvlh9r4oe+PP4fNvTMbsaPS1jecj6xNwRmEEyf0bvBBCl9iNbWe5F9Fqvr dbbOkKOHfo0nCYR3AzQFJpM1o81KGQ90JkcrR/hhEII8KD/3VKdvUqFl1WkU6rEP rAxkl4lXM7aq11nJp2dClvshSZO/6Fo2byISGfLuVLPnq1/Lo2zZjnmk7StE1bm2 bWCA+C64CjAt1gqUCC1TZ7XpMtAZDQEdDSSN49/BqpJJncFEhigzph+I3H+LFuE9 hcUdF4i1UTNlqMTv4uY8ycdmrjRj4SfeXaBXxWjnaRn5Wxfs0GdtmdeiVuBTLylQ Y18MM5OMGso1bzgS392jjwzMsommdLcgb7/v8Z6TU9dAjoeS2cYac5JiqilkgiSN 0nzikhtADCHZ+tgMU2LOc+nK93KQWyaJUtrSR5uEU6tHqe/m6No= =mKnX -----END PGP SIGNATURE----- Merge tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following series contains Netfilter/IPVS updates for net-next: 1) Display netns inode in conntrack table full log, from lvxiafei. 2) Autoload nf_log_syslog in case no logging backend is available, from Lance Yang. 3) Three patches to remove unused functions in x_tables, nf_tables and conntrack. From Yue Haibing. 4) Exclude LEGACY TABLES on PREEMPT_RT: Add NETFILTER_XTABLES_LEGACY to exclude xtables legacy infrastructure. 5) Restore selftests by toggling NETFILTER_XTABLES_LEGACY where needed. From Florian Westphal. 6) Use CONFIG_INET_SCTP_DIAG in tools/testing/selftests/net/netfilter/config, from Sebastian Andrzej Siewior. 7) Use timer_delete in comment in IPVS codebase, from WangYuli. 8) Dump flowtable information in nfnetlink_hook, this includes an initial patch to consolidate common code in helper function, from Phil Sutter. 9) Remove unused arguments in nft_pipapo set backend, from Florian Westphal. 10) Return nft_set_ext instead of boolean in set lookup function, from Florian Westphal. 11) Remove indirection in dynamic set infrastructure, also from Florian. 12) Consolidate pipapo_get/lookup, from Florian. 13) Use kvmalloc in nft_pipapop, from Florian Westphal. 14) syzbot reports slab-out-of-bounds in xt_nfacct log message, fix from Florian Westphal. 15) Ignored tainted kernels in selftest nft_interface_stress.sh, from Phil Sutter. 16) Fix IPVS selftest by disabling rp_filter with ipip tunnel device, from Yi Chen. * tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0 selftests: netfilter: Ignore tainted kernels in interface stress test netfilter: xt_nfacct: don't assume acct name is null-terminated netfilter: nft_set_pipapo: prefer kvmalloc for scratch maps netfilter: nft_set_pipapo: merge pipapo_get/lookup netfilter: nft_set: remove indirection from update API call netfilter: nft_set: remove one argument from lookup and update functions netfilter: nft_set_pipapo: remove unused arguments netfilter: nfnetlink_hook: Dump flowtable info netfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helper ipvs: Rename del_timer in comment in ip_vs_conn_expire_now() selftests: netfilter: Enable CONFIG_INET_SCTP_DIAG selftests: net: Enable legacy netfilter legacy options. netfilter: Exclude LEGACY TABLES on PREEMPT_RT. netfilter: conntrack: Remove unused net in nf_conntrack_double_lock() netfilter: nf_tables: Remove unused nft_reduce_is_readonly() netfilter: x_tables: Remove unused functions xt_{in\|out}name() netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid netfilter: conntrack: table full detailed log ==================== Link: https://patch.msgid.link/20250725170340.21327-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-25 16:37:55 -07:00
Gabriel Goller	f24987ef69	ipv6: add `force_forwarding` sysctl to enable per-interface forwarding It is currently impossible to enable ipv6 forwarding on a per-interface basis like in ipv4. To enable forwarding on an ipv6 interface we need to enable it on all interfaces and disable it on the other interfaces using a netfilter rule. This is especially cumbersome if you have lots of interfaces and only want to enable forwarding on a few. According to the sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding for all interfaces, while the interface-specific `net.ipv6.conf.<interface>.forwarding` configures the interface Host/Router configuration. Introduce a new sysctl flag `force_forwarding`, which can be set on every interface. The ip6_forwarding function will then check if the global forwarding flag OR the force_forwarding flag is active and forward the packet. To preserve backwards-compatibility reset the flag (on all interfaces) to 0 if the net.ipv6.conf.all.forwarding flag is set to 0. Add a short selftest that checks if a packet gets forwarded with and without `force_forwarding`. [0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Gabriel Goller <g.goller@proxmox.com> Link: https://patch.msgid.link/20250722081847.132632-1-g.goller@proxmox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-25 13:06:19 -07:00
Phil Sutter	bc8c43adfd	netfilter: nfnetlink_hook: Dump flowtable info Introduce NFNL_HOOK_TYPE_NFT_FLOWTABLE to distinguish flowtable hooks from base chain ones. Nested attributes are shared with the old NFTABLES hook info type since they fit apart from their misleading name. Old nftables in user space will ignore this new hook type and thus continue to print flowtable hooks just like before, e.g.: \| family netdev { \| hook ingress device test0 { \| 0000000000 nf_flow_offload_ip_hook [nf_flow_table] \| } \| } With this patch in place and support for the new hook info type, output becomes more useful: \| family netdev { \| hook ingress device test0 { \| 0000000000 flowtable ip mytable myft [nf_flow_table] \| } \| } Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-07-25 18:40:01 +02:00
Samiullah Khawaja	8e7583a4f6	net: define an enum for the napi threaded state Instead of using '0' and '1' for napi threaded state use an enum with 'disabled' and 'enabled' states. Tested: ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250723013031.2911384-4-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-24 18:34:55 -07:00
Jakub Kicinski	126d85fb04	Another wireless update: - rtw89: - STA+P2P concurrency - support for USB devices RTL8851BU/RTL8852BU - ath9k: OF support - ath12k: - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - iwlwifi: some FIPS interoperability - brcm80211: support SDIO 43751 device - rt2x00: better DT/OF support - cfg80211/mac80211: - improved S1G support - beacon monitor for MLO -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmiCBJcACgkQ10qiO8sP aABs3w/9GiI7NLSicM2ulpaWs92za2RrsXsYH01z+m+dx3wUCupKEXKHh+hwy+EW WxJpfMqKURAh4QofTCLK0mwOKLr1bT/gNmXd9tKM3oaDIH/fk6HCteTff8GgmU/1 zfbd5UMSU1W98WJiS3Ukm9FjZN5i1X/cpPdYMIq0sX3sM9JSmDZL6ToLsknr3sC2 6CZj3UcjPx89e/ei4ALVjlU8DRhKBgqpzHMBgdnPt8bAbp9tUmyTKF0RqrBUn1Md 9D1inkA/bJxmIKHslc8PgUEeuBlndCjrLCzlYw2XW8UOcLaHxujqQv7drgEjjlr2 UTM+Hv6itFsRugS465EwbYLjM3lotmbpSWKR7ZQiSBF16jg0mBscq2mqqOU6Dv+X SqxTp9WSYptCtylinz/6h8SsaPr+rGxa00sLopcPUrhWgWagndhmKMcVfQBvnlUE JAg9gXkQ0d5GuYDOIdW+7i1NpLADthQpyynihhkAyISgfk+43HGUssTXCfRMr7wc iL07j6PFGXYTdSDiuaxnS70qn8jHSlFQ6FKvb7jxTGBR3RD1KtCyoWVStx6vE6Q7 MTnsBGZZZ0yli6ur5vtVJr6ziwjjkBG+XxxuLvbeFb6+LYbPdpfa9fXj/UFct9el U9iBlQ0SBwgw8olh1Y6Xbj0XM9B57UnKLgRZ9GxMPVgQPLTli9k= =Dkfh -----END PGP SIGNATURE----- Merge tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another wireless update: - rtw89: - STA+P2P concurrency - support for USB devices RTL8851BU/RTL8852BU - ath9k: OF support - ath12k: - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - iwlwifi: some FIPS interoperability - brcm80211: support SDIO 43751 device - rt2x00: better DT/OF support - cfg80211/mac80211: - improved S1G support - beacon monitor for MLO * tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (199 commits) ssb: use new GPIO line value setter callbacks for the second GPIO chip wifi: Fix typos wifi: brcmsmac: Use str_true_false() helper wifi: brcmfmac: fix EXTSAE WPA3 connection failure due to AUTH TX failure wifi: brcm80211: Remove yet more unused functions wifi: brcm80211: Remove more unused functions wifi: brcm80211: Remove unused functions wifi: iwlwifi: Revert "wifi: iwlwifi: remove support of several iwl_ppag_table_cmd versions" wifi: iwlwifi: check validity of the FW API range wifi: iwlwifi: don't export symbols that we shouldn't wifi: iwlwifi: mld: use spec link id and not FW link id wifi: iwlwifi: mld: decode EOF bit for AMPDUs wifi: iwlwifi: Remove support for rx OMI bandwidth reduction wifi: iwlwifi: stop supporting iwl_omi_send_status_notif ver 1 wifi: iwlwifi: remove SC2F firmware support wifi: iwlwifi: mvm: Remove NAN support wifi: iwlwifi: mld: avoid outdated reorder buffer head_sn wifi: iwlwifi: mvm: avoid outdated reorder buffer head_sn wifi: iwlwifi: disable certain features for fips_enabled wifi: iwlwifi: mld: support channel survey collection for ACS scans ... ==================== Link: https://patch.msgid.link/20250724100349.21564-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-24 17:25:42 -07:00
Frank Li	eefb83790a	misc: pci_endpoint_test: Add doorbell test case Add doorbell support with the help of three new registers: PCIE_ENDPOINT_TEST_DB_BAR, PCIE_ENDPOINT_TEST_DB_ADDR, and PCIE_ENDPOINT_TEST_DB_DATA. The testcase works by triggering the doorbell in Endpoint by writing the value from PCI_ENDPOINT_TEST_DB_DATA register to the address provided by PCI_ENDPOINT_TEST_DB_OFFSET register of the BAR indicated by the PCIE_ENDPOINT_TEST_DB_BAR register and waiting for the completion status from the Endpoint. Signed-off-by: Frank Li <Frank.Li@nxp.com> [mani: removed one spurious change and reworded the commit message] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250710-ep-msi-v21-7-57683fc7fb25@nxp.com	2025-07-24 16:51:46 -05:00
Chia-Yu Chang	d4de8bffbe	sched: Dump configuration and statistics of dualpi2 qdisc The configuration and statistics dump of the DualPI2 Qdisc provides information related to both queues, such as packet numbers and queuing delays in the L-queue and C-queue, as well as general information such as probability value, WRR credits, memory usage, packet marking counters, max queue size, etc. The following patch includes enqueue/dequeue for DualPI2. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Link: https://patch.msgid.link/20250722095915.24485-3-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-23 17:52:07 -07:00
Chia-Yu Chang	320d031ad6	sched: Struct definition and parsing of dualpi2 qdisc DualPI2 is the reference implementation of IETF RFC9332 DualQ Coupled AQM (https://datatracker.ietf.org/doc/html/rfc9332) providing two queues called low latency (L-queue) and classic (C-queue). By default, it enqueues non-ECN and ECT(0) packets into the C-queue and ECT(1) and CE packets into the low latency queue (L-queue), as per IETF RFC9332 spec. This patch defines the dualpi2 Qdisc structure and parsing, and the following two patches include dumping and enqueue/dequeue for the DualPI2. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Link: https://patch.msgid.link/20250722095915.24485-2-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-23 17:52:07 -07:00
Carolina Jubran	1bbdb81a98	devlink: Fix excessive stack usage in rate TC bandwidth parsing The devlink_nl_rate_tc_bw_parse function uses a large stack array for devlink attributes, which triggers a warning about excessive stack usage: net/devlink/rate.c: In function 'devlink_nl_rate_tc_bw_parse': net/devlink/rate.c:382:1: error: the frame size of 1648 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] Introduce a separate attribute set specifically for rate TC bandwidth parsing that only contains the two attributes actually used: index and bandwidth. This reduces the stack array from DEVLINK_ATTR_MAX entries to just 2 entries, solving the stack usage issue. Update devlink selftest to use the new 'index' and 'bw' attribute names consistent with the YAML spec. Example usage with ynl with the new spec: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"index": 0, "bw": 50}, {"index": 1, "bw": 50}, {"index": 2, "bw": 0}, {"index": 3, "bw": 0}, {"index": 4, "bw": 0}, {"index": 5, "bw": 0}, {"index": 6, "bw": 0}, {"index": 7, "bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'bw': 50, 'index': 0}, {'bw': 50, 'index': 1}, {'bw': 0, 'index': 2}, {'bw': 0, 'index': 3}, {'bw': 0, 'index': 4}, {'bw': 0, 'index': 5}, {'bw': 0, 'index': 6}, {'bw': 0, 'index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Fixes: `566e8f108f` ("devlink: Extend devlink rate API with traffic classes bandwidth management") Reported-by: Arnd Bergmann <arnd@arndb.de> Closes: https://lore.kernel.org/netdev/20250708160652.1810573-1-arnd@kernel.org/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507171943.W7DJcs6Y-lkp@intel.com/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Tested-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/1753175609-330621-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-23 17:07:35 -07:00
Yishai Hadas	a272019a46	IB: Extend UVERBS_METHOD_REG_MR to get DMAH Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers. It will be used in mlx5 driver as part of the next patch from the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-07-23 01:42:11 -04:00
Yishai Hadas	d83edab562	RDMA/core: Introduce a DMAH object and its alloc/free APIs Introduce a new DMA handle (DMAH) object along with its corresponding allocation and deallocation APIs. This DMAH object encapsulates attributes intended for use in DMA transactions. While its initial purpose is to support TPH functionality, it is designed to be extensible for future features such as DMA PCI multipath, PCI UIO configurations, PCI traffic class selection, and more. Further details: ---------------- We ensure that a caller requesting a DMA handle for a specific CPU ID is permitted to be scheduled on it. This prevent a potential security issue where a non privilege user may trigger DMA operations toward a CPU that it's not allowed to run on. We manage reference counting for the DMAH object and its consumers (e.g., memory regions) as will be detailed in subsequent patches in the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/2cad097e849597e49d6b61e6865dba878257f371.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-07-23 01:42:10 -04:00
Yishai Hadas	5b2e45049d	IB/core: Add UVERBS_METHOD_REG_MR on the MR object This new method enables us to use a single ioctl from user space which supports the below variants of reg_mr [1]. The method will be extended in the next patches from the series with an extra attribute to let us pass DMA handle to be used as part of the registration. [1] ibv_reg_mr(), ibv_reg_mr_iova(), ibv_reg_mr_iova2(), ibv_reg_dmabuf_mr(). Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/5a3822ceef084efe967c9752e89c58d8250337c7.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-07-23 01:42:10 -04:00
Jakub Kicinski	fbe09277fa	ethtool: rss: support removing contexts via Netlink Implement removing additional RSS contexts via Netlink. Technically it'd be possible to shoehorn the delete operation into ethnl_request_ops-compatible handler. The code ends up longer than open coded version, and I think we'll need a custom way of sending notifications at some stage (if we allow tying the context lifetime to the netlink socket, in the future). Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-21 18:21:19 -07:00
Jakub Kicinski	a166ab7816	ethtool: rss: support creating contexts via Netlink Support creating contexts via Netlink. Setting flow hashing fields on the new context is not supported at this stage, it can be added later. An empty indirection table is not supported. This is a carry over from the IOCTL interface where empty indirection table meant delete. We can repurpose empty indirection table in Netlink but for now to avoid confusion reject it using the policy. Support letting user choose the ID for the new context. This was not possible in IOCTL since the context ID field for the create action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value. Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-21 18:20:43 -07:00
David Sterba	009b2056cb	btrfs: defrag: add flag to force no-compression Currently the defrag ioctl cannot rewrite the extents without compression. Add a new flag for that, as setting compression to 0 (or "no compression") means to do no changes to compression so take what is the current default, like mount options or properties. The defrag setting overrides mount or properties. The compression BTRFS_DEFRAG_DONT_COMPRESS is only used for in-memory operations and does not need to have a fixed value. Mount with zstd:9, copy test file from /usr/bin/ (about 260KB): $ mount -o compress=zstd:9 /dev/vda /mnt $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 127: 13312.. 13439: 128: encoded 1: 128.. 255: 13364.. 13491: 128: 13440: encoded 2: 256.. 291: 13424.. 13459: 36: 13492: last,encoded,eof testfile: 3 extents found $ compsize testfile Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 42% 124K 292K 292K zstd 42% 124K 292K 292K Defrag to uncompressed: $ btrfs fi defrag --nocomp testfile $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 291: 291840.. 292131: 292: last,eof testfile: 1 extent found $ compsize testfile Processed 1 file, 1 regular extents (1 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 292K 292K 292K none 100% 292K 292K 292K Compress again with LZO: $ btrfs fi defrag -clzo testfile $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 127: 13312.. 13439: 128: encoded 1: 128.. 255: 13392.. 13519: 128: 13440: encoded 2: 256.. 291: 13480.. 13515: 36: 13520: last,encoded,eof testfile: 3 extents found $ compsize testfile Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 64% 188K 292K 292K lzo 64% 188K 292K 292K Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-22 01:13:03 +02:00
Greg Kroah-Hartman	bcbef1e4a6	Linux 6.16-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmh9azkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7GIH/0lpQtHRl6N+q2Qs v75iG2ZouWyw2JlhUOHAToKU58MZqqTXLZzc8ZdY6fAd7DpXWKRGSDsyWVyLbUkt UKGzXEIJsHXYvw2QIPbhkY9gQBWpdZTh4tHztFyKb0QLn81qkibVP6ChOwSzOGa/ xUyQ5v6yH+JvQlnQaCgy6hi7cMrLNSNZmuIjy0yc5Y153YPEtX5OUPO2PstpUx5r AuiOhU4ewW9QCe07X/Pk7tdn0T2Jg8Kwk1FViaM0RBUf/0GXGfsovIxpUP/eCyMc MA+9SXXLlDa/4Z8w3EsQYx6m2MnNmm0HPeevCmWqq3+Ocooik4si1BpzHfUaE6n/ /0D8zBg= =NzEi -----END PGP SIGNATURE----- Merge tag 'v6.16-rc7' into tty-next We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-21 16:53:33 +02:00
Xu Yilun	850f14f5b9	iommufd: Destroy vdevice on idevice destroy Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so that vdev can't outlive idev. idev represents the physical device bound to iommufd, while the vdev represents the virtual instance of the physical device in the VM. The lifecycle of the vdev should not be longer than idev. This doesn't cause real problem on existing use cases cause vdev doesn't impact the physical device, only provides virtualization information. But to extend vdev for Confidential Computing (CC), there are needs to do secure configuration for the vdev, e.g. TSM Bind/Unbind. These configurations should be rolled back on idev destroy, or the external driver (VFIO) functionality may be impact. The idev is created by external driver so its destruction can't fail. The idev implements pre_destroy() op to actively remove its associated vdev before destroying itself. There are 3 cases on idev pre_destroy(): 1. vdev is already destroyed by userspace. No extra handling needed. 2. vdev is still alive. Use iommufd_object_tombstone_user() to destroy vdev and tombstone the vdev ID. 3. vdev is being destroyed by userspace. The vdev ID is already freed, but vdev destroy handler is not completed. This requires multi-threads syncing - vdev holds idev's short term users reference until vdev destruction completes, idev leverages existing wait_shortterm mechanism for syncing. idev should also block any new reference to it after pre_destroy(), or the following wait shortterm would timeout. Introduce a 'destroying' flag, set it to true on idev pre_destroy(). Any attempt to reference idev should honor this flag under the protection of idev->igroup->lock. Link: https://patch.msgid.link/r/20250716070349.1807226-5-yilun.xu@linux.intel.com Originally-by: Nicolin Chen <nicolinc@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> Signed-off-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-18 17:33:08 -03:00
Alexei Starovoitov	beb1097ec8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-18 12:15:59 -07:00
Lachlan Hodges	6624a0af82	wifi: cfg80211: support configuring an S1G short beaconing BSS S1G short beacons are an optional frame type used in an S1G BSS that contain a limited set of elements. While they are optional, they are a fundamental part of S1G that enables significant power saving. Expose 2 additional netlink attributes, NL80211_ATTR_S1G_LONG_BEACON_PERIOD which denotes the number of beacon intervals between each long beacon and NL80211_ATTR_S1G_SHORT_BEACON which is a nested attribute containing the short beacon tail and head. We split them as the long beacon period cannot be updated, and is only used when initialisng the interface, whereas the short beacon data can be used to both initialise and update the templates. This follows how things such as the beacon interval and DTIM period currently operate. During the initialisation path, we ensure we have the long beacon period if the short beacon data is being passed down, whereas the update path will simply update the template if its sent down. The short beacon data is validated using the same routines for regular beacons as they support correctly parsing the short beacon format while ensuring the frame is well-formed. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250717074205.312577-2-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-18 14:14:43 +02:00
Jakub Kicinski	c0ae03588b	ethtool: rss: initial RSS_SET (indirection table handling) Add initial support for RSS_SET, for now only operations on the indirection table are supported. Unlike the ioctl don't check if at least one parameter is being changed. This is how other ethtool-nl ops behave, so pick the ethtool-nl consistency vs copying ioctl behavior. There are two special cases here: 1) resetting the table to defaults; 2) support for tables of different size. For (1) I use an empty Netlink attribute (array of size 0). (2) may require some background. AFAICT a lot of modern devices allow allocating RSS tables of different sizes. mlx5 can upsize its tables, bnxt has some "table size calculation", and Intel folks asked about RSS table sizing in context of resource allocation in the past. The ethtool IOCTL API has a concept of table size, but right now the user is expected to provide a table exactly the size the device requests. Some drivers may change the table size at runtime (in response to queue count changes) but the user is not in control of this. What's not great is that all RSS contexts share the same table size. For example a device with 128 queues enabled, 16 RSS contexts 8 queues in each will likely have 256 entry tables for each of the 16 contexts, while 32 would be more than enough given each context only has 8 queues. To address this the Netlink API should avoid enforcing table size at the uAPI level, and should allow the user to express the min table size they expect. To fully solve (2) we will need more driver plumbing but at the uAPI level this patch allows the user to specify a table size smaller than what the device advertises. The device table size must be a multiple of the user requested table size. We then replicate the user-provided table to fill the full device size table. This addresses the "allow the user to express the min table size" objective, while not enforcing any fixed size. From Netlink perspective .get_rxfh_indir_size() is now de facto the "max" table size supported by the device. We may choose to support table replication in ethtool, too, when we actually plumb this thru the device APIs. Initially I was considering moving full pattern generation to the kernel (which queues to use, at which frequency and what min sequence length). I don't think this complexity would buy us much and most if not all devices have pow-2 table sizes, which simplifies the replication a lot. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 16:13:58 -07:00
Jakub Kicinski	af2d6148d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml `880d43ca9a` ("netlink: specs: clean up spaces in brackets") `af52020fc5` ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c `a44312d58e` ("net: phy: Don't register LEDs for genphy") `f0f2b992d8` ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c `5fde0fcbd7` ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") `ea045a0de3` ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c `ae3264a25a` ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") `a8594c956c` ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 11:00:33 -07:00
Tao Chen	19d18fdfc7	bpf: Add struct bpf_token_info The 'commit `35f96de041` ("bpf: Introduce BPF token object")' added BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD already used to get BPF object info, so we can also get token info with this cmd. One usage scenario, when program runs failed with token, because of the permission failure, we can report what BPF token is allowing with this API for debugging. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:38:05 -07:00
Jesse Zhang	9ffab039bc	drm/amdgpu: Replace HQD terminology with slots naming The term "HQD" is CP-specific and doesn't accurately describe the queue resources for other IP blocks like SDMA, VCN, or VPE. This change: 1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect the generic nature of the resource counting 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` 3. Maintains the same functionality while using more appropriate terminology Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-16 16:17:36 -04:00
Jesse Zhang	78d0a27ae0	drm/amdgpu: Add user queue instance count in HW IP info This change exposes the number of available user queue instances for each hardware IP type (GFX, COMPUTE, SDMA) through the drm_amdgpu_info_hw_ip interface. Key changes: 1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure 2. Implemented counting of available HQD slots using: - mes.gfx_hqd_mask for GFX queues - mes.compute_hqd_mask for COMPUTE queues - mes.sdma_hqd_mask for SDMA queues 3. Only counts available instances when user queues are enabled (!disable_uq) v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks to determine the number of queue slots available for each engine type (Alex) v3: rename userq_num_instance to userq_num_hqds (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-16 16:17:35 -04:00
Eric Dumazet	6c758062c6	tcp: add LINUX_MIB_BEYOND_WINDOW Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW Incremented when an incoming packet is received beyond the receiver window. nstat -az \| grep TcpExtBeyondWindow Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-14 18:41:42 -07:00
Samiullah Khawaja	2677010e77	Add support to set NAPI threaded for individual NAPI A net device has a threaded sysctl that can be used to enable threaded NAPI polling on all of the NAPI contexts under that device. Allow enabling threaded NAPI polling at individual NAPI level using netlink. Extend the netlink operation `napi-set` and allow setting the threaded attribute of a NAPI. This will enable the threaded polling on a NAPI context. Add a test in `nl_netdev.py` that verifies various cases of threaded NAPI being set at NAPI and at device level. Tested ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-14 18:02:37 -07:00
Michał Winiarski	5a8f77e24a	PCI/IOV: Restore VF resizable BAR state after reset Similar to regular resizable BARs, VF BARs can also be resized, e.g. by the system firmware or the PCI subsystem itself. The capability layout is the same as PCI_EXT_CAP_ID_REBAR. Add the capability ID and restore it as a part of IOV state. See PCIe r6.2, sec 7.8.7. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20250702093522.518099-2-michal.winiarski@intel.com	2025-07-14 14:58:13 -05:00
Phil Sutter	36a686c078	Revert "netfilter: nf_tables: Add notifications for hook changes" This reverts commit `465b9ee0ee`. Such notifications fit better into core or nfnetlink_hook code, following the NFNL_MSG_HOOK_GET message format. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-07-14 15:22:47 +02:00
Mark Brown	bfd291279f	ASoC: codec: Convert to GPIO descriptors for Merge series from Peng Fan <peng.fan@nxp.com>: This patchset is a pick up of patch 1,2 from [1]. And I also collect Linus's R-b for patch 2. After this patchset, there is only one user of of_gpio.h left in sound driver(pxa2xx-ac97). of_gpio.h is deprecated, update the driver to use GPIO descriptors. Patch 1 is to drop legacy platform data which in-tree no users are using it Patch 2 is to convert to GPIO descriptors Checking the DTS that use the device, all are using GPIOD_ACTIVE_LOW polarity for reset-gpios, so all should work as expected with this patch. [1] https://lore.kernel.org/all/20250408-asoc-gpio-v1-0-c0db9d3fd6e9@nxp.com/	2025-07-14 11:34:16 +01:00
I Viswanath	c3ff7f06c7	i2c: Clarify behavior of I2C_M_RD flag Update the description of I2C_M_RD to clarify that not setting it signals a write transaction Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>	2025-07-14 09:15:58 +02:00
Michael Margolin	9fb3dd8519	RDMA/efa: Add CQ with external memory support Add an option to create CQ using external memory instead of allocating in the driver. The memory can be passed from userspace by dmabuf fd and an offset or a VA. One of the possible usages is creating CQs that reside in accelerator memory, allowing low latency asynchronous direct polling from the accelerator device. Add a capability bit to reflect on the feature support. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20250708202308.24783-4-mrgolin@amazon.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-07-13 04:00:34 -04:00
Michael Margolin	1a40c362ae	RDMA/uverbs: Add a common way to create CQ with umem Add ioctl command attributes and a common handling for the option to create CQs with memory buffers passed from userspace. When required attributes are supplied, create umem and provide it for driver's use. The extension enables creation of CQs on top of preallocated CPU virtual or device memory buffers, by supplying VA or dmabuf fd, in a common way. Drivers can support this flow by initializing a new create_cq_umem fp field in their ops struct, with a function that can handle the new parameter. Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20250708202308.24783-2-mrgolin@amazon.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-07-13 04:00:34 -04:00
Nicolin Chen	32b2d3a57e	iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support Add a new vEVENTQ type for VINTFs that are assigned to the user space. Simply report the two 64-bit LVCMDQ_ERR_MAPs register values. Link: https://patch.msgid.link/r/68161a980da41fa5022841209638aeff258557b5.1752126748.git.nicolinc@nvidia.com Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:36 -03:00
Nicolin Chen	4dc0d12474	iommu/tegra241-cmdqv: Add user-space use support The CMDQV HW supports a user-space use for virtualization cases. It allows the VM to issue guest-level TLBI or ATC_INV commands directly to the queue and executes them without a VMEXIT, as HW will replace the VMID field in a TLBI command and the SID field in an ATC_INV command with the preset VMID and SID. This is built upon the vIOMMU infrastructure by allowing VMM to allocate a VINTF (as a vIOMMU object) and assign VCMDQs (HW QUEUE objs) to the VINTF. So firstly, replace the standard vSMMU model with the VINTF implementation but reuse the standard cache_invalidate op (for unsupported commands) and the standard alloc_domain_nested op (for standard nested STE). Each VINTF has two 64KB MMIO pages (128B per logical VCMDQ): - Page0 (directly accessed by guest) has all the control and status bits. - Page1 (trapped by VMM) has guest-owned queue memory location/size info. VMM should trap the emulated VINTF0's page1 of the guest VM for the guest- level VCMDQ location/size info and forward that to the kernel to translate to a physical memory location to program the VCMDQ HW during an allocation call. Then, it should mmap the assigned VINTF's page0 to the VINTF0 page0 of the guest VM. This allows the guest OS to read and write the guest-own VINTF's page0 for direct control of the VCMDQ HW. For ATC invalidation commands that hold an SID, it requires all devices to register their virtual SIDs to the SID_MATCH registers and their physical SIDs to the pairing SID_REPLACE registers, so that HW can use those as a lookup table to replace those virtual SIDs with the correct physical SIDs. Thus, implement the driver-allocated vDEVICE op with a tegra241_vintf_sid structure to allocate SID_REPLACE and to program the SIDs accordingly. This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. Link: https://patch.msgid.link/r/fb0eab83f529440b6aa181798912a6f0afa21eb0.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:36 -03:00
Nicolin Chen	a9f10bab2e	iommufd: Allow an input data_type via iommu_hw_info The iommu_hw_info can output via the out_data_type field the vendor data type from a driver, but this only allows driver to report one data type. Now, with SMMUv3 having a Tegra241 CMDQV implementation, it has two sets of types and data structs to report. One way to support that is to use the same type field bidirectionally. Reuse the same field by adding an "in_data_type", allowing user space to request for a specific type and to get the corresponding data. For backward compatibility, since the ioctl handler has never checked an input value, add an IOMMU_HW_INFO_FLAG_INPUT_TYPE to switch between the old output-only field and the new bidirectional field. Link: https://patch.msgid.link/r/887378a7167e1786d9d13cde0c36263ed61823d7.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Nicolin Chen	62622a8753	iommu: Allow an input type in hw_info op The hw_info uAPI will support a bidirectional data_type field that can be used as an input field for user space to request for a specific info data. To prepare for the uAPI update, change the iommu layer first: - Add a new IOMMU_HW_INFO_TYPE_DEFAULT as an input, for which driver can output its only (or firstly) supported type - Update the kdoc accordingly - Roll out the type validation in the existing drivers Link: https://patch.msgid.link/r/00f4a2d3d930721f61367014717b3ba2d1e82a81.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 14:34:35 -03:00
Ricardo Ribalda	2ab4019aa3	media: uvcvideo: Introduce V4L2_META_FMT_UVC_MSXU_1_5 The UVC driver provides two metadata types V4L2_META_FMT_UVC, and V4L2_META_FMT_D4XX. The only difference between the two of them is that V4L2_META_FMT_UVC only copies PTS, SCR, size and flags, and V4L2_META_FMT_D4XX copies the whole metadata section. Now we only enable V4L2_META_FMT_D4XX for the Intel D4xx family of devices, but it is useful to have the whole metadata payload for any device where vendors include other metadata, such as the one described by Microsoft: https://learn.microsoft.com/en-us/windows-hardware/drivers/stream/mf-capture-metadata This patch introduces a new format V4L2_META_FMT_UVC_MSXU_1_5, that is identical to V4L2_META_FMT_D4XX. Let the user enable this format with a quirk for now. This way they can test if their devices provide useful metadata without rebuilding the kernel. They can later contribute patches to auto-quirk their devices. We will also work in methods to auto-detect devices compatible with this new metadata format. Suggested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250707-uvc-meta-v8-4-ed17f8b1218b@chromium.org Signed-off-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>	2025-07-11 19:27:30 +02:00
Nicolin Chen	2238ddc2b0	iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl Introduce a new IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl for user space to allocate a HW QUEUE object for a vIOMMU specific HW-accelerated queue, e.g.: - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers Since this is introduced with NVIDIA's VCMDQs that access the guest memory in the physical address space, add an iommufd_hw_queue_alloc_phys() helper that will create an access object to the queue memory in the IOAS, to avoid the mappings of the guest memory from being unmapped, during the life cycle of the HW queue object. AMD's HW will need an hw_queue_init op that is mutually exclusive with the hw_queue_init_phys op, and their case will bypass the access part, i.e. no iommufd_hw_queue_alloc_phys() call. Link: https://patch.msgid.link/r/dab4ace747deb46c1fe70a5c663307f46990ae56.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:26 -03:00
Nicolin Chen	e2e9360022	iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct Add IOMMUFD_OBJ_HW_QUEUE with an iommufd_hw_queue structure, representing a HW-accelerated queue type of IOMMU's physical queue that can be passed through to a user space VM for direct hardware control, such as: - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers Add new viommu ops for iommufd to communicate with IOMMU drivers to fetch supported HW queue structure size and to forward user space ioctls to the IOMMU drivers for initialization/destroy. As the existing HWs, NVIDIA's VCMDQs access the guest memory via physical addresses, while AMD's Buffers access the guest memory via guest physical addresses (i.e. iova of the nesting parent HWPT). Separate two mutually exclusive hw_queue_init and hw_queue_init_phys ops to indicate whether a vIOMMU HW accesses the guest queue in the guest physical space (via iova) or the host physical space (via pa). In a latter case, the iommufd core will validate the physical pages of a given guest queue, to ensure the underlying physical pages are contiguous and pinned. Since this is introduced with NVIDIA's VCMDQs, add hw_queue_init_phys for now, and leave some notes for hw_queue_init in the near future (for AMD). Either NVIDIA's or AMD's HW is a multi-queue model: NVIDIA's will be only one type in enum iommu_hw_queue_type, while AMD's will be three different types (two of which will have multi queues). Compared to letting the core manage multiple queues with three types per vIOMMU object, it'd be easier for the driver to manage that by having three different driver-structure arrays per vIOMMU object. Thus, pass in the index to the init op. Link: https://patch.msgid.link/r/6939b73699e278e60ce167e911b3d9be68882bad.1752126748.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-11 11:09:26 -03:00
Sebastian Andrzej Siewior	760e6f7bef	futex: Remove support for IMMUTABLE The FH_FLAG_IMMUTABLE flag was meant to avoid the reference counting on the private hash and so to avoid the performance regression on big machines. With the switch to per-CPU counter this is no longer needed. That flag was never useable on any released kernel. Remove any support for IMMUTABLE while preserve the flags argument and enforce it to be zero. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250710110011.384614-5-bigeasy@linutronix.de	2025-07-11 16:02:01 +02:00
Simona Vetter	9800bf6fae	UAPI Changes: - Documentation fixes (Shuicheng) Cross-subsystem Changes: - MTD intel-dg driver for dgfx non-volatile memory device (Sasha) - i2c: designware changes to allow i2c integration with BMG (Heikki) Core Changes: - Restructure migration in preparation for multi-device (Brost, Thomas) - Expose fan control and voltage regulator version on sysfs (Raag) Driver Changes: - Add WildCat Lake support (Roper) - Add aux bus child device driver for NVM on DGFX (Sasha) - Some refactor and fixes to allow cleaner BMG w/a (Lucas, Maarten, Auld) - BMG w/a (Vinay) - Improve handling of aborted probe (Michal) - Do not wedge device on killed exec queues (Brost) - Init changes for flicker-free boot (Maarten) - Fix out-of-bounds field write in MI_STORE_DATA_IMM (Jia) - Enable the GuC Dynamic Inhibit Context Switch optimization (Daniele) - Drop bo->size (Brost) - Builds and KConfig fixes (Harry, Maarten) - Consolidate LRC offset calculations (Tvrtko) - Fix potential leak in hw_engine_group (Michal) - Future-proof for multi-tile + multi-GT cases (Roper) - Validate gt in pmu event (Riana) - SRIOV PF: Clear all LMTT pages on alloc (Michal) - Allocate PF queue size on pow2 boundary (Brost) - SRIOV VF: Make multi-GT migration less error prone (Tomasz) - Revert indirect ring state patch to fix random LRC context switches failures (Brost) - Fix compressed VRAM handling (Auld) - Add one additional BMG PCI ID (Ravi) - Recommend GuC v70.46.2 for BMG, LNL, DG2 (Julia) - Add GuC and HuC to PTL (Daniele) - Drop PTL force_probe requirement (Atwood) - Fix error flow in display suspend (Shuicheng) - Disable GuC communication on hardware initialization error (Zhanjun) - Devcoredump fixes and clean up (Shuicheng) - SRIOV PF: Downgrade some info to debug (Michal) - Don't allocate temporary GuC policies object (Michal) - Support for I2C attached MCUs (Heikki, Raag, Riana) - Add GPU memory bo trace points (Juston) - SRIOV VF: Skip some W/a (Michal) - Correct comment of xe_pm_set_vram_threshold (Shuicheng) - Cancel ongoing H2G requests when stopping CT (Michal) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmhwORUACgkQ+mJfZA7r E8qZEAf/Qe0kdTdRUe5xeIH+xJ3tLLtghns3Hp4mjwXCVq45Mg49r00C+jksxl2+ rxPotKMJJfO0mjL0EhvMqeE5AMPaEjfZHoNFFDYEEe2MNCqm1ES6W6togTQqO19w uO23jboLpdY6P5TrDCM2YQsew0D42iPviSEoBKQ+rIjCb/Edt79xRdBaXLbeyvk5 kKhnKC8myW51XlOQv9vFYA0hUg9T4K0KvazV3zT22R1JAxYDJfT6Scu3LQrymEB/ 15j3QxFfUyNb4AohO1fi/ggUaX02GrrkSAJ075VFvxDoCG5DcjMnGqQEsonDzKFh QnA8vgbTgapWyf3B10bobRgLmV1uTw== =R9th -----END PGP SIGNATURE----- Merge tag 'drm-xe-next-2025-07-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Documentation fixes (Shuicheng) Cross-subsystem Changes: - MTD intel-dg driver for dgfx non-volatile memory device (Sasha) - i2c: designware changes to allow i2c integration with BMG (Heikki) Core Changes: - Restructure migration in preparation for multi-device (Brost, Thomas) - Expose fan control and voltage regulator version on sysfs (Raag) Driver Changes: - Add WildCat Lake support (Roper) - Add aux bus child device driver for NVM on DGFX (Sasha) - Some refactor and fixes to allow cleaner BMG w/a (Lucas, Maarten, Auld) - BMG w/a (Vinay) - Improve handling of aborted probe (Michal) - Do not wedge device on killed exec queues (Brost) - Init changes for flicker-free boot (Maarten) - Fix out-of-bounds field write in MI_STORE_DATA_IMM (Jia) - Enable the GuC Dynamic Inhibit Context Switch optimization (Daniele) - Drop bo->size (Brost) - Builds and KConfig fixes (Harry, Maarten) - Consolidate LRC offset calculations (Tvrtko) - Fix potential leak in hw_engine_group (Michal) - Future-proof for multi-tile + multi-GT cases (Roper) - Validate gt in pmu event (Riana) - SRIOV PF: Clear all LMTT pages on alloc (Michal) - Allocate PF queue size on pow2 boundary (Brost) - SRIOV VF: Make multi-GT migration less error prone (Tomasz) - Revert indirect ring state patch to fix random LRC context switches failures (Brost) - Fix compressed VRAM handling (Auld) - Add one additional BMG PCI ID (Ravi) - Recommend GuC v70.46.2 for BMG, LNL, DG2 (Julia) - Add GuC and HuC to PTL (Daniele) - Drop PTL force_probe requirement (Atwood) - Fix error flow in display suspend (Shuicheng) - Disable GuC communication on hardware initialization error (Zhanjun) - Devcoredump fixes and clean up (Shuicheng) - SRIOV PF: Downgrade some info to debug (Michal) - Don't allocate temporary GuC policies object (Michal) - Support for I2C attached MCUs (Heikki, Raag, Riana) - Add GPU memory bo trace points (Juston) - SRIOV VF: Skip some W/a (Michal) - Correct comment of xe_pm_set_vram_threshold (Shuicheng) - Cancel ongoing H2G requests when stopping CT (Michal) Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/aHA7184UnWlONORU@intel.com	2025-07-11 11:08:53 +02:00
Jakub Kicinski	178331743c	ethtool: rss: report which fields are configured for hashing Implement ETHTOOL_GRXFH over Netlink. The number of flow types is reasonable (around 20) so report all of them at once for simplicity. Do not maintain the flow ID mapping with ioctl at the uAPI level. This gives us a chance to clean up the confusion that come from RxNFC vs RxFH (flow direction vs hashing) in the ioctl. Try to align with the names used in ethtool CLI, they seem to have stood the test of time just fine. One annoyance is that we still call L4 ports the weird names, but I guess they also apply to IPSec (where they cover the SPI) so it is what it is. $ ynl --family ethtool --dump rss-get { "header": { "dev-index": 1, "dev-name": "enp1s0" }, "hfunc": 1, "hkey": b"...", "indir": [0, 1, ...], "flow-hash": { "ether": {"l2da"}, "ah-esp4": {"ip-src", "ip-dst"}, "ah-esp6": {"ip-src", "ip-dst"}, "ah4": {"ip-src", "ip-dst"}, "ah6": {"ip-src", "ip-dst"}, "esp4": {"ip-src", "ip-dst"}, "esp6": {"ip-src", "ip-dst"}, "ip4": {"ip-src", "ip-dst"}, "ip6": {"ip-src", "ip-dst"}, "sctp4": {"ip-src", "ip-dst"}, "sctp6": {"ip-src", "ip-dst"}, "udp4": {"ip-src", "ip-dst"}, "udp6": {"ip-src", "ip-dst"} "tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"}, "tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"}, }, } Link: https://patch.msgid.link/20250708220640.2738464-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:57:49 -07:00
Jakub Kicinski	d7974697de	ethtool: mark ETHER_FLOW as usable for Rx hash Looks like some drivers (ena, enetc, fbnic.. there's probably more) consider ETHER_FLOW to be legitimate target for flow hashing. I'm not sure how intentional that is from the uAPI perspective vs just an effect of ethtool IOCTL doing minimal input validation. But Netlink will do strict validation, so we need to decide whether we allow this use case or not. I don't see a strong reason against it, and rejecting it would potentially regress a number of drivers. So update the comments and flow_type_hashable(). Link: https://patch.msgid.link/20250708220640.2738464-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:57:49 -07:00
Jakub Kicinski	b430f6c38d	Merge branch 'virtio_udp_tunnel_08_07_2025' of https://github.com/pabeni/linux-devel Paolo Abeni says: ==================== virtio: introduce GSO over UDP tunnel Some virtualized deployments use UDP tunnel pervasively and are impacted negatively by the lack of GSO support for such kind of traffic in the virtual NIC driver. The virtio_net specification recently introduced support for GSO over UDP tunnel, this series updates the virtio implementation to support such a feature. Currently the kernel virtio support limits the feature space to 64, while the virtio specification allows for a larger number of features. Specifically the GSO-over-UDP-tunnel-related virtio features use bits 65-69. The first four patches in this series rework the virtio and vhost feature support to cope with up to 128 bits. The limit is set by a define and could be easily raised in future, as needed. This implementation choice is aimed at keeping the code churn as limited as possible. For the same reason, only the virtio_net driver is reworked to leverage the extended feature space; all other virtio/vhost drivers are unaffected, but could be upgraded to support the extended features space in a later time. The last four patches bring in the actual GSO over UDP tunnel support. As per specification, some additional fields are introduced into the virtio net header to support the new offload. The presence of such fields depends on the negotiated features. New helpers are introduced to convert the UDP-tunneled skb metadata to an extended virtio net header and vice versa. Such helpers are used by the tun and virtio_net driver to cope with the newly supported offloads. Tested with basic stream transfer with all the possible permutations of host kernel/qemu/guest kernel with/without GSO over UDP tunnel support. ==================== Link: https://patch.msgid.link/cover.1751874094.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 13:32:35 -07:00
Jakub Kicinski	3321e97eab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml `0a12c435a1` ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") `b3603c0466` ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 10:10:49 -07:00
Linus Torvalds	73d7cf0710	ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM. - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue. - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace. - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private. - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest. - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset. - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID. - Advertise supported TDX TDVMCALLs to userspace. - Pass SetupEventNotifyInterrupt arguments to userspace. - Fix TSC frequency underflow. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhurKgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNxHggApTP4vw+oOzfN7UoNmgR9XZMI1p2a R8AzQ1zDyVbEVWq3xTKvXtld+dKeO0yKB/XeI/1JLck1OiHxY57I3X6k5AnsurEr CBzeAhAjXivF8woMgmlP+30aqpomcPACdQm0gRnWkRDDJfXqSUas/iE/s9Ct1dT4 4w3PtFLsSsU8vX/RttR+CqF1AQ6SeV/NRvA8hzPGMGZoQ2um74j4ZsM/3xh77Kdw Z2vOnZOIA4dk0074JjO/Yb9l00Ib4hn+MWG5jVJ+6i2HRRYd2knnB29apVS/ARdL X20j+LvtYj/jrPPdYwqjvxbIXyLbJrLCZyjKhfueN+rnisPNvzR+7YE4ZQ== =NduO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Many patches, pretty much all of them small, that accumulated while I was on vacation. ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID - Advertise supported TDX TDVMCALLs to userspace - Pass SetupEventNotifyInterrupt arguments to userspace - Fix TSC frequency underflow" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: avoid underflow when scaling TSC frequency KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes KVM: arm64: Don't free hyp pages with pKVM on GICv2 KVM: arm64: Fix error path in init_hyp_mode() KVM: arm64: Adjust range correctly during host stage-2 faults KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi() KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush KVM: SVM: Add missing member in SNP_LAUNCH_START command structure Documentation: KVM: Fix unexpected unindent warnings KVM: selftests: Add back the missing check of MONITOR/MWAIT availability KVM: Allow CPU to reschedule while setting per-page memory attributes KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table. KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt	2025-07-10 09:06:53 -07:00
Nicolin Chen	1976cdf61c	iommufd/viommu: Allow driver-specific user data for a vIOMMU object The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass it in via the viommu_init iommu op. Link: https://patch.msgid.link/r/2315b0e164b355746387e960745ac9154caec124.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:51 -03:00
Nicolin Chen	fca02263f2	iommufd: Correct virt_id kdoc at struct iommu_vdevice_alloc The userspace-api iommufd.rst has described it correctly but the uAPI doc was remained uncorrected. Thus, fix it. Link: https://patch.msgid.link/r/2cdcecaf2babee16fda7545ccad4e5bed7a5032d.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-07-10 12:38:50 -03:00
Jason Xing	45e359be1c	net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt This patch provides a setsockopt method to let applications leverage to adjust how many descs to be handled at most in one send syscall. It mitigates the situation where the default value (32) that is too small leads to higher frequency of triggering send syscall. Considering the prosperity/complexity the applications have, there is no absolutely ideal suggestion fitting all cases. So keep 32 as its default value like before. The patch does the following things: - Add XDP_MAX_TX_SKB_BUDGET socket option. - Set max_tx_budget to 32 by default in the initialization phase as a per-socket granular control. - Set the range of max_tx_budget as [32, xs->tx->nentries]. The idea behind this comes out of real workloads in production. We use a user-level stack with xsk support to accelerate sending packets and minimize triggering syscalls. When the packets are aggregated, it's not hard to hit the upper bound (namely, 32). The moment user-space stack fetches the -EAGAIN error number passed from sendto(), it will loop to try again until all the expected descs from tx ring are sent out to the driver. Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of sendto() and higher throughput/PPS. Here is what I did in production, along with some numbers as follows: For one application I saw lately, I suggested using 128 as max_tx_budget because I saw two limitations without changing any default configuration: 1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind this was I counted how many descs are transmitted to the driver at one time of sendto() based on [1] patch and then I calculated the possibility of hitting the upper bound. Finally I chose 128 as a suitable value because 1) it covers most of the cases, 2) a higher number would not bring evident results. After twisting the parameters, a stable improvement of around 4% for both PPS and throughput and less resources consumption were found to be observed by strace -c -p xxx: 1) %time was decreased by 7.8% 2) error counter was decreased from 18367 to 572 [1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/ Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-10 14:48:29 +02:00
Mehdi Djait	1fff2ee377	media: uapi: videodev2: Fix comment for 12-bit packed Bayer formats For 12-bit packed Bayer formats: every two consecutive samples are packed into three bytes. Fix the corresponding comment. Signed-off-by: Mehdi Djait <mehdi.djait@linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>	2025-07-10 11:32:24 +02:00
Aleksa Sarai	76fdb7eb4e	uapi: export PROCFS_ROOT_INO The root inode of /proc having a fixed inode number has been part of the core kernel ABI since its inception, and recently some userspace programs (mainly container runtimes) have started to explicitly depend on this behaviour. The main reason this is useful to userspace is that by checking that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO, they can then use openat2(RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH}) to ensure that there isn't a bind-mount that replaces some procfs file with a different one. This kind of attack has lead to security issues in container runtimes in the past (such as CVE-2019-19921) and libraries like libpathrs[1] use this feature of procfs to provide safe procfs handling functions. There was also some trailing whitespace in the "struct proc_dir_entry" initialiser, so fix that up as well. [1]: https://github.com/openSUSE/libpathrs Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/20250708-uapi-procfs-root-ino-v1-1-6ae61e97c79b@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-10 09:39:18 +02:00
Jim Mattson	a7cec20845	KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs without interception. The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not handled at all. The MSR values are not zeroed on vCPU creation, saved on suspend, or restored on resume. No accommodation is made for processor migration or for sharing a logical processor with other tasks. No adjustments are made for non-unit TSC multipliers. The MSRs do not account for time the same way as the comparable PMU events, whether the PMU is virtualized by the traditional emulation method or the new mediated pass-through approach. Nonetheless, in a properly constrained environment, this capability can be combined with a guest CPUID table that advertises support for CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the effective physical CPU frequency in /proc/cpuinfo. Moreover, there is no performance cost for this capability. Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-07-09 09:33:37 -07:00
Thomas Gleixner	068f7b64bf	Merge v6.16-rc2 into timers/ptp to pick up the __GENMASK() fix, otherwise the AUX clock VDSO patches fail to compile for compat. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-07-09 11:51:34 +02:00
Uwe Kleine-König	2b2aeaa12c	Runtime PM updates related to autosuspend for 6.17 Make several autosuspend functions mark last busy stamp and update the documentation accordingly (Sakari Ailus). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmhjrvQSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1aigIAKypscFSfd/WsAlP4orLtd4xiLi7KQHa NrOYv7sZBVHgx4S9BGqHavYM3qhOBH5qCg+yBtqb8v6vp/TIlJjF2v3+C8uAXAnl 9GI5Gc6sZp/lMhf5Vb4Ibf58/P9lg4A7mIvM+LtfeEUMTlRrp47Q/RQ0ah7S4tDj rFyC92QmBnkDLoyQp5IbCWkdjlqGgyTZzz/+XnfV9fWwX1aiXqaB4EZ2nXSAlyBx awvA4/t662SWlHsdQqll9vS/+ZB39LEWwa1jJ7XGIK8t6GSH+TTys3S9/PHR/ua3 Dlcd+14plDGlSOy97hY+z6NdSgxH/qcpRgFduu5KwVqEwo2fqNkeV9Q= =pFYw -----END PGP SIGNATURE----- Merge tag 'pm-runtime-6.17-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Runtime PM updates related to autosuspend for 6.17 Make several autosuspend functions mark last busy stamp and update the documentation accordingly (Sakari Ailus).	2025-07-09 10:48:21 +02:00
Kuniyuki Iwashima	df30285b36	af_unix: Introduce SO_INQ. We have an application that uses almost the same code for TCP and AF_UNIX (SOCK_STREAM). TCP can use TCP_INQ, but AF_UNIX doesn't have it and requires an extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO) as an alternative. Let's introduce the generic version of TCP_INQ. If SO_INQ is enabled, recvmsg() will put a cmsg of SCM_INQ that contains the exact value of ioctl(SIOCINQ). The cmsg is also included when msg->msg_get_inq is non-zero to make sockets io_uring-friendly. Note that SOCK_CUSTOM_SOCKOPT is flagged only for SOCK_STREAM to override setsockopt() for SOL_SOCKET. By having the flag in struct unix_sock, instead of struct sock, we can later add SO_INQ support for TCP and reuse tcp_sk(sk)->recvmsg_inq. Note also that supporting custom getsockopt() for SOL_SOCKET will need preparation for other SOCK_CUSTOM_SOCKOPT users (UDP, vsock, MPTCP). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
Paolo Abeni	288f304351	tun: enable gso over UDP tunnel support. Add new tun features to represent the newly introduced virtio GSO over UDP tunnel offload. Allows detection and selection of such features via the existing TUNSETOFFLOAD ioctl and compute the expected virtio header size and tunnel header offset using the current netdev features, so that we can plug almost seamless the newly introduced virtio helpers to serialize the extended virtio header. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- v6 -> v7: - rebased v4 -> v5: - encapsulate the guest feature guessing in a tun helper - dropped irrelevant check on xdp buff headroom - do not remove unrelated black line - avoid line len > 80 char v3 -> v4: - virtio tnl-related fields are at fixed offset, cleanup the code accordingly. - use netdev features instead of flags bit to check for the configured offload - drop packet in case of enabled features/configured hdr size mismatch v2 -> v3: - cleaned-up uAPI comments - use explicit struct layout instead of raw buf.	2025-07-08 18:07:26 +02:00
Paolo Abeni	a2fb4bc4e2	net: implement virtio helpers to handle UDP GSO tunneling. The virtio specification are introducing support for GSO over UDP tunnel. This patch brings in the needed defines and the additional virtio hdr parsing/building helpers. The UDP tunnel support uses additional fields in the virtio hdr, and such fields location can change depending on other negotiated features - specifically VIRTIO_NET_F_HASH_REPORT. Try to be as conservative as possible with the new field validation. Existing implementation for plain GSO offloads allow for invalid/ self-contradictory values of such fields. With GSO over UDP tunnel we can be more strict, with no need to deal with legacy implementation. Since the checksum-related field validation is asymmetric in the driver and in the device, introduce a separate helper to implement the new checks (to be used only on the driver side). Note that while the feature space exceeds the 64-bit boundaries, the guest offload space is fixed by the specification of the VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command to a 64-bit size. Prior to the UDP tunnel GSO support, each guest offload bit corresponded to the feature bit with the same value and vice versa. Due to the limited 'guest offload' space, relevant features in the high 64 bits are 'mapped' to free bits in the lower range. That is simpler than defining a new command (and associated features) to exchange an extended guest offloads set. As a consequence, the uAPIs also specify the mapped guest offload value corresponding to the UDP tunnel GSO features. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> -- v4 -> v5: - avoid lines above 80 chars v3 -> v4: - fixed offset for UDP GSO tunnel, update accordingly the helpers - tried to clarified vlan_hlen semantic - virtio_net_chk_data_valid() -> virtio_net_handle_csum_offload() v2 -> v3: - add definitions for possible vnet hdr layouts with tunnel support v1 -> v2: - 'relay' -> 'rely' typo - less unclear comment WRT enforced inner GSO checks - inner header fields are allowed only with 'modern' virtio, thus are always le - clarified in the commit message the need for 'mapped features' defines - assume little_endian is true when UDP GSO is enabled. - fix inner proto type value	2025-07-08 18:05:47 +02:00
Paolo Abeni	333c515d18	vhost-net: allow configuring extended features Use the extended feature type for 'acked_features' and implement two new ioctls operation allowing the user-space to set/query an unbounded amount of features. The actual number of processed features is limited by VIRTIO_FEATURES_MAX and attempts to set features above such limit fail with EOPNOTSUPP. Note that: the legacy ioctls implicitly truncate the negotiated features to the lower 64 bits range and the 'acked_backend_features' field don't need conversion, as the only negotiated feature there is in the low 64 bit range. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 18:05:23 +02:00
Thomas Weißschuh	70b9c0c11e	uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again (2) BITS_PER_LONG does not exist in UAPI headers, so can't be used by the UAPI __GENMASK(). Instead __BITS_PER_LONG needs to be used. When __GENMASK() was introduced in commit `3c7a8e190b` ("uapi: introduce uapi-friendly macros for GENMASK"), the code was fine. A broken revert in `1e7933a575` ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"") introduced the incorrect usage of BITS_PER_LONG. That was fixed in commit `11fcf36850` ("uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again"). But a broken sync of the kernel headers with the tools/ headers in commit `fc92099902` ("tools headers: Synchronize linux/bits.h with the kernel sources") undid the fix. Reapply the fix and while at it also fix the tools header. Fixes: `fc92099902` ("tools headers: Synchronize linux/bits.h with the kernel sources") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>	2025-07-08 10:23:13 -04:00
Hannes Reinecke	e22da46850	net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING' Add a new netlink parameter 'HANDSHAKE_A_ACCEPT_KEYRING' to provide the serial number of the keyring to use. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250701144657.104401-1-hare@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 15:31:44 +02:00
Simona Vetter	203dcde881	Merge tag 'drm-msm-next-2025-07-05' of https://gitlab.freedesktop.org/drm/msm into drm-next Updates for v6.17 CI: - uprev mesa and ci-templates - use shallow clone to speed up build jobs - remove sdm845/cheza jobs. These runners are no more (RIP dear chezas) - fix runner tag for i915 cml runners - uprev igt to pull in msm test fixes Core: - VM_BIND support! - single source of truth for UBWC configuration. Adds a global soc driver for UBWC config which is used from display and GPU. (And later vidc/camera/etc) - Decouple ties between GPU and KMS, adding a `separate_gpu_kms` modparam to allow the GPU and KMS to bind to separate DRM devices. This should better deal with more exotic SoC configurations where the number of GPUs is different from number of DPUs. The default behavior is to still come up as a single unified DRM device to avoid surprising userspace. DP: - major rework of the I/O accessors DPU: - use version checks instead of feature bits - SM8750 support - set min_prefill_lines for SC8180X DSI: - SM8750 support GPU: - speedbin support for X1-85 - X1-45 support MDSS: - SM8750 support Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Robin Clark <robin.clark@oss.qualcomm.com> Link: https://patchwork.freedesktop.org/patch/msgid/CACSVV0217R+kpoWQJeuYGHf6q_4aFyEJuKa=dZZKOnLQzFwppg@mail.gmail.com	2025-07-08 14:31:19 +02:00
Jeremy Kerr	ad39c12fce	net: mctp: add gateway routing support This change allows for gateway routing, where a route table entry may reference a routable endpoint (by network and EID), instead of routing directly to a netdevice. We add support for a RTM_GATEWAY attribute for netlink route updates, with an attribute format of: struct mctp_fq_addr { unsigned int net; mctp_eid_t eid; } - we need the net here to uniquely identify the target EID, as we no longer have the device reference directly (which would provide the net id in the case of direct routes). This makes route lookups recursive, as a route lookup that returns a gateway route must be resolved into a direct route (ie, to a device) eventually. We provide a limit to the route lookups, to prevent infinite loop routing. The route lookup populates a new 'nexthop' field in the dst structure, which now specifies the key for the neighbour table lookup on device output, rather than using the packet destination address directly. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-13-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Tonghao Zhang	3d98ee5265	net: bonding: add broadcast_neighbor netlink option User can config or display the bonding broadcast_neighbor option via iproute2/netlink. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/76b90700ba5b98027dfb51a2f3c5cfea0440a21b.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 10:59:42 +02:00
Ankit Agrawal	f55ce5a6cd	KVM: arm64: Expose new KVM cap for cacheable PFNMAP Introduce a new KVM capability to expose to the userspace whether cacheable mapping of PFNMAP is supported. The ability to safely do the cacheable mapping of PFNMAP is contingent on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache and turns icache_inval_pou() into a NOP. The cap would be false if those requirements are missing and is checked by making use of kvm_arch_supports_cacheable_pfnmap. This capability would allow userspace to discover the support. It could for instance be used by userspace to prevent live-migration across FWB and non-FWB hosts. CC: Catalin Marinas <catalin.marinas@arm.com> CC: Jason Gunthorpe <jgg@nvidia.com> CC: Oliver Upton <oliver.upton@linux.dev> CC: David Hildenbrand <david@redhat.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-07-07 16:54:52 -07:00
Ilya Maximets	59f44c9ccc	net: openvswitch: allow providing upcall pid for the 'execute' command When a packet enters OVS datapath and there is no flow to handle it, packet goes to userspace through a MISS upcall. With per-CPU upcall dispatch mechanism, we're using the current CPU id to select the Netlink PID on which to send this packet. This allows us to send packets from the same traffic flow through the same handler. The handler will process the packet, install required flow into the kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a recirculation action that will pass the (likely modified) packet through the flow lookup again. And if the flow is not found, the packet will be sent to userspace again through another MISS upcall. However, the handler thread in userspace is likely running on a different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled in the syscall context of that thread. So, when the time comes to send the packet through another upcall, the per-CPU dispatch will choose a different Netlink PID, and this packet will end up processed by a different handler thread on a different CPU. The process continues as long as there are new recirculations, each time the packet goes to a different handler thread before it is sent out of the OVS datapath to the destination port. In real setups the number of recirculations can go up to 4 or 5, sometimes more. There is always a chance to re-order packets while processing upcalls, because userspace will first install the flow and then re-inject the original packet. So, there is a race window when the flow is already installed and the second packet can match it and be forwarded to the destination before the first packet is re-injected. But the fact that packets are going through multiple upcalls handled by different userspace threads makes the reordering noticeably more likely, because we not only have a race between the kernel and a userspace handler (which is hard to avoid), but also between multiple userspace handlers. For example, let's assume that 10 packets got enqueued through a MISS upcall for handler-1, it will start processing them, will install the flow into the kernel and start re-injecting packets back, from where they will go through another MISS to handler-2. Handler-2 will install the flow into the kernel and start re-injecting the packets, while handler-1 continues to re-inject the last of the 10 packets, they will hit the flow installed by handler-2 and be forwarded without going to the handler-2, while handler-2 still re-injects the first of these 10 packets. Given multiple recirculations and misses, these 10 packets may end up completely mixed up on the output from the datapath. Let's allow userspace to specify on which Netlink PID the packets should be upcalled while processing OVS_PACKET_CMD_EXECUTE. This makes it possible to ensure that all the packets are processed by the same handler thread in the userspace even with them being upcalled multiple times in the process. Packets will remain in order since they will be enqueued to the same socket and re-injected in the same order. This doesn't eliminate re-ordering as stated above, since we still have a race between kernel and the userspace thread, but it allows to eliminate races between multiple userspace threads. Userspace knows the PID of the socket on which the original upcall is received, so there is no need to send it up from the kernel. Solution requires storing the value somewhere for the duration of the packet processing. There are two potential places for this: our skb extension or the per-CPU storage. It's not clear which is better, so just following currently used scheme of storing this kind of things along the skb. We still have a decent amount of space in the cb. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250702155043.2331772-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 14:30:39 -07:00
Mark Brown	bb96a315b4	ASoC: soc-dapm: cleanups Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>: This is prepare to hiding snd_soc_dapm_context inside soc-dapm.c	2025-07-07 21:02:59 +01:00
Uwe Kleine-König	9c06f26ba5	pwm: Add support for pwmchip devices for faster and easier userspace access With this change each pwmchip defining the new-style waveform callbacks can be accessed from userspace via a character device. Compared to the sysfs-API this is faster and allows to pass the whole configuration in a single ioctl allowing atomic application and thus reducing glitches. On an STM32MP13 I see: root@DistroKit:~ time pwmtestperf real 0m 1.27s user 0m 0.02s sys 0m 1.21s root@DistroKit:~ rm /dev/pwmchip0 root@DistroKit:~ time pwmtestperf real 0m 3.61s user 0m 0.27s sys 0m 3.26s pwmtestperf does essentially: for i in 0 .. 50000: pwm_set_waveform(duty_length_ns=i, period_length_ns=50000, duty_offset_ns=0) and in the presence of /dev/pwmchip0 is uses the ioctls introduced here, without that device it uses /sys/class/pwm/pwmchip0. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/ad4a4e49ae3f8ea81e23cac1ac12b338c3bf5c5b.1746010245.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>	2025-07-07 08:39:33 +02:00
Rob Clark	2e6a8a1fe2	drm/msm: Add VM_BIND ioctl Add a VM_BIND ioctl for binding/unbinding buffers into a VM. This is only supported if userspace has opted in to MSM_PARAM_EN_VM_BIND. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661524/	2025-07-04 17:48:38 -07:00
Rob Clark	92395af63a	drm/msm: Add VM_BIND submitqueue This submitqueue type isn't tied to a hw ringbuffer, but instead executes on the CPU for performing async VM_BIND ops. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661517/	2025-07-04 17:48:37 -07:00
Rob Clark	e1341f9145	drm/msm: Extract out syncobj helpers We'll be re-using these for the VM_BIND ioctl. Also, rename a few things in the uapi header to reflect that syncobj use is not specific to the submit ioctl. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661512/	2025-07-04 17:48:37 -07:00
Rob Clark	b58e12a66e	drm/msm: Add _NO_SHARE flag Buffers that are not shared between contexts can share a single resv object. This way drm_gpuvm will not track them as external objects, and submit-time validating overhead will be O(1) for all N non-shared BOs, instead of O(n). Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661497/	2025-07-04 17:48:36 -07:00
Rob Clark	feb8ef4636	drm/msm: Add opt-in for VM_BIND Add a SET_PARAM for userspace to request to manage to the VM itself, instead of getting a kernel managed VM. In order to transition to a userspace managed VM, this param must be set before any mappings are created. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661494/	2025-07-04 17:48:36 -07:00
Rob Clark	dbbde63c9e	drm/msm: Add PRR support Add PRR (Partial Resident Region) is a bypass address which make GPU writes go to /dev/null and reads return zero. This is used to implement vulkan sparse residency. To support PRR/NULL mappings, we allocate a page to reserve a physical address which we know will not be used as part of a GEM object, and configure the SMMU to use this address for PRR/NULL mappings. Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Tested-by: Antonino Maniscalco <antomani103@gmail.com> Reviewed-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/661486/	2025-07-04 17:48:35 -07:00
Ariel Otilibili	cdd73b1666	uapi: fix broken link in linux/capability.h The link to the libcap library is outdated. Instead, use a link to the libcap2 library. As well, give the complete reference of the POSIX compliance. Signed-off-by: Ariel Otilibili <ariel.otilibili-anieli@eurecom.fr> Acked-by: Andrew G. Morgan <morgan@kernel.org> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Serge Hallyn <sergeh@kernel.org>	2025-07-04 19:21:53 -05:00
Christian Brauner	ca115d7e75	tree-wide: s/struct fileattr/struct file_kattr/g Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 16:14:39 +02:00
Paolo Abeni	6b9fd8857b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-04 08:03:18 +02:00
Kumar Kartikeya Dwivedi	5ab154f146	bpf: Introduce BPF standard streams Add support for a stream API to the kernel and expose related kfuncs to BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These can be used for printing messages that can be consumed from user space, thus it's similar in spirit to existing trace_pipe interface. The kernel will use the BPF_STDERR stream to notify the program of any errors encountered at runtime. BPF programs themselves may use both streams for writing debug messages. BPF library-like code may use BPF_STDERR to print warnings or errors on misuse at runtime. The implementation of a stream is as follows. Everytime a message is emitted from the kernel (directly, or through a BPF program), a record is allocated by bump allocating from per-cpu region backed by a page obtained using alloc_pages_nolock(). This ensures that we can allocate memory from any context. The eventual plan is to discard this scheme in favor of Alexei's kmalloc_nolock() [0]. This record is then locklessly inserted into a list (llist_add()) so that the printing side doesn't require holding any locks, and works in any context. Each stream has a maximum capacity of 4MB of text, and each printed message is accounted against this limit. Messages from a program are emitted using the bpf_stream_vprintk kfunc, which takes a stream_id argument in addition to working otherwise similar to bpf_trace_vprintk. The bprintf buffer helpers are extracted out to be reused for printing the string into them before copying it into the stream, so that we can (with the defined max limit) format a string and know its true length before performing allocations of the stream element. For consuming elements from a stream, we expose a bpf(2) syscall command named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the stream of a given prog_fd into a user space buffer. The main logic is implemented in bpf_stream_read(). The log messages are queued in bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and ordered correctly in the stream backlog. For this purpose, we hold a lock around bpf_stream_backlog_peek(), as llist_del_first() (if we maintained a second lockless list for the backlog) wouldn't be safe from multiple threads anyway. Then, if we fail to find something in the backlog log, we splice out everything from the lockless log, and place it in the backlog log, and then return the head of the backlog. Once the full length of the element is consumed, we will pop it and free it. The lockless list bpf_stream::log is a LIFO stack. Elements obtained using a llist_del_all() operation are in LIFO order, thus would break the chronological ordering if printed directly. Hence, this batch of messages is first reversed. Then, it is stashed into a separate list in the stream, i.e. the backlog_log. The head of this list is the actual message that should always be returned to the caller. All of this is done in bpf_stream_backlog_fill(). From the kernel side, the writing into the stream will be a bit more involved than the typical printk. First, the kernel typically may print a collection of messages into the stream, and parallel writers into the stream may suffer from interleaving of messages. To ensure each group of messages is visible atomically, we can lift the advantage of using a lockless list for pushing in messages. To enable this, we add a bpf_stream_stage() macro, and require kernel users to use bpf_stream_printk statements for the passed expression to write into the stream. Underneath the macro, we have a message staging API, where a bpf_stream_stage object on the stack accumulates the messages being printed into a local llist_head, and then a commit operation splices the whole batch into the stream's lockless log list. This is especially pertinent for rqspinlock deadlock messages printed to program streams. After this change, we see each deadlock invocation as a non-interleaving contiguous message without any confusion on the reader's part, improving their user experience in debugging the fault. While programs cannot benefit from this staged stream writing API, they could just as well hold an rqspinlock around their print statements to serialize messages, hence this is kept kernel-internal for now. Overall, this infrastructure provides NMI-safe any context printing of messages to two dedicated streams. Later patches will add support for printing splats in case of BPF arena page faults, rqspinlock deadlocks, and cond_break timeouts, and integration of this facility into bpftool for dumping messages to user space. [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-03 19:30:06 -07:00
Dave Airlie	17d081ef84	drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - bridge: More reference counting - dp: Implement backlight control helpers - fourcc: Add half-float and 32b float formats, RGB161616, BGR161616 - mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flag - ttm: Improve eviction Driver Changes: - i915: Use backlight control helpers for eDP - tidss: Add AM65x OLDI bridge support - panels: - panel-edp: Add CMN N116BCJ-EAK support - raydium-rm67200: misc cleanups, optional reset - new panel: DJN HX83112B -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaGY7aQAKCRAnX84Zoj2+ dncVAYC+7mGk8UDugcIEn51fCLxv92DKeMRq/qsmGPz/x5c3TaXX7sN0/FLo91ek bLrwR9ABfjx+Qz+jO21LuwRBxgHv7XH5Bk1sPay1n7+TokndCj55+YG8vCbXISsk gsxtheA8Ig== =ybn3 -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-07-03' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - bridge: More reference counting - dp: Implement backlight control helpers - fourcc: Add half-float and 32b float formats, RGB161616, BGR161616 - mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flag - ttm: Improve eviction Driver Changes: - i915: Use backlight control helpers for eDP - tidss: Add AM65x OLDI bridge support - panels: - panel-edp: Add CMN N116BCJ-EAK support - raydium-rm67200: misc cleanups, optional reset - new panel: DJN HX83112B Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250703-chirpy-lilac-dalmatian-2c5838@houat	2025-07-04 11:54:31 +10:00
Daniel Scally	78584431e2	media: v4l2: Add Renesas Camera Receiver Unit pixel formats The Renesas Camera Receiver Unit in the RZ/V2H SoC can output RAW data captured from an image sensor without conversion to an RGB/YUV format. In that case the data are packed into 64-bit blocks, with a variable amount of padding in the most significant bits depending on the bitdepth of the data. Add new V4L2 pixel format codes for the new formats, along with documentation to describe them. Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally+renesas@ideasonboard.com> Link: https://lore.kernel.org/r/20250630222734.2712390-1-dan.scally@ideasonboard.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>	2025-07-03 09:04:14 +02:00
Jacopo Mondi	aa89281bbc	media: pisp_be: Use clamp() and define max sizes Use the clamp() function from minmax.h and provide a define for the max sizes as they will be used in subsequent patches. Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Stefan Klug <stefan.klug@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>	2025-07-03 09:25:01 +02:00
Pavel Begunkov	cf73d9970e	io_uring: don't use int for ABI __kernel_rwf_t is defined as int, the actual size of which is implementation defined. It won't go well if some compiler / archs ever defines it as i64, so replace it with __u32, hoping that there is no one using i16 for it. Cc: stable@vger.kernel.org Fixes: `2b188cc1bb` ("Add io_uring IO interface") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/47c666c4ee1df2018863af3a2028af18feef11ed.1751412511.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 17:11:58 -06:00
Carolina Jubran	566e8f108f	devlink: Extend devlink rate API with traffic classes bandwidth management Introduce support for specifying relative bandwidth shares between traffic classes (TC) in the devlink-rate API. This new option allows users to allocate bandwidth across multiple traffic classes in a single command. This feature provides a more granular control over traffic management, especially for scenarios requiring Enhanced Transmission Selection. Users can now define a relative bandwidth share for each traffic class. For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5 (RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the total bandwidth. The actual percentage each class receives depends on the ratio of its share value to the sum of all shares. Example: DEV=pci/0000:08:00.0 $ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \ tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 $ devlink port function rate set $DEV/vfs_group \ tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0 Example usage with ynl: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"rate-tc-index": 0, "rate-tc-bw": 50}, {"rate-tc-index": 1, "rate-tc-bw": 50}, {"rate-tc-index": 2, "rate-tc-bw": 0}, {"rate-tc-index": 3, "rate-tc-bw": 0}, {"rate-tc-index": 4, "rate-tc-bw": 0}, {"rate-tc-index": 5, "rate-tc-bw": 0}, {"rate-tc-index": 6, "rate-tc-bw": 0}, {"rate-tc-index": 7, "rate-tc-bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0}, {'rate-tc-bw': 50, 'rate-tc-index': 1}, {'rate-tc-bw': 0, 'rate-tc-index': 2}, {'rate-tc-bw': 0, 'rate-tc-index': 3}, {'rate-tc-bw': 0, 'rate-tc-index': 4}, {'rate-tc-bw': 0, 'rate-tc-index': 5}, {'rate-tc-bw': 0, 'rate-tc-index': 6}, {'rate-tc-bw': 0, 'rate-tc-index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 15:39:05 -07:00
Yeoreum Yun	b1fabef37b	prctl: Introduce PR_MTE_STORE_ONLY PR_MTE_STORE_ONLY is used to restrict the MTE tag check for store opeartion only. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-3-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-07-02 18:49:03 +01:00
Andrey Albershteyn	be7efb2d20	fs: introduce file_getattr and file_setattr syscalls Introduce file_getattr() and file_setattr() syscalls to manipulate inode extended attributes. The syscalls takes pair of file descriptor and pathname. Then it operates on inode opened accroding to openat() semantics. The struct file_attr is passed to obtain/change extended attributes. This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference that file don't need to be open as we can reference it with a path instead of fd. By having this we can manipulated inode extended attributes not only on regular files but also on special ones. This is not possible with FS_IOC_FSSETXATTR ioctl as with special files we can not call ioctl() directly on the filesystem inode using fd. This patch adds two new syscalls which allows userspace to get/set extended inode attributes on special files by using parent directory and a path - *at() like syscall. CC: linux-api@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: linux-xfs@vger.kernel.org Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-6-c4e3bc35227b@kernel.org Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-02 17:05:17 +02:00
Pavel Begunkov	e448d57826	io_uring/mock: add trivial poll handler Add a flag that enables polling on the mock file. For now it's trivially says that there is always data available, it'll be extended in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f16de043ec4876d65fae294fc99ade57415fba0c.1750599274.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 08:10:26 -06:00
Pavel Begunkov	0c98a44329	io_uring/mock: support for async read/write Let the user to specify a delay to read/write request. io_uring will start a timer, return -EIOCBQUEUED and complete the request asynchronously after the delay pass. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/38f9d2e143fda8522c90a724b74630e68f9bbd16.1750599274.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 08:10:26 -06:00
Pavel Begunkov	2f71d2386f	io_uring/mock: allow to choose FMODE_NOWAIT Add an option to choose whether the file supports FMODE_NOWAIT, that changes the execution path io_uring request takes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1e532565b05a05b23589d237c24ee1a3d90c2fd9.1750599274.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 08:10:26 -06:00
Pavel Begunkov	d1aa034657	io_uring/mock: add sync read/write Add support for synchronous zero read/write for mock files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/571f3c9fe688e918256a06a722d3db6ced9ca3d5.1750599274.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 08:10:26 -06:00
Pavel Begunkov	4aac001f78	io_uring/mock: add cmd using vectored regbufs There is a command api allowing to import vectored registered buffers, add a new mock command that uses the feature and simply copies the specified registered buffer into user space or vice versa. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/229a113fd7de6b27dbef9567f7c0bf4475c9017d.1750599274.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 08:10:26 -06:00
Pavel Begunkov	3a0ae385f6	io_uring/mock: add basic infra for test mock files io_uring commands provide an ioctl style interface for files to implement file specific operations. io_uring provides many features and advanced api to commands, and it's getting hard to test as it requires specific files/devices. Add basic infrastucture for creating special mock files that will be implementing the cmd api and using various io_uring features we want to test. It'll also be useful to test some more obscure read/write/polling edge cases in the future. Suggested-by: chase xd <sl1589472800@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/93f21b0af58c1367a2b22635d5a7d694ad0272fc.1750599274.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-02 08:10:26 -06:00
Anuj Gupta	9eb22f7fed	fs: add ioctl to query metadata and protection info capabilities Add a new ioctl, FS_IOC_GETLBMD_CAP, to query metadata and protection info (PI) capabilities. This ioctl returns information about the files integrity profile. This is useful for userspace applications to understand a files end-to-end data protection support and configure the I/O accordingly. For now this interface is only supported by block devices. However the design and placement of this ioctl in generic FS ioctl space allows us to extend it to work over files as well. This maybe useful when filesystems start supporting PI-aware layouts. A new structure struct logical_block_metadata_cap is introduced, which contains the following fields: 1. lbmd_flags: bitmask of logical block metadata capability flags 2. lbmd_interval: the amount of data described by each unit of logical block metadata 3. lbmd_size: size in bytes of the logical block metadata associated with each interval 4. lbmd_opaque_size: size in bytes of the opaque block tag associated with each interval 5. lbmd_opaque_offset: offset in bytes of the opaque block tag within the logical block metadata 6. lbmd_pi_size: size in bytes of the T10 PI tuple associated with each interval 7. lbmd_pi_offset: offset in bytes of T10 PI tuple within the logical block metadata 8. lbmd_pi_guard_tag_type: T10 PI guard tag type 9. lbmd_pi_app_tag_size: size in bytes of the T10 PI application tag 10. lbmd_pi_ref_tag_size: size in bytes of the T10 PI reference tag 11. lbmd_pi_storage_tag_size: size in bytes of the T10 PI storage tag The internal logic to fetch the capability is encapsulated in a helper function blk_get_meta_cap(), which uses the blk_integrity profile associated with the device. The ioctl returns -EOPNOTSUPP, if CONFIG_BLK_DEV_INTEGRITY is not enabled. Suggested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/20250630090548.3317-5-anuj20.g@samsung.com Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 14:00:15 +02:00
Caleb Sander Mateos	763ff02ce2	ublk: allow UBLK_IO_(UN)REGISTER_IO_BUF on any task Currently, UBLK_IO_REGISTER_IO_BUF and UBLK_IO_UNREGISTER_IO_BUF are only permitted on the ublk_io's daemon task. But this restriction is unnecessary. ublk_register_io_buf() calls __ublk_check_and_get_req() to look up the request from the tagset and atomically take a reference on the request without accessing the ublk_io. ublk_unregister_io_buf() doesn't use the q_id or tag at all. So allow these opcodes even on tasks other than io->task. Handle UBLK_IO_UNREGISTER_IO_BUF before obtaining the ubq and io since the buffer index being unregistered is not necessarily related to the specified q_id and tag. Add a feature flag UBLK_F_BUF_REG_OFF_DAEMON that userspace can use to determine whether the kernel supports off-daemon buffer registration. Suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250620151008.3976463-10-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-06-30 20:13:42 -06:00
Ido Schimmel	03dc03fa04	neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries tl;dr ===== Add a new neighbor flag ("extern_valid") that can be used to indicate to the kernel that a neighbor entry was learned and determined to be valid externally. The kernel will not try to remove or invalidate such an entry, leaving these decisions to the user space control plane. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches (VTEPs). VTEPs that are connected to the same ES are called ES peers. When a neighbor entry is learned on a VTEP, it is distributed to both ES peers and remote VTEPs using EVPN MAC/IP advertisement routes. ES peers use the neighbor entry when routing traffic towards the multi-homed host and remote VTEPs use it for ARP/NS suppression. Motivation ========== If the ES link between a host and the VTEP on which the neighbor entry was locally learned goes down, the EVPN MAC/IP advertisement route will be withdrawn and the neighbor entries will be removed from both ES peers and remote VTEPs. Routing towards the multi-homed host and ARP/NS suppression can fail until another ES peer locally learns the neighbor entry and distributes it via an EVPN MAC/IP advertisement route. "draft-rbickhart-evpn-ip-mac-proxy-adv-03" [1] suggests avoiding these intermittent failures by having the ES peers install the neighbor entries as before, but also injecting EVPN MAC/IP advertisement routes with a proxy indication. When the previously mentioned ES link goes down and the original EVPN MAC/IP advertisement route is withdrawn, the ES peers will not withdraw their neighbor entries, but instead start aging timers for the proxy indication. If an ES peer locally learns the neighbor entry (i.e., it becomes "reachable"), it will restart its aging timer for the entry and emit an EVPN MAC/IP advertisement route without a proxy indication. An ES peer will stop its aging timer for the proxy indication if it observes the removal of the proxy indication from at least one of the ES peers advertising the entry. In the event that the aging timer for the proxy indication expired, an ES peer will withdraw its EVPN MAC/IP advertisement route. If the timer expired on all ES peers and they all withdrew their proxy advertisements, the neighbor entry will be completely removed from the EVPN fabric. Implementation ============== In the above scheme, when the control plane (e.g., FRR) advertises a neighbor entry with a proxy indication, it expects the corresponding entry in the data plane (i.e., the kernel) to remain valid and not be removed due to garbage collection or loss of carrier. The control plane also expects the kernel to notify it if the entry was learned locally (i.e., became "reachable") so that it will remove the proxy indication from the EVPN MAC/IP advertisement route. That is why these entries cannot be programmed with dummy states such as "permanent" or "noarp". Instead, add a new neighbor flag ("extern_valid") which indicates that the entry was learned and determined to be valid externally and should not be removed or invalidated by the kernel. The kernel can probe the entry and notify user space when it becomes "reachable" (it is initially installed as "stale"). However, if the kernel does not receive a confirmation, have it return the entry to the "stale" state instead of the "failed" state. In other words, an entry marked with the "extern_valid" flag behaves like any other dynamically learned entry other than the fact that the kernel cannot remove or invalidate it. One can argue that the "extern_valid" flag should not prevent garbage collection and that instead a neighbor entry should be programmed with both the "extern_valid" and "extern_learn" flags. There are two reasons for not doing that: 1. Unclear why a control plane would like to program an entry that the kernel cannot invalidate but can completely remove. 2. The "extern_learn" flag is used by FRR for neighbor entries learned on remote VTEPs (for ARP/NS suppression) whereas here we are concerned with local entries. This distinction is currently irrelevant for the kernel, but might be relevant in the future. Given that the flag only makes sense when the neighbor has a valid state, reject attempts to add a neighbor with an invalid state and with this flag set. For example: # ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid Error: Cannot create externally validated neighbor with an invalid state. # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 nud failed dev br0.10 extern_valid Error: Cannot mark neighbor as externally validated with an invalid state. The above means that a neighbor cannot be created with the "extern_valid" flag and flags such as "use" or "managed" as they result in a neighbor being created with an invalid state ("none") and immediately getting probed: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use Error: Cannot create externally validated neighbor with an invalid state. However, these flags can be used together with "extern_valid" after the neighbor was created with a valid state: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use One consequence of preventing the kernel from invalidating a neighbor entry is that by default it will only try to determine reachability using unicast probes. This can be changed using the "mcast_resolicit" sysctl: # sysctl net.ipv4.neigh.br0/10.mcast_resolicit 0 # tcpdump -nn -e -i br0.10 -Q out arp & # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 # sysctl -wq net.ipv4.neigh.br0/10.mcast_resolicit=3 # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 iproute2 patches can be found here [2]. [1] https://datatracker.ietf.org/doc/html/draft-rbickhart-evpn-ip-mac-proxy-adv-03 [2] https://github.com/idosch/iproute2/tree/submit/extern_valid_v1 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250626073111.244534-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:14:23 -07:00
Jens Axboe	94b2030968	io_uring: remove errant ';' from IORING_CQE_F_TSTAMP_HW definition An errant ';' slipped into that definition, which will cause some compilers to complain when it's used in an application: timestamp.c:257:45: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt] 257 \| hwts = cqe->flags & IORING_CQE_F_TSTAMP_HW; \| ^ Fixes: `9e4ed359b8` ("io_uring/netcmd: add tx timestamping cmd support") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-06-30 11:36:54 -06:00
Greg Kroah-Hartman	815ac67919	Merge 6.16-rc4 into tty-next We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-30 07:50:04 +02:00
Jacopo Mondi	1aa93cfb12	drm/fourcc: Add RGB161616 and BGR161616 formats Add FourCC definitions for the 48-bit RGB/BGR formats to the DRM/KMS uapi. The format will be used by the Raspberry Pi PiSP Back End, supported by a V4L2 driver in kernel space and by libcamera in userspace, which uses the DRM FourCC identifiers. Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Simon Ser <contact@emersion.fr> Reviewed-by: Naushir Patuck <naush@raspberrypi.com> Link: https://lore.kernel.org/r/20240226132544.82817-1-jacopo.mondi@ideasonboard.com Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>	2025-06-28 09:36:16 +02:00
Arkadiusz Kubalewski	7f15ee3597	dpll: add reference-sync netlink attribute Add new netlink attribute to allow user space configuration of reference sync pin pairs, where both pins are used to provide one clock signal consisting of both: base frequency and sync signal. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250626135219.1769350-2-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-27 16:38:02 -07:00
Linus Torvalds	e540341508	block-6.16-20250626 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmhd4zsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmZgEACOk81RNf8WGNQf4/parSENzebWNj9W+fKD RDhWxwBAquT2VzkF8Iu6wbteVbP9A8yq4BagbD079OWrr0iV8NgWA5y1GyqdER6N upe2ZtBlY7RR4F1FerpSGqRBbhWYejNojSr073ea8mmx5Yl0BbHz5aKKmzWGbUYO lveYPgCeL4dD7kfPeINiamhicLudyAGdqqYpG+/wriefwaVhTgCe+4aQ6pEwftRT utqCzrpUnxrmXS4TFXiWd4u3iVNwPhzcMyUrgkK1yTM7mWIqp8QyHzfF4Acbh/T3 RN/8d5OCfYmamlRvDUCl3FXWukkdGtBrA4m51mhUIzRJ9Np9IiSHdd2UTDgGqSeG 2NSjLtmdDQvtVXeuqBs56os7e3DFx42LZuceqbGWaTQ4VC4QE+Xz+n2ZENx/hWFZ /lixcIBdxt6iqjveJuBJeXW6UqaR+Hz4hpSigZU69DMQzrKm65bSoMdOvyn5b0bU GtlPusSnfgpsSe/H41Lm7SLBePiGXMJvhujzlkWW5cnUUl+yRUQhTO206kQJkbV1 XUMs8Syow15gjQaXI9KiAq+MMUuUwOvXmptMyYQ1NjFy16yzhJ8QOhJilJLWfLdT SqsLyXn1kG2EdcPmXHJRthIgVmQ+uORy2JB1wAomyjJj9a16wJYhgCGDjrl4mocl 9LpjfnyMsA== =ln4w -----END PGP SIGNATURE----- Merge tag 'block-6.16-20250626' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: - Fixes for ublk: - fix C++ narrowing warnings in the uapi header - update/improve UBLK_F_SUPPORT_ZERO_COPY comment in uapi header - fix for the ublk ->queue_rqs() implementation, limiting a batch to just the specific task AND ring - ublk_get_data() error handling fix - sanity check more arguments in ublk_ctrl_add_dev() - selftest addition - NVMe pull request via Christoph: - reset delayed remove_work after reconnect - fix atomic write size validation - Fix for a warning introduced in bdev_count_inflight_rw() in this merge window * tag 'block-6.16-20250626' of git://git.kernel.dk/linux: block: fix false warning in bdev_count_inflight_rw() ublk: sanity check add_dev input for underflow nvme: fix atomic write size validation nvme: refactor the atomic write unit detection nvme: reset delayed remove_work after reconnect ublk: setup ublk_io correctly in case of ublk_get_data() failure ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header ublk: fix narrowing warnings in UAPI header selftests: ublk: don't take same backing file for more than one ublk devices ublk: build batch from IOs in same io_ring_ctx and io task	2025-06-27 09:02:33 -07:00
Rob Clark	3529cb5ab1	drm/fourcc: Add 32b float formats Add 1, 2, 3, and 4 component 32b float formats, so that buffers with these formats can be imported/exported with fourcc+modifier, and/or created by gbm. These correspond to PIPE_FORMAT_{R32,R32G32,R32G32B32,R32G32B32A32}_FLOAT in mesa. v2: Fix comment describing float32 layout [Sima] Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Acked-by: Simona Vetter <simona@ffwll.ch> Acked-by: Daniel Stone <daniels@collabora.com> Link: https://lore.kernel.org/r/20250625173712.116446-3-robin.clark@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>	2025-06-27 01:46:06 +03:00
Rob Clark	e04c3521df	drm/fourcc: Add missing half-float formats Not something that is likely to be scanned out, but GPUs usually support half-float formats with 1, 2, or possibly 3 components, and it is useful to be able to import/export them with a valid fourcc, and/or use gbm to create them. These correspond to PIPE_FORMAT_{R16,R16G16,R16G16B16}_FLOAT in mesa. Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Acked-by: Simona Vetter <simona@ffwll.ch> Acked-by: Daniel Stone <daniels@collabora.com> Link: https://lore.kernel.org/r/20250625173712.116446-2-robin.clark@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>	2025-06-27 01:46:06 +03:00

1 2 3 4 5 ...

15262 Commits