Commit Graph

1036 Commits

Author SHA1 Message Date
Linus Torvalds
8be4d31cb8 Networking changes for 6.17.
Core & protocols
 ----------------
 
  - Wrap datapath globals into net_aligned_data, to avoid false sharing.
 
  - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container).
 
  - Add SO_INQ and SCM_INQ support to AF_UNIX.
 
  - Add SIOCINQ support to AF_VSOCK.
 
  - Add TCP_MAXSEG sockopt to MPTCP.
 
  - Add IPv6 force_forwarding sysctl to enable forwarding per interface.
 
  - Make TCP validation of whether packet fully fits in the receive
    window and the rcv_buf more strict. With increased use of HW
    aggregation a single "packet" can be multiple 100s of kB.
 
  - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
    improves latency up to 33% for sockmap users.
 
  - Convert TCP send queue handling from tasklet to BH workque.
 
  - Improve BPF iteration over TCP sockets to see each socket exactly once.
 
  - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code.
 
  - Support enabling kernel threads for NAPI processing on per-NAPI
    instance basis rather than a whole device. Fully stop the kernel NAPI
    thread when threaded NAPI gets disabled. Previously thread would stick
    around until ifdown due to tricky synchronization.
 
  - Allow multicast routing to take effect on locally-generated packets.
 
  - Add output interface argument for End.X in segment routing.
 
  - MCTP: add support for gateway routing, improve bind() handling.
 
  - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink.
 
  - Add a new neighbor flag ("extern_valid"), which cedes refresh
    responsibilities to userspace. This is needed for EVPN multi-homing
    where a neighbor entry for a multi-homed host needs to be synced
    across all the VTEPs among which the host is multi-homed.
 
  - Support NUD_PERMANENT for proxy neighbor entries.
 
  - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM.
 
  - Add sequence numbers to netconsole messages. Unregister netconsole's
    console when all net targets are removed. Code refactoring.
    Add a number of selftests.
 
  - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
    should be used for an inbound SA lookup.
 
  - Support inspecting ref_tracker state via DebugFS.
 
  - Don't force bonding advertisement frames tx to ~333 ms boundaries.
    Add broadcast_neighbor option to send ARP/ND on all bonded links.
 
  - Allow providing upcall pid for the 'execute' command in openvswitch.
 
  - Remove DCCP support from Netfilter's conntrack.
 
  - Disallow multiple packet duplications in the queuing layer.
 
  - Prevent use of deprecated iptables code on PREEMPT_RT.
 
 Driver API
 ----------
 
  - Support RSS and hashing configuration over ethtool Netlink.
 
  - Add dedicated ethtool callbacks for getting and setting hashing fields.
 
  - Add support for power budget evaluation strategy in PSE /
    Power-over-Ethernet. Generate Netlink events for overcurrent etc.
 
  - Support DPLL phase offset monitoring across all device inputs.
    Support providing clock reference and SYNC over separate DPLL
    inputs.
 
  - Support traffic classes in devlink rate API for bandwidth management.
 
  - Remove rtnl_lock dependency from UDP tunnel port configuration.
 
 Device drivers
 --------------
 
  - Add a new Broadcom driver for 800G Ethernet (bnge).
 
  - Add a standalone driver for Microchip ZL3073x DPLL.
 
  - Remove IBM's NETIUCV device driver.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
     - support zero-copy Tx of DMABUF memory
     - take page size into account for page pool recycling rings
    - Intel (100G, ice, idpf):
      - idpf: XDP and AF_XDP support preparations
      - idpf: add flow steering
      - add link_down_events statistic
      - clean up the TSPLL code
      - preparations for live VM migration
    - nVidia/Mellanox:
     - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
     - optimize context memory usage for matchers
     - expose serial numbers in devlink info
     - support PCIe congestion metrics
    - Meta (fbnic):
      - add 25G, 50G, and 100G link modes to phylink
      - support dumping FW logs
    - Marvell/Cavium:
      - support for CN20K generation of the Octeon chips
    - Amazon:
      - add HW clock (without timestamping, just hypervisor time access)
 
  - Ethernet virtual:
    - VirtIO net:
      - support segmentation of UDP-tunnel-encapsulated packets
    - Google (gve):
      - support packet timestamping and clock synchronization
    - Microsoft vNIC:
      - add handler for device-originated servicing events
      - allow dynamic MSI-X vector allocation
      - support Tx bandwidth clamping
 
  - Ethernet NICs consumer, and embedded:
    - AMD:
      - amd-xgbe: hardware timestamping and PTP clock support
    - Broadcom integrated MACs (bcmgenet, bcmasp):
      - use napi_complete_done() return value to support NAPI polling
      - add support for re-starting auto-negotiation
    - Broadcom switches (b53):
      - support BCM5325 switches
      - add bcm63xx EPHY power control
    - Synopsys (stmmac):
      - lots of code refactoring and cleanups
    - TI:
      - icssg-prueth: read firmware-names from device tree
      - icssg: PRP offload support
    - Microchip:
      - lan78xx: convert to PHYLINK for improved PHY and MAC management
      - ksz: add KSZ8463 switch support
    - Intel:
      - support similar queue priority scheme in multi-queue and
        time-sensitive networking (taprio)
      - support packet pre-emption in both
    - RealTek (r8169):
      - enable EEE at 5Gbps on RTL8126
    - Airoha:
      - add PPPoE offload support
      - MDIO bus controller for Airoha AN7583
 
  - Ethernet PHYs:
    - support for the IPQ5018 internal GE PHY
    - micrel KSZ9477 switch-integrated PHYs:
      - add MDI/MDI-X control support
      - add RX error counters
      - add cable test support
      - add Signal Quality Indicator (SQI) reporting
    - dp83tg720: improve reset handling and reduce link recovery time
    - support bcm54811 (and its MII-Lite interface type)
    - air_en8811h: support resume/suspend
    - support PHY counters for QCA807x and QCA808x
    - support WoL for QCA807x
 
  - CAN drivers:
    - rcar_canfd: support for Transceiver Delay Compensation
    - kvaser: report FW versions via devlink dev info
 
  - WiFi:
    - extended regulatory info support (6 GHz)
    - add statistics and beacon monitor for Multi-Link Operation (MLO)
    - support S1G aggregation, improve S1G support
    - add Radio Measurement action fields
    - support per-radio RTS threshold
    - some work around how FIPS affects wifi, which was wrong (RC4 is used
      by TKIP, not only WEP)
    - improvements for unsolicited probe response handling
 
  - WiFi drivers:
    - RealTek (rtw88):
      - IBSS mode for SDIO devices
    - RealTek (rtw89):
      - BT coexistence for MLO/WiFi7
      - concurrent station + P2P support
      - support for USB devices RTL8851BU/RTL8852BU
    - Intel (iwlwifi):
      - use embedded PNVM in (to be released) FW images to fix
        compatibility issues
      - many cleanups (unused FW APIs, PCIe code, WoWLAN)
      - some FIPS interoperability
    - MediaTek (mt76):
      - firmware recovery improvements
      - more MLO work
    - Qualcomm/Atheros (ath12k):
      - fix scan on multi-radio devices
      - more EHT/Wi-Fi 7 features
      - encapsulation/decapsulation offload
    - Broadcom (brcm80211):
      - support SDIO 43751 device
 
  - Bluetooth:
    - hci_event: add support for handling LE BIG Sync Lost event
    - ISO: add socket option to report packet seqnum via CMSG
    - ISO: support SCM_TIMESTAMPING for ISO TS
 
  - Bluetooth drivers:
    - intel_pcie: support Function Level Reset
    - nxpuart: add support for 4M baudrate
    - nxpuart: implement powerup sequence, reset, FW dump, and FW loading
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiFgLgACgkQMUZtbf5S
 IrvafxAAnQRwYBoIG+piCILx6z5pRvBGHkmEQ4AQgSCFuq2eO3ubwMFIqEybfma1
 5+QFjUZAV3OgGgKRBS2KGWxtSzdiF+/JGV1VOIN67sX3Mm0a2QgjA4n5CgKL0FPr
 o6BEzjX5XwG1zvGcBNQ5BZ19xUUKjoZQgTtnea8sZ57Fsp5RtRgmYRqoewNvNk/n
 uImh0NFsDVb0UeOpSzC34VD9l1dJvLGdui4zJAjno/vpvmT1DkXjoK419J/r52SS
 X+5WgsfJ6DkjHqVN1tIhhK34yWqBOcwGFZJgEnWHMkFIl2FqRfFKMHyqtfLlVnLA
 mnIpSyz8Sq2AHtx0TlgZ3At/Ri8p5+yYJgHOXcDKyABa8y8Zf4wrycmr6cV9JLuL
 z54nLEVnJuvfDVDVJjsLYdJXyhMpZFq6+uAItdxKaw8Ugp/QqG4QtoRj+XIHz4ZW
 z6OohkCiCzTwEISFK+pSTxPS30eOxq43kCspcvuLiwCCStJBRkRb5GdZA4dm7LA+
 1Od4ADAkHjyrFtBqTyyC2scX8UJ33DlAIpAYyIeS6w9Cj9EXxtp1z33IAAAZ03MW
 jJwIaJuc8bK2fWKMmiG7ucIXjPo4t//KiWlpkwwqLhPbjZgfDAcxq1AC2TLoqHBL
 y4EOgKpHDCMAghSyiFIAn2JprGcEt8dp+11B0JRXIn4Pm/eYDH8=
 =lqbe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Wrap datapath globals into net_aligned_data, to avoid false sharing

   - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)

   - Add SO_INQ and SCM_INQ support to AF_UNIX

   - Add SIOCINQ support to AF_VSOCK

   - Add TCP_MAXSEG sockopt to MPTCP

   - Add IPv6 force_forwarding sysctl to enable forwarding per interface

   - Make TCP validation of whether packet fully fits in the receive
     window and the rcv_buf more strict. With increased use of HW
     aggregation a single "packet" can be multiple 100s of kB

   - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
     improves latency up to 33% for sockmap users

   - Convert TCP send queue handling from tasklet to BH workque

   - Improve BPF iteration over TCP sockets to see each socket exactly
     once

   - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code

   - Support enabling kernel threads for NAPI processing on per-NAPI
     instance basis rather than a whole device. Fully stop the kernel
     NAPI thread when threaded NAPI gets disabled. Previously thread
     would stick around until ifdown due to tricky synchronization

   - Allow multicast routing to take effect on locally-generated packets

   - Add output interface argument for End.X in segment routing

   - MCTP: add support for gateway routing, improve bind() handling

   - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink

   - Add a new neighbor flag ("extern_valid"), which cedes refresh
     responsibilities to userspace. This is needed for EVPN multi-homing
     where a neighbor entry for a multi-homed host needs to be synced
     across all the VTEPs among which the host is multi-homed

   - Support NUD_PERMANENT for proxy neighbor entries

   - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM

   - Add sequence numbers to netconsole messages. Unregister
     netconsole's console when all net targets are removed. Code
     refactoring. Add a number of selftests

   - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
     should be used for an inbound SA lookup

   - Support inspecting ref_tracker state via DebugFS

   - Don't force bonding advertisement frames tx to ~333 ms boundaries.
     Add broadcast_neighbor option to send ARP/ND on all bonded links

   - Allow providing upcall pid for the 'execute' command in openvswitch

   - Remove DCCP support from Netfilter's conntrack

   - Disallow multiple packet duplications in the queuing layer

   - Prevent use of deprecated iptables code on PREEMPT_RT

  Driver API:

   - Support RSS and hashing configuration over ethtool Netlink

   - Add dedicated ethtool callbacks for getting and setting hashing
     fields

   - Add support for power budget evaluation strategy in PSE /
     Power-over-Ethernet. Generate Netlink events for overcurrent etc

   - Support DPLL phase offset monitoring across all device inputs.
     Support providing clock reference and SYNC over separate DPLL
     inputs

   - Support traffic classes in devlink rate API for bandwidth
     management

   - Remove rtnl_lock dependency from UDP tunnel port configuration

  Device drivers:

   - Add a new Broadcom driver for 800G Ethernet (bnge)

   - Add a standalone driver for Microchip ZL3073x DPLL

   - Remove IBM's NETIUCV device driver

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support zero-copy Tx of DMABUF memory
         - take page size into account for page pool recycling rings
      - Intel (100G, ice, idpf):
         - idpf: XDP and AF_XDP support preparations
         - idpf: add flow steering
         - add link_down_events statistic
         - clean up the TSPLL code
         - preparations for live VM migration
      - nVidia/Mellanox:
         - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
         - optimize context memory usage for matchers
         - expose serial numbers in devlink info
         - support PCIe congestion metrics
      - Meta (fbnic):
         - add 25G, 50G, and 100G link modes to phylink
         - support dumping FW logs
      - Marvell/Cavium:
         - support for CN20K generation of the Octeon chips
      - Amazon:
         - add HW clock (without timestamping, just hypervisor time access)

   - Ethernet virtual:
      - VirtIO net:
         - support segmentation of UDP-tunnel-encapsulated packets
      - Google (gve):
         - support packet timestamping and clock synchronization
      - Microsoft vNIC:
         - add handler for device-originated servicing events
         - allow dynamic MSI-X vector allocation
         - support Tx bandwidth clamping

   - Ethernet NICs consumer, and embedded:
      - AMD:
         - amd-xgbe: hardware timestamping and PTP clock support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - use napi_complete_done() return value to support NAPI polling
         - add support for re-starting auto-negotiation
      - Broadcom switches (b53):
         - support BCM5325 switches
         - add bcm63xx EPHY power control
      - Synopsys (stmmac):
         - lots of code refactoring and cleanups
      - TI:
         - icssg-prueth: read firmware-names from device tree
         - icssg: PRP offload support
      - Microchip:
         - lan78xx: convert to PHYLINK for improved PHY and MAC management
         - ksz: add KSZ8463 switch support
      - Intel:
         - support similar queue priority scheme in multi-queue and
           time-sensitive networking (taprio)
         - support packet pre-emption in both
      - RealTek (r8169):
         - enable EEE at 5Gbps on RTL8126
      - Airoha:
         - add PPPoE offload support
         - MDIO bus controller for Airoha AN7583

   - Ethernet PHYs:
      - support for the IPQ5018 internal GE PHY
      - micrel KSZ9477 switch-integrated PHYs:
         - add MDI/MDI-X control support
         - add RX error counters
         - add cable test support
         - add Signal Quality Indicator (SQI) reporting
      - dp83tg720: improve reset handling and reduce link recovery time
      - support bcm54811 (and its MII-Lite interface type)
      - air_en8811h: support resume/suspend
      - support PHY counters for QCA807x and QCA808x
      - support WoL for QCA807x

   - CAN drivers:
      - rcar_canfd: support for Transceiver Delay Compensation
      - kvaser: report FW versions via devlink dev info

   - WiFi:
      - extended regulatory info support (6 GHz)
      - add statistics and beacon monitor for Multi-Link Operation (MLO)
      - support S1G aggregation, improve S1G support
      - add Radio Measurement action fields
      - support per-radio RTS threshold
      - some work around how FIPS affects wifi, which was wrong (RC4 is
        used by TKIP, not only WEP)
      - improvements for unsolicited probe response handling

   - WiFi drivers:
      - RealTek (rtw88):
         - IBSS mode for SDIO devices
      - RealTek (rtw89):
         - BT coexistence for MLO/WiFi7
         - concurrent station + P2P support
         - support for USB devices RTL8851BU/RTL8852BU
      - Intel (iwlwifi):
         - use embedded PNVM in (to be released) FW images to fix
           compatibility issues
         - many cleanups (unused FW APIs, PCIe code, WoWLAN)
         - some FIPS interoperability
      - MediaTek (mt76):
         - firmware recovery improvements
         - more MLO work
      - Qualcomm/Atheros (ath12k):
         - fix scan on multi-radio devices
         - more EHT/Wi-Fi 7 features
         - encapsulation/decapsulation offload
      - Broadcom (brcm80211):
         - support SDIO 43751 device

   - Bluetooth:
      - hci_event: add support for handling LE BIG Sync Lost event
      - ISO: add socket option to report packet seqnum via CMSG
      - ISO: support SCM_TIMESTAMPING for ISO TS

   - Bluetooth drivers:
      - intel_pcie: support Function Level Reset
      - nxpuart: add support for 4M baudrate
      - nxpuart: implement powerup sequence, reset, FW dump, and FW loading"

* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
  dpll: zl3073x: Fix build failure
  selftests: bpf: fix legacy netfilter options
  ipv6: annotate data-races around rt->fib6_nsiblings
  ipv6: fix possible infinite loop in fib6_info_uses_dev()
  ipv6: prevent infinite loop in rt6_nlmsg_size()
  ipv6: add a retry logic in net6_rt_notify()
  vrf: Drop existing dst reference in vrf_ip6_input_dst
  net/sched: taprio: align entry index attr validation with mqprio
  net: fsl_pq_mdio: use dev_err_probe
  selftests: rtnetlink.sh: remove esp4_offload after test
  vsock: remove unnecessary null check in vsock_getname()
  igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
  stmmac: xsk: fix negative overflow of budget in zerocopy mode
  dt-bindings: ieee802154: Convert at86rf230.txt yaml format
  net: dsa: microchip: Disable PTP function of KSZ8463
  net: dsa: microchip: Setup fiber ports for KSZ8463
  net: dsa: microchip: Write switch MAC address differently for KSZ8463
  net: dsa: microchip: Use different registers for KSZ8463
  net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
  dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
  ...
2025-07-30 08:58:55 -07:00
Linus Torvalds
13150742b0 Crypto library updates for 6.17
This is the main crypto library pull request for 6.17. The main focus
 this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
 high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
 and establishing conventions for lib/crypto/ going forward:
 
  - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
    most of the SHA-512 code) into lib/crypto/. This includes both the
    generic and architecture-optimized code. Greatly simplify how the
    architecture-optimized code is integrated. Add an easy-to-use
    library API for each SHA variant, including HMAC support. Finally,
    reimplement the crypto_shash support on top of the library API.
 
  - Apply the same reorganization to the SHA-256 code (and also SHA-224
    which shares most of the SHA-256 code). This is a somewhat smaller
    change, due to my earlier work on SHA-256. But this brings in all
    the same additional improvements that I made for SHA-1 and SHA-512.
 
 There are also some smaller changes:
 
  - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
    from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
    these algorithms it's just a move, not a full reorganization yet.
 
  - Fix the MIPS chacha-core.S to build with the clang assembler.
 
  - Fix the Poly1305 functions to work in all contexts.
 
  - Fix a performance regression in the x86_64 Poly1305 code.
 
  - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
 
 Note that since the new organization of the SHA code is much simpler,
 the diffstat of this pull request is negative, despite the addition of
 new fully-documented library APIs for multiple SHA and HMAC-SHA
 variants. These APIs will allow further simplifications across the
 kernel as users start using them instead of the old-school crypto API.
 (I've already written a lot of such conversion patches, removing over
 1000 more lines of code. But most of those will target 6.18 or later.)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaIZ93BQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOK8HCAQD3O9P0qd6wscne5XuRwaybzKHQ2AqU
 OlhlDZWQQEvYAgD/aa6KP/DS+8RKGj0TBn6bACAJyXyDygFXq5a5s9pGzAs=
 =UmMM
 -----END PGP SIGNATURE-----

Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library updates from Eric Biggers:
 "This is the main crypto library pull request for 6.17. The main focus
  this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
  high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
  and establishing conventions for lib/crypto/ going forward:

   - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
     most of the SHA-512 code) into lib/crypto/. This includes both the
     generic and architecture-optimized code. Greatly simplify how the
     architecture-optimized code is integrated. Add an easy-to-use
     library API for each SHA variant, including HMAC support. Finally,
     reimplement the crypto_shash support on top of the library API.

   - Apply the same reorganization to the SHA-256 code (and also SHA-224
     which shares most of the SHA-256 code). This is a somewhat smaller
     change, due to my earlier work on SHA-256. But this brings in all
     the same additional improvements that I made for SHA-1 and SHA-512.

  There are also some smaller changes:

   - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
     from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
     these algorithms it's just a move, not a full reorganization yet.

   - Fix the MIPS chacha-core.S to build with the clang assembler.

   - Fix the Poly1305 functions to work in all contexts.

   - Fix a performance regression in the x86_64 Poly1305 code.

   - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.

  Note that since the new organization of the SHA code is much simpler,
  the diffstat of this pull request is negative, despite the addition of
  new fully-documented library APIs for multiple SHA and HMAC-SHA
  variants.

  These APIs will allow further simplifications across the kernel as
  users start using them instead of the old-school crypto API. (I've
  already written a lot of such conversion patches, removing over 1000
  more lines of code. But most of those will target 6.18 or later)"

* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits)
  lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
  lib/crypto: x86/sha1-ni: Convert to use rounds macros
  lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
  crypto: sha1 - Remove sha1_base.h
  lib/crypto: x86/sha1: Migrate optimized code into library
  lib/crypto: sparc/sha1: Migrate optimized code into library
  lib/crypto: s390/sha1: Migrate optimized code into library
  lib/crypto: powerpc/sha1: Migrate optimized code into library
  lib/crypto: mips/sha1: Migrate optimized code into library
  lib/crypto: arm64/sha1: Migrate optimized code into library
  lib/crypto: arm/sha1: Migrate optimized code into library
  crypto: sha1 - Use same state format as legacy drivers
  crypto: sha1 - Wrap library and add HMAC support
  lib/crypto: sha1: Add HMAC support
  lib/crypto: sha1: Add SHA-1 library functions
  lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
  crypto: x86/sha1 - Rename conflicting symbol
  lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()
  lib/crypto: arm/poly1305: Remove unneeded empty weak function
  lib/crypto: x86/poly1305: Fix performance regression on short messages
  ...
2025-07-28 17:58:52 -07:00
Gabriel Goller
f24987ef69 ipv6: add force_forwarding sysctl to enable per-interface forwarding
It is currently impossible to enable ipv6 forwarding on a per-interface
basis like in ipv4. To enable forwarding on an ipv6 interface we need to
enable it on all interfaces and disable it on the other interfaces using
a netfilter rule. This is especially cumbersome if you have lots of
interfaces and only want to enable forwarding on a few. According to the
sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding
for all interfaces, while the interface-specific
`net.ipv6.conf.<interface>.forwarding` configures the interface
Host/Router configuration.

Introduce a new sysctl flag `force_forwarding`, which can be set on every
interface. The ip6_forwarding function will then check if the global
forwarding flag OR the force_forwarding flag is active and forward the
packet.

To preserve backwards-compatibility reset the flag (on all interfaces)
to 0 if the net.ipv6.conf.all.forwarding flag is set to 0.

Add a short selftest that checks if a packet gets forwarded with and
without `force_forwarding`.

[0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Link: https://patch.msgid.link/20250722081847.132632-1-g.goller@proxmox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25 13:06:19 -07:00
Stanislav Fomichev
93893a57ef net: s/dev_get_flags/netif_get_flags/
Commit cc34acd577 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18 17:27:47 -07:00
Eric Biggers
9503ca2cca lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
Rename the existing sha1_init() to sha1_init_raw(), since it conflicts
with the upcoming library function.  This will later be removed, but
this keeps the kernel building for the introduction of the library.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250712232329.818226-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14 08:22:31 -07:00
Jakub Kicinski
0cad34fb7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).

No conflicts.

Adjacent changes:

drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
  c701574c54 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan")
  b3a431fe2e ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")

drivers/net/wireless/mediatek/mt76/mt7996/mac.c
  62da647a2b ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
  dc66a129ad ("wifi: mt76: add a wrapper for wcid access with validation")

drivers/net/wireless/mediatek/mt76/mt7996/main.c
  3dd6f67c66 ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()")
  8989d8e90f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")

net/mac80211/cfg.c
  58fcb1b428 ("wifi: mac80211: reject VHT opmode for unsupported channel widths")
  037dc18ac3 ("wifi: mac80211: add support for storing station S1G capabilities")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11 11:42:38 -07:00
Guillaume Nault
4e914ef063 gre: Fix IPv6 multicast route creation.
Use addrconf_add_dev() instead of ipv6_find_idev() in
addrconf_gre_config() so that we don't just get the inet6_dev, but also
install the default ff00::/8 multicast route.

Before commit 3e6a0243ff ("gre: Fix again IPv6 link-local address
generation."), the multicast route was created at the end of the
function by addrconf_add_mroute(). But this code path is now only taken
in one particular case (gre devices not bound to a local IP address and
in EUI64 mode). For all other cases, the function exits early and
addrconf_add_mroute() is not called anymore.

Using addrconf_add_dev() instead of ipv6_find_idev() in
addrconf_gre_config(), fixes the problem as it will create the default
multicast route for all gre devices. This also brings
addrconf_gre_config() a bit closer to the normal netdevice IPv6
configuration code (addrconf_dev_config()).

Cc: stable@vger.kernel.org
Fixes: 3e6a0243ff ("gre: Fix again IPv6 link-local address generation.")
Reported-by: Aiden Yang <ling@moedove.com>
Closes: https://lore.kernel.org/netdev/CANR=AhRM7YHHXVxJ4DmrTNMeuEOY87K2mLmo9KMed1JMr20p6g@mail.gmail.com/
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/027a923dcb550ad115e6d93ee8bb7d310378bd01.1752070620.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10 18:10:47 -07:00
Kuniyuki Iwashima
7b6b53a76f ipv6: anycast: Don't use rtnl_dereference().
inet6_dev->ac_list is protected by inet6_dev->lock, so rtnl_dereference()
is a bit rough annotation.

As done in mcast.c, we can use ac_dereference() that checks if
inet6_dev->lock is held.

Let's replace rtnl_dereference() with a new helper ac_dereference().

Note that now addrconf_join_solict() / addrconf_leave_solict() in
__ipv6_dev_ac_inc() / __ipv6_dev_ac_dec() does not need RTNL, so we
can remove ASSERT_RTNL() there.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-12-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08 18:32:39 -07:00
Kuniyuki Iwashima
e01b193e0b ipv6: mcast: Use in6_dev_get() in ipv6_dev_mc_dec().
As well as __ipv6_dev_mc_inc(), all code in __ipv6_dev_mc_dec() are
protected by inet6_dev->mc_lock, and RTNL is not needed.

Let's use in6_dev_get() in ipv6_dev_mc_dec() and remove ASSERT_RTNL()
in __ipv6_dev_mc_dec().

Now, we can remove the RTNL comment above addrconf_leave_solict() too.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-6-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima
dbd40f318c ipv6: mcast: Check inet6_dev->dead under idev->mc_lock in __ipv6_dev_mc_inc().
Since commit 63ed8de4be ("mld: add mc_lock for protecting
per-interface mld data"), every multicast resource is protected
by inet6_dev->mc_lock.

RTNL is unnecessary in terms of protection but still needed for
synchronisation between addrconf_ifdown() and __ipv6_dev_mc_inc().

Once we removed RTNL, there would be a race below, where we could
add a multicast address to a dead inet6_dev.

  CPU1                            CPU2
  ====                            ====
  addrconf_ifdown()               __ipv6_dev_mc_inc()
                                    if (idev->dead) <-- false
    dead = true                       return -ENODEV;
    ipv6_mc_destroy_dev() / ipv6_mc_down()
      mutex_lock(&idev->mc_lock)
      ...
      mutex_unlock(&idev->mc_lock)
                                    mutex_lock(&idev->mc_lock)
                                    ...
                                    mutex_unlock(&idev->mc_lock)

The race window can be easily closed by checking inet6_dev->dead
under inet6_dev->mc_lock in __ipv6_dev_mc_inc() as addrconf_ifdown()
will acquire it after marking inet6_dev dead.

Let's check inet6_dev->dead under mc_lock in __ipv6_dev_mc_inc().

Note that now __ipv6_dev_mc_inc() no longer depends on RTNL and
we can remove ASSERT_RTNL() there and the RTNL comment above
addrconf_join_solict().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-4-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08 18:32:37 -07:00
Guillaume Nault
d3623dd5bd ipv6: Simplify link-local address generation for IPv6 GRE.
Since commit 3e6a0243ff ("gre: Fix again IPv6 link-local address
generation."), addrconf_gre_config() has stopped handling IP6GRE
devices specially and just calls the regular addrconf_addr_gen()
function to create their link-local IPv6 addresses.

We can thus avoid using addrconf_gre_config() for IP6GRE devices and
use the normal IPv6 initialisation path instead (that is, jump directly
to addrconf_dev_config() in addrconf_init_auto_addrs()).

See commit 3e6a0243ff ("gre: Fix again IPv6 link-local address
generation.") for a deeper explanation on how and why GRE devices
started handling their IPv6 link-local address generation specially,
why it was a problem, and why this is not even necessary in most cases
(especially for GRE over IPv6).

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a9144be9c7ec3cf09f25becae5e8fdf141fde9f6.1750075076.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19 12:11:40 +02:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Jakub Kicinski
6b02fd7799 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc6).

No conflicts.

Adjacent changes:

net/core/dev.c:
  08e9f2d584 ("net: Lock netdevices during dev_shutdown")
  a82dc19db1 ("net: avoid potential race between netdev_get_by_index_lock() and netns switch")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08 08:59:02 -07:00
Guillaume Nault
3e6a0243ff gre: Fix again IPv6 link-local address generation.
Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE
devices in most cases and fall back to using add_v4_addrs() only in
case the GRE configuration is incompatible with addrconf_addr_gen().

GRE used to use addrconf_addr_gen() until commit e5dd729460 ("ip/ip6_gre:
use the same logic as SIT interfaces when computing v6LL address")
restricted this use to gretap and ip6gretap devices, and created
add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones.

The original problem came when commit 9af28511be ("addrconf: refuse
isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its
addr parameter was 0. The commit says that this would create an invalid
address, however, I couldn't find any RFC saying that the generated
interface identifier would be wrong. Anyway, since gre over IPv4
devices pass their local tunnel address to __ipv6_isatap_ifid(), that
commit broke their IPv6 link-local address generation when the local
address was unspecified.

Then commit e5dd729460 ("ip/ip6_gre: use the same logic as SIT
interfaces when computing v6LL address") tried to fix that case by
defining add_v4_addrs() and calling it to generate the IPv6 link-local
address instead of using addrconf_addr_gen() (apart for gretap and
ip6gretap devices, which would still use the regular
addrconf_addr_gen(), since they have a MAC address).

That broke several use cases because add_v4_addrs() isn't properly
integrated into the rest of IPv6 Neighbor Discovery code. Several of
these shortcomings have been fixed over time, but add_v4_addrs()
remains broken on several aspects. In particular, it doesn't send any
Router Sollicitations, so the SLAAC process doesn't start until the
interface receives a Router Advertisement. Also, add_v4_addrs() mostly
ignores the address generation mode of the interface
(/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the
IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases.

Fix the situation by using add_v4_addrs() only in the specific scenario
where the normal method would fail. That is, for interfaces that have
all of the following characteristics:

  * run over IPv4,
  * transport IP packets directly, not Ethernet (that is, not gretap
    interfaces),
  * tunnel endpoint is INADDR_ANY (that is, 0),
  * device address generation mode is EUI64.

In all other cases, revert back to the regular addrconf_addr_gen().

Also, remove the special case for ip6gre interfaces in add_v4_addrs(),
since ip6gre devices now always use addrconf_addr_gen() instead.

Note:
  This patch was originally applied as commit 183185a18f ("gre: Fix
  IPv6 link-local address generation."). However, it was then reverted
  by commit fc486c2d06 ("Revert "gre: Fix IPv6 link-local address
  generation."") because it uncovered another bug that ended up
  breaking net/forwarding/ip6gre_custom_multipath_hash.sh. That other
  bug has now been fixed by commit 4d0ab3a688 ("ipv6: Start path
  selection from the first nexthop"). Therefore we can now revive this
  GRE patch (no changes since original commit 183185a18f ("gre: Fix
  IPv6 link-local address generation.").

Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a88cc5c4811af36007645d610c95102dccb360a6.1746225214.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05 18:08:14 -07:00
Breno Leitao
6c454270a8 ipv6: Use nlmsg_payload in addrconf file
Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250415-nlmsg_v2-v1-2-a1c75d493fd7@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16 18:33:02 -07:00
Breno Leitao
8cf1e30907 ipv6: Use nlmsg_payload in inet6_rtm_valid_getaddr_req
Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-7-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 08:28:54 -07:00
Breno Leitao
e87187dfbb ipv6: Use nlmsg_payload in inet6_valid_dump_ifaddr_req
Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-6-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 08:28:54 -07:00
Linus Torvalds
ab59a86056 Including fixes from netfilter.
Current release - regressions:
 
   - core: hold instance lock during NETDEV_CHANGE
 
   - rtnetlink: fix bad unlock balance in do_setlink().
 
   - ipv6:
     - fix null-ptr-deref in addrconf_add_ifaddr().
     - align behavior across nexthops during path selection
 
 Previous releases - regressions:
 
   - sctp: prevent transport UaF in sendmsg
 
   - mptcp: only inc MPJoinAckHMacFailure for HMAC failures
 
 Previous releases - always broken:
 
   - sched:
     - make ->qlen_notify() idempotent
     - ensure sufficient space when sending filter netlink notifications
     - sch_sfq: really don't allow 1 packet limit
 
   - netfilter: fix incorrect avx2 match of 5th field octet
 
   - tls: explicitly disallow disconnect
 
   - eth: octeontx2-pf: fix VF root node parent queue priority
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmf3xusSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkud0P/iWOQB0oj0nvxl2ionPzgJEPduxuF0V6
 YPyDBUzLC7Gq6NmTcdDlNJt8fE6UmKUIneghUm9Ss7LRpKv0/TPvorKMSK44Zt53
 a5q49JeoI0TvnnhJesdHjiF31hrInqZmcX8OjSH8Q/SCKuy7rsgzao0vjvhd7lxm
 wA6LlWnJO1Pf991nNpbjUSoAZ7CMNlEIewGkdq0+6UADC7D9VagKTgIkFKw1BvRw
 2Eb2pzvdO9Pj02+l/mjdRhUzMZlr+FG+WBqXk5oKR0YZ2t3CS4O9/UUBoAn775tM
 gCfzepNuAUXGX0I6h+DANCNuswWuG/IvYTdhy+hRWblYeCkILU60E8eVMlh7tpII
 fUd5GSRhX1NpGNHUlDG/4b6IcjMO3ebtce2cm2Y9t2CUe7EqB0HZyvTczNroTxip
 KXrXcCBuEkzxXCZhaN/CrBu8Piu8vJk/rMH5ha1khce9CkmYY+m9ruvsYjZmPI+/
 P/SFkRdb/yV/SIOmay8FCJsy60t4FOtLnlDDrnygq4Q/9a7VwafebVpKS1fbTELG
 ZTiELN3/PN2GUnfREf0DVLPfn9sdqrMZaclLOJpp/Zi1/RZpo52WHceXJShiu9pe
 A8B+3SuPgOaLfhwyqiHlWm5moc9kNF26vlrWfFjK1GrJdxMisYwQoWD5eHfQFhDX
 UaxlAmndwZa9
 =wkGz
 -----END PGP SIGNATURE-----

Merge tag 'net-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

    - core: hold instance lock during NETDEV_CHANGE

    - rtnetlink: fix bad unlock balance in do_setlink()

    - ipv6:
       - fix null-ptr-deref in addrconf_add_ifaddr()
       - align behavior across nexthops during path selection

  Previous releases - regressions:

    - sctp: prevent transport UaF in sendmsg

    - mptcp: only inc MPJoinAckHMacFailure for HMAC failures

  Previous releases - always broken:

    - sched:
       - make ->qlen_notify() idempotent
       - ensure sufficient space when sending filter netlink notifications
       - sch_sfq: really don't allow 1 packet limit

    - netfilter: fix incorrect avx2 match of 5th field octet

    - tls: explicitly disallow disconnect

    - eth: octeontx2-pf: fix VF root node parent queue priority"

* tag 'net-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  ethtool: cmis_cdb: Fix incorrect read / write length extension
  selftests: netfilter: add test case for recent mismatch bug
  nft_set_pipapo: fix incorrect avx2 match of 5th field octet
  net: ppp: Add bound checking for skb data on ppp_sync_txmung
  net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.
  ipv6: Align behavior across nexthops during path selection
  net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY
  net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend()
  selftests/tc-testing: sfq: check that a derived limit of 1 is rejected
  net_sched: sch_sfq: move the limit validation
  net_sched: sch_sfq: use a temporary work area for validating configuration
  net: libwx: handle page_pool_dev_alloc_pages error
  selftests: mptcp: validate MPJoin HMacFailure counters
  mptcp: only inc MPJoinAckHMacFailure for HMAC failures
  rtnetlink: Fix bad unlock balance in do_setlink().
  net: ethtool: Don't call .cleanup_data when prepare_data fails
  tc: Ensure we have enough buffer space when sending filter netlink notifications
  net: libwx: Fix the wrong Rx descriptor field
  octeontx2-pf: qos: fix VF root node parent queue index
  selftests: tls: check that disconnect does nothing
  ...
2025-04-10 08:52:18 -07:00
Kuniyuki Iwashima
54f5fafcce ipv6: Fix null-ptr-deref in addrconf_add_ifaddr().
The cited commit placed netdev_lock_ops() just after __dev_get_by_index()
in addrconf_add_ifaddr(), where dev could be NULL as reported. [0]

Let's call netdev_lock_ops() only when dev is not NULL.

[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000198: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000cc0-0x0000000000000cc7]
CPU: 3 UID: 0 PID: 12032 Comm: syz.0.15 Not tainted 6.14.0-13408-g9f867ba24d36 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:addrconf_add_ifaddr (./include/net/netdev_lock.h:30 ./include/net/netdev_lock.h:41 net/ipv6/addrconf.c:3157)
Code: 8b b4 24 94 00 00 00 4c 89 ef e8 7e 4c 2f ff 4c 8d b0 c5 0c 00 00 48 89 c3 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 80
RSP: 0018:ffffc90015b0faa0 EFLAGS: 00010213
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000198 RSI: ffffffff893162f2 RDI: ffff888078cb0338
RBP: ffffc90015b0fbb0 R08: 0000000000000000 R09: fffffbfff20cbbe2
R10: ffffc90015b0faa0 R11: 0000000000000000 R12: 1ffff92002b61f54
R13: ffff888078cb0000 R14: 0000000000000cc5 R15: ffff888078cb0000
FS: 00007f92559ed640(0000) GS:ffff8882a8659000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f92559ecfc8 CR3: 000000001c39e000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 inet6_ioctl (net/ipv6/af_inet6.c:580)
 sock_do_ioctl (net/socket.c:1196)
 sock_ioctl (net/socket.c:1314)
 __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:906 fs/ioctl.c:892 fs/ioctl.c:892)
 do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130
RIP: 0033:0x7f9254b9c62d
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff f8
RSP: 002b:00007f92559ecf98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f9254d65f80 RCX: 00007f9254b9c62d
RDX: 0000000020000040 RSI: 0000000000008916 RDI: 0000000000000003
RBP: 00007f9254c264d3 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f9254d65f80 R15: 00007f92559cd000
 </TASK>
Modules linked in:

Fixes: 8965c160b8 ("net: use netif_disable_lro in ipv6_add_dev")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Hui Guo <guohui.study@gmail.com>
Closes: https://lore.kernel.org/netdev/CAHOo4gK+tdU1B14Kh6tg-tNPqnQ1qGLfinONFVC43vmgEPnXXw@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250406035755.69238-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-07 11:02:19 -07:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Stanislav Fomichev
8965c160b8 net: use netif_disable_lro in ipv6_add_dev
ipv6_add_dev might call dev_disable_lro which unconditionally grabs
instance lock, so it will deadlock during NETDEV_REGISTER. Switch
to netif_disable_lro.

Make sure all callers hold the instance lock as well.

Cc: Cosmin Ratiu <cratiu@nvidia.com>
Fixes: ad7c7b2172 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:08 -07:00
Fernando Fernandez Mancera
7ac6ea4a3e ipv6: fix omitted netlink attributes when using RTEXT_FILTER_SKIP_STATS
Using RTEXT_FILTER_SKIP_STATS is incorrectly skipping non-stats IPv6
netlink attributes on link dump. This causes issues on userspace tools,
e.g iproute2 is not rendering address generation mode as it should due
to missing netlink attribute.

Move the filling of IFLA_INET6_STATS and IFLA_INET6_ICMP6STATS to a
helper function guarded by a flag check to avoid hitting the same
situation in the future.

Fixes: d5566fd72e ("rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250402121751.3108-1-ffmancera@riseup.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:11:29 -07:00
Guillaume Nault
fc486c2d06 Revert "gre: Fix IPv6 link-local address generation."
This reverts commit 183185a18f.

This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.

Fixes: 183185a18f ("gre: Fix IPv6 link-local address generation.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/8b1ce738eb15dd841aab9ef888640cab4f6ccfea.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:46:16 +01:00
Guillaume Nault
183185a18f gre: Fix IPv6 link-local address generation.
Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE
devices in most cases and fall back to using add_v4_addrs() only in
case the GRE configuration is incompatible with addrconf_addr_gen().

GRE used to use addrconf_addr_gen() until commit e5dd729460
("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL
address") restricted this use to gretap and ip6gretap devices, and
created add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones.

The original problem came when commit 9af28511be ("addrconf: refuse
isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its
addr parameter was 0. The commit says that this would create an invalid
address, however, I couldn't find any RFC saying that the generated
interface identifier would be wrong. Anyway, since gre over IPv4
devices pass their local tunnel address to __ipv6_isatap_ifid(), that
commit broke their IPv6 link-local address generation when the local
address was unspecified.

Then commit e5dd729460 ("ip/ip6_gre: use the same logic as SIT
interfaces when computing v6LL address") tried to fix that case by
defining add_v4_addrs() and calling it to generate the IPv6 link-local
address instead of using addrconf_addr_gen() (apart for gretap and
ip6gretap devices, which would still use the regular
addrconf_addr_gen(), since they have a MAC address).

That broke several use cases because add_v4_addrs() isn't properly
integrated into the rest of IPv6 Neighbor Discovery code. Several of
these shortcomings have been fixed over time, but add_v4_addrs()
remains broken on several aspects. In particular, it doesn't send any
Router Sollicitations, so the SLAAC process doesn't start until the
interface receives a Router Advertisement. Also, add_v4_addrs() mostly
ignores the address generation mode of the interface
(/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the
IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases.

Fix the situation by using add_v4_addrs() only in the specific scenario
where the normal method would fail. That is, for interfaces that have
all of the following characteristics:

  * run over IPv4,
  * transport IP packets directly, not Ethernet (that is, not gretap
    interfaces),
  * tunnel endpoint is INADDR_ANY (that is, 0),
  * device address generation mode is EUI64.

In all other cases, revert back to the regular addrconf_addr_gen().

Also, remove the special case for ip6gre interfaces in add_v4_addrs(),
since ip6gre devices now always use addrconf_addr_gen() instead.

Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/559c32ce5c9976b269e6337ac9abb6a96abe5096.1741375285.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13 10:17:42 +01:00
Kuniyuki Iwashima
7bcf45ddb8 ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
Let's register inet6_rtm_deladdr() with RTNL_FLAG_DOIT_PERNET and
hold rtnl_net_lock() before inet6_addr_del().

Now that inet6_addr_del() is always called under per-netns RTNL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:06 -08:00
Kuniyuki Iwashima
82a1e6aa8f ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
Let's register inet6_rtm_newaddr() with RTNL_FLAG_DOIT_PERNET
and hold rtnl_net_lock() before __dev_get_by_index().

Now that inet6_addr_add() and inet6_addr_modify() are always
called under per-netns RTNL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-11-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:06 -08:00
Kuniyuki Iwashima
867b385251 ipv6: Move lifetime validation to inet6_rtm_newaddr().
inet6_addr_add() and inet6_addr_modify() have the same code to validate
IPv6 lifetime that is done under RTNL.

Let's factorise it out to inet6_rtm_newaddr() so that we can validate
the lifetime without RTNL later.

Note that inet6_addr_add() is called from addrconf_add_ifaddr(), but the
lifetime is INFINITY_LIFE_TIME in the path, so expires and flags are 0.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-10-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
2f1ace4127 ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
We will convert inet6_rtm_newaddr() to per-netns RTNL.

Except for IFA_F_OPTIMISTIC, cfg.ifa_flags can be set before
__dev_get_by_index().

Let's move ifa_flags setup before __dev_get_by_index() so that
we can set ifa_flags without RTNL.

Also, now it's moved before tb[IFA_CACHEINFO] in preparing for
the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
f7fce98a73 ipv6: Pass dev to inet6_addr_add().
inet6_addr_add() is called from inet6_rtm_newaddr() and
addrconf_add_ifaddr().

inet6_addr_add() looks up dev by __dev_get_by_index(), but
it's already done in inet6_rtm_newaddr().

Let's move the 2nd lookup to addrconf_add_ifaddr() and pass
dev to inet6_addr_add().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
832128cc44 ipv6: Convert inet6_ioctl() to per-netns RTNL.
These functions are called from inet6_ioctl() with a socket's netns
and hold RTNL.

  * SIOCSIFADDR    : addrconf_add_ifaddr()
  * SIOCDIFADDR    : addrconf_del_ifaddr()
  * SIOCSIFDSTADDR : addrconf_set_dstaddr()

Let's use rtnl_net_lock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
cdc5c1196e ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
addrconf_init() holds RTNL for blackhole_netdev, which is the global
device in init_net.

addrconf_cleanup() holds RTNL to clean up devices in init_net too.

Let's use rtnl_net_lock(&init_net) there.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
02cdd78b4e ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
addrconf_dad_work() is per-address work and holds RTNL internally.

We can fetch netns as dev_net(ifp->idev->dev).

Let's use rtnl_net_lock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
6550ba0863 ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
addrconf_verify_work() is per-netns work to call addrconf_verify_rtnl()
under RTNL.

Let's use rtnl_net_lock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:05 -08:00
Kuniyuki Iwashima
93c839e3ed ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
net.ipv6.conf.${DEV}.XXX sysctl are changed under RTNL:

  * forwarding
  * ignore_routes_with_linkdown
  * disable_ipv6
  * proxy_ndp
  * addr_gen_mode
  * stable_secret
  * disable_policy

Let's use rtnl_net_lock() there.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250115080608.28127-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 12:16:04 -08:00
Yuyang Huang
33d97a07b3 netlink: add IPv6 anycast join/leave notifications
This change introduces a mechanism for notifying userspace
applications about changes to IPv6 anycast addresses via netlink. It
includes:

* Addition and deletion of IPv6 anycast addresses are reported using
  RTM_NEWANYCAST and RTM_DELANYCAST.
* A new netlink group (RTNLGRP_IPV6_ACADDR) for subscribing to these
  notifications.

This enables user space applications(e.g. ip monitor) to efficiently
track anycast addresses through netlink messages, improving metrics
collection and system monitoring. It also unlocks the potential for
advanced anycast management in user space, such as hardware offload
control and fine grained network control.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250107114355.1766086-1-yuyanghuang@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 12:54:45 +01:00
Yuyang Huang
aa4ad7c3f2 netlink: correct nlmsg size for multicast notifications
Corrected the netlink message size calculation for multicast group
join/leave notifications. The previous calculation did not account for
the inclusion of both IPv4/IPv6 addresses and ifa_cacheinfo in the
payload. This fix ensures that the allocated message size is
sufficient to hold all necessary information.

This patch also includes the following improvements:
* Uses GFP_KERNEL instead of GFP_ATOMIC when holding the RTNL mutex.
* Uses nla_total_size(sizeof(struct in6_addr)) instead of
  nla_total_size(16).
* Removes unnecessary EXPORT_SYMBOL().

Fixes: 2c2b61d213 ("netlink: add IGMP/MLD join/leave notifications")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://patch.msgid.link/20241221100007.1910089-1-yuyanghuang@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23 10:26:43 -08:00
Yuyang Huang
2c2b61d213 netlink: add IGMP/MLD join/leave notifications
This change introduces netlink notifications for multicast address
changes. The following features are included:
* Addition and deletion of multicast addresses are reported using
  RTM_NEWMULTICAST and RTM_DELMULTICAST messages with AF_INET and
  AF_INET6.
* Two new notification groups: RTNLGRP_IPV4_MCADDR and
  RTNLGRP_IPV6_MCADDR are introduced for receiving these events.

This change allows user space applications (e.g., ip monitor) to
efficiently track multicast group memberships by listening for netlink
events. Previously, applications relied on inefficient polling of
procfs, introducing delays. With netlink notifications, applications
receive realtime updates on multicast group membership changes,
enabling more precise metrics collection and system monitoring. 

This change also unlocks the potential for implementing a wide range
of sophisticated multicast related features in user space by allowing
applications to combine kernel provided multicast address information
with user space data and communicate decisions back to the kernel for
more fine grained control. This mechanism can be used for various
purposes, including multicast filtering, IGMP/MLD offload, and
IGMP/MLD snooping.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Co-developed-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
Link: https://lore.kernel.org/r/20180906091056.21109-1-pruddy@vyatta.att-mail.com
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15 12:31:35 +00:00
Eric Dumazet
a747e02430 ipv6: avoid possible NULL deref in modify_prefix_route()
syzbot found a NULL deref [1] in modify_prefix_route(), caused by one
fib6_info without a fib6_table pointer set.

This can happen for net->ipv6.fib6_null_entry

[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 1 UID: 0 PID: 5837 Comm: syz-executor888 Not tainted 6.12.0-syzkaller-09567-g7eef7e306d3c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:__lock_acquire+0xe4/0x3c40 kernel/locking/lockdep.c:5089
Code: 08 84 d2 0f 85 15 14 00 00 44 8b 0d ca 98 f5 0e 45 85 c9 0f 84 b4 0e 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 96 2c 00 00 49 8b 04 24 48 3d a0 07 7f 93 0f 84
RSP: 0018:ffffc900035d7268 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000006 RSI: 1ffff920006bae5f RDI: 0000000000000030
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff90608e17 R11: 0000000000000001 R12: 0000000000000030
R13: ffff888036334880 R14: 0000000000000000 R15: 0000000000000000
FS:  0000555579e90380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc59cc4278 CR3: 0000000072b54000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
  _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
  spin_lock_bh include/linux/spinlock.h:356 [inline]
  modify_prefix_route+0x30b/0x8b0 net/ipv6/addrconf.c:4831
  inet6_addr_modify net/ipv6/addrconf.c:4923 [inline]
  inet6_rtm_newaddr+0x12c7/0x1ab0 net/ipv6/addrconf.c:5055
  rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6920
  netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2541
  netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
  netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1347
  netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1891
  sock_sendmsg_nosec net/socket.c:711 [inline]
  __sock_sendmsg net/socket.c:726 [inline]
  ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2583
  ___sys_sendmsg+0x135/0x1e0 net/socket.c:2637
  __sys_sendmsg+0x16e/0x220 net/socket.c:2669
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd1dcef8b79
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc59cc4378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd1dcef8b79
RDX: 0000000000040040 RSI: 0000000020000140 RDI: 0000000000000004
RBP: 00000000000113fd R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 00007ffc59cc438c
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Fixes: 5eb902b8e7 ("net/ipv6: Remove expired routes with a separated list of routes.")
Reported-by: syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67461f7f.050a0220.1286eb.0021.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Kui-Feng Lee <thinker.li@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-01 20:45:23 +00:00
Hangbin Liu
00b5b7aab9 net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged
RFC8981 section 3.4 says that existing temporary addresses must have their
lifetimes adjusted so that no temporary addresses should ever remain "valid"
or "preferred" longer than the incoming SLAAC Prefix Information. This would
strongly imply in Linux's case that if the "mngtmpaddr" address is deleted or
un-flagged as such, its corresponding temporary addresses must be cleared out
right away.

But now the temporary address is renewed even after ‘mngtmpaddr’ is removed
or becomes unmanaged as manage_tempaddrs() set temporary addresses
prefered/valid time to 0, and later in addrconf_verify_rtnl() all checkings
failed to remove the addresses. Fix this by deleting the temporary address
directly for these situations.

Fixes: 778964f2fd ("ipv6/addrconf: fix timing bug in tempaddr regen")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 10:29:12 +01:00
Johannes Berg
a885a6b2d3 net: convert to nla_get_*_default()
Most of the original conversion is from the spatch below,
but I edited some and left out other instances that were
either buggy after conversion (where default values don't
fit into the type) or just looked strange.

    @@
    expression attr, def;
    expression val;
    identifier fn =~ "^nla_get_.*";
    fresh identifier dfn = fn ## "_default";
    @@
    (
    -if (attr)
    -  val = fn(attr);
    -else
    -  val = def;
    +val = dfn(attr, def);
    |
    -if (!attr)
    -  val = def;
    -else
    -  val = fn(attr);
    +val = dfn(attr, def);
    |
    -if (!attr)
    -  return def;
    -return fn(attr);
    +return dfn(attr, def);
    |
    -attr ? fn(attr) : def
    +dfn(attr, def)
    |
    -!attr ? def : fn(attr)
    +dfn(attr, def)
    )

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 10:32:06 -08:00
Kuniyuki Iwashima
26eebdc4b0 rtnetlink: Return int from rtnl_af_register().
The next patch will add init_srcu_struct() in rtnl_af_register(),
then we need to handle its error.

Let's add the error handling in advance to make the following
patch cleaner.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
a37b0e4eca ipv6: Use rtnl_register_many().
We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Eric Dumazet
4a0ec2aa07 ipv6: switch inet6_addr_hash() to less predictable hash
In commit 3f27fb2321 ("ipv6: addrconf: add per netns perturbation
in inet6_addr_hash()"), I added net_hash_mix() in inet6_addr_hash()
to get better hash dispersion, at a time all netns were sharing the
hash table.

Since then, commit 21a216a8fc ("ipv6/addrconf: allocate a per
netns hash table") made the hash table per netns.

We could remove the net_hash_mix() from inet6_addr_hash(), but
there is still an issue with ipv6_addr_hash().

It is highly predictable and a malicious user can easily create
thousands of IPv6 addresses all stored in the same hash bucket.

Switch to __ipv6_addr_jhash(). We could use a dedicated
secret, or reuse net_hash_mix() as I did in this patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241008120101.734521-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09 19:33:46 -07:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Xi Huang
d35a3a8f1b ipv6: remove redundant check
err varibale will be set everytime,like -ENOBUFS and in if (err < 0),
 when code gets into this path. This check will just slowdown
the execution and that's all.

Signed-off-by: Xi Huang <xuiagnh@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20240820115442.49366-1-xuiagnh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21 17:21:09 -07:00
Kuniyuki Iwashima
de67763cbd ip: Move INFINITY_LIFE_TIME to addrconf.h.
INFINITY_LIFE_TIME is the common value used in IPv4 and IPv6 but defined
in both .c files.

Also, 0xffffffff used in addrconf_timeout_fixup() is INFINITY_LIFE_TIME.

Let's move INFINITY_LIFE_TIME's definition to addrconf.h

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240809235406.50187-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-15 18:56:14 -07:00
Patrick Rohr
990c304930 Add support for PIO p flag
draft-ietf-6man-pio-pflag is adding a new flag to the Prefix Information
Option to signal that the network can allocate a unique IPv6 prefix per
client via DHCPv6-PD (see draft-ietf-v6ops-dhcp-pd-per-device).

When ra_honor_pio_pflag is enabled, the presence of a P-flag causes
SLAAC autoconfiguration to be disabled for that particular PIO.

An automated test has been added in Android (r.android.com/3195335) to
go along with this change.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Lamparter <equinox@opensourcerouting.org>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31 13:49:48 +01:00
Joel Granados
78eb4ea25c sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.

This patch has been generated by the following coccinelle script:

```
  virtual patch

  @r1@
  identifier ctl, write, buffer, lenp, ppos;
  identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int write, void *buffer, size_t *lenp, loff_t *ppos);

  @r2@
  identifier func, ctl, write, buffer, lenp, ppos;
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int write, void *buffer, size_t *lenp, loff_t *ppos)
  { ... }

  @r3@
  identifier func;
  @@

  int func(
  - struct ctl_table *
  + const struct ctl_table *
    ,int , void *, size_t *, loff_t *);

  @r4@
  identifier func, ctl;
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int , void *, size_t *, loff_t *);

  @r5@
  identifier func, write, buffer, lenp, ppos;
  @@

  int func(
  - struct ctl_table *
  + const struct ctl_table *
    ,int write, void *buffer, size_t *lenp, loff_t *ppos);

```

* Code formatting was adjusted in xfs_sysctl.c to comply with code
  conventions. The xfs_stats_clear_proc_handler,
  xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
  adjusted.

* The ctl_table argument in proc_watchdog_common was const qualified.
  This is called from a proc_handler itself and is calling back into
  another proc_handler, making it necessary to change it as part of the
  proc_handler migration.

Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-24 20:59:29 +02:00
Jakub Kicinski
51b35d4f9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.11 net-next PR.

Conflicts:
  93c3a96c30 ("net: pse-pd: Do not return EOPNOSUPP if config is null")
  4cddb0f15e ("net: ethtool: pse-pd: Fix possible null-deref")
  30d7b67277 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15 13:19:17 -07:00
Nicolas Dichtel
abb9a68d2c ipv6: take care of scope when choosing the src addr
When the source address is selected, the scope must be checked. For
example, if a loopback address is assigned to the vrf device, it must not
be chosen for packets sent outside.

CC: stable@vger.kernel.org
Fixes: afbac6010a ("net: ipv6: Address selection needs to consider L3 domains")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14 07:34:16 -07:00