mirror_ubuntu-kernels/include/linux/gfp.h
Linus Torvalds 9187210eee Networking changes for 6.9.
Core & protocols
 ----------------
 
  - Large effort by Eric to lower rtnl_lock pressure and remove locks:
 
    - Make commonly used parts of rtnetlink (address, route dumps etc.)
      lockless, protected by RCU instead of rtnl_lock.
 
    - Add a netns exit callback which already holds rtnl_lock,
      allowing netns exit to take rtnl_lock once in the core
      instead of once for each driver / callback.
 
    - Remove locks / serialization in the socket diag interface.
 
    - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
 
    - Remove the dev_base_lock, depend on RCU where necessary.
 
  - Support busy polling on a per-epoll context basis. Poll length
    and budget parameters can be set independently of system defaults.
 
  - Introduce struct net_hotdata, to make sure read-mostly global config
    variables fit in as few cache lines as possible.
 
  - Add optional per-nexthop statistics to ease monitoring / debug
    of ECMP imbalance problems.
 
  - Support TCP_NOTSENT_LOWAT in MPTCP.
 
  - Ensure that IPv6 temporary addresses' preferred lifetimes are long
    enough, compared to other configured lifetimes, and at least 2 sec.
 
  - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
 
  - Add support for the independent control state machine for bonding
    per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
    control state machine.
 
  - Add "network ID" to MCTP socket APIs to support hosts with multiple
    disjoint MCTP networks.
 
  - Re-use the mono_delivery_time skbuff bit for packets which user
    space wants to be sent at a specified time. Maintain the timing
    information while traversing veth links, bridge etc.
 
  - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
 
  - Simplify many places iterating over netdevs by using an xarray
    instead of a hash table walk (hash table remains in place, for
    use on fastpaths).
 
  - Speed up scanning for expired routes by keeping a dedicated list.
 
  - Speed up "generic" XDP by trying harder to avoid large allocations.
 
  - Support attaching arbitrary metadata to netconsole messages.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
    VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena).
 
  - Rework selftest harness to enable the use of the full range of
    ksft exit code (pass, fail, skip, xfail, xpass).
 
 Netfilter
 ---------
 
  - Allow userspace to define a table that is exclusively owned by a daemon
    (via netlink socket aliveness) without auto-removing this table when
    the userspace program exits. Such table gets marked as orphaned and
    a restarting management daemon can re-attach/regain ownership.
 
  - Speed up element insertions to nftables' concatenated-ranges set type.
    Compact a few related data structures.
 
 BPF
 ---
 
  - Add BPF token support for delegating a subset of BPF subsystem
    functionality from privileged system-wide daemons such as systemd
    through special mount options for userns-bound BPF fs to a trusted
    & unprivileged application.
 
  - Introduce bpf_arena which is sparse shared memory region between BPF
    program and user space where structures inside the arena can have
    pointers to other areas of the arena, and pointers work seamlessly
    for both user-space programs and BPF programs.
 
  - Introduce may_goto instruction that is a contract between the verifier
    and the program. The verifier allows the program to loop assuming it's
    behaving well, but reserves the right to terminate it.
 
  - Extend the BPF verifier to enable static subprog calls in spin lock
    critical sections.
 
  - Support registration of struct_ops types from modules which helps
    projects like fuse-bpf that seeks to implement a new struct_ops type.
 
  - Add support for retrieval of cookies for perf/kprobe multi links.
 
  - Support arbitrary TCP SYN cookie generation / validation in the TC
    layer with BPF to allow creating SYN flood handling in BPF firewalls.
 
  - Add code generation to inline the bpf_kptr_xchg() helper which
    improves performance when stashing/popping the allocated BPF objects.
 
 Wireless
 --------
 
  - Add SPP (signaling and payload protected) AMSDU support.
 
  - Support wider bandwidth OFDMA, as required for EHT operation.
 
 Driver API
 ----------
 
  - Major overhaul of the Energy Efficient Ethernet internals to support
    new link modes (2.5GE, 5GE), share more code between drivers
    (especially those using phylib), and encourage more uniform behavior.
    Convert and clean up drivers.
 
  - Define an API for querying per netdev queue statistics from drivers.
 
  - IPSec: account in global stats for fully offloaded sessions.
 
  - Create a concept of Ethernet PHY Packages at the Device Tree level,
    to allow parameterizing the existing PHY package code.
 
  - Enable Rx hashing (RSS) on GTP protocol fields.
 
 Misc
 ----
 
  - Improvements and refactoring all over networking selftests.
 
  - Create uniform module aliases for TC classifiers, actions,
    and packet schedulers to simplify creating modprobe policies.
 
  - Address all missing MODULE_DESCRIPTION() warnings in networking.
 
  - Extend the Netlink descriptions in YAML to cover message encapsulation
    or "Netlink polymorphism", where interpretation of nested attributes
    depends on link type, classifier type or some other "class type".
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
    - Intel (100G, ice, idpf):
      - support E825-C devices
    - nVidia/Mellanox:
      - support devices with one port and multiple PCIe links
    - Broadcom (bnxt):
      - support n-tuple filters
      - support configuring the RSS key
    - Wangxun (ngbe/txgbe):
      - implement irq_domain for TXGBE's sub-interrupts
    - Pensando/AMD:
      - support XDP
      - optimize queue submission and wakeup handling (+17% bps)
      - optimize struct layout, saving 28% of memory on queues
 
  - Ethernet NICs embedded and virtual:
    - Google cloud vNIC:
      - refactor driver to perform memory allocations for new queue
        config before stopping and freeing the old queue memory
    - Synopsys (stmmac):
      - obey queueMaxSDU and implement counters required by 802.1Qbv
    - Renesas (ravb):
      - support packet checksum offload
      - suspend to RAM and runtime PM support
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - support for nexthop group statistics
    - Microchip:
      - ksz8: implement PHY loopback
      - add support for KSZ8567, a 7-port 10/100Mbps switch
 
  - PTP:
    - New driver for RENESAS FemtoClock3 Wireless clock generator.
    - Support OCP PTP cards designed and built by Adva.
 
  - CAN:
    - Support recvmsg() flags for own, local and remote traffic
      on CAN BCM sockets.
    - Support for esd GmbH PCIe/402 CAN device family.
    - m_can:
      - Rx/Tx submission coalescing
      - wake on frame Rx
 
  - WiFi:
    - Intel (iwlwifi):
      - enable signaling and payload protected A-MSDUs
      - support wider-bandwidth OFDMA
      - support for new devices
      - bump FW API to 89 for AX devices; 90 for BZ/SC devices
    - MediaTek (mt76):
      - mt7915: newer ADIE version support
      - mt7925: radio temperature sensor support
    - Qualcomm (ath11k):
      - support 6 GHz station power modes: Low Power Indoor (LPI),
        Standard Power) SP and Very Low Power (VLP)
      - QCA6390 & WCN6855: support 2 concurrent station interfaces
      - QCA2066 support
    - Qualcomm (ath12k):
      - refactoring in preparation for Multi-Link Operation (MLO) support
      - 1024 Block Ack window size support
      - firmware-2.bin support
      - support having multiple identical PCI devices (firmware needs to
        have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
      - QCN9274: support split-PHY devices
      - WCN7850: enable Power Save Mode in station mode
      - WCN7850: P2P support
    - RealTek:
      - rtw88: support for more rtw8811cu and rtw8821cu devices
      - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
      - rtlwifi: speed up USB firmware initialization
      - rtwl8xxxu:
        - RTL8188F: concurrent interface support
        - Channel Switch Announcement (CSA) support in AP mode
    - Broadcom (brcmfmac):
      - per-vendor feature support
      - per-vendor SAE password setup
      - DMI nvram filename quirk for ACEPC W5 Pro
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXv0mgACgkQMUZtbf5S
 IrtgMxAAuRd+WJW++SENr4KxIWhYO1q6Xcxnai43wrNkan9swD24icG8TYALt4f3
 yoT6idQvWReAb5JNlh9rUQz8R7E0nJXlvEFn5MtJwcthx2C6wFo/XkJlddlRrT+j
 c2xGILwLjRhW65LaC0MZ2ECbEERkFz8xcGfK2SWzUgh6KYvPjcRfKFxugpM7xOQK
 P/Wnqhs4fVRS/Mj/bCcXcO+yhwC121Q3qVeQVjGS0AzEC65hAW87a/kc2BfgcegD
 EyI9R7mf6criQwX+0awubjfoIdr4oW/8oDVNvUDczkJkbaEVaLMQk9P5x/0XnnVS
 UHUchWXyI80Q8Rj12uN1/I0h3WtwNQnCRBuLSmtm6GLfCAwbLvp2nGWDnaXiqryW
 DVKUIHGvqPKjkOOMOVfSvfB3LvkS3xsFVVYiQBQCn0YSs/gtu4CoF2Nty9CiLPbK
 tTuxUnLdPDZDxU//l0VArZmP8p2JM7XQGJ+JH8GFH4SBTyBR23e0iyPSoyaxjnYn
 RReDnHMVsrS1i7GPhbqDJWn+uqMSs7N149i0XmmyeqwQHUVSJN3J2BApP2nCaDfy
 H2lTuYly5FfEezt61NvCE4qr/VsWeEjm1fYlFQ9dFn4pGn+HghyCpw+xD1ZN56DN
 lujemau5B3kk1UTtAT4ypPqvuqjkRFqpNV2LzsJSk/Js+hApw8Y=
 =oY52
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Large effort by Eric to lower rtnl_lock pressure and remove locks:

      - Make commonly used parts of rtnetlink (address, route dumps
        etc) lockless, protected by RCU instead of rtnl_lock.

      - Add a netns exit callback which already holds rtnl_lock,
        allowing netns exit to take rtnl_lock once in the core instead
        of once for each driver / callback.

      - Remove locks / serialization in the socket diag interface.

      - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.

      - Remove the dev_base_lock, depend on RCU where necessary.

   - Support busy polling on a per-epoll context basis. Poll length and
     budget parameters can be set independently of system defaults.

   - Introduce struct net_hotdata, to make sure read-mostly global
     config variables fit in as few cache lines as possible.

   - Add optional per-nexthop statistics to ease monitoring / debug of
     ECMP imbalance problems.

   - Support TCP_NOTSENT_LOWAT in MPTCP.

   - Ensure that IPv6 temporary addresses' preferred lifetimes are long
     enough, compared to other configured lifetimes, and at least 2 sec.

   - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.

   - Add support for the independent control state machine for bonding
     per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
     control state machine.

   - Add "network ID" to MCTP socket APIs to support hosts with multiple
     disjoint MCTP networks.

   - Re-use the mono_delivery_time skbuff bit for packets which user
     space wants to be sent at a specified time. Maintain the timing
     information while traversing veth links, bridge etc.

   - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.

   - Simplify many places iterating over netdevs by using an xarray
     instead of a hash table walk (hash table remains in place, for use
     on fastpaths).

   - Speed up scanning for expired routes by keeping a dedicated list.

   - Speed up "generic" XDP by trying harder to avoid large allocations.

   - Support attaching arbitrary metadata to netconsole messages.

  Things we sprinkled into general kernel code:

   - Enforce VM_IOREMAP flag and range in ioremap_page_range and
     introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
     bpf_arena).

   - Rework selftest harness to enable the use of the full range of ksft
     exit code (pass, fail, skip, xfail, xpass).

  Netfilter:

   - Allow userspace to define a table that is exclusively owned by a
     daemon (via netlink socket aliveness) without auto-removing this
     table when the userspace program exits. Such table gets marked as
     orphaned and a restarting management daemon can re-attach/regain
     ownership.

   - Speed up element insertions to nftables' concatenated-ranges set
     type. Compact a few related data structures.

  BPF:

   - Add BPF token support for delegating a subset of BPF subsystem
     functionality from privileged system-wide daemons such as systemd
     through special mount options for userns-bound BPF fs to a trusted
     & unprivileged application.

   - Introduce bpf_arena which is sparse shared memory region between
     BPF program and user space where structures inside the arena can
     have pointers to other areas of the arena, and pointers work
     seamlessly for both user-space programs and BPF programs.

   - Introduce may_goto instruction that is a contract between the
     verifier and the program. The verifier allows the program to loop
     assuming it's behaving well, but reserves the right to terminate
     it.

   - Extend the BPF verifier to enable static subprog calls in spin lock
     critical sections.

   - Support registration of struct_ops types from modules which helps
     projects like fuse-bpf that seeks to implement a new struct_ops
     type.

   - Add support for retrieval of cookies for perf/kprobe multi links.

   - Support arbitrary TCP SYN cookie generation / validation in the TC
     layer with BPF to allow creating SYN flood handling in BPF
     firewalls.

   - Add code generation to inline the bpf_kptr_xchg() helper which
     improves performance when stashing/popping the allocated BPF
     objects.

  Wireless:

   - Add SPP (signaling and payload protected) AMSDU support.

   - Support wider bandwidth OFDMA, as required for EHT operation.

  Driver API:

   - Major overhaul of the Energy Efficient Ethernet internals to
     support new link modes (2.5GE, 5GE), share more code between
     drivers (especially those using phylib), and encourage more
     uniform behavior. Convert and clean up drivers.

   - Define an API for querying per netdev queue statistics from
     drivers.

   - IPSec: account in global stats for fully offloaded sessions.

   - Create a concept of Ethernet PHY Packages at the Device Tree level,
     to allow parameterizing the existing PHY package code.

   - Enable Rx hashing (RSS) on GTP protocol fields.

  Misc:

   - Improvements and refactoring all over networking selftests.

   - Create uniform module aliases for TC classifiers, actions, and
     packet schedulers to simplify creating modprobe policies.

   - Address all missing MODULE_DESCRIPTION() warnings in networking.

   - Extend the Netlink descriptions in YAML to cover message
     encapsulation or "Netlink polymorphism", where interpretation of
     nested attributes depends on link type, classifier type or some
     other "class type".

  Drivers:

   - Ethernet high-speed NICs:
      - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
      - Intel (100G, ice, idpf):
         - support E825-C devices
      - nVidia/Mellanox:
         - support devices with one port and multiple PCIe links
      - Broadcom (bnxt):
         - support n-tuple filters
         - support configuring the RSS key
      - Wangxun (ngbe/txgbe):
         - implement irq_domain for TXGBE's sub-interrupts
      - Pensando/AMD:
         - support XDP
         - optimize queue submission and wakeup handling (+17% bps)
         - optimize struct layout, saving 28% of memory on queues

   - Ethernet NICs embedded and virtual:
      - Google cloud vNIC:
         - refactor driver to perform memory allocations for new queue
           config before stopping and freeing the old queue memory
      - Synopsys (stmmac):
         - obey queueMaxSDU and implement counters required by 802.1Qbv
      - Renesas (ravb):
         - support packet checksum offload
         - suspend to RAM and runtime PM support

   - Ethernet switches:
      - nVidia/Mellanox:
         - support for nexthop group statistics
      - Microchip:
         - ksz8: implement PHY loopback
         - add support for KSZ8567, a 7-port 10/100Mbps switch

   - PTP:
      - New driver for RENESAS FemtoClock3 Wireless clock generator.
      - Support OCP PTP cards designed and built by Adva.

   - CAN:
      - Support recvmsg() flags for own, local and remote traffic on CAN
        BCM sockets.
      - Support for esd GmbH PCIe/402 CAN device family.
      - m_can:
         - Rx/Tx submission coalescing
         - wake on frame Rx

   - WiFi:
      - Intel (iwlwifi):
         - enable signaling and payload protected A-MSDUs
         - support wider-bandwidth OFDMA
         - support for new devices
         - bump FW API to 89 for AX devices; 90 for BZ/SC devices
      - MediaTek (mt76):
         - mt7915: newer ADIE version support
         - mt7925: radio temperature sensor support
      - Qualcomm (ath11k):
         - support 6 GHz station power modes: Low Power Indoor (LPI),
           Standard Power) SP and Very Low Power (VLP)
         - QCA6390 & WCN6855: support 2 concurrent station interfaces
         - QCA2066 support
      - Qualcomm (ath12k):
         - refactoring in preparation for Multi-Link Operation (MLO)
           support
         - 1024 Block Ack window size support
         - firmware-2.bin support
         - support having multiple identical PCI devices (firmware needs
           to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
         - QCN9274: support split-PHY devices
         - WCN7850: enable Power Save Mode in station mode
         - WCN7850: P2P support
      - RealTek:
         - rtw88: support for more rtw8811cu and rtw8821cu devices
         - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
         - rtlwifi: speed up USB firmware initialization
         - rtwl8xxxu:
             - RTL8188F: concurrent interface support
             - Channel Switch Announcement (CSA) support in AP mode
      - Broadcom (brcmfmac):
         - per-vendor feature support
         - per-vendor SAE password setup
         - DMI nvram filename quirk for ACEPC W5 Pro"

* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
  nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
  nexthop: Fix out-of-bounds access during attribute validation
  nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
  nexthop: Only parse NHA_OP_FLAGS for get messages that require it
  bpf: move sleepable flag from bpf_prog_aux to bpf_prog
  bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
  selftests/bpf: Add kprobe multi triggering benchmarks
  ptp: Move from simple ida to xarray
  vxlan: Remove generic .ndo_get_stats64
  vxlan: Do not alloc tstats manually
  devlink: Add comments to use netlink gen tool
  nfp: flower: handle acti_netdevs allocation failure
  net/packet: Add getsockopt support for PACKET_COPY_THRESH
  net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
  selftests/bpf: Add bpf_arena_htab test.
  selftests/bpf: Add bpf_arena_list test.
  selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
  bpf: Add helper macro bpf_addr_space_cast()
  libbpf: Recognize __arena global variables.
  bpftool: Recognize arena map type
  ...
2024-03-12 17:44:08 -07:00

385 lines
12 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __LINUX_GFP_H
#define __LINUX_GFP_H
#include <linux/gfp_types.h>
#include <linux/mmzone.h>
#include <linux/topology.h>
struct vm_area_struct;
struct mempolicy;
/* Convert GFP flags to their corresponding migrate type */
#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
#define GFP_MOVABLE_SHIFT 3
static inline int gfp_migratetype(const gfp_t gfp_flags)
{
VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE);
BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE);
BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE);
BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC);
if (unlikely(page_group_by_mobility_disabled))
return MIGRATE_UNMOVABLE;
/* Group based on mobility */
return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT;
}
#undef GFP_MOVABLE_MASK
#undef GFP_MOVABLE_SHIFT
static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
{
return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
}
#ifdef CONFIG_HIGHMEM
#define OPT_ZONE_HIGHMEM ZONE_HIGHMEM
#else
#define OPT_ZONE_HIGHMEM ZONE_NORMAL
#endif
#ifdef CONFIG_ZONE_DMA
#define OPT_ZONE_DMA ZONE_DMA
#else
#define OPT_ZONE_DMA ZONE_NORMAL
#endif
#ifdef CONFIG_ZONE_DMA32
#define OPT_ZONE_DMA32 ZONE_DMA32
#else
#define OPT_ZONE_DMA32 ZONE_NORMAL
#endif
/*
* GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
* zone to use given the lowest 4 bits of gfp_t. Entries are GFP_ZONES_SHIFT
* bits long and there are 16 of them to cover all possible combinations of
* __GFP_DMA, __GFP_DMA32, __GFP_MOVABLE and __GFP_HIGHMEM.
*
* The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA.
* But GFP_MOVABLE is not only a zone specifier but also an allocation
* policy. Therefore __GFP_MOVABLE plus another zone selector is valid.
* Only 1 bit of the lowest 3 bits (DMA,DMA32,HIGHMEM) can be set to "1".
*
* bit result
* =================
* 0x0 => NORMAL
* 0x1 => DMA or NORMAL
* 0x2 => HIGHMEM or NORMAL
* 0x3 => BAD (DMA+HIGHMEM)
* 0x4 => DMA32 or NORMAL
* 0x5 => BAD (DMA+DMA32)
* 0x6 => BAD (HIGHMEM+DMA32)
* 0x7 => BAD (HIGHMEM+DMA32+DMA)
* 0x8 => NORMAL (MOVABLE+0)
* 0x9 => DMA or NORMAL (MOVABLE+DMA)
* 0xa => MOVABLE (Movable is valid only if HIGHMEM is set too)
* 0xb => BAD (MOVABLE+HIGHMEM+DMA)
* 0xc => DMA32 or NORMAL (MOVABLE+DMA32)
* 0xd => BAD (MOVABLE+DMA32+DMA)
* 0xe => BAD (MOVABLE+DMA32+HIGHMEM)
* 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
*
* GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms.
*/
#if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4
/* ZONE_DEVICE is not a valid GFP zone specifier */
#define GFP_ZONES_SHIFT 2
#else
#define GFP_ZONES_SHIFT ZONES_SHIFT
#endif
#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG
#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
#endif
#define GFP_ZONE_TABLE ( \
(ZONE_NORMAL << 0 * GFP_ZONES_SHIFT) \
| (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT) \
| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT) \
| (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT) \
| (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT) \
| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT) \
| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)\
| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)\
)
/*
* GFP_ZONE_BAD is a bitmap for all combinations of __GFP_DMA, __GFP_DMA32
* __GFP_HIGHMEM and __GFP_MOVABLE that are not permitted. One flag per
* entry starting with bit 0. Bit is set if the combination is not
* allowed.
*/
#define GFP_ZONE_BAD ( \
1 << (___GFP_DMA | ___GFP_HIGHMEM) \
| 1 << (___GFP_DMA | ___GFP_DMA32) \
| 1 << (___GFP_DMA32 | ___GFP_HIGHMEM) \
| 1 << (___GFP_DMA | ___GFP_DMA32 | ___GFP_HIGHMEM) \
| 1 << (___GFP_MOVABLE | ___GFP_HIGHMEM | ___GFP_DMA) \
| 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_DMA) \
| 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_HIGHMEM) \
| 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_DMA | ___GFP_HIGHMEM) \
)
static inline enum zone_type gfp_zone(gfp_t flags)
{
enum zone_type z;
int bit = (__force int) (flags & GFP_ZONEMASK);
z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) &
((1 << GFP_ZONES_SHIFT) - 1);
VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1);
return z;
}
/*
* There is only one page-allocator function, and two main namespaces to
* it. The alloc_page*() variants return 'struct page *' and as such
* can allocate highmem pages, the *get*page*() variants return
* virtual kernel addresses to the allocated page(s).
*/
static inline int gfp_zonelist(gfp_t flags)
{
#ifdef CONFIG_NUMA
if (unlikely(flags & __GFP_THISNODE))
return ZONELIST_NOFALLBACK;
#endif
return ZONELIST_FALLBACK;
}
/*
* We get the zone list from the current node and the gfp_mask.
* This zone list contains a maximum of MAX_NUMNODES*MAX_NR_ZONES zones.
* There are two zonelists per node, one for all zones with memory and
* one containing just zones from the node the zonelist belongs to.
*
* For the case of non-NUMA systems the NODE_DATA() gets optimized to
* &contig_page_data at compile-time.
*/
static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
{
return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
}
#ifndef HAVE_ARCH_FREE_PAGE
static inline void arch_free_page(struct page *page, int order) { }
#endif
#ifndef HAVE_ARCH_ALLOC_PAGE
static inline void arch_alloc_page(struct page *page, int order) { }
#endif
struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
struct list_head *page_list,
struct page **page_array);
unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
unsigned long nr_pages,
struct page **page_array);
/* Bulk allocate order-0 pages */
static inline unsigned long
alloc_pages_bulk_list(gfp_t gfp, unsigned long nr_pages, struct list_head *list)
{
return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, list, NULL);
}
static inline unsigned long
alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_array)
{
return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
}
static inline unsigned long
alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
}
static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
{
gfp_t warn_gfp = gfp_mask & (__GFP_THISNODE|__GFP_NOWARN);
if (warn_gfp != (__GFP_THISNODE|__GFP_NOWARN))
return;
if (node_online(this_node))
return;
pr_warn("%pGg allocation from offline node %d\n", &gfp_mask, this_node);
dump_stack();
}
/*
* Allocate pages, preferring the node given as nid. The node must be valid and
* online. For more general interface, see alloc_pages_node().
*/
static inline struct page *
__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp_mask);
return __alloc_pages(gfp_mask, order, nid, NULL);
}
static inline
struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp);
return __folio_alloc(gfp, order, nid, NULL);
}
/*
* Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
* prefer the current CPU's closest node. Otherwise node must be valid and
* online.
*/
static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
unsigned int order)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
return __alloc_pages_node(nid, gfp_mask, order);
}
#ifdef CONFIG_NUMA
struct page *alloc_pages(gfp_t gfp, unsigned int order);
struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
struct mempolicy *mpol, pgoff_t ilx, int nid);
struct folio *folio_alloc(gfp_t gfp, unsigned int order);
struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
unsigned long addr, bool hugepage);
#else
static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
{
return alloc_pages_node(numa_node_id(), gfp_mask, order);
}
static inline struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
struct mempolicy *mpol, pgoff_t ilx, int nid)
{
return alloc_pages(gfp, order);
}
static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order)
{
return __folio_alloc_node(gfp, order, numa_node_id());
}
#define vma_alloc_folio(gfp, order, vma, addr, hugepage) \
folio_alloc(gfp, order)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
static inline struct page *alloc_page_vma(gfp_t gfp,
struct vm_area_struct *vma, unsigned long addr)
{
struct folio *folio = vma_alloc_folio(gfp, 0, vma, addr, false);
return &folio->page;
}
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
void *alloc_pages_exact(size_t size, gfp_t gfp_mask) __alloc_size(1);
void free_pages_exact(void *virt, size_t size);
__meminit void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);
#define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask), 0)
#define __get_dma_pages(gfp_mask, order) \
__get_free_pages((gfp_mask) | GFP_DMA, (order))
extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
struct page_frag_cache;
void page_frag_cache_drain(struct page_frag_cache *nc);
extern void __page_frag_cache_drain(struct page *page, unsigned int count);
void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz,
gfp_t gfp_mask, unsigned int align_mask);
static inline void *page_frag_alloc_align(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask,
unsigned int align)
{
WARN_ON_ONCE(!is_power_of_2(align));
return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align);
}
static inline void *page_frag_alloc(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask)
{
return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u);
}
extern void page_frag_free(void *addr);
#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr), 0)
void page_alloc_init_cpuhp(void);
int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
void drain_all_pages(struct zone *zone);
void drain_local_pages(struct zone *zone);
void page_alloc_init_late(void);
void setup_pcp_cacheinfo(void);
/*
* gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
* GFP flags are used before interrupts are enabled. Once interrupts are
* enabled, it is set to __GFP_BITS_MASK while the system is running. During
* hibernation, it is used by PM to avoid I/O during memory allocation while
* devices are suspended.
*/
extern gfp_t gfp_allowed_mask;
/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
static inline bool gfp_has_io_fs(gfp_t gfp)
{
return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
}
/*
* Check if the gfp flags allow compaction - GFP_NOIO is a really
* tricky context because the migration might require IO.
*/
static inline bool gfp_compaction_allowed(gfp_t gfp_mask)
{
return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO);
}
extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
#ifdef CONFIG_CONTIG_ALLOC
/* The below functions must be run on a range from a single zone. */
extern int alloc_contig_range(unsigned long start, unsigned long end,
unsigned migratetype, gfp_t gfp_mask);
extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
int nid, nodemask_t *nodemask);
#endif
void free_contig_range(unsigned long pfn, unsigned long nr_pages);
#endif /* __LINUX_GFP_H */