diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-30 08:58:55 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-30 08:58:55 -0700 |
commit | 8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf (patch) | |
tree | fec3039a08284cd87f4ec9c3bea5b5a439f1859f /drivers/net/ethernet/ibm/ibmveth.c | |
parent | 4b290aae788e06561754b28c6842e4080957d3f7 (diff) | |
parent | fa582ca7e187a15e772e6a72fe035f649b387a60 (diff) |
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
Diffstat (limited to 'drivers/net/ethernet/ibm/ibmveth.c')
-rw-r--r-- | drivers/net/ethernet/ibm/ibmveth.c | 220 |
1 files changed, 152 insertions, 68 deletions
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 24046fe16634..6f0821f1e798 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -211,98 +211,169 @@ static inline void ibmveth_flush_buffer(void *addr, unsigned long length) static void ibmveth_replenish_buffer_pool(struct ibmveth_adapter *adapter, struct ibmveth_buff_pool *pool) { - u32 i; - u32 count = pool->size - atomic_read(&pool->available); - u32 buffers_added = 0; - struct sk_buff *skb; - unsigned int free_index, index; - u64 correlator; + union ibmveth_buf_desc descs[IBMVETH_MAX_RX_PER_HCALL] = {0}; + u32 remaining = pool->size - atomic_read(&pool->available); + u64 correlators[IBMVETH_MAX_RX_PER_HCALL] = {0}; unsigned long lpar_rc; + u32 buffers_added = 0; + u32 i, filled, batch; + struct vio_dev *vdev; dma_addr_t dma_addr; + struct device *dev; + u32 index; + + vdev = adapter->vdev; + dev = &vdev->dev; mb(); - for (i = 0; i < count; ++i) { - union ibmveth_buf_desc desc; + batch = adapter->rx_buffers_per_hcall; - free_index = pool->consumer_index; - index = pool->free_map[free_index]; - skb = NULL; + while (remaining > 0) { + unsigned int free_index = pool->consumer_index; - if (WARN_ON(index == IBM_VETH_INVALID_MAP)) { - schedule_work(&adapter->work); - goto bad_index_failure; - } + /* Fill a batch of descriptors */ + for (filled = 0; filled < min(remaining, batch); filled++) { + index = pool->free_map[free_index]; + if (WARN_ON(index == IBM_VETH_INVALID_MAP)) { + adapter->replenish_add_buff_failure++; + netdev_info(adapter->netdev, + "Invalid map index %u, reset\n", + index); + schedule_work(&adapter->work); + break; + } + + if (!pool->skbuff[index]) { + struct sk_buff *skb = NULL; - /* are we allocating a new buffer or recycling an old one */ - if (pool->skbuff[index]) - goto reuse; + skb = netdev_alloc_skb(adapter->netdev, + pool->buff_size); + if (!skb) { + adapter->replenish_no_mem++; + adapter->replenish_add_buff_failure++; + break; + } - skb = netdev_alloc_skb(adapter->netdev, pool->buff_size); + dma_addr = dma_map_single(dev, skb->data, + pool->buff_size, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, dma_addr)) { + dev_kfree_skb_any(skb); + adapter->replenish_add_buff_failure++; + break; + } - if (!skb) { - netdev_dbg(adapter->netdev, - "replenish: unable to allocate skb\n"); - adapter->replenish_no_mem++; - break; - } + pool->dma_addr[index] = dma_addr; + pool->skbuff[index] = skb; + } else { + /* re-use case */ + dma_addr = pool->dma_addr[index]; + } - dma_addr = dma_map_single(&adapter->vdev->dev, skb->data, - pool->buff_size, DMA_FROM_DEVICE); + if (rx_flush) { + unsigned int len; - if (dma_mapping_error(&adapter->vdev->dev, dma_addr)) - goto failure; + len = adapter->netdev->mtu + IBMVETH_BUFF_OH; + len = min(pool->buff_size, len); + ibmveth_flush_buffer(pool->skbuff[index]->data, + len); + } - pool->dma_addr[index] = dma_addr; - pool->skbuff[index] = skb; + descs[filled].fields.flags_len = IBMVETH_BUF_VALID | + pool->buff_size; + descs[filled].fields.address = dma_addr; - if (rx_flush) { - unsigned int len = min(pool->buff_size, - adapter->netdev->mtu + - IBMVETH_BUFF_OH); - ibmveth_flush_buffer(skb->data, len); - } -reuse: - dma_addr = pool->dma_addr[index]; - desc.fields.flags_len = IBMVETH_BUF_VALID | pool->buff_size; - desc.fields.address = dma_addr; + correlators[filled] = ((u64)pool->index << 32) | index; + *(u64 *)pool->skbuff[index]->data = correlators[filled]; - correlator = ((u64)pool->index << 32) | index; - *(u64 *)pool->skbuff[index]->data = correlator; + free_index++; + if (free_index >= pool->size) + free_index = 0; + } - lpar_rc = h_add_logical_lan_buffer(adapter->vdev->unit_address, - desc.desc); + if (!filled) + break; + /* single buffer case*/ + if (filled == 1) + lpar_rc = h_add_logical_lan_buffer(vdev->unit_address, + descs[0].desc); + else + /* Multi-buffer hcall */ + lpar_rc = h_add_logical_lan_buffers(vdev->unit_address, + descs[0].desc, + descs[1].desc, + descs[2].desc, + descs[3].desc, + descs[4].desc, + descs[5].desc, + descs[6].desc, + descs[7].desc); if (lpar_rc != H_SUCCESS) { - netdev_warn(adapter->netdev, - "%sadd_logical_lan failed %lu\n", - skb ? "" : "When recycling: ", lpar_rc); - goto failure; + dev_warn_ratelimited(dev, + "RX h_add_logical_lan failed: filled=%u, rc=%lu, batch=%u\n", + filled, lpar_rc, batch); + goto hcall_failure; } - pool->free_map[free_index] = IBM_VETH_INVALID_MAP; - pool->consumer_index++; - if (pool->consumer_index >= pool->size) - pool->consumer_index = 0; + /* Only update pool state after hcall succeeds */ + for (i = 0; i < filled; i++) { + free_index = pool->consumer_index; + pool->free_map[free_index] = IBM_VETH_INVALID_MAP; - buffers_added++; - adapter->replenish_add_buff_success++; - } + pool->consumer_index++; + if (pool->consumer_index >= pool->size) + pool->consumer_index = 0; + } - mb(); - atomic_add(buffers_added, &(pool->available)); - return; + buffers_added += filled; + adapter->replenish_add_buff_success += filled; + remaining -= filled; -failure: + memset(&descs, 0, sizeof(descs)); + memset(&correlators, 0, sizeof(correlators)); + continue; - if (dma_addr && !dma_mapping_error(&adapter->vdev->dev, dma_addr)) - dma_unmap_single(&adapter->vdev->dev, - pool->dma_addr[index], pool->buff_size, - DMA_FROM_DEVICE); - dev_kfree_skb_any(pool->skbuff[index]); - pool->skbuff[index] = NULL; -bad_index_failure: - adapter->replenish_add_buff_failure++; +hcall_failure: + for (i = 0; i < filled; i++) { + index = correlators[i] & 0xffffffffUL; + dma_addr = pool->dma_addr[index]; + + if (pool->skbuff[index]) { + if (dma_addr && + !dma_mapping_error(dev, dma_addr)) + dma_unmap_single(dev, dma_addr, + pool->buff_size, + DMA_FROM_DEVICE); + + dev_kfree_skb_any(pool->skbuff[index]); + pool->skbuff[index] = NULL; + } + } + adapter->replenish_add_buff_failure += filled; + + /* + * If multi rx buffers hcall is no longer supported by FW + * e.g. in the case of Live Parttion Migration + */ + if (batch > 1 && lpar_rc == H_FUNCTION) { + /* + * Instead of retry submit single buffer individually + * here just set the max rx buffer per hcall to 1 + * buffers will be respleshed next time + * when ibmveth_replenish_buffer_pool() is called again + * with single-buffer case + */ + netdev_info(adapter->netdev, + "RX Multi buffers not supported by FW, rc=%lu\n", + lpar_rc); + adapter->rx_buffers_per_hcall = 1; + netdev_info(adapter->netdev, + "Next rx replesh will fall back to single-buffer hcall\n"); + } + break; + } mb(); atomic_add(buffers_added, &(pool->available)); @@ -1783,6 +1854,19 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id) netdev->features |= NETIF_F_FRAGLIST; } + if (ret == H_SUCCESS && + (ret_attr & IBMVETH_ILLAN_RX_MULTI_BUFF_SUPPORT)) { + adapter->rx_buffers_per_hcall = IBMVETH_MAX_RX_PER_HCALL; + netdev_dbg(netdev, + "RX Multi-buffer hcall supported by FW, batch set to %u\n", + adapter->rx_buffers_per_hcall); + } else { + adapter->rx_buffers_per_hcall = 1; + netdev_dbg(netdev, + "RX Single-buffer hcall mode, batch set to %u\n", + adapter->rx_buffers_per_hcall); + } + netdev->min_mtu = IBMVETH_MIN_MTU; netdev->max_mtu = ETH_MAX_MTU - IBMVETH_BUFF_OH; |