From b3e7b3a6ee92ab927f750a6b19615ce88ece808f Mon Sep 17 00:00:00 2001 From: Michal Swiatkowski Date: Thu, 6 Jul 2023 08:25:51 +0200 Subject: ice: prevent NULL pointer deref during reload Calling ethtool during reload can lead to call trace, because VSI isn't configured for some time, but netdev is alive. To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors to 0 after freeing and add a check for ::tx/rx_rings in ring related ethtool ops. Add proper unroll of filters in ice_start_eth(). Reproduction: $watch -n 0.1 -d 'ethtool -g enp24s0f0np0' $devlink dev reload pci/0000:18:00.0 action driver_reinit Call trace before fix: [66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000 [66303.926259] #PF: supervisor read access in kernel mode [66303.926286] #PF: error_code(0x0000) - not-present page [66303.926311] PGD 0 P4D 0 [66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI [66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1 [66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018 [66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice] [66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48 [66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246 [66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48 [66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000 [66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000 [66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000 [66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50 [66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000 [66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0 [66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [66303.927060] PKRU: 55555554 [66303.927075] Call Trace: [66303.927094] [66303.927111] ? __die+0x23/0x70 [66303.927140] ? page_fault_oops+0x171/0x4e0 [66303.927176] ? exc_page_fault+0x7f/0x180 [66303.927209] ? asm_exc_page_fault+0x26/0x30 [66303.927244] ? ice_get_ringparam+0x22/0x50 [ice] [66303.927433] rings_prepare_data+0x62/0x80 [66303.927469] ethnl_default_doit+0xe2/0x350 [66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140 [66303.927538] genl_rcv_msg+0x1b1/0x2c0 [66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10 [66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10 [66303.927615] netlink_rcv_skb+0x58/0x110 [66303.927644] genl_rcv+0x28/0x40 [66303.927665] netlink_unicast+0x19e/0x290 [66303.927691] netlink_sendmsg+0x254/0x4d0 [66303.927717] sock_sendmsg+0x93/0xa0 [66303.927743] __sys_sendto+0x126/0x170 [66303.927780] __x64_sys_sendto+0x24/0x30 [66303.928593] do_syscall_64+0x5d/0x90 [66303.929370] ? __count_memcg_events+0x60/0xa0 [66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30 [66303.930920] ? handle_mm_fault+0x9e/0x350 [66303.931688] ? do_user_addr_fault+0x258/0x740 [66303.932452] ? exc_page_fault+0x7f/0x180 [66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: 5b246e533d01 ("ice: split probe into smaller functions") Reviewed-by: Przemek Kitszel Signed-off-by: Michal Swiatkowski Reviewed-by: Simon Horman Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_base.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/net/ethernet/intel/ice/ice_base.c') diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 4a12316f7b46..b678bdf96f3a 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -800,6 +800,8 @@ void ice_vsi_free_q_vectors(struct ice_vsi *vsi) ice_for_each_q_vector(vsi, v_idx) ice_free_q_vector(vsi, v_idx); + + vsi->num_q_vectors = 0; } /** -- cgit v1.2.3 From 1bbc04de607babfca7374d374960e20ebb20813f Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Wed, 19 Jul 2023 15:24:09 +0200 Subject: ice: xsk: add RX multi-buffer support This support is strongly inspired by work that introduced multi-buffer support to regular Rx data path in ice. There are some differences, though. When adding a frag, besides adding it to skb_shared_info, use also fresh xsk_buff_add_frag() helper. Reason for doing both things is that we can not rule out the fact that AF_XDP pipeline could use XDP program that needs to access frame fragments. Without them being in skb_shared_info it will not be possible. Another difference is that XDP_PASS has to allocate a new pages for each frags and copy contents from memory backed by xsk_buff_pool. chain_len that is used for programming HW Rx descriptors no longer has to be limited to 1 when xsk_pool is present - remove this restriction. Signed-off-by: Maciej Fijalkowski Link: https://lore.kernel.org/r/20230719132421.584801-13-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- drivers/net/ethernet/intel/ice/ice_base.c | 9 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 136 ++++++++++++++++++++++-------- 2 files changed, 102 insertions(+), 43 deletions(-) (limited to 'drivers/net/ethernet/intel/ice/ice_base.c') diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 4a12316f7b46..3367b8ba9851 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -408,7 +408,6 @@ static unsigned int ice_rx_offset(struct ice_rx_ring *rx_ring) */ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) { - int chain_len = ICE_MAX_CHAINED_RX_BUFS; struct ice_vsi *vsi = ring->vsi; u32 rxdid = ICE_RXDID_FLEX_NIC; struct ice_rlan_ctx rlan_ctx; @@ -472,17 +471,11 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) */ rlan_ctx.showiv = 0; - /* For AF_XDP ZC, we disallow packets to span on - * multiple buffers, thus letting us skip that - * handling in the fast-path. - */ - if (ring->xsk_pool) - chain_len = 1; /* Max packet size for this queue - must not be set to a larger value * than 5 x DBUF */ rlan_ctx.rxmax = min_t(u32, vsi->max_frame, - chain_len * ring->rx_buf_len); + ICE_MAX_CHAINED_RX_BUFS * ring->rx_buf_len); /* Rx queue threshold in units of 64 */ rlan_ctx.lrxqthresh = 1; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index a7fe2b4ce655..91cdd5e4790d 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -545,19 +545,6 @@ bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) return __ice_alloc_rx_bufs_zc(rx_ring, leftover); } -/** - * ice_bump_ntc - Bump the next_to_clean counter of an Rx ring - * @rx_ring: Rx ring - */ -static void ice_bump_ntc(struct ice_rx_ring *rx_ring) -{ - int ntc = rx_ring->next_to_clean + 1; - - ntc = (ntc < rx_ring->count) ? ntc : 0; - rx_ring->next_to_clean = ntc; - prefetch(ICE_RX_DESC(rx_ring, ntc)); -} - /** * ice_construct_skb_zc - Create an sk_buff from zero-copy buffer * @rx_ring: Rx ring @@ -572,8 +559,14 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) { unsigned int totalsize = xdp->data_end - xdp->data_meta; unsigned int metasize = xdp->data - xdp->data_meta; + struct skb_shared_info *sinfo = NULL; struct sk_buff *skb; + u32 nr_frags = 0; + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } net_prefetch(xdp->data_meta); skb = __napi_alloc_skb(&rx_ring->q_vector->napi, totalsize, @@ -589,6 +582,29 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) __skb_pull(skb, metasize); } + if (likely(!xdp_buff_has_frags(xdp))) + goto out; + + for (int i = 0; i < nr_frags; i++) { + struct skb_shared_info *skinfo = skb_shinfo(skb); + skb_frag_t *frag = &sinfo->frags[i]; + struct page *page; + void *addr; + + page = dev_alloc_page(); + if (!page) { + dev_kfree_skb(skb); + return NULL; + } + addr = page_to_virt(page); + + memcpy(addr, skb_frag_page(frag), skb_frag_size(frag)); + + __skb_fill_page_desc_noacc(skinfo, skinfo->nr_frags++, + addr, 0, skb_frag_size(frag)); + } + +out: xsk_buff_free(xdp); return skb; } @@ -752,6 +768,34 @@ out_failure: return result; } +static int +ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, + struct xdp_buff *xdp, const unsigned int size) +{ + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first); + + if (!size) + return 0; + + if (!xdp_buff_has_frags(first)) { + sinfo->nr_frags = 0; + sinfo->xdp_frags_size = 0; + xdp_buff_set_frags_flag(first); + } + + if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { + xsk_buff_free(first); + return -ENOMEM; + } + + __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, + virt_to_page(xdp->data_hard_start), 0, size); + sinfo->xdp_frags_size += size; + xsk_buff_add_frag(xdp); + + return 0; +} + /** * ice_clean_rx_irq_zc - consumes packets from the hardware ring * @rx_ring: AF_XDP Rx ring @@ -762,9 +806,14 @@ out_failure: int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) { unsigned int total_rx_bytes = 0, total_rx_packets = 0; + struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; + u32 ntc = rx_ring->next_to_clean; + u32 ntu = rx_ring->next_to_use; + struct xdp_buff *first = NULL; struct ice_tx_ring *xdp_ring; unsigned int xdp_xmit = 0; struct bpf_prog *xdp_prog; + u32 cnt = rx_ring->count; bool failure = false; int entries_to_alloc; @@ -774,6 +823,9 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) xdp_prog = READ_ONCE(rx_ring->xdp_prog); xdp_ring = rx_ring->xdp_ring; + if (ntc != rx_ring->first_desc) + first = *ice_xdp_buf(rx_ring, rx_ring->first_desc); + while (likely(total_rx_packets < (unsigned int)budget)) { union ice_32b_rx_flex_desc *rx_desc; unsigned int size, xdp_res = 0; @@ -783,7 +835,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) u16 vlan_tag = 0; u16 rx_ptype; - rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); + rx_desc = ICE_RX_DESC(rx_ring, ntc); stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S); if (!ice_test_staterr(rx_desc->wb.status_error0, stat_err_bits)) @@ -795,51 +847,61 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) */ dma_rmb(); - if (unlikely(rx_ring->next_to_clean == rx_ring->next_to_use)) + if (unlikely(ntc == ntu)) break; - xdp = *ice_xdp_buf(rx_ring, rx_ring->next_to_clean); + xdp = *ice_xdp_buf(rx_ring, ntc); size = le16_to_cpu(rx_desc->wb.pkt_len) & ICE_RX_FLX_DESC_PKT_LEN_M; - if (!size) { - xdp->data = NULL; - xdp->data_end = NULL; - xdp->data_hard_start = NULL; - xdp->data_meta = NULL; - goto construct_skb; - } xsk_buff_set_size(xdp, size); - xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool); + xsk_buff_dma_sync_for_cpu(xdp, xsk_pool); + + if (!first) { + first = xdp; + xdp_buff_clear_frags_flag(first); + } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) { + break; + } + + if (++ntc == cnt) + ntc = 0; + + if (ice_is_non_eop(rx_ring, rx_desc)) + continue; - xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring); + xdp_res = ice_run_xdp_zc(rx_ring, first, xdp_prog, xdp_ring); if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) { xdp_xmit |= xdp_res; } else if (xdp_res == ICE_XDP_EXIT) { failure = true; + first = NULL; + rx_ring->first_desc = ntc; break; } else if (xdp_res == ICE_XDP_CONSUMED) { - xsk_buff_free(xdp); + xsk_buff_free(first); } else if (xdp_res == ICE_XDP_PASS) { goto construct_skb; } - total_rx_bytes += size; + total_rx_bytes += xdp_get_buff_len(first); total_rx_packets++; - ice_bump_ntc(rx_ring); + first = NULL; + rx_ring->first_desc = ntc; continue; construct_skb: /* XDP_PASS path */ - skb = ice_construct_skb_zc(rx_ring, xdp); + skb = ice_construct_skb_zc(rx_ring, first); if (!skb) { rx_ring->ring_stats->rx_stats.alloc_buf_failed++; break; } - ice_bump_ntc(rx_ring); + first = NULL; + rx_ring->first_desc = ntc; if (eth_skb_pad(skb)) { skb = NULL; @@ -858,18 +920,22 @@ construct_skb: ice_receive_skb(rx_ring, skb, vlan_tag); } - entries_to_alloc = ICE_DESC_UNUSED(rx_ring); + rx_ring->next_to_clean = ntc; + entries_to_alloc = ICE_RX_DESC_UNUSED(rx_ring); if (entries_to_alloc > ICE_RING_QUARTER(rx_ring)) failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc); ice_finalize_xdp_rx(xdp_ring, xdp_xmit, 0); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); - if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { - if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool); + if (xsk_uses_need_wakeup(xsk_pool)) { + /* ntu could have changed when allocating entries above, so + * use rx_ring value instead of stack based one + */ + if (failure || ntc == rx_ring->next_to_use) + xsk_set_rx_need_wakeup(xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); + xsk_clear_rx_need_wakeup(xsk_pool); return (int)total_rx_packets; } -- cgit v1.2.3 From 10083aef784031fa9f06c19a1b182e6fad5338d9 Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Thu, 10 Aug 2023 16:51:10 -0700 Subject: ice: fix receive buffer size miscalculation The driver is misconfiguring the hardware for some values of MTU such that it could use multiple descriptors to receive a packet when it could have simply used one. Change the driver to use a round-up instead of the result of a shift, as the shift can truncate the lower bits of the size, and result in the problem noted above. It also aligns this driver with similar code in i40e. The insidiousness of this problem is that everything works with the wrong size, it's just not working as well as it could, as some MTU sizes end up using two or more descriptors, and there is no way to tell that is happening without looking at ice_trace or a bus analyzer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Przemek Kitszel Signed-off-by: Jesse Brandeburg Reviewed-by: Leon Romanovsky Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_base.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/net/ethernet/intel/ice/ice_base.c') diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index b678bdf96f3a..074bf9403cd1 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -435,7 +435,8 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) /* Receive Packet Data Buffer Size. * The Packet Data Buffer Size is defined in 128 byte units. */ - rlan_ctx.dbuf = ring->rx_buf_len >> ICE_RLAN_CTX_DBUF_S; + rlan_ctx.dbuf = DIV_ROUND_UP(ring->rx_buf_len, + BIT_ULL(ICE_RLAN_CTX_DBUF_S)); /* use 32 byte descriptors */ rlan_ctx.dsize = 1; -- cgit v1.2.3