Merge tag 'mlx5-updates-2019-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says: ==================== mlx5-updates-2019-04-22 This series includes updates to mlx5e driver RX data path and some significant XDP RX/TX improvements to overcome/mitigate HW and PCIE bottlenecks. From Tariq: 1) Some Enhancements in rq->flags 2) Stabilize RX packet rate (on Striding RQ) with multiple outstanding UMR posts In this patch, we add support for multiple outstanding UMR posts, to allow faster gap closure between consuming MPWQEs and reposting them back into the WQ. Performance test: As expected, huge improvement in large-scale (48 cores). xdp_redirect_map, 64B UDP multi-stream. Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Before: Unstable, 7 to 30 Mpps After: Stable, at 70.5 Mpps From Shay: 3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: | | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same | XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. | | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same | As we can see, feature significantly improves scaling, without hurting single ring performance. From Maxim: 4) Some trivial refactoring and code improvements prior to a larger series to support AF_XDP. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: David S. Miller <davem@davemloft.net> 2019-04-23 17:03:40 -0700
committer: David S. Miller <davem@davemloft.net> 2019-04-23 17:03:40 -0700
commit: 20eb08b2b06bf1cf5e5dda2e5212db893c3e8ffe (patch)
tree: d3457ad2030a504924fc946038a462a49cd0cab4 /include
parent: 539b593d3940fb91468b13b285db099087bfe5a9 (diff)
parent: f8ebecf2e32a62137dc5a98b2c94b1db37a0f9f8 (diff)
2 files changed, 7 insertions, 0 deletions
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 9df51da04621..fd91df3a4e09 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -75,6 +75,12 @@ enum mlx5_flow_namespace_type {
 	MLX5_FLOW_NAMESPACE_EGRESS,
 };
 
+enum {
+	FDB_BYPASS_PATH,
+	FDB_FAST_PATH,
+	FDB_SLOW_PATH,
+};
+
 struct mlx5_flow_table;
 struct mlx5_flow_group;
 struct mlx5_flow_namespace;
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 0343c81d4c5f..3ba4edbd17a6 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -395,6 +395,7 @@ struct mlx5_wqe_signature_seg {
 
 struct mlx5_wqe_inline_seg {
 	__be32	byte_count;
+	__be32	data[0];
 };
 
 enum mlx5_sig_type {
author	David S. Miller <davem@davemloft.net>	2019-04-23 17:03:40 -0700
committer	David S. Miller <davem@davemloft.net>	2019-04-23 17:03:40 -0700
commit	20eb08b2b06bf1cf5e5dda2e5212db893c3e8ffe (patch)
tree	d3457ad2030a504924fc946038a462a49cd0cab4 /include
parent	539b593d3940fb91468b13b285db099087bfe5a9 (diff)
parent	f8ebecf2e32a62137dc5a98b2c94b1db37a0f9f8 (diff)