summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core
AgeCommit message (Collapse)Author
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-03Merge tag 'for-linus-ioctl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is a big pull request. Of note is that I'm sending you the new ioctl API for the rdma subsystem. We put it up on linux-api@, but didn't get much response. The API is complex, but it solves two different problems in one go: 1) The bi-directional nature of the RDMA file write calls, which created the security hole we had to handle (and for which the fix is now causing problems for systems in production, we were a bit over zealous in the fix and the ability to open a device, then fork, then create new queue pairs on the device and use them is broken). 2) The bloat caused by different vendors implementing extensions to the base verbs API. Each vendor's hardware is slightly different, and the hardware might be suitable for one extension but not another. By the time we add generic extensions for all the different ways that the different hardware can offload things, the API becomes bloated. Things like our completion structs have started to exceed a cache line in size because of all the elements needed to support this. That in turn shows up heavily in the performance graphs with a noticable drop in performance on 100Gigabit links as our completion structs go from occupying one cache line to 1+. This API makes things like the completion structs modular in a very similar way to netlink so that your structs can only include the items needed for the offloads/features you are actually using on a given queue pair. In that way we support everything, but only use what we need, and our structs stay smaller. The ioctl API is better explained by the posting on linux-api@ than I can explain it here, so I'll just leave it at that. The rest of the pull request is typical stuff. Updates for 4.14 kernel merge window - Lots of hfi1 driver updates (mixed with a few qib and core updates as well) - rxe updates - various mlx updates - Set default roce type to RoCEv2 - Several larger fixes for bnxt_re that were too big for -rc - Several larger fixes for qedr that, likewise, were too big for -rc - Misc core changes - Make the hns_roce driver compilable on arches other than aarch64 so we can more easily debug build issues related to it - Add rdma-netlink infrastructure updates - Add automatic IRQ affinity infrastructure - Add 32bit lid support - Lots of misc fixes across the subsystem from random people - Autoloading of RDMA netlink modules - PCI pool cleanups from Romain Perier - mlx5 driver feature additions and fixes - Hardware tag matchine feature - Fix sleeping in atomic when resolving roce ah - Add experimental ioctl interface as posted to linux-api@" * tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits) IB/core: Expose ioctl interface through experimental Kconfig IB/core: Assign root to all drivers IB/core: Add completion queue (cq) object actions IB/core: Add legacy driver's user-data IB/core: Export ioctl enum types to user-space IB/core: Explicitly destroy an object while keeping uobject IB/core: Add macros for declaring methods and attributes IB/core: Add uverbs merge trees functionality IB/core: Add DEVICE object and root tree structure IB/core: Declare an object instead of declaring only type attributes IB/core: Add new ioctl interface RDMA/vmw_pvrdma: Fix a signedness RDMA/vmw_pvrdma: Report network header type in WC IB/core: Add might_sleep() annotation to ib_init_ah_from_wc() IB/cm: Fix sleeping in atomic when RoCE is used IB/core: Add support to finalize objects in one transaction IB/core: Add a generic way to execute an operation on a uobject Documentation: Hardware tag matching IB/mlx5: Support IB_SRQT_TM net/mlx5: Add XRQ support ...
2017-09-03net/mlx5e: Distribute RSS table among all RX ringsTariq Toukan
In default, uniformly distribute the RSS indirection table entries among all RX rings, rather than restricting this only to the rings on the close NUMA node. irqbalancer would anyway dynamically override the default affinities set to the RX rings. This gives better multi-stream performance and CPU util. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Stop NAPI when irq balancer changes affinityTariq Toukan
NAPI context keeps rescheduling on same CPU as long as it's busy. This doesn't give the oppurtunity for changes in irq affinities to take effect. Fix that by calling napi_complete_done() upon a change in affinity. This would stop the NAPI and reschedule it on the new CPU. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Use kernel's mechanism to avoid missing NAPIsTariq Toukan
We used a channel state bit MLX5E_CHANNEL_NAPI_SCHED to make sure no NAPI is missed when a channel's napi_schedule() is called for completion events of the different channel's resources/rings while NAPI is currently running. Now, as similar mechanism is implemented in kernel, ("39e6c8208d7b net: solve a NAPI race"), we obsolete our own implementation and rely on the return value of napi_complete_done(). This patch removes a redundant overhead of atomic bit operations. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Slightly increase RX page-cache sizeTariq Toukan
In XDP_TX flow, we now get back quicker to each page in page-cache, and on some occasions refcount does not get back to 1 on time, causing some costly page allocations. Slightly increase the size of RX page-cache to significantly decrease the chances for this to happen. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Don't recycle page if moved to far NUMATariq Toukan
Avoid recycling an RX page if it moved to another NUMA node. Add an ethtool counter to count such events. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Remove unnecessary fields in ICO SQTariq Toukan
As of current design, in each NAPI, only a single UMR WQE completion could be available in the completion queue of the the internal control operations (ICO) send queue, in addition to nop operations that require no actions upon completion. This renders the consume index obsolete, as the wqe_counter field in CQE is sufficient. This helps removing a memory barrier, and obsoletes the need for tracking the num_wqebbs to update the consumer counter. In addition, remove other unused fields in icosq struct: pdev, dma_fifo_pc, and prev_cc. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Type-specific optimizations for RX post WQEs functionTariq Toukan
Separate the RX post WQEs function of the different RQ types. This enables RQ type-specific optimizations in data-path. Poll the ICOSQ completion queue only for Striding RQ, and only when a UMR post completion could be possibly available. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Non-atomic RQ state indicator for UMR WQE in progressTariq Toukan
The indication for a UMR WQE in progress is needed only within the NAPI context, and hence no races possible and no need for the use of atomic operations. The only place the flag is read outside of NAPI context is in closure flow, after RQ is disabled flag is no more accessed in NAPI. Use a boolean instead of a bit in ring state, so that its non-atomic set operations do not race with the atomic sets of the other bits. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Non-atomic indicator for ring enabled stateTariq Toukan
Rings enabled state change occurs in control path only, and is always followed by a napi_sychronize(), so that following NAPIs read the new value. This read does not need to be atomic. The RQ auto-moderation bit is not set/cleared in data-path. No need for atomic read, a regular read operation is sufficient. In RQ creation time as well, there's no multiple threads trying to access it yet, hence a regular read can be used. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Refactor data-path lro header functionTariq Toukan
Refactor function mlx5e_lro_update_hdr() to reduce number of branches. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Early-return on empty completion queuesTariq Toukan
NAPI context handles different kinds of completion queues (RX, TX, and others). Hence, upon a poll trial, some of them might be empty. Here we early-return upon empty completion queues, as well as full rx buffer, and save unnecessary logic and memory barriers. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: NAPI busy-poll when UMR post is in progressTariq Toukan
If a UMR post is in progress, it means that there's a missing WQE in RQ, and that a completion will be shortly available in ICO SQ completion queue. Prefer busy-poll to handle it as soon as possible. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Small enhancements for RX MPWQE allocation and freeTariq Toukan
The dma offset of a MPWQE (Multi-Packet WQE) in memory region is fixed for all rounds. Calculate it once on creation time, instead of in runtime. This also obsoletes the wqe argument in the function. In addition, optimize dma_info iterator calculation. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Use memset to init skbs_frags array to zerosTariq Toukan
In RX data-path, use memset() instead of loop assignment to init the whole skbs_frags array. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Remove unnecessary wqe_sz field from RQ bufferTariq Toukan
Field is used only locally within the RQ create function. The use of a local variable is sufficient. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Replace multiplication by stride size with a shiftTariq Toukan
In RX data-path, use shift operations instead of a regular multiplication by stride size, as it is a power of two. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03net/mlx5e: Reorganize struct mlx5e_rqTariq Toukan
Bring fast-path fields together, and combine RX WQE mutual exclusive fields into a union. Page-reuse and XDP are mutually exclusive and cannot be used at the same time. Use a union to combine their footprints. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31net/mlx5e: Support RSS for GRE tunneled packetsGal Pressman
Introduce a new flow table and indirect TIRs which are used to hash the inner packet headers of GRE tunneled packets. When a GRE tunneled packet is received, the TTC flow table will match the new IPv4/6->GRE rules which will forward it to the inner TTC table. The inner TTC is similar to its counterpart outer TTC table, but matching the inner packet headers instead of the outer ones (and does not include the new IPv4/6->GRE rules). The new rules will not add steering hops since they are added to an already existing flow group which will be matched regardless of this patch. Non GRE traffic will not be affected. The inner flow table will forward the packet to inner indirect TIRs which hash the inner packet and thus result in RSS for the tunneled packets. Testing 8 TCP streams bandwidth over GRE: System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Before: 21.3 Gbps (Single RQ) Now : 90.5 Gbps (RSS spread on 8 RQs) Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-31net/mlx5e: Support TSO and TX checksum offloads for GRE tunnelsGal Pressman
Add TX offloads support for GRE tunneled packets by reporting the needed netdev features. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-31net/mlx5e: Use IP version matching to classify IP trafficGal Pressman
This change adds the ability for flow steering to classify IPv4/6 packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP packets and hit IPv4/6 classification steering rules. Since IP packets with MPLS tag header have MPLS ethertype, they missed the IPv4/6 ethertype rule and ended up hitting the default filter forwarding all the packets to the same single RQ (No RSS). Since our device is able to look past the MPLS tag and identify the next protocol we introduce this solution which replaces ethertype matching by the device's capability to perform IP version identification and matching in order to distinguish between IPv4 and IPv6. Therefore, when driver is performing flow steering configuration on the device it will use IP version matching in IP classified rules instead of ethertype matching which will cause relevant MPLS tagged packets to hit this rule as well. If the device doesn't support IP version matching the driver will fall back to use legacy ethertype matching in the steering as before. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Fix CQ moderation mode not set properlyTal Gilboa
cq_period_mode assignment was mistakenly removed so it was always set to "0", which is EQE based moderation, regardless of the device CAPs and requested value in ethtool. Fixes: 6a9764efb255 ("net/mlx5e: Isolate open_channels from priv->params") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Fix inline header size for small packetsMoshe Shemesh
Fix inline header size, make sure it is not greater than skb len. This bug effects small packets, for example L2 packets with size < 18. Fixes: ae76715d153e ("net/mlx5e: Check the minimum inline header mode before xmit") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5: E-Switch, Unload the representors in the correct orderShahar Klein
When changing from switchdev to legacy mode, all the representor port devices (uplink nic and reps) are cleaned up. Part of this cleaning process is removing the neigh entries and the hash table containing them. However, a representor neigh entry might be linked to the uplink port hash table and if the uplink nic is cleaned first the cleaning of the representor will end up in null deref. Fix that by unloading the representors in the opposite order of load. Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source addressPaul Blakey
Currently if vxlan tunnel ipv6 src isn't supplied the driver fails to resolve it as part of the route lookup. The resulting encap header is left with a zeroed out ipv6 src address so the packets are sent with this src ip. Use an appropriate route lookup API that also resolves the source ipv6 address if it's not supplied. Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Don't override user RSS upon set channelsInbar Karmy
Currently, increasing the number of combined channels is changing the RSS spread to use the new created channels. Prevent the RSS spread change in case the user explicitly declare it, to avoid overriding user configuration. Tested: when RSS default: # ethtool -L ens8 combined 4 RSS spread will change and point to 4 channels. # ethtool -X ens8 equal 4 # ethtool -L ens8 combined 6 RSS will not change after increasing the number of the channels. Fixes: 8bf368620486 ('ethtool: ensure channel counts are within bounds during SCHANNELS') Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Fix dangling page pointer on DMA mapping errorEran Ben Elisha
Function mlx5e_dealloc_rx_wqe is using page pointer value as an indication to valid DMA mapping. In case that the mapping failed, we released the page but kept the dangling pointer. Store the page pointer only after the DMA mapping passed to avoid invalid page DMA unmap. Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5: Remove the flag MLX5_INTERFACE_STATE_SHUTDOWNHuy Nguyen
MLX5_INTERFACE_STATE_SHUTDOWN is not used in the code. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5: Skip mlx5_unload_one if mlx5_load_one failsHuy Nguyen
There is an issue where the firmware fails during mlx5_load_one, the health_care timer detects the issue and schedules a health_care call. Then the mlx5_load_one detects the issue, cleans up and quits. Then the health_care starts and calls mlx5_unload_one to clean up the resources that no longer exist and causes kernel panic. The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set after mlx5_load_one fails. The solution is removing the bit MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN is redundant and we can use MLX5_INTERFACE_STATE_UP instead. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5: Fix arm SRQ command for ISSI version 0Noa Osherovich
Support for ISSI version 0 was recently broken as the arm_srq_cmd command, which is used only for ISSI version 0, was given the opcode for ISSI version 1 instead of ISSI version 0. Change arm_srq_cmd to use the correct command opcode for ISSI version 0. Fixes: af1ba291c5e4 ('{net, IB}/mlx5: Refactor internal SRQ API') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap.Huy Nguyen
Current code doesn't report DCB_CAP_DCBX_HOST capability when query through getcap. User space lldptool expects capability to have HOST mode set when it wants to configure DCBX CEE mode. In absence of HOST mode capability, lldptool fails to switch to CEE mode. This fix returns DCB_CAP_DCBX_HOST capability when port's DCBX controlled mode is under software control. Fixes: 3a6a931dfb8e ("net/mlx5e: Support DCBX CEE API") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-30net/mlx5e: Check for qos capability in dcbnl_initializeHuy Nguyen
qos capability is the master capability bit that determines if the DCBX is supported for the PCI function. If this bit is off, driver cannot run any dcbx code. Fixes: e207b7e99176 ("net/mlx5e: ConnectX-4 firmware support for DCBX") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-29net/mlx5: Add XRQ supportArtemy Kovalyov
Add support to new XRQ(eXtended shared Receive Queue) hardware object. It supports SRQ semantics with addition of extended receive buffers topologies and offloads. Currently supports tag matching topology and rendezvouz offload. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24Merge tag 'mlx5-updates-2017-08-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-08-24 This series includes updates to mlx5 core driver. From Gal and Saeed, three cleanup patches. From Matan, Low level flow steering improvements and optimizations, - Use more efficient data structures for flow steering objects handling. - Add tracepoints to flow steering operations. - Overall these patches improve flow steering rule insertion rate by a factor of seven in large scales (~50K rules or more). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net/mlx5e: make mlx5e_profile constBhumika Goyal
Make this const as it is only passed as an argument to the function mlx5e_create_netdev and the corresponding argument is of type const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24net/mlx5: Add tracepointsMatan Barak
Add a tracepoint infrastructure for mlx5_core driver. Implemented flow steering tracepoints: 1. Add flow group 2. Remove flow group 3. Add flow table entry 4. Remove flow table entry 5. Add flow table rule 6. Remove flow table rule Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24net/mlx5: Add hash table for flow groups in flow tableMatan Barak
When adding a flow table entry (fte) to a flow table (ft), we first need to find its flow group (fg). Currently, this is done by traversing a linear list of all flow groups in the flow table. Furthermore, since multiple flow groups which correspond to the same fte mask may exist in the same ft, we can't just stop at the first match. Converting the linear list to rhltable in order to speed things up. The last four patches increases the steering rules update rate by a factor of more than 7 (for insertion of 50K steering rules). Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24net/mlx5: Add hash table to search FTEs in a flow-groupMatan Barak
When adding a flow table entry (fte) to a flow group (fg), we first need to check whether this fte exist. In such a case we just merge the destinations (if possible). Currently, this is done by traversing the fte list available in a fg. This could take a lot of time when using large flow groups. Speeding this up by using rhashtable, which is much faster. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24net/mlx5: Don't store reserved part in FTEs and FGsMatan Barak
The current code stores fte_match_param in the software representation of FTEs and FGs. fte_match_param contains a large reserved area at the bottom of the struct. Since downstream patches are going to hash this part, we would like to avoid doing so on a reserved part. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24net/mlx5: Convert linear search for free index to idaMatan Barak
When allocating a flow table entry, we need to allocate a free index in the flow group. Currently, this is done by traversing the existing flow table entries in the flow group, until a free index is found. Replacing this by using a ida, which allows us to find a free index much faster. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24net/mlx5e: Fix wrong code indentation in conditional statementGal Pressman
Fix the following checkpatch warning in en_ethtool.c: WARNING: suspect code indent for conditional statements (8, 9) + for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++) + strcpy(data + (idx++) * ETH_GSTRING_LEN, Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24net/mlx5: Add a blank line after declarations V2Saeed Mahameed
The blank line should be after u32 val = ... and not after __be32 __iomem *addr = ... Fixes: ad5b39a95c83 ("net/mlx5: Add a blank line after declarations") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Joe Perches <joe@perches.com>
2017-08-22mlx5: Replace PCI pool old APIRomain Perier
The PCI pool API is deprecated. This commit replaces the PCI pool old API by the appropriate function with the DMA pool API. Signed-off-by: Romain Perier <romain.perier@collabora.com> Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com> Acked-by: Doug Ledford <dledford@redhat.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-20net/mlx5e: Use size_t to store byte offset in statistics descriptorsGal Pressman
The byte offset of counter descriptors should be stored in size_t variable instead of an integer. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20net/mlx5e: Use kernel types instead of uint*_t in ethtool callbacksGal Pressman
Fix checkpatch errors: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t' Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20net/mlx5e: Place constants on the right side of comparisonsOr Gerlitz
To fix these checkpatch complaints: WARNING: Comparisons should place the constant on the right side of the test Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20net/mlx5e: Avoid using multiple blank linesOr Gerlitz
To fix these checkpatch complaints: CHECK: Please don't use multiple blank lines Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20net/mlx5e: Properly indent within conditional statementsOr Gerlitz
To fix these checkpatch complaints: WARNING: suspect code indent for conditional statements (8, 24) + if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR) [...] + return PORT_FIBRE; Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>