summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice_main.c
AgeCommit message (Collapse)Author
2021-09-24ice: Delete always true check of PF pointerLeon Romanovsky
PF pointer is always valid when PCI core calls its .shutdown() and .remove() callbacks. There is no need to check it again. Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22devlink: Make devlink_register to be voidLeon Romanovsky
devlink_register() can't fail and always returns success, but all drivers are obligated to check returned status anyway. This adds a lot of boilerplate code to handle impossible flow. Make devlink_register() void and simplify the drivers that use that API call. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/netdevice.h net/socket.c d0efb16294d1 ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls") 876f0bf9d0d5 ("net: socket: simplify dev_ifconf handling") 29c4964822aa ("net: socket: rework compat_ifreq_ioctl()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-27ice: Only lock to update netdev dev_addrBrett Creeley
commit 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list") introduced calls to netif_addr_lock_bh() and netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is fine since the driver is updated the netdev's dev_addr, but since this is a spinlock, the driver cannot sleep when the lock is held. Unfortunately the functions to add/delete MAC filters depend on a mutex. This was causing a trace with the lock debug kernel config options enabled when changing the mac address via iproute. [ 203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip [ 203.273068] Preemption disabled at: [ 203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S W I 5.14.0-rc4 #2 [ 203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 203.273102] Call Trace: [ 203.273107] dump_stack_lvl+0x33/0x42 [ 203.273113] ? ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273124] ___might_sleep.cold.150+0xda/0xea [ 203.273131] mutex_lock+0x1c/0x40 [ 203.273136] ice_remove_mac+0xe3/0x180 [ice] [ 203.273155] ? ice_fltr_add_mac_list+0x20/0x20 [ice] [ 203.273175] ice_fltr_prepare_mac+0x43/0xa0 [ice] [ 203.273194] ice_set_mac_address+0xab/0x1c0 [ice] [ 203.273206] dev_set_mac_address+0xb8/0x120 [ 203.273210] dev_set_mac_address_user+0x2c/0x50 [ 203.273212] do_setlink+0x1dd/0x10e0 [ 203.273217] ? __nla_validate_parse+0x12d/0x1a0 [ 203.273221] __rtnl_newlink+0x530/0x910 [ 203.273224] ? __kmalloc_node_track_caller+0x17f/0x380 [ 203.273230] ? preempt_count_add+0x68/0xa0 [ 203.273236] ? _raw_spin_lock_irqsave+0x1f/0x30 [ 203.273241] ? kmem_cache_alloc_trace+0x4d/0x440 [ 203.273244] rtnl_newlink+0x43/0x60 [ 203.273245] rtnetlink_rcv_msg+0x13a/0x380 [ 203.273248] ? rtnl_calcit.isra.40+0x130/0x130 [ 203.273250] netlink_rcv_skb+0x4e/0x100 [ 203.273256] netlink_unicast+0x1a2/0x280 [ 203.273258] netlink_sendmsg+0x242/0x490 [ 203.273260] sock_sendmsg+0x58/0x60 [ 203.273263] ____sys_sendmsg+0x1ef/0x260 [ 203.273265] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273268] ? ____sys_recvmsg+0xe6/0x170 [ 203.273270] ___sys_sendmsg+0x7c/0xc0 [ 203.273272] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273274] ? ___sys_recvmsg+0x89/0xc0 [ 203.273276] ? __netlink_sendskb+0x50/0x50 [ 203.273278] ? mod_objcg_state+0xee/0x310 [ 203.273282] ? __dentry_kill+0x114/0x170 [ 203.273286] ? get_max_files+0x10/0x10 [ 203.273288] __sys_sendmsg+0x57/0xa0 [ 203.273290] do_syscall_64+0x37/0x80 [ 203.273295] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 203.273296] RIP: 0033:0x7f8edf96e278 [ 203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 [ 203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278 [ 203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003 [ 203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0 [ 203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005 Fix this by only locking when changing the netdev->dev_addr. Also, make sure to restore the old netdev->dev_addr on any failures. Fixes: 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09ice: don't remove netdev->dev_addr from uc sync listBrett Creeley
In some circumstances, such as with bridging, it's possible that the stack will add the device's own MAC address to its unicast address list. If, later, the stack deletes this address, the driver will receive a request to remove this address. The driver stores its current MAC address as part of the VSI MAC filter list instead of separately. So, this causes a problem when the device's MAC address is deleted unexpectedly, which results in traffic failure in some cases. The following configuration steps will reproduce the previously mentioned problem: > ip link set eth0 up > ip link add dev br0 type bridge > ip link set br0 up > ip addr flush dev eth0 > ip link set eth0 master br0 > echo 1 > /sys/class/net/br0/bridge/vlan_filtering > modprobe -r veth > modprobe -r bridge > ip addr add 192.168.1.100/24 dev eth0 The following ping command fails due to the netdev->dev_addr being deleted when removing the bridge module. > ping <link partner> Fix this by making sure to not delete the netdev->dev_addr during MAC address sync. After fixing this issue it was noticed that the netdev_warn() in .set_mac was overly verbose, so make it at netdev_dbg(). Also, there is a possibility of a race condition between .set_mac and .set_rx_mode. Fix this by calling netif_addr_lock_bh() and netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr is going to be updated in .set_mac. Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Liang Li <liali@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-09ice: Prevent probing virtual functionsAnirudh Venkataramanan
The userspace utility "driverctl" can be used to change/override the system's default driver choices. This is useful in some situations (buggy driver, old driver missing a device ID, trying a workaround, etc.) where the user needs to load a different driver. However, this is also prone to user error, where a driver is mapped to a device it's not designed to drive. For example, if the ice driver is mapped to driver iavf devices, the ice driver crashes. Add a check to return an error if the ice driver is being used to probe a virtual function. Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-07-27dev_ioctl: split out ndo_eth_ioctlArnd Bergmann
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25ice: add support for auxiliary input/output pinsMaciej Machnikowski
The E810 device supports programmable pins for enabling both input and output events related to the PTP hardware clock. This includes both output signals with programmable period, as well as timestamping of events on input pins. Add support for enabling these using the CONFIG_PTP_1588_CLOCK interface. This allows programming the software defined pins to take advantage of the hardware clock features. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25ice: add tracepointsJesse Brandeburg
This patch is modeled after one by Scott Peterson for i40e. Add tracepoints to the driver, via a new file ice_trace.h and some new trace calls added in interesting places in the driver. Add some tracing for DIMLIB to help debug interrupt moderation problems. Performance should not be affected, and this can be very useful for debugging and adding new trace events to paths in the future. Note eBPF programs can attach to these events, as well as perf can count them since we're attaching to the events subsystem in the kernel. Co-developed-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-17ice: remove local variablePaul M Stillwell Jr
Remove the local variable since it's only used once. Instead, use it directly. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-17ice: reduce scope of variablesPaul M Stillwell Jr
There are some places where the scope of a variable can be reduced so do that. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: enable transmit timestamps for E810 devicesJacob Keller
Add support for enabling Tx timestamp requests for outgoing packets on E810 devices. The ice hardware can support multiple outstanding Tx timestamp requests. When sending a descriptor to hardware, a Tx timestamp request is made by setting a request bit, and assigning an index that represents which Tx timestamp index to store the timestamp in. Hardware makes no effort to synchronize the index use, so it is up to software to ensure that Tx timestamp indexes are not re-used before the timestamp is reported back. To do this, introduce a Tx timestamp tracker which will keep track of currently in-use indexes. In the hot path, if a packet has a timestamp request, an index will be requested from the tracker. Unfortunately, this does require a lock as the indexes are shared across all queues on a PHY. There are not enough indexes to reliably assign only 1 to each queue. For the E810 devices, the timestamp indexes are not shared across PHYs, so each port can have its own tracking. Once hardware captures a timestamp, an interrupt is fired. In this interrupt, trigger a new work item that will figure out which timestamp was completed, and report the timestamp back to the stack. This function loops through the Tx timestamp indexes and checks whether there is now a valid timestamp. If so, it clears the PHY timestamp indication in the PHY memory, locks and removes the SKB and bit in the tracker, then reports the timestamp to the stack. It is possible in some cases that a timestamp request will be initiated but never completed. This might occur if the packet is dropped by software or hardware before it reaches the PHY. Add a task to the periodic work function that will check whether a timestamp request is more than a few seconds old. If so, the timestamp index is cleared in the PHY, and the SKB is released. Just as with Rx timestamps, the Tx timestamps are only 40 bits wide, and use the same overall logic for extending to 64 bits of nanoseconds. With this change, E810 devices should be able to perform basic PTP functionality. Future changes will extend the support to cover the E822-based devices. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: enable receive hardware timestampingJacob Keller
Add SIOCGHWTSTAMP and SIOCSHWTSTAMP ioctl handlers to respond to requests to enable timestamping support. If the request is for enabling Rx timestamps, set a bit in the Rx descriptors to indicate that receive timestamps should be reported. Hardware captures receive timestamps in the PHY which only captures part of the timer, and reports only 40 bits into the Rx descriptor. The upper 32 bits represent the contents of GLTSYN_TIME_L at the point of packet reception, while the lower 8 bits represent the upper 8 bits of GLTSYN_TIME_0. The networking and PTP stack expect 64 bit timestamps in nanoseconds. To support this, implement some logic to extend the timestamps by using the full PHC time. If the Rx timestamp was captured prior to the PHC time, then the real timestamp is PHC - (lower_32_bits(PHC) - timestamp) If the Rx timestamp was captured after the PHC time, then the real timestamp is PHC + (timestamp - lower_32_bits(PHC)) These calculations are correct as long as neither the PHC timestamp nor the Rx timestamps are more than 2^32-1 nanseconds old. Further, we can detect when the Rx timestamp is before or after the PHC as long as the PHC timestamp is no more than 2^31-1 nanoseconds old. In that case, we calculate the delta between the lower 32 bits of the PHC and the Rx timestamp. If it's larger than 2^31-1 then the Rx timestamp must have been captured in the past. If it's smaller, then the Rx timestamp must have been captured after PHC time. Add an ice_ptp_extend_32b_ts function that relies on a cached copy of the PHC time and implements this algorithm to calculate the proper upper 32bits of the Rx timestamps. Cache the PHC time periodically in all of the Rx rings. This enables each Rx ring to simply call the extension function with a recent copy of the PHC time. By ensuring that the PHC time is kept up to date periodically, we ensure this algorithm doesn't use stale data and produce incorrect results. To cache the time, introduce a kworker and a kwork item to periodically store the Rx time. It might seem like we should use the .do_aux_work interface of the PTP clock. This doesn't work because all PFs must cache this time, but only one PF owns the PTP clock device. Thus, the ice driver will manage its own kthread instead of relying on the PTP do_aux_work handler. With this change, the driver can now report Rx timestamps on all incoming packets. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: register 1588 PTP clock device object for E810 devicesJacob Keller
Add a new ice_ptp.c file for holding the basic PTP clock interface functions. If the device supports PTP, call the new ice_ptp_init and ice_ptp_release functions where appropriate. If the function owns the hardware resource associated with the PTP hardware clock, register with the PTP_1588_CLOCK infrastructure to allocate a new clock object that represents the device hardware clock. Implement basic functionality for reading and setting the clock time, performing clock adjustments, and adjusting the clock frequency. Future changes will introduce functionality for handling related features including Tx and Rx timestamps. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: add support for sideband messagesJacob Keller
In order to support certain device features, including enabling the PTP hardware clock, the ice driver needs to control some registers on the device PHY. These registers are accessed by sending sideband messages. For some hardware, these messages must be sent over the device admin queue, while other hardware has a dedicated control queue for the sideband messages. Add the neighbor device message structure for sending a message to the neighboring device. Where supported, initialize the sideband control queue and handle cleanup. Add a wrapper function for sending sideband control queue messages that read or write a neighboring device register. Because some devices send sideband messages over the AdminQ, also increase the length of the admin queue to allow more messages to be queued up. This is important because the sideband messages add additional pressure on the AQ usage. This support will be used in following patches to enable support for CONFIG_1588_PTP_CLOCK. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-09ice: add ndo_bpf callback for safe mode netdev opsMaciej Fijalkowski
ice driver requires a programmable pipeline firmware package in order to have a support for advanced features. Otherwise, driver falls back to so called 'safe mode'. For that mode, ndo_bpf callback is not exposed and when user tries to load XDP program, the following happens: $ sudo ./xdp1 enp179s0f1 libbpf: Kernel error message: Underlying driver does not support XDP in native mode link set xdp fd failed which is sort of confusing, as there is a native XDP support, but not in the current mode. Improve the user experience by providing the specific ndo_bpf callback dedicated for safe mode which will make use of extack to explicitly let the user know that the DDP package is missing and that's the reason that the XDP can't be loaded onto interface currently. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Fixes: efc2214b6047 ("ice: Add support for XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: Detect and report unsupported module power levelsAnirudh Venkataramanan
Determine whether an unsupported power configuration is preventing link establishment by storing and checking the link_cfg_err_byte. Print error messages when module power levels are unsupported. Also add a new flag bit to prevent spamming said error messages. Co-developed-by: Jeb Cramer <jeb.j.cramer@intel.com> Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: (re)initialize NVM fields when rebuildingJacob Keller
After performing a flash update, a device EMP reset may occur. This reset will cause the newly downloaded firmware to be initialized. When this happens, the driver still reports the previous NVM version information. This is because the NVM versions are cached within the hw structure. This can be confusing, as the new firmware is in fact running in this case. Handle this by calling ice_init_nvm when rebuilding the driver state. This will update the flash version information and ensures that the current values are displayed when reporting the NVM versions to the stack. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: wait for reset before reporting devlink infoJacob Keller
Requesting device firmware information while the device is busy cleaning up after a reset can result in an unexpected failure: This occurs because the command is attempting to access the device AdminQ while it is down. Resolve this by having the command wait for a while until the reset is complete. To do this, introduce a reset_wait_queue and associated helper function "ice_wait_for_reset". This helper will use the wait queue to sleep until the driver is done rebuilding. Use of a wait queue is preferred because the potential sleep duration can be several seconds. To ensure that the thread wakes up properly, a new wake_up call is added during all code paths which clear the reset state bits associated with the driver rebuild flow. Using this ensures that tools can request device information without worrying about whether the driver is cleaning up from a reset. Specifically, it is expected that a flash update could result in a device reset, and it is better to delay the response for information until the reset is complete rather than exit with an immediate failure. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28ice: Register auxiliary device to provide RDMADave Ertman
Register ice client auxiliary RDMA device on the auxiliary bus per PCIe device function for the auxiliary driver (irdma) to attach to. It allows to realize a single RDMA driver (irdma) capable of working with multiple netdev drivers over multi-generation Intel HW supporting RDMA. There is no load ordering dependencies between ice and irdma. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28ice: Implement iidc operationsDave Ertman
Add implementations for supporting iidc operations for device operation such as allocation of resources and event notifications. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28ice: Initialize RDMA supportDave Ertman
Probe the device's capabilities to see if it supports RDMA. If so, allocate and reserve resources to support its operation; populate structures with initial values. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22ice: warn about potentially malicious VFsVignesh Sridhar
Attempt to detect malicious VFs and, if suspected, log the information but keep going to allow the user to take any desired actions. Potentially malicious VFs are identified by checking if the VFs are transmitting too many messages via the PF-VF mailbox which could cause an overflow of this channel resulting in denial of service. This is done by creating a snapshot or static capture of the mailbox buffer which can be traversed and in which the messages sent by VFs are tracked. Co-developed-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com> Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com> Co-developed-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Co-developed-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: remove return variablePaul M Stillwell Jr
We were saving the return value from ice_vsi_manage_rss_lut(), but the errors from that function are not critical so change it to return void and remove the code that saved the value. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: print name in /proc/iomemJesse Brandeburg
The driver previously printed it's PCI address in the name field for the pci resource, which when displayed via /proc/iomem, would print the same thing twice. It's more useful for debugging to see the driver name, as most other modules do. Here's a diff of before and after this change: 99100000-991fffff : 0000:3b:00.1 9a000000-a04fffff : PCI Bus 0000:3b 9a000000-9bffffff : 0000:3b:00.1 - 9a000000-9bffffff : 0000:3b:00.1 + 9a000000-9bffffff : ice 9c000000-9dffffff : 0000:3b:00.0 - 9c000000-9dffffff : 0000:3b:00.0 + 9c000000-9dffffff : ice 9e000000-9effffff : 0000:3b:00.1 9f000000-9fffffff : 0000:3b:00.0 a0000000-a000ffff : 0000:3b:00.1 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: replace custom AIM algorithm with kernel's DIM libraryJacob Keller
The ice driver has support for adaptive interrupt moderation, an algorithm for tuning the interrupt rate dynamically. This algorithm is based on various assumptions about ring size, socket buffer size, link speed, SKB overhead, ethernet frame overhead and more. The Linux kernel has support for a dynamic interrupt moderation algorithm known as "dimlib". Replace the custom driver-specific implementation of dynamic interrupt moderation with the kernel's algorithm. The Intel hardware has a different hardware implementation than the originators of the dimlib code had to work with, which requires the driver to use a slightly different set of inputs for the actual moderation values, while getting all the advice from dimlib of better/worse, shift left or right. The change made for this implementation is to use a pair of values for each of the 5 "slots" that the dimlib moderation expects, and the driver will program those pairs when dimlib recommends a slot to use. The currently implementation uses two tables, one for receive and one for transmit, and the pairs of values in each slot set the maximum delay of an interrupt and a maximum number of interrupts per second (both expressed in microseconds). There are two separate kinds of bugs fixed by using DIMLIB, one is UDP single stream send was too slow, and the other is that 8K ping-pong was going to the most aggressive moderation and has much too high latency. The overall result of using DIMLIB is that we meet or exceed our performance expectations set based on the old algorithm. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: Add new VSI states to track netdev alloc/registrationAnirudh Venkataramanan
Add two new VSI states, one to track if a netdev for the VSI has been allocated and the other to track if the netdev has been registered. Call unregister_netdev/free_netdev only when the corresponding state bits are set. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: Drop leading underscores in enum ice_pf_stateAnirudh Venkataramanan
Remove the leading underscores in enum ice_pf_state. This is not really communicating anything and is unnecessary. No functional change. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-08ice: fix memory leak of aRFS after resuming from suspendYongxin Liu
In ice_suspend(), ice_clear_interrupt_scheme() is called, and then irq_free_descs() will be eventually called to free irq and its descriptor. In ice_resume(), ice_init_interrupt_scheme() is called to allocate new irqs. However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap maybe cannot be freed, if the irqs that released in ice_suspend() were reassigned to other devices, which makes irq descriptor's affinity_notify lost. So call ice_free_cpu_rx_rmap() before ice_clear_interrupt_scheme(), which can make sure all irq_glue and cpu_rmap can be correctly released before corresponding irq and descriptor are released. Fix the following memory leak. unreferenced object 0xffff95bd951afc00 (size 512): comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s) hex dump (first 32 bytes): 18 00 00 00 18 00 18 00 70 fc 1a 95 bd 95 ff ff ........p....... 00 00 ff ff 01 00 ff ff 02 00 ff ff 03 00 ff ff ................ backtrace: [<0000000072e4b914>] __kmalloc+0x336/0x540 [<0000000054642a87>] alloc_cpu_rmap+0x3b/0xb0 [<00000000f220deec>] ice_set_cpu_rx_rmap+0x6a/0x110 [ice] [<000000002370a632>] ice_probe+0x941/0x1180 [ice] [<00000000d692edba>] local_pci_probe+0x47/0xa0 [<00000000503934f0>] work_for_cpu_fn+0x1a/0x30 [<00000000555a9e4a>] process_one_work+0x1dd/0x410 [<000000002c4b414a>] worker_thread+0x221/0x3f0 [<00000000bb2b556b>] kthread+0x14c/0x170 [<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30 unreferenced object 0xffff95bd81b0a2a0 (size 96): comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s) hex dump (first 32 bytes): 38 00 00 00 01 00 00 00 e0 ff ff ff 0f 00 00 00 8............... b0 a2 b0 81 bd 95 ff ff b0 a2 b0 81 bd 95 ff ff ................ backtrace: [<00000000582dd5c5>] kmem_cache_alloc_trace+0x31f/0x4c0 [<000000002659850d>] irq_cpu_rmap_add+0x25/0xe0 [<00000000495a3055>] ice_set_cpu_rx_rmap+0xb4/0x110 [ice] [<000000002370a632>] ice_probe+0x941/0x1180 [ice] [<00000000d692edba>] local_pci_probe+0x47/0xa0 [<00000000503934f0>] work_for_cpu_fn+0x1a/0x30 [<00000000555a9e4a>] process_one_work+0x1dd/0x410 [<000000002c4b414a>] worker_thread+0x221/0x3f0 [<00000000bb2b556b>] kthread+0x14c/0x170 [<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30 Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL") Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Remove unnecessary blank lineTony Nguyen
Checkpatch reports the following, fix it. ----------------------------------------- drivers/net/ethernet/intel/ice/ice_main.c ----------------------------------------- CHECK:BRACES: Blank lines aren't necessary before a close brace '}' FILE: drivers/net/ethernet/intel/ice/ice_main.c:455: + +} Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
2021-04-07ice: Remove unnecessary checks in add/kill_vid ndo opsBrett Creeley
Currently the driver is doing two unnecessary checks. First both ops are checking if the VLAN ID passed in is less than VLAN_N_VID and second both ops are checking to see if a port VLAN is configured on the VSI. The first check is already handled by the 8021q driver so this is an unnecessary check. The second check is unnecessary because the PF VSI is never put into a port VLAN. Remove these checks. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Remove rx_gro_dropped statAnirudh Venkataramanan
Tracking of the rx_gro_dropped statistic was removed in commit f73fc40327c0 ("ice: drop dead code in ice_receive_skb()"). Remove the associated variables and its reporting to ethtool stats. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Use local variable instead of pointer derefsAnirudh Venkataramanan
Replace multiple instances of vsi->back and pi->phy with equivalent local variables Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Remove unnecessary variableAnirudh Venkataramanan
In ice_init_phy_user_cfg, vsi is used only to get to hw. Remove this and just use pi->hw Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Use default configuration mode for PHY configurationAnirudh Venkataramanan
Recent firmware supports a new "get PHY capabilities" mode ICE_AQC_REPORT_DFLT_CFG which makes it unnecessary for the driver to track and apply NVM based default link overrides. If FW AQ API version supports it, use Report Default Configuration. Add check function for Report Default Configuration support and update accordingly. Also change adv_phy_type_[lo|hi] to advert_phy_type[lo|hi] for clarity. Co-developed-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com> Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Ignore EMODE return for opcode 0x0605Anirudh Venkataramanan
When link is owned by manageability, the driver is not allowed to fiddle with link. FW returns ICE_AQ_RC_EMODE if the driver attempts to do so. This patch adds a new function ice_set_link which abstracts the call to ice_aq_set_link_restart_an and provides a clean way to turn on/off link. While making this change, I also spotted that an int variable was being used to hold both an ice_status return code and the Linux errno return code. This pattern more often than not results in the driver inadvertently returning ice_status back to kernel which is a major boo-boo. Clean it up. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07ice: Align macro names to the specificationAnirudh Venkataramanan
For get PHY abilities AQ, the specification defines "report modes" as "with media", "without media" and "active configuration". For clarity, rename macros to align with the specification. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Consolidate VSI state and flagsAnirudh Venkataramanan
struct ice_vsi has two fields, state and flags which seem to be serving the same purpose. Consolidate them into one field 'state'. enum ice_state is used to represent state information of the PF. While some of these enum values can be use to represent VSI state, it makes more sense to represent VSI state with its own enum. So derive a new enum ice_vsi_state from ice_vsi_flags and ice_state and use it. Also rename enum ice_state to ice_pf_state for clarity. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Refactor ice_set/get_rss into LUT and key specific functionsBrett Creeley
Currently ice_set/get_rss are used to set/get the RSS LUT and/or RSS key. However nearly everywhere these functions are called only the LUT or key are set/get. Also, making this change reduces how many things ice_set/get_rss are doing. Fix this by adding ice_set/get_rss_lut and ice_set/get_rss_key functions. Also, consolidate all calls for setting/getting the RSS LUT and RSS Key to use ice_set/get_rss_lut() and ice_set/get_rss_key(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Refactor get/set RSS LUT to use struct parameterBrett Creeley
Update ice_aq_get_rss_lut() and ice_aq_set_rss_lut() to take a new structure ice_aq_get_set_rss_params instead of passing individual parameters. This is done for 2 reasons: 1. Reduce the number of parameters passed to the functions. 2. Reduce the amount of change required if the arguments ever need to be updated in the future. Also, reduce duplicate code that was checking for an invalid vsi_handle and lut parameter by moving the checks to the lower level __ice_aq_get_set_rss_lut(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: change link misconfiguration messagePaul Greenwalt
Change link misconfiguration message since the configuration could be intended by the user. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Delay netdev registrationAnirudh Venkataramanan
Once a netdev is registered, the corresponding network interface can be immediately used by userspace utilities (like say NetworkManager). This can be problematic if the driver technically isn't fully up yet. Move netdev registration to the end of probe, as by this time the driver data structures and device will be initialized as expected. However, delaying netdev registration causes a failure in the aRFS flow where netdev->reg_state == NETREG_REGISTERED condition is checked. It's not clear why this check was added to begin with, so remove it. Local testing didn't indicate any issues with this change. The state bit check in ice_open was put in as a stop-gap measure to prevent a premature interface up operation. This is no longer needed, so remove it. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-29ice: Use port number instead of PF ID for WoLAnirudh Venkataramanan
As per the spec, the WoL control word read from the NVM should be interpreted as port numbers, and not PF numbers. So when checking if WoL supported, use the port number instead of the PF ID. Also, ice_is_wol_supported doesn't really need a pointer to the pf struct, but just needs a pointer to the hw instance. Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-29ice: prevent ice_open and ice_stop during resetKrzysztof Goreczny
There is a possibility of race between ice_open or ice_stop calls performed by OS and reset handling routine both trying to modify VSI resources. Observed scenarios: - reset handler deallocates memory in ice_vsi_free_arrays and ice_open tries to access it in ice_vsi_cfg_txq leading to driver crash - reset handler deallocates memory in ice_vsi_free_arrays and ice_close tries to access it in ice_down leading to driver crash - reset handler clears port scheduler topology and sets port state to ICE_SCHED_PORT_STATE_INIT leading to ice_ena_vsi_txq fail in ice_open To prevent this additional checks in ice_open and ice_stop are introduced to make sure that OS is not allowed to alter VSI config while reset is in progress. Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx") Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-29ice: Continue probe on link/PHY errorsAnirudh Venkataramanan
An incorrect NVM update procedure can result in the driver failing probe. In this case, the recommended resolution method is to update the NVM using the right procedure. However, if the driver fails probe, the user will not be able to update the NVM. So do not fail probe on link/PHY errors. Fixes: 1a3571b5938c ("ice: restore PHY settings on media insertion") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-23ice: Fix prototype warningsTony Nguyen
Correct reported warnings for "warning: expecting prototype for ... Prototype was for ... instead" Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
2021-03-22ice: Check FDIR program status for AVFQi Zhang
Enable returning FDIR completion status by checking the ctrl_vsi Rx queue descriptor value. To enable returning FDIR completion status from ctrl_vsi Rx queue, COMP_Queue and COMP_Report of FDIR filter programming descriptor needs to be properly configured. After program request sent to ctrl_vsi Tx queue, ctrl_vsi Rx queue interrupt will be triggered and completion status will be returned. Driver will first issue request in ice_vc_fdir_add_fltr(), then pass FDIR context to the background task in interrupt service routine ice_vc_fdir_irq_handler() and finally deal with them in ice_flush_fdir_ctx(). ice_flush_fdir_ctx() will check the descriptor's value, fdir context, and then send back virtual channel message to VF by calling ice_vc_add_fdir_fltr_post(). An additional timer will be setup in case of hardware interrupt timeout. Signed-off-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Chen Bo <BoX.C.Chen@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>