summaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2020-10-11bpf, sockmap: Remove skb_orphan and let normal skb_kfree do cleanupJohn Fastabend
Calling skb_orphan() is unnecessary in the strp rcv handler because the skb is from a skb_clone() in __strp_recv. So it never has a destructor or a sk assigned. Plus its confusing to read because it might hint to the reader that the skb could have an sk assigned which is not true. Even if we did have an sk assigned it would be cleaner to simply wait for the upcoming kfree_skb(). Additionally, move the comment about strparser clone up so its closer to the logic it is describing and add to it so that it is more complete. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226865548.5692.9098315689984599579.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: Remove dropped data on errors in redirect caseJohn Fastabend
In the sk_skb redirect case we didn't handle the case where we overrun the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress. Because we didn't have anything implemented we simply dropped the skb. This meant data could be dropped if socket memory accounting was in place. This fixes the above dropped data case by moving the memory checks later in the code where we actually do the send or recv. This pushes those checks into the workqueue and allows us to return an EAGAIN error which in turn allows us to try again later from the workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226863689.5692.13861422742592309285.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: Remove skb_set_owner_w wmem will be taken later from sendpageJohn Fastabend
The skb_set_owner_w is unnecessary here. The sendpage call will create a fresh skb and set the owner correctly from workqueue. Its also not entirely harmless because it consumes cycles, but also impacts resource accounting by increasing sk_wmem_alloc. This is charging the socket we are going to send to for the skb, but we will put it on the workqueue for some time before this happens so we are artifically inflating sk_wmem_alloc for this period. Further, we don't know how many skbs will be used to send the packet or how it will be broken up when sent over the new socket so charging it with one big sum is also not correct when the workqueue may break it up if facing memory pressure. Seeing we don't know how/when this is going to be sent drop the early accounting. A later patch will do proper accounting charged on receive socket for the case where skbs get enqueued on the workqueue. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226861708.5692.17964237936462425136.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: On receive programs try to fast track SK_PASS ingressJohn Fastabend
When we receive an skb and the ingress skb verdict program returns SK_PASS we currently set the ingress flag and put it on the workqueue so it can be turned into a sk_msg and put on the sk_msg ingress queue. Then finally telling userspace with data_ready hook. Here we observe that if the workqueue is empty then we can try to convert into a sk_msg type and call data_ready directly without bouncing through a workqueue. Its a common pattern to have a recv verdict program for visibility that always returns SK_PASS. In this case unless there is an ENOMEM error or we overrun the socket we can avoid the workqueue completely only using it when we fall back to error cases caused by memory pressure. By doing this we eliminate another case where data may be dropped if errors occur on memory limits in workqueue. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226859704.5692.12929678876744977669.stgit@john-Precision-5820-Tower
2020-10-11bpf, sockmap: Skb verdict SK_PASS to self already checked rmem limitsJohn Fastabend
For sk_skb case where skb_verdict program returns SK_PASS to continue to pass packet up the stack, the memory limits were already checked before enqueuing in skb_queue_tail from TCP side. So, lets remove the extra checks here. The theory is if the TCP stack believes we have memory to receive the packet then lets trust the stack and not double check the limits. In fact the accounting here can cause a drop if sk_rmem_alloc has increased after the stack accepted this packet, but before the duplicate check here. And worse if this happens because TCP stack already believes the data has been received there is no retransmit. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160226857664.5692.668205469388498375.stgit@john-Precision-5820-Tower
2020-10-11bpf: Always return target ifindex in bpf_fib_lookupToke Høiland-Jørgensen
The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work. For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com
2020-10-11bpf: Add redirect_peer helperDaniel Borkmann
Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
2020-10-09devlink: Add enable_remote_dev_reset generic parameterMoshe Shemesh
The enable_remote_dev_reset devlink param flags that the host admin allows device resets that can be initiated by other hosts. This parameter is useful for setups where a device is shared by different hosts, such as multi-host setup. Once the user set this parameter to false, the driver should NACK any attempt to reset the device while the driver is loaded. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09devlink: Add remote reload statsMoshe Shemesh
Add remote reload stats to hold the history of actions performed due devlink reload commands initiated by remote host. For example, in case firmware activation with reset finished successfully but was initiated by remote host. The function devlink_remote_reload_actions_performed() is exported to enable drivers update on remote reload actions performed as it was not initiated by their own devlink instance. Expose devlink remote reload stats to the user through devlink dev get command. Examples: $ devlink dev show pci/0000:82:00.0: stats: reload: driver_reinit 2 fw_activate 1 fw_activate_no_reset 0 remote_reload: driver_reinit 0 fw_activate 0 fw_activate_no_reset 0 pci/0000:82:00.1: stats: reload: driver_reinit 1 fw_activate 0 fw_activate_no_reset 0 remote_reload: driver_reinit 1 fw_activate 1 fw_activate_no_reset 0 $ devlink dev show -jp { "dev": { "pci/0000:82:00.0": { "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 }, "remote_reload": { "driver_reinit": 0, "fw_activate": 0, "fw_activate_no_reset": 0 } } }, "pci/0000:82:00.1": { "stats": { "reload": { "driver_reinit": 1, "fw_activate": 0, "fw_activate_no_reset": 0 }, "remote_reload": { "driver_reinit": 1, "fw_activate": 1, "fw_activate_no_reset": 0 } } } } } Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09devlink: Add reload statsMoshe Shemesh
Add reload stats to hold the history per reload action type and limit. For example, the number of times fw_activate has been performed on this device since the driver module was added or if the firmware activation was performed with or without reset. Add devlink notification on stats update. Expose devlink reload stats to the user through devlink dev get command. Examples: $ devlink dev show pci/0000:82:00.0: stats: reload: driver_reinit 2 fw_activate 1 fw_activate_no_reset 0 pci/0000:82:00.1: stats: reload: driver_reinit 1 fw_activate 0 fw_activate_no_reset 0 $ devlink dev show -jp { "dev": { "pci/0000:82:00.0": { "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 } } }, "pci/0000:82:00.1": { "stats": { "reload": { "driver_reinit": 1, "fw_activate": 0, "fw_activate_no_reset": 0 } } } } } Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09devlink: Add devlink reload limit optionMoshe Shemesh
Add reload limit to demand restrictions on reload actions. Reload limits supported: no_reset: No reset allowed, no down time allowed, no link flap and no configuration is lost. By default reload limit is unspecified and so no constraints on reload actions are required. Some combinations of action and limit are invalid. For example, driver can not reinitialize its entities without any downtime. The no_reset reload limit will have usecase in this patchset to implement restricted fw_activate on mlx5. Have the uapi parameter of reload limit ready for future support of multiselection. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09devlink: Add reload action option to devlink reload commandMoshe Shemesh
Add devlink reload action to allow the user to request a specific reload action. The action parameter is optional, if not specified then devlink driver re-init action is used (backward compatible). Note that when required to do firmware activation some drivers may need to reload the driver. On the other hand some drivers may need to reset the firmware to reinitialize the driver entities. Therefore, the devlink reload command returns the actions which were actually performed. Reload actions supported are: driver_reinit: driver entities re-initialization, applying devlink-param and devlink-resource values. fw_activate: firmware activate. command examples: $devlink dev reload pci/0000:82:00.0 action driver_reinit reload_actions_performed: driver_reinit $devlink dev reload pci/0000:82:00.0 action fw_activate reload_actions_performed: driver_reinit fw_activate Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09devlink: Change devlink_reload_supported() param typeMoshe Shemesh
Change devlink_reload_supported() function to get devlink_ops pointer param instead of devlink pointer param. This change will be used in the next patch to check if devlink reload is supported before devlink instance is allocated. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09bpf: Add tcp_notsent_lowat bpf setsockoptNikita V. Shirokov
Adding support for TCP_NOTSENT_LOWAT sockoption (https://lwn.net/Articles/560082/) in tcp bpf programs. Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201009070325.226855-1-tehnerd@tehnerd.com
2020-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Small conflict around locking in rxrpc_process_event() - channel_lock moved to bundle in next, while state lock needs _bh() from net. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2020-10-08 The main changes are: 1) Fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock BTF, from Yonghong Song. 2) Fix 32 bit sub-register bounds tracking for OR case, from Daniel Borkmann. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-06net: always dump full packets with skb_dumpVladimir Oltean
Currently skb_dump has a restriction to only dump full packet for the first 5 socket buffers, then only headers will be printed. Remove this arbitrary and confusing restriction, which is only documented vaguely ("up to") in the comments above the prototype. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net/core: check length before updating Ethertype in skb_mpls_{push,pop}Guillaume Nault
Openvswitch allows to drop a packet's Ethernet header, therefore skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true and mac_len=0. In that case the pointer passed to skb_mod_eth_type() doesn't point to an Ethernet header and the new Ethertype is written at unexpected locations. Fix this by verifying that mac_len is big enough to contain an Ethernet header. Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions") Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net: devlink: Add support for port regionsAndrew Lunn
Allow regions to be registered to a devlink port. The same netlink API is used, but the port index is provided to indicate when a region is a port region as opposed to a device region. Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net: devlink: Add unused port flavourAndrew Lunn
Not all ports of a switch need to be used, particularly in embedded systems. Add a port flavour for ports which physically exist in the switch, but are not connected to the front panel etc, and so are unused. By having unused ports present in devlink, it gives a more accurate representation of the hardware. It also allows regions to be associated to such ports, so allowing, for example, to determine unused ports are correctly powered off, or to compare probable reset defaults of unused ports to used ports experiences issues. Actually registering unused ports and setting the flavour to unused is optional. The DSA core will register all such switch ports, but such ports are expected to be limited in number. Bigger ASICs may decide not to list unused ports. v2: Expand the description about why it is useful Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03net/sched: act_vlan: Add {POP,PUSH}_ETH actionsGuillaume Nault
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to respectively pop and push a base Ethernet header at the beginning of a frame. POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any, must be stripped before calling POP_ETH. PUSH_ETH is restricted to skbs with no mac_header, and only the MAC addresses can be configured. The Ethertype is automatically set from skb->protocol. These restrictions ensure that all skb's fields remain consistent, so that this action can't confuse other part of the networking stack (like GSO). Since openvswitch already had these actions, consolidate the code in skbuff.c (like for vlan and mpls push/pop). Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02genetlink: move to smaller ops wherever possibleJakub Kicinski
Bulk of the genetlink users can use smaller ops, move them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02devlink: add .trap_group_action_set() callbackIoana Ciornei
Add a new devlink callback, .trap_group_action_set(), which can be used by device drivers which do not support controlling the action (drop, trap) on each trap but rather on the entire group trap. If this new callback is populated, it will take precedence over the .trap_action_set() callback when the user requests a change of all the traps in a group. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02devlink: add parser error drop packet trapsIoana Ciornei
Add parser error drop packet traps, so that capable device driver could register them with devlink. The new packet trap group holds any drops of packets which were marked by the device as erroneous during header parsing. Add documentation for every added packet trap and packet trap group. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02bpf, sockmap: Add skb_adjust_room to pop bytes off ingress payloadJohn Fastabend
This implements a new helper skb_adjust_room() so users can push/pop extra bytes from a BPF_SK_SKB_STREAM_VERDICT program. Some protocols may include headers and other information that we may not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT program. One use case is to redirect TLS packets into a receive socket that doesn't expect TLS data. In TLS case the first 13B or so contain the protocol header. With KTLS the payload is decrypted so we should be able to redirect this to a receiving socket, but the receiving socket may not be expecting to receive a TLS header and discard the data. Using the above helper we can pop the header off and put an appropriate header on the payload. This allows for creating a proxy between protocols without extra hops through the stack or userspace. So in order to fix this case add skb_adjust_room() so users can strip the header. After this the user can strip the header and an unmodified receiver thread will work correctly when data is redirected into the ingress path of a sock. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/160160099197.7052.8443193973242831692.stgit@john-Precision-5820-Tower
2020-10-01net-sysfs: Fix inconsistent of format with argument type in net-sysfs.cYe Bin
Fix follow warnings: [net/core/net-sysfs.c:1161]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. [net/core/net-sysfs.c:1162]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-01pktgen: Fix inconsistent of format with argument type in pktgen.cYe Bin
Fix follow warnings: [net/core/pktgen.c:925]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:942]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:962]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:984]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:1149]: (warning) %d in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 90 non-merge commits during the last 8 day(s) which contain a total of 103 files changed, 7662 insertions(+), 1894 deletions(-). Note that once bpf(/net) tree gets merged into net-next, there will be a small merge conflict in tools/lib/bpf/btf.c between commit 1245008122d7 ("libbpf: Fix native endian assumption when parsing BTF") from the bpf tree and the commit 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness") from the bpf-next tree. Correct resolution would be to stick with bpf-next, it should look like: [...] /* check BTF magic */ if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) { err = -EIO; goto err_out; } if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) { /* definitely not a raw BTF */ err = -EPROTO; goto err_out; } /* get file size */ [...] The main changes are: 1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying BTF-based kernel data structures out of BPF programs, from Alan Maguire. 2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race conditions exposed by it. It was discussed to take these via BPF and networking tree to get better testing exposure, from Paul E. McKenney. 3) Support multi-attach for freplace programs, needed for incremental attachment of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen. 4) libbpf support for appending new BTF types at the end of BTF object, allowing intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko. 5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add a redirect helper into neighboring subsys, from Daniel Borkmann. 6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps from one to another, from Lorenz Bauer. 7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused a verifier issue due to type downgrade to scalar, from John Fastabend. 8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue and epilogue sections, from Maciej Fijalkowski. 9) Add an option to perf RB map to improve sharing of event entries by avoiding remove- on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu. 10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails, from Magnus Karlsson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-01bpf: Fix "unresolved symbol" build error with resolve_btfidsYonghong Song
Michal reported a build failure likes below: BTFIDS vmlinux FAILED unresolved symbol tcp_timewait_sock make[1]: *** [/.../linux-5.9-rc7/Makefile:1176: vmlinux] Error 255 This error can be triggered when config has CONFIG_NET enabled but CONFIG_INET disabled. In this case, there is no user of istructs inet_timewait_sock and tcp_timewait_sock and hence vmlinux BTF types are not generated for these two structures. To fix the problem, let us force BTF generation for these two structures with BTF_TYPE_EMIT. Fixes: fce557bcef11 ("bpf: Make btf_sock_ids global") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201001051339.2549085-1-yhs@fb.com
2020-09-30drop_monitor: Filter control packets in drop monitorIdo Schimmel
Previously, devlink called into drop monitor in order to report hardware originated drops / exceptions. devlink intentionally filtered control packets and did not pass them to drop monitor as they were not dropped by the underlying hardware. Now drop monitor registers its probe on a generic 'devlink_trap_report' tracepoint and should therefore perform this filtering itself instead of having devlink do that. Add the trap type as metadata and have drop monitor ignore control packets. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30drop_monitor: Remove duplicate structIdo Schimmel
'struct net_dm_hw_metadata' is a duplicate of 'struct devlink_trap_metadata'. Remove the former and simplify the code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30drop_monitor: Remove no longer used functionsIdo Schimmel
The old probe functions that were invoked by drop monitor code are no longer called and can thus be removed. They were replaced by actual probe functions that are registered on the recently introduced 'devlink_trap_report' tracepoint. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30drop_monitor: Convert to using devlink tracepointIdo Schimmel
Convert drop monitor to use the recently introduced 'devlink_trap_report' tracepoint instead of having devlink call into drop monitor. This is both consistent with software originated drops ('kfree_skb' tracepoint) and also allows drop monitor to be built as a module and still report hardware originated drops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30drop_monitor: Prepare probe functions for devlink tracepointIdo Schimmel
Drop monitor supports two alerting modes: Summary and packet. Prepare a probe function for each, so that they could be later registered on the devlink tracepoint by calling register_trace_devlink_trap_report(), based on the configured alerting mode. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30devlink: Add a tracepoint for trap reportsIdo Schimmel
Add a tracepoint for trap reports so that drop monitor could register its probe on it. Use trace_devlink_trap_report_enabled() to avoid wasting cycles setting the trap metadata if the tracepoint is not enabled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30bpf: Add redirect_neigh helper as redirect drop-inDaniel Borkmann
Add a redirect_neigh() helper as redirect() drop-in replacement for the xmit side. Main idea for the helper is to be very similar in semantics to the latter just that the skb gets injected into the neighboring subsystem in order to let the stack do the work it knows best anyway to populate the L2 addresses of the packet and then hand over to dev_queue_xmit() as redirect() does. This solves two bigger items: i) skbs don't need to go up to the stack on the host facing veth ingress side for traffic egressing the container to achieve the same for populating L2 which also has the huge advantage that ii) the skb->sk won't get orphaned in ip_rcv_core() when entering the IP routing layer on the host stack. Given that skb->sk neither gets orphaned when crossing the netns as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.") the helper can then push the skbs directly to the phys device where FQ scheduler can do its work and TCP stack gets proper backpressure given we hold on to skb->sk as long as skb is still residing in queues. With the helper used in BPF data path to then push the skb to the phys device, I observed a stable/consistent TCP_STREAM improvement on veth devices for traffic going container -> host -> host -> container from ~10Gbps to ~15Gbps for a single stream in my test environment. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2020-09-30bpf, net: Rework cookie generator as per-cpu oneDaniel Borkmann
With its use in BPF, the cookie generator can be called very frequently in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg) and attached to the root cgroup, for example, when used in v1/v2 mixed environments. In particular, when there's a high churn on sockets in the system there can be many parallel requests to the bpf_get_socket_cookie() and bpf_get_netns_cookie() helpers which then cause contention on the atomic counter. As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino allocator"), add a small helper library that both can use for the 64 bit counters. Given this can be called from different contexts, we also need to deal with potential nested calls even though in practice they are considered extremely rare. One idea as suggested by Eric Dumazet was to use a reverse counter for this situation since we don't expect 64 bit overflows anyways; that way, we can avoid bigger gaps in the 64 bit counter space compared to just batch-wise increase. Even on machines with small number of cores (e.g. 4) the cookie generation shrinks from min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run in parallel from multiple CPUs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
2020-09-30bpf: Add classid helper only based on skb->skDaniel Borkmann
Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid from v2 hooks"), add a helper to retrieve cgroup v1 classid solely based on the skb->sk, so it can be used as key as part of BPF map lookups out of tc from host ns, in particular given the skb->sk is retained these days when crossing net ns thanks to 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). This is similar to bpf_skb_cgroup_id() which implements the same for v2. Kubernetes ecosystem is still operating on v1 however, hence net_cls needs to be used there until this can be dropped in with the v2 helper of bpf_skb_cgroup_id(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2020-09-29net: Add netif_rx_any_context()Sebastian Andrzej Siewior
Quite some drivers make conditional decisions based on in_interrupt() to invoke either netif_rx() or netif_rx_ni(). Conditionals based on in_interrupt() or other variants of preempt count checks in drivers should not exist for various reasons and Linus clearly requested to either split the code pathes or pass an argument to the common functions which provides the context. This is obviously the correct solution, but for some of the affected drivers this needs a major rewrite due to their convoluted structure. As in_interrupt() usage in drivers needs to be phased out, provide netif_rx_any_context() as a stop gap for these drivers. This confines the in_interrupt() conditional to core code which in turn allows to remove the access to this check for driver code and provides one central place to do further modifications once the driver maze is cleaned up. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28bpf: sockmap: Enable map_update_elem from bpf_iterLorenz Bauer
Allow passing a pointer to a BTF struct sock_common* when updating a sockmap or sockhash. Since BTF pointers can fault and therefore be NULL at runtime we need to add an additional !sk check to sock_map_update_elem. Since we may be passed a request or timewait socket we also need to check sk_fullsock. Doing this allows calling map_update_elem on sockmap from bpf_iter context, which uses BTF pointers. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com
2020-09-28net: core: add nested_level variable in net_deviceTaehee Yoo
This patch is to add a new variable 'nested_level' into the net_device structure. This variable will be used as a parameter of spin_lock_nested() of dev->addr_list_lock. netif_addr_lock() can be called recursively so spin_lock_nested() is used instead of spin_lock() and dev->lower_level is used as a parameter of spin_lock_nested(). But, dev->lower_level value can be updated while it is being used. So, lockdep would warn a possible deadlock scenario. When a stacked interface is deleted, netif_{uc | mc}_sync() is called recursively. So, spin_lock_nested() is called recursively too. At this moment, the dev->lower_level variable is used as a parameter of it. dev->lower_level value is updated when interfaces are being unlinked/linked immediately. Thus, After unlinking, dev->lower_level shouldn't be a parameter of spin_lock_nested(). A (macvlan) | B (vlan) | C (bridge) | D (macvlan) | E (vlan) | F (bridge) A->lower_level : 6 B->lower_level : 5 C->lower_level : 4 D->lower_level : 3 E->lower_level : 2 F->lower_level : 1 When an interface 'A' is removed, it releases resources. At this moment, netif_addr_lock() would be called. Then, netdev_upper_dev_unlink() is called recursively. Then dev->lower_level is updated. There is no problem. But, when the bridge module is removed, 'C' and 'F' interfaces are removed at once. If 'F' is removed first, a lower_level value is like below. A->lower_level : 5 B->lower_level : 4 C->lower_level : 3 D->lower_level : 2 E->lower_level : 1 F->lower_level : 1 Then, 'C' is removed. at this moment, netif_addr_lock() is called recursively. The ordering is like this. C(3)->D(2)->E(1)->F(1) At this moment, the lower_level value of 'E' and 'F' are the same. So, lockdep warns a possible deadlock scenario. In order to avoid this problem, a new variable 'nested_level' is added. This value is the same as dev->lower_level - 1. But this value is updated in rtnl_unlock(). So, this variable can be used as a parameter of spin_lock_nested() safely in the rtnl context. Test commands: ip link add br0 type bridge vlan_filtering 1 ip link add vlan1 link br0 type vlan id 10 ip link add macvlan2 link vlan1 type macvlan ip link add br3 type bridge vlan_filtering 1 ip link set macvlan2 master br3 ip link add vlan4 link br3 type vlan id 10 ip link add macvlan5 link vlan4 type macvlan ip link add br6 type bridge vlan_filtering 1 ip link set macvlan5 master br6 ip link add vlan7 link br6 type vlan id 10 ip link add macvlan8 link vlan7 type macvlan ip link set br0 up ip link set vlan1 up ip link set macvlan2 up ip link set br3 up ip link set vlan4 up ip link set macvlan5 up ip link set br6 up ip link set vlan7 up ip link set macvlan8 up modprobe -rv bridge Splat looks like: [ 36.057436][ T744] WARNING: possible recursive locking detected [ 36.058848][ T744] 5.9.0-rc6+ #728 Not tainted [ 36.059959][ T744] -------------------------------------------- [ 36.061391][ T744] ip/744 is trying to acquire lock: [ 36.062590][ T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30 [ 36.064922][ T744] [ 36.064922][ T744] but task is already holding lock: [ 36.066626][ T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.068851][ T744] [ 36.068851][ T744] other info that might help us debug this: [ 36.070731][ T744] Possible unsafe locking scenario: [ 36.070731][ T744] [ 36.072497][ T744] CPU0 [ 36.073238][ T744] ---- [ 36.074007][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.075290][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.076590][ T744] [ 36.076590][ T744] *** DEADLOCK *** [ 36.076590][ T744] [ 36.078515][ T744] May be due to missing lock nesting notation [ 36.078515][ T744] [ 36.080491][ T744] 3 locks held by ip/744: [ 36.081471][ T744] #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490 [ 36.083614][ T744] #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.085942][ T744] #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80 [ 36.088400][ T744] [ 36.088400][ T744] stack backtrace: [ 36.089772][ T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728 [ 36.091364][ T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 36.093630][ T744] Call Trace: [ 36.094416][ T744] dump_stack+0x77/0x9b [ 36.095385][ T744] __lock_acquire+0xbc3/0x1f40 [ 36.096522][ T744] lock_acquire+0xb4/0x3b0 [ 36.097540][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.098657][ T744] ? rtmsg_ifinfo+0x1f/0x30 [ 36.099711][ T744] ? __dev_notify_flags+0xa5/0xf0 [ 36.100874][ T744] ? rtnl_is_locked+0x11/0x20 [ 36.101967][ T744] ? __dev_set_promiscuity+0x7b/0x1a0 [ 36.103230][ T744] _raw_spin_lock_bh+0x38/0x70 [ 36.104348][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.105461][ T744] dev_set_rx_mode+0x19/0x30 [ 36.106532][ T744] dev_set_promiscuity+0x36/0x50 [ 36.107692][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.108929][ T744] dev_set_promiscuity+0x1e/0x50 [ 36.110093][ T744] br_port_set_promisc+0x1f/0x40 [bridge] [ 36.111415][ T744] br_manage_promisc+0x8b/0xe0 [bridge] [ 36.112728][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.113967][ T744] ? __hw_addr_sync_one+0x23/0x50 [ 36.115135][ T744] __dev_set_rx_mode+0x68/0x90 [ 36.116249][ T744] dev_uc_sync+0x70/0x80 [ 36.117244][ T744] dev_uc_add+0x50/0x60 [ 36.118223][ T744] macvlan_open+0x18e/0x1f0 [macvlan] [ 36.119470][ T744] __dev_open+0xd6/0x170 [ 36.120470][ T744] __dev_change_flags+0x181/0x1d0 [ 36.121644][ T744] dev_change_flags+0x23/0x60 [ 36.122741][ T744] do_setlink+0x30a/0x11e0 [ 36.123778][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.124929][ T744] ? __nla_validate_parse.part.6+0x45/0x8e0 [ 36.126309][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.127457][ T744] __rtnl_newlink+0x546/0x8e0 [ 36.128560][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.129623][ T744] ? deactivate_slab.isra.85+0x6a1/0x850 [ 36.130946][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.132102][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.133176][ T744] ? is_bpf_text_address+0x5/0xe0 [ 36.134364][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.135445][ T744] ? rcu_read_lock_sched_held+0x32/0x60 [ 36.136771][ T744] ? kmem_cache_alloc_trace+0x2d8/0x380 [ 36.138070][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.139164][ T744] rtnl_newlink+0x47/0x70 [ ... ] Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: core: introduce struct netdev_nested_priv for nested interface ↵Taehee Yoo
infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: core: add __netdev_upper_dev_unlink()Taehee Yoo
The netdev_upper_dev_unlink() has to work differently according to flags. This idea is the same with __netdev_upper_dev_link(). In the following patches, new flags will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26net: flow_dissector: avoid indirect call to DSA .flow_dissect for generic caseVladimir Oltean
With the recent mitigations against speculative execution exploits, indirect function calls are more expensive and it would be good to avoid them where possible. In the case of DSA, most switch taggers will shift the EtherType and next headers by a fixed amount equal to that tag's length in bytes. So we can use a generic procedure to determine that, without calling into custom tagger code. However we still leave the flow_dissect method inside struct dsa_device_ops as an override for the generic function. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26net: dsa: make the .flow_dissect tagger callback return voidVladimir Oltean
There is no tagger that returns anything other than zero, so just change the return type appropriately. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25devlink: introduce flash update overwrite maskJacob Keller
Sections of device flash may contain settings or device identifying information. When performing a flash update, it is generally expected that these settings and identifiers are not overwritten. However, it may sometimes be useful to allow overwriting these fields when performing a flash update. Some examples include, 1) customizing the initial device config on first programming, such as overwriting default device identifying information, or 2) reverting a device configuration to known good state provided in the new firmware image, or 3) in case it is suspected that current firmware logic for managing the preservation of fields during an update is broken. Although some devices are able to completely separate these types of settings and fields into separate components, this is not true for all hardware. To support controlling this behavior, a new DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK is defined. This is an nla_bitfield32 which will define what subset of fields in a component should be overwritten during an update. If no bits are specified, or of the overwrite mask is not provided, then an update should not overwrite anything, and should maintain the settings and identifiers as they are in the previous image. If the overwrite mask has the DEVLINK_FLASH_OVERWRITE_SETTINGS bit set, then the device should be configured to overwrite any of the settings in the requested component with settings found in the provided image. Similarly, if the DEVLINK_FLASH_OVERWRITE_IDENTIFIERS bit is set, the device should be configured to overwrite any device identifiers in the requested component with the identifiers from the image. Multiple overwrite modes may be combined to indicate that a combination of the set of fields that should be overwritten. Drivers which support the new overwrite mask must set the DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK in the supported_flash_update_params field of their devlink_ops. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25devlink: convert flash_update to use params structureJacob Keller
The devlink core recently gained support for checking whether the driver supports a flash_update parameter, via `supported_flash_update_params`. However, parameters are specified as function arguments. Adding a new parameter still requires modifying the signature of the .flash_update callback in all drivers. Convert the .flash_update function to take a new `struct devlink_flash_update_params` instead. By using this structure, and the `supported_flash_update_params` bit field, a new parameter to flash_update can be added without requiring modification to existing drivers. As before, all parameters except file_name will require driver opt-in. Because file_name is a necessary field to for the flash_update to make sense, no "SUPPORTED" bitflag is provided and it is always considered valid. All future additional parameters will require a new bit in the supported_flash_update_params bitfield. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Bin Luo <luobin9@huawei.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Danielle Ratson <danieller@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25devlink: check flash_update parameter support in net coreJacob Keller
When implementing .flash_update, drivers which do not support per-component update are manually checking the component parameter to verify that it is NULL. Without this check, the driver might accept an update request with a component specified even though it will not honor such a request. Instead of having each driver check this, move the logic into net/core/devlink.c, and use a new `supported_flash_update_params` field in the devlink_ops. Drivers which will support per-component update must now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in the supported_flash_update_params in their devlink_ops. This helps ensure that drivers do not forget to check for a NULL component if they do not support per-component update. This also enables a slightly better error message by enabling the core stack to set the netlink bad attribute message to indicate precisely the unsupported attribute in the message. Going forward, any new additional parameter to flash update will require a bit in the supported_flash_update_params bitfield. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Bin Luo <luobin9@huawei.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Danielle Ratson <danieller@mellanox.com> Cc: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMONMartin KaFai Lau
This patch changes the bpf_sk_assign() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL". Meaning it specifically takes a literal NULL. ARG_PTR_TO_BTF_ID_SOCK_COMMON does not allow a literal NULL, so another ARG type is required for this purpose and another follow-up patch can be used if there is such need. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com