diff options
author | David S. Miller <davem@davemloft.net> | 2019-08-17 12:40:09 -0700 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2019-08-17 12:40:09 -0700 |
commit | 83beee5a3aff0fb159b2fb4d0cac8f18a193417e (patch) | |
tree | ce77ccefee1384488408d9b9e49e2148359f30d9 /include/net/devlink.h | |
parent | f77508308fa76d0efc60ebf3c906f467feb062cb (diff) | |
parent | 95766451bfb82f972bf3fea93fc6e91a904cf624 (diff) |
Merge branch 'drop_monitor-for-offloaded-paths'
Ido Schimmel says:
====================
Add drop monitor for offloaded data paths
Users have several ways to debug the kernel and understand why a packet
was dropped. For example, using drop monitor and perf. Both utilities
trace kfree_skb(), which is the function called when a packet is freed
as part of a failure. The information provided by these tools is
invaluable when trying to understand the cause of a packet loss.
In recent years, large portions of the kernel data path were offloaded
to capable devices. Today, it is possible to perform L2 and L3
forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
Different TC classifiers and actions are also offloaded to capable
devices, at both ingress and egress.
However, when the data path is offloaded it is not possible to achieve
the same level of introspection since packets are dropped by the
underlying device and never reach the kernel.
This patchset aims to solve this by allowing users to monitor packets
that the underlying device decided to drop along with relevant metadata
such as the drop reason and ingress port.
The above is achieved by exposing a fundamental capability of devices
capable of data path offloading - packet trapping. In much the same way
as drop monitor registers its probe function with the kfree_skb()
tracepoint, the device is instructed to pass to the CPU (trap) packets
that it decided to drop in various places in the pipeline.
The configuration of the device to pass such packets to the CPU is
performed using devlink, as it is not specific to a port, but rather to
a device. In the future, we plan to control the policing of such packets
using devlink, in order not to overwhelm the CPU.
While devlink is used as the control path, the dropped packets are
passed along with metadata to drop monitor, which reports them to
userspace as netlink events. This allows users to use the same interface
for the monitoring of both software and hardware drops.
Logically, the solution looks as follows:
Netlink event: Packet w/ metadata
Or a summary of recent drops
^
|
Userspace |
+---------------------------------------------------+
Kernel |
|
+-------+--------+
| |
| drop_monitor |
| |
+-------^--------+
|
|
|
+----+----+
| | Kernel's Rx path
| devlink | (non-drop traps)
| |
+----^----+ ^
| |
+-----------+
|
+-------+-------+
| |
| Device driver |
| |
+-------^-------+
Kernel |
+---------------------------------------------------+
Hardware |
| Trapped packet
|
+--+---+
| |
| ASIC |
| |
+------+
In order to reduce the patch count, this patchset only includes
integration with netdevsim. A follow-up patchset will add devlink-trap
support in mlxsw.
Patches #1-#7 extend drop monitor to also monitor hardware originated
drops.
Patches #8-#10 add the devlink-trap infrastructure.
Patches #11-#12 add devlink-trap support in netdevsim.
Patches #13-#16 add tests for the generic infrastructure over netdevsim.
Example
=======
Instantiate netdevsim
---------------------
List supported traps
--------------------
netdevsim/netdevsim10:
name source_mac_is_multicast type drop generic true action drop group l2_drops
name vlan_tag_mismatch type drop generic true action drop group l2_drops
name ingress_vlan_filter type drop generic true action drop group l2_drops
name ingress_spanning_tree_filter type drop generic true action drop group l2_drops
name port_list_is_empty type drop generic true action drop group l2_drops
name port_loopback_filter type drop generic true action drop group l2_drops
name fid_miss type exception generic false action trap group l2_drops
name blackhole_route type drop generic true action drop group l3_drops
name ttl_value_is_too_small type exception generic true action trap group l3_drops
name tail_drop type drop generic true action drop group buffer_drops
Enable a trap
-------------
Query statistics
----------------
netdevsim/netdevsim10:
name blackhole_route type drop generic true action trap group l3_drops
stats:
rx:
bytes 7384 packets 52
Monitor dropped packets
-----------------------
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch> set sw true
setting software drops monitoring to 1
dropwatch> set hw true
setting hardware drops monitoring to 1
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: ttl_value_is_too_small (l3_drops)
origin: hardware
input port ifindex: 55
input port name: eth0
timestamp: Mon Aug 12 10:52:20 2019 445911505 nsec
protocol: 0x800
length: 142
original length: 142
drop at: ip6_mc_input+0x8b8/0xef8 (0xffffffff9e2bb0e8)
origin: software
input port ifindex: 4
timestamp: Mon Aug 12 10:53:37 2019 024444587 nsec
protocol: 0x86dd
length: 110
original length: 110
Future plans
============
* Provide more drop reasons as well as more metadata
* Add dropmon support to libpcap, so that tcpdump/tshark could
specifically listen on dropmon traffic, instead of capturing all
netlink packets via nlmon interface
Changes in v3:
* Place test with the rest of the netdevsim tests
* Fix test to load netdevsim module
* Move devlink helpers from the test to devlink_lib.sh. Will be used
by mlxsw tests
* Re-order netdevsim includes in alphabetical order
* Fix reverse xmas tree in netdevsim
* Remove double include in netdevsim
Changes in v2:
* Use drop monitor to report dropped packets instead of devlink
* Add drop monitor patches
* Add test cases
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/net/devlink.h')
-rw-r--r-- | include/net/devlink.h | 175 |
1 files changed, 175 insertions, 0 deletions
diff --git a/include/net/devlink.h b/include/net/devlink.h index 451268f64880..7f43c48f54cd 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -14,6 +14,7 @@ #include <linux/netdevice.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <linux/refcount.h> #include <net/net_namespace.h> #include <uapi/linux/devlink.h> @@ -31,6 +32,8 @@ struct devlink { struct list_head reporter_list; struct mutex reporters_lock; /* protects reporter_list */ struct devlink_dpipe_headers *dpipe_headers; + struct list_head trap_list; + struct list_head trap_group_list; const struct devlink_ops *ops; struct device *dev; possible_net_t _net; @@ -497,6 +500,135 @@ struct devlink_health_reporter_ops { struct devlink_fmsg *fmsg); }; +/** + * struct devlink_trap_group - Immutable packet trap group attributes. + * @name: Trap group name. + * @id: Trap group identifier. + * @generic: Whether the trap group is generic or not. + * + * Describes immutable attributes of packet trap groups that drivers register + * with devlink. + */ +struct devlink_trap_group { + const char *name; + u16 id; + bool generic; +}; + +#define DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT BIT(0) + +/** + * struct devlink_trap - Immutable packet trap attributes. + * @type: Trap type. + * @init_action: Initial trap action. + * @generic: Whether the trap is generic or not. + * @id: Trap identifier. + * @name: Trap name. + * @group: Immutable packet trap group attributes. + * @metadata_cap: Metadata types that can be provided by the trap. + * + * Describes immutable attributes of packet traps that drivers register with + * devlink. + */ +struct devlink_trap { + enum devlink_trap_type type; + enum devlink_trap_action init_action; + bool generic; + u16 id; + const char *name; + struct devlink_trap_group group; + u32 metadata_cap; +}; + +/* All traps must be documented in + * Documentation/networking/devlink-trap.rst + */ +enum devlink_trap_generic_id { + DEVLINK_TRAP_GENERIC_ID_SMAC_MC, + DEVLINK_TRAP_GENERIC_ID_VLAN_TAG_MISMATCH, + DEVLINK_TRAP_GENERIC_ID_INGRESS_VLAN_FILTER, + DEVLINK_TRAP_GENERIC_ID_INGRESS_STP_FILTER, + DEVLINK_TRAP_GENERIC_ID_EMPTY_TX_LIST, + DEVLINK_TRAP_GENERIC_ID_PORT_LOOPBACK_FILTER, + DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_ROUTE, + DEVLINK_TRAP_GENERIC_ID_TTL_ERROR, + DEVLINK_TRAP_GENERIC_ID_TAIL_DROP, + + /* Add new generic trap IDs above */ + __DEVLINK_TRAP_GENERIC_ID_MAX, + DEVLINK_TRAP_GENERIC_ID_MAX = __DEVLINK_TRAP_GENERIC_ID_MAX - 1, +}; + +/* All trap groups must be documented in + * Documentation/networking/devlink-trap.rst + */ +enum devlink_trap_group_generic_id { + DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS, + DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS, + DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS, + + /* Add new generic trap group IDs above */ + __DEVLINK_TRAP_GROUP_GENERIC_ID_MAX, + DEVLINK_TRAP_GROUP_GENERIC_ID_MAX = + __DEVLINK_TRAP_GROUP_GENERIC_ID_MAX - 1, +}; + +#define DEVLINK_TRAP_GENERIC_NAME_SMAC_MC \ + "source_mac_is_multicast" +#define DEVLINK_TRAP_GENERIC_NAME_VLAN_TAG_MISMATCH \ + "vlan_tag_mismatch" +#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_VLAN_FILTER \ + "ingress_vlan_filter" +#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_STP_FILTER \ + "ingress_spanning_tree_filter" +#define DEVLINK_TRAP_GENERIC_NAME_EMPTY_TX_LIST \ + "port_list_is_empty" +#define DEVLINK_TRAP_GENERIC_NAME_PORT_LOOPBACK_FILTER \ + "port_loopback_filter" +#define DEVLINK_TRAP_GENERIC_NAME_BLACKHOLE_ROUTE \ + "blackhole_route" +#define DEVLINK_TRAP_GENERIC_NAME_TTL_ERROR \ + "ttl_value_is_too_small" +#define DEVLINK_TRAP_GENERIC_NAME_TAIL_DROP \ + "tail_drop" + +#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \ + "l2_drops" +#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L3_DROPS \ + "l3_drops" +#define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \ + "buffer_drops" + +#define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \ + { \ + .type = DEVLINK_TRAP_TYPE_##_type, \ + .init_action = DEVLINK_TRAP_ACTION_##_init_action, \ + .generic = true, \ + .id = DEVLINK_TRAP_GENERIC_ID_##_id, \ + .name = DEVLINK_TRAP_GENERIC_NAME_##_id, \ + .group = _group, \ + .metadata_cap = _metadata_cap, \ + } + +#define DEVLINK_TRAP_DRIVER(_type, _init_action, _id, _name, _group, \ + _metadata_cap) \ + { \ + .type = DEVLINK_TRAP_TYPE_##_type, \ + .init_action = DEVLINK_TRAP_ACTION_##_init_action, \ + .generic = false, \ + .id = _id, \ + .name = _name, \ + .group = _group, \ + .metadata_cap = _metadata_cap, \ + } + +#define DEVLINK_TRAP_GROUP_GENERIC(_id) \ + { \ + .name = DEVLINK_TRAP_GROUP_GENERIC_NAME_##_id, \ + .id = DEVLINK_TRAP_GROUP_GENERIC_ID_##_id, \ + .generic = true, \ + } + struct devlink_ops { int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack); int (*port_type_set)(struct devlink_port *devlink_port, @@ -558,6 +690,38 @@ struct devlink_ops { int (*flash_update)(struct devlink *devlink, const char *file_name, const char *component, struct netlink_ext_ack *extack); + /** + * @trap_init: Trap initialization function. + * + * Should be used by device drivers to initialize the trap in the + * underlying device. Drivers should also store the provided trap + * context, so that they could efficiently pass it to + * devlink_trap_report() when the trap is triggered. + */ + int (*trap_init)(struct devlink *devlink, + const struct devlink_trap *trap, void *trap_ctx); + /** + * @trap_fini: Trap de-initialization function. + * + * Should be used by device drivers to de-initialize the trap in the + * underlying device. + */ + void (*trap_fini)(struct devlink *devlink, + const struct devlink_trap *trap, void *trap_ctx); + /** + * @trap_action_set: Trap action set function. + */ + int (*trap_action_set)(struct devlink *devlink, + const struct devlink_trap *trap, + enum devlink_trap_action action); + /** + * @trap_group_init: Trap group initialization function. + * + * Should be used by device drivers to initialize the trap group in the + * underlying device. + */ + int (*trap_group_init)(struct devlink *devlink, + const struct devlink_trap_group *group); }; static inline void *devlink_priv(struct devlink *devlink) @@ -774,6 +938,17 @@ void devlink_flash_update_status_notify(struct devlink *devlink, unsigned long done, unsigned long total); +int devlink_traps_register(struct devlink *devlink, + const struct devlink_trap *traps, + size_t traps_count, void *priv); +void devlink_traps_unregister(struct devlink *devlink, + const struct devlink_trap *traps, + size_t traps_count); +void devlink_trap_report(struct devlink *devlink, + struct sk_buff *skb, void *trap_ctx, + struct devlink_port *in_devlink_port); +void *devlink_trap_ctx_priv(void *trap_ctx); + #if IS_ENABLED(CONFIG_NET_DEVLINK) void devlink_compat_running_version(struct net_device *dev, |