summaryrefslogtreecommitdiff
path: root/Documentation/networking
AgeCommit message (Collapse)Author
2017-07-11net: Fix minor code bug in timestamping.txtAhmad Fatoum
Passing (void*)val instead of &val would make a pointer out of an integer and cause sock_setsockopt to -EFAULT. See tools/testing/selftests/networking/timestamping/timestamping.c for a working example. Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Ahmad Fatoum <ahmad@a3f.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-08doc: SKB_GSO_[IPIP|SIT] have been replacedNicolas Dichtel
Those enum values don't exist anymore. Fixes: 7e13318daa4a ("net: define gso types for IPx over IPv4 and IPv6") CC: Tom Herbert <tom@herbertland.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Reasonably busy this cycle, but perhaps not as busy as in the 4.12 merge window: 1) Several optimizations for UDP processing under high load from Paolo Abeni. 2) Support pacing internally in TCP when using the sch_fq packet scheduler for this is not practical. From Eric Dumazet. 3) Support mutliple filter chains per qdisc, from Jiri Pirko. 4) Move to 1ms TCP timestamp clock, from Eric Dumazet. 5) Add batch dequeueing to vhost_net, from Jason Wang. 6) Flesh out more completely SCTP checksum offload support, from Davide Caratti. 7) More plumbing of extended netlink ACKs, from David Ahern, Pablo Neira Ayuso, and Matthias Schiffer. 8) Add devlink support to nfp driver, from Simon Horman. 9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa Prabhu. 10) Add stack depth tracking to BPF verifier and use this information in the various eBPF JITs. From Alexei Starovoitov. 11) Support XDP on qed device VFs, from Yuval Mintz. 12) Introduce BPF PROG ID for better introspection of installed BPF programs. From Martin KaFai Lau. 13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann. 14) For loads, allow narrower accesses in bpf verifier checking, from Yonghong Song. 15) Support MIPS in the BPF selftests and samples infrastructure, the MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David Daney. 16) Support kernel based TLS, from Dave Watson and others. 17) Remove completely DST garbage collection, from Wei Wang. 18) Allow installing TCP MD5 rules using prefixes, from Ivan Delalande. 19) Add XDP support to Intel i40e driver, from Björn Töpel 20) Add support for TC flower offload in nfp driver, from Simon Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub Kicinski, and Bert van Leeuwen. 21) IPSEC offloading support in mlx5, from Ilan Tayari. 22) Add HW PTP support to macb driver, from Rafal Ozieblo. 23) Networking refcount_t conversions, From Elena Reshetova. 24) Add sock_ops support to BPF, from Lawrence Brako. This is useful for tuning the TCP sockopt settings of a group of applications, currently via CGROUPs" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits) net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap cxgb4: Support for get_ts_info ethtool method cxgb4: Add PTP Hardware Clock (PHC) support cxgb4: time stamping interface for PTP nfp: default to chained metadata prepend format nfp: remove legacy MAC address lookup nfp: improve order of interfaces in breakout mode net: macb: remove extraneous return when MACB_EXT_DESC is defined bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case bpf: fix return in load_bpf_file mpls: fix rtm policy in mpls_getroute net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t net, ax25: convert ax25_route.refcount from atomic_t to refcount_t net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t ...
2017-07-03Merge tag 'docs-4.13' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "There has been a fair amount of activity in the docs tree this time around. Highlights include: - Conversion of a bunch of security documentation into RST - The conversion of the remaining DocBook templates by The Amazing Mauro Machine. We can now drop the entire DocBook build chain. - The usual collection of fixes and minor updates" * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits) scripts/kernel-doc: handle DECLARE_HASHTABLE Documentation: atomic_ops.txt is core-api/atomic_ops.rst Docs: clean up some DocBook loose ends Make the main documentation title less Geocities Docs: Use kernel-figure in vidioc-g-selection.rst Docs: fix table problems in ras.rst Docs: Fix breakage with Sphinx 1.5 and upper Docs: Include the Latex "ifthen" package doc/kokr/howto: Only send regression fixes after -rc1 docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters doc: Document suitability of IBM Verse for kernel development Doc: fix a markup error in coding-style.rst docs: driver-api: i2c: remove some outdated information Documentation: DMA API: fix a typo in a function name Docs: Insert missing space to separate link from text doc/ko_KR/memory-barriers: Update control-dependencies example Documentation, kbuild: fix typo "minimun" -> "minimum" docs: Fix some formatting issues in request-key.rst doc: ReSTify keys-trusted-encrypted.txt doc: ReSTify keys-request-key.txt ...
2017-07-03Documentation: fix wrong example commandMatteo Croce
In the IPVLAN documentation there is an example command line where the master and slave interface names are inverted. Fix the command line and also add the optional `name' keyword to better describe what the command is doing. v2: added commit message Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-27net: remove policy-routing.txt documentationVincent Bernat
It dates back from 2.1.16 and is obsolete since 2.1.68 when the current rule system has been introduced. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15tls: DocumentationDave Watson
Add documentation for the tcp ULP tls interface. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: fix up hash documentationMichael S. Tsirkin
commit 61b905da33 ("net: Rename skb->rxhash to skb->hash") didn't update the documentation, fix this up. Cc: Tom Herbert <therbert@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07rxrpc: Provide a cmsg to specify the amount of Tx data for a callDavid Howells
Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response to indicate the total length of the data to be transmitted for that call. Currently, because the length of the payload of an encrypted DATA packet is encrypted in front of the data, the packet cannot be encrypted until we know how much data it will hold. By specifying the length at the beginning of the transmit phase, each DATA packet length can be set before we start loading data from userspace (where several sendmsg() calls may contribute to a particular packet). An error will be returned if too little or too much data is presented in the Tx phase. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-07rxrpc: Provide a getsockopt call to query what cmsgs types are supportedDavid Howells
Provide a getsockopt() call that can query what cmsg types are supported by AF_RXRPC.
2017-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just some simple overlapping changes in marvell PHY driver and the DSA core code. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06net: phy: Delete unused function phy_ethtool_gsetyuval.shaia@oracle.com
It's unused, so remove it. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05net: Update TCP congestion control documentationAnmol Sarma
Update tcp.txt to fix mandatory congestion control ops and default CCA selection. Also, fix comment in tcp.h for undo_cwnd. Signed-off-by: Anmol Sarma <me@anmolsarma.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05rxrpc: Add service upgrade support for client connectionsDavid Howells
Make it possible for a client to use AuriStor's service upgrade facility. The client does this by adding an RXRPC_UPGRADE_SERVICE control message to the first sendmsg() of a call. This takes no parameters. When recvmsg() starts returning data from the call, the service ID field in the returned msg_name will reflect the result of the upgrade attempt. If the upgrade was ignored, srx_service will match what was set in the sendmsg(); if the upgrade happened the srx_service will be altered to indicate the service the server upgraded to. Note that: (1) The choice of upgrade service is up to the server (2) Further client calls to the same server that would share a connection are blocked if an upgrade probe is in progress. (3) This should only be used to probe the service. Clients should then use the returned service ID in all subsequent communications with that server (and not set the upgrade). Note that the kernel will not retain this information should the connection expire from its cache. (4) If a server that supports upgrading is replaced by one that doesn't, whilst a connection is live, and if the replacement is running, say, OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement server will not respond to packets sent to the upgraded connection. At this point, calls will time out and the server must be reprobed. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05rxrpc: Implement service upgradeDavid Howells
Implement AuriStor's service upgrade facility. There are three problems that this is meant to deal with: (1) Various of the standard AFS RPC calls have IPv4 addresses in their requests and/or replies - but there's no room for including IPv6 addresses. (2) Definition of IPv6-specific RPC operations in the standard operation sets has not yet been achieved. (3) One could envision the creation a new service on the same port that as the original service. The new service could implement improved operations - and the client could try this first, falling back to the original service if it's not there. Unfortunately, certain servers ignore packets addressed to a service they don't implement and don't respond in any way - not even with an ABORT. This means that the client must then wait for the call timeout to occur. What service upgrade does is to see if the connection is marked as being 'upgradeable' and if so, change the service ID in the server and thus the request and reply formats. Note that the upgrade isn't mandatory - a server that supports only the original call set will ignore the upgrade request. In the protocol, the procedure is then as follows: (1) To request an upgrade, the first DATA packet in a new connection must have the userStatus set to 1 (this is normally 0). The userStatus value is normally ignored by the server. (2) If the server doesn't support upgrading, the reply packets will contain the same service ID as for the first request packet. (3) If the server does support upgrading, all future reply packets on that connection will contain the new service ID and the new service ID will be applied to *all* further calls on that connection as well. (4) The RPC op used to probe the upgrade must take the same request data as the shadow call in the upgrade set (but may return a different reply). GetCapability RPC ops were added to all standard sets for just this purpose. Ops where the request formats differ cannot be used for probing. (5) The client must wait for completion of the probe before sending any further RPC ops to the same destination. It should then use the service ID that recvmsg() reported back in all future calls. (6) The shadow service must have call definitions for all the operation IDs defined by the original service. To support service upgrading, a server should: (1) Call bind() twice on its AF_RXRPC socket before calling listen(). Each bind() should supply a different service ID, but the transport addresses must be the same. This allows the server to receive requests with either service ID. (2) Enable automatic upgrading by calling setsockopt(), specifying RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of unsigned shorts as the argument: unsigned short optval[2]; This specifies a pair of service IDs. They must be different and must match the service IDs bound to the socket. Member 0 is the service ID to upgrade from and member 1 is the service ID to upgrade to. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05rxrpc: Permit multiple service bindingDavid Howells
Permit bind() to be called on an AF_RXRPC socket more than once (currently maximum twice) to bind multiple listening services to it. There are some restrictions: (1) All bind() calls involved must have a non-zero service ID. (2) The service IDs must all be different. (3) The rest of the address (notably the transport part) must be the same in all (a single UDP socket is shared). (4) This must be done before listen() or sendmsg() is called. This allows someone to connect to the service socket with different service IDs and lays the foundation for service upgrading. The service ID used by an incoming call can be extracted from the msg_name returned by recvmsg(). Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-01i40evf: update i40evf.txt with new contentJesse Brandeburg
The addition of the AVF and virtchnl code to the i40evf driver means we should update the i40evf.txt file with the most up to date information. It seems this file hasn't been updated in a while, so the changes cover a little more than just AVF, but it's all only in the i40evf.txt. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-05-30Documentation: networking: add DPAA Ethernet documentMadalin Bucur
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: allow simultaneous SW and HW transmit timestampingMiroslav Lichvar
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to be looped to the socket's error queue with a software timestamp even when a hardware transmit timestamp is expected to be provided by the driver. Applications using this option will receive two separate messages from the error queue, one with a software timestamp and the other with a hardware timestamp. As the hardware timestamp is saved to the shared skb info, which may happen before the first message with software timestamp is received by the application, the hardware timestamp is copied to the SCM_TIMESTAMPING control message only when the skb has no software timestamp or it is an incoming packet. While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as there are no other users. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: fix documentation of struct scm_timestampingMiroslav Lichvar
The scm_timestamping struct may return multiple non-zero fields, e.g. when both software and hardware RX timestamping is enabled, or when the SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false software timestamp is generated in the recvmsg() call in order to always return a SCM_TIMESTAMP(NS) message. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: add new control message for incoming HW-timestamped packetsMiroslav Lichvar
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message for incoming packets with hardware timestamps. It contains the index of the real interface which received the packet and the length of the packet at layer 2. The index is useful with bonding, bridges and other interfaces, where IP_PKTINFO doesn't allow applications to determine which PHC made the timestamp. With the L2 length (and link speed) it is possible to transpose preamble timestamps to trailer timestamps, which are used in the NTP protocol. While this information could be provided by two new socket options independently from timestamping, it doesn't look like they would be very useful. With this option any performance impact is limited to hardware timestamping. Use dev_get_by_napi_id() to get the device and its index. On kernels with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero index will be returned in the control message. CC: Richard Cochran <richardcochran@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-19net: more accurate checksumming in validate_xmit_skb()Davide Caratti
skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to determine if skb needs software computation of Internet Checksum or crc32c (or nothing, if this computation can be done by the hardware). Use it in place of skb_checksum_help() in validate_xmit_skb() to avoid corruption of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL. While at it, remove references to skb_csum_off_chk* functions, since they are not present anymore in Linux _ see commit cf53b1da73bd ("Revert "net: Add driver helper functions to determine checksum offloadability""). Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18Merge remote-tracking branch 'mauro-exp/docbook3' into death-to-docbookJonathan Corbet
Mauro says: This patch series convert the remaining DocBooks to ReST. The first version was originally send as 3 patch series: [PATCH 00/36] Convert DocBook documents to ReST [PATCH 0/5] Convert more books to ReST [PATCH 00/13] Get rid of DocBook The lsm book was added as if it were a text file under Documentation. The plan is to merge it with another file under Documentation/security, after both this series and a security Documentation patch series gets merged. It also adjusts some Sphinx-pedantic errors/warnings on some kernel-doc markups. I also added some patches here to add PDF output for all existing ReST books.
2017-05-18doc: ReSTify keys-request-key.txtKees Cook
Adjusts for ReST markup and moves under keys security devel index. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-05-16docs-rst: convert scsi DocBook to ReSTMauro Carvalho Chehab
Use pandoc to convert documentation to ReST by calling Documentation/sphinx/tmplcvt script. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16docs-rst: convert z8530book DocBook to ReSTMauro Carvalho Chehab
Use pandoc to convert documentation to ReST by calling Documentation/sphinx/tmplcvt script. Some manual adjustments were required due to some conversion error on some literals. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-16docs-rst: convert networking book to ReSTMauro Carvalho Chehab
Use pandoc to convert documentation to ReST by calling Documentation/sphinx/tmplcvt script. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-05-02Merge tag 'docs-4.12' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation update from Jonathan Corbet: "A reasonably busy cycle for documentation this time around. There is a new guide for user-space API documents, rather sparsely populated at the moment, but it's a start. Markus improved the infrastructure for converting diagrams. Mauro has converted much of the USB documentation over to RST. Plus the usual set of fixes, improvements, and tweaks. There's a bit more than the usual amount of reaching out of Documentation/ to fix comments elsewhere in the tree; I have acks for those where I could get them" * tag 'docs-4.12' of git://git.lwn.net/linux: (74 commits) docs: Fix a couple typos docs: Fix a spelling error in vfio-mediated-device.txt docs: Fix a spelling error in ioctl-number.txt MAINTAINERS: update file entry for HSI subsystem Documentation: allow installing man pages to a user defined directory Doc/PM: Sync with intel_powerclamp code behavior zr364xx.rst: usb/devices is now at /sys/kernel/debug/ usb.rst: move documentation from proc_usb_info.txt to USB ReST book convert philips.txt to ReST and add to media docs docs-rst: usb: update old usbfs-related documentation arm: Documentation: update a path name docs: process/4.Coding.rst: Fix a couple of document refs docs-rst: fix usb cross-references usb: gadget.h: be consistent at kernel doc macros usb: composite.h: fix two warnings when building docs usb: get rid of some ReST doc build errors usb.rst: get rid of some Sphinx errors usb/URB.txt: convert to ReST and update it usb/persist.txt: convert to ReST and add to driver-api book usb/hotplug.txt: convert to ReST and add to driver-api book ...
2017-05-01switchdev: documentation: fix whitespace issuesLiam Beguin
Figure 1 is full of whitespaces; fix it Signed-off-by: Liam Beguin <lbeguin@tycoint.com> Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Disable active side TFO in certain scenariosWei Wang
Middlebox firewall issues can potentially cause server's data being blackholed after a successful 3WHS using TFO. Following are the related reports from Apple: https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf Slide 31 identifies an issue where the client ACK to the server's data sent during a TFO'd handshake is dropped. C ---> syn-data ---> S C <--- syn/ack ----- S C (accept & write) C <---- data ------- S C ----- ACK -> X S [retry and timeout] https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf Slide 5 shows a similar situation that the server's data gets dropped after 3WHS. C ---- syn-data ---> S C <--- syn/ack ----- S C ---- ack --------> S S (accept & write) C? X <- data ------ S [retry and timeout] This is the worst failure b/c the client can not detect such behavior to mitigate the situation (such as disabling TFO). Failing to proceed, the application (e.g., SSL library) may simply timeout and retry with TFO again, and the process repeats indefinitely. The proposed solution is to disable active TFO globally under the following circumstances: 1. client side TFO socket detects out of order FIN 2. client side TFO socket receives out of order RST We disable active side TFO globally for 1hr at first. Then if it happens again, we disable it for 2h, then 4h, 8h, ... And we reset the timeout to 1hr if a client side TFO sockets not opened on loopback has successfully received data segs from server. And we examine this condition during close(). The rational behind it is that when such firewall issue happens, application running on the client should eventually close the socket as it is not able to get the data it is expecting. Or application running on the server should close the socket as it is not able to receive any response from client. In both cases, out of order FIN or RST will get received on the client given that the firewall will not block them as no data are in those frames. And we want to disable active TFO globally as it helps if the middle box is very close to the client and most of the connections are likely to fail. Also, add a debug sysctl: tcp_fastopen_blackhole_detect_timeout_sec: the initial timeout to use when firewall blackhole issue happens. This can be set and read. When setting it to 0, it means to disable the active disable logic. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-23bpf, doc: update list of architectures that do eBPF JITAlexei Starovoitov
update the list and remove 'in the future' statement, since all still alive 64-bit architectures now do eBPF JIT. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20docs-rst: usb: update old usbfs-related documentationMauro Carvalho Chehab
There's no usbfs anymore. The old features are now either exported to /dev/bus/usb or via debugfs. Update documentation accordingly, pointing to the new places where the character devices and usb/devices are now placed. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-03-29Documentation: Fix dead URLs to ftp.kernel.orgSeongJae Park
As ftp.kernel.org is closed [0], this commit fixes dead URLs in documents to use www.kernel.org instead. [0] https://www.kernel.org/shutting-down-ftp-services.html Signed-off-by: SeongJae Park <sj38.park@gmail.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-03-24Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-03-23 This series contains updates to i40e and i40e.txt documentation. Jake provides all the changes in the series which are centered around ntuple filter fixes and additional support. Fixed the current implementation of .set_rxnfc, where we were not reading the mask field for filter entries which was resulting in filters not behaving as expected and not working correctly. When cleaning up after disabling flow director support, ensure that the default input set is correctly reprogrammed. Since the hardware only supports a single input set for all flows of that type, the driver shall only allow the input set to change if there are no other configured filters for that flow type, so add support to detect when we can update the input set for each flow type. Align the driver to other drivers to partition the ring_cookie value into 8bits of VF index, along with 32bits of queue number instead of using the user-def field. Added support to parse the user-def field into a data structure format to allow future extensions of the user-def filed by keeping all the code that read/writes the field into a single location. Added support for flexible payloads passed via ethtool user-def field. We support a single flexible word (2byte) value per protocol type, and we handle the FLX_PIT register using a list of flexible entries so that each flow type may be configured separately. Enabled flow director filters for SCTPv4 packets using the ethtool ntuple interface to enable filters. Updated the documentation on the i40e driver to include the newly added support to ntuple filters. Reduced complexity of a if-continue-else-break section of code by taking advantage of using hlist_for_each_entry_continue() instead. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Add sysctl to toggle early demux for tcp and udpsubashab@codeaurora.org
Certain system process significant unconnected UDP workload. It would be preferrable to disable UDP early demux for those systems and enable it for TCP only. By disabling UDP demux, we see these slight gains on an ARM64 system- 782 -> 788Mbps unconnected single stream UDPv4 633 -> 654Mbps unconnected UDPv4 different sources The performance impact can change based on CPU architecure and cache sizes. There will not much difference seen if entire UDP hash table is in cache. Both sysctls are enabled by default to preserve existing behavior. v1->v2: Change function pointer instead of adding conditional as suggested by Stephen. v2->v3: Read once in callers to avoid issues due to compiler optimizations. Also update commit message with the tests. v3->v4: Store and use read once result instead of querying pointer again incorrectly. v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n} Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Tom Herbert <tom@herbertland.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23i40e: document drivers use of ntuple filtersJacob Keller
Add documentation describing the drivers use of ethtool ntuple filters, including the limitations that it has due to hardware, as well as how it reads and parses the user-def data block. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-22net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.Joel Scherpelz
This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21net: ipv4: add support for ECMP hash policy choiceNikolay Aleksandrov
This patch adds support for ECMP hash policy choice via a new sysctl called fib_multipath_hash_policy and also adds support for L4 hashes. The current values for fib_multipath_hash_policy are: 0 - layer 3 (default) 1 - layer 4 If there's an skb hash already set and it matches the chosen policy then it will be used instead of being calculated (currently only for L4). In L3 mode we always calculate the hash due to the ICMP error special case, the flow dissector's field consistentification should handle the address order thus we can remove the address reversals. If the skb is provided we always use it for the hash calculation, otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. A couple of new features for nf_tables, and unsorted cleanups and incremental updates for the Netfilter tree. More specifically, they are: 1) Allow to check for TCP option presence via nft_exthdr, patch from Phil Sutter. 2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana. 3) Use pr_cont() in ebt_log, from Joe Perches. 4) Remove some dead code in arp_tables reported via static analysis tool, from Colin Ian King. 5) Consolidate nf_tables expression validation, from Liping Zhang. 6) Consolidate set lookup via nft_set_lookup(). 7) Remove unnecessary rcu read lock side in bridge netfilter, from Florian Westphal. 8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo. 9) Pass nft_ctx struct to object initialization indirections, from Florian Westphal. 10) Add code to integrate conntrack helper into nf_tables, also from Florian. 11) Allow to check if interface index or name exists via NFTA_FIB_F_PRESENT, from Phil Sutter. 12) Simplify resolve_normal_ct(), from Florian. 13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang. 14) Use rwlock in nft_set_rbtree set, also from Liping Zhang. 15) One patch to remove a useless printk at netns init path in ipvs, and several patches to document IPVS knobs. 16) Use refcount_t for reference counter in the Netfilter/IPVS code, from Elena Reshetova. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16tcp: remove tcp_tw_recycleSoheil Hassas Yeganeh
The tcp_tw_recycle was already broken for connections behind NAT, since the per-destination timestamp is not monotonically increasing for multiple machines behind a single destination address. After the randomization of TCP timestamp offsets in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection), the tcp_tw_recycle is broken for all types of connections for the same reason: the timestamps received from a single machine is not monotonically increasing, anymore. Remove tcp_tw_recycle, since it is not functional. Also, remove the PAWSPassive SNMP counter since it is only used for tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req since the strict argument is only set when tcp_tw_recycle is enabled. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Lutz Vieweg <lvml@5t9.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16ipvs: Document sysctl pmtu_discHangbin Liu
Document sysctl pmtu_disc based on commit 3654e61137db ("ipvs: add pmtu_disc option to disable IP DF for TUN packets"). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16ipvs: Document sysctl sync_portsHangbin Liu
Document sysctl sync_ports based on commit f73181c8288f ("ipvs: add support for sync threads"). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16ipvs: Document sysctl sync_qlen_max and sync_sock_sizeHangbin Liu
Document sysctl sync_qlen_max and sync_sock_size based on commit 1c003b1580e2 ("ipvs: wakeup master thread"). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16ipvs: fix sync_threshold description and add sync_refresh_period, sync_retriesHangbin Liu
Fix sync_threshold description which should have two values. Also add sync_refresh_period and sync_retries based on commit 749c42b620a9 ("ipvs: reduce sync rate with time thresholds"). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/genet/bcmgenet.c net/core/sock.c Conflicts were overlapping changes in bcmgenet and the lockdep handling of sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13mpls: allow TTL propagation from IP packets to be configuredRobert Shearman
Allow TTL propagation from IP packets to MPLS packets to be configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which allows the TTL to be set in the resulting MPLS packet, with the value of 0 having the semantics of enabling propagation of the TTL from the IP header (i.e. non-zero values disable propagation). Also allow the configuration to be overridden globally by reusing the same sysctl to control whether the TTL is propagated from IP packets into the MPLS header. If the per-LWT attribute is set then it overrides the global configuration. If the TTL isn't propagated then a default TTL value is used which can be configured via a new sysctl, "net.mpls.default_ttl". This is kept separate from the configuration of whether IP TTL propagation is enabled as it can be used in the future when non-IP payloads are supported (i.e. where there is no payload TTL that can be propagated). Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13mpls: allow TTL propagation to IP packets to be configuredRobert Shearman
Provide the ability to control on a per-route basis whether the TTL value from an MPLS packet is propagated to an IPv4/IPv6 packet when the last label is popped as per the theoretical model in RFC 3443 through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to mean disable propagation and 1 to mean enable propagation. In order to provide the ability to change the behaviour for packets arriving with IPv4/IPv6 Explicit Null labels and to provide an easy way for a user to change the behaviour for all existing routes without having to reprogram them, a global knob is provided. This is done through the addition of a new per-namespace sysctl, "net.mpls.ip_ttl_propagate", which defaults to enabled. If the per-route attribute is set (either enabled or disabled) then it overrides the global configuration. Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12Make IP 'forwarding' doc more preciseNeil Jerram
It wasn't clear if the 'forwarding' setting needs to be enabled on the interface that packets are received from, or on the interface that packets are forwarded to, or both. In fact (according to my code reading) the setting is relevant on the interface that packets are received from, so this change updates the doc to say that. Signed-off-by: Neil Jerram <neil@tigera.io> Signed-off-by: David S. Miller <davem@davemloft.net>