summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2017-07-05netfilter: synproxy: fix conntrackd interactionEric Leblond
commit 87e94dbc210a720a34be5c1174faee5c84be963e upstream. This patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05netfilter: xt_TCPMSS: add more sanity tests on tcph->doffEric Dumazet
commit 2638fd0f92d4397884fd991d8f4925cb3f081901 upstream. Denys provided an awesome KASAN report pointing to an use after free in xt_TCPMSS I have provided three patches to fix this issue, either in xt_TCPMSS or in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-08netfilter; Add some missing default cases to switch statements in nft_reject.David S. Miller
commit 129d23a56623eea0947a05288158d76dc7f2f0ac upstream. This fixes: ==================== net/netfilter/nft_reject.c: In function ‘nft_reject_dump’: net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswitch] switch (priv->type) { ^ net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_ICMPX_UNREACH’ not handled in switch [-Wswi\ tch] net/netfilter/nft_reject_inet.c: In function ‘nft_reject_inet_dump’: net/netfilter/nft_reject_inet.c:105:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswi\ tch] switch (priv->type) { ^ ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-08netfilter: Fix switch statement warnings with recent gcc.David Miller
commit c1f866767777d1c6abae0ec57effffcb72017c00 upstream. More recent GCC warns about two kinds of switch statement uses: 1) Switching on an enumeration, but not having an explicit case statement for all members of the enumeration. To show the compiler this is intentional, we simply add a default case with nothing more than a break statement. 2) Switching on a boolean value. I think this warning is dumb but nevertheless you get it wholesale with -Wswitch. This patch cures all such warnings in netfilter. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-03netfilter: x_tables: speed up jump target validationFlorian Westphal
[ Upstream commit f4dc77713f8016d2e8a3295e1c9c53a21f296def ] The dummy ruleset I used to test the original validation change was broken, most rules were unreachable and were not tested by mark_source_chains(). In some cases rulesets that used to load in a few seconds now require several minutes. sample ruleset that shows the behaviour: echo "*filter" for i in $(seq 0 100000);do printf ":chain_%06x - [0:0]\n" $i done for i in $(seq 0 100000);do printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i done echo COMMIT [ pipe result into iptables-restore ] This ruleset will be about 74mbyte in size, with ~500k searches though all 500k[1] rule entries. iptables-restore will take forever (gave up after 10 minutes) Instead of always searching the entire blob for a match, fill an array with the start offsets of every single ipt_entry struct, then do a binary search to check if the jump target is present or not. After this change ruleset restore times get again close to what one gets when reverting 36472341017529e (~3 seconds on my workstation). [1] every user-defined rule gets an implicit RETURN, so we get 300k jumps + 100k userchains + 100k returns -> 500k rule entries Fixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps") Reported-by: Jeff Wu <wujiafu@gmail.com> Tested-by: Jeff Wu <wujiafu@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
2016-07-12netfilter: x_tables: introduce and use xt_copy_counters_from_userFlorian Westphal
[ Upstream commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce ] The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: do compat validation via translate_tableFlorian Westphal
[ Upstream commit 09d9686047dbbe1cf4faa558d3ecc4aae2046054 ] This looks like refactoring, but its also a bug fix. Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few sanity tests that are done in the normal path. For example, we do not check for underflows and the base chain policies. While its possible to also add such checks to the compat path, its more copy&pastry, for instance we cannot reuse check_underflow() helper as e->target_offset differs in the compat case. Other problem is that it makes auditing for validation errors harder; two places need to be checked and kept in sync. At a high level 32 bit compat works like this: 1- initial pass over blob: validate match/entry offsets, bounds checking lookup all matches and targets do bookkeeping wrt. size delta of 32/64bit structures assign match/target.u.kernel pointer (points at kernel implementation, needed to access ->compatsize etc.) 2- allocate memory according to the total bookkeeping size to contain the translated ruleset 3- second pass over original blob: for each entry, copy the 32bit representation to the newly allocated memory. This also does any special match translations (e.g. adjust 32bit to 64bit longs, etc). 4- check if ruleset is free of loops (chase all jumps) 5-first pass over translated blob: call the checkentry function of all matches and targets. The alternative implemented by this patch is to drop steps 3&4 from the compat process, the translation is changed into an intermediate step rather than a full 1:1 translate_table replacement. In the 2nd pass (step #3), change the 64bit ruleset back to a kernel representation, i.e. put() the kernel pointer and restore ->u.user.name . This gets us a 64bit ruleset that is in the format generated by a 64bit iptables userspace -- we can then use translate_table() to get the 'native' sanity checks. This has two drawbacks: 1. we re-validate all the match and target entry structure sizes even though compat translation is supposed to never generate bogus offsets. 2. we put and then re-lookup each match and target. THe upside is that we get all sanity tests and ruleset validations provided by the normal path and can remove some duplicated compat code. iptables-restore time of autogenerated ruleset with 300k chains of form -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002 -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003 shows no noticeable differences in restore times: old: 0m30.796s new: 0m31.521s 64bit: 0m25.674s Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: xt_compat_match_from_user doesn't need a retvalFlorian Westphal
[ Upstream commit 0188346f21e6546498c2a0f84888797ad4063fc5 ] Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: don't reject valid target size on some architecturesFlorian Westphal
[ Upstream commit 7b7eba0f3515fca3296b8881d583f7c1042f5226 ] Quoting John Stultz: In updating a 32bit arm device from 4.6 to Linus' current HEAD, I noticed I was having some trouble with networking, and realized that /proc/net/ip_tables_names was suddenly empty. Digging through the registration process, it seems we're catching on the: if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 && target_offset + sizeof(struct xt_standard_target) != next_offset) return -EINVAL; Where next_offset seems to be 4 bytes larger then the offset + standard_target struct size. next_offset needs to be aligned via XT_ALIGN (so we can access all members of ip(6)t_entry struct). This problem didn't show up on i686 as it only needs 4-byte alignment for u64, but iptables userspace on other 32bit arches does insert extra padding. Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: validate all offsets and sizes in a ruleFlorian Westphal
[ Upstream commit 13631bfc604161a9d69cd68991dff8603edd66f9 ] Validate that all matches (if any) add up to the beginning of the target and that each match covers at least the base structure size. The compat path should be able to safely re-use the function as the structures only differ in alignment; added a BUILD_BUG_ON just in case we have an arch that adds padding as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: check for bogus target offsetFlorian Westphal
[ Upstream commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c ] We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: check standard target size tooFlorian Westphal
[ Upstream commit 7ed2abddd20cf8f6bd27f65bd218f26fa5bf7f44 ] We have targets and standard targets -- the latter carries a verdict. The ip/ip6tables validation functions will access t->verdict for the standard targets to fetch the jump offset or verdict for chainloop detection, but this happens before the targets get checked/validated. Thus we also need to check for verdict presence here, else t->verdict can point right after a blob. Spotted with UBSAN while testing malformed blobs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: add compat version of xt_check_entry_offsetsFlorian Westphal
[ Upstream commit fc1221b3a163d1386d1052184202d5dc50d302d1 ] 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: assert minimum target sizeFlorian Westphal
[ Upstream commit a08e4e190b866579896c09af59b3bdca821da2cd ] The target size includes the size of the xt_entry_target struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12netfilter: x_tables: add and use xt_check_entry_offsetsFlorian Westphal
[ Upstream commit 7d35812c3214afa5b37a675113555259cfd67b98 ] Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-07-12ipvs: correct initial offset of Call-ID header search in SIP persistence engineMarco Angaroni
[ Upstream commit 7617a24f83b5d67f4dab1844956be1cebc44aec8 ] The IPVS SIP persistence engine is not able to parse the SIP header "Call-ID" when such header is inserted in the first positions of the SIP message. When IPVS is configured with "--pe sip" option, like for example: ipvsadm -A -u 1.2.3.4:5060 -s rr --pe sip -p 120 -o some particular messages (see below for details) do not create entries in the connection template table, which can be listed with: ipvsadm -Lcn --persistent-conn Problematic SIP messages are SIP responses having "Call-ID" header positioned just after message first line: SIP/2.0 200 OK [Call-ID header here] [rest of the headers] When "Call-ID" header is positioned down (after a few other headers) it is correctly recognized. This is due to the data offset used in get_callid function call inside ip_vs_pe_sip.c file: since dptr already points to the start of the SIP message, the value of dataoff should be initially 0. Otherwise the header is searched starting from some bytes after the first character of the SIP message. Fixes: 758ff0338722 ("IPVS: sip persistence engine") Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-05-17nf_conntrack: avoid kernel pointer value leak in slab nameLinus Torvalds
[ Upstream commit 31b0b385f69d8d5491a4bca288e25e63f1d945d0 ] The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27ipvs: fix crash with sync protocol v0 and FTPJulian Anastasov
[ Upstream commit 56184858d1fc95c46723436b455cb7261cd8be6f ] Fix crash in 3.5+ if FTP is used after switching sync_version to 0. Fixes: 749c42b620a9 ("ipvs: reduce sync rate with time thresholds") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27ipvs: skb_orphan in case of forwardingAlex Gartrell
[ Upstream commit 71563f3414e917c62acd8e0fb0edf8ed6af63e4b ] It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell <agartrell@fb.com> Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27ipvs: fix crash if scheduler is changedJulian Anastasov
[ Upstream commit 05f00505a89acd21f5d0d20f5797dfbc4cf85243 ] I overlooked the svc->sched_data usage from schedulers when the services were converted to RCU in 3.10. Now the rare ipvsadm -E command can change the scheduler but due to the reverse order of ip_vs_bind_scheduler and ip_vs_unbind_scheduler we provide new sched_data to the old scheduler resulting in a crash. To fix it without changing the scheduler methods we have to use synchronize_rcu() only for the editing case. It means all svc->scheduler readers should expect a NULL value. To avoid breakage for the service listing and ipvsadm -R we can use the "none" name to indicate that scheduler is not assigned, a state when we drop new connections. Reported-by: Alexander Vasiliev <a.vasylev@404-group.com> Fixes: ceec4c381681 ("ipvs: convert services to rcu") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27ipvs: do not use random local source address for tunnelsJulian Anastasov
[ Upstream commit 4754957f04f5f368792a0eb7dab0ae89fb93dcfd ] Michael Vallaly reports about wrong source address used in rare cases for tunneled traffic. Looks like __ip_vs_get_out_rt in 3.10+ is providing uninitialized dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses kmalloc. While we retry after seeing EINVAL from routing for data that does not look like valid local address, it still succeeded when this memory was previously used from other dests and with different local addresses. As result, we can use valid local address that is not suitable for our real server. Fix it by providing 0.0.0.0 every time our cache is refreshed. By this way we will get preferred source address from routing. Reported-by: Michael Vallaly <lvs@nolatency.com> Fixes: 026ace060dfe ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: nf_log: don't zap all loggers on unregisterFlorian Westphal
[ Upstream commit 205ee117d4dc4a11ac3bd9638bb9b2e839f4de9a ] like nf_log_unset, nf_log_unregister must not reset the list of loggers. Otherwise, a call to nf_log_unregister() will render loggers of other nf protocols unusable: iptables -A INPUT -j LOG modprobe nf_log_arp ; rmmod nf_log_arp iptables -A INPUT -j LOG iptables: No chain/target/match by that name Fixes: 30e0c6a6be ("netfilter: nf_log: prepare net namespace support for loggers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: nf_log: Introduce nft_log_dereference() macroMarcelo Leitner
[ Upstream commit 0c26ed1c07f13ca27e2638ffdd1951013ed96c48 ] Wrap up a common call pattern in an easier to handle call. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: nft_compat: skip family comparison in case of NFPROTO_UNSPECPablo Neira Ayuso
[ Upstream commit ba378ca9c04a5fc1b2cf0f0274a9d02eb3d1bad9 ] Fix lookup of existing match/target structures in the corresponding list by skipping the family check if NFPROTO_UNSPEC is used. This is resulting in the allocation and insertion of one match/target structure for each use of them. So this not only bloats memory consumption but also severely affects the time to reload the ruleset from the iptables-compat utility. After this patch, iptables-compat-restore and iptables-compat take almost the same time to reload large rulesets. Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: nf_log: wait for rcu grace after logger unregistrationPablo Neira Ayuso
[ Upstream commit ad5001cc7cdf9aaee5eb213fdee657e4a3c94776 ] The nf_log_unregister() function needs to call synchronize_rcu() to make sure that the objects are not dereferenced anymore on module removal. Fixes: 5962815a6a56 ("netfilter: nf_log: use an array of loggers instead of list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: ctnetlink: put back references to master ct and expect objectsPablo Neira Ayuso
[ Upstream commit 95dd8653de658143770cb0e55a58d2aab97c79d2 ] We have to put back the references to the master conntrack and the expectation that we just created, otherwise we'll leak them. Fixes: 0ef71ee1a5b9 ("netfilter: ctnetlink: refactor ctnetlink_create_expect") Reported-by: Tim Wiess <Tim.Wiess@watchguard.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: nf_conntrack: Support expectations in different zonesJoe Stringer
[ Upstream commit 4b31814d20cbe5cd4ccf18089751e77a04afe4f2 ] When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27netfilter: nfnetlink: work around wrong endianess in res_id fieldPablo Neira Ayuso
[ Upstream commit a9de9777d613500b089a7416f936bf3ae5f070d2 ] The convention in nfnetlink is to use network byte order in every header field as well as in the attribute payload. The initial version of the batching infrastructure assumes that res_id comes in host byte order though. The only client of the batching infrastructure is nf_tables, so let's add a workaround to address this inconsistency. We currently have 11 nfnetlink subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem 2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias of nf_tables from the nfnetlink batching path when interpreting the res_id field. Based on original patch from Florian Westphal. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-13netfilter: nf_tables: allow to change chain policy without hook if it existsPablo Neira Ayuso
[ Upstream commit d6b6cb1d3e6f78d55c2d4043d77d0d8def3f3b99 ] If there's an existing base chain, we have to allow to change the default policy without indicating the hook information. However, if the chain doesn't exists, we have to enforce the presence of the hook attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-13netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is setPablo Neira Ayuso
[ Upstream commit 749177ccc74f9c6d0f51bd78a15c652a2134aa11 ] ip6tables extensions check for this flag to restrict match/target to a given protocol. Without this flag set, SYNPROXY6 returns an error. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-13netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()Ian Wilson
[ Upstream commit 78146572b9cd20452da47951812f35b1ad4906be ] nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(), nfnl_cthelper_get() and nfnl_cthelper_del(). In each case they pass a pointer to an nf_conntrack_tuple data structure local variable: struct nf_conntrack_tuple tuple; ... ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]); The problem is that this local variable is not initialized, and nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and dst.protonum. This leaves all other fields with undefined values based on whatever is on the stack: tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM])); tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]); The symptom observed was that when the rpc and tns helpers were added then traffic to port 1536 was being sent to user-space. Signed-off-by: Ian Wilson <iwilson@brocade.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-12netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warningsChen Gang
[ Upstream commit b18c5d15e8714336365d9d51782d5b53afa0443c ] The related code can be simplified, and also can avoid related warnings (with allmodconfig under parisc): CC [M] net/netfilter/nfnetlink_cthelper.o net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’: net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers] memcpy(&help->data, nla_data(attr), help->helper->data_len); ^ In file included from include/linux/string.h:17:0, from include/uapi/linux/uuid.h:25, from include/linux/uuid.h:23, from include/linux/mod_devicetable.h:12, from ./arch/parisc/include/asm/hardware.h:4, from ./arch/parisc/include/asm/processor.h:15, from ./arch/parisc/include/asm/spinlock.h:6, from ./arch/parisc/include/asm/atomic.h:21, from include/linux/atomic.h:4, from ./arch/parisc/include/asm/bitops.h:12, from include/linux/bitops.h:36, from include/linux/kernel.h:10, from include/linux/list.h:8, from include/linux/module.h:9, from net/netfilter/nfnetlink_cthelper.c:11: ./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’ void * memcpy(void * dest,const void *src,size_t count); ^ Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05netfilter: nf_qeueue: Drop queue entries on nf_unregister_hookEric W. Biederman
[ Upstream commit 8405a8fff3f8545c888a872d6e3c0c8eecd4d348 ] Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. > BUG: unable to handle kernel paging request at 0000000100000001 > IP: [<0000000100000001>] 0x100000001 > PGD b9c35067 PUD 0 > Oops: 0010 [#1] SMP > Modules linked in: > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000 > RIP: 0010:[<0000000100000001>] [<0000000100000001>] 0x100000001 > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16 > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90 > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00 > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28 > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900 > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000 > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0 > Stack: > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8 > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128 > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0 > Call Trace: > [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0 > [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190 > [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360 > [<ffffffff81386290>] ? nla_parse+0x80/0xf0 > [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240 > [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150 > [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20 > [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0 > [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0 > [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650 > [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50 > [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0 > [<ffffffff810e8f73>] ? __wake_up+0x43/0x70 > [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0 > [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80 > [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a > Code: Bad RIP value. > RIP [<0000000100000001>] 0x100000001 > RSP <ffff8800ba9dba40> > CR2: 0000000100000001 > ---[ end trace 08eb65d42362793f ]--- Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28netfilter: x_tables: fix cgroup matching on non-full sksDaniel Borkmann
[ Upstream commit afb7718016fcb0370ac29a83b2839c78b76c2960 ] While originally only being intended for outgoing traffic, commit a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook as well, in order to allow for nfacct accounting. Besides being currently limited to early demuxes only, commit a00e76349f35 forgot to add a check if we deal with full sockets, i.e. in this case not with time wait sockets. TCP time wait sockets do not have the same memory layout as full sockets, a lower memory footprint and consequently also don't have a sk_classid member; probing for sk_classid member there could potentially lead to a crash. Fixes: a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks") Cc: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-01-29ipvs: uninitialized data with IP_VS_IPV6Dan Carpenter
commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream. The app_tcp_pkt_out() function expects "*diff" to be set and ends up using uninitialized data if CONFIG_IP_VS_IPV6 is turned on. The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov for noticing that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29netfilter: conntrack: fix race between confirmation and flushPablo Neira Ayuso
commit 8ca3f5e974f2b4b7f711589f4abff920db36637a upstream. Commit 5195c14c8b27c ("netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse") aimed to resolve the race condition between the confirmation (packet path) and the flush command (from control plane). However, it introduced a crash when several packets race to add a new conntrack, which seems easier to reproduce when nf_queue is in place. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list This patch also changes the verdict from NF_ACCEPT to NF_DROP when we lose race. Basically, the confirmation happens for the first packet that we see in a flow. If you just invoked conntrack -F once (which should be the common case), then this is likely to be the first packet of the flow (unless you already called flush anytime soon in the past). This should be hard to trigger, but better drop this packet, otherwise we leave things in inconsistent state since the destination will likely reply to this packet, but it will find no conntrack, unless the origin retransmits. The change of the verdict has been discussed in: https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29netfilter: nfnetlink: relax strict multicast group check from netlink_bindPablo Neira Ayuso
commit 62924af247e95de7041a6d6f2d06cdd05152e2dc upstream. Relax the checking that was introduced in 97840cb ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") when the subscription bitmask is used. Existing userspace code code may request to listen to all of the existing netlink groups by setting an all to one subscription group bitmask. Netlink already validates subscription via setsockopt() for us. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29netfilter: nf_tables: fix flush ruleset chain dependenciesPablo Neira Ayuso
commit a2f18db0c68fec96631c10cad9384c196e9008ac upstream. Jumping between chains doesn't mix well with flush ruleset. Rules from a different chain and set elements may still refer to us. [ 353.373791] ------------[ cut here ]------------ [ 353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159! [ 353.373896] invalid opcode: 0000 [#1] SMP [ 353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi [ 353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98 [ 353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010 [...] [ 353.375018] Call Trace: [ 353.375046] [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540 [ 353.375101] [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0 [ 353.375150] [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0 [ 353.375200] [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790 [ 353.375253] [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0 [ 353.375300] [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70 [ 353.375357] [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30 [ 353.375410] [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0 [ 353.375459] [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400 [ 353.375510] [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90 [ 353.375563] [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20 [ 353.375616] [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0 [ 353.375667] [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80 [ 353.375719] [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d [ 353.375776] [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20 [ 353.375823] [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b Release objects in this order: rules -> sets -> chains -> tables, to make sure no references to chains are held anymore. Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29netfilter: nfnetlink: validate nfnetlink header from batchPablo Neira Ayuso
commit 9ea2aa8b7dba9e99544c4187cc298face254569f upstream. Make sure there is enough room for the nfnetlink header in the netlink messages that are part of the batch. There is a similar check in netlink_rcv_skb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-25Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against ↵Pablo Neira
get_next_corpse" This reverts commit 5195c14c8b27cc0b18220ddbf0e5ad3328a04187. If the conntrack clashes with an existing one, it is left out of the unconfirmed list, thus, crashing when dropping the packet and releasing the conntrack since golden rule is that conntracks are always placed in any of the existing lists for traceability reasons. Reported-by: Daniel Borkmann <dborkman@redhat.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-17netfilter: nfnetlink: fix insufficient validation in nfnetlink_bindPablo Neira Ayuso
Make sure the netlink group exists, otherwise you can trigger an out of bound array memory access from the netlink_bind() path. This splat can only be triggered only by superuser. [ 180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28 [ 180.204249] index 9 is out of range for type 'int [9]' [ 180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122 [ 180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org +04/01/2014 [ 180.206498] 0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8 [ 180.207220] ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8 [ 180.207887] ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000 [ 180.208639] Call Trace: [ 180.208857] dump_stack (lib/dump_stack.c:52) [ 180.209370] ubsan_epilogue (lib/ubsan.c:174) [ 180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400) [ 180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467) [ 180.210986] netlink_bind (net/netlink/af_netlink.c:1483) [ 180.211495] SYSC_bind (net/socket.c:1541) Moreover, define the missing nf_tables and nf_acct multicast groups too. Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-14netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpsebill bonaparte
After removal of the central spinlock nf_conntrack_lock, in commit 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock"), it is possible to race against get_next_corpse(). The race is against the get_next_corpse() cleanup on the "unconfirmed" list (a per-cpu list with seperate locking), which set the DYING bit. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list. While at this, fix coding style of the comment that has been updated. Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: bill bonaparte <programme110@gmail.com> Signed-off-by: bill bonaparte <programme110@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12netfilter: nf_tables: restore synchronous object release from commit/abortPablo Neira Ayuso
The existing xtables matches and targets, when used from nft_compat, may sleep from the destroy path, ie. when removing rules. Since the objects are released via call_rcu from softirq context, this results in lockdep splats and possible lockups that may be hard to reproduce. Patrick also indicated that delayed object release via call_rcu can cause us problems in the ordering of event notifications when anonymous sets are in place. So, this patch restores the synchronous object release from the commit and abort paths. This includes a call to synchronize_rcu() to make sure that no packets are walking on the objects that are going to be released. This is slowier though, but it's simple and it resolves the aforementioned problems. This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all object release via rcu") that was introduced in 3.16 to speed up interaction with userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12netfilter: nft_compat: use the match->table to validate dependenciesPablo Neira Ayuso
Instead of the match->name, which is of course not relevant. Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12netfilter: nft_compat: relax chain type validationPablo Neira Ayuso
Check for nat chain dependency only, which is the one that can actually crash the kernel. Don't care if mangle, filter and security specific match and targets are used out of their scope, they are harmless. This restores iptables-compat with mangle specific match/target when used out of the OUTPUT chain, that are actually emulated through filter chains, which broke when performing strict validation. Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12netfilter: nft_compat: use current net namespacePablo Neira Ayuso
Instead of init_net when using xtables over nftables compat. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12ipvs: Keep skb->sk when allocating headroom on tunnel xmitCalvin Owens
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new skb, either unconditionally setting ->sk to NULL or allowing the uninitialized ->sk from a newly allocated skb to leak through to the caller. This patch properly copies ->sk and increments its reference count. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-11-11netfilter: ipset: small potential read beyond the end of bufferDan Carpenter
We could be reading 8 bytes into a 4 byte buffer here. It seems harmless but adding a check is the right thing to do and it silences a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-28ipvs: Avoid null-pointer deref in debug codeAlex Gartrell
Use daddr instead of reaching into dest. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-27netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()Arturo Borrero
The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>