summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-10-26Linux 3.2.52v3.2.52Ben Hutchings
2013-10-26can: flexcan: flexcan_chip_start: fix regression, mark one MB for TX and ↵Marc Kleine-Budde
abort pending TX commit d5a7b406c529e4595ce03dc8f6dcf7fa36f106fa upstream. In patch 0d1862e can: flexcan: fix flexcan_chip_start() on imx6 the loop in flexcan_chip_start() that iterates over all mailboxes after the soft reset of the CAN core was removed. This loop put all mailboxes (even the ones marked as reserved 1...7) into EMPTY/INACTIVE mode. On mailboxes 8...63, this aborts any pending TX messages. After a cold boot there is random garbage in the mailboxes, which leads to spontaneous transmit of CAN frames during first activation. Further if the interface was disabled with a pending message (usually due to an error condition on the CAN bus), this message is retransmitted after enabling the interface again. This patch fixes the regression by: 1) Limiting the maximum number of used mailboxes to 8, 0...7 are used by the RX FIFO, 8 is used by TX. 2) Marking the TX mailbox as EMPTY/INACTIVE, so that any pending TX of that mailbox is aborted. Cc: Lothar Waßmann <LW@KARO-electronics.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> [bwh: Backported to 3.2: - Adjust context - Hardware local echo is still enabled] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26gianfar: Change default HW Tx queue scheduling modeClaudiu Manoil
commit b98b8babd6e3370fadb7c6eaacb00eb2f6344a6c upstream. This is primarily to address transmission timeout occurrences, when multiple H/W Tx queues are being used concurrently. Because in the priority scheduling mode the controller does not service the Tx queues equally (but in ascending index order), Tx timeouts are being triggered rightaway for a basic test with multiple simultaneous connections like: iperf -c <server_ip> -n 100M -P 8 resulting in kernel trace: NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue <X> timed out ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:255 ... and controller reset during intense traffic, and possibly further complications. This patch changes the default H/W Tx scheduling setting (TXSCHED) for multi-queue devices, from priority scheduling mode to a weighted round robin mode with equal weights for all H/W Tx queues, and addresses the issue above. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26mm, show_mem: suppress page counts in non-blockable contextsDavid Rientjes
commit 4b59e6c4730978679b414a8da61514a2518da512 upstream. On large systems with a lot of memory, walking all RAM to determine page types may take a half second or even more. In non-blockable contexts, the page allocator will emit a page allocation failure warning unless __GFP_NOWARN is specified. In such contexts, irqs are typically disabled and such a lengthy delay may even result in NMI watchdog timeouts. To fix this, suppress the page walk in such contexts when printing the page allocation failure warning. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler()Lv Zheng
commit 06a8566bcf5cf7db9843a82cde7a33c7bf3947d9 upstream. This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdeviceIan Abbott
commit 677a31565692d596ef42ea589b53ba289abf4713 upstream. The `insn_bits` handler `ni_65xx_dio_insn_bits()` has a `for` loop that currently writes (optionally) and reads back up to 5 "ports" consisting of 8 channels each. It reads up to 32 1-bit channels but can only read and write a whole port at once - it needs to handle up to 5 ports as the first channel it reads might not be aligned on a port boundary. It breaks out of the loop early if the next port it handles is beyond the final port on the card. It also breaks out early on the 5th port in the loop if the first channel was aligned. Unfortunately, it doesn't check that the current port it is dealing with belongs to the comedi subdevice the `insn_bits` handler is acting on. That's a bug. Redo the `for` loop to terminate after the final port belonging to the subdevice, changing the loop variable in the process to simplify things a bit. The `for` loop could now try and handle more than 5 ports if the subdevice has more than 40 channels, but the test `if (bitshift >= 32)` ensures it will break out early after 4 or 5 ports (depending on whether the first channel is aligned on a port boundary). (`bitshift` will be between -7 and 7 inclusive on the first iteration, increasing by 8 for each subsequent operation.) Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Ian Abbott: This patch applies to kernels 2.6.34.y through to 3.5.y inclusive.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ext4: avoid hang when mounting non-journal filesystems with orphan listTheodore Ts'o
commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream. When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) -js: This is a fix for CVE-2013-2015. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26hwmon: (applesmc) Silence uninitialized warningsHenrik Rydberg
commit 0fc86eca1b338d06ec500b34ef7def79c32b602b upstream. Some error paths do not set a result, leading to the (false) assumption that the value may be used uninitialized. Set results for those paths as well. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26xhci: Fix race between ep halt and URB cancellationFlorian Wolter
commit 526867c3ca0caa2e3e846cb993b0f961c33c2abb upstream. The halted state of a endpoint cannot be cleared over CLEAR_HALT from a user process, because the stopped_td variable was overwritten in the handle_stopped_endpoint() function. So the xhci_endpoint_reset() function will refuse the reset and communication with device can not run over this endpoint. https://bugzilla.kernel.org/show_bug.cgi?id=60699 Signed-off-by: Florian Wolter <wolly84@web.de> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26cciss: fix info leak in cciss_ioctl32_passthru()Dan Carpenter
commit 58f09e00ae095e46ef9edfcf3a5fd9ccdfad065e upstream. The arg64 struct has a hole after ->buf_size which isn't cleared. Or if any of the calls to copy_from_user() fail then that would cause an information leak as well. This was assigned CVE-2013-2147. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26cpqarray: fix info leak in ida_locked_ioctl()Dan Carpenter
commit 627aad1c01da6f881e7f98d71fd928ca0c316b1a upstream. The pciinfo struct has a two byte hole after ->dev_fn so stack information could be leaked to the user. This was assigned CVE-2013-2147. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26iscsi: don't hang in endless loop if no targets presentSasha Levin
commit 46a7c17d26967922092f3a8291815ffb20f6cabe upstream. iscsi_if_send_reply() may return -ESRCH if there were no targets to send data to. Currently we're ignoring this value and looping in attempt to do it over and over, which will usually lead in a hung task like this one: [ 4920.817298] INFO: task trinity:9074 blocked for more than 120 seconds. [ 4920.818527] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4920.819982] trinity D 0000000000000000 5504 9074 2756 0x00000004 [ 4920.825374] ffff880003961a98 0000000000000086 ffff8800001aa000 ffff8800001aa000 [ 4920.826791] 00000000001d4340 ffff880003961fd8 ffff880003960000 00000000001d4340 [ 4920.828241] 00000000001d4340 00000000001d4340 ffff880003961fd8 00000000001d4340 [ 4920.833231] [ 4920.833519] Call Trace: [ 4920.834010] [<ffffffff826363fa>] schedule+0x3a/0x50 [ 4920.834953] [<ffffffff82634ac9>] __mutex_lock_common+0x209/0x5b0 [ 4920.836226] [<ffffffff81af805d>] ? iscsi_if_rx+0x2d/0x990 [ 4920.837281] [<ffffffff81053943>] ? sched_clock+0x13/0x20 [ 4920.838305] [<ffffffff81af805d>] ? iscsi_if_rx+0x2d/0x990 [ 4920.839336] [<ffffffff82634eb0>] mutex_lock_nested+0x40/0x50 [ 4920.840423] [<ffffffff81af805d>] iscsi_if_rx+0x2d/0x990 [ 4920.841434] [<ffffffff810dffed>] ? sub_preempt_count+0x9d/0xd0 [ 4920.842548] [<ffffffff82637bb0>] ? _raw_read_unlock+0x30/0x60 [ 4920.843666] [<ffffffff821f71de>] netlink_unicast+0x1ae/0x1f0 [ 4920.844751] [<ffffffff821f7997>] netlink_sendmsg+0x227/0x350 [ 4920.845850] [<ffffffff821857bd>] ? sock_update_netprioidx+0xdd/0x1b0 [ 4920.847060] [<ffffffff82185732>] ? sock_update_netprioidx+0x52/0x1b0 [ 4920.848276] [<ffffffff8217f226>] sock_aio_write+0x166/0x180 [ 4920.849348] [<ffffffff810dfe41>] ? get_parent_ip+0x11/0x50 [ 4920.850428] [<ffffffff811d0d9a>] do_sync_write+0xda/0x120 [ 4920.851465] [<ffffffff810dffed>] ? sub_preempt_count+0x9d/0xd0 [ 4920.852579] [<ffffffff810dfe41>] ? get_parent_ip+0x11/0x50 [ 4920.853608] [<ffffffff81791887>] ? security_file_permission+0x27/0xb0 [ 4920.854821] [<ffffffff811d0f4c>] vfs_write+0x16c/0x180 [ 4920.855781] [<ffffffff811d104f>] sys_write+0x4f/0xa0 [ 4920.856798] [<ffffffff82638e79>] system_call_fastpath+0x16/0x1b [ 4920.877487] 1 lock held by trinity/9074: [ 4920.878239] #0: (rx_queue_mutex){+.+...}, at: [<ffffffff81af805d>] iscsi_if_rx+0x2d/0x990 [ 4920.880005] Kernel panic - not syncing: hung_task: blocked tasks Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26Revert "sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()"Ben Hutchings
This reverts commit de77b7955c3985ca95f64af3cb10557eb17eacee, which was commit f6e80abeab928b7c47cc1fbf53df13b4398a2bec upstream. This fix was only appropriate for Linux 3.7 onward, and introduced a regression when applied to earlier versions. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26isofs: Refuse RW mount of the filesystem instead of making it ROJan Kara
commit 17b7f7cf58926844e1dd40f5eb5348d481deca6a upstream. Refuse RW mount of isofs filesystem. So far we just silently changed it to RO mount but when the media is writeable, block layer won't notice this change and thus will think device is used RW and will block eject button of the drive. That is unexpected by users because for non-writeable media eject button works just fine. Userspace mount(8) command handles this just fine and retries mounting with MS_RDONLY set so userspace shouldn't see any regression. Plus any tool mounting isofs is likely confronted with the case of read-only media where block layer already refuses to mount the filesystem without MS_RDONLY set so our behavior shouldn't be anything new for it. Reported-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26HID: usbhid: quirk for N-Trig DuoSense Touch ScreenVasily Titskiy
commit 9e0bf92c223dabe0789714f8f85f6e26f8f9cda4 upstream. The DuoSense touchscreen device causes a 10 second timeout. This fix removes the delay. Signed-off-by: Vasily Titskiy <qehgt0@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26HID: Fix Speedlink VAD Cezanne support for some devicesStefan Kriwanek
commit 06bb5219118fb098f4b0c7dcb484b28a52bf1c14 upstream. Some devices of the "Speedlink VAD Cezanne" model need more aggressive fixing than already done. I made sure through testing that this patch would not interfere with the proper working of a device that is bug-free. (The driver drops EV_REL events with abs(val) >= 256, which are not achievable even on the highest laser resolution hardware setting.) Signed-off-by: Stefan Kriwanek <mail@stefankriwanek.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26fanotify: dont merge permission eventsLino Sanfilippo
commit 03a1cec1f17ac1a6041996b3e40f96b5a2f90e1b upstream. Boyd Yang reported a problem for the case that multiple threads of the same thread group are waiting for a reponse for a permission event. In this case it is possible that some of the threads are never woken up, even if the response for the event has been received (see http://marc.info/?l=linux-kernel&m=131822913806350&w=2). The reason is that we are currently merging permission events if they belong to the same thread group. But we are not prepared to wake up more than one waiter for each event. We do wait_event(group->fanotify_data.access_waitq, event->response || atomic_read(&group->fanotify_data.bypass_perm)); and after that event->response = 0; which is the reason that even if we woke up all waiters for the same event some of them may see event->response being already set 0 again, then go back to sleep and block forever. With this patch we avoid that more than one thread is waiting for a response by not merging permission events for the same thread group any more. Reported-by: Boyd Yang <boyd.yang@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilipp@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26perf tools: Handle JITed code in shared memoryAndi Kleen
commit 89365e6c9ad4c0e090e4c6a4b67a3ce319381d89 upstream. Need to check for /dev/zero. Most likely more strings are missing too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1366848182-30449-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26perf: Fix perf_cgroup_switch for sw-eventsPeter Zijlstra
commit 95cf59ea72331d0093010543b8951bb43f262cac upstream. Jiri reported that he could trigger the WARN_ON_ONCE() in perf_cgroup_switch() using sw-events. This is because sw-events share a cpuctx with multiple PMUs. Use the ->unique_pmu pointer to limit the pmu iteration to unique cpuctx instances. Reported-and-Tested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-so7wi2zf3jjzrwcutm2mkz0j@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmuPeter Zijlstra
commit 3f1f33206c16c7b3839d71372bc2ac3f305aa802 upstream. Stephane thought the perf_cpu_context::active_pmu name confusing and suggested using 'unique_pmu' instead. This pointer is a pointer to a 'random' pmu sharing the cpuctx instance, therefore limiting a for_each_pmu loop to those where cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx instances. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26cgroup: fail if monitored file and event_control are in different cgroupLi Zefan
commit f169007b2773f285e098cb84c74aac0154d65ff7 upstream. If we pass fd of memory.usage_in_bytes of cgroup A to cgroup.event_control of cgroup B, then we won't get memory usage notification from A but B! What's worse, if A and B are in different mount hierarchy, we'll end up accessing NULL pointer! Disallow this kind of invalid usage. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26sfc: Fix efx_rx_buf_offset() for recycled pagesBen Hutchings
This bug fix is only for stable branches older than 3.10. The bug was fixed upstream by commit 2768935a4660 ('sfc: reuse pages to avoid DMA mapping/unmapping costs'), but that change is totally unsuitable for stable. Commit b590ace09d51 ('sfc: Fix efx_rx_buf_offset() in the presence of swiotlb') added an explicit page_offset member to struct efx_rx_buffer, which must be set consistently with the u.page and dma_addr fields. However, it failed to add the necessary assignment in efx_resurrect_rx_buffer(). It also did not correct the calculation of efx_rx_buffer::dma_addr in efx_resurrect_rx_buffer(), which assumes that DMA-mapping a page will result in a page-aligned DMA address (exactly what swiotlb violates). Add the assignment of efx_rx_buffer::page_offset and change the calculation of dma_addr to make use of it. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-10-26macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang
commit ece793fcfc417b3925844be88a6a6dc82ae8f7c6 upstream. We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: callback only takes one argument] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26Revert "zram: use zram->lock to protect zram_free_page() in swap free notify ↵Ben Hutchings
path" This reverts commit 9e443904906ca2b5b3ae71f34ac4a4fa6905623e, which was commit 57ab048532c0d975538cebd4456491b5c34248f4 upstream. Taking the semaphore here leads to sleeping in atomic context. This was fixed in mainline commit a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity") but that is too difficult to backport. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26powerpc/pseries/lparcfg: Fix possible overflow are more than 1026Chen Gang
commit 5676005acf26ab7e924a8438ea4746e47d405762 upstream. need set '\0' for 'local_buffer'. SPLPAR_MAXLENGTH is 1026, RTAS_DATA_BUF_SIZE is 4096. so the contents of rtas_data_buf may truncated in memcpy. if contents are really truncated. the splpar_strlen is more than 1026. the next while loop checking will not find the end of buffer. that will cause memory access violation. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26m68knommu: clean up linker scriptGreg Ungerer
commit f84f52a5c15db7d14a534815f27253b001735183 upstream. There is a lot of years of collected cruft in the m68knommu linker script. Clean it all up and use the well defined linker script support macros. Support is maintained for building both ROM/FLASH based and RAM based setups. No major changes to section layouts, though the rodata section is now lumped in with the read/write data section. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26m68k: use non-MMU linker script for ColdFire MMU buildsGreg Ungerer
commit ed865e31a8273be200db9ddcdb6b844e48777abd upstream. Use the non-MMU linker script for ColdFire builds when we are building for MMU enabled. The image layout is correct for loading on existing ColdFire dev boards. The only addition required to the current non-MMU linker script is to add support for the fixup section. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Matt Waddel <mwaddel@yahoo.com> Acked-by: Kurt Mahan <kmahan@xmission.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26m68k: consolidate the vmlinux.lds linker scriptsGreg Ungerer
commit 40c1b9cfeedf79b909c961e0e00a13497e80bc82 upstream. The merge of m68knommu left the linker scripts a little disorganized. Some consistent naming and squashing two of scripts that just include others can simplify things a lot. So merge the two simple including scripts, and rename the nommu script to be consistent with the existing m68k linker scripts. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26usb: core: don't try to reset_device() a port that got just disconnectedJulius Werner
commit 481f2d4f89f87a0baa26147f323380e31cfa7c44 upstream. The USB hub driver's event handler contains a check to catch SuperSpeed devices that transitioned into the SS.Inactive state and tries to fix them with a reset. It decides whether to do a plain hub port reset or call the usb_reset_device() function based on whether there was a device attached to the port. However, there are device/hub combinations (found with a JetFlash Transcend mass storage stick (8564:1000) on the root hub of an Intel LynxPoint PCH) which can transition to the SS.Inactive state on disconnect (and stay there long enough for the host to notice). In this case, above-mentioned reset check will call usb_reset_device() on the stale device data structure. The kernel will send pointless LPM control messages to the no longer connected device address and can even cause several 5 second khubd stalls on some (buggy?) host controllers, before finally accepting the device's fate amongst a flurry of error messages. This patch makes the choice of reset dependent on the port status that has just been read from the hub in addition to the existence of an in-kernel data structure for the device, and only proceeds with the more extensive reset if both are valid. Signed-off-by: Julius Werner <jwerner@chromium.org> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)Oleg Nesterov
commit 776164c1faac4966ab14418bb0922e1820da1d19 upstream. debugfs_remove_recursive() is wrong, 1. it wrongly assumes that !list_empty(d_subdirs) means that this dir should be removed. This is not that bad by itself, but: 2. if d_subdirs does not becomes empty after __debugfs_remove() it gives up and silently fails, it doesn't even try to remove other entries. However ->d_subdirs can be non-empty because it still has the already deleted !debugfs_positive() entries. 3. simple_release_fs() is called even if __debugfs_remove() fails. Suppose we have dir1/ dir2/ file2 file1 and someone opens dir1/dir2/file2. Now, debugfs_remove_recursive(dir1/dir2) succeeds, and dir1/dir2 goes away. But debugfs_remove_recursive(dir1) silently fails and doesn't remove this directory. Because it tries to delete (the already deleted) dir1/dir2/file2 again and then fails due to "Avoid infinite loop" logic. Test-case: #!/bin/sh cd /sys/kernel/debug/tracing echo 'p:probe/sigprocmask sigprocmask' >> kprobe_events sleep 1000 < events/probe/sigprocmask/id & echo -n >| kprobe_events [ -d events/probe ] && echo "ERR!! failed to rm probe" And after that it is not possible to create another probe entry. With this patch debugfs_remove_recursive() skips !debugfs_positive() files although this is not strictly needed. The most important change is that it does not try to make ->d_subdirs empty, it simply scans the whole list(s) recursively and removes as much as possible. Link: http://lkml.kernel.org/r/20130726151256.GC19472@redhat.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26perf: Use css_tryget() to avoid propping up css refcountSalman Qazi
commit 9c5da09d266ca9b32eb16cf940f8161d949c2fe5 upstream. An rmdir pushes css's ref count to zero. However, if the associated directory is open at the time, the dentry ref count is non-zero. If the fd for this directory is then passed into perf_event_open, it does a css_get(). This bounces the ref count back up from zero. This is a problem by itself. But what makes it turn into a crash is the fact that we end up doing an extra dput, since we perform a dput when css_put sees the ref count go down to zero. css_tryget() does not fall into that trap. So, we use that instead. Reproduction test-case for the bug: #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <linux/unistd.h> #include <linux/perf_event.h> #include <string.h> #include <errno.h> #include <stdio.h> #define PERF_FLAG_PID_CGROUP (1U << 2) int perf_event_open(struct perf_event_attr *hw_event_uptr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu, group_fd, flags); } /* * Directly poke at the perf_event bug, since it's proving hard to repro * depending on where in the kernel tree. what moved? */ int main(int argc, char **argv) { int fd; struct perf_event_attr attr; memset(&attr, 0, sizeof(attr)); attr.exclude_kernel = 1; attr.size = sizeof(attr); mkdir("/dev/cgroup/perf_event/blah", 0777); fd = open("/dev/cgroup/perf_event/blah", O_RDONLY); perror("open"); rmdir("/dev/cgroup/perf_event/blah"); sleep(2); perf_event_open(&attr, fd, 0, -1, PERF_FLAG_PID_CGROUP); perror("perf_event_open"); close(fd); return 0; } Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26kernel-doc: bugfix - multi-line macrosDaniel Santos
commit 654784284430bf2739985914b65e09c7c35a7273 upstream. Prior to this patch the following code breaks: /** * multiline_example - this breaks kernel-doc */ #define multiline_example( \ myparam) Producing this error: Error(somefile.h:983): cannot understand prototype: 'multiline_example( \ ' This patch fixes the issue by appending all lines ending in a blackslash (optionally followed by whitespace), removing the backslash and any whitespace after it prior to appending (just like the C pre-processor would). This fixes a break in kerel-doc introduced by the additions to rbtree.h. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26sparc32: Fix exit flag passed from traced sys_sigreturnKirill Tkhai
[ Upstream commit 7a3b0f89e3fea680f93932691ca41a68eee7ab5e ] Pass 1 in %o1 to indicate that syscall_trace accounts exit. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26sparc64: Fix not SRA'ed %o5 in 32-bit traced syscallKirill Tkhai
[ Upstream commit ab2abda6377723e0d5fbbfe5f5aa16a5523344d1 ] (From v1 to v2: changed comment) On the way linux_sparc_syscall32->linux_syscall_trace32->goto 2f, register %o5 doesn't clear its second 32-bit. Fix that. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26sparc64: Fix off by one in trampoline TLB mapping installation loop.David S. Miller
[ Upstream commit 63d499662aeec1864ec36d042aca8184ea6a938e ] Reported-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26sparc64: Remove RWSEM export leftoversKirill Tkhai
[ Upstream commit 61d9b9355b0d427bd1e732bd54628ff9103e496f ] The functions __down_read __down_read_trylock __down_write __down_write_trylock __up_read __up_write __downgrade_write are implemented inline, so remove corresponding EXPORT_SYMBOLs (They lead to compile errors on RT kernel). Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26sparc64: Fix ITLB handler of null pageKirill Tkhai
[ Upstream commit 1c2696cdaad84580545a2e9c0879ff597880b1a9 ] 1)Use kvmap_itlb_longpath instead of kvmap_dtlb_longpath. 2)Handle page #0 only, don't handle page #1: bleu -> blu (KERNBASE is 0x400000, so #1 does not exist too. But everything is possible in the future. Fix to not to have problems later.) 3)Remove unused kvmap_itlb_nonlinear. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26esp_scsi: Fix tag state corruption when autosensing.David S. Miller
[ Upstream commit 21af8107f27878813d0364733c0b08813c2c192a ] Meelis Roos reports a crash in esp_free_lun_tag() in the presense of a disk which has died. The issue is that when we issue an autosense command, we do so by hijacking the original command that caused the check-condition. When we do so we clear out the ent->tag[] array when we issue it via find_and_prep_issuable_command(). This is so that the autosense command is forced to be issued non-tagged. That is problematic, because it is the value of ent->tag[] which determines whether we issued the original scsi command as tagged vs. non-tagged (see esp_alloc_lun_tag()). And that, in turn, is what trips up the sanity checks in esp_free_lun_tag(). That function needs the original ->tag[] values in order to free up the tag slot properly. Fix this by remembering the original command's tag values, and having esp_alloc_lun_tag() and esp_free_lun_tag() use them. Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ll_temac: Reset dma descriptors indexes on ndo_openRicardo Ribalda
[ Upstream commit 7167cf0e8cd10287b7912b9ffcccd9616f382922 ] The dma descriptors indexes are only initialized on the probe function. If a packet is on the buffer when temac_stop is called, the dma descriptors indexes can be left on a incorrect state where no other package can be sent. So an interface could be left in an usable state after ifdow/ifup. This patch makes sure that the descriptors indexes are in a proper status when the device is open. Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_putSalam Noureddine
[ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ] It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_putSalam Noureddine
[ Upstream commit e2401654dd0f5f3fb7a8d80dad9554d73d7ca394 ] It is possible for the timer handlers to run after the call to ip_mc_down so use in_dev_put instead of __in_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the in_device being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26bonding: Fix broken promiscuity reference counting issueNeil Horman
[ Upstream commit 5a0068deb611109c5ba77358be533f763f395ee4 ] Recently grabbed this report: https://bugzilla.redhat.com/show_bug.cgi?id=1005567 Of an issue in which the bonding driver, with an attached vlan encountered the following errors when bond0 was taken down and back up: dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken. The error occurs because, during __bond_release_one, if we release our last slave, we take on a random mac address and issue a NETDEV_CHANGEADDR notification. With an attached vlan, the vlan may see that the vlan and bond mac address were in sync, but no longer are. This triggers a call to dev_uc_add and dev_set_rx_mode, which enables IFF_PROMISC on the bond device. Then, when we complete __bond_release_one, we use the current state of the bond flags to determine if we should decrement the promiscuity of the releasing slave. But since the bond changed promiscuity state during the release operation, we incorrectly decrement the slave promisc count when it wasn't in promiscuous mode to begin with, causing the above error Fix is pretty simple, just cache the bonding flags at the start of the function and use those when determining the need to set promiscuity. This is also needed for the ALLMULTI flag CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Mark Wu <wudxw@linux.vnet.ibm.com> CC: "David S. Miller" <davem@davemloft.net> Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26dm9601: fix IFF_ALLMULTI handlingPeter Korsgaard
[ Upstream commit bf0ea6380724beb64f27a722dfc4b0edabff816e ] Pass-all-multicast is controlled by bit 3 in RX control, not bit 2 (pass undersized frames). Reported-by: Joseph Chang <joseph_chang@davicom.com.tw> Signed-off-by: Peter Korsgaard <peter@korsgaard.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26via-rhine: fix VLAN priority field (PCP, IEEE 802.1p)Roger Luethi
[ Upstream commit 207070f5221e2a901d56a49df9cde47d9b716cd7 ] Outgoing packets sent by via-rhine have their VLAN PCP field off by one (when hardware acceleration is enabled). The TX descriptor expects only VID and PCP (without a CFI/DEI bit). Peter Boström noticed and reported the bug. Signed-off-by: Roger Luethi <rl@hellgate.ch> Cc: Peter Boström <peter.bostrom@netrounds.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa
[ Upstream commit 2811ebac2521ceac84f2bdae402455baa6a7fb47 ] In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ] If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26resubmit bridge: fix message_age_timer calculationChris Healy
[ Upstream commit 9a0620133ccce9dd35c00a96405c8d80938c2cc0 ] This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmitDaniel Borkmann
[ Upstream commit 95ee62083cb6453e056562d91f597552021e6ae7 ] Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport does not seem to have the desired effect: SCTP + IPv4: 22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116) 192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72 22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340) 192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1): SCTP + IPv6: 22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364) fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp 1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10] Moreover, Alan says: This problem was seen with both Racoon and Racoon2. Other people have seen this with OpenSwan. When IPsec is configured to encrypt all upper layer protocols the SCTP connection does not initialize. After using Wireshark to follow packets, this is because the SCTP packet leaves Box A unencrypted and Box B believes all upper layer protocols are to be encrypted so it drops this packet, causing the SCTP connection to fail to initialize. When IPsec is configured to encrypt just SCTP, the SCTP packets are observed unencrypted. In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext" string on the other end, results in cleartext on the wire where SCTP eventually does not report any errors, thus in the latter case that Alan reports, the non-paranoid user might think he's communicating over an encrypted transport on SCTP although he's not (tcpdump ... -X): ... 0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l.... 0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext... Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the receiver side. Initial follow-up analysis from Alan's bug report was done by Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this. SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit(). This has the implication that it probably never really got updated along with changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers. SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since a call to inet6_csk_xmit() would solve this problem, but result in unecessary route lookups, let us just use the cached flowi6 instead that we got through sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(), we do the route lookup / flow caching in sctp_transport_route(), hold it in tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst() instead to get the correct source routed dst entry, which we assign to the skb. Also source address routing example from 625034113 ("sctp: fix sctp to work with ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095 it is actually 'recommended' to not use that anyway due to traffic amplification [1]. So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if we overwrite the flow destination here, the lower IPv6 layer will be unable to put the correct destination address into IP header, as routing header is added in ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside, result of this patch is that we do not have any XfrmInTmplMismatch increase plus on the wire with this patch it now looks like: SCTP + IPv6: 08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba: AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72 08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a: AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296 This fixes Kernel Bugzilla 24412. This security issue seems to be present since 2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have its fun with that. lksctp-tools IPv6 regression test suite passes as well with this patch. [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf Reported-by: Alan Chester <alan.chester@tekelec.com> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26netpoll: fix NULL pointer dereference in netpoll_cleanupNikolay Aleksandrov
[ Upstream commit d0fe8c888b1fd1a2f84b9962cabcb98a70988aec ] I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26net: sctp: fix smatch warning in sctp_send_asconf_del_ipDaniel Borkmann
[ Upstream commit 88362ad8f9a6cea787420b57cc27ccacef000dbe ] This was originally reported in [1] and posted by Neil Horman [2], he said: Fix up a missed null pointer check in the asconf code. If we don't find a local address, but we pass in an address length of more than 1, we may dereference a NULL laddr pointer. Currently this can't happen, as the only users of the function pass in the value 1 as the addrcnt parameter, but its not hot path, and it doesn't hurt to check for NULL should that ever be the case. The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1 while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR so that we do *not* find a single address in the association's bind address list that is not in the packed array of addresses. If this happens when we have an established association with ASCONF-capable peers, then we could get a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1 and call later sctp_make_asconf_update_ip() with NULL laddr. BUT: this actually won't happen as sctp_bindx_rem() will catch such a case and return with an error earlier. As this is incredably unintuitive and error prone, add a check to catch at least future bugs here. As Neil says, its not hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the single-homed host"). [1] http://www.spinics.net/lists/linux-sctp/msg02132.html [2] http://www.spinics.net/lists/linux-sctp/msg02133.html Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Michio Honda <micchie@sfc.wide.ad.jp> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>