summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-01-27Linux 5.10.11v5.10.11Greg Kroah-Hartman
Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20210126094313.589480033@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27Revert "mm: fix initialization of struct page for holes in memory layout"Linus Torvalds
commit 377bf660d07a47269510435d11f3b65d53edca20 upstream. This reverts commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399. Chris Wilson reports that it causes boot problems: "We have half a dozen or so different machines in CI that are silently failing to boot, that we believe is bisected to this patch" and the CI team confirmed that a revert fixed the issues. The cause is unknown for now, so let's revert it. Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/ Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27mm: fix initialization of struct page for holes in memory layoutMike Rapoport
commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399 upstream. There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields if this page are set to default values and the page is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in struct page) in the same pageblock. Update init_unavailable_mem() to use zone constraints defined by an architecture to properly setup the zone link and use node ID of the adjacent range in memblock.memory to set the node link. Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27Commit 9bb48c82aced ("tty: implement write_iter") converted the tty layer to ↵Sami Tolvanen
use write_iter. Fix the redirected_tty_write declaration also in n_tty and change the comparisons to use write_iter instead of write. also in n_tty and change the comparisons to use write_iter instead of write. commit 9f12e37cae44a96132fc3031535a0b165486941a upstream. [ Also moved the declaration of redirected_tty_write() to the proper location in a header file. The reason for the bug was the bogus extern declaration in n_tty.c silently not matching the changed definition in tty_io.c, and because it wasn't in a shared header file, there was no cross-checking of the declaration. Sami noticed because Clang's Control Flow Integrity checking ended up incidentally noticing the inconsistent declaration. - Linus ] Fixes: 9bb48c82aced ("tty: implement write_iter") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27fs/pipe: allow sendfile() to pipe againJohannes Berg
commit f8ad8187c3b536ee2b10502a8340c014204a1af0 upstream. After commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") sendfile() could no longer send data from a real file to a pipe, breaking for example certain cgit setups (e.g. when running behind fcgiwrap), because in this case cgit will try to do exactly this: sendfile() to a pipe. Fix this by using iter_file_splice_write for the splice_write method of pipes, as suggested by Christoph. Cc: stable@vger.kernel.org Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27interconnect: imx8mq: Use icc_sync_stateMartin Kepplinger
commit 67288f74d4837b82ef937170da3389b0779c17be upstream. Add the icc_sync_state callback to notify the framework when consumers are probed and the bandwidth doesn't have to be kept at maximum anymore. Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm> Suggested-by: Georgi Djakov <georgi.djakov@linaro.org> Fixes: 7d3b0b0d8184 ("interconnect: qcom: Use icc_sync_state") Link: https://lore.kernel.org/r/20201210100906.18205-6-martin.kepplinger@puri.sm Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27kernfs: wire up ->splice_read and ->splice_writeChristoph Hellwig
commit f2d6c2708bd84ca953fa6b6ca5717e79eb0140c7 upstream. Wire up the splice_read and splice_write methods to the default helpers using ->read_iter and ->write_iter now that those are implemented for kernfs. This restores support to use splice and sendfile on kernfs files. Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Siddharth Gupta <sidgup@codeaurora.org> Tested-by: Siddharth Gupta <sidgup@codeaurora.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-4-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27kernfs: implement ->write_iterChristoph Hellwig
commit cc099e0b399889c6485c88368b19824b087c9f8c upstream. Switch kernfs to implement the write_iter method instead of plain old write to prepare to supporting splice and sendfile again. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-3-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27kernfs: implement ->read_iterChristoph Hellwig
commit 4eaad21a6ac9865df7f31983232ed5928450458d upstream. Switch kernfs to implement the read_iter method instead of plain old read to prepare to supporting splice and sendfile again. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-2-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27bpf: Local storage helpers should check nullness of owner ptr passedKP Singh
commit 1a9c72ad4c26821e215a396167c14959cf24a7f1 upstream. The verifier allows ARG_PTR_TO_BTF_ID helper arguments to be NULL, so helper implementations need to check this before dereferencing them. This was already fixed for the socket storage helpers but not for task and inode. The issue can be reproduced by attaching an LSM program to inode_rename hook (called when moving files) which tries to get the inode of the new file without checking for its nullness and then trying to move an existing file to a new path: mv existing_file new_file_does_not_exist The report including the sample program and the steps for reproducing the bug: https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com Fixes: 4cf1bc1f1045 ("bpf: Implement task local storage") Fixes: 8ea636848aca ("bpf: Implement bpf_local_storage for inodes") Reported-by: Gilad Reti <gilad.reti@gmail.com> Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075525.256820-3-kpsingh@kernel.org [ just take 1/2 of this patch for 5.10.y - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27drm/i915/hdcp: Get conn while content_type changedAnshuman Gupta
commit 8662e1119a7d1baa1b2001689b2923e9050754bd upstream. Get DRM connector reference count while scheduling a prop work to avoid any possible destroy of DRM connector when it is in DRM_CONNECTOR_REGISTERED state. Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors") Cc: Sean Paul <seanpaul@chromium.org> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Tested-by: Karthik B S <karthik.b.s@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210111081120.28417-3-anshuman.gupta@intel.com (cherry picked from commit b3c6661aad979ec3d4f5675cf3e6a35828607d6a) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27ASoC: SOF: Intel: hda: Avoid checking jack on system suspendKai-Heng Feng
commit ef4d764c99f792b725d4754a3628830f094f5c58 upstream. System takes a very long time to suspend after commit 215a22ed31a1 ("ALSA: hda: Refactor codec PM to use direct-complete optimization"): [ 90.065964] PM: suspend entry (s2idle) [ 90.067337] Filesystems sync: 0.001 seconds [ 90.185758] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 90.188713] OOM killer disabled. [ 90.188714] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 90.190024] printk: Suspending console(s) (use no_console_suspend to debug) [ 90.904912] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C], continue to suspend [ 321.262505] snd_hda_codec_realtek ehdaudio0D0: Unable to sync register 0x2b8000. -5 [ 328.426919] snd_hda_codec_realtek ehdaudio0D0: Unable to sync register 0x2b8000. -5 [ 329.490933] ACPI: EC: interrupt blocked That commit keeps the codec suspended during the system suspend. However, mute/micmute LED will clear codec's direct-complete flag by dpm_clear_superiors_direct_complete(). This doesn't play well with SOF driver. When its runtime resume is called for system suspend, hda_codec_jack_check() schedules jackpoll_work which uses snd_hdac_is_power_on() to check whether codec is suspended. Because the direct-complete path isn't taken, pm_runtime_disable() isn't called so snd_hdac_is_power_on() returns false and jackpoll continues to run, and snd_hda_power_up_pm() cannot power up an already suspended codec in multiple attempts, causes the long delay on system suspend: if (dev->power.direct_complete) { if (pm_runtime_status_suspended(dev)) { pm_runtime_disable(dev); if (pm_runtime_status_suspended(dev)) { pm_dev_dbg(dev, state, "direct-complete "); goto Complete; } pm_runtime_enable(dev); } dev->power.direct_complete = false; } When direct-complete path is taken, snd_hdac_is_power_on() returns true and hda_jackpoll_work() is skipped by accident. So this is still not correct. If we were to use snd_hdac_is_power_on() in system PM path, pm_runtime_status_suspended() should be used instead of pm_runtime_suspended(), otherwise pm_runtime_{enable,disable}() may change the outcome of snd_hdac_is_power_on(). Because devices suspend in reverse order (i.e. child first), it doesn't make much sense to resume an already suspended codec from audio controller. So avoid the issue by making sure jackpoll isn't used in system PM process. Fixes: 215a22ed31a1 ("ALSA: hda: Refactor codec PM to use direct-complete optimization") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20210112181128.1229827-3-kai.heng.feng@canonical.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27tcp: Fix potential use-after-free due to double kfree()Kuniyuki Iwashima
commit c89dffc70b340780e5b933832d8c3e045ef3791e upstream. Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct request_sock and then can allocate inet_rsk(req)->ireq_opt. After that, tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full socket into ehash and sets NULL to ireq_opt. Otherwise, tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full socket. The commit 01770a1661657 ("tcp: fix race condition when creating child sockets from syncookies") added a new path, in which more than one cores create full sockets for the same SYN cookie. Currently, the core which loses the race frees the full socket without resetting inet_opt, resulting in that both sock_put() and reqsk_put() call kfree() for the same memory: sock_put sk_free __sk_free sk_destruct __sk_destruct sk->sk_destruct/inet_sock_destruct kfree(rcu_dereference_protected(inet->inet_opt, 1)); reqsk_put reqsk_free __reqsk_free req->rsk_ops->destructor/tcp_v4_reqsk_destructor kfree(rcu_dereference_protected(inet_rsk(req)->ireq_opt, 1)); Calling kmalloc() between the double kfree() can lead to use-after-free, so this patch fixes it by setting NULL to inet_opt before sock_put(). As a side note, this kind of issue does not happen for IPv6. This is because tcp_v6_syn_recv_sock() clones both ipv6_opt and pktopts which correspond to ireq_opt in IPv4. Fixes: 01770a166165 ("tcp: fix race condition when creating child sockets from syncookies") CC: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210118055920.82516-1-kuniyu@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27x86/sev-es: Handle string port IO to kernel memory properlyHyunwook (Wooky) Baek
commit 7024f60d655272bd2ca1d3a4c9e0a63319b1eea1 upstream. Don't assume dest/source buffers are userspace addresses when manually copying data for string I/O or MOVS MMIO, as {get,put}_user() will fail if handed a kernel address and ultimately lead to a kernel panic. When invoking INSB/OUTSB instructions in kernel space in a SEV-ES-enabled VM, the kernel crashes with the following message: "SEV-ES: Unsupported exception in #VC instruction emulation - can't continue" Handle that case properly. [ bp: Massage commit message. ] Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image") Signed-off-by: Hyunwook (Wooky) Baek <baekhw@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: David Rientjes <rientjes@google.com> Link: https://lkml.kernel.org/r/20210110071102.2576186-1-baekhw@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: systemport: free dev before on error pathPan Bian
commit 0c630a66bf10991b0ef13d27c93d7545e692ef5b upstream. On the error path, it should goto the error handling label to free allocated memory rather than directly return. Fixes: 31bc72d97656 ("net: systemport: fetch and use clock resources") Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210120044423.1704-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27tty: fix up hung_up_tty_write() conversionLinus Torvalds
commit 17749851eb9ca2298e7c3b81aae4228961b36f28 upstream. In commit "tty: implement write_iter", I left the write_iter conversion of the hung up tty case alone, because I incorrectly thought it didn't matter. Jiri showed me the errors of my ways, and pointed out the problems with that incomplete conversion. Fix it all up. Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/CAHk-=wh+-rGsa=xruEWdg_fJViFG8rN9bpLrfLz=_yBYh2tBhA@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27tty: implement write_iterLinus Torvalds
commit 9bb48c82aced07698a2d08ee0f1475a6c4f6b266 upstream. This makes the tty layer use the .write_iter() function instead of the traditional .write() functionality. That allows writev(), but more importantly also makes it possible to enable .splice_write() for ttys, reinstating the "splice to tty" functionality that was lost in commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops"). Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Oliver Giles <ohw.giles@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27x86/sev: Fix nonistr violationPeter Zijlstra
commit a1d5c98aac33a5a0004ecf88905dcc261c52f988 upstream. When the compiler fails to inline, it violates nonisntr: vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0xc7: call to sev_es_wr_ghcb_msr() leaves .noinstr.text section Fixes: 4ca68e023b11 ("x86/sev-es: Handle NMI State") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210106144017.532902065@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27pinctrl: qcom: Don't clear pending interrupts when enablingDouglas Anderson
commit cf9d052aa6005f1e8dfaf491d83bf37f368af69e upstream. In Linux, if a driver does disable_irq() and later does enable_irq() on its interrupt, I believe it's expecting these properties: * If an interrupt was pending when the driver disabled then it will still be pending after the driver re-enables. * If an edge-triggered interrupt comes in while an interrupt is disabled it should assert when the interrupt is re-enabled. If you think that the above sounds a lot like the disable_irq() and enable_irq() are supposed to be masking/unmasking the interrupt instead of disabling/enabling it then you've made an astute observation. Specifically when talking about interrupts, "mask" usually means to stop posting interrupts but keep tracking them and "disable" means to fully shut off interrupt detection. It's unfortunate that this is so confusing, but presumably this is all the way it is for historical reasons. Perhaps more confusing than the above is that, even though clients of IRQs themselves don't have a way to request mask/unmask vs. disable/enable calls, IRQ chips themselves can implement both. ...and yet more confusing is that if an IRQ chip implements disable/enable then they will be called when a client driver calls disable_irq() / enable_irq(). It does feel like some of the above could be cleared up. However, without any other core interrupt changes it should be clear that when an IRQ chip gets a request to "disable" an IRQ that it has to treat it like a mask of that IRQ. In any case, after that long interlude you can see that the "unmask and clear" can break things. Maulik tried to fix it so that we no longer did "unmask and clear" in commit 71266d9d3936 ("pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback"), but it only handled the PDC case and it had problems (it caused sc7180-trogdor devices to fail to suspend). Let's fix. >From my understanding the source of the phantom interrupt in the were these two things: 1. One that could have been introduced in msm_gpio_irq_set_type() (only for the non-PDC case). 2. Edges could have been detected when a GPIO was muxed away. Fixing case #1 is easy. We can just add a clear in msm_gpio_irq_set_type(). Fixing case #2 is harder. Let's use a concrete example. In sc7180-trogdor.dtsi we configure the uart3 to have two pinctrl states, sleep and default, and mux between the two during runtime PM and system suspend (see geni_se_resources_{on,off}() for more details). The difference between the sleep and default state is that the RX pin is muxed to a GPIO during sleep and muxed to the UART otherwise. As per Qualcomm, when we mux the pin over to the UART function the PDC (or the non-PDC interrupt detection logic) is still watching it / latching edges. These edges don't cause interrupts because the current code masks the interrupt unless we're entering suspend. However, as soon as we enter suspend we unmask the interrupt and it's counted as a wakeup. Let's deal with the problem like this: * When we mux away, we'll mask our interrupt. This isn't necessary in the above case since the client already masked us, but it's a good idea in general. * When we mux back will clear any interrupts and unmask. Fixes: 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for msm gpio") Fixes: 71266d9d3936 ("pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Maulik Shah <mkshah@codeaurora.org> Tested-by: Maulik Shah <mkshah@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20210114191601.v7.4.I7cf3019783720feb57b958c95c2b684940264cd1@changeid Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27pinctrl: qcom: Properly clear "intr_ack_high" interrupts when unmaskingDouglas Anderson
commit a95881d6aa2c000e3649f27a1a7329cf356e6bb3 upstream. In commit 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for msm gpio") we tried to Ack interrupts during unmask. However, that patch forgot to check "intr_ack_high" so, presumably, it only worked for a certain subset of SoCs. Let's add a small accessor so we don't need to open-code the logic in both places. This was found by code inspection. I don't have any access to the hardware in question nor software that needs the Ack during unmask. Fixes: 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for msm gpio") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Maulik Shah <mkshah@codeaurora.org> Tested-by: Maulik Shah <mkshah@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210114191601.v7.3.I32d0f4e174d45363b49ab611a13c3da8f1e87d0f@changeid Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27pinctrl: qcom: No need to read-modify-write the interrupt statusDouglas Anderson
commit 4079d35fa4fca4ee0ffd66968312fc86a5e8c290 upstream. When the Qualcomm pinctrl driver wants to Ack an interrupt, it does a read-modify-write on the interrupt status register. On some SoCs it makes sure that the status bit is 1 to "Ack" and on others it makes sure that the bit is 0 to "Ack". Presumably the first type of interrupt controller is a "write 1 to clear" type register and the second just let you directly set the interrupt status register. As far as I can tell from scanning structure definitions, the interrupt status bit is always in a register by itself. Thus with both types of interrupt controllers it is safe to "Ack" interrupts without doing a read-modify-write. We can do a simple write. It should be noted that if the interrupt status bit _was_ ever in a register with other things (like maybe status bits for other GPIOs): a) For "write 1 clear" type controllers then read-modify-write would be totally wrong because we'd accidentally end up clearing interrupts we weren't looking at. b) For "direct set" type controllers then read-modify-write would also be wrong because someone setting one of the other bits in the register might accidentally clear (or set) our interrupt. I say this simply to show that the current read-modify-write doesn't provide any sort of "future proofing" of the code. In fact (for "write 1 clear" controllers) the new code is slightly more "future proof" since it would allow more than one interrupt status bits to share a register. NOTE: this code fixes no bugs--it simply avoids an extra register read. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Maulik Shah <mkshah@codeaurora.org> Tested-by: Maulik Shah <mkshah@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210114191601.v7.2.I3635de080604e1feda770591c5563bd6e63dd39d@changeid Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27pinctrl: qcom: Allow SoCs to specify a GPIO function that's not 0Douglas Anderson
commit a82e537807d5c85706cd4c16fd2de77a8495dc8d upstream. There's currently a comment in the code saying function 0 is GPIO. Instead of hardcoding it, let's add a member where an SoC can specify it. No known SoCs use a number other than 0, but this just makes the code clearer. NOTE: no SoC code needs to be updated since we can rely on zero-initialization. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Maulik Shah <mkshah@codeaurora.org> Tested-by: Maulik Shah <mkshah@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210114191601.v7.1.I3ad184e3423d8e479bc3e86f5b393abb1704a1d1@changeid Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: core: devlink: use right genl user_ptr when handling port param get/setOleksandr Mazur
commit 7e238de8283acd32c26c2bc2a50672d0ea862ff7 upstream. Fix incorrect user_ptr dereferencing when handling port param get/set: idx [0] stores the 'struct devlink' pointer; idx [1] stores the 'struct devlink_port' pointer; Fixes: 637989b5d77e ("devlink: Always use user_ptr[0] for devlink and simplify post_doit") CC: Parav Pandit <parav@mellanox.com> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Link: https://lore.kernel.org/r/20210119085333.16833-1-vadym.kochan@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: mscc: ocelot: Fix multicast to the CPU portAlban Bedel
commit 584b7cfcdc7d6d416a9d6fece9516764bd977d2e upstream. Multicast entries in the MAC table use the high bits of the MAC address to encode the ports that should get the packets. But this port mask does not work for the CPU port, to receive these packets on the CPU port the MAC_CPU_COPY flag must be set. Because of this IPv6 was effectively not working because neighbor solicitations were never received. This was not apparent before commit 9403c158 (net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries) as the IPv6 entries were broken so all incoming IPv6 multicast was then treated as unknown and flooded on all ports. To fix this problem rework the ocelot_mact_learn() to set the MAC_CPU_COPY flag when a multicast entry that target the CPU port is added. For this we have to read back the ports endcoded in the pseudo MAC address by the caller. It is not a very nice design but that avoid changing the callers and should make backporting easier. Signed-off-by: Alban Bedel <alban.bedel@aerq.com> Fixes: 9403c158b872 ("net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries") Link: https://lore.kernel.org/r/20210119140638.203374-1-alban.bedel@aerq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27tcp: fix TCP_USER_TIMEOUT with zero windowEnke Chen
commit 9d9b1ee0b2d1c9e02b2338c4a4b0a062d2d3edac upstream. The TCP session does not terminate with TCP_USER_TIMEOUT when data remain untransmitted due to zero window. The number of unanswered zero-window probes (tcp_probes_out) is reset to zero with incoming acks irrespective of the window size, as described in tcp_probe_timer(): RFC 1122 4.2.2.17 requires the sender to stay open indefinitely as long as the receiver continues to respond probes. We support this by default and reset icsk_probes_out with incoming ACKs. This counter, however, is the wrong one to be used in calculating the duration that the window remains closed and data remain untransmitted. Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the actual issue. In this patch a new timestamp is introduced for the socket in order to track the elapsed time for the zero-window probes that have not been answered with any non-zero window ack. Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") Reported-by: William McCall <william.mccall@gmail.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27tcp: do not mess with cloned skbs in tcp_add_backlog()Eric Dumazet
commit b160c28548bc0a87cbd16d5af6d3edcfd70b8c9a upstream. Heiner Kallweit reported that some skbs were sent with the following invalid GSO properties : - gso_size > 0 - gso_type == 0 This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2. Juerg Haefliger was able to reproduce a similar issue using a lan78xx NIC and a workload mixing TCP incoming traffic and forwarded packets. The problem is that tcp_add_backlog() is writing over gso_segs and gso_size even if the incoming packet will not be coalesced to the backlog tail packet. While skb_try_coalesce() would bail out if tail packet is cloned, this overwriting would lead to corruptions of other packets cooked by lan78xx, sharing a common super-packet. The strategy used by lan78xx is to use a big skb, and split it into all received packets using skb_clone() to avoid copies. The drawback of this strategy is that all the small skb share a common struct skb_shared_info. This patch rewrites TCP gso_size/gso_segs handling to only happen on the tail skb, since skb_try_coalesce() made sure it was not cloned. Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Juerg Haefliger <juergh@canonical.com> Tested-by: Juerg Haefliger <juergh@canonical.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423 Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: dsa: b53: fix an off by one in checking "vlan->vid"Dan Carpenter
commit 8e4052c32d6b4b39c1e13c652c7e33748d447409 upstream. The > comparison should be >= to prevent accessing one element beyond the end of the dev->vlans[] array in the caller function, b53_vlan_add(). The "dev->vlans" array is allocated in the b53_switch_init() function and it has "dev->num_vlans" elements. Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/YAbxI97Dl/pmBy5V@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabledTariq Toukan
commit a3eb4e9d4c9218476d05c52dfd2be3d6fdce6b91 upstream. With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be logically done when RXCSUM offload is off. Fixes: 14136564c8ee ("net: Add TLS RX offload feature") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: mscc: ocelot: allow offloading of bridge on top of LAGVladimir Oltean
commit 79267ae22615496655feee2db0848f6786bcf67a upstream. The blamed commit was too aggressive, and it made ocelot_netdevice_event react only to network interface events emitted for the ocelot switch ports. In fact, only the PRECHANGEUPPER should have had that check. When we ignore all events that are not for us, we miss the fact that the upper of the LAG changes, and the bonding interface gets enslaved to a bridge. This is an operation we could offload under certain conditions. Fixes: 7afb3e575e5a ("net: mscc: ocelot: don't handle netdev events for other netdevs") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210118135210.2666246-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27ipv6: set multicast flag on the multicast routeMatteo Croce
commit ceed9038b2783d14e0422bdc6fd04f70580efb4c upstream. The multicast route ff00::/8 is created with type RTN_UNICAST: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium Set the type to RTN_MULTICAST which is more appropriate. Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net_sched: reject silly cell_log in qdisc_get_rtab()Eric Dumazet
commit e4bedf48aaa5552bc1f49703abd17606e7e6e82a upstream. iproute2 probably never goes beyond 8 for the cell exponent, but stick to the max shift exponent for signed 32bit. UBSAN reported: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22 shift exponent 130 is too large for 32-bit type 'int' CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345 ___sys_sendmsg net/socket.c:2399 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net_sched: avoid shift-out-of-bounds in tcindex_set_parms()Eric Dumazet
commit bcd0cf19ef8258ac31b9a20248b05c15a1f4b4b0 upstream. tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT attribute is not silly. UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29 shift exponent 255 is too large for 32-bit type 'int' CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 valid_perfect_hash net/sched/cls_tcindex.c:260 [inline] tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425 tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546 tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127 rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336 ___sys_sendmsg+0xf3/0x170 net/socket.c:2390 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27ipv6: create multicast route with RTPROT_KERNELMatteo Croce
commit a826b04303a40d52439aa141035fca5654ccaccd upstream. The ff00::/8 multicast route is created without specifying the fc_protocol field, so the default RTPROT_BOOT value is used: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium As the documentation says, this value identifies routes installed during boot, but the route is created when interface is set up. Change the value to RTPROT_KERNEL which is a better value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27udp: mask TOS bits in udp_v4_early_demux()Guillaume Nault
commit 8d2b51b008c25240914984208b2ced57d1dd25a5 upstream. udp_v4_early_demux() is the only function that calls ip_mc_validate_source() with a TOS that hasn't been masked with IPTOS_RT_MASK. This results in different behaviours for incoming multicast UDPv4 packets, depending on if ip_mc_validate_source() is called from the early-demux path (udp_v4_early_demux) or from the regular input path (ip_route_input_noref). ECN would normally not be used with UDP multicast packets, so the practical consequences should be limited on that side. However, IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align with the non-early-demux path behaviour. Reproducer: Setup two netns, connected with veth: $ ip netns add ns0 $ ip netns add ns1 $ ip -netns ns0 link set dev lo up $ ip -netns ns1 link set dev lo up $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 $ ip -netns ns0 link set dev veth01 up $ ip -netns ns1 link set dev veth10 up $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01 $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10 In ns0, add route to multicast address 224.0.2.0/24 using source address 198.51.100.10: $ ip -netns ns0 address add 198.51.100.10/32 dev lo $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10 In ns1, define route to 198.51.100.10, only for packets with TOS 4: $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10 Also activate rp_filter in ns1, so that incoming packets not matching the above route get dropped: $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1 Now try to receive packets on 224.0.2.11: $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof - In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is, tos 6 for socat): $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6 The "test0" message is properly received by socat in ns1, because early-demux has no cached dst to use, so source address validation is done by ip_route_input_mc(), which receives a TOS that has the ECN bits masked. Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0): $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6 The "test1" message isn't received by socat in ns1, because, now, early-demux has a cached dst to use and calls ip_mc_validate_source() immediately, without masking the ECN bits. Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net_sched: gen_estimator: support large ewma logEric Dumazet
commit dd5e073381f2ada3630f36be42833c6e9c78b75e upstream. syzbot report reminded us that very big ewma_log were supported in the past, even if they made litle sense. tc qdisc replace dev xxx root est 1sec 131072sec ... While fixing the bug, also add boundary checks for ewma_log, in line with range supported by iproute2. UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83 call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516 Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114181929.1717985-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27tcp: fix TCP socket rehash stats mis-accountingYuchung Cheng
commit 9c30ae8398b0813e237bde387d67a7f74ab2db2d upstream. The previous commit 32efcc06d2a1 ("tcp: export count for rehash attempts") would mis-account rehashing SNMP and socket stats: a. During handshake of an active open, only counts the first SYN timeout b. After handshake of passive and active open, stop updating after (roughly) TCP_RETRIES1 recurring RTOs c. After the socket aborts, over count timeout_rehash by 1 This patch fixes this by checking the rehash result from sk_rethink_txhash. Fixes: 32efcc06d2a1 ("tcp: export count for rehash attempts") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27kasan: fix incorrect arguments passing in kasan_add_zero_shadowLecopzer Chen
commit 5dabd1712cd056814f9ab15f1d68157ceb04e741 upstream. kasan_remove_zero_shadow() shall use original virtual address, start and size, instead of shadow address. Link: https://lkml.kernel.org/r/20210103063847.5963-1-lecopzer@gmail.com Fixes: 0207df4fa1a86 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN") Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27kasan: fix unaligned address is unhandled in kasan_remove_zero_shadowLecopzer Chen
commit a11a496ee6e2ab6ed850233c96b94caf042af0b9 upstream. During testing kasan_populate_early_shadow and kasan_remove_zero_shadow, if the shadow start and end address in kasan_remove_zero_shadow() is not aligned to PMD_SIZE, the remain unaligned PTE won't be removed. In the test case for kasan_remove_zero_shadow(): shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000 3-level page table: PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K 0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because in kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the next address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE. In the correct condition, this should fallback to the next level kasan_remove_pmd_table() but the condition flow always continue to skip the unaligned part. Fix by correcting the condition when next and addr are neither aligned. Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com Fixes: 0207df4fa1a86 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN") Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: YJ Chiang <yj.chiang@mediatek.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() tooAlexander Lobakin
commit 66c556025d687dbdd0f748c5e1df89c977b6c02a upstream. Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") ensured that skbs with data size lower than 1025 bytes will be kmalloc'ed to avoid excessive page cache fragmentation and memory consumption. However, the fix adressed only __napi_alloc_skb() (primarily for virtio_net and napi_get_frags()), but the issue can still be achieved through __netdev_alloc_skb(), which is still used by several drivers. Drivers often allocate a tiny skb for headers and place the rest of the frame to frags (so-called copybreak). Mirror the condition to __netdev_alloc_skb() to handle this case too. Since v1 [0]: - fix "Fixes:" tag; - refine commit message (mention copybreak usecase). [0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me Fixes: a1c7fff7e18f ("net: netdev_alloc_skb() use build_skb()") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27lightnvm: fix memory leak when submit failsPan Bian
commit 97784481757fba7570121a70dd37ca74a29f50a8 upstream. The allocated page is not released if error occurs in nvm_submit_io_sync_raw(). __free_page() is moved ealier to avoid possible memory leak issue. Fixes: aff3fb18f957 ("lightnvm: move bad block and chunk state logic to core") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27cachefiles: Drop superfluous readpages aops NULL checkTakashi Iwai
commit db58465f1121086b524be80be39d1fedbe5387f3 upstream. After the recent actions to convert readpages aops to readahead, the NULL checks of readpages aops in cachefiles_read_or_alloc_page() may hit falsely. More badly, it's an ASSERT() call, and this panics. Drop the superfluous NULL checks for fixing this regression. [DH: Note that cachefiles never actually used readpages, so this check was never actually necessary] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883 BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245 Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27nvme-pci: fix error unwind in nvme_map_dataChristoph Hellwig
commit fa0732168fa1369dd089e5b06d6158a68229f7b7 upstream. Properly unwind step by step using refactored helpers from nvme_unmap_data to avoid a potential double dma_unmap on a mapping failure. Fixes: 7fe07d14f71f ("nvme-pci: merge nvme_free_iod into nvme_unmap_data") Reported-by: Marc Orr <marcorr@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27nvme-pci: refactor nvme_unmap_dataChristoph Hellwig
commit 9275c206f88e5c49cb3e71932c81c8561083db9e upstream. Split out three helpers from nvme_unmap_data that will allow finer grained unwinding from nvme_map_data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27sh_eth: Fix power down vs. is_opened flag orderingGeert Uytterhoeven
commit f6a2e94b3f9d89cb40771ff746b16b5687650cbb upstream. sh_eth_close() does a synchronous power down of the device before marking it closed. Revert the order, to make sure the device is never marked opened while suspended. While at it, use pm_runtime_put() instead of pm_runtime_put_sync(), as there is no reason to do a synchronous power down. Fixes: 7fa2955ff70ce453 ("sh_eth: Fix sleeping function called from invalid context") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20210118150812.796791-1-geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27selftests/powerpc: Fix exit status of pkey testsSandipan Das
commit 92a5e1fdb286851d5bd0eb966b8d075be27cf5ee upstream. Since main() does not return a value explicitly, the return values from FAIL_IF() conditions are ignored and the tests can still pass irrespective of failures. This makes sure that we always explicitly return the correct test exit status. Fixes: 1addb6444791 ("selftests/powerpc: Add test for execute-disabled pkeys") Fixes: c27f2fd1705a ("selftests/powerpc: Add test for pkey siginfo verification") Reported-by: Eirik Fuller <efuller@redhat.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210118093145.10134-1-sandipan@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27net: dsa: mv88e6xxx: also read STU state in mv88e6250_g1_vtu_getnextRasmus Villemoes
commit 87fe04367d842c4d97a77303242d4dd4ac351e46 upstream. mv88e6xxx_port_vlan_join checks whether the VTU already contains an entry for the given vid (via mv88e6xxx_vtu_getnext), and if so, merely changes the relevant .member[] element and loads the updated entry into the VTU. However, at least for the mv88e6250, the on-stack struct mv88e6xxx_vtu_entry vlan never has its .state[] array explicitly initialized, neither in mv88e6xxx_port_vlan_join() nor inside the getnext implementation. So the new entry has random garbage for the STU bits, breaking VLAN filtering. When the VTU entry is initially created, those bits are all zero, and we should make sure to keep them that way when the entry is updated. Fixes: 92307069a96c (net: dsa: mv88e6xxx: Avoid VTU corruption on 6097) Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27octeontx2-af: Fix missing check bugs in rvu_cgx.cYingjie Wang
commit b7ba6cfabc42fc846eb96e33f1edcd3ea6290a27 upstream. In rvu_mbox_handler_cgx_mac_addr_get() and rvu_mbox_handler_cgx_mac_addr_set(), the msg is expected only from PFs that are mapped to CGX LMACs. It should be checked before mapping, so we add the is_cgx_config_permitted() in the functions. Fixes: 96be2e0da85e ("octeontx2-af: Support for MAC address filters in CGX") Signed-off-by: Yingjie Wang <wangyingjie55@126.com> Reviewed-by: Geetha sowjanya<gakula@marvell.com> Link: https://lore.kernel.org/r/1610719804-35230-1-git-send-email-wangyingjie55@126.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27ASoC: SOF: Intel: fix page fault at probe if i915 init failsKai Vehmanen
commit 9c25af250214e45f6d1c21ff6239a1ffeeedf20e upstream. The earlier commit to fix runtime PM in case i915 init fails, introduces a possibility to hit a page fault. snd_hdac_ext_bus_device_exit() is designed to be called from dev.release(). Calling it outside device reference counting, is not safe and may lead to calling the device_exit() function twice. Additionally, as part of ext_bus_device_init(), the device is also registered with snd_hdac_device_register(). Thus before calling device_exit(), the device must be removed from device hierarchy first. Fix the issue by rolling back init actions by calling hdac_device_unregister() and then releasing device with put_device(). This matches with existing code in hdac-ext module. To complete the fix, add handling for the case where hda_codec_load_module() returns -ENODEV, and clean up the hdac_ext resources also in this case. In future work, hdac-ext interface should be extended to allow clients more flexibility to handle the life-cycle of individual devices, beyond just the current snd_hdac_ext_bus_device_remove(), which removes all devices. BugLink: https://github.com/thesofproject/linux/issues/2646 Reported-by: Jaroslav Kysela <perex@perex.cz> Fixes: 6c63c954e1c5 ("ASoC: SOF: fix a runtime pm issue in SOF when HDMI codec doesn't work") Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Reviewed-by: Libin Yang <libin.yang@intel.com> Reviewed-by: Bard Liao <bard.liao@intel.com> Link: https://lore.kernel.org/r/20210113150715.3992635-1-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27locking/lockdep: Cure noinstr failPeter Zijlstra
commit 0afda3a888dccf12557b41ef42eee942327d122b upstream. When the compiler doesn't feel like inlining, it causes a noinstr fail: vmlinux.o: warning: objtool: lock_is_held_type()+0xb: call to lockdep_enabled() leaves .noinstr.text section Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210106144017.592595176@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27sh: Remove unused HAVE_COPY_THREAD_TLS macroJinyang He
commit 19170492735be935747b0545b7eed8bb40cc1209 upstream. Fixes: e1cc9d8d596e ("sh: switch to copy_thread_tls()") Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Rich Felker <dalias@libc.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>