summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-03Merge tag 'sched_urgent_for_v5.15_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Tell the compiler to always inline is_percpu_thread() - Make sure tunable_scaling buffer is null-terminated after an update in sysfs - Fix LTP named regression due to cgroup list ordering * tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Always inline is_percpu_thread() sched/fair: Null terminate buffer when updating tunable_scaling sched/fair: Add ancestors of unthrottled undecayed cfs_rq
2021-10-03Merge tag 'perf_urgent_for_v5.15_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure the destroy callback is reset when a event initialization fails - Update the event constraints for Icelake - Make sure the active time of an event is updated even for inactive events * tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: fix userpage->time_enabled of inactive events perf/x86/intel: Update event constraints for ICX perf/x86: Reset destroy callback on event init failure
2021-10-03Merge tag 'objtool_urgent_for_v5.15_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Handle symbol relocations properly due to changes in the toolchains which remove section symbols now * tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Teach get_alt_entry() about more relocation types
2021-10-02Merge tag 'hwmon-for-v5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fixed various potential NULL pointer accesses in w8379* drivers - Improved error handling, fault reporting, and fixed rounding in thmp421 driver - Fixed error handling in ltc2947 driver - Added missing attribute to pmbus/mp2975 driver - Fixed attribute values in pbus/ibm-cffps, occ, and mlxreg-fan drivers - Removed unused residual code from k10temp driver * tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller hwmon: (pmbus/ibm-cffps) max_power_out swap changes hwmon: (occ) Fix P10 VRM temp sensors hwmon: (ltc2947) Properly handle errors when looking for the external clock hwmon: (tmp421) fix rounding for negative values hwmon: (tmp421) report /PVLD condition as fault hwmon: (tmp421) handle I2C errors hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs hwmon: (k10temp) Remove residues of current and voltage
2021-10-02Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: "Eleven fixes for the ksmbd kernel server, mostly security related: - an important fix for disabling weak NTLMv1 authentication - seven security (improved buffer overflow checks) fixes - fix for wrong infolevel struct used in some getattr/setattr paths - two small documentation fixes" * tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: missing check for NULL in convert_to_nt_pathname() ksmbd: fix transform header validation ksmbd: add buffer validation for SMB2_CREATE_CONTEXT ksmbd: add validation in smb2 negotiate ksmbd: add request buffer validation in smb2_set_info ksmbd: use correct basic info level in set_file_basic_info() ksmbd: remove NTLMv1 authentication ksmbd: fix documentation for 2 functions MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry ksmbd: fix invalid request buffer access in compound ksmbd: remove RFC1002 check in smb2 request
2021-10-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Five fairly minor fixes and spelling updates, all in drivers. Even though the ufs fix is in tracing, it's a potentially exploitable use beyond end of array bug" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: csiostor: Add module softdep on cxgb4 scsi: qla2xxx: Fix excessive messages during device logout scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported" scsi: ses: Fix unsigned comparison with less than zero scsi: ufs: Fix illegal offset in UPIU event trace
2021-10-02Merge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few block fixes for this release: - Revert a BFQ commit that causes breakage for people. Unfortunately it was auto-selected for stable as well, so now 5.14.7 suffers from it too. Hopefully stable will pick up this revert quickly too, so we can remove the issue on that end as well. - Add a quirk for Apple NVMe controllers, which due to their non-compliance broke due to the introduction of command sequences (Keith) - Use shifts in nbd, fixing a __divdi3 issue (Nick)" * tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block: nbd: use shifts rather than multiplies Revert "block, bfq: honor already-setup queue merges" nvme: add command id quirk for apple controllers
2021-10-02Merge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Two fixes in here: - The signal issue that was discussed start of this week (me). - Kill dead fasync support in io_uring. Looks like it was broken since io_uring was initially merged, and given that nobody has ever complained about it, let's just kill it (Pavel)" * tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block: io_uring: kill fasync io-wq: exclusively gate signal based exit on get_signal() return
2021-10-02Merge tag 'libnvdimm-fixes-5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A fix for a regression added this cycle in the pmem driver, and for a long standing bug for failed NUMA node lookups on ARM64. This has appeared in -next for several days with no reported issues. Summary: - Fix a regression that caused the sysfs ABI for pmem block devices to not be registered. This fails the nvdimm unit tests and dax xfstests. - Fix numa node lookups for dax-kmem memory (device-dax memory assigned to the page allocator) on ARM64" * tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/pmem: fix creating the dax group ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect
2021-10-02cachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL objectDave Wysochanski
In cachefiles_mark_object_buried, the dentry in question may not have an owner, and thus our cachefiles_object pointer may be NULL when calling the tracepoint, in which case we will also not have a valid debug_id to print in the tracepoint. Check for NULL object in the tracepoint and if so, just set debug_id to MAX_UINT as was done in 2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces"). This fixes the following oops: FS-Cache: Cache "mycache" added (type cachefiles) CacheFiles: File cache on vdc registered ... Workqueue: fscache_object fscache_object_work_func [fscache] RIP: 0010:trace_event_raw_event_cachefiles_mark_buried+0x4e/0xa0 [cachefiles] .... Call Trace: cachefiles_mark_object_buried+0xa5/0xb0 [cachefiles] cachefiles_bury_object+0x270/0x430 [cachefiles] cachefiles_walk_to_object+0x195/0x9c0 [cachefiles] cachefiles_lookup_object+0x5a/0xc0 [cachefiles] fscache_look_up_object+0xd7/0x160 [fscache] fscache_object_work_func+0xb2/0x340 [fscache] process_one_work+0x1f1/0x390 worker_thread+0x53/0x3e0 kthread+0x127/0x150 Fixes: 2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-02drm/i915: fix blank screen booting crashesHugh Dickins
5.15-rc1 crashes with blank screen when booting up on two ThinkPads using i915. Bisections converge convincingly, but arrive at different and suprising "culprits", none of them the actual culprit. netconsole (with init_netconsole() hacked to call i915_init() when logging has started, instead of by module_init()) tells the story: kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! with RSI: ffffffff814d408b pointing to sw_fence_dummy_notify(). I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that function needs to be 4-byte aligned. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-02hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary ↵Nadezda Lutovinova
structure field If driver read tmp value sufficient for (tmp & 0x08) && (!(tmp & 0x80)) && ((tmp & 0x7) == ((tmp >> 4) & 0x7)) from device then Null pointer dereference occurs. (It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers) Also lm75[] does not serve a purpose anymore after switching to devm_i2c_new_dummy_device() in w83791d_detect_subclients(). The patch fixes possible NULL pointer dereference by removing lm75[]. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru> Link: https://lore.kernel.org/r/20210921155153.28098-3-lutovinova@ispras.ru [groeck: Dropped unnecessary continuation lines, fixed multi-line alignments] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-02hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary ↵Nadezda Lutovinova
structure field If driver read val value sufficient for (val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7)) from device then Null pointer dereference occurs. (It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers) Also lm75[] does not serve a purpose anymore after switching to devm_i2c_new_dummy_device() in w83791d_detect_subclients(). The patch fixes possible NULL pointer dereference by removing lm75[]. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru> Link: https://lore.kernel.org/r/20210921155153.28098-2-lutovinova@ispras.ru [groeck: Dropped unnecessary continuation lines, fixed multipline alignment] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-02hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary ↵Nadezda Lutovinova
structure field If driver read val value sufficient for (val & 0x08) && (!(val & 0x80)) && ((val & 0x7) == ((val >> 4) & 0x7)) from device then Null pointer dereference occurs. (It is possible if tmp = 0b0xyz1xyz, where same literals mean same numbers) Also lm75[] does not serve a purpose anymore after switching to devm_i2c_new_dummy_device() in w83791d_detect_subclients(). The patch fixes possible NULL pointer dereference by removing lm75[]. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org Signed-off-by: Nadezda Lutovinova <lutovinova@ispras.ru> Link: https://lore.kernel.org/r/20210921155153.28098-1-lutovinova@ispras.ru [groeck: Dropped unnecessary continuation lines, fixed multi-line alignment] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-02hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controllerVadim Pasternak
Add missed attribute for reading POUT from page 1. It is supported by device, but has been missed in initial commit. Fixes: 2c6fcbb21149 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20210927070740.2149290-1-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-02hwmon: (pmbus/ibm-cffps) max_power_out swap changesBrandon Wyman
The bytes for max_power_out from the ibm-cffps devices differ in byte order for some power supplies. The Witherspoon power supply returns the bytes in MSB/LSB order. The Rainier power supply returns the bytes in LSB/MSB order. The Witherspoon power supply uses version cffps1. The Rainier power supply should use version cffps2. If version is cffps1, swap the bytes before output to max_power_out. Tested: Witherspoon before: 3148. Witherspoon after: 3148. Rainier before: 53255. Rainier after: 2000. Signed-off-by: Brandon Wyman <bjwyman@gmail.com> Reviewed-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/r/20210928205051.1222815-1-bjwyman@gmail.com [groeck: Replaced yoda programming] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-02hwmon: (occ) Fix P10 VRM temp sensorsEddie James
The P10 (temp sensor version 0x10) doesn't do the same VRM status reporting that was used on P9. It just reports the temperature, so drop the check for VRM fru type in the sysfs show function, and don't set the name to "alarm". Fixes: db4919ec86 ("hwmon: (occ) Add new temperature sensor type") Signed-off-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/r/20210929153604.14968-1-eajames@linux.ibm.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-01Merge tag 's390-5.15-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Vasily Gorbik: "One fix for 5.15-rc4: Avoid CIO excessive path-verification requests, which might cause unwanted delays" * tag 's390-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cio: avoid excessive path-verification requests
2021-10-01thermal: Update information in MAINTAINERSRafael J. Wysocki
Because Rui is now going to focus on work that is not related to the maintenance of the thermal subsystem in the kernel, Rafael will start to help Daniel with handling the development process as a new member of the thermal maintainers team. Rui will continue to review patches in that area. The thermal development process flow will change so that the material from the thermal git tree will be merged into the thermal branch of the linux-pm.git tree before going into the mainline. Update the information in MAINTAINERS accordingly. Signed-off-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more kvm fixes from Paolo Bonzini: "Small x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Ensure all migrations are performed when test is affined KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm x86/kvmclock: Move this_cpu_pvti into kvmclock.h selftests: KVM: Don't clobber XMM register when read KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue
2021-10-01Merge tag 'drm-fixes-2021-10-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Daniel Vetter: "Dave is out on a long w/e, should be back next week. Nothing nefarious, just a bunch of driver fixes: amdgpu, i915, tegra, and one exynos driver fix" * tag 'drm-fixes-2021-10-01' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix drm/amdgpu: check tiling flags when creating FB on GFX8- drm/amd/display: Pass PCI deviceid into DC drm/amd/display: initialize backlight_ramping_override to false drm/amdgpu: correct initial cp_hqd_quantum for gfx9 drm/amd/display: Fix Display Flicker on embedded panels drm/amdgpu: fix gart.bo pin_count leak drm/i915: Remove warning from the rps worker drm/i915/request: fix early tracepoints drm/i915/guc, docs: Fix pdfdocs build error by removing nested grid gpu: host1x: Plug potential memory leak gpu/host1x: fence: Make spinlock static drm/tegra: uapi: Fix wrong mapping end address in case of disabled IOMMU drm/tegra: dc: Remove unused variables drm/exynos: Make use of the helper function devm_platform_ioremap_resource() drm/i915/gvt: fix the usage of ww lock in gvt scheduler.
2021-10-01io_uring: kill fasyncio_uring-5.15-2021-10-01Pavel Begunkov
We have never supported fasync properly, it would only fire when there is something polling io_uring making it useless. The original support came in through the initial io_uring merge for 5.1. Since it's broken and nobody has reported it, get rid of the fasync bits. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-01Merge tag 'iommu-fixes-v5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Two fixes for the new Apple DART driver to fix a kernel panic and a stale data usage issue - Intel VT-d fix for how PCI device ids are printed * tag 'iommu-fixes-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/dart: Clear sid2group entry when a group is freed iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses iommu/dart: Remove iommu_flush_ops
2021-10-01Merge tag 'exynos-drm-fixes-for-v5.15-rc4' of ↵Daniel Vetter
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes One cleanup - Use devm_platform_ioremap_resource() helper function instead of old one. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210928074158.2942-1-inki.dae@samsung.com
2021-10-01Merge tag 'amd-drm-fixes-5.15-2021-09-29' of ↵Daniel Vetter
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.15-2021-09-29: amdgpu: - gart pin count fix - eDP flicker fix - GFX9 MQD fix - Display fixes - Tiling flags fix for pre-GFX9 - SDMA resume fix for S0ix Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210930023013.5207-1-alexander.deucher@amd.com
2021-10-01Merge tag 'drm-intel-fixes-2021-09-30' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.15-rc4: - Fix GVT scheduler ww lock usage - Fix pdfdocs documentation build - Fix request early tracepoints - Fix an invalid warning from rps worker Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87lf3ev44z.fsf@intel.com
2021-10-01sched: Always inline is_percpu_thread()Peter Zijlstra
vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.org
2021-10-01sched/fair: Null terminate buffer when updating tunable_scalingMel Gorman
This patch null-terminates the temporary buffer in sched_scaling_write() so kstrtouint() does not return failure and checks the value is valid. Before: $ cat /sys/kernel/debug/sched/tunable_scaling 1 $ echo 0 > /sys/kernel/debug/sched/tunable_scaling -bash: echo: write error: Invalid argument $ cat /sys/kernel/debug/sched/tunable_scaling 1 After: $ cat /sys/kernel/debug/sched/tunable_scaling 1 $ echo 0 > /sys/kernel/debug/sched/tunable_scaling $ cat /sys/kernel/debug/sched/tunable_scaling 0 $ echo 3 > /sys/kernel/debug/sched/tunable_scaling -bash: echo: write error: Invalid argument Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
2021-10-01sched/fair: Add ancestors of unthrottled undecayed cfs_rqMichal Koutný
Since commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") we add cfs_rqs with no runnable tasks but not fully decayed into the load (leaf) list. We may ignore adding some ancestors and therefore breaking tmp_alone_branch invariant. This broke LTP test cfs_bandwidth01 and it was partially fixed in commit fdaba61ef8a2 ("sched/fair: Ensure that the CFS parent is added after unthrottling"). I noticed the named test still fails even with the fix (but with low probability, 1 in ~1000 executions of the test). The reason is when bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly. Fix this by adding ancestors if we notice the unthrottled cfs_rq was added to the load list. Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Odin Ugedal <odin@uged.al> Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
2021-10-01perf/core: fix userpage->time_enabled of inactive eventsSong Liu
Users of rdpmc rely on the mmapped user page to calculate accurate time_enabled. Currently, userpage->time_enabled is only updated when the event is added to the pmu. As a result, inactive event (due to counter multiplexing) does not have accurate userpage->time_enabled. This can be reproduced with something like: /* open 20 task perf_event "cycles", to create multiplexing */ fd = perf_event_open(); /* open task perf_event "cycles" */ userpage = mmap(fd); /* use mmap and rdmpc */ while (true) { time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */ time_enabled_read = read(fd).time_enabled; if (time_enabled_mmap > time_enabled_read) BUG(); } Fix this by updating userpage for inactive events in merge_sched_in. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-and-tested-by: Lucian Grijincu <lucian@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
2021-10-01perf/x86/intel: Update event constraints for ICXKan Liang
According to the latest event list, the event encoding 0xEF is only available on the first 4 counters. Add it into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
2021-10-01perf/x86: Reset destroy callback on event init failureAnand K Mistry
perf_init_event tries multiple init callbacks and does not reset the event state between tries. When x86_pmu_event_init runs, it unconditionally sets the destroy callback to hw_perf_event_destroy. On the next init attempt after x86_pmu_event_init, in perf_try_init_event, if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy callback will be run. However, if the next init didn't set the destroy callback, hw_perf_event_destroy will be run (since the callback wasn't reset). Looking at other pmu init functions, the common pattern is to only set the destroy callback on a successful init. Resetting the callback on failure tries to replicate that pattern. This was discovered after commit f11dd0d80555 ("perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second) run of the perf tool after a reboot results in 0 samples being generated. The extra run of hw_perf_event_destroy results in active_events having an extra decrement on each perf run. The second run has active_events == 0 and every subsequent run has active_events < 0. When active_events == 0, the NMI handler will early-out and not record any samples. Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid
2021-10-01objtool: Teach get_alt_entry() about more relocation typesPeter Zijlstra
Occasionally objtool encounters symbol (as opposed to section) relocations in .altinstructions. Typically they are the alternatives written by elf_add_alternative() as encountered on a noinstr validation run on vmlinux after having already ran objtool on the individual .o files. Basically this is the counterpart of commit 44f6a7c0755d ("objtool: Fix seg fault with Clang non-section symbols"), because when these new assemblers (binutils now also does this) strip the section symbols, elf_add_reloc_to_insn() is forced to emit symbol based relocations. As such, teach get_alt_entry() about different relocation types. Fixes: 9bc0bb50727c ("objtool/x86: Rewrite retpoline thunk calls") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/YVWUvknIEVNkPvnP@hirez.programming.kicks-ass.net
2021-09-30ksmbd: missing check for NULL in convert_to_nt_pathname()Dan Carpenter
The kmalloc() does not have a NULL check. This code can be re-written slightly cleaner to just use the kstrdup(). Fixes: 265fd1991c1d ("ksmbd: use LOOKUP_BENEATH to prevent the out of share access") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30Merge tag 'net-5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, netfilter and bpf. Current release - regressions: - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from interrupt - mdio: revert mechanical patches which broke handling of optional resources - dev_addr_list: prevent address duplication Previous releases - regressions: - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb (NULL deref) - Revert "mac80211: do not use low data rates for data frames with no ack flag", fixing broadcast transmissions - mac80211: fix use-after-free in CCMP/GCMP RX - netfilter: include zone id in tuple hash again, minimize collisions - netfilter: nf_tables: unlink table before deleting it (race -> UAF) - netfilter: log: work around missing softdep backend module - mptcp: don't return sockets in foreign netns - sched: flower: protect fl_walk() with rcu (race -> UAF) - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup - smsc95xx: fix stalled rx after link change - enetc: fix the incorrect clearing of IF_MODE bits - ipv4: fix rtnexthop len when RTA_FLOW is present - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU - e100: fix length calculation & buffer overrun in ethtool::get_regs Previous releases - always broken: - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx - mac80211: drop frames from invalid MAC address in ad-hoc mode - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race -> UAF) - bpf, x86: Fix bpf mapping of atomic fetch implementation - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog - netfilter: ip6_tables: zero-initialize fragment offset - mhi: fix error path in mhi_net_newlink - af_unix: return errno instead of NULL in unix_create1() when over the fs.file-max limit Misc: - bpf: exempt CAP_BPF from checks against bpf_jit_limit - netfilter: conntrack: make max chain length random, prevent guessing buckets by attackers - netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic, defer conntrack walk to work queue (prevent hogging RTNL lock)" * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) af_unix: fix races in sk_peer_pid and sk_peer_cred accesses net: stmmac: fix EEE init issue when paired with EEE capable PHYs net: dev_addr_list: handle first address in __hw_addr_add_ex net: sched: flower: protect fl_walk() with rcu net: introduce and use lock_sock_fast_nested() net: phy: bcm7xxx: Fixed indirect MMD operations net: hns3: disable firmware compatible features when uninstall PF net: hns3: fix always enable rx vlan filter problem after selftest net: hns3: PF enable promisc for VF when mac table is overflow net: hns3: fix show wrong state when add existing uc mac address net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE net: hns3: don't rollback when destroy mqprio fail net: hns3: remove tc enable checking net: hns3: do not allow call hns3_nic_net_open repeatedly ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup net: bridge: mcast: Associate the seqcount with its protecting lock. net: mdio-ipq4019: Fix the error for an optional regs resource net: hns3: fix hclge_dbg_dump_tm_pg() stack usage net: mdio: mscc-miim: Fix the mdio controller af_unix: Return errno instead of NULL in unix_create1(). ...
2021-09-30Merge tag 'gpio-fixes-for-v5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "A single fix for the gpio-pca953x driver and two commits updating the MAINTAINERS entries for Mun Yew Tham (GPIO specific) and myself (treewide after a change in professional situation). Summary: - don't ignore I2C errors in gpio-pca953x - update MAINTAINERS entries for Mun Yew Tham and myself" * tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer MAINTAINERS: update my email address gpio: pca953x: do not ignore i2c errors
2021-09-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Not much too exciting here, although two syzkaller bugs that seem to have 9 lives may have finally been squashed. Several core bugs and a batch of driver bug fixes: - Fix compilation problems in qib and hfi1 - Do not corrupt the joined multicast group state when using SEND_ONLY - Several CMA bugs, a reference leak for listening and two syzkaller crashers - Various bug fixes for irdma - Fix a Sleeping while atomic bug in usnic - Properly sanitize kernel pointers in dmesg - Two bugs in the 64b CQE support for hns" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Add the check of the CQE size of the user space RDMA/hns: Fix the size setting error when copying CQE in clean_cq() RDMA/hfi1: Fix kernel pointer leak RDMA/usnic: Lock VF with mutex instead of spinlock RDMA/hns: Work around broken constant propagation in gcc 8 RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests RDMA/cma: Do not change route.addr.src_addr.ss_family RDMA/irdma: Report correct WC error when there are MW bind errors RDMA/irdma: Report correct WC error when transport retry counter is exceeded RDMA/irdma: Validate number of CQ entries on create CQ RDMA/irdma: Skip CQP ring during a reset MAINTAINERS: Update Broadcom RDMA maintainers RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure IB/cma: Do not send IGMP leaves for sendonly Multicast groups IB/qib: Fix clang confusion of NULL pointer comparison
2021-09-30ksmbd: fix transform header validationNamjae Jeon
Validate that the transform and smb request headers are present before checking OriginalMessageSize and SessionId fields. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Ralph Böhme <slow@samba.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Tom Talpey <tom@talpey.com> Acked-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30ksmbd: add buffer validation for SMB2_CREATE_CONTEXTHyunchul Lee
Add buffer validation for SMB2_CREATE_CONTEXT. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Reviewed-by: Ralph Boehme <slow@samba.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30ksmbd: add validation in smb2 negotiateNamjae Jeon
This patch add validation to check request buffer check in smb2 negotiate and fix null pointer deferencing oops in smb3_preauth_hash_rsp() that found from manual test. Cc: Tom Talpey <tom@talpey.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Ralph Böhme <slow@samba.org> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Ralph Boehme <slow@samba.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30ksmbd: add request buffer validation in smb2_set_infoNamjae Jeon
Add buffer validation in smb2_set_info, and remove unused variable in set_file_basic_info. and smb2_set_info infolevel functions take structure pointer argument. Cc: Tom Talpey <tom@talpey.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Ralph Böhme <slow@samba.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Hyunchul Lee <hyc.lee@gmail.com> Reviewed-by: Ralph Boehme <slow@samba.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30ksmbd: use correct basic info level in set_file_basic_info()Namjae Jeon
Use correct basic info level in set/get_file_basic_info(). Reviewed-by: Ralph Boehme <slow@samba.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30af_unix: fix races in sk_peer_pid and sk_peer_cred accessesEric Dumazet
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred. In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written. Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected. Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: stmmac: fix EEE init issue when paired with EEE capable PHYsWong Vee Khee
When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY, and the PHY is advertising EEE by default, we need to enable EEE on the xPCS side too, instead of having user to manually trigger the enabling config via ethtool. Fixed this by adding xpcs_config_eee() call in stmmac_eee_init(). Fixes: 7617af3d1a5e ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet") Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: dev_addr_list: handle first address in __hw_addr_add_exJakub Kicinski
struct dev_addr_list is used for device addresses, unicast addresses and multicast addresses. The first of those needs special handling of the main address - netdev->dev_addr points directly the data of the entry and drivers write to it freely, so we can't maintain it in the rbtree (for now, at least, to be fixed in net-next). Current work around sprinkles special handling of the first address on the list throughout the code but it missed the case where address is being added. First address will not be visible during subsequent adds. Syzbot found a warning where unicast addresses are modified without holding the rtnl lock, tl;dr is that team generates the same modification multiple times, not necessarily when right locks are held. In the repro we have: macvlan -> team -> veth macvlan adds a unicast address to the team. Team then pushes that address down to its memebers (veths). Next something unrelated makes team sync member addrs again, and because of the bug the addr entries get duplicated in the veths. macvlan gets removed, removes its addr from team which removes only one of the duplicated addresses from veths. This removal is done under rtnl. Next syzbot uses iptables to add a multicast addr to team (which does not hold rtnl lock). Team syncs veth addrs, but because veths' unicast list still has the duplicate it will also get sync, even though this update is intended for mc addresses. Again, uc address updates need rtnl lock, boom. Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com Fixes: 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: sched: flower: protect fl_walk() with rcuVlad Buslov
Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul() also removed rcu protection of individual filters which causes following use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain rcu read lock while iterating and taking the filter reference and temporary release the lock while calling arg->fn() callback that can sleep. KASAN trace: [ 352.773640] ================================================================== [ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower] [ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987 [ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2 [ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 352.781022] Call Trace: [ 352.781573] dump_stack_lvl+0x46/0x5a [ 352.782332] print_address_description.constprop.0+0x1f/0x140 [ 352.783400] ? fl_walk+0x159/0x240 [cls_flower] [ 352.784292] ? fl_walk+0x159/0x240 [cls_flower] [ 352.785138] kasan_report.cold+0x83/0xdf [ 352.785851] ? fl_walk+0x159/0x240 [cls_flower] [ 352.786587] kasan_check_range+0x145/0x1a0 [ 352.787337] fl_walk+0x159/0x240 [cls_flower] [ 352.788163] ? fl_put+0x10/0x10 [cls_flower] [ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.790102] tcf_chain_dump+0x231/0x450 [ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170 [ 352.791833] ? __might_sleep+0x2e/0xc0 [ 352.792594] ? tfilter_notify+0x170/0x170 [ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.794477] tc_dump_tfilter+0x385/0x4b0 [ 352.795262] ? tc_new_tfilter+0x1180/0x1180 [ 352.796103] ? __mod_node_page_state+0x1f/0xc0 [ 352.796974] ? __build_skb_around+0x10e/0x130 [ 352.797826] netlink_dump+0x2c0/0x560 [ 352.798563] ? netlink_getsockopt+0x430/0x430 [ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.800542] __netlink_dump_start+0x356/0x440 [ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550 [ 352.802190] ? tc_new_tfilter+0x1180/0x1180 [ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.803668] ? tc_new_tfilter+0x1180/0x1180 [ 352.804344] ? _copy_from_iter_nocache+0x800/0x800 [ 352.805202] ? kasan_set_track+0x1c/0x30 [ 352.805900] netlink_rcv_skb+0xc6/0x1f0 [ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0 [ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.808324] ? netlink_ack+0x4d0/0x4d0 [ 352.809086] ? netlink_deliver_tap+0x62/0x3d0 [ 352.809951] netlink_unicast+0x353/0x480 [ 352.810744] ? netlink_attachskb+0x430/0x430 [ 352.811586] ? __alloc_skb+0xd7/0x200 [ 352.812349] netlink_sendmsg+0x396/0x680 [ 352.813132] ? netlink_unicast+0x480/0x480 [ 352.813952] ? __import_iovec+0x192/0x210 [ 352.814759] ? netlink_unicast+0x480/0x480 [ 352.815580] sock_sendmsg+0x6c/0x80 [ 352.816299] ____sys_sendmsg+0x3a5/0x3c0 [ 352.817096] ? kernel_sendmsg+0x30/0x30 [ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150 [ 352.818753] ___sys_sendmsg+0xd8/0x140 [ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110 [ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0 [ 352.821110] ? __copy_msghdr_from_user+0x260/0x260 [ 352.821934] ? _raw_spin_lock+0x81/0xd0 [ 352.822680] ? __handle_mm_fault+0xef3/0x1b20 [ 352.823549] ? rb_insert_color+0x2a/0x270 [ 352.824373] ? copy_page_range+0x16b0/0x16b0 [ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0 [ 352.826190] ? __fget_light+0xd9/0xf0 [ 352.826941] __sys_sendmsg+0xb3/0x130 [ 352.827613] ? __sys_sendmsg_sock+0x20/0x20 [ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0 [ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60 [ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160 [ 352.830845] do_syscall_64+0x35/0x80 [ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.832331] RIP: 0033:0x7f7bee973c17 [ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17 [ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003 [ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40 [ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f [ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088 [ 352.843784] Allocated by task 2960: [ 352.844451] kasan_save_stack+0x1b/0x40 [ 352.845173] __kasan_kmalloc+0x7c/0x90 [ 352.845873] fl_change+0x282/0x22db [cls_flower] [ 352.846696] tc_new_tfilter+0x6cf/0x1180 [ 352.847493] rtnetlink_rcv_msg+0x471/0x550 [ 352.848323] netlink_rcv_skb+0xc6/0x1f0 [ 352.849097] netlink_unicast+0x353/0x480 [ 352.849886] netlink_sendmsg+0x396/0x680 [ 352.850678] sock_sendmsg+0x6c/0x80 [ 352.851398] ____sys_sendmsg+0x3a5/0x3c0 [ 352.852202] ___sys_sendmsg+0xd8/0x140 [ 352.852967] __sys_sendmsg+0xb3/0x130 [ 352.853718] do_syscall_64+0x35/0x80 [ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.855830] Freed by task 7: [ 352.856421] kasan_save_stack+0x1b/0x40 [ 352.857139] kasan_set_track+0x1c/0x30 [ 352.857854] kasan_set_free_info+0x20/0x30 [ 352.858609] __kasan_slab_free+0xed/0x130 [ 352.859348] kfree+0xa7/0x3c0 [ 352.859951] process_one_work+0x44d/0x780 [ 352.860685] worker_thread+0x2e2/0x7e0 [ 352.861390] kthread+0x1f4/0x220 [ 352.862022] ret_from_fork+0x1f/0x30 [ 352.862955] Last potentially related work creation: [ 352.863758] kasan_save_stack+0x1b/0x40 [ 352.864378] kasan_record_aux_stack+0xab/0xc0 [ 352.865028] insert_work+0x30/0x160 [ 352.865617] __queue_work+0x351/0x670 [ 352.866261] rcu_work_rcufn+0x30/0x40 [ 352.866917] rcu_core+0x3b2/0xdb0 [ 352.867561] __do_softirq+0xf6/0x386 [ 352.868708] Second to last potentially related work creation: [ 352.869779] kasan_save_stack+0x1b/0x40 [ 352.870560] kasan_record_aux_stack+0xab/0xc0 [ 352.871426] call_rcu+0x5f/0x5c0 [ 352.872108] queue_rcu_work+0x44/0x50 [ 352.872855] __fl_put+0x17c/0x240 [cls_flower] [ 352.873733] fl_delete+0xc7/0x100 [cls_flower] [ 352.874607] tc_del_tfilter+0x510/0xb30 [ 352.886085] rtnetlink_rcv_msg+0x471/0x550 [ 352.886875] netlink_rcv_skb+0xc6/0x1f0 [ 352.887636] netlink_unicast+0x353/0x480 [ 352.888285] netlink_sendmsg+0x396/0x680 [ 352.888942] sock_sendmsg+0x6c/0x80 [ 352.889583] ____sys_sendmsg+0x3a5/0x3c0 [ 352.890311] ___sys_sendmsg+0xd8/0x140 [ 352.891019] __sys_sendmsg+0xb3/0x130 [ 352.891716] do_syscall_64+0x35/0x80 [ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.893666] The buggy address belongs to the object at ffff8881c8251000 which belongs to the cache kmalloc-2k of size 2048 [ 352.895696] The buggy address is located 1152 bytes inside of 2048-byte region [ffff8881c8251000, ffff8881c8251800) [ 352.897640] The buggy address belongs to the page: [ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250 [ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0 [ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00 [ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 352.905861] page dumped because: kasan: bad access detected [ 352.907323] Memory state around the buggy address: [ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.912012] ^ [ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.915185] ================================================================== Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: introduce and use lock_sock_fast_nested()Paolo Abeni
Syzkaller reported a false positive deadlock involving the nl socket lock and the subflow socket lock: MPTCP: kernel_bind error, err=-98 ============================================ WARNING: possible recursive locking detected 5.15.0-rc1-syzkaller #0 Not tainted -------------------------------------------- syz-executor998/6520 is trying to acquire lock: ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 but task is already holding lock: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-sk_lock-AF_INET); lock(k-sk_lock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor998/6520: #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline] #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 stack backtrace: CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline] check_deadlock kernel/locking/lockdep.c:2987 [inline] validate_chain kernel/locking/lockdep.c:3776 [inline] __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_fast+0x36/0x100 net/core/sock.c:3229 mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 __sock_release net/socket.c:649 [inline] sock_release+0x87/0x1b0 net/socket.c:677 mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900 mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_no_sendpage+0x101/0x150 net/core/sock.c:2980 kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504 kernel_sendpage net/socket.c:3501 [inline] sock_sendpage+0xe5/0x140 net/socket.c:1003 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0xd4/0x140 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x110/0x180 fs/splice.c:936 splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891 do_splice_direct+0x1b3/0x280 fs/splice.c:979 do_sendfile+0xae9/0x1240 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1314 [inline] __se_sys_sendfile64 fs/read_write.c:1300 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f215cb69969 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08 R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 the problem originates from uncorrect lock annotation in the mptcp code and is only visible since commit 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations"), but is present since the port-based endpoint support initial implementation. This patch addresses the issue introducing a nested variant of lock_sock_fast() and using it in the relevant code path. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30KVM: selftests: Ensure all migrations are performed when test is affinedSean Christopherson
Rework the CPU selection in the migration worker to ensure the specified number of migrations are performed when the test iteslf is affined to a subset of CPUs. The existing logic skips iterations if the target CPU is not in the original set of possible CPUs, which causes the test to fail if too many iterations are skipped. ==== Test Assertion Failure ==== rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2) pid=10127 tid=10127 errno=4 - Interrupted system call 1 0x00000000004018e5: main at rseq_test.c:227 2 0x00007fcc8fc66bf6: ?? ??:0 3 0x0000000000401959: _start at ??:? Only performed 4 KVM_RUNs, task stalled too much? Calculate the min/max possible CPUs as a cheap "best effort" to avoid high runtimes when the test is affined to a small percentage of CPUs. Alternatively, a list or xarray of the possible CPUs could be used, but even in a horrendously inefficient setup, such optimizations are not needed because the runtime is completely dominated by the cost of migrating the task, and the absolute runtime is well under a minute in even truly absurd setups, e.g. running on a subset of vCPUs in a VM that is heavily overcommited (16 vCPUs per pCPU). Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Reported-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210929234112.1862848-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checksSean Christopherson
Check whether a CPUID entry's index is significant before checking for a matching index to hack-a-fix an undefined behavior bug due to consuming uninitialized data. RESET/INIT emulation uses kvm_cpuid() to retrieve CPUID.0x1, which does _not_ have a significant index, and fails to initialize the dummy variable that doubles as EBX/ECX/EDX output _and_ ECX, a.k.a. index, input. Practically speaking, it's _extremely_ unlikely any compiler will yield code that causes problems, as the compiler would need to inline the kvm_cpuid() call to detect the uninitialized data, and intentionally hose the kernel, e.g. insert ud2, instead of simply ignoring the result of the index comparison. Although the sketchy "dummy" pattern was introduced in SVM by commit 66f7b72e1171 ("KVM: x86: Make register state after reset conform to specification"), it wasn't actually broken until commit 7ff6c0350315 ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the order of operations such that "index" was checked before the significant flag. Avoid consuming uninitialized data by reverting to checking the flag before the index purely so that the fix can be easily backported; the offending RESET/INIT code has been refactored, moved, and consolidated from vendor code to common x86 since the bug was introduced. A future patch will directly address the bad RESET/INIT behavior. The undefined behavior was detected by syzbot + KernelMemorySanitizer. BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68 BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline] kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline] kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534 vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788 kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020 Local variable ----dummy@kvm_vcpu_reset created at: kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com Reported-by: Alexander Potapenko <glider@google.com> Fixes: 2a24be79b6b7 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping") Fixes: 7ff6c0350315 ("KVM: x86: Remove stateful CPUID handling") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210929222426.1855730-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvmZelin Deng
hv_clock is preallocated to have only HVC_BOOT_ARRAY_SIZE (64) elements; if the PTP_SYS_OFFSET_PRECISE ioctl is executed on vCPUs whose index is 64 of higher, retrieving the struct pvclock_vcpu_time_info pointer with "src = &hv_clock[cpu].pvti" will result in an out-of-bounds access and a wild pointer. Change it to "this_cpu_pvti()" which is guaranteed to be valid. Fixes: 95a3d4454bb1 ("Switch kvmclock data to a PER_CPU variable") Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: <stable@vger.kernel.org> Message-Id: <1632892429-101194-3-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>