summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-17xfs: check inode core when scrubbing metadata filesscrub-fix-checking-gaps_2022-03-17Darrick J. Wong
Metadata files (e.g. realtime bitmaps and quota files) do not show up in the bulkstat output, which means that scrub-by-handle does not work; they can only be checked through a specific scrub type. Therefore, each scrub type calls xchk_metadata_inode_forks to check the metadata for whatever's in the file. Unfortunately, that function doesn't actually check the inode record itself. Refactor the function a bit to make that happen. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: fix inode core scrubber racing with ifree/iallocDarrick J. Wong
If the xfs_iget call during setup for the inode scrubber fails with EINVAL or EFSCORRUPTED, that means that we were unable to create an incore inode either because the inode btree says the ondisk inode is free, or because there's corruption in the inode forks. Either way, we failed to get an incore inode. We'd like to distinguish between real corruption and the ondisk inode being free, because in the second case our work is done. To settle this, we try xfs_imap to see if the ondisk inode is free. For performance reasons, the setup function doesn't grab its own reference to the AGI header for the iget lookup, which means that the setup function can race with another thread that frees the inode, which is how we end up with iget returning EINVAL. Unfortunately, the setup function also doesn't take a reference to the AGI header when it tries the imap, which means that the inode can be reallocated in the mean time. In this case, the scrub function sees that there is no incore inode attached to the scrub context and proclaims that the inode core is corrupt, which is not correct. Fix this by open-coding xchk_get_inode in the setup function and modifying it (1) to grab the AGI buffer if the cached inode cannot be loaded and (2) to retry the iget with our own reference to the AGI. This avoids all the coherence problems outlined above. If we grab the AGI buffer, we keep it until the scrub transaction is torn down. This will be very important for online repair of the ondisk inode, since it will need exclusive access to the AGI to prevent inode allocation activity, and the inode cluster buffer. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: manage inode DONTCACHE status at irele timeDarrick J. Wong
Right now, there are statements scattered all over the online fsck codebase about how we can't use XFS_IGET_DONTCACHE because of concerns about scrub's unusual practice of releasing inodes with transactions held. However, iget is the wrong place to handle this -- the DONTCACHE state doesn't matter at all until we try to *release* the inode, and here we get things wrong in multiple ways: First, if we /do/ have a transaction, we must NOT drop the inode, because the inode could have dirty pages, dropping the inode will trigger writeback, and writeback can trigger a nested transaction. Second, if the inode already had an active reference and the DONTCACHE flag set, the icache hit when scrub grabs another ref will not clear DONTCACHE. This is sort of by design, since DONTCACHE is now used to initiate cache drops so that sysadmins can change a file's access mode between pagecache and DAX. Third, if we do actually have the last active reference to the inode, we can set DONTCACHE to avoid polluting the cache. This is the /one/ case where we actually want that flag. Create an xchk_irele helper to encode all that logic and switch the online fsck code to use it. Since this now means that nearly all scrubbers use the same xfs_iget flags, we can wrap them too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: check the reference counts of gaps in the refcount btreeDarrick J. Wong
Gaps in the reference count btree are also significant -- for these regions, there must not be any overlapping reverse mappings. We don't currently check this, so make the refcount scrubber more complete. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: check quota files for unwritten extentsDarrick J. Wong
Teach scrub to flag quota files containing unwritten extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: block map scrub should handle incore delalloc reservationsDarrick J. Wong
Enhance the block map scrubber to check delayed allocation reservations. Though there are no physical space allocations to check, we do need to make sure that the range of file offsets being mapped are correct, and to bump the lastoff cursor so that key order checking works correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: teach scrub to check for adjacent bmaps when rmap larger than bmapDarrick J. Wong
When scrub is checking file fork mappings against rmap records and the rmap record starts before or ends after the bmap record, check the adjacent bmap records to make sure that they're adjacent to the one we're checking. This helps us to detect cases where the rmaps cover territory that the bmaps do not. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: fix return code when fatal signal encountered during dquot scrubDarrick J. Wong
If the scrub process is sent a fatal signal while we're checking dquots, the predicate for this will set the error code to -EINTR. Don't then squash that into -ECANCELED, because the wrong errno turns up in the trace output. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: online checking of the free rt extent countDarrick J. Wong
Teach the summary count checker to count the number of free realtime extents and compare that to the superblock copy. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: skip fscounters comparisons when the scan is incompleteDarrick J. Wong
If any part of the per-AG summary counter scan loop aborts without collecting all of the data we need, the scrubber's observation data will be invalid. Set the incomplete flag so that we abort the scrub without reporting false corruptions. Document the data dependency here too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: check btree keys reflect the child blockDarrick J. Wong
When scrub is checking a non-root btree block, it should make sure that the keys in the parent btree block accurately capture the keyspace that the child block stores. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: make checking directory dotdot entries more reliableDarrick J. Wong
The current directory parent scrubbing code could be tighter in its execution -- instead of bailing out to userspace after a couple of seconds of waiting for the (alleged) parent directory's IOLOCK while refusing to release the child directory's IOLOCK, we could just cycle both locks until we get both or the child process absorbs a fatal signal. Note that because the usual sequence is to take IOLOCKs before grabbing a transaction, we have to use the _nowait variants on both inodes to avoid an ABBA deadlock. Since parent pointer checking is the only place in scrub that needs this kind of functionality, move it to parent.c as a private function. Furthermore, if the child directory's parent changes during the lock cycling, we know that the new parent has stamped the correct parent into the dotdot entry, so we can conclude that the parent entry is correct. This eliminates an entire source of -EDEADLOCK-based "retry harder" scrub executions. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: teach xfs_btree_has_record to return false if there are gapsDarrick J. Wong
The current implementation of xfs_btree_has_record returns true if it finds /any/ record within the given range. Unfortunately, that's not what the predicate is supposed to do -- it's supposed to test if the /entire/ range is covered by records. Therefore, enhance the routine to check that the first record it encounters starts earlier or at the same point as the low key, the last record ends at or after the same point as the high key, and that there aren't any gaps in the records. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: fix rmap key comparison functionsDarrick J. Wong
Keys for extent interval records in the reverse mapping btree are supposed to be computed as follows: (physical block, owner, fork, is_btree, offset) This provides users the ability to look up a reverse mapping from a file block mapping record -- start with the physical block; then if there are multiple records for the same block, move on to the owner; then the inode fork type; and so on to the file offset. However, the key comparison functions incorrectly remove the fork/bmbt information that's encoded in the on-disk offset. This means that lookup comparisons are only done with: (physical block, owner, offset) This means that queries can return incorrect results. On consistent filesystems this isn't an issue because bmbt blocks and blocks mapped to an attr fork cannot be shared, but this prevents us from detecting incorrect fork and bmbt flag bits in the rmap btree. A previous version of this patch forgot to keep the (un)written state flag masked during the comparison and caused a major regression in 5.9.x since unwritten extent conversion can update an rmap record without requiring key updates. Note that blocks cannot go directly from data fork to attr fork without being deallocated and reallocated, nor can they be added to or removed from a bmbt without a free/alloc cycle, so this should not cause any regressions. Found by fuzzing keys[1].attrfork = ones on xfs/371. Fixes: 4b8ed67794fe ("xfs: add rmap btree operations") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: use per-cpu counters to implement intent drainingDarrick J. Wong
Currently, the intent draining code uses a per-AG atomic counter to keep track of how many writer threads are currently or going to start processing log intent items for that AG. This isn't particularly efficient, since every counter update will dirty the cacheline, and the only code that cares about precise counter values is online scrub, which shouldn't be running all that often. Therefore, substitute the atomic_t for a per-cpu counter with a high batch limit to avoid pingponging cache lines as long as possible. While updates to per-cpu counters are slower in the single-thread case (on the author's system, 12ns vs. 8ns), this quickly reverses itself if there are a lot of CPUs queuing intent items. Because percpu counter summation is slow, this change shifts most of the performance impact to code that calls xfs_drain_wait, which means that online fsck runs a little bit slower to minimize the overhead of regular runtime code. To reduce the runtime overhead even further, use a static branch key to decide if we call wake_up on the drain. For compilers that support jump labels, the call to wake_up is replaced by a nop sled when nobody is waiting for intents to drain. For the few compilers that don't, we pay the cost of reading from a read-mostly atomic counter. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: allow queued AG intents to drain before scrubbingDarrick J. Wong
When a writer thread executes a chain of log intent items, the AG header buffer locks will cycle during a transaction roll to get from one intent item to the next in a chain. Although scrub takes all AG header buffer locks, this isn't sufficient to guard against scrub checking an AG while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding allocation groups. When there's a collision, cross-referencing between data structures (e.g. rmapbt and refcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the perag structure the count of active intents and make scrub wait until it has both AG header buffer locks and the intent counter reaches zero. This is a little stupid since transactions can queue intents without taking buffer locks, but it's not the end of the world for scrub to wait (in KILLABLE state) for those transactions. In the next patch we'll improve on this facility, but this patch provides the basic functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: return EINTR when a fatal signal terminates scrubDarrick J. Wong
If the program calling online fsck is terminated with a fatal signal, bail out to userspace by returning EINTR, not EAGAIN. EAGAIN is used by scrubbers to indicate that we should try again with more resources locked, and not to indicate that the operation was cancelled. The miswiring is mostly harmless, but it shows up in the trace data. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: pivot online scrub away from kmem.[ch]Darrick J. Wong
Convert all the online scrub code to use the Linux slab allocator functions directly instead of going through the kmem wrappers. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: standardize GFP flags usage in online scrubDarrick J. Wong
Memory allocation usage is the same throughout online fsck -- we want kernel memory, we have to be able to back out if we can't allocate memory, and we don't want to spray dmesg with memory allocation failure reports. Standardize the GFP flag usage and document these requirements. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: set the buffer type after holding the AG[IF] across trans_rollDarrick J. Wong
Currently, the only way to lock an allocation group is to hold the AGI and AGF buffers. If repair needs to roll the transaction while repairing some AG metadata, it maintains that lock by holding the two buffers across the transaction roll and joins them afterwards. However, repair is not the same as the other parts of XFS that employ this bhold/bjoin sequence, because it's possible that the AGI or AGF buffers are not actually dirty before the roll. In this case, the buffer log item can detach from the buffer, which means that we have to re-set the buffer type in the bli after joining the buffer to the new transaction so that log recovery will know what to do if the fs fails. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: don't track the AGFL buffer in the scrub AG contextDarrick J. Wong
While scrubbing an allocation group, we don't need to hold the AGFL buffer as part of the scrub context. All that is necessary to lock an AG is to hold the AGI and AGF buffers, so fix all the existing users of the AGFL buffer to grab them only when necessary. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17btrfs: fix fallocate to use file_modified to update permissions consistentlyfalloc-permissions-fixes-5.18_2022-03-17Darrick J. Wong
Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities. The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17ext4: fix fallocate to use file_modified to update permissions consistentlyDarrick J. Wong
Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero, collapse, insert range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities. The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: constify xfs_name_dotdotxfs-fixes-5.18_2022-03-17Darrick J. Wong
The symbol xfs_name_dotdot is a global variable that the xfs codebase uses here and there to look up directory dotdot entries. Currently it's a non-const variable, which means that it's a mutable global variable. So far nobody's abused this to cause problems, but let's use the compiler to enforce that. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: constify the name argument to various directory functionsDarrick J. Wong
Various directory functions do not modify their @name parameter, so mark it const to make that clear. This will enable us to mark the global xfs_name_dotdot variable as const to prevent mischief. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: reserve quota for target dir expansion when renaming filesDarrick J. Wong
XFS does not reserve quota for directory expansion when renaming children into a directory. This means that we don't reject the expansion with EDQUOT when we're at or near a hard limit, which means that unprivileged userspace can use rename() to exceed quota. Rename operations don't always expand the target directory, and we allow a rename to proceed with no space reservation if we don't need to add a block to the target directory to handle the addition. Moreover, the unlink operation on the source directory generally does not expand the directory (you'd have to free a block and then cause a btree split) and it's probably of little consequence to leave the corner case that renaming a file out of a directory can increase its size. As with link and unlink, there is a further bug in that we do not trigger the blockgc workers to try to clear space when we're out of quota. Because rename is its own special tricky animal, we'll patch xfs_rename directly to reserve quota to the rename transaction. We'll leave cleaning up the rest of xfs_rename for the metadata directory tree patchset. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: reserve quota for dir expansion when linking/unlinking filesDarrick J. Wong
XFS does not reserve quota for directory expansion when linking or unlinking children from a directory. This means that we don't reject the expansion with EDQUOT when we're at or near a hard limit, which means that unprivileged userspace can use link()/unlink() to exceed quota. The fix for this is nuanced -- link operations don't always expand the directory, and we allow a link to proceed with no space reservation if we don't need to add a block to the directory to handle the addition. Unlink operations generally do not expand the directory (you'd have to free a block and then cause a btree split) and we can defer the directory block freeing if there is no space reservation. Moreover, there is a further bug in that we do not trigger the blockgc workers to try to clear space when we're out of quota. To fix both cases, create a new xfs_trans_alloc_dir function that allocates the transaction, locks and joins the inodes, and reserves quota for the directory. If there isn't sufficient space or quota, we'll switch the caller to reservationless mode. This should prevent quota usage overruns with the least restriction in functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-17xfs: refactor user/group quota chown in xfs_setattr_nonsizeDarrick J. Wong
Combine if tests to reduce the indentation levels of the quota chown calls in xfs_setattr_nonsize. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org>
2022-03-17xfs: use setattr_copy to set vfs inode attributesDarrick J. Wong
Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid revocation isn't consistent with btrfs[1] or ext4. Those two filesystems use the VFS function setattr_copy to convey certain attributes from struct iattr into the VFS inode structure. Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to decide if it should clear setgid and setuid on a file attribute update. This is a second symptom of the problem that Filipe noticed. XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode, xfs_setattr_nonsize, and xfs_setattr_time. Regrettably, setattr_copy is /not/ a simple copy function; it contains additional logic to clear the setgid bit when setting the mode, and XFS' version no longer matches. The VFS implements its own setuid/setgid stripping logic, which establishes consistent behavior. It's a tad unfortunate that it's scattered across notify_change, should_remove_suid, and setattr_copy but XFS should really follow the Linux VFS. Adapt XFS to use the VFS functions and get rid of the old functions. [1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/ [2] https://lore.kernel.org/linux-xfs/20220221182218.748084-1-andrey.zhadchenko@virtuozzo.com/ Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org>
2022-03-17xfs: don't generate selinux audit messages for capability testingDarrick J. Wong
There are a few places where we test the current process' capability set to decide if we're going to be more or less generous with resource acquisition for a system call. If the process doesn't have the capability, we can continue the call, albeit in a degraded mode. These are /not/ the actual security decisions, so it's not proper to use capable(), which (in certain selinux setups) causes audit messages to get logged. Switch them to has_capability_noaudit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Dave Chinner <david@fromorbit.com>
2022-03-17xfs: add missing cmap->br_state = XFS_EXT_NORM updateGao Xiang
COW extents are already converted into written real extents after xfs_reflink_convert_cow_locked(), therefore cmap->br_state should reflect it. Otherwise, there is another necessary unwritten convertion triggered in xfs_dio_write_end_io() for direct I/O cases. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-13Linux 5.17-rc8v5.17-rc8Linus Torvalds
2022-03-13Merge tag 'x86_urgent_for_v5.17_rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Free shmem backing storage for SGX enclave pages when those are swapped back into EPC memory - Prevent do_int3() from being kprobed, to avoid recursion - Remap setup_data and setup_indirect structures properly when accessing their members - Correct the alternatives patching order for modules too * tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Free backing memory after faulting the enclave page x86/traps: Mark do_int3() NOKPROBE_SYMBOL x86/boot: Add setup_indirect support in early_memremap_is_setup_data() x86/boot: Fix memremap of setup_indirect structures x86/module: Fix the paravirt vs alternative order
2022-03-12Merge tag 'perf-tools-fixes-for-v5.17-2022-03-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix event parser error for hybrid systems - Fix NULL check against wrong variable in 'perf bench' and in the parsing code - Update arm64 KVM headers from the kernel sources - Sync cpufeatures header with the kernel sources * tag 'perf-tools-fixes-for-v5.17-2022-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf parse: Fix event parser error for hybrid systems perf bench: Fix NULL check against wrong variable perf parse-events: Fix NULL check against wrong variable tools headers cpufeatures: Sync with the kernel sources tools kvm headers arm64: Update KVM headers from the kernel sources
2022-03-12Merge tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm kconfig fix from Dave Airlie: "Thorsten pointed out this had fallen down the cracks and was in -next only, I've picked it out, fixed up it's Fixes: line. - fix regression in Kconfig" * tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drm: drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP
2022-03-12perf parse: Fix event parser error for hybrid systemsZhengjun Xing
This bug happened on hybrid systems when both cpu_core and cpu_atom have the same event name such as "UOPS_RETIRED.MS" while their event terms are different, then during perf stat, the event for cpu_atom will parse fail and then no output for cpu_atom. UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/ It is because event terms in the "head" of parse_events_multi_pmu_add will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS for cpu_core, then when parsing the same event for cpu_atom, it still uses the event terms for cpu_core, but event terms for cpu_atom are different with cpu_core, the event parses for cpu_atom will fail. This patch fixes it, the event terms should be parsed from the original event. This patch can work for the hybrid systems that have the same event in more than 2 PMUs. It also can work in non-hybrid systems. Before: # perf stat -v -e UOPS_RETIRED.MS -a sleep 1 Using CPUID GenuineIntel-6-97-1 UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ Control descriptor is not initialized UOPS_RETIRED.MS: 2737845 16068518485 16068518485 Performance counter stats for 'system wide': 2,737,845 cpu_core/UOPS_RETIRED.MS/ 1.002553850 seconds time elapsed After: # perf stat -v -e UOPS_RETIRED.MS -a sleep 1 Using CPUID GenuineIntel-6-97-1 UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/ Control descriptor is not initialized UOPS_RETIRED.MS: 1977555 16076950711 16076950711 UOPS_RETIRED.MS: 568684 8038694234 8038694234 Performance counter stats for 'system wide': 1,977,555 cpu_core/UOPS_RETIRED.MS/ 568,684 cpu_atom/UOPS_RETIRED.MS/ 1.004758259 seconds time elapsed Fixes: fb0811535e92c6c1 ("perf parse-events: Allow config on kernel PMU events") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220307151627.30049-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12perf bench: Fix NULL check against wrong variableWeiguo Li
We did a NULL check after "epollfdp = calloc(...)", but we checked "epollfd" instead of "epollfdp". Signed-off-by: Weiguo Li <liwg06@foxmail.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/tencent_B5D64530EB9C7DBB8D2C88A0C790F1489D0A@qq.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12perf parse-events: Fix NULL check against wrong variableWeiguo Li
We did a null check after "tmp->symbol = strdup(...)", but we checked "list->symbol" other than "tmp->symbol". Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Weiguo Li <liwg06@foxmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/tencent_DF39269807EC9425E24787E6DB632441A405@qq.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12tools headers cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: d45476d983240937 ("x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE") Its just a comment fixup. This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/lkml/YiyiHatGaJQM7l/Y@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12tools kvm headers arm64: Update KVM headers from the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: a5905d6af492ee6a ("KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated") That don't causes any changes in tooling (when built on x86), only addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h' diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/lkml/YiyhAK6sVPc83FaI@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDPThomas Zimmermann
As reported in [1], DRM_PANEL_EDP depends on DRM_DP_HELPER. Select the option to fix the build failure. The error message is shown below. arm-linux-gnueabihf-ld: drivers/gpu/drm/panel/panel-edp.o: in function `panel_edp_probe': panel-edp.c:(.text+0xb74): undefined reference to `drm_panel_dp_aux_backlight' make[1]: *** [/builds/linux/Makefile:1222: vmlinux] Error 1 The issue has been reported before, when DisplayPort helpers were hidden behind the option CONFIG_DRM_KMS_HELPER. [2] v2: * fix and expand commit description (Arnd) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 9d6366e743f3 ("drm: fb_helper: improve CONFIG_FB dependency") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://lore.kernel.org/dri-devel/CA+G9fYvN0NyaVkRQmA1O6rX7H8PPaZrUAD7=RDy33QY9rUU-9g@mail.gmail.com/ # [1] Link: https://lore.kernel.org/all/20211117062704.14671-1-rdunlap@infradead.org/ # [2] Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: dri-devel@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20220203093922.20754-1-tzimmermann@suse.de Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-03-11ARM: Spectre-BHB: provide empty stub for non-configRandy Dunlap
When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references to spectre_v2_update_state() cause a build error, so provide an empty stub for that function when the Kconfig option is not set. Fixes this build error: arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init': proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state' arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state' Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11Merge tag 'riscv-for-linus-5.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - prevent users from enabling the alternatives framework (and thus errata handling) on XIP kernels, where runtime code patching does not function correctly. - properly detect offset overflow for AUIPC-based relocations in modules. This may manifest as modules calling arbitrary invalid addresses, depending on the address allocated when a module is loaded. * tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix auipc+jalr relocation range checks riscv: alternative only works on !XIP_KERNEL
2022-03-11Merge tag 'powerpc-5.17-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix STACKTRACE=n build, in particular for skiroot_defconfig" * tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix STACKTRACE=n build
2022-03-11ARM: fix Thumb2 regression with Spectre BHBRussell King (Oracle)
When building for Thumb2, the vectors make use of a local label. Sadly, the Spectre BHB code also uses a local label with the same number which results in the Thumb2 reference pointing at the wrong place. Fix this by changing the number used for the Spectre BHB local label. Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11Merge tag 'mmc-v5.17-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Restore (mostly) the busy polling for MMC_SEND_OP_COND MMC host: - meson-gx: Fix DMA usage of meson_mmc_post_req()" * tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND mmc: meson: Fix usage of meson_mmc_post_req()
2022-03-11x86/sgx: Free backing memory after faulting the enclave pageJarkko Sakkinen
There is a limited amount of SGX memory (EPC) on each system. When that memory is used up, SGX has its own swapping mechanism which is similar in concept but totally separate from the core mm/* code. Instead of swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM comes from a shared memory pseudo-file and can itself be swapped by the core mm code. There is a hierarchy like this: EPC <-> shmem <-> disk After data is swapped back in from shmem to EPC, the shmem backing storage needs to be freed. Currently, the backing shmem is not freed. This effectively wastes the shmem while the enclave is running. The memory is recovered when the enclave is destroyed and the backing storage freed. Sort this out by freeing memory with shmem_truncate_range(), as soon as a page is faulted back to the EPC. In addition, free the memory for PCMD pages as soon as all PCMD's in a page have been marked as unused by zeroing its contents. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
2022-03-11Merge branch 'davidh' (fixes from David Howells)Linus Torvalds
Merge misc fixes from David Howells: "A set of patches for watch_queue filter issues noted by Jann. I've added in a cleanup patch from Christophe Jaillet to convert to using formal bitmap specifiers for the note allocation bitmap. Also two filesystem fixes (afs and cachefiles)" * emailed patches from David Howells <dhowells@redhat.com>: cachefiles: Fix volume coherency attribute afs: Fix potential thrashing in afs writeback watch_queue: Make comment about setting ->defunct more accurate watch_queue: Fix lack of barrier/sync/lock between post and read watch_queue: Free the alloc bitmap when the watch_queue is torn down watch_queue: Fix the alloc bitmap size to reflect notes allocated watch_queue: Use the bitmap API when applicable watch_queue: Fix to always request a pow-of-2 pipe ring size watch_queue: Fix to release page in ->release() watch_queue, pipe: Free watchqueue state after clearing pipe ring watch_queue: Fix filter limit check
2022-03-11cachefiles: Fix volume coherency attributeDavid Howells
A network filesystem may set coherency data on a volume cookie, and if given, cachefiles will store this in an xattr on the directory in the cache corresponding to the volume. The function that sets the xattr just stores the contents of the volume coherency buffer directly into the xattr, with nothing added; the checking function, on the other hand, has a cut'n'paste error whereby it tries to interpret the xattr contents as would be the xattr on an ordinary file (using the cachefiles_xattr struct). This results in a failure to match the coherency data because the buffer ends up being shifted by 18 bytes. Fix this by defining a structure specifically for the volume xattr and making both the setting and checking functions use it. Since the volume coherency doesn't work if used, take the opportunity to insert a reserved field for future use, set it to 0 and check that it is 0. Log mismatch through the appropriate tracepoint. Note that this only affects cifs; 9p, afs, ceph and nfs don't use the volume coherency data at the moment. Fixes: 32e150037dce ("fscache, cachefiles: Store the volume coherency data") Reported-by: Rohith Surabattula <rohiths.msft@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Steve French <smfrench@gmail.com> cc: linux-cifs@vger.kernel.org cc: linux-cachefs@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11afs: Fix potential thrashing in afs writebackDavid Howells
In afs_writepages_region(), if the dirty page we find is undergoing writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go round the loop trying the same page again and again with no pausing or waiting unless and until another thread manages to clear the writeback and fscache flags. Fix this with three measures: (1) Advance start to after the page we found. (2) Break out of the loop and return if rescheduling is requested. (3) Arbitrarily give up after a maximum of 5 skips. Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>