summaryrefslogtreecommitdiff
path: root/kernel/rcu/tree_plugin.h
AgeCommit message (Collapse)Author
2015-07-17rcu: Stop disabling CPU hotplug in synchronize_rcu_expedited()Paul E. McKenney
The fact that tasks could be migrated from leaf to root rcu_node structures meant that synchronize_rcu_expedited() had to disable CPU hotplug. However, tasks now stay put, so this commit removes the CPU-hotplug disabling from synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Simplify arithmetic to calculate number of RCU nodesAlexander Gordeev
This update makes arithmetic to calculate number of RCU nodes more straight and easy to read. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-06-22Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather largish update for everything time and timer related: - Cache footprint optimizations for both hrtimers and timer wheel - Lower the NOHZ impact on systems which have NOHZ or timer migration disabled at runtime. - Optimize run time overhead of hrtimer interrupt by making the clock offset updates smarter - hrtimer cleanups and removal of restrictions to tackle some problems in sched/perf - Some more leap second tweaks - Another round of changes addressing the 2038 problem - First step to change the internals of clock event devices by introducing the necessary infrastructure - Allow constant folding for usecs/msecs_to_jiffies() - The usual pile of clockevent/clocksource driver updates The hrtimer changes contain updates to sched, perf and x86 as they depend on them plus changes all over the tree to cleanup API changes and redundant code, which got copied all over the place. The y2038 changes touch s390 to remove the last non 2038 safe code related to boot/persistant clock" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) clocksource: Increase dependencies of timer-stm32 to limit build wreckage timer: Minimize nohz off overhead timer: Reduce timer migration overhead if disabled timer: Stats: Simplify the flags handling timer: Replace timer base by a cpu index timer: Use hlist for the timer wheel hash buckets timer: Remove FIFO "guarantee" timers: Sanitize catchup_timer_jiffies() usage hrtimer: Allow hrtimer::function() to free the timer seqcount: Introduce raw_write_seqcount_barrier() seqcount: Rename write_seqcount_barrier() hrtimer: Fix hrtimer_is_queued() hole hrtimer: Remove HRTIMER_STATE_MIGRATE selftest: Timers: Avoid signal deadlock in leap-a-day timekeeping: Copy the shadow-timekeeper over the real timekeeper last clockevents: Check state instead of mode in suspend/resume path selftests: timers: Add leap-second timer edge testing to leap-a-day.c ntp: Do leapsecond adjustment in adjtimex read path time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge ntp: Introduce and use SECS_PER_DAY macro instead of 86400 ...
2015-06-19timer: Reduce timer migration overhead if disabledThomas Gleixner
Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled. We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base. The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line. With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what. Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-27Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', ↵Paul E. McKenney
'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD array.2015.05.27a: Remove all uses of RCU-protected array indexes. doc.2015.05.27a: Docuemntation updates. fixes.2015.05.27a: Miscellaneous fixes. hotplug.2015.05.27a: CPU-hotplug updates. init.2015.05.27a: Initialization/Kconfig updates. tiny.2015.05.27a: Updates to Tiny RCU. torture.2015.05.27a: Torture-testing updates.
2015-05-27rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAFPaul E. McKenney
This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined. The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT_LEAF unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUTPaul E. McKenney
This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameterPaul E. McKenney
The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and perhaps only) by rcutorture to verify that RCU works correctly in specific rcu_node combining-tree configurations. It therefore does not make much sense have this as a question to people attempting to configure their kernels. So this commit creates an rcutree.rcu_fanout_exact= boot parameter that rcutorture can use, and eliminates the original CONFIG_RCU_FANOUT_EXACT Kconfig parameter. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Adjust ->lock acquisition for tasks no longer migratingPaul E. McKenney
Tasks are no longer migrated away from a given rcu_node structure when all CPUs corresponding to that rcu_node structure have gone offline. This means that rcu_read_unlock_special() no longer needs to loop retrying rcu_node ->lock acquisition because the current task is guaranteed to stay put. This commit takes a small and paranoid step towards relying on this guarantee by placing a WARN_ON_ONCE() just after the early exit from the lock-acquisition loop. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Fix missing task information during rcu-preempt stallPatrick Daly
The first item list_for_each_entry_continue(alist) iterates over is alist->next, rather than alist itself. Consequently, rcu_print_detail_task_stall_rnp() skips the task referenced by gp_tasks. Use gp_tasks->prev as the argument to list_for_each_entry_continue() instead. Signed-off-by: Patrick Daly <pdaly@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: tree_plugin: Use bool function return values of true/false not 1/0Joe Perches
Use the normal return values for bool functions Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Eliminate a few CONFIG_RCU_NOCB_CPU_ALL #ifdefsPaul E. McKenney
This commit converts several CONFIG_RCU_NOCB_CPU_ALL #ifdefs to instead use IS_ENABLED(). This change should help avoid hiding code from compiler diagnostics. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Create an immutable rcu_data_p pointer to default rcu_data structurePaul E. McKenney
This commit creates an immutable rcu_data_p pointer that references rcu_preempt_data for TREE_PREEMPT_RCU builds and that references rcu_sched_data for TREE_RCU builds. This rcu_data_p pointer will enable more code to move from #ifdef to IS_ENABLED(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Tell the compiler that rcu_state_p is immutablePaul E. McKenney
This commit adds a "const" tag to the declarations of rcu_state_p, which should allow the compiler to generate better code and also to catch erroneous assignments to this variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Eliminate a few RCU_BOOST #ifdefs in favor of IS_ENABLED()Paul E. McKenney
This commit removes a few RCU_BOOST #ifdefs, replacing them with IS_ENABLED()-protected return statements. This relies on the optimizer to remove any resulting dead code. There are several other RCU_BOOST #ifdefs, however these rely on some per-CPU variables that are available only under RCU_BOOST. These might be converted later, if the simplification proves to outweigh the increase in memory footprint. One hoped-for advantage is more easily locating compiler errors in obscure combinations of Kconfig parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-rt-users@vger.kernel.org>
2015-05-27rcu: Convert from rcu_preempt_state to *rcu_state_pPaul E. McKenney
It would be good to move more code from #ifdef to IS_ENABLED(), but that does not work if the body of the IS_ENABLED() "if" statement references a variable (such as rcu_preempt_state) that does not exist if the IS_ENABLED() Kconfig variable is not set. This commit therefore substitutes *rcu_state_p for all uses of rcu_preempt_state in kernel/rcu/tree_preempt.h, which should enable elimination of a few #ifdefs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()Paul E. McKenney
This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE() and WRITE_ONCE() APIs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]
2015-04-22tick: Nohz: Rework next timer evaluationThomas Gleixner
The evaluation of the next timer in the nohz code is based on jiffies while all the tick internals are nano seconds based. We have also to convert hrtimer nanoseconds to jiffies in the !highres case. That's just wrong and introduces interesting corner cases. Turn it around and convert the next timer wheel timer expiry and the rcu event to clock monotonic and base all calculations on nanoseconds. That identifies the case where no timer is pending clearly with an absolute expiry value of KTIME_MAX. Makes the code more readable and gets rid of the jiffies magic in the nohz code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-03-20Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', ↵Paul E. McKenney
'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD doc.2015.02.26a: Documentation changes earlycb.2015.03.03a: Permit early-boot RCU callbacks fixes.2015.03.03a: Miscellaneous fixes gpexp.2015.02.26a: In-kernel expediting of normal grace periods hotplug.2015.03.20a: CPU hotplug fixes sysidle.2015.02.26b: NO_HZ_FULL_SYSIDLE fixes tiny.2015.02.26a: TINY_RCU fixes
2015-03-12rcu: Process offlining and onlining only at grace-period startPaul E. McKenney
Races between CPU hotplug and grace periods can be difficult to resolve, so the ->onoff_mutex is used to exclude the two events. Unfortunately, this means that it is impossible for an outgoing CPU to perform the last bits of its offlining from its last pass through the idle loop, because sleeplocks cannot be acquired in that context. This commit avoids these problems by buffering online and offline events in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a grace period starts, the events accumulated in this mask are applied to the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special case of all CPUs corresponding to a given leaf rcu_node structure being offline while there are still elements in that structure's ->blkd_tasks list is handled using a new ->wait_blkd_tasks field. In this case, propagating the offline bits up the tree is deferred until the beginning of the grace period after all of the tasks have exited their RCU read-side critical sections and removed themselves from the list, at which point the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node structure's CPUs comes back online before the list empties, then the ->wait_blkd_tasks flag is simply cleared. This of course means that RCU's notion of which CPUs are offline can be out of date. This is OK because RCU need only wait on CPUs that were online at the time that the grace period started. In addition, RCU's force-quiescent-state actions will handle the case where a CPU goes offline after the grace period starts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Move rcu_report_unblock_qs_rnp() to common codePaul E. McKenney
The rcu_report_unblock_qs_rnp() function is invoked when the last task blocking the current grace period exits its outermost RCU read-side critical section. Previously, this was called only from rcu_read_unlock_special(), and was therefore defined only when CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at the beginnings of RCU grace periods. The reason for this change is that the last task on a given leaf rcu_node structure's ->blkd_tasks list might well exit its RCU read-side critical section between the time that recent CPU-hotplug operations were applied and when the new grace period was initialized. This situation could result in RCU waiting forever on that leaf rcu_node structure, because if all that structure's CPUs were already offline, there would be no quiescent-state events to drive that structure's part of the grace period. This commit therefore moves rcu_report_unblock_qs_rnp() to common code that is built unconditionally so that the quiescent-state-forcing code can clean up after this situation, avoiding the grace-period stall. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Rework preemptible expedited bitmask handlingPaul E. McKenney
Currently, the rcu_node tree ->expmask bitmasks are initially set to reflect the online CPUs. This is pointless, because only the CPUs preempted within RCU read-side critical sections by the preceding synchronize_sched_expedited() need to be tracked. This commit therefore instead sets up these bitmasks based on the state of the ->blkd_tasks lists. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Eliminate empty HOTPLUG_CPU ifdefPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Simplify sync_rcu_preempt_exp_init()Paul E. McKenney
This commit eliminates a boolean and associated "if" statement by rearranging the code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Add boot-up check for non-default CONFIG_RCU_FANOUT_LEAF valuesPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Use IS_ENABLED() to simplify rcu_bootup_announce_oddness()Paul E. McKenney
This commit gets rid of some inline #ifdefs by replacing them with IS_ENABLED. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Improve diagnostics for blocked critical sections in irqPaul E. McKenney
If an RCU read-side critical section occurs within an interrupt handler or a softirq handler, it cannot have been preempted. Therefore, there is a check in rcu_read_unlock_special() checking for this error. However, when this check triggers, it lacks diagnostic information. This commit therefore moves rcu_read_unlock()'s lockdep annotation to follow the call to __rcu_read_unlock() and changes rcu_read_unlock_special()'s WARN_ON_ONCE() to an lockdep_rcu_suspicious() in order to locate where the offending RCU read-side critical section began. In addition, the value of the ->rcu_read_unlock_special field is printed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Move early-boot callbacks to no-CBs lists for no-CBs CPUsPaul E. McKenney
When a CPU is first determined to be a no-CBs CPUs, this commit causes any early boot callbacks to be moved to the no-CBs callback list, allowing them to be invoked. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Tighten up affinity and check for sysidlePaul E. McKenney
If the RCU grace-period kthread invoking rcu_sysidle_check_cpu() happens to be running on the tick_do_timer_cpu initially, then rcu_bind_gp_kthread() won't bind it. This kthread might then migrate before invoking rcu_gp_fqs(), which will trigger the WARN_ON_ONCE() in rcu_sysidle_check_cpu(). This commit therefore makes rcu_bind_gp_kthread() do the binding even if the kthread is currently on the same CPU. Because this incurs added overhead, this commit also causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread() once at boot rather than at the beginning of each grace period. And as long as rcu_bind_gp_kthread() is being modified, this commit eliminates its #ifdef. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Update from rcu_expedited variable to rcu_gp_is_expedited()Paul E. McKenney
This commit updates open-coded tests of the rcu_expedited variable to instead use rcu_gp_is_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Refine diagnostics for lacking kthread for no-CBs callbacksPaul E. McKenney
Some diagnostics under CONFIG_PROVE_RCU in rcu_nocb_cpu_needs_barrier() assume that there can be no early-boot callbacks. This commit therefore qualifies the diagnostic with rcu_scheduler_fully_active to permit early boot callbacks to avoid this splat. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-21Merge branches 'core-urgent-for-linus' and 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rcu fix and x86 irq fix from Ingo Molnar: - Fix a bug that caused an RCU warning splat. - Two x86 irq related fixes: a hotplug crash fix and an ACPI IRQ registry fix. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Clear need_qs flag to prevent splat * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() x86/irq: Fix regression caused by commit b568b8601f05
2015-02-13rcu: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11rcu: Clear need_qs flag to prevent splatPaul E. McKenney
If the scheduling-clock interrupt sets the current tasks need_qs flag, but if the current CPU passes through a quiescent state in the meantime, then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool RCU into thinking that additional rcu_read_unlock_special() processing is needed. This commit therefore clears the need_qs flag before checking for additional processing. For this problem to occur, we need rcu_preempt_data.passed_quiesce equal to true and current->rcu_read_unlock_special.b.need_qs also equal to true. This condition can occur as follows: 1. CPU 0 is aware of the current preemptible RCU grace period, but has not yet passed through a quiescent state. Among other things, this means that rcu_preempt_data.passed_quiesce is false. 2. Task A running on CPU 0 enters a preemptible RCU read-side critical section. 3. CPU 0 takes a scheduling-clock interrupt, which notices the RCU read-side critical section and the need for a quiescent state, and thus sets current->rcu_read_unlock_special.b.need_qs to true. 4. Task A is preempted, enters the scheduler, eventually invoking rcu_preempt_note_context_switch() which in turn invokes rcu_preempt_qs(). Because rcu_preempt_data.passed_quiesce is false, control enters the body of the "if" statement, which sets rcu_preempt_data.passed_quiesce to true. 5. At this point, CPU 0 takes an interrupt. The interrupt handler contains an RCU read-side critical section, and the rcu_read_unlock() notes that current->rcu_read_unlock_special is nonzero, and thus invokes rcu_read_unlock_special(). 6. Once in rcu_read_unlock_special(), the fact that current->rcu_read_unlock_special.b.need_qs is true becomes apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs(). Recursively, given that we interrupted out of that same function in the preceding step. 7. Because rcu_preempt_data.passed_quiesce is now true, rcu_preempt_qs() does nothing, and simply returns. 8. Upon return to rcu_read_unlock_special(), it is noted that current->rcu_read_unlock_special is still nonzero (because the interrupted rcu_preempt_qs() had not yet gotten around to clearing current->rcu_read_unlock_special.b.need_qs). 9. Execution proceeds to the WARN_ON_ONCE(), which notes that we are in an interrupt handler and thus duly splats. The solution, as noted above, is to make rcu_read_unlock_special() clear out current->rcu_read_unlock_special.b.need_qs after calling rcu_preempt_qs(). The interrupted rcu_preempt_qs() will clear it again, but this is harmless. The worst that happens is that we clobber another attempt to set this field, but this is not a problem because we just got done reporting a quiescent state. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix embarrassing build bug noted by Sasha Levin. ] Tested-by: Sasha Levin <sasha.levin@oracle.com>
2015-01-15Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', ↵Paul E. McKenney
'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD doc.2015.01.07a: Documentation updates. fixes.2015.01.15a: Miscellaneous fixes. preempt.2015.01.06a: Changes to handling of lists of preempted tasks. srcu.2015.01.06a: SRCU updates. stall.2015.01.16a: RCU CPU stall-warning updates and fixes. torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15rcu: Optionally run grace-period kthreads at real-time priorityPaul E. McKenney
Recent testing has shown that under heavy load, running RCU's grace-period kthreads at real-time priority can improve performance (according to 0day test robot) and reduce the incidence of RCU CPU stall warnings. However, most systems do just fine with the default non-realtime priorities for these kthreads, and it does not make sense to expose the entire user base to any risk stemming from this change, given that this change is of use only to a few users running extremely heavy workloads. Therefore, this commit allows users to specify realtime priorities for the grace-period kthreads, but leaves them running SCHED_OTHER by default. The realtime priority may be specified at build time via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the rcutree.kthread_prio parameter. Either way, 0 says to continue the default SCHED_OTHER behavior and values from 1-99 specify that priority of SCHED_FIFO behavior. Note that a value of 0 is not permitted when the RCU_BOOST Kconfig parameter is specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10rcutorture: Check from beginning to end of grace periodPaul E. McKenney
Currently, rcutorture's Reader Batch checks measure from the end of the previous grace period to the end of the current one. This commit tightens up these checks by measuring from the start and end of the same grace period. This involves adding rcu_batches_started() and friends corresponding to the existing rcu_batches_completed() and friends. We leave SRCU alone for the moment, as it does not yet have a way of tracking both ends of its grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10rcu: Make _batches_completed() functions return unsigned longPaul E. McKenney
Long ago, the various ->completed fields were of type long, but now are unsigned long due to signed-integer-overflow concerns. However, the various _batches_completed() functions remained of type long, even though their only purpose in life is to return the corresponding ->completed field. This patch cleans this up by changing these functions' return types to unsigned long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Handle gpnum/completed wrap while dyntick idlePaul E. McKenney
Subtle race conditions can result if a CPU stays in dyntick-idle mode long enough for the ->gpnum and ->completed fields to wrap. For example, consider the following sequence of events: o CPU 1 encounters a quiescent state while waiting for grace period 5 to complete, but then enters dyntick-idle mode. o While CPU 1 is in dyntick-idle mode, the grace-period counters wrap around so that the grace period number is now 4. o Just as CPU 1 exits dyntick-idle mode, grace period 4 completes and grace period 5 begins. o The quiescent state that CPU 1 passed through during the old grace period 5 looks like it applies to the new grace period 5. Therefore, the new grace period 5 completes without CPU 1 having passed through a quiescent state. This could clearly be a fatal surprise to any long-running RCU read-side critical section that happened to be running on CPU 1 at the time. At one time, this was not a problem, given that it takes significant time for the grace-period counters to overflow even on 32-bit systems. However, with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long idle periods are now becoming quite feasible. It is therefore time to close this race. This commit therefore avoids this race condition by having the quiescent-state forcing code detect when a CPU is falling too far behind, and setting a new rcu_data field ->gpwrap when this happens. Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed fields are known to be untrustworthy, and can be ignored, along with any associated quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Make RCU_CPU_STALL_INFO include number of fqs attemptsPaul E. McKenney
One way that an RCU CPU stall warning can happen is if the grace-period kthread is not allowed to execute. One proxy for this kthread's forward progress is the number of force-quiescent-state (fqs) scans. This commit therefore adds the number of fqs scans to the RCU CPU stall warning printouts when CONFIG_RCU_CPU_STALL_INFO=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid ↵Lai Jiangshan
priority-inversion The patch dfeb9765ce3c ("Allow post-unlock reference for rt_mutex") ensured rcu-boost safe even the rt_mutex has post-unlock reference. But rt_mutex allowing post-unlock reference is definitely a bug and it was fixed by the commit 27e35715df54 ("rtmutex: Plug slow unlock race"). This fix made the previous patch (dfeb9765ce3c) useless. And even worse, the priority-inversion introduced by the the previous patch still exists. rcu_read_unlock_special() { rt_mutex_unlock(&rnp->boost_mtx); /* Priority-Inversion: * the current task had been deboosted and preempted as a low * priority task immediately, it could wait long before reschedule in, * and the rcu-booster also waits on this low priority task and sleeps. * This priority-inversion makes rcu-booster can't work * as expected. */ complete(&rnp->boost_completion); } Just revert the patch to avoid it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't bother affinitying rcub kthreads away from offline CPUsPaul E. McKenney
When rcu_boost_kthread_setaffinity() sees that all CPUs for a given rcu_node structure are now offline, it affinities the corresponding RCU-boost ("rcub") kthread away from those CPUs. This is pointless because the kthread cannot run on those offline CPUs in any case. This commit therefore removes this unneeded code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't spawn rcub kthreads on root rcu_node structurePaul E. McKenney
Now that offlining CPUs no longer moves leaf rcu_node structures' ->blkd_tasks lists to the root, there is no way for the root rcu_node structure's ->blkd_task list to be nonempty, unless the root node is also the sole leaf node. This commit therefore refrains from creating an rcub kthread for the root rcu_node structure unless it is also the sole leaf. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Make use of rcu_preempt_has_tasks()Paul E. McKenney
Given that there is now arcu_preempt_has_tasks() function that checks to see if the ->blkd_tasks list is non-empty, this commit makes use of it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't migrate blocked tasks even if all corresponding CPUs offlinePaul E. McKenney
When the last CPU associated with a given leaf rcu_node structure goes offline, something must be done about the tasks queued on that rcu_node structure. Each of these tasks has been preempted on one of the leaf rcu_node structure's CPUs while in an RCU read-side critical section that it have not yet exited. Handling these tasks is the job of rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node structure to the root rcu_node structure. Unfortunately, this migration has to be done one task at a time because each tasks allegiance must be shifted from the original leaf rcu_node to the root, so that future attempts to deal with these tasks will acquire the root rcu_node structure's ->lock rather than that of the leaf. Worse yet, this migration must be done with interrupts disabled, which is not so good for realtime response, especially given that there is no bound on the number of tasks on a given rcu_node structure's list. (OK, OK, there is a bound, it is just that it is unreasonably large, especially on 64-bit systems.) This was not considered a problem back when rcu_preempt_offline_tasks() was first written because realtime systems were assumed not to do CPU-hotplug operations while real-time applications were running. This assumption has proved of dubious validity given that people are starting to run multiple realtime applications on a single SMP system and that it is common practice to offline then online a CPU before starting its real-time application in order to clear extraneous processing off of that CPU. So we now need CPU hotplug operations to avoid undue latencies. This commit therefore avoids migrating these tasks, instead letting them be dequeued one by one from the original leaf rcu_node structure by rcu_read_unlock_special(). This means that the clearing of bits from the upper-level rcu_node structures must be deferred until the last such task has been dequeued, because otherwise subsequent grace periods won't wait on them. This commit has the beneficial side effect of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in CONFIG_RCU_BOOST builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearingPaul E. McKenney
This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit bit clearing up the rcu_node tree once a given rcu_node structure's blkd_tasks list becomes empty. This is the final commit in preparation for the rework of RCU priority boosting: It enables preempted tasks to remain queued on their rcu_node structure even after all of that rcu_node structure's CPUs have gone offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()Paul E. McKenney
This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() in preparation for the rework of RCU priority boosting. This new function will be invoked from rcu_read_unlock_special() in the reworked scheme, which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node structure's ->qsmaskinit field has already been updated. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Rename "empty" to "empty_norm" in preparation for boost reworkPaul E. McKenney
This commit undertakes a simple variable renaming to make way for some rework of RCU priority boosting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Protect rcu_boost() lockless accesses with ACCESS_ONCE()Paul E. McKenney
This commit prevents random compiler optimizations by applying ACCESS_ONCE() to lockless accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Fix rcu_barrier() race that could result in too-short waitPaul E. McKenney
The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions. It checks a given CPU's lists of callbacks, and if all three no-CBs lists are empty, ignores that CPU. However, these three lists could potentially be empty even when callbacks are present if the check executed just as the callbacks were being moved from one list to another. It turns out that recent versions of rcutorture can spot this race. This commit plugs this hole by consolidating the per-list counts of no-CBs callbacks into a single count, which is incremented before the corresponding callback is posted and after it is invoked. Then rcu_barrier() checks this single count to reliably determine whether the corresponding CPU has no-CBs callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>