summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2011-03-03Merge branch 'quilt/rr'Stephen Rothwell
2011-03-03Merge remote-tracking branch 'net/master'Stephen Rothwell
Conflicts: drivers/net/bnx2x/bnx2x.h lib/Makefile
2011-03-03Merge remote-tracking branch 'kvm/linux-next'Stephen Rothwell
2011-03-03Merge remote-tracking branch 'sh/sh-latest'Stephen Rothwell
2011-02-28export pid symbols needed for kvm_vcpu_on_spinRik van Riel
Export the symbols required for a race-free kvm_vcpu_on_spin. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-02-28Merge remote branch 'tip/sched/core' into HEADAvi Kivity
* tip/sched/core: sched: Add yield_to(task, preempt) functionality sched: Use a buddy to implement yield_task_fair() sched: Limit the scope of clear_buddies sched: Check the right ->nr_running in yield_task_fair() sched: Avoid expensive initial update_cfs_load(), on UP too sched: Fix switch_from_fair() sched: Simplify the idle scheduling class softirqs: Account ksoftirqd time as cpustat softirq sched: Export ns irqtimes through /proc/stat sched: Refactor account_system_time separating id-update time: Add nsecs_to_cputime64 interface for asm-generic softirqs: Free up pf flag PF_KSOFTIRQD sched: Avoid expensive initial update_cfs_load() sched: Simplify update_cfs_shares parameters Signed-off-by: Avi Kivity <avi@redhat.com>
2011-02-26clockevents: Prevent oneshot mode when broadcast device is periodicThomas Gleixner
When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only can switch into oneshot mode, when the backup broadcast device supports oneshot mode as well. Otherwise we would try to switch the broadcast device into an unsupported mode unconditionally. This went unnoticed so far as the current available broadcast devices support oneshot mode. Seth unearthed this problem while debugging and working around an hpet related BIOS wreckage. Add the necessary check to tick_is_oneshot_available(). Reported-and-tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1102252231200.2701@localhost6.localdomain6> Cc: stable@kernel.org # .21 ->
2011-02-25module: deal with alignment issues in built-in module versionsDmitry Torokhov
On m68k natural alignment is 2-byte boundary but we are trying to align structures in __modver section on sizeof(void *) boundary. This causes trouble when we try to access elements in this section in array-like fashion when create "version" attributes for built-in modules. Moreover, as DaveM said, we can't reliably put structures into independent objects, put them into a special section, and then expect array access over them (via the section boundaries) after linking the objects together to just "work" due to variable alignment choices in different situations. The only solution that seems to work reliably is to make an array of plain pointers to the objects in question and put those pointers in the special section. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-02-22Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for now genirq: Prevent access beyond allocated_irqs bitmap
2011-02-22Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix throttle logic perf, x86: P4 PMU: Fix spurious NMI messages
2011-02-19Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c
2011-02-19genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for nowThomas Gleixner
With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler in request_threaded_irq(). The original implementation (commit a304e1b8) called the handler _BEFORE_ it was installed, but that caused problems with handlers calling disable_irq_nosync(). See commit 377bf1e4. It's braindead in the first place to call disable_irq_nosync in shared handlers, but .... Moving this call after we installed the handler looks innocent, but it is very subtle broken on SMP. Interrupt handlers rely on the fact, that the irq core prevents reentrancy. Now this debug call violates that promise because we run the handler w/o the IRQ_INPROGRESS protection - which we cannot apply here because that would result in a possibly forever masked interrupt line. A concurrent real hardware interrupt on a different CPU results in handler reentrancy and can lead to complete wreckage, which was unfortunately observed in reality and took a fricking long time to debug. Leave the code here for now. We want this debug feature, but that's not easy to fix. We really should get rid of those disable_irq_nosync() abusers and remove that function completely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: stable@kernel.org # .28 -> .37
2011-02-19genirq: Prevent access beyond allocated_irqs bitmapThomas Gleixner
Lars-Peter Clausen pointed out: I stumbled upon this while looking through the existing archs using SPARSE_IRQ. Even with SPARSE_IRQ the NR_IRQS is still the upper limit for the number of IRQs. Both PXA and MMP set NR_IRQS to IRQ_BOARD_START, with IRQ_BOARD_START being the number of IRQs used by the core. In various machine files the nr_irqs field of the ARM machine defintion struct is then set to "IRQ_BOARD_START + NR_BOARD_IRQS". As a result "nr_irqs" will greater then NR_IRQS which then again causes the "allocated_irqs" bitmap in the core irq code to be accessed beyond its size overwriting unrelated data. The core code really misses a sanity check there. This went unnoticed so far as by chance the compiler/linker places data behind that bitmap which gets initialized later on those affected platforms. So the obvious fix would be to add a sanity check in early_irq_init() and break all affected platforms. Though that check wants to be backported to stable as well, which will require to fix all known problematic platforms and probably some more yet not known ones as well. Lots of churn. A way simpler solution is to allocate a slightly larger bitmap and avoid the whole churn w/o breaking anything. Add a few warnings when an arch returns utter crap. Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # .37 Cc: Haojian Zhuang <haojian.zhuang@marvell.com> Cc: Eric Miao <eric.y.miao@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org>
2011-02-18Merge branch 'fixes-2.6.38' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' workqueue: wake up a worker when a rescuer is leaving a gcwq
2011-02-16PM / Hibernate: Return error code when alloc_image_page() failsStanislaw Gruszka
Currently we return 0 in swsusp_alloc() when alloc_image_page() fails. Fix that. Also remove unneeded "error" variable since the only useful value of error is -ENOMEM. [rjw: Fixed up the changelog and changed subject.] Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> Cc: stable@kernel.org Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-02-16workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies longTejun Heo
MAYDAY_INITIAL_TIMEOUT is defined as HZ / 100 and depending on configuration may end up 0 or 1. Even when it's 1, depending on when the mayday timer is added in the current jiffy interval, it may expire way before a jiffy has passed. Make sure MAYDAY_INITIAL_TIMEOUT is at least two to guarantee that at least a full jiffy has passed before calling rescuers. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ray Jui <rjui@broadcom.com> Cc: stable@kernel.org
2011-02-16workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'Tejun Heo
There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>
2011-02-16perf: Fix throttle logicPeter Zijlstra
It was possible to call pmu::start() on an already running event. In particular this lead so some wreckage as the hrtimer events would re-initialize active timers. This was due to throttled events being activated again by scheduling. Scheduling in a context would add and force start events, resulting in running events with a possible throttle status. The next tick to hit that task will then try to unthrottle the event and call ->start() on an already running event. Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15Merge branches 'core-fixes-for-linus' and 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "lockdep, timer: Fix del_timer_sync() annotation" * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timer debug: Hide kernel addresses via %pK in /proc/timer_list
2011-02-15Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix text_poke_smp_batch() deadlock perf tools: Fix thread_map event synthesizing in top and record watchdog, nmi: Lower the severity of error messages ARM: oprofile: Fix backtraces in timer mode oprofile: Fix usage of CONFIG_HW_PERF_EVENTS for oprofile_perf_init and friends
2011-02-15sh: Enable CONFIG_GCOV_PROFILE_ALL for shChris Smith
This patch enables gcov kernel profiling over the whole kernel for sh. Profiling of specific files individually already worked. A handful of files have to be explicitly excluded from the profiling to avoid breaking things, notably pmb.c. Signed-off-by: Chris Smith <chris.smith@st.com> Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-02-14workqueue: wake up a worker when a rescuer is leaving a gcwqTejun Heo
After executing the matching works, a rescuer leaves the gcwq whether there are more pending works or not. This may decrease the concurrency level to zero and stall execution until a new work item is queued on the gcwq. Make rescuer wake up a regular worker when it leaves a gcwq if there are more works to execute, so that execution isn't stalled. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ray Jui <rjui@broadcom.com> Cc: stable@kernel.org
2011-02-12timer debug: Hide kernel addresses via %pK in /proc/timer_listKees Cook
In the continuing effort to avoid kernel addresses leaking to unprivileged users, this patch switches to %pK for /proc/timer_list reporting. Signed-off-by: Kees Cook <kees.cook@canonical.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110212032125.GA23571@outflux.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: pci: use security_capable() when checking capablities during config space read security: add cred argument to security_capable() tpm_tis: Use timeouts returned from TPM
2011-02-11ptrace: use safer wake up on ptrace_detach()Tejun Heo
The wake_up_process() call in ptrace_detach() is spurious and not interlocked with the tracee state. IOW, the tracee could be running or sleeping in any place in the kernel by the time wake_up_process() is called. This can lead to the tracee waking up unexpectedly which can be dangerous. The wake_up is spurious and should be removed but for now reduce its toxicity by only waking up if the tracee is in TRACED or STOPPED state. This bug can possibly be used as an attack vector. I don't think it will take too much effort to come up with an attack which triggers oops somewhere. Most sleeps are wrapped in condition test loops and should be safe but we have quite a number of places where sleep and wakeup conditions are expected to be interlocked. Although the window of opportunity is tiny, ptrace can be used by non-privileged users and with some loading the window can definitely be extended and exploited. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-11security: add cred argument to security_capable()Chris Wright
Expand security_capable() to include cred, so that it can be usable in a wider range of call sites. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-02-10cap_syslog: accept CAP_SYS_ADMIN for nowLinus Torvalds
In commit ce6ada35bdf7 ("security: Define CAP_SYSLOG") Serge Hallyn introduced CAP_SYSLOG, but broke backwards compatibility by no longer accepting CAP_SYS_ADMIN as an override (it would cause a warning and then reject the operation). Re-instate CAP_SYS_ADMIN - but keeping the warning - as an acceptable capability until any legacy applications have been updated. There are apparently applications out there that drop all capabilities except for CAP_SYS_ADMIN in order to access the syslog. (This is a re-implementation of a patch by Serge, cleaning the logic up and making the code more readable) Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-10watchdog, nmi: Lower the severity of error messagesDon Zickus
During boot if the hardlockup detector fails to initialize, it complains very loudly. Some failures should be expected under certain situations, ie no lapics, or resource in-use. Tone those error messages down a bit. Keep the rest at a high level. Reported-by: Paul Bolle <pebolle@tiscali.nl> Tested-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1297278153-21111-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-09Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cdrom: support devices that have check_events but not media_changed cfq-iosched: Don't wait if queue already has requests. blkio-throttle: Avoid calling blkiocg_lookup_group() for root group cfq: rename a function to give it more appropriate name cciss: make cciss_revalidate not loop through CISS_MAX_LUNS volumes unnecessarily. drivers/block/aoe/Makefile: replace the use of <module>-objs with <module>-y loop: queue_lock NULL pointer derefence in blk_throtl_exit drivers/block/Makefile: replace the use of <module>-objs with <module>-y blktrace: Don't output messages if NOTIFY isn't set.
2011-02-08Revert "lockdep, timer: Fix del_timer_sync() annotation"Peter Zijlstra
Both attempts at trying to allow softirq usage for del_timer_sync() failed (produced bogus warnings), so revert the commit for this release: f266a5110d45: lockdep, timer: Fix del_timer_sync() annotation and try again later. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yong Zhang <yong.zhang0@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1297174680.13327.107.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-07CRED: Fix memory and refcount leaks upon security_prepare_creds() failureTetsuo Handa
In prepare_kernel_cred() since 2.6.29, put_cred(new) is called without assigning new->usage when security_prepare_creds() returned an error. As a result, memory for new and refcount for new->{user,group_info,tgcred} are leaked because put_cred(new) won't call __put_cred() unless old->usage == 1. Fix these leaks by assigning new->usage (and new->subscribers which was added in 2.6.32) before calling security_prepare_creds(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-07CRED: Fix BUG() upon security_cred_alloc_blank() failureTetsuo Handa
In cred_alloc_blank() since 2.6.32, abort_creds(new) is called with new->security == NULL and new->magic == 0 when security_cred_alloc_blank() returns an error. As a result, BUG() will be triggered if SELinux is enabled or CONFIG_DEBUG_CREDENTIALS=y. If CONFIG_DEBUG_CREDENTIALS=y, BUG() is called from __invalid_creds() because cred->magic == 0. Failing that, BUG() is called from selinux_cred_free() because selinux_cred_free() is not expecting cred->security == NULL. This does not affect smack_cred_free(), tomoyo_cred_free() or apparmor_cred_free(). Fix these bugs by (1) Set new->magic before calling security_cred_alloc_blank(). (2) Handle null cred->security in creds_are_invalid() and selinux_cred_free(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-06Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep, timer: Fix del_timer_sync() annotation RTC: Prevents a division by zero in kernel code.
2011-02-04Merge branch 'tip/perf/urgent-2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2011-02-04lockdep, timer: Fix del_timer_sync() annotationPeter Zijlstra
Calling local_bh_enable() will want to actually start processing softirqs, which isn't a good idea since this can get called with IRQs disabled. Cure this by using _local_bh_enable() which doesn't start processing softirqs, and use raw_local_irq_save() to avoid any softirqs from happening without letting lockdep think IRQs are in fact disabled. Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> LKML-Reference: <20110203141548.039540914@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-03Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Prevent irq storm on migration
2011-02-03Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix update_curr_rt() sched, docs: Update schedstats documentation to version 15
2011-02-03Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix reading in perf_event_read() watchdog: Don't change watchdog state on read of sysctl watchdog: Fix sysctl consistency watchdog: Fix broken nowatchdog logic perf: Fix Pentium4 raw event validation perf: Fix alloc_callchain_buffers()
2011-02-03tracing: Replace syscall_meta_data struct array with pointer arraySteven Rostedt
Currently the syscall_meta structures for the syscall tracepoints are placed in the __syscall_metadata section, and at link time, the linker makes one large array of all these syscall metadata structures. On boot up, this array is read (much like the initcall sections) and the syscall data is processed. The problem is that there is no guarantee that gcc will place complex structures nicely together in an array format. Two structures in the same file may be placed awkwardly, because gcc has no clue that they are suppose to be in an array. A hack was used previous to force the alignment to 4, to pack the structures together. But this caused alignment issues with other architectures (sparc). Instead of packing the structures into an array, the structures' addresses are now put into the __syscall_metadata section. As pointers are always the natural alignment, gcc should always pack them tightly together (otherwise initcall, extable, etc would also fail). By having the pointers to the structures in the section, we can still iterate the trace_events without causing unnecessary alignment problems with other architectures, or depending on the current behaviour of gcc that will likely change in the future just to tick us kernel developers off a little more. The __syscall_metadata section is also moved into the .init.data section as it is now only needed at boot up. Suggested-by: David Miller <davem@davemloft.net> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03tracepoints: Fix section alignment using pointer arrayMathieu Desnoyers
Make the tracepoints more robust, making them solid enough to handle compiler changes by not relying on anything based on compiler-specific behavior with respect to structure alignment. Implement an approach proposed by David Miller: use an array of const pointers to refer to the individual structures, and export this pointer array through the linker script rather than the structures per se. It will consume 32 extra bytes per tracepoint (24 for structure padding and 8 for the pointers), but are less likely to break due to compiler changes. History: commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() added the aligned(32) type and variable attribute to the tracepoint structures to deal with gcc happily aligning statically defined structures on 32-byte multiples. One attempt was to use a 8-byte alignment for tracepoint structures by applying both the variable and type attribute to tracepoint structures definitions and declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5. The reason is that the "aligned" attribute only specify the _minimum_ alignment for a structure, leaving both the compiler and the linker free to align on larger multiples. Because tracepoint.c expects the structures to be placed as an array within each section, up-alignment cause NULL-pointer exceptions due to the extra unexpected padding. (this patch applies on top of -tip) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> LKML-Reference: <20110126222622.GA10794@Krystal> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <peterz@infradead.org> CC: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03sched: Add yield_to(task, preempt) functionalityMike Galbraith
Currently only implemented for fair class tasks. Add a yield_to_task method() to the fair scheduling class. allowing the caller of yield_to() to accelerate another thread in it's thread group, task group. Implemented via a scheduler hint, using cfs_rq->next to encourage the target being selected. We can rely on pick_next_entity to keep things fair, so noone can accelerate a thread that has already used its fair share of CPU time. This also means callers should only call yield_to when they really mean it. Calling it too often can result in the scheduler just ignoring the hint. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110201095051.4ddb7738@annuminas.surriel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03sched: Use a buddy to implement yield_task_fair()Rik van Riel
Use the buddy mechanism to implement yield_task_fair. This allows us to skip onto the next highest priority se at every level in the CFS tree, unless doing so would introduce gross unfairness in CPU time distribution. We order the buddy selection in pick_next_entity to check yield first, then last, then next. We need next to be able to override yield, because it is possible for the "next" and "yield" task to be different processen in the same sub-tree of the CFS tree. When they are, we need to go into that sub-tree regardless of the "yield" hint, and pick the correct entity once we get to the right level. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03sched: Limit the scope of clear_buddiesRik van Riel
The clear_buddies function does not seem to play well with the concept of hierarchical runqueues. In the following tree, task groups are represented by 'G', tasks by 'T', next by 'n' and last by 'l'. (nl) / \ G(nl) G / \ \ T(l) T(n) T This situation can arise when a task is woken up T(n), and the previously running task T(l) is marked last. When clear_buddies is called from either T(l) or T(n), the next and last buddies of the group G(nl) will be cleared. This is not the desired result, since we would like to be able to find the other type of buddy in many cases. This especially a worry when implementing yield_task_fair through the buddy system. The fix is simple: only clear the buddy type that the task itself is indicated to be. As an added bonus, we stop walking up the tree when the buddy has already been cleared or pointed elsewhere. Signed-off-by: Rik van Riel <riel@redhat.coM> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110201094837.6b0962a9@annuminas.surriel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03sched: Check the right ->nr_running in yield_task_fair()Rik van Riel
With CONFIG_FAIR_GROUP_SCHED, each task_group has its own cfs_rq. Yielding to a task from another cfs_rq may be worthwhile, since a process calling yield typically cannot use the CPU right now. Therefor, we want to check the per-cpu nr_running, not the cgroup local one. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110201094715.798c4f86@annuminas.surriel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03sched: Fix update_curr_rt()Peter Zijlstra
cpu_stopper_thread() migration_cpu_stop() __migrate_task() deactivate_task() dequeue_task() dequeue_task_rq() update_curr_rt() Will call update_curr_rt() on rq->curr, which at that time is rq->stop. The problem is that rq->stop.prio matches an RT prio and thus falsely assumes its a rt_sched_class task. Reported-Debuged-Tested-Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: stable@kernel.org # .37 Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03perf: Fix reading in perf_event_read()Peter Zijlstra
It is quite possible for the event to have been disabled between perf_event_read() sending the IPI and the CPU servicing the IPI and calling __perf_event_read(), hence revalidate the state. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-02tracing: Replace trace_event struct array with pointer arraySteven Rostedt
Currently the trace_event structures are placed in the _ftrace_events section, and at link time, the linker makes one large array of all the trace_event structures. On boot up, this array is read (much like the initcall sections) and the events are processed. The problem is that there is no guarantee that gcc will place complex structures nicely together in an array format. Two structures in the same file may be placed awkwardly, because gcc has no clue that they are suppose to be in an array. A hack was used previous to force the alignment to 4, to pack the structures together. But this caused alignment issues with other architectures (sparc). Instead of packing the structures into an array, the structures' addresses are now put into the _ftrace_event section. As pointers are always the natural alignment, gcc should always pack them tightly together (otherwise initcall, extable, etc would also fail). By having the pointers to the structures in the section, we can still iterate the trace_events without causing unnecessary alignment problems with other architectures, or depending on the current behaviour of gcc that will likely change in the future just to tick us kernel developers off a little more. The _ftrace_event section is also moved into the .init.data section as it is now only needed at boot up. Suggested-by: David Miller <davem@davemloft.net> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-02genirq: Prevent irq storm on migrationThomas Gleixner
move_native_irq() masks and unmasks the interrupt line unconditionally, but the interrupt line might be masked due to a threaded oneshot handler in progress. Unmasking the line in that case can lead to interrupt storms. Observed on PREEMPT_RT. Originally-from: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
2011-01-31watchdog: Don't change watchdog state on read of sysctlMarcin Slusarz
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> [ add {}'s to fix a warning ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <1296230433-6261-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-31watchdog: Fix sysctl consistencyMarcin Slusarz
If it was not possible to enable watchdog for any cpu, switch watchdog_enabled back to 0, because it's visible via kernel.watchdog sysctl. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <1296230433-6261-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>