summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-17block: Bio cancellationaio-idaKent Overstreet
If a bio is associated with a kiocb, allow it to be cancelled. This is accomplished by adding a pointer to a kiocb in struct bio, and when we go to dequeue a request we check if its bio has been cancelled - if so, we end the request with -ECANCELED. We don't currently try to cancel bios if IO has already been started - that'd require a per bio callback function, and a way to find all the outstanding bios for a given kiocb. Such a mechanism may or may not be added in the future but this patch tries to start simple. Currently this can only be triggered with aio and io_cancel(), but the mechanism can be used for sync io too. It can also be used for bios created by stacking drivers, and bio clones in general - when cloning a bio, if the bi_iocb pointer is copied as well the clone will then be cancellable. bio_clone() could be modified to do this, but hasn't in this patch because all the bio_clone() users would need to be auditied to make sure that it's safe. We can't blindly make e.g. raid5 writes cancellable without the knowledge of the md code. Initial patch by Anatol Pomazau (anatol@google.com). Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17direct-io: Set dio->io_error directlyKent Overstreet
The way io errors are returned in the dio code was rather convoluted, and also meant that the specific error code was lost. We need to return the actual error so that for cancellation we can pass up -ECANCELED. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17aio/usb: Update cancellation for new synchonizationKent Overstreet
Previous patch got rid of kiocb->ki_users; this was done by having kiocb_cancel()/aio_complete() explicitly synchronize with each other. The new rule is that when a driver calls aio_complete(), after aio_complete() returns ki_cancel cannot be running and it's safe to dispose of kiocb->priv. But, this means ki_cancel() won't be able to call aio_complete() itself, or aio_complete() will deadlock. So, update the driver. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17aio: Allow cancellation without a cancel callback, new kiocb lookupKent Overstreet
This patch does a couple things: * Allows cancellation of any kiocb, even if the driver doesn't implement a ki_cancel callback function. This will be used for block layer cancellation - there, implementing a callback is problematic, but we can implement useful cancellation by just checking if the kicob has been marked as cancelled when it goes to dequeue the request. * Implements a new lookup mechanism for cancellation. Previously, to cancel a kiocb we had to look it up in a linked list, and kiocbs were added to the linked list lazily. But if any kiocb is cancellable, the lazy list adding no longer works, so we need a new mechanism. This is done by allocating kiocbs out of a (lazily allocated) array of pages, which means we can refer to the kiocbs (and iterate over them) with small integers - we use the percpu tag allocation code for allocating individual kiocbs. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17mtip32xx: convert to batch completionKent Overstreet
[asamymuthupa@micron.com: * changes for conversion to bio batch completion from Kent * fix to apply the above changes cleanly on latest mtip32xx code * batch bio completion changes in * mtip_command_cleanup() * mtip_timeout_function() * mtip_handle_tfe()] Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17virtio-blk: convert to batch completionKent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17block, aio: batch completion for bios/kiocbsKent Overstreet
When completing a kiocb, there's some fixed overhead from touching the kioctx's ring buffer the kiocb belongs to. Some newer high end block devices can complete multiple IOs per interrupt, much like many network interfaces have been for some time. This plumbs through infrastructure so we can take advantage of multiple completions at the interrupt level, and complete multiple kiocbs at the same time. Drivers have to be converted to take advantage of this, but it's a simple change and the next patches will convert a few drivers. To use it, an interrupt handler (or any code that completes bios or requests) declares and initializes a struct batch_complete: struct batch_complete batch; batch_complete_init(&batch); Then, instead of calling bio_endio(), it calls bio_endio_batch(bio, err, &batch). This just adds the bio to a list in the batch_complete. At the end, it calls batch_complete(&batch); This completes all the bios all at once, building up a list of kiocbs; then the list of kiocbs are completed all at once. [akpm@linux-foundation.org: fix warning] [akpm@linux-foundation.org: fs/aio.c needs bio.h, move bio_endio_batch() declaration somewhere rational] [akpm@linux-foundation.org: fix warnings] [minchan@kernel.org: fix build error due to bio_endio_batch] [akpm@linux-foundation.org: fix tracepoint in batch_complete()] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17block: prep work for batch completionKent Overstreet
Add a struct batch_complete * argument to bi_end_io; infrastructure to make use of it comes in the next patch. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17aio: convert the ioctx list to radix treeKent Overstreet
On Wed, Jun 12, 2013 at 11:14:40AM -0700, Kent Overstreet wrote: > On Mon, Apr 15, 2013 at 02:40:55PM +0300, Octavian Purdila wrote: > > When using a large number of threads performing AIO operations the > > IOCTX list may get a significant number of entries which will cause > > significant overhead. For example, when running this fio script: > > > > rw=randrw; size=256k ;directory=/mnt/fio; ioengine=libaio; iodepth=1 > > blocksize=1024; numjobs=512; thread; loops=100 > > > > on an EXT2 filesystem mounted on top of a ramdisk we can observe up to > > 30% CPU time spent by lookup_ioctx: > > > > 32.51% [guest.kernel] [g] lookup_ioctx > > 9.19% [guest.kernel] [g] __lock_acquire.isra.28 > > 4.40% [guest.kernel] [g] lock_release > > 4.19% [guest.kernel] [g] sched_clock_local > > 3.86% [guest.kernel] [g] local_clock > > 3.68% [guest.kernel] [g] native_sched_clock > > 3.08% [guest.kernel] [g] sched_clock_cpu > > 2.64% [guest.kernel] [g] lock_release_holdtime.part.11 > > 2.60% [guest.kernel] [g] memcpy > > 2.33% [guest.kernel] [g] lock_acquired > > 2.25% [guest.kernel] [g] lock_acquire > > 1.84% [guest.kernel] [g] do_io_submit > > > > This patchs converts the ioctx list to a radix tree. For a performance > > comparison the above FIO script was run on a 2 sockets 8 core > > machine. This are the results (average and %rsd of 10 runs) for the > > original list based implementation and for the radix tree based > > implementation: > > > > cores 1 2 4 8 16 32 > > list 109376 ms 69119 ms 35682 ms 22671 ms 19724 ms 16408 ms > > %rsd 0.69% 1.15% 1.17% 1.21% 1.71% 1.43% > > radix 73651 ms 41748 ms 23028 ms 16766 ms 15232 ms 13787 ms > > %rsd 1.19% 0.98% 0.69% 1.13% 0.72% 0.75% > > % of radix > > relative 66.12% 65.59% 66.63% 72.31% 77.26% 83.66% > > to list > > > > To consider the impact of the patch on the typical case of having > > only one ctx per process the following FIO script was run: > > > > rw=randrw; size=100m ;directory=/mnt/fio; ioengine=libaio; iodepth=1 > > blocksize=1024; numjobs=1; thread; loops=100 > > > > on the same system and the results are the following: > > > > list 58892 ms > > %rsd 0.91% > > radix 59404 ms > > %rsd 0.81% > > % of radix > > relative 100.87% > > to list > > So, I was just doing some benchmarking/profiling to get ready to send > out the aio patches I've got for 3.11 - and it looks like your patch is > causing a ~1.5% throughput regression in my testing :/ ... <snip> I've got an alternate approach for fixing this wart in lookup_ioctx()... Instead of using an rbtree, just use the reserved id in the ring buffer header to index an array pointing the ioctx. It's not finished yet, and it needs to be tidied up, but is most of the way there. -ben -- "Thought is the essence of where you are now." -- And, a rework of Ben's code, but this was entirely his idea -Kent fs/aio.c | 80 ++++++++++++++++++++++++++++++++++++++++++----- include/linux/mm_types.h | 5 ++ kernel/fork.c | 4 ++ 3 files changed, 81 insertions(+), 8 deletions(-)
2013-06-17aio: Kill ki_dtorKent Overstreet
sock_aio_dtor() is dead code - and stuff that does need to do cleanup can simply do it before calling aio_complete(). Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu>
2013-06-17aio: Kill ki_usersKent Overstreet
The kiocb refcount is only needed for cancellation - to ensure a kiocb isn't freed while a ki_cancel callback is running. But if we restrict ki_cancel callbacks to not block (which they currently don't), we can simply drop the refcount. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu>
2013-06-17aio: Kill unneeded kiocb membersKent Overstreet
The old aio retry infrastucture needed to save the various arguments to to aio operations. But with the retry infrastructure gone, we can trim struct kiocb quite a bit. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu>
2013-06-17aio: Kill aio_rw_vect_retry()Kent Overstreet
This code doesn't serve any purpose anymore, since the aio retry infrastructure has been removed. This change should be safe because aio_read/write are also used for synchronous IO, and called from do_sync_read()/do_sync_write() - and there's no looping done in the sync case (the read and write syscalls). Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17aio: Don't use ctx->tail unnecessarilyKent Overstreet
aio_complete() (arguably) needs to keep its own trusted copy of the tail pointer, but io_getevents() doesn't have to use it - it's already using the head pointer from the ring buffer. So convert it to use the tail from the ring buffer so it touches fewer cachelines and doesn't contend with the cacheline aio_complete() needs. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17aio: io_cancel() no longer returns the io_eventKent Overstreet
Originally, io_event() was documented to return the io_event if cancellation succeeded - the io_event wouldn't be delivered via the ring buffer like it normally would. But this isn't what the implementation was actually doing; the only driver implementing cancellation, the usb gadget code, never returned an io_event in its cancel function. And aio_complete() was recently changed to no longer suppress event delivery if the kiocb had been cancelled. This gets rid of the unused io_event argument to kiocb_cancel() and kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if kiocb->ki_cancel() returned success. Also tweak the refcounting in kiocb_cancel() to make more sense. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17aio: percpu ioctx refcountKent Overstreet
This just converts the ioctx refcount to the new generic dynamic percpu refcount code. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17aio: percpu reqs_availableKent Overstreet
See the previous patch ("aio: reqs_active -> reqs_available") for why we want to do this - this basically implements a per cpu allocator for reqs_available that doesn't actually allocate anything. Note that we need to increase the size of the ringbuffer we allocate, since a single thread won't necessarily be able to use all the reqs_available slots - some (up to about half) might be on other per cpu lists, unavailable for the current thread. We size the ringbuffer based on the nr_events userspace passed to io_setup(), so this is a slight behaviour change - but nr_events wasn't being used as a hard limit before, it was being rounded up to the next page before so this doesn't change the actual semantics. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17aio: reqs_active -> reqs_availableKent Overstreet
The number of outstanding kiocbs is one of the few shared things left that has to be touched for every kiocb - it'd be nice to make it percpu. We can make it per cpu by treating it like an allocation problem: we have a maximum number of kiocbs that can be outstanding (i.e. slots) - then we just allocate and free slots, and we know how to write per cpu allocators. So as prep work for that, we convert reqs_active to reqs_available. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-17aio: Fix io_destroy() regression by using call_rcu()Kent Overstreet
There was a regression introduced by 36f5588, reported by Jens Axboe - the refcounting cleanup switched to using RCU in the shutdown path, but the synchronize_rcu() was done in the context of the io_destroy() syscall greatly increasing the time it could block. This patch switches it to call_rcu() and makes shutdown asynchronous (more asynchronous than it was originally; before the refcount changes io_destroy() would still wait on pending kiocbs). Note that there's a global quota on the max outstanding kiocbs, and that quota must be manipulated synchronously; otherwise io_setup() could return -EAGAIN when there isn't quota available, and userspace won't have any way of waiting until shutdown of the old kioctxs has finished (besides busy looping). So we release our quota before kioctx shutdown has finished, which should be fine since the quota never corresponded to anything real anyways. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org>
2013-06-17idr: Percpu idaKent Overstreet
Percpu frontend for allocating ids. With percpu allocation (that works), it's impossible to guarantee it will always be possible to allocate all nr_tags - typically, some will be stuck on a remote percpu freelist where the current job can't get to them. We do guarantee that it will always be possible to allocate at least (nr_tags / 2) tags - this is done by keeping track of which and how many cpus have tags on their percpu freelists. On allocation failure if enough cpus have tags that there could potentially be (nr_tags / 2) tags stuck on remote percpu freelists, we then pick a remote cpu at random to steal from. Note that the synchronization is _definitely_ tricky - we're using xchg()/cmpxchg() on the percpu lists, to synchronize between steal_tags(). The alternative would've been adding a spinlock to protect the percpu freelists, but that would've required some different tricky code to avoid deadlock because of the lock ordering. Note that there's no cpu hotplug notifier - we don't care, because steal_tags() will eventually get the down cpu's tags. We _could_ satisfy more allocations if we had a notifier - but we'll still meet our guarantees and it's absolutely not a correctness issue, so I don't think it's worth the extra code. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
2013-06-17idr: Rewrite idaKent Overstreet
This is a new, from scratch implementation of ida that should be simpler, faster and more space efficient. Two primary reasons for the rewrite: * A future patch will reimplement idr on top of this ida implementation + radix trees. Once that's done, the end result will be ~1k fewer lines of code, much simpler and easier to understand and it should be quite a bit faster. * The performance improvements and addition of ganged allocation should make ida more suitable for use by a percpu id/tag allocator, which would then act as a frontend to this allocator. The old ida implementation was done with the idr data structures - this was IMO backwards. I'll soon be reimplementing idr on top of this new ida implementation and radix trees - using a separate dedicated data structure for the free ID bitmap should actually make idr faster, and the end result is _significantly_ less code. This implementation conceptually isn't that different from the old one - it's a tree of bitmaps, where one bit in a given node indicates whether or not there are free bits in a child node. The main difference (and advantage) over the old version is that the tree isn't implemented with pointers - it's implemented in an array, like how heaps are implemented, which both better space efficiency and it'll be faster since there's no pointer chasing. This does mean that the entire bitmap is stored in one contiguous memory allocation - and as currently implemented we won't be able to allocate _quite_ as many ids as with the previous implementation. I don't expect this to be an issue in practice since anywhere this is used, an id corresponds to a struct allocation somewher else - we can't allocate an unbounded number of ids, we'll run out of memory somewhere else eventually, and I expect that to be the limiting factor in practice. If a user/use case does come up where this matters I can add some sharding (or perhaps add a separate big_ida implementation) - but the extra complexity would adversely affect performance for the users that don't need > millions of ids, so I intend to leave the implementation as is until if and when this becomes an issue. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Fengguang Wu <fengguang.wu@intel.com>
2013-06-17idr: Rename ida_simple_get() -> ida_alloc_range()Kent Overstreet
The old ida interfaces that didn't do locking have been removed, the "simple" distinction doesn't make sense anymore. Also, add an ida_alloc() wrapper that doesn't take the start and end parameters. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jean Delvare <khali@linux-fr.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Cameron <jic23@cam.ac.uk> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Cc: Jens Taprogge <jens.taprogge@taprogge.org> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org>
2013-06-17idr: Convert code to ida_simple_get()Kent Overstreet
A lot of drivers were open coding ida_simple_get() - by converting them we can get rid of (crappy) ida_pre_get(), ida_get_new(), ida_remove() interfaces. The ida_simple_*() interfaces do their own locking, which means we can remove a fair amount of code. Additionally, some code was open coding ida_get_cyclic() (c.f. idr_alloc_cyclic()) - after digging into the git logs this seems to have entirely been a performance optimization. This patch removes the cyclic allocation - cyclic allocation as performance optimization is rather sketchy, and in a couple patches we're reimplementing ida from scratch and the new implementation should be a good deal faster. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Li Zefan <lizefan@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2013-06-03percpu-refcount: Don't use silly cmpxchg()Kent Overstreet
The cmpxchg() was just to ensure the debug check didn't race, which was a bit excessive. The caller is supposed to do the appropriate synchronization, which means percpu_ref_kill() can just do a simple store. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-03percpu: implement generic percpu refcountingKent Overstreet
This implements a refcount with similar semantics to atomic_get()/atomic_dec_and_test() - but percpu. It also implements two stage shutdown, as we need it to tear down the percpu counts. Before dropping the initial refcount, you must call percpu_ref_kill(); this puts the refcount in "shutting down mode" and switches back to a single atomic refcount with the appropriate barriers (synchronize_rcu()). It's also legal to call percpu_ref_kill() multiple times - it only returns true once, so callers don't have to reimplement shutdown synchronization. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style tweak] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-04Merge tag 'regulator-v3.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A few small fixes for v3.10, documentation things in the core and a few driver bugs." * tag 'regulator-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: palmas: Fix "enable_reg" to point to the correct reg for SMPS10 regulator: palmas: Fix incorrect condition regulator: core: Correct spelling mistake in comment regulator: dbx500: Make local symbol static regulator: Fix kernel-doc generation warnings.
2013-06-04Merge tag 'jfs-3.10-rc5' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs bugfixes from David Kleikamp: "A couple jfs bug fixes for 3.10-rc5" * tag 'jfs-3.10-rc5' of git://github.com/kleikamp/linux-shaggy: fs/jfs: Add check if journaling to disk has been disabled in lbmRead() jfs: Several bugs in jfs_freeze() and jfs_unfreeze()
2013-06-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k fix from Geert Uytterhoeven: "A boot lock-up on Mac, also destined for stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k/mac: Fix unexpected interrupt with CONFIG_EARLY_PRINTK
2013-06-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Recent bug fixes, one of them touches a common code file. It adds two #ifndef/#endif pairs to asm-generic/io.h to be able to override xlate_dev_kmem_ptr and xlate_dev_mem_ptr." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pgtable: Fix gmap notifier address s390/dasd: fix handling of gone paths s390/pgtable: Fix check for pgste/storage key handling arch: s390: appldata: using strncpy() and strnlen() instead of sprintf() s390/smp: lost IPIs on cpu hotplug kernel: Fix s390 absolute memory access for /dev/mem s390/dma: do not call debug_dma after free
2013-06-03Merge branch 'for-3.10-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix for yet another xattr bug which may lead to NULL deref. - A subtle bug in for_each_descendant_pre(). This bug requires quite specific conditions to trigger and isn't too likely to actually happen in the wild, but maybe that just makes it that much more nastier. - A warning message added for silly cgroup re-mount (not -o remount, but unmount followed by mount) behavior. * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: warn about mismatching options of a new mount of an existing hierarchy cgroup: fix a subtle bug in descendant pre-order walk cgroup: initialize xattr before calling d_instantiate()
2013-06-03Merge branch 'for-3.10-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata changes from Tejun Heo: "Nothing too interesting. PCI ID additions, some sata_rcar fixes and a fringe bug fix for DMADIR handling which shouldn't affect any device remotely modern." * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: sata_rcar: fix interrupt handling ahci: add an observed PCI ID for Marvell 88se9172 SATA controller sata_rcar: clear STOP bit in bmdma_start() method libata: make ata_exec_internal_sg honor DMADIR ata_piix: add PCI IDs for Intel BayTail libata: update "Maintained by:" tags
2013-06-02Linux 3.10-rc4Linus Torvalds
2013-06-02sata_rcar: fix interrupt handlingSergei Shtylyov
The driver's interrupt handling code is too picky in deciding whether it should handle an interrupt or not which causes completely unneeded spurious interrupts. Thus make sata_rcar_{ata|serr}_interrupt() *void*; add ATA status register read to sata_rcar_ata_interrupt() to clear an unexpected ATA interrupt -- it doesn't get cleared by writing to the SATAINTSTAT register in the interrupt mode we use. Also, in sata_rcar_ata_interrupt() we should check SATAINTSTAT register only for enabled interrupts and we should clear only those interrupts that we have read as active first time around, because else we have a race and risk clearing an interrupt that can occur between read and write of the SATAINTSTAT register and never registering it... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2013-06-02Merge branch 'for-3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "This patcheset includes fixes for: - the PCI/LBA which brings back the stifb graphics framebuffer console - possible memory overflows in parisc kernel init code - parport support on older GSC machines - avoids that users by mistake enable PARPORT_PC_SUPERIO on parisc - MAINTAINERS file list updates for parisc." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: parport0: fix this legacy no-device port driver! parport_pc: disable PARPORT_PC_SUPERIO on parisc architecture parisc/PCI: lba: fix: convert to pci_create_root_bus() for correct root bus resources (v2) parisc/PCI: Set type for LBA bus_num resource MAINTAINERS: update parisc architecture file list parisc: kernel: using strlcpy() instead of strcpy() parisc: rename "CONFIG_PA7100" to "CONFIG_PA7000" parisc: fix kernel BUG at arch/parisc/include/asm/mmzone.h:50 parisc: memory overflow, 'name' length is too short for using
2013-06-01parisc: parport0: fix this legacy no-device port driver!Helge Deller
Fix the above kernel error from parport_announce_port() on 32bit GSC machines (e.g. B160L). The parport driver requires now a pointer to the device struct. Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01parport_pc: disable PARPORT_PC_SUPERIO on parisc architectureHelge Deller
If enabled, CONFIG_PARPORT_PC_SUPERIO scans on PC-like hardware for various super-io chips by accessing i/o ports in a range which will crash any parisc hardware at once. In addition, parisc has it's own incompatible superio chip (CONFIG_SUPERIO), so if we disable PARPORT_PC_SUPERIO completely for parisc we can avoid that people by accident enable the parport_pc superio option too. Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01parisc/PCI: lba: fix: convert to pci_create_root_bus() for correct root bus ↵Helge Deller
resources (v2) commit dc7dce280a Author: Bjorn Helgaas <bhelgaas@google.com> Date: Fri Oct 28 16:27:27 2011 -0600 parisc/PCI: lba: convert to pci_create_root_bus() for correct root bus resources Supply root bus resources to pci_create_root_bus() so they're correct immediately. This fixes the problem of "early" and "header" quirks seeing incorrect root bus resources. added tests for elmmio_space.start while it should use elmmio_space.flags. This for example led to incorrect resource assignments and a non-working stifb framebuffer on most parisc machines. LBA 10:1: PCI host bridge to bus 0000:01 pci_bus 0000:01: root bus resource [io 0x12000-0x13fff] (bus address [0x2000-0x3fff]) pci_bus 0000:01: root bus resource [mem 0xfffffffffa000000-0xfffffffffbffffff] (bus address [0xfa000000-0xfbffffff]) pci_bus 0000:01: root bus resource [mem 0xfffffffff4800000-0xfffffffff4ffffff] (bus address [0xf4800000-0xf4ffffff]) pci_bus 0000:01: root bus resource [??? 0x00000001 flags 0x0] Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2013-06-01parisc/PCI: Set type for LBA bus_num resourceBjorn Helgaas
The non-PAT resource probing code failed to set the type of the LBA bus_num resource (30aa80da43 "parisc/PCI: register busn_res for root buses" did the corresponding thing for the PAT case). This causes incorrect resource assignments and a non-working stifb framebuffer on most parisc machines. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01MAINTAINERS: update parisc architecture file listHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01parisc: kernel: using strlcpy() instead of strcpy()Chen Gang
'boot_args' is an input args, and 'boot_command_line' has a fix length. So use strlcpy() instead of strcpy() to avoid memory overflow. Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01parisc: rename "CONFIG_PA7100" to "CONFIG_PA7000"Paul Bolle
There's a Makefile line setting cflags for CONFIG_PA7100. But that Kconfig macro doesn't exist. There is a Kconfig symbol PA7000, which covers both PA7000 and PA7100 processors. So let's use the corresponding Kconfig macro. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01parisc: fix kernel BUG at arch/parisc/include/asm/mmzone.h:50Helge Deller
With CONFIG_DISCONTIGMEM=y and multiple physical memory areas, cat /proc/kpageflags triggers this kernel bug: kernel BUG at arch/parisc/include/asm/mmzone.h:50! CPU: 2 PID: 7848 Comm: cat Tainted: G D W 3.10.0-rc3-64bit #44 IAOQ[0]: kpageflags_read0x128/0x238 IAOQ[1]: kpageflags_read0x12c/0x238 RP(r2): proc_reg_read0xbc/0x130 Backtrace: [<00000000402ca2d4>] proc_reg_read0xbc/0x130 [<0000000040235bcc>] vfs_read0xc4/0x1d0 [<0000000040235f0c>] SyS_read0x94/0xf0 [<0000000040105fc0>] syscall_exit0x0/0x14 kpageflags_read() walks through the whole memory, even if some memory areas are physically not available. So, we should better not BUG on an unavailable pfn in pfn_to_nid() but just return the expected value -1 or 0. Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01parisc: memory overflow, 'name' length is too short for usingChen Gang
'path.bc[i]' can be asigned by PCI_SLOT() which can '> 10', so sizeof(6 * "%u:" + "%u" + '\0') may be 21. Since 'name' length is 20, it may be memory overflow. And 'path.bc[i]' is 'unsigned char' for printing, we can be sure the max length of 'name' must be less than 28. So simplify thinking, we can use 28 instead of 20 directly, and do not think of whether 'patchc.bc[i]' can '> 100'. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-01Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "Here are a few more fixes for powerpc 3.10. It's a bit more than I would have liked this late in the game but I suppose that's what happens with a brand new chip generation coming out. A few regression fixes, some last minute fixes for new P8 features such as transactional memory,... There's also one powerpc KVM patch that I requested that adds two missing functions to our in-kernel interrupt controller support which is itself a new 3.10 feature. These are defined by the base hypervisor specification. We didn't implement them originally because Linux doesn't use them but they are simple and I'm not comfortable having a half-implemented interface in 3.10 and having to deal with versionning etc... later when something starts needing those calls. They cannot be emulated in qemu when using in-kernel interrupt controller (not enough shared state). Just added a last minute patch to fix a typo introducing a breakage in our cputable for Power7+ processors, sorry about that, but the regression it fixes just hurt me :-)" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/cputable: Fix typo on P7+ cputable entry powerpc/perf: Add missing SIER support powerpc/perf: Revert to original NO_SIPR logic powerpc/pci: Remove the unused variables in pci_process_bridge_OF_ranges powerpc/pci: Remove the stale comments of pci_process_bridge_OF_ranges powerpc/pseries: Always enable CONFIG_HOTPLUG_CPU on PSERIES SMP powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation powerpc/32bit:Store temporary result in r0 instead of r8 powerpc/mm: Always invalidate tlb on hpte invalidate and update powerpc/pseries: Improve stream generation comments in copypage/user powerpc/pseries: Kill all prefetch streams on context switch powerpc/cputable: Fix oprofile_cpu_type on power8 powerpc/mpic: Fix irq distribution problem when MPIC_SINGLE_DEST_CPU powerpc/tm: Fix userspace stack corruption on signal delivery for active transactions powerpc/tm: Move TM abort cause codes to uapi powerpc/tm: Abort on emulation and alignment faults powerpc/tm: Update cause codes documentation powerpc/tm: Make room for hypervisor in abort cause codes
2013-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds
Pull scsi target fixes from Nicholas Bellinger: "The highlights include: - Re-instate sess->wait_list in target_wait_for_sess_cmds() for active I/O shutdown handling in fabrics using se_cmd->cmd_kref - Make ib_srpt call target_sess_cmd_list_set_waiting() during session shutdown - Fix FILEIO off-by-one READ_CAPACITY bug for !S_ISBLK export - Fix iscsi-target login error heap buffer overflow (Kees) - Fix iscsi-target active I/O shutdown handling regression in v3.10-rc1 A big thanks to Kees Cook for fixing a long standing login error buffer overflow bug. All patches are CC'ed to stable with the exception of the v3.10-rc1 specific regression + other minor target cleanup." * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iscsi-target: Fix iscsit_free_cmd() se_cmd->cmd_kref shutdown handling target: Propigate up ->cmd_kref put return via transport_generic_free_cmd iscsi-target: fix heap buffer overflow on error target/file: Fix off-by-one READ_CAPACITY bug for !S_ISBLK export ib_srpt: Call target_sess_cmd_list_set_waiting during shutdown_session target: Re-instate sess_wait_list for target_wait_for_sess_cmds target: Remove unused wait_for_tasks bit in target_wait_for_sess_cmds
2013-06-01Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds
Pull clock subsystem fixes from Mike Turquette: "A mix of small fixes affecting mostly ARM platforms as well as a discrete clock expander chip. Most fixes are corrections to lousy clock data of one form or another." * tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux: clk: mxs: Include clk mxs header file clk: vt8500: Fix unbalanced spinlock in vt8500_dclk_set_rate() clk: si5351: Set initial clkout rate when defined in platform data. clk: si5351: Fix clkout rate computation. clk: samsung: Add CLK_IGNORE_UNUSED flag for the sysreg clocks clk: ux500: clk-sysctrl: handle clocks with no parents clk: ux500: Provide device enumeration number suffix for SMSC911x
2013-06-01Merge tag 'fbdev-for-3.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/plagnioj/linux-fbdev Pull fbdev fixes from Jean-Christophe PLAGNIOL-VILLARD: "This contains some small fixes - Atmel LCDC: fix blank the backlight on remove - ps3fb: fix compile warning - OMAPDSS: Fix crash with DT boot" * tag 'fbdev-for-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/plagnioj/linux-fbdev: atmel_lcdfb: blank the backlight on remove trivial: atmel_lcdfb: add missing error message OMAPDSS: Fix crash with DT boot fbdev/ps3fb: fix compile warning
2013-06-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull assorted fixes from Al Viro: "There'll be more - I'm trying to dig out from under the pile of mail (a couple of weeks of something flu-like ;-/) and there's several more things waiting for review; this is just the obvious stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: zoran: racy refcount handling in vm_ops ->open()/->close() befs_readdir(): do not increment ->f_pos if filldir tells us to stop hpfs: deadlock and race in directory lseek() qnx6: qnx6_readdir() has a braino in pos calculation fix buffer leak after "scsi: saner replacements for ->proc_info()" vfs: Fix invalid ida_remove() call
2013-06-01Merge tag 'nfs-for-3.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull two NFS client fixes from Trond Myklebust: - Fix a regression that broke NFS mounting using klibc and busybox - Stable fix to check access modes correctly on NFSv4 delegated open() * tag 'nfs-for-3.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix security flavor negotiation with legacy binary mounts NFSv4: Fix a thinko in nfs4_try_open_cached
2013-06-01powerpc/cputable: Fix typo on P7+ cputable entryWill Schmidt
Fix a typo in setting COMMON_USER2_POWER7 bits to .cpu_user_features2 cpu specs table. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>