diff options
author | Stephen Rothwell <sfr@canb.auug.org.au> | 2008-05-19 15:02:58 +1000 |
---|---|---|
committer | Stephen Rothwell <sfr@canb.auug.org.au> | 2008-05-19 15:02:58 +1000 |
commit | 514c5cef81627daad0af9abe63b738b903a80821 (patch) | |
tree | 49c235587f8731b996746101d532d9be81912f46 | |
parent | df1464241ca6658e5ba8a920edc16ff497c949b0 (diff) | |
parent | fd4d8cd72819a8b7065c9fb871a8758d71d20a0a (diff) |
Merge branch 'quilt/rr'
Conflicts:
include/linux/stop_machine.h
kernel/stop_machine.c
54 files changed, 776 insertions, 353 deletions
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 77c42f40be5d..b88315671f5d 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -219,10 +219,10 @@ </para> <sect1 id="lock-intro"> - <title>Three Main Types of Kernel Locks: Spinlocks, Mutexes and Semaphores</title> + <title>Two Main Types of Kernel Locks: Spinlocks and Mutexes</title> <para> - There are three main types of kernel locks. The fundamental type + There are two main types of kernel locks. The fundamental type is the spinlock (<filename class="headerfile">include/asm/spinlock.h</filename>), which is a very simple single-holder lock: if you can't get the @@ -240,14 +240,6 @@ use a spinlock instead. </para> <para> - The third type is a semaphore - (<filename class="headerfile">include/linux/semaphore.h</filename>): it - can have more than one holder at any time (the number decided at - initialization time), although it is most commonly used as a - single-holder lock (a mutex). If you can't get a semaphore, your - task will be suspended and later on woken up - just like for mutexes. - </para> - <para> Neither type of lock is recursive: see <xref linkend="deadlock"/>. </para> @@ -278,7 +270,7 @@ </para> <para> - Semaphores still exist, because they are required for + Mutexes still exist, because they are required for synchronization between <firstterm linkend="gloss-usercontext">user contexts</firstterm>, as we will see below. </para> @@ -289,18 +281,17 @@ <para> If you have a data structure which is only ever accessed from - user context, then you can use a simple semaphore - (<filename>linux/linux/semaphore.h</filename>) to protect it. This - is the most trivial case: you initialize the semaphore to the number - of resources available (usually 1), and call - <function>down_interruptible()</function> to grab the semaphore, and - <function>up()</function> to release it. There is also a - <function>down()</function>, which should be avoided, because it + user context, then you can use a simple mutex + (<filename>include/linux/mutex.h</filename>) to protect it. This + is the most trivial case: you initialize the mutex. Then you can + call <function>mutex_lock_interruptible()</function> to grab the mutex, + and <function>mutex_unlock()</function> to release it. There is also a + <function>mutex_lock()</function>, which should be avoided, because it will not return if a signal is received. </para> <para> - Example: <filename>linux/net/core/netfilter.c</filename> allows + Example: <filename>net/netfilter/nf_sockopt.c</filename> allows registration of new <function>setsockopt()</function> and <function>getsockopt()</function> calls, with <function>nf_register_sockopt()</function>. Registration and @@ -515,7 +506,7 @@ <listitem> <para> If you are in a process context (any syscall) and want to - lock other process out, use a semaphore. You can take a semaphore + lock other process out, use a mutex. You can take a mutex and sleep (<function>copy_from_user*(</function> or <function>kmalloc(x,GFP_KERNEL)</function>). </para> @@ -662,7 +653,7 @@ <entry>SLBH</entry> <entry>SLBH</entry> <entry>SLBH</entry> -<entry>DI</entry> +<entry>MLI</entry> <entry>None</entry> </row> @@ -692,8 +683,8 @@ <entry>spin_lock_bh</entry> </row> <row> -<entry>DI</entry> -<entry>down_interruptible</entry> +<entry>MLI</entry> +<entry>mutex_lock_interruptible</entry> </row> </tbody> @@ -703,6 +694,39 @@ </sect1> </chapter> +<chapter id="trylock-functions"> + <title>The trylock Functions</title> + <para> + There are functions that try to acquire a lock only once and immediately + return a value telling about success or failure to acquire the lock. + They can be used if you need no access to the data protected with the lock + when some other thread is holding the lock. You should acquire the lock + later if you then need access to the data protected with the lock. + </para> + + <para> + <function>spin_trylock()</function> does not spin but returns non-zero if + it acquires the spinlock on the first try or 0 if not. This function can + be used in all contexts like <function>spin_lock</function>: you must have + disabled the contexts that might interrupt you and acquire the spin lock. + </para> + + <para> + <function>mutex_trylock()</function> does not suspend your task + but returns non-zero if it could lock the mutex on the first try + or 0 if not. This function cannot be safely used in hardware or software + interrupt contexts despite not sleeping. + </para> + + <para> + <function>down_trylock()</function> does not suspend your task + but returns 0 if it could get the semaphore on the first try or + non-zero if not. The return value is the inverse of that of + <function>spin_trylock()</function> and <function>mutex_trylock() + </function>. <function>down_trylock</function> can be used in all contexts. + </para> +</chapter> + <chapter id="Examples"> <title>Common Examples</title> <para> @@ -1285,7 +1309,7 @@ as Alan Cox says, <quote>Lock data, not code</quote>. <para> There is a coding bug where a piece of code tries to grab a spinlock twice: it will spin forever, waiting for the lock to - be released (spinlocks, rwlocks and semaphores are not + be released (spinlocks, rwlocks and mutexes are not recursive in Linux). This is trivial to diagnose: not a stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem. @@ -1310,7 +1334,7 @@ as Alan Cox says, <quote>Lock data, not code</quote>. <para> This complete lockup is easy to diagnose: on SMP boxes the - watchdog timer or compiling with <symbol>DEBUG_SPINLOCKS</symbol> set + watchdog timer or compiling with <symbol>DEBUG_SPINLOCK</symbol> set (<filename>include/linux/spinlock.h</filename>) will show this up immediately when it happens. </para> @@ -1533,7 +1557,7 @@ the amount of locking which needs to be done. <title>Read/Write Lock Variants</title> <para> - Both spinlocks and semaphores have read/write variants: + Both spinlocks and mutexes have read/write variants: <type>rwlock_t</type> and <structname>struct rw_semaphore</structname>. These divide users into two classes: the readers and the writers. If you are only reading the data, you can get a read lock, but to write to @@ -1656,7 +1680,7 @@ the amount of locking which needs to be done. #include <linux/slab.h> #include <linux/string.h> +#include <linux/rcupdate.h> - #include <linux/semaphore.h> + #include <linux/mutex.h> #include <asm/errno.h> struct object @@ -1888,7 +1912,7 @@ machines due to caching. </listitem> <listitem> <para> - <function> put_user()</function> + <function>put_user()</function> </para> </listitem> </itemizedlist> @@ -1902,13 +1926,13 @@ machines due to caching. <listitem> <para> - <function>down_interruptible()</function> and - <function>down()</function> + <function>mutex_lock_interruptible()</function> and + <function>mutex_lock()</function> </para> <para> - There is a <function>down_trylock()</function> which can be + There is a <function>mutex_trylock()</function> which can be used inside interrupt context, as it will not sleep. - <function>up()</function> will also never sleep. + <function>mutex_unlock()</function> will also never sleep. </para> </listitem> </itemizedlist> @@ -1998,7 +2022,7 @@ machines due to caching. <para> Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is unset, processes in user context inside the kernel would not - preempt each other (ie. you had that CPU until you have it up, + preempt each other (ie. you had that CPU until you gave it up, except for interrupts). With the addition of <symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when in user context, higher priority tasks can "cut in": spinlocks diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 3be8ab2a886a..90f9ab0305e8 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -41,6 +41,7 @@ #include "linux/virtio_net.h" #include "linux/virtio_blk.h" #include "linux/virtio_console.h" +#include "linux/virtio_rng.h" #include "linux/virtio_ring.h" #include "asm-x86/bootparam.h" /*L:110 We can ignore the 39 include files we need for this program, but I do @@ -152,9 +153,6 @@ struct virtqueue /* The actual ring of buffers. */ struct vring vring; - /* Last available index we saw. */ - u16 last_avail_idx; - /* The routine to call when the Guest pings us. */ void (*handle_output)(int fd, struct virtqueue *me); }; @@ -196,6 +194,33 @@ static void *_convert(struct iovec *iov, size_t size, size_t align, #define le32_to_cpu(v32) (v32) #define le64_to_cpu(v64) (v64) +/* Is this iovec empty? */ +static bool iov_empty(const struct iovec iov[], unsigned int num_iov) +{ + unsigned int i; + + for (i = 0; i < num_iov; i++) + if (iov[i].iov_len) + return false; + return true; +} + +/* Take len bytes from the front of this iovec. */ +static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len) +{ + unsigned int i; + + for (i = 0; i < num_iov; i++) { + unsigned int used; + + used = iov[i].iov_len < len ? iov[i].iov_len : len; + iov[i].iov_base += used; + iov[i].iov_len -= used; + len -= used; + } + assert(len == 0); +} + /* The device virtqueue descriptors are followed by feature bitmasks. */ static u8 *get_feature_bits(struct device *dev) { @@ -658,19 +683,22 @@ static unsigned get_vq_desc(struct virtqueue *vq, unsigned int *out_num, unsigned int *in_num) { unsigned int i, head; + u16 last_avail; /* Check it isn't doing very strange things with descriptor numbers. */ - if ((u16)(vq->vring.avail->idx - vq->last_avail_idx) > vq->vring.num) + last_avail = vring_last_avail(&vq->vring); + if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num) errx(1, "Guest moved used index from %u to %u", - vq->last_avail_idx, vq->vring.avail->idx); + last_avail, vq->vring.avail->idx); /* If there's nothing new since last we looked, return invalid. */ - if (vq->vring.avail->idx == vq->last_avail_idx) + if (vq->vring.avail->idx == last_avail) return vq->vring.num; /* Grab the next descriptor number they're advertising, and increment * the index we've seen. */ - head = vq->vring.avail->ring[vq->last_avail_idx++ % vq->vring.num]; + head = vq->vring.avail->ring[last_avail % vq->vring.num]; + vring_last_avail(&vq->vring)++; /* If their number is silly, that's a fatal mistake. */ if (head >= vq->vring.num) @@ -945,7 +973,7 @@ static void update_device_status(struct device *dev) for (vq = dev->vq; vq; vq = vq->next) { memset(vq->vring.desc, 0, vring_size(vq->config.num, getpagesize())); - vq->last_avail_idx = 0; + vring_last_avail(&vq->vring) = 0; } } else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) { warnx("Device %s configuration FAILED", dev->name); @@ -1105,7 +1133,6 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs, /* Initialize the virtqueue */ vq->next = NULL; - vq->last_avail_idx = 0; vq->dev = dev; /* Initialize the configuration. */ @@ -1161,6 +1188,10 @@ static void add_feature(struct device *dev, unsigned bit) * how we use it. */ static void set_config(struct device *dev, unsigned len, const void *conf) { + /* We always set the VIRTIO_RING_F_PUBLISH_INDICES feature + * bit, so now is a good time to do that. */ + add_feature(dev, VIRTIO_RING_F_PUBLISH_INDICES); + /* Check we haven't overflowed our single page. */ if (device_config(dev) + len > devices.descpage + getpagesize()) errx(1, "Too many devices"); @@ -1235,6 +1266,8 @@ static void setup_console(void) add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd); add_virtqueue(dev, VIRTQUEUE_NUM, handle_console_output); + /* Every device should set this bit. */ + add_feature(dev, VIRTIO_RING_F_PUBLISH_INDICES); verbose("device %u: console\n", devices.device_num++); } /*:*/ @@ -1613,6 +1646,64 @@ static void setup_block_file(const char *filename) verbose("device %u: virtblock %llu sectors\n", devices.device_num, le64_to_cpu(conf.capacity)); } + +/* Our random number generator device reads from /dev/urandom into the Guest's + * input buffers. The usual case is that the Guest doesn't want random numbers + * and so has no buffers although /dev/urandom is still readable, whereas + * console is the reverse. + * + * The same logic applies, however. */ +static bool handle_rng_input(int fd, struct device *dev) +{ + int len; + unsigned int head, in_num, out_num, totlen = 0; + struct iovec iov[dev->vq->vring.num]; + + /* First we need a buffer from the Guests's virtqueue. */ + head = get_vq_desc(dev->vq, iov, &out_num, &in_num); + + /* If they're not ready for input, stop listening to this file + * descriptor. We'll start again once they add an input buffer. */ + if (head == dev->vq->vring.num) + return false; + + if (out_num) + errx(1, "Output buffers in rng?"); + + /* This is why we convert to iovecs: the readv() call uses them, and so + * it reads straight into the Guest's buffer. We loop to make sure we + * fill it. */ + while (!iov_empty(iov, in_num)) { + len = readv(dev->fd, iov, in_num); + if (len <= 0) + err(1, "Read from /dev/urandom gave %i", len); + iov_consume(iov, in_num, len); + totlen += len; + } + + /* Tell the Guest about the new input. */ + add_used_and_trigger(fd, dev->vq, head, totlen); + + /* Everything went OK! */ + return true; +} + +/* And this creates a "hardware" random number device for the Guest. */ +static void setup_rng(void) +{ + struct device *dev; + int fd; + + fd = open_or_die("/dev/urandom", O_RDONLY); + + /* The device responds to return from I/O thread. */ + dev = new_device("rng", VIRTIO_ID_RNG, fd, handle_rng_input); + + /* The device has one virtqueue, where the Guest places inbufs. */ + add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd); + + verbose("device %u: rng\n", devices.device_num++); +} /* That's the end of device setup. */ /*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */ @@ -1683,6 +1774,7 @@ static struct option opts[] = { { "verbose", 0, NULL, 'v' }, { "tunnet", 1, NULL, 't' }, { "block", 1, NULL, 'b' }, + { "rng", 0, NULL, 'r' }, { "initrd", 1, NULL, 'i' }, { NULL }, }; @@ -1757,6 +1849,9 @@ int main(int argc, char *argv[]) case 'b': setup_block_file(optarg); break; + case 'r': + setup_rng(); + break; case 'i': initrd_name = optarg; break; diff --git a/arch/ia64/kernel/salinfo.c b/arch/ia64/kernel/salinfo.c index ecb9eb78d687..b66c51d739f2 100644 --- a/arch/ia64/kernel/salinfo.c +++ b/arch/ia64/kernel/salinfo.c @@ -192,7 +192,7 @@ struct salinfo_platform_oemdata_parms { static void salinfo_work_to_do(struct salinfo_data *data) { - down_trylock(&data->mutex); + down_nowait(&data->mutex); up(&data->mutex); } @@ -309,7 +309,7 @@ salinfo_event_read(struct file *file, char __user *buffer, size_t count, loff_t int i, n, cpu = -1; retry: - if (cpus_empty(data->cpu_event) && down_trylock(&data->mutex)) { + if (cpus_empty(data->cpu_event) && !down_nowait(&data->mutex)) { if (file->f_flags & O_NONBLOCK) return -EAGAIN; if (down_interruptible(&data->mutex)) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 84e064ffee52..f83efec457fe 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -260,6 +260,10 @@ static int virtblk_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_TAG, NULL); + /* If disk is read-only in the host, the guest should obey */ + if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO)) + set_disk_ro(vblk->disk, 1); + /* Host must always specify the capacity. */ vdev->config->get(vdev, offsetof(struct virtio_blk_config, capacity), &cap, sizeof(cap)); @@ -325,7 +329,7 @@ static struct virtio_device_id id_table[] = { static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, - VIRTIO_BLK_F_GEOMETRY, + VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, }; static struct virtio_driver virtio_blk = { diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig index 8d6c2089d2a8..407fe01015a4 100644 --- a/drivers/char/hw_random/Kconfig +++ b/drivers/char/hw_random/Kconfig @@ -112,3 +112,13 @@ config HW_RANDOM_PASEMI If unsure, say Y. +config HW_RANDOM_VIRTIO + tristate "VirtIO Random Number Generator support" + depends on HW_RANDOM && VIRTIO + ---help--- + This driver provides kernel-side support for the virtual Random Number + Generator hardware. + + To compile this driver as a module, choose M here: the + module will be called virtio-rng. If unsure, say N. + diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile index c8b7300e2fb1..b4940ddbb35f 100644 --- a/drivers/char/hw_random/Makefile +++ b/drivers/char/hw_random/Makefile @@ -11,3 +11,4 @@ obj-$(CONFIG_HW_RANDOM_VIA) += via-rng.o obj-$(CONFIG_HW_RANDOM_IXP4XX) += ixp4xx-rng.o obj-$(CONFIG_HW_RANDOM_OMAP) += omap-rng.o obj-$(CONFIG_HW_RANDOM_PASEMI) += pasemi-rng.o +obj-$(CONFIG_HW_RANDOM_VIRTIO) += virtio-rng.o diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c new file mode 100644 index 000000000000..16be243b7ae0 --- /dev/null +++ b/drivers/char/hw_random/virtio-rng.c @@ -0,0 +1,143 @@ +/* + * Randomness driver for virtio + * Copyright (C) 2007, 2008 Rusty Russell IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +#include <linux/err.h> +#include <linux/hw_random.h> +#include <linux/scatterlist.h> +#include <linux/spinlock.h> +#include <linux/virtio.h> +#include <linux/virtio_rng.h> + +/* The host will fill any buffer we give it with sweet, sweet randomness. We + * give it 64 bytes at a time, and the hwrng framework takes it 4 bytes at a + * time. */ +static struct virtqueue *vq; +static u32 random_data[16]; +static unsigned int data_left; +static DECLARE_COMPLETION(have_data); + +static void random_recv_done(struct virtqueue *vq) +{ + int len; + + /* We never get spurious callbacks. */ + if (!vq->vq_ops->get_buf(vq, &len)) + BUG(); + + data_left = len / sizeof(random_data[0]); + complete(&have_data); +} + +static void register_buffer(void) +{ + struct scatterlist sg; + + sg_init_one(&sg, &random_data, sizeof(random_data)); + /* There should always be room for one buffer. */ + if (vq->vq_ops->add_buf(vq, &sg, 0, 1, &random_data) != 0) + BUG(); + vq->vq_ops->kick(vq); +} + +/* At least we don't udelay() in a loop like some other drivers. */ +static int virtio_data_present(struct hwrng *rng, int wait) +{ + if (data_left) + return 1; + + if (!wait) + return 0; + + wait_for_completion(&have_data); + return 1; +} + +/* virtio_data_present() must have succeeded before this is called. */ +static int virtio_data_read(struct hwrng *rng, u32 *data) +{ + BUG_ON(!data_left); + + *data = random_data[--data_left]; + + if (!data_left) { + init_completion(&have_data); + register_buffer(); + } + return sizeof(*data); +} + +static struct hwrng virtio_hwrng = { + .name = "virtio", + .data_present = virtio_data_present, + .data_read = virtio_data_read, +}; + +static int virtrng_probe(struct virtio_device *vdev) +{ + int err; + + /* We expect a single virtqueue. */ + vq = vdev->config->find_vq(vdev, 0, random_recv_done); + if (IS_ERR(vq)) + return PTR_ERR(vq); + + err = hwrng_register(&virtio_hwrng); + if (err) { + vdev->config->del_vq(vq); + return err; + } + + register_buffer(); + return 0; +} + +static void virtrng_remove(struct virtio_device *vdev) +{ + vdev->config->reset(vdev); + hwrng_unregister(&virtio_hwrng); + vdev->config->del_vq(vq); +} + +static struct virtio_device_id id_table[] = { + { VIRTIO_ID_RNG, VIRTIO_DEV_ANY_ID }, + { 0 }, +}; + +static struct virtio_driver virtio_rng = { + .driver.name = KBUILD_MODNAME, + .driver.owner = THIS_MODULE, + .id_table = id_table, + .probe = virtrng_probe, + .remove = __devexit_p(virtrng_remove), +}; + +static int __init init(void) +{ + return register_virtio_driver(&virtio_rng); +} + +static void __exit fini(void) +{ + unregister_virtio_driver(&virtio_rng); +} +module_init(init); +module_exit(fini); + +MODULE_DEVICE_TABLE(virtio, id_table); +MODULE_DESCRIPTION("Virtio random number driver"); +MODULE_LICENSE("GPL"); diff --git a/drivers/char/snsc.c b/drivers/char/snsc.c index 8fe099a41065..8eba9d177423 100644 --- a/drivers/char/snsc.c +++ b/drivers/char/snsc.c @@ -161,7 +161,7 @@ scdrv_read(struct file *file, char __user *buf, size_t count, loff_t *f_pos) struct subch_data_s *sd = (struct subch_data_s *) file->private_data; /* try to get control of the read buffer */ - if (down_trylock(&sd->sd_rbs)) { + if (!down_nowait(&sd->sd_rbs)) { /* somebody else has it now; * if we're non-blocking, then exit... */ @@ -253,7 +253,7 @@ scdrv_write(struct file *file, const char __user *buf, struct subch_data_s *sd = (struct subch_data_s *) file->private_data; /* try to get control of the write buffer */ - if (down_trylock(&sd->sd_wbs)) { + if (!down_nowait(&sd->sd_wbs)) { /* somebody else has it now; * if we're non-blocking, then exit... */ diff --git a/drivers/char/viotape.c b/drivers/char/viotape.c index 58aad63831f4..bf9dec600b13 100644 --- a/drivers/char/viotape.c +++ b/drivers/char/viotape.c @@ -361,7 +361,7 @@ static ssize_t viotap_write(struct file *file, const char *buf, * semaphore */ if (noblock) { - if (down_trylock(&reqSem)) { + if (!down_nowait(&reqSem)) { ret = -EWOULDBLOCK; goto free_op; } @@ -451,7 +451,7 @@ static ssize_t viotap_read(struct file *file, char *buf, size_t count, * semaphore */ if (noblock) { - if (down_trylock(&reqSem)) { + if (!down_nowait(&reqSem)) { ret = -EWOULDBLOCK; goto free_op; } diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 840ede9ae965..440b91497695 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -890,7 +890,7 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp) return -ENXIO; if (filp->f_flags & O_NONBLOCK) { - if (down_trylock(&port->sm_sem)) { + if (!down_nowait(&port->sm_sem)) { ret = -EAGAIN; goto fail; } diff --git a/drivers/input/serio/hil_mlc.c b/drivers/input/serio/hil_mlc.c index 93a1a6ba216a..4fad29e7768f 100644 --- a/drivers/input/serio/hil_mlc.c +++ b/drivers/input/serio/hil_mlc.c @@ -607,7 +607,7 @@ static inline void hilse_setup_input(hil_mlc *mlc, const struct hilse_node *node do_gettimeofday(&(mlc->instart)); mlc->icount = 15; memset(mlc->ipacket, 0, 16 * sizeof(hil_packet)); - BUG_ON(down_trylock(&mlc->isem)); + BUG_ON(!down_nowait(&mlc->isem)); } #ifdef HIL_MLC_DEBUG @@ -694,7 +694,7 @@ static int hilse_donode(hil_mlc *mlc) out2: write_unlock_irqrestore(&mlc->lock, flags); - if (down_trylock(&mlc->osem)) { + if (!down_nowait(&mlc->osem)) { nextidx = HILSEN_DOZE; break; } diff --git a/drivers/input/serio/hp_sdc_mlc.c b/drivers/input/serio/hp_sdc_mlc.c index 587398f5c9df..b0702bf3c60c 100644 --- a/drivers/input/serio/hp_sdc_mlc.c +++ b/drivers/input/serio/hp_sdc_mlc.c @@ -148,7 +148,7 @@ static int hp_sdc_mlc_in(hil_mlc *mlc, suseconds_t timeout) priv = mlc->priv; /* Try to down the semaphore */ - if (down_trylock(&mlc->isem)) { + if (!down_nowait(&mlc->isem)) { struct timeval tv; if (priv->emtestmode) { mlc->ipacket[0] = @@ -186,13 +186,13 @@ static int hp_sdc_mlc_cts(hil_mlc *mlc) priv = mlc->priv; /* Try to down the semaphores -- they should be up. */ - BUG_ON(down_trylock(&mlc->isem)); - BUG_ON(down_trylock(&mlc->osem)); + BUG_ON(!down_nowait(&mlc->isem)); + BUG_ON(!down_nowait(&mlc->osem)); up(&mlc->isem); up(&mlc->osem); - if (down_trylock(&mlc->csem)) { + if (!down_nowait(&mlc->csem)) { if (priv->trans.act.semaphore != &mlc->csem) goto poll; else @@ -229,7 +229,7 @@ static void hp_sdc_mlc_out(hil_mlc *mlc) priv = mlc->priv; /* Try to down the semaphore -- it should be up. */ - BUG_ON(down_trylock(&mlc->osem)); + BUG_ON(!down_nowait(&mlc->osem)); if (mlc->opacket & HIL_DO_ALTER_CTRL) goto do_control; @@ -240,7 +240,7 @@ static void hp_sdc_mlc_out(hil_mlc *mlc) return; } /* Shouldn't be sending commands when loop may be busy */ - BUG_ON(down_trylock(&mlc->csem)); + BUG_ON(!down_nowait(&mlc->csem)); up(&mlc->csem); priv->trans.actidx = 0; @@ -296,7 +296,7 @@ static void hp_sdc_mlc_out(hil_mlc *mlc) priv->tseq[3] = 0; if (mlc->opacket & HIL_CTRL_APE) { priv->tseq[3] |= HP_SDC_LPC_APE_IPF; - down_trylock(&mlc->csem); + down_nowait(&mlc->csem); } enqueue: hp_sdc_enqueue_transaction(&priv->trans); diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c index 8080249957af..4a226b86d181 100644 --- a/drivers/lguest/lguest_device.c +++ b/drivers/lguest/lguest_device.c @@ -27,7 +27,7 @@ static unsigned int dev_index; * __iomem to quieten sparse. */ static inline void *lguest_map(unsigned long phys_addr, unsigned long pages) { - return (__force void *)ioremap(phys_addr, PAGE_SIZE*pages); + return (__force void *)ioremap_cache(phys_addr, PAGE_SIZE*pages); } static inline void lguest_unmap(void *addr) @@ -98,7 +98,8 @@ static u32 lg_get_features(struct virtio_device *vdev) if (in_features[i / 8] & (1 << (i % 8))) features |= (1 << i); - return features; + /* Vring may want to play with the bits it's offered. */ + return vring_transport_features(features); } static void lg_set_features(struct virtio_device *vdev, u32 features) diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index ff05fe893083..d6d248654a5b 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -587,7 +587,7 @@ static void rh_recovery_prepare(struct region_hash *rh) /* Extra reference to avoid race with rh_stop_recovery */ atomic_inc(&rh->recovery_in_flight); - while (!down_trylock(&rh->recovery_count)) { + while (down_nowait(&rh->recovery_count)) { atomic_inc(&rh->recovery_in_flight); if (__rh_recovery_prepare(rh) <= 0) { atomic_dec(&rh->recovery_in_flight); diff --git a/drivers/net/3c527.c b/drivers/net/3c527.c index 6aca0c640f13..a16c7fa69029 100644 --- a/drivers/net/3c527.c +++ b/drivers/net/3c527.c @@ -576,7 +576,7 @@ static int mc32_command_nowait(struct net_device *dev, u16 cmd, void *data, int int ioaddr = dev->base_addr; int ret = -1; - if (down_trylock(&lp->cmd_mutex) == 0) + if (down_nowait(&lp->cmd_mutex)) { lp->cmd_nonblocking=1; lp->exec_box->mbox=0; diff --git a/drivers/net/irda/sir_dev.c b/drivers/net/irda/sir_dev.c index 6078e03de9a8..e14383c674f9 100644 --- a/drivers/net/irda/sir_dev.c +++ b/drivers/net/irda/sir_dev.c @@ -286,7 +286,7 @@ int sirdev_schedule_request(struct sir_dev *dev, int initial_state, unsigned par IRDA_DEBUG(2, "%s - state=0x%04x / param=%u\n", __FUNCTION__, initial_state, param); - if (down_trylock(&fsm->sem)) { + if (!down_nowait(&fsm->sem)) { if (in_interrupt() || in_atomic() || irqs_disabled()) { IRDA_DEBUG(1, "%s(), state machine busy!\n", __FUNCTION__); return -EWOULDBLOCK; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f926b5ab3d09..a6fde5caa037 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -19,6 +19,7 @@ //#define DEBUG #include <linux/netdevice.h> #include <linux/etherdevice.h> +#include <linux/ethtool.h> #include <linux/module.h> #include <linux/virtio.h> #include <linux/virtio_net.h> @@ -50,6 +51,9 @@ struct virtnet_info /* Receive & send queues. */ struct sk_buff_head recv; struct sk_buff_head send; + + /* Chain pages by the private ptr. */ + struct page *pages; }; static inline struct virtio_net_hdr *skb_vnet_hdr(struct sk_buff *skb) @@ -62,6 +66,23 @@ static inline void vnet_hdr_to_sg(struct scatterlist *sg, struct sk_buff *skb) sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr)); } +static void give_a_page(struct virtnet_info *vi, struct page *page) +{ + page->private = (unsigned long)vi->pages; + vi->pages = page; +} + +static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask) +{ + struct page *p = vi->pages; + + if (p) + vi->pages = (struct page *)p->private; + else + p = alloc_page(gfp_mask); + return p; +} + static void skb_xmit_done(struct virtqueue *svq) { struct virtnet_info *vi = svq->vdev->priv; @@ -76,6 +97,7 @@ static void receive_skb(struct net_device *dev, struct sk_buff *skb, unsigned len) { struct virtio_net_hdr *hdr = skb_vnet_hdr(skb); + int err; if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) { pr_debug("%s: short packet %i\n", dev->name, len); @@ -83,9 +105,24 @@ static void receive_skb(struct net_device *dev, struct sk_buff *skb, goto drop; } len -= sizeof(struct virtio_net_hdr); - BUG_ON(len > MAX_PACKET_LEN); - skb_trim(skb, len); + if (len <= MAX_PACKET_LEN) { + unsigned int i; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) + give_a_page(dev->priv, skb_shinfo(skb)->frags[i].page); + skb->data_len = 0; + skb_shinfo(skb)->nr_frags = 0; + } + + err = pskb_trim(skb, len); + if (err) { + pr_debug("%s: pskb_trim failed %i %d\n", dev->name, len, err); + dev->stats.rx_dropped++; + goto drop; + } + skb->truesize += skb->data_len; + skb->protocol = eth_type_trans(skb, dev); pr_debug("Receiving skb proto 0x%04x len %i type %i\n", ntohs(skb->protocol), skb->len, skb->pkt_type); @@ -146,7 +183,7 @@ static void try_fill_recv(struct virtnet_info *vi) { struct sk_buff *skb; struct scatterlist sg[2+MAX_SKB_FRAGS]; - int num, err; + int num, err, i; sg_init_table(sg, 2+MAX_SKB_FRAGS); for (;;) { @@ -156,6 +193,24 @@ static void try_fill_recv(struct virtnet_info *vi) skb_put(skb, MAX_PACKET_LEN); vnet_hdr_to_sg(sg, skb); + + if (vi->dev->features & NETIF_F_LRO) { + for (i = 0; i < MAX_SKB_FRAGS; i++) { + skb_frag_t *f = &skb_shinfo(skb)->frags[i]; + f->page = get_a_page(vi, GFP_ATOMIC); + if (!f->page) + break; + + f->page_offset = 0; + f->size = PAGE_SIZE; + + skb->data_len += PAGE_SIZE; + skb->len += PAGE_SIZE; + + skb_shinfo(skb)->nr_frags++; + } + } + num = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1; skb_queue_head(&vi->recv, skb); @@ -356,6 +411,22 @@ static int virtnet_close(struct net_device *dev) return 0; } +static int virtnet_set_tx_csum(struct net_device *dev, u32 data) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi->vdev; + + if (data && !virtio_has_feature(vdev, VIRTIO_NET_F_CSUM)) + return -ENOSYS; + + return ethtool_op_set_tx_hw_csum(dev, data); +} + +static struct ethtool_ops virtnet_ethtool_ops = { + .set_tx_csum = virtnet_set_tx_csum, + .set_sg = ethtool_op_set_sg, +}; + static int virtnet_probe(struct virtio_device *vdev) { int err; @@ -375,6 +446,7 @@ static int virtnet_probe(struct virtio_device *vdev) #ifdef CONFIG_NET_POLL_CONTROLLER dev->poll_controller = virtnet_netpoll; #endif + SET_ETHTOOL_OPS(dev, &virtnet_ethtool_ops); SET_NETDEV_DEV(dev, &vdev->dev); /* Do we support "hardware" checksums? */ @@ -396,6 +468,12 @@ static int virtnet_probe(struct virtio_device *vdev) dev->features |= NETIF_F_UFO; } + /* If we can receive ANY GSO packets, we must allocate large ones. */ + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN)) + dev->features |= NETIF_F_LRO; + /* Configuration may specify what MAC to use. Otherwise random. */ if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { vdev->config->get(vdev, @@ -410,6 +488,7 @@ static int virtnet_probe(struct virtio_device *vdev) vi->dev = dev; vi->vdev = vdev; vdev->priv = vi; + vi->pages = NULL; /* We expect two virtqueues, receive then send. */ vi->rvq = vdev->config->find_vq(vdev, 0, skb_recv_done); @@ -478,6 +557,10 @@ static void virtnet_remove(struct virtio_device *vdev) vdev->config->del_vq(vi->svq); vdev->config->del_vq(vi->rvq); unregister_netdev(vi->dev); + + while (vi->pages) + __free_pages(get_a_page(vi, GFP_KERNEL), 0); + free_netdev(vi->dev); } @@ -489,7 +572,8 @@ static struct virtio_device_id id_table[] = { static unsigned int features[] = { VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC, VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6, - VIRTIO_NET_F_HOST_ECN, + VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, + VIRTIO_NET_F_GUEST_ECN, /* We don't yet handle UFO input. */ }; static struct virtio_driver virtio_net = { diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c index 17ced37e55ed..e94ec7c00aa7 100644 --- a/drivers/net/wireless/airo.c +++ b/drivers/net/wireless/airo.c @@ -2137,7 +2137,7 @@ static int airo_start_xmit(struct sk_buff *skb, struct net_device *dev) { fids[i] |= (len << 16); priv->xmit.skb = skb; priv->xmit.fid = i; - if (down_trylock(&priv->sem) != 0) { + if (!down_nowait(&priv->sem)) { set_bit(FLAG_PENDING_XMIT, &priv->flags); netif_stop_queue(dev); set_bit(JOB_XMIT, &priv->jobs); @@ -2208,7 +2208,7 @@ static int airo_start_xmit11(struct sk_buff *skb, struct net_device *dev) { fids[i] |= (len << 16); priv->xmit11.skb = skb; priv->xmit11.fid = i; - if (down_trylock(&priv->sem) != 0) { + if (!down_nowait(&priv->sem)) { set_bit(FLAG_PENDING_XMIT11, &priv->flags); netif_stop_queue(dev); set_bit(JOB_XMIT11, &priv->jobs); @@ -2258,7 +2258,7 @@ static struct net_device_stats *airo_get_stats(struct net_device *dev) if (!test_bit(JOB_STATS, &local->jobs)) { /* Get stats out of the card if available */ - if (down_trylock(&local->sem) != 0) { + if (!down_nowait(&local->sem)) { set_bit(JOB_STATS, &local->jobs); wake_up_interruptible(&local->thr_wait); } else @@ -2285,7 +2285,7 @@ static void airo_set_multicast_list(struct net_device *dev) { if ((dev->flags ^ ai->flags) & IFF_PROMISC) { change_bit(FLAG_PROMISC, &ai->flags); - if (down_trylock(&ai->sem) != 0) { + if (!down_nowait(&ai->sem)) { set_bit(JOB_PROMISC, &ai->jobs); wake_up_interruptible(&ai->thr_wait); } else @@ -3213,7 +3213,7 @@ static irqreturn_t airo_interrupt(int irq, void *dev_id) set_bit(FLAG_UPDATE_UNI, &apriv->flags); set_bit(FLAG_UPDATE_MULTI, &apriv->flags); - if (down_trylock(&apriv->sem) != 0) { + if (!down_nowait(&apriv->sem)) { set_bit(JOB_EVENT, &apriv->jobs); wake_up_interruptible(&apriv->thr_wait); } else @@ -7660,7 +7660,7 @@ static struct iw_statistics *airo_get_wireless_stats(struct net_device *dev) if (!test_bit(JOB_WSTATS, &local->jobs)) { /* Get stats out of the card if available */ - if (down_trylock(&local->sem) != 0) { + if (!down_nowait(&local->sem)) { set_bit(JOB_WSTATS, &local->jobs); wake_up_interruptible(&local->thr_wait); } else diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c index 289304aab690..f0a09dc91b0c 100644 --- a/drivers/scsi/aacraid/commsup.c +++ b/drivers/scsi/aacraid/commsup.c @@ -490,7 +490,7 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned long size, * hardware failure has occurred. */ unsigned long count = 36000000L; /* 3 minutes */ - while (down_trylock(&fibptr->event_wait)) { + while (!down_nowait(&fibptr->event_wait)) { int blink; if (--count == 0) { struct aac_queue * q = &dev->queues->queue[AdapNormCmdQueue]; diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c index 325774375837..2f76846e1c60 100644 --- a/drivers/usb/core/usb.c +++ b/drivers/usb/core/usb.c @@ -478,7 +478,7 @@ int usb_lock_device_for_reset(struct usb_device *udev, } } - while (usb_trylock_device(udev) != 0) { + while (!usb_trylock_device(udev)) { /* If we can't acquire the lock after waiting one second, * we're probably deadlocked */ diff --git a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c index 69b0a2754f2a..0caceb4bd29c 100644 --- a/drivers/usb/gadget/inode.c +++ b/drivers/usb/gadget/inode.c @@ -298,7 +298,7 @@ get_ready_ep (unsigned f_flags, struct ep_data *epdata) int val; if (f_flags & O_NONBLOCK) { - if (down_trylock (&epdata->lock) != 0) + if (!down_nowait (&epdata->lock)) goto nonblock; if (epdata->state != STATE_EP_ENABLED) { up (&epdata->lock); diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 13866789b356..f42767243260 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -117,6 +117,11 @@ static int virtio_dev_probe(struct device *_d) set_bit(f, dev->features); } + /* Transport features are always preserved to pass to set_features. */ + for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) + if (device_features & (1 << i)) + set_bit(i, dev->features); + err = drv->probe(dev); if (err) add_status(dev, VIRTIO_CONFIG_S_FAILED); diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 27e9fc9117cd..ac18458eadcf 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -91,10 +91,14 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev) static u32 vp_get_features(struct virtio_device *vdev) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); + u32 features; /* When someone needs more than 32 feature bits, we'll need to * steal a bit to indicate that the rest are somewhere else. */ - return ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES); + features = ioread32(vp_dev->ioaddr + VIRTIO_PCI_HOST_FEATURES); + + /* Vring may want to play with the bits it's offered. */ + return vring_transport_features(features); } /* virtio config->set_features() implementation */ diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 937a49d6772c..eb209cea774d 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -18,6 +18,7 @@ */ #include <linux/virtio.h> #include <linux/virtio_ring.h> +#include <linux/virtio_config.h> #include <linux/device.h> #ifdef DEBUG @@ -52,9 +53,6 @@ struct vring_virtqueue /* Number we've added since last sync. */ unsigned int num_added; - /* Last used index we've seen. */ - u16 last_used_idx; - /* How to notify other side. FIXME: commonalize hcalls! */ void (*notify)(struct virtqueue *vq); @@ -173,12 +171,13 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head) static inline bool more_used(const struct vring_virtqueue *vq) { - return vq->last_used_idx != vq->vring.used->idx; + return vring_last_used(&vq->vring) != vq->vring.used->idx; } static void *vring_get_buf(struct virtqueue *_vq, unsigned int *len) { struct vring_virtqueue *vq = to_vvq(_vq); + struct vring_used_elem *u; void *ret; unsigned int i; @@ -195,8 +194,11 @@ static void *vring_get_buf(struct virtqueue *_vq, unsigned int *len) return NULL; } - i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id; - *len = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].len; + u = &vq->vring.used->ring[vring_last_used(&vq->vring) % vq->vring.num]; + i = u->id; + *len = u->len; + /* Make sure we don't reload i after doing checks. */ + rmb(); if (unlikely(i >= vq->vring.num)) { BAD_RING(vq, "id %u out of range\n", i); @@ -210,7 +212,7 @@ static void *vring_get_buf(struct virtqueue *_vq, unsigned int *len) /* detach_buf clears data, so grab it now. */ ret = vq->data[i]; detach_buf(vq, i); - vq->last_used_idx++; + vring_last_used(&vq->vring)++; END_USE(vq); return ret; } @@ -302,7 +304,6 @@ struct virtqueue *vring_new_virtqueue(unsigned int num, vq->vq.vq_ops = &vring_vq_ops; vq->notify = notify; vq->broken = false; - vq->last_used_idx = 0; vq->num_added = 0; #ifdef DEBUG vq->in_use = false; @@ -328,4 +329,15 @@ void vring_del_virtqueue(struct virtqueue *vq) } EXPORT_SYMBOL_GPL(vring_del_virtqueue); +/* Manipulates transport-specific feature bits. */ +u32 vring_transport_features(u32 features) +{ + u32 mask = ~VIRTIO_TRANSPORT_F_MASK; + + /* We let through any non-transport bits, and the only one we know. */ + mask &= ~(1 << VIRTIO_RING_F_PUBLISH_INDICES); + return features & mask; +} +EXPORT_SYMBOL_GPL(vring_transport_features); + MODULE_LICENSE("GPL"); diff --git a/drivers/watchdog/ar7_wdt.c b/drivers/watchdog/ar7_wdt.c index 2eb48c0df32c..a24e9a57271a 100644 --- a/drivers/watchdog/ar7_wdt.c +++ b/drivers/watchdog/ar7_wdt.c @@ -179,7 +179,7 @@ static void ar7_wdt_disable_wdt(void) static int ar7_wdt_open(struct inode *inode, struct file *file) { /* only allow one at a time */ - if (down_trylock(&open_semaphore)) + if (!down_nowait(&open_semaphore)) return -EBUSY; ar7_wdt_enable_wdt(); expect_close = 0; diff --git a/drivers/watchdog/it8712f_wdt.c b/drivers/watchdog/it8712f_wdt.c index 445b7e812112..924a421dc5ee 100644 --- a/drivers/watchdog/it8712f_wdt.c +++ b/drivers/watchdog/it8712f_wdt.c @@ -306,7 +306,7 @@ static int it8712f_wdt_open(struct inode *inode, struct file *file) { /* only allow one at a time */ - if (down_trylock(&it8712f_wdt_sem)) + if (!down_nowait(&it8712f_wdt_sem)) return -EBUSY; it8712f_wdt_enable(); diff --git a/drivers/watchdog/s3c2410_wdt.c b/drivers/watchdog/s3c2410_wdt.c index 98532c0e0689..b47bb6fd8d17 100644 --- a/drivers/watchdog/s3c2410_wdt.c +++ b/drivers/watchdog/s3c2410_wdt.c @@ -211,7 +211,7 @@ static int s3c2410wdt_set_heartbeat(int timeout) static int s3c2410wdt_open(struct inode *inode, struct file *file) { - if(down_trylock(&open_lock)) + if(!down_nowait(&open_lock)) return -EBUSY; if (nowayout) diff --git a/drivers/watchdog/sc1200wdt.c b/drivers/watchdog/sc1200wdt.c index 35cddff7020f..4c46c1a0706a 100644 --- a/drivers/watchdog/sc1200wdt.c +++ b/drivers/watchdog/sc1200wdt.c @@ -151,7 +151,7 @@ static inline int sc1200wdt_status(void) static int sc1200wdt_open(struct inode *inode, struct file *file) { /* allow one at a time */ - if (down_trylock(&open_sem)) + if (!down_nowait(&open_sem)) return -EBUSY; if (timeout > MAX_TIMEOUT) diff --git a/drivers/watchdog/scx200_wdt.c b/drivers/watchdog/scx200_wdt.c index d55882bca319..fea2efd1910f 100644 --- a/drivers/watchdog/scx200_wdt.c +++ b/drivers/watchdog/scx200_wdt.c @@ -92,7 +92,7 @@ static void scx200_wdt_disable(void) static int scx200_wdt_open(struct inode *inode, struct file *file) { /* only allow one at a time */ - if (down_trylock(&open_semaphore)) + if (!down_nowait(&open_semaphore)) return -EBUSY; scx200_wdt_enable(); diff --git a/drivers/watchdog/wdt_pci.c b/drivers/watchdog/wdt_pci.c index 1355608683e4..3bd911b71a3f 100644 --- a/drivers/watchdog/wdt_pci.c +++ b/drivers/watchdog/wdt_pci.c @@ -426,7 +426,7 @@ static int wdtpci_ioctl(struct inode *inode, struct file *file, unsigned int cmd static int wdtpci_open(struct inode *inode, struct file *file) { - if (down_trylock(&open_sem)) + if (!down_nowait(&open_sem)) return -EBUSY; if (nowayout) { diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 7e9e4c79aec7..64211fc74266 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -1062,10 +1062,6 @@ void ocfs2_clear_inode(struct inode *inode) (unsigned long long)oi->ip_blkno); mutex_unlock(&oi->ip_io_mutex); - /* - * down_trylock() returns 0, down_write_trylock() returns 1 - * kernel 1, world 0 - */ mlog_bug_on_msg(!down_write_trylock(&oi->ip_alloc_sem), "Clear inode of %llu, alloc_sem is locked\n", (unsigned long long)oi->ip_blkno); diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c index e396b2fa4743..0d199cbf14b3 100644 --- a/fs/reiserfs/journal.c +++ b/fs/reiserfs/journal.c @@ -1412,7 +1412,7 @@ static int flush_journal_list(struct super_block *s, /* if flushall == 0, the lock is already held */ if (flushall) { down(&journal->j_flush_sem); - } else if (!down_trylock(&journal->j_flush_sem)) { + } else if (down_nowait(&journal->j_flush_sem)) { BUG(); } diff --git a/fs/xfs/linux-2.6/sema.h b/fs/xfs/linux-2.6/sema.h index 3abe7e9ceb33..87629d0494b6 100644 --- a/fs/xfs/linux-2.6/sema.h +++ b/fs/xfs/linux-2.6/sema.h @@ -36,17 +36,15 @@ typedef struct semaphore sema_t; static inline int issemalocked(sema_t *sp) { - return down_trylock(sp) || (up(sp), 0); + return !down_nowait(sp) || (up(sp), 0); } /* - * Map cpsema (try to get the sema) to down_trylock. We need to switch - * the return values since cpsema returns 1 (acquired) 0 (failed) and - * down_trylock returns the reverse 0 (acquired) 1 (failed). + * Map cpsema (try to get the sema) to down_try. */ static inline int cpsema(sema_t *sp) { - return down_trylock(sp) ? 0 : 1; + return down_nowait(sp); } #endif /* __XFS_SUPPORT_SEMA_H__ */ diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c index 5105015a75ad..faa2f78818d5 100644 --- a/fs/xfs/linux-2.6/xfs_buf.c +++ b/fs/xfs/linux-2.6/xfs_buf.c @@ -530,7 +530,7 @@ found: * if this does not work then we need to drop the * spinlock and do a hard attempt on the semaphore. */ - if (down_trylock(&bp->b_sema)) { + if (!down_nowait(&bp->b_sema)) { if (!(flags & XBF_TRYLOCK)) { /* wait for buffer ownership */ XB_TRACE(bp, "get_lock", 0); @@ -873,7 +873,7 @@ xfs_buf_cond_lock( { int locked; - locked = down_trylock(&bp->b_sema) == 0; + locked = down_nowait(&bp->b_sema); if (locked) { XB_SET_OWNER(bp); } diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 5c8351b859f0..9bf1cf1563ea 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -61,3 +61,22 @@ #define noinline __attribute__((noinline)) #define __attribute_const__ __attribute__((__const__)) #define __maybe_unused __attribute__((unused)) + +/** + * cast_if_type - allow an alternate type + * @expr: the expression to optionally cast + * @oktype: the type to allow. + * @desttype: the type to cast to. + * + * This is used to accept a particular alternate type for an expression: + * because any other types will not be cast, they will cause a warning as + * normal. + * + * Note that the unnecessary trinary forces functions to devolve into + * function pointers as users expect, but means @expr must be a pointer or + * integer. + */ +#define cast_if_type(expr, oktype, desttype) \ + __builtin_choose_expr(__builtin_types_compatible_p(typeof(1?(expr):0),\ + oktype), \ + (desttype)(expr), (expr)) diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h index d8e636e5607d..7e704e6c8a5d 100644 --- a/include/linux/compiler-intel.h +++ b/include/linux/compiler-intel.h @@ -29,3 +29,5 @@ #endif #define uninitialized_var(x) x + +#define cast_if_type(expr, oktype, desttype) ((desttype)(expr)) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 4cb8d3df414e..91d375ca15bc 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -500,4 +500,39 @@ struct sysinfo { #define NUMA_BUILD 0 #endif +/* If fn is of type ok1 or ok2, cast to desttype */ +#define __typesafe_cb(desttype, fn, ok1, ok2) \ + cast_if_type(cast_if_type((fn), ok1, desttype), ok2, desttype) + +/** + * typesafe_cb - cast a callback function if it matches the arg + * @rettype: the return type of the callback function + * @fn: the callback function to cast + * @arg: the (pointer) argument to hand to the callback function. + * + * If a callback function takes a single argument, this macro does + * appropriate casts to a function which takes a single void * argument if the + * callback provided matches the @arg (or a const or volatile version). + * + * It is assumed that @arg is of pointer type: usually @arg is passed + * or assigned to a void * elsewhere anyway. + */ +#define typesafe_cb(rettype, fn, arg) \ + __typesafe_cb(rettype (*)(void *), (fn), \ + rettype (*)(const typeof(arg)), \ + rettype (*)(typeof(arg))) + +/** + * typesafe_cb_preargs - cast a callback function if it matches the arg + * @rettype: the return type of the callback function + * @fn: the callback function to cast + * @arg: the (pointer) argument to hand to the callback function. + * + * This is a version of typesafe_cb() for callbacks that take other arguments + * before the @arg. + */ +#define typesafe_cb_preargs(rettype, fn, arg, ...) \ + __typesafe_cb(rettype (*)(__VA_ARGS__, void *), (fn), \ + rettype (*)(__VA_ARGS__, const typeof(arg)), \ + rettype (*)(__VA_ARGS__, typeof(arg))) #endif diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 00dd957e245b..3152c1ef1d08 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -4,9 +4,32 @@ #include <linux/err.h> #include <linux/sched.h> -struct task_struct *kthread_create(int (*threadfn)(void *data), - void *data, - const char namefmt[], ...); +/** + * kthread_create - create a kthread. + * @threadfn: the function to run until signal_pending(current). + * @data: data ptr for @threadfn. + * @namefmt: printf-style name for the thread. + * + * Description: This helper function creates and names a kernel + * thread. The thread will be stopped: use wake_up_process() to start + * it. See also kthread_run(), kthread_create_on_cpu(). + * + * When woken, the thread will run @threadfn() with @data as its + * argument. @threadfn() can either call do_exit() directly if it is a + * standalone thread for which noone will call kthread_stop(), or + * return when 'kthread_should_stop()' is true (which means + * kthread_stop() has been called). The return value should be zero + * or a negative error number; it will be passed to kthread_stop(). + * + * Returns a task_struct or ERR_PTR(-ENOMEM). + */ +#define kthread_create(threadfn, data, namefmt...) \ + __kthread_create(typesafe_cb(int,(threadfn),(data)), (data), namefmt) + +struct task_struct *__kthread_create(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...) + __attribute__((format(printf, 3, 4))); /** * kthread_run - create and wake a thread. diff --git a/include/linux/mutex.h b/include/linux/mutex.h index bc6da10ceee0..c1f5b3f9fe2d 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -141,10 +141,6 @@ extern int __must_check mutex_lock_killable(struct mutex *lock); # define mutex_lock_killable_nested(lock, subclass) mutex_lock_killable(lock) #endif -/* - * NOTE: mutex_trylock() follows the spin_trylock() convention, - * not the down_trylock() convention! - */ extern int mutex_trylock(struct mutex *lock); extern void mutex_unlock(struct mutex *lock); diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h index 9cae64b00d6b..e02bfcc40f11 100644 --- a/include/linux/semaphore.h +++ b/include/linux/semaphore.h @@ -44,7 +44,12 @@ static inline void sema_init(struct semaphore *sem, int val) extern void down(struct semaphore *sem); extern int __must_check down_interruptible(struct semaphore *sem); extern int __must_check down_killable(struct semaphore *sem); -extern int __must_check down_trylock(struct semaphore *sem); +extern int __must_check down_nowait(struct semaphore *sem); +/* Old down_trylock() returned the opposite of what was expected. */ +static inline int __deprecated down_trylock(struct semaphore *sem) +{ + return !down_nowait(sem); +} extern int __must_check down_timeout(struct semaphore *sem, long jiffies); extern void up(struct semaphore *sem); diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 18af011c13af..23c1b0d0f020 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -5,10 +5,9 @@ (and more). So the "read" side to such a lock is anything which diables preeempt. */ #include <linux/cpu.h> +#include <linux/compiler.h> #include <asm/system.h> -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP) - #define ALL_CPUS ~0U /** @@ -17,8 +16,7 @@ * @data: the data ptr for the @fn() * @cpu: if @cpu == n, run @fn() on cpu n * if @cpu == NR_CPUS, run @fn() on any cpu - * if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then - * concurrently on all the other cpus + * if @cpu == ALL_CPUS, run @fn() on every online CPU. * * Description: This causes a thread to be scheduled on every other cpu, * each of which disables interrupts, and finally interrupts are disabled @@ -27,7 +25,11 @@ * * This can be thought of as a very heavy write lock, equivalent to * grabbing every spinlock in the kernel. */ -int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu); +#define stop_machine_run(fn, data, cpu) \ + stop_machine_run_notype(typesafe_cb(int, (fn), (data)), (data), (cpu)) + +#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP) +int stop_machine_run_notype(int (*fn)(void *), void *data, unsigned int cpu); /** * __stop_machine_run: freeze the machine on all CPUs and run this function @@ -35,17 +37,14 @@ int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu); * @data: the data ptr for the @fn * @cpu: the cpu to run @fn on (or any, if @cpu == NR_CPUS. * - * Description: This is a special version of the above, which returns the - * thread which has run @fn(): kthread_stop will return the return value - * of @fn(). Used by hotplug cpu. + * Description: This is a special version of the above, which assumes cpus + * won't come or go while it's being called. Used by hotplug cpu. */ -struct task_struct *__stop_machine_run(int (*fn)(void *), void *data, - unsigned int cpu); - +int __stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu); #else -static inline int stop_machine_run(int (*fn)(void *), void *data, - unsigned int cpu) +static inline int stop_machine_run_notype(int (*fn)(void *), void *data, + unsigned int cpu) { int ret; local_irq_disable(); diff --git a/include/linux/timer.h b/include/linux/timer.h index d4ba79248a27..1baf40162b8c 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -25,12 +25,22 @@ struct timer_list { extern struct tvec_base boot_tvec_bases; -#define TIMER_INITIALIZER(_function, _expires, _data) { \ - .entry = { .prev = TIMER_ENTRY_STATIC }, \ - .function = (_function), \ - .expires = (_expires), \ - .data = (_data), \ - .base = &boot_tvec_bases, \ +/* + * For historic reasons the timer function takes an unsigned long, so + * we use this variant of typesafe_cb. data is converted to an unsigned long + * if it is another integer type, by adding 0UL. + */ +#define typesafe_timerfn(fn, data) \ + __typesafe_cb(void (*)(unsigned long), (fn), \ + void (*)(const typeof((data)+0UL)), \ + void (*)(typeof((data)+0UL))) + +#define TIMER_INITIALIZER(_function, _expires, _data) { \ + .entry = { .prev = TIMER_ENTRY_STATIC }, \ + .function = typesafe_timerfn((_function), (_data)), \ + .expires = (_expires), \ + .data = (unsigned long)(_data), \ + .base = &boot_tvec_bases, \ } #define DEFINE_TIMER(_name, _function, _expires, _data) \ @@ -51,9 +61,13 @@ static inline void init_timer_on_stack(struct timer_list *timer) } #endif -static inline void setup_timer(struct timer_list * timer, - void (*function)(unsigned long), - unsigned long data) +#define setup_timer(timer, function, data) \ + __setup_timer((timer), typesafe_timerfn((function), (data)), \ + (unsigned long)(data)) + +static inline void __setup_timer(struct timer_list *timer, + void (*function)(unsigned long), + unsigned long data) { timer->function = function; timer->data = data; diff --git a/include/linux/usb.h b/include/linux/usb.h index 6994f187e924..30432027fce2 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -493,7 +493,7 @@ extern void usb_put_dev(struct usb_device *dev); /* USB device locking */ #define usb_lock_device(udev) down(&(udev)->dev.sem) #define usb_unlock_device(udev) up(&(udev)->dev.sem) -#define usb_trylock_device(udev) down_trylock(&(udev)->dev.sem) +#define usb_trylock_device(udev) down_nowait(&(udev)->dev.sem) extern int usb_lock_device_for_reset(struct usb_device *udev, const struct usb_interface *iface); diff --git a/include/linux/virtio_blk.h b/include/linux/virtio_blk.h index d4695a3356d0..b80919fad0ef 100644 --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -10,6 +10,7 @@ #define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */ #define VIRTIO_BLK_F_SEG_MAX 2 /* Indicates maximum # of segments */ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ +#define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ struct virtio_blk_config { diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 50db245c81ad..ec82d722da19 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -15,6 +15,13 @@ /* We've given up on this device. */ #define VIRTIO_CONFIG_S_FAILED 0x80 +/* Some virtio feature bits (currently bits 24 through 31) are reserved for the + * transport being used (eg. virtio_ring), the rest are per-device feature + * bits. */ +#define VIRTIO_TRANSPORT_F_START 24 +#define VIRTIO_TRANSPORT_F_END 32 +#define VIRTIO_TRANSPORT_F_MASK 0xFF000000 + #ifdef __KERNEL__ #include <linux/virtio.h> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index abe481ed990e..f4008a809ecc 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -24,6 +24,9 @@ * optimization. */ #define VRING_AVAIL_F_NO_INTERRUPT 1 +/* We publish our last-seen used index at the end of the avail ring. */ +#define VIRTIO_RING_F_PUBLISH_INDICES 24 + /* Virtio ring descriptors: 16 bytes. These can chain together via "next". */ struct vring_desc { @@ -82,6 +85,7 @@ struct vring { * __u16 avail_flags; * __u16 avail_idx; * __u16 available[num]; + * __u16 last_used_idx; * * // Padding to the next page boundary. * char pad[]; @@ -90,6 +94,7 @@ struct vring { * __u16 used_flags; * __u16 used_idx; * struct vring_used_elem used[num]; + * __u16 last_avail_idx; * }; */ static inline void vring_init(struct vring *vr, unsigned int num, void *p, @@ -106,9 +111,14 @@ static inline unsigned vring_size(unsigned int num, unsigned long pagesize) { return ((sizeof(struct vring_desc) * num + sizeof(__u16) * (2 + num) + pagesize - 1) & ~(pagesize - 1)) - + sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num; + + sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num + 2; } +/* We publish the last-seen used index at the end of the available ring, and + * vice-versa. These are at the end for backwards compatibility. */ +#define vring_last_used(vr) ((vr)->avail->ring[(vr)->num]) +#define vring_last_avail(vr) (*(__u16 *)&(vr)->used->ring[(vr)->num]) + #ifdef __KERNEL__ #include <linux/irqreturn.h> struct virtio_device; @@ -121,6 +131,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int num, void (*callback)(struct virtqueue *vq)); void vring_del_virtqueue(struct virtqueue *vq); +/* Filter out unsupported transport-specific feature bits. */ +u32 vring_transport_features(u32 features); + irqreturn_t vring_interrupt(int irq, void *_vq); #endif /* __KERNEL__ */ #endif /* _LINUX_VIRTIO_RING_H */ diff --git a/include/linux/virtio_rng.h b/include/linux/virtio_rng.h new file mode 100644 index 000000000000..331afb6c9f62 --- /dev/null +++ b/include/linux/virtio_rng.h @@ -0,0 +1,8 @@ +#ifndef _LINUX_VIRTIO_RNG_H +#define _LINUX_VIRTIO_RNG_H +#include <linux/virtio_config.h> + +/* The ID for virtio_rng */ +#define VIRTIO_ID_RNG 4 + +#endif /* _LINUX_VIRTIO_RNG_H */ diff --git a/kernel/cpu.c b/kernel/cpu.c index c284b64bec82..e44a40ae1909 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -192,7 +192,6 @@ static int __ref take_cpu_down(void *_param) static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) { int err, nr_calls = 0; - struct task_struct *p; cpumask_t old_allowed, tmp; void *hcpu = (void *)(long)cpu; unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0; @@ -226,19 +225,15 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) cpu_clear(cpu, tmp); set_cpus_allowed_ptr(current, &tmp); - p = __stop_machine_run(take_cpu_down, &tcd_param, cpu); + err = __stop_machine_run(take_cpu_down, &tcd_param, cpu); - if (IS_ERR(p) || cpu_online(cpu)) { + if (err || cpu_online(cpu)) { /* CPU didn't die: tell everyone. Can't complain. */ if (raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED | mod, hcpu) == NOTIFY_BAD) BUG(); - if (IS_ERR(p)) { - err = PTR_ERR(p); - goto out_allowed; - } - goto out_thread; + goto out_allowed; } /* Wait for it to sleep (leaving idle task). */ @@ -255,8 +250,6 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) check_for_tasks(cpu); -out_thread: - err = kthread_stop(p); out_allowed: set_cpus_allowed_ptr(current, &old_allowed); out_release: diff --git a/kernel/kthread.c b/kernel/kthread.c index bfbb2fe7dc4c..cbd58953f863 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -112,29 +112,10 @@ static void create_kthread(struct kthread_create_info *create) complete(&create->done); } -/** - * kthread_create - create a kthread. - * @threadfn: the function to run until signal_pending(current). - * @data: data ptr for @threadfn. - * @namefmt: printf-style name for the thread. - * - * Description: This helper function creates and names a kernel - * thread. The thread will be stopped: use wake_up_process() to start - * it. See also kthread_run(), kthread_create_on_cpu(). - * - * When woken, the thread will run @threadfn() with @data as its - * argument. @threadfn() can either call do_exit() directly if it is a - * standalone thread for which noone will call kthread_stop(), or - * return when 'kthread_should_stop()' is true (which means - * kthread_stop() has been called). The return value should be zero - * or a negative error number; it will be passed to kthread_stop(). - * - * Returns a task_struct or ERR_PTR(-ENOMEM). - */ -struct task_struct *kthread_create(int (*threadfn)(void *data), - void *data, - const char namefmt[], - ...) +struct task_struct *__kthread_create(int (*threadfn)(void *data), + void *data, + const char namefmt[], + ...) { struct kthread_create_info create; @@ -159,7 +140,7 @@ struct task_struct *kthread_create(int (*threadfn)(void *data), } return create.result; } -EXPORT_SYMBOL(kthread_create); +EXPORT_SYMBOL(__kthread_create); /** * kthread_bind - bind a just-created kthread to a cpu. diff --git a/kernel/module.c b/kernel/module.c index 65c8ebf61bd1..cd37341772fe 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1786,7 +1786,7 @@ static struct module *load_module(void __user *umod, /* Sanity checks against insmoding binaries or wrong arch, weird elf version */ - if (memcmp(hdr->e_ident, ELFMAG, 4) != 0 + if (memcmp(hdr->e_ident, ELFMAG, SELFMAG) != 0 || hdr->e_type != ET_REL || !elf_check_arch(hdr) || hdr->e_shentsize != sizeof(*sechdrs)) { diff --git a/kernel/mutex.c b/kernel/mutex.c index d046a345d365..ed31233746ec 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -373,8 +373,8 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count) * Try to acquire the mutex atomically. Returns 1 if the mutex * has been acquired successfully, and 0 on contention. * - * NOTE: this function follows the spin_trylock() convention, so - * it is negated to the down_trylock() return values! Be careful + * NOTE: this function follows the spin_trylock()/down_nowait() convention, + * so it is negated to the old down_trylock() return values! Be careful * about this when converting semaphore users to mutexes. * * This function must not be used in interrupt context. The diff --git a/kernel/printk.c b/kernel/printk.c index 57aeb84fda57..e585d125932f 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -991,7 +991,7 @@ EXPORT_SYMBOL(acquire_console_sem); int try_acquire_console_sem(void) { - if (down_trylock(&console_sem)) + if (!down_nowait(&console_sem)) return -1; console_locked = 1; console_may_schedule = 0; @@ -1090,7 +1090,7 @@ void console_unblank(void) * oops_in_progress is set to 1.. */ if (oops_in_progress) { - if (down_trylock(&console_sem) != 0) + if (!down_nowait(&console_sem)) return; } else acquire_console_sem(); diff --git a/kernel/semaphore.c b/kernel/semaphore.c index 1a064adab658..f600a1065241 100644 --- a/kernel/semaphore.c +++ b/kernel/semaphore.c @@ -14,7 +14,7 @@ * Some notes on the implementation: * * The spinlock controls access to the other members of the semaphore. - * down_trylock() and up() can be called from interrupt context, so we + * down_nowait() and up() can be called from interrupt context, so we * have to disable interrupts when taking the lock. It turns out various * parts of the kernel expect to be able to use down() on a semaphore in * interrupt context when they know it will succeed, so we have to use @@ -116,19 +116,18 @@ int down_killable(struct semaphore *sem) EXPORT_SYMBOL(down_killable); /** - * down_trylock - try to acquire the semaphore, without waiting + * down_nowait - try to acquire the semaphore, without waiting * @sem: the semaphore to be acquired * - * Try to acquire the semaphore atomically. Returns 0 if the mutex has - * been acquired successfully or 1 if it it cannot be acquired. + * Try to acquire the semaphore atomically. Returns true if the mutex has + * been acquired successfully or 0 if it it cannot be acquired. * - * NOTE: This return value is inverted from both spin_trylock and - * mutex_trylock! Be careful about this when converting code. + * NOTE: This replaces down_trylock() which returned the reverse. * * Unlike mutex_trylock, this function can be used from interrupt context, * and the semaphore can be released by any task or interrupt. */ -int down_trylock(struct semaphore *sem) +int down_nowait(struct semaphore *sem) { unsigned long flags; int count; @@ -139,9 +138,9 @@ int down_trylock(struct semaphore *sem) sem->count = count; spin_unlock_irqrestore(&sem->lock, flags); - return (count < 0); + return (count >= 0); } -EXPORT_SYMBOL(down_trylock); +EXPORT_SYMBOL(down_nowait); /** * down_timeout - acquire the semaphore within a specified time diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index bab5d2a601d3..36e166def7c7 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -13,220 +13,171 @@ #include <asm/atomic.h> #include <asm/uaccess.h> -/* Since we effect priority and affinity (both of which are visible - * to, and settable by outside processes) we do indirection via a - * kthread. */ - -/* Thread to stop each CPU in user context. */ +/* This controls the threads on each CPU. */ enum stopmachine_state { - STOPMACHINE_WAIT, - STOPMACHINE_PREPARE, + /* Dummy starting state for thread. */ + STOPMACHINE_NONE, + /* Disable interrupts. */ STOPMACHINE_DISABLE_IRQ, + /* Run the function */ STOPMACHINE_RUN, + /* Exit */ STOPMACHINE_EXIT, + /* Everyone exited. */ + STOPMACHINE_COMPLETE, }; +static enum stopmachine_state state; struct stop_machine_data { int (*fn)(void *); void *data; - struct completion done; - int run_all; -} smdata; - -static enum stopmachine_state stopmachine_state; -static unsigned int stopmachine_num_threads; -static atomic_t stopmachine_thread_ack; - -static int stopmachine(void *cpu) -{ - int irqs_disabled = 0; - int prepared = 0; - int ran = 0; - - set_cpus_allowed_ptr(current, &cpumask_of_cpu((int)(long)cpu)); - - /* Ack: we are alive */ - smp_mb(); /* Theoretically the ack = 0 might not be on this CPU yet. */ - atomic_inc(&stopmachine_thread_ack); - - /* Simple state machine */ - while (stopmachine_state != STOPMACHINE_EXIT) { - if (stopmachine_state == STOPMACHINE_DISABLE_IRQ - && !irqs_disabled) { - local_irq_disable(); - hard_irq_disable(); - irqs_disabled = 1; - /* Ack: irqs disabled. */ - smp_mb(); /* Must read state first. */ - atomic_inc(&stopmachine_thread_ack); - } else if (stopmachine_state == STOPMACHINE_PREPARE - && !prepared) { - /* Everyone is in place, hold CPU. */ - preempt_disable(); - prepared = 1; - smp_mb(); /* Must read state first. */ - atomic_inc(&stopmachine_thread_ack); - } else if (stopmachine_state == STOPMACHINE_RUN && !ran) { - smdata.fn(smdata.data); - ran = 1; - smp_mb(); /* Must read state first. */ - atomic_inc(&stopmachine_thread_ack); - } - /* Yield in first stage: migration threads need to - * help our sisters onto their CPUs. */ - if (!prepared && !irqs_disabled) - yield(); - else - cpu_relax(); - } - - /* Ack: we are exiting. */ - smp_mb(); /* Must read state first. */ - atomic_inc(&stopmachine_thread_ack); - - if (irqs_disabled) - local_irq_enable(); - if (prepared) - preempt_enable(); + int fnret; +}; - return 0; -} +/* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */ +static unsigned int num_threads; +static atomic_t thread_ack; +static struct completion finished; -/* Change the thread state */ -static void stopmachine_set_state(enum stopmachine_state state) +static void set_state(enum stopmachine_state newstate) { - atomic_set(&stopmachine_thread_ack, 0); + /* Reset ack counter. */ + atomic_set(&thread_ack, num_threads); smp_wmb(); - stopmachine_state = state; - while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) - cpu_relax(); + state = newstate; } -static int stop_machine(void) +/* Last one to ack a state moves to the next state. */ +static void ack_state(void) { - int i, ret = 0; - - atomic_set(&stopmachine_thread_ack, 0); - stopmachine_num_threads = 0; - stopmachine_state = STOPMACHINE_WAIT; - - for_each_online_cpu(i) { - if (i == raw_smp_processor_id()) - continue; - ret = kernel_thread(stopmachine, (void *)(long)i,CLONE_KERNEL); - if (ret < 0) - break; - stopmachine_num_threads++; - } - - /* Wait for them all to come to life. */ - while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) - yield(); - - /* If some failed, kill them all. */ - if (ret < 0) { - stopmachine_set_state(STOPMACHINE_EXIT); - return ret; + if (atomic_dec_and_test(&thread_ack)) { + set_state(state + 1); + if (state == STOPMACHINE_COMPLETE) + complete(&finished); } - - /* Now they are all started, make them hold the CPUs, ready. */ - preempt_disable(); - stopmachine_set_state(STOPMACHINE_PREPARE); - - /* Make them disable irqs. */ - local_irq_disable(); - hard_irq_disable(); - stopmachine_set_state(STOPMACHINE_DISABLE_IRQ); - - return 0; } -static void restart_machine(void) +/* This is the actual thread which stops the CPU. It exits by itself rather + * than waiting for kthread_stop(), because it's easier for hotplug CPU. */ +static int stop_cpu(struct stop_machine_data *smdata) { - stopmachine_set_state(STOPMACHINE_EXIT); + enum stopmachine_state curstate = STOPMACHINE_NONE; + int uninitialized_var(ret); + + /* Simple state machine */ + do { + /* Chill out and ensure we re-read stopmachine_state. */ + cpu_relax(); + if (state != curstate) { + curstate = state; + switch (curstate) { + case STOPMACHINE_DISABLE_IRQ: + local_irq_disable(); + hard_irq_disable(); + break; + case STOPMACHINE_RUN: + /* |= allows error detection if functions on + * multiple CPUs. */ + smdata->fnret |= smdata->fn(smdata->data); + break; + default: + break; + } + ack_state(); + } + } while (curstate < STOPMACHINE_EXIT); + local_irq_enable(); - preempt_enable_no_resched(); + do_exit(0); } -static void run_other_cpus(void) +/* Callback for CPUs which aren't supposed to do anything. */ +static int chill(void *unused) { - stopmachine_set_state(STOPMACHINE_RUN); + return 0; } -static int do_stop(void *_smdata) +int __stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) { - struct stop_machine_data *smdata = _smdata; - int ret; + int i, err; + struct stop_machine_data active, idle; + struct task_struct **threads; + + active.fn = fn; + active.data = data; + active.fnret = 0; + idle.fn = chill; + idle.data = NULL; + + /* If they don't care which cpu fn runs on, just pick one. */ + if (cpu == NR_CPUS) + cpu = any_online_cpu(cpu_online_map); + + /* This could be too big for stack on large machines. */ + threads = kcalloc(NR_CPUS, sizeof(threads[0]), GFP_KERNEL); + if (!threads) + return -ENOMEM; + + /* Set up initial state. */ + init_completion(&finished); + num_threads = num_online_cpus(); + set_state(STOPMACHINE_DISABLE_IRQ); - ret = stop_machine(); - if (ret == 0) { - ret = smdata->fn(smdata->data); - if (smdata->run_all) - run_other_cpus(); - restart_machine(); - } + for_each_online_cpu(i) { + struct stop_machine_data *smdata; + struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; - /* We're done: you can kthread_stop us now */ - complete(&smdata->done); + if (cpu == ALL_CPUS || i == cpu) + smdata = &active; + else + smdata = &idle; - /* Wait for kthread_stop */ - set_current_state(TASK_INTERRUPTIBLE); - while (!kthread_should_stop()) { - schedule(); - set_current_state(TASK_INTERRUPTIBLE); - } - __set_current_state(TASK_RUNNING); - return ret; -} + threads[i] = kthread_create(stop_cpu, smdata, "kstop%u", i); + if (IS_ERR(threads[i])) { + err = PTR_ERR(threads[i]); + threads[i] = NULL; + goto kill_threads; + } -struct task_struct *__stop_machine_run(int (*fn)(void *), void *data, - unsigned int cpu) -{ - static DEFINE_MUTEX(stopmachine_mutex); - struct stop_machine_data smdata; - struct task_struct *p; + /* Place it onto correct cpu. */ + kthread_bind(threads[i], i); - mutex_lock(&stopmachine_mutex); + /* Make it highest prio. */ + if (sched_setscheduler(threads[i], SCHED_FIFO, ¶m) != 0) + BUG(); + } - smdata.fn = fn; - smdata.data = data; - smdata.run_all = (cpu == ALL_CPUS) ? 1 : 0; - init_completion(&smdata.done); + /* We've created all the threads. Wake them all: hold this CPU so one + * doesn't hit this CPU until we're ready. */ + cpu = get_cpu(); + for_each_online_cpu(i) + wake_up_process(threads[i]); - smp_wmb(); /* make sure other cpus see smdata updates */ + /* This will release the thread on our CPU. */ + put_cpu(); + wait_for_completion(&finished); - /* If they don't care which CPU fn runs on, bind to any online one. */ - if (cpu == NR_CPUS || cpu == ALL_CPUS) - cpu = raw_smp_processor_id(); + kfree(threads); - p = kthread_create(do_stop, &smdata, "kstopmachine"); - if (!IS_ERR(p)) { - struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; + return active.fnret; - /* One high-prio thread per cpu. We'll do this one. */ - sched_setscheduler(p, SCHED_FIFO, ¶m); - kthread_bind(p, cpu); - wake_up_process(p); - wait_for_completion(&smdata.done); - } - mutex_unlock(&stopmachine_mutex); - return p; +kill_threads: + for_each_online_cpu(i) + if (threads[i]) + kthread_stop(threads[i]); + kfree(threads); + return err; } -int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) +int stop_machine_run_notype(int (*fn)(void *), void *data, unsigned int cpu) { - struct task_struct *p; int ret; /* No CPUs can come up or down during this. */ get_online_cpus(); - p = __stop_machine_run(fn, data, cpu); - if (!IS_ERR(p)) - ret = kthread_stop(p); - else - ret = PTR_ERR(p); + ret = __stop_machine_run(fn, data, cpu); put_online_cpus(); return ret; } -EXPORT_SYMBOL_GPL(stop_machine_run); +EXPORT_SYMBOL_GPL(stop_machine_run_notype); |