From 275d7d44d802ef271a42dc87ac091a495ba72fc5 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 20 Aug 2015 10:34:59 +0930 Subject: module: Fix locking in symbol_put_addr() Poma (on the way to another bug) reported an assertion triggering: [] module_assert_mutex_or_preempt+0x49/0x90 [] __module_address+0x32/0x150 [] __module_text_address+0x16/0x70 [] symbol_put_addr+0x29/0x40 [] dvb_frontend_detach+0x7d/0x90 [dvb_core] Laura Abbott produced a patch which lead us to inspect symbol_put_addr(). This function has a comment claiming it doesn't need to disable preemption around the module lookup because it holds a reference to the module it wants to find, which therefore cannot go away. This is wrong (and a false optimization too, preempt_disable() is really rather cheap, and I doubt any of this is on uber critical paths, otherwise it would've retained a pointer to the actual module anyway and avoided the second lookup). While its true that the module cannot go away while we hold a reference on it, the data structure we do the lookup in very much _CAN_ change while we do the lookup. Therefore fix the comment and add the required preempt_disable(). Reported-by: poma Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Rusty Russell Fixes: a6e6abd575fc ("module: remove module_text_address()") Cc: stable@kernel.org --- kernel/module.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index b86b7bf1be38..8f051a106676 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1063,11 +1063,15 @@ void symbol_put_addr(void *addr) if (core_kernel_text(a)) return; - /* module_text_address is safe here: we're supposed to have reference - * to module from symbol_get, so it can't go away. */ + /* + * Even though we hold a reference on the module; we still need to + * disable preemption in order to safely traverse the data structure. + */ + preempt_disable(); modaddr = __module_text_address(a); BUG_ON(!modaddr); module_put(modaddr); + preempt_enable(); } EXPORT_SYMBOL_GPL(symbol_put_addr); -- cgit v1.2.3 From 868e87ccda2461cafd4a0d39f1486eb8f4a9a6f9 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 28 Sep 2015 10:31:50 +0100 Subject: ARM: make RiscPC depend on MMU RiscPC fails to build if MMU is disabled: arch/arm/mach-rpc/ecard.c: In function 'ecard_init_pgtables': arch/arm/mach-rpc/ecard.c:229:2: error: implicit declaration of function 'pgd_offset' [-Werror=implicit-function-declaration] arrange for RiscPC to depend on MMU. Signed-off-by: Russell King --- arch/arm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 72ad724c67ae..639411f73ca9 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -645,6 +645,7 @@ config ARCH_SHMOBILE_LEGACY config ARCH_RPC bool "RiscPC" + depends on MMU select ARCH_ACORN select ARCH_MAY_HAVE_PC_FDC select ARCH_SPARSEMEM_ENABLE -- cgit v1.2.3 From 178b2d09afc05a46f68b190c6594f3a429bc2385 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 24 Sep 2015 16:18:12 -0300 Subject: ARM: dts: imx7d: Fix UART2 base address The UART2 memory space starts at address 0x30890000 (UART2_URXD). Fix it so that UART2 can be used. Signed-off-by: Fabio Estevam Fixes: 949673450291 ("ARM: dts: add imx7d soc dtsi file") Cc: Signed-off-by: Shawn Guo --- arch/arm/boot/dts/imx7d.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi index b738ce0f9d9b..6e444bb873f9 100644 --- a/arch/arm/boot/dts/imx7d.dtsi +++ b/arch/arm/boot/dts/imx7d.dtsi @@ -588,10 +588,10 @@ status = "disabled"; }; - uart2: serial@30870000 { + uart2: serial@30890000 { compatible = "fsl,imx7d-uart", "fsl,imx6q-uart"; - reg = <0x30870000 0x10000>; + reg = <0x30890000 0x10000>; interrupts = ; clocks = <&clks IMX7D_UART2_ROOT_CLK>, <&clks IMX7D_UART2_ROOT_CLK>; -- cgit v1.2.3 From 1f744fd317dc55cadd7132c57c499e3117aea01d Mon Sep 17 00:00:00 2001 From: Thomas Hebb Date: Thu, 1 Oct 2015 21:00:00 +0200 Subject: ARM: dts: berlin: change BG2Q's USB PHY compatible Currently, BG2Q shares a compatible with BG2. This is incorrect, since BG2 and BG2Q use different USB PLL dividers. In reality, BG2Q shares a divider with BG2CD. Change BG2Q's USB PHY compatible string to reflect that. Cc: # v4.2.0- Signed-off-by: Thomas Hebb Signed-off-by: Sebastian Hesselbarth --- arch/arm/boot/dts/berlin2q.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/dts/berlin2q.dtsi b/arch/arm/boot/dts/berlin2q.dtsi index 63a48490e2f9..d4dbd28d348c 100644 --- a/arch/arm/boot/dts/berlin2q.dtsi +++ b/arch/arm/boot/dts/berlin2q.dtsi @@ -152,7 +152,7 @@ }; usb_phy2: phy@a2f400 { - compatible = "marvell,berlin2-usb-phy"; + compatible = "marvell,berlin2cd-usb-phy"; reg = <0xa2f400 0x128>; #phy-cells = <0>; resets = <&chip_rst 0x104 14>; @@ -170,7 +170,7 @@ }; usb_phy0: phy@b74000 { - compatible = "marvell,berlin2-usb-phy"; + compatible = "marvell,berlin2cd-usb-phy"; reg = <0xb74000 0x128>; #phy-cells = <0>; resets = <&chip_rst 0x104 12>; @@ -178,7 +178,7 @@ }; usb_phy1: phy@b78000 { - compatible = "marvell,berlin2-usb-phy"; + compatible = "marvell,berlin2cd-usb-phy"; reg = <0xb78000 0x128>; #phy-cells = <0>; resets = <&chip_rst 0x104 13>; -- cgit v1.2.3 From 7cc97d77ee8a90a6389b96a62472cddc02475ffc Mon Sep 17 00:00:00 2001 From: Adam YH Lee Date: Tue, 4 Aug 2015 11:15:48 -0700 Subject: iio: adc: twl4030: Fix ADC[3:6] readings MADC[3:6] reads incorrect values without these two following changes: - enable the 3v1 bias regulator for ADC[3:6] - configure ADC[3:6] lines as input, not as USB Signed-off-by: Adam YH Lee Signed-off-by: Jonathan Cameron --- drivers/iio/adc/twl4030-madc.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/drivers/iio/adc/twl4030-madc.c b/drivers/iio/adc/twl4030-madc.c index ebe415f10640..0c74869a540a 100644 --- a/drivers/iio/adc/twl4030-madc.c +++ b/drivers/iio/adc/twl4030-madc.c @@ -45,13 +45,18 @@ #include #include #include +#include #include +#define TWL4030_USB_SEL_MADC_MCPC (1<<3) +#define TWL4030_USB_CARKIT_ANA_CTRL 0xBB + /** * struct twl4030_madc_data - a container for madc info * @dev: Pointer to device structure for madc * @lock: Mutex protecting this data structure + * @regulator: Pointer to bias regulator for madc * @requests: Array of request struct corresponding to SW1, SW2 and RT * @use_second_irq: IRQ selection (main or co-processor) * @imr: Interrupt mask register of MADC @@ -60,6 +65,7 @@ struct twl4030_madc_data { struct device *dev; struct mutex lock; /* mutex protecting this data structure */ + struct regulator *usb3v1; struct twl4030_madc_request requests[TWL4030_MADC_NUM_METHODS]; bool use_second_irq; u8 imr; @@ -841,6 +847,32 @@ static int twl4030_madc_probe(struct platform_device *pdev) } twl4030_madc = madc; + /* Configure MADC[3:6] */ + ret = twl_i2c_read_u8(TWL_MODULE_USB, ®val, + TWL4030_USB_CARKIT_ANA_CTRL); + if (ret) { + dev_err(&pdev->dev, "unable to read reg CARKIT_ANA_CTRL 0x%X\n", + TWL4030_USB_CARKIT_ANA_CTRL); + goto err_i2c; + } + regval |= TWL4030_USB_SEL_MADC_MCPC; + ret = twl_i2c_write_u8(TWL_MODULE_USB, regval, + TWL4030_USB_CARKIT_ANA_CTRL); + if (ret) { + dev_err(&pdev->dev, "unable to write reg CARKIT_ANA_CTRL 0x%X\n", + TWL4030_USB_CARKIT_ANA_CTRL); + goto err_i2c; + } + + /* Enable 3v1 bias regulator for MADC[3:6] */ + madc->usb3v1 = devm_regulator_get(madc->dev, "vusb3v1"); + if (IS_ERR(madc->usb3v1)) + return -ENODEV; + + ret = regulator_enable(madc->usb3v1); + if (ret) + dev_err(madc->dev, "could not enable 3v1 bias regulator\n"); + ret = iio_device_register(iio_dev); if (ret) { dev_err(&pdev->dev, "could not register iio device\n"); @@ -866,6 +898,8 @@ static int twl4030_madc_remove(struct platform_device *pdev) twl4030_madc_set_current_generator(madc, 0, 0); twl4030_madc_set_power(madc, 0); + regulator_disable(madc->usb3v1); + return 0; } -- cgit v1.2.3 From 61fd56309165d4790f99462d893b099f0b07312a Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 2 Sep 2015 21:02:58 +0200 Subject: iio: st_accel: fix interrupt handling on LIS3LV02 This accelerometer accidentally either emits a DRDY signal or an IRQ signal. Accidentally I activated the IRQ signal as I thought it was analogous to the interrupt generator on other ST accelerometers. This was wrong. After this patch generic_buffer gives a nice stream of accelerometer readings. Fixes: 3acddf74f807778f "iio: st-sensors: add support for lis3lv02d accelerometer" Cc: Denis CIOCCA Signed-off-by: Linus Walleij Cc: Signed-off-by: Jonathan Cameron --- drivers/iio/accel/st_accel_core.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c index ff30f8806880..fb9311110424 100644 --- a/drivers/iio/accel/st_accel_core.c +++ b/drivers/iio/accel/st_accel_core.c @@ -149,8 +149,6 @@ #define ST_ACCEL_4_BDU_MASK 0x40 #define ST_ACCEL_4_DRDY_IRQ_ADDR 0x21 #define ST_ACCEL_4_DRDY_IRQ_INT1_MASK 0x04 -#define ST_ACCEL_4_IG1_EN_ADDR 0x21 -#define ST_ACCEL_4_IG1_EN_MASK 0x08 #define ST_ACCEL_4_MULTIREAD_BIT true /* CUSTOM VALUES FOR SENSOR 5 */ @@ -489,10 +487,6 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .drdy_irq = { .addr = ST_ACCEL_4_DRDY_IRQ_ADDR, .mask_int1 = ST_ACCEL_4_DRDY_IRQ_INT1_MASK, - .ig1 = { - .en_addr = ST_ACCEL_4_IG1_EN_ADDR, - .en_mask = ST_ACCEL_4_IG1_EN_MASK, - }, }, .multi_read_bit = ST_ACCEL_4_MULTIREAD_BIT, .bootime = 2, /* guess */ -- cgit v1.2.3 From eda7d0f38aaf50dbb2a2de15e8db386c4f6f65fc Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Sat, 8 Aug 2015 22:16:42 +0300 Subject: iio: accel: sca3000: memory corruption in sca3000_read_first_n_hw_rb() "num_read" is in byte units but we are write u16s so we end up write twice as much as intended. Signed-off-by: Dan Carpenter Cc: Signed-off-by: Jonathan Cameron --- drivers/staging/iio/accel/sca3000_ring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/iio/accel/sca3000_ring.c b/drivers/staging/iio/accel/sca3000_ring.c index 23685e74917e..bd2c69f85949 100644 --- a/drivers/staging/iio/accel/sca3000_ring.c +++ b/drivers/staging/iio/accel/sca3000_ring.c @@ -116,7 +116,7 @@ static int sca3000_read_first_n_hw_rb(struct iio_buffer *r, if (ret) goto error_ret; - for (i = 0; i < num_read; i++) + for (i = 0; i < num_read / sizeof(u16); i++) *(((u16 *)rx) + i) = be16_to_cpup((__be16 *)rx + i); if (copy_to_user(buf, rx, num_read)) -- cgit v1.2.3 From d836ace65ee98d7079bc3c5afdbcc0e27dca20a3 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Sat, 3 Oct 2015 13:03:47 -0700 Subject: ARM: orion: Fix DSA platform device after mvmdio conversion DSA expects the host_dev pointer to be the device structure associated with the MDIO bus controller driver. First commit breaking that was c3a07134e6aa ("mv643xx_eth: convert to use the Marvell Orion MDIO driver"), and then, it got completely under the radar for a while. Reported-by: Frans van de Wiel Fixes: c3a07134e6aa ("mv643xx_eth: convert to use the Marvell Orion MDIO driver") CC: stable@vger.kernel.org Signed-off-by: Florian Fainelli Signed-off-by: Gregory CLEMENT --- arch/arm/plat-orion/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c index 2235081a04ee..8861c367d061 100644 --- a/arch/arm/plat-orion/common.c +++ b/arch/arm/plat-orion/common.c @@ -495,7 +495,7 @@ void __init orion_ge00_switch_init(struct dsa_platform_data *d, int irq) d->netdev = &orion_ge00.dev; for (i = 0; i < d->nr_chips; i++) - d->chip[i].host_dev = &orion_ge00_shared.dev; + d->chip[i].host_dev = &orion_ge_mvmdio.dev; orion_switch_device.dev.platform_data = d; platform_device_register(&orion_switch_device); -- cgit v1.2.3 From 1266963170f576d4d08e6310b6963e26d3ff9d1e Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Wed, 7 Oct 2015 11:03:28 -0500 Subject: PCI: Prevent out of bounds access in numa_node override 63692df103e9 ("PCI: Allow numa_node override via sysfs") didn't check that the numa node provided by userspace is valid. Passing a node number too high would attempt to access invalid memory and trigger a kernel panic. Fixes: 63692df103e9 ("PCI: Allow numa_node override via sysfs") Signed-off-by: Sasha Levin Signed-off-by: Bjorn Helgaas CC: stable@vger.kernel.org # v3.19+ --- drivers/pci/pci-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 312f23a8429c..92618686604c 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -216,7 +216,7 @@ static ssize_t numa_node_store(struct device *dev, if (ret) return ret; - if (!node_online(node)) + if (node >= MAX_NUMNODES || !node_online(node)) return -EINVAL; add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); -- cgit v1.2.3 From a54c8f0f2d7df525ff997e2afe71866a1a013064 Mon Sep 17 00:00:00 2001 From: Cathy Avery Date: Fri, 2 Oct 2015 09:35:01 -0400 Subject: xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing) xen-blkfront will crash if the check to talk_to_blkback() in blkback_changed()(XenbusStateInitWait) returns an error. The driver data is freed and info is set to NULL. Later during the close process via talk_to_blkback's call to xenbus_dev_fatal() the null pointer is passed to and dereference in blkfront_closing. CC: stable@vger.kernel.org Signed-off-by: Cathy Avery Signed-off-by: Konrad Rzeszutek Wilk --- drivers/block/xen-blkfront.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 6d89ed35d80c..c8fdbc77f9b1 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -1968,7 +1968,8 @@ static void blkback_changed(struct xenbus_device *dev, break; /* Missed the backend's Closing state -- fallthrough */ case XenbusStateClosing: - blkfront_closing(info); + if (info) + blkfront_closing(info); break; } } -- cgit v1.2.3 From dcc909d90ccdbb73226397ff6d298f7af35b0e11 Mon Sep 17 00:00:00 2001 From: Markus Pargmann Date: Tue, 6 Oct 2015 20:03:54 +0200 Subject: nbd: Add locking for tasks The timeout handling introduced in 7e2893a16d3e (nbd: Fix timeout detection) introduces a race condition which may lead to killing of tasks that are not in nbd context anymore. This was not observed or reproducable yet. This patch adds locking to critical use of task_recv and task_send to avoid killing tasks that already left the NBD thread functions. This lock is only acquired if a timeout occures or the nbd device starts/stops. Reported-by: Ben Hutchings Signed-off-by: Markus Pargmann Reviewed-by: Ben Hutchings Fixes: 7e2893a16d3e ("nbd: Fix timeout detection") Signed-off-by: Jens Axboe --- drivers/block/nbd.c | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 293495a75d3d..1b87623381e2 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -60,6 +60,7 @@ struct nbd_device { bool disconnect; /* a disconnect has been requested by user */ struct timer_list timeout_timer; + spinlock_t tasks_lock; struct task_struct *task_recv; struct task_struct *task_send; @@ -140,21 +141,23 @@ static void sock_shutdown(struct nbd_device *nbd) static void nbd_xmit_timeout(unsigned long arg) { struct nbd_device *nbd = (struct nbd_device *)arg; - struct task_struct *task; + unsigned long flags; if (list_empty(&nbd->queue_head)) return; nbd->disconnect = true; - task = READ_ONCE(nbd->task_recv); - if (task) - force_sig(SIGKILL, task); + spin_lock_irqsave(&nbd->tasks_lock, flags); + + if (nbd->task_recv) + force_sig(SIGKILL, nbd->task_recv); - task = READ_ONCE(nbd->task_send); - if (task) + if (nbd->task_send) force_sig(SIGKILL, nbd->task_send); + spin_unlock_irqrestore(&nbd->tasks_lock, flags); + dev_err(nbd_to_dev(nbd), "Connection timed out, killed receiver and sender, shutting down connection\n"); } @@ -403,17 +406,24 @@ static int nbd_thread_recv(struct nbd_device *nbd) { struct request *req; int ret; + unsigned long flags; BUG_ON(nbd->magic != NBD_MAGIC); sk_set_memalloc(nbd->sock->sk); + spin_lock_irqsave(&nbd->tasks_lock, flags); nbd->task_recv = current; + spin_unlock_irqrestore(&nbd->tasks_lock, flags); ret = device_create_file(disk_to_dev(nbd->disk), &pid_attr); if (ret) { dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n"); + + spin_lock_irqsave(&nbd->tasks_lock, flags); nbd->task_recv = NULL; + spin_unlock_irqrestore(&nbd->tasks_lock, flags); + return ret; } @@ -429,7 +439,9 @@ static int nbd_thread_recv(struct nbd_device *nbd) device_remove_file(disk_to_dev(nbd->disk), &pid_attr); + spin_lock_irqsave(&nbd->tasks_lock, flags); nbd->task_recv = NULL; + spin_unlock_irqrestore(&nbd->tasks_lock, flags); if (signal_pending(current)) { siginfo_t info; @@ -534,8 +546,11 @@ static int nbd_thread_send(void *data) { struct nbd_device *nbd = data; struct request *req; + unsigned long flags; + spin_lock_irqsave(&nbd->tasks_lock, flags); nbd->task_send = current; + spin_unlock_irqrestore(&nbd->tasks_lock, flags); set_user_nice(current, MIN_NICE); while (!kthread_should_stop() || !list_empty(&nbd->waiting_queue)) { @@ -572,7 +587,15 @@ static int nbd_thread_send(void *data) nbd_handle_req(nbd, req); } + spin_lock_irqsave(&nbd->tasks_lock, flags); nbd->task_send = NULL; + spin_unlock_irqrestore(&nbd->tasks_lock, flags); + + /* Clear maybe pending signals */ + if (signal_pending(current)) { + siginfo_t info; + dequeue_signal_lock(current, ¤t->blocked, &info); + } return 0; } @@ -1052,6 +1075,7 @@ static int __init nbd_init(void) nbd_dev[i].magic = NBD_MAGIC; INIT_LIST_HEAD(&nbd_dev[i].waiting_queue); spin_lock_init(&nbd_dev[i].queue_lock); + spin_lock_init(&nbd_dev[i].tasks_lock); INIT_LIST_HEAD(&nbd_dev[i].queue_head); mutex_init(&nbd_dev[i].tx_lock); init_timer(&nbd_dev[i].timeout_timer); -- cgit v1.2.3 From b94e22805a2224061bb263a82b72e09544a5fbb3 Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Wed, 7 Oct 2015 13:10:54 +0200 Subject: iio: mxs-lradc: Fix temperature offset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 0° Kelvin is actually −273.15°C, not -272.15°C. Fix the temperature offset. Also improve the comment explaining the calculation. Reported-by: Janusz Użycki Signed-off-by: Alexandre Belloni Acked-by: Stefan Wahren Acked-by: Marek Vasut Cc: Signed-off-by: Jonathan Cameron --- drivers/staging/iio/adc/mxs-lradc.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/staging/iio/adc/mxs-lradc.c b/drivers/staging/iio/adc/mxs-lradc.c index 3f7715c9968b..47fc00a3f63b 100644 --- a/drivers/staging/iio/adc/mxs-lradc.c +++ b/drivers/staging/iio/adc/mxs-lradc.c @@ -915,11 +915,12 @@ static int mxs_lradc_read_raw(struct iio_dev *iio_dev, case IIO_CHAN_INFO_OFFSET: if (chan->type == IIO_TEMP) { /* The calculated value from the ADC is in Kelvin, we - * want Celsius for hwmon so the offset is - * -272.15 * scale + * want Celsius for hwmon so the offset is -273.15 + * The offset is applied before scaling so it is + * actually -213.15 * 4 / 1.012 = -1079.644268 */ - *val = -1075; - *val2 = 691699; + *val = -1079; + *val2 = 644268; return IIO_VAL_INT_PLUS_MICRO; } -- cgit v1.2.3 From 9babcd7929bc8967ae3bb6093f603b93c2f9958f Mon Sep 17 00:00:00 2001 From: Daniel Bristot de Oliveira Date: Thu, 8 Oct 2015 15:36:06 -0300 Subject: sched, tracing: Stop/start critical timings around the idle=poll idle loop When using idle=poll, the preemptoff tracer is always showing the idle task as the culprit for long latencies. That happens because critical timings are not stopped before idle loop. This patch stops critical timings before entering the idle loop, starting it again after the idle loop. This problem does not affect the irqsoff tracer because interruptions are enabled before entering the idle loop. Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Luis Claudio R. Goncalves Acked-by: Steven Rostedt Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.com Signed-off-by: Ingo Molnar --- kernel/sched/idle.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 8f177c73ae19..4a2ef5a02fd3 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -57,9 +57,11 @@ static inline int cpu_idle_poll(void) rcu_idle_enter(); trace_cpu_idle_rcuidle(0, smp_processor_id()); local_irq_enable(); + stop_critical_timings(); while (!tif_need_resched() && (cpu_idle_force_poll || tick_check_broadcast_expired())) cpu_relax(); + start_critical_timings(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); rcu_idle_exit(); return 1; -- cgit v1.2.3 From 0480334fa60488d12ae101a02d7d9e1a3d03d7dd Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 18 Sep 2015 11:45:12 +0100 Subject: ovl: use O_LARGEFILE in ovl_copy_up() Open the lower file with O_LARGEFILE in ovl_copy_up(). Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for catching 32-bit userspace dealing with a file large enough that it'll be mishandled if the application isn't aware that there might be an integer overflow. Inside the kernel, there shouldn't be any problems. Reported-by: Ulrich Obergfell Signed-off-by: David Howells Signed-off-by: Miklos Szeredi Cc: # v3.18+ --- fs/overlayfs/copy_up.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 84d693d37428..b1990ac8fa09 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -81,11 +81,11 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) if (len == 0) return 0; - old_file = ovl_path_open(old, O_RDONLY); + old_file = ovl_path_open(old, O_LARGEFILE | O_RDONLY); if (IS_ERR(old_file)) return PTR_ERR(old_file); - new_file = ovl_path_open(new, O_WRONLY); + new_file = ovl_path_open(new, O_LARGEFILE | O_WRONLY); if (IS_ERR(new_file)) { error = PTR_ERR(new_file); goto out_fput; -- cgit v1.2.3 From ab79efab0a0ba01a74df782eb7fa44b044dae8b5 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 18 Sep 2015 11:45:22 +0100 Subject: ovl: fix dentry reference leak In ovl_copy_up_locked(), newdentry is leaked if the function exits through out_cleanup as this just to out after calling ovl_cleanup() - which doesn't actually release the ref on newdentry. The out_cleanup segment should instead exit through out2 as certainly newdentry leaks - and possibly upper does also, though this isn't caught given the catch of newdentry. Without this fix, something like the following is seen: BUG: Dentry ffff880023e9eb20{i=f861,n=#ffff880023e82d90} still in use (1) [unmount of tmpfs tmpfs] BUG: Dentry ffff880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs] when unmounting the upper layer after an error occurred in copyup. An error can be induced by creating a big file in a lower layer with something like: dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000)) to create a large file (4.1G). Overlay an upper layer that is too small (on tmpfs might do) and then induce a copy up by opening it writably. Reported-by: Ulrich Obergfell Signed-off-by: David Howells Signed-off-by: Miklos Szeredi Cc: # v3.18+ --- fs/overlayfs/copy_up.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index b1990ac8fa09..871fcb67be97 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -267,7 +267,7 @@ out: out_cleanup: ovl_cleanup(wdir, newdentry); - goto out; + goto out2; } /* -- cgit v1.2.3 From 1c8a47df36d72ace8cf78eb6c228aa0f8027d3c2 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 12 Oct 2015 15:56:20 +0200 Subject: ovl: fix open in stacked overlay If two overlayfs filesystems are stacked on top of each other, then we need recursion in ovl_d_select_inode(). I guess d_backing_inode() is supposed to do that. But currently it doesn't and that functionality is open coded in vfs_open(). This is now copied into ovl_d_select_inode() to fix this regression. Reported-by: Alban Crequy Signed-off-by: Miklos Szeredi Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...") Cc: David Howells Cc: # v4.2+ --- fs/overlayfs/inode.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index d9da5a4e9382..ec0c2a050043 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -363,6 +363,9 @@ struct inode *ovl_d_select_inode(struct dentry *dentry, unsigned file_flags) ovl_path_upper(dentry, &realpath); } + if (realpath.dentry->d_flags & DCACHE_OP_SELECT_INODE) + return realpath.dentry->d_op->d_select_inode(realpath.dentry, file_flags); + return d_backing_inode(realpath.dentry); } -- cgit v1.2.3 From 0f95502ad84874b3c05fc7cdd9d4d9d5cddf7859 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Mon, 24 Aug 2015 15:57:18 +0300 Subject: ovl: free stack of paths in ovl_fill_super This fixes small memory leak after mount. Kmemleak report: unreferenced object 0xffff88003683fe00 (size 16): comm "mount", pid 2029, jiffies 4294909563 (age 33.380s) hex dump (first 16 bytes): 20 27 1f bb 00 88 ff ff 40 4b 0f 36 02 88 ff ff '......@K.6.... backtrace: [] create_object+0x124/0x2c0 [] kmemleak_alloc+0x7b/0xc0 [] __kmalloc+0x106/0x340 [] ovl_fill_super+0x389/0x9a0 [overlay] [] mount_nodev+0x54/0xa0 [] ovl_mount+0x18/0x20 [overlay] [] mount_fs+0x43/0x170 [] vfs_kern_mount+0x74/0x170 [] do_mount+0x22d/0xdf0 [] SyS_mount+0x7b/0xc0 [] entry_SYSCALL_64_fastpath+0x12/0x76 [] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov Signed-off-by: Miklos Szeredi Fixes: a78d9f0d5d5c ("ovl: support multiple lower layers") Cc: # v4.0+ --- fs/overlayfs/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 7466ff339c66..3f90c43c3c4a 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -1048,6 +1048,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) oe->lowerstack[i].dentry = stack[i].dentry; oe->lowerstack[i].mnt = ufs->lower_mnt[i]; } + kfree(stack); root_dentry->d_fsdata = oe; -- cgit v1.2.3 From 5ffdbe8bf1e485026e1c7e4714d2841553cf0b40 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Mon, 24 Aug 2015 15:57:19 +0300 Subject: ovl: free lower_mnt array in ovl_put_super This fixes memory leak after umount. Kmemleak report: unreferenced object 0xffff8800ba791010 (size 8): comm "mount", pid 2394, jiffies 4294996294 (age 53.920s) hex dump (first 8 bytes): 20 1c 13 02 00 88 ff ff ....... backtrace: [] create_object+0x124/0x2c0 [] kmemleak_alloc+0x7b/0xc0 [] __kmalloc+0x106/0x340 [] ovl_fill_super+0x55c/0x9b0 [overlay] [] mount_nodev+0x54/0xa0 [] ovl_mount+0x18/0x20 [overlay] [] mount_fs+0x43/0x170 [] vfs_kern_mount+0x74/0x170 [] do_mount+0x22d/0xdf0 [] SyS_mount+0x7b/0xc0 [] entry_SYSCALL_64_fastpath+0x12/0x76 [] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov Signed-off-by: Miklos Szeredi Fixes: dd662667e6d3 ("ovl: add mutli-layer infrastructure") Cc: # v4.0+ --- fs/overlayfs/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 3f90c43c3c4a..8d04b86e0680 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -544,6 +544,7 @@ static void ovl_put_super(struct super_block *sb) mntput(ufs->upper_mnt); for (i = 0; i < ufs->numlower; i++) mntput(ufs->lower_mnt[i]); + kfree(ufs->lower_mnt); kfree(ufs->config.lowerdir); kfree(ufs->config.upperdir); -- cgit v1.2.3 From 9ad18ab938375502c03cf467abecbb77264c9475 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 29 Sep 2015 12:47:50 -0400 Subject: writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration laptop_mode_timer_fn() was using bdi_for_each_wb() without the required RCU locking leading to the following warning. WARNING: CPU: 0 PID: 0 at include/linux/backing-dev.h:415 laptop_mode_timer_fn+0x106/0x170() ... Call Trace: [] dump_stack+0x4e/0x82 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_null+0x1a/0x20 [] laptop_mode_timer_fn+0x106/0x170 [] call_timer_fn+0xb3/0x2f0 [] run_timer_softirq+0x205/0x370 [] __do_softirq+0xd4/0x460 [] irq_exit+0x89/0xa0 [] smp_apic_timer_interrupt+0x42/0x50 [] apic_timer_interrupt+0x84/0x90 ... Fix it by adding rcu_read_lock() around the iteration. Signed-off-by: Tejun Heo Fixes: a06fd6b10228 ("writeback: make laptop_mode_timer_fn() handle multiple bdi_writeback's") Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- mm/page-writeback.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0a931cdd4f6b..902e5f215e57 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1965,10 +1965,12 @@ void laptop_mode_timer_fn(unsigned long data) if (!bdi_has_dirty_io(&q->backing_dev_info)) return; + rcu_read_lock(); bdi_for_each_wb(wb, &q->backing_dev_info, &iter, 0) if (wb_has_dirty_io(wb)) wb_start_writeback(wb, nr_pages, true, WB_REASON_LAPTOP_TIMER); + rcu_read_unlock(); } /* -- cgit v1.2.3 From 6fdf860f15d4a6be8f0947bad608d687fe0c7af7 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 29 Sep 2015 12:47:51 -0400 Subject: writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback() wakeup_dirtytime_writeback() walks and wakes up all wb's of all bdi's; unfortunately, it was always waking up bdi->wb instead of the wb being walked. Fix it. Signed-off-by: Tejun Heo Fixes: 001fe6f617b1 ("writeback: make wakeup_dirtytime_writeback() handle multiple bdi_writeback's") Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- fs/fs-writeback.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 091a36444972..d0da30668e98 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1897,8 +1897,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w) struct wb_iter iter; bdi_for_each_wb(wb, bdi, &iter, 0) - if (!list_empty(&bdi->wb.b_dirty_time)) - wb_wakeup(&bdi->wb); + if (!list_empty(&wb->b_dirty_time)) + wb_wakeup(wb); } rcu_read_unlock(); schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); -- cgit v1.2.3 From b817525a4a80c04e4ca44192d97a1ffa9f2be572 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 2 Oct 2015 14:47:05 -0400 Subject: writeback: bdi_writeback iteration must not skip dying ones bdi_for_each_wb() is used in several places to wake up or issue writeback work items to all wb's (bdi_writeback's) on a given bdi. The iteration is performed by walking bdi->cgwb_tree; however, the tree only indexes wb's which are currently active. For example, when a memcg gets associated with a different blkcg, the old wb is removed from the tree so that the new one can be indexed. The old wb starts dying from then on but will linger till all its inodes are drained. As these dying wb's may still host dirty inodes, writeback operations which affect all wb's must include them. bdi_for_each_wb() skipping dying wb's led to sync(2) missing and failing to sync the inodes belonging to those wb's. This patch adds a RCU protected @bdi->wb_list which lists all wb's beloinging to that bdi. wb's are added on creation and removed on release rather than on the start of destruction. bdi_for_each_wb() usages are replaced with list_for_each[_continue]_rcu() iterations over @bdi->wb_list and bdi_for_each_wb() and its helpers are removed. v2: Updated as per Jan. last_wb ref leak in bdi_split_work_to_wbs() fixed and unnecessary list head severing in cgwb_bdi_destroy() removed. Signed-off-by: Tejun Heo Reported-and-tested-by: Artem Bityutskiy Fixes: ebe41ab0c79d ("writeback: implement bdi_for_each_wb()") Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com Cc: Jan Kara Signed-off-by: Jens Axboe --- fs/fs-writeback.c | 31 ++++++++++++++------ include/linux/backing-dev-defs.h | 3 ++ include/linux/backing-dev.h | 63 ---------------------------------------- mm/backing-dev.c | 14 ++++++++- mm/page-writeback.c | 3 +- 5 files changed, 39 insertions(+), 75 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index d0da30668e98..29e4599f6fc1 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -778,19 +778,24 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, struct wb_writeback_work *base_work, bool skip_if_busy) { - int next_memcg_id = 0; - struct bdi_writeback *wb; - struct wb_iter iter; + struct bdi_writeback *last_wb = NULL; + struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list, + struct bdi_writeback, bdi_node); might_sleep(); restart: rcu_read_lock(); - bdi_for_each_wb(wb, bdi, &iter, next_memcg_id) { + list_for_each_entry_continue_rcu(wb, &bdi->wb_list, bdi_node) { DEFINE_WB_COMPLETION_ONSTACK(fallback_work_done); struct wb_writeback_work fallback_work; struct wb_writeback_work *work; long nr_pages; + if (last_wb) { + wb_put(last_wb); + last_wb = NULL; + } + /* SYNC_ALL writes out I_DIRTY_TIME too */ if (!wb_has_dirty_io(wb) && (base_work->sync_mode == WB_SYNC_NONE || @@ -819,12 +824,22 @@ restart: wb_queue_work(wb, work); - next_memcg_id = wb->memcg_css->id + 1; + /* + * Pin @wb so that it stays on @bdi->wb_list. This allows + * continuing iteration from @wb after dropping and + * regrabbing rcu read lock. + */ + wb_get(wb); + last_wb = wb; + rcu_read_unlock(); wb_wait_for_completion(bdi, &fallback_work_done); goto restart; } rcu_read_unlock(); + + if (last_wb) + wb_put(last_wb); } #else /* CONFIG_CGROUP_WRITEBACK */ @@ -1857,12 +1872,11 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) rcu_read_lock(); list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { struct bdi_writeback *wb; - struct wb_iter iter; if (!bdi_has_dirty_io(bdi)) continue; - bdi_for_each_wb(wb, bdi, &iter, 0) + list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) wb_start_writeback(wb, wb_split_bdi_pages(wb, nr_pages), false, reason); } @@ -1894,9 +1908,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w) rcu_read_lock(); list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { struct bdi_writeback *wb; - struct wb_iter iter; - bdi_for_each_wb(wb, bdi, &iter, 0) + list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) if (!list_empty(&wb->b_dirty_time)) wb_wakeup(wb); } diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index a23209b43842..1b4d69f68c33 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -116,6 +116,8 @@ struct bdi_writeback { struct list_head work_list; struct delayed_work dwork; /* work item used for writeback */ + struct list_head bdi_node; /* anchored at bdi->wb_list */ + #ifdef CONFIG_CGROUP_WRITEBACK struct percpu_ref refcnt; /* used only for !root wb's */ struct fprop_local_percpu memcg_completions; @@ -150,6 +152,7 @@ struct backing_dev_info { atomic_long_t tot_write_bandwidth; struct bdi_writeback wb; /* the root writeback info for this bdi */ + struct list_head wb_list; /* list of all wbs */ #ifdef CONFIG_CGROUP_WRITEBACK struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */ struct rb_root cgwb_congested_tree; /* their congested states */ diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index d5eb4ad1c534..78677e5a65bf 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -408,61 +408,6 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked) rcu_read_unlock(); } -struct wb_iter { - int start_memcg_id; - struct radix_tree_iter tree_iter; - void **slot; -}; - -static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter, - struct backing_dev_info *bdi) -{ - struct radix_tree_iter *titer = &iter->tree_iter; - - WARN_ON_ONCE(!rcu_read_lock_held()); - - if (iter->start_memcg_id >= 0) { - iter->slot = radix_tree_iter_init(titer, iter->start_memcg_id); - iter->start_memcg_id = -1; - } else { - iter->slot = radix_tree_next_slot(iter->slot, titer, 0); - } - - if (!iter->slot) - iter->slot = radix_tree_next_chunk(&bdi->cgwb_tree, titer, 0); - if (iter->slot) - return *iter->slot; - return NULL; -} - -static inline struct bdi_writeback *__wb_iter_init(struct wb_iter *iter, - struct backing_dev_info *bdi, - int start_memcg_id) -{ - iter->start_memcg_id = start_memcg_id; - - if (start_memcg_id) - return __wb_iter_next(iter, bdi); - else - return &bdi->wb; -} - -/** - * bdi_for_each_wb - walk all wb's of a bdi in ascending memcg ID order - * @wb_cur: cursor struct bdi_writeback pointer - * @bdi: bdi to walk wb's of - * @iter: pointer to struct wb_iter to be used as iteration buffer - * @start_memcg_id: memcg ID to start iteration from - * - * Iterate @wb_cur through the wb's (bdi_writeback's) of @bdi in ascending - * memcg ID order starting from @start_memcg_id. @iter is struct wb_iter - * to be used as temp storage during iteration. rcu_read_lock() must be - * held throughout iteration. - */ -#define bdi_for_each_wb(wb_cur, bdi, iter, start_memcg_id) \ - for ((wb_cur) = __wb_iter_init(iter, bdi, start_memcg_id); \ - (wb_cur); (wb_cur) = __wb_iter_next(iter, bdi)) - #else /* CONFIG_CGROUP_WRITEBACK */ static inline bool inode_cgwb_enabled(struct inode *inode) @@ -522,14 +467,6 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg) { } -struct wb_iter { - int next_id; -}; - -#define bdi_for_each_wb(wb_cur, bdi, iter, start_blkcg_id) \ - for ((iter)->next_id = (start_blkcg_id); \ - ({ (wb_cur) = !(iter)->next_id++ ? &(bdi)->wb : NULL; }); ) - static inline int inode_congested(struct inode *inode, int cong_bits) { return wb_congested(&inode_to_bdi(inode)->wb, cong_bits); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 2df8ddcb0ca0..e92d77937fd3 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -480,6 +480,10 @@ static void cgwb_release_workfn(struct work_struct *work) release_work); struct backing_dev_info *bdi = wb->bdi; + spin_lock_irq(&cgwb_lock); + list_del_rcu(&wb->bdi_node); + spin_unlock_irq(&cgwb_lock); + wb_shutdown(wb); css_put(wb->memcg_css); @@ -575,6 +579,7 @@ static int cgwb_create(struct backing_dev_info *bdi, ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb); if (!ret) { atomic_inc(&bdi->usage_cnt); + list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list); list_add(&wb->memcg_node, memcg_cgwb_list); list_add(&wb->blkcg_node, blkcg_cgwb_list); css_get(memcg_css); @@ -764,15 +769,22 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { } int bdi_init(struct backing_dev_info *bdi) { + int ret; + bdi->dev = NULL; bdi->min_ratio = 0; bdi->max_ratio = 100; bdi->max_prop_frac = FPROP_FRAC_BASE; INIT_LIST_HEAD(&bdi->bdi_list); + INIT_LIST_HEAD(&bdi->wb_list); init_waitqueue_head(&bdi->wb_waitq); - return cgwb_bdi_init(bdi); + ret = cgwb_bdi_init(bdi); + + list_add_tail_rcu(&bdi->wb.bdi_node, &bdi->wb_list); + + return ret; } EXPORT_SYMBOL(bdi_init); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 902e5f215e57..0f1a94e9f351 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1956,7 +1956,6 @@ void laptop_mode_timer_fn(unsigned long data) int nr_pages = global_page_state(NR_FILE_DIRTY) + global_page_state(NR_UNSTABLE_NFS); struct bdi_writeback *wb; - struct wb_iter iter; /* * We want to write everything out, not just down to the dirty @@ -1966,7 +1965,7 @@ void laptop_mode_timer_fn(unsigned long data) return; rcu_read_lock(); - bdi_for_each_wb(wb, &q->backing_dev_info, &iter, 0) + list_for_each_entry_rcu(wb, &q->backing_dev_info.wb_list, bdi_node) if (wb_has_dirty_io(wb)) wb_start_writeback(wb, nr_pages, true, WB_REASON_LAPTOP_TIMER); -- cgit v1.2.3 From d60d1bddd5b642711a237511845853755b25bf1f Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 29 Sep 2015 12:47:53 -0400 Subject: writeback: memcg dirty_throttle_control should be initialized with wb->memcg_completions MDTC_INIT() is used to initialize dirty_throttle_control for memcg domains. It used DTC_INIT_COMMON() to initialized mdtc->wb and ->wb_completions which is incorrect as DTC_INIT_COMMON() sets the latter to wb->completions instead of wb->memcg_completions. This can lead to wildly incorrect results when calculating the proportion of dirty memory the memcg domain should get. Remove DTC_INIT_COMMON() and update MDTC_INIT() to initialize mdtc->wb_completions to wb->memcg_completions. Signed-off-by: Tejun Heo Fixes: c2aa723a6093 ("writeback: implement memcg writeback domain based throttling") Signed-off-by: Jens Axboe --- mm/page-writeback.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0f1a94e9f351..56c0bffa9f49 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -145,9 +145,6 @@ struct dirty_throttle_control { unsigned long pos_ratio; }; -#define DTC_INIT_COMMON(__wb) .wb = (__wb), \ - .wb_completions = &(__wb)->completions - /* * Length of period for aging writeout fractions of bdis. This is an * arbitrarily chosen number. The longer the period, the slower fractions will @@ -157,12 +154,16 @@ struct dirty_throttle_control { #ifdef CONFIG_CGROUP_WRITEBACK -#define GDTC_INIT(__wb) .dom = &global_wb_domain, \ - DTC_INIT_COMMON(__wb) +#define GDTC_INIT(__wb) .wb = (__wb), \ + .dom = &global_wb_domain, \ + .wb_completions = &(__wb)->completions + #define GDTC_INIT_NO_WB .dom = &global_wb_domain -#define MDTC_INIT(__wb, __gdtc) .dom = mem_cgroup_wb_domain(__wb), \ - .gdtc = __gdtc, \ - DTC_INIT_COMMON(__wb) + +#define MDTC_INIT(__wb, __gdtc) .wb = (__wb), \ + .dom = mem_cgroup_wb_domain(__wb), \ + .wb_completions = &(__wb)->memcg_completions, \ + .gdtc = __gdtc static bool mdtc_valid(struct dirty_throttle_control *dtc) { @@ -213,7 +214,8 @@ static void wb_min_max_ratio(struct bdi_writeback *wb, #else /* CONFIG_CGROUP_WRITEBACK */ -#define GDTC_INIT(__wb) DTC_INIT_COMMON(__wb) +#define GDTC_INIT(__wb) .wb = (__wb), \ + .wb_completions = &(__wb)->completions #define GDTC_INIT_NO_WB #define MDTC_INIT(__wb, __gdtc) -- cgit v1.2.3 From c5edf9cdc4c483b9a94c03fc0b9f769bd090bf3e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 29 Sep 2015 13:04:26 -0400 Subject: writeback: fix incorrect calculation of available memory for memcg domains For memcg domains, the amount of available memory was calculated as min(the amount currently in use + headroom according to memcg, total clean memory) This isn't quite correct as what should be capped by the amount of clean memory is the headroom, not the sum of memory in use and headroom. For example, if a memcg domain has a significant amount of dirty memory, the above can lead to a value which is lower than the current amount in use which doesn't make much sense. In most circumstances, the above leads to a number which is somewhat but not drastically lower. As the amount of memory which can be readily allocated to the memcg domain is capped by the amount of system-wide clean memory which is not already assigned to the memcg itself, the number we want is the amount currently in use + min(headroom according to memcg, clean memory elsewhere in the system) This patch updates mem_cgroup_wb_stats() to return the number of filepages and headroom instead of the calculated available pages. mdtc_cap_avail() is renamed to mdtc_calc_avail() and performs the above calculation from file, headroom, dirty and globally clean pages. v2: Dummy mem_cgroup_wb_stats() implementation wasn't updated leading to build failure when !CGROUP_WRITEBACK. Fixed. Signed-off-by: Tejun Heo Fixes: c2aa723a6093 ("writeback: implement memcg writeback domain based throttling") Signed-off-by: Jens Axboe --- include/linux/memcontrol.h | 8 +++++--- mm/memcontrol.c | 35 +++++++++++++++++------------------ mm/page-writeback.c | 29 ++++++++++++++++++----------- 3 files changed, 40 insertions(+), 32 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6452ff4c463f..3e3318ddfc0e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -676,8 +676,9 @@ enum { struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg); struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); -void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pavail, - unsigned long *pdirty, unsigned long *pwriteback); +void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, + unsigned long *pheadroom, unsigned long *pdirty, + unsigned long *pwriteback); #else /* CONFIG_CGROUP_WRITEBACK */ @@ -687,7 +688,8 @@ static inline struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb) } static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb, - unsigned long *pavail, + unsigned long *pfilepages, + unsigned long *pheadroom, unsigned long *pdirty, unsigned long *pwriteback) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1fedbde68f59..882c10cfd0ba 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3740,44 +3740,43 @@ struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb) /** * mem_cgroup_wb_stats - retrieve writeback related stats from its memcg * @wb: bdi_writeback in question - * @pavail: out parameter for number of available pages + * @pfilepages: out parameter for number of file pages + * @pheadroom: out parameter for number of allocatable pages according to memcg * @pdirty: out parameter for number of dirty pages * @pwriteback: out parameter for number of pages under writeback * - * Determine the numbers of available, dirty, and writeback pages in @wb's - * memcg. Dirty and writeback are self-explanatory. Available is a bit - * more involved. + * Determine the numbers of file, headroom, dirty, and writeback pages in + * @wb's memcg. File, dirty and writeback are self-explanatory. Headroom + * is a bit more involved. * - * A memcg's headroom is "min(max, high) - used". The available memory is - * calculated as the lowest headroom of itself and the ancestors plus the - * number of pages already being used for file pages. Note that this - * doesn't consider the actual amount of available memory in the system. - * The caller should further cap *@pavail accordingly. + * A memcg's headroom is "min(max, high) - used". In the hierarchy, the + * headroom is calculated as the lowest headroom of itself and the + * ancestors. Note that this doesn't consider the actual amount of + * available memory in the system. The caller should further cap + * *@pheadroom accordingly. */ -void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pavail, - unsigned long *pdirty, unsigned long *pwriteback) +void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, + unsigned long *pheadroom, unsigned long *pdirty, + unsigned long *pwriteback) { struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - unsigned long head_room = PAGE_COUNTER_MAX; - unsigned long file_pages; *pdirty = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_DIRTY); /* this should eventually include NR_UNSTABLE_NFS */ *pwriteback = mem_cgroup_read_stat(memcg, MEM_CGROUP_STAT_WRITEBACK); + *pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) | + (1 << LRU_ACTIVE_FILE)); + *pheadroom = PAGE_COUNTER_MAX; - file_pages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) | - (1 << LRU_ACTIVE_FILE)); while ((parent = parent_mem_cgroup(memcg))) { unsigned long ceiling = min(memcg->memory.limit, memcg->high); unsigned long used = page_counter_read(&memcg->memory); - head_room = min(head_room, ceiling - min(ceiling, used)); + *pheadroom = min(*pheadroom, ceiling - min(ceiling, used)); memcg = parent; } - - *pavail = file_pages + head_room; } #else /* CONFIG_CGROUP_WRITEBACK */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 56c0bffa9f49..2c90357c34ea 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -684,13 +684,19 @@ static unsigned long hard_dirty_limit(struct wb_domain *dom, return max(thresh, dom->dirty_limit); } -/* memory available to a memcg domain is capped by system-wide clean memory */ -static void mdtc_cap_avail(struct dirty_throttle_control *mdtc) +/* + * Memory which can be further allocated to a memcg domain is capped by + * system-wide clean memory excluding the amount being used in the domain. + */ +static void mdtc_calc_avail(struct dirty_throttle_control *mdtc, + unsigned long filepages, unsigned long headroom) { struct dirty_throttle_control *gdtc = mdtc_gdtc(mdtc); - unsigned long clean = gdtc->avail - min(gdtc->avail, gdtc->dirty); + unsigned long clean = filepages - min(filepages, mdtc->dirty); + unsigned long global_clean = gdtc->avail - min(gdtc->avail, gdtc->dirty); + unsigned long other_clean = global_clean - min(global_clean, clean); - mdtc->avail = min(mdtc->avail, clean); + mdtc->avail = filepages + min(headroom, other_clean); } /** @@ -1564,16 +1570,16 @@ static void balance_dirty_pages(struct address_space *mapping, } if (mdtc) { - unsigned long writeback; + unsigned long filepages, headroom, writeback; /* * If @wb belongs to !root memcg, repeat the same * basic calculations for the memcg domain. */ - mem_cgroup_wb_stats(wb, &mdtc->avail, &mdtc->dirty, - &writeback); - mdtc_cap_avail(mdtc); + mem_cgroup_wb_stats(wb, &filepages, &headroom, + &mdtc->dirty, &writeback); mdtc->dirty += writeback; + mdtc_calc_avail(mdtc, filepages, headroom); domain_dirty_limits(mdtc); @@ -1895,10 +1901,11 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb) return true; if (mdtc) { - unsigned long writeback; + unsigned long filepages, headroom, writeback; - mem_cgroup_wb_stats(wb, &mdtc->avail, &mdtc->dirty, &writeback); - mdtc_cap_avail(mdtc); + mem_cgroup_wb_stats(wb, &filepages, &headroom, &mdtc->dirty, + &writeback); + mdtc_calc_avail(mdtc, filepages, headroom); domain_dirty_limits(mdtc); /* ditto, ignore writeback */ if (mdtc->dirty > mdtc->bg_thresh) -- cgit v1.2.3 From 835da3f99d329b1160a1f7fc82c7ac81163d63d0 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 6 Oct 2015 22:29:48 +0200 Subject: nvme: fix 32-bit build warning Compiling the nvme driver on 32-bit warns about a cast from a __u64 variable to a pointer: drivers/block/nvme-core.c: In function 'nvme_submit_io': drivers/block/nvme-core.c:1847:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (void __user *)io.addr, length, NULL, 0); The cast here is intentional and safe, so we can shut up the gcc warning by adding an intermediate cast to 'uintptr_t'. I had previously submitted a patch to fix this problem in the nvme driver, but it was accepted on the same day that two new warnings got added. For clarification, I also change the third instance of this cast to use uintptr_t instead of unsigned long now. Signed-off-by: Arnd Bergmann Fixes: d29ec8241c10e ("nvme: submit internal commands through the block layer") Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- drivers/block/nvme-core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c index 6f04771f1019..fce353ba5f66 100644 --- a/drivers/block/nvme-core.c +++ b/drivers/block/nvme-core.c @@ -1804,7 +1804,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) length = (io.nblocks + 1) << ns->lba_shift; meta_len = (io.nblocks + 1) * ns->ms; - metadata = (void __user *)(unsigned long)io.metadata; + metadata = (void __user *)(uintptr_t)io.metadata; write = io.opcode & 1; if (ns->ext) { @@ -1844,7 +1844,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) c.rw.metadata = cpu_to_le64(meta_dma); status = __nvme_submit_sync_cmd(ns->queue, &c, NULL, - (void __user *)io.addr, length, NULL, 0); + (void __user *)(uintptr_t)io.addr, length, NULL, 0); unmap: if (meta) { if (status == NVME_SC_SUCCESS && !write) { @@ -1886,7 +1886,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns, timeout = msecs_to_jiffies(cmd.timeout_ms); status = __nvme_submit_sync_cmd(ns ? ns->queue : dev->admin_q, &c, - NULL, (void __user *)cmd.addr, cmd.data_len, + NULL, (void __user *)(uintptr_t)cmd.addr, cmd.data_len, &cmd.result, timeout); if (status >= 0) { if (put_user(cmd.result, &ucmd->result)) -- cgit v1.2.3 From 51a6256b00008a3c520f6f31bcd62cd15cb05960 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 13 Oct 2015 04:32:49 +0900 Subject: ARM: EXYNOS: Fix double of_node_put() when parsing child power domains On each next iteration of for_each_compatible_node() the reference counter for current device node is already decreased by the loop iterator. The manual call to of_node_get() is required only on loop break which is not happening here. The double of_node_get() (with enabled CONFIG_OF_DYNAMIC) lead to decreasing the counter below expected, initial value. Fixes: fe4034a3fad7 ("ARM: EXYNOS: Add missing of_node_put() when parsing power domains") Signed-off-by: Krzysztof Kozlowski Cc: Signed-off-by: Kukjin Kim --- arch/arm/mach-exynos/pm_domains.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/arm/mach-exynos/pm_domains.c b/arch/arm/mach-exynos/pm_domains.c index 4a87e86dec45..7c21760f590f 100644 --- a/arch/arm/mach-exynos/pm_domains.c +++ b/arch/arm/mach-exynos/pm_domains.c @@ -200,15 +200,15 @@ no_clk: args.args_count = 0; child_domain = of_genpd_get_from_provider(&args); if (IS_ERR(child_domain)) - goto next_pd; + continue; if (of_parse_phandle_with_args(np, "power-domains", "#power-domain-cells", 0, &args) != 0) - goto next_pd; + continue; parent_domain = of_genpd_get_from_provider(&args); if (IS_ERR(parent_domain)) - goto next_pd; + continue; if (pm_genpd_add_subdomain(parent_domain, child_domain)) pr_warn("%s failed to add subdomain: %s\n", @@ -216,8 +216,6 @@ no_clk: else pr_info("%s has as child subdomain: %s.\n", parent_domain->name, child_domain->name); -next_pd: - of_node_put(np); } return 0; -- cgit v1.2.3 From b8bb9baad27e455c467e8fac47eebadbe765c18f Mon Sep 17 00:00:00 2001 From: Alim Akhtar Date: Tue, 13 Oct 2015 04:32:53 +0900 Subject: ARM: dts: Fix audio card detection on Peach boards Since commit 2fad972d45c4 ("ARM: dts: Add mclk entry for Peach boards"), sound card detection is broken on peach boards and gives below errors: [ 3.630457] max98090 7-0010: MAX98091 REVID=0x51 [ 3.634233] max98090 7-0010: use default 2.8v micbias [ 3.640985] snow-audio sound: HiFi <-> 3830000.i2s mapping ok [ 3.645307] max98090 7-0010: Invalid master clock frequency [ 3.650824] snow-audio sound: ASoC: Peach-Pi-I2S-MAX98091 late_probe() failed: -22 [ 3.658914] snow-audio sound: snd_soc_register_card failed (-22) [ 3.664366] snow-audio: probe of sound failed with error -22 This patch adds missing assigned-clocks and assigned-clock-parents for pmu_system_controller node which is used as "mclk" for audio codec. Fixes: 2fad972d45c4 ("ARM: dts: Add mclk entry for Peach boards") Signed-off-by: Alim Akhtar Reviewed-by: Krzysztof Kozlowski Cc: Signed-off-by: Kukjin Kim --- arch/arm/boot/dts/exynos5420-peach-pit.dts | 5 +++++ arch/arm/boot/dts/exynos5800-peach-pi.dts | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/arch/arm/boot/dts/exynos5420-peach-pit.dts b/arch/arm/boot/dts/exynos5420-peach-pit.dts index 8f4d76c5e11c..1b95da79293c 100644 --- a/arch/arm/boot/dts/exynos5420-peach-pit.dts +++ b/arch/arm/boot/dts/exynos5420-peach-pit.dts @@ -915,6 +915,11 @@ }; }; +&pmu_system_controller { + assigned-clocks = <&pmu_system_controller 0>; + assigned-clock-parents = <&clock CLK_FIN_PLL>; +}; + &rtc { status = "okay"; clocks = <&clock CLK_RTC>, <&max77802 MAX77802_CLK_32K_AP>; diff --git a/arch/arm/boot/dts/exynos5800-peach-pi.dts b/arch/arm/boot/dts/exynos5800-peach-pi.dts index 7d5b386b5ae6..8f40c7e549bd 100644 --- a/arch/arm/boot/dts/exynos5800-peach-pi.dts +++ b/arch/arm/boot/dts/exynos5800-peach-pi.dts @@ -878,6 +878,11 @@ }; }; +&pmu_system_controller { + assigned-clocks = <&pmu_system_controller 0>; + assigned-clock-parents = <&clock CLK_FIN_PLL>; +}; + &rtc { status = "okay"; clocks = <&clock CLK_RTC>, <&max77802 MAX77802_CLK_32K_AP>; -- cgit v1.2.3 From 7e381ec6a36aa44f15fc1a76e6efb9e2cd942e61 Mon Sep 17 00:00:00 2001 From: Tomi Valkeinen Date: Fri, 25 Sep 2015 16:02:03 +0300 Subject: ARM: dts: am57xx-beagle-x15: set VDD_SD to always-on LDO1 regulator (VDD_SD) is connected to SoC's vddshv8. vddshv8 needs to be kept always powered (see commit 5a0f93c6576a ("ARM: dts: Add am57xx-beagle-x15"), but at the moment VDD_SD is enabled/disabled depending on whether an SD card is inserted or not. This patch sets LDO1 regulator to always-on. This patch has a side effect of fixing another issue, HDMI DDC not working when SD card is not inserted: Why this happens is that the tpd12s015 (HDMI level shifter/ESD protection chip) has LS_OE GPIO input, which needs to be enabled for the HDMI DDC to work. LS_OE comes from gpio6_28. The pin that provides gpio6_28 is powered by vddshv8, and vddshv8 comes from VDD_SD. So when SD card is not inserted, VDD_SD is disabled, and LS_OE stays off. The proper fix for the HDMI DDC issue would be to maybe have the pinctrl framework manage the pin specific power. Apparently this fixes also a third issue (copy paste from Kishon's patch): ldo1_reg in addition to being connected to the io lines is also connected to the card detect line. On card removal, omap_hsmmc driver does a regulator_disable causing card detect line to be pulled down. This raises a card insertion interrupt and once the MMC core detects there is no card inserted, it does a regulator disable which again raises a card insertion interrupt. This happens in a loop causing infinite MMC interrupts. Fixes: 5a0f93c6576a ("ARM: dts: Add am57xx-beagle-x15") Cc: Kishon Vijay Abraham I Signed-off-by: Tomi Valkeinen Reported-by: Louis McCarthy Acked-by: Nishanth Menon Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/am57xx-beagle-x15.dts | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/am57xx-beagle-x15.dts b/arch/arm/boot/dts/am57xx-beagle-x15.dts index 568adf5efde0..d55e3ea89fda 100644 --- a/arch/arm/boot/dts/am57xx-beagle-x15.dts +++ b/arch/arm/boot/dts/am57xx-beagle-x15.dts @@ -402,11 +402,12 @@ /* SMPS9 unused */ ldo1_reg: ldo1 { - /* VDD_SD */ + /* VDD_SD / VDDSHV8 */ regulator-name = "ldo1"; regulator-min-microvolt = <1800000>; regulator-max-microvolt = <3300000>; regulator-boot-on; + regulator-always-on; }; ldo2_reg: ldo2 { -- cgit v1.2.3 From be59b6192f43a9792f5f636b51358196ba11bbf6 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Mon, 12 Oct 2015 16:19:54 -0700 Subject: memory: omap-gpmc: Fix unselectable debug option for GPMC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 63aa945b1013 ("memory: omap-gpmc: Add Kconfig option for debug") added a debug option for GPMC, but somehow managed to keep it unselectable. This probably happened because I had some uncommitted changes and the GPMC option is selected in the platform specific Kconfig. Let's also update the description a bit, it does not mention that enabling the debug option also disables the reset of GPMC controller during the init as pointed out by Uwe Kleine-König and Roger Quadros . Fixes: 63aa945b1013 ("memory: omap-gpmc: Add Kconfig option for debug") Reported-by: Uwe Kleine-König Acked-by: Roger Quadros Signed-off-by: Tony Lindgren --- drivers/memory/Kconfig | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig index c6a644b22af4..6f3154613dc7 100644 --- a/drivers/memory/Kconfig +++ b/drivers/memory/Kconfig @@ -58,12 +58,18 @@ config OMAP_GPMC memory drives like NOR, NAND, OneNAND, SRAM. config OMAP_GPMC_DEBUG - bool + bool "Enable GPMC debug output and skip reset of GPMC during init" depends on OMAP_GPMC help Enables verbose debugging mostly to decode the bootloader provided - timings. Enable this during development to configure devices - connected to the GPMC bus. + timings. To preserve the bootloader provided timings, the reset + of GPMC is skipped during init. Enable this during development to + configure devices connected to the GPMC bus. + + NOTE: In addition to matching the register setup with the bootloader + you also need to match the GPMC FCLK frequency used by the + bootloader or else the GPMC timings won't be identical with the + bootloader timings. config MVEBU_DEVBUS bool "Marvell EBU Device Bus Controller" -- cgit v1.2.3 From fd820a1ec758bc25b0eb10ab5e88e6c61fbcc8aa Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Tue, 6 Oct 2015 22:07:49 +0200 Subject: memory: omap-gpmc: dump "before" state before first modification MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When gpmc_cs_show_timings is called in gpmc_cs_set_timings() gpmc_cs_program_settings() was already run which modifies the CONFIG1 register. So to be more useful do the "before" dump earlier. Signed-off-by: Uwe Kleine-König Acked-by: Roger Quadros Signed-off-by: Tony Lindgren --- drivers/memory/omap-gpmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/memory/omap-gpmc.c b/drivers/memory/omap-gpmc.c index 32ac049f2bc4..6515dfc2b805 100644 --- a/drivers/memory/omap-gpmc.c +++ b/drivers/memory/omap-gpmc.c @@ -696,7 +696,6 @@ int gpmc_cs_set_timings(int cs, const struct gpmc_timings *t, int div; u32 l; - gpmc_cs_show_timings(cs, "before gpmc_cs_set_timings"); div = gpmc_calc_divider(t->sync_clk); if (div < 0) return div; @@ -1988,6 +1987,7 @@ static int gpmc_probe_generic_child(struct platform_device *pdev, if (ret < 0) goto err; + gpmc_cs_show_timings(cs, "before gpmc_cs_program_settings"); ret = gpmc_cs_program_settings(cs, &gpmc_s); if (ret < 0) goto err; -- cgit v1.2.3 From d8e1f5ed11a39a68da00f05000466c4f6db4456e Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Mon, 12 Oct 2015 16:19:54 -0700 Subject: Documentation: ARM: List new omap MMC requirements Earlier the PBIAS regulator was optional, not so with recent omap_hsmmc changes. To make things easier for people with custom .config files, let's add minimal documentation for it as suggested by Russell King . Signed-off-by: Tony Lindgren --- Documentation/arm/OMAP/README | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 Documentation/arm/OMAP/README diff --git a/Documentation/arm/OMAP/README b/Documentation/arm/OMAP/README new file mode 100644 index 000000000000..75645c45d14a --- /dev/null +++ b/Documentation/arm/OMAP/README @@ -0,0 +1,7 @@ +This file contains documentation for running mainline +kernel on omaps. + +KERNEL NEW DEPENDENCIES +v4.3+ Update is needed for custom .config files to make sure + CONFIG_REGULATOR_PBIAS is enabled for MMC1 to work + properly. -- cgit v1.2.3 From 42f2bb1c494543084b764e1ca253c73db910daf2 Mon Sep 17 00:00:00 2001 From: Vinod Koul Date: Tue, 13 Oct 2015 14:57:49 +0530 Subject: ALSA: hdac: Explicitly add io.h Compiling the hdac extended core on arm fails with below error: sound/hda/ext/hdac_ext_bus.c: In function 'hdac_ext_writel': >> sound/hda/ext/hdac_ext_bus.c:29:2: error: implicit declaration of >> function +'writel' [-Werror=implicit-function-declaration] writel(value, addr); ^ sound/hda/ext/hdac_ext_bus.c: In function 'hdac_ext_readl': >> sound/hda/ext/hdac_ext_bus.c:34:2: error: implicit declaration of >> function +'readl' [-Werror=implicit-function-declaration] return readl(addr); This is fixed by explicitly including io.h Fixes: 99463b3a3994 - ('ALSA: hda: provide default bus io ops extended hdac') Reported-by: kbuild test robot Suggested-by: Mark Brown Signed-off-by: Vinod Koul Signed-off-by: Takashi Iwai --- sound/hda/ext/hdac_ext_bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/hda/ext/hdac_ext_bus.c b/sound/hda/ext/hdac_ext_bus.c index 4449d1a99089..2433f7c81472 100644 --- a/sound/hda/ext/hdac_ext_bus.c +++ b/sound/hda/ext/hdac_ext_bus.c @@ -19,6 +19,7 @@ #include #include +#include #include MODULE_DESCRIPTION("HDA extended core"); -- cgit v1.2.3 From e8d65a8d985271a102f07c7456da5b86c19ffe16 Mon Sep 17 00:00:00 2001 From: David Henningsson Date: Tue, 13 Oct 2015 10:10:18 +0200 Subject: ALSA: hda - Fix inverted internal mic on Lenovo G50-80 Add the appropriate quirk to indicate the Lenovo G50-80 has a stereo mic input where one channel has reverse polarity. Alsa-info available at: https://launchpadlibrarian.net/220846272/AlsaInfo.txt Cc: stable@vger.kernel.org BugLink: https://bugs.launchpad.net/bugs/1504778 Signed-off-by: David Henningsson Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_conexant.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c index ca03c40609fc..2f0ec7c45fc7 100644 --- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -819,6 +819,7 @@ static const struct snd_pci_quirk cxt5066_fixups[] = { SND_PCI_QUIRK(0x17aa, 0x21da, "Lenovo X220", CXT_PINCFG_LENOVO_TP410), SND_PCI_QUIRK(0x17aa, 0x21db, "Lenovo X220-tablet", CXT_PINCFG_LENOVO_TP410), SND_PCI_QUIRK(0x17aa, 0x38af, "Lenovo IdeaPad Z560", CXT_FIXUP_MUTE_LED_EAPD), + SND_PCI_QUIRK(0x17aa, 0x390b, "Lenovo G50-80", CXT_FIXUP_STEREO_DMIC), SND_PCI_QUIRK(0x17aa, 0x3975, "Lenovo U300s", CXT_FIXUP_STEREO_DMIC), SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_FIXUP_STEREO_DMIC), SND_PCI_QUIRK(0x17aa, 0x397b, "Lenovo S205", CXT_FIXUP_STEREO_DMIC), -- cgit v1.2.3 From e797e4b71777877b19b50e3d736331c947ccffe7 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Tue, 6 Oct 2015 14:53:01 +0200 Subject: drm/i915: Fix kerneldoc for i915_gem_shrink_all I've botched this in commit eb0b44adc08c0be01a027eb009e9cdadc31e65a2 Author: Daniel Vetter Date: Wed Mar 18 14:47:59 2015 +0100 drm/i915: kerneldoc for i915_gem_shrinker.c so let's fix it. Signed-off-by: Daniel Vetter Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index f6ecbda2c604..674341708033 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -143,7 +143,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv, } /** - * i915_gem_shrink - Shrink buffer object caches completely + * i915_gem_shrink_all - Shrink buffer object caches completely * @dev_priv: i915 device * * This is a simple wraper around i915_gem_shrink() to aggressively shrink all -- cgit v1.2.3 From 40a24488f5250d63341e74b9994159afc4589606 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Fri, 21 Aug 2015 16:08:41 +0100 Subject: drm/i915: Flush pipecontrol post-sync writes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In order to flush the results from in-batch pipecontrol writes (used for example in glQuery) before declaring the batch complete (and so declaring the query results coherent), we need to set the FlushEnable bit in our flushing pipecontrol. The FlushEnable bit "waits until all previous writes of immediate data from post-sync circles are complete before executing the next command". I get GPU hangs on byt without flushing these writes (running ue4). piglit has examples where the flush is required for correct rendering. Signed-off-by: Chris Wilson Reviewed-by: Ville Syrjälä Acked-by: Daniel Vetter Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_lrc.c | 1 + drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 7412caedcf7f..29dd4488dc49 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1659,6 +1659,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request, if (flush_domains) { flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + flags |= PIPE_CONTROL_FLUSH_ENABLE; } if (invalidate_domains) { diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 6e6b8db996ef..61b451fbd09e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -347,6 +347,7 @@ gen7_render_ring_flush(struct drm_i915_gem_request *req, if (flush_domains) { flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + flags |= PIPE_CONTROL_FLUSH_ENABLE; } if (invalidate_domains) { flags |= PIPE_CONTROL_TLB_INVALIDATE; @@ -418,6 +419,7 @@ gen8_render_ring_flush(struct drm_i915_gem_request *req, if (flush_domains) { flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + flags |= PIPE_CONTROL_FLUSH_ENABLE; } if (invalidate_domains) { flags |= PIPE_CONTROL_TLB_INVALIDATE; -- cgit v1.2.3 From 8e7a65aa70bcc1235a44e40ae0da5056525fe081 Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Wed, 7 Oct 2015 22:08:24 +0300 Subject: drm/i915: Restore lost DPLL register write on gen2-4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We accidentally lost the initial DPLL register write in 1c4e02746147 drm/i915: Fix DVO 2x clock enable on 830M The "three times for luck" hack probably saved us from a total disaster. But anyway, bring the initial write back so that the code actually makes some sense. Reported-and-tested-by: Nick Bowler References: http://mid.gmane.org/CAN_QmVyMaArxYgEcVVsGvsMo7-6ohZr8HmF5VhkkL4i9KOmrhw@mail.gmail.com Cc: stable@vger.kernel.org Cc: Nick Bowler Signed-off-by: Ville Syrjälä Reviewed-by: Daniel Vetter Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index cf418be7d30a..bdfac53dd945 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1724,6 +1724,8 @@ static void i9xx_enable_pll(struct intel_crtc *crtc) I915_READ(DPLL(!crtc->pipe)) | DPLL_DVO_2X_MODE); } + I915_WRITE(reg, dpll); + /* Wait for the clocks to stabilize. */ POSTING_READ(reg); udelay(150); -- cgit v1.2.3 From c2b63374461c0986147902f719c26412d1f26fbc Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Wed, 7 Oct 2015 22:08:25 +0300 Subject: drm/i915: Enable DPLL VGA mode before P1/P2 divider write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Apparently writing the DPLL register P1/P2 divider fields won't trigger an actual change in the DPLL output unless VGA mode is enabled for prior to the register write that changes the P1/P2 dividers. The write with the new P1/P2 divider can itself disable VGA mode again without problems. I tested the behaviour on my 946GZ, and when manually frobbing the register with the display on, the behaviour is very clear. However I can't explain why this machine actually works. The P1/P2 divider changes caused by normal modesets do seem to make it through to the hardware somehow since I get a stable picture on the monitor with any resolution. Maybe it's the "three times for luck" stuff that somehow masks the problem, or something. But apparently there are machines (eg. Nick Bowler's G45) where that isn't the case and we fail to get the correct clock from the DPLL. Things used to work because we enabled VGA mode for disabled DPLLs, so when re-enabling the DPLL VGA mode was enabled just prior to the first register write, and hence the P1/P2 change went through without a hitch. That got changed in b8afb9113c51 drm/i915: Keep GMCH DPLL VGA mode always disabled in the name of consistency. In order to keep the consistency part, leave VGA mode disabled for disabled DPLLs, but turn it on just prior to updating the P1/P2 dividers to make sure the hardware picks up on the new values. Cc: Nick Bowler Reported-by: Nick Bowler Tested-by: Nick Bowler Signed-off-by: Ville Syrjälä Reviewed-by: Daniel Vetter Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_display.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index bdfac53dd945..96e6c41783df 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1724,6 +1724,13 @@ static void i9xx_enable_pll(struct intel_crtc *crtc) I915_READ(DPLL(!crtc->pipe)) | DPLL_DVO_2X_MODE); } + /* + * Apparently we need to have VGA mode enabled prior to changing + * the P1/P2 dividers. Otherwise the DPLL will keep using the old + * dividers, even though the register value does change. + */ + I915_WRITE(reg, 0); + I915_WRITE(reg, dpll); /* Wait for the clocks to stabilize. */ -- cgit v1.2.3 From cc917ab43541db3ff66d0136042686d40a1b4c9a Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Tue, 13 Oct 2015 14:22:26 +0100 Subject: drm/i915: Deny wrapping an userptr into a framebuffer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pinning a userptr onto the hardware raises interesting questions about the lifetime of such a surface as the framebuffer extends that life beyond the client's address space. That is the hardware will need to keep scanning out from the backing storage even after the client wants to remap its address space. As the hardware pins the backing storage, the userptr becomes invalid and this raises a WARN when the clients tries to unmap its address space. The situation can be even more complicated when the buffer is passed between processes, between a client and display server, where the lifetime and hardware access is even more confusing. Deny it. Signed-off-by: Chris Wilson Cc: Daniel Vetter Cc: Tvrtko Ursulin Cc: Michał Winiarski Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/i915_gem_userptr.c | 5 ++++- drivers/gpu/drm/i915/intel_display.c | 5 +++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 8fd431bcdfd3..a96b9006a51e 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -804,7 +804,10 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = { * Also note, that the object created here is not currently a "first class" * object, in that several ioctls are banned. These are the CPU access * ioctls: mmap(), pwrite and pread. In practice, you are expected to use - * direct access via your pointer rather than use those ioctls. + * direct access via your pointer rather than use those ioctls. Another + * restriction is that we do not allow userptr surfaces to be pinned to the + * hardware and so we reject any attempt to create a framebuffer out of a + * userptr. * * If you think this is a good interface to use to pass GPU memory between * drivers, please use dma-buf instead. In fact, wherever possible use diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 96e6c41783df..2bf248b04542 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -14116,6 +14116,11 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb, struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); struct drm_i915_gem_object *obj = intel_fb->obj; + if (obj->userptr.mm) { + DRM_DEBUG("attempting to use a userptr for a framebuffer, denied\n"); + return -EINVAL; + } + return drm_gem_handle_create(file, &obj->base, handle); } -- cgit v1.2.3 From ba2374fd2bf379f933773811fdb06cb6a5445f41 Mon Sep 17 00:00:00 2001 From: Christian Zander Date: Wed, 10 Jun 2015 09:41:45 -0700 Subject: iommu/vt-d: fix range computation when making room for large pages In preparation for the installation of a large page, any small page tables that may still exist in the target IOV address range are removed. However, if a scatter/gather list entry is large enough to fit more than one large page, the address space for any subsequent large pages is not cleared of conflicting small page tables. This can cause legitimate mapping requests to fail with errors of the form below, potentially followed by a series of IOMMU faults: ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083) In this example, a 4MiB scatter/gather list entry resulted in the successful installation of a large page @ vPFN 0xfdc00, followed by a failed attempt to install another large page @ vPFN 0xfde00, due to the presence of a pointer to a small page table @ 0x7f83a4000. To address this problem, compute the number of large pages that fit into a given scatter/gather list entry, and use it to derive the last vPFN covered by the large page(s). Cc: stable@vger.kernel.org Signed-off-by: Christian Zander Signed-off-by: David Woodhouse --- drivers/iommu/intel-iommu.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 041bc1810a86..df53855a5f3e 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2115,15 +2115,19 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return -ENOMEM; /* It is large page*/ if (largepage_lvl > 1) { + unsigned long nr_superpages, end_pfn; + pteval |= DMA_PTE_LARGE_PAGE; lvl_pages = lvl_to_nr_pages(largepage_lvl); + + nr_superpages = sg_res / lvl_pages; + end_pfn = iov_pfn + nr_superpages * lvl_pages - 1; + /* * Ensure that old small page tables are - * removed to make room for superpage, - * if they exist. + * removed to make room for superpage(s). */ - dma_pte_free_pagetable(domain, iov_pfn, - iov_pfn + lvl_pages - 1); + dma_pte_free_pagetable(domain, iov_pfn, end_pfn); } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; } -- cgit v1.2.3 From 2e2edebefceef201624dcc323a1f7761e0040cf5 Mon Sep 17 00:00:00 2001 From: Jani Nikula Date: Wed, 14 Oct 2015 11:29:01 +0300 Subject: Revert "drm/i915: Add primary plane to mask if it's visible" This reverts commit 721a09f7393de6c28a07516dccd654c6e995944a. There is nothing wrong with the commit per se. We had two versions of the commit, one in -next headed for v4.4 and this one for v4.3. Turns out we'll need to backport more fixes from -next, and they conflict with the v4.3 version. It gets messy. It will be easiest to revert this one, and backport all the relevant commits from -next without modifications; they apply cleanly after this revert. Requested-by: Joseph Yasi References: https://bugs.freedesktop.org/show_bug.cgi?id=91910#c4 Cc: Maarten Lankhorst Acked-by: Maarten Lankhorst Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_display.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 2bf248b04542..7704315e067f 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -15101,12 +15101,9 @@ static void readout_plane_state(struct intel_crtc *crtc, plane_state = to_intel_plane_state(p->base.state); - if (p->base.type == DRM_PLANE_TYPE_PRIMARY) { + if (p->base.type == DRM_PLANE_TYPE_PRIMARY) plane_state->visible = primary_get_hw_state(crtc); - if (plane_state->visible) - crtc->base.state->plane_mask |= - 1 << drm_plane_index(&p->base); - } else { + else { if (active) p->disable_plane(&p->base, &crtc->base); -- cgit v1.2.3 From c4816c7389d8dbcad036be7e5a34584289d9f590 Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Thu, 10 Sep 2015 18:59:07 +0300 Subject: drm/i915: Assign hwmode after encoder state readout MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The dotclock is often calculated in encoder .get_config(), so we shouldn't copy the adjusted_mode to hwmode until we have read out the dotclock. Gets rid of some warnings like these: [drm:drm_calc_timestamping_constants [drm]] *ERROR* crtc 21: Can't calculate constants, dotclock = 0! [drm:i915_get_vblank_timestamp] crtc 0 is disabled v2: Steal Maarten's idea to move crtc->mode etc. assignment too Cc: Maarten Lankhorst Cc: Patrik Jakobsson Signed-off-by: Ville Syrjälä Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91428 Reviewed-by: Patrik Jakobsson Signed-off-by: Daniel Vetter [Jani: cherry-picked from -next to v4.3] Acked-by: Maarten Lankhorst Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_display.c | 57 +++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 7704315e067f..ed87a7e4c32a 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -15132,33 +15132,6 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev) crtc->base.state->active = crtc->active; crtc->base.enabled = crtc->active; - memset(&crtc->base.mode, 0, sizeof(crtc->base.mode)); - if (crtc->base.state->active) { - intel_mode_from_pipe_config(&crtc->base.mode, crtc->config); - intel_mode_from_pipe_config(&crtc->base.state->adjusted_mode, crtc->config); - WARN_ON(drm_atomic_set_mode_for_crtc(crtc->base.state, &crtc->base.mode)); - - /* - * The initial mode needs to be set in order to keep - * the atomic core happy. It wants a valid mode if the - * crtc's enabled, so we do the above call. - * - * At this point some state updated by the connectors - * in their ->detect() callback has not run yet, so - * no recalculation can be done yet. - * - * Even if we could do a recalculation and modeset - * right now it would cause a double modeset if - * fbdev or userspace chooses a different initial mode. - * - * If that happens, someone indicated they wanted a - * mode change, which means it's safe to do a full - * recalculation. - */ - crtc->base.state->mode.private_flags = I915_MODE_FLAG_INHERITED; - } - - crtc->base.hwmode = crtc->config->base.adjusted_mode; readout_plane_state(crtc, to_intel_crtc_state(crtc->base.state)); DRM_DEBUG_KMS("[CRTC:%d] hw state readout: %s\n", @@ -15218,6 +15191,36 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev) connector->base.name, connector->base.encoder ? "enabled" : "disabled"); } + + for_each_intel_crtc(dev, crtc) { + crtc->base.hwmode = crtc->config->base.adjusted_mode; + + memset(&crtc->base.mode, 0, sizeof(crtc->base.mode)); + if (crtc->base.state->active) { + intel_mode_from_pipe_config(&crtc->base.mode, crtc->config); + intel_mode_from_pipe_config(&crtc->base.state->adjusted_mode, crtc->config); + WARN_ON(drm_atomic_set_mode_for_crtc(crtc->base.state, &crtc->base.mode)); + + /* + * The initial mode needs to be set in order to keep + * the atomic core happy. It wants a valid mode if the + * crtc's enabled, so we do the above call. + * + * At this point some state updated by the connectors + * in their ->detect() callback has not run yet, so + * no recalculation can be done yet. + * + * Even if we could do a recalculation and modeset + * right now it would cause a double modeset if + * fbdev or userspace chooses a different initial mode. + * + * If that happens, someone indicated they wanted a + * mode change, which means it's safe to do a full + * recalculation. + */ + crtc->base.state->mode.private_flags = I915_MODE_FLAG_INHERITED; + } + } } /* Scan out the current hw modeset state, -- cgit v1.2.3 From 0836e6d8c47416d6eb60bb68a5d7213c0c2d0d29 Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Thu, 10 Sep 2015 18:59:08 +0300 Subject: drm/i915: Move sprite/cursor plane disable to intel_sanitize_crtc() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move the sprite/cursor plane disabling to occur in intel_sanitize_crtc() where it belongs instead of doing it in intel_modeset_readout_hw_state(). The plane disabling was first added in 4cf0ebbd4fafbdf8e6431dbb315e5511c3efdc3b drm/i915: Rework plane readout. I got the idea from some patches from Partik and/or Maarten but those moved also the plane state readout to intel_sanitize_crtc() which isn't quite right in my opinion. Cc: Maarten Lankhorst Cc: Patrik Jakobsson Signed-off-by: Ville Syrjälä References: https://bugs.freedesktop.org/show_bug.cgi?id=91910 Reviewed-by: Patrik Jakobsson Signed-off-by: Daniel Vetter [Jani: cherry-picked from -next to v4.3] Acked-by: Maarten Lankhorst Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_display.c | 44 ++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index ed87a7e4c32a..ac407e3e1edd 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -14911,9 +14911,19 @@ static void intel_sanitize_crtc(struct intel_crtc *crtc) /* restore vblank interrupts to correct state */ drm_crtc_vblank_reset(&crtc->base); if (crtc->active) { + struct intel_plane *plane; + drm_calc_timestamping_constants(&crtc->base, &crtc->base.hwmode); update_scanline_offset(crtc); drm_crtc_vblank_on(&crtc->base); + + /* Disable everything but the primary plane */ + for_each_intel_plane_on_crtc(dev, crtc, plane) { + if (plane->base.type == DRM_PLANE_TYPE_PRIMARY) + continue; + + plane->disable_plane(&plane->base, &crtc->base); + } } /* We need to sanitize the plane -> pipe mapping first because this will @@ -15081,35 +15091,21 @@ void i915_redisable_vga(struct drm_device *dev) i915_redisable_vga_power_on(dev); } -static bool primary_get_hw_state(struct intel_crtc *crtc) +static bool primary_get_hw_state(struct intel_plane *plane) { - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; + struct drm_i915_private *dev_priv = to_i915(plane->base.dev); - return !!(I915_READ(DSPCNTR(crtc->plane)) & DISPLAY_PLANE_ENABLE); + return I915_READ(DSPCNTR(plane->plane)) & DISPLAY_PLANE_ENABLE; } -static void readout_plane_state(struct intel_crtc *crtc, - struct intel_crtc_state *crtc_state) +/* FIXME read out full plane state for all planes */ +static void readout_plane_state(struct intel_crtc *crtc) { - struct intel_plane *p; - struct intel_plane_state *plane_state; - bool active = crtc_state->base.active; + struct intel_plane_state *plane_state = + to_intel_plane_state(crtc->base.primary->state); - for_each_intel_plane(crtc->base.dev, p) { - if (crtc->pipe != p->pipe) - continue; - - plane_state = to_intel_plane_state(p->base.state); - - if (p->base.type == DRM_PLANE_TYPE_PRIMARY) - plane_state->visible = primary_get_hw_state(crtc); - else { - if (active) - p->disable_plane(&p->base, &crtc->base); - - plane_state->visible = false; - } - } + plane_state->visible = + primary_get_hw_state(to_intel_plane(crtc->base.primary)); } static void intel_modeset_readout_hw_state(struct drm_device *dev) @@ -15132,7 +15128,7 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev) crtc->base.state->active = crtc->active; crtc->base.enabled = crtc->active; - readout_plane_state(crtc, to_intel_crtc_state(crtc->base.state)); + readout_plane_state(crtc); DRM_DEBUG_KMS("[CRTC:%d] hw state readout: %s\n", crtc->base.base.id, -- cgit v1.2.3 From 18e9345b0db9fe7bd18c3c43967789fe0a2fdb52 Mon Sep 17 00:00:00 2001 From: Maarten Lankhorst Date: Wed, 23 Sep 2015 16:11:41 +0200 Subject: drm/i915: Add primary plane to mask if it's visible This fixes the warnings like "plane A assertion failure, should be disabled but not" that on the initial modeset during boot. This can happen if the primary plane is enabled by the firmware, but inheriting it fails because the DMAR is active or for other reasons. Most likely caused by commit 36750f284b3a4f19b304fda1bb7d6e9e1275ea8d Author: Maarten Lankhorst Date: Mon Jun 1 12:49:54 2015 +0200 drm/i915: update plane state during init This is a new version of commit 721a09f7393de6c28a07516dccd654c6e995944a Author: Maarten Lankhorst Date: Tue Sep 15 14:28:54 2015 +0200 drm/i915: Add primary plane to mask if it's visible That was reverted in order to facilitate easier backporting of some commits from -next to v4.3. Reported-by: Andreas Reis Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91429 Reported-and-tested-by: Emil Renner Berthing Tested-by: Andreas Reis Signed-off-by: Daniel Vetter [Jani: cherry-picked from -next to v4.3] Acked-by: Maarten Lankhorst Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_display.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index ac407e3e1edd..b2270d576979 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -15101,11 +15101,15 @@ static bool primary_get_hw_state(struct intel_plane *plane) /* FIXME read out full plane state for all planes */ static void readout_plane_state(struct intel_crtc *crtc) { + struct drm_plane *primary = crtc->base.primary; struct intel_plane_state *plane_state = - to_intel_plane_state(crtc->base.primary->state); + to_intel_plane_state(primary->state); plane_state->visible = - primary_get_hw_state(to_intel_plane(crtc->base.primary)); + primary_get_hw_state(to_intel_plane(primary)); + + if (plane_state->visible) + crtc->base.state->plane_mask |= 1 << drm_plane_index(primary); } static void intel_modeset_readout_hw_state(struct drm_device *dev) -- cgit v1.2.3 From 8a53554e12e98d1759205afd7b8e9e2ea0936f48 Mon Sep 17 00:00:00 2001 From: "Kővágó, Zoltán" Date: Mon, 12 Oct 2015 15:13:56 +0100 Subject: x86/efi: Fix multiple GOP device support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When multiple GOP devices exists, but none of them implements ConOut, the code should just choose the first GOP (according to the comments). But currently 'fb_base' will refer to the last GOP, while other parameters to the first GOP, which will likely result in a garbled display. I can reliably reproduce this bug using my ASRock Z87M Extreme4 motherboard with CSM and integrated GPU disabled, and two PCIe video cards (NVidia GT640 and GTX980), booting from efi-stub (booting from grub works fine). On the primary display the ASRock logo remains and on the secondary screen it is garbled up completely. Signed-off-by: Kővágó, Zoltán Signed-off-by: Matt Fleming Cc: Cc: Linus Torvalds Cc: Matthew Garrett Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1444659236-24837-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar --- arch/x86/boot/compressed/eboot.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index ee1b6d346b98..db51c1f27446 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -667,6 +667,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto, bool conout_found = false; void *dummy = NULL; u32 h = handles[i]; + u32 current_fb_base; status = efi_call_early(handle_protocol, h, proto, (void **)&gop32); @@ -678,7 +679,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto, if (status == EFI_SUCCESS) conout_found = true; - status = __gop_query32(gop32, &info, &size, &fb_base); + status = __gop_query32(gop32, &info, &size, ¤t_fb_base); if (status == EFI_SUCCESS && (!first_gop || conout_found)) { /* * Systems that use the UEFI Console Splitter may @@ -692,6 +693,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto, pixel_format = info->pixel_format; pixel_info = info->pixel_information; pixels_per_scan_line = info->pixels_per_scan_line; + fb_base = current_fb_base; /* * Once we've found a GOP supporting ConOut, @@ -770,6 +772,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto, bool conout_found = false; void *dummy = NULL; u64 h = handles[i]; + u32 current_fb_base; status = efi_call_early(handle_protocol, h, proto, (void **)&gop64); @@ -781,7 +784,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto, if (status == EFI_SUCCESS) conout_found = true; - status = __gop_query64(gop64, &info, &size, &fb_base); + status = __gop_query64(gop64, &info, &size, ¤t_fb_base); if (status == EFI_SUCCESS && (!first_gop || conout_found)) { /* * Systems that use the UEFI Console Splitter may @@ -795,6 +798,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto, pixel_format = info->pixel_format; pixel_info = info->pixel_information; pixels_per_scan_line = info->pixels_per_scan_line; + fb_base = current_fb_base; /* * Once we've found a GOP supporting ConOut, -- cgit v1.2.3 From 6391074598442b8a8d33e2cfdf277d5568b57f2d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 12 Oct 2015 15:44:49 +0200 Subject: ARM: pxa: fix pxa3xx DFI lockup hack Some recently added code to avoid a bug introduced a build error when CONFIG_PM is disabled and a macro is hidden: arch/arm/mach-pxa/pxa3xx.c: In function 'pxa3xx_init': arch/arm/mach-pxa/pxa3xx.c:439:3: error: 'NDCR' undeclared (first use in this function) NDCR = (NDCR & ~NDCR_ND_ARB_EN) | NDCR_ND_ARB_CNTL; ^ This moves the macro outside of the #ifdef so it can be referenced correctly. Signed-off-by: Arnd Bergmann Fixes: adf3442cc890 ("ARM: pxa: fix DFI bus lockups on startup") Acked-by: Robert Jarzmik --- arch/arm/mach-pxa/pxa3xx.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm/mach-pxa/pxa3xx.c b/arch/arm/mach-pxa/pxa3xx.c index 06005d3f2ba3..20ce2d386f17 100644 --- a/arch/arm/mach-pxa/pxa3xx.c +++ b/arch/arm/mach-pxa/pxa3xx.c @@ -42,10 +42,6 @@ #define PECR_IS(n) ((1 << ((n) * 2)) << 29) extern void __init pxa_dt_irq_init(int (*fn)(struct irq_data *, unsigned int)); -#ifdef CONFIG_PM - -#define ISRAM_START 0x5c000000 -#define ISRAM_SIZE SZ_256K /* * NAND NFC: DFI bus arbitration subset @@ -54,6 +50,11 @@ extern void __init pxa_dt_irq_init(int (*fn)(struct irq_data *, unsigned int)); #define NDCR_ND_ARB_EN (1 << 12) #define NDCR_ND_ARB_CNTL (1 << 19) +#ifdef CONFIG_PM + +#define ISRAM_START 0x5c000000 +#define ISRAM_SIZE SZ_256K + static void __iomem *sram; static unsigned long wakeup_src; -- cgit v1.2.3 From 83bf6b13834d9c926905e45cdfda23fe218fc598 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 13 Oct 2015 19:46:54 +0200 Subject: ARM: ux500: modify initial levelshifter status commit 1d8aca9df612f5751892fb2642d72536f2f48fd0 "ARM: ux500: fix MMC/SD card regression" fixed broken the level shifter: it should be default ON but became default OFF. Fixes: 1d8aca9df612 "ARM: ux500: fix MMC/SD card regression" Reported-and-tested-by: Ulf Hansson Signed-off-by: Linus Walleij Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/ste-hrefv60plus.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/ste-hrefv60plus.dtsi b/arch/arm/boot/dts/ste-hrefv60plus.dtsi index 810cda743b6d..9c2387b34d0c 100644 --- a/arch/arm/boot/dts/ste-hrefv60plus.dtsi +++ b/arch/arm/boot/dts/ste-hrefv60plus.dtsi @@ -56,7 +56,7 @@ /* VMMCI level-shifter enable */ default_hrefv60_cfg2 { pins = "GPIO169_D22"; - ste,config = <&gpio_out_lo>; + ste,config = <&gpio_out_hi>; }; /* VMMCI level-shifter voltage select */ default_hrefv60_cfg3 { -- cgit v1.2.3 From 5c6dcd7f3b26736a88593586fbeec28b6a1ea78d Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Wed, 7 Oct 2015 18:39:40 +0100 Subject: MAINTAINERS: Update Allwinner entry and add new maintainer Add Chen-Yu Tsai as a co-maintainer to the ARM sunxi support. While we are doing so, also update the entry for new SoCs. Signed-off-by: Maxime Ripard Signed-off-by: Arnd Bergmann --- MAINTAINERS | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5f467845ef72..23b72245d829 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -894,11 +894,12 @@ M: Lennert Buytenhek L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained -ARM/Allwinner A1X SoC support +ARM/Allwinner sunXi SoC support M: Maxime Ripard +M: Chen-Yu Tsai L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained -N: sun[x4567]i +N: sun[x456789]i ARM/Allwinner SoC Clock Support M: Emilio López -- cgit v1.2.3 From f9e5ca86eeaae430e70b05ec312372dca1055d8d Mon Sep 17 00:00:00 2001 From: Carlo Caione Date: Thu, 1 Oct 2015 12:52:40 +0200 Subject: ARM: meson6: DTS: Fix wrong reg mapping and IRQ numbers The DTS erronously uses the wrong reg mapping and IRQ numbers for some UART, WDT and timer nodes. Fix this. Reported-by: John Wehle Signed-off-by: Carlo Caione Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/meson.dtsi | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/arm/boot/dts/meson.dtsi b/arch/arm/boot/dts/meson.dtsi index 548441384d2a..8c77c87660cd 100644 --- a/arch/arm/boot/dts/meson.dtsi +++ b/arch/arm/boot/dts/meson.dtsi @@ -67,7 +67,7 @@ timer@c1109940 { compatible = "amlogic,meson6-timer"; - reg = <0xc1109940 0x14>; + reg = <0xc1109940 0x18>; interrupts = <0 10 1>; }; @@ -80,36 +80,37 @@ wdt: watchdog@c1109900 { compatible = "amlogic,meson6-wdt"; reg = <0xc1109900 0x8>; + interrupts = <0 0 1>; }; uart_AO: serial@c81004c0 { compatible = "amlogic,meson-uart"; - reg = <0xc81004c0 0x14>; + reg = <0xc81004c0 0x18>; interrupts = <0 90 1>; clocks = <&clk81>; status = "disabled"; }; - uart_A: serial@c81084c0 { + uart_A: serial@c11084c0 { compatible = "amlogic,meson-uart"; - reg = <0xc81084c0 0x14>; - interrupts = <0 90 1>; + reg = <0xc11084c0 0x18>; + interrupts = <0 26 1>; clocks = <&clk81>; status = "disabled"; }; - uart_B: serial@c81084dc { + uart_B: serial@c11084dc { compatible = "amlogic,meson-uart"; - reg = <0xc81084dc 0x14>; - interrupts = <0 90 1>; + reg = <0xc11084dc 0x18>; + interrupts = <0 75 1>; clocks = <&clk81>; status = "disabled"; }; - uart_C: serial@c8108700 { + uart_C: serial@c1108700 { compatible = "amlogic,meson-uart"; - reg = <0xc8108700 0x14>; - interrupts = <0 90 1>; + reg = <0xc1108700 0x18>; + interrupts = <0 93 1>; clocks = <&clk81>; status = "disabled"; }; -- cgit v1.2.3 From db347f1a5304d68c68c52f19971924b1e5842f3c Mon Sep 17 00:00:00 2001 From: Marcin Wojtas Date: Thu, 15 Oct 2015 03:17:08 +0200 Subject: ARM: mvebu: correct a385-db-ap compatible string This commit enables standby support on Armada 385 DB-AP board, because the PM initalization routine requires "marvell,armada380" compatible string for all Armada 38x-based platforms. Beside the compatible "marvell,armada38x" was wrong and should be fixed in the stable kernels too. [gregory.clement@free-electrons.com: add information, about the fixes] Fixes: e5ee12817e9ea ("ARM: mvebu: Add Armada 385 Access Point Development Board support") Signed-off-by: Marcin Wojtas Signed-off-by: Gregory CLEMENT Cc: --- arch/arm/boot/dts/armada-385-db-ap.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/armada-385-db-ap.dts b/arch/arm/boot/dts/armada-385-db-ap.dts index 89f5a95954ed..4047621b137e 100644 --- a/arch/arm/boot/dts/armada-385-db-ap.dts +++ b/arch/arm/boot/dts/armada-385-db-ap.dts @@ -46,7 +46,7 @@ / { model = "Marvell Armada 385 Access Point Development Board"; - compatible = "marvell,a385-db-ap", "marvell,armada385", "marvell,armada38x"; + compatible = "marvell,a385-db-ap", "marvell,armada385", "marvell,armada380"; chosen { stdout-path = "serial1:115200n8"; -- cgit v1.2.3 From d14f6fced5f9360edca5a1325ddb7077aab1203b Mon Sep 17 00:00:00 2001 From: Jay Cornwall Date: Wed, 16 Sep 2015 14:10:03 -0500 Subject: iommu/amd: Fix BUG when faulting a PROT_NONE VMA handle_mm_fault indirectly triggers a BUG in do_numa_page when given a VMA without read/write/execute access. Check this condition in do_fault. do_fault -> handle_mm_fault -> handle_pte_fault -> do_numa_page mm/memory.c 3147 static int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, .... 3159 /* A PROT_NONE fault should not end up here */ 3160 BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))); Signed-off-by: Jay Cornwall Cc: # v4.1+ Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu_v2.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c index 1131664b918b..d21d4edf7236 100644 --- a/drivers/iommu/amd_iommu_v2.c +++ b/drivers/iommu/amd_iommu_v2.c @@ -516,6 +516,13 @@ static void do_fault(struct work_struct *work) goto out; } + if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) { + /* handle_mm_fault would BUG_ON() */ + up_read(&mm->mmap_sem); + handle_fault_error(fault); + goto out; + } + ret = handle_mm_fault(mm, vma, address, write); if (ret & VM_FAULT_ERROR) { /* failed to service fault */ -- cgit v1.2.3 From f42d79ab67322e51b92dd7aa965e310c71352a64 Mon Sep 17 00:00:00 2001 From: Junichi Nomura Date: Wed, 14 Oct 2015 05:02:15 +0000 Subject: blk-mq: fix use-after-free in blk_mq_free_tag_set() tags is freed in blk_mq_free_rq_map() and should not be used after that. The problem doesn't manifest if CONFIG_CPUMASK_OFFSTACK is false because free_cpumask_var() is nop. tags->cpumask is allocated in blk_mq_init_tags() so it's natural to free cpumask in its counter part, blk_mq_free_tags(). Fixes: f26cdc8536ad ("blk-mq: Shared tag enhancements") Signed-off-by: Jun'ichi Nomura Cc: Keith Busch Reviewed-by: Jeff Moyer Signed-off-by: Jens Axboe --- block/blk-mq-tag.c | 1 + block/blk-mq.c | 4 +--- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index ed96474d75cb..ec2d11915142 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -641,6 +641,7 @@ void blk_mq_free_tags(struct blk_mq_tags *tags) { bt_free(&tags->bitmap_tags); bt_free(&tags->breserved_tags); + free_cpumask_var(tags->cpumask); kfree(tags); } diff --git a/block/blk-mq.c b/block/blk-mq.c index 7785ae96267a..85f014327342 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2296,10 +2296,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set) int i; for (i = 0; i < set->nr_hw_queues; i++) { - if (set->tags[i]) { + if (set->tags[i]) blk_mq_free_rq_map(set, set->tags[i], i); - free_cpumask_var(set->tags[i]->cpumask); - } } kfree(set->tags); -- cgit v1.2.3 From b20519fd5007908c6a816cab1b9db911915e9fbd Mon Sep 17 00:00:00 2001 From: Pawel Moll Date: Thu, 15 Oct 2015 14:32:45 +0100 Subject: bus: arm-ccn: Handle correctly no-more-cpus case When migrating events the driver picks another cpu using cpumask_any_but() function, which returns value >= nr_cpu_ids when there is none available, not a negative value as the code assumed. Fixed now. Reported-by: Dan Carpenter Signed-off-by: Pawel Moll Signed-off-by: Arnd Bergmann --- drivers/bus/arm-ccn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index 7d9879e166cf..cc322fbde9af 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -1184,7 +1184,7 @@ static int arm_ccn_pmu_cpu_notifier(struct notifier_block *nb, if (!cpumask_test_and_clear_cpu(cpu, &dt->cpu)) break; target = cpumask_any_but(cpu_online_mask, cpu); - if (target < 0) + if (target >= nr_cpu_ids) break; perf_pmu_migrate_context(&dt->pmu, cpu, target); cpumask_set_cpu(target, &dt->cpu); -- cgit v1.2.3 From a0bcbe969f564d1ec08658170dda72a1b7e9053a Mon Sep 17 00:00:00 2001 From: Pawel Moll Date: Thu, 15 Oct 2015 14:32:46 +0100 Subject: bus: arm-ccn: Fix irq affinity setting on CPU migration When PMU context is migrating between CPUs, interrupt affinity is set as well. Only this should not happen when the CCN interrupt is not being used at all (the driver is using a hrtimer tick instead). Fixed now. Cc: # 4.2+ Signed-off-by: Pawel Moll Signed-off-by: Arnd Bergmann --- drivers/bus/arm-ccn.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index cc322fbde9af..7082c7268845 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -1188,7 +1188,8 @@ static int arm_ccn_pmu_cpu_notifier(struct notifier_block *nb, break; perf_pmu_migrate_context(&dt->pmu, cpu, target); cpumask_set_cpu(target, &dt->cpu); - WARN_ON(irq_set_affinity(ccn->irq, &dt->cpu) != 0); + if (ccn->irq) + WARN_ON(irq_set_affinity(ccn->irq, &dt->cpu) != 0); default: break; } -- cgit v1.2.3 From fb659882cc6482bd2e32ec0ab8ab7afeda649413 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 12 Oct 2015 14:48:39 +0100 Subject: drivers/perf: arm_pmu: avoid CPU device_node reference leak of_cpu_device_node_get increments the reference count on the CPU device_node, so we must take care to of_node_put once we've finished with it. This patch fixes the perf IRQ probing code to avoid the leak. Cc: Sudeep Holla Cc: Mark Rutland Signed-off-by: Will Deacon Signed-off-by: Arnd Bergmann --- drivers/perf/arm_pmu.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 2365a32a595e..be3755c973e9 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -823,9 +823,15 @@ static int of_pmu_irq_cfg(struct arm_pmu *pmu) } /* Now look up the logical CPU number */ - for_each_possible_cpu(cpu) - if (dn == of_cpu_device_node_get(cpu)) + for_each_possible_cpu(cpu) { + struct device_node *cpu_dn; + + cpu_dn = of_cpu_device_node_get(cpu); + of_node_put(cpu_dn); + + if (dn == cpu_dn) break; + } if (cpu >= nr_cpu_ids) { pr_warn("Failed to find logical CPU for %s\n", -- cgit v1.2.3 From 2e4e5da55afaf9315f2398e85424fd3824459220 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 15 Oct 2015 20:32:05 +0900 Subject: ARM: dts: uniphier: fix IRQ number for devices on PH1-LD6b ref board The IRQ signal from external devices on this board is connected to the XIRQ4 pin of the SoC. The IRQ number should be 52, not 50. Fixes: a5e921b4771f ("ARM: dts: uniphier: add ProXstream2 and PH1-LD6b SoC/board support") Signed-off-by: Masahiro Yamada Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/uniphier-ph1-ld6b-ref.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/uniphier-ph1-ld6b-ref.dts b/arch/arm/boot/dts/uniphier-ph1-ld6b-ref.dts index 33963acd7e8f..f80f772d99fb 100644 --- a/arch/arm/boot/dts/uniphier-ph1-ld6b-ref.dts +++ b/arch/arm/boot/dts/uniphier-ph1-ld6b-ref.dts @@ -85,7 +85,7 @@ }; ðsc { - interrupts = <0 50 4>; + interrupts = <0 52 4>; }; &serial0 { -- cgit v1.2.3 From 81c04b943872e0332872df18cec1dec89b178b4d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 12 Oct 2015 21:23:39 +0200 Subject: nvme: use an integer value to Linux errno values Use a separate integer variable to hold the signed Linux errno values we pass back to the block layer. Note that for pass through commands those might still be NVMe values, but those fit into the int as well. Fixes: f4829a9b7a61: ("blk-mq: fix racy updates of rq->errors") Reported-by: Dan Carpenter Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- drivers/block/nvme-core.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c index fce353ba5f66..84e4a8088386 100644 --- a/drivers/block/nvme-core.c +++ b/drivers/block/nvme-core.c @@ -603,8 +603,8 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, struct nvme_iod *iod = ctx; struct request *req = iod_get_private(iod); struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req); - u16 status = le16_to_cpup(&cqe->status) >> 1; + int error = 0; if (unlikely(status)) { if (!(status & NVME_SC_DNR || blk_noretry_request(req)) @@ -621,9 +621,11 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, if (req->cmd_type == REQ_TYPE_DRV_PRIV) { if (cmd_rq->ctx == CMD_CTX_CANCELLED) - status = -EINTR; + error = -EINTR; + else + error = status; } else { - status = nvme_error_status(status); + error = nvme_error_status(status); } } @@ -635,7 +637,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, if (cmd_rq->aborted) dev_warn(nvmeq->dev->dev, "completing aborted command with status:%04x\n", - status); + error); if (iod->nents) { dma_unmap_sg(nvmeq->dev->dev, iod->sg, iod->nents, @@ -649,7 +651,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, } nvme_free_iod(nvmeq->dev, iod); - blk_mq_complete_request(req, status); + blk_mq_complete_request(req, error); } /* length is in bytes. gfp flags indicates whether we may sleep. */ -- cgit v1.2.3 From b02176f30cd30acccd3b633ab7d9aed8b5da52ff Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 8 Sep 2015 12:20:22 -0400 Subject: block: don't release bdi while request_queue has live references bdi's are initialized in two steps, bdi_init() and bdi_register(), but destroyed in a single step by bdi_destroy() which, for a bdi embedded in a request_queue, is called during blk_cleanup_queue() which makes the queue invisible and starts the draining of remaining usages. A request_queue's user can access the congestion state of the embedded bdi as long as it holds a reference to the queue. As such, it may access the congested state of a queue which finished blk_cleanup_queue() but hasn't reached blk_release_queue() yet. Because the congested state was embedded in backing_dev_info which in turn is embedded in request_queue, accessing the congested state after bdi_destroy() was called was fine. The bdi was destroyed but the memory region for the congested state remained accessible till the queue got released. a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") changed the situation. Now, the root congested state which is expected to be pinned while request_queue remains accessible is separately reference counted and the base ref is put during bdi_destroy(). This means that the root congested state may go away prematurely while the queue is between bdi_dstroy() and blk_cleanup_queue(), which was detected by Andrey's KASAN tests. The root cause of this problem is that bdi doesn't distinguish the two steps of destruction, unregistration and release, and now the root congested state actually requires a separate release step. To fix the issue, this patch separates out bdi_unregister() and bdi_exit() from bdi_destroy(). bdi_unregister() is called from blk_cleanup_queue() and bdi_exit() from blk_release_queue(). bdi_destroy() is now just a simple wrapper calling the two steps back-to-back. While at it, the prototype of bdi_destroy() is moved right below bdi_setup_and_register() so that the counterpart operations are located together. Signed-off-by: Tejun Heo Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") Cc: stable@vger.kernel.org # v4.2+ Reported-and-tested-by: Andrey Konovalov Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com Reviewed-by: Jan Kara Reviewed-by: Jeff Moyer Signed-off-by: Jens Axboe --- block/blk-core.c | 2 +- block/blk-sysfs.c | 1 + include/linux/backing-dev.h | 6 +++++- mm/backing-dev.c | 12 +++++++++++- 4 files changed, 18 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 2eb722d48773..18e92a6645e2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -576,7 +576,7 @@ void blk_cleanup_queue(struct request_queue *q) q->queue_lock = &q->__queue_lock; spin_unlock_irq(lock); - bdi_destroy(&q->backing_dev_info); + bdi_unregister(&q->backing_dev_info); /* @q is and will stay empty, shutdown and put */ blk_put_queue(q); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 3e44a9da2a13..07b42f5ad797 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -540,6 +540,7 @@ static void blk_release_queue(struct kobject *kobj) struct request_queue *q = container_of(kobj, struct request_queue, kobj); + bdi_exit(&q->backing_dev_info); blkcg_exit_queue(q); if (q->elevator) { diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 78677e5a65bf..c85f74946a8b 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -19,13 +19,17 @@ #include int __must_check bdi_init(struct backing_dev_info *bdi); -void bdi_destroy(struct backing_dev_info *bdi); +void bdi_exit(struct backing_dev_info *bdi); __printf(3, 4) int bdi_register(struct backing_dev_info *bdi, struct device *parent, const char *fmt, ...); int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev); +void bdi_unregister(struct backing_dev_info *bdi); + int __must_check bdi_setup_and_register(struct backing_dev_info *, char *); +void bdi_destroy(struct backing_dev_info *bdi); + void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, bool range_cyclic, enum wb_reason reason); void wb_start_background_writeback(struct bdi_writeback *wb); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index e92d77937fd3..9e841399041a 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -835,7 +835,7 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi) synchronize_rcu_expedited(); } -void bdi_destroy(struct backing_dev_info *bdi) +void bdi_unregister(struct backing_dev_info *bdi) { /* make sure nobody finds us on the bdi_list anymore */ bdi_remove_from_list(bdi); @@ -847,9 +847,19 @@ void bdi_destroy(struct backing_dev_info *bdi) device_unregister(bdi->dev); bdi->dev = NULL; } +} +void bdi_exit(struct backing_dev_info *bdi) +{ + WARN_ON_ONCE(bdi->dev); wb_exit(&bdi->wb); } + +void bdi_destroy(struct backing_dev_info *bdi) +{ + bdi_unregister(bdi); + bdi_exit(bdi); +} EXPORT_SYMBOL(bdi_destroy); /* -- cgit v1.2.3 From 4f1d841475e1f6e9e32496dda11215db56f4ea73 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Fri, 9 Oct 2015 17:51:47 +0200 Subject: ARM: tegra: Comment out gpio-ranges properties While the addition of these properties is technically correct it unveils a bug with deferred probe. The problem is that the presence of the gpio- range property causes the gpio-tegra driver to defer probe (it needs the pinctrl driver to be ready). That's technically correct, but it causes a couple of issues: - The keyboard on Chromebooks stops working. The reason for that is that the gpio-tegra device has not registered an IRQ domain by the time the EC SPI device is registered, hence the interrupt number resolves to 0. This is technically a bug in the SPI core, since it should really resolve the interrupt at probe time and defer if the IRQ domain isn't available yet. This is similar to what's done for I2C and platform device already. - The gpio-tegra device deferring probe means that it is moved to the end of the dpm_list. This list defines the suspend/resume order for devices. However the core lacks a way to move all users of the gpio-tegra device to the end of the dpm_list at the same time. This in turn results in a subtle bug on Jetson TK1, where the gpio-keys device is used to expose the power key as input. The power key is a convenient way to wake the system from suspend. Interestingly, the gpio-keys device ends up getting probed at a point after gpio-tegra has been probed successfully from having been deferred earlier. As such the driver doesn't need to defer the probe itself, and hence the device isn't moved to the end of the dpm_list. This causes the gpio-tegra device to be suspended before gpio-keys, which in turn leaves gpio-keys unable to wake the system from suspend. There are patches in the works to fix both of the above issues, but they are too involved to make it into v4.3, so in the meantime let's fix the regressions by commenting out the gpio-ranges properties until the fixes have landed. Signed-off-by: Thierry Reding Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/tegra114.dtsi | 2 ++ arch/arm/boot/dts/tegra124.dtsi | 2 ++ arch/arm/boot/dts/tegra20.dtsi | 2 ++ arch/arm/boot/dts/tegra30.dtsi | 2 ++ 4 files changed, 8 insertions(+) diff --git a/arch/arm/boot/dts/tegra114.dtsi b/arch/arm/boot/dts/tegra114.dtsi index 9d4f86e9c50a..d845bd1448b5 100644 --- a/arch/arm/boot/dts/tegra114.dtsi +++ b/arch/arm/boot/dts/tegra114.dtsi @@ -234,7 +234,9 @@ gpio-controller; #interrupt-cells = <2>; interrupt-controller; + /* gpio-ranges = <&pinmux 0 0 246>; + */ }; apbmisc@70000800 { diff --git a/arch/arm/boot/dts/tegra124.dtsi b/arch/arm/boot/dts/tegra124.dtsi index 1e204a6de12c..819e2ae2cabe 100644 --- a/arch/arm/boot/dts/tegra124.dtsi +++ b/arch/arm/boot/dts/tegra124.dtsi @@ -258,7 +258,9 @@ gpio-controller; #interrupt-cells = <2>; interrupt-controller; + /* gpio-ranges = <&pinmux 0 0 251>; + */ }; apbdma: dma@0,60020000 { diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi index e058709e6d98..969b828505ae 100644 --- a/arch/arm/boot/dts/tegra20.dtsi +++ b/arch/arm/boot/dts/tegra20.dtsi @@ -244,7 +244,9 @@ gpio-controller; #interrupt-cells = <2>; interrupt-controller; + /* gpio-ranges = <&pinmux 0 0 224>; + */ }; apbmisc@70000800 { diff --git a/arch/arm/boot/dts/tegra30.dtsi b/arch/arm/boot/dts/tegra30.dtsi index fe04fb5e155f..c6938ad1b543 100644 --- a/arch/arm/boot/dts/tegra30.dtsi +++ b/arch/arm/boot/dts/tegra30.dtsi @@ -349,7 +349,9 @@ gpio-controller; #interrupt-cells = <2>; interrupt-controller; + /* gpio-ranges = <&pinmux 0 0 248>; + */ }; apbmisc@70000800 { -- cgit v1.2.3 From f05819df10d7b09f6d1eb6f8534a8f68e5a4fe61 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 15 Oct 2015 17:21:37 +0100 Subject: KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring The following sequence of commands: i=`keyctl add user a a @s` keyctl request2 keyring foo bar @t keyctl unlink $i @s tries to invoke an upcall to instantiate a keyring if one doesn't already exist by that name within the user's keyring set. However, if the upcall fails, the code sets keyring->type_data.reject_error to -ENOKEY or some other error code. When the key is garbage collected, the key destroy function is called unconditionally and keyring_destroy() uses list_empty() on keyring->type_data.link - which is in a union with reject_error. Subsequently, the kernel tries to unlink the keyring from the keyring names list - which oopses like this: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [] keyring_destroy+0x3d/0x88 ... Workqueue: events key_garbage_collector ... RIP: 0010:[] keyring_destroy+0x3d/0x88 RSP: 0018:ffff88003e2f3d30 EFLAGS: 00010203 RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40 RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000 R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900 R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000 ... CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0 ... Call Trace: [] key_gc_unused_keys.constprop.1+0x5d/0x10f [] key_garbage_collector+0x1fa/0x351 [] process_one_work+0x28e/0x547 [] worker_thread+0x26e/0x361 [] ? rescuer_thread+0x2a8/0x2a8 [] kthread+0xf3/0xfb [] ? kthread_create_on_node+0x1c2/0x1c2 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x1c2/0x1c2 Note the value in RAX. This is a 32-bit representation of -ENOKEY. The solution is to only call ->destroy() if the key was successfully instantiated. Reported-by: Dmitry Vyukov Signed-off-by: David Howells Tested-by: Dmitry Vyukov --- security/keys/gc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/security/keys/gc.c b/security/keys/gc.c index 39eac1fd5706..addf060399e0 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -134,8 +134,10 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); - /* Throw away the key data */ - if (key->type->destroy) + /* Throw away the key data if the key is instantiated */ + if (test_bit(KEY_FLAG_INSTANTIATED, &key->flags) && + !test_bit(KEY_FLAG_NEGATIVE, &key->flags) && + key->type->destroy) key->type->destroy(key); security_key_free(key); -- cgit v1.2.3 From 17b38fb89055bf5df402980c9546a8b046552f2b Mon Sep 17 00:00:00 2001 From: Doron Tsur Date: Thu, 15 Oct 2015 15:01:02 +0300 Subject: IB/core: Fix memory corruption in ib_cache_gid_set_default_gid When ib_cache_gid_set_default_gid is called from several threads, updating the table could make find_gid fail, therefore a negative index will be retruned and an invalid table entry will be used. Locking find_gid as well fixes this problem. Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management') Signed-off-by: Doron Tsur Signed-off-by: Matan Barak Signed-off-by: Doug Ledford --- drivers/infiniband/core/cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 8f66c67ff0df..87471ef37198 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -508,12 +508,12 @@ void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port, memset(&gid_attr, 0, sizeof(gid_attr)); gid_attr.ndev = ndev; + mutex_lock(&table->lock); ix = find_gid(table, NULL, NULL, true, GID_ATTR_FIND_MASK_DEFAULT); /* Coudn't find default GID location */ WARN_ON(ix < 0); - mutex_lock(&table->lock); if (!__ib_cache_gid_get(ib_dev, port, ix, ¤t_gid, ¤t_gid_attr) && mode == IB_CACHE_GID_DEFAULT_MODE_SET && -- cgit v1.2.3 From 0dfc70c33409afc232ef0b9ec210535dfbf9bc61 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 15 Oct 2015 13:38:48 -0600 Subject: NVMe: Fix memory leak on retried commands Resources are reallocated for requeued commands, so unmap and release the iod for the failed command. It's a pretty bad memory leak and causes a kernel hang if you remove a drive because of a busy dma pool. You'll get messages spewing like this: nvme 0000:xx:xx.x: dma_pool_destroy prp list 256, ffff880420dec000 busy and lock up pci and the driver since removal never completes while holding a lock. Cc: stable@vger.kernel.org Cc: # 4.0.x- Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- drivers/block/nvme-core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c index 84e4a8088386..ccc0c1f93daa 100644 --- a/drivers/block/nvme-core.c +++ b/drivers/block/nvme-core.c @@ -604,6 +604,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, struct request *req = iod_get_private(iod); struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req); u16 status = le16_to_cpup(&cqe->status) >> 1; + bool requeue = false; int error = 0; if (unlikely(status)) { @@ -611,12 +612,13 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, && (jiffies - req->start_time) < req->timeout) { unsigned long flags; + requeue = true; blk_mq_requeue_request(req); spin_lock_irqsave(req->q->queue_lock, flags); if (!blk_queue_stopped(req->q)) blk_mq_kick_requeue_list(req->q); spin_unlock_irqrestore(req->q->queue_lock, flags); - return; + goto release_iod; } if (req->cmd_type == REQ_TYPE_DRV_PRIV) { @@ -639,6 +641,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, "completing aborted command with status:%04x\n", error); +release_iod: if (iod->nents) { dma_unmap_sg(nvmeq->dev->dev, iod->sg, iod->nents, rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE); @@ -651,7 +654,8 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx, } nvme_free_iod(nvmeq->dev, iod); - blk_mq_complete_request(req, error); + if (likely(!requeue)) + blk_mq_complete_request(req, error); } /* length is in bytes. gfp flags indicates whether we may sleep. */ -- cgit v1.2.3 From f5f3497cad8c8416a74b9aaceb127908755d020a Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 14 Oct 2015 13:30:45 +0200 Subject: x86/setup: Extend low identity map to cover whole kernel range On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 000000ff IDT= fffbb000 000007ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 (gdb) x/4wx 0x3399000 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x00000000 [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini Reported-by: Laszlo Ersek Cc: Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Andy Lutomirski Signed-off-by: Matt Fleming [ Updated changelog. ] --- arch/x86/kernel/setup.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index fdb7f2a2d328..a3cccbfc5f77 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1173,6 +1173,14 @@ void __init setup_arch(char **cmdline_p) clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); + + /* + * sync back low identity map too. It is used for example + * in the 32-bit EFI stub. + */ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); #endif tboot_probe(); -- cgit v1.2.3 From 7ba6e4ef76c7e43101bd5e0f8987c11a8ed0d325 Mon Sep 17 00:00:00 2001 From: Bard Liao Date: Fri, 16 Oct 2015 15:21:32 +0800 Subject: ASoC: rt298: correct index default value Some of the default value on rt298_index_def are incorrect. Change them to the correct value. Signed-off-by: Bard Liao Signed-off-by: Mark Brown --- sound/soc/codecs/rt298.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/sound/soc/codecs/rt298.c b/sound/soc/codecs/rt298.c index 3c2f0f8d6266..d3e30a645ae3 100644 --- a/sound/soc/codecs/rt298.c +++ b/sound/soc/codecs/rt298.c @@ -50,24 +50,24 @@ struct rt298_priv { }; static struct reg_default rt298_index_def[] = { - { 0x01, 0xaaaa }, - { 0x02, 0x8aaa }, + { 0x01, 0xa5a8 }, + { 0x02, 0x8e95 }, { 0x03, 0x0002 }, - { 0x04, 0xaf01 }, - { 0x08, 0x000d }, - { 0x09, 0xd810 }, - { 0x0a, 0x0120 }, + { 0x04, 0xaf67 }, + { 0x08, 0x200f }, + { 0x09, 0xd010 }, + { 0x0a, 0x0100 }, { 0x0b, 0x0000 }, { 0x0d, 0x2800 }, - { 0x0f, 0x0000 }, - { 0x19, 0x0a17 }, + { 0x0f, 0x0022 }, + { 0x19, 0x0217 }, { 0x20, 0x0020 }, { 0x33, 0x0208 }, { 0x46, 0x0300 }, - { 0x49, 0x0004 }, - { 0x4f, 0x50e9 }, - { 0x50, 0x2000 }, - { 0x63, 0x2902 }, + { 0x49, 0x4004 }, + { 0x4f, 0x50c9 }, + { 0x50, 0x3000 }, + { 0x63, 0x1b02 }, { 0x67, 0x1111 }, { 0x68, 0x1016 }, { 0x69, 0x273f }, -- cgit v1.2.3 From c0ff971ef9acacd4d2caa508e444edad958dead9 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Thu, 15 Oct 2015 19:42:23 +0200 Subject: x86/ioapic: Disable interrupts when re-routing legacy IRQs A sporadic hang with consequent crash is observed when booting Hyper-V Gen1 guests: Call Trace: [] ? trace_hardirqs_off+0xd/0x10 [] queue_work_on+0x46/0x90 [] ? add_interrupt_randomness+0x176/0x1d0 ... [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 [] __irq_put_desc_unlock+0x1e/0x40 [] irq_modify_status+0xb5/0xd0 [] mp_register_handler+0x4b/0x70 [] mp_irqdomain_alloc+0x1ea/0x2a0 [] irq_domain_alloc_irqs_recursive+0x40/0xa0 [] __irq_domain_alloc_irqs+0x13c/0x2b0 [] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0 [] mp_map_pin_to_irq+0x165/0x2d0 [] pin_2_irq+0x47/0x80 [] setup_IO_APIC+0xfe/0x802 ... [] ? rest_init+0x140/0x140 The issue is easily reproducible with a simple instrumentation: if mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing IRQ0. The issue seems to be caused by the fact that we don't disable interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually happens. Protect the setup sequence against concurrent interrupts. [ tglx: Make the protection unconditional and not only for legacy interrupts ] Signed-off-by: Vitaly Kuznetsov Cc: Jiang Liu Cc: Yinghai Lu Cc: K. Y. Srinivasan Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/io_apic.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 5c60bb162622..bb6bfc01cb82 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2907,6 +2907,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, struct irq_data *irq_data; struct mp_chip_data *data; struct irq_alloc_info *info = arg; + unsigned long flags; if (!info || nr_irqs > 1) return -EINVAL; @@ -2939,11 +2940,14 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, cfg = irqd_cfg(irq_data); add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin); + + local_irq_save(flags); if (info->ioapic_entry) mp_setup_entry(cfg, data, info->ioapic_entry); mp_register_handler(virq, data->trigger); if (virq < nr_legacy_irqs()) legacy_pic->mask(virq); + local_irq_restore(flags); apic_printk(APIC_VERBOSE, KERN_DEBUG "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n", -- cgit v1.2.3 From 34198710f55b5f359f43e67d9a08fe5aadfbca1b Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Wed, 14 Oct 2015 13:31:24 +0100 Subject: ASoC: Add info callback for SX_TLV controls SX_TLV controls are intended for situations where the register behind the control has some non-zero value indicating the minimum gain and then gains increasing from there and eventually overflowing through zero. Currently every CODEC implementing these controls specifies the minimum as the non-zero value for the minimum and the maximum as the number of gain settings available. This means when the info callback subtracts the minimum value from the maximum value to calculate the number of gain levels available it is actually under reporting the available levels. This patch fixes this issue by adding a new snd_soc_info_volsw_sx callback that does not subtract the minimum value. Fixes: 1d99f2436d0d ("ASoC: core: Rework SOC_DOUBLE_R_SX_TLV add SOC_SINGLE_SX_TLV") Signed-off-by: Charles Keepax Acked-by: Brian Austin Tested-by: Brian Austin Signed-off-by: Mark Brown Cc: stable@vger.kernel.org --- include/sound/soc.h | 6 ++++-- sound/soc/soc-ops.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/sound/soc.h b/include/sound/soc.h index 884e728b09d9..26ede14597da 100644 --- a/include/sound/soc.h +++ b/include/sound/soc.h @@ -86,7 +86,7 @@ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ | \ SNDRV_CTL_ELEM_ACCESS_READWRITE, \ .tlv.p = (tlv_array),\ - .info = snd_soc_info_volsw, \ + .info = snd_soc_info_volsw_sx, \ .get = snd_soc_get_volsw_sx,\ .put = snd_soc_put_volsw_sx, \ .private_value = (unsigned long)&(struct soc_mixer_control) \ @@ -156,7 +156,7 @@ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ | \ SNDRV_CTL_ELEM_ACCESS_READWRITE, \ .tlv.p = (tlv_array), \ - .info = snd_soc_info_volsw, \ + .info = snd_soc_info_volsw_sx, \ .get = snd_soc_get_volsw_sx, \ .put = snd_soc_put_volsw_sx, \ .private_value = (unsigned long)&(struct soc_mixer_control) \ @@ -574,6 +574,8 @@ int snd_soc_put_enum_double(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol); int snd_soc_info_volsw(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_info *uinfo); +int snd_soc_info_volsw_sx(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_info *uinfo); #define snd_soc_info_bool_ext snd_ctl_boolean_mono_info int snd_soc_get_volsw(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol); diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c index 100d92b5b77e..05977ae1ff2a 100644 --- a/sound/soc/soc-ops.c +++ b/sound/soc/soc-ops.c @@ -206,6 +206,34 @@ int snd_soc_info_volsw(struct snd_kcontrol *kcontrol, } EXPORT_SYMBOL_GPL(snd_soc_info_volsw); +/** + * snd_soc_info_volsw_sx - Mixer info callback for SX TLV controls + * @kcontrol: mixer control + * @uinfo: control element information + * + * Callback to provide information about a single mixer control, or a double + * mixer control that spans 2 registers of the SX TLV type. SX TLV controls + * have a range that represents both positive and negative values either side + * of zero but without a sign bit. + * + * Returns 0 for success. + */ +int snd_soc_info_volsw_sx(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_info *uinfo) +{ + struct soc_mixer_control *mc = + (struct soc_mixer_control *)kcontrol->private_value; + + snd_soc_info_volsw(kcontrol, uinfo); + /* Max represents the number of levels in an SX control not the + * maximum value, so add the minimum value back on + */ + uinfo->value.integer.max += mc->min; + + return 0; +} +EXPORT_SYMBOL_GPL(snd_soc_info_volsw_sx); + /** * snd_soc_get_volsw - single mixer get callback * @kcontrol: mixer control -- cgit v1.2.3 From 6a3b764b8dc781c36f0f94287df5b2ec23b8fdd7 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 16 Oct 2015 12:16:21 -0700 Subject: ARM: OMAP2+: Fix oops with LPAE and more than 2GB of memory On boards with more than 2GB of RAM booting goes wrong with things not working and we're getting lots of l3 warnings: WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x260/0x384() 44000000.ocp:L3 Custom Error: MASTER MMC6 TARGET DMM1 (Idle): Data Access in User mode during Functional access ... [] (scsi_add_host_with_dma) from [] (ata_scsi_add_hosts+0x5c/0x18c) [] (ata_scsi_add_hosts) from [] (ata_host_register+0x150/0x2cc) [] (ata_host_register) from [] (ata_host_activate+0xd4/0x124) [] (ata_host_activate) from [] (ahci_host_activate+0x5c/0x194) [] (ahci_host_activate) from [] (ahci_platform_init_host+0x1f0/0x3f0) [] (ahci_platform_init_host) from [] (ahci_probe+0x70/0x98) [] (ahci_probe) from [] (platform_drv_probe+0x54/0xb4) Let's fix the issue by enabling ZONE_DMA for LPAE. Note that we need to limit dma_zone_size to 2GB as the rest of the RAM is beyond the 4GB limit. Let's also fix things for dra7 as done in similar patches in the TI tree by Lokesh Vutla . Reviewed-by: Lokesh Vutla Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/Kconfig | 2 ++ arch/arm/mach-omap2/board-generic.c | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index b3a0dff67e3f..33d1460a5639 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -49,6 +49,7 @@ config SOC_OMAP5 select OMAP_INTERCONNECT select OMAP_INTERCONNECT_BARRIER select PM_OPP if PM + select ZONE_DMA if ARM_LPAE config SOC_AM33XX bool "TI AM33XX" @@ -78,6 +79,7 @@ config SOC_DRA7XX select OMAP_INTERCONNECT select OMAP_INTERCONNECT_BARRIER select PM_OPP if PM + select ZONE_DMA if ARM_LPAE config ARCH_OMAP2PLUS bool diff --git a/arch/arm/mach-omap2/board-generic.c b/arch/arm/mach-omap2/board-generic.c index 6133eaac685d..6b59230cd6be 100644 --- a/arch/arm/mach-omap2/board-generic.c +++ b/arch/arm/mach-omap2/board-generic.c @@ -243,6 +243,9 @@ static const char *const omap5_boards_compat[] __initconst = { }; DT_MACHINE_START(OMAP5_DT, "Generic OMAP5 (Flattened Device Tree)") +#if defined(CONFIG_ZONE_DMA) && defined(CONFIG_ARM_LPAE) + .dma_zone_size = SZ_2G, +#endif .reserve = omap_reserve, .smp = smp_ops(omap4_smp_ops), .map_io = omap5_map_io, @@ -288,6 +291,9 @@ static const char *const dra74x_boards_compat[] __initconst = { }; DT_MACHINE_START(DRA74X_DT, "Generic DRA74X (Flattened Device Tree)") +#if defined(CONFIG_ZONE_DMA) && defined(CONFIG_ARM_LPAE) + .dma_zone_size = SZ_2G, +#endif .reserve = omap_reserve, .smp = smp_ops(omap4_smp_ops), .map_io = dra7xx_map_io, @@ -308,6 +314,9 @@ static const char *const dra72x_boards_compat[] __initconst = { }; DT_MACHINE_START(DRA72X_DT, "Generic DRA72X (Flattened Device Tree)") +#if defined(CONFIG_ZONE_DMA) && defined(CONFIG_ARM_LPAE) + .dma_zone_size = SZ_2G, +#endif .reserve = omap_reserve, .map_io = dra7xx_map_io, .init_early = dra7xx_init_early, -- cgit v1.2.3 From b28fec1324bf8f5010d2c3c5d57db4115bda66d4 Mon Sep 17 00:00:00 2001 From: Sudip Mukherjee Date: Sat, 17 Oct 2015 08:08:56 +0900 Subject: thermal: exynos: Fix register read in TMU The value of emul_con was getting overwritten if the selected soc is SOC_ARCH_EXYNOS5260. And so as a result we were reading from the wrong register in the case of SOC_ARCH_EXYNOS5260. Fixes: 488c7455d74c ("thermal: exynos: Add the support for Exynos5433 TMU") Signed-off-by: Sudip Mukherjee Reviewed-by: Krzysztof Kozlowski Reviewed-by: Chanwoo Choi Acked-by: Lukasz Majewski Signed-off-by: Krzysztof Kozlowski Signed-off-by: Kukjin Kim --- drivers/thermal/samsung/exynos_tmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c index 0bae8cc6c23a..ca920b0ecf8f 100644 --- a/drivers/thermal/samsung/exynos_tmu.c +++ b/drivers/thermal/samsung/exynos_tmu.c @@ -932,7 +932,7 @@ static void exynos4412_tmu_set_emulation(struct exynos_tmu_data *data, if (data->soc == SOC_ARCH_EXYNOS5260) emul_con = EXYNOS5260_EMUL_CON; - if (data->soc == SOC_ARCH_EXYNOS5433) + else if (data->soc == SOC_ARCH_EXYNOS5433) emul_con = EXYNOS5433_TMU_EMUL_CON; else if (data->soc == SOC_ARCH_EXYNOS7) emul_con = EXYNOS7_TMU_REG_EMUL_CON; -- cgit v1.2.3 From e210c422b6fdd2dc123bedc588f399aefd8bf9de Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Mon, 12 Oct 2015 11:30:11 +0300 Subject: xhci: don't finish a TD if we get a short transfer event mid TD If the difference is big enough between the bytes asked and received in a bulk transfer we can get a short transfer event pointing to a TRB in the middle of the TD. We don't want to handle the TD yet as we will anyway receive a new event for the last TRB in the TD. Hold off from finishing the TD and removing it from the list until we receive an event for the last TRB in the TD Cc: stable Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 43291f93afeb..79e89e6aef73 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2191,6 +2191,10 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td, } /* Fast path - was this the last TRB in the TD for this URB? */ } else if (event_trb == td->last_trb) { + if (td->urb_length_set && trb_comp_code == COMP_SHORT_TX) + return finish_td(xhci, td, event_trb, event, ep, + status, false); + if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) { td->urb->actual_length = td->urb->transfer_buffer_length - @@ -2242,6 +2246,12 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td, td->urb->actual_length += TRB_LEN(le32_to_cpu(cur_trb->generic.field[2])) - EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)); + + if (trb_comp_code == COMP_SHORT_TX) { + xhci_dbg(xhci, "mid bulk/intr SP, wait for last TRB event\n"); + td->urb_length_set = true; + return 0; + } } return finish_td(xhci, td, event_trb, event, ep, status, false); -- cgit v1.2.3 From 3b4739b8951d650becbcd855d7d6f18ac98a9a85 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Mon, 12 Oct 2015 11:30:12 +0300 Subject: xhci: handle no ping response error properly If a host fails to wake up a isochronous SuperSpeed device from U1/U2 in time for a isoch transfer it will generate a "No ping response error" Host will then move to the next transfer descriptor. Handle this case in the same way as missed service errors, tag the current TD as skipped and handle it on the next transfer event. Cc: stable Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 79e89e6aef73..97ffe3997273 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2284,6 +2284,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, u32 trb_comp_code; int ret = 0; int td_num = 0; + bool handling_skipped_tds = false; slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags)); xdev = xhci->devs[slot_id]; @@ -2420,6 +2421,10 @@ static int handle_tx_event(struct xhci_hcd *xhci, ep->skip = true; xhci_dbg(xhci, "Miss service interval error, set skip flag\n"); goto cleanup; + case COMP_PING_ERR: + ep->skip = true; + xhci_dbg(xhci, "No Ping response error, Skip one Isoc TD\n"); + goto cleanup; default: if (xhci_is_vendor_info_code(xhci, trb_comp_code)) { status = 0; @@ -2556,13 +2561,18 @@ static int handle_tx_event(struct xhci_hcd *xhci, ep, &status); cleanup: + + + handling_skipped_tds = ep->skip && + trb_comp_code != COMP_MISSED_INT && + trb_comp_code != COMP_PING_ERR; + /* - * Do not update event ring dequeue pointer if ep->skip is set. - * Will roll back to continue process missed tds. + * Do not update event ring dequeue pointer if we're in a loop + * processing missed tds. */ - if (trb_comp_code == COMP_MISSED_INT || !ep->skip) { + if (!handling_skipped_tds) inc_deq(xhci, xhci->event_ring); - } if (ret) { urb = td->urb; @@ -2597,7 +2607,7 @@ cleanup: * Process them as short transfer until reach the td pointed by * the event. */ - } while (ep->skip && trb_comp_code != COMP_MISSED_INT); + } while (handling_skipped_tds); return 0; } -- cgit v1.2.3 From fd7cd061adcf5f7503515ba52b6a724642a839c8 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 12 Oct 2015 11:30:13 +0300 Subject: xhci: Add spurious wakeup quirk for LynxPoint-LP controllers We received several reports of systems rebooting and powering on after an attempted shutdown. Testing showed that setting XHCI_SPURIOUS_WAKEUP quirk in addition to the XHCI_SPURIOUS_REBOOT quirk allowed the system to shutdown as expected for LynxPoint-LP xHCI controllers. Set the quirk back. Note that the quirk was originally introduced for LynxPoint and LynxPoint-LP just for this same reason. See: commit 638298dc66ea ("xhci: Fix spurious wakeups after S5 on Haswell") It was later limited to only concern HP machines as it caused regression on some machines, see both bug and commit: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171 commit 6962d914f317 ("xhci: Limit the spurious wakeup fix only to HP machines") Later it was discovered that the powering on after shutdown was limited to LynxPoint-LP (Haswell-ULT) and that some non-LP HP machine suffered from spontaneous resume from S3 (which should not be related to the SPURIOUS_WAKEUP quirk at all). An attempt to fix this then removed the SPURIOUS_WAKEUP flag usage completely. commit b45abacde3d5 ("xhci: no switching back on non-ULT Haswell") Current understanding is that LynxPoint-LP (Haswell ULT) machines need the SPURIOUS_WAKEUP quirk, otherwise they will restart, and plain Lynxpoint (Haswell) machines may _not_ have the quirk set otherwise they again will restart. Signed-off-by: Laura Abbott Cc: Takashi Iwai Cc: Oliver Neukum [Added more history to commit message -Mathias] Cc: stable Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index c79d33676672..c47d3e480586 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -147,6 +147,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) if (pdev->vendor == PCI_VENDOR_ID_INTEL && pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI) { xhci->quirks |= XHCI_SPURIOUS_REBOOT; + xhci->quirks |= XHCI_SPURIOUS_WAKEUP; } if (pdev->vendor == PCI_VENDOR_ID_INTEL && (pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_LP_XHCI || -- cgit v1.2.3 From 324700604b04954510ddd4c6841a88a06938a28c Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Sat, 17 Oct 2015 11:30:15 -0700 Subject: Input: lpc32xx_ts - fix warnings caused by enabling unprepared clock If common clock framework is configured, the driver generates a warning, which is fixed by this change: root@devkit3250:~# cat /dev/input/touchscreen0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 720 at drivers/clk/clk.c:727 clk_core_enable+0x2c/0xa4() Modules linked in: sc16is7xx snd_soc_uda1380 CPU: 0 PID: 720 Comm: cat Not tainted 4.3.0-rc2+ #199 Hardware name: LPC32XX SoC (Flattened Device Tree) Backtrace: [<>] (dump_backtrace) from [<>] (show_stack+0x18/0x1c) [<>] (show_stack) from [<>] (dump_stack+0x20/0x28) [<>] (dump_stack) from [<>] (warn_slowpath_common+0x90/0xb8) [<>] (warn_slowpath_common) from [<>] (warn_slowpath_null+0x24/0x2c) [<>] (warn_slowpath_null) from [<>] (clk_core_enable+0x2c/0xa4) [<>] (clk_core_enable) from [<>] (clk_enable+0x24/0x38) [<>] (clk_enable) from [<>] (lpc32xx_setup_tsc+0x18/0xa0) [<>] (lpc32xx_setup_tsc) from [<>] (lpc32xx_ts_open+0x14/0x1c) [<>] (lpc32xx_ts_open) from [<>] (input_open_device+0x74/0xb0) [<>] (input_open_device) from [<>] (evdev_open+0x110/0x16c) [<>] (evdev_open) from [<>] (chrdev_open+0x1b4/0x1dc) [<>] (chrdev_open) from [<>] (do_dentry_open+0x1dc/0x2f4) [<>] (do_dentry_open) from [<>] (vfs_open+0x6c/0x70) [<>] (vfs_open) from [<>] (path_openat+0xb4c/0xddc) [<>] (path_openat) from [<>] (do_filp_open+0x40/0x8c) [<>] (do_filp_open) from [<>] (do_sys_open+0x124/0x1c4) [<>] (do_sys_open) from [<>] (SyS_open+0x2c/0x30) [<>] (SyS_open) from [<>] (ret_fast_syscall+0x0/0x38) Signed-off-by: Vladimir Zapolskiy Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/lpc32xx_ts.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/input/touchscreen/lpc32xx_ts.c b/drivers/input/touchscreen/lpc32xx_ts.c index 24d704cd9f88..7fbb3b0c8571 100644 --- a/drivers/input/touchscreen/lpc32xx_ts.c +++ b/drivers/input/touchscreen/lpc32xx_ts.c @@ -139,14 +139,14 @@ static void lpc32xx_stop_tsc(struct lpc32xx_tsc *tsc) tsc_readl(tsc, LPC32XX_TSC_CON) & ~LPC32XX_TSC_ADCCON_AUTO_EN); - clk_disable(tsc->clk); + clk_disable_unprepare(tsc->clk); } static void lpc32xx_setup_tsc(struct lpc32xx_tsc *tsc) { u32 tmp; - clk_enable(tsc->clk); + clk_prepare_enable(tsc->clk); tmp = tsc_readl(tsc, LPC32XX_TSC_CON) & ~LPC32XX_TSC_ADCCON_POWER_UP; -- cgit v1.2.3 From f967fc8f165fadb72166f2bd4785094b3ca21307 Mon Sep 17 00:00:00 2001 From: Frederic Danis Date: Fri, 9 Oct 2015 17:14:56 +0200 Subject: Revert "serial: 8250_dma: don't bother DMA with small transfers" This reverts commit 9119fba0cfeda6d415c9f068df66838a104b87cb. This commit prevents from sending "big" file using Bluetooth. When sending a lot of data quickly through the Bluetooth interface, and after a variable amount of data sent, transfer fails with error: kernel: [ 415.247453] Bluetooth: hci0 hardware error 0x00 Found on T100TA. After reverting this commit, send works fine for any file size. Signed-off-by: Frederic Danis Fixes: 9119fba0cfed (serial: 8250_dma: don't bother DMA with small transfers) Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus Acked-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_dma.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c index 21d01a491405..e508939daea3 100644 --- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -80,10 +80,6 @@ int serial8250_tx_dma(struct uart_8250_port *p) return 0; dma->tx_size = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE); - if (dma->tx_size < p->port.fifosize) { - ret = -EINVAL; - goto err; - } desc = dmaengine_prep_slave_single(dma->txchan, dma->tx_addr + xmit->tail, -- cgit v1.2.3 From f235f664a8afabccf863a5dee4777d2d7b676fda Mon Sep 17 00:00:00 2001 From: Scot Doyle Date: Fri, 9 Oct 2015 15:08:10 +0000 Subject: fbcon: initialize blink interval before calling fb_set_par Since commit 27a4c827c34ac4256a190cc9d24607f953c1c459 fbcon: use the cursor blink interval provided by vt a PPC64LE kernel fails to boot when fbcon_add_cursor_timer uses an uninitialized ops->cur_blink_jiffies. Prevent by initializing in fbcon_init before the call to info->fbops->fb_set_par. Reported-and-tested-by: Alistair Popple Signed-off-by: Scot Doyle Cc: [v4.2] Signed-off-by: Greg Kroah-Hartman --- drivers/video/console/fbcon.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c index 1aaf89300621..92f394927f24 100644 --- a/drivers/video/console/fbcon.c +++ b/drivers/video/console/fbcon.c @@ -1093,6 +1093,7 @@ static void fbcon_init(struct vc_data *vc, int init) con_copy_unimap(vc, svc); ops = info->fbcon_par; + ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms); p->con_rotate = initial_rotation; set_blitting_type(vc, info); -- cgit v1.2.3 From c8a1978e07c412f8d79b8612f45aaafe7238ca62 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 18 Oct 2015 16:25:53 -0700 Subject: Input: sur40 - add dependency on VIDEO_V4L2 Fix build errors due to missing Kconfig dependency. drivers/built-in.o: In function `sur40_disconnect': sur40.c:(.text+0x22be6e): undefined reference to `video_unregister_device' sur40.c:(.text+0x22be77): undefined reference to `v4l2_device_unregister' drivers/built-in.o: In function `sur40_process_video': sur40.c:(.text+0x22c1d4): undefined reference to `v4l2_get_timestamp' drivers/built-in.o: In function `sur40_probe': sur40.c:(.text+0x22ca82): undefined reference to `v4l2_device_register' sur40.c:(.text+0x22cb1a): undefined reference to `v4l2_device_unregister' sur40.c:(.text+0x22cbf7): undefined reference to `video_device_release_empty' sur40.c:(.text+0x22cc53): undefined reference to `__video_register_device' sur40.c:(.text+0x22cc90): undefined reference to `video_unregister_device' drivers/built-in.o: In function `sur40_vidioc_querycap': sur40.c:(.text+0x22ccb0): undefined reference to `video_devdata' Signed-off-by: Randy Dunlap Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/touchscreen/Kconfig b/drivers/input/touchscreen/Kconfig index 600dcceff542..deb14c12ae8b 100644 --- a/drivers/input/touchscreen/Kconfig +++ b/drivers/input/touchscreen/Kconfig @@ -1006,6 +1006,7 @@ config TOUCHSCREEN_SUN4I config TOUCHSCREEN_SUR40 tristate "Samsung SUR40 (Surface 2.0/PixelSense) touchscreen" depends on USB && MEDIA_USB_SUPPORT && HAS_DMA + depends on VIDEO_V4L2 select INPUT_POLLDEV select VIDEOBUF2_DMA_SG help -- cgit v1.2.3 From f1ccd249319efca4ee4faf1d904f5a362cac7c81 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 16 Oct 2015 00:14:28 -0400 Subject: x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior For legacy machines cpu_init_udelay defaults to 10,000. For modern machines it is set to 0. The user should be able to set cpu_init_udelay to any value on the cmdline, including 10,000. Before this patch, that was seen as "unchanged from default" and thus on a modern machine, the user request was ignored and the delay was set to 0. Signed-off-by: Len Brown Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dparsons@brightdsl.net Cc: shrybman@teksavvy.com Link: http://lkml.kernel.org/r/de363cdbbcfcca1d22569683f7eb9873e0177251.1444968087.git.len.brown@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/smpboot.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index e0c198e5f920..32267ccac3d7 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -509,7 +509,7 @@ void __inquire_remote_apic(int apicid) */ #define UDELAY_10MS_DEFAULT 10000 -static unsigned int init_udelay = UDELAY_10MS_DEFAULT; +static unsigned int init_udelay = INT_MAX; static int __init cpu_init_udelay(char *str) { @@ -522,13 +522,16 @@ early_param("cpu_init_udelay", cpu_init_udelay); static void __init smp_quirk_init_udelay(void) { /* if cmdline changed it from default, leave it alone */ - if (init_udelay != UDELAY_10MS_DEFAULT) + if (init_udelay != INT_MAX) return; /* if modern processor, use no delay */ if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) || ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF))) init_udelay = 0; + + /* else, use legacy delay */ + init_udelay = UDELAY_10MS_DEFAULT; } /* -- cgit v1.2.3 From fcafddec4e78a7776db4b6685db6b2902d4300fc Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 16 Oct 2015 00:14:29 -0400 Subject: x86/smpboot: Fix CPU #1 boot timeout The following commit: a9bcaa02a5104ac ("x86/smpboot: Remove SIPI delays from cpu_up()") Caused some Intel Core2 processors to time-out when bringing up CPU #1, resulting in the missing of that CPU after bootup. That patch reduced the SIPI delays from udelay() 300, 200 to udelay() 0, 0 on modern processors. Several Intel(R) Core(TM)2 systems failed to bring up CPU #1 10/10 times after that change. Increasing either of the SIPI delays to udelay(1) results in success. So here we increase both to udelay(10). While this may be 20x slower than the absolute minimum, it is still 20x to 30x faster than the original code. Tested-by: Donald Parsons Tested-by: Shane Signed-off-by: Len Brown Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dparsons@brightdsl.net Cc: shrybman@teksavvy.com Link: http://lkml.kernel.org/r/6dd554ee8945984d85aafb2ad35793174d068af0.1444968087.git.len.brown@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/smpboot.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 32267ccac3d7..892ee2e5ecbc 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -660,7 +660,9 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) /* * Give the other CPU some time to accept the IPI. */ - if (init_udelay) + if (init_udelay == 0) + udelay(10); + else udelay(300); pr_debug("Startup point 1\n"); @@ -671,7 +673,9 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) /* * Give the other CPU some time to accept the IPI. */ - if (init_udelay) + if (init_udelay == 0) + udelay(10); + else udelay(200); if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */ -- cgit v1.2.3 From a75ca545e8d57473da47ece828ad98a10727ec6f Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Fri, 16 Oct 2015 14:28:53 +0300 Subject: x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Declaration of memcpy() is hidden under #ifndef CONFIG_KMEMCHECK. In asm/efi.h under #ifdef CONFIG_KASAN we #undef memcpy(), due to which the following happens: In file included from arch/x86/kernel/setup.c:96:0: ./arch/x86/include/asm/desc.h: In function ‘native_write_idt_entry’: ./arch/x86/include/asm/desc.h:122:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration] memcpy(&idt[entry], gate, sizeof(*gate)); ^ cc1: some warnings being treated as errors make[2]: *** [arch/x86/kernel/setup.o] Error 1 We will get rid of that #undef in asm/efi.h eventually. But in the meanwhile move memcpy() declaration out of #ifdefs to fix the build. Reported-by: Borislav Petkov Signed-off-by: Andrey Ryabinin Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1444994933-28328-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/string_64.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index e4661196994e..ff8b9a17dc4b 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -27,12 +27,11 @@ static __always_inline void *__inline_memcpy(void *to, const void *from, size_t function. */ #define __HAVE_ARCH_MEMCPY 1 +extern void *memcpy(void *to, const void *from, size_t len); extern void *__memcpy(void *to, const void *from, size_t len); #ifndef CONFIG_KMEMCHECK -#if (__GNUC__ == 4 && __GNUC_MINOR__ >= 3) || __GNUC__ > 4 -extern void *memcpy(void *to, const void *from, size_t len); -#else +#if (__GNUC__ == 4 && __GNUC_MINOR__ < 3) || __GNUC__ < 4 #define memcpy(dst, src, len) \ ({ \ size_t __len = (len); \ -- cgit v1.2.3 From 911b79cde95c7da0ec02f48105358a36636b7a71 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 19 Oct 2015 11:20:28 +0100 Subject: KEYS: Don't permit request_key() to construct a new keyring If request_key() is used to find a keyring, only do the search part - don't do the construction part if the keyring was not found by the search. We don't really want keyrings in the negative instantiated state since the rejected/negative instantiation error value in the payload is unioned with keyring metadata. Now the kernel gives an error: request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted) Signed-off-by: David Howells --- security/keys/request_key.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 486ef6fa393b..0d6253124278 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -440,6 +440,9 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx, kenter(""); + if (ctx->index_key.type == &key_type_keyring) + return ERR_PTR(-EPERM); + user = key_user_lookup(current_fsuid()); if (!user) return ERR_PTR(-ENOMEM); -- cgit v1.2.3 From 57df5380853460bc66b59a46273ce113c923d39c Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 16 Oct 2015 12:23:33 -0700 Subject: ARM: OMAP2+: Fix imprecise external abort caused by bogus SRAM init Some omaps are producing imprecise external aborts because we are wrongly trying to init SRAM for device tree based booting. Only omap3 is still using the legacy SRAM code, so we need to make it omap3 specific. Otherwise we can get errors like this on at least dm814x: Unhandled fault: imprecise external abort (0xc06) at 0xc08b156c ... (omap_rev) from [] (omap_sram_init+0xf8/0x3e0) (omap_sram_init) from [] (omap_sdrc_init+0x10/0xb0) (omap_sdrc_init) from [] (pdata_quirks_init+0x18/0x44) (pdata_quirks_init) from [] (omap_generic_init+0x10/0x1c) (omap_generic_init) from [] (customize_machine+0x1c/0x40) (customize_machine) from [] (do_one_initcall+0x80/0x1dc) (do_one_initcall) from [] (kernel_init_freeable+0x218/0x2e8) (kernel_init_freeable) from [] (kernel_init+0x8/0xec) (kernel_init) from [] (ret_from_fork+0x14/0x24) Let's fix the issue by making sure omap_sdrc_init only gets called for omap3. To do that, we need to have compatible "ti,omap3" in the dts files. And let's also use "ti,omap3630" instead of "ti,omap36xx" like we're supposed to. Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts | 2 +- arch/arm/boot/dts/omap3-evm-37xx.dts | 2 +- arch/arm/mach-omap2/board-generic.c | 1 + arch/arm/mach-omap2/pdata-quirks.c | 9 ++++++++- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts b/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts index 91146c318798..5b0430041ec6 100644 --- a/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts +++ b/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts @@ -12,7 +12,7 @@ / { model = "LogicPD Zoom DM3730 Torpedo Development Kit"; - compatible = "logicpd,dm3730-torpedo-devkit", "ti,omap36xx"; + compatible = "logicpd,dm3730-torpedo-devkit", "ti,omap3630", "ti,omap3"; gpio_keys { compatible = "gpio-keys"; diff --git a/arch/arm/boot/dts/omap3-evm-37xx.dts b/arch/arm/boot/dts/omap3-evm-37xx.dts index 16e8ce350dda..bb339d1648e0 100644 --- a/arch/arm/boot/dts/omap3-evm-37xx.dts +++ b/arch/arm/boot/dts/omap3-evm-37xx.dts @@ -13,7 +13,7 @@ / { model = "TI OMAP37XX EVM (TMDSEVM3730)"; - compatible = "ti,omap3-evm-37xx", "ti,omap36xx"; + compatible = "ti,omap3-evm-37xx", "ti,omap3630", "ti,omap3"; memory { device_type = "memory"; diff --git a/arch/arm/mach-omap2/board-generic.c b/arch/arm/mach-omap2/board-generic.c index 6b59230cd6be..fb219a30c10c 100644 --- a/arch/arm/mach-omap2/board-generic.c +++ b/arch/arm/mach-omap2/board-generic.c @@ -106,6 +106,7 @@ DT_MACHINE_START(OMAP3_DT, "Generic OMAP3 (Flattened Device Tree)") MACHINE_END static const char *const omap36xx_boards_compat[] __initconst = { + "ti,omap3630", "ti,omap36xx", NULL, }; diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c index ea56397599c2..1dfe34654c43 100644 --- a/arch/arm/mach-omap2/pdata-quirks.c +++ b/arch/arm/mach-omap2/pdata-quirks.c @@ -559,7 +559,14 @@ static void pdata_quirks_check(struct pdata_init *quirks) void __init pdata_quirks_init(const struct of_device_id *omap_dt_match_table) { - omap_sdrc_init(NULL, NULL); + /* + * We still need this for omap2420 and omap3 PM to work, others are + * using drivers/misc/sram.c already. + */ + if (of_machine_is_compatible("ti,omap2420") || + of_machine_is_compatible("ti,omap3")) + omap_sdrc_init(NULL, NULL); + pdata_quirks_check(auxdata_quirks); of_platform_populate(NULL, omap_dt_match_table, omap_auxdata_lookup, NULL); -- cgit v1.2.3 From 8a603f91cc4848ab1a0458bc065aa9f64322e123 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Fri, 16 Oct 2015 22:19:06 +0100 Subject: ARM: 8445/1: fix vdsomunge not to depend on glibc specific byteswap.h If the host toolchain is not glibc based then the arm kernel build fails with HOSTCC arch/arm/vdso/vdsomunge arch/arm/vdso/vdsomunge.c:48:22: fatal error: byteswap.h: No such file or directory Observed: with omap2plus_defconfig and compile on Mac OS X with arm ELF cross-compiler. Reason: byteswap.h is a glibc only header. Solution: replace by private byte-swapping macros (taken from arch/mips/boot/elf2ecoff.c and kindly improved by Russell King) Tested to compile on Mac OS X 10.9.5 host. Signed-off-by: H. Nikolaus Schaller Signed-off-by: Nathan Lynch Signed-off-by: Russell King --- arch/arm/vdso/vdsomunge.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/arm/vdso/vdsomunge.c b/arch/arm/vdso/vdsomunge.c index aedec81d1198..0cebd98cd88c 100644 --- a/arch/arm/vdso/vdsomunge.c +++ b/arch/arm/vdso/vdsomunge.c @@ -45,7 +45,6 @@ * it does. */ -#include #include #include #include @@ -59,6 +58,16 @@ #include #include +#define swab16(x) \ + ((((x) & 0x00ff) << 8) | \ + (((x) & 0xff00) >> 8)) + +#define swab32(x) \ + ((((x) & 0x000000ff) << 24) | \ + (((x) & 0x0000ff00) << 8) | \ + (((x) & 0x00ff0000) >> 8) | \ + (((x) & 0xff000000) << 24)) + #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ #define HOST_ORDER ELFDATA2LSB #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ @@ -104,17 +113,17 @@ static void cleanup(void) static Elf32_Word read_elf_word(Elf32_Word word, bool swap) { - return swap ? bswap_32(word) : word; + return swap ? swab32(word) : word; } static Elf32_Half read_elf_half(Elf32_Half half, bool swap) { - return swap ? bswap_16(half) : half; + return swap ? swab16(half) : half; } static void write_elf_word(Elf32_Word val, Elf32_Word *dst, bool swap) { - *dst = swap ? bswap_32(val) : val; + *dst = swap ? swab32(val) : val; } int main(int argc, char **argv) -- cgit v1.2.3 From 2a7d44f47f53fa1be677f44c73d78b1bcf9c05d9 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 19 Oct 2015 09:30:42 -0400 Subject: drm/radeon/dpm: don't add pwm attributes if DPM is disabled PWM fan control is only available with DPM. If DPM disabled, don't expose the PWM fan controls to avoid a crash. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=92524 Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/radeon/radeon_pm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index 44489cce7458..6a0a176e26ec 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -717,10 +717,14 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj, struct radeon_device *rdev = dev_get_drvdata(dev); umode_t effective_mode = attr->mode; - /* Skip limit attributes if DPM is not enabled */ + /* Skip attributes if DPM is not enabled */ if (rdev->pm.pm_method != PM_METHOD_DPM && (attr == &sensor_dev_attr_temp1_crit.dev_attr.attr || - attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr)) + attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr || + attr == &sensor_dev_attr_pwm1.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_max.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_min.dev_attr.attr)) return 0; /* Skip fan attributes if fan is not present */ -- cgit v1.2.3 From 27100735adbcb872854674bed1d000825f9954ac Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 19 Oct 2015 15:49:11 -0400 Subject: drm/amdgpu/dpm: don't add pwm attributes if DPM is disabled PWM fan control is only available with DPM. There is no non-DPM support on amdgpu, so we should never get a crash here because the sysfs nodes would never be created in the first place. Add the check just in case to be on the safe side. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c index efed11509f4a..ed2bbe5b10af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c @@ -294,10 +294,14 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj, struct amdgpu_device *adev = dev_get_drvdata(dev); umode_t effective_mode = attr->mode; - /* Skip limit attributes if DPM is not enabled */ + /* Skip attributes if DPM is not enabled */ if (!adev->pm.dpm_enabled && (attr == &sensor_dev_attr_temp1_crit.dev_attr.attr || - attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr)) + attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr || + attr == &sensor_dev_attr_pwm1.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_max.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_min.dev_attr.attr)) return 0; /* Skip fan attributes if fan is not present */ -- cgit v1.2.3 From 677c884ff6370add1360e2b9558285355ebe2b36 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 19 Oct 2015 15:54:21 -0400 Subject: drm/amdgpu: add missing dpm check for KV dpm late init Skip dpm late init if dpm is disabled. Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/kv_dpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c index 9745ed3a9aef..7e9154c7f1db 100644 --- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c +++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c @@ -2997,6 +2997,9 @@ static int kv_dpm_late_init(void *handle) struct amdgpu_device *adev = (struct amdgpu_device *)handle; int ret; + if (!amdgpu_dpm) + return 0; + /* init the sysfs and debugfs files late */ ret = amdgpu_pm_sysfs_init(adev); if (ret) -- cgit v1.2.3 From 0b5aedfe0e6654ec54f35109e1929a1cf7fc4cdd Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 28 Jun 2015 22:55:26 +0200 Subject: um: Fix out-of-tree build Commit 30b11ee9a (um: Remove copy&paste code from init.h) uncovered an issue wrt. out-of-tree builds. For out-of-tree builds, we must not rely on relative paths. Before 30b11ee9a it worked by chance as no host code included generated header files. Acked-by: Randy Dunlap Signed-off-by: Richard Weinberger --- arch/um/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/um/Makefile b/arch/um/Makefile index 098ab3333e7c..e3abe6f3156d 100644 --- a/arch/um/Makefile +++ b/arch/um/Makefile @@ -70,8 +70,8 @@ KBUILD_AFLAGS += $(ARCH_INCLUDE) USER_CFLAGS = $(patsubst $(KERNEL_DEFINES),,$(patsubst -I%,,$(KBUILD_CFLAGS))) \ $(ARCH_INCLUDE) $(MODE_INCLUDE) $(filter -I%,$(CFLAGS)) \ - -D_FILE_OFFSET_BITS=64 -idirafter include \ - -D__KERNEL__ -D__UM_HOST__ + -D_FILE_OFFSET_BITS=64 -idirafter $(srctree)/include \ + -idirafter $(obj)/include -D__KERNEL__ -D__UM_HOST__ #This will adjust *FLAGS accordingly to the platform. include $(ARCH_DIR)/Makefile-os-$(OS) -- cgit v1.2.3 From 37e81a016cc847c03ea71570fea29f12ca390bee Mon Sep 17 00:00:00 2001 From: Hans-Werner Hilse Date: Mon, 29 Jun 2015 11:50:32 +0200 Subject: um: Do not rely on libc to provide modify_ldt() modify_ldt() was declared as an external symbol. Despite the man page for this syscall telling that there is no wrapper in glibc, since version 2.1 there actually is, so linking to the glibc works. Since modify_ldt() is not a POSIX interface, other libc implementations do not always provide a wrapper function. Even glibc headers do not provide a corresponding declaration. So go the recommended way to call this using syscall(). Signed-off-by: Hans-Werner Hilse Signed-off-by: Richard Weinberger --- arch/x86/um/ldt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/um/ldt.c b/arch/x86/um/ldt.c index 9701a4fd7bf2..836a1eb5df43 100644 --- a/arch/x86/um/ldt.c +++ b/arch/x86/um/ldt.c @@ -12,7 +12,10 @@ #include #include -extern int modify_ldt(int func, void *ptr, unsigned long bytecount); +static inline int modify_ldt (int func, void *ptr, unsigned long bytecount) +{ + return syscall(__NR_modify_ldt, func, ptr, bytecount); +} static long write_ldt_entry(struct mm_id *mm_idp, int func, struct user_desc *desc, void **addr, int done) -- cgit v1.2.3 From 6b1873371cea13036171d03a7c1e3e59158b4505 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 9 Aug 2015 21:49:07 +0200 Subject: um: Fix waitpid() usage in helper code If UML is executing a helper program it is using waitpid() with the __WCLONE flag to wait for the program as the helper is executed from a clone()'ed thread. While using __WCLONE is perfectly fine for clone()'ed childs it won't detect terminated childs if the helper has issued an execve(). We have to use __WALL to wait for both clone()'ed and regular childs to detect the termination before and after an execve(). Reported-and-tested-by: Thomas Meyer Signed-off-by: Richard Weinberger --- arch/um/os-Linux/helper.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/um/os-Linux/helper.c b/arch/um/os-Linux/helper.c index e3ee4a51ef63..3f02d4232812 100644 --- a/arch/um/os-Linux/helper.c +++ b/arch/um/os-Linux/helper.c @@ -96,7 +96,7 @@ int run_helper(void (*pre_exec)(void *), void *pre_data, char **argv) "ret = %d\n", -n); ret = n; } - CATCH_EINTR(waitpid(pid, NULL, __WCLONE)); + CATCH_EINTR(waitpid(pid, NULL, __WALL)); } out_free2: @@ -129,7 +129,7 @@ int run_helper_thread(int (*proc)(void *), void *arg, unsigned int flags, return err; } if (stack_out == NULL) { - CATCH_EINTR(pid = waitpid(pid, &status, __WCLONE)); + CATCH_EINTR(pid = waitpid(pid, &status, __WALL)); if (pid < 0) { err = -errno; printk(UM_KERN_ERR "run_helper_thread - wait failed, " @@ -148,7 +148,7 @@ int run_helper_thread(int (*proc)(void *), void *arg, unsigned int flags, int helper_wait(int pid) { int ret, status; - int wflags = __WCLONE; + int wflags = __WALL; CATCH_EINTR(ret = waitpid(pid, &status, wflags)); if (ret < 0) { -- cgit v1.2.3 From 56b88a3bf97a39d3f4f010509917b76a865a6dc8 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 9 Aug 2015 22:26:33 +0200 Subject: um: Fix kernel mode fault condition We have to exclude memory locations <= PAGE_SIZE from the condition and let the kernel mode fault path catch it. Otherwise a kernel NULL pointer exception will be reported as a kernel user space access. Fixes: d2313084e2c (um: Catch unprotected user memory access) Signed-off-by: Richard Weinberger --- arch/um/kernel/trap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index d8a9fce6ee2e..98783dd0fa2e 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -220,7 +220,7 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user, show_regs(container_of(regs, struct pt_regs, regs)); panic("Segfault with no mm"); } - else if (!is_user && address < TASK_SIZE) { + else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) { show_regs(container_of(regs, struct pt_regs, regs)); panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx", address, ip); -- cgit v1.2.3 From fde7d22e01aa0d252fc5c95fa11f0dac35a4dd59 Mon Sep 17 00:00:00 2001 From: Yuyang Du Date: Tue, 13 Oct 2015 09:18:22 +0800 Subject: sched/fair: Fix overly small weight for interactive group entities Commit: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") led to an overly small weight for interactive group entities. The bad case can be easily reproduced when a number of CPU hogs compete for the CPUs at the same time (thanks to Mike). This is largly because the task group's load average tracking cross CPUs lags behind the real changes. To fix this we accelerate the group share distribution process by using the load.weight of the cfs_rq. This may increase the entire group's share, but we have to do so to protect the (fragile) interactive tasks, especially from CPU hogs. Reported-by: Mike Galbraith Tested-by: Dietmar Eggemann Tested-by: Mike Galbraith Signed-off-by: Yuyang Du Signed-off-by: Peter Zijlstra (Intel) Acked-by: Dietmar Eggemann Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1444699103-20272-1-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6e2e3483b1ec..bc62c5096e54 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2363,7 +2363,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) */ tg_weight = atomic_long_read(&tg->load_avg); tg_weight -= cfs_rq->tg_load_avg_contrib; - tg_weight += cfs_rq_load_avg(cfs_rq); + tg_weight += cfs_rq->load.weight; return tg_weight; } @@ -2373,7 +2373,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) long tg_weight, load, shares; tg_weight = calc_tg_weight(tg, cfs_rq); - load = cfs_rq_load_avg(cfs_rq); + load = cfs_rq->load.weight; shares = (tg->shares * load); if (tg_weight) -- cgit v1.2.3 From 3e386d56bafbb6d2540b49367444997fc671ea69 Mon Sep 17 00:00:00 2001 From: Yuyang Du Date: Tue, 13 Oct 2015 09:18:23 +0800 Subject: sched/fair: Update task group's load_avg after task migration When cfs_rq has cfs_rq->removed_load_avg set (when a task migrates from this cfs_rq), we need to update its contribution to the group's load_avg. This should not increase tg's update too much, because in most cases, the cfs_rq has already decayed its load_avg. Tested-by: Dietmar Eggemann Signed-off-by: Yuyang Du Signed-off-by: Peter Zijlstra (Intel) Acked-by: Dietmar Eggemann Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1444699103-20272-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bc62c5096e54..9a5e60fe721a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2664,13 +2664,14 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) { - int decayed; struct sched_avg *sa = &cfs_rq->avg; + int decayed, removed = 0; if (atomic_long_read(&cfs_rq->removed_load_avg)) { long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); + removed = 1; } if (atomic_long_read(&cfs_rq->removed_util_avg)) { @@ -2688,7 +2689,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) cfs_rq->load_last_update_time_copy = sa->last_update_time; #endif - return decayed; + return decayed || removed; } /* Update task and its cfs_rq load average */ -- cgit v1.2.3 From 0baabb385eb4bce699ddab0db015112be6cf1e6a Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 12 Oct 2015 17:21:23 +0200 Subject: nohz: Revert "nohz: Set isolcpus when nohz_full is set" This reverts: 8cb9764fc88b ("nohz: Set isolcpus when nohz_full is set") We assumed that full-nohz users always want scheduler isolation on full dynticks CPUs, therefore we included full-nohz CPUs on cpu_isolated_map. This means that tasks run by default on CPUs outside the nohz_full range unless their affinity is explicity overwritten. This suits pure isolation workloads but when the machine is needed to run common workloads, the available sets of CPUs to run common tasks becomes reduced. We reach an extreme case when CONFIG_NO_HZ_FULL_ALL is enabled as it leaves only CPU 0 for non-isolation tasks, which makes people think that their supercomputer regressed to 90's UP - which is true in a sense. Some full-nohz users appear to be interested in running normal workloads either before or after an isolation workload. Full-nohz isn't optimized toward normal workloads but it's still better than UP performance. We are reaching a limitation in kernel presets here. Lets revert this cpu_isolated_map inclusion and let userspace do its own scheduler isolation using cpusets or explicit affinity settings. Reported-by: Ingo Molnar Reported-by: Mike Galbraith Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Acked-by: Thomas Gleixner Cc: Alexey Dobriyan Cc: Andrew Morton Cc: Chris Metcalf Cc: Christoph Lameter Cc: Dave Jones Cc: Linus Torvalds Cc: Mike Galbraith Cc: Oleg Nesterov Cc: Paul E . McKenney Cc: Peter Zijlstra Cc: Rik van Riel Link: http://lkml.kernel.org/r/1444663283-30068-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 10a8faa1b0d4..5bd7d60658d3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7238,9 +7238,6 @@ void __init sched_init_smp(void) alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL); alloc_cpumask_var(&fallback_doms, GFP_KERNEL); - /* nohz_full won't take effect without isolating the cpus. */ - tick_nohz_full_add_cpus_to(cpu_isolated_map); - sched_init_numa(); /* -- cgit v1.2.3 From 5aa5050787f449e7eaef2c5ec93c7b357aa7dcdc Mon Sep 17 00:00:00 2001 From: Luca Abeni Date: Fri, 16 Oct 2015 10:06:21 +0200 Subject: sched/deadline: Fix migration of SCHED_DEADLINE tasks Commit: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") broke select_task_rq_dl() and find_lock_later_rq(), because it introduced a comparison between the local task's deadline and dl.earliest_dl.curr of the remote queue. However, if the remote runqueue does not contain any SCHED_DEADLINE task its earliest_dl.curr is 0 (always smaller than the deadline of the local task) and the remote runqueue is not selected for pushing. As a result, if an application creates multiple SCHED_DEADLINE threads, they will never be pushed to runqueues that do not already contain SCHED_DEADLINE tasks. This patch fixes the issue by checking if dl.dl_nr_running == 0. Signed-off-by: Luca Abeni Signed-off-by: Peter Zijlstra (Intel) Cc: Juri Lelli Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Wanpeng Li Fixes: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") Link: http://lkml.kernel.org/r/1444982781-15608-1-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar --- kernel/sched/deadline.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fc8f01083527..142df2668e5d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1066,8 +1066,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags) int target = find_later_rq(p); if (target != -1 && - dl_time_before(p->dl.deadline, - cpu_rq(target)->dl.earliest_dl.curr)) + (dl_time_before(p->dl.deadline, + cpu_rq(target)->dl.earliest_dl.curr) || + (cpu_rq(target)->dl.dl_nr_running == 0))) cpu = target; } rcu_read_unlock(); @@ -1417,7 +1418,8 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq) later_rq = cpu_rq(cpu); - if (!dl_time_before(task->dl.deadline, + if (later_rq->dl.dl_nr_running && + !dl_time_before(task->dl.deadline, later_rq->dl.earliest_dl.curr)) { /* * Target rq has tasks of equal or earlier deadline, -- cgit v1.2.3 From d976441f44bc5d48635d081d277aa76556ffbf8b Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Mon, 19 Oct 2015 11:37:17 +0300 Subject: compiler, atomics, kasan: Provide READ_ONCE_NOCHECK() Some code may perform racy by design memory reads. This could be harmless, yet such code may produce KASAN warnings. To hide such accesses from KASAN this patch introduces READ_ONCE_NOCHECK() macro. KASAN will not check the memory accessed by READ_ONCE_NOCHECK(). The KernelThreadSanitizer (KTSAN) is going to ignore it as well. This patch creates __read_once_size_nocheck() a clone of __read_once_size(). The only difference between them is 'no_sanitized_address' attribute appended to '*_nocheck' function. This attribute tells the compiler that instrumentation of memory accesses should not be applied to that function. We declare it as static '__maybe_unsed' because GCC is not capable to inline such function: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 With KASAN=n READ_ONCE_NOCHECK() is just a clone of READ_ONCE(). Signed-off-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Andrew Morton Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Denys Vlasenko Cc: Dmitry Vyukov Cc: Kostya Serebryany Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Sasha Levin Cc: Thomas Gleixner Cc: Wolfram Gloger Cc: kasan-dev Link: http://lkml.kernel.org/r/1445243838-17763-2-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar --- include/linux/compiler-gcc.h | 13 +++++++++ include/linux/compiler.h | 66 +++++++++++++++++++++++++++++++++++--------- 2 files changed, 66 insertions(+), 13 deletions(-) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index dfaa7b3e9ae9..8efb40e61d6e 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -237,12 +237,25 @@ #define KASAN_ABI_VERSION 3 #endif +#if GCC_VERSION >= 40902 +/* + * Tell the compiler that address safety instrumentation (KASAN) + * should not be applied to that function. + * Conflicts with inlining: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 + */ +#define __no_sanitize_address __attribute__((no_sanitize_address)) +#endif + #endif /* gcc version >= 40000 specific checks */ #if !defined(__noclone) #define __noclone /* not needed */ #endif +#if !defined(__no_sanitize_address) +#define __no_sanitize_address +#endif + /* * A trick to suppress uninitialized variable warning without generating any * code diff --git a/include/linux/compiler.h b/include/linux/compiler.h index c836eb2dc44d..3d7810341b57 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -198,19 +198,45 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); #include -static __always_inline void __read_once_size(const volatile void *p, void *res, int size) +#define __READ_ONCE_SIZE \ +({ \ + switch (size) { \ + case 1: *(__u8 *)res = *(volatile __u8 *)p; break; \ + case 2: *(__u16 *)res = *(volatile __u16 *)p; break; \ + case 4: *(__u32 *)res = *(volatile __u32 *)p; break; \ + case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \ + default: \ + barrier(); \ + __builtin_memcpy((void *)res, (const void *)p, size); \ + barrier(); \ + } \ +}) + +static __always_inline +void __read_once_size(const volatile void *p, void *res, int size) { - switch (size) { - case 1: *(__u8 *)res = *(volatile __u8 *)p; break; - case 2: *(__u16 *)res = *(volatile __u16 *)p; break; - case 4: *(__u32 *)res = *(volatile __u32 *)p; break; - case 8: *(__u64 *)res = *(volatile __u64 *)p; break; - default: - barrier(); - __builtin_memcpy((void *)res, (const void *)p, size); - barrier(); - } + __READ_ONCE_SIZE; +} + +#ifdef CONFIG_KASAN +/* + * This function is not 'inline' because __no_sanitize_address confilcts + * with inlining. Attempt to inline it may cause a build failure. + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 + * '__maybe_unused' allows us to avoid defined-but-not-used warnings. + */ +static __no_sanitize_address __maybe_unused +void __read_once_size_nocheck(const volatile void *p, void *res, int size) +{ + __READ_ONCE_SIZE; +} +#else +static __always_inline +void __read_once_size_nocheck(const volatile void *p, void *res, int size) +{ + __READ_ONCE_SIZE; } +#endif static __always_inline void __write_once_size(volatile void *p, void *res, int size) { @@ -248,8 +274,22 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s * required ordering. */ -#define READ_ONCE(x) \ - ({ union { typeof(x) __val; char __c[1]; } __u; __read_once_size(&(x), __u.__c, sizeof(x)); __u.__val; }) +#define __READ_ONCE(x, check) \ +({ \ + union { typeof(x) __val; char __c[1]; } __u; \ + if (check) \ + __read_once_size(&(x), __u.__c, sizeof(x)); \ + else \ + __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ + __u.__val; \ +}) +#define READ_ONCE(x) __READ_ONCE(x, 1) + +/* + * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need + * to hide memory access from KASAN. + */ +#define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0) #define WRITE_ONCE(x, val) \ ({ \ -- cgit v1.2.3 From f7d27c35ddff7c100d7a98db499ac0040149ac05 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Mon, 19 Oct 2015 11:37:18 +0300 Subject: x86/mm, kasan: Silence KASAN warnings in get_wchan() get_wchan() is racy by design, it may access volatile stack of running task, thus it may access redzone in a stack frame and cause KASAN to warn about this. Use READ_ONCE_NOCHECK() to silence these warnings. Reported-by: Sasha Levin Signed-off-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Andrew Morton Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Denys Vlasenko Cc: Dmitry Vyukov Cc: Kostya Serebryany Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Wolfram Gloger Cc: kasan-dev Link: http://lkml.kernel.org/r/1445243838-17763-3-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 39e585a554b7..e28db181e4fc 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -550,14 +550,14 @@ unsigned long get_wchan(struct task_struct *p) if (sp < bottom || sp > top) return 0; - fp = READ_ONCE(*(unsigned long *)sp); + fp = READ_ONCE_NOCHECK(*(unsigned long *)sp); do { if (fp < bottom || fp > top) return 0; - ip = READ_ONCE(*(unsigned long *)(fp + sizeof(unsigned long))); + ip = READ_ONCE_NOCHECK(*(unsigned long *)(fp + sizeof(unsigned long))); if (!in_sched_functions(ip)) return ip; - fp = READ_ONCE(*(unsigned long *)fp); + fp = READ_ONCE_NOCHECK(*(unsigned long *)fp); } while (count++ < 16 && p->state != TASK_RUNNING); return 0; } -- cgit v1.2.3 From 3fc89adb9fa4beff31374a4bf50b3d099d88ae83 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Mon, 19 Oct 2015 18:23:57 +0800 Subject: crypto: api - Only abort operations on fatal signal Currently a number of Crypto API operations may fail when a signal occurs. This causes nasty problems as the caller of those operations are often not in a good position to restart the operation. In fact there is currently no need for those operations to be interrupted by user signals at all. All we need is for them to be killable. This patch replaces the relevant calls of signal_pending with fatal_signal_pending, and wait_for_completion_interruptible with wait_for_completion_killable, respectively. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu --- crypto/ablkcipher.c | 2 +- crypto/algapi.c | 2 +- crypto/api.c | 6 +++--- crypto/crypto_user.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c index b788f169cc98..b4ffc5be1a93 100644 --- a/crypto/ablkcipher.c +++ b/crypto/ablkcipher.c @@ -706,7 +706,7 @@ struct crypto_ablkcipher *crypto_alloc_ablkcipher(const char *alg_name, err: if (err != -EAGAIN) break; - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { err = -EINTR; break; } diff --git a/crypto/algapi.c b/crypto/algapi.c index d130b41dbaea..59bf491fe3d8 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -345,7 +345,7 @@ static void crypto_wait_for_test(struct crypto_larval *larval) crypto_alg_tested(larval->alg.cra_driver_name, 0); } - err = wait_for_completion_interruptible(&larval->completion); + err = wait_for_completion_killable(&larval->completion); WARN_ON(err); out: diff --git a/crypto/api.c b/crypto/api.c index afe4610afc4b..bbc147cb5dec 100644 --- a/crypto/api.c +++ b/crypto/api.c @@ -172,7 +172,7 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg) struct crypto_larval *larval = (void *)alg; long timeout; - timeout = wait_for_completion_interruptible_timeout( + timeout = wait_for_completion_killable_timeout( &larval->completion, 60 * HZ); alg = larval->adult; @@ -445,7 +445,7 @@ struct crypto_tfm *crypto_alloc_base(const char *alg_name, u32 type, u32 mask) err: if (err != -EAGAIN) break; - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { err = -EINTR; break; } @@ -562,7 +562,7 @@ void *crypto_alloc_tfm(const char *alg_name, err: if (err != -EAGAIN) break; - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { err = -EINTR; break; } diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index d94d99ffe8b9..237f3795cfaa 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -375,7 +375,7 @@ static struct crypto_alg *crypto_user_skcipher_alg(const char *name, u32 type, err = PTR_ERR(alg); if (err != -EAGAIN) break; - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { err = -EINTR; break; } -- cgit v1.2.3 From d289619a219dd01e255d7b5e30f9171b25efea48 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 20 Oct 2015 16:23:55 +0200 Subject: ALSA: hda - Fix deadlock at error in building PCM The HDA codec driver issues snd_hda_codec_reset() at the error path of PCM build. This was needed in the earlier code base, but the recent rewrite to use the standard bus binding made this a deadlock: modprobe D 0000000000000005 0 720 716 0x00000080 Call Trace: [] schedule+0x3e/0x90 [] schedule_preempt_disabled+0x15/0x20 [] __mutex_lock_slowpath+0xb5/0x120 [] mutex_lock+0x1b/0x30 [] device_release_driver+0x1b/0x30 [] bus_remove_device+0x105/0x180 [] device_del+0x139/0x260 [] snd_hdac_device_unregister+0x25/0x30 [snd_hda_core] [] snd_hda_codec_reset+0x2a/0x70 [snd_hda_codec] [] snd_hda_codec_build_pcms+0x18b/0x1b0 [snd_hda_codec] [] hda_codec_driver_probe+0xbe/0x140 [snd_hda_codec] [] driver_probe_device+0x1f4/0x460 [] __driver_attach+0x90/0xa0 [] bus_for_each_dev+0x64/0xa0 [] driver_attach+0x1e/0x20 [] bus_add_driver+0x1eb/0x280 [] driver_register+0x60/0xe0 [] __hda_codec_driver_register+0x5a/0x60 [snd_hda_codec] [] realtek_driver_init+0x1e/0x1000 [snd_hda_codec_realtek] [] do_one_initcall+0xb3/0x200 [] do_init_module+0x60/0x1f8 [] load_module+0x1653/0x1bd0 [] SYSC_finit_module+0x98/0xc0 [] SyS_finit_module+0xe/0x10 [] entry_SYSCALL_64_fastpath+0x16/0x75 The simple fix is just to remove this call, since we don't need to think about unbinding at there any longer. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=948758 Cc: # v4.1+ Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_codec.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index 37f43a1b34ef..a249d5486889 100644 --- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -3367,10 +3367,8 @@ int snd_hda_codec_build_pcms(struct hda_codec *codec) int dev, err; err = snd_hda_codec_parse_pcms(codec); - if (err < 0) { - snd_hda_codec_reset(codec); + if (err < 0) return err; - } /* attach a new PCM streams */ list_for_each_entry(cpcm, &codec->pcm_list_head, list) { -- cgit v1.2.3 From 97aff2c03a1e4d343266adadb52313613efb027f Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Tue, 20 Oct 2015 10:25:58 +0100 Subject: ASoC: wm8904: Correct number of EQ registers There are 24 EQ registers not 25, I suspect this bug came about because the registers start at EQ1 not zero. The bug is relatively harmless as the extra register written is an unused one. Signed-off-by: Charles Keepax Signed-off-by: Mark Brown Cc: stable@vger.kernel.org --- include/sound/wm8904.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/sound/wm8904.h b/include/sound/wm8904.h index 898be3a8db9a..6d8f8fba3341 100644 --- a/include/sound/wm8904.h +++ b/include/sound/wm8904.h @@ -119,7 +119,7 @@ #define WM8904_MIC_REGS 2 #define WM8904_GPIO_REGS 4 #define WM8904_DRC_REGS 4 -#define WM8904_EQ_REGS 25 +#define WM8904_EQ_REGS 24 /** * DRC configurations are specified with a label and a set of register -- cgit v1.2.3 From a2d7629048322ae62bff57f34f5f995e25ed234c Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Tue, 20 Oct 2015 11:38:08 -0400 Subject: tracing: Have stack tracer force RCU to be watching The stack tracer was triggering the WARN_ON() in module.c: static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP if (unlikely(!debug_locks)) return; WARN_ON(!rcu_read_lock_sched_held() && !lockdep_is_held(&module_mutex)); #endif } The reason is that the stack tracer traces all function calls, and some of those calls happen while exiting or entering user space and idle. Some of these functions are called after RCU had already stopped watching, as RCU does not watch userspace or idle CPUs. If a max stack is hit, then the save_stack_trace() is called, which will check module addresses and call module_assert_mutex_or_preempt(), and then trigger the warning. Sad part is, the warning itself will also do a stack trace and tigger the same warning. That probably should be fixed. The warning was added by 0be964be0d45 "module: Sanitize RCU usage and locking" but this bug has probably been around longer. But it's unlikely to cause much harm, but the new warning causes the system to lock up. Cc: stable@vger.kernel.org # 4.2+ Cc: Peter Zijlstra Cc:"Paul E. McKenney" Signed-off-by: Steven Rostedt --- kernel/trace/trace_stack.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index b746399ab59c..5f29402bff0f 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -88,6 +88,12 @@ check_stack(unsigned long ip, unsigned long *stack) local_irq_save(flags); arch_spin_lock(&max_stack_lock); + /* + * RCU may not be watching, make it see us. + * The stack trace code uses rcu_sched. + */ + rcu_irq_enter(); + /* In case another CPU set the tracer_frame on us */ if (unlikely(!frame_size)) this_size -= tracer_frame; @@ -169,6 +175,7 @@ check_stack(unsigned long ip, unsigned long *stack) } out: + rcu_irq_exit(); arch_spin_unlock(&max_stack_lock); local_irq_restore(flags); } -- cgit v1.2.3 From 437f9963bc4fd75889c1fe9289a92dea9124a439 Mon Sep 17 00:00:00 2001 From: Pavel Fedin Date: Fri, 25 Sep 2015 17:00:29 +0300 Subject: KVM: arm/arm64: Do not inject spurious interrupts When lowering a level-triggered line from userspace, we forgot to lower the pending bit on the emulated CPU interface and we also did not re-compute the pending_on_cpu bitmap for the CPU affected by the change. Update vgic_update_irq_pending() to fix the two issues above and also raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt pending on a CPU which is neither marked active nor pending. [ Commit text reworked completely - Christoffer ] Signed-off-by: Pavel Fedin Signed-off-by: Christoffer Dall --- virt/kvm/arm/vgic.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 6bd1c9bf7ae7..596455a394af 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1132,7 +1132,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); vgic_irq_clear_active(vcpu, irq); vgic_update_state(vcpu->kvm); - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { + } else { + WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq)); vlr.state |= LR_STATE_PENDING; kvm_debug("Set pending: 0x%x\n", vlr.state); } @@ -1607,8 +1608,12 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid, } else { if (level_triggered) { vgic_dist_irq_clear_level(vcpu, irq_num); - if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) + if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) { vgic_dist_irq_clear_pending(vcpu, irq_num); + vgic_cpu_irq_clear(vcpu, irq_num); + if (!compute_pending_for_cpu(vcpu)) + clear_bit(cpuid, dist->irq_pending_on_cpu); + } } ret = false; -- cgit v1.2.3 From 399ea0f6bcd318af94ec8e4ffe96703ed674f22e Mon Sep 17 00:00:00 2001 From: Pavel Fedin Date: Tue, 6 Oct 2015 11:14:35 +0300 Subject: KVM: arm/arm64: Fix memory leak if timer initialization fails Jump to correct label and free kvm_host_cpu_state Reviewed-by: Wei Huang Signed-off-by: Pavel Fedin Signed-off-by: Christoffer Dall --- arch/arm/kvm/arm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index dc017adfddc8..78b286994577 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -1080,7 +1080,7 @@ static int init_hyp_mode(void) */ err = kvm_timer_hyp_init(); if (err) - goto out_free_mappings; + goto out_free_context; #ifndef CONFIG_HOTPLUG_CPU free_boot_hyp_pgd(); -- cgit v1.2.3 From 4a5d69b73948d0e03cd38d77dc11edb2e707165f Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 12 Oct 2015 15:22:31 +0200 Subject: KVM: arm: use GIC support unconditionally The vgic code on ARM is built for all configurations that enable KVM, but the parent_data field that it references is only present when CONFIG_IRQ_DOMAIN_HIERARCHY is set: virt/kvm/arm/vgic.c: In function 'kvm_vgic_map_phys_irq': virt/kvm/arm/vgic.c:1781:13: error: 'struct irq_data' has no member named 'parent_data' This flag is implied by the GIC driver, and indeed the VGIC code only makes sense if a GIC is present. This changes the CONFIG_KVM symbol to always select GIC, which avoids the issue. Fixes: 662d9715840 ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") Signed-off-by: Arnd Bergmann Acked-by: Marc Zyngier Signed-off-by: Christoffer Dall --- arch/arm/kvm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 210eccadb69a..356970f3b25e 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM depends on MMU && OF select PREEMPT_NOTIFIERS select ANON_INODES + select ARM_GIC select HAVE_KVM_CPU_RELAX_INTERCEPT select HAVE_KVM_ARCH_TLB_FLUSH_ALL select KVM_MMIO -- cgit v1.2.3 From cff9211eb1a1f58ce7f5a2d596b617928fd4be0e Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Fri, 16 Oct 2015 12:41:21 +0200 Subject: arm/arm64: KVM: Fix arch timer behavior for disabled interrupts We have an interesting issue when the guest disables the timer interrupt on the VGIC, which happens when turning VCPUs off using PSCI, for example. The problem is that because the guest disables the virtual interrupt at the VGIC level, we never inject interrupts to the guest and therefore never mark the interrupt as active on the physical distributor. The host also never takes the timer interrupt (we only use the timer device to trigger a guest exit and everything else is done in software), so the interrupt does not become active through normal means. The result is that we keep entering the guest with a programmed timer that will always fire as soon as we context switch the hardware timer state and run the guest, preventing forward progress for the VCPU. Since the active state on the physical distributor is really part of the timer logic, it is the job of our virtual arch timer driver to manage this state. The timer->map->active boolean field indicates whether we have signalled this interrupt to the vgic and if that interrupt is still pending or active. As long as that is the case, the hardware doesn't have to generate physical interrupts and therefore we mark the interrupt as active on the physical distributor. We also have to restore the pending state of an interrupt that was queued to an LR but was retired from the LR for some reason, while remaining pending in the LR. Cc: Marc Zyngier Reported-by: Lorenzo Pieralisi Signed-off-by: Christoffer Dall --- virt/kvm/arm/arch_timer.c | 19 +++++++++++++++++++ virt/kvm/arm/vgic.c | 43 +++++++++++-------------------------------- 2 files changed, 30 insertions(+), 32 deletions(-) diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index 48c6e1ac6827..b9d3a32cbc04 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -137,6 +137,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu) void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu) { struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; + bool phys_active; + int ret; /* * We're about to run this vcpu again, so there is no need to @@ -151,6 +153,23 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu) */ if (kvm_timer_should_fire(vcpu)) kvm_timer_inject_irq(vcpu); + + /* + * We keep track of whether the edge-triggered interrupt has been + * signalled to the vgic/guest, and if so, we mask the interrupt and + * the physical distributor to prevent the timer from raising a + * physical interrupt whenever we run a guest, preventing forward + * VCPU progress. + */ + if (kvm_vgic_get_phys_irq_active(timer->map)) + phys_active = true; + else + phys_active = false; + + ret = irq_set_irqchip_state(timer->map->irq, + IRQCHIP_STATE_ACTIVE, + phys_active); + WARN_ON(ret); } /** diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 596455a394af..ea21bc273542 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1092,6 +1092,15 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr); + /* + * We must transfer the pending state back to the distributor before + * retiring the LR, otherwise we may loose edge-triggered interrupts. + */ + if (vlr.state & LR_STATE_PENDING) { + vgic_dist_irq_set_pending(vcpu, irq); + vlr.hwirq = 0; + } + vlr.state = 0; vgic_set_lr(vcpu, lr_nr, vlr); clear_bit(lr_nr, vgic_cpu->lr_used); @@ -1241,7 +1250,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu) struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; struct vgic_dist *dist = &vcpu->kvm->arch.vgic; unsigned long *pa_percpu, *pa_shared; - int i, vcpu_id, lr, ret; + int i, vcpu_id; int overflow = 0; int nr_shared = vgic_nr_shared_irqs(dist); @@ -1296,31 +1305,6 @@ epilog: */ clear_bit(vcpu_id, dist->irq_pending_on_cpu); } - - for (lr = 0; lr < vgic->nr_lr; lr++) { - struct vgic_lr vlr; - - if (!test_bit(lr, vgic_cpu->lr_used)) - continue; - - vlr = vgic_get_lr(vcpu, lr); - - /* - * If we have a mapping, and the virtual interrupt is - * presented to the guest (as pending or active), then we must - * set the state to active in the physical world. See - * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt. - */ - if (vlr.state & LR_HW) { - struct irq_phys_map *map; - map = vgic_irq_map_search(vcpu, vlr.irq); - - ret = irq_set_irqchip_state(map->irq, - IRQCHIP_STATE_ACTIVE, - true); - WARN_ON(ret); - } - } } static bool vgic_process_maintenance(struct kvm_vcpu *vcpu) @@ -1430,13 +1414,8 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr) WARN_ON(ret); - if (map->active) { - ret = irq_set_irqchip_state(map->irq, - IRQCHIP_STATE_ACTIVE, - false); - WARN_ON(ret); + if (map->active) return 0; - } return 1; } -- cgit v1.2.3 From 544c572e03174438b6656ed24a4516b9a9d5f14a Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Sat, 17 Oct 2015 17:55:12 +0200 Subject: arm/arm64: KVM: Clear map->active on pend/active clear When a guest reboots or offlines/onlines CPUs, it is not uncommon for it to clear the pending and active states of an interrupt through the emulated VGIC distributor. However, since the architected timers are defined by the architecture to be level triggered and the guest rightfully expects them to be that, but we emulate them as edge-triggered, we have to mimic level-triggered behavior for an edge-triggered virtual implementation. We currently do not signal the VGIC when the map->active field is true, because it indicates that the guest has already been signalled of the interrupt as required. Normally this field is set to false when the guest deactivates the virtual interrupt through the sync path. We also need to catch the case where the guest deactivates the interrupt through the emulated distributor, again allowing guests to boot even if the original virtual timer signal hit before the guest's GIC initialization sequence is run. Reviewed-by: Eric Auger Signed-off-by: Christoffer Dall --- virt/kvm/arm/vgic.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index ea21bc273542..58b125676785 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -531,6 +531,34 @@ bool vgic_handle_set_pending_reg(struct kvm *kvm, return false; } +/* + * If a mapped interrupt's state has been modified by the guest such that it + * is no longer active or pending, without it have gone through the sync path, + * then the map->active field must be cleared so the interrupt can be taken + * again. + */ +static void vgic_handle_clear_mapped_irq(struct kvm_vcpu *vcpu) +{ + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + struct list_head *root; + struct irq_phys_map_entry *entry; + struct irq_phys_map *map; + + rcu_read_lock(); + + /* Check for PPIs */ + root = &vgic_cpu->irq_phys_map_list; + list_for_each_entry_rcu(entry, root, entry) { + map = &entry->map; + + if (!vgic_dist_irq_is_pending(vcpu, map->virt_irq) && + !vgic_irq_is_active(vcpu, map->virt_irq)) + map->active = false; + } + + rcu_read_unlock(); +} + bool vgic_handle_clear_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio, phys_addr_t offset, int vcpu_id) @@ -561,6 +589,7 @@ bool vgic_handle_clear_pending_reg(struct kvm *kvm, vcpu_id, offset); vgic_reg_access(mmio, reg, offset, mode); + vgic_handle_clear_mapped_irq(kvm_get_vcpu(kvm, vcpu_id)); vgic_update_state(kvm); return true; } @@ -598,6 +627,7 @@ bool vgic_handle_clear_active_reg(struct kvm *kvm, ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT); if (mmio->is_write) { + vgic_handle_clear_mapped_irq(kvm_get_vcpu(kvm, vcpu_id)); vgic_update_state(kvm); return true; } @@ -1406,7 +1436,7 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr) return 0; map = vgic_irq_map_search(vcpu, vlr.irq); - BUG_ON(!map || !map->active); + BUG_ON(!map); ret = irq_get_irqchip_state(map->irq, IRQCHIP_STATE_ACTIVE, -- cgit v1.2.3 From 0d997491f814c87310a6ad7be30a9049c7150489 Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Sat, 17 Oct 2015 19:05:27 +0200 Subject: arm/arm64: KVM: Fix disabled distributor operation We currently do a single update of the vgic state when the distributor enable/disable control register is accessed and then bypass updating the state for as long as the distributor remains disabled. This is incorrect, because updating the state does not consider the distributor enable bit, and this you can end up in a situation where an interrupt is marked as pending on the CPU interface, but not pending on the distributor, which is an impossible state to be in, and triggers a warning. Consider for example the following sequence of events: 1. An interrupt is marked as pending on the distributor - the interrupt is also forwarded to the CPU interface 2. The guest turns off the distributor (it's about to do a reboot) - we stop updating the CPU interface state from now on 3. The guest disables the pending interrupt - we remove the pending state from the distributor, but don't touch the CPU interface, see point 2. Since the distributor disable bit really means that no interrupts should be forwarded to the CPU interface, we modify the code to keep updating the internal VGIC state, but always set the CPU interface pending bits to zero when the distributor is disabled. Signed-off-by: Christoffer Dall --- virt/kvm/arm/vgic.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 58b125676785..66c66165e712 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1012,6 +1012,12 @@ static int compute_pending_for_cpu(struct kvm_vcpu *vcpu) pend_percpu = vcpu->arch.vgic_cpu.pending_percpu; pend_shared = vcpu->arch.vgic_cpu.pending_shared; + if (!dist->enabled) { + bitmap_zero(pend_percpu, VGIC_NR_PRIVATE_IRQS); + bitmap_zero(pend_shared, nr_shared); + return 0; + } + pending = vgic_bitmap_get_cpu_map(&dist->irq_pending, vcpu_id); enabled = vgic_bitmap_get_cpu_map(&dist->irq_enabled, vcpu_id); bitmap_and(pend_percpu, pending, enabled, VGIC_NR_PRIVATE_IRQS); @@ -1039,11 +1045,6 @@ void vgic_update_state(struct kvm *kvm) struct kvm_vcpu *vcpu; int c; - if (!dist->enabled) { - set_bit(0, dist->irq_pending_on_cpu); - return; - } - kvm_for_each_vcpu(c, vcpu, kvm) { if (compute_pending_for_cpu(vcpu)) set_bit(c, dist->irq_pending_on_cpu); -- cgit v1.2.3 From 625faa6a720d26fc0db9e20b48dc0dfe4c8d8ddf Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 20 Oct 2015 11:49:44 +0100 Subject: clkdev: fix clk_add_alias() with a NULL alias device name clk_add_alias() was not correctly handling the case where alias_dev_name was NULL: rather than producing an entry with a NULL dev_id pointer, it would produce a device name of (null). Fix this. Cc: Fixes: 2568999835d7 ("clkdev: add clkdev_create() helper") Reported-by: Aaro Koskinen Tested-by: Aaro Koskinen Signed-off-by: Russell King --- drivers/clk/clkdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c index c0eaf0973bd2..779b6ff0c7ad 100644 --- a/drivers/clk/clkdev.c +++ b/drivers/clk/clkdev.c @@ -333,7 +333,8 @@ int clk_add_alias(const char *alias, const char *alias_dev_name, if (IS_ERR(r)) return PTR_ERR(r); - l = clkdev_create(r, alias, "%s", alias_dev_name); + l = clkdev_create(r, alias, alias_dev_name ? "%s" : NULL, + alias_dev_name); clk_put(r); return l ? 0 : -ENODEV; -- cgit v1.2.3 From 3909642034ffd7a8906ff3f2b2a71455bf39e7f6 Mon Sep 17 00:00:00 2001 From: Matan Barak Date: Thu, 15 Oct 2015 15:01:03 +0300 Subject: IB/core: Fix use after free of ifa When using ifup/ifdown while executing enum_netdev_ipv4_ips, ifa could become invalid and cause use after free error. Fixing it by protecting with RCU lock. Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management') Signed-off-by: Matan Barak Signed-off-by: Doug Ledford --- drivers/infiniband/core/roce_gid_mgmt.c | 35 +++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c index 6b24cba1e474..178f98482e13 100644 --- a/drivers/infiniband/core/roce_gid_mgmt.c +++ b/drivers/infiniband/core/roce_gid_mgmt.c @@ -250,25 +250,44 @@ static void enum_netdev_ipv4_ips(struct ib_device *ib_dev, u8 port, struct net_device *ndev) { struct in_device *in_dev; + struct sin_list { + struct list_head list; + struct sockaddr_in ip; + }; + struct sin_list *sin_iter; + struct sin_list *sin_temp; + LIST_HEAD(sin_list); if (ndev->reg_state >= NETREG_UNREGISTERING) return; - in_dev = in_dev_get(ndev); - if (!in_dev) + rcu_read_lock(); + in_dev = __in_dev_get_rcu(ndev); + if (!in_dev) { + rcu_read_unlock(); return; + } for_ifa(in_dev) { - struct sockaddr_in ip; + struct sin_list *entry = kzalloc(sizeof(*entry), GFP_ATOMIC); - ip.sin_family = AF_INET; - ip.sin_addr.s_addr = ifa->ifa_address; - update_gid_ip(GID_ADD, ib_dev, port, ndev, - (struct sockaddr *)&ip); + if (!entry) { + pr_warn("roce_gid_mgmt: couldn't allocate entry for IPv4 update\n"); + continue; + } + entry->ip.sin_family = AF_INET; + entry->ip.sin_addr.s_addr = ifa->ifa_address; + list_add_tail(&entry->list, &sin_list); } endfor_ifa(in_dev); + rcu_read_unlock(); - in_dev_put(in_dev); + list_for_each_entry_safe(sin_iter, sin_temp, &sin_list, list) { + update_gid_ip(GID_ADD, ib_dev, port, ndev, + (struct sockaddr *)&sin_iter->ip); + list_del(&sin_iter->list); + kfree(sin_iter); + } } static void enum_netdev_ipv6_ips(struct ib_device *ib_dev, -- cgit v1.2.3 From b3b51f9f6f5d91cd16afaed0c22df2c56ed5f92e Mon Sep 17 00:00:00 2001 From: Haggai Eran Date: Mon, 21 Sep 2015 16:02:02 +0300 Subject: IB/cma: Potential NULL dereference in cma_id_from_event If the lookup of a listening ID failed for an AF_IB request, the code would try to call dev_put() on a NULL net_dev. Fixes: be688195bd08 ("IB/cma: Fix net_dev reference leak with failed requests") Reported-by: Dan Carpenter Signed-off-by: Haggai Eran Signed-off-by: Doug Ledford --- drivers/infiniband/core/cma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 59a2dafc8c57..f163ac680841 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1324,7 +1324,7 @@ static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id, bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id), cma_port_from_service_id(req.service_id)); id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev); - if (IS_ERR(id_priv)) { + if (IS_ERR(id_priv) && *net_dev) { dev_put(*net_dev); *net_dev = NULL; } -- cgit v1.2.3 From 0174b381caf89443d92c6fe75f725f2bfeba96b6 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Thu, 17 Sep 2015 16:04:19 -0400 Subject: IB/ucma: check workqueue allocation before usage Allocating a workqueue might fail, which wasn't checked so far and would lead to NULL ptr derefs when an attempt to use it was made. Signed-off-by: Sasha Levin Signed-off-by: Doug Ledford --- drivers/infiniband/core/ucma.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index a53fc9b01c69..30467d10df91 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -1624,11 +1624,16 @@ static int ucma_open(struct inode *inode, struct file *filp) if (!file) return -ENOMEM; + file->close_wq = create_singlethread_workqueue("ucma_close_id"); + if (!file->close_wq) { + kfree(file); + return -ENOMEM; + } + INIT_LIST_HEAD(&file->event_list); INIT_LIST_HEAD(&file->ctx_list); init_waitqueue_head(&file->poll_wait); mutex_init(&file->mut); - file->close_wq = create_singlethread_workqueue("ucma_close_id"); filp->private_data = file; file->filp = filp; -- cgit v1.2.3 From ab3964ad2acfbb0dc5414d4c86fa6d8d690f27a1 Mon Sep 17 00:00:00 2001 From: Haggai Eran Date: Tue, 20 Oct 2015 09:53:01 +0300 Subject: IB/cma: Use inner P_Key to determine netdev When discussing the patches to demux ids in rdma_cm instead of ib_cm, it was decided that it is best to use the P_Key value in the packet headers. However, the mlx5 and ipath drivers are currently unable to send correct P_Key values in GMP headers. They always send using a single P_Key that is set during the GSI QP initialization. Change the rdma_cm code to look at the P_Key value that is part of the packet payload as a workaround. Once the drivers are fixed this patch can be reverted. Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM") Signed-off-by: Haggai Eran Signed-off-by: Doug Ledford --- drivers/infiniband/core/cma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f163ac680841..36b12d560e17 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1067,14 +1067,14 @@ static int cma_save_req_info(const struct ib_cm_event *ib_event, sizeof(req->local_gid)); req->has_gid = true; req->service_id = req_param->primary_path->service_id; - req->pkey = req_param->bth_pkey; + req->pkey = be16_to_cpu(req_param->primary_path->pkey); break; case IB_CM_SIDR_REQ_RECEIVED: req->device = sidr_param->listen_id->device; req->port = sidr_param->port; req->has_gid = false; req->service_id = sidr_param->service_id; - req->pkey = sidr_param->bth_pkey; + req->pkey = sidr_param->pkey; break; default: return -EINVAL; -- cgit v1.2.3 From 203d27b0226a05202438ddb39ef0ef1acb14a759 Mon Sep 17 00:00:00 2001 From: Jes Sorensen Date: Tue, 20 Oct 2015 12:09:12 -0400 Subject: md/raid1: submit_bio_wait() returns 0 on success This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b which changed the return value of submit_bio_wait() to return != 0 on error, but didn't update the caller accordingly. Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md") Cc: stable@vger.kernel.org (v3.10) Reported-by: Bill Kuzeja Signed-off-by: Jes Sorensen Signed-off-by: NeilBrown --- drivers/md/raid1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index ddd8a5f572aa..cfca6edf7813 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2195,7 +2195,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i) bio_trim(wbio, sector - r1_bio->sector, sectors); wbio->bi_iter.bi_sector += rdev->data_offset; wbio->bi_bdev = rdev->bdev; - if (submit_bio_wait(WRITE, wbio) == 0) + if (submit_bio_wait(WRITE, wbio) < 0) /* failure! */ ok = rdev_set_badblocks(rdev, sector, sectors, 0) -- cgit v1.2.3 From 681ab4696062f5aa939c9e04d058732306a97176 Mon Sep 17 00:00:00 2001 From: Jes Sorensen Date: Tue, 20 Oct 2015 12:09:13 -0400 Subject: md/raid10: submit_bio_wait() returns 0 on success This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b which changed the return value of submit_bio_wait() to return != 0 on error, but didn't update the caller accordingly. Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md") Cc: stable@vger.kernel.org (v3.10) Reported-by: Bill Kuzeja Signed-off-by: Jes Sorensen Signed-off-by: NeilBrown --- drivers/md/raid10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 9f69dc526f8c..a9ecec4e9a13 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2467,7 +2467,7 @@ static int narrow_write_error(struct r10bio *r10_bio, int i) choose_data_offset(r10_bio, rdev) + (sector - r10_bio->sector)); wbio->bi_bdev = rdev->bdev; - if (submit_bio_wait(WRITE, wbio) == 0) + if (submit_bio_wait(WRITE, wbio) < 0) /* Failure! */ ok = rdev_set_badblocks(rdev, sector, sectors, 0) -- cgit v1.2.3 From 1904be1b6bb92058c8e00063dd59df2df294e258 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Tue, 20 Oct 2015 21:48:02 -0400 Subject: tracing: Do not allow stack_tracer to record stack in NMI The code in stack tracer should not be executed within an NMI as it grabs spinlocks and stack tracing an NMI gives the possibility of causing a deadlock. Although this is safe on x86_64, because it does not perform stack traces when the task struct stack is not in use (interrupts and NMIs), it may be an issue for NMIs on i386 and other archs that use the same stack as the NMI. Signed-off-by: Steven Rostedt --- kernel/trace/trace_stack.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5f29402bff0f..8abf1ba18085 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -85,6 +85,10 @@ check_stack(unsigned long ip, unsigned long *stack) if (!object_is_on_stack(stack)) return; + /* Can't do this from NMI context (can cause deadlocks) */ + if (in_nmi()) + return; + local_irq_save(flags); arch_spin_lock(&max_stack_lock); -- cgit v1.2.3 From 0f6925fa2907df58496cabc33fa4677c635e2223 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 14 Oct 2015 15:26:13 +0800 Subject: btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Current code will always truncate tailing page if its alloc_start is smaller than inode size. For example, the file extent layout is like: 0 4K 8K 16K 32K |<-----Extent A---------------->| |<--Inode size: 18K---------->| But if calling fallocate even for range [0,4K), it will cause btrfs to re-truncate the range [16,32K), causing COW and a new extent. 0 4K 8K 16K 32K |///////| <- Fallocate call range |<-----Extent A-------->|<--B-->| The cause is quite easy, just a careless btrfs_truncate_inode() in a else branch without extra judgment. Fix it by add judgment on whether the fallocate range is beyond isize. Signed-off-by: Qu Wenruo Signed-off-by: Chris Mason --- fs/btrfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b823fac91c92..8c6f247ba81d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2584,7 +2584,7 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_start); if (ret) goto out; - } else { + } else if (offset + len > inode->i_size) { /* * If we are fallocating from the end of the file onward we * need to zero out the end of the page if i_size lands in the -- cgit v1.2.3 From 08b137d90eec51b0e90c42e123ca8ceb118d233f Mon Sep 17 00:00:00 2001 From: Chaotian Jing Date: Mon, 12 Oct 2015 17:22:23 +0800 Subject: mmc: core: Fix init_card in 52Mhz Suppose that we got a data crc error, and it triggers the mmc_reset. mmc_reset will call mmc_send_status to see if HW reset was supported. before issue CMD13, it will do retune, and if EMMC was in HS400 mode, it will reduce frequency to 52Mhz firstly, then results in card init was doing at 52Mhz. The mmc_send_status was originally only done for mmc_test, should drop it. And, rename the "eMMC hardware reset" to "Reset test", as we would also be able to use the test for SD-cards. Signed-off-by: Chaotian Jing Suggested-by: Adrian Hunter Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors") Signed-off-by: Ulf Hansson --- drivers/mmc/card/mmc_test.c | 9 +++------ drivers/mmc/core/mmc.c | 7 ------- 2 files changed, 3 insertions(+), 13 deletions(-) diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c index b78cf5d403a3..7fc9174d4619 100644 --- a/drivers/mmc/card/mmc_test.c +++ b/drivers/mmc/card/mmc_test.c @@ -2263,15 +2263,12 @@ static int mmc_test_profile_sglen_r_nonblock_perf(struct mmc_test_card *test) /* * eMMC hardware reset. */ -static int mmc_test_hw_reset(struct mmc_test_card *test) +static int mmc_test_reset(struct mmc_test_card *test) { struct mmc_card *card = test->card; struct mmc_host *host = card->host; int err; - if (!mmc_card_mmc(card) || !mmc_can_reset(card)) - return RESULT_UNSUP_CARD; - err = mmc_hw_reset(host); if (!err) return RESULT_OK; @@ -2605,8 +2602,8 @@ static const struct mmc_test_case mmc_test_cases[] = { }, { - .name = "eMMC hardware reset", - .run = mmc_test_hw_reset, + .name = "Reset test", + .run = mmc_test_reset, }, }; diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index e726903170a8..f6cd995dbe92 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1924,7 +1924,6 @@ EXPORT_SYMBOL(mmc_can_reset); static int mmc_reset(struct mmc_host *host) { struct mmc_card *card = host->card; - u32 status; if (!(host->caps & MMC_CAP_HW_RESET) || !host->ops->hw_reset) return -EOPNOTSUPP; @@ -1937,12 +1936,6 @@ static int mmc_reset(struct mmc_host *host) host->ops->hw_reset(host); - /* If the reset has happened, then a status command will fail */ - if (!mmc_send_status(card, &status)) { - mmc_host_clk_release(host); - return -ENOSYS; - } - /* Set initial state and call mmc_set_ios */ mmc_set_initial_state(host); mmc_host_clk_release(host); -- cgit v1.2.3 From cbf3ccd09d683abf1cacd36e3640872ee912d99b Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Tue, 20 Oct 2015 14:59:36 +0200 Subject: iommu/amd: Don't clear DTE flags when modifying it During device assignment/deassignment the flags in the DTE get lost, which might cause spurious faults, for example when the device tries to access the system management range. Fix this by not clearing the flags with the rest of the DTE. Reported-by: G. Richard Bellamy Tested-by: G. Richard Bellamy Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 4 ++-- drivers/iommu/amd_iommu_types.h | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 08d2775887f7..532e2a211fe1 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1974,8 +1974,8 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats) static void clear_dte_entry(u16 devid) { /* remove entry from the device table seen by the hardware */ - amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV; - amd_iommu_dev_table[devid].data[1] = 0; + amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV; + amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK; amd_iommu_apply_erratum_63(devid); } diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index f65908841be0..c9b64722f623 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -295,6 +295,7 @@ #define IOMMU_PTE_IR (1ULL << 61) #define IOMMU_PTE_IW (1ULL << 62) +#define DTE_FLAG_MASK (0x3ffULL << 32) #define DTE_FLAG_IOTLB (0x01UL << 32) #define DTE_FLAG_GV (0x01ULL << 55) #define DTE_GLX_SHIFT (56) -- cgit v1.2.3 From 23316316c1af0677a041c81f3ad6efb9dc470b33 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Wed, 21 Oct 2015 16:03:14 +1100 Subject: powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" This reverts commit 9678cdaae939 ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8") because the original commit had multiple, partly self-cancelling bugs, that could cause occasional memory corruption. In fact the logmpp instruction was incorrectly using register r0 as the source of the buffer address and operation code, and depending on what was in r0, it would either do nothing or corrupt the 64k page pointed to by r0. The logmpp instruction encoding and the operation code definitions could be corrected, but then there is the problem that there is no clearly defined way to know when the hardware has finished writing to the buffer. The original commit attempted to work around this by aborting the write-out before starting the prefetch, but this is ineffective in the case where the virtual core is now executing on a different physical core from the one where the write-out was initiated. These problems plus advice from the hardware designers not to use the function (since the measured performance improvement from using the feature was actually mostly negative), mean that reverting the code is the best option. Fixes: 9678cdaae939 ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8") Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/cache.h | 7 ----- arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/include/asm/ppc-opcode.h | 17 ----------- arch/powerpc/include/asm/reg.h | 1 - arch/powerpc/kvm/book3s_hv.c | 55 +---------------------------------- 5 files changed, 1 insertion(+), 81 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index 0dc42c5082b7..5f8229e24fe6 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -3,7 +3,6 @@ #ifdef __KERNEL__ -#include /* bytes per L1 cache line */ #if defined(CONFIG_8xx) || defined(CONFIG_403GCX) @@ -40,12 +39,6 @@ struct ppc64_caches { }; extern struct ppc64_caches ppc64_caches; - -static inline void logmpp(u64 x) -{ - asm volatile(PPC_LOGMPP(R1) : : "r" (x)); -} - #endif /* __powerpc64__ && ! __ASSEMBLY__ */ #if defined(__ASSEMBLY__) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 827a38d7a9db..887c259556df 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -297,8 +297,6 @@ struct kvmppc_vcore { u32 arch_compat; ulong pcr; ulong dpdes; /* doorbell state (POWER8) */ - void *mpp_buffer; /* Micro Partition Prefetch buffer */ - bool mpp_buffer_is_valid; ulong conferring_threads; }; diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 790f5d1d9a46..7ab04fc59e24 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -141,7 +141,6 @@ #define PPC_INST_ISEL 0x7c00001e #define PPC_INST_ISEL_MASK 0xfc00003e #define PPC_INST_LDARX 0x7c0000a8 -#define PPC_INST_LOGMPP 0x7c0007e4 #define PPC_INST_LSWI 0x7c0004aa #define PPC_INST_LSWX 0x7c00042a #define PPC_INST_LWARX 0x7c000028 @@ -285,20 +284,6 @@ #define __PPC_EH(eh) 0 #endif -/* POWER8 Micro Partition Prefetch (MPP) parameters */ -/* Address mask is common for LOGMPP instruction and MPPR SPR */ -#define PPC_MPPE_ADDRESS_MASK 0xffffffffc000ULL - -/* Bits 60 and 61 of MPP SPR should be set to one of the following */ -/* Aborting the fetch is indeed setting 00 in the table size bits */ -#define PPC_MPPR_FETCH_ABORT (0x0ULL << 60) -#define PPC_MPPR_FETCH_WHOLE_TABLE (0x2ULL << 60) - -/* Bits 54 and 55 of register for LOGMPP instruction should be set to: */ -#define PPC_LOGMPP_LOG_L2 (0x02ULL << 54) -#define PPC_LOGMPP_LOG_L2L3 (0x01ULL << 54) -#define PPC_LOGMPP_LOG_ABORT (0x03ULL << 54) - /* Deal with instructions that older assemblers aren't aware of */ #define PPC_DCBAL(a, b) stringify_in_c(.long PPC_INST_DCBAL | \ __PPC_RA(a) | __PPC_RB(b)) @@ -307,8 +292,6 @@ #define PPC_LDARX(t, a, b, eh) stringify_in_c(.long PPC_INST_LDARX | \ ___PPC_RT(t) | ___PPC_RA(a) | \ ___PPC_RB(b) | __PPC_EH(eh)) -#define PPC_LOGMPP(b) stringify_in_c(.long PPC_INST_LOGMPP | \ - __PPC_RB(b)) #define PPC_LWARX(t, a, b, eh) stringify_in_c(.long PPC_INST_LWARX | \ ___PPC_RT(t) | ___PPC_RA(a) | \ ___PPC_RB(b) | __PPC_EH(eh)) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index aa1cc5f015ee..a908ada8e0a5 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -226,7 +226,6 @@ #define CTRL_TE 0x00c00000 /* thread enable */ #define CTRL_RUNLATCH 0x1 #define SPRN_DAWR 0xB4 -#define SPRN_MPPR 0xB8 /* Micro Partition Prefetch Register */ #define SPRN_RPR 0xBA /* Relative Priority Register */ #define SPRN_CIABR 0xBB #define CIABR_PRIV 0x3 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 228049786888..9c26c5a96ea2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -36,7 +36,6 @@ #include #include -#include #include #include #include @@ -75,12 +74,6 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); -#if defined(CONFIG_PPC_64K_PAGES) -#define MPP_BUFFER_ORDER 0 -#elif defined(CONFIG_PPC_4K_PAGES) -#define MPP_BUFFER_ORDER 3 -#endif - static int dynamic_mt_modes = 6; module_param(dynamic_mt_modes, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(dynamic_mt_modes, "Set of allowed dynamic micro-threading modes: 0 (= none), 2, 4, or 6 (= 2 or 4)"); @@ -1455,13 +1448,6 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core) vcore->kvm = kvm; INIT_LIST_HEAD(&vcore->preempt_list); - vcore->mpp_buffer_is_valid = false; - - if (cpu_has_feature(CPU_FTR_ARCH_207S)) - vcore->mpp_buffer = (void *)__get_free_pages( - GFP_KERNEL|__GFP_ZERO, - MPP_BUFFER_ORDER); - return vcore; } @@ -1894,33 +1880,6 @@ static int on_primary_thread(void) return 1; } -static void kvmppc_start_saving_l2_cache(struct kvmppc_vcore *vc) -{ - phys_addr_t phy_addr, mpp_addr; - - phy_addr = (phys_addr_t)virt_to_phys(vc->mpp_buffer); - mpp_addr = phy_addr & PPC_MPPE_ADDRESS_MASK; - - mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_ABORT); - logmpp(mpp_addr | PPC_LOGMPP_LOG_L2); - - vc->mpp_buffer_is_valid = true; -} - -static void kvmppc_start_restoring_l2_cache(const struct kvmppc_vcore *vc) -{ - phys_addr_t phy_addr, mpp_addr; - - phy_addr = virt_to_phys(vc->mpp_buffer); - mpp_addr = phy_addr & PPC_MPPE_ADDRESS_MASK; - - /* We must abort any in-progress save operations to ensure - * the table is valid so that prefetch engine knows when to - * stop prefetching. */ - logmpp(mpp_addr | PPC_LOGMPP_LOG_ABORT); - mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE); -} - /* * A list of virtual cores for each physical CPU. * These are vcores that could run but their runner VCPU tasks are @@ -2471,14 +2430,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) srcu_idx = srcu_read_lock(&vc->kvm->srcu); - if (vc->mpp_buffer_is_valid) - kvmppc_start_restoring_l2_cache(vc); - __kvmppc_vcore_entry(); - if (vc->mpp_buffer) - kvmppc_start_saving_l2_cache(vc); - srcu_read_unlock(&vc->kvm->srcu, srcu_idx); spin_lock(&vc->lock); @@ -3073,14 +3026,8 @@ static void kvmppc_free_vcores(struct kvm *kvm) { long int i; - for (i = 0; i < KVM_MAX_VCORES; ++i) { - if (kvm->arch.vcores[i] && kvm->arch.vcores[i]->mpp_buffer) { - struct kvmppc_vcore *vc = kvm->arch.vcores[i]; - free_pages((unsigned long)vc->mpp_buffer, - MPP_BUFFER_ORDER); - } + for (i = 0; i < KVM_MAX_VCORES; ++i) kfree(kvm->arch.vcores[i]); - } kvm->arch.online_vcores = 0; } -- cgit v1.2.3 From 53c656c4138511c2ba54df413dc29976cfa9f084 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Wed, 21 Oct 2015 16:06:24 +1100 Subject: powerpc/powernv: Handle irq_happened flag correctly in off-line loop This fixes a bug where it is possible for an off-line CPU to fail to go into a low-power state (nap/sleep/winkle), and to become unresponsive to requests from the KVM subsystem to wake up and run a VCPU. What can happen is that a maskable interrupt of some kind (external, decrementer, hypervisor doorbell, or HMI) after we have called local_irq_disable() at the beginning of pnv_smp_cpu_kill_self() and before interrupts are hard-disabled inside power7_nap/sleep/winkle(). In this situation, the pending event is marked in the irq_happened flag in the PACA. This pending event prevents power7_nap/sleep/winkle from going to the requested low-power state; instead they return immediately. We don't deal with any of these pending event flags in the off-line loop in pnv_smp_cpu_kill_self() because power7_nap et al. return 0 in this case, so we will have srr1 == 0, and none of the processing to clear interrupts or doorbells will be done. Usually, the most obvious symptom of this is that a KVM guest will fail with a console message saying "KVM: couldn't grab cpu N". This fixes the problem by making sure we handle the irq_happened flags properly. First, we hard-disable before the off-line loop. Once we have hard-disabled, the irq_happened flags can't change underneath us. We unconditionally clear the DEC and HMI flags: there is no processing of timer interrupts while off-line, and the necessary HMI processing is all done in lower-level code. We leave the EE and DBELL flags alone for the first iteration of the loop, so that we won't fail to respond to a split-core request that came in just before hard-disabling. Within the loop, we handle external interrupts if the EE bit is set in irq_happened as well as if the low-power state was interrupted by an external interrupt. (We don't need to do the msgclr for a pending doorbell in irq_happened, because doorbells are edge-triggered and don't remain pending in hardware.) Then we clear both the EE and DBELL flags, and once clear, they cannot be set again (until this CPU comes online again, that is). This also fixes the debug check to not be done when we just ran a KVM guest or when the sleep didn't happen because of a pending event in irq_happened. Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/powernv/smp.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 8f70ba681a78..ca264833ee64 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -171,7 +171,26 @@ static void pnv_smp_cpu_kill_self(void) * so clear LPCR:PECE1. We keep PECE2 enabled. */ mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1); + + /* + * Hard-disable interrupts, and then clear irq_happened flags + * that we can safely ignore while off-line, since they + * are for things for which we do no processing when off-line + * (or in the case of HMI, all the processing we need to do + * is done in lower-level real-mode code). + */ + hard_irq_disable(); + local_paca->irq_happened &= ~(PACA_IRQ_DEC | PACA_IRQ_HMI); + while (!generic_check_cpu_restart(cpu)) { + /* + * Clear IPI flag, since we don't handle IPIs while + * offline, except for those when changing micro-threading + * mode, which are handled explicitly below, and those + * for coming online, which are handled via + * generic_check_cpu_restart() calls. + */ + kvmppc_set_host_ipi(cpu, 0); ppc64_runlatch_off(); @@ -196,20 +215,20 @@ static void pnv_smp_cpu_kill_self(void) * having finished executing in a KVM guest, then srr1 * contains 0. */ - if ((srr1 & wmask) == SRR1_WAKEEE) { + if (((srr1 & wmask) == SRR1_WAKEEE) || + (local_paca->irq_happened & PACA_IRQ_EE)) { icp_native_flush_interrupt(); - local_paca->irq_happened &= PACA_IRQ_HARD_DIS; - smp_mb(); } else if ((srr1 & wmask) == SRR1_WAKEHDBELL) { unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); asm volatile(PPC_MSGCLR(%0) : : "r" (msg)); - kvmppc_set_host_ipi(cpu, 0); } + local_paca->irq_happened &= ~(PACA_IRQ_EE | PACA_IRQ_DBELL); + smp_mb(); if (cpu_core_split_required()) continue; - if (!generic_check_cpu_restart(cpu)) + if (srr1 && !generic_check_cpu_restart(cpu)) DBG("CPU%d Unexpected exit while offline !\n", cpu); } mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1); -- cgit v1.2.3 From f8f2dc4a7127725383c93b501fcc4e47871b0a9d Mon Sep 17 00:00:00 2001 From: Bard Liao Date: Wed, 21 Oct 2015 16:18:18 +0800 Subject: ASoC: rt298: fix wrong setting of gpio2_en The register value to enable gpio2 was incorrect. So fix it. Signed-off-by: Bard Liao Signed-off-by: Mark Brown --- sound/soc/codecs/rt298.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/codecs/rt298.c b/sound/soc/codecs/rt298.c index d3e30a645ae3..f823eb502367 100644 --- a/sound/soc/codecs/rt298.c +++ b/sound/soc/codecs/rt298.c @@ -1214,7 +1214,7 @@ static int rt298_i2c_probe(struct i2c_client *i2c, mdelay(10); if (!rt298->pdata.gpio2_en) - regmap_write(rt298->regmap, RT298_SET_DMIC2_DEFAULT, 0x4000); + regmap_write(rt298->regmap, RT298_SET_DMIC2_DEFAULT, 0x40); else regmap_write(rt298->regmap, RT298_SET_DMIC2_DEFAULT, 0); -- cgit v1.2.3 From e27c5b9d23168cc2cb8fec147ae7ed1f7a2005c3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 13 Oct 2015 18:14:19 -0400 Subject: writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy() a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") added rbtree_postorder_for_each_entry_safe() which is used to remove all entries; however, according to Cody, the iterator isn't safe against operations which may rebalance the tree. Fix it by switching to repeatedly removing rb_first() until empty. Signed-off-by: Tejun Heo Reported-by: Cody P Schafer Fixes: a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") Link: http://lkml.kernel.org/g/1443997973-1700-1-git-send-email-dev@codyps.com Signed-off-by: Jens Axboe --- mm/backing-dev.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 9e841399041a..619984fc07ec 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -681,7 +681,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi) static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { struct radix_tree_iter iter; - struct bdi_writeback_congested *congested, *congested_n; + struct rb_node *rbn; void **slot; WARN_ON(test_bit(WB_registered, &bdi->wb.state)); @@ -691,9 +691,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi) radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) cgwb_kill(*slot); - rbtree_postorder_for_each_entry_safe(congested, congested_n, - &bdi->cgwb_congested_tree, rb_node) { - rb_erase(&congested->rb_node, &bdi->cgwb_congested_tree); + while ((rbn = rb_first(&bdi->cgwb_congested_tree))) { + struct bdi_writeback_congested *congested = + rb_entry(rbn, struct bdi_writeback_congested, rb_node); + + rb_erase(rbn, &bdi->cgwb_congested_tree); congested->bdi = NULL; /* mark @congested unlinked */ } -- cgit v1.2.3 From 09dc1387c9c06cdaf55bc99b35238bd2ec0aed4b Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Wed, 21 Oct 2015 21:31:49 +0200 Subject: drm/vmwgfx: Stabilize the command buffer submission code This commit addresses some stability problems with the command buffer submission code recently introduced: 1) Make the vmw_cmdbuf_man_process() function handle reruns internally to avoid losing interrupts if the caller forgets to rerun on -EAGAIN. 2) Handle default command buffer allocations using inline command buffers. This avoids rare allocation deadlocks. 3) In case of command buffer errors we might lose fence submissions. Therefore send a new fence after each command buffer error. This will help avoid lengthy fence waits. Signed-off-by: Thomas Hellstrom Reviewed-by: Sinclair Yeh --- drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c index 8a76821177a6..6377e8151000 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c @@ -415,16 +415,16 @@ static void vmw_cmdbuf_ctx_process(struct vmw_cmdbuf_man *man, * * Calls vmw_cmdbuf_ctx_process() on all contexts. If any context has * command buffers left that are not submitted to hardware, Make sure - * IRQ handling is turned on. Otherwise, make sure it's turned off. This - * function may return -EAGAIN to indicate it should be rerun due to - * possibly missed IRQs if IRQs has just been turned on. + * IRQ handling is turned on. Otherwise, make sure it's turned off. */ -static int vmw_cmdbuf_man_process(struct vmw_cmdbuf_man *man) +static void vmw_cmdbuf_man_process(struct vmw_cmdbuf_man *man) { - int notempty = 0; + int notempty; struct vmw_cmdbuf_context *ctx; int i; +retry: + notempty = 0; for_each_cmdbuf_ctx(man, i, ctx) vmw_cmdbuf_ctx_process(man, ctx, ¬empty); @@ -440,10 +440,8 @@ static int vmw_cmdbuf_man_process(struct vmw_cmdbuf_man *man) man->irq_on = true; /* Rerun in case we just missed an irq. */ - return -EAGAIN; + goto retry; } - - return 0; } /** @@ -468,8 +466,7 @@ static void vmw_cmdbuf_ctx_add(struct vmw_cmdbuf_man *man, header->cb_context = cb_context; list_add_tail(&header->list, &man->ctx[cb_context].submitted); - if (vmw_cmdbuf_man_process(man) == -EAGAIN) - vmw_cmdbuf_man_process(man); + vmw_cmdbuf_man_process(man); } /** @@ -488,8 +485,7 @@ static void vmw_cmdbuf_man_tasklet(unsigned long data) struct vmw_cmdbuf_man *man = (struct vmw_cmdbuf_man *) data; spin_lock(&man->lock); - if (vmw_cmdbuf_man_process(man) == -EAGAIN) - (void) vmw_cmdbuf_man_process(man); + vmw_cmdbuf_man_process(man); spin_unlock(&man->lock); } @@ -507,6 +503,7 @@ static void vmw_cmdbuf_work_func(struct work_struct *work) struct vmw_cmdbuf_man *man = container_of(work, struct vmw_cmdbuf_man, work); struct vmw_cmdbuf_header *entry, *next; + uint32_t dummy; bool restart = false; spin_lock_bh(&man->lock); @@ -523,6 +520,8 @@ static void vmw_cmdbuf_work_func(struct work_struct *work) if (restart && vmw_cmdbuf_startstop(man, true)) DRM_ERROR("Failed restarting command buffer context 0.\n"); + /* Send a new fence in case one was removed */ + vmw_fifo_send_fence(man->dev_priv, &dummy); } /** @@ -682,7 +681,7 @@ static bool vmw_cmdbuf_try_alloc(struct vmw_cmdbuf_man *man, DRM_MM_SEARCH_DEFAULT, DRM_MM_CREATE_DEFAULT); if (ret) { - (void) vmw_cmdbuf_man_process(man); + vmw_cmdbuf_man_process(man); ret = drm_mm_insert_node_generic(&man->mm, info->node, info->page_size, 0, 0, DRM_MM_SEARCH_DEFAULT, @@ -1168,7 +1167,14 @@ int vmw_cmdbuf_set_pool_size(struct vmw_cmdbuf_man *man, drm_mm_init(&man->mm, 0, size >> PAGE_SHIFT); man->has_pool = true; - man->default_size = default_size; + + /* + * For now, set the default size to VMW_CMDBUF_INLINE_SIZE to + * prevent deadlocks from happening when vmw_cmdbuf_space_pool() + * needs to wait for space and we block on further command + * submissions to be able to free up space. + */ + man->default_size = VMW_CMDBUF_INLINE_SIZE; DRM_INFO("Using command buffers with %s pool.\n", (man->using_mob) ? "MOB" : "DMA"); -- cgit v1.2.3 From 0ca81a2840f77855bbad1b9f172c545c4dc9e6a4 Mon Sep 17 00:00:00 2001 From: Doron Tsur Date: Sun, 11 Oct 2015 15:58:17 +0300 Subject: IB/cm: Fix rb-tree duplicate free and use-after-free ib_send_cm_sidr_rep could sometimes erase the node from the sidr (depending on errors in the process). Since ib_send_cm_sidr_rep is called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv could be either erased from the rb_tree twice or not erased at all. Fixing that by making sure it's erased only once before freeing cm_id_priv. Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation') Signed-off-by: Doron Tsur Signed-off-by: Matan Barak Signed-off-by: Doug Ledford --- drivers/infiniband/core/cm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index ea4db9c1d44f..4f918b929eca 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -835,6 +835,11 @@ retest: case IB_CM_SIDR_REQ_RCVD: spin_unlock_irq(&cm_id_priv->lock); cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT); + spin_lock_irq(&cm.lock); + if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) + rb_erase(&cm_id_priv->sidr_id_node, + &cm.remote_sidr_table); + spin_unlock_irq(&cm.lock); break; case IB_CM_REQ_SENT: case IB_CM_MRA_REQ_RCVD: @@ -3172,7 +3177,10 @@ int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id, spin_unlock_irqrestore(&cm_id_priv->lock, flags); spin_lock_irqsave(&cm.lock, flags); - rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table); + if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) { + rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table); + RB_CLEAR_NODE(&cm_id_priv->sidr_id_node); + } spin_unlock_irqrestore(&cm.lock, flags); return 0; -- cgit v1.2.3 From 30730c7f5943b3beace1e29f7f1476e05de3da14 Mon Sep 17 00:00:00 2001 From: Adam Richter Date: Fri, 16 Oct 2015 03:33:02 -0700 Subject: drm: fix mutex leak in drm_dp_get_mst_branch_device In Linux 4.3-rc5, there is an error case in drm_dp_get_branch_device that returns without releasing mgr->lock, resulting a spew of kernel messages about a kernel work function possibly having leaked a mutex and presumably more serious adverse consequences later. This patch changes the error to "goto out" to unlock the mutex before returning. [airlied: grabbed from drm-next as it fixes something we've seen] Signed-off-by: Adam J. Richter Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter Signed-off-by: Dave Airlie --- drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 5bca390d9ae2..809959d56d78 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1194,17 +1194,18 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_ list_for_each_entry(port, &mstb->ports, next) { if (port->port_num == port_num) { - if (!port->mstb) { + mstb = port->mstb; + if (!mstb) { DRM_ERROR("failed to lookup MSTB with lct %d, rad %02x\n", lct, rad[0]); - return NULL; + goto out; } - mstb = port->mstb; break; } } } kref_get(&mstb->kref); +out: mutex_unlock(&mgr->lock); return mstb; } -- cgit v1.2.3 From 2a6c521bb41ce862e43db46f52e7681d33e8d771 Mon Sep 17 00:00:00 2001 From: Ilia Mirkin Date: Tue, 20 Oct 2015 01:15:39 -0400 Subject: drm/nouveau/gem: return only valid domain when there's only one On nv50+, we restrict the valid domains to just the one where the buffer was originally created. However after the buffer is evicted to system memory, we might move it back to a different domain that was not originally valid. When sharing the buffer and retrieving its GEM_INFO data, we still want the domain that will be valid for this buffer in a pushbuf, not the one where it currently happens to be. This resolves fdo#92504 and several others. These are due to suspend evicting all buffers, making it more likely that they temporarily end up in the wrong place. Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504 Signed-off-by: Ilia Mirkin Signed-off-by: Ben Skeggs --- drivers/gpu/drm/nouveau/nouveau_gem.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index 2c9981512d27..41be584147b9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -227,11 +227,12 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem, struct nouveau_bo *nvbo = nouveau_gem_object(gem); struct nvkm_vma *vma; - if (nvbo->bo.mem.mem_type == TTM_PL_TT) + if (is_power_of_2(nvbo->valid_domains)) + rep->domain = nvbo->valid_domains; + else if (nvbo->bo.mem.mem_type == TTM_PL_TT) rep->domain = NOUVEAU_GEM_DOMAIN_GART; else rep->domain = NOUVEAU_GEM_DOMAIN_VRAM; - rep->offset = nvbo->bo.offset; if (cli->vm) { vma = nouveau_bo_vma_find(nvbo, cli->vm); -- cgit v1.2.3 From 8832317f662c06f5c06e638f57bfe89a71c9b266 Mon Sep 17 00:00:00 2001 From: Vasant Hegde Date: Fri, 16 Oct 2015 15:53:29 +0530 Subject: powerpc/rtas: Validate rtas.entry before calling enter_rtas() Currently we do not validate rtas.entry before calling enter_rtas(). This leads to a kernel oops when user space calls rtas system call on a powernv platform (see below). This patch adds code to validate rtas.entry before making enter_rtas() call. Oops: Exception in kernel mode, sig: 4 [#1] SMP NR_CPUS=1024 NUMA PowerNV task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000 NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140 REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le) MSR: 1000000000081000 CR: 00000000 XER: 00000000 CFAR: c000000000009c0c SOFTE: 0 NIP [0000000000000000] (null) LR [0000000000009c14] 0x9c14 Call Trace: [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable) [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0 [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98 Cc: stable@vger.kernel.org # v3.2+ Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform") Reported-by: NAGESWARA R. SASTRY Signed-off-by: Vasant Hegde [mpe: Reword change log, trim oops, and add stable + fixes] Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/rtas.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 84bf934cf748..5a753fae8265 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1043,6 +1043,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs) if (!capable(CAP_SYS_ADMIN)) return -EPERM; + if (!rtas.entry) + return -EINVAL; + if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0) return -EFAULT; -- cgit v1.2.3 From 0f89abf56abbd0e1c6e3cef9813e6d9f05383c1e Mon Sep 17 00:00:00 2001 From: Christian Engelmayer Date: Wed, 21 Oct 2015 00:50:06 +0200 Subject: btrfs: fix possible leak in btrfs_ioctl_balance() Commit 8eb934591f8b ("btrfs: check unsupported filters in balance arguments") adds a jump to exit label out_bargs in case the argument check fails. At this point in addition to the bargs memory, the memory for struct btrfs_balance_control has already been allocated. Ownership of bctl is passed to btrfs_balance() in the good case, thus the memory is not freed due to the introduced jump. Make sure that the memory gets freed in any case as necessary. Detected by Coverity CID 1328378. Signed-off-by: Christian Engelmayer Reviewed-by: David Sterba Signed-off-by: Chris Mason --- fs/btrfs/ioctl.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 3e3e6130637f..8d20f3b1cab0 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4641,7 +4641,7 @@ locked: if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) { ret = -EINVAL; - goto out_bargs; + goto out_bctl; } do_balance: @@ -4655,12 +4655,15 @@ do_balance: need_unlock = false; ret = btrfs_balance(bctl, bargs); + bctl = NULL; if (arg) { if (copy_to_user(arg, bargs, sizeof(*bargs))) ret = -EFAULT; } +out_bctl: + kfree(bctl); out_bargs: kfree(bargs); out_unlock: -- cgit v1.2.3 From 4eb0f7abcefad2d4c127aa7502d3122635eddab0 Mon Sep 17 00:00:00 2001 From: Jiada Wang Date: Tue, 20 Oct 2015 11:47:11 +0900 Subject: ASoC: wm8962: mark cache_dirty flag after software reset in pm_resume By doing software reset of wm8962 in pm_resume, all registers which have already been set will be reset to default value without regmap interface be involved, thus driver need to mark cache_dirty flag, to let regcache can be updated by regcache_sync(). Signed-off-by: Jiada Wang Acked-by: Charles Keepax Signed-off-by: Mark Brown --- sound/soc/codecs/wm8962.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/codecs/wm8962.c b/sound/soc/codecs/wm8962.c index b4eb975da981..45f06d5f865f 100644 --- a/sound/soc/codecs/wm8962.c +++ b/sound/soc/codecs/wm8962.c @@ -3804,6 +3804,8 @@ static int wm8962_runtime_resume(struct device *dev) wm8962_reset(wm8962); + regcache_mark_dirty(wm8962->regmap); + /* SYSCLK defaults to on; make sure it is off so we can safely * write to registers if the device is declocked. */ -- cgit v1.2.3 From 0729a04977d497cf66234fd7f900ddcec3ef1c52 Mon Sep 17 00:00:00 2001 From: Hezi Shahmoon Date: Tue, 20 Oct 2015 16:32:24 +0200 Subject: i2c: mv64xxx: really allow I2C offloading Commit 00d8689b85a7 ("i2c: mv64xxx: rework offload support to fix several problems") completely reworked the offload support, but left a debugging-related "return false" at the beginning of the mv64xxx_i2c_can_offload() function. This has the unfortunate consequence that offloading is in fact never used, which wasn't really the intention. This commit fixes that problem by removing the bogus "return false". Fixes: 00d8689b85a7 ("i2c: mv64xxx: rework offload support to fix several problems") Signed-off-by: Hezi Shahmoon [Thomas: reworked commit log and title.] Signed-off-by: Thomas Petazzoni Signed-off-by: Wolfram Sang Cc: --- drivers/i2c/busses/i2c-mv64xxx.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-mv64xxx.c b/drivers/i2c/busses/i2c-mv64xxx.c index 30059c1df2a3..5801227b97ab 100644 --- a/drivers/i2c/busses/i2c-mv64xxx.c +++ b/drivers/i2c/busses/i2c-mv64xxx.c @@ -669,8 +669,6 @@ mv64xxx_i2c_can_offload(struct mv64xxx_i2c_data *drv_data) struct i2c_msg *msgs = drv_data->msgs; int num = drv_data->num_msgs; - return false; - if (!drv_data->offload_enabled) return false; -- cgit v1.2.3 From ebdd4b7e6a0dd86736eeb6b9e60b361ef64ccc30 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Sun, 13 Sep 2015 19:39:22 -0300 Subject: [media] horus3a: Fix horus3a_attach() function parameters If CONFIG_DVB_HORUS3A is disabled a stub static inline function is defined that just prints a warning about the driver being disabled but the function parameters were wrong which caused a build error. Fixes: a5d32b358254f ("[media] horus3a: Sony Horus3A DVB-S/S2 tuner driver") Reported-by: Fengguang Wu Signed-off-by: Javier Martinez Canillas --- drivers/media/dvb-frontends/horus3a.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/media/dvb-frontends/horus3a.h b/drivers/media/dvb-frontends/horus3a.h index b055319d532e..c1e2d1834b78 100644 --- a/drivers/media/dvb-frontends/horus3a.h +++ b/drivers/media/dvb-frontends/horus3a.h @@ -46,8 +46,8 @@ extern struct dvb_frontend *horus3a_attach(struct dvb_frontend *fe, const struct horus3a_config *config, struct i2c_adapter *i2c); #else -static inline struct dvb_frontend *horus3a_attach( - const struct cxd2820r_config *config, +static inline struct dvb_frontend *horus3a_attach(struct dvb_frontend *fe, + const struct horus3a_config *config, struct i2c_adapter *i2c) { printk(KERN_WARNING "%s: driver disabled by Kconfig\n", __func__); -- cgit v1.2.3 From a9c4e5cfebc44e6caa6b9299af5603f5c2da0c33 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Sun, 13 Sep 2015 19:45:21 -0300 Subject: [media] lnbh25: Fix lnbh25_attach() function return type If CONFIG_DVB_LNBH25 is disabled, a stub static inline function is defined that just prints a warning about the driver being disabled but the function return type was wrong which caused a build error. Fixes: e025273b86fb ("[media] lnbh25: LNBH25 SEC controller driver") Reported-by: Fengguang Wu Signed-off-by: Javier Martinez Canillas --- drivers/media/dvb-frontends/lnbh25.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/dvb-frontends/lnbh25.h b/drivers/media/dvb-frontends/lnbh25.h index 69f30e21f6b3..1f329ef05acc 100644 --- a/drivers/media/dvb-frontends/lnbh25.h +++ b/drivers/media/dvb-frontends/lnbh25.h @@ -43,7 +43,7 @@ struct dvb_frontend *lnbh25_attach( struct lnbh25_config *cfg, struct i2c_adapter *i2c); #else -static inline dvb_frontend *lnbh25_attach( +static inline struct dvb_frontend *lnbh25_attach( struct dvb_frontend *fe, struct lnbh25_config *cfg, struct i2c_adapter *i2c) -- cgit v1.2.3 From bf447221a8791d0f5dd28b19336e31e48f05f04a Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 15 Sep 2015 08:42:27 -0300 Subject: [media] c8sectpfe: fix ininitialized error return on firmware load failure static analysis with cppcheck detected the following error: [drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c:1210]: (error) Uninitialized variable: ret ret is never initialised, so garbage is being returned. Instead return the error return from the call of request_firmware_nowait Signed-off-by: Colin Ian King --- drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c b/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c index 486aef50d99b..16aa494f22be 100644 --- a/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c +++ b/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c @@ -1192,7 +1192,6 @@ err: static int load_c8sectpfe_fw_step1(struct c8sectpfei *fei) { - int ret; int err; dev_info(fei->dev, "Loading firmware: %s\n", FIRMWARE_MEMDMA); @@ -1207,7 +1206,7 @@ static int load_c8sectpfe_fw_step1(struct c8sectpfei *fei) if (err) { dev_err(fei->dev, "request_firmware_nowait err: %d.\n", err); complete_all(&fei->fw_ack); - return ret; + return err; } return 0; -- cgit v1.2.3 From 51a3ac5f4dc45120c78fad51096d989914801457 Mon Sep 17 00:00:00 2001 From: Sudip Mukherjee Date: Thu, 17 Sep 2015 07:12:54 -0300 Subject: [media] c8sectpfe: fix return of garbage The variable err was never initialized, that means we had been checking a garbage value in the for loop. Moreover if the segment is not outside the firmware file then also we have been returning the garbage. Initialize it to 0 so that on success we return the value and no need to check in the for loop also as it is initially 0 and whenever that value changes we have done a break from the loop. Signed-off-by: Sudip Mukherjee Signed-off-by: Mauro Carvalho Chehab --- drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c b/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c index 16aa494f22be..f922f2e827bc 100644 --- a/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c +++ b/drivers/media/platform/sti/c8sectpfe/c8sectpfe-core.c @@ -1097,7 +1097,7 @@ static int load_slim_core_fw(const struct firmware *fw, void *context) Elf32_Ehdr *ehdr; Elf32_Phdr *phdr; u8 __iomem *dst; - int err, i; + int err = 0, i; if (!fw || !context) return -EINVAL; @@ -1106,7 +1106,7 @@ static int load_slim_core_fw(const struct firmware *fw, void *context) phdr = (Elf32_Phdr *)(fw->data + ehdr->e_phoff); /* go through the available ELF segments */ - for (i = 0; i < ehdr->e_phnum && !err; i++, phdr++) { + for (i = 0; i < ehdr->e_phnum; i++, phdr++) { /* Only consider LOAD segments */ if (phdr->p_type != PT_LOAD) -- cgit v1.2.3 From 54bec3970cb5351d08866af1ea8b0787edd7ede3 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 21 Sep 2015 12:47:11 -0300 Subject: [media] ir-hix5hd2: drop the use of IRQF_NO_SUSPEND This driver doesn't claim the IR transmitter to be wakeup source. It even disables the clock and the IR during suspend-resume cycle. This patch removes yet another misuse of IRQF_NO_SUSPEND. Cc: Patrice Chotard Cc: Fabio Estevam Cc: Guoxiong Yan Signed-off-by: Sudeep Holla Acked-by: Zhangfei Gao Signed-off-by: Mauro Carvalho Chehab --- drivers/media/rc/ir-hix5hd2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/rc/ir-hix5hd2.c b/drivers/media/rc/ir-hix5hd2.c index 1c087cb76815..d0549fba711c 100644 --- a/drivers/media/rc/ir-hix5hd2.c +++ b/drivers/media/rc/ir-hix5hd2.c @@ -257,7 +257,7 @@ static int hix5hd2_ir_probe(struct platform_device *pdev) goto clkerr; if (devm_request_irq(dev, priv->irq, hix5hd2_ir_rx_interrupt, - IRQF_NO_SUSPEND, pdev->name, priv) < 0) { + 0, pdev->name, priv) < 0) { dev_err(dev, "IRQ %d register failed\n", priv->irq); ret = -EINVAL; goto regerr; -- cgit v1.2.3 From a828d72df216c36e9c40b6c24dc4b17b6f7b5a76 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Tue, 29 Sep 2015 21:10:10 -0300 Subject: [media] si2157: Bounds check firmware When reading the firmware and sending commands, the length must be bounds checked to avoid overrunning the size of the command buffer and smashing the stack if the firmware is not in the expected format. Add the proper check. Cc: stable@kernel.org Signed-off-by: Laura Abbott Signed-off-by: Mauro Carvalho Chehab --- drivers/media/tuners/si2157.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/media/tuners/si2157.c b/drivers/media/tuners/si2157.c index 507382160e5e..ce157edd45fa 100644 --- a/drivers/media/tuners/si2157.c +++ b/drivers/media/tuners/si2157.c @@ -166,6 +166,10 @@ static int si2157_init(struct dvb_frontend *fe) for (remaining = fw->size; remaining > 0; remaining -= 17) { len = fw->data[fw->size - remaining]; + if (len > SI2157_ARGLEN) { + dev_err(&client->dev, "Bad firmware length\n"); + goto err_release_firmware; + } memcpy(cmd.args, &fw->data[(fw->size - remaining) + 1], len); cmd.wlen = len; cmd.rlen = 1; -- cgit v1.2.3 From 47810b4341ac9d2f558894bc5995e6fa2a1298f9 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Tue, 29 Sep 2015 21:10:09 -0300 Subject: [media] si2168: Bounds check firmware When reading the firmware and sending commands, the length must be bounds checked to avoid overrunning the size of the command buffer and smashing the stack if the firmware is not in the expected format: si2168 11-0064: found a 'Silicon Labs Si2168-B40' si2168 11-0064: downloading firmware from file 'dvb-demod-si2168-b40-01.fw' si2168 11-0064: firmware download failed -95 Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa085708f Add the proper check. Cc: stable@kernel.org Reported-by: Stuart Auchterlonie Reviewed-by: Antti Palosaari Signed-off-by: Laura Abbott Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/si2168.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c index 81788c5a44d8..821a8f481507 100644 --- a/drivers/media/dvb-frontends/si2168.c +++ b/drivers/media/dvb-frontends/si2168.c @@ -502,6 +502,10 @@ static int si2168_init(struct dvb_frontend *fe) /* firmware is in the new format */ for (remaining = fw->size; remaining > 0; remaining -= 17) { len = fw->data[fw->size - remaining]; + if (len > SI2168_ARGLEN) { + ret = -EINVAL; + break; + } memcpy(cmd.args, &fw->data[(fw->size - remaining) + 1], len); cmd.wlen = len; cmd.rlen = 1; -- cgit v1.2.3 From 9d2b064c0ae42ad93b2a0c7da05daef312c96bcc Mon Sep 17 00:00:00 2001 From: Abylay Ospan Date: Fri, 25 Sep 2015 04:56:21 -0300 Subject: [media] netup_unidvb: fix potential crash when spi is NULL Signed-off-by: Abylay Ospan Reported-by: Dan Carpenter Signed-off-by: Mauro Carvalho Chehab --- drivers/media/pci/netup_unidvb/netup_unidvb_spi.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/media/pci/netup_unidvb/netup_unidvb_spi.c b/drivers/media/pci/netup_unidvb/netup_unidvb_spi.c index f55b3276f28d..56773f3893d4 100644 --- a/drivers/media/pci/netup_unidvb/netup_unidvb_spi.c +++ b/drivers/media/pci/netup_unidvb/netup_unidvb_spi.c @@ -80,11 +80,9 @@ irqreturn_t netup_spi_interrupt(struct netup_spi *spi) u16 reg; unsigned long flags; - if (!spi) { - dev_dbg(&spi->master->dev, - "%s(): SPI not initialized\n", __func__); + if (!spi) return IRQ_NONE; - } + spin_lock_irqsave(&spi->lock, flags); reg = readw(&spi->regs->control_stat); if (!(reg & NETUP_SPI_CTRL_IRQ)) { @@ -234,11 +232,9 @@ void netup_spi_release(struct netup_unidvb_dev *ndev) unsigned long flags; struct netup_spi *spi = ndev->spi; - if (!spi) { - dev_dbg(&spi->master->dev, - "%s(): SPI not initialized\n", __func__); + if (!spi) return; - } + spin_lock_irqsave(&spi->lock, flags); reg = readw(&spi->regs->control_stat); writew(reg | NETUP_SPI_CTRL_IRQ, &spi->regs->control_stat); -- cgit v1.2.3 From 17f38822038ba5d4dba79b72fd111bbf64173063 Mon Sep 17 00:00:00 2001 From: Jacek Anaszewski Date: Fri, 2 Oct 2015 06:19:15 -0300 Subject: [media] v4l2-flash-led-class: Add missing VIDEO_V4L2 Kconfig dependency Fixes the following randconfig problem: drivers/built-in.o: In function `v4l2_flash_release': (.text+0x12204f): undefined reference to `v4l2_async_unregister_subdev' drivers/built-in.o: In function `v4l2_flash_release': (.text+0x122057): undefined reference to `v4l2_ctrl_handler_free' drivers/built-in.o: In function `v4l2_flash_close': v4l2-flash-led-class.c:(.text+0x12208f): undefined reference to `v4l2_fh_is_singular' v4l2-flash-led-class.c:(.text+0x1220c8): undefined reference to `__v4l2_ctrl_s_ctrl' drivers/built-in.o: In function `v4l2_flash_open': v4l2-flash-led-class.c:(.text+0x12227f): undefined reference to `v4l2_fh_is_singular' drivers/built-in.o: In function `v4l2_flash_init_controls': v4l2-flash-led-class.c:(.text+0x12274e): undefined reference to `v4l2_ctrl_handler_init_class' v4l2-flash-led-class.c:(.text+0x122797): undefined reference to `v4l2_ctrl_new_std_menu' v4l2-flash-led-class.c:(.text+0x1227e0): undefined reference to `v4l2_ctrl_new_std' v4l2-flash-led-class.c:(.text+0x122826): undefined reference to `v4l2_ctrl_handler_setup' v4l2-flash-led-class.c:(.text+0x122839): undefined reference to `v4l2_ctrl_handler_free' drivers/built-in.o: In function `v4l2_flash_init': (.text+0x1228e2): undefined reference to `v4l2_subdev_init' drivers/built-in.o: In function `v4l2_flash_init': (.text+0x12293b): undefined reference to `v4l2_async_register_subdev' drivers/built-in.o: In function `v4l2_flash_init': (.text+0x122949): undefined reference to `v4l2_ctrl_handler_free' drivers/built-in.o:(.rodata+0x20ef8): undefined reference to `v4l2_subdev_queryctrl' drivers/built-in.o:(.rodata+0x20f10): undefined reference to `v4l2_subdev_querymenu' Signed-off-by: Jacek Anaszewski Reported-by: kbuild test robot Cc: Sakari Ailus Cc: Hans Verkuil --- drivers/media/v4l2-core/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/v4l2-core/Kconfig b/drivers/media/v4l2-core/Kconfig index 82876a67f144..9beece00869b 100644 --- a/drivers/media/v4l2-core/Kconfig +++ b/drivers/media/v4l2-core/Kconfig @@ -47,7 +47,7 @@ config V4L2_MEM2MEM_DEV # Used by LED subsystem flash drivers config V4L2_FLASH_LED_CLASS tristate "V4L2 flash API for LED flash class devices" - depends on VIDEO_V4L2_SUBDEV_API + depends on VIDEO_V4L2 && VIDEO_V4L2_SUBDEV_API depends on LEDS_CLASS_FLASH ---help--- Say Y here to enable V4L2 flash API support for LED flash -- cgit v1.2.3 From d18ca5b7ceca0e9674cb4bb2ed476b0fcbb23ba2 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Tue, 6 Oct 2015 00:22:23 -0300 Subject: [media] rtl28xxu: fix control message flaws Add lock to prevent concurrent access for control message as control message function uses shared buffer. Without the lock there may be remote control polling which messes the buffer causing IO errors. Increase buffer size and add check for maximum supported message length. Link: https://bugzilla.kernel.org/show_bug.cgi?id=103391 Fixes: c56222a6b25c ("[media] rtl28xxu: move usb buffers to state") Cc: # 4.0+ Signed-off-by: Antti Palosaari --- drivers/media/usb/dvb-usb-v2/rtl28xxu.c | 15 +++++++++++++-- drivers/media/usb/dvb-usb-v2/rtl28xxu.h | 2 +- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c index c3cac4c12fb3..197a4f2e54d2 100644 --- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c +++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c @@ -34,6 +34,14 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req) unsigned int pipe; u8 requesttype; + mutex_lock(&d->usb_mutex); + + if (req->size > sizeof(dev->buf)) { + dev_err(&d->intf->dev, "too large message %u\n", req->size); + ret = -EINVAL; + goto err_mutex_unlock; + } + if (req->index & CMD_WR_FLAG) { /* write */ memcpy(dev->buf, req->data, req->size); @@ -50,14 +58,17 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req) dvb_usb_dbg_usb_control_msg(d->udev, 0, requesttype, req->value, req->index, dev->buf, req->size); if (ret < 0) - goto err; + goto err_mutex_unlock; /* read request, copy returned data to return buf */ if (requesttype == (USB_TYPE_VENDOR | USB_DIR_IN)) memcpy(req->data, dev->buf, req->size); + mutex_unlock(&d->usb_mutex); + return 0; -err: +err_mutex_unlock: + mutex_unlock(&d->usb_mutex); dev_dbg(&d->intf->dev, "failed=%d\n", ret); return ret; } diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.h b/drivers/media/usb/dvb-usb-v2/rtl28xxu.h index 9f6115a2ee01..138062960a73 100644 --- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.h +++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.h @@ -71,7 +71,7 @@ struct rtl28xxu_dev { - u8 buf[28]; + u8 buf[128]; u8 chip_id; u8 tuner; char *tuner_name; -- cgit v1.2.3 From 56ea37da3b93dfe46cb5c3ee0ee4cc44229ece47 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Sat, 3 Oct 2015 18:35:14 -0300 Subject: [media] m88ds3103: use own reg update_bits() implementation Device stopped to tuning some channels after regmap conversion. Reason is that regmap_update_bits() works a bit differently for partially volatile registers than old homemade routine. Return back to old routine in order to fix issue. Fixes: 478932b16052f5ded74685d096ae920cd17d6424 Cc: # 4.2+ Reported-by: Mark Clarkstone Tested-by: Mark Clarkstone Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/m88ds3103.c | 73 +++++++++++++++++++++------------ 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/drivers/media/dvb-frontends/m88ds3103.c b/drivers/media/dvb-frontends/m88ds3103.c index ff31e7a01ca9..feeeb70d841e 100644 --- a/drivers/media/dvb-frontends/m88ds3103.c +++ b/drivers/media/dvb-frontends/m88ds3103.c @@ -18,6 +18,27 @@ static struct dvb_frontend_ops m88ds3103_ops; +/* write single register with mask */ +static int m88ds3103_update_bits(struct m88ds3103_dev *dev, + u8 reg, u8 mask, u8 val) +{ + int ret; + u8 tmp; + + /* no need for read if whole reg is written */ + if (mask != 0xff) { + ret = regmap_bulk_read(dev->regmap, reg, &tmp, 1); + if (ret) + return ret; + + val &= mask; + tmp &= ~mask; + val |= tmp; + } + + return regmap_bulk_write(dev->regmap, reg, &val, 1); +} + /* write reg val table using reg addr auto increment */ static int m88ds3103_wr_reg_val_tab(struct m88ds3103_dev *dev, const struct m88ds3103_reg_val *tab, int tab_len) @@ -394,10 +415,10 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe) u8tmp2 = 0x00; /* 0b00 */ break; } - ret = regmap_update_bits(dev->regmap, 0x22, 0xc0, u8tmp1 << 6); + ret = m88ds3103_update_bits(dev, 0x22, 0xc0, u8tmp1 << 6); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x24, 0xc0, u8tmp2 << 6); + ret = m88ds3103_update_bits(dev, 0x24, 0xc0, u8tmp2 << 6); if (ret) goto err; } @@ -455,13 +476,13 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe) if (ret) goto err; } - ret = regmap_update_bits(dev->regmap, 0x9d, 0x08, 0x08); + ret = m88ds3103_update_bits(dev, 0x9d, 0x08, 0x08); if (ret) goto err; ret = regmap_write(dev->regmap, 0xf1, 0x01); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x30, 0x80, 0x80); + ret = m88ds3103_update_bits(dev, 0x30, 0x80, 0x80); if (ret) goto err; } @@ -498,7 +519,7 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe) switch (dev->cfg->ts_mode) { case M88DS3103_TS_SERIAL: case M88DS3103_TS_SERIAL_D7: - ret = regmap_update_bits(dev->regmap, 0x29, 0x20, u8tmp1); + ret = m88ds3103_update_bits(dev, 0x29, 0x20, u8tmp1); if (ret) goto err; u8tmp1 = 0; @@ -567,11 +588,11 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe) if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x4d, 0x02, dev->cfg->spec_inv << 1); + ret = m88ds3103_update_bits(dev, 0x4d, 0x02, dev->cfg->spec_inv << 1); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x30, 0x10, dev->cfg->agc_inv << 4); + ret = m88ds3103_update_bits(dev, 0x30, 0x10, dev->cfg->agc_inv << 4); if (ret) goto err; @@ -625,13 +646,13 @@ static int m88ds3103_init(struct dvb_frontend *fe) dev->warm = false; /* wake up device from sleep */ - ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x01); + ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x01); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x00); + ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x00); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x00); + ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x00); if (ret) goto err; @@ -749,18 +770,18 @@ static int m88ds3103_sleep(struct dvb_frontend *fe) utmp = 0x29; else utmp = 0x27; - ret = regmap_update_bits(dev->regmap, utmp, 0x01, 0x00); + ret = m88ds3103_update_bits(dev, utmp, 0x01, 0x00); if (ret) goto err; /* sleep */ - ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x00); + ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x00); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x01); + ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x01); if (ret) goto err; - ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x10); + ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x10); if (ret) goto err; @@ -992,12 +1013,12 @@ static int m88ds3103_set_tone(struct dvb_frontend *fe, } utmp = tone << 7 | dev->cfg->envelope_mode << 5; - ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp); + ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp); if (ret) goto err; utmp = 1 << 2; - ret = regmap_update_bits(dev->regmap, 0xa1, reg_a1_mask, utmp); + ret = m88ds3103_update_bits(dev, 0xa1, reg_a1_mask, utmp); if (ret) goto err; @@ -1047,7 +1068,7 @@ static int m88ds3103_set_voltage(struct dvb_frontend *fe, voltage_dis ^= dev->cfg->lnb_en_pol; utmp = voltage_dis << 1 | voltage_sel << 0; - ret = regmap_update_bits(dev->regmap, 0xa2, 0x03, utmp); + ret = m88ds3103_update_bits(dev, 0xa2, 0x03, utmp); if (ret) goto err; @@ -1080,7 +1101,7 @@ static int m88ds3103_diseqc_send_master_cmd(struct dvb_frontend *fe, } utmp = dev->cfg->envelope_mode << 5; - ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp); + ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp); if (ret) goto err; @@ -1115,12 +1136,12 @@ static int m88ds3103_diseqc_send_master_cmd(struct dvb_frontend *fe, } else { dev_dbg(&client->dev, "diseqc tx timeout\n"); - ret = regmap_update_bits(dev->regmap, 0xa1, 0xc0, 0x40); + ret = m88ds3103_update_bits(dev, 0xa1, 0xc0, 0x40); if (ret) goto err; } - ret = regmap_update_bits(dev->regmap, 0xa2, 0xc0, 0x80); + ret = m88ds3103_update_bits(dev, 0xa2, 0xc0, 0x80); if (ret) goto err; @@ -1152,7 +1173,7 @@ static int m88ds3103_diseqc_send_burst(struct dvb_frontend *fe, } utmp = dev->cfg->envelope_mode << 5; - ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp); + ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp); if (ret) goto err; @@ -1194,12 +1215,12 @@ static int m88ds3103_diseqc_send_burst(struct dvb_frontend *fe, } else { dev_dbg(&client->dev, "diseqc tx timeout\n"); - ret = regmap_update_bits(dev->regmap, 0xa1, 0xc0, 0x40); + ret = m88ds3103_update_bits(dev, 0xa1, 0xc0, 0x40); if (ret) goto err; } - ret = regmap_update_bits(dev->regmap, 0xa2, 0xc0, 0x80); + ret = m88ds3103_update_bits(dev, 0xa2, 0xc0, 0x80); if (ret) goto err; @@ -1435,13 +1456,13 @@ static int m88ds3103_probe(struct i2c_client *client, goto err_kfree; /* sleep */ - ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x00); + ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x00); if (ret) goto err_kfree; - ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x01); + ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x01); if (ret) goto err_kfree; - ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x10); + ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x10); if (ret) goto err_kfree; -- cgit v1.2.3 From 5211613978cb7353a3237e4372958c0e7514683f Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 22 Oct 2015 13:32:08 -0700 Subject: kmod: don't run async usermode helper as a child of kworker thread call_usermodehelper_exec_sync() does fork() + wait() with "unignored" SIGCHLD. What we have missed is that this worker thread can have other children previously forked by call_usermodehelper_exec_work() without UMH_WAIT_PROC. If such a child exits in between it becomes a zombie because auto-reaping only works if SIGCHLD is ignored, and nobody can reap it (unless/until this worker thread exits too). Change the !UMH_WAIT_PROC case to use CLONE_PARENT. Note: this is only first step. All PF_KTHREAD tasks, even created by kernel_thread() should have ->parent == kthreadd by default. Fixes: bb304a5c6fc63d8506c ("kmod: handle UMH_WAIT_PROC from system unbound workqueue") Signed-off-by: Oleg Nesterov Acked-by: Frederic Weisbecker Cc: Rik van Riel Cc: Christoph Lameter Cc: Tejun Heo Cc: Rusty Russell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/kmod.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/kmod.c b/kernel/kmod.c index da98d0593de2..0277d1216f80 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -327,9 +327,13 @@ static void call_usermodehelper_exec_work(struct work_struct *work) call_usermodehelper_exec_sync(sub_info); } else { pid_t pid; - + /* + * Use CLONE_PARENT to reparent it to kthreadd; we do not + * want to pollute current->children, and we need a parent + * that always ignores SIGCHLD to ensure auto-reaping. + */ pid = kernel_thread(call_usermodehelper_exec_async, sub_info, - SIGCHLD); + CLONE_PARENT | SIGCHLD); if (pid < 0) { sub_info->retval = pid; umh_complete(sub_info); -- cgit v1.2.3 From 67a2e213e7e937c41c52ab5bc46bf3f4de469f6e Mon Sep 17 00:00:00 2001 From: Rohit Vaswani Date: Thu, 22 Oct 2015 13:32:11 -0700 Subject: mm: cma: fix incorrect type conversion for size during dma allocation This was found during userspace fuzzing test when a large size dma cma allocation is made by driver(like ion) through userspace. show_stack+0x10/0x1c dump_stack+0x74/0xc8 kasan_report_error+0x2b0/0x408 kasan_report+0x34/0x40 __asan_storeN+0x15c/0x168 memset+0x20/0x44 __dma_alloc_coherent+0x114/0x18c Signed-off-by: Rohit Vaswani Acked-by: Greg Kroah-Hartman Cc: Marek Szyprowski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/base/dma-contiguous.c | 2 +- include/linux/cma.h | 2 +- include/linux/dma-contiguous.h | 4 ++-- mm/cma.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 950fff9ce453..a12ff9863d7e 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -187,7 +187,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, * global one. Requires architecture specific dev_get_cma_area() helper * function. */ -struct page *dma_alloc_from_contiguous(struct device *dev, int count, +struct page *dma_alloc_from_contiguous(struct device *dev, size_t count, unsigned int align) { if (align > CONFIG_CMA_ALIGNMENT) diff --git a/include/linux/cma.h b/include/linux/cma.h index f7ef093ec49a..29f9e774ab76 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -26,6 +26,6 @@ extern int __init cma_declare_contiguous(phys_addr_t base, extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, unsigned int order_per_bit, struct cma **res_cma); -extern struct page *cma_alloc(struct cma *cma, unsigned int count, unsigned int align); +extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align); extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); #endif diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h index 569bbd039896..fec734df1524 100644 --- a/include/linux/dma-contiguous.h +++ b/include/linux/dma-contiguous.h @@ -111,7 +111,7 @@ static inline int dma_declare_contiguous(struct device *dev, phys_addr_t size, return ret; } -struct page *dma_alloc_from_contiguous(struct device *dev, int count, +struct page *dma_alloc_from_contiguous(struct device *dev, size_t count, unsigned int order); bool dma_release_from_contiguous(struct device *dev, struct page *pages, int count); @@ -144,7 +144,7 @@ int dma_declare_contiguous(struct device *dev, phys_addr_t size, } static inline -struct page *dma_alloc_from_contiguous(struct device *dev, int count, +struct page *dma_alloc_from_contiguous(struct device *dev, size_t count, unsigned int order) { return NULL; diff --git a/mm/cma.c b/mm/cma.c index e7d1db533025..4eb56badf37e 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -361,7 +361,7 @@ err: * This function allocates part of contiguous memory on specific * contiguous memory area. */ -struct page *cma_alloc(struct cma *cma, unsigned int count, unsigned int align) +struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align) { unsigned long mask, offset, pfn, start = 0; unsigned long bitmap_maxno, bitmap_no, bitmap_count; @@ -371,7 +371,7 @@ struct page *cma_alloc(struct cma *cma, unsigned int count, unsigned int align) if (!cma || !cma->count) return NULL; - pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma, + pr_debug("%s(cma %p, count %zu, align %d)\n", __func__, (void *)cma, count, align); if (!count) -- cgit v1.2.3 From 41192a2d6a7f4cd6af9fc2f8edbbf24b2694f2f6 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Thu, 22 Oct 2015 13:32:13 -0700 Subject: MAINTAINERS: add Sergey as zsmalloc reviewer Nominate myself as a zsmalloc reviewer. Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index fb7d2e4af200..d1116d9ea6ae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11674,6 +11674,7 @@ F: drivers/tty/serial/zs.* ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR M: Minchan Kim M: Nitin Gupta +R: Sergey Senozhatsky L: linux-mm@kvack.org S: Maintained F: mm/zsmalloc.c -- cgit v1.2.3 From b8fa0efa01109e294e9be610465c324f771cb5ba Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Thu, 22 Oct 2015 13:32:16 -0700 Subject: mailmap: update Javier Martinez Canillas' email The get_maintainer script still reports my old Collabora email based on old commits but that address no longer exist so update mailmap to report my current email and avoid people sending to the old address. Signed-off-by: Javier Martinez Canillas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index 4b31af54ccd5..b1e9a97653dc 100644 --- a/.mailmap +++ b/.mailmap @@ -59,6 +59,7 @@ James Bottomley James Bottomley James E Wilson James Ketrenos + Jean Tourrilhes Jeff Garzik Jens Axboe -- cgit v1.2.3 From 47aee4d8e314384807e98b67ade07f6da476aa75 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 22 Oct 2015 13:32:19 -0700 Subject: thp: use is_zero_pfn() only after pte_present() check Use is_zero_pfn() on pteval only after pte_present() check on pteval (It might be better idea to introduce is_zero_pte() which checks pte_present() first). Otherwise when working on a swap or migration entry and if pte_pfn's result is equal to zero_pfn by chance, we lose user's data in __collapse_huge_page_copy(). So if you're unlucky, the application segfaults and finally you could see below message on exit: BUG: Bad rss-counter state mm:ffff88007f099300 idx:2 val:3 Fixes: ca0984caa823 ("mm: incorporate zero pages into transparent huge pages") Signed-off-by: Minchan Kim Reviewed-by: Andrea Arcangeli Acked-by: Kirill A. Shutemov Cc: Mel Gorman Acked-by: Vlastimil Babka Cc: Hugh Dickins Cc: Rik van Riel Cc: [4.1+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/huge_memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4b06b8db9df2..bbac913f96bc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2206,7 +2206,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, for (_pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, address += PAGE_SIZE) { pte_t pteval = *_pte; - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + if (pte_none(pteval) || (pte_present(pteval) && + is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && ++none_or_zero <= khugepaged_max_ptes_none) continue; -- cgit v1.2.3 From 296291cdd1629c308114504b850dc343eabc2782 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 22 Oct 2015 13:32:21 -0700 Subject: mm: make sendfile(2) killable Currently a simple program below issues a sendfile(2) system call which takes about 62 days to complete in my test KVM instance. int fd; off_t off = 0; fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644); ftruncate(fd, 2); lseek(fd, 0, SEEK_END); sendfile(fd, fd, &off, 0xfffffff); Now you should not ask kernel to do a stupid stuff like copying 256MB in 2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin should have a way to stop you. We actually do have a check for fatal_signal_pending() in generic_perform_write() which triggers in this path however because we always succeed in writing something before the check is done, we return value > 0 from generic_perform_write() and thus the information about signal gets lost. Fix the problem by doing the signal check before writing anything. That way generic_perform_write() returns -EINTR, the error gets propagated up and the sendfile loop terminates early. Signed-off-by: Jan Kara Reported-by: Dmitry Vyukov Cc: Al Viro Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 1cc5467cf36c..327910c2400c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2488,6 +2488,11 @@ again: break; } + if (fatal_signal_pending(current)) { + status = -EINTR; + break; + } + status = a_ops->write_begin(file, mapping, pos, bytes, flags, &page, &fsdata); if (unlikely(status < 0)) @@ -2525,10 +2530,6 @@ again: written += copied; balance_dirty_pages_ratelimited(mapping); - if (fatal_signal_pending(current)) { - status = -EINTR; - break; - } } while (iov_iter_count(i)); return written ? written : status; -- cgit v1.2.3 From 3f181b4d8652f7bcd7e9932c7307b8ecd4d87cf6 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 22 Oct 2015 13:32:24 -0700 Subject: lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y When the kernel compiled with KASAN=y, GCC adds redzones for each variable on stack. This enlarges function's stack frame and causes: 'warning: the frame size of X bytes is larger than Y bytes' The worst case I've seen for now is following: ../net/wireless/nl80211.c: In function `nl80211_send_wiphy': ../net/wireless/nl80211.c:1731:1: warning: the frame size of 5448 bytes is larger than 2048 bytes [-Wframe-larger-than=] That kind of warning becomes useless with KASAN=y. It doesn't necessarily indicate that there is some problem in the code, thus we should turn it off. (The KASAN=y stack size in increased from 16k to 32k for this reason) Signed-off-by: Andrey Ryabinin Reported-by: Fengguang Wu Acked-by: Abylay Ospan Cc: Andi Kleen Cc: Ingo Molnar Cc: Mauro Carvalho Chehab Cc: Kozlov Sergey Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/Kconfig.debug | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ab76b99adc85..1d1521c26302 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -197,6 +197,7 @@ config ENABLE_MUST_CHECK config FRAME_WARN int "Warn for stack frames larger than (needs gcc 4.4)" range 0 8192 + default 0 if KASAN default 1024 if !64BIT default 2048 if 64BIT help -- cgit v1.2.3 From bb387002693ed28b2bb0408c5dec65521b71e5f1 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 22 Oct 2015 13:32:27 -0700 Subject: fault-inject: fix inverted interval/probability values in printk interval displays the probability and vice versa. Fixes: 6adc4a22f20bb ("fault-inject: add ratelimit option") Acked-by: Akinobu Mita Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/fault-inject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/fault-inject.c b/lib/fault-inject.c index f1cdeb024d17..6a823a53e357 100644 --- a/lib/fault-inject.c +++ b/lib/fault-inject.c @@ -44,7 +44,7 @@ static void fail_dump(struct fault_attr *attr) printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n" "name %pd, interval %lu, probability %lu, " "space %d, times %d\n", attr->dname, - attr->probability, attr->interval, + attr->interval, attr->probability, atomic_read(&attr->space), atomic_read(&attr->times)); if (attr->verbose > 1) -- cgit v1.2.3 From b67de018b37a97548645a879c627d4188518e907 Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Thu, 22 Oct 2015 13:32:29 -0700 Subject: ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put dlm_lockres_put will call dlm_lockres_release if it is the last reference, and then it may call dlm_print_one_lock_resource and take lockres spinlock. So unlock lockres spinlock before dlm_lockres_put to avoid deadlock. Signed-off-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ocfs2/dlm/dlmmaster.c | 3 ++- fs/ocfs2/dlm/dlmrecovery.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index ee5aa4daaea0..ce38b4ccc9ab 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -1658,12 +1658,13 @@ send_response: if (ret < 0) { mlog(ML_ERROR, "failed to dispatch assert master work\n"); response = DLM_MASTER_RESP_ERROR; + spin_unlock(&res->spinlock); dlm_lockres_put(res); } else { dispatched = 1; __dlm_lockres_grab_inflight_worker(dlm, res); + spin_unlock(&res->spinlock); } - spin_unlock(&res->spinlock); } else { if (res) dlm_lockres_put(res); diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 3d90ad7ff91f..58eaa5c0d387 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1723,8 +1723,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data, } else { dispatched = 1; __dlm_lockres_grab_inflight_worker(dlm, res); + spin_unlock(&res->spinlock); } - spin_unlock(&res->spinlock); } else { /* put.. incase we are not the master */ spin_unlock(&res->spinlock); -- cgit v1.2.3 From 0aaafaabfcba8aa991913cd3280a5dbf7f111a2a Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 23 Oct 2015 11:50:08 +0200 Subject: sched/core: Add missing lockdep_unpin() annotations Luca and Wanpeng reported two missing annotations that led to false lockdep complaints. Add the missing annotations. Reported-by: Luca Abeni Reported-by: Wanpeng Li Signed-off-by: Peter Zijlstra (Intel) Cc: Juri Lelli Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: cbce1a686700 ("sched,lockdep: Employ lock pinning") Link: http://lkml.kernel.org/r/20151023095008.GY17308@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 9 ++++++++- kernel/sched/deadline.c | 9 ++++++++- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5bd7d60658d3..bcd214e4b4d6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2366,8 +2366,15 @@ void wake_up_new_task(struct task_struct *p) trace_sched_wakeup_new(p); check_preempt_curr(rq, p, WF_FORK); #ifdef CONFIG_SMP - if (p->sched_class->task_woken) + if (p->sched_class->task_woken) { + /* + * Nothing relies on rq->lock after this, so its fine to + * drop it. + */ + lockdep_unpin_lock(&rq->lock); p->sched_class->task_woken(rq, p); + lockdep_pin_lock(&rq->lock); + } #endif task_rq_unlock(rq, p, &flags); } diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 142df2668e5d..8b0a15e285f9 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -668,8 +668,15 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) * Queueing this task back might have overloaded rq, check if we need * to kick someone away. */ - if (has_pushable_dl_tasks(rq)) + if (has_pushable_dl_tasks(rq)) { + /* + * Nothing relies on rq->lock after this, so its safe to drop + * rq->lock. + */ + lockdep_unpin_lock(&rq->lock); push_dl_task(rq); + lockdep_pin_lock(&rq->lock); + } #endif unlock: -- cgit v1.2.3 From 5c92d87d30b23844e6998d8318e4c19ee3a907ac Mon Sep 17 00:00:00 2001 From: Christian König Date: Wed, 21 Oct 2015 21:58:28 +0200 Subject: drm/amdgpu: stop leaking page flip fence MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit reservation_object_get_fences_rcu already takes the references. Signed-off-by: Christian König Reviewed-by: Alex Deucher Reviewed-by: Chunming Zhou Reviewed-by: Jammy Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index dc29ed8145c2..6c9e0902a414 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -184,10 +184,6 @@ int amdgpu_crtc_page_flip(struct drm_crtc *crtc, goto cleanup; } - fence_get(work->excl); - for (i = 0; i < work->shared_count; ++i) - fence_get(work->shared[i]); - amdgpu_bo_get_tiling_flags(new_rbo, &tiling_flags); amdgpu_bo_unreserve(new_rbo); -- cgit v1.2.3 From 49abb26651167c892393cd9f2ad23df429645ed9 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Fri, 23 Oct 2015 10:38:52 -0400 Subject: drm/radeon: don't try to recreate sysfs entries on resume Fixes a harmless error message caused by: 51a4726b04e880fdd9b4e0e58b13f70b0a68a7f5 Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/radeon/radeon.h | 1 + drivers/gpu/drm/radeon/radeon_pm.c | 35 +++++++++++++++++++++-------------- 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index f03b7eb15233..b6cbd816537e 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -1658,6 +1658,7 @@ struct radeon_pm { u8 fan_max_rpm; /* dpm */ bool dpm_enabled; + bool sysfs_initialized; struct radeon_dpm dpm; }; diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index 6a0a176e26ec..5feee3b4c557 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -1528,19 +1528,23 @@ int radeon_pm_late_init(struct radeon_device *rdev) if (rdev->pm.pm_method == PM_METHOD_DPM) { if (rdev->pm.dpm_enabled) { - ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state); - if (ret) - DRM_ERROR("failed to create device file for dpm state\n"); - ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level); - if (ret) - DRM_ERROR("failed to create device file for dpm state\n"); - /* XXX: these are noops for dpm but are here for backwards compat */ - ret = device_create_file(rdev->dev, &dev_attr_power_profile); - if (ret) - DRM_ERROR("failed to create device file for power profile\n"); - ret = device_create_file(rdev->dev, &dev_attr_power_method); - if (ret) - DRM_ERROR("failed to create device file for power method\n"); + if (!rdev->pm.sysfs_initialized) { + ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state); + if (ret) + DRM_ERROR("failed to create device file for dpm state\n"); + ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level); + if (ret) + DRM_ERROR("failed to create device file for dpm state\n"); + /* XXX: these are noops for dpm but are here for backwards compat */ + ret = device_create_file(rdev->dev, &dev_attr_power_profile); + if (ret) + DRM_ERROR("failed to create device file for power profile\n"); + ret = device_create_file(rdev->dev, &dev_attr_power_method); + if (ret) + DRM_ERROR("failed to create device file for power method\n"); + if (!ret) + rdev->pm.sysfs_initialized = true; + } mutex_lock(&rdev->pm.mutex); ret = radeon_dpm_late_enable(rdev); @@ -1556,7 +1560,8 @@ int radeon_pm_late_init(struct radeon_device *rdev) } } } else { - if (rdev->pm.num_power_states > 1) { + if ((rdev->pm.num_power_states > 1) && + (!rdev->pm.sysfs_initialized)) { /* where's the best place to put these? */ ret = device_create_file(rdev->dev, &dev_attr_power_profile); if (ret) @@ -1564,6 +1569,8 @@ int radeon_pm_late_init(struct radeon_device *rdev) ret = device_create_file(rdev->dev, &dev_attr_power_method); if (ret) DRM_ERROR("failed to create device file for power method\n"); + if (!ret) + rdev->pm.sysfs_initialized = true; } } return ret; -- cgit v1.2.3 From c86f5ebfbd147d1a228ab89ee1658e18939bd7ad Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Fri, 23 Oct 2015 10:45:14 -0400 Subject: drm/amdgpu: don't try to recreate sysfs entries on resume Fixes an error on resume caused by: fa022a9b65d2886486a022fd66b20c823cd76ad9 Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6647fb26ef25..0d13e6368b96 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1654,6 +1654,7 @@ struct amdgpu_pm { u8 fan_max_rpm; /* dpm */ bool dpm_enabled; + bool sysfs_initialized; struct amdgpu_dpm dpm; const struct firmware *fw; /* SMC firmware */ uint32_t fw_version; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c index ed2bbe5b10af..22a8c7d3a3ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c @@ -695,6 +695,9 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev) { int ret; + if (adev->pm.sysfs_initialized) + return 0; + if (adev->pm.funcs->get_temperature == NULL) return 0; adev->pm.int_hwmon_dev = hwmon_device_register_with_groups(adev->dev, @@ -723,6 +726,8 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev) return ret; } + adev->pm.sysfs_initialized = true; + return 0; } -- cgit v1.2.3 From 1f2c6651f69c14d0d3a9cfbda44ea101b02160ba Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Sun, 11 Oct 2015 19:38:00 +0200 Subject: rbd: don't leak parent_spec in rbd_dev_probe_parent() Currently we leak parent_spec and trigger a "parent reference underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. The problem is we take the !parent out_err branch and that only drops refcounts; parent_spec that would've been freed had we called rbd_dev_unparent() remains and triggers rbd_warn() in rbd_dev_parent_put() - at that point we have parent_spec != NULL and parent_ref == 0, so counter ends up being -1 after the decrement. Redo rbd_dev_probe_parent() to fix this. Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2 Signed-off-by: Ilya Dryomov Reviewed-by: Alex Elder --- drivers/block/rbd.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index f5e49b639818..028db28cb8a0 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -5134,41 +5134,37 @@ out_err: static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) { struct rbd_device *parent = NULL; - struct rbd_spec *parent_spec; - struct rbd_client *rbdc; int ret; if (!rbd_dev->parent_spec) return 0; - /* - * We need to pass a reference to the client and the parent - * spec when creating the parent rbd_dev. Images related by - * parent/child relationships always share both. - */ - parent_spec = rbd_spec_get(rbd_dev->parent_spec); - rbdc = __rbd_get_client(rbd_dev->rbd_client); - ret = -ENOMEM; - parent = rbd_dev_create(rbdc, parent_spec, NULL); - if (!parent) + parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec, + NULL); + if (!parent) { + ret = -ENOMEM; goto out_err; + } + + /* + * Images related by parent/child relationships always share + * rbd_client and spec/parent_spec, so bump their refcounts. + */ + __rbd_get_client(rbd_dev->rbd_client); + rbd_spec_get(rbd_dev->parent_spec); ret = rbd_dev_image_probe(parent, false); if (ret < 0) goto out_err; + rbd_dev->parent = parent; atomic_set(&rbd_dev->parent_ref, 1); - return 0; + out_err: - if (parent) { - rbd_dev_unparent(rbd_dev); + rbd_dev_unparent(rbd_dev); + if (parent) rbd_dev_destroy(parent); - } else { - rbd_put_client(rbdc); - rbd_spec_put(parent_spec); - } - return ret; } -- cgit v1.2.3 From 6d69bb536bac0d403d83db1ca841444981b280cd Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Sun, 11 Oct 2015 19:38:00 +0200 Subject: rbd: prevent kernel stack blow up on rbd map Mapping an image with a long parent chain (e.g. image foo, whose parent is bar, whose parent is baz, etc) currently leads to a kernel stack overflow, due to the following recursion in the reply path: rbd_osd_req_callback() rbd_obj_request_complete() rbd_img_obj_callback() rbd_img_parent_read_callback() rbd_obj_request_complete() ... Limit the parent chain to 16 images, which is ~5K worth of stack. When the above recursion is eliminated, this limit can be lifted. Fixes: http://tracker.ceph.com/issues/12538 Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2 Signed-off-by: Ilya Dryomov Reviewed-by: Josh Durgin --- drivers/block/rbd.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 028db28cb8a0..6f26cf38c6f9 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -96,6 +96,8 @@ static int atomic_dec_return_safe(atomic_t *v) #define RBD_MINORS_PER_MAJOR 256 #define RBD_SINGLE_MAJOR_PART_SHIFT 4 +#define RBD_MAX_PARENT_CHAIN_LEN 16 + #define RBD_SNAP_DEV_NAME_PREFIX "snap_" #define RBD_MAX_SNAP_NAME_LEN \ (NAME_MAX - (sizeof (RBD_SNAP_DEV_NAME_PREFIX) - 1)) @@ -426,7 +428,7 @@ static ssize_t rbd_add_single_major(struct bus_type *bus, const char *buf, size_t count); static ssize_t rbd_remove_single_major(struct bus_type *bus, const char *buf, size_t count); -static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping); +static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth); static void rbd_spec_put(struct rbd_spec *spec); static int rbd_dev_id_to_minor(int dev_id) @@ -5131,7 +5133,12 @@ out_err: return ret; } -static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) +/* + * @depth is rbd_dev_image_probe() -> rbd_dev_probe_parent() -> + * rbd_dev_image_probe() recursion depth, which means it's also the + * length of the already discovered part of the parent chain. + */ +static int rbd_dev_probe_parent(struct rbd_device *rbd_dev, int depth) { struct rbd_device *parent = NULL; int ret; @@ -5139,6 +5146,12 @@ static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) if (!rbd_dev->parent_spec) return 0; + if (++depth > RBD_MAX_PARENT_CHAIN_LEN) { + pr_info("parent chain is too long (%d)\n", depth); + ret = -EINVAL; + goto out_err; + } + parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec, NULL); if (!parent) { @@ -5153,7 +5166,7 @@ static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) __rbd_get_client(rbd_dev->rbd_client); rbd_spec_get(rbd_dev->parent_spec); - ret = rbd_dev_image_probe(parent, false); + ret = rbd_dev_image_probe(parent, depth); if (ret < 0) goto out_err; @@ -5282,7 +5295,7 @@ static void rbd_dev_image_release(struct rbd_device *rbd_dev) * parent), initiate a watch on its header object before using that * object to get detailed information about the rbd image. */ -static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping) +static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) { int ret; @@ -5300,7 +5313,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping) if (ret) goto err_out_format; - if (mapping) { + if (!depth) { ret = rbd_dev_header_watch_sync(rbd_dev); if (ret) { if (ret == -ENOENT) @@ -5321,7 +5334,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping) * Otherwise this is a parent image, identified by pool, image * and snap ids - need to fill in names for those ids. */ - if (mapping) + if (!depth) ret = rbd_spec_fill_snap_id(rbd_dev); else ret = rbd_spec_fill_names(rbd_dev); @@ -5343,12 +5356,12 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping) * Need to warn users if this image is the one being * mapped and has a parent. */ - if (mapping && rbd_dev->parent_spec) + if (!depth && rbd_dev->parent_spec) rbd_warn(rbd_dev, "WARNING: kernel layering is EXPERIMENTAL!"); } - ret = rbd_dev_probe_parent(rbd_dev); + ret = rbd_dev_probe_parent(rbd_dev, depth); if (ret) goto err_out_probe; @@ -5359,7 +5372,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping) err_out_probe: rbd_dev_unprobe(rbd_dev); err_out_watch: - if (mapping) + if (!depth) rbd_dev_header_unwatch_sync(rbd_dev); out_header_name: kfree(rbd_dev->header_name); @@ -5422,7 +5435,7 @@ static ssize_t do_rbd_add(struct bus_type *bus, spec = NULL; /* rbd_dev now owns this */ rbd_opts = NULL; /* rbd_dev now owns this */ - rc = rbd_dev_image_probe(rbd_dev, true); + rc = rbd_dev_image_probe(rbd_dev, 0); if (rc < 0) goto err_out_rbd_dev; -- cgit v1.2.3 From 2871c69e025e8bc507651d5a9cf81a8a7da9d24b Mon Sep 17 00:00:00 2001 From: Joe Thornber Date: Wed, 21 Oct 2015 18:36:49 +0100 Subject: dm btree remove: fix a bug when rebalancing nodes after removal Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't a complete fix for redistribute3(). The redistribute3 function takes 3 btree nodes and shares out the entries evenly between them. If the three nodes in total contained (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in the center. Fix this issue by being more careful about calculating the target number of entries for the left and right nodes. Unit tested in userspace using this program: https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org --- drivers/md/persistent-data/dm-btree-remove.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c index 421a36c593e3..2e4c4cb79e4d 100644 --- a/drivers/md/persistent-data/dm-btree-remove.c +++ b/drivers/md/persistent-data/dm-btree-remove.c @@ -301,11 +301,16 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent, { int s; uint32_t max_entries = le32_to_cpu(left->header.max_entries); - unsigned target = (nr_left + nr_center + nr_right) / 3; - BUG_ON(target > max_entries); + unsigned total = nr_left + nr_center + nr_right; + unsigned target_right = total / 3; + unsigned remainder = (target_right * 3) != total; + unsigned target_left = target_right + remainder; + + BUG_ON(target_left > max_entries); + BUG_ON(target_right > max_entries); if (nr_left < nr_right) { - s = nr_left - target; + s = nr_left - target_left; if (s < 0 && nr_center < -s) { /* not enough in central node */ @@ -316,10 +321,10 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent, } else shift(left, center, s); - shift(center, right, target - nr_right); + shift(center, right, target_right - nr_right); } else { - s = target - nr_right; + s = target_right - nr_right; if (s > 0 && nr_center < s) { /* not enough in central node */ shift(center, right, nr_center); @@ -329,7 +334,7 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent, } else shift(center, right, s); - shift(left, center, nr_left - target); + shift(left, center, nr_left - target_left); } *key_ptr(parent, c->index) = center->keys[0]; -- cgit v1.2.3 From 4dcb8b57df3593dcb20481d9d6cf79d1dc1534be Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 22 Oct 2015 10:56:40 -0400 Subject: dm btree: fix leak of bufio-backed block in btree_split_beneath error path btree_split_beneath()'s error path had an outstanding FIXME that speaks directly to the potential for _not_ cleaning up a previously allocated bufio-backed block. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Cc: stable@vger.kernel.org --- drivers/md/persistent-data/dm-btree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c index b6cec258cc21..0e09aef43998 100644 --- a/drivers/md/persistent-data/dm-btree.c +++ b/drivers/md/persistent-data/dm-btree.c @@ -523,7 +523,7 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key) r = new_block(s->info, &right); if (r < 0) { - /* FIXME: put left */ + unlock_block(s->info, left); return r; } -- cgit v1.2.3 From 3201ac452e84a8a368197d648c9b7011e061804a Mon Sep 17 00:00:00 2001 From: Joe Thornber Date: Thu, 22 Oct 2015 18:10:55 +0100 Subject: dm cache: the CLEAN_SHUTDOWN flag was not being set If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache blocks are marked as dirty and a full writeback occurs. __commit_transaction() is responsible for setting/clearing CLEAN_SHUTDOWN (based the flags_mutator that is passed in). Fix this issue, of the cache's on-disk flags being wrong, by making sure __commit_transaction() does not reset the flags after the mutator has altered the flags in preparation for them being serialized to disk. before: sb_flags = mutator(le32_to_cpu(disk_super->flags)); disk_super->flags = cpu_to_le32(sb_flags); disk_super->flags = cpu_to_le32(cmd->flags); after: disk_super->flags = cpu_to_le32(cmd->flags); sb_flags = mutator(le32_to_cpu(disk_super->flags)); disk_super->flags = cpu_to_le32(sb_flags); Reported-by: Bogdan Vasiliev Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org --- drivers/md/dm-cache-metadata.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 20cc36b01b77..0a17d1b91a81 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -634,10 +634,10 @@ static int __commit_transaction(struct dm_cache_metadata *cmd, disk_super = dm_block_data(sblock); + disk_super->flags = cpu_to_le32(cmd->flags); if (mutator) update_flags(disk_super, mutator); - disk_super->flags = cpu_to_le32(cmd->flags); disk_super->mapping_root = cpu_to_le64(cmd->root); disk_super->hint_root = cpu_to_le64(cmd->hint_root); disk_super->discard_root = cpu_to_le64(cmd->discard_root); -- cgit v1.2.3 From 5dd32eae604ee503e5a84a4f18d1381e4cc356cb Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Sat, 17 Oct 2015 21:52:27 +0300 Subject: i2c: pnx: fix runtime warnings caused by enabling unprepared clock The driver can not be used on a platform with common clock framework until clk_prepare/clk_unprepare calls are added, otherwise clk_enable calls will fail and a WARN is generated. Signed-off-by: Vladimir Zapolskiy Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-pnx.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/i2c/busses/i2c-pnx.c b/drivers/i2c/busses/i2c-pnx.c index e814a36d9b78..6f8b446be5b0 100644 --- a/drivers/i2c/busses/i2c-pnx.c +++ b/drivers/i2c/busses/i2c-pnx.c @@ -600,7 +600,7 @@ static int i2c_pnx_controller_suspend(struct device *dev) { struct i2c_pnx_algo_data *alg_data = dev_get_drvdata(dev); - clk_disable(alg_data->clk); + clk_disable_unprepare(alg_data->clk); return 0; } @@ -609,7 +609,7 @@ static int i2c_pnx_controller_resume(struct device *dev) { struct i2c_pnx_algo_data *alg_data = dev_get_drvdata(dev); - return clk_enable(alg_data->clk); + return clk_prepare_enable(alg_data->clk); } static SIMPLE_DEV_PM_OPS(i2c_pnx_pm, @@ -672,7 +672,7 @@ static int i2c_pnx_probe(struct platform_device *pdev) if (IS_ERR(alg_data->ioaddr)) return PTR_ERR(alg_data->ioaddr); - ret = clk_enable(alg_data->clk); + ret = clk_prepare_enable(alg_data->clk); if (ret) return ret; @@ -726,7 +726,7 @@ static int i2c_pnx_probe(struct platform_device *pdev) return 0; out_clock: - clk_disable(alg_data->clk); + clk_disable_unprepare(alg_data->clk); return ret; } @@ -735,7 +735,7 @@ static int i2c_pnx_remove(struct platform_device *pdev) struct i2c_pnx_algo_data *alg_data = platform_get_drvdata(pdev); i2c_del_adapter(&alg_data->adapter); - clk_disable(alg_data->clk); + clk_disable_unprepare(alg_data->clk); return 0; } -- cgit v1.2.3 From bd8688a199b864944bf62eebed0ca13b46249453 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 24 Oct 2015 16:02:16 +1100 Subject: md/raid1: don't clear bitmap bit when bad-block-list write fails. When a write fails and a bad-block-list is present, we can update the bad-block-list instead of writing the data. If this succeeds then it is OK clear the relevant bitmap-bit as no further 'sync' of the block is needed. However if writing the bad-block-list fails then we need to treat the write as failed and particularly must not clear the bitmap bit. Otherwise the device can be re-added (after any hardware connection issues are resolved) and because the relevant bit in the bitmap is clear, that block will not be resynced. This leads to data corruption. We already delay the final bio_endio() on the write until the bad-block-list is written so that when the write returns: either that data is safe, the bad-block record is safe, or the fact that the device is faulty is safe. However we *don't* delay the clearing of the bitmap, so the bitmap bit can be recorded as cleared before we know if the bad-block-list was written safely. So: delay that until the write really is safe. i.e. move the call to close_write() until just before calling bio_endio(), and recheck the 'is array degraded' status before making that call. This bug goes back to v3.1 when bad-block-lists were introduced, though it only affects arrays created with mdadm-3.3 or later as only those have bad-block lists. Backports will require at least Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") as well. I'll send that to 'stable' separately. Note that of the two tests of R1BIO_WriteError that this patch adds, the first is certain to fail and the second is certain to succeed. However doing it this way makes the patch more obviously correct. I will tidy the code up in a future merge window. Reported-and-tested-by: Nate Dailey Cc: Jes Sorensen Fixes: cd5ff9a16f08 ("md/raid1: Handle write errors by updating badblock log.") Signed-off-by: NeilBrown --- drivers/md/raid1.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index cfca6edf7813..d9d031ede4bf 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2258,15 +2258,16 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) rdev_dec_pending(conf->mirrors[m].rdev, conf->mddev); } - if (test_bit(R1BIO_WriteError, &r1_bio->state)) - close_write(r1_bio); if (fail) { spin_lock_irq(&conf->device_lock); list_add(&r1_bio->retry_list, &conf->bio_end_io_list); spin_unlock_irq(&conf->device_lock); md_wakeup_thread(conf->mddev->thread); - } else + } else { + if (test_bit(R1BIO_WriteError, &r1_bio->state)) + close_write(r1_bio); raid_end_bio_io(r1_bio); + } } static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) @@ -2385,6 +2386,10 @@ static void raid1d(struct md_thread *thread) r1_bio = list_first_entry(&tmp, struct r1bio, retry_list); list_del(&r1_bio->retry_list); + if (mddev->degraded) + set_bit(R1BIO_Degraded, &r1_bio->state); + if (test_bit(R1BIO_WriteError, &r1_bio->state)) + close_write(r1_bio); raid_end_bio_io(r1_bio); } } -- cgit v1.2.3 From c340702ca26a628832fade4f133d8160a55c29cc Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 24 Oct 2015 16:23:48 +1100 Subject: md/raid10: don't clear bitmap bit when bad-block-list write fails. When a write fails and a bad-block-list is present, we can update the bad-block-list instead of writing the data. If this succeeds then it is OK clear the relevant bitmap-bit as no further 'sync' of the block is needed. However if writing the bad-block-list fails then we need to treat the write as failed and particularly must not clear the bitmap bit. Otherwise the device can be re-added (after any hardware connection issues are resolved) and because the relevant bit in the bitmap is clear, that block will not be resynced. This leads to data corruption. We already delay the final bio_endio() on the write until the bad-block-list is written so that when the write returns: either that data is safe, the bad-block record is safe, or the fact that the device is faulty is safe. However we *don't* delay the clearing of the bitmap, so the bitmap bit can be recorded as cleared before we know if the bad-block-list was written safely. So: delay that until the write really is safe. i.e. move the call to close_write() until just before calling bio_endio(), and recheck the 'is array degraded' status before making that call. This bug goes back to v3.1 when bad-block-lists were introduced, though it only affects arrays created with mdadm-3.3 or later as only those have bad-block lists. Backports will require at least Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.") as well. I'll send that to 'stable' separately. Note that of the two tests of R10BIO_WriteError that this patch adds, the first is certain to fail and the second is certain to succeed. However doing it this way makes the patch more obviously correct. I will tidy the code up in a future merge window. Reported-by: Nate Dailey Fixes: bd870a16c594 ("md/raid10: Handle write errors by updating badblock log.") Signed-off-by: NeilBrown --- drivers/md/raid10.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index a9ecec4e9a13..23de2144ee13 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2654,16 +2654,17 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio) rdev_dec_pending(rdev, conf->mddev); } } - if (test_bit(R10BIO_WriteError, - &r10_bio->state)) - close_write(r10_bio); if (fail) { spin_lock_irq(&conf->device_lock); list_add(&r10_bio->retry_list, &conf->bio_end_io_list); spin_unlock_irq(&conf->device_lock); md_wakeup_thread(conf->mddev->thread); - } else + } else { + if (test_bit(R10BIO_WriteError, + &r10_bio->state)) + close_write(r10_bio); raid_end_bio_io(r10_bio); + } } } @@ -2691,6 +2692,12 @@ static void raid10d(struct md_thread *thread) r10_bio = list_first_entry(&tmp, struct r10bio, retry_list); list_del(&r10_bio->retry_list); + if (mddev->degraded) + set_bit(R10BIO_Degraded, &r10_bio->state); + + if (test_bit(R10BIO_WriteError, + &r10_bio->state)) + close_write(r10_bio); raid_end_bio_io(r10_bio); } } -- cgit v1.2.3 From 8bce6d35b308d73cdb2ee273c95d711a55be688c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 22 Oct 2015 13:20:15 +1100 Subject: md/raid10: fix the 'new' raid10 layout to work correctly. In Linux 3.9 we introduce a new 'far' layout for RAID10 which was supposed to rotate the replicas differently and so provide better resilience. In particular it could survive more combinations of 2 drive failures. Unfortunately. due to a coding error, this some did what was wanted, sometimes improved less than we hoped, and sometimes - in very unlikely circumstances - put multiple replicas on the same device so the redundancy was harmed. No public user-space tool has created arrays using this layout so it is very unlikely that zero-redundancy arrays actually exist. Probably no arrays using any form of the new layout exist. But we cannot be certain. So use another bit in the 'layout' number and introduce a bug-fixed version of the layout. Also when assembling an array, if it has a zero-redundancy layout, give a warning. Reported-by: Heinz Mauelshagen Signed-off-by: NeilBrown --- drivers/md/raid10.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 23de2144ee13..96f365968306 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -39,6 +39,7 @@ * far_copies (stored in second byte of layout) * far_offset (stored in bit 16 of layout ) * use_far_sets (stored in bit 17 of layout ) + * use_far_sets_bugfixed (stored in bit 18 of layout ) * * The data to be stored is divided into chunks using chunksize. Each device * is divided into far_copies sections. In each section, chunks are laid out @@ -1497,6 +1498,8 @@ static void status(struct seq_file *seq, struct mddev *mddev) seq_printf(seq, " %d offset-copies", conf->geo.far_copies); else seq_printf(seq, " %d far-copies", conf->geo.far_copies); + if (conf->geo.far_set_size != conf->geo.raid_disks) + seq_printf(seq, " %d devices per set", conf->geo.far_set_size); } seq_printf(seq, " [%d/%d] [", conf->geo.raid_disks, conf->geo.raid_disks - mddev->degraded); @@ -3394,7 +3397,7 @@ static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new) disks = mddev->raid_disks + mddev->delta_disks; break; } - if (layout >> 18) + if (layout >> 19) return -1; if (chunk < (PAGE_SIZE >> 9) || !is_power_of_2(chunk)) @@ -3406,7 +3409,22 @@ static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new) geo->near_copies = nc; geo->far_copies = fc; geo->far_offset = fo; - geo->far_set_size = (layout & (1<<17)) ? disks / fc : disks; + switch (layout >> 17) { + case 0: /* original layout. simple but not always optimal */ + geo->far_set_size = disks; + break; + case 1: /* "improved" layout which was buggy. Hopefully no-one is + * actually using this, but leave code here just in case.*/ + geo->far_set_size = disks/fc; + WARN(geo->far_set_size < fc, + "This RAID10 layout does not provide data safety - please backup and create new array\n"); + break; + case 2: /* "improved" layout fixed to match documentation */ + geo->far_set_size = fc * nc; + break; + default: /* Not a valid layout */ + return -1; + } geo->chunk_mask = chunk - 1; geo->chunk_shift = ffz(~chunk); return nc*fc; -- cgit v1.2.3 From 32b88194f71d6ae7768a29f87fbba454728273ee Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 25 Oct 2015 10:39:47 +0900 Subject: Linux 4.3-rc7 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index d33ab74bffce..431067a41fcf 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 4 PATCHLEVEL = 3 SUBLEVEL = 0 -EXTRAVERSION = -rc6 +EXTRAVERSION = -rc7 NAME = Blurry Fish Butt # *DOCUMENTATION* -- cgit v1.2.3 From f9bf45e08ef36b6726a5744f0029325e81b3248a Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Fri, 23 Oct 2015 17:14:07 -0700 Subject: net: thunderx: Remove PF soft reset. In some silicon revisions, the soft reset clobbers PCI config space, so quit doing the reset. Signed-off-by: Sunil Goutham Signed-off-by: David Daney Signed-off-by: David S. Miller --- drivers/net/ethernet/cavium/thunder/nic_main.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c index b3a5947a2cc0..d6e3219ddeb3 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_main.c +++ b/drivers/net/ethernet/cavium/thunder/nic_main.c @@ -305,9 +305,6 @@ static void nic_init_hw(struct nicpf *nic) { int i; - /* Reset NIC, in case the driver is repeatedly inserted and removed */ - nic_reg_write(nic, NIC_PF_SOFT_RESET, 1); - /* Enable NIC HW block */ nic_reg_write(nic, NIC_PF_CFG, 0x3); -- cgit v1.2.3 From 4e85777ff071b51f500b130b6d036922af32be25 Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Fri, 23 Oct 2015 17:14:08 -0700 Subject: net: thunderx: Fix incorrect subsystem devid of VF on pass2 silicon Signed-off-by: Sunil Goutham Signed-off-by: David Daney Signed-off-by: David S. Miller --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index b63e579aeb12..a9377727c11c 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -29,7 +29,7 @@ static const struct pci_device_id nicvf_id_table[] = { { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_NIC_VF, - PCI_VENDOR_ID_CAVIUM, 0xA11E) }, + PCI_VENDOR_ID_CAVIUM, 0xA134) }, { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_PASS1_NIC_VF, PCI_VENDOR_ID_CAVIUM, 0xA11E) }, -- cgit v1.2.3 From 88ed237720bd618240439714a57fb69ea96428e7 Mon Sep 17 00:00:00 2001 From: David Daney Date: Fri, 23 Oct 2015 17:14:09 -0700 Subject: net: thunderx: Rewrite silicon revision tests. The test for pass-1 silicon was incorrect, it should be for all revisions less than 8. Also the revision is already present in the pci_dev, so there is no need to read and keep a private copy. Remove rev_id and code to read it from struct nicpf. Create new static inline function pass1_silicon() to be used to testing the silicon version. Use pass1_silicon() for revision checks, this will be more widely used in follow on patches. Signed-off-by: David Daney Signed-off-by: David S. Miller --- drivers/net/ethernet/cavium/thunder/nic_main.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c index d6e3219ddeb3..52e1acb69562 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_main.c +++ b/drivers/net/ethernet/cavium/thunder/nic_main.c @@ -22,7 +22,6 @@ struct nicpf { struct pci_dev *pdev; - u8 rev_id; u8 node; unsigned int flags; u8 num_vf_en; /* No of VF enabled */ @@ -54,6 +53,11 @@ struct nicpf { bool irq_allocated[NIC_PF_MSIX_VECTORS]; }; +static inline bool pass1_silicon(struct nicpf *nic) +{ + return nic->pdev->revision < 8; +} + /* Supported devices */ static const struct pci_device_id nic_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_NIC_PF) }, @@ -117,7 +121,7 @@ static void nic_send_msg_to_vf(struct nicpf *nic, int vf, union nic_mbx *mbx) * when PF writes to MBOX(1), in next revisions when * PF writes to MBOX(0) */ - if (nic->rev_id == 0) { + if (pass1_silicon(nic)) { /* see the comment for nic_reg_write()/nic_reg_read() * functions above */ @@ -998,8 +1002,6 @@ static int nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) goto err_release_regions; } - pci_read_config_byte(pdev, PCI_REVISION_ID, &nic->rev_id); - nic->node = nic_get_node_id(pdev); nic_set_lmac_vf_mapping(nic); -- cgit v1.2.3 From 34411b68b132e403ddf395419e986475a9993d9b Mon Sep 17 00:00:00 2001 From: Thanneeru Srinivasulu Date: Fri, 23 Oct 2015 17:14:10 -0700 Subject: net: thunderx: Incorporate pass2 silicon CPI index configuration changes Add support for ThunderX pass2 CPI and MPI configuration changes. MPI_ALG is not enabled i.e MCAM parsing is disabled. Signed-off-by: Thanneeru Srinivasulu Signed-off-by: Sunil Goutham Signed-off-by: David Daney Signed-off-by: David S. Miller --- drivers/net/ethernet/cavium/thunder/nic_main.c | 29 ++++++++++++++++++++------ drivers/net/ethernet/cavium/thunder/nic_reg.h | 4 ++++ 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c index 52e1acb69562..c561fdcb79a7 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_main.c +++ b/drivers/net/ethernet/cavium/thunder/nic_main.c @@ -43,6 +43,7 @@ struct nicpf { u8 duplex[MAX_LMAC]; u32 speed[MAX_LMAC]; u16 cpi_base[MAX_NUM_VFS_SUPPORTED]; + u16 rssi_base[MAX_NUM_VFS_SUPPORTED]; u16 rss_ind_tbl_size; bool mbx_lock[MAX_NUM_VFS_SUPPORTED]; @@ -396,8 +397,18 @@ static void nic_config_cpi(struct nicpf *nic, struct cpi_cfg_msg *cfg) padd = cpi % 8; /* 3 bits CS out of 6bits DSCP */ /* Leave RSS_SIZE as '0' to disable RSS */ - nic_reg_write(nic, NIC_PF_CPI_0_2047_CFG | (cpi << 3), - (vnic << 24) | (padd << 16) | (rssi_base + rssi)); + if (pass1_silicon(nic)) { + nic_reg_write(nic, NIC_PF_CPI_0_2047_CFG | (cpi << 3), + (vnic << 24) | (padd << 16) | + (rssi_base + rssi)); + } else { + /* Set MPI_ALG to '0' to disable MCAM parsing */ + nic_reg_write(nic, NIC_PF_CPI_0_2047_CFG | (cpi << 3), + (padd << 16)); + /* MPI index is same as CPI if MPI_ALG is not enabled */ + nic_reg_write(nic, NIC_PF_MPI_0_2047_CFG | (cpi << 3), + (vnic << 24) | (rssi_base + rssi)); + } if ((rssi + 1) >= cfg->rq_cnt) continue; @@ -410,6 +421,7 @@ static void nic_config_cpi(struct nicpf *nic, struct cpi_cfg_msg *cfg) rssi = ((cpi - cpi_base) & 0x38) >> 3; } nic->cpi_base[cfg->vf_id] = cpi_base; + nic->rssi_base[cfg->vf_id] = rssi_base; } /* Responsds to VF with its RSS indirection table size */ @@ -435,10 +447,9 @@ static void nic_config_rss(struct nicpf *nic, struct rss_cfg_msg *cfg) { u8 qset, idx = 0; u64 cpi_cfg, cpi_base, rssi_base, rssi; + u64 idx_addr; - cpi_base = nic->cpi_base[cfg->vf_id]; - cpi_cfg = nic_reg_read(nic, NIC_PF_CPI_0_2047_CFG | (cpi_base << 3)); - rssi_base = (cpi_cfg & 0x0FFF) + cfg->tbl_offset; + rssi_base = nic->rssi_base[cfg->vf_id] + cfg->tbl_offset; rssi = rssi_base; qset = cfg->vf_id; @@ -455,9 +466,15 @@ static void nic_config_rss(struct nicpf *nic, struct rss_cfg_msg *cfg) idx++; } + cpi_base = nic->cpi_base[cfg->vf_id]; + if (pass1_silicon(nic)) + idx_addr = NIC_PF_CPI_0_2047_CFG; + else + idx_addr = NIC_PF_MPI_0_2047_CFG; + cpi_cfg = nic_reg_read(nic, idx_addr | (cpi_base << 3)); cpi_cfg &= ~(0xFULL << 20); cpi_cfg |= (cfg->hash_bits << 20); - nic_reg_write(nic, NIC_PF_CPI_0_2047_CFG | (cpi_base << 3), cpi_cfg); + nic_reg_write(nic, idx_addr | (cpi_base << 3), cpi_cfg); } /* 4 level transmit side scheduler configutation diff --git a/drivers/net/ethernet/cavium/thunder/nic_reg.h b/drivers/net/ethernet/cavium/thunder/nic_reg.h index 58197bb2f805..dd536be20193 100644 --- a/drivers/net/ethernet/cavium/thunder/nic_reg.h +++ b/drivers/net/ethernet/cavium/thunder/nic_reg.h @@ -85,7 +85,11 @@ #define NIC_PF_ECC3_DBE_INT_W1S (0x2708) #define NIC_PF_ECC3_DBE_ENA_W1C (0x2710) #define NIC_PF_ECC3_DBE_ENA_W1S (0x2718) +#define NIC_PF_MCAM_0_191_ENA (0x100000) +#define NIC_PF_MCAM_0_191_M_0_5_DATA (0x110000) +#define NIC_PF_MCAM_CTRL (0x120000) #define NIC_PF_CPI_0_2047_CFG (0x200000) +#define NIC_PF_MPI_0_2047_CFG (0x210000) #define NIC_PF_RSSI_0_4097_RQ (0x220000) #define NIC_PF_LMAC_0_7_CFG (0x240000) #define NIC_PF_LMAC_0_7_SW_XOFF (0x242000) -- cgit v1.2.3 From 5188f7e5a7175975f8b943a4b25e499c98a7b9d6 Mon Sep 17 00:00:00 2001 From: Claudiu Manoil Date: Fri, 23 Oct 2015 11:41:58 +0300 Subject: gianfar: Remove duplicated argument to bitwise OR RQFCR_AND is duplicated. Add missing space as well. Signed-off-by: Claudiu Manoil Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/gianfar_ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c index 6bdc89179b72..a33e4a829601 100644 --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c @@ -676,14 +676,14 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow) u32 fcr = 0x0, fpr = FPR_FILER_MASK; if (ethflow & RXH_L2DA) { - fcr = RQFCR_PID_DAH |RQFCR_CMP_NOMATCH | + fcr = RQFCR_PID_DAH | RQFCR_CMP_NOMATCH | RQFCR_HASH | RQFCR_AND | RQFCR_HASHTBL_0; priv->ftp_rqfpr[priv->cur_filer_idx] = fpr; priv->ftp_rqfcr[priv->cur_filer_idx] = fcr; gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr); priv->cur_filer_idx = priv->cur_filer_idx - 1; - fcr = RQFCR_PID_DAL | RQFCR_AND | RQFCR_CMP_NOMATCH | + fcr = RQFCR_PID_DAL | RQFCR_CMP_NOMATCH | RQFCR_HASH | RQFCR_AND | RQFCR_HASHTBL_0; priv->ftp_rqfpr[priv->cur_filer_idx] = fpr; priv->ftp_rqfcr[priv->cur_filer_idx] = fcr; -- cgit v1.2.3 From 15bf176db1fb00333af7050c0c699fc7b4e4a960 Mon Sep 17 00:00:00 2001 From: Claudiu Manoil Date: Fri, 23 Oct 2015 11:41:59 +0300 Subject: gianfar: Don't enable the Filer w/o the Parser Under one unusual circumstance it's possible to wrongly set FILREN without enabling PRSDEP as well in the RCTRL register, against the hardware specifications. With the default config this does not happen because the default Rx offloads (Rx csum and Rx VLAN) properly enable PRSDEP. But if anyone disables all these offloads (via ethtool), we get a wrong configuration were the Rx flow classification and hashing, and other Filer based features (e.g. wake-on-filer interrupt) won't work. This patch fixes the issue. Also, account for Rx FCB insertion which happens every time PRSDEP is set. Signed-off-by: Claudiu Manoil Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/gianfar.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 710715fcb23d..939ed8fe6318 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -341,7 +341,7 @@ static void gfar_rx_offload_en(struct gfar_private *priv) if (priv->ndev->features & (NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_RX)) priv->uses_rxfcb = 1; - if (priv->hwts_rx_en) + if (priv->hwts_rx_en || priv->rx_filer_enable) priv->uses_rxfcb = 1; } @@ -351,7 +351,7 @@ static void gfar_mac_rx_config(struct gfar_private *priv) u32 rctrl = 0; if (priv->rx_filer_enable) { - rctrl |= RCTRL_FILREN; + rctrl |= RCTRL_FILREN | RCTRL_PRSDEP_INIT; /* Program the RIR0 reg with the required distribution */ if (priv->poll_mode == GFAR_SQ_POLLING) gfar_write(®s->rir0, DEFAULT_2RXQ_RIR0); -- cgit v1.2.3 From 1de65a5ea32de7b335ab505366d45cefadbbdf71 Mon Sep 17 00:00:00 2001 From: Claudiu Manoil Date: Fri, 23 Oct 2015 11:42:00 +0300 Subject: gianfar: Fix Rx BSY error handling The Rx BSY error interrupt indicates that a frame was received and discarded due to lack of buffers, so it's a rx ring overflow condition and has nothing to do with with bad rx packets. Use the right counter. BSY conditions happen when the SoC is under performance stress. Doing *more* work in stress situations by trying to schedule NAPI is not a good idea as the stressed system becomes still more stressed. The Rx interrupt is already at work making sure the NAPI is scheduled. So calling gfar_receive() here does not help. This issue was present since day 1. Signed-off-by: Claudiu Manoil Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/gianfar.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 939ed8fe6318..ce38d266f931 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -3462,11 +3462,9 @@ static irqreturn_t gfar_error(int irq, void *grp_id) netif_dbg(priv, tx_err, dev, "Transmit Error\n"); } if (events & IEVENT_BSY) { - dev->stats.rx_errors++; + dev->stats.rx_over_errors++; atomic64_inc(&priv->extra_stats.rx_bsy); - gfar_receive(irq, grp_id); - netif_dbg(priv, rx_err, dev, "busy error (rstat: %x)\n", gfar_read(®s->rstat)); } -- cgit v1.2.3 From abb1ed7b793fcb10cadb378fe0eeee589b61a9e1 Mon Sep 17 00:00:00 2001 From: Claudiu Manoil Date: Fri, 23 Oct 2015 11:42:01 +0300 Subject: MAINTAINERS: Add entry for gianfar ethernet driver Signed-off-by: Claudiu Manoil Signed-off-by: David S. Miller --- MAINTAINERS | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index fb7d2e4af200..9b43ef23b985 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4427,6 +4427,14 @@ L: linuxppc-dev@lists.ozlabs.org S: Maintained F: drivers/net/ethernet/freescale/ucc_geth* +FREESCALE eTSEC ETHERNET DRIVER (GIANFAR) +M: Claudiu Manoil +L: netdev@vger.kernel.org +S: Maintained +F: drivers/net/ethernet/freescale/gianfar* +X: drivers/net/ethernet/freescale/gianfar_ptp.c +F: Documentation/devicetree/bindings/net/fsl-tsec-phy.txt + FREESCALE QUICC ENGINE UCC UART DRIVER M: Timur Tabi L: linuxppc-dev@lists.ozlabs.org -- cgit v1.2.3 From 195562194aad3a0a3915941077f283bcc6347b9b Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 26 Oct 2015 01:50:28 -0700 Subject: Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 92bac83dd79e ("Input: alps - non interleaved V2 dualpoint has separate stick button bits") assumes that all alps v2 non-interleaved dual point setups have the separate stick button bits. Later we limited this to Dell laptops only because of reports that this broke things on non Dell laptops. Now it turns out that this breaks things on the Dell Latitude D600 too. So it seems that only the Dell Latitude D420/430/620/630, which all share the same touchpad / stick combo, have these separate bits. This patch limits the checking of the separate bits to only these models fixing regressions with other models. Reported-and-tested-by: Larry Finger Cc: stable@vger.kernel.org Tested-by: Hans de Goede Signed-off-by: Hans de Goede Acked-By: Pali Rohár Signed-off-by: Dmitry Torokhov --- drivers/input/mouse/alps.c | 48 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c index 4d246861d692..41e6cb501e6a 100644 --- a/drivers/input/mouse/alps.c +++ b/drivers/input/mouse/alps.c @@ -100,7 +100,7 @@ static const struct alps_nibble_commands alps_v6_nibble_commands[] = { #define ALPS_FOUR_BUTTONS 0x40 /* 4 direction button present */ #define ALPS_PS2_INTERLEAVED 0x80 /* 3-byte PS/2 packet interleaved with 6-byte ALPS packet */ -#define ALPS_DELL 0x100 /* device is a Dell laptop */ +#define ALPS_STICK_BITS 0x100 /* separate stick button bits */ #define ALPS_BUTTONPAD 0x200 /* device is a clickpad */ static const struct alps_model_info alps_model_data[] = { @@ -159,6 +159,43 @@ static const struct alps_protocol_info alps_v8_protocol_data = { ALPS_PROTO_V8, 0x18, 0x18, 0 }; +/* + * Some v2 models report the stick buttons in separate bits + */ +static const struct dmi_system_id alps_dmi_has_separate_stick_buttons[] = { +#if defined(CONFIG_DMI) && defined(CONFIG_X86) + { + /* Extrapolated from other entries */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D420"), + }, + }, + { + /* Reported-by: Hans de Bruin */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D430"), + }, + }, + { + /* Reported-by: Hans de Goede */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D620"), + }, + }, + { + /* Extrapolated from other entries */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D630"), + }, + }, +#endif + { } +}; + static void alps_set_abs_params_st(struct alps_data *priv, struct input_dev *dev1); static void alps_set_abs_params_semi_mt(struct alps_data *priv, @@ -253,9 +290,8 @@ static void alps_process_packet_v1_v2(struct psmouse *psmouse) return; } - /* Dell non interleaved V2 dualpoint has separate stick button bits */ - if (priv->proto_version == ALPS_PROTO_V2 && - priv->flags == (ALPS_DELL | ALPS_PASS | ALPS_DUALPOINT)) { + /* Some models have separate stick button bits */ + if (priv->flags & ALPS_STICK_BITS) { left |= packet[0] & 1; right |= packet[0] & 2; middle |= packet[0] & 4; @@ -2552,8 +2588,6 @@ static int alps_set_protocol(struct psmouse *psmouse, priv->byte0 = protocol->byte0; priv->mask0 = protocol->mask0; priv->flags = protocol->flags; - if (dmi_name_in_vendors("Dell")) - priv->flags |= ALPS_DELL; priv->x_max = 2000; priv->y_max = 1400; @@ -2568,6 +2602,8 @@ static int alps_set_protocol(struct psmouse *psmouse, priv->set_abs_params = alps_set_abs_params_st; priv->x_max = 1023; priv->y_max = 767; + if (dmi_check_system(alps_dmi_has_separate_stick_buttons)) + priv->flags |= ALPS_STICK_BITS; break; case ALPS_PROTO_V3: -- cgit v1.2.3 From ab8579169b79c062935dade949287113c7c1ba73 Mon Sep 17 00:00:00 2001 From: Sergei Shtylyov Date: Sat, 24 Oct 2015 00:46:03 +0300 Subject: sh_eth: fix RX buffer size alignment Both Renesas R-Car and RZ/A1 manuals state that RX buffer length must be a multiple of 32 bytes, while the driver only uses 16 byte granularity... Signed-off-by: Sergei Shtylyov Signed-off-by: David S. Miller --- drivers/net/ethernet/renesas/sh_eth.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 257ea713b4c1..d8334d8a53b3 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -1148,8 +1148,8 @@ static void sh_eth_ring_format(struct net_device *ndev) /* RX descriptor */ rxdesc = &mdp->rx_ring[i]; - /* The size of the buffer is a multiple of 16 bytes. */ - rxdesc->buffer_length = ALIGN(mdp->rx_buf_sz, 16); + /* The size of the buffer is a multiple of 32 bytes. */ + rxdesc->buffer_length = ALIGN(mdp->rx_buf_sz, 32); dma_addr = dma_map_single(&ndev->dev, skb->data, rxdesc->buffer_length, DMA_FROM_DEVICE); @@ -1506,7 +1506,7 @@ static int sh_eth_rx(struct net_device *ndev, u32 intr_status, int *quota) if (mdp->cd->rpadir) skb_reserve(skb, NET_IP_ALIGN); dma_unmap_single(&ndev->dev, rxdesc->addr, - ALIGN(mdp->rx_buf_sz, 16), + ALIGN(mdp->rx_buf_sz, 32), DMA_FROM_DEVICE); skb_put(skb, pkt_len); skb->protocol = eth_type_trans(skb, ndev); @@ -1524,8 +1524,8 @@ static int sh_eth_rx(struct net_device *ndev, u32 intr_status, int *quota) for (; mdp->cur_rx - mdp->dirty_rx > 0; mdp->dirty_rx++) { entry = mdp->dirty_rx % mdp->num_rx_ring; rxdesc = &mdp->rx_ring[entry]; - /* The size of the buffer is 16 byte boundary. */ - rxdesc->buffer_length = ALIGN(mdp->rx_buf_sz, 16); + /* The size of the buffer is 32 byte boundary. */ + rxdesc->buffer_length = ALIGN(mdp->rx_buf_sz, 32); if (mdp->rx_skbuff[entry] == NULL) { skb = netdev_alloc_skb(ndev, skbuff_size); -- cgit v1.2.3 From cb3685958dd4c46d7646d244063ea3ec8adf3618 Mon Sep 17 00:00:00 2001 From: Sergei Shtylyov Date: Sat, 24 Oct 2015 00:46:40 +0300 Subject: sh_eth: fix RX buffer size calculation The RX buffer size calulation failed to account for the length granularity (which is now 32 bytes)... Signed-off-by: Sergei Shtylyov Signed-off-by: David S. Miller --- drivers/net/ethernet/renesas/sh_eth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index d8334d8a53b3..a484d8beb855 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -1127,7 +1127,7 @@ static void sh_eth_ring_format(struct net_device *ndev) struct sh_eth_txdesc *txdesc = NULL; int rx_ringsize = sizeof(*rxdesc) * mdp->num_rx_ring; int tx_ringsize = sizeof(*txdesc) * mdp->num_tx_ring; - int skbuff_size = mdp->rx_buf_sz + SH_ETH_RX_ALIGN - 1; + int skbuff_size = mdp->rx_buf_sz + SH_ETH_RX_ALIGN + 32 - 1; dma_addr_t dma_addr; mdp->cur_rx = 0; @@ -1450,7 +1450,7 @@ static int sh_eth_rx(struct net_device *ndev, u32 intr_status, int *quota) struct sk_buff *skb; u16 pkt_len = 0; u32 desc_status; - int skbuff_size = mdp->rx_buf_sz + SH_ETH_RX_ALIGN - 1; + int skbuff_size = mdp->rx_buf_sz + SH_ETH_RX_ALIGN + 32 - 1; dma_addr_t dma_addr; boguscnt = min(boguscnt, *quota); -- cgit v1.2.3 From 7e3b6e7423d5f994257c1de88e06b509673fdbcf Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 24 Oct 2015 05:47:44 -0700 Subject: ipv6: gre: support SIT encapsulation gre_gso_segment() chokes if SIT frames were aggregated by GRO engine. Fixes: 61c1db7fae21e ("ipv6: sit: add GSO/TSO support") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/gre_offload.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index 5aa46d4b44ef..5a8ee3282550 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -36,7 +36,8 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb, SKB_GSO_TCP_ECN | SKB_GSO_GRE | SKB_GSO_GRE_CSUM | - SKB_GSO_IPIP))) + SKB_GSO_IPIP | + SKB_GSO_SIT))) goto out; if (!skb->encapsulation) -- cgit v1.2.3 From 8c387ebbaff8652943a1cbcab496aecadc6a8875 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 25 Oct 2015 14:57:00 +0100 Subject: net: thunderx: add missing of_node_put for_each_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ local idexpression r.n; expression r,e; @@ for_each_child_of_node(r,n) { ... ( of_node_put(n); | e = n | + of_node_put(n); ? break; ) ... } ... when != n // Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 574c49278900..180aa9fabf48 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -977,8 +977,10 @@ static int bgx_init_of_phy(struct bgx *bgx) SET_NETDEV_DEV(&bgx->lmac[lmac].netdev, &bgx->pdev->dev); bgx->lmac[lmac].lmacid = lmac; lmac++; - if (lmac == MAX_LMAC_PER_BGX) + if (lmac == MAX_LMAC_PER_BGX) { + of_node_put(np_child); break; + } } return 0; } -- cgit v1.2.3 From bd252796852193277a07da505601a2f407c70e0b Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 25 Oct 2015 14:57:01 +0100 Subject: net: netcp: add missing of_node_put for_each_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ local idexpression r.n; expression r,e; @@ for_each_child_of_node(r,n) { ... ( of_node_put(n); | e = n | + of_node_put(n); ? break; ) ... } ... when != n // Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- drivers/net/ethernet/ti/netcp_ethss.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index 6bff8d82ceab..4e70e7586a09 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -2637,8 +2637,10 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev, mac_phy_link = true; slave->open = true; - if (gbe_dev->num_slaves >= gbe_dev->max_num_slaves) + if (gbe_dev->num_slaves >= gbe_dev->max_num_slaves) { + of_node_put(port); break; + } } /* of_phy_connect() is needed only for MAC-PHY interface */ @@ -3137,8 +3139,10 @@ static int gbe_probe(struct netcp_device *netcp_device, struct device *dev, continue; } gbe_dev->num_slaves++; - if (gbe_dev->num_slaves >= gbe_dev->max_num_slaves) + if (gbe_dev->num_slaves >= gbe_dev->max_num_slaves) { + of_node_put(interface); break; + } } of_node_put(interfaces); -- cgit v1.2.3 From 447ed7360037b6e38c0206ddcbd04a256ec94099 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 25 Oct 2015 14:57:02 +0100 Subject: netdev/phy: add missing of_node_put for_each_available_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ local idexpression r.n; expression r,e; @@ for_each_available_child_of_node(r,n) { ... ( of_node_put(n); | e = n | + of_node_put(n); ? break; ) ... } ... when != n // Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- drivers/net/phy/mdio-mux.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c index 280c7c311f72..908e8d486342 100644 --- a/drivers/net/phy/mdio-mux.c +++ b/drivers/net/phy/mdio-mux.c @@ -144,6 +144,7 @@ int mdio_mux_init(struct device *dev, dev_err(dev, "Error: Failed to allocate memory for child\n"); ret_val = -ENOMEM; + of_node_put(child_bus_node); break; } cb->bus_number = v; -- cgit v1.2.3 From 028623418766ea64f4256035b06ac6cbc0a67892 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 25 Oct 2015 14:57:03 +0100 Subject: net: phy: mdio: add missing of_node_put for_each_available_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ expression root,e; local idexpression child; @@ for_each_available_child_of_node(root, child) { ... when != of_node_put(child) when != e = child ( return child; | + of_node_put(child); ? return ...; ) ... } // Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- drivers/net/phy/mdio-mux-mmioreg.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/phy/mdio-mux-mmioreg.c b/drivers/net/phy/mdio-mux-mmioreg.c index 2377c1341172..7fde454fbc4f 100644 --- a/drivers/net/phy/mdio-mux-mmioreg.c +++ b/drivers/net/phy/mdio-mux-mmioreg.c @@ -113,12 +113,14 @@ static int mdio_mux_mmioreg_probe(struct platform_device *pdev) if (!iprop || len != sizeof(uint32_t)) { dev_err(&pdev->dev, "mdio-mux child node %s is " "missing a 'reg' property\n", np2->full_name); + of_node_put(np2); return -ENODEV; } if (be32_to_cpup(iprop) & ~s->mask) { dev_err(&pdev->dev, "mdio-mux child node %s has " "a 'reg' value with unmasked bits\n", np2->full_name); + of_node_put(np2); return -ENODEV; } } -- cgit v1.2.3 From 81a577034b000964ca791281a975f0ba9a9d7eed Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 25 Oct 2015 14:57:06 +0100 Subject: ath6kl: add missing of_node_put for_each_compatible_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ expression e; local idexpression n; @@ for_each_compatible_node(n,...) { ... when != of_node_put(n) when != e = n ( return n; | + of_node_put(n); ? return ...; ) ... } // Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- drivers/net/wireless/ath/ath6kl/init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/ath/ath6kl/init.c b/drivers/net/wireless/ath/ath6kl/init.c index 6e473fa4b13c..12241b1c57cd 100644 --- a/drivers/net/wireless/ath/ath6kl/init.c +++ b/drivers/net/wireless/ath/ath6kl/init.c @@ -715,6 +715,7 @@ static bool check_device_tree(struct ath6kl *ar) board_filename, ret); continue; } + of_node_put(node); return true; } return false; -- cgit v1.2.3 From 26b7974d9ad7f93891ee8c39ee63bd2515da7744 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 25 Oct 2015 14:57:07 +0100 Subject: net: mv643xx_eth: add missing of_node_put for_each_available_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ expression root,e; local idexpression child; @@ for_each_available_child_of_node(root, child) { ... when != of_node_put(child) when != e = child ( return child; | + of_node_put(child); ? return ...; ) ... } // Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/mv643xx_eth.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index e893a35143c5..dfb6d5f79a10 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -2817,8 +2817,10 @@ static int mv643xx_eth_shared_of_probe(struct platform_device *pdev) for_each_available_child_of_node(np, pnp) { ret = mv643xx_eth_shared_of_add_port(pdev, pnp); - if (ret) + if (ret) { + of_node_put(pnp); return ret; + } } return 0; } -- cgit v1.2.3 From 174fd8d369613c4e06660f3704caaba48dac8554 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 22 Oct 2015 09:27:12 +0900 Subject: blkcg: fix incorrect read/write sync/async stat accounting While unifying how blkcg stats are collected, 77ea733884eb ("blkcg: move io_service_bytes and io_serviced stats into blkcg_gq") incorrectly used bio->flags instead of bio->rw to tell the IO type. This made IOs to be accounted as the wrong type. Fix it. Signed-off-by: Tejun Heo Fixes: 77ea733884eb ("blkcg: move io_service_bytes and io_serviced stats into blkcg_gq") Reviewed-by: Jeff Moyer Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 0a5cc7a1109b..c02e669945e9 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -713,9 +713,9 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q, if (!throtl) { blkg = blkg ?: q->root_blkg; - blkg_rwstat_add(&blkg->stat_bytes, bio->bi_flags, + blkg_rwstat_add(&blkg->stat_bytes, bio->bi_rw, bio->bi_iter.bi_size); - blkg_rwstat_add(&blkg->stat_ios, bio->bi_flags, 1); + blkg_rwstat_add(&blkg->stat_ios, bio->bi_rw, 1); } rcu_read_unlock(); -- cgit v1.2.3 From a22c4d7e34402ccdf3414f64c50365436eba7b93 Mon Sep 17 00:00:00 2001 From: Ming Lin Date: Thu, 22 Oct 2015 09:59:42 -0700 Subject: block: re-add discard_granularity and alignment checks In commit b49a087("block: remove split code in blkdev_issue_{discard,write_same}"), discard_granularity and alignment checks were removed. Ideally, with bio late splitting, the upper layers shouldn't need to depend on device's limits. Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe device when mkfs.xfs. We have not found the root cause yet. This patch re-adds discard_granularity and alignment checks by reverting the related changes in commit b49a087. The good thing is now we can remove the 2G discard size cap and just use UINT_MAX to avoid bi_size overflow. Reviewed-by: Christoph Hellwig Tested-by: Christoph Hellwig Signed-off-by: Ming Lin Reviewed-by: Mike Snitzer Signed-off-by: Jens Axboe --- block/blk-lib.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index bd40292e5009..9ebf65379556 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio) bio_put(bio); } -/* - * Ensure that max discard sectors doesn't overflow bi_size and hopefully - * it is of the proper granularity as long as the granularity is a power - * of two. - */ -#define MAX_BIO_SECTORS ((1U << 31) >> 9) - /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; + unsigned int granularity; + int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q->limits.discard_granularity >> 9, 1U); + alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; + if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(&plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect; + sector_t end_sect, tmp; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS); + /* Make sure bi_size doesn't overflow */ + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); + + /* + * If splitting a request, and the next starting sector would be + * misaligned, stop the discard at the previous aligned sector. + */ end_sect = sector + req_sects; + tmp = end_sect; + if (req_sects < nr_sects && + sector_div(tmp, granularity) != alignment) { + end_sect = end_sect - alignment; + sector_div(end_sect, granularity); + end_sect = end_sect * granularity + alignment; + req_sects = end_sect - sector; + } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; -- cgit v1.2.3 From c2229fe1430d4e1c70e36520229dd64a87802b20 Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Tue, 27 Oct 2015 15:06:45 -0700 Subject: fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key We were computing the child index in cases where the key value we were looking for was actually less than the base key of the tnode. As a result we were getting incorrect index values that would cause us to skip over some children. To fix this I have added a test that will force us to use child index 0 if the key we are looking for is less than the key of the current tnode. Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: Brian Rak Signed-off-by: Alexander Duyck Signed-off-by: David S. Miller --- net/ipv4/fib_trie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 6c2af797f2f9..744e5936c10d 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1569,7 +1569,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **tn, t_key key) do { /* record parent and next child index */ pn = n; - cindex = key ? get_index(key, pn) : 0; + cindex = (key > pn->key) ? get_index(key, pn) : 0; if (cindex >> pn->bits) break; -- cgit v1.2.3 From 74c16618137f1505b0a32dea3ec73a2ef6f8f842 Mon Sep 17 00:00:00 2001 From: Joe Stringer Date: Sun, 25 Oct 2015 20:21:48 -0700 Subject: openvswitch: Fix double-free on ip_defrag() errors If ip_defrag() returns an error other than -EINPROGRESS, then the skb is freed. When handle_fragments() passes this back up to do_execute_actions(), it will be freed again. Prevent this double free by never freeing the skb in do_execute_actions() for errors returned by ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar Signed-off-by: David S. Miller --- net/openvswitch/actions.c | 4 ++-- net/openvswitch/conntrack.c | 17 +++++++++++++---- net/openvswitch/conntrack.h | 1 + 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 0bf0f406de52..dba635d086b2 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1109,8 +1109,8 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, nla_data(a)); /* Hide stolen IP fragments from user space. */ - if (err == -EINPROGRESS) - return 0; + if (err) + return err == -EINPROGRESS ? 0 : err; break; } diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index a5ec34f8502f..b5dcc0abde66 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -293,6 +293,9 @@ static int ovs_ct_helper(struct sk_buff *skb, u16 proto) return helper->help(skb, protoff, ct, ctinfo); } +/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero + * value if 'skb' is freed. + */ static int handle_fragments(struct net *net, struct sw_flow_key *key, u16 zone, struct sk_buff *skb) { @@ -308,8 +311,8 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, return err; ovs_cb.mru = IPCB(skb)->frag_max_size; - } else if (key->eth.type == htons(ETH_P_IPV6)) { #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) + } else if (key->eth.type == htons(ETH_P_IPV6)) { enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; struct sk_buff *reasm; @@ -318,17 +321,18 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, if (!reasm) return -EINPROGRESS; - if (skb == reasm) + if (skb == reasm) { + kfree_skb(skb); return -EINVAL; + } key->ip.proto = ipv6_hdr(reasm)->nexthdr; skb_morph(skb, reasm); consume_skb(reasm); ovs_cb.mru = IP6CB(skb)->frag_max_size; -#else - return -EPFNOSUPPORT; #endif } else { + kfree_skb(skb); return -EPFNOSUPPORT; } @@ -473,6 +477,9 @@ static bool labels_nonzero(const struct ovs_key_ct_labels *labels) return false; } +/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero + * value if 'skb' is freed. + */ int ovs_ct_execute(struct net *net, struct sk_buff *skb, struct sw_flow_key *key, const struct ovs_conntrack_info *info) @@ -508,6 +515,8 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, &info->labels.mask); err: skb_push(skb, nh_ofs); + if (err) + kfree_skb(skb); return err; } diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h index 82e0dfc66028..a7544f405c16 100644 --- a/net/openvswitch/conntrack.h +++ b/net/openvswitch/conntrack.h @@ -67,6 +67,7 @@ static inline int ovs_ct_execute(struct net *net, struct sk_buff *skb, struct sw_flow_key *key, const struct ovs_conntrack_info *info) { + kfree_skb(skb); return -ENOTSUPP; } -- cgit v1.2.3 From 190b8ffbb700a9aa47acc559779bc79c0cb14766 Mon Sep 17 00:00:00 2001 From: Joe Stringer Date: Sun, 25 Oct 2015 20:21:49 -0700 Subject: ipv6: Export nf_ct_frag6_consume_orig() This is needed in openvswitch to fix an skb leak in the next patch. Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar Signed-off-by: David S. Miller --- net/ipv6/netfilter/nf_conntrack_reasm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 701cd2bae0a9..c7196ad1d69f 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -646,6 +646,7 @@ void nf_ct_frag6_consume_orig(struct sk_buff *skb) s = s2; } } +EXPORT_SYMBOL_GPL(nf_ct_frag6_consume_orig); static int nf_ct_net_init(struct net *net) { -- cgit v1.2.3 From 6f5cadee44d83395dcd78d557b577e1021e192e4 Mon Sep 17 00:00:00 2001 From: Joe Stringer Date: Sun, 25 Oct 2015 20:21:50 -0700 Subject: openvswitch: Fix skb leak using IPv6 defrag nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the reassembly is successful, expects the caller to free all of the original skbs using nf_ct_frag6_consume_orig(). This call was previously missing, meaning that the original fragments were never freed (with the exception of the last fragment to arrive). Fix this by ensuring that all original fragments except for the last fragment are freed via nf_ct_frag6_consume_orig(). The last fragment will be morphed into the head, so it must not be freed yet. Furthermore, retain the ->next pointer for the head after skb_morph(). Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar Signed-off-by: David S. Miller --- net/openvswitch/conntrack.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index b5dcc0abde66..50095820edb7 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -326,8 +326,15 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, return -EINVAL; } + /* Don't free 'skb' even though it is one of the original + * fragments, as we're going to morph it into the head. + */ + skb_get(skb); + nf_ct_frag6_consume_orig(reasm); + key->ip.proto = ipv6_hdr(reasm)->nexthdr; skb_morph(skb, reasm); + skb->next = reasm->next; consume_skb(reasm); ovs_cb.mru = IP6CB(skb)->frag_max_size; #endif -- cgit v1.2.3 From 0b7c874348ea14ec3c358fe95e56d6f830540248 Mon Sep 17 00:00:00 2001 From: Neil Horman Date: Mon, 26 Oct 2015 12:24:22 -0400 Subject: forcedeth: fix unilateral interrupt disabling in netpoll path Forcedeth currently uses disable_irq_lockdep and enable_irq_lockdep, which in some configurations simply calls local_irq_disable. This causes errant warnings in the netpoll path as in netpoll_send_skb_on_dev, where we disable irqs using local_irq_save, leading to the following warning: WARNING: at net/core/netpoll.c:352 netpoll_send_skb_on_dev+0x243/0x250() (Not tainted) Hardware name: netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (nv_start_xmit_optimized+0x0/0x860 [forcedeth]) Modules linked in: netconsole(+) configfs ipv6 iptable_filter ip_tables ppdev parport_pc parport sg microcode serio_raw edac_core edac_mce_amd k8temp snd_hda_codec_realtek snd_hda_codec_generic forcedeth snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_nforce2 i2c_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1940, comm: modprobe Not tainted 2.6.32-573.7.1.el6.x86_64.debug #1 Call Trace: [] ? warn_slowpath_common+0x91/0xe0 [] ? warn_slowpath_fmt+0x46/0x60 [] ? nv_start_xmit_optimized+0x0/0x860 [forcedeth] [] ? netpoll_send_skb_on_dev+0x243/0x250 [] ? netpoll_send_udp+0x229/0x270 [] ? write_msg+0x39/0x110 [netconsole] [] ? write_msg+0xbb/0x110 [netconsole] [] ? __call_console_drivers+0x75/0x90 [] ? _call_console_drivers+0x4a/0x80 [] ? release_console_sem+0xe5/0x250 [] ? register_console+0x190/0x3e0 [] ? init_netconsole+0x1a6/0x216 [netconsole] [] ? init_netconsole+0x0/0x216 [netconsole] [] ? do_one_initcall+0xc0/0x280 [] ? sys_init_module+0xe3/0x260 [] ? system_call_fastpath+0x16/0x1b ---[ end trace f349c7af88e6a6d5 ]--- console [netcon0] enabled netconsole: network logging started Fix it by modifying the forcedeth code to use disable_irq_nosync_lockdep_irqsavedisable_irq_nosync_lockdep_irqsave instead, which saves and restores irq state properly. This also saves us a little code in the process Tested by the reporter, with successful restuls Patch applies to the head of the net tree Signed-off-by: Neil Horman CC: "David S. Miller" Reported-by: Vasily Averin Signed-off-by: David S. Miller --- drivers/net/ethernet/nvidia/forcedeth.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c index a41bb5e6b954..75e88f4c1531 100644 --- a/drivers/net/ethernet/nvidia/forcedeth.c +++ b/drivers/net/ethernet/nvidia/forcedeth.c @@ -4076,6 +4076,8 @@ static void nv_do_nic_poll(unsigned long data) struct fe_priv *np = netdev_priv(dev); u8 __iomem *base = get_hwbase(dev); u32 mask = 0; + unsigned long flags; + unsigned int irq = 0; /* * First disable irq(s) and then @@ -4085,25 +4087,27 @@ static void nv_do_nic_poll(unsigned long data) if (!using_multi_irqs(dev)) { if (np->msi_flags & NV_MSI_X_ENABLED) - disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector); + irq = np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector; else - disable_irq_lockdep(np->pci_dev->irq); + irq = np->pci_dev->irq; mask = np->irqmask; } else { if (np->nic_poll_irq & NVREG_IRQ_RX_ALL) { - disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector); + irq = np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector; mask |= NVREG_IRQ_RX_ALL; } if (np->nic_poll_irq & NVREG_IRQ_TX_ALL) { - disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector); + irq = np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector; mask |= NVREG_IRQ_TX_ALL; } if (np->nic_poll_irq & NVREG_IRQ_OTHER) { - disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector); + irq = np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector; mask |= NVREG_IRQ_OTHER; } } - /* disable_irq() contains synchronize_irq, thus no irq handler can run now */ + + disable_irq_nosync_lockdep_irqsave(irq, &flags); + synchronize_irq(irq); if (np->recover_error) { np->recover_error = 0; @@ -4156,28 +4160,22 @@ static void nv_do_nic_poll(unsigned long data) nv_nic_irq_optimized(0, dev); else nv_nic_irq(0, dev); - if (np->msi_flags & NV_MSI_X_ENABLED) - enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector); - else - enable_irq_lockdep(np->pci_dev->irq); } else { if (np->nic_poll_irq & NVREG_IRQ_RX_ALL) { np->nic_poll_irq &= ~NVREG_IRQ_RX_ALL; nv_nic_irq_rx(0, dev); - enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector); } if (np->nic_poll_irq & NVREG_IRQ_TX_ALL) { np->nic_poll_irq &= ~NVREG_IRQ_TX_ALL; nv_nic_irq_tx(0, dev); - enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector); } if (np->nic_poll_irq & NVREG_IRQ_OTHER) { np->nic_poll_irq &= ~NVREG_IRQ_OTHER; nv_nic_irq_other(0, dev); - enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector); } } + enable_irq_lockdep_irqrestore(irq, &flags); } #ifdef CONFIG_NET_POLL_CONTROLLER -- cgit v1.2.3 From 8ce675ff39b9958d1c10f86cf58e357efaafc856 Mon Sep 17 00:00:00 2001 From: Sowmini Varadhan Date: Mon, 26 Oct 2015 12:46:37 -0400 Subject: RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- net/rds/tcp_recv.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c index fbc5ef88bc0e..27a992154804 100644 --- a/net/rds/tcp_recv.c +++ b/net/rds/tcp_recv.c @@ -214,8 +214,15 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb, } to_copy = min(tc->t_tinc_data_rem, left); - pskb_pull(clone, offset); - pskb_trim(clone, to_copy); + if (!pskb_pull(clone, offset) || + pskb_trim(clone, to_copy)) { + pr_warn("rds_tcp_data_recv: pull/trim failed " + "left %zu data_rem %zu skb_len %d\n", + left, tc->t_tinc_data_rem, skb->len); + kfree_skb(clone); + desc->error = -ENOMEM; + goto out; + } skb_queue_tail(&tinc->ti_skb_list, clone); rdsdebug("skb %p data %p len %d off %u to_copy %zu -> " -- cgit v1.2.3 From 20986ed826cbb36bb8f2d77f872e3c52d8d30647 Mon Sep 17 00:00:00 2001 From: "Lendacky, Thomas" Date: Mon, 26 Oct 2015 17:13:54 -0500 Subject: amd-xgbe: Fix race between access of desc and desc index During Tx cleanup it's still possible for the descriptor data to be read ahead of the descriptor index. A memory barrier is required between the read of the descriptor index and the start of the Tx cleanup loop. This allows a change to a lighter-weight barrier in the Tx transmit routine just before updating the current descriptor index. Since the memory barrier does result in extra overhead on arm64, keep the previous change to not chase the current descriptor value. This prevents the execution of the barrier for each loop performed. Suggested-by: Alexander Duyck Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller --- drivers/net/ethernet/amd/xgbe/xgbe-dev.c | 2 +- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c index e9ab8b9f3b9c..f672dba345f7 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c @@ -1595,7 +1595,7 @@ static void xgbe_dev_xmit(struct xgbe_channel *channel) packet->rdesc_count, 1); /* Make sure ownership is written to the descriptor */ - wmb(); + smp_wmb(); ring->cur = cur_index + 1; if (!packet->skb->xmit_more || diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index d2b77d985441..dde0486667e0 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -1816,6 +1816,10 @@ static int xgbe_tx_poll(struct xgbe_channel *channel) return 0; cur = ring->cur; + + /* Be sure we get ring->cur before accessing descriptor data */ + smp_rmb(); + txq = netdev_get_tx_queue(netdev, channel->queue_index); while ((processed < XGBE_TX_DESC_MAX_PROC) && -- cgit v1.2.3 From 85ff8a43f39fa6d2f970b5e1e5c03df87abde242 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Mon, 26 Oct 2015 17:02:19 -0700 Subject: bpf: sample: define aarch64 specific registers Define aarch64 specific registers for building bpf samples correctly. Signed-off-by: Yang Shi Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller --- samples/bpf/bpf_helpers.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h index 3a44d3a272af..af44e564d6dd 100644 --- a/samples/bpf/bpf_helpers.h +++ b/samples/bpf/bpf_helpers.h @@ -86,5 +86,17 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag #define PT_REGS_RC(x) ((x)->gprs[2]) #define PT_REGS_SP(x) ((x)->gprs[15]) +#elif defined(__aarch64__) + +#define PT_REGS_PARM1(x) ((x)->regs[0]) +#define PT_REGS_PARM2(x) ((x)->regs[1]) +#define PT_REGS_PARM3(x) ((x)->regs[2]) +#define PT_REGS_PARM4(x) ((x)->regs[3]) +#define PT_REGS_PARM5(x) ((x)->regs[4]) +#define PT_REGS_RET(x) ((x)->regs[30]) +#define PT_REGS_FP(x) ((x)->regs[29]) /* Works only with CONFIG_FRAME_POINTER */ +#define PT_REGS_RC(x) ((x)->regs[0]) +#define PT_REGS_SP(x) ((x)->sp) + #endif #endif -- cgit v1.2.3 From e407f39afdc0741dcf20aed100b8e738ccab7cb1 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Tue, 27 Oct 2015 11:37:39 +0200 Subject: vhost: fix performance on LE hosts commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian support for legacy devices") introduced a minor regression: even with cross-endian disabled, and even on LE host, vhost_is_little_endian is checking is_le flag so there's always a branch. To fix, simply check virtio_legacy_is_little_endian first. Cc: Greg Kurz Signed-off-by: Michael S. Tsirkin Reviewed-by: Greg Kurz Signed-off-by: David S. Miller --- drivers/vhost/vhost.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 4772862b71a7..d3f767448a72 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -183,10 +183,17 @@ static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit) return vq->acked_features & (1ULL << bit); } +#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) { return vq->is_le; } +#else +static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) +{ + return virtio_legacy_is_little_endian() || vq->is_le; +} +#endif /* Memory accessors */ static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) -- cgit v1.2.3 From 092bf0fc80f5fb7928244ad63d8a2a8df8a72a3e Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Tue, 27 Oct 2015 17:36:19 +0200 Subject: net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present We do not set the ins_vlan field to zero when no vlan id is present in the packet. Since WQEs in the TX ring are not zeroed out between uses, this oversight could result in having vlan flags present in the WQE ctrl segment when no vlan is preset. Fixes: e38af4faf01d ('net/mlx4_en: Add support for hardware accelerated 802.1ad vlan') Reported-by: Gideon Naim Signed-off-by: Jack Morgenstein Signed-off-by: Or Gerlitz Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 494e7762fdb1..4421bf5463f6 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -964,6 +964,8 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) tx_desc->ctrl.ins_vlan = MLX4_WQE_CTRL_INS_SVLAN; else if (vlan_proto == ETH_P_8021Q) tx_desc->ctrl.ins_vlan = MLX4_WQE_CTRL_INS_CVLAN; + else + tx_desc->ctrl.ins_vlan = 0; tx_desc->ctrl.fence_size = real_size; -- cgit v1.2.3 From c02b05011fadf8e409e41910217ca689f2fc9d91 Mon Sep 17 00:00:00 2001 From: Carol L Soto Date: Tue, 27 Oct 2015 17:36:20 +0200 Subject: net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes When doing memcpy/memset of EQEs, we should use sizeof struct mlx4_eqe as the base size and not caps.eqe_size which could be bigger. If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt data in the master context. When using a 64 byte stride, the memcpy copied over 63 bytes to the slave_eq structure. This resulted in copying over the entire eqe of interest, including its ownership bit -- and also 31 bytes of garbage into the next WQE in the slave EQ -- which did NOT include the ownership bit (and therefore had no impact). However, once the stride is increased to 128, we are overwriting the ownership bits of *three* eqes in the slave_eq struct. This results in an incorrect ownership bit for those eqes, which causes the eq to seem to be full. The issue therefore surfaced only once 128-byte EQEs started being used in SRIOV and (overarchitectures that have 128/256 byte cache-lines such as PPC) - e.g after commit 77507aa249ae "net/mlx4_core: Enable CQE/EQE stride support". Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support') Signed-off-by: Carol L Soto Signed-off-by: Jack Morgenstein Signed-off-by: Or Gerlitz Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/cmd.c | 2 +- drivers/net/ethernet/mellanox/mlx4/eq.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c index 0a3202047569..2177e56ed0be 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c @@ -2398,7 +2398,7 @@ int mlx4_multi_func_init(struct mlx4_dev *dev) } } - memset(&priv->mfunc.master.cmd_eqe, 0, dev->caps.eqe_size); + memset(&priv->mfunc.master.cmd_eqe, 0, sizeof(struct mlx4_eqe)); priv->mfunc.master.cmd_eqe.type = MLX4_EVENT_TYPE_CMD; INIT_WORK(&priv->mfunc.master.comm_work, mlx4_master_comm_channel); diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c index c34488479365..603d1c3d3b2e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/eq.c +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c @@ -196,7 +196,7 @@ static void slave_event(struct mlx4_dev *dev, u8 slave, struct mlx4_eqe *eqe) return; } - memcpy(s_eqe, eqe, dev->caps.eqe_size - 1); + memcpy(s_eqe, eqe, sizeof(struct mlx4_eqe) - 1); s_eqe->slave_id = slave; /* ensure all information is written before setting the ownersip bit */ dma_wmb(); -- cgit v1.2.3 From 977bf062bba3eb8d03f66d5b4e227e5d7ebc1e08 Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt Date: Tue, 27 Oct 2015 17:20:05 +0900 Subject: powerpc/dma: dma_set_coherent_mask() should not be GPL only When turning this from inline to an exported function I was a bit over-eager and made it GPL only. This prevents the use of pretty much all non-GPL PCI driver which is a bit over the top. Let's bring it back in line with other architecture. Fixes: 817820b0226a ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask") Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 59503ed98e5f..3f1472a78f39 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -303,7 +303,7 @@ int dma_set_coherent_mask(struct device *dev, u64 mask) dev->coherent_dma_mask = mask; return 0; } -EXPORT_SYMBOL_GPL(dma_set_coherent_mask); +EXPORT_SYMBOL(dma_set_coherent_mask); #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16) -- cgit v1.2.3 From 589cb22bbedacf325951014c07a35a2b01ca57f6 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Thu, 15 Oct 2015 13:55:53 +0100 Subject: arm64: compat: fix stxr failure case in SWP emulation If the STXR instruction fails in the SWP emulation code, we leave *data overwritten with the loaded value, therefore corrupting the data written by a subsequent, successful attempt. This patch re-jigs the code so that we only write back to *data once we know that the update has happened. Cc: Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm") Reported-by: Shengjiu Wang Reported-by: Vladimir Murzin Signed-off-by: Will Deacon --- arch/arm64/kernel/armv8_deprecated.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c index bcee7abac68e..937f5e58a4d3 100644 --- a/arch/arm64/kernel/armv8_deprecated.c +++ b/arch/arm64/kernel/armv8_deprecated.c @@ -284,21 +284,23 @@ static void register_insn_emulation_sysctl(struct ctl_table *table) __asm__ __volatile__( \ ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN, \ CONFIG_ARM64_PAN) \ - " mov %w2, %w1\n" \ - "0: ldxr"B" %w1, [%3]\n" \ - "1: stxr"B" %w0, %w2, [%3]\n" \ + "0: ldxr"B" %w2, [%3]\n" \ + "1: stxr"B" %w0, %w1, [%3]\n" \ " cbz %w0, 2f\n" \ " mov %w0, %w4\n" \ + " b 3f\n" \ "2:\n" \ + " mov %w1, %w2\n" \ + "3:\n" \ " .pushsection .fixup,\"ax\"\n" \ " .align 2\n" \ - "3: mov %w0, %w5\n" \ - " b 2b\n" \ + "4: mov %w0, %w5\n" \ + " b 3b\n" \ " .popsection" \ " .pushsection __ex_table,\"a\"\n" \ " .align 3\n" \ - " .quad 0b, 3b\n" \ - " .quad 1b, 3b\n" \ + " .quad 0b, 4b\n" \ + " .quad 1b, 4b\n" \ " .popsection\n" \ ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN, \ CONFIG_ARM64_PAN) \ -- cgit v1.2.3 From e13d918a19a7b6cba62b32884f5e336e764c2cc6 Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Tue, 27 Oct 2015 17:29:10 +0000 Subject: arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap Commit dd006da21646 ("arm64: mm: increase VA range of identity map") introduced a mechanism to extend the virtual memory map range to support arm64 systems with system RAM located at very high offset, where the identity mapping used to enable/disable the MMU requires additional translation levels to map the physical memory at an equal virtual offset. The kernel detects at boot time the tcr_el1.t0sz value required by the identity mapping and sets-up the tcr_el1.t0sz register field accordingly, any time the identity map is required in the kernel (ie when enabling the MMU). After enabling the MMU, in the cold boot path the kernel resets the tcr_el1.t0sz to its default value (ie the actual configuration value for the system virtual address space) so that after enabling the MMU the memory space translated by ttbr0_el1 is restored as expected. Commit dd006da21646 ("arm64: mm: increase VA range of identity map") also added code to set-up the tcr_el1.t0sz value when the kernel resumes from low-power states with the MMU off through cpu_resume() in order to effectively use the identity mapping to enable the MMU but failed to add the code required to restore the tcr_el1.t0sz to its default value, when the core returns to the kernel with the MMU enabled, so that the kernel might end up running with tcr_el1.t0sz value set-up for the identity mapping which can be lower than the value required by the actual virtual address space, resulting in an erroneous set-up. This patchs adds code in the resume path that restores the tcr_el1.t0sz default value upon core resume, mirroring this way the cold boot path behaviour therefore fixing the issue. Cc: Cc: Catalin Marinas Fixes: dd006da21646 ("arm64: mm: increase VA range of identity map") Acked-by: Ard Biesheuvel Signed-off-by: Lorenzo Pieralisi Signed-off-by: James Morse Signed-off-by: Will Deacon --- arch/arm64/kernel/suspend.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c index 8297d502217e..44ca4143b013 100644 --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c @@ -80,17 +80,21 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) if (ret == 0) { /* * We are resuming from reset with TTBR0_EL1 set to the - * idmap to enable the MMU; restore the active_mm mappings in - * TTBR0_EL1 unless the active_mm == &init_mm, in which case - * the thread entered cpu_suspend with TTBR0_EL1 set to - * reserved TTBR0 page tables and should be restored as such. + * idmap to enable the MMU; set the TTBR0 to the reserved + * page tables to prevent speculative TLB allocations, flush + * the local tlb and set the default tcr_el1.t0sz so that + * the TTBR0 address space set-up is properly restored. + * If the current active_mm != &init_mm we entered cpu_suspend + * with mappings in TTBR0 that must be restored, so we switch + * them back to complete the address space configuration + * restoration before returning. */ - if (mm == &init_mm) - cpu_set_reserved_ttbr0(); - else - cpu_switch_mm(mm->pgd, mm); - + cpu_set_reserved_ttbr0(); flush_tlb_all(); + cpu_set_default_tcr_t0sz(); + + if (mm != &init_mm) + cpu_switch_mm(mm->pgd, mm); /* * Restore per-cpu offset before any kernel -- cgit v1.2.3 From 9702970c7bd3e2d6fecb642a190269131d4ac16c Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 28 Oct 2015 16:56:13 +0000 Subject: Revert "ARM64: unwind: Fix PC calculation" This reverts commit e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63. With this patch applied, we were the only architecture making this sort of adjustment to the PC calculation in the unwinder. This causes problems for ftrace, where the PC values are matched against the contents of the stack frames in the callchain and fail to match any records after the address adjustment. Whilst there has been some effort to change ftrace to workaround this, those patches are not yet ready for mainline and, since we're the odd architecture in this regard, let's just step in line with other architectures (like arch/arm/) for now. Cc: Signed-off-by: Will Deacon --- arch/arm64/kernel/stacktrace.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 407991bf79f5..ccb6078ed9f2 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -48,11 +48,7 @@ int notrace unwind_frame(struct stackframe *frame) frame->sp = fp + 0x10; frame->fp = *(unsigned long *)(fp); - /* - * -4 here because we care about the PC at time of bl, - * not where the return will go. - */ - frame->pc = *(unsigned long *)(fp + 8) - 4; + frame->pc = *(unsigned long *)(fp + 8); return 0; } -- cgit v1.2.3 From d305c4773458fdd6ff9c52bfdea8c67fbd3b2072 Mon Sep 17 00:00:00 2001 From: Émeric MASCHINO Date: Tue, 22 Sep 2015 23:58:48 +0200 Subject: [IA64] Wire up kcmp syscall MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit systemd > 218 fails to compile on ia64 with: error: ‘__NR_kcmp’ undeclared [1]. I've been told that this is because the kcmp syscall hasn't been wired up for the ia64 arch [2]. The proposed patch thus wire up the kcmp syscall for the ia64 arch. [1] https://bugs.gentoo.org/show_bug.cgi?id=560492 [2] https://bugs.gentoo.org/show_bug.cgi?id=560492#c17 Signed-off-by: Émeric MASCHINO Signed-off-by: Tony Luck --- arch/ia64/include/asm/unistd.h | 2 +- arch/ia64/include/uapi/asm/unistd.h | 1 + arch/ia64/kernel/entry.S | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/ia64/include/asm/unistd.h b/arch/ia64/include/asm/unistd.h index 99c96a5e6016..db73390568c8 100644 --- a/arch/ia64/include/asm/unistd.h +++ b/arch/ia64/include/asm/unistd.h @@ -11,7 +11,7 @@ -#define NR_syscalls 321 /* length of syscall table */ +#define NR_syscalls 322 /* length of syscall table */ /* * The following defines stop scripts/checksyscalls.sh from complaining about diff --git a/arch/ia64/include/uapi/asm/unistd.h b/arch/ia64/include/uapi/asm/unistd.h index 98e94e19a5a0..9038726e7d26 100644 --- a/arch/ia64/include/uapi/asm/unistd.h +++ b/arch/ia64/include/uapi/asm/unistd.h @@ -334,5 +334,6 @@ #define __NR_execveat 1342 #define __NR_userfaultfd 1343 #define __NR_membarrier 1344 +#define __NR_kcmp 1345 #endif /* _UAPI_ASM_IA64_UNISTD_H */ diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S index 37cc7a65cd3e..dcd97f84d065 100644 --- a/arch/ia64/kernel/entry.S +++ b/arch/ia64/kernel/entry.S @@ -1770,5 +1770,6 @@ sys_call_table: data8 sys_execveat data8 sys_userfaultfd data8 sys_membarrier + data8 sys_kcmp // 1345 .org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls -- cgit v1.2.3 From 1e0d69a9cc9172d7896c2113f983a74f6e8ff303 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Wed, 28 Oct 2015 13:21:03 +0100 Subject: Revert "Merge branch 'ipv6-overflow-arith'" Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- include/linux/compiler-gcc.h | 4 ---- include/linux/overflow-arith.h | 18 ------------------ net/ipv6/ip6_output.c | 6 +----- 3 files changed, 1 insertion(+), 27 deletions(-) delete mode 100644 include/linux/overflow-arith.h diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 82c159e0532a..dfaa7b3e9ae9 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -237,10 +237,6 @@ #define KASAN_ABI_VERSION 3 #endif -#if GCC_VERSION >= 50000 -#define CC_HAVE_BUILTIN_OVERFLOW -#endif - #endif /* gcc version >= 40000 specific checks */ #if !defined(__noclone) diff --git a/include/linux/overflow-arith.h b/include/linux/overflow-arith.h deleted file mode 100644 index e12ccf854a70..000000000000 --- a/include/linux/overflow-arith.h +++ /dev/null @@ -1,18 +0,0 @@ -#pragma once - -#include - -#ifdef CC_HAVE_BUILTIN_OVERFLOW - -#define overflow_usub __builtin_usub_overflow - -#else - -static inline bool overflow_usub(unsigned int a, unsigned int b, - unsigned int *res) -{ - *res = a - b; - return *res > a ? true : false; -} - -#endif diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 8dddb45c433e..d03d6da772f3 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -28,7 +28,6 @@ #include #include -#include #include #include #include @@ -585,10 +584,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, if (np->frag_size) mtu = np->frag_size; } - - if (overflow_usub(mtu, hlen + sizeof(struct frag_hdr), &mtu) || - mtu <= 7) - goto fail_toobig; + mtu -= hlen + sizeof(struct frag_hdr); frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr, &ipv6_hdr(skb)->saddr); -- cgit v1.2.3 From 89bc7848a91bc99532f5c21b2885472ba710f249 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Wed, 28 Oct 2015 13:21:04 +0100 Subject: ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds Cc: Linus Torvalds Reported-by: Dmitry Vyukov Cc: Dmitry Vyukov Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- net/ipv6/ip6_output.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d03d6da772f3..f84ec4e9b2de 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -584,6 +584,8 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, if (np->frag_size) mtu = np->frag_size; } + if (mtu < hlen + sizeof(struct frag_hdr) + 8) + goto fail_toobig; mtu -= hlen + sizeof(struct frag_hdr); frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr, -- cgit v1.2.3 From 73effccb9196ccc0241c3fb51dfd8de1d14ae8ed Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 29 Oct 2015 15:07:25 +0100 Subject: arm64/efi: do not assume DRAM base is aligned to 2 MB The current arm64 Image relocation code in the UEFI stub assumes that the dram_base argument it receives is always a multiple of 2 MB. In reality, it is simply the lowest start address of all RAM entries in the UEFI memory map, which means it could be any multiple of 4 KB. Since the arm64 kernel Image needs to reside TEXT_OFFSET bytes beyond a 2 MB aligned base, or it will fail to boot, make sure we round dram_base to 2 MB before using it to calculate the relocation address. Fixes: e38457c361b30c5a ("arm64: efi: prefer AllocatePages() over efi_low_alloc() for vmlinux") Reported-by: Timur Tabi Tested-by: Timur Tabi Acked-by: Mark Rutland Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon --- arch/arm64/kernel/efi-stub.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index 816120ece6bc..78dfbd34b6bf 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -25,10 +25,20 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg, unsigned long kernel_size, kernel_memsize = 0; unsigned long nr_pages; void *old_image_addr = (void *)*image_addr; + unsigned long preferred_offset; + + /* + * The preferred offset of the kernel Image is TEXT_OFFSET bytes beyond + * a 2 MB aligned base, which itself may be lower than dram_base, as + * long as the resulting offset equals or exceeds it. + */ + preferred_offset = round_down(dram_base, SZ_2M) + TEXT_OFFSET; + if (preferred_offset < dram_base) + preferred_offset += SZ_2M; /* Relocate the image, if required. */ kernel_size = _edata - _text; - if (*image_addr != (dram_base + TEXT_OFFSET)) { + if (*image_addr != preferred_offset) { kernel_memsize = kernel_size + (_end - _edata); /* @@ -42,7 +52,7 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg, * Mustang), we can still place the kernel at the address * 'dram_base + TEXT_OFFSET'. */ - *image_addr = *reserve_addr = dram_base + TEXT_OFFSET; + *image_addr = *reserve_addr = preferred_offset; nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE; status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS, -- cgit v1.2.3 From bae818ee1577c27356093901a0ea48f672eda514 Mon Sep 17 00:00:00 2001 From: Ronny Hegewald Date: Thu, 15 Oct 2015 18:50:46 +0000 Subject: rbd: require stable pages if message data CRCs are enabled rbd requires stable pages, as it performs a crc of the page data before they are send to the OSDs. But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0 "mm: only enforce stable page writes if the backing device requires it") it is not assumed anymore that block devices require stable pages. This patch sets the necessary flag to get stable pages back for rbd. In a ceph installation that provides multiple ext4 formatted rbd devices "bad crc" messages appeared regularly (ca 1 message every 1-2 minutes on every OSD that provided the data for the rbd) in the OSD-logs before this patch. After this patch this messages are pretty much gone (only ca 1-2 / month / OSD). Cc: stable@vger.kernel.org # 3.9+, needs backporting Signed-off-by: Ronny Hegewald [idryomov@gmail.com: require stable pages only in crc case, changelog] Signed-off-by: Ilya Dryomov --- drivers/block/rbd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 6f26cf38c6f9..128e7df5b807 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3780,6 +3780,9 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE); q->limits.discard_zeroes_data = 1; + if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) + q->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES; + disk->queue = q; q->queuedata = rbd_dev; -- cgit v1.2.3