From b03e7495a862b028294f59fc87286d6d78ee7fa1 Mon Sep 17 00:00:00 2001 From: Jon Mason Date: Wed, 20 Jul 2011 15:20:54 -0500 Subject: PCI: Set PCI-E Max Payload Size on fabric On a given PCI-E fabric, each device, bridge, and root port can have a different PCI-E maximum payload size. There is a sizable performance boost for having the largest possible maximum payload size on each PCI-E device. However, if improperly configured, fatal bus errors can occur. Thus, it is important to ensure that PCI-E payloads sends by a device are never larger than the MPS setting of all devices on the way to the destination. This can be achieved two ways: - A conservative approach is to use the smallest common denominator of the entire tree below a root complex for every device on that fabric. This means for example that having a 128 bytes MPS USB controller on one leg of a switch will dramatically reduce performances of a video card or 10GE adapter on another leg of that same switch. It also means that any hierarchy supporting hotplug slots (including expresscard or thunderbolt I suppose, dbl check that) will have to be entirely clamped to 128 bytes since we cannot predict what will be plugged into those slots, and we cannot change the MPS on a "live" system. - A more optimal way is possible, if it falls within a couple of constraints: * The top-level host bridge will never generate packets larger than the smallest TLP (or if it can be controlled independently from its MPS at least) * The device will never generate packets larger than MPS (which can be configured via MRRS) * No support of direct PCI-E <-> PCI-E transfers between devices without some additional code to specifically deal with that case Then we can use an approach that basically ignores downstream requests and focuses exclusively on upstream requests. In that case, all we need to care about is that a device MPS is no larger than its parent MPS, which allows us to keep all switches/bridges to the max MPS supported by their parent and eventually the PHB. In this case, your USB controller would no longer "starve" your 10GE Ethernet and your hotplug slots won't affect your global MPS. Additionally, the hotplugged devices themselves can be configured to a larger MPS up to the value configured in the hotplug bridge. To choose between the two available options, two PCI kernel boot args have been added to the PCI calls. "pcie_bus_safe" will provide the former behavior, while "pcie_bus_perf" will perform the latter behavior. By default, the latter behavior is used. NOTE: due to the location of the enablement, each arch will need to add calls to this function. This patch only enables x86. This patch includes a number of changes recommended by Benjamin Herrenschmidt. Tested-by: Jordan_Hargrave@dell.com Signed-off-by: Jon Mason Signed-off-by: Jesse Barnes --- drivers/pci/pci.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 08a95b369d85..466fad6e6ee2 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -77,6 +77,8 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE; unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE; unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE; +enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PERFORMANCE; + /* * The default CLS is used if arch didn't set CLS explicitly and not * all pci devices agree on the same value. Arch can override either @@ -3222,6 +3224,67 @@ out: } EXPORT_SYMBOL(pcie_set_readrq); +/** + * pcie_get_mps - get PCI Express maximum payload size + * @dev: PCI device to query + * + * Returns maximum payload size in bytes + * or appropriate error value. + */ +int pcie_get_mps(struct pci_dev *dev) +{ + int ret, cap; + u16 ctl; + + cap = pci_pcie_cap(dev); + if (!cap) + return -EINVAL; + + ret = pci_read_config_word(dev, cap + PCI_EXP_DEVCTL, &ctl); + if (!ret) + ret = 128 << ((ctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5); + + return ret; +} + +/** + * pcie_set_mps - set PCI Express maximum payload size + * @dev: PCI device to query + * @rq: maximum payload size in bytes + * valid values are 128, 256, 512, 1024, 2048, 4096 + * + * If possible sets maximum payload size + */ +int pcie_set_mps(struct pci_dev *dev, int mps) +{ + int cap, err = -EINVAL; + u16 ctl, v; + + if (mps < 128 || mps > 4096 || !is_power_of_2(mps)) + goto out; + + v = ffs(mps) - 8; + if (v > dev->pcie_mpss) + goto out; + v <<= 5; + + cap = pci_pcie_cap(dev); + if (!cap) + goto out; + + err = pci_read_config_word(dev, cap + PCI_EXP_DEVCTL, &ctl); + if (err) + goto out; + + if ((ctl & PCI_EXP_DEVCTL_PAYLOAD) != v) { + ctl &= ~PCI_EXP_DEVCTL_PAYLOAD; + ctl |= v; + err = pci_write_config_word(dev, cap + PCI_EXP_DEVCTL, ctl); + } +out: + return err; +} + /** * pci_select_bars - Make BAR mask from the type of resource * @dev: the PCI device for which BAR mask is made @@ -3505,6 +3568,10 @@ static int __init pci_setup(char *str) pci_hotplug_io_size = memparse(str + 9, &str); } else if (!strncmp(str, "hpmemsize=", 10)) { pci_hotplug_mem_size = memparse(str + 10, &str); + } else if (!strncmp(str, "pcie_bus_safe", 13)) { + pcie_bus_config = PCIE_BUS_SAFE; + } else if (!strncmp(str, "pcie_bus_perf", 13)) { + pcie_bus_config = PCIE_BUS_PERFORMANCE; } else { printk(KERN_ERR "PCI: Unknown option `%s'\n", str); -- cgit v1.2.3 From 47c08f3107270e5a439bc0106a308f7c48c9621d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 20 Aug 2011 11:49:43 -0700 Subject: pci: fix new kernel-doc warning in pci.c Fix new kernel-doc warning in pci.c: Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps' Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps' Signed-off-by: Randy Dunlap Cc: Jesse Barnes Signed-off-by: Linus Torvalds --- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 466fad6e6ee2..0ce67423a0a3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3250,7 +3250,7 @@ int pcie_get_mps(struct pci_dev *dev) /** * pcie_set_mps - set PCI Express maximum payload size * @dev: PCI device to query - * @rq: maximum payload size in bytes + * @mps: maximum payload size in bytes * valid values are 128, 256, 512, 1024, 2048, 4096 * * If possible sets maximum payload size -- cgit v1.2.3 From ed2888e906b56769b4ffabb9c577190438aa68b8 Mon Sep 17 00:00:00 2001 From: Jon Mason Date: Thu, 8 Sep 2011 16:41:18 -0500 Subject: PCI: Remove MRRS modification from MPS setting code Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has massive negative ramifications on some devices. Without knowing which devices have this issue, do not modify from the default value when walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe the default procedure. Tested-by: Sven Schnelle Tested-by: Simon Kirby Tested-by: Stephen M. Cameron Reported-and-tested-by: Eric Dumazet Reported-and-tested-by: Niels Ole Salscheider References: https://bugzilla.kernel.org/show_bug.cgi?id=42162 Signed-off-by: Jon Mason Acked-by: Jesse Barnes Signed-off-by: Linus Torvalds --- drivers/pci/pci.c | 2 +- drivers/pci/probe.c | 41 ++++++++++++++++++++++------------------- 2 files changed, 23 insertions(+), 20 deletions(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 0ce67423a0a3..4e84fd4a4312 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -77,7 +77,7 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE; unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE; unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE; -enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PERFORMANCE; +enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_SAFE; /* * The default CLS is used if arch didn't set CLS explicitly and not diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 0820fc1544e8..b1187ff31d89 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1396,34 +1396,37 @@ static void pcie_write_mps(struct pci_dev *dev, int mps) static void pcie_write_mrrs(struct pci_dev *dev, int mps) { - int rc, mrrs; + int rc, mrrs, dev_mpss; - if (pcie_bus_config == PCIE_BUS_PERFORMANCE) { - int dev_mpss = 128 << dev->pcie_mpss; + /* In the "safe" case, do not configure the MRRS. There appear to be + * issues with setting MRRS to 0 on a number of devices. + */ - /* For Max performance, the MRRS must be set to the largest - * supported value. However, it cannot be configured larger - * than the MPS the device or the bus can support. This assumes - * that the largest MRRS available on the device cannot be - * smaller than the device MPSS. - */ - mrrs = mps < dev_mpss ? mps : dev_mpss; - } else - /* In the "safe" case, configure the MRRS for fairness on the - * bus by making all devices have the same size - */ - mrrs = mps; + if (pcie_bus_config != PCIE_BUS_PERFORMANCE) + return; + + dev_mpss = 128 << dev->pcie_mpss; + /* For Max performance, the MRRS must be set to the largest supported + * value. However, it cannot be configured larger than the MPS the + * device or the bus can support. This assumes that the largest MRRS + * available on the device cannot be smaller than the device MPSS. + */ + mrrs = min(mps, dev_mpss); /* MRRS is a R/W register. Invalid values can be written, but a - * subsiquent read will verify if the value is acceptable or not. + * subsequent read will verify if the value is acceptable or not. * If the MRRS value provided is not acceptable (e.g., too large), * shrink the value until it is acceptable to the HW. */ while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) { + dev_warn(&dev->dev, "Attempting to modify the PCI-E MRRS value" + " to %d. If any issues are encountered, please try " + "running with pci=pcie_bus_safe\n", mrrs); rc = pcie_set_readrq(dev, mrrs); if (rc) - dev_err(&dev->dev, "Failed attempting to set the MRRS\n"); + dev_err(&dev->dev, + "Failed attempting to set the MRRS\n"); mrrs /= 2; } @@ -1436,13 +1439,13 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) if (!pci_is_pcie(dev)) return 0; - dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", + dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", pcie_get_mps(dev), 128<pcie_mpss, pcie_get_readrq(dev)); pcie_write_mps(dev, mps); pcie_write_mrrs(dev, mps); - dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", + dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n", pcie_get_mps(dev), 128<pcie_mpss, pcie_get_readrq(dev)); return 0; -- cgit v1.2.3 From 5f39e6705faade2e89d119958a8c51b9b6e2c53c Mon Sep 17 00:00:00 2001 From: Jon Mason Date: Mon, 3 Oct 2011 09:50:20 -0500 Subject: PCI: Disable MPS configuration by default Add the ability to disable PCI-E MPS turning and using the BIOS configured MPS defaults. Due to the number of issues recently discovered on some x86 chipsets, make this the default behavior. Also, add the option for peer to peer DMA MPS configuration. Peer to peer DMA is outside the scope of this patch, but MPS configuration could prevent it from working by having the MPS on one root port different than the MPS on another. To work around this, simply make the system wide MPS the smallest possible value (128B). Signed-off-by: Jon Mason Acked-by: Benjamin Herrenschmidt Signed-off-by: Linus Torvalds --- drivers/pci/pci.c | 6 +++++- drivers/pci/probe.c | 14 +++++++++++++- include/linux/pci.h | 3 ++- 3 files changed, 20 insertions(+), 3 deletions(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 4e84fd4a4312..e9651f0a8817 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -77,7 +77,7 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE; unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE; unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE; -enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_SAFE; +enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_TUNE_OFF; /* * The default CLS is used if arch didn't set CLS explicitly and not @@ -3568,10 +3568,14 @@ static int __init pci_setup(char *str) pci_hotplug_io_size = memparse(str + 9, &str); } else if (!strncmp(str, "hpmemsize=", 10)) { pci_hotplug_mem_size = memparse(str + 10, &str); + } else if (!strncmp(str, "pcie_bus_tune_off", 17)) { + pcie_bus_config = PCIE_BUS_TUNE_OFF; } else if (!strncmp(str, "pcie_bus_safe", 13)) { pcie_bus_config = PCIE_BUS_SAFE; } else if (!strncmp(str, "pcie_bus_perf", 13)) { pcie_bus_config = PCIE_BUS_PERFORMANCE; + } else if (!strncmp(str, "pcie_bus_peer2peer", 18)) { + pcie_bus_config = PCIE_BUS_PEER2PEER; } else { printk(KERN_ERR "PCI: Unknown option `%s'\n", str); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index f3f94a5c068f..6ab6bd3df4b2 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1458,12 +1458,24 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) */ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) { - u8 smpss = mpss; + u8 smpss; if (!pci_is_pcie(bus->self)) return; + if (pcie_bus_config == PCIE_BUS_TUNE_OFF) + return; + + /* FIXME - Peer to peer DMA is possible, though the endpoint would need + * to be aware to the MPS of the destination. To work around this, + * simply force the MPS of the entire system to the smallest possible. + */ + if (pcie_bus_config == PCIE_BUS_PEER2PEER) + smpss = 0; + if (pcie_bus_config == PCIE_BUS_SAFE) { + smpss = mpss; + pcie_find_smpss(bus->self, &smpss); pci_walk_bus(bus, pcie_find_smpss, &smpss); } diff --git a/include/linux/pci.h b/include/linux/pci.h index 8c230cbcbb48..9fc01226055b 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -621,8 +621,9 @@ struct pci_driver { extern void pcie_bus_configure_settings(struct pci_bus *bus, u8 smpss); enum pcie_bus_config_types { - PCIE_BUS_PERFORMANCE, + PCIE_BUS_TUNE_OFF, PCIE_BUS_SAFE, + PCIE_BUS_PERFORMANCE, PCIE_BUS_PEER2PEER, }; -- cgit v1.2.3 From 379021d5c0899fcf9410cae4ca7a59a5a94ca769 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 3 Oct 2011 23:16:33 +0200 Subject: PCI / PM: Extend PME polling to all PCI devices The land of PCI power management is a land of sorrow and ugliness, especially in the area of signaling events by devices. There are devices that set their PME Status bits, but don't really bother to send a PME message or assert PME#. There are hardware vendors who don't connect PME# lines to the system core logic (they know who they are). There are PCI Express Root Ports that don't bother to trigger interrupts when they receive PME messages from the devices below. There are ACPI BIOSes that forget to provide _PRW methods for devices capable of signaling wakeup. Finally, there are BIOSes that do provide _PRW methods for such devices, but then don't bother to call Notify() for those devices from the corresponding _Lxx/_Exx GPE-handling methods. In all of these cases the kernel doesn't have a chance to receive a proper notification that it should wake up a device, so devices stay in low-power states forever. Worse yet, in some cases they continuously send PME Messages that are silently ignored, because the kernel simply doesn't know that it should clear the device's PME Status bit. This problem was first observed for "parallel" (non-Express) PCI devices on add-on cards and Matthew Garrett addressed it by adding code that polls PME Status bits of such devices, if they are enabled to signal PME, to the kernel. Recently, however, it has turned out that PCI Express devices are also affected by this issue and that it is not limited to add-on devices, so it seems necessary to extend the PME polling to all PCI devices, including PCI Express and planar ones. Still, it would be wasteful to poll the PME Status bits of devices that are known to receive proper PME notifications, so make the kernel (1) poll the PME Status bits of all PCI and PCIe devices enabled to signal PME and (2) disable the PME Status polling for devices for which correct PME notifications are received. Tested-by: Sarah Sharp Signed-off-by: Rafael J. Wysocki Signed-off-by: Jesse Barnes --- drivers/pci/pci-acpi.c | 3 +++ drivers/pci/pci.c | 41 ++++++++++++++++++++--------------------- drivers/pci/pcie/pme.c | 9 +++++++++ include/linux/pci.h | 1 + 4 files changed, 33 insertions(+), 21 deletions(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index d36f41ea8cbf..cd3c4f1cdf1b 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -46,6 +46,9 @@ static void pci_acpi_wake_dev(acpi_handle handle, u32 event, void *context) struct pci_dev *pci_dev = context; if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) { + if (pci_dev->pme_poll) + pci_dev->pme_poll = false; + pci_wakeup_event(pci_dev); pci_check_pme_status(pci_dev); pm_runtime_resume(&pci_dev->dev); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e9651f0a8817..7cd417e94058 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1407,13 +1407,16 @@ bool pci_check_pme_status(struct pci_dev *dev) /** * pci_pme_wakeup - Wake up a PCI device if its PME Status bit is set. * @dev: Device to handle. - * @ign: Ignored. + * @pme_poll_reset: Whether or not to reset the device's pme_poll flag. * * Check if @dev has generated PME and queue a resume request for it in that * case. */ -static int pci_pme_wakeup(struct pci_dev *dev, void *ign) +static int pci_pme_wakeup(struct pci_dev *dev, void *pme_poll_reset) { + if (pme_poll_reset && dev->pme_poll) + dev->pme_poll = false; + if (pci_check_pme_status(dev)) { pci_wakeup_event(dev); pm_request_resume(&dev->dev); @@ -1428,7 +1431,7 @@ static int pci_pme_wakeup(struct pci_dev *dev, void *ign) void pci_pme_wakeup_bus(struct pci_bus *bus) { if (bus) - pci_walk_bus(bus, pci_pme_wakeup, NULL); + pci_walk_bus(bus, pci_pme_wakeup, (void *)true); } /** @@ -1446,30 +1449,25 @@ bool pci_pme_capable(struct pci_dev *dev, pci_power_t state) static void pci_pme_list_scan(struct work_struct *work) { - struct pci_pme_device *pme_dev; + struct pci_pme_device *pme_dev, *n; mutex_lock(&pci_pme_list_mutex); if (!list_empty(&pci_pme_list)) { - list_for_each_entry(pme_dev, &pci_pme_list, list) - pci_pme_wakeup(pme_dev->dev, NULL); - schedule_delayed_work(&pci_pme_work, msecs_to_jiffies(PME_TIMEOUT)); + list_for_each_entry_safe(pme_dev, n, &pci_pme_list, list) { + if (pme_dev->dev->pme_poll) { + pci_pme_wakeup(pme_dev->dev, NULL); + } else { + list_del(&pme_dev->list); + kfree(pme_dev); + } + } + if (!list_empty(&pci_pme_list)) + schedule_delayed_work(&pci_pme_work, + msecs_to_jiffies(PME_TIMEOUT)); } mutex_unlock(&pci_pme_list_mutex); } -/** - * pci_external_pme - is a device an external PCI PME source? - * @dev: PCI device to check - * - */ - -static bool pci_external_pme(struct pci_dev *dev) -{ - if (pci_is_pcie(dev) || dev->bus->number == 0) - return false; - return true; -} - /** * pci_pme_active - enable or disable PCI device's PME# function * @dev: PCI device to handle. @@ -1503,7 +1501,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable) hit, and the power savings from the devices will still be a win. */ - if (pci_external_pme(dev)) { + if (dev->pme_poll) { struct pci_pme_device *pme_dev; if (enable) { pme_dev = kmalloc(sizeof(struct pci_pme_device), @@ -1821,6 +1819,7 @@ void pci_pm_init(struct pci_dev *dev) (pmc & PCI_PM_CAP_PME_D3) ? " D3hot" : "", (pmc & PCI_PM_CAP_PME_D3cold) ? " D3cold" : ""); dev->pme_support = pmc >> PCI_PM_CAP_PME_SHIFT; + dev->pme_poll = true; /* * Make device's PM flags reflect the wake-up capability, but * let the user space enable it to wake up the system as needed. diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index 0057344a3fcb..001f1b78f39c 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -84,6 +84,9 @@ static bool pcie_pme_walk_bus(struct pci_bus *bus) list_for_each_entry(dev, &bus->devices, bus_list) { /* Skip PCIe devices in case we started from a root port. */ if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) { + if (dev->pme_poll) + dev->pme_poll = false; + pci_wakeup_event(dev); pm_request_resume(&dev->dev); ret = true; @@ -142,6 +145,9 @@ static void pcie_pme_handle_request(struct pci_dev *port, u16 req_id) /* First, check if the PME is from the root port itself. */ if (port->devfn == devfn && port->bus->number == busnr) { + if (port->pme_poll) + port->pme_poll = false; + if (pci_check_pme_status(port)) { pm_request_resume(&port->dev); found = true; @@ -187,6 +193,9 @@ static void pcie_pme_handle_request(struct pci_dev *port, u16 req_id) /* The device is there, but we have to check its PME status. */ found = pci_check_pme_status(dev); if (found) { + if (dev->pme_poll) + dev->pme_poll = false; + pci_wakeup_event(dev); pm_request_resume(&dev->dev); } diff --git a/include/linux/pci.h b/include/linux/pci.h index 4ff6d4e9455c..176c981a90d4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -273,6 +273,7 @@ struct pci_dev { unsigned int pme_support:5; /* Bitmask of states from which PME# can be generated */ unsigned int pme_interrupt:1; + unsigned int pme_poll:1; /* Poll device's PME status bit */ unsigned int d1_support:1; /* Low power state D1 is supported */ unsigned int d2_support:1; /* Low power state D2 is supported */ unsigned int no_d1d2:1; /* Only allow D0 and D3 */ -- cgit v1.2.3 From a1c473aa11e61bc871be16279c9bf976acf22504 Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt Date: Fri, 14 Oct 2011 14:56:15 -0500 Subject: pci: Clamp pcie_set_readrq() when using "performance" settings When configuring the PCIe settings for "performance", we allow parents to have a larger Max Payload Size than children and rely on children Max Read Request Size to not be larger than their own MPS to avoid having the host bridge generate responses they can't cope with. However, various drivers in Linux call pci_set_readrq() with arbitrary values, assuming this to be a simple performance tweak. This breaks under our "performance" configuration. Fix that by making sure the value programmed by pcie_set_readrq() is never larger than the configured MPS for that device. Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Jon Mason Signed-off-by: Jesse Barnes --- drivers/pci/pci.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) (limited to 'drivers/pci/pci.c') diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 7cd417e94058..6f45a73c6e9f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3202,8 +3202,6 @@ int pcie_set_readrq(struct pci_dev *dev, int rq) if (rq < 128 || rq > 4096 || !is_power_of_2(rq)) goto out; - v = (ffs(rq) - 8) << 12; - cap = pci_pcie_cap(dev); if (!cap) goto out; @@ -3211,6 +3209,22 @@ int pcie_set_readrq(struct pci_dev *dev, int rq) err = pci_read_config_word(dev, cap + PCI_EXP_DEVCTL, &ctl); if (err) goto out; + /* + * If using the "performance" PCIe config, we clamp the + * read rq size to the max packet size to prevent the + * host bridge generating requests larger than we can + * cope with + */ + if (pcie_bus_config == PCIE_BUS_PERFORMANCE) { + int mps = pcie_get_mps(dev); + + if (mps < 0) + return mps; + if (mps < rq) + rq = mps; + } + + v = (ffs(rq) - 8) << 12; if ((ctl & PCI_EXP_DEVCTL_READRQ) != v) { ctl &= ~PCI_EXP_DEVCTL_READRQ; -- cgit v1.2.3