Merge tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki: "These include two new drivers (cpufreq driver for Apple SoC CPU P-states and the SCMI Powercap based power capping driver), other new hardware support and driver extensions (Qualcomm cpufreq driver and its DT bindings, TI cpufreq driver, intel_pstate, intel-uncore-freq), a bunch of fixes and cleanups all over and a cpupower utility update including new features related to RAPL support. Specifics: - Fix nasty and hard to debug race condition introduced by mistake in the runtime PM core code and clean up that code somewhat on top of the fix (Rafael Wysocki) - Generalize of_perf_domain_get_sharing_cpumask phandle format (Hector Martin) - Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin) - Update Qualcomm cpufreq driver (Manivannan Sadhasivam, Chen Hui): - CPU clock provider support - Generic cleanups or reorganization - Potential memleak fix - Fix of the return value of cpufreq_driver->get() - Update Qualcomm cpufreq driver's DT bindings (Manivannan Sadhasivam, Rob Herring, Melody Olvera): - Support for CPU clock provider - Missing cache-related properties fixes - Support for QDU1000/QRU1000 - Add support for ti,am625 SoC and enable build of ti-cpufreq for ARCH_K3 (Dave Gerlach, and Vibhore Vardhan) - Use flexible array to simplify memory allocation in the tegra186 cpufreq driver (Christophe JAILLET) - Convert cpufreq statistics code to use sysfs_emit_at() (ye xingchen) - Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni Gherdovich) - Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq driver (Xiongfeng Wang) - Initialize the kobj_unregister completion before calling kobject_init_and_add() in the cpufreq core code (Yongqiang Liu) - Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes, Nathan Chancellor) - Make intel_pstate accept initial EPP value of 0x80 (Srinivas Pandruvada) - Make read-only array sys_clk_src in the SPEAr cpufreq driver static (Colin Ian King) - Make array speeds in the longhaul cpufreq driver static (Colin Ian King) - Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy Shevchenko) - Drop a reference to CVS from cpufreq documentation (Conghui Wang) - Improve kernel messages printed by the PSCI cpuidle driver (Ulf Hansson) - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson) - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki) - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo) - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin) - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa) - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa) - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo) - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo) - Drop generic power domain status manipulation during hibernate restore (Shawn Guo) - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada) - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET) - Add SCMI Powercap based power capping driver (Cristian Marussi) - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy) - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn) - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar) - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros) - Fix clock-latency-ns property in DT example (Serge Semin) - Add a private governor_data for devfreq governors (Kant Fan) - Reorganize devfreq code to use device_match_of_node() and devm_platform_get_and_ioremap_resource() instead of open coding them (ye xingchen, Minghao Chi) - Make cpupower choose base_cpu to display default cpupower details instead of picking CPU 0 (Saket Kumar Bhaskar) - Add Georgian translation to cpupower documentation (Zurab Kargareteli) - Introduce powercap intel-rapl library, powercap-info command, and RAPL monitor into cpupower (Thomas Renninger)" * tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits) PM: runtime: Adjust white space in the core code cpufreq: Remove CVS version control contents from documentation cpufreq: stats: Convert to use sysfs_emit_at() API cpufreq: ACPI: Only set boost MSRs on supported CPUs PM: sleep: Refine error message in try_to_freeze_tasks() PM: sleep: Avoid using pr_cont() in the tasks freezing code PM: runtime: Relocate rpm_callback() right after __rpm_callback() PM: runtime: Do not call __rpm_callback() from rpm_idle() PM / devfreq: event: use devm_platform_get_and_ioremap_resource() PM / devfreq: event: Use device_match_of_node() PM / devfreq: Use device_match_of_node() powercap: idle_inject: Fix warnings with make W=1 PM: hibernate: Complain about memory map mismatches during resume dt-bindings: cpufreq: cpufreq-qcom-hw: Add QDU1000/QRU1000 cpufreq cpufreq: tegra186: Use flexible array to simplify memory allocation cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain cpupower: Introduce powercap intel-rapl library and powercap-info command cpupower: Add Georgian translation cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode cpufreq: amd_freq_sensitivity: Add missing pci_dev_put() ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2022-12-12 13:19:07 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2022-12-12 13:19:07 -0800
commit: 045e222d0a9dcec152abe0633f538cafd965b12b (patch)
tree: 09db255e1ef5ff3a3d17bb6744662650cfb99244 /drivers
parent: 631aa744423173bf921191ba695bbc7c1aabd9e0 (diff)
parent: ed6a00471dd444af085164f987e3d5e8145da35a (diff)
30 files changed, 1349 insertions, 458 deletions
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index a3df9b949aeb..967bcf9d415e 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -964,8 +964,8 @@ static int genpd_runtime_suspend(struct device *dev)
 		return 0;
 
 	genpd_lock(genpd);
-	gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 	genpd_power_off(genpd, true, 0);
+	gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 	genpd_unlock(genpd);
 
 	return 0;
@@ -1003,9 +1003,8 @@ static int genpd_runtime_resume(struct device *dev)
 		goto out;
 
 	genpd_lock(genpd);
+	genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
 	ret = genpd_power_on(genpd, 0);
-	if (!ret)
-		genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
 	genpd_unlock(genpd);
 
 	if (ret)
@@ -1043,8 +1042,8 @@ err_stop:
 err_poweroff:
 	if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
 		genpd_lock(genpd);
-		gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 		genpd_power_off(genpd, true, 0);
+		gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
 		genpd_unlock(genpd);
 	}
 
@@ -1214,12 +1213,15 @@ static int genpd_prepare(struct device *dev)
  * genpd_finish_suspend - Completion of suspend or hibernation of device in an
  *   I/O pm domain.
  * @dev: Device to suspend.
- * @poweroff: Specifies if this is a poweroff_noirq or suspend_noirq callback.
+ * @suspend_noirq: Generic suspend_noirq callback.
+ * @resume_noirq: Generic resume_noirq callback.
  *
  * Stop the device and remove power from the domain if all devices in it have
  * been stopped.
  */
-static int genpd_finish_suspend(struct device *dev, bool poweroff)
+static int genpd_finish_suspend(struct device *dev,
+				int (*suspend_noirq)(struct device *dev),
+				int (*resume_noirq)(struct device *dev))
 {
 	struct generic_pm_domain *genpd;
 	int ret = 0;
@@ -1228,10 +1230,7 @@ static int genpd_finish_suspend(struct device *dev, bool poweroff)
 	if (IS_ERR(genpd))
 		return -EINVAL;
 
-	if (poweroff)
-		ret = pm_generic_poweroff_noirq(dev);
-	else
-		ret = pm_generic_suspend_noirq(dev);
+	ret = suspend_noirq(dev);
 	if (ret)
 		return ret;
 
@@ -1242,10 +1241,7 @@ static int genpd_finish_suspend(struct device *dev, bool poweroff)
 	    !pm_runtime_status_suspended(dev)) {
 		ret = genpd_stop_dev(genpd, dev);
 		if (ret) {
-			if (poweroff)
-				pm_generic_restore_noirq(dev);
-			else
-				pm_generic_resume_noirq(dev);
+			resume_noirq(dev);
 			return ret;
 		}
 	}
@@ -1269,16 +1265,20 @@ static int genpd_suspend_noirq(struct device *dev)
 {
 	dev_dbg(dev, "%s()\n", __func__);
 
-	return genpd_finish_suspend(dev, false);
+	return genpd_finish_suspend(dev,
+				    pm_generic_suspend_noirq,
+				    pm_generic_resume_noirq);
 }
 
 /**
- * genpd_resume_noirq - Start of resume of device in an I/O PM domain.
+ * genpd_finish_resume - Completion of resume of device in an I/O PM domain.
  * @dev: Device to resume.
+ * @resume_noirq: Generic resume_noirq callback.
  *
  * Restore power to the device's PM domain, if necessary, and start the device.
  */
-static int genpd_resume_noirq(struct device *dev)
+static int genpd_finish_resume(struct device *dev,
+			       int (*resume_noirq)(struct device *dev))
 {
 	struct generic_pm_domain *genpd;
 	int ret;
@@ -1290,7 +1290,7 @@ static int genpd_resume_noirq(struct device *dev)
 		return -EINVAL;
 
 	if (device_wakeup_path(dev) && genpd_is_active_wakeup(genpd))
-		return pm_generic_resume_noirq(dev);
+		return resume_noirq(dev);
 
 	genpd_lock(genpd);
 	genpd_sync_power_on(genpd, true, 0);
@@ -1308,6 +1308,19 @@ static int genpd_resume_noirq(struct device *dev)
 }
 
 /**
+ * genpd_resume_noirq - Start of resume of device in an I/O PM domain.
+ * @dev: Device to resume.
+ *
+ * Restore power to the device's PM domain, if necessary, and start the device.
+ */
+static int genpd_resume_noirq(struct device *dev)
+{
+	dev_dbg(dev, "%s()\n", __func__);
+
+	return genpd_finish_resume(dev, pm_generic_resume_noirq);
+}
+
+/**
  * genpd_freeze_noirq - Completion of freezing a device in an I/O PM domain.
  * @dev: Device to freeze.
  *
@@ -1318,24 +1331,11 @@ static int genpd_resume_noirq(struct device *dev)
  */
 static int genpd_freeze_noirq(struct device *dev)
 {
-	const struct generic_pm_domain *genpd;
-	int ret = 0;
-
 	dev_dbg(dev, "%s()\n", __func__);
 
-	genpd = dev_to_genpd(dev);
-	if (IS_ERR(genpd))
-		return -EINVAL;
-
-	ret = pm_generic_freeze_noirq(dev);
-	if (ret)
-		return ret;
-
-	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
-	    !pm_runtime_status_suspended(dev))
-		ret = genpd_stop_dev(genpd, dev);
-
-	return ret;
+	return genpd_finish_suspend(dev,
+				    pm_generic_freeze_noirq,
+				    pm_generic_thaw_noirq);
 }
 
 /**
@@ -1347,23 +1347,9 @@ static int genpd_freeze_noirq(struct device *dev)
  */
 static int genpd_thaw_noirq(struct device *dev)
 {
-	const struct generic_pm_domain *genpd;
-	int ret = 0;
-
 	dev_dbg(dev, "%s()\n", __func__);
 
-	genpd = dev_to_genpd(dev);
-	if (IS_ERR(genpd))
-		return -EINVAL;
-
-	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
-	    !pm_runtime_status_suspended(dev)) {
-		ret = genpd_start_dev(genpd, dev);
-		if (ret)
-			return ret;
-	}
-
-	return pm_generic_thaw_noirq(dev);
+	return genpd_finish_resume(dev, pm_generic_thaw_noirq);
 }
 
 /**
@@ -1378,7 +1364,9 @@ static int genpd_poweroff_noirq(struct device *dev)
 {
 	dev_dbg(dev, "%s()\n", __func__);
 
-	return genpd_finish_suspend(dev, true);
+	return genpd_finish_suspend(dev,
+				    pm_generic_poweroff_noirq,
+				    pm_generic_restore_noirq);
 }
 
 /**
@@ -1390,40 +1378,9 @@ static int genpd_poweroff_noirq(struct device *dev)
  */
 static int genpd_restore_noirq(struct device *dev)
 {
-	struct generic_pm_domain *genpd;
-	int ret = 0;
-
 	dev_dbg(dev, "%s()\n", __func__);
 
-	genpd = dev_to_genpd(dev);
-	if (IS_ERR(genpd))
-		return -EINVAL;
-
-	/*
-	 * At this point suspended_count == 0 means we are being run for the
-	 * first time for the given domain in the present cycle.
-	 */
-	genpd_lock(genpd);
-	if (genpd->suspended_count++ == 0) {
-		/*
-		 * The boot kernel might put the domain into arbitrary state,
-		 * so make it appear as powered off to genpd_sync_power_on(),
-		 * so that it tries to power it on in case it was really off.
-		 */
-		genpd->status = GENPD_STATE_OFF;
-	}
-
-	genpd_sync_power_on(genpd, true, 0);
-	genpd_unlock(genpd);
-
-	if (genpd->dev_ops.stop && genpd->dev_ops.start &&
-	    !pm_runtime_status_suspended(dev)) {
-		ret = genpd_start_dev(genpd, dev);
-		if (ret)
-			return ret;
-	}
-
-	return pm_generic_restore_noirq(dev);
+	return genpd_finish_resume(dev, pm_generic_restore_noirq);
 }
 
 /**
@@ -2775,17 +2732,6 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
 	dev->pm_domain->detach = genpd_dev_pm_detach;
 	dev->pm_domain->sync = genpd_dev_pm_sync;
 
-	if (power_on) {
-		genpd_lock(pd);
-		ret = genpd_power_on(pd, 0);
-		genpd_unlock(pd);
-	}
-
-	if (ret) {
-		genpd_remove_device(pd, dev);
-		return -EPROBE_DEFER;
-	}
-
 	/* Set the default performance state */
 	pstate = of_get_required_opp_performance_state(dev->of_node, index);
 	if (pstate < 0 && pstate != -ENODEV && pstate != -EOPNOTSUPP) {
@@ -2797,6 +2743,24 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
 			goto err;
 		dev_gpd_data(dev)->default_pstate = pstate;
 	}
+
+	if (power_on) {
+		genpd_lock(pd);
+		ret = genpd_power_on(pd, 0);
+		genpd_unlock(pd);
+	}
+
+	if (ret) {
+		/* Drop the default performance state */
+		if (dev_gpd_data(dev)->default_pstate) {
+			dev_pm_genpd_set_performance_state(dev, 0);
+			dev_gpd_data(dev)->default_pstate = 0;
+		}
+
+		genpd_remove_device(pd, dev);
+		return -EPROBE_DEFER;
+	}
+
 	return 1;
 
 err:
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index b52049098d4e..50e726b6c2cf 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -243,8 +243,7 @@ void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
 		 * flag was set by any one of the descendants.
 		 */
 		if (!dev || (!enable &&
-			     device_for_each_child(dev, NULL,
-						   dev_memalloc_noio)))
+		    device_for_each_child(dev, NULL, dev_memalloc_noio)))
 			break;
 	}
 	mutex_unlock(&dev_hotplug_mutex);
@@ -265,15 +264,13 @@ static int rpm_check_suspend_allowed(struct device *dev)
 		retval = -EACCES;
 	else if (atomic_read(&dev->power.usage_count))
 		retval = -EAGAIN;
-	else if (!dev->power.ignore_children &&
-			atomic_read(&dev->power.child_count))
+	else if (!dev->power.ignore_children && atomic_read(&dev->power.child_count))
 		retval = -EBUSY;
 
 	/* Pending resume requests take precedence over suspends. */
-	else if ((dev->power.deferred_resume
-			&& dev->power.runtime_status == RPM_SUSPENDING)
-	    || (dev->power.request_pending
-			&& dev->power.request == RPM_REQ_RESUME))
+	else if ((dev->power.deferred_resume &&
+	    dev->power.runtime_status == RPM_SUSPENDING) ||
+	    (dev->power.request_pending && dev->power.request == RPM_REQ_RESUME))
 		retval = -EAGAIN;
 	else if (__dev_pm_qos_resume_latency(dev) == 0)
 		retval = -EPERM;
@@ -404,9 +401,9 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
 		 *
 		 * Do that if resume fails too.
 		 */
-		if (use_links
-		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
-		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+		if (use_links &&
+		    ((dev->power.runtime_status == RPM_SUSPENDING && !retval) ||
+		    (dev->power.runtime_status == RPM_RESUMING && retval))) {
 			idx = device_links_read_lock();
 
 			__rpm_put_suppliers(dev, false);
@@ -422,6 +419,38 @@ fail:
 }
 
 /**
+ * rpm_callback - Run a given runtime PM callback for a given device.
+ * @cb: Runtime PM callback to run.
+ * @dev: Device to run the callback for.
+ */
+static int rpm_callback(int (*cb)(struct device *), struct device *dev)
+{
+	int retval;
+
+	if (dev->power.memalloc_noio) {
+		unsigned int noio_flag;
+
+		/*
+		 * Deadlock might be caused if memory allocation with
+		 * GFP_KERNEL happens inside runtime_suspend and
+		 * runtime_resume callbacks of one block device's
+		 * ancestor or the block device itself. Network
+		 * device might be thought as part of iSCSI block
+		 * device, so network device and its ancestor should
+		 * be marked as memalloc_noio too.
+		 */
+		noio_flag = memalloc_noio_save();
+		retval = __rpm_callback(cb, dev);
+		memalloc_noio_restore(noio_flag);
+	} else {
+		retval = __rpm_callback(cb, dev);
+	}
+
+	dev->power.runtime_error = retval;
+	return retval != -EACCES ? retval : -EIO;
+}
+
+/**
  * rpm_idle - Notify device bus type if the device can be suspended.
  * @dev: Device to notify the bus type about.
  * @rpmflags: Flag bits.
@@ -459,6 +488,7 @@ static int rpm_idle(struct device *dev, int rpmflags)
 	/* Act as though RPM_NOWAIT is always set. */
 	else if (dev->power.idle_notification)
 		retval = -EINPROGRESS;
+
 	if (retval)
 		goto out;
 
@@ -484,7 +514,17 @@ static int rpm_idle(struct device *dev, int rpmflags)
 
 	dev->power.idle_notification = true;
 
-	retval = __rpm_callback(callback, dev);
+	if (dev->power.irq_safe)
+		spin_unlock(&dev->power.lock);
+	else
+		spin_unlock_irq(&dev->power.lock);
+
+	retval = callback(dev);
+
+	if (dev->power.irq_safe)
+		spin_lock(&dev->power.lock);
+	else
+		spin_lock_irq(&dev->power.lock);
 
 	dev->power.idle_notification = false;
 	wake_up_all(&dev->power.wait_queue);
@@ -495,38 +535,6 @@ static int rpm_idle(struct device *dev, int rpmflags)
 }
 
 /**
- * rpm_callback - Run a given runtime PM callback for a given device.
- * @cb: Runtime PM callback to run.
- * @dev: Device to run the callback for.
- */
-static int rpm_callback(int (*cb)(struct device *), struct device *dev)
-{
-	int retval;
-
-	if (dev->power.memalloc_noio) {
-		unsigned int noio_flag;
-
-		/*
-		 * Deadlock might be caused if memory allocation with
-		 * GFP_KERNEL happens inside runtime_suspend and
-		 * runtime_resume callbacks of one block device's
-		 * ancestor or the block device itself. Network
-		 * device might be thought as part of iSCSI block
-		 * device, so network device and its ancestor should
-		 * be marked as memalloc_noio too.
-		 */
-		noio_flag = memalloc_noio_save();
-		retval = __rpm_callback(cb, dev);
-		memalloc_noio_restore(noio_flag);
-	} else {
-		retval = __rpm_callback(cb, dev);
-	}
-
-	dev->power.runtime_error = retval;
-	return retval != -EACCES ? retval : -EIO;
-}
-
-/**
  * rpm_suspend - Carry out runtime suspend of given device.
  * @dev: Device to suspend.
  * @rpmflags: Flag bits.
@@ -564,12 +572,12 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 	/* Synchronous suspends are not allowed in the RPM_RESUMING state. */
 	if (dev->power.runtime_status == RPM_RESUMING && !(rpmflags & RPM_ASYNC))
 		retval = -EAGAIN;
+
 	if (retval)
 		goto out;
 
 	/* If the autosuspend_delay time hasn't expired yet, reschedule. */
-	if ((rpmflags & RPM_AUTO)
-	    && dev->power.runtime_status != RPM_SUSPENDING) {
+	if ((rpmflags & RPM_AUTO) && dev->power.runtime_status != RPM_SUSPENDING) {
 		u64 expires = pm_runtime_autosuspend_expiration(dev);
 
 		if (expires != 0) {
@@ -584,7 +592,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 			 * rest.
 			 */
 			if (!(dev->power.timer_expires &&
-					dev->power.timer_expires <= expires)) {
+			    dev->power.timer_expires <= expires)) {
 				/*
 				 * We add a slack of 25% to gather wakeups
 				 * without sacrificing the granularity.
@@ -594,9 +602,9 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 
 				dev->power.timer_expires = expires;
 				hrtimer_start_range_ns(&dev->power.suspend_timer,
-						ns_to_ktime(expires),
-						slack,
-						HRTIMER_MODE_ABS);
+						       ns_to_ktime(expires),
+						       slack,
+						       HRTIMER_MODE_ABS);
 			}
 			dev->power.timer_autosuspends = 1;
 			goto out;
@@ -787,8 +795,8 @@ static int rpm_resume(struct device *dev, int rpmflags)
 		goto out;
 	}
 
-	if (dev->power.runtime_status == RPM_RESUMING
-	    || dev->power.runtime_status == RPM_SUSPENDING) {
+	if (dev->power.runtime_status == RPM_RESUMING ||
+	    dev->power.runtime_status == RPM_SUSPENDING) {
 		DEFINE_WAIT(wait);
 
 		if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
@@ -815,8 +823,8 @@ static int rpm_resume(struct device *dev, int rpmflags)
 		for (;;) {
 			prepare_to_wait(&dev->power.wait_queue, &wait,
 					TASK_UNINTERRUPTIBLE);
-			if (dev->power.runtime_status != RPM_RESUMING
-			    && dev->power.runtime_status != RPM_SUSPENDING)
+			if (dev->power.runtime_status != RPM_RESUMING &&
+			    dev->power.runtime_status != RPM_SUSPENDING)
 				break;
 
 			spin_unlock_irq(&dev->power.lock);
@@ -836,9 +844,9 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	 */
 	if (dev->power.no_callbacks && !parent && dev->parent) {
 		spin_lock_nested(&dev->parent->power.lock, SINGLE_DEPTH_NESTING);
-		if (dev->parent->power.disable_depth > 0
-		    || dev->parent->power.ignore_children
-		    || dev->parent->power.runtime_status == RPM_ACTIVE) {
+		if (dev->parent->power.disable_depth > 0 ||
+		    dev->parent->power.ignore_children ||
+		    dev->parent->power.runtime_status == RPM_ACTIVE) {
 			atomic_inc(&dev->parent->power.child_count);
 			spin_unlock(&dev->parent->power.lock);
 			retval = 1;
@@ -867,6 +875,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
 		parent = dev->parent;
 		if (dev->power.irq_safe)
 			goto skip_parent;
+
 		spin_unlock(&dev->power.lock);
 
 		pm_runtime_get_noresume(parent);
@@ -876,8 +885,8 @@ static int rpm_resume(struct device *dev, int rpmflags)
 		 * Resume the parent if it has runtime PM enabled and not been
 		 * set to ignore its children.
 		 */
-		if (!parent->power.disable_depth
-		    && !parent->power.ignore_children) {
+		if (!parent->power.disable_depth &&
+		    !parent->power.ignore_children) {
 			rpm_resume(parent, 0);
 			if (parent->power.runtime_status != RPM_ACTIVE)
 				retval = -EBUSY;
@@ -887,6 +896,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
 		spin_lock(&dev->power.lock);
 		if (retval)
 			goto out;
+
 		goto repeat;
 	}
  skip_parent:
@@ -1291,9 +1301,9 @@ int __pm_runtime_set_status(struct device *dev, unsigned int status)
 		 * not active, has runtime PM enabled and the
 		 * 'power.ignore_children' flag unset.
 		 */
-		if (!parent->power.disable_depth
-		    && !parent->power.ignore_children
-		    && parent->power.runtime_status != RPM_ACTIVE) {
+		if (!parent->power.disable_depth &&
+		    !parent->power.ignore_children &&
+		    parent->power.runtime_status != RPM_ACTIVE) {
 			dev_err(dev, "runtime PM trying to activate child device %s but parent (%s) is not active\n",
 				dev_name(dev),
 				dev_name(parent));
@@ -1358,9 +1368,9 @@ static void __pm_runtime_barrier(struct device *dev)
 		dev->power.request_pending = false;
 	}
 
-	if (dev->power.runtime_status == RPM_SUSPENDING
-	    || dev->power.runtime_status == RPM_RESUMING
-	    || dev->power.idle_notification) {
+	if (dev->power.runtime_status == RPM_SUSPENDING ||
+	    dev->power.runtime_status == RPM_RESUMING ||
+	    dev->power.idle_notification) {
 		DEFINE_WAIT(wait);
 
 		/* Suspend, wake-up or idle notification in progress. */
@@ -1445,8 +1455,8 @@ void __pm_runtime_disable(struct device *dev, bool check_resume)
 	 * means there probably is some I/O to process and disabling runtime PM
 	 * shouldn't prevent the device from processing the I/O.
 	 */
-	if (check_resume && dev->power.request_pending
-	    && dev->power.request == RPM_REQ_RESUME) {
+	if (check_resume && dev->power.request_pending &&
+	    dev->power.request == RPM_REQ_RESUME) {
 		/*
 		 * Prevent suspends and idle notifications from being carried
 		 * out after we have woken up the device.
@@ -1606,6 +1616,7 @@ void pm_runtime_irq_safe(struct device *dev)
 {
 	if (dev->parent)
 		pm_runtime_get_sync(dev->parent);
+
 	spin_lock_irq(&dev->power.lock);
 	dev->power.irq_safe = 1;
 	spin_unlock_irq(&dev->power.lock);
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 82e5de1f6f8c..0a0352d8fa45 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -41,6 +41,15 @@ config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM
 	  To compile this driver as a module, choose M here: the
 	  module will be called sun50i-cpufreq-nvmem.
 
+config ARM_APPLE_SOC_CPUFREQ
+	tristate "Apple Silicon SoC CPUFreq support"
+	depends on ARCH_APPLE || (COMPILE_TEST && 64BIT)
+	select PM_OPP
+	default ARCH_APPLE
+	help
+	  This adds the CPUFreq driver for Apple Silicon machines
+	  (e.g. Apple M1).
+
 config ARM_ARMADA_37XX_CPUFREQ
 	tristate "Armada 37xx CPUFreq support"
 	depends on ARCH_MVEBU && CPUFREQ_DT
@@ -340,8 +349,8 @@ config ARM_TEGRA194_CPUFREQ
 
 config ARM_TI_CPUFREQ
 	bool "Texas Instruments CPUFreq support"
-	depends on ARCH_OMAP2PLUS
-	default ARCH_OMAP2PLUS
+	depends on ARCH_OMAP2PLUS || ARCH_K3
+	default y
 	help
 	  This driver enables valid OPPs on the running platform based on
 	  values contained within the SoC in use. Enable this in order to
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 49b98c62c5af..32a7029e25ed 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY)	+= amd_freq_sensitivity.o
 
 ##################################################################################
 # ARM SoC drivers
+obj-$(CONFIG_ARM_APPLE_SOC_CPUFREQ)	+= apple-soc-cpufreq.o
 obj-$(CONFIG_ARM_ARMADA_37XX_CPUFREQ)	+= armada-37xx-cpufreq.o
 obj-$(CONFIG_ARM_ARMADA_8K_CPUFREQ)	+= armada-8k-cpufreq.o
 obj-$(CONFIG_ARM_BRCMSTB_AVS_CPUFREQ)	+= brcmstb-avs-cpufreq.o
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 1bb2b90ebb21..78adfb2ffff6 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -19,6 +19,7 @@
 #include <linux/compiler.h>
 #include <linux/dmi.h>
 #include <linux/slab.h>
+#include <linux/string_helpers.h>
 
 #include <linux/acpi.h>
 #include <linux/io.h>
@@ -135,8 +136,8 @@ static int set_boost(struct cpufreq_policy *policy, int val)
 {
 	on_each_cpu_mask(policy->cpus, boost_set_msr_each,
 			 (void *)(long)val, 1);
-	pr_debug("CPU %*pbl: Core Boosting %sabled.\n",
-		 cpumask_pr_args(policy->cpus), val ? "en" : "dis");
+	pr_debug("CPU %*pbl: Core Boosting %s.\n",
+		 cpumask_pr_args(policy->cpus), str_enabled_disabled(val));
 
 	return 0;
 }
@@ -535,15 +536,6 @@ static void free_acpi_perf_data(void)
 	free_percpu(acpi_perf_data);
 }
 
-static int cpufreq_boost_online(unsigned int cpu)
-{
-	/*
-	 * On the CPU_UP path we simply keep the boost-disable flag
-	 * in sync with the current global state.
-	 */
-	return boost_set_msr(acpi_cpufreq_driver.boost_enabled);
-}
-
 static int cpufreq_boost_down_prep(unsigned int cpu)
 {
 	/*
@@ -897,6 +889,9 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	if (perf->states[0].core_frequency * 1000 != freq_table[0].frequency)
 		pr_warn(FW_WARN "P-state 0 is not max freq\n");
 
+	if (acpi_cpufreq_driver.set_boost)
+		set_boost(policy, acpi_cpufreq_driver.boost_enabled);
+
 	return result;
 
 err_unreg:
@@ -916,6 +911,7 @@ static int acpi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 
 	pr_debug("%s\n", __func__);
 
+	cpufreq_boost_down_prep(policy->cpu);
 	policy->fast_switch_possible = false;
 	policy->driver_data = NULL;
 	acpi_processor_unregister_performance(data->acpi_perf_cpu);
@@ -958,12 +954,8 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
 	.attr		= acpi_cpufreq_attr,
 };
 
-static enum cpuhp_state acpi_cpufreq_online;
-
 static void __init acpi_cpufreq_boost_init(void)
 {
-	int ret;
-
 	if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA))) {
 		pr_debug("Boost capabilities not present in the processor\n");
 		return;
@@ -971,24 +963,6 @@ static void __init acpi_cpufreq_boost_init(void)
 
 	acpi_cpufreq_driver.set_boost = set_boost;
 	acpi_cpufreq_driver.boost_enabled = boost_state(0);
-
-	/*
-	 * This calls the online callback on all online cpu and forces all
-	 * MSRs to the same value.
-	 */
-	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cpufreq/acpi:online",
-				cpufreq_boost_online, cpufreq_boost_down_prep);
-	if (ret < 0) {
-		pr_err("acpi_cpufreq: failed to register hotplug callbacks\n");
-		return;
-	}
-	acpi_cpufreq_online = ret;
-}
-
-static void acpi_cpufreq_boost_exit(void)
-{
-	if (acpi_cpufreq_online > 0)
-		cpuhp_remove_state_nocalls(acpi_cpufreq_online);
 }
 
 static int __init acpi_cpufreq_init(void)
@@ -1032,7 +1006,6 @@ static int __init acpi_cpufreq_init(void)
 	ret = cpufreq_register_driver(&acpi_cpufreq_driver);
 	if (ret) {
 		free_acpi_perf_data();
-		acpi_cpufreq_boost_exit();
 	}
 	return ret;
 }
@@ -1041,8 +1014,6 @@ static void __exit acpi_cpufreq_exit(void)
 {
 	pr_debug("%s\n", __func__);
 
-	acpi_cpufreq_boost_exit();
-
 	cpufreq_unregister_driver(&acpi_cpufreq_driver);
 
 	free_acpi_perf_data();
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
index 6448e03bcf48..59b19b9975e8 100644
--- a/drivers/cpufreq/amd_freq_sensitivity.c
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -125,6 +125,8 @@ static int __init amd_freq_sensitivity_init(void)
 	if (!pcidev) {
 		if (!boot_cpu_has(X86_FEATURE_PROC_FEEDBACK))
 			return -ENODEV;
+	} else {
+		pci_dev_put(pcidev);
 	}
 
 	if (rdmsrl_safe(MSR_AMD64_FREQ_SENSITIVITY_ACTUAL, &val))
diff --git a/drivers/cpufreq/apple-soc-cpufreq.c b/drivers/cpufreq/apple-soc-cpufreq.c
new file mode 100644
index 000000000000..d1801281cdd9
--- /dev/null
+++ b/drivers/cpufreq/apple-soc-cpufreq.c
@@ -0,0 +1,352 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Apple SoC CPU cluster performance state driver
+ *
+ * Copyright The Asahi Linux Contributors
+ *
+ * Based on scpi-cpufreq.c
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/pm_opp.h>
+#include <linux/slab.h>
+
+#define APPLE_DVFS_CMD			0x20
+#define APPLE_DVFS_CMD_BUSY		BIT(31)
+#define APPLE_DVFS_CMD_SET		BIT(25)
+#define APPLE_DVFS_CMD_PS2		GENMASK(16, 12)
+#define APPLE_DVFS_CMD_PS1		GENMASK(4, 0)
+
+/* Same timebase as CPU counter (24MHz) */
+#define APPLE_DVFS_LAST_CHG_TIME	0x38
+
+/*
+ * Apple ran out of bits and had to shift this in T8112...
+ */
+#define APPLE_DVFS_STATUS			0x50
+#define APPLE_DVFS_STATUS_CUR_PS_T8103		GENMASK(7, 4)
+#define APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8103	4
+#define APPLE_DVFS_STATUS_TGT_PS_T8103		GENMASK(3, 0)
+#define APPLE_DVFS_STATUS_CUR_PS_T8112		GENMASK(9, 5)
+#define APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8112	5
+#define APPLE_DVFS_STATUS_TGT_PS_T8112		GENMASK(4, 0)
+
+/*
+ * Div is +1, base clock is 12MHz on existing SoCs.
+ * For documentation purposes. We use the OPP table to
+ * get the frequency.
+ */
+#define APPLE_DVFS_PLL_STATUS		0xc0
+#define APPLE_DVFS_PLL_FACTOR		0xc8
+#define APPLE_DVFS_PLL_FACTOR_MULT	GENMASK(31, 16)
+#define APPLE_DVFS_PLL_FACTOR_DIV	GENMASK(15, 0)
+
+#define APPLE_DVFS_TRANSITION_TIMEOUT 100
+
+struct apple_soc_cpufreq_info {
+	u64 max_pstate;
+	u64 cur_pstate_mask;
+	u64 cur_pstate_shift;
+};
+
+struct apple_cpu_priv {
+	struct device *cpu_dev;
+	void __iomem *reg_base;
+	const struct apple_soc_cpufreq_info *info;
+};
+
+static struct cpufreq_driver apple_soc_cpufreq_driver;
+
+static const struct apple_soc_cpufreq_info soc_t8103_info = {
+	.max_pstate = 15,
+	.cur_pstate_mask = APPLE_DVFS_STATUS_CUR_PS_T8103,
+	.cur_pstate_shift = APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8103,
+};
+
+static const struct apple_soc_cpufreq_info soc_t8112_info = {
+	.max_pstate = 31,
+	.cur_pstate_mask = APPLE_DVFS_STATUS_CUR_PS_T8112,
+	.cur_pstate_shift = APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8112,
+};
+
+static const struct apple_soc_cpufreq_info soc_default_info = {
+	.max_pstate = 15,
+	.cur_pstate_mask = 0, /* fallback */
+};
+
+static const struct of_device_id apple_soc_cpufreq_of_match[] = {
+	{
+		.compatible = "apple,t8103-cluster-cpufreq",
+		.data = &soc_t8103_info,
+	},
+	{
+		.compatible = "apple,t8112-cluster-cpufreq",
+		.data = &soc_t8112_info,
+	},
+	{
+		.compatible = "apple,cluster-cpufreq",
+		.data = &soc_default_info,
+	},
+	{}
+};
+
+static unsigned int apple_soc_cpufreq_get_rate(unsigned int cpu)
+{
+	struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
+	struct apple_cpu_priv *priv = policy->driver_data;
+	struct cpufreq_frequency_table *p;
+	unsigned int pstate;
+
+	if (priv->info->cur_pstate_mask) {
+		u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_STATUS);
+
+		pstate = (reg & priv->info->cur_pstate_mask) >>  priv->info->cur_pstate_shift;
+	} else {
+		/*
+		 * For the fallback case we might not know the layout of DVFS_STATUS,
+		 * so just use the command register value (which ignores boost limitations).
+		 */
+		u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_CMD);
+
+		pstate = FIELD_GET(APPLE_DVFS_CMD_PS1, reg);
+	}
+
+	cpufreq_for_each_valid_entry(p, policy->freq_table)
+		if (p->driver_data == pstate)
+			return p->frequency;
+
+	dev_err(priv->cpu_dev, "could not find frequency for pstate %d\n",
+		pstate);
+	return 0;
+}
+
+static int apple_soc_cpufreq_set_target(struct cpufreq_policy *policy,
+					unsigned int index)
+{
+	struct apple_cpu_priv *priv = policy->driver_data;
+	unsigned int pstate = policy->freq_table[index].driver_data;
+	u64 reg;
+
+	/* Fallback for newer SoCs */
+	if (index > priv->info->max_pstate)
+		index = priv->info->max_pstate;
+
+	if (readq_poll_timeout_atomic(priv->reg_base + APPLE_DVFS_CMD, reg,
+				      !(reg & APPLE_DVFS_CMD_BUSY), 2,
+				      APPLE_DVFS_TRANSITION_TIMEOUT)) {
+		return -EIO;
+	}
+
+	reg &= ~(APPLE_DVFS_CMD_PS1 | APPLE_DVFS_CMD_PS2);
+	reg |= FIELD_PREP(APPLE_DVFS_CMD_PS1, pstate);
+	reg |= FIELD_PREP(APPLE_DVFS_CMD_PS2, pstate);
+	reg |= APPLE_DVFS_CMD_SET;
+
+	writeq_relaxed(reg, priv->reg_base + APPLE_DVFS_CMD);
+
+	return 0;
+}
+
+static unsigned int apple_soc_cpufreq_fast_switch(struct cpufreq_policy *policy,
+						  unsigned int target_freq)
+{
+	if (apple_soc_cpufreq_set_target(policy, policy->cached_resolved_idx) < 0)
+		return 0;
+
+	return policy->freq_table[policy->cached_resolved_idx].frequency;
+}
+
+static int apple_soc_cpufreq_find_cluster(struct cpufreq_policy *policy,
+					  void __iomem **reg_base,
+					  const struct apple_soc_cpufreq_info **info)
+{
+	struct of_phandle_args args;
+	const struct of_device_id *match;
+	int ret = 0;
+
+	ret = of_perf_domain_get_sharing_cpumask(policy->cpu, "performance-domains",
+						 "#performance-domain-cells",
+						 policy->cpus, &args);
+	if (ret < 0)
+		return ret;
+
+	match = of_match_node(apple_soc_cpufreq_of_match, args.np);
+	of_node_put(args.np);
+	if (!match)
+		return -ENODEV;
+
+	*info = match->data;
+
+	*reg_base = of_iomap(args.np, 0);
+	if (IS_ERR(*reg_base))
+		return PTR_ERR(*reg_base);
+
+	return 0;
+}
+
+static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
+	&cpufreq_freq_attr_scaling_available_freqs,
+	NULL, /* Filled in below if boost is enabled */
+	NULL,
+};
+
+static int apple_soc_cpufreq_init(struct cpufreq_policy *policy)
+{
+	int ret, i;
+	unsigned int transition_latency;
+	void __iomem *reg_base;
+	struct device *cpu_dev;
+	struct apple_cpu_priv *priv;
+	const struct apple_soc_cpufreq_info *info;
+	struct cpufreq_frequency_table *freq_table;
+
+	cpu_dev = get_cpu_device(policy->cpu);
+	if (!cpu_dev) {
+		pr_err("failed to get cpu%d device\n", policy->cpu);
+		return -ENODEV;
+	}
+
+	ret = dev_pm_opp_of_add_table(cpu_dev);
+	if (ret < 0) {
+		dev_err(cpu_dev, "%s: failed to add OPP table: %d\n", __func__, ret);
+		return ret;
+	}
+
+	ret = apple_soc_cpufreq_find_cluster(policy, &reg_base, &info);
+	if (ret) {
+		dev_err(cpu_dev, "%s: failed to get cluster info: %d\n", __func__, ret);
+		return ret;
+	}
+
+	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
+	if (ret) {
+		dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n", __func__, ret);
+		goto out_iounmap;
+	}
+
+	ret = dev_pm_opp_get_opp_count(cpu_dev);
+	if (ret <= 0) {
+		dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n");
+		ret = -EPROBE_DEFER;
+		goto out_free_opp;
+	}
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv) {
+		ret = -ENOMEM;
+		goto out_free_opp;
+	}
+
+	ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
+	if (ret) {
+		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
+		goto out_free_priv;
+	}
+
+	/* Get OPP levels (p-state indexes) and stash them in driver_data */
+	for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
+		unsigned long rate = freq_table[i].frequency * 1000 + 999;
+		struct dev_pm_opp *opp = dev_pm_opp_find_freq_floor(cpu_dev, &rate);
+
+		if (IS_ERR(opp)) {
+			ret = PTR_ERR(opp);
+			goto out_free_cpufreq_table;
+		}
+		freq_table[i].driver_data = dev_pm_opp_get_level(opp);
+		dev_pm_opp_put(opp);
+	}
+
+	priv->cpu_dev = cpu_dev;
+	priv->reg_base = reg_base;
+	priv->info = info;
+	policy->driver_data = priv;
+	policy->freq_table = freq_table;
+
+	transition_latency = dev_pm_opp_get_max_transition_latency(cpu_dev);
+	if (!transition_latency)
+		transition_latency = CPUFREQ_ETERNAL;
+
+	policy->cpuinfo.transition_latency = transition_latency;
+	policy->dvfs_possible_from_any_cpu = true;
+	policy->fast_switch_possible = true;
+
+	if (policy_has_boost_freq(policy)) {
+		ret = cpufreq_enable_boost_support();
+		if (ret) {
+			dev_warn(cpu_dev, "failed to enable boost: %d\n", ret);
+		} else {
+			apple_soc_cpufreq_hw_attr[1] = &cpufreq_freq_attr_scaling_boost_freqs;
+			apple_soc_cpufreq_driver.boost_enabled = true;
+		}
+	}
+
+	return 0;
+
+out_free_cpufreq_table:
+	dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
+out_free_priv:
+	kfree(priv);
+out_free_opp:
+	dev_pm_opp_remove_all_dynamic(cpu_dev);
+out_iounmap:
+	iounmap(reg_base);
+	return ret;
+}
+
+static int apple_soc_cpufreq_exit(struct cpufreq_policy *policy)
+{
+	struct apple_cpu_priv *priv = policy->driver_data;
+
+	dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
+	dev_pm_opp_remove_all_dynamic(priv->cpu_dev);
+	iounmap(priv->reg_base);
+	kfree(priv);
+
+	return 0;
+}
+
+static struct cpufreq_driver apple_soc_cpufreq_driver = {
+	.name		= "apple-cpufreq",
+	.flags		= CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
+			  CPUFREQ_NEED_INITIAL_FREQ_CHECK | CPUFREQ_IS_COOLING_DEV,
+	.verify		= cpufreq_generic_frequency_table_verify,
+	.attr		= cpufreq_generic_attr,
+	.get		= apple_soc_cpufreq_get_rate,
+	.init		= apple_soc_cpufreq_init,
+	.exit		= apple_soc_cpufreq_exit,
+	.target_index	= apple_soc_cpufreq_set_target,
+	.fast_switch	= apple_soc_cpufreq_fast_switch,
+	.register_em	= cpufreq_register_em_with_opp,
+	.attr		= apple_soc_cpufreq_hw_attr,
+};
+
+static int __init apple_soc_cpufreq_module_init(void)
+{
+	if (!of_machine_is_compatible("apple,arm-platform"))
+		return -ENODEV;
+
+	return cpufreq_register_driver(&apple_soc_cpufreq_driver);
+}
+module_init(apple_soc_cpufreq_module_init);
+
+static void __exit apple_soc_cpufreq_module_exit(void)
+{
+	cpufreq_unregister_driver(&apple_soc_cpufreq_driver);
+}
+module_exit(apple_soc_cpufreq_module_exit);
+
+MODULE_DEVICE_TABLE(of, apple_soc_cpufreq_of_match);
+MODULE_AUTHOR("Hector Martin <marcan@marcan.st>");
+MODULE_DESCRIPTION("Apple SoC CPU cluster DVFS driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c
index 6ac3800db450..8ab672883043 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -103,6 +103,8 @@ static const struct of_device_id allowlist[] __initconst = {
 static const struct of_device_id blocklist[] __initconst = {
 	{ .compatible = "allwinner,sun50i-h6", },
 
+	{ .compatible = "apple,arm-platform", },
+
 	{ .compatible = "arm,vexpress", },
 
 	{ .compatible = "calxeda,highbank", },
@@ -160,6 +162,7 @@ static const struct of_device_id blocklist[] __initconst = {
 	{ .compatible = "ti,am43", },
 	{ .compatible = "ti,dra7", },
 	{ .compatible = "ti,omap3", },
+	{ .compatible = "ti,am625", },
 
 	{ .compatible = "qcom,ipq8064", },
 	{ .compatible = "qcom,apq8064", },
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 69b3d61852ac..7e56a42750ea 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1207,6 +1207,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	if (!zalloc_cpumask_var(&policy->real_cpus, GFP_KERNEL))
 		goto err_free_rcpumask;
 
+	init_completion(&policy->kobj_unregister);
 	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
 				   cpufreq_global_kobject, "policy%u", cpu);
 	if (ret) {
@@ -1245,7 +1246,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	init_rwsem(&policy->rwsem);
 	spin_lock_init(&policy->transition_lock);
 	init_waitqueue_head(&policy->transition_wait);
-	init_completion(&policy->kobj_unregister);
 	INIT_WORK(&policy->update, handle_update);
 
 	policy->cpu = cpu;
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 1570d6f3e75d..55c7ffd37d1c 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -128,25 +128,23 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
 	ssize_t len = 0;
 	int i, j, count;
 
-	len += scnprintf(buf + len, PAGE_SIZE - len, "   From  :    To\n");
-	len += scnprintf(buf + len, PAGE_SIZE - len, "         : ");
+	len += sysfs_emit_at(buf, len, "   From  :    To\n");
+	len += sysfs_emit_at(buf, len, "         : ");
 	for (i = 0; i < stats->state_num; i++) {
 		if (len >= PAGE_SIZE)
 			break;
-		len += scnprintf(buf + len, PAGE_SIZE - len, "%9u ",
-				stats->freq_table[i]);
+		len += sysfs_emit_at(buf, len, "%9u ", stats->freq_table[i]);
 	}
 	if (len >= PAGE_SIZE)
 		return PAGE_SIZE;
 
-	len += scnprintf(buf + len, PAGE_SIZE - len, "\n");
+	len += sysfs_emit_at(buf, len, "\n");
 
 	for (i = 0; i < stats->state_num; i++) {
 		if (len >= PAGE_SIZE)
 			break;
 
-		len += scnprintf(buf + len, PAGE_SIZE - len, "%9u: ",
-				stats->freq_table[i]);
+		len += sysfs_emit_at(buf, len, "%9u: ", stats->freq_table[i]);
 
 		for (j = 0; j < stats->state_num; j++) {
 			if (len >= PAGE_SIZE)
@@ -157,11 +155,11 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
 			else
 				count = stats->trans_table[i * stats->max_state + j];
 
-			len += scnprintf(buf + len, PAGE_SIZE - len, "%9u ", count);
+			len += sysfs_emit_at(buf, len, "%9u ", count);
 		}
 		if (len >= PAGE_SIZE)
 			break;
-		len += scnprintf(buf + len, PAGE_SIZE - len, "\n");
+		len += sysfs_emit_at(buf, len, "\n");
 	}
 
 	if (len >= PAGE_SIZE) {
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6ff73c30769f..fd73d6d2b808 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -298,6 +298,7 @@ static int hwp_active __read_mostly;
 static int hwp_mode_bdw __read_mostly;
 static bool per_cpu_limits __read_mostly;
 static bool hwp_boost __read_mostly;
+static bool hwp_forced __read_mostly;
 
 static struct cpufreq_driver *intel_pstate_driver __read_mostly;
 
@@ -1679,12 +1680,12 @@ static void intel_pstate_update_epp_defaults(struct cpudata *cpudata)
 		return;
 
 	/*
-	 * If powerup EPP is something other than chipset default 0x80 and
-	 * - is more performance oriented than 0x80 (default balance_perf EPP)
+	 * If the EPP is set by firmware, which means that firmware enabled HWP
+	 * - Is equal or less than 0x80 (default balance_perf EPP)
 	 * - But less performance oriented than performance EPP
 	 *   then use this as new balance_perf EPP.
 	 */
-	if (cpudata->epp_default < HWP_EPP_BALANCE_PERFORMANCE &&
+	if (hwp_forced && cpudata->epp_default <= HWP_EPP_BALANCE_PERFORMANCE &&
 	    cpudata->epp_default > HWP_EPP_PERFORMANCE) {
 		epp_values[EPP_INDEX_BALANCE_PERFORMANCE] = cpudata->epp_default;
 		return;
@@ -2378,6 +2379,7 @@ static const struct x86_cpu_id intel_pstate_cpu_ids[] = {
 	X86_MATCH(COMETLAKE,		core_funcs),
 	X86_MATCH(ICELAKE_X,		core_funcs),
 	X86_MATCH(TIGERLAKE,		core_funcs),
+	X86_MATCH(SAPPHIRERAPIDS_X,	core_funcs),
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
@@ -3384,7 +3386,7 @@ static int __init intel_pstate_init(void)
 
 	id = x86_match_cpu(hwp_support_ids);
 	if (id) {
-		bool hwp_forced = intel_pstate_hwp_is_enabled();
+		hwp_forced = intel_pstate_hwp_is_enabled();
 
 		if (hwp_forced)
 			pr_info("HWP enabled by BIOS\n");
diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index 3e000e1a75c6..4c57c6725c13 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -407,10 +407,10 @@ static int guess_fsb(int mult)
 {
 	int speed = cpu_khz / 1000;
 	int i;
-	int speeds[] = { 666, 1000, 1333, 2000 };
+	static const int speeds[] = { 666, 1000, 1333, 2000 };
 	int f_max, f_min;
 
-	for (i = 0; i < 4; i++) {
+	for (i = 0; i < ARRAY_SIZE(speeds); i++) {
 		f_max = ((speeds[i] * mult) + 50) / 100;
 		f_max += (ROUNDING / 2);
 		f_min = f_max - ROUNDING;
diff --git a/drivers/cpufreq/mediatek-cpufreq-hw.c b/drivers/cpufreq/mediatek-cpufreq-hw.c
index f0e0a35c7f21..f80339779084 100644
--- a/drivers/cpufreq/mediatek-cpufreq-hw.c
+++ b/drivers/cpufreq/mediatek-cpufreq-hw.c
@@ -160,6 +160,7 @@ static int mtk_cpu_resources_init(struct platform_device *pdev,
 	struct mtk_cpufreq_data *data;
 	struct device *dev = &pdev->dev;
 	struct resource *res;
+	struct of_phandle_args args;
 	void __iomem *base;
 	int ret, i;
 	int index;
@@ -168,11 +169,14 @@ static int mtk_cpu_resources_init(struct platform_device *pdev,
 	if (!data)
 		return -ENOMEM;
 
-	index = of_perf_domain_get_sharing_cpumask(policy->cpu, "performance-domains",
-						   "#performance-domain-cells",
-						   policy->cpus);
-	if (index < 0)
-		return index;
+	ret = of_perf_domain_get_sharing_cpumask(policy->cpu, "performance-domains",
+						 "#performance-domain-cells",
+						 policy->cpus, &args);
+	if (ret < 0)
+		return ret;
+
+	index = args.args[0];
+	of_node_put(args.np);
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, index);
 	if (!res) {
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 833589bc95e4..340fed35e45d 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/bitfield.h>
+#include <linux/clk-provider.h>
 #include <linux/cpufreq.h>
 #include <linux/init.h>
 #include <linux/interconnect.h>
@@ -43,7 +44,6 @@ struct qcom_cpufreq_soc_data {
 struct qcom_cpufreq_data {
 	void __iomem *base;
 	struct resource *res;
-	const struct qcom_cpufreq_soc_data *soc_data;
 
 	/*
 	 * Mutex to synchronize between de-init sequence and re-starting LMh
@@ -55,12 +55,18 @@ struct qcom_cpufreq_data {
 	bool cancel_throttle;
 	struct delayed_work throttle_work;
 	struct cpufreq_policy *policy;
+	struct clk_hw cpu_clk;
 
 	bool per_core_dcvs;
 
 	struct freq_qos_request throttle_freq_req;
 };
 
+static struct {
+	struct qcom_cpufreq_data *data;
+	const struct qcom_cpufreq_soc_data *soc_data;
+} qcom_cpufreq;
+
 static unsigned long cpu_hw_rate, xo_rate;
 static bool icc_scaling_enabled;
 
@@ -109,7 +115,7 @@ static int qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy,
 					unsigned int index)
 {
 	struct qcom_cpufreq_data *data = policy->driver_data;
-	const struct qcom_cpufreq_soc_data *soc_data = data->soc_data;
+	const struct qcom_cpufreq_soc_data *soc_data = qcom_cpufreq.soc_data;
 	unsigned long freq = policy->freq_table[index].frequency;
 	unsigned int i;
 
@@ -125,9 +131,37 @@ static int qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy,
 	return 0;
 }
 
+static unsigned long qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
+{
+	unsigned int lval;
+
+	if (qcom_cpufreq.soc_data->reg_current_vote)
+		lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_current_vote) & 0x3ff;
+	else
+		lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_domain_state) & 0xff;
+
+	return lval * xo_rate;
+}
+
+/* Get the current frequency of the CPU (after throttling) */
 static unsigned int qcom_cpufreq_hw_get(unsigned int cpu)
 {
 	struct qcom_cpufreq_data *data;
+	struct cpufreq_policy *policy;
+
+	policy = cpufreq_cpu_get_raw(cpu);
+	if (!policy)
+		return 0;
+
+	data = policy->driver_data;
+
+	return qcom_lmh_get_throttle_freq(data) / HZ_PER_KHZ;
+}
+
+/* Get the frequency requested by the cpufreq core for the CPU */
+static unsigned int qcom_cpufreq_get_freq(unsigned int cpu)
+{
+	struct qcom_cpufreq_data *data;
 	const struct qcom_cpufreq_soc_data *soc_data;
 	struct cpufreq_policy *policy;
 	unsigned int index;
@@ -137,7 +171,7 @@ static unsigned int qcom_cpufreq_hw_get(unsigned int cpu)
 		return 0;
 
 	data = policy->driver_data;
-	soc_data = data->soc_data;
+	soc_data = qcom_cpufreq.soc_data;
 
 	index = readl_relaxed(data->base + soc_data->reg_perf_state);
 	index = min(index, LUT_MAX_ENTRIES - 1);
@@ -149,7 +183,7 @@ static unsigned int qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
 						unsigned int target_freq)
 {
 	struct qcom_cpufreq_data *data = policy->driver_data;
-	const struct qcom_cpufreq_soc_data *soc_data = data->soc_data;
+	const struct qcom_cpufreq_soc_data *soc_data = qcom_cpufreq.soc_data;
 	unsigned int index;
 	unsigned int i;
 
@@ -173,7 +207,7 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
 	unsigned long rate;
 	int ret;
 	struct qcom_cpufreq_data *drv_data = policy->driver_data;
-	const struct qcom_cpufreq_soc_data *soc_data = drv_data->soc_data;
+	const struct qcom_cpufreq_soc_data *soc_data = qcom_cpufreq.soc_data;
 
 	table = kcalloc(LUT_MAX_ENTRIES + 1, sizeof(*table), GFP_KERNEL);
 	if (!table)
@@ -193,6 +227,7 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
 		}
 	} else if (ret != -ENODEV) {
 		dev_err(cpu_dev, "Invalid opp table in device tree\n");
+		kfree(table);
 		return ret;
 	} else {
 		policy->fast_switch_possible = true;
@@ -286,18 +321,6 @@ static void qcom_get_related_cpus(int index, struct cpumask *m)
 	}
 }
 
-static unsigned long qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
-{
-	unsigned int lval;
-
-	if (data->soc_data->reg_current_vote)
-		lval = readl_relaxed(data->base + data->soc_data->reg_current_vote) & 0x3ff;
-	else
-		lval = readl_relaxed(data->base + data->soc_data->reg_domain_state) & 0xff;
-
-	return lval * xo_rate;
-}
-
 static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 {
 	struct cpufreq_policy *policy = data->policy;
@@ -341,7 +364,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 	 * If h/w throttled frequency is higher than what cpufreq has requested
 	 * for, then stop polling and switch back to interrupt mechanism.
 	 */
-	if (throttled_freq >= qcom_cpufreq_hw_get(cpu))
+	if (throttled_freq >= qcom_cpufreq_get_freq(cpu))
 		enable_irq(data->throttle_irq);
 	else
 		mod_delayed_work(system_highpri_wq, &data->throttle_work,
@@ -367,9 +390,9 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
 	disable_irq_nosync(c_data->throttle_irq);
 	schedule_delayed_work(&c_data->throttle_work, 0);
 
-	if (c_data->soc_data->reg_intr_clr)
+	if (qcom_cpufreq.soc_data->reg_intr_clr)
 		writel_relaxed(GT_IRQ_STATUS,
-			       c_data->base + c_data->soc_data->reg_intr_clr);
+			       c_data->base + qcom_cpufreq.soc_data->reg_intr_clr);
 
 	return IRQ_HANDLED;
 }
@@ -503,8 +526,6 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
 	struct of_phandle_args args;
 	struct device_node *cpu_np;
 	struct device *cpu_dev;
-	struct resource *res;
-	void __iomem *base;
 	struct qcom_cpufreq_data *data;
 	int ret, index;
 
@@ -526,51 +547,18 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
 		return ret;
 
 	index = args.args[0];
-
-	res = platform_get_resource(pdev, IORESOURCE_MEM, index);
-	if (!res) {
-		dev_err(dev, "failed to get mem resource %d\n", index);
-		return -ENODEV;
-	}
-
-	if (!request_mem_region(res->start, resource_size(res), res->name)) {
-		dev_err(dev, "failed to request resource %pR\n", res);
-		return -EBUSY;
-	}
-
-	base = ioremap(res->start, resource_size(res));
-	if (!base) {
-		dev_err(dev, "failed to map resource %pR\n", res);
-		ret = -ENOMEM;
-		goto release_region;
-	}
-
-	data = kzalloc(sizeof(*data), GFP_KERNEL);
-	if (!data) {
-		ret = -ENOMEM;
-		goto unmap_base;
-	}
-
-	data->soc_data = of_device_get_match_data(&pdev->dev);
-	data->base = base;
-	data->res = res;
+	data = &qcom_cpufreq.data[index];
 
 	/* HW should be in enabled state to proceed */
-	if (!(readl_relaxed(base + data->soc_data->reg_enable) & 0x1)) {
+	if (!(readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_enable) & 0x1)) {
 		dev_err(dev, "Domain-%d cpufreq hardware not enabled\n", index);
-		ret = -ENODEV;
-		goto error;
+		return -ENODEV;
 	}
 
-	if (readl_relaxed(base + data->soc_data->reg_dcvs_ctrl) & 0x1)
+	if (readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_dcvs_ctrl) & 0x1)
 		data->per_core_dcvs = true;
 
 	qcom_get_related_cpus(index, policy->cpus);
-	if (cpumask_empty(policy->cpus)) {
-		dev_err(dev, "Domain-%d failed to get related CPUs\n", index);
-		ret = -ENOENT;
-		goto error;
-	}
 
 	policy->driver_data = data;
 	policy->dvfs_possible_from_any_cpu = true;
@@ -578,14 +566,13 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
 	ret = qcom_cpufreq_hw_read_lut(cpu_dev, policy);
 	if (ret) {
 		dev_err(dev, "Domain-%d failed to read LUT\n", index);
-		goto error;
+		return ret;
 	}
 
 	ret = dev_pm_opp_get_opp_count(cpu_dev);
 	if (ret <= 0) {
 		dev_err(cpu_dev, "Failed to add OPPs\n");
-		ret = -ENODEV;
-		goto error;
+		return -ENODEV;
 	}
 
 	if (policy_has_boost_freq(policy)) {
@@ -594,18 +581,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
 			dev_warn(cpu_dev, "failed to enable boost: %d\n", ret);
 	}
 
-	ret = qcom_cpufreq_hw_lmh_init(policy, index);
-	if (ret)
-		goto error;
-
-	return 0;
-error:
-	kfree(data);
-unmap_base:
-	iounmap(base);
-release_region:
-	release_mem_region(res->start, resource_size(res));
-	return ret;
+	return qcom_cpufreq_hw_lmh_init(policy, index);
 }
 
 static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
@@ -658,20 +634,33 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
 	.ready		= qcom_cpufreq_ready,
 };
 
+static unsigned long qcom_cpufreq_hw_recalc_rate(struct clk_hw *hw, unsigned long parent_rate)
+{
+	struct qcom_cpufreq_data *data = container_of(hw, struct qcom_cpufreq_data, cpu_clk);
+
+	return qcom_lmh_get_throttle_freq(data);
+}
+
+static const struct clk_ops qcom_cpufreq_hw_clk_ops = {
+	.recalc_rate = qcom_cpufreq_hw_recalc_rate,
+};
+
 static int qcom_cpufreq_hw_driver_probe(struct platform_device *pdev)
 {
+	struct clk_hw_onecell_data *clk_data;
+	struct device *dev = &pdev->dev;
 	struct device *cpu_dev;
 	struct clk *clk;
-	int ret;
+	int ret, i, num_domains;
 
-	clk = clk_get(&pdev->dev, "xo");
+	clk = clk_get(dev, "xo");
 	if (IS_ERR(clk))
 		return PTR_ERR(clk);
 
 	xo_rate = clk_get_rate(clk);
 	clk_put(clk);
 
-	clk = clk_get(&pdev->dev, "alternate");
+	clk = clk_get(dev, "alternate");
 	if (IS_ERR(clk))
 		return PTR_ERR(clk);
 
@@ -689,11 +678,70 @@ static int qcom_cpufreq_hw_driver_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	/* Allocate qcom_cpufreq_data based on the available frequency domains in DT */
+	num_domains = of_property_count_elems_of_size(dev->of_node, "reg", sizeof(u32) * 4);
+	if (num_domains <= 0)
+		return num_domains;
+
+	qcom_cpufreq.data = devm_kzalloc(dev, sizeof(struct qcom_cpufreq_data) * num_domains,
+					 GFP_KERNEL);
+	if (!qcom_cpufreq.data)
+		return -ENOMEM;
+
+	qcom_cpufreq.soc_data = of_device_get_match_data(dev);
+
+	clk_data = devm_kzalloc(dev, struct_size(clk_data, hws, num_domains), GFP_KERNEL);
+	if (!clk_data)
+		return -ENOMEM;
+
+	clk_data->num = num_domains;
+
+	for (i = 0; i < num_domains; i++) {
+		struct qcom_cpufreq_data *data = &qcom_cpufreq.data[i];
+		struct clk_init_data clk_init = {};
+		struct resource *res;
+		void __iomem *base;
+
+		base = devm_platform_get_and_ioremap_resource(pdev, i, &res);
+		if (IS_ERR(base)) {
+			dev_err(dev, "Failed to map resource %pR\n", res);
+			return PTR_ERR(base);
+		}
+
+		data->base = base;
+		data->res = res;
+
+		/* Register CPU clock for each frequency domain */
+		clk_init.name = kasprintf(GFP_KERNEL, "qcom_cpufreq%d", i);
+		if (!clk_init.name)
+			return -ENOMEM;
+
+		clk_init.flags = CLK_GET_RATE_NOCACHE;
+		clk_init.ops = &qcom_cpufreq_hw_clk_ops;
+		data->cpu_clk.init = &clk_init;
+
+		ret = devm_clk_hw_register(dev, &data->cpu_clk);
+		if (ret < 0) {
+			dev_err(dev, "Failed to register clock %d: %d\n", i, ret);
+			kfree(clk_init.name);
+			return ret;
+		}
+
+		clk_data->hws[i] = &data->cpu_clk;
+		kfree(clk_init.name);
+	}
+
+	ret = devm_of_clk_add_hw_provider(dev, of_clk_hw_onecell_get, clk_data);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add clock provider\n");
+		return ret;
+	}
+
 	ret = cpufreq_register_driver(&cpufreq_qcom_hw_driver);
 	if (ret)
-		dev_err(&pdev->dev, "CPUFreq HW driver failed to register\n");
+		dev_err(dev, "CPUFreq HW driver failed to register\n");
 	else
-		dev_dbg(&pdev->dev, "QCOM CPUFreq HW driver initialized\n");
+		dev_dbg(dev, "QCOM CPUFreq HW driver initialized\n");
 
 	return ret;
 }
diff --git a/drivers/cpufreq/spear-cpufreq.c b/drivers/cpufreq/spear-cpufreq.c
index 7d0d62a06bf3..c6fdf019dbde 100644
--- a/drivers/cpufreq/spear-cpufreq.c
+++ b/drivers/cpufreq/spear-cpufreq.c
@@ -39,7 +39,7 @@ static struct clk *spear1340_cpu_get_possible_parent(unsigned long newfreq)
 	 * In SPEAr1340, cpu clk's parent sys clk can take input from
 	 * following sources
 	 */
-	const char *sys_clk_src[] = {
+	static const char * const sys_clk_src[] = {
 		"sys_syn_clk",
 		"pll1_clk",
 		"pll2_clk",
diff --git a/drivers/cpufreq/tegra186-cpufreq.c b/drivers/cpufreq/tegra186-cpufreq.c
index 6c88827f4e62..f98f53bf1011 100644
--- a/drivers/cpufreq/tegra186-cpufreq.c
+++ b/drivers/cpufreq/tegra186-cpufreq.c
@@ -65,8 +65,8 @@ struct tegra186_cpufreq_cluster {
 
 struct tegra186_cpufreq_data {
 	void __iomem *regs;
-	struct tegra186_cpufreq_cluster *clusters;
 	const struct tegra186_cpufreq_cpu *cpus;
+	struct tegra186_cpufreq_cluster clusters[];
 };
 
 static int tegra186_cpufreq_init(struct cpufreq_policy *policy)
@@ -221,15 +221,12 @@ static int tegra186_cpufreq_probe(struct platform_device *pdev)
 	struct tegra_bpmp *bpmp;
 	unsigned int i = 0, err;
 
-	data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL);
+	data = devm_kzalloc(&pdev->dev,
+			    struct_size(data, clusters, TEGRA186_NUM_CLUSTERS),
+			    GFP_KERNEL);
 	if (!data)
 		return -ENOMEM;
 
-	data->clusters = devm_kcalloc(&pdev->dev, TEGRA186_NUM_CLUSTERS,
-				      sizeof(*data->clusters), GFP_KERNEL);
-	if (!data->clusters)
-		return -ENOMEM;
-
 	data->cpus = tegra186_cpus;
 
 	bpmp = tegra_bpmp_get(&pdev->dev);
diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c
index f64180dd2005..be4209d97cb3 100644
--- a/drivers/cpufreq/ti-cpufreq.c
+++ b/drivers/cpufreq/ti-cpufreq.c
@@ -39,6 +39,14 @@
 #define OMAP34xx_ProdID_SKUID			0x4830A20C
 #define OMAP3_SYSCON_BASE	(0x48000000 + 0x2000 + 0x270)
 
+#define AM625_EFUSE_K_MPU_OPP			11
+#define AM625_EFUSE_S_MPU_OPP			19
+#define AM625_EFUSE_T_MPU_OPP			20
+
+#define AM625_SUPPORT_K_MPU_OPP			BIT(0)
+#define AM625_SUPPORT_S_MPU_OPP			BIT(1)
+#define AM625_SUPPORT_T_MPU_OPP			BIT(2)
+
 #define VERSION_COUNT				2
 
 struct ti_cpufreq_data;
@@ -104,6 +112,25 @@ static unsigned long omap3_efuse_xlate(struct ti_cpufreq_data *opp_data,
 	return BIT(efuse);
 }
 
+static unsigned long am625_efuse_xlate(struct ti_cpufreq_data *opp_data,
+				       unsigned long efuse)
+{
+	unsigned long calculated_efuse = AM625_SUPPORT_K_MPU_OPP;
+
+	switch (efuse) {
+	case AM625_EFUSE_T_MPU_OPP:
+		calculated_efuse |= AM625_SUPPORT_T_MPU_OPP;
+		fallthrough;
+	case AM625_EFUSE_S_MPU_OPP:
+		calculated_efuse |= AM625_SUPPORT_S_MPU_OPP;
+		fallthrough;
+	case AM625_EFUSE_K_MPU_OPP:
+		calculated_efuse |= AM625_SUPPORT_K_MPU_OPP;
+	}
+
+	return calculated_efuse;
+}
+
 static struct ti_cpufreq_soc_data am3x_soc_data = {
 	.efuse_xlate = amx3_efuse_xlate,
 	.efuse_fallback = AM33XX_800M_ARM_MPU_MAX_FREQ,
@@ -198,6 +225,14 @@ static struct ti_cpufreq_soc_data am3517_soc_data = {
 	.multi_regulator = false,
 };
 
+static struct ti_cpufreq_soc_data am625_soc_data = {
+	.efuse_xlate = am625_efuse_xlate,
+	.efuse_offset = 0x0018,
+	.efuse_mask = 0x07c0,
+	.efuse_shift = 0x6,
+	.rev_offset = 0x0014,
+	.multi_regulator = false,
+};
 
 /**
  * ti_cpufreq_get_efuse() - Parse and return efuse value present on SoC
@@ -301,6 +336,7 @@ static const struct of_device_id ti_cpufreq_of_match[] = {
 	{ .compatible = "ti,dra7", .data = &dra7_soc_data },
 	{ .compatible = "ti,omap34xx", .data = &omap34xx_soc_data, },
 	{ .compatible = "ti,omap36xx", .data = &omap36xx_soc_data, },
+	{ .compatible = "ti,am625", .data = &am625_soc_data, },
 	/* legacy */
 	{ .compatible = "ti,omap3430", .data = &omap34xx_soc_data, },
 	{ .compatible = "ti,omap3630", .data = &omap36xx_soc_data, },
diff --git a/drivers/cpuidle/cpuidle-psci-domain.c b/drivers/cpuidle/cpuidle-psci-domain.c
index 821984947ed9..c80cf9ddabd8 100644
--- a/drivers/cpuidle/cpuidle-psci-domain.c
+++ b/drivers/cpuidle/cpuidle-psci-domain.c
@@ -181,7 +181,8 @@ static int psci_cpuidle_domain_probe(struct platform_device *pdev)
 	if (ret)
 		goto remove_pd;
 
-	pr_info("Initialized CPU PM domain topology\n");
+	pr_info("Initialized CPU PM domain topology using %s mode\n",
+		use_osi ? "OSI" : "PC");
 	return 0;
 
 put_node:
diff --git a/drivers/cpuidle/dt_idle_states.c b/drivers/cpuidle/dt_idle_states.c
index 252f2a9686a6..7ca3d7d9b5ea 100644
--- a/drivers/cpuidle/dt_idle_states.c
+++ b/drivers/cpuidle/dt_idle_states.c
@@ -211,18 +211,15 @@ int dt_init_idle_driver(struct cpuidle_driver *drv,
 	of_node_put(cpu_node);
 	if (err)
 		return err;
-	/*
-	 * Update the driver state count only if some valid DT idle states
-	 * were detected
-	 */
-	if (i)
-		drv->state_count = state_idx;
+
+	/* Set the number of total supported idle states. */
+	drv->state_count = state_idx;
 
 	/*
 	 * Return the number of present and valid DT idle states, which can
 	 * also be 0 on platforms with missing DT idle states or legacy DT
 	 * configuration predating the DT idle states bindings.
 	 */
-	return i;
+	return state_idx - start_idx;
 }
 EXPORT_SYMBOL_GPL(dt_init_idle_driver);
diff --git a/drivers/devfreq/devfreq-event.c b/drivers/devfreq/devfreq-event.c
index 6765c03334bc..f041edccd107 100644
--- a/drivers/devfreq/devfreq-event.c
+++ b/drivers/devfreq/devfreq-event.c
@@ -233,7 +233,7 @@ struct devfreq_event_dev *devfreq_event_get_edev_by_phandle(struct device *dev,
 
 	mutex_lock(&devfreq_event_list_lock);
 	list_for_each_entry(edev, &devfreq_event_list, node) {
-		if (edev->dev.parent && edev->dev.parent->of_node == node)
+		if (edev->dev.parent && device_match_of_node(edev->dev.parent, node))
 			goto out;
 	}
 
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 63347a5ae599..817c71da391a 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -776,8 +776,7 @@ static void remove_sysfs_files(struct devfreq *devfreq,
  * @dev:	the device to add devfreq feature.
  * @profile:	device-specific profile to run devfreq.
  * @governor_name:	name of the policy to choose frequency.
- * @data:	private data for the governor. The devfreq framework does not
- *		touch this value.
+ * @data:	devfreq driver pass to governors, governor should not change it.
  */
 struct devfreq *devfreq_add_device(struct device *dev,
 				   struct devfreq_dev_profile *profile,
@@ -1011,8 +1010,7 @@ static void devm_devfreq_dev_release(struct device *dev, void *res)
  * @dev:	the device to add devfreq feature.
  * @profile:	device-specific profile to run devfreq.
  * @governor_name:	name of the policy to choose frequency.
- * @data:	private data for the governor. The devfreq framework does not
- *		touch this value.
+ * @data:	 devfreq driver pass to governors, governor should not change it.
  *
  * This function manages automatically the memory of devfreq device using device
  * resource management and simplify the free operation for memory of devfreq
@@ -1059,7 +1057,7 @@ struct devfreq *devfreq_get_devfreq_by_node(struct device_node *node)
 	mutex_lock(&devfreq_list_lock);
 	list_for_each_entry(devfreq, &devfreq_list, node) {
 		if (devfreq->dev.parent
-			&& devfreq->dev.parent->of_node == node) {
+			&& device_match_of_node(devfreq->dev.parent, node)) {
 			mutex_unlock(&devfreq_list_lock);
 			return devfreq;
 		}
diff --git a/drivers/devfreq/event/exynos-nocp.c b/drivers/devfreq/event/exynos-nocp.c
index ccc531ee6938..c1cc23bcb995 100644
--- a/drivers/devfreq/event/exynos-nocp.c
+++ b/drivers/devfreq/event/exynos-nocp.c
@@ -214,8 +214,7 @@ static int exynos_nocp_parse_dt(struct platform_device *pdev,
 		nocp->clk = NULL;
 
 	/* Maps the memory mapped IO to control nocp register */
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	base = devm_ioremap_resource(dev, res);
+	base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
 	if (IS_ERR(base))
 		return PTR_ERR(base);
 
diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
index ab9db7adb3ad..d69672ccacc4 100644
--- a/drivers/devfreq/governor_userspace.c
+++ b/drivers/devfreq/governor_userspace.c
@@ -21,7 +21,7 @@ struct userspace_data {
 
 static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
 {
-	struct userspace_data *data = df->data;
+	struct userspace_data *data = df->governor_data;
 
 	if (data->valid)
 		*freq = data->user_frequency;
@@ -40,7 +40,7 @@ static ssize_t set_freq_store(struct device *dev, struct device_attribute *attr,
 	int err = 0;
 
 	mutex_lock(&devfreq->lock);
-	data = devfreq->data;
+	data = devfreq->governor_data;
 
 	sscanf(buf, "%lu", &wanted);
 	data->user_frequency = wanted;
@@ -60,7 +60,7 @@ static ssize_t set_freq_show(struct device *dev,
 	int err = 0;
 
 	mutex_lock(&devfreq->lock);
-	data = devfreq->data;
+	data = devfreq->governor_data;
 
 	if (data->valid)
 		err = sprintf(buf, "%lu\n", data->user_frequency);
@@ -91,7 +91,7 @@ static int userspace_init(struct devfreq *devfreq)
 		goto out;
 	}
 	data->valid = false;
-	devfreq->data = data;
+	devfreq->governor_data = data;
 
 	err = sysfs_create_group(&devfreq->dev.kobj, &dev_attr_group);
 out:
@@ -107,8 +107,8 @@ static void userspace_exit(struct devfreq *devfreq)
 	if (devfreq->dev.kobj.sd)
 		sysfs_remove_group(&devfreq->dev.kobj, &dev_attr_group);
 
-	kfree(devfreq->data);
-	devfreq->data = NULL;
+	kfree(devfreq->governor_data);
+	devfreq->governor_data = NULL;
 }
 
 static int devfreq_userspace_handler(struct devfreq *devfreq,
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 605d68673f92..e55c6095adf0 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -578,169 +578,140 @@ static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table,
 	return false;
 }
 
-static int opp_parse_supplies(struct dev_pm_opp *opp, struct device *dev,
-			      struct opp_table *opp_table)
+static u32 *_parse_named_prop(struct dev_pm_opp *opp, struct device *dev,
+			      struct opp_table *opp_table,
+			      const char *prop_type, bool *triplet)
 {
-	u32 *microvolt, *microamp = NULL, *microwatt = NULL;
-	int supplies = opp_table->regulator_count;
-	int vcount, icount, pcount, ret, i, j;
 	struct property *prop = NULL;
 	char name[NAME_MAX];
+	int count, ret;
+	u32 *out;
 
-	/* Search for "opp-microvolt-<name>" */
+	/* Search for "opp-<prop_type>-<name>" */
 	if (opp_table->prop_name) {
-		snprintf(name, sizeof(name), "opp-microvolt-%s",
+		snprintf(name, sizeof(name), "opp-%s-%s", prop_type,
 			 opp_table->prop_name);
 		prop = of_find_property(opp->np, name, NULL);
 	}
 
 	if (!prop) {
-		/* Search for "opp-microvolt" */
-		sprintf(name, "opp-microvolt");
+		/* Search for "opp-<prop_type>" */
+		snprintf(name, sizeof(name), "opp-%s", prop_type);
 		prop = of_find_property(opp->np, name, NULL);
-
-		/* Missing property isn't a problem, but an invalid entry is */
-		if (!prop) {
-			if (unlikely(supplies == -1)) {
-				/* Initialize regulator_count */
-				opp_table->regulator_count = 0;
-				return 0;
-			}
-
-			if (!supplies)
-				return 0;
-
-			dev_err(dev, "%s: opp-microvolt missing although OPP managing regulators\n",
-				__func__);
-			return -EINVAL;
-		}
+		if (!prop)
+			return NULL;
 	}
 
-	if (unlikely(supplies == -1)) {
-		/* Initialize regulator_count */
-		supplies = opp_table->regulator_count = 1;
-	} else if (unlikely(!supplies)) {
-		dev_err(dev, "%s: opp-microvolt wasn't expected\n", __func__);
-		return -EINVAL;
+	count = of_property_count_u32_elems(opp->np, name);
+	if (count < 0) {
+		dev_err(dev, "%s: Invalid %s property (%d)\n", __func__, name,
+			count);
+		return ERR_PTR(count);
 	}
 
-	vcount = of_property_count_u32_elems(opp->np, name);
-	if (vcount < 0) {
-		dev_err(dev, "%s: Invalid %s property (%d)\n",
-			__func__, name, vcount);
-		return vcount;
-	}
-
-	/* There can be one or three elements per supply */
-	if (vcount != supplies && vcount != supplies * 3) {
-		dev_err(dev, "%s: Invalid number of elements in %s property (%d) with supplies (%d)\n",
-			__func__, name, vcount, supplies);
-		return -EINVAL;
+	/*
+	 * Initialize regulator_count, if regulator information isn't provided
+	 * by the platform. Now that one of the properties is available, fix the
+	 * regulator_count to 1.
+	 */
+	if (unlikely(opp_table->regulator_count == -1))
+		opp_table->regulator_count = 1;
+
+	if (count != opp_table->regulator_count &&
+	    (!triplet || count != opp_table->regulator_count * 3)) {
+		dev_err(dev, "%s: Invalid number of elements in %s property (%u) with supplies (%d)\n",
+			__func__, prop_type, count, opp_table->regulator_count);
+		return ERR_PTR(-EINVAL);
 	}
 
-	microvolt = kmalloc_array(vcount, sizeof(*microvolt), GFP_KERNEL);
-	if (!microvolt)
-		return -ENOMEM;
+	out = kmalloc_array(count, sizeof(*out), GFP_KERNEL);
+	if (!out)
+		return ERR_PTR(-EINVAL);
 
-	ret = of_property_read_u32_array(opp->np, name, microvolt, vcount);
+	ret = of_property_read_u32_array(opp->np, name, out, count);
 	if (ret) {
 		dev_err(dev, "%s: error parsing %s: %d\n", __func__, name, ret);
-		ret = -EINVAL;
-		goto free_microvolt;
-	}
-
-	/* Search for "opp-microamp-<name>" */
-	prop = NULL;
-	if (opp_table->prop_name) {
-		snprintf(name, sizeof(name), "opp-microamp-%s",
-			 opp_table->prop_name);
-		prop = of_find_property(opp->np, name, NULL);
+		kfree(out);
+		return ERR_PTR(-EINVAL);
 	}
 
-	if (!prop) {
-		/* Search for "opp-microamp" */
-		sprintf(name, "opp-microamp");
-		prop = of_find_property(opp->np, name, NULL);
-	}
+	if (triplet)
+		*triplet = count != opp_table->regulator_count;
 
-	if (prop) {
-		icount = of_property_count_u32_elems(opp->np, name);
-		if (icount < 0) {
-			dev_err(dev, "%s: Invalid %s property (%d)\n", __func__,
-				name, icount);
-			ret = icount;
-			goto free_microvolt;
-		}
+	return out;
+}
 
-		if (icount != supplies) {
-			dev_err(dev, "%s: Invalid number of elements in %s property (%d) with supplies (%d)\n",
-				__func__, name, icount, supplies);
-			ret = -EINVAL;
-			goto free_microvolt;
-		}
+static u32 *opp_parse_microvolt(struct dev_pm_opp *opp, struct device *dev,
+				struct opp_table *opp_table, bool *triplet)
+{
+	u32 *microvolt;
 
-		microamp = kmalloc_array(icount, sizeof(*microamp), GFP_KERNEL);
-		if (!microamp) {
-			ret = -EINVAL;
-			goto free_microvolt;
-		}
+	microvolt = _parse_named_prop(opp, dev, opp_table, "microvolt", triplet);
+	if (IS_ERR(microvolt))
+		return microvolt;
 
-		ret = of_property_read_u32_array(opp->np, name, microamp,
-						 icount);
-		if (ret) {
-			dev_err(dev, "%s: error parsing %s: %d\n", __func__,
-				name, ret);
-			ret = -EINVAL;
-			goto free_microamp;
+	if (!microvolt) {
+		/*
+		 * Missing property isn't a problem, but an invalid
+		 * entry is. This property isn't optional if regulator
+		 * information is provided. Check only for the first OPP, as
+		 * regulator_count may get initialized after that to a valid
+		 * value.
+		 */
+		if (list_empty(&opp_table->opp_list) &&
+		    opp_table->regulator_count > 0) {
+			dev_err(dev, "%s: opp-microvolt missing although OPP managing regulators\n",
+				__func__);
+			return ERR_PTR(-EINVAL);
 		}
 	}
 
-	/* Search for "opp-microwatt" */
-	sprintf(name, "opp-microwatt");
-	prop = of_find_property(opp->np, name, NULL);
-
-	if (prop) {
-		pcount = of_property_count_u32_elems(opp->np, name);
-		if (pcount < 0) {
-			dev_err(dev, "%s: Invalid %s property (%d)\n", __func__,
-				name, pcount);
-			ret = pcount;
-			goto free_microamp;
-		}
+	return microvolt;
+}
 
-		if (pcount != supplies) {
-			dev_err(dev, "%s: Invalid number of elements in %s property (%d) with supplies (%d)\n",
-				__func__, name, pcount, supplies);
-			ret = -EINVAL;
-			goto free_microamp;
-		}
+static int opp_parse_supplies(struct dev_pm_opp *opp, struct device *dev,
+			      struct opp_table *opp_table)
+{
+	u32 *microvolt, *microamp, *microwatt;
+	int ret = 0, i, j;
+	bool triplet;
 
-		microwatt = kmalloc_array(pcount, sizeof(*microwatt),
-					  GFP_KERNEL);
-		if (!microwatt) {
-			ret = -EINVAL;
-			goto free_microamp;
-		}
+	microvolt = opp_parse_microvolt(opp, dev, opp_table, &triplet);
+	if (IS_ERR(microvolt))
+		return PTR_ERR(microvolt);
 
-		ret = of_property_read_u32_array(opp->np, name, microwatt,
-						 pcount);
-		if (ret) {
-			dev_err(dev, "%s: error parsing %s: %d\n", __func__,
-				name, ret);
-			ret = -EINVAL;
-			goto free_microwatt;
-		}
+	microamp = _parse_named_prop(opp, dev, opp_table, "microamp", NULL);
+	if (IS_ERR(microamp)) {
+		ret = PTR_ERR(microamp);
+		goto free_microvolt;
 	}
 
-	for (i = 0, j = 0; i < supplies; i++) {
-		opp->supplies[i].u_volt = microvolt[j++];
+	microwatt = _parse_named_prop(opp, dev, opp_table, "microwatt", NULL);
+	if (IS_ERR(microwatt)) {
+		ret = PTR_ERR(microwatt);
+		goto free_microamp;
+	}
 
-		if (vcount == supplies) {
-			opp->supplies[i].u_volt_min = opp->supplies[i].u_volt;
-			opp->supplies[i].u_volt_max = opp->supplies[i].u_volt;
-		} else {
-			opp->supplies[i].u_volt_min = microvolt[j++];
-			opp->supplies[i].u_volt_max = microvolt[j++];
+	/*
+	 * Initialize regulator_count if it is uninitialized and no properties
+	 * are found.
+	 */
+	if (unlikely(opp_table->regulator_count == -1)) {
+		opp_table->regulator_count = 0;
+		return 0;
+	}
+
+	for (i = 0, j = 0; i < opp_table->regulator_count; i++) {
+		if (microvolt) {
+			opp->supplies[i].u_volt = microvolt[j++];
+
+			if (triplet) {
+				opp->supplies[i].u_volt_min = microvolt[j++];
+				opp->supplies[i].u_volt_max = microvolt[j++];
+			} else {
+				opp->supplies[i].u_volt_min = opp->supplies[i].u_volt;
+				opp->supplies[i].u_volt_max = opp->supplies[i].u_volt;
+			}
 		}
 
 		if (microamp)
@@ -750,7 +721,6 @@ static int opp_parse_supplies(struct dev_pm_opp *opp, struct device *dev,
 			opp->supplies[i].u_watt = microwatt[i];
 	}
 
-free_microwatt:
 	kfree(microwatt);
 free_microamp:
 	kfree(microamp);
diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c
index 8f9c571d7257..00ac7e381441 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c
@@ -203,6 +203,7 @@ static const struct x86_cpu_id intel_uncore_cpu_ids[] = {
 	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,	NULL),
 	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D,	NULL),
 	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
+	X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, NULL),
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, intel_uncore_cpu_ids);
diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index 515e3ceb3393..90d33cd1b670 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -44,6 +44,19 @@ config IDLE_INJECT
 	  synchronously on a set of specified CPUs or alternatively
 	  on a per CPU basis.
 
+config ARM_SCMI_POWERCAP
+	tristate "ARM SCMI Powercap driver"
+	depends on ARM_SCMI_PROTOCOL
+	help
+	  This enables support for the ARM Powercap based on ARM SCMI
+	  Powercap protocol.
+
+	  ARM SCMI Powercap protocol allows power limits to be enforced
+	  and monitored against the SCMI Powercap domains advertised as
+	  available by the SCMI platform firmware.
+
+	  When compiled as module it will be called arm_scmi_powercap.ko.
+
 config DTPM
 	bool "Power capping for Dynamic Thermal Power Management (EXPERIMENTAL)"
 	depends on OF
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 494617cdad88..4474201b4aa7 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_POWERCAP)	+= powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
 obj-$(CONFIG_IDLE_INJECT) += idle_inject.o
+obj-$(CONFIG_ARM_SCMI_POWERCAP) += arm_scmi_powercap.o
diff --git a/drivers/powercap/arm_scmi_powercap.c b/drivers/powercap/arm_scmi_powercap.c
new file mode 100644
index 000000000000..05d0e516176a
--- /dev/null
+++ b/drivers/powercap/arm_scmi_powercap.c
@@ -0,0 +1,509 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * SCMI Powercap support.
+ *
+ * Copyright (C) 2022 ARM Ltd.
+ */
+
+#include <linux/device.h>
+#include <linux/math.h>
+#include <linux/limits.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/powercap.h>
+#include <linux/scmi_protocol.h>
+
+#define to_scmi_powercap_zone(z)		\
+	container_of(z, struct scmi_powercap_zone, zone)
+
+static const struct scmi_powercap_proto_ops *powercap_ops;
+
+struct scmi_powercap_zone {
+	unsigned int height;
+	struct device *dev;
+	struct scmi_protocol_handle *ph;
+	const struct scmi_powercap_info *info;
+	struct scmi_powercap_zone *spzones;
+	struct powercap_zone zone;
+	struct list_head node;
+};
+
+struct scmi_powercap_root {
+	unsigned int num_zones;
+	struct scmi_powercap_zone *spzones;
+	struct list_head *registered_zones;
+};
+
+static struct powercap_control_type *scmi_top_pcntrl;
+
+static int scmi_powercap_zone_release(struct powercap_zone *pz)
+{
+	return 0;
+}
+
+static int scmi_powercap_get_max_power_range_uw(struct powercap_zone *pz,
+						u64 *max_power_range_uw)
+{
+	*max_power_range_uw = U32_MAX;
+	return 0;
+}
+
+static int scmi_powercap_get_power_uw(struct powercap_zone *pz,
+				      u64 *power_uw)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+	u32 avg_power, pai;
+	int ret;
+
+	if (!spz->info->powercap_monitoring)
+		return -EINVAL;
+
+	ret = powercap_ops->measurements_get(spz->ph, spz->info->id, &avg_power,
+					     &pai);
+	if (ret)
+		return ret;
+
+	*power_uw = avg_power;
+	if (spz->info->powercap_scale_mw)
+		*power_uw *= 1000;
+
+	return 0;
+}
+
+static const struct powercap_zone_ops zone_ops = {
+	.get_max_power_range_uw = scmi_powercap_get_max_power_range_uw,
+	.get_power_uw = scmi_powercap_get_power_uw,
+	.release = scmi_powercap_zone_release,
+};
+
+static void scmi_powercap_normalize_cap(const struct scmi_powercap_zone *spz,
+					u64 power_limit_uw, u32 *norm)
+{
+	bool scale_mw = spz->info->powercap_scale_mw;
+	u64 val;
+
+	val = scale_mw ? DIV_ROUND_UP_ULL(power_limit_uw, 1000) : power_limit_uw;
+	/*
+	 * This cast is lossless since here @req_power is certain to be within
+	 * the range [min_power_cap, max_power_cap] whose bounds are assured to
+	 * be two unsigned 32bits quantities.
+	 */
+	*norm = clamp_t(u32, val, spz->info->min_power_cap,
+			spz->info->max_power_cap);
+	*norm = rounddown(*norm, spz->info->power_cap_step);
+
+	val = (scale_mw) ? *norm * 1000 : *norm;
+	if (power_limit_uw != val)
+		dev_dbg(spz->dev,
+			"Normalized %s:CAP - requested:%llu - normalized:%llu\n",
+			spz->info->name, power_limit_uw, val);
+}
+
+static int scmi_powercap_set_power_limit_uw(struct powercap_zone *pz, int cid,
+					    u64 power_uw)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+	u32 norm_power;
+
+	if (!spz->info->powercap_cap_config)
+		return -EINVAL;
+
+	scmi_powercap_normalize_cap(spz, power_uw, &norm_power);
+
+	return powercap_ops->cap_set(spz->ph, spz->info->id, norm_power, false);
+}
+
+static int scmi_powercap_get_power_limit_uw(struct powercap_zone *pz, int cid,
+					    u64 *power_limit_uw)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+	u32 power;
+	int ret;
+
+	ret = powercap_ops->cap_get(spz->ph, spz->info->id, &power);
+	if (ret)
+		return ret;
+
+	*power_limit_uw = power;
+	if (spz->info->powercap_scale_mw)
+		*power_limit_uw *= 1000;
+
+	return 0;
+}
+
+static void scmi_powercap_normalize_time(const struct scmi_powercap_zone *spz,
+					 u64 time_us, u32 *norm)
+{
+	/*
+	 * This cast is lossless since here @time_us is certain to be within the
+	 * range [min_pai, max_pai] whose bounds are assured to be two unsigned
+	 * 32bits quantities.
+	 */
+	*norm = clamp_t(u32, time_us, spz->info->min_pai, spz->info->max_pai);
+	*norm = rounddown(*norm, spz->info->pai_step);
+
+	if (time_us != *norm)
+		dev_dbg(spz->dev,
+			"Normalized %s:PAI - requested:%llu - normalized:%u\n",
+			spz->info->name, time_us, *norm);
+}
+
+static int scmi_powercap_set_time_window_us(struct powercap_zone *pz, int cid,
+					    u64 time_window_us)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+	u32 norm_pai;
+
+	if (!spz->info->powercap_pai_config)
+		return -EINVAL;
+
+	scmi_powercap_normalize_time(spz, time_window_us, &norm_pai);
+
+	return powercap_ops->pai_set(spz->ph, spz->info->id, norm_pai);
+}
+
+static int scmi_powercap_get_time_window_us(struct powercap_zone *pz, int cid,
+					    u64 *time_window_us)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+	int ret;
+	u32 pai;
+
+	ret = powercap_ops->pai_get(spz->ph, spz->info->id, &pai);
+	if (ret)
+		return ret;
+
+	*time_window_us = pai;
+
+	return 0;
+}
+
+static int scmi_powercap_get_max_power_uw(struct powercap_zone *pz, int cid,
+					  u64 *max_power_uw)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+
+	*max_power_uw = spz->info->max_power_cap;
+	if (spz->info->powercap_scale_mw)
+		*max_power_uw *= 1000;
+
+	return 0;
+}
+
+static int scmi_powercap_get_min_power_uw(struct powercap_zone *pz, int cid,
+					  u64 *min_power_uw)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+
+	*min_power_uw = spz->info->min_power_cap;
+	if (spz->info->powercap_scale_mw)
+		*min_power_uw *= 1000;
+
+	return 0;
+}
+
+static int scmi_powercap_get_max_time_window_us(struct powercap_zone *pz,
+						int cid, u64 *time_window_us)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+
+	*time_window_us = spz->info->max_pai;
+
+	return 0;
+}
+
+static int scmi_powercap_get_min_time_window_us(struct powercap_zone *pz,
+						int cid, u64 *time_window_us)
+{
+	struct scmi_powercap_zone *spz = to_scmi_powercap_zone(pz);
+
+	*time_window_us = (u64)spz->info->min_pai;
+
+	return 0;
+}
+
+static const char *scmi_powercap_get_name(struct powercap_zone *pz, int cid)
+{
+	return "SCMI power-cap";
+}
+
+static const struct powercap_zone_constraint_ops constraint_ops  = {
+	.set_power_limit_uw = scmi_powercap_set_power_limit_uw,
+	.get_power_limit_uw = scmi_powercap_get_power_limit_uw,
+	.set_time_window_us = scmi_powercap_set_time_window_us,
+	.get_time_window_us = scmi_powercap_get_time_window_us,
+	.get_max_power_uw = scmi_powercap_get_max_power_uw,
+	.get_min_power_uw = scmi_powercap_get_min_power_uw,
+	.get_max_time_window_us = scmi_powercap_get_max_time_window_us,
+	.get_min_time_window_us = scmi_powercap_get_min_time_window_us,
+	.get_name = scmi_powercap_get_name,
+};
+
+static void scmi_powercap_unregister_all_zones(struct scmi_powercap_root *pr)
+{
+	int i;
+
+	/* Un-register children zones first starting from the leaves */
+	for (i = pr->num_zones - 1; i >= 0; i--) {
+		if (!list_empty(&pr->registered_zones[i])) {
+			struct scmi_powercap_zone *spz;
+
+			list_for_each_entry(spz, &pr->registered_zones[i], node)
+				powercap_unregister_zone(scmi_top_pcntrl,
+							 &spz->zone);
+		}
+	}
+}
+
+static inline bool
+scmi_powercap_is_zone_registered(struct scmi_powercap_zone *spz)
+{
+	return !list_empty(&spz->node);
+}
+
+static inline unsigned int
+scmi_powercap_get_zone_height(struct scmi_powercap_zone *spz)
+{
+	if (spz->info->parent_id == SCMI_POWERCAP_ROOT_ZONE_ID)
+		return 0;
+
+	return spz->spzones[spz->info->parent_id].height + 1;
+}
+
+static inline struct scmi_powercap_zone *
+scmi_powercap_get_parent_zone(struct scmi_powercap_zone *spz)
+{
+	if (spz->info->parent_id == SCMI_POWERCAP_ROOT_ZONE_ID)
+		return NULL;
+
+	return &spz->spzones[spz->info->parent_id];
+}
+
+/**
+ * scmi_powercap_register_zone  - Register an SCMI powercap zone recursively
+ *
+ * @pr: A reference to the root powercap zones descriptors
+ * @spz: A reference to the SCMI powercap zone to register
+ *
+ * When registering SCMI powercap zones with the powercap framework we should
+ * take care to always register zones starting from the root ones and to
+ * deregister starting from the leaves.
+ *
+ * Unfortunately we cannot assume that the array of available SCMI powercap
+ * zones provided by the SCMI platform firmware is built to comply with such
+ * requirement.
+ *
+ * This function, given an SCMI powercap zone to register, takes care to walk
+ * the SCMI powercap zones tree up to the root looking recursively for
+ * unregistered parent zones before registering the provided zone; at the same
+ * time each registered zone height in such a tree is accounted for and each
+ * zone, once registered, is stored in the @registered_zones array that is
+ * indexed by zone height: this way will be trivial, at unregister time, to walk
+ * the @registered_zones array backward and unregister all the zones starting
+ * from the leaves, removing children zones before parents.
+ *
+ * While doing this, we prune away any zone marked as invalid (like the ones
+ * sporting an SCMI abstract power scale) as long as they are positioned as
+ * leaves in the SCMI powercap zones hierarchy: any non-leaf invalid zone causes
+ * the entire process to fail since we cannot assume the correctness of an SCMI
+ * powercap zones hierarchy if some of the internal nodes are missing.
+ *
+ * Note that the array of SCMI powercap zones as returned by the SCMI platform
+ * is known to be sane, i.e. zones relationships have been validated at the
+ * protocol layer.
+ *
+ * Return: 0 on Success
+ */
+static int scmi_powercap_register_zone(struct scmi_powercap_root *pr,
+				       struct scmi_powercap_zone *spz)
+{
+	int ret = 0;
+	struct scmi_powercap_zone *parent;
+
+	if (!spz->info)
+		return ret;
+
+	parent = scmi_powercap_get_parent_zone(spz);
+	if (parent && !scmi_powercap_is_zone_registered(parent)) {
+		/*
+		 * Bail out if a parent domain was marked as unsupported:
+		 * only domains participating as leaves can be skipped.
+		 */
+		if (!parent->info)
+			return -ENODEV;
+
+		ret = scmi_powercap_register_zone(pr, parent);
+		if (ret)
+			return ret;
+	}
+
+	if (!scmi_powercap_is_zone_registered(spz)) {
+		struct powercap_zone *z;
+
+		z = powercap_register_zone(&spz->zone,
+					   scmi_top_pcntrl,
+					   spz->info->name,
+					   parent ? &parent->zone : NULL,
+					   &zone_ops, 1, &constraint_ops);
+		if (!IS_ERR(z)) {
+			spz->height = scmi_powercap_get_zone_height(spz);
+			list_add(&spz->node,
+				 &pr->registered_zones[spz->height]);
+			dev_dbg(spz->dev,
+				"Registered node %s - parent %s - height:%d\n",
+				spz->info->name,
+				parent ? parent->info->name : "ROOT",
+				spz->height);
+			ret = 0;
+		} else {
+			ret = PTR_ERR(z);
+			dev_err(spz->dev,
+				"Error registering node:%s - parent:%s - h:%d - ret:%d\n",
+				 spz->info->name,
+				 parent ? parent->info->name : "ROOT",
+				 spz->height, ret);
+		}
+	}
+
+	return ret;
+}
+
+static int scmi_powercap_probe(struct scmi_device *sdev)
+{
+	int ret, i;
+	struct scmi_powercap_root *pr;
+	struct scmi_powercap_zone *spz;
+	struct scmi_protocol_handle *ph;
+	struct device *dev = &sdev->dev;
+
+	if (!sdev->handle)
+		return -ENODEV;
+
+	powercap_ops = sdev->handle->devm_protocol_get(sdev,
+						       SCMI_PROTOCOL_POWERCAP,
+						       &ph);
+	if (IS_ERR(powercap_ops))
+		return PTR_ERR(powercap_ops);
+
+	pr = devm_kzalloc(dev, sizeof(*pr), GFP_KERNEL);
+	if (!pr)
+		return -ENOMEM;
+
+	ret = powercap_ops->num_domains_get(ph);
+	if (ret < 0) {
+		dev_err(dev, "number of powercap domains not found\n");
+		return ret;
+	}
+	pr->num_zones = ret;
+
+	pr->spzones = devm_kcalloc(dev, pr->num_zones,
+				   sizeof(*pr->spzones), GFP_KERNEL);
+	if (!pr->spzones)
+		return -ENOMEM;
+
+	/* Allocate for worst possible scenario of maximum tree height. */
+	pr->registered_zones = devm_kcalloc(dev, pr->num_zones,
+					    sizeof(*pr->registered_zones),
+					    GFP_KERNEL);
+	if (!pr->registered_zones)
+		return -ENOMEM;
+
+	for (i = 0, spz = pr->spzones; i < pr->num_zones; i++, spz++) {
+		/*
+		 * Powercap domains are validate by the protocol layer, i.e.
+		 * when only non-NULL domains are returned here, whose
+		 * parent_id is assured to point to another valid domain.
+		 */
+		spz->info = powercap_ops->info_get(ph, i);
+
+		spz->dev = dev;
+		spz->ph = ph;
+		spz->spzones = pr->spzones;
+		INIT_LIST_HEAD(&spz->node);
+		INIT_LIST_HEAD(&pr->registered_zones[i]);
+
+		/*
+		 * Forcibly skip powercap domains using an abstract scale.
+		 * Note that only leaves domains can be skipped, so this could
+		 * lead later to a global failure.
+		 */
+		if (!spz->info->powercap_scale_uw &&
+		    !spz->info->powercap_scale_mw) {
+			dev_warn(dev,
+				 "Abstract power scale not supported. Skip %s.\n",
+				 spz->info->name);
+			spz->info = NULL;
+			continue;
+		}
+	}
+
+	/*
+	 * Scan array of retrieved SCMI powercap domains and register them
+	 * recursively starting from the root domains.
+	 */
+	for (i = 0, spz = pr->spzones; i < pr->num_zones; i++, spz++) {
+		ret = scmi_powercap_register_zone(pr, spz);
+		if (ret) {
+			dev_err(dev,
+				"Failed to register powercap zone %s - ret:%d\n",
+				spz->info->name, ret);
+			scmi_powercap_unregister_all_zones(pr);
+			return ret;
+		}
+	}
+
+	dev_set_drvdata(dev, pr);
+
+	dev_info(dev, "Registered %d SCMI Powercap domains !\n", pr->num_zones);
+
+	return ret;
+}
+
+static void scmi_powercap_remove(struct scmi_device *sdev)
+{
+	struct device *dev = &sdev->dev;
+	struct scmi_powercap_root *pr = dev_get_drvdata(dev);
+
+	scmi_powercap_unregister_all_zones(pr);
+}
+
+static const struct scmi_device_id scmi_id_table[] = {
+	{ SCMI_PROTOCOL_POWERCAP, "powercap" },
+	{ },
+};
+MODULE_DEVICE_TABLE(scmi, scmi_id_table);
+
+static struct scmi_driver scmi_powercap_driver = {
+	.name = "scmi-powercap",
+	.probe = scmi_powercap_probe,
+	.remove = scmi_powercap_remove,
+	.id_table = scmi_id_table,
+};
+
+static int __init scmi_powercap_init(void)
+{
+	int ret;
+
+	scmi_top_pcntrl = powercap_register_control_type(NULL, "arm-scmi", NULL);
+	if (IS_ERR(scmi_top_pcntrl))
+		return PTR_ERR(scmi_top_pcntrl);
+
+	ret = scmi_register(&scmi_powercap_driver);
+	if (ret)
+		powercap_unregister_control_type(scmi_top_pcntrl);
+
+	return ret;
+}
+module_init(scmi_powercap_init);
+
+static void __exit scmi_powercap_exit(void)
+{
+	scmi_unregister(&scmi_powercap_driver);
+
+	powercap_unregister_control_type(scmi_top_pcntrl);
+}
+module_exit(scmi_powercap_exit);
+
+MODULE_AUTHOR("Cristian Marussi <cristian.marussi@arm.com>");
+MODULE_DESCRIPTION("ARM SCMI Powercap driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c
index 999e218d7793..fe86a09e3b67 100644
--- a/drivers/powercap/idle_inject.c
+++ b/drivers/powercap/idle_inject.c
@@ -147,6 +147,7 @@ static void idle_inject_fn(unsigned int cpu)
 
 /**
  * idle_inject_set_duration - idle and run duration update helper
+ * @ii_dev: idle injection control device structure
  * @run_duration_us: CPU run time to allow in microseconds
  * @idle_duration_us: CPU idle time to inject in microseconds
  */
@@ -162,6 +163,7 @@ void idle_inject_set_duration(struct idle_inject_device *ii_dev,
 
 /**
  * idle_inject_get_duration - idle and run duration retrieval helper
+ * @ii_dev: idle injection control device structure
  * @run_duration_us: memory location to store the current CPU run time
  * @idle_duration_us: memory location to store the current CPU idle time
  */
@@ -175,6 +177,7 @@ void idle_inject_get_duration(struct idle_inject_device *ii_dev,
 
 /**
  * idle_inject_set_latency - set the maximum latency allowed
+ * @ii_dev: idle injection control device structure
  * @latency_us: set the latency requirement for the idle state
  */
 void idle_inject_set_latency(struct idle_inject_device *ii_dev,
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
index f0654a932b37..1f968353d479 100644
--- a/drivers/powercap/powercap_sys.c
+++ b/drivers/powercap/powercap_sys.c
@@ -7,6 +7,7 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/err.h>
+#include <linux/kstrtox.h>
 #include <linux/slab.h>
 #include <linux/powercap.h>
 
@@ -446,7 +447,7 @@ static ssize_t enabled_store(struct device *dev,
 {
 	bool mode;
 
-	if (strtobool(buf, &mode))
+	if (kstrtobool(buf, &mode))
 		return -EINVAL;
 	if (dev->parent) {
 		struct powercap_zone *power_zone = to_powercap_zone(dev);
author	Linus Torvalds <torvalds@linux-foundation.org>	2022-12-12 13:19:07 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2022-12-12 13:19:07 -0800
commit	045e222d0a9dcec152abe0633f538cafd965b12b (patch)
tree	09db255e1ef5ff3a3d17bb6744662650cfb99244 /drivers
parent	631aa744423173bf921191ba695bbc7c1aabd9e0 (diff)
parent	ed6a00471dd444af085164f987e3d5e8145da35a (diff)