cpufreq: governor: Fix negative 'idle_time' handling in dbs_update()

[ Upstream commit 3698dd6b139dc37b35a9ad83d9330c1f99666c02 ]

We observed an issue that the CPU frequency can't raise up with a 100% CPU
load when NOHZ is off and the 'conservative' governor is selected.

'idle_time' can be negative if it's obtained from get_cpu_idle_time_jiffy()
when NOHZ is off.  This was found and explained in commit 9485e4ca0b
("cpufreq: governor: Fix handling of special cases in dbs_update()").

However, commit 7592019634 ("cpufreq: governors: Fix long idle detection
logic in load calculation") introduced a comparison between 'idle_time' and
'samling_rate' to detect a long idle interval.  While 'idle_time' is
converted to int before comparison, it's actually promoted to unsigned
again when compared with an unsigned 'sampling_rate'.  Hence, this leads to
wrong idle interval detection when it's in fact 100% busy and sets
policy_dbs->idle_periods to a very large value.  'conservative' adjusts the
frequency to minimum because of the large 'idle_periods', such that the
frequency can't raise up.  'Ondemand' doesn't use policy_dbs->idle_periods
so it fortunately avoids the issue.

Correct negative 'idle_time' to 0 before any use of it in dbs_update().

Fixes: 7592019634 ("cpufreq: governors: Fix long idle detection logic in load calculation")
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Link: https://patch.msgid.link/20250213035510.2402076-1-zhanjie9@hisilicon.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit is contained in:
Jie Zhan 2025-02-13 11:55:10 +08:00 committed by Greg Kroah-Hartman
parent d6937b1b4a
commit a33ab15c5b

View File

@ -145,7 +145,23 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
time_elapsed = update_time - j_cdbs->prev_update_time;
j_cdbs->prev_update_time = update_time;
/*
* cur_idle_time could be smaller than j_cdbs->prev_cpu_idle if
* it's obtained from get_cpu_idle_time_jiffy() when NOHZ is
* off, where idle_time is calculated by the difference between
* time elapsed in jiffies and "busy time" obtained from CPU
* statistics. If a CPU is 100% busy, the time elapsed and busy
* time should grow with the same amount in two consecutive
* samples, but in practice there could be a tiny difference,
* making the accumulated idle time decrease sometimes. Hence,
* in this case, idle_time should be regarded as 0 in order to
* make the further process correct.
*/
if (cur_idle_time > j_cdbs->prev_cpu_idle)
idle_time = cur_idle_time - j_cdbs->prev_cpu_idle;
else
idle_time = 0;
j_cdbs->prev_cpu_idle = cur_idle_time;
if (ignore_nice) {
@ -162,7 +178,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
* calls, so the previous load value can be used then.
*/
load = j_cdbs->prev_load;
} else if (unlikely((int)idle_time > 2 * sampling_rate &&
} else if (unlikely(idle_time > 2 * sampling_rate &&
j_cdbs->prev_load)) {
/*
* If the CPU had gone completely idle and a task has
@ -189,30 +205,15 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
load = j_cdbs->prev_load;
j_cdbs->prev_load = 0;
} else {
if (time_elapsed >= idle_time) {
if (time_elapsed > idle_time)
load = 100 * (time_elapsed - idle_time) / time_elapsed;
} else {
/*
* That can happen if idle_time is returned by
* get_cpu_idle_time_jiffy(). In that case
* idle_time is roughly equal to the difference
* between time_elapsed and "busy time" obtained
* from CPU statistics. Then, the "busy time"
* can end up being greater than time_elapsed
* (for example, if jiffies_64 and the CPU
* statistics are updated by different CPUs),
* so idle_time may in fact be negative. That
* means, though, that the CPU was busy all
* the time (on the rough average) during the
* last sampling interval and 100 can be
* returned as the load.
*/
load = (int)idle_time < 0 ? 100 : 0;
}
else
load = 0;
j_cdbs->prev_load = load;
}
if (unlikely((int)idle_time > 2 * sampling_rate)) {
if (unlikely(idle_time > 2 * sampling_rate)) {
unsigned int periods = idle_time / sampling_rate;
if (periods < idle_periods)