Patchwork [v1,6/7] arm64: dts: sdm845: Increase alert trip point to 95 degrees

login
register
mail settings
Submitter Amit Kucheria
Date Jan. 10, 2019, midnight
Message ID <041258d65883df964890249a24d2a4788c419304.1547078153.git.amit.kucheria@linaro.org>
Download mbox | patch
Permalink /patch/696257/
State New
Headers show

Comments

Amit Kucheria - Jan. 10, 2019, midnight
75 degrees is too aggressive for throttling the CPU. After speaking to
Qualcomm engineers, increase it to 95 degrees.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
Stephen Boyd - Jan. 10, 2019, 12:29 a.m.
Quoting Amit Kucheria (2019-01-09 16:00:55)
> 75 degrees is too aggressive for throttling the CPU. After speaking to
> Qualcomm engineers, increase it to 95 degrees.
> 
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------

Is the plan that these are some defaults that would be adjusted by board
variants? Just curious why we have anything in here and don't punt it
all to each board dts file.
Matthias Kaehlcke - Jan. 10, 2019, 1:15 a.m.
Hi Amit,

On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> 75 degrees is too aggressive for throttling the CPU. After speaking to
> Qualcomm engineers, increase it to 95 degrees.
> 
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index c27cbd3bcb0a..29e823b0caf4 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -1692,7 +1692,7 @@
>  
>  			trips {
>  				cpu_alert0: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1713,7 +1713,7 @@
>  
>  			trips {
>  				cpu_alert1: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1734,7 +1734,7 @@
>  
>  			trips {
>  				cpu_alert2: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1755,7 +1755,7 @@
>  
>  			trips {
>  				cpu_alert3: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1776,7 +1776,7 @@
>  
>  			trips {
>  				cpu_alert4: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1797,7 +1797,7 @@
>  
>  			trips {
>  				cpu_alert5: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1818,7 +1818,7 @@
>  
>  			trips {
>  				cpu_alert6: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
> @@ -1839,7 +1839,7 @@
>  
>  			trips {
>  				cpu_alert7: trip0 {
> -					temperature = <75000>;
> +					temperature = <95000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};

The change itself looks good to me, however I wonder if it would be
worth to eliminate redundancy and merge the current 8 thermal zones
into 2, one for the Silver and one for the Gold cluster (as done by
http://crrev.com/c/1381752). There is a single cooling device for
each cluster, so it's not clear to me if there is any gain from having
a separate thermal zone for each CPU. If it is important to monitor
the temperatures of the individual cores this can still be done by
configuring the thermal zone of the cluster with multiple thermal
sensors.

Cheers

Matthias
Matthias Kaehlcke - Jan. 10, 2019, 2:15 a.m.
On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> Hi Amit,
> 
> On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> > 
> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index c27cbd3bcb0a..29e823b0caf4 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > @@ -1692,7 +1692,7 @@
> >  
> >  			trips {
> >  				cpu_alert0: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1713,7 +1713,7 @@
> >  
> >  			trips {
> >  				cpu_alert1: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1734,7 +1734,7 @@
> >  
> >  			trips {
> >  				cpu_alert2: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1755,7 +1755,7 @@
> >  
> >  			trips {
> >  				cpu_alert3: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1776,7 +1776,7 @@
> >  
> >  			trips {
> >  				cpu_alert4: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1797,7 +1797,7 @@
> >  
> >  			trips {
> >  				cpu_alert5: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1818,7 +1818,7 @@
> >  
> >  			trips {
> >  				cpu_alert6: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> > @@ -1839,7 +1839,7 @@
> >  
> >  			trips {
> >  				cpu_alert7: trip0 {
> > -					temperature = <75000>;
> > +					temperature = <95000>;
> >  					hysteresis = <2000>;
> >  					type = "passive";
> >  				};
> 
> The change itself looks good to me, however I wonder if it would be
> worth to eliminate redundancy and merge the current 8 thermal zones
> into 2, one for the Silver and one for the Gold cluster (as done by
> http://crrev.com/c/1381752). There is a single cooling device for
> each cluster, so it's not clear to me if there is any gain from having
> a separate thermal zone for each CPU. If it is important to monitor
> the temperatures of the individual cores this can still be done by
> configuring the thermal zone of the cluster with multiple thermal
> sensors.

I see your idea is to have a cooling device per CPU ("arm64: dts:
sdm845: wireup the thermal trip points to cpufreq" /
https://lore.kernel.org/patchwork/patch/1030742/), however that
doesn't work as intended. Only two cpufreq 'devices' are created,
one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
runs for these cores and only two cooling devices are
registered. Since the cores of a cluster all run at the same
frequency I also doubt if having multiple cooling devices would
bring any benefits.

Cheers

Matthias
Doug Anderson - Jan. 10, 2019, 5:14 p.m.
Hi,

On Wed, Jan 9, 2019 at 4:29 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Amit Kucheria (2019-01-09 16:00:55)
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
>
> Is the plan that these are some defaults that would be adjusted by board
> variants? Just curious why we have anything in here and don't punt it
> all to each board dts file.

My preference would be that the SoC device tree file should contain
thermal numbers that are important to pay attention to for the safety
/ proper operation of the SoC.  ...then individual boards could (if
they needed to) override with lower values to control, for instance,
skin temperature.

From experience with previous boards, if you've got enough an off-SoC
thermistors then those are the ones you'd want to monitor to control
skin temperature.  It's OK if the SoC spikes up quite high as long as
that heat has somewhere to go (like a heat pipe).  The sensors that
are part of Amit's patch are on-chip.

...if you've got a board without external thermistors and are using
the SoC's on-chip sensors as a proxy for the heat in the overall
system then you might want to lower your values in the board device
tree file.  You won't be able to have as many short term spikes, but
that's what you gotta do without the extra sensors.


Sound sane?

-Doug
Amit Kucheria - Jan. 10, 2019, 7:45 p.m.
On Thu, Jan 10, 2019 at 7:45 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> > Hi Amit,
> >
> > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > > Qualcomm engineers, increase it to 95 degrees.
> > >
> > > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > > ---
> > >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index c27cbd3bcb0a..29e823b0caf4 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -1692,7 +1692,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert0: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1713,7 +1713,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert1: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1734,7 +1734,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert2: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1755,7 +1755,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert3: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1776,7 +1776,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert4: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1797,7 +1797,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert5: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1818,7 +1818,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert6: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> > > @@ -1839,7 +1839,7 @@
> > >
> > >                     trips {
> > >                             cpu_alert7: trip0 {
> > > -                                   temperature = <75000>;
> > > +                                   temperature = <95000>;
> > >                                     hysteresis = <2000>;
> > >                                     type = "passive";
> > >                             };
> >
> > The change itself looks good to me, however I wonder if it would be
> > worth to eliminate redundancy and merge the current 8 thermal zones
> > into 2, one for the Silver and one for the Gold cluster (as done by
> > http://crrev.com/c/1381752). There is a single cooling device for
> > each cluster, so it's not clear to me if there is any gain from having
> > a separate thermal zone for each CPU. If it is important to monitor
> > the temperatures of the individual cores this can still be done by
> > configuring the thermal zone of the cluster with multiple thermal
> > sensors.
>
> I see your idea is to have a cooling device per CPU ("arm64: dts:
> sdm845: wireup the thermal trip points to cpufreq" /
> https://lore.kernel.org/patchwork/patch/1030742/), however that
> doesn't work as intended. Only two cpufreq 'devices' are created,
> one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
> runs for these cores and only two cooling devices are
> registered. Since the cores of a cluster all run at the same
> frequency I also doubt if having multiple cooling devices would
> bring any benefits.

I actually only intended for two cooling devices - one for each
frequency domain. I'll clarify it better in the patch description.
Matthias Kaehlcke - Jan. 10, 2019, 8 p.m.
On Fri, Jan 11, 2019 at 01:15:09AM +0530, Amit Kucheria wrote:
> On Thu, Jan 10, 2019 at 7:45 AM Matthias Kaehlcke <mka@chromium.org> wrote:
> >
> > On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> > > Hi Amit,
> > >
> > > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > > > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > > > Qualcomm engineers, increase it to 95 degrees.
> > > >
> > > > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > > > ---
> > > >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > > index c27cbd3bcb0a..29e823b0caf4 100644
> > > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > > @@ -1692,7 +1692,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert0: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1713,7 +1713,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert1: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1734,7 +1734,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert2: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1755,7 +1755,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert3: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1776,7 +1776,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert4: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1797,7 +1797,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert5: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1818,7 +1818,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert6: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > > > @@ -1839,7 +1839,7 @@
> > > >
> > > >                     trips {
> > > >                             cpu_alert7: trip0 {
> > > > -                                   temperature = <75000>;
> > > > +                                   temperature = <95000>;
> > > >                                     hysteresis = <2000>;
> > > >                                     type = "passive";
> > > >                             };
> > >
> > > The change itself looks good to me, however I wonder if it would be
> > > worth to eliminate redundancy and merge the current 8 thermal zones
> > > into 2, one for the Silver and one for the Gold cluster (as done by
> > > http://crrev.com/c/1381752). There is a single cooling device for
> > > each cluster, so it's not clear to me if there is any gain from having
> > > a separate thermal zone for each CPU. If it is important to monitor
> > > the temperatures of the individual cores this can still be done by
> > > configuring the thermal zone of the cluster with multiple thermal
> > > sensors.
> >
> > I see your idea is to have a cooling device per CPU ("arm64: dts:
> > sdm845: wireup the thermal trip points to cpufreq" /
> > https://lore.kernel.org/patchwork/patch/1030742/), however that
> > doesn't work as intended. Only two cpufreq 'devices' are created,
> > one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
> > runs for these cores and only two cooling devices are
> > registered. Since the cores of a cluster all run at the same
> > frequency I also doubt if having multiple cooling devices would
> > bring any benefits.
> 
> I actually only intended for two cooling devices - one for each
> frequency domain. I'll clarify it better in the patch description.

Viresh helped me understand that we currently need to add cooling
device entries for all CPUs to the DT, even though at most one will be
active per freq domain at any time (I wonder if this could be changed
though).

Independently of the cooling devices, is there any value in having a
thermal zone for each CPU instead of having only one per freq domain?

Thanks

Matthias
Amit Kucheria - Jan. 10, 2019, 8:06 p.m.
On Thu, Jan 10, 2019 at 5:59 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Amit Kucheria (2019-01-09 16:00:55)
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
>
> Is the plan that these are some defaults that would be adjusted by board
> variants? Just curious why we have anything in here and don't punt it
> all to each board dts file.

These values are meant to be safe values for silicon operation as
characterised[1] by the HW team. So I'd consider them a more than just
default values. It also gives future boards a safe starting point.

IMHO it would be better to have the characterized values merged
upstream and if really required, those values can be tweaked in the
board-specific DTS.

[1] Caveat: The characterisation probably focused on mobile devices
and might change depending on form-factor, active cooling and heat
dissipation design but those differences should only impact the skin
temperature, not the operation of the SoC itself.
Viresh Kumar - Jan. 11, 2019, 3:32 a.m.
On 10-01-19, 12:00, Matthias Kaehlcke wrote:
> Viresh helped me understand that we currently need to add cooling
> device entries for all CPUs to the DT, even though at most one will be
> active per freq domain at any time (I wonder if this could be changed
> though).

Actually we were only adding cooling-cells in CPU0 until now and I
fixed that, so that is going to stay :)

The idea is that the hardware should be described properly and not
partially. Even if all the CPUs are part of the same freq-domain, they
are all capable of being a cooling device here and the DT should
describe that. Kernel will ofcourse create a single cooling device.
Amit Kucheria - Jan. 11, 2019, 10:24 a.m.
On Thu, Jan 10, 2019 at 6:45 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> Hi Amit,
>
> On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index c27cbd3bcb0a..29e823b0caf4 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > @@ -1692,7 +1692,7 @@
> >
> >                       trips {
> >                               cpu_alert0: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1713,7 +1713,7 @@
> >
> >                       trips {
> >                               cpu_alert1: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1734,7 +1734,7 @@
> >
> >                       trips {
> >                               cpu_alert2: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1755,7 +1755,7 @@
> >
> >                       trips {
> >                               cpu_alert3: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1776,7 +1776,7 @@
> >
> >                       trips {
> >                               cpu_alert4: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1797,7 +1797,7 @@
> >
> >                       trips {
> >                               cpu_alert5: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1818,7 +1818,7 @@
> >
> >                       trips {
> >                               cpu_alert6: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
> > @@ -1839,7 +1839,7 @@
> >
> >                       trips {
> >                               cpu_alert7: trip0 {
> > -                                     temperature = <75000>;
> > +                                     temperature = <95000>;
> >                                       hysteresis = <2000>;
> >                                       type = "passive";
> >                               };
>
> The change itself looks good to me, however I wonder if it would be
> worth to eliminate redundancy and merge the current 8 thermal zones
> into 2, one for the Silver and one for the Gold cluster (as done by
> http://crrev.com/c/1381752). There is a single cooling device for
> each cluster, so it's not clear to me if there is any gain from having
> a separate thermal zone for each CPU. If it is important to monitor
> the temperatures of the individual cores this can still be done by
> configuring the thermal zone of the cluster with multiple thermal
> sensors.

Reducing the number of thermal zones to 2 (by grouping 4 sensors per
zone) is not possible due a limitation of the thermal framework[1]. It
is something that we want to address. Previous attempts to fix this
were rejected for various reasons. Eduardo was going to share a way to
have more flexible mapping between sensors and zones after discussions
at LPC.

<nag> Eduardo, do you have anything we can review? </nag> :-)

Having said that, we'll need some aggregation functions when we add
multiple sensors to a zone (e.g. max, mean) to reflect the zone. This
will lose information about hotspots and prevent things like idle
injection on a particular CPU that is causing most of the heat in the
aggregated zone. So IMHO, it might be useful to have information about
the hotspots (i.e TZ per sensor) and aggregated values (ambient
temperature) that can be fed to the thermal policy.


[1] https://elixir.bootlin.com/linux/v5.0-rc1/source/drivers/thermal/of-thermal.c#L502
Matthias Kaehlcke - Jan. 11, 2019, 6:30 p.m.
On Fri, Jan 11, 2019 at 03:54:23PM +0530, Amit Kucheria wrote:
> On Thu, Jan 10, 2019 at 6:45 AM Matthias Kaehlcke <mka@chromium.org> wrote:
> >
> > Hi Amit,
> >
> > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > > Qualcomm engineers, increase it to 95 degrees.
> > >
> > > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > > ---
> > >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index c27cbd3bcb0a..29e823b0caf4 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -1692,7 +1692,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert0: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1713,7 +1713,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert1: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1734,7 +1734,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert2: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1755,7 +1755,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert3: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1776,7 +1776,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert4: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1797,7 +1797,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert5: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1818,7 +1818,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert6: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> > > @@ -1839,7 +1839,7 @@
> > >
> > >                       trips {
> > >                               cpu_alert7: trip0 {
> > > -                                     temperature = <75000>;
> > > +                                     temperature = <95000>;
> > >                                       hysteresis = <2000>;
> > >                                       type = "passive";
> > >                               };
> >
> > The change itself looks good to me, however I wonder if it would be
> > worth to eliminate redundancy and merge the current 8 thermal zones
> > into 2, one for the Silver and one for the Gold cluster (as done by
> > http://crrev.com/c/1381752). There is a single cooling device for
> > each cluster, so it's not clear to me if there is any gain from having
> > a separate thermal zone for each CPU. If it is important to monitor
> > the temperatures of the individual cores this can still be done by
> > configuring the thermal zone of the cluster with multiple thermal
> > sensors.
> 
> Reducing the number of thermal zones to 2 (by grouping 4 sensors per
> zone) is not possible due a limitation of the thermal framework[1]. It
> is something that we want to address. Previous attempts to fix this
> were rejected for various reasons. Eduardo was going to share a way to
> have more flexible mapping between sensors and zones after discussions
> at LPC.

I wasn't aware of this limitation, thanks for the clarification! With
this I understand that for now we indeed need the 8 thermal zones with
all the redundant information :(

> <nag> Eduardo, do you have anything we can review? </nag> :-)
> 
> Having said that, we'll need some aggregation functions when we add
> multiple sensors to a zone (e.g. max, mean) to reflect the zone. This
> will lose information about hotspots and prevent things like idle
> injection on a particular CPU that is causing most of the heat in the
> aggregated zone. So IMHO, it might be useful to have information about
> the hotspots (i.e TZ per sensor) and aggregated values (ambient
> temperature) that can be fed to the thermal policy.

Ok, it seems for now we need the 8 thermal zones in any case, when
support for multiple sensors becomes available we can evaluate whether
it's worth to change that or not.

Cheers

Matthias

Patch

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index c27cbd3bcb0a..29e823b0caf4 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -1692,7 +1692,7 @@ 
 
 			trips {
 				cpu_alert0: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1713,7 +1713,7 @@ 
 
 			trips {
 				cpu_alert1: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1734,7 +1734,7 @@ 
 
 			trips {
 				cpu_alert2: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1755,7 +1755,7 @@ 
 
 			trips {
 				cpu_alert3: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1776,7 +1776,7 @@ 
 
 			trips {
 				cpu_alert4: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1797,7 +1797,7 @@ 
 
 			trips {
 				cpu_alert5: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1818,7 +1818,7 @@ 
 
 			trips {
 				cpu_alert6: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1839,7 +1839,7 @@ 
 
 			trips {
 				cpu_alert7: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};