Patchwork Regression in v5.0-rc1 with autosuspend hrtimers

login
register
mail settings
Submitter Vincent Guittot
Date Jan. 9, 2019, 1:42 a.m.
Message ID <20190109014218.GA8363@linaro.org>
Download mbox | patch
Permalink /patch/695407/
State New
Headers show

Comments

Vincent Guittot - Jan. 9, 2019, 1:42 a.m.
Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> * Vincent Guittot <vincent.guittot@linaro.org> [190108 16:42]:
> > On Tue, 8 Jan 2019 at 16:53, Tony Lindgren <tony@atomide.com> wrote:
> > > Hmm so could it be that we now rely on timers that that may
> > > not be capable of waking up the system from idle states with
> > > hrtimer?
> > 
> > With nohz and hrtimer enabled,  timer relies on hrtimer to generate
> > the tick so you should use the same interrupt.
> 
> OK yeah looks like that part is working just fine.
> 
> Adding some printks and debugging over ssh, looks like
> omap8250_runtime_resume() gets called just fine based on a wakeirq,
> but then omap8250_runtime_suspend() runs immediately instead of
> waiting for the three second timeout.
> 
> Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> Anything higher than 2200 ms seems to somehow time out immediately
> now :)

This is quite close to the max ns of an int on arm 32bits

Could you try the patch below ?

---
 drivers/base/power/runtime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Tony Lindgren - Jan. 9, 2019, 1:51 a.m.
* Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > Anything higher than 2200 ms seems to somehow time out immediately
> > now :)
> 
> This is quite close to the max ns of an int on arm 32bits
> 
> Could you try the patch below ?

Yup great thanks, that's it:

Tested-by: Tony Lindgren <tony@atomide.com>

> ---
>  drivers/base/power/runtime.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 7062469..44c5c76 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -141,7 +141,7 @@ u64 pm_runtime_autosuspend_expiration(struct device *dev)
>  
>  	last_busy = READ_ONCE(dev->power.last_busy);
>  
> -	expires = last_busy + autosuspend_delay * NSEC_PER_MSEC;
> +	expires = last_busy + (u64)(autosuspend_delay) * NSEC_PER_MSEC;
>  	if (expires <= now)
>  		expires = 0;	/* Already expired. */
>  
> -- 
> 2.7.4
> 
> 
> > 
> > Regards,
> > 
> > Tony
Rafael J. Wysocki - Jan. 9, 2019, 9:43 a.m.
On Wed, Jan 9, 2019 at 2:51 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> > Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > > Anything higher than 2200 ms seems to somehow time out immediately
> > > now :)
> >
> > This is quite close to the max ns of an int on arm 32bits
> >
> > Could you try the patch below ?
>
> Yup great thanks, that's it:
>
> Tested-by: Tony Lindgren <tony@atomide.com>

Cool.  Thanks for getting to the bottom of this!
Tony Lindgren - Jan. 9, 2019, 4:28 p.m.
* Rafael J. Wysocki <rafael@kernel.org> [190109 09:44]:
> On Wed, Jan 9, 2019 at 2:51 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> > > Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > > > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > > > Anything higher than 2200 ms seems to somehow time out immediately
> > > > now :)
> > >
> > > This is quite close to the max ns of an int on arm 32bits
> > >
> > > Could you try the patch below ?
> >
> > Yup great thanks, that's it:
> >
> > Tested-by: Tony Lindgren <tony@atomide.com>
> 
> Cool.  Thanks for getting to the bottom of this!

No problem.

One more thing I noticed: The 25% slack can get noticeable
for larger values. For things like a 3 second uart console
timeout slack of 750 ms is quite large variation.

Should we have a limit of max 100 ms for the slack?

Regards,

Tony
Vincent Guittot - Jan. 9, 2019, 4:48 p.m.
On Wed, 9 Jan 2019 at 17:28, Tony Lindgren <tony@atomide.com> wrote:
>
> * Rafael J. Wysocki <rafael@kernel.org> [190109 09:44]:
> > On Wed, Jan 9, 2019 at 2:51 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> > > > Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > > > > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > > > > Anything higher than 2200 ms seems to somehow time out immediately
> > > > > now :)
> > > >
> > > > This is quite close to the max ns of an int on arm 32bits
> > > >
> > > > Could you try the patch below ?
> > >
> > > Yup great thanks, that's it:
> > >
> > > Tested-by: Tony Lindgren <tony@atomide.com>
> >
> > Cool.  Thanks for getting to the bottom of this!
>
> No problem.
>
> One more thing I noticed: The 25% slack can get noticeable
> for larger values. For things like a 3 second uart console
> timeout slack of 750 ms is quite large variation.
>
> Should we have a limit of max 100 ms for the slack?

Keep in mind that when jiffies were used, expires was rounded to a
full second when delay was greater than a second. So you could already
have difference of up 990ms on arm before this patch
And i don't take into account the rework of timer infra which add
another level of variation, something like up to 640 ms more when the
timer is greater than 2880 ms for arm IIRC
>
> Regards,
>
> Tony
Tony Lindgren - Jan. 9, 2019, 4:50 p.m.
* Vincent Guittot <vincent.guittot@linaro.org> [190109 16:48]:
> On Wed, 9 Jan 2019 at 17:28, Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Rafael J. Wysocki <rafael@kernel.org> [190109 09:44]:
> > > On Wed, Jan 9, 2019 at 2:51 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > * Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> > > > > Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > > > > > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > > > > > Anything higher than 2200 ms seems to somehow time out immediately
> > > > > > now :)
> > > > >
> > > > > This is quite close to the max ns of an int on arm 32bits
> > > > >
> > > > > Could you try the patch below ?
> > > >
> > > > Yup great thanks, that's it:
> > > >
> > > > Tested-by: Tony Lindgren <tony@atomide.com>
> > >
> > > Cool.  Thanks for getting to the bottom of this!
> >
> > No problem.
> >
> > One more thing I noticed: The 25% slack can get noticeable
> > for larger values. For things like a 3 second uart console
> > timeout slack of 750 ms is quite large variation.
> >
> > Should we have a limit of max 100 ms for the slack?
> 
> Keep in mind that when jiffies were used, expires was rounded to a
> full second when delay was greater than a second. So you could already
> have difference of up 990ms on arm before this patch
> And i don't take into account the rework of timer infra which add
> another level of variation, something like up to 640 ms more when the
> timer is greater than 2880 ms for arm IIRC

I think it was rounded up earlier.

Don't we get rounded down now also?

Regards,

Tony
Vincent Guittot - Jan. 9, 2019, 4:55 p.m.
On Wed, 9 Jan 2019 at 17:50, Tony Lindgren <tony@atomide.com> wrote:
>
> * Vincent Guittot <vincent.guittot@linaro.org> [190109 16:48]:
> > On Wed, 9 Jan 2019 at 17:28, Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Rafael J. Wysocki <rafael@kernel.org> [190109 09:44]:
> > > > On Wed, Jan 9, 2019 at 2:51 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > * Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> > > > > > Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > > > > > > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > > > > > > Anything higher than 2200 ms seems to somehow time out immediately
> > > > > > > now :)
> > > > > >
> > > > > > This is quite close to the max ns of an int on arm 32bits
> > > > > >
> > > > > > Could you try the patch below ?
> > > > >
> > > > > Yup great thanks, that's it:
> > > > >
> > > > > Tested-by: Tony Lindgren <tony@atomide.com>
> > > >
> > > > Cool.  Thanks for getting to the bottom of this!
> > >
> > > No problem.
> > >
> > > One more thing I noticed: The 25% slack can get noticeable
> > > for larger values. For things like a 3 second uart console
> > > timeout slack of 750 ms is quite large variation.
> > >
> > > Should we have a limit of max 100 ms for the slack?
> >
> > Keep in mind that when jiffies were used, expires was rounded to a
> > full second when delay was greater than a second. So you could already
> > have difference of up 990ms on arm before this patch
> > And i don't take into account the rework of timer infra which add
> > another level of variation, something like up to 640 ms more when the
> > timer is greater than 2880 ms for arm IIRC
>
> I think it was rounded up earlier.
>
> Don't we get rounded down now also?

We still round up. In hrtimer we have :
timer->_softexpires = time;
timer->node.expires = ktime_add_safe(time, delta);
so the hrtimer will expire between "time" and "time+delta"

>
> Regards,
>
> Tony
Tony Lindgren - Jan. 9, 2019, 5:02 p.m.
* Vincent Guittot <vincent.guittot@linaro.org> [190109 16:56]:
> On Wed, 9 Jan 2019 at 17:50, Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Vincent Guittot <vincent.guittot@linaro.org> [190109 16:48]:
> > > On Wed, 9 Jan 2019 at 17:28, Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > * Rafael J. Wysocki <rafael@kernel.org> [190109 09:44]:
> > > > > On Wed, Jan 9, 2019 at 2:51 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > >
> > > > > > * Vincent Guittot <vincent.guittot@linaro.org> [190109 01:42]:
> > > > > > > Le Tuesday 08 Jan 2019 à 13:37:43 (-0800), Tony Lindgren a écrit :
> > > > > > > > Lowering the autosuspend_delay_ms to 2100 ms makes things work again.
> > > > > > > > Anything higher than 2200 ms seems to somehow time out immediately
> > > > > > > > now :)
> > > > > > >
> > > > > > > This is quite close to the max ns of an int on arm 32bits
> > > > > > >
> > > > > > > Could you try the patch below ?
> > > > > >
> > > > > > Yup great thanks, that's it:
> > > > > >
> > > > > > Tested-by: Tony Lindgren <tony@atomide.com>
> > > > >
> > > > > Cool.  Thanks for getting to the bottom of this!
> > > >
> > > > No problem.
> > > >
> > > > One more thing I noticed: The 25% slack can get noticeable
> > > > for larger values. For things like a 3 second uart console
> > > > timeout slack of 750 ms is quite large variation.
> > > >
> > > > Should we have a limit of max 100 ms for the slack?
> > >
> > > Keep in mind that when jiffies were used, expires was rounded to a
> > > full second when delay was greater than a second. So you could already
> > > have difference of up 990ms on arm before this patch
> > > And i don't take into account the rework of timer infra which add
> > > another level of variation, something like up to 640 ms more when the
> > > timer is greater than 2880 ms for arm IIRC
> >
> > I think it was rounded up earlier.
> >
> > Don't we get rounded down now also?
> 
> We still round up. In hrtimer we have :
> timer->_softexpires = time;
> timer->node.expires = ktime_add_safe(time, delta);
> so the hrtimer will expire between "time" and "time+delta"

OK thanks for checking it. In that case we should be good to go :)

Tony

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 7062469..44c5c76 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -141,7 +141,7 @@  u64 pm_runtime_autosuspend_expiration(struct device *dev)
 
 	last_busy = READ_ONCE(dev->power.last_busy);
 
-	expires = last_busy + autosuspend_delay * NSEC_PER_MSEC;
+	expires = last_busy + (u64)(autosuspend_delay) * NSEC_PER_MSEC;
 	if (expires <= now)
 		expires = 0;	/* Already expired. */