Patchwork [1/2] PM-runtime: Take suppliers into account in __pm_runtime_set_status()

login
register
mail settings
Submitter Rafael J. Wysocki
Date Feb. 7, 2019, 6:38 p.m.
Message ID <1634058.cXDNg15SOd@aspire.rjw.lan>
Download mbox | patch
Permalink /patch/721069/
State New
Headers show

Comments

Rafael J. Wysocki - Feb. 7, 2019, 6:38 p.m.
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

If the target device has any suppliers, as reflected by device links
to them, __pm_runtime_set_status() does not take them into account,
which is not consistent with the other parts of the PM-runtime
framework and may lead to programming mistakes.

Modify __pm_runtime_set_status() to take suppliers into account by
activating them upfront if the new status is RPM_ACTIVE and
deactivating them on exit if the new status is RPM_SUSPENDED.

If the activation of one of the suppliers fails, the new status
will be RPM_SUSPENDED and the (remaining) suppliers will be
deactivated on exit (the child count of the device's parent
will be dropped too then).

Of course, adding device links locking to __pm_runtime_set_status()
means that it cannot be run fron interrupt context, so make it use
spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
and spin_unlock_irqrestore(), respectively.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/runtime.c |   45 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)
Ulf Hansson - Feb. 11, 2019, 1:27 p.m.
On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> If the target device has any suppliers, as reflected by device links
> to them, __pm_runtime_set_status() does not take them into account,
> which is not consistent with the other parts of the PM-runtime
> framework and may lead to programming mistakes.
>
> Modify __pm_runtime_set_status() to take suppliers into account by
> activating them upfront if the new status is RPM_ACTIVE and
> deactivating them on exit if the new status is RPM_SUSPENDED.
>
> If the activation of one of the suppliers fails, the new status
> will be RPM_SUSPENDED and the (remaining) suppliers will be
> deactivated on exit (the child count of the device's parent
> will be dropped too then).
>
> Of course, adding device links locking to __pm_runtime_set_status()
> means that it cannot be run fron interrupt context, so make it use
> spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> and spin_unlock_irqrestore(), respectively.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Rafael, thanks for working on this!

I am running some tests at my side, but still not achieving the
behavior I expect to. Will let you know when I have more details, but
first some comments below.

> ---
>  drivers/base/power/runtime.c |   45 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 5 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1102,20 +1102,43 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_in_u
>   * and the device parent's counter of unsuspended children is modified to
>   * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
>   * notification request for the parent is submitted.
> + *
> + * If @dev has any suppliers (as reflected by device links to them), and @status
> + * is RPM_ACTIVE, they will be activated upfront and if the activation of one
> + * of them fails, the status of @dev will be changed to RPM_SUSPENDED (instead
> + * of the @status value) and the suppliers will be deacticated on exit.  The
> + * error returned by the failing supplier activation will be returned in that
> + * case.
>   */
>  int __pm_runtime_set_status(struct device *dev, unsigned int status)
>  {
>         struct device *parent = dev->parent;
> -       unsigned long flags;
>         bool notify_parent = false;
>         int error = 0;
>
>         if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
>                 return -EINVAL;
>
> -       spin_lock_irqsave(&dev->power.lock, flags);
> +       /*
> +        * If the new status is RPM_ACTIVE, the suppliers can be activated
> +        * upfront regardless of the current status, because next time
> +        * rpm_put_suppliers() runs, the rpm_active refcounts of the links
> +        * involved will be dropped down to one anyway.
> +        */
> +       if (status == RPM_ACTIVE) {
> +               int idx = device_links_read_lock();
> +
> +               error = rpm_get_suppliers(dev);
> +               if (error)
> +                       status = RPM_SUSPENDED;
> +
> +               device_links_read_unlock(idx);
> +       }

This doesn't look right to me, and more importantly, this isn't
consistent with how we treat a parent/child.

More precisely, I think you need to check "if
(!dev->power.runtime_error && !dev->power.disable_depth)" and also
whether "dev->power.runtime_status == status", before deciding to call
rpm_get_suppliers() above. Otherwise you may end up resuming suppliers
and/or increasing the link->rpm_active count, when you shouldn't.

In other words, expecting __pm_runtime_set_status() to be called in
"balanced" manner isn't correct.

> +
> +       spin_lock_irq(&dev->power.lock);
>
>         if (!dev->power.runtime_error && !dev->power.disable_depth) {
> +               status = dev->power.runtime_status;
>                 error = -EAGAIN;
>                 goto out;
>         }
> @@ -1147,19 +1170,31 @@ int __pm_runtime_set_status(struct devic
>
>                 spin_unlock(&parent->power.lock);
>
> -               if (error)
> +               if (error) {
> +                       status = RPM_SUSPENDED;
>                         goto out;
> +               }
>         }
>
>   out_set:
>         __update_runtime_status(dev, status);
> -       dev->power.runtime_error = 0;
> +       if (!error)
> +               dev->power.runtime_error = 0;
> +
>   out:
> -       spin_unlock_irqrestore(&dev->power.lock, flags);
> +       spin_unlock_irq(&dev->power.lock);
>
>         if (notify_parent)
>                 pm_request_idle(parent);
>
> +       if (status == RPM_SUSPENDED) {
> +               int idx = device_links_read_lock();
> +
> +               rpm_put_suppliers(dev);
> +
> +               device_links_read_unlock(idx);
> +       }
> +
>         return error;
>  }
>  EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
>

Kind regards
Uffe
Ulf Hansson - Feb. 11, 2019, 3:50 p.m.
On Mon, 11 Feb 2019 at 14:27, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > If the target device has any suppliers, as reflected by device links
> > to them, __pm_runtime_set_status() does not take them into account,
> > which is not consistent with the other parts of the PM-runtime
> > framework and may lead to programming mistakes.
> >
> > Modify __pm_runtime_set_status() to take suppliers into account by
> > activating them upfront if the new status is RPM_ACTIVE and
> > deactivating them on exit if the new status is RPM_SUSPENDED.
> >
> > If the activation of one of the suppliers fails, the new status
> > will be RPM_SUSPENDED and the (remaining) suppliers will be
> > deactivated on exit (the child count of the device's parent
> > will be dropped too then).
> >
> > Of course, adding device links locking to __pm_runtime_set_status()
> > means that it cannot be run fron interrupt context, so make it use
> > spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> > and spin_unlock_irqrestore(), respectively.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Rafael, thanks for working on this!
>
> I am running some tests at my side, but still not achieving the
> behavior I expect to. Will let you know when I have more details, but
> first some comments below.

Alright, this is what I found.

When I call pm_runtime_set_suspended(), in the ->probe() error path of
my RPM test driver (I am removing the device link afterwards), then my
expectation was that this should allow the supplier to become runtime
suspended (sooner or later). This isn't the case, as it turns out the
runtime PM usage count of the supplier, still remains 1 after the
probe failure.

My observation is that with $subject patch, the link->rpm_active count
is now reaching 1, before it stayed at 2 - so one step forward. :-)

However, the reason to why the runtime PM usage count never reaches 0,
is because of the call to pm_runtime_get_noresume(supplier) in
device_link_rpm_prepare(), which is called from device_link_add().

To solve the problem, it seems like we need to call
pm_runtime_put(supplier), in case the device link is deleted while the
consumer is still probing.

>
> > ---
> >  drivers/base/power/runtime.c |   45 ++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 40 insertions(+), 5 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -1102,20 +1102,43 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_in_u
> >   * and the device parent's counter of unsuspended children is modified to
> >   * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
> >   * notification request for the parent is submitted.
> > + *
> > + * If @dev has any suppliers (as reflected by device links to them), and @status
> > + * is RPM_ACTIVE, they will be activated upfront and if the activation of one
> > + * of them fails, the status of @dev will be changed to RPM_SUSPENDED (instead
> > + * of the @status value) and the suppliers will be deacticated on exit.  The
> > + * error returned by the failing supplier activation will be returned in that
> > + * case.
> >   */
> >  int __pm_runtime_set_status(struct device *dev, unsigned int status)
> >  {
> >         struct device *parent = dev->parent;
> > -       unsigned long flags;
> >         bool notify_parent = false;
> >         int error = 0;
> >
> >         if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
> >                 return -EINVAL;
> >
> > -       spin_lock_irqsave(&dev->power.lock, flags);
> > +       /*
> > +        * If the new status is RPM_ACTIVE, the suppliers can be activated
> > +        * upfront regardless of the current status, because next time
> > +        * rpm_put_suppliers() runs, the rpm_active refcounts of the links
> > +        * involved will be dropped down to one anyway.
> > +        */
> > +       if (status == RPM_ACTIVE) {
> > +               int idx = device_links_read_lock();
> > +
> > +               error = rpm_get_suppliers(dev);
> > +               if (error)
> > +                       status = RPM_SUSPENDED;
> > +
> > +               device_links_read_unlock(idx);
> > +       }
>
> This doesn't look right to me, and more importantly, this isn't
> consistent with how we treat a parent/child.
>
> More precisely, I think you need to check "if
> (!dev->power.runtime_error && !dev->power.disable_depth)" and also
> whether "dev->power.runtime_status == status", before deciding to call
> rpm_get_suppliers() above. Otherwise you may end up resuming suppliers
> and/or increasing the link->rpm_active count, when you shouldn't.
>
> In other words, expecting __pm_runtime_set_status() to be called in
> "balanced" manner isn't correct.
>
> > +
> > +       spin_lock_irq(&dev->power.lock);
> >
> >         if (!dev->power.runtime_error && !dev->power.disable_depth) {
> > +               status = dev->power.runtime_status;
> >                 error = -EAGAIN;
> >                 goto out;
> >         }
> > @@ -1147,19 +1170,31 @@ int __pm_runtime_set_status(struct devic
> >
> >                 spin_unlock(&parent->power.lock);
> >
> > -               if (error)
> > +               if (error) {
> > +                       status = RPM_SUSPENDED;
> >                         goto out;
> > +               }
> >         }
> >
> >   out_set:
> >         __update_runtime_status(dev, status);
> > -       dev->power.runtime_error = 0;
> > +       if (!error)
> > +               dev->power.runtime_error = 0;
> > +
> >   out:
> > -       spin_unlock_irqrestore(&dev->power.lock, flags);
> > +       spin_unlock_irq(&dev->power.lock);
> >
> >         if (notify_parent)
> >                 pm_request_idle(parent);
> >
> > +       if (status == RPM_SUSPENDED) {
> > +               int idx = device_links_read_lock();
> > +
> > +               rpm_put_suppliers(dev);
> > +
> > +               device_links_read_unlock(idx);
> > +       }
> > +
> >         return error;
> >  }
> >  EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
> >
>
> Kind regards
> Uffe
Rafael J. Wysocki - Feb. 11, 2019, 10:41 p.m.
On Mon, Feb 11, 2019 at 2:28 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > If the target device has any suppliers, as reflected by device links
> > to them, __pm_runtime_set_status() does not take them into account,
> > which is not consistent with the other parts of the PM-runtime
> > framework and may lead to programming mistakes.
> >
> > Modify __pm_runtime_set_status() to take suppliers into account by
> > activating them upfront if the new status is RPM_ACTIVE and
> > deactivating them on exit if the new status is RPM_SUSPENDED.
> >
> > If the activation of one of the suppliers fails, the new status
> > will be RPM_SUSPENDED and the (remaining) suppliers will be
> > deactivated on exit (the child count of the device's parent
> > will be dropped too then).
> >
> > Of course, adding device links locking to __pm_runtime_set_status()
> > means that it cannot be run fron interrupt context, so make it use
> > spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> > and spin_unlock_irqrestore(), respectively.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Rafael, thanks for working on this!
>
> I am running some tests at my side, but still not achieving the
> behavior I expect to. Will let you know when I have more details, but
> first some comments below.
>
> > ---
> >  drivers/base/power/runtime.c |   45 ++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 40 insertions(+), 5 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -1102,20 +1102,43 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_in_u
> >   * and the device parent's counter of unsuspended children is modified to
> >   * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
> >   * notification request for the parent is submitted.
> > + *
> > + * If @dev has any suppliers (as reflected by device links to them), and @status
> > + * is RPM_ACTIVE, they will be activated upfront and if the activation of one
> > + * of them fails, the status of @dev will be changed to RPM_SUSPENDED (instead
> > + * of the @status value) and the suppliers will be deacticated on exit.  The
> > + * error returned by the failing supplier activation will be returned in that
> > + * case.
> >   */
> >  int __pm_runtime_set_status(struct device *dev, unsigned int status)
> >  {
> >         struct device *parent = dev->parent;
> > -       unsigned long flags;
> >         bool notify_parent = false;
> >         int error = 0;
> >
> >         if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
> >                 return -EINVAL;
> >
> > -       spin_lock_irqsave(&dev->power.lock, flags);
> > +       /*
> > +        * If the new status is RPM_ACTIVE, the suppliers can be activated
> > +        * upfront regardless of the current status, because next time
> > +        * rpm_put_suppliers() runs, the rpm_active refcounts of the links
> > +        * involved will be dropped down to one anyway.
> > +        */
> > +       if (status == RPM_ACTIVE) {
> > +               int idx = device_links_read_lock();
> > +
> > +               error = rpm_get_suppliers(dev);
> > +               if (error)
> > +                       status = RPM_SUSPENDED;
> > +
> > +               device_links_read_unlock(idx);
> > +       }
>
> This doesn't look right to me, and more importantly, this isn't
> consistent with how we treat a parent/child.

It cannot be entirely consistent with that, because you cannot walk
the suppliers under the device's power.lock.

The idea here is that activating suppliers upfront if the new status
is RPM_ACTIVE shouldn't hurt regardless.

> More precisely, I think you need to check "if
> (!dev->power.runtime_error && !dev->power.disable_depth)" and also
> whether "dev->power.runtime_status == status", before deciding to call
> rpm_get_suppliers() above. Otherwise you may end up resuming suppliers
> and/or increasing the link->rpm_active count, when you shouldn't.

Resuming suppliers unnecessarily is not particularly efficient, but it
is not incorrect.  Incrementing their rpm_active temporarily also
isn't incorrect as long as the rpm_active values are correct on exit
(and note that incementing them if the consumer's status is RPM_ACTIVE
doesn't even matter).

> In other words, expecting __pm_runtime_set_status() to be called in
> "balanced" manner isn't correct.

There is no such expectation here.

There is a possible race between __pm_runtime_set_status() and runtime
suspend or resume of the device in case PM-runtime is enabled for it
when __pm_runtime_set_status() is called, but it shouldn't occur if
__pm_runtime_set_status() is used correctly (that is, when PM-runtime
is disabled for the device).

I think I know how to avoid that race, though, so I'm going to post an
incremental fix if that works out.
Rafael J. Wysocki - Feb. 11, 2019, 11:05 p.m.
On Mon, Feb 11, 2019 at 4:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 11 Feb 2019 at 14:27, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > If the target device has any suppliers, as reflected by device links
> > > to them, __pm_runtime_set_status() does not take them into account,
> > > which is not consistent with the other parts of the PM-runtime
> > > framework and may lead to programming mistakes.
> > >
> > > Modify __pm_runtime_set_status() to take suppliers into account by
> > > activating them upfront if the new status is RPM_ACTIVE and
> > > deactivating them on exit if the new status is RPM_SUSPENDED.
> > >
> > > If the activation of one of the suppliers fails, the new status
> > > will be RPM_SUSPENDED and the (remaining) suppliers will be
> > > deactivated on exit (the child count of the device's parent
> > > will be dropped too then).
> > >
> > > Of course, adding device links locking to __pm_runtime_set_status()
> > > means that it cannot be run fron interrupt context, so make it use
> > > spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> > > and spin_unlock_irqrestore(), respectively.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Rafael, thanks for working on this!
> >
> > I am running some tests at my side, but still not achieving the
> > behavior I expect to. Will let you know when I have more details, but
> > first some comments below.
>
> Alright, this is what I found.
>
> When I call pm_runtime_set_suspended(), in the ->probe() error path of
> my RPM test driver (I am removing the device link afterwards), then my
> expectation was that this should allow the supplier to become runtime
> suspended (sooner or later). This isn't the case, as it turns out the
> runtime PM usage count of the supplier, still remains 1 after the
> probe failure.
>
> My observation is that with $subject patch, the link->rpm_active count
> is now reaching 1, before it stayed at 2 - so one step forward. :-)
>
> However, the reason to why the runtime PM usage count never reaches 0,
> is because of the call to pm_runtime_get_noresume(supplier) in
> device_link_rpm_prepare(), which is called from device_link_add().

That was there previously, I've just moved it to device_link_rpm_prepare().

But good catch!

> To solve the problem, it seems like we need to call
> pm_runtime_put(supplier), in case the device link is deleted while the
> consumer is still probing.

I'd rather change the way pm_runtime_get/put_suppliers() work, so that
they use the rpm_active refcount, but pm_runtime_put_suppliers() only
drops it by one - unless it is one already.

Then, when adding a new link with DL_FLAG_RPM_ACTIVE,
device_link_add() only needs to increment its rpm_active *twice*
(instead of doing that once as to does now), so it will stay above one
after the subsequent pm_runtime_put_suppliers() - and if it goes away
in the meantime, then it will be cleaned up by the removal.

In turn, if a link is created without DL_FLAG_RPM_ACTIVE, its
rpm_active is one and then pm_runtime_put_suppliers() will just skip
it.

A patch will follow. :-)
Ulf Hansson - Feb. 12, 2019, 8:03 a.m.
On Tue, 12 Feb 2019 at 00:05, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Feb 11, 2019 at 4:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Mon, 11 Feb 2019 at 14:27, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > If the target device has any suppliers, as reflected by device links
> > > > to them, __pm_runtime_set_status() does not take them into account,
> > > > which is not consistent with the other parts of the PM-runtime
> > > > framework and may lead to programming mistakes.
> > > >
> > > > Modify __pm_runtime_set_status() to take suppliers into account by
> > > > activating them upfront if the new status is RPM_ACTIVE and
> > > > deactivating them on exit if the new status is RPM_SUSPENDED.
> > > >
> > > > If the activation of one of the suppliers fails, the new status
> > > > will be RPM_SUSPENDED and the (remaining) suppliers will be
> > > > deactivated on exit (the child count of the device's parent
> > > > will be dropped too then).
> > > >
> > > > Of course, adding device links locking to __pm_runtime_set_status()
> > > > means that it cannot be run fron interrupt context, so make it use
> > > > spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> > > > and spin_unlock_irqrestore(), respectively.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Rafael, thanks for working on this!
> > >
> > > I am running some tests at my side, but still not achieving the
> > > behavior I expect to. Will let you know when I have more details, but
> > > first some comments below.
> >
> > Alright, this is what I found.
> >
> > When I call pm_runtime_set_suspended(), in the ->probe() error path of
> > my RPM test driver (I am removing the device link afterwards), then my
> > expectation was that this should allow the supplier to become runtime
> > suspended (sooner or later). This isn't the case, as it turns out the
> > runtime PM usage count of the supplier, still remains 1 after the
> > probe failure.
> >
> > My observation is that with $subject patch, the link->rpm_active count
> > is now reaching 1, before it stayed at 2 - so one step forward. :-)
> >
> > However, the reason to why the runtime PM usage count never reaches 0,
> > is because of the call to pm_runtime_get_noresume(supplier) in
> > device_link_rpm_prepare(), which is called from device_link_add().
>
> That was there previously, I've just moved it to device_link_rpm_prepare().

Correct. The problem been there before. Even without using  DL_FLAG_RPM_ACTIVE.

>
> But good catch!
>
> > To solve the problem, it seems like we need to call
> > pm_runtime_put(supplier), in case the device link is deleted while the
> > consumer is still probing.
>
> I'd rather change the way pm_runtime_get/put_suppliers() work, so that
> they use the rpm_active refcount, but pm_runtime_put_suppliers() only
> drops it by one - unless it is one already.

That seems like a very reasonable approach!

The mix between calling pm_runtime_get/put*() on the supplier device
directly vs using the path with the rpm_active count, is to me rather
confusing. Using only the latter, would be a nice cleanup anyway, I
think.

>
> Then, when adding a new link with DL_FLAG_RPM_ACTIVE,
> device_link_add() only needs to increment its rpm_active *twice*
> (instead of doing that once as to does now), so it will stay above one
> after the subsequent pm_runtime_put_suppliers() - and if it goes away
> in the meantime, then it will be cleaned up by the removal.

Assuming you will add a check for "consumer->links.status ==
DL_DEV_PROBING" to understand if rpm_active should by be decreased.

Yes, it seems reasonable.

>
> In turn, if a link is created without DL_FLAG_RPM_ACTIVE, its
> rpm_active is one and then pm_runtime_put_suppliers() will just skip
> it.
>
> A patch will follow. :-)

Great, I am here to review it. :-)

Kind regards
Uffe
Ulf Hansson - Feb. 12, 2019, 8:25 a.m.
On Mon, 11 Feb 2019 at 23:41, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
>  On Mon, Feb 11, 2019 at 2:28 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > If the target device has any suppliers, as reflected by device links
> > > to them, __pm_runtime_set_status() does not take them into account,
> > > which is not consistent with the other parts of the PM-runtime
> > > framework and may lead to programming mistakes.
> > >
> > > Modify __pm_runtime_set_status() to take suppliers into account by
> > > activating them upfront if the new status is RPM_ACTIVE and
> > > deactivating them on exit if the new status is RPM_SUSPENDED.
> > >
> > > If the activation of one of the suppliers fails, the new status
> > > will be RPM_SUSPENDED and the (remaining) suppliers will be
> > > deactivated on exit (the child count of the device's parent
> > > will be dropped too then).
> > >
> > > Of course, adding device links locking to __pm_runtime_set_status()
> > > means that it cannot be run fron interrupt context, so make it use
> > > spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> > > and spin_unlock_irqrestore(), respectively.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Rafael, thanks for working on this!
> >
> > I am running some tests at my side, but still not achieving the
> > behavior I expect to. Will let you know when I have more details, but
> > first some comments below.
> >
> > > ---
> > >  drivers/base/power/runtime.c |   45 ++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 40 insertions(+), 5 deletions(-)
> > >
> > > Index: linux-pm/drivers/base/power/runtime.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/base/power/runtime.c
> > > +++ linux-pm/drivers/base/power/runtime.c
> > > @@ -1102,20 +1102,43 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_in_u
> > >   * and the device parent's counter of unsuspended children is modified to
> > >   * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
> > >   * notification request for the parent is submitted.
> > > + *
> > > + * If @dev has any suppliers (as reflected by device links to them), and @status
> > > + * is RPM_ACTIVE, they will be activated upfront and if the activation of one
> > > + * of them fails, the status of @dev will be changed to RPM_SUSPENDED (instead
> > > + * of the @status value) and the suppliers will be deacticated on exit.  The
> > > + * error returned by the failing supplier activation will be returned in that
> > > + * case.
> > >   */
> > >  int __pm_runtime_set_status(struct device *dev, unsigned int status)
> > >  {
> > >         struct device *parent = dev->parent;
> > > -       unsigned long flags;
> > >         bool notify_parent = false;
> > >         int error = 0;
> > >
> > >         if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
> > >                 return -EINVAL;
> > >
> > > -       spin_lock_irqsave(&dev->power.lock, flags);
> > > +       /*
> > > +        * If the new status is RPM_ACTIVE, the suppliers can be activated
> > > +        * upfront regardless of the current status, because next time
> > > +        * rpm_put_suppliers() runs, the rpm_active refcounts of the links
> > > +        * involved will be dropped down to one anyway.
> > > +        */
> > > +       if (status == RPM_ACTIVE) {
> > > +               int idx = device_links_read_lock();
> > > +
> > > +               error = rpm_get_suppliers(dev);
> > > +               if (error)
> > > +                       status = RPM_SUSPENDED;
> > > +
> > > +               device_links_read_unlock(idx);
> > > +       }
> >
> > This doesn't look right to me, and more importantly, this isn't
> > consistent with how we treat a parent/child.
>
> It cannot be entirely consistent with that, because you cannot walk
> the suppliers under the device's power.lock.
>
> The idea here is that activating suppliers upfront if the new status
> is RPM_ACTIVE shouldn't hurt regardless.

I see. However, perhaps we can just read out the needed flags/states
(within device's power.lock) before walking the suppliers.

In principle, those flags/states shouldn't really change, in case
runtime PM have been properly disabled by the caller.

>
> > More precisely, I think you need to check "if
> > (!dev->power.runtime_error && !dev->power.disable_depth)" and also
> > whether "dev->power.runtime_status == status", before deciding to call
> > rpm_get_suppliers() above. Otherwise you may end up resuming suppliers
> > and/or increasing the link->rpm_active count, when you shouldn't.
>
> Resuming suppliers unnecessarily is not particularly efficient, but it
> is not incorrect.  Incrementing their rpm_active temporarily also
> isn't incorrect as long as the rpm_active values are correct on exit
> (and note that incementing them if the consumer's status is RPM_ACTIVE
> doesn't even matter).
>
> > In other words, expecting __pm_runtime_set_status() to be called in
> > "balanced" manner isn't correct.
>
> There is no such expectation here.

You are right!

I didn't realize that rpm_put_suppliers() actually doesn't drop the
usage count only by one, but instead as many times as needed to let
rpm_active reach one.

>
> There is a possible race between __pm_runtime_set_status() and runtime
> suspend or resume of the device in case PM-runtime is enabled for it
> when __pm_runtime_set_status() is called, but it shouldn't occur if
> __pm_runtime_set_status() is used correctly (that is, when PM-runtime
> is disabled for the device).
>
> I think I know how to avoid that race, though, so I'm going to post an
> incremental fix if that works out.

Okay, let's see what comes out of this.

Kind regards
Uffe
Ulf Hansson - Feb. 12, 2019, 4:02 p.m.
On Thu, 7 Feb 2019 at 19:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> If the target device has any suppliers, as reflected by device links
> to them, __pm_runtime_set_status() does not take them into account,
> which is not consistent with the other parts of the PM-runtime
> framework and may lead to programming mistakes.
>
> Modify __pm_runtime_set_status() to take suppliers into account by
> activating them upfront if the new status is RPM_ACTIVE and
> deactivating them on exit if the new status is RPM_SUSPENDED.
>
> If the activation of one of the suppliers fails, the new status
> will be RPM_SUSPENDED and the (remaining) suppliers will be
> deactivated on exit (the child count of the device's parent
> will be dropped too then).
>
> Of course, adding device links locking to __pm_runtime_set_status()
> means that it cannot be run fron interrupt context, so make it use
> spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
> and spin_unlock_irqrestore(), respectively.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe


> ---
>  drivers/base/power/runtime.c |   45 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 5 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1102,20 +1102,43 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_in_u
>   * and the device parent's counter of unsuspended children is modified to
>   * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
>   * notification request for the parent is submitted.
> + *
> + * If @dev has any suppliers (as reflected by device links to them), and @status
> + * is RPM_ACTIVE, they will be activated upfront and if the activation of one
> + * of them fails, the status of @dev will be changed to RPM_SUSPENDED (instead
> + * of the @status value) and the suppliers will be deacticated on exit.  The
> + * error returned by the failing supplier activation will be returned in that
> + * case.
>   */
>  int __pm_runtime_set_status(struct device *dev, unsigned int status)
>  {
>         struct device *parent = dev->parent;
> -       unsigned long flags;
>         bool notify_parent = false;
>         int error = 0;
>
>         if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
>                 return -EINVAL;
>
> -       spin_lock_irqsave(&dev->power.lock, flags);
> +       /*
> +        * If the new status is RPM_ACTIVE, the suppliers can be activated
> +        * upfront regardless of the current status, because next time
> +        * rpm_put_suppliers() runs, the rpm_active refcounts of the links
> +        * involved will be dropped down to one anyway.
> +        */
> +       if (status == RPM_ACTIVE) {
> +               int idx = device_links_read_lock();
> +
> +               error = rpm_get_suppliers(dev);
> +               if (error)
> +                       status = RPM_SUSPENDED;
> +
> +               device_links_read_unlock(idx);
> +       }
> +
> +       spin_lock_irq(&dev->power.lock);
>
>         if (!dev->power.runtime_error && !dev->power.disable_depth) {
> +               status = dev->power.runtime_status;
>                 error = -EAGAIN;
>                 goto out;
>         }
> @@ -1147,19 +1170,31 @@ int __pm_runtime_set_status(struct devic
>
>                 spin_unlock(&parent->power.lock);
>
> -               if (error)
> +               if (error) {
> +                       status = RPM_SUSPENDED;
>                         goto out;
> +               }
>         }
>
>   out_set:
>         __update_runtime_status(dev, status);
> -       dev->power.runtime_error = 0;
> +       if (!error)
> +               dev->power.runtime_error = 0;
> +
>   out:
> -       spin_unlock_irqrestore(&dev->power.lock, flags);
> +       spin_unlock_irq(&dev->power.lock);
>
>         if (notify_parent)
>                 pm_request_idle(parent);
>
> +       if (status == RPM_SUSPENDED) {
> +               int idx = device_links_read_lock();
> +
> +               rpm_put_suppliers(dev);
> +
> +               device_links_read_unlock(idx);
> +       }
> +
>         return error;
>  }
>  EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
>

Patch

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1102,20 +1102,43 @@  EXPORT_SYMBOL_GPL(pm_runtime_get_if_in_u
  * and the device parent's counter of unsuspended children is modified to
  * reflect the new status.  If the new status is RPM_SUSPENDED, an idle
  * notification request for the parent is submitted.
+ *
+ * If @dev has any suppliers (as reflected by device links to them), and @status
+ * is RPM_ACTIVE, they will be activated upfront and if the activation of one
+ * of them fails, the status of @dev will be changed to RPM_SUSPENDED (instead
+ * of the @status value) and the suppliers will be deacticated on exit.  The
+ * error returned by the failing supplier activation will be returned in that
+ * case.
  */
 int __pm_runtime_set_status(struct device *dev, unsigned int status)
 {
 	struct device *parent = dev->parent;
-	unsigned long flags;
 	bool notify_parent = false;
 	int error = 0;
 
 	if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
 		return -EINVAL;
 
-	spin_lock_irqsave(&dev->power.lock, flags);
+	/*
+	 * If the new status is RPM_ACTIVE, the suppliers can be activated
+	 * upfront regardless of the current status, because next time
+	 * rpm_put_suppliers() runs, the rpm_active refcounts of the links
+	 * involved will be dropped down to one anyway.
+	 */
+	if (status == RPM_ACTIVE) {
+		int idx = device_links_read_lock();
+
+		error = rpm_get_suppliers(dev);
+		if (error)
+			status = RPM_SUSPENDED;
+
+		device_links_read_unlock(idx);
+	}
+
+	spin_lock_irq(&dev->power.lock);
 
 	if (!dev->power.runtime_error && !dev->power.disable_depth) {
+		status = dev->power.runtime_status;
 		error = -EAGAIN;
 		goto out;
 	}
@@ -1147,19 +1170,31 @@  int __pm_runtime_set_status(struct devic
 
 		spin_unlock(&parent->power.lock);
 
-		if (error)
+		if (error) {
+			status = RPM_SUSPENDED;
 			goto out;
+		}
 	}
 
  out_set:
 	__update_runtime_status(dev, status);
-	dev->power.runtime_error = 0;
+	if (!error)
+		dev->power.runtime_error = 0;
+
  out:
-	spin_unlock_irqrestore(&dev->power.lock, flags);
+	spin_unlock_irq(&dev->power.lock);
 
 	if (notify_parent)
 		pm_request_idle(parent);
 
+	if (status == RPM_SUSPENDED) {
+		int idx = device_links_read_lock();
+
+		rpm_put_suppliers(dev);
+
+		device_links_read_unlock(idx);
+	}
+
 	return error;
 }
 EXPORT_SYMBOL_GPL(__pm_runtime_set_status);