Patchwork [1/1] PM / Domains: Avoid a potential deadlock

login
register
mail settings
Submitter jiada_wang@mentor.com
Date March 12, 2019, 6:51 a.m.
Message ID <20190312065128.24994-1-jiada_wang@mentor.com>
Download mbox | patch
Permalink /patch/746737/
State New
Headers show

Comments

jiada_wang@mentor.com - March 12, 2019, 6:51 a.m.
Lockdep warns that prepare_lock and genpd->mlock can cause a deadlock
the deadlock scenario is like following:
First thread is probing cs2000
cs2000_probe()
  clk_register()
    __clk_core_init()
      clk_prepare_lock()                            ----> acquires prepare_lock
        cs2000_recalc_rate()
          i2c_smbus_read_byte_data()
            rcar_i2c_master_xfer()
              dma_request_chan()
                rcar_dmac_of_xlate()
                  rcar_dmac_alloc_chan_resources()
                    pm_runtime_get_sync()
                      __pm_runtime_resume()
                        rpm_resume()
                          rpm_callback()
                            genpd_runtime_resume()   ----> acquires genpd->mlock

Second thread is attaching any device to the same PM domain
genpd_add_device()
  genpd_lock()                                       ----> acquires genpd->mlock
    cpg_mssr_attach_dev()
      of_clk_get_from_provider()
        __of_clk_get_from_provider()
          __clk_create_clk()
            clk_prepare_lock()                       ----> acquires prepare_lock

Since currently no PM provider access genpd's critical section
in .attach_dev, and .detach_dev callbacks, so there is no need to protect
these two callbacks with genpd->mlock.
This patch avoids a potential deadlock by moving out .attach_dev and .detach_dev
from genpd->mlock, so that genpd->mlock won't be held when prepare_lock is acquired
in .attach_dev and .detach_dev

Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
---
 drivers/base/power/domain.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)
Ulf Hansson - March 13, 2019, 7:35 a.m.
On Tue, 12 Mar 2019 at 07:51, Jiada Wang <jiada_wang@mentor.com> wrote:
>
> Lockdep warns that prepare_lock and genpd->mlock can cause a deadlock
> the deadlock scenario is like following:
> First thread is probing cs2000
> cs2000_probe()
>   clk_register()
>     __clk_core_init()
>       clk_prepare_lock()                            ----> acquires prepare_lock
>         cs2000_recalc_rate()
>           i2c_smbus_read_byte_data()
>             rcar_i2c_master_xfer()
>               dma_request_chan()
>                 rcar_dmac_of_xlate()
>                   rcar_dmac_alloc_chan_resources()
>                     pm_runtime_get_sync()
>                       __pm_runtime_resume()
>                         rpm_resume()
>                           rpm_callback()
>                             genpd_runtime_resume()   ----> acquires genpd->mlock
>
> Second thread is attaching any device to the same PM domain
> genpd_add_device()
>   genpd_lock()                                       ----> acquires genpd->mlock
>     cpg_mssr_attach_dev()
>       of_clk_get_from_provider()
>         __of_clk_get_from_provider()
>           __clk_create_clk()
>             clk_prepare_lock()                       ----> acquires prepare_lock
>
> Since currently no PM provider access genpd's critical section
> in .attach_dev, and .detach_dev callbacks, so there is no need to protect
> these two callbacks with genpd->mlock.
> This patch avoids a potential deadlock by moving out .attach_dev and .detach_dev
> from genpd->mlock, so that genpd->mlock won't be held when prepare_lock is acquired
> in .attach_dev and .detach_dev

Thanks for the detailed description, this seems like a reasonable change to me!

>
> Signed-off-by: Jiada Wang <jiada_wang@mentor.com>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe


> ---
>  drivers/base/power/domain.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 500de1dee967..a00ca6b8117b 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1467,12 +1467,12 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
>         if (IS_ERR(gpd_data))
>                 return PTR_ERR(gpd_data);
>
> -       genpd_lock(genpd);
> -
>         ret = genpd->attach_dev ? genpd->attach_dev(genpd, dev) : 0;
>         if (ret)
>                 goto out;
>
> +       genpd_lock(genpd);
> +
>         dev_pm_domain_set(dev, &genpd->domain);
>
>         genpd->device_count++;
> @@ -1480,9 +1480,8 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
>
>         list_add_tail(&gpd_data->base.list_node, &genpd->dev_list);
>
> - out:
>         genpd_unlock(genpd);
> -
> + out:
>         if (ret)
>                 genpd_free_dev_data(dev, gpd_data);
>         else
> @@ -1531,15 +1530,15 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
>         genpd->device_count--;
>         genpd->max_off_time_changed = true;
>
> -       if (genpd->detach_dev)
> -               genpd->detach_dev(genpd, dev);
> -
>         dev_pm_domain_set(dev, NULL);
>
>         list_del_init(&pdd->list_node);
>
>         genpd_unlock(genpd);
>
> +       if (genpd->detach_dev)
> +               genpd->detach_dev(genpd, dev);
> +
>         genpd_free_dev_data(dev, gpd_data);
>
>         return 0;
> --
> 2.19.2
>
Geert Uytterhoeven - March 13, 2019, 2:45 p.m.
Hi Jiada,

CC Marek, Jon, Mike, Stephen, linux-clk, linux-renesas-soc

On Tue, Mar 12, 2019 at 7:51 AM Jiada Wang <jiada_wang@mentor.com> wrote:
> Lockdep warns that prepare_lock and genpd->mlock can cause a deadlock
> the deadlock scenario is like following:
> First thread is probing cs2000
> cs2000_probe()
>   clk_register()
>     __clk_core_init()
>       clk_prepare_lock()                            ----> acquires prepare_lock
>         cs2000_recalc_rate()
>           i2c_smbus_read_byte_data()
>             rcar_i2c_master_xfer()
>               dma_request_chan()
>                 rcar_dmac_of_xlate()
>                   rcar_dmac_alloc_chan_resources()
>                     pm_runtime_get_sync()
>                       __pm_runtime_resume()
>                         rpm_resume()
>                           rpm_callback()
>                             genpd_runtime_resume()   ----> acquires genpd->mlock
>
> Second thread is attaching any device to the same PM domain
> genpd_add_device()
>   genpd_lock()                                       ----> acquires genpd->mlock
>     cpg_mssr_attach_dev()
>       of_clk_get_from_provider()
>         __of_clk_get_from_provider()
>           __clk_create_clk()
>             clk_prepare_lock()                       ----> acquires prepare_lock
>
> Since currently no PM provider access genpd's critical section
> in .attach_dev, and .detach_dev callbacks, so there is no need to protect
> these two callbacks with genpd->mlock.
> This patch avoids a potential deadlock by moving out .attach_dev and .detach_dev
> from genpd->mlock, so that genpd->mlock won't be held when prepare_lock is acquired
> in .attach_dev and .detach_dev
>
> Signed-off-by: Jiada Wang <jiada_wang@mentor.com>

Thanks a lot for your perseverance!
I had tried putting each PM Domain in its own lockdep class, but that
didn't help.

Your patch fixes the lockdep warnings on my Salvator-X(S) boards.

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Your solution makes sense, as the .{at,de}tach_dev() callbacks are
owned by the individual PM Domain drivers, and not by the genpd core.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

One thing I'm still wondering about: genpd_add_device() is always called
with gpd_list_lock held.
However, the same cannot be said about genpd_remove_device():
  - pm_genpd_remove_device() does not take gpd_list_lock, unlike
    pm_genpd_add_device(),
  - genpd_dev_pm_detach() does not take gpd_list_lock, while
    __genpd_dev_pm_attach() does take the lock for adding,
    but not for removing (in the error patch)?

> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1467,12 +1467,12 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
>         if (IS_ERR(gpd_data))
>                 return PTR_ERR(gpd_data);
>
> -       genpd_lock(genpd);
> -
>         ret = genpd->attach_dev ? genpd->attach_dev(genpd, dev) : 0;
>         if (ret)
>                 goto out;
>
> +       genpd_lock(genpd);
> +
>         dev_pm_domain_set(dev, &genpd->domain);
>
>         genpd->device_count++;
> @@ -1480,9 +1480,8 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
>
>         list_add_tail(&gpd_data->base.list_node, &genpd->dev_list);
>
> - out:
>         genpd_unlock(genpd);
> -
> + out:
>         if (ret)
>                 genpd_free_dev_data(dev, gpd_data);
>         else
> @@ -1531,15 +1530,15 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
>         genpd->device_count--;
>         genpd->max_off_time_changed = true;
>
> -       if (genpd->detach_dev)
> -               genpd->detach_dev(genpd, dev);
> -
>         dev_pm_domain_set(dev, NULL);
>
>         list_del_init(&pdd->list_node);
>
>         genpd_unlock(genpd);
>
> +       if (genpd->detach_dev)
> +               genpd->detach_dev(genpd, dev);
> +
>         genpd_free_dev_data(dev, gpd_data);
>
>         return 0;

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Ulf Hansson - March 13, 2019, 7:52 p.m.
On Wed, 13 Mar 2019 at 15:45, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Jiada,
>
> CC Marek, Jon, Mike, Stephen, linux-clk, linux-renesas-soc
>
> On Tue, Mar 12, 2019 at 7:51 AM Jiada Wang <jiada_wang@mentor.com> wrote:
> > Lockdep warns that prepare_lock and genpd->mlock can cause a deadlock
> > the deadlock scenario is like following:
> > First thread is probing cs2000
> > cs2000_probe()
> >   clk_register()
> >     __clk_core_init()
> >       clk_prepare_lock()                            ----> acquires prepare_lock
> >         cs2000_recalc_rate()
> >           i2c_smbus_read_byte_data()
> >             rcar_i2c_master_xfer()
> >               dma_request_chan()
> >                 rcar_dmac_of_xlate()
> >                   rcar_dmac_alloc_chan_resources()
> >                     pm_runtime_get_sync()
> >                       __pm_runtime_resume()
> >                         rpm_resume()
> >                           rpm_callback()
> >                             genpd_runtime_resume()   ----> acquires genpd->mlock
> >
> > Second thread is attaching any device to the same PM domain
> > genpd_add_device()
> >   genpd_lock()                                       ----> acquires genpd->mlock
> >     cpg_mssr_attach_dev()
> >       of_clk_get_from_provider()
> >         __of_clk_get_from_provider()
> >           __clk_create_clk()
> >             clk_prepare_lock()                       ----> acquires prepare_lock
> >
> > Since currently no PM provider access genpd's critical section
> > in .attach_dev, and .detach_dev callbacks, so there is no need to protect
> > these two callbacks with genpd->mlock.
> > This patch avoids a potential deadlock by moving out .attach_dev and .detach_dev
> > from genpd->mlock, so that genpd->mlock won't be held when prepare_lock is acquired
> > in .attach_dev and .detach_dev
> >
> > Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
>
> Thanks a lot for your perseverance!
> I had tried putting each PM Domain in its own lockdep class, but that
> didn't help.
>
> Your patch fixes the lockdep warnings on my Salvator-X(S) boards.
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Your solution makes sense, as the .{at,de}tach_dev() callbacks are
> owned by the individual PM Domain drivers, and not by the genpd core.
>
> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> One thing I'm still wondering about: genpd_add_device() is always called
> with gpd_list_lock held.
> However, the same cannot be said about genpd_remove_device():
>   - pm_genpd_remove_device() does not take gpd_list_lock, unlike
>     pm_genpd_add_device(),
>   - genpd_dev_pm_detach() does not take gpd_list_lock, while
>     __genpd_dev_pm_attach() does take the lock for adding,
>     but not for removing (in the error patch)?

That's correctly observed and it's something that needs to be fixed in
the long run. Although I am not sure that we should use the
gpd_list_lock...

There are race conditions while removing a device at the same time as
the genpd is removed. As a matter of fact, even the debugfs isn't
correctly dropped, when a genpd is removed. However, because almost
none it ever removing a genpd, this isn't a problem in practice.

Anyway, I have a TODO list for genpd (starting to be quite longe) and
this one is already on it, whatever that means. :-)

[...]

Kind regards
Uffe

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 500de1dee967..a00ca6b8117b 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1467,12 +1467,12 @@  static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
 	if (IS_ERR(gpd_data))
 		return PTR_ERR(gpd_data);
 
-	genpd_lock(genpd);
-
 	ret = genpd->attach_dev ? genpd->attach_dev(genpd, dev) : 0;
 	if (ret)
 		goto out;
 
+	genpd_lock(genpd);
+
 	dev_pm_domain_set(dev, &genpd->domain);
 
 	genpd->device_count++;
@@ -1480,9 +1480,8 @@  static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
 
 	list_add_tail(&gpd_data->base.list_node, &genpd->dev_list);
 
- out:
 	genpd_unlock(genpd);
-
+ out:
 	if (ret)
 		genpd_free_dev_data(dev, gpd_data);
 	else
@@ -1531,15 +1530,15 @@  static int genpd_remove_device(struct generic_pm_domain *genpd,
 	genpd->device_count--;
 	genpd->max_off_time_changed = true;
 
-	if (genpd->detach_dev)
-		genpd->detach_dev(genpd, dev);
-
 	dev_pm_domain_set(dev, NULL);
 
 	list_del_init(&pdd->list_node);
 
 	genpd_unlock(genpd);
 
+	if (genpd->detach_dev)
+		genpd->detach_dev(genpd, dev);
+
 	genpd_free_dev_data(dev, gpd_data);
 
 	return 0;