Patchwork [1/5] dmaengine: bcm2835: Fix interrupt race on RT

login
register
mail settings
Submitter Lukas Wunner
Date Dec. 22, 2018, 7:28 a.m.
Message ID <72be86ba05f9dde6cebeb4763da831f0d2f642d0.1545462045.git.lukas@wunner.de>
Download mbox | patch
Permalink /patch/688985/
State New
Headers show

Comments

Lukas Wunner - Dec. 22, 2018, 7:28 a.m.
If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is
enabled or "threadirqs" was passed on the command line) and if system
load is sufficiently high that wakeup latency of IRQ threads degrades,
SPI DMA transactions on the BCM2835 occasionally break like this:

ks8851 spi0.0: SPI transfer timed out
bcm2835-dma 3f007000.dma: DMA transfer could not be terminated
ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed

The root cause is an assumption made by the DMA driver which is
documented in a code comment in bcm2835_dma_terminate_all():

/*
 * Stop DMA activity: we assume the callback will not be called
 * after bcm_dma_abort() returns (even if it does, it will see
 * c->desc is NULL and exit.)
 */

That assumption falls apart if the IRQ handler bcm2835_dma_callback() is
threaded:  A client may terminate a descriptor and issue a new one
before the IRQ handler had a chance to run.  In fact the IRQ handler may
miss an *arbitrary* number of descriptors.  The result is the following
race condition:

1. A descriptor finishes, its interrupt is deferred to the IRQ thread.
2. A client calls dma_terminate_async() which sets channel->desc = NULL.
3. The client issues a new descriptor.  Because channel->desc is NULL,
   bcm2835_dma_issue_pending() immediately starts the descriptor.
4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS
   register to acknowledge the interrupt.  This clears the ACTIVE flag,
   so the newly issued descriptor is paused in the middle of the
   transaction.  Because channel->desc is not NULL, the IRQ thread
   finalizes the descriptor and tries to start the next one.

I see two possible solutions:  The first is to call synchronize_irq()
in bcm2835_dma_issue_pending() to wait until the IRQ thread has
finished before issuing a new descriptor.  The downside of this approach
is unnecessary latency if clients desire rapidly terminating and
re-issuing descriptors and don't have any use for an IRQ callback.
(The SPI TX DMA channel is a case in point.)

A better alternative is to make the IRQ thread recognize that it has
missed descriptors and avoid finalizing the newly issued descriptor.
Therefore, only finalize a descriptor if the END flag is set in the CS
register.  Clear the flag when starting a new descriptor.  Perform both
operations under the vchan lock.  That way, there is practically no
impact on latency and throughput if the client doesn't care for the
interrupt:  Only minimal additional overhead is introduced as one
further MMIO read is necessary per interrupt.

That MMIO read is needed to write the same value to the CS register that
it currently has, thereby preserving the ACTIVE flag while clearing the
INT and END flags.  Note that the transaction may finish between reading
and writing the CS register, and in that case the write changes the
ACTIVE flag from 0 to 1.  But that's harmless, the chip will notice that
NEXTCONBK is 0 and self-clear the ACTIVE flag again.

Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.14+
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Florian Meier <florian.meier@koalo.de>
---
 drivers/dma/bcm2835-dma.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)
Vinod Koul - Jan. 7, 2019, 7 a.m.
Hi Lukas,

On 22-12-18, 08:28, Lukas Wunner wrote:
> If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is
> enabled or "threadirqs" was passed on the command line) and if system
> load is sufficiently high that wakeup latency of IRQ threads degrades,
> SPI DMA transactions on the BCM2835 occasionally break like this:
> 
> ks8851 spi0.0: SPI transfer timed out
> bcm2835-dma 3f007000.dma: DMA transfer could not be terminated
> ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed
> 
> The root cause is an assumption made by the DMA driver which is
> documented in a code comment in bcm2835_dma_terminate_all():
> 
> /*
>  * Stop DMA activity: we assume the callback will not be called
>  * after bcm_dma_abort() returns (even if it does, it will see
>  * c->desc is NULL and exit.)
>  */
> 
> That assumption falls apart if the IRQ handler bcm2835_dma_callback() is
> threaded:  A client may terminate a descriptor and issue a new one
> before the IRQ handler had a chance to run.  In fact the IRQ handler may

                                             ^^^
> miss an *arbitrary* number of descriptors.  The result is the following
                                          ^^^^^
this has double spaces in few places pls fix

> race condition:
> 
> 1. A descriptor finishes, its interrupt is deferred to the IRQ thread.
> 2. A client calls dma_terminate_async() which sets channel->desc = NULL.
> 3. The client issues a new descriptor.  Because channel->desc is NULL,
>    bcm2835_dma_issue_pending() immediately starts the descriptor.
> 4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS
>    register to acknowledge the interrupt.  This clears the ACTIVE flag,
>    so the newly issued descriptor is paused in the middle of the
>    transaction.  Because channel->desc is not NULL, the IRQ thread
>    finalizes the descriptor and tries to start the next one.
> 
> I see two possible solutions:  The first is to call synchronize_irq()
> in bcm2835_dma_issue_pending() to wait until the IRQ thread has
> finished before issuing a new descriptor.  The downside of this approach
> is unnecessary latency if clients desire rapidly terminating and
> re-issuing descriptors and don't have any use for an IRQ callback.
> (The SPI TX DMA channel is a case in point.)
> 
> A better alternative is to make the IRQ thread recognize that it has
> missed descriptors and avoid finalizing the newly issued descriptor.
> Therefore, only finalize a descriptor if the END flag is set in the CS
> register.  Clear the flag when starting a new descriptor.  Perform both
> operations under the vchan lock.  That way, there is practically no
> impact on latency and throughput if the client doesn't care for the
> interrupt:  Only minimal additional overhead is introduced as one
> further MMIO read is necessary per interrupt.
> 
> That MMIO read is needed to write the same value to the CS register that
> it currently has, thereby preserving the ACTIVE flag while clearing the
> INT and END flags.  Note that the transaction may finish between reading
> and writing the CS register, and in that case the write changes the
> ACTIVE flag from 0 to 1.  But that's harmless, the chip will notice that
> NEXTCONBK is 0 and self-clear the ACTIVE flag again.
> 
> Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v3.14+
> Cc: Frank Pavlic <f.pavlic@kunbus.de>
> Cc: Martin Sperl <kernel@martin.sperl.org>
> Cc: Florian Meier <florian.meier@koalo.de>
> ---
>  drivers/dma/bcm2835-dma.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index 1a44c8086d77..e94f41c56975 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -456,7 +456,8 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
>  	c->desc = d = to_bcm2835_dma_desc(&vd->tx);
>  
>  	writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
> -	writel(BCM2835_DMA_ACTIVE, c->chan_base + BCM2835_DMA_CS);
> +	writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_END,
> +	       c->chan_base + BCM2835_DMA_CS);

can you explain this change please? so every descriptor is written with
BCM2835_DMA_END?


>  }
>  
>  static irqreturn_t bcm2835_dma_callback(int irq, void *data)
> @@ -464,6 +465,7 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
>  	struct bcm2835_chan *c = data;
>  	struct bcm2835_desc *d;
>  	unsigned long flags;
> +	u32 cs;
>  
>  	/* check the shared interrupt */
>  	if (c->irq_flags & IRQF_SHARED) {
> @@ -477,11 +479,17 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
>  	spin_lock_irqsave(&c->vc.lock, flags);
>  
>  	/* Acknowledge interrupt */
> -	writel(BCM2835_DMA_INT, c->chan_base + BCM2835_DMA_CS);
> +	cs = readl(c->chan_base + BCM2835_DMA_CS);
> +	writel(cs, c->chan_base + BCM2835_DMA_CS);

So idea is to ack the interrupts asserted while this is running, right?


>  
>  	d = c->desc;
>  
> -	if (d) {
> +	/*
> +	 * If this IRQ handler is threaded, clients may terminate descriptors
> +	 * and issue new ones before the IRQ handler runs.  Avoid finalizing
                                                        ^^^^
this as well..


> +	 * such a newly issued descriptor by checking the END flag.
> +	 */
> +	if (d && cs & BCM2835_DMA_END) {
>  		if (d->cyclic) {
>  			/* call the cyclic callback */
>  			vchan_cyclic_callback(&d->vd);
> @@ -789,11 +797,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
>  	list_del_init(&c->node);
>  	spin_unlock(&d->lock);
>  
> -	/*
> -	 * Stop DMA activity: we assume the callback will not be called
> -	 * after bcm_dma_abort() returns (even if it does, it will see
> -	 * c->desc is NULL and exit.)
> -	 */
> +	/* stop DMA activity */
>  	if (c->desc) {
>  		vchan_terminate_vdesc(&c->desc->vd);
>  		c->desc = NULL;
> -- 
> 2.19.2
Lukas Wunner - Jan. 7, 2019, 8:32 a.m.
On Mon, Jan 07, 2019 at 12:30:03PM +0530, Vinod Koul wrote:
> On 22-12-18, 08:28, Lukas Wunner wrote:
> > before the IRQ handler had a chance to run.  In fact the IRQ handler may
>                                              ^^^
> > miss an *arbitrary* number of descriptors.  The result is the following
>                                           ^^^^^
> this has double spaces in few places pls fix

That was intentional.  Bjorn Helgaas has established double spaces to
separate sentences in the PCI subsystem, both in commit messages and
code comments.  The rationale he has given:

   "Only habit because my eighth-grade typing teacher in 1979 did it that
    way, and (I think) vim does it that way by default."
    https://patchwork.kernel.org/patch/8941601/

Rafael Wysocki, myself and others have adopted this style.  However
I'll gladly omit the double spaces if that's the preferred style in
the DMA subsystem.


> > @@ -456,7 +456,8 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
> >  	c->desc = d = to_bcm2835_dma_desc(&vd->tx);
> >  
> >  	writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
> > -	writel(BCM2835_DMA_ACTIVE, c->chan_base + BCM2835_DMA_CS);
> > +	writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_END,
> > +	       c->chan_base + BCM2835_DMA_CS);
> 
> can you explain this change please? so every descriptor is written with
> BCM2835_DMA_END?

The BCM2835_DMA_END flag is of type W1C (write 1 to clear).  The above
change ensures that the END flag is cleared when a new descriptor is
started.  Once the DMA engine has finished that descriptor, it will
automatically set the flag again.

This allows the IRQ handler to recognize that it has missed descriptors
and that a new descriptor is currently processed by the DMA engine.
The IRQ handler will then refrain from finalizing that in-flight
descriptor.

I did explain this change in the commit message:

   "Therefore, only finalize a descriptor if the END flag is set in the CS
    register.  Clear the flag when starting a new descriptor.  Perform both
    operations under the vchan lock."


> > @@ -477,11 +479,17 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
> >  	spin_lock_irqsave(&c->vc.lock, flags);
> >  
> >  	/* Acknowledge interrupt */
> > -	writel(BCM2835_DMA_INT, c->chan_base + BCM2835_DMA_CS);
> > +	cs = readl(c->chan_base + BCM2835_DMA_CS);
> > +	writel(cs, c->chan_base + BCM2835_DMA_CS);
> 
> So idea is to ack the interrupts asserted while this is running, right?

The BCM2835_DMA_INT flag is likewise of type W1C.  By writing the
same value to the CS register that it currently has, the interrupt
is acknowledged (INT flag is cleared) but the ACTIVE flag is preserved.

That way, if a new descriptor is currently processed by the DMA engine,
that descriptor is left running.

(The BCM2835_DMA_END is also cleared, but that's not important.)

The above change is likewise explained in the commit message:

   "Only minimal additional overhead is introduced as one
    further MMIO read is necessary per interrupt.

    That MMIO read is needed to write the same value to the CS register that
    it currently has, thereby preserving the ACTIVE flag while clearing the
    INT and END flags."

Thanks,

Lukas
Lukas Wunner - Jan. 17, 2019, 8:26 a.m.
On Sat, Dec 22, 2018 at 08:28:45AM +0100, Lukas Wunner wrote:
> That MMIO read is needed to write the same value to the CS register that
> it currently has, thereby preserving the ACTIVE flag while clearing the
> INT and END flags.  Note that the transaction may finish between reading
> and writing the CS register, and in that case the write changes the
> ACTIVE flag from 0 to 1.  But that's harmless, the chip will notice that
> NEXTCONBK is 0 and self-clear the ACTIVE flag again.

Further experimentation has shown that the chip does not self-clear the
ACTIVE flag if it is set on an idle channel, contrary to what I have
written above. Consequently that flag cannot be used to determine idleness
of a channel reliably. A workaround is to check whether the register
containing the current control block's address is zero. I have amended
the patch accordingly and it is currently undergoing stresstesting until
Monday.  So I should be able to post an updated version of the series
next week.

I have also updated the patch to fix the following remaining race:

> @@ -477,11 +479,17 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
>  	spin_lock_irqsave(&c->vc.lock, flags);
>  
>  	/* Acknowledge interrupt */
> -	writel(BCM2835_DMA_INT, c->chan_base + BCM2835_DMA_CS);
> +	cs = readl(c->chan_base + BCM2835_DMA_CS);
> +	writel(cs, c->chan_base + BCM2835_DMA_CS);
>  
>  	d = c->desc;
>  
> -	if (d) {
> +	/*
> +	 * If this IRQ handler is threaded, clients may terminate descriptors
> +	 * and issue new ones before the IRQ handler runs.  Avoid finalizing
> +	 * such a newly issued descriptor by checking the END flag.
> +	 */
> +	if (d && cs & BCM2835_DMA_END) {
>  		if (d->cyclic) {
>  			/* call the cyclic callback */
>  			vchan_cyclic_callback(&d->vd);

The descriptor may be finished between the read and the write of the
CS register, and in that case the interrupt for the finished descriptor
will be acknowledged but the descriptor will not be finalized because
END was not yet set when the register was read. I haven't seen this
race in the real world but it's definitely not correct and I've fixed
it as well for v2.

Thanks,

Lukas

Patch

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 1a44c8086d77..e94f41c56975 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -456,7 +456,8 @@  static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
 	c->desc = d = to_bcm2835_dma_desc(&vd->tx);
 
 	writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
-	writel(BCM2835_DMA_ACTIVE, c->chan_base + BCM2835_DMA_CS);
+	writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_END,
+	       c->chan_base + BCM2835_DMA_CS);
 }
 
 static irqreturn_t bcm2835_dma_callback(int irq, void *data)
@@ -464,6 +465,7 @@  static irqreturn_t bcm2835_dma_callback(int irq, void *data)
 	struct bcm2835_chan *c = data;
 	struct bcm2835_desc *d;
 	unsigned long flags;
+	u32 cs;
 
 	/* check the shared interrupt */
 	if (c->irq_flags & IRQF_SHARED) {
@@ -477,11 +479,17 @@  static irqreturn_t bcm2835_dma_callback(int irq, void *data)
 	spin_lock_irqsave(&c->vc.lock, flags);
 
 	/* Acknowledge interrupt */
-	writel(BCM2835_DMA_INT, c->chan_base + BCM2835_DMA_CS);
+	cs = readl(c->chan_base + BCM2835_DMA_CS);
+	writel(cs, c->chan_base + BCM2835_DMA_CS);
 
 	d = c->desc;
 
-	if (d) {
+	/*
+	 * If this IRQ handler is threaded, clients may terminate descriptors
+	 * and issue new ones before the IRQ handler runs.  Avoid finalizing
+	 * such a newly issued descriptor by checking the END flag.
+	 */
+	if (d && cs & BCM2835_DMA_END) {
 		if (d->cyclic) {
 			/* call the cyclic callback */
 			vchan_cyclic_callback(&d->vd);
@@ -789,11 +797,7 @@  static int bcm2835_dma_terminate_all(struct dma_chan *chan)
 	list_del_init(&c->node);
 	spin_unlock(&d->lock);
 
-	/*
-	 * Stop DMA activity: we assume the callback will not be called
-	 * after bcm_dma_abort() returns (even if it does, it will see
-	 * c->desc is NULL and exit.)
-	 */
+	/* stop DMA activity */
 	if (c->desc) {
 		vchan_terminate_vdesc(&c->desc->vd);
 		c->desc = NULL;