Patchwork ARM64 boot failure on espressobin with 5.0.0-rc6 (1f947a7a011fcceb14cb912f5481a53b18f1879a)

login
register
mail settings
Submitter Robin Murphy
Date Feb. 14, 2019, 5:58 p.m.
Message ID <fd389e6a-b10c-77fb-ac79-b15ffd805a7d@arm.com>
Download mbox | patch
Permalink /patch/726803/
State New
Headers show

Comments

Robin Murphy - Feb. 14, 2019, 5:58 p.m.
On 14/02/2019 17:36, Christoph Hellwig wrote:
> On Thu, Feb 14, 2019 at 05:27:41PM +0000, Robin Murphy wrote:
>> Oh wow, that driver has possibly the most inventive way of passing a NULL
>> device to the DMA API that I've ever seen, and on arm64 it will certainly
>> have been failing since 4.2, but of course there's also no error checking
>> for anyone to notice...
> 
> I did take a brief look and didn't see how we got the NULL device
> pointer, so it is well hidden for sure.
> 
>> This crash will be a fallout from 356da6d0cd (plus the subsequent fix in
>> 9ab91e7c5c51) that's otherwise missed Christoph's big cleanup. Obviously
>> the right thing to do is for someone to try to figure out the steaming pile
>> of mess in that driver, but if necessary I think the quick fix below should
>> probably suffice to mitigate the change in the short term.
> 
> The fix looks ok.  And for 5.2 I plan to explicitly reject all uses of
> NULL device arguments in the DMA API.  I've sent patches out for all
> the obviously problemetic drivers, and most of them got accepted by the
> maintainers for the 5.1 merge window.
> 
> It seems like the mv_xor code is mostly unmaintained as far as I can
> tell unfortunately.

Hmm, having felt brave enough to take a closer look, it might actually 
be as simple as this - Dave, are you able to give the diff below a spin?

Robin.

----->8-----
John David Anglin - Feb. 14, 2019, 6:11 p.m.
On 2019-02-14 12:58 p.m., Robin Murphy wrote:
> Hmm, having felt brave enough to take a closer look, it might actually be as simple as this - Dave, are you able to give the diff below a spin?
Yes.
John David Anglin - Feb. 15, 2019, 3:22 p.m.
On 2019-02-14 12:58 p.m., Robin Murphy wrote:
> Hmm, having felt brave enough to take a closer look, it might actually be as simple as this - Dave, are you able to give the diff below a spin?

Still crashes but in slightly different spot:

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.0.0-rc6+ (root@espressobin) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PREEMPT Wed Feb 13
16:17:46 EST 2019
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Board
[    0.000000] earlycon: ar3700_uart0 at MMIO 0x00000000d0012000 (options '')
[    0.000000] printk: bootconsole [ar3700_uart0] enabled
[    3.210276] Internal error: Oops: 96000045 [#1] PREEMPT SMP
[    3.215932] Modules linked in:
[    3.219072] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc6+ #1
[    3.225519] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[    3.232151] pstate: a0000005 (NzCv daif -PAN -UAO)
[    3.237090] pc : mv_xor_channel_add+0x4c/0xb28
[    3.241650] lr : mv_xor_probe+0x20c/0x4b8
[    3.245768] sp : ffffff8010033ac0
[    3.249173] x29: ffffff8010033ac0 x28: 0000000000000000
[    3.254639] x27: ffffff8010e76068 x26: 0000000000000029
[    3.260104] x25: 0000000000000000 x24: 0000000000000000
[    3.265570] x23: ffffffc03fb47400 x22: ffffff8010ead000
[    3.271035] x21: ffffffc03fb47410 x20: ffffffc03bea8d80
[    3.276501] x19: ffffffc03fb47400 x18: ffffffffffffffff
[    3.281966] x17: 000000000000000c x16: 000000000000000a
[    3.287432] x15: ffffff8010ead6c8 x14: ffffffc03beaa003
[    3.292898] x13: ffffffc03beaa002 x12: 0000000000000038
[    3.298363] x11: 0000000000001fff x10: 0000000000000001
[    3.303829] x9 : 0000000000000040 x8 : ffffff8010ec7928
[    3.309294] x7 : ffffffc03cc003b8 x6 : 0000000000000000
[    3.314760] x5 : 0000000000000000 x4 : 0000000000000029
[    3.320226] x3 : 0000000000000083 x2 : 00000000000080c0
[    3.325691] x1 : 0000000000000000 x0 : ffffffc03fb47410
[    3.331158] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[    3.338056] Call trace:
[    3.340569]  mv_xor_channel_add+0x4c/0xb28
[    3.344779]  mv_xor_probe+0x20c/0x4b8
[    3.348544]  platform_drv_probe+0x50/0xb0
[    3.352663]  really_probe+0x1fc/0x2c0
[    3.356427]  driver_probe_device+0x58/0x100
[    3.360727]  __driver_attach+0xd8/0xe0
[    3.364580]  bus_for_each_dev+0x68/0xc8
[    3.368522]  driver_attach+0x20/0x28
[    3.372196]  bus_add_driver+0x108/0x228
[    3.376139]  driver_register+0x60/0x110
[    3.380081]  __platform_driver_register+0x44/0x50
[    3.384923]  mv_xor_driver_init+0x18/0x20
[    3.389043]  do_one_initcall+0x58/0x170
[    3.392985]  kernel_init_freeable+0x190/0x234
[    3.397465]  kernel_init+0x10/0x108
[    3.401047]  ret_from_fork+0x10/0x1c
[    3.404723] Code: f90067a5 d2800005 52901802 aa1503e0 (f9003035)
[    3.411004] ---[ end trace 65be82a62724e328 ]---
[    3.415804] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.423626] SMP: stopping secondary CPUs
[    3.427661] Kernel Offset: disabled
[    3.431243] CPU features: 0x002,2000200c
[    3.435272] Memory Limit: none
[    3.438412] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

ffffff8010630440 <mv_xor_channel_add>:
ffffff8010630440:       a9b37bfd        stp     x29, x30, [sp, #-208]!
ffffff8010630444:       910003fd        mov     x29, sp
ffffff8010630448:       a9025bf5        stp     x21, x22, [sp, #32]
ffffff801063044c:       b00043f6        adrp    x22, ffffff8010ead000 <crypto_il_tab+0x940>
ffffff8010630450:       a90363f7        stp     x23, x24, [sp, #48]
ffffff8010630454:       aa0103f7        mov     x23, x1
ffffff8010630458:       a9046bf9        stp     x25, x26, [sp, #64]
ffffff801063045c:       d2800001        mov     x1, #0x0                        // #0
ffffff8010630460:       a90153f3        stp     x19, x20, [sp, #16]
ffffff8010630464:       910042f5        add     x21, x23, #0x10
ffffff8010630468:       a90573fb        stp     x27, x28, [sp, #80]
ffffff801063046c:       aa0003f4        mov     x20, x0
ffffff8010630470:       911b22c0        add     x0, x22, #0x6c8
ffffff8010630474:       2a0203f9        mov     w25, w2
ffffff8010630478:       f9400005        ldr     x5, [x0]
ffffff801063047c:       f90067a5        str     x5, [x29, #200]
ffffff8010630480:       d2800005        mov     x5, #0x0                        // #0
ffffff8010630484:       52901802        mov     w2, #0x80c0                     // #32960
ffffff8010630488:       aa1503e0        mov     x0, x21
ffffff801063048c:       f9003035        str     x21, [x1, #96]
ffffff8010630490:       72a00c02        movk    w2, #0x60, lsl #16
ffffff8010630494:       d2806a01        mov     x1, #0x350                     
...

Dave

-- 
John David Anglin  dave.anglin@bell.net
John David Anglin - Feb. 15, 2019, 4:05 p.m.
On 2019-02-14 12:58 p.m., Robin Murphy wrote:
> Hmm, having felt brave enough to take a closer look, it might actually be as simple as this - Dave, are you able to give the diff below a spin?
> 
> Robin.
> 
> ----->8-----
> diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
> index 7f595355fb79..fe4a7c71fede 100644
> --- a/drivers/dma/mv_xor.c
> +++ b/drivers/dma/mv_xor.c
> @@ -1059,6 +1059,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
>          mv_chan->op_in_desc = XOR_MODE_IN_DESC;
> 
>      dma_dev = &mv_chan->dmadev;
> +    dma_dev->dev = &pdev->dev;
>      mv_chan->xordev = xordev;
> 
>      /*
> @@ -1091,7 +1092,6 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
>      dma_dev->device_free_chan_resources = mv_xor_free_chan_resources;
>      dma_dev->device_tx_status = mv_xor_status;
>      dma_dev->device_issue_pending = mv_xor_issue_pending;
> -    dma_dev->dev = &pdev->dev;
> 
>      /* set prep routines based on capability */
>      if (dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask))

The patch is fine and it fixes the boot failure.

I misapplied it in previous test.

Thanks,
Dave

Patch

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 7f595355fb79..fe4a7c71fede 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -1059,6 +1059,7 @@  mv_xor_channel_add(struct mv_xor_device *xordev,
  		mv_chan->op_in_desc = XOR_MODE_IN_DESC;

  	dma_dev = &mv_chan->dmadev;
+	dma_dev->dev = &pdev->dev;
  	mv_chan->xordev = xordev;

  	/*
@@ -1091,7 +1092,6 @@  mv_xor_channel_add(struct mv_xor_device *xordev,
  	dma_dev->device_free_chan_resources = mv_xor_free_chan_resources;
  	dma_dev->device_tx_status = mv_xor_status;
  	dma_dev->device_issue_pending = mv_xor_issue_pending;
-	dma_dev->dev = &pdev->dev;

  	/* set prep routines based on capability */
  	if (dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask))