Patchwork usb: uas: fix usb subsystem hang after power off hub port

login
register
mail settings
Submitter Alan Stern
Date April 9, 2019, 4:45 p.m.
Message ID <Pine.LNX.4.44L0.1904091232370.1599-100000@iolanthe.rowland.org>
Download mbox | patch
Permalink /patch/768817/
State New
Headers show

Comments

Alan Stern - April 9, 2019, 4:45 p.m.
On Tue, 9 Apr 2019, Bart Van Assche wrote:

> On Tue, 2019-04-09 at 10:44 -0400, Alan Stern wrote:
> +AD4 On Mon, 8 Apr 2019, Martin K. Petersen wrote:
> +AD4 
> +AD4 +AD4 
> +AD4 +AD4 Alan,
> +AD4 +AD4 
> +AD4 +AD4 +AD4 So it looks as though the SCSI subsystem doesn't like to have a reset 
> +AD4 +AD4 +AD4 handler call scsi+AF8-remove+AF8-host.
> +AD4 +AD4 
> +AD4 +AD4 Are you talking about a PCI device removal handler or a SCSI error
> +AD4 +AD4 handler?
> +AD4 
> +AD4 The context of this discussion is a USB mass-storage device where the
> +AD4 device's port on its upstream hub has been powered off.  The
> +AD4 powered-off port causes an executing command to time out.  As a result
> +AD4 the SCSI error handler runs and calls the USB reset routine, but the
> +AD4 reset fails because the kernel is unable to communicate with the device
> +AD4 through the powered-off port.  This causes the USB reset routine to
> +AD4 unbind the device from its USB driver, which in turn calls
> +AD4 scsi+AF8-remove+AF8-host -- while the error handler is still running.
> 
> From which context does that unbind happen? From inside a SCSI EH callback
> or from the context of a workqueue? I think the former is not allowed but
> that the latter is allowed. The SRP initiator driver (ib+AF8-srp.c) follows the
> latter approach. See also srp+AF8-queue+AF8-remove+AF8-work().

The unbind happens from inside the SCSI EH callback.  If that really is
not allowed, we'll need to change it.  Or we can just change it
regardless, since the effort required is pretty small.

Kento, please try the patch below.  Does it help with your problem?

Alan Stern
Kento.A.Kobayashi@sony.com - April 15, 2019, 12:27 a.m.
Hi

>The unbind happens from inside the SCSI EH callback.  If that really is not allowed, we'll need to change it.  Or we can just change it regardless, since the effort required is pretty small.
>
>Kento, please try the patch below.  Does it help with your problem?

Thank you for suggestion about this problem.
I confirmed your patch fixes this problem.
I think you change policy for error handler to not calling unbind, right?
In addition, I have a question about this patch.
Could you please tell me why it should not be allowed that the unbind is occurred from eh callback?

This patch will ignore all error which is returned from usb_reset_and_verify_device.
But my patch will ignore error only being returned ENODEV case.
I think side effect of your patch is bigger than my patch.
So I want to know why the unbind is occurred from eh callback should not be allowed.

Regards,
Kento Kobayashi
Alan Stern - April 15, 2019, 3:18 p.m.
On Mon, 15 Apr 2019 Kento.A.Kobayashi@sony.com wrote:

> Hi
> 
> >The unbind happens from inside the SCSI EH callback.  If that really is not allowed, we'll need to change it.  Or we can just change it regardless, since the effort required is pretty small.
> >
> >Kento, please try the patch below.  Does it help with your problem?
> 
> Thank you for suggestion about this problem.
> I confirmed your patch fixes this problem.

Good; I will submit it.

> I think you change policy for error handler to not calling unbind, right?

That's right.

> In addition, I have a question about this patch.
> Could you please tell me why it should not be allowed that the unbind is occurred from eh callback?

The SCSI core does not handle unbind properly when it occurs inside an 
error handler callback.  If you want to know more about why the SCSI 
core behaves this way, you should ask the SCSI developers -- I don't 
know the answer.

> This patch will ignore all error which is returned from usb_reset_and_verify_device.
> But my patch will ignore error only being returned ENODEV case.
> I think side effect of your patch is bigger than my patch.

The only other errors that usb_reset_and_verify_device() can return are 
EINVAL and EISDIR.  These error codes occur in three situations:

	The device has already been disconnected;

	The device has no parent hub (it is a root hub);

	The device is currently suspended.

The first situation is just as bad as -ENODEV.  The second cannot
happen for a USB mass storage device.  The third can happen only if an
error occurs when usb_reset_device() tries to carry out a resume, and
that's also just as bad as -ENODEV.

So although the side effects are larger than with your patch, they are 
not any worse.  Furthermore, they handle correctly some situations that 
your patch does not handle.

> So I want to know why the unbind is occurred from eh callback should not be allowed.

Ask the SCSI developers.

Alan Stern
Alan Stern - April 15, 2019, 3:32 p.m.
On Mon, 15 Apr 2019, Alan Stern wrote:

> On Mon, 15 Apr 2019 Kento.A.Kobayashi@sony.com wrote:
> 
> > Hi
> > 
> > >The unbind happens from inside the SCSI EH callback.  If that really is not allowed, we'll need to change it.  Or we can just change it regardless, since the effort required is pretty small.
> > >
> > >Kento, please try the patch below.  Does it help with your problem?
> > 
> > Thank you for suggestion about this problem.
> > I confirmed your patch fixes this problem.
> 
> Good; I will submit it.

I forgot to ask: Is it okay to add

	Tested-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com>

along with the patch?

Alan Stern
Kento.A.Kobayashi@sony.com - April 16, 2019, 2:31 a.m.
Hi,

>On Mon, 15 Apr 2019, Alan Stern wrote:
>
>> On Mon, 15 Apr 2019 Kento.A.Kobayashi@sony.com wrote:
>> 
>> > Hi
>> > 
>> > >The unbind happens from inside the SCSI EH callback.  If that really is not allowed, we'll need to change it.  Or we can just change it regardless, since the effort required is pretty small.
>> > >
>> > >Kento, please try the patch below.  Does it help with your problem?
>> > 
>> > Thank you for suggestion about this problem.
>> > I confirmed your patch fixes this problem.
>> 
>> Good; I will submit it.

Thank you for allowing submit about fixing this problem.

>
>I forgot to ask: Is it okay to add
>
>	Tested-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com>
>
>along with the patch?

Sure.
In addition, if you can, could you please add member who contribute to this problem fixing?
Cao, Jacky <Jacky.Cao@sony.com>

Regards,
Kento Kobayashi

Patch

Index: usb-4.x/drivers/usb/core/hub.c
===================================================================
--- usb-4.x.orig/drivers/usb/core/hub.c
+++ usb-4.x/drivers/usb/core/hub.c
@@ -5902,7 +5902,9 @@  int usb_reset_device(struct usb_device *
 					cintf->needs_binding = 1;
 			}
 		}
-		usb_unbind_and_rebind_marked_interfaces(udev);
+		/* If the reset failed, hub_wq will unbind drivers later */
+		if (ret == 0)
+			usb_unbind_and_rebind_marked_interfaces(udev);
 	}
 
 	usb_autosuspend_device(udev);