Patchwork iommu/vt-d: Handle hotplug devices' default identity mapping setting

login
register
mail settings
Submitter Lu Baolu
Date Feb. 22, 2019, 5:30 a.m.
Message ID <fe6dcb4e-03c4-0bbf-c51f-5964b20593dd@linux.intel.com>
Download mbox | patch
Permalink /patch/732885/
State New
Headers show

Comments

Lu Baolu - Feb. 22, 2019, 5:30 a.m.
Hi

On 2/22/19 1:06 PM, Lu Baolu wrote:

> Do you mind checking this?
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6ecdcf8fc8c0..f62f30bc1339 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2632,6 +2632,9 @@ static struct dmar_domain 
> *find_or_alloc_domain(struct device *dev, int gaw)
>                          goto out;
>          }
> 
> +       if (!iommu_should_identity_map(dev, 0))
> +               return si_domain;
> +
>          /* Allocate and initialize new domain for the device */
>          domain = alloc_domain(0);
>          if (!domain)

Oops! This can't be compiled. Please try below instead if you'd like to.
Sorry about it.

         if (!domain)

Best regards,
Lu Baolu
Benjamin Serebrin via iommu - Feb. 22, 2019, 6:38 a.m.
Baolu:

Thanks for your comments and your patch. Please find below our responses to
each of your comments:

> What does "I/O operation won't work" exactly mean here? Do you see any
> IOMMU fault message? Or, something doesn't work as expected?

Yes, DMAR fault messages as following came out:
[  354.939896] DMAR: DMAR:[DMA Read] Request device [03:00.1]fault addr
1fdfe80000
[  354.939896] DMAR:[fault reason 02] Present bit in context entry is clear


> Do you mind checking this?
>
> index 6ecdcf8fc8c0..f62f30bc1339 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2632,6 +2632,9 @@ static struct dmar_domain
> *find_or_alloc_domain(struct device *dev, int gaw)
>                          goto out;
>          }
>
> +       if (!iommu_should_identity_map(dev, 0))
> +               return si_domain;
> +
>          /* Allocate and initialize new domain for the device */
>          domain = alloc_domain(0);
>          if (!domain)

Tried this patch, and the same DMAR fault message came out.

Guess it is because of the iommu code path for hotplug devices. If a hotplug
device is rescanned after removal, iommu_bus_notifier will be called as part
of the notifier chains to handle BUS_NOTIFY_ADD_DEVICE event. Along the code
path, intel_iommu_ops->add_device() created an iommu group for this hotplug
device, but failed to create an iommu domain because of the default domain
type IOMMU_DOMAIN_IDENTITY imposed by current IOMMU command line option got
declined by intel_iommu_ops->domain_alloc().

In your patch, function find_or_alloc_domain() is not even in the code path
of BUS_NOTIFY_ADD_DEVICE event notifier chain.

Please let us know if your have more concerns and suggestions.

Best Regards,
James

On Thu, Feb 21, 2019 at 9:35 PM Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi
>
> On 2/22/19 1:06 PM, Lu Baolu wrote:
>
> > Do you mind checking this?
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 6ecdcf8fc8c0..f62f30bc1339 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -2632,6 +2632,9 @@ static struct dmar_domain
> > *find_or_alloc_domain(struct device *dev, int gaw)
> >                          goto out;
> >          }
> >
> > +       if (!iommu_should_identity_map(dev, 0))
> > +               return si_domain;
> > +
> >          /* Allocate and initialize new domain for the device */
> >          domain = alloc_domain(0);
> >          if (!domain)
>
> Oops! This can't be compiled. Please try below instead if you'd like to.
> Sorry about it.
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 6ecdcf8fc8c0..b89f5ba6a3c8 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -174,6 +174,8 @@ static inline unsigned long virt_to_dma_pfn(void *p)
>          return page_to_dma_pfn(virt_to_page(p));
>   }
>
> +static int iommu_should_identity_map(struct device *dev, int startup);
> +
>   /* global iommu list, set NULL for ignored DMAR units */
>   static struct intel_iommu **g_iommus;
>
> @@ -2632,6 +2634,9 @@ static struct dmar_domain
> *find_or_alloc_domain(struct device *dev, int gaw)
>                          goto out;
>          }
>
> +       if (iommu_should_identity_map(dev, 0))
> +               return si_domain;
> +
>          /* Allocate and initialize new domain for the device */
>          domain = alloc_domain(0);
>          if (!domain)
>
> Best regards,
> Lu Baolu
>
>
>
Benjamin Serebrin via iommu - Feb. 22, 2019, 7:36 a.m.
Baolu:

Sorry that my last reply email seems not text format. Resend it now.

Thanks for your comments and your patch. Please find below our responses to
each of your comments:

> What does "I/O operation won't work" exactly mean here? Do you see any
> IOMMU fault message? Or, something doesn't work as expected?

Yes, DMAR fault messages as following came out:
[  354.939896] DMAR: DMAR:[DMA Read] Request device [03:00.1]fault addr 1fdfe80000 
[  354.939896] DMAR:[fault reason 02] Present bit in context entry is clear


> Do you mind checking this?
>
> index 6ecdcf8fc8c0..f62f30bc1339 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2632,6 +2632,9 @@ static struct dmar_domain 
> *find_or_alloc_domain(struct device *dev, int gaw)
>                          goto out;
>          }
> 
> +       if (!iommu_should_identity_map(dev, 0))
> +               return si_domain;
> +
>          /* Allocate and initialize new domain for the device */
>          domain = alloc_domain(0);
>          if (!domain)

Tried this patch, and the same DMAR fault message came out.

Guess it is because of the iommu code path for hotplug devices. If a hotplug
device is rescanned after removal, iommu_bus_notifier will be called as part
of the notifier chains to handle BUS_NOTIFY_ADD_DEVICE event. Along the code
path, intel_iommu_ops->add_device() created an iommu group for this hotplug
device, but failed to create an iommu domain because of the default domain
type IOMMU_DOMAIN_IDENTITY imposed by current IOMMU command line option got
declined by intel_iommu_ops->domain_alloc().

Since si_domain is type of "struct dmar_domain", which is platform dependent,
it is hard to make this change in intel_iommu_ops->domain_alloc().

In your patch, function find_or_alloc_domain() is not in the code path of
BUS_NOTIFY_ADD_DEVICE event notifier chain.

Please let us know if your have more concerns and suggestions.

Best Regards,
James
Lu Baolu - Feb. 22, 2019, 7:42 a.m.
Hi James,

On 2/22/19 2:38 PM, James Dong wrote:
> 
> Tried this patch, and the same DMAR fault message came out.
> 
> Guess it is because of the iommu code path for hotplug devices. If a hotplug
> device is rescanned after removal, iommu_bus_notifier will be called as part
> of the notifier chains to handle BUS_NOTIFY_ADD_DEVICE event. Along the code
> path, intel_iommu_ops->add_device() created an iommu group for this hotplug
> device, but failed to create an iommu domain because of the default domain
> type IOMMU_DOMAIN_IDENTITY imposed by current IOMMU command line option got
> declined by intel_iommu_ops->domain_alloc().
> 
> In your patch, function find_or_alloc_domain() is not even in the code path
> of BUS_NOTIFY_ADD_DEVICE event notifier chain.
> 
> Please let us know if your have more concerns and suggestions.

Can I reproduce this with a local machine? If so, how should I do?

> 
> Best Regards,
> James

Best regards,
Lu Baolu
Benjamin Serebrin via iommu - Feb. 22, 2019, 8:28 a.m.
Baolu:

The reproduction depends on devices. HW passthrough PCIe devices with default
identity map could have the issue. Make sure that messages like following came
out in dmesg and their mapping does not change after booting:
  [   10.167809] DMAR: Hardware identity mapping for device 0000:30:00.0
  [   10.167823] DMAR: Hardware identity mapping for device 0000:30:00.1

Devices which make following true could also be used for the experiment:
  > static int iommu_should_identity_map(struct device *dev, int startup)
  > {
  >		if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev))
  >			return 1;
  >		if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
  >			return 1;

Once they are up, remove them first by following command:
  echo 1 > /sys/bus/pci/devices/0000\:03\:00.1/remove

Then trigger the hotplug device rescanning:
  echo 1 > /sys/bus/pci/rescan

To provide an example of specific devices on the market, I need to try out.
Or, if it is fine with you, forcing a PCIe NIC card to be default hardware
passthrough by changing the intel-iommu.c is another easy way to reproduce
this issue.

Best Regards,
James
Benjamin Serebrin via iommu - Feb. 22, 2019, 6:07 p.m.
An NVMe device configured in static identity mapping should also cause
this error when removed and rescanned. Essentially, a device that does
a DMA when its driver inits, or one that you can force a DMA from.



On Fri, Feb 22, 2019 at 12:28 AM James Dong <xmdong@google.com> wrote:

> Baolu:
>
> The reproduction depends on devices. HW passthrough PCIe devices with
> default
> identity map could have the issue. Make sure that messages like following
> came
> out in dmesg and their mapping does not change after booting:
>   [   10.167809] DMAR: Hardware identity mapping for device 0000:30:00.0
>   [   10.167823] DMAR: Hardware identity mapping for device 0000:30:00.1
>
> Devices which make following true could also be used for the experiment:
>   > static int iommu_should_identity_map(struct device *dev, int startup)
>   > {
>   >             if ((iommu_identity_mapping & IDENTMAP_AZALIA) &&
> IS_AZALIA(pdev))
>   >                     return 1;
>   >             if ((iommu_identity_mapping & IDENTMAP_GFX) &&
> IS_GFX_DEVICE(pdev))
>   >                     return 1;
>
> Once they are up, remove them first by following command:
>   echo 1 > /sys/bus/pci/devices/0000\:03\:00.1/remove
>
> Then trigger the hotplug device rescanning:
>   echo 1 > /sys/bus/pci/rescan
>
> To provide an example of specific devices on the market, I need to try out.
> Or, if it is fine with you, forcing a PCIe NIC card to be default hardware
> passthrough by changing the intel-iommu.c is another easy way to reproduce
> this issue.
>
> Best Regards,
> James
>
Benjamin Serebrin via iommu - Feb. 22, 2019, 6:10 p.m.
An NVMe device configured in static identity mapping should also cause
this error when removed and rescanned. Essentially, a device that does
a DMA when its driver inits, or one that you can force a DMA from.

-Jis


On Fri, Feb 22, 2019 at 12:28 AM James Dong <xmdong@google.com> wrote:
>
> Baolu:
>
> The reproduction depends on devices. HW passthrough PCIe devices with default
> identity map could have the issue. Make sure that messages like following came
> out in dmesg and their mapping does not change after booting:
>   [   10.167809] DMAR: Hardware identity mapping for device 0000:30:00.0
>   [   10.167823] DMAR: Hardware identity mapping for device 0000:30:00.1
>
> Devices which make following true could also be used for the experiment:
>   > static int iommu_should_identity_map(struct device *dev, int startup)
>   > {
>   >             if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev))
>   >                     return 1;
>   >             if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
>   >                     return 1;
>
> Once they are up, remove them first by following command:
>   echo 1 > /sys/bus/pci/devices/0000\:03\:00.1/remove
>
> Then trigger the hotplug device rescanning:
>   echo 1 > /sys/bus/pci/rescan
>
> To provide an example of specific devices on the market, I need to try out.
> Or, if it is fine with you, forcing a PCIe NIC card to be default hardware
> passthrough by changing the intel-iommu.c is another easy way to reproduce
> this issue.
>
> Best Regards,
> James

Patch

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6ecdcf8fc8c0..b89f5ba6a3c8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -174,6 +174,8 @@  static inline unsigned long virt_to_dma_pfn(void *p)
         return page_to_dma_pfn(virt_to_page(p));
  }

+static int iommu_should_identity_map(struct device *dev, int startup);
+
  /* global iommu list, set NULL for ignored DMAR units */
  static struct intel_iommu **g_iommus;

@@ -2632,6 +2634,9 @@  static struct dmar_domain 
*find_or_alloc_domain(struct device *dev, int gaw)
                         goto out;
         }

+       if (iommu_should_identity_map(dev, 0))
+               return si_domain;
+
         /* Allocate and initialize new domain for the device */
         domain = alloc_domain(0);