Patchwork [PATCHv6,07/10] acpi/hmat: Register processor domain to its memory

login
register
mail settings
Submitter Keith Busch
Date Feb. 14, 2019, 5:10 p.m.
Message ID <20190214171017.9362-8-keith.busch@intel.com>
Download mbox | patch
Permalink /patch/726737/
State New
Headers show

Comments

Keith Busch - Feb. 14, 2019, 5:10 p.m.
If the HMAT Subsystem Address Range provides a valid processor proximity
domain for a memory domain, or a processor domain matches the performance
access of the valid processor proximity domain, register the memory
target with that initiator so this relationship will be visible under
the node's sysfs directory.

By registering only the best performing relationships, this provides the
most useful information applications may want to know when considering
which CPU they should run on for a given memory node, or which memory
node they should allocate memory from for a given CPU.

Since HMAT requires valid address ranges have an equivalent SRAT entry,
verify each memory target satisfies this requirement.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/acpi/hmat/Kconfig |   1 +
 drivers/acpi/hmat/hmat.c  | 396 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 396 insertions(+), 1 deletion(-)
Rafael J. Wysocki - Feb. 20, 2019, 10:02 p.m.
On Thu, Feb 14, 2019 at 6:10 PM Keith Busch <keith.busch@intel.com> wrote:
>
> If the HMAT Subsystem Address Range provides a valid processor proximity
> domain for a memory domain, or a processor domain matches the performance
> access of the valid processor proximity domain, register the memory
> target with that initiator so this relationship will be visible under
> the node's sysfs directory.
>
> By registering only the best performing relationships, this provides the
> most useful information applications may want to know when considering
> which CPU they should run on for a given memory node, or which memory
> node they should allocate memory from for a given CPU.
>
> Since HMAT requires valid address ranges have an equivalent SRAT entry,
> verify each memory target satisfies this requirement.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/acpi/hmat/Kconfig |   1 +
>  drivers/acpi/hmat/hmat.c  | 396 +++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 396 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> index c9637e2e7514..08e972ead159 100644
> --- a/drivers/acpi/hmat/Kconfig
> +++ b/drivers/acpi/hmat/Kconfig
> @@ -2,6 +2,7 @@
>  config ACPI_HMAT
>         bool "ACPI Heterogeneous Memory Attribute Table Support"
>         depends on ACPI_NUMA
> +       select HMEM_REPORTING

If you want to do this here, I'm not sure that defining HMEM_REPORTING
as a user-selectable option is a good idea.  In particular, I don't
really think that setting ACPI_HMAT without it makes a lot of sense.
Apart from this, the patch looks reasonable to me.
Dave Hansen - Feb. 20, 2019, 10:11 p.m.
On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
>> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
>> index c9637e2e7514..08e972ead159 100644
>> --- a/drivers/acpi/hmat/Kconfig
>> +++ b/drivers/acpi/hmat/Kconfig
>> @@ -2,6 +2,7 @@
>>  config ACPI_HMAT
>>         bool "ACPI Heterogeneous Memory Attribute Table Support"
>>         depends on ACPI_NUMA
>> +       select HMEM_REPORTING
> If you want to do this here, I'm not sure that defining HMEM_REPORTING
> as a user-selectable option is a good idea.  In particular, I don't
> really think that setting ACPI_HMAT without it makes a lot of sense.
> Apart from this, the patch looks reasonable to me.

I guess the question is whether we would want to allow folks to consume
the HMAT inside the kernel while not reporting it out via
HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
mitigations for memory-side caches.

It's certainly possible that folks would want to consume those
mitigations without anything in sysfs.  They might not even want or need
NUMA support itself, for instance.

So, what should we do?

config HMEM_REPORTING
	bool # no user-visible prompt
	default y if ACPI_HMAT

So folks can override in their .config, but they don't see a prompt?
Dan Williams - Feb. 20, 2019, 10:13 p.m.
On Wed, Feb 20, 2019 at 2:11 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
> >> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> >> index c9637e2e7514..08e972ead159 100644
> >> --- a/drivers/acpi/hmat/Kconfig
> >> +++ b/drivers/acpi/hmat/Kconfig
> >> @@ -2,6 +2,7 @@
> >>  config ACPI_HMAT
> >>         bool "ACPI Heterogeneous Memory Attribute Table Support"
> >>         depends on ACPI_NUMA
> >> +       select HMEM_REPORTING
> > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > as a user-selectable option is a good idea.  In particular, I don't
> > really think that setting ACPI_HMAT without it makes a lot of sense.
> > Apart from this, the patch looks reasonable to me.
>
> I guess the question is whether we would want to allow folks to consume
> the HMAT inside the kernel while not reporting it out via
> HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
> mitigations for memory-side caches.
>
> It's certainly possible that folks would want to consume those
> mitigations without anything in sysfs.  They might not even want or need
> NUMA support itself, for instance.
>
> So, what should we do?
>
> config HMEM_REPORTING
>         bool # no user-visible prompt
>         default y if ACPI_HMAT
>
> So folks can override in their .config, but they don't see a prompt?

I would add an "&& ACPI_NUMA" to that default as well.
Rafael J. Wysocki - Feb. 20, 2019, 10:16 p.m.
On Wed, Feb 20, 2019 at 11:14 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Feb 20, 2019 at 2:11 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
> > >> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> > >> index c9637e2e7514..08e972ead159 100644
> > >> --- a/drivers/acpi/hmat/Kconfig
> > >> +++ b/drivers/acpi/hmat/Kconfig
> > >> @@ -2,6 +2,7 @@
> > >>  config ACPI_HMAT
> > >>         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > >>         depends on ACPI_NUMA
> > >> +       select HMEM_REPORTING
> > > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > > as a user-selectable option is a good idea.  In particular, I don't
> > > really think that setting ACPI_HMAT without it makes a lot of sense.
> > > Apart from this, the patch looks reasonable to me.
> >
> > I guess the question is whether we would want to allow folks to consume
> > the HMAT inside the kernel while not reporting it out via
> > HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
> > mitigations for memory-side caches.
> >
> > It's certainly possible that folks would want to consume those
> > mitigations without anything in sysfs.  They might not even want or need
> > NUMA support itself, for instance.
> >
> > So, what should we do?
> >
> > config HMEM_REPORTING
> >         bool # no user-visible prompt
> >         default y if ACPI_HMAT
> >
> > So folks can override in their .config, but they don't see a prompt?
>
> I would add an "&& ACPI_NUMA" to that default as well.

But ACPI_HMAT depends on ACPI_NUMA already, or am I missing anything?
Dan Williams - Feb. 20, 2019, 10:20 p.m.
On Wed, Feb 20, 2019 at 2:17 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Feb 20, 2019 at 11:14 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Feb 20, 2019 at 2:11 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > >
> > > On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
> > > >> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> > > >> index c9637e2e7514..08e972ead159 100644
> > > >> --- a/drivers/acpi/hmat/Kconfig
> > > >> +++ b/drivers/acpi/hmat/Kconfig
> > > >> @@ -2,6 +2,7 @@
> > > >>  config ACPI_HMAT
> > > >>         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > > >>         depends on ACPI_NUMA
> > > >> +       select HMEM_REPORTING
> > > > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > > > as a user-selectable option is a good idea.  In particular, I don't
> > > > really think that setting ACPI_HMAT without it makes a lot of sense.
> > > > Apart from this, the patch looks reasonable to me.
> > >
> > > I guess the question is whether we would want to allow folks to consume
> > > the HMAT inside the kernel while not reporting it out via
> > > HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
> > > mitigations for memory-side caches.
> > >
> > > It's certainly possible that folks would want to consume those
> > > mitigations without anything in sysfs.  They might not even want or need
> > > NUMA support itself, for instance.
> > >
> > > So, what should we do?
> > >
> > > config HMEM_REPORTING
> > >         bool # no user-visible prompt
> > >         default y if ACPI_HMAT
> > >
> > > So folks can override in their .config, but they don't see a prompt?
> >
> > I would add an "&& ACPI_NUMA" to that default as well.
>
> But ACPI_HMAT depends on ACPI_NUMA already, or am I missing anything?

Oh, my mistake, sorry.
Rafael J. Wysocki - Feb. 20, 2019, 10:21 p.m.
On Wed, Feb 20, 2019 at 11:11 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
> >> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> >> index c9637e2e7514..08e972ead159 100644
> >> --- a/drivers/acpi/hmat/Kconfig
> >> +++ b/drivers/acpi/hmat/Kconfig
> >> @@ -2,6 +2,7 @@
> >>  config ACPI_HMAT
> >>         bool "ACPI Heterogeneous Memory Attribute Table Support"
> >>         depends on ACPI_NUMA
> >> +       select HMEM_REPORTING
> > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > as a user-selectable option is a good idea.  In particular, I don't
> > really think that setting ACPI_HMAT without it makes a lot of sense.
> > Apart from this, the patch looks reasonable to me.
>
> I guess the question is whether we would want to allow folks to consume
> the HMAT inside the kernel while not reporting it out via
> HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
> mitigations for memory-side caches.
>
> It's certainly possible that folks would want to consume those
> mitigations without anything in sysfs.  They might not even want or need
> NUMA support itself, for instance.
>
> So, what should we do?
>
> config HMEM_REPORTING
>         bool # no user-visible prompt
>         default y if ACPI_HMAT
>
> So folks can override in their .config, but they don't see a prompt?

Maybe it would be better to make HMEM_REPORTING do "select ACPI_HMAT if ACPI".

The mitigations could then do that too if they depend on HMAT and
ACPI_HMAT need not be user-visible at all.
Keith Busch - Feb. 20, 2019, 10:44 p.m.
On Wed, Feb 20, 2019 at 11:21:45PM +0100, Rafael J. Wysocki wrote:
> On Wed, Feb 20, 2019 at 11:11 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
> > >> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> > >> index c9637e2e7514..08e972ead159 100644
> > >> --- a/drivers/acpi/hmat/Kconfig
> > >> +++ b/drivers/acpi/hmat/Kconfig
> > >> @@ -2,6 +2,7 @@
> > >>  config ACPI_HMAT
> > >>         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > >>         depends on ACPI_NUMA
> > >> +       select HMEM_REPORTING
> > > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > > as a user-selectable option is a good idea.  In particular, I don't
> > > really think that setting ACPI_HMAT without it makes a lot of sense.
> > > Apart from this, the patch looks reasonable to me.
> >
> > I guess the question is whether we would want to allow folks to consume
> > the HMAT inside the kernel while not reporting it out via
> > HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
> > mitigations for memory-side caches.
> >
> > It's certainly possible that folks would want to consume those
> > mitigations without anything in sysfs.  They might not even want or need
> > NUMA support itself, for instance.
> >
> > So, what should we do?
> >
> > config HMEM_REPORTING
> >         bool # no user-visible prompt
> >         default y if ACPI_HMAT
> >
> > So folks can override in their .config, but they don't see a prompt?
> 
> Maybe it would be better to make HMEM_REPORTING do "select ACPI_HMAT if ACPI".
> 
> The mitigations could then do that too if they depend on HMAT and
> ACPI_HMAT need not be user-visible at all.

That sounds okay, though it would create unreachable code if !ACPI since
that's the only user for the new reporting interfaces.
Rafael J. Wysocki - Feb. 20, 2019, 10:50 p.m.
On Wed, Feb 20, 2019 at 11:44 PM Keith Busch <keith.busch@intel.com> wrote:
>
> On Wed, Feb 20, 2019 at 11:21:45PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Feb 20, 2019 at 11:11 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > > On 2/20/19 2:02 PM, Rafael J. Wysocki wrote:
> > > >> diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
> > > >> index c9637e2e7514..08e972ead159 100644
> > > >> --- a/drivers/acpi/hmat/Kconfig
> > > >> +++ b/drivers/acpi/hmat/Kconfig
> > > >> @@ -2,6 +2,7 @@
> > > >>  config ACPI_HMAT
> > > >>         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > > >>         depends on ACPI_NUMA
> > > >> +       select HMEM_REPORTING
> > > > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > > > as a user-selectable option is a good idea.  In particular, I don't
> > > > really think that setting ACPI_HMAT without it makes a lot of sense.
> > > > Apart from this, the patch looks reasonable to me.
> > >
> > > I guess the question is whether we would want to allow folks to consume
> > > the HMAT inside the kernel while not reporting it out via
> > > HMEM_REPORTING.  We have some in-kernel users of the HMAT lined up like
> > > mitigations for memory-side caches.
> > >
> > > It's certainly possible that folks would want to consume those
> > > mitigations without anything in sysfs.  They might not even want or need
> > > NUMA support itself, for instance.
> > >
> > > So, what should we do?
> > >
> > > config HMEM_REPORTING
> > >         bool # no user-visible prompt
> > >         default y if ACPI_HMAT
> > >
> > > So folks can override in their .config, but they don't see a prompt?
> >
> > Maybe it would be better to make HMEM_REPORTING do "select ACPI_HMAT if ACPI".
> >
> > The mitigations could then do that too if they depend on HMAT and
> > ACPI_HMAT need not be user-visible at all.
>
> That sounds okay, though it would create unreachable code if !ACPI since
> that's the only user for the new reporting interfaces.

Until there are other users of it, you can make HMEM_REPORTING depend
on ACPI_NUMA and select ACPI_HMAT.
Keith Busch - Feb. 22, 2019, 6:48 p.m.
On Wed, Feb 20, 2019 at 11:02:01PM +0100, Rafael J. Wysocki wrote:
> On Thu, Feb 14, 2019 at 6:10 PM Keith Busch <keith.busch@intel.com> wrote:
> >  config ACPI_HMAT
> >         bool "ACPI Heterogeneous Memory Attribute Table Support"
> >         depends on ACPI_NUMA
> > +       select HMEM_REPORTING
> 
> If you want to do this here, I'm not sure that defining HMEM_REPORTING
> as a user-selectable option is a good idea.  In particular, I don't
> really think that setting ACPI_HMAT without it makes a lot of sense.
> Apart from this, the patch looks reasonable to me.

I'm trying to implement based on the feedback, but I'm a little confused.

As I have it at the moment, HMEM_REPORTING is not user-prompted, so
another option needs to turn it on. I have ACPI_HMAT do that here.

So when you say it's a bad idea to make HMEM_REPORTING user selectable,
isn't it already not user selectable?

If I do it the other way around, that's going to make HMEM_REPORTING
complicated if a non-ACPI implementation wants to report HMEM
properties.
Dan Williams - Feb. 22, 2019, 7:21 p.m.
On Fri, Feb 22, 2019 at 10:48 AM Keith Busch <keith.busch@intel.com> wrote:
>
> On Wed, Feb 20, 2019 at 11:02:01PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Feb 14, 2019 at 6:10 PM Keith Busch <keith.busch@intel.com> wrote:
> > >  config ACPI_HMAT
> > >         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > >         depends on ACPI_NUMA
> > > +       select HMEM_REPORTING
> >
> > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > as a user-selectable option is a good idea.  In particular, I don't
> > really think that setting ACPI_HMAT without it makes a lot of sense.
> > Apart from this, the patch looks reasonable to me.
>
> I'm trying to implement based on the feedback, but I'm a little confused.
>
> As I have it at the moment, HMEM_REPORTING is not user-prompted, so
> another option needs to turn it on. I have ACPI_HMAT do that here.
>
> So when you say it's a bad idea to make HMEM_REPORTING user selectable,
> isn't it already not user selectable?
>
> If I do it the other way around, that's going to make HMEM_REPORTING
> complicated if a non-ACPI implementation wants to report HMEM
> properties.

Agree. If a platform supports these HMEM properties then they should
be reported. ACPI_HMAT is that opt-in for ACPI based platforms, and
other archs can do something similar. It's not clear that one would
ever want to opt-in to HMAT support and opt-out of reporting any of it
to userspace.
Rafael J. Wysocki - Feb. 24, 2019, 7:59 p.m.
On Fri, Feb 22, 2019 at 7:48 PM Keith Busch <keith.busch@intel.com> wrote:
>
> On Wed, Feb 20, 2019 at 11:02:01PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Feb 14, 2019 at 6:10 PM Keith Busch <keith.busch@intel.com> wrote:
> > >  config ACPI_HMAT
> > >         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > >         depends on ACPI_NUMA
> > > +       select HMEM_REPORTING
> >
> > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > as a user-selectable option is a good idea.  In particular, I don't
> > really think that setting ACPI_HMAT without it makes a lot of sense.
> > Apart from this, the patch looks reasonable to me.
>
> I'm trying to implement based on the feedback, but I'm a little confused.
>
> As I have it at the moment, HMEM_REPORTING is not user-prompted, so
> another option needs to turn it on. I have ACPI_HMAT do that here.
>
> So when you say it's a bad idea to make HMEM_REPORTING user selectable,
> isn't it already not user selectable?

I thought that HMEM_REPORTING was user-prompted initially, by bad if it wasn't.

> If I do it the other way around, that's going to make HMEM_REPORTING
> complicated if a non-ACPI implementation wants to report HMEM
> properties.

But the mitigations that Dave was talking about get in the way, don't they?

Say there is another Kconfig option,CACHE_MITIGATIONS, to enable them.
Then you want ACPI_HMAT to be set when that it set and you also want
ACPI_HMAT to be set when HMEM_REPORTING and ACPI_NUMA are both set.

OTOH, you may not want HMEM_REPORTING to be set when CACHE_MITIGATIONS
is set, but that causes ACPI_HMAT to be set and which means that
ACPI_HMAT alone will not be sufficient to determine the
HMEM_REPORTING value.

Now, if you prompt for HMEM_REPORTING and make it depend on ACPI_NUMA,
then ACPI_HMAT can be selected by that (regardless of the
CACHE_MITIGATIONS value).

And if someone wants to use HMEM_REPORTING without ACPI_NUMA, it can
be made depend on whatever new option is there for that non-ACPI
mechanism.

There might be a problem if someone wanted to enable the alternative
way of HMEM_REPORTING if ACPI_NUMA was set (in which case HMAT would
have to be ignored even if it was present), but in that case there
would need to be an explicit way to choose between HMAT and non-HMAT
anyway.

In any case, I prefer providers to be selected by consumers and not
the other way around, in case there are multiple consumers for one
provider.
Rafael J. Wysocki - Feb. 24, 2019, 8:07 p.m.
On Fri, Feb 22, 2019 at 8:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Feb 22, 2019 at 10:48 AM Keith Busch <keith.busch@intel.com> wrote:
> >
> > On Wed, Feb 20, 2019 at 11:02:01PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Feb 14, 2019 at 6:10 PM Keith Busch <keith.busch@intel.com> wrote:
> > > >  config ACPI_HMAT
> > > >         bool "ACPI Heterogeneous Memory Attribute Table Support"
> > > >         depends on ACPI_NUMA
> > > > +       select HMEM_REPORTING
> > >
> > > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > > as a user-selectable option is a good idea.  In particular, I don't
> > > really think that setting ACPI_HMAT without it makes a lot of sense.
> > > Apart from this, the patch looks reasonable to me.
> >
> > I'm trying to implement based on the feedback, but I'm a little confused.
> >
> > As I have it at the moment, HMEM_REPORTING is not user-prompted, so
> > another option needs to turn it on. I have ACPI_HMAT do that here.
> >
> > So when you say it's a bad idea to make HMEM_REPORTING user selectable,
> > isn't it already not user selectable?
> >
> > If I do it the other way around, that's going to make HMEM_REPORTING
> > complicated if a non-ACPI implementation wants to report HMEM
> > properties.
>
> Agree. If a platform supports these HMEM properties then they should
> be reported.

Well, I'm not sure if everybody is in agreement on that.

> ACPI_HMAT is that opt-in for ACPI based platforms, and
> other archs can do something similar. It's not clear that one would
> ever want to opt-in to HMAT support and opt-out of reporting any of it
> to userspace.

In my view, ACPI_HMAT need not be an opt-in in the first place.  The
only reason to avoid compiling HMAT parsing it would be if there were
no users of it in the kernel IMO.
Keith Busch - Feb. 25, 2019, 4:51 p.m.
On Sun, Feb 24, 2019 at 08:59:45PM +0100, Rafael J. Wysocki wrote:
> On Fri, Feb 22, 2019 at 7:48 PM Keith Busch <keith.busch@intel.com> wrote:
> > If I do it the other way around, that's going to make HMEM_REPORTING
> > complicated if a non-ACPI implementation wants to report HMEM
> > properties.
> 
> But the mitigations that Dave was talking about get in the way, don't they?
> 
> Say there is another Kconfig option,CACHE_MITIGATIONS, to enable them.
> Then you want ACPI_HMAT to be set when that it set and you also want
> ACPI_HMAT to be set when HMEM_REPORTING and ACPI_NUMA are both set.
> 
> OTOH, you may not want HMEM_REPORTING to be set when CACHE_MITIGATIONS
> is set, but that causes ACPI_HMAT to be set and which means that
> ACPI_HMAT alone will not be sufficient to determine the
> HMEM_REPORTING value.

I can't think of when we'd want to suppress reporting these attributes
to user space, but I can split HMAT enabling so it doesn't depend on
HMEM_REPORTING just in case there really is an in-kernel user that
definitely does not want the same attributes exported.

> Now, if you prompt for HMEM_REPORTING and make it depend on ACPI_NUMA,
> then ACPI_HMAT can be selected by that (regardless of the
> CACHE_MITIGATIONS value).
> 
> And if someone wants to use HMEM_REPORTING without ACPI_NUMA, it can
> be made depend on whatever new option is there for that non-ACPI
> mechanism.
> 
> There might be a problem if someone wanted to enable the alternative
> way of HMEM_REPORTING if ACPI_NUMA was set (in which case HMAT would
> have to be ignored even if it was present), but in that case there
> would need to be an explicit way to choose between HMAT and non-HMAT
> anyway.
> 
> In any case, I prefer providers to be selected by consumers and not
> the other way around, in case there are multiple consumers for one
> provider.

Well, the HMEM_REPORTING fundamentally has no dependency on any of these
things and I've put some effort into making this part provider agnostic.
I will change it if this concern is gating acceptance, but I don't
think it's as intuitive for generic interfaces to be the selector for
implementation specific providers.
Rafael J. Wysocki - Feb. 25, 2019, 10:30 p.m.
On Mon, Feb 25, 2019 at 5:51 PM Keith Busch <keith.busch@intel.com> wrote:
>
> On Sun, Feb 24, 2019 at 08:59:45PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Feb 22, 2019 at 7:48 PM Keith Busch <keith.busch@intel.com> wrote:
> > > If I do it the other way around, that's going to make HMEM_REPORTING
> > > complicated if a non-ACPI implementation wants to report HMEM
> > > properties.
> >
> > But the mitigations that Dave was talking about get in the way, don't they?
> >
> > Say there is another Kconfig option,CACHE_MITIGATIONS, to enable them.
> > Then you want ACPI_HMAT to be set when that it set and you also want
> > ACPI_HMAT to be set when HMEM_REPORTING and ACPI_NUMA are both set.
> >
> > OTOH, you may not want HMEM_REPORTING to be set when CACHE_MITIGATIONS
> > is set, but that causes ACPI_HMAT to be set and which means that
> > ACPI_HMAT alone will not be sufficient to determine the
> > HMEM_REPORTING value.
>
> I can't think of when we'd want to suppress reporting these attributes
> to user space, but I can split HMAT enabling so it doesn't depend on
> HMEM_REPORTING just in case there really is an in-kernel user that
> definitely does not want the same attributes exported.

I'd rather simplify HMAT enabling than make it more complicated, so
splitting it would be worse than what you have already IMO.

> > Now, if you prompt for HMEM_REPORTING and make it depend on ACPI_NUMA,
> > then ACPI_HMAT can be selected by that (regardless of the
> > CACHE_MITIGATIONS value).
> >
> > And if someone wants to use HMEM_REPORTING without ACPI_NUMA, it can
> > be made depend on whatever new option is there for that non-ACPI
> > mechanism.
> >
> > There might be a problem if someone wanted to enable the alternative
> > way of HMEM_REPORTING if ACPI_NUMA was set (in which case HMAT would
> > have to be ignored even if it was present), but in that case there
> > would need to be an explicit way to choose between HMAT and non-HMAT
> > anyway.
> >
> > In any case, I prefer providers to be selected by consumers and not
> > the other way around, in case there are multiple consumers for one
> > provider.
>
> Well, the HMEM_REPORTING fundamentally has no dependency on any of these
> things and I've put some effort into making this part provider agnostic.

Which I agree with.

> I will change it if this concern is gating acceptance, but I don't
> think it's as intuitive for generic interfaces to be the selector for
> implementation specific providers.

That is sort of a chicken-and-egg issue about what is more fundamental
that could be discussed forever. :-)

My original point was that if you regard ACPI_HMAT as the more
fundamental thing, then you should prompt for it and select
HMEM_REPORTING automatically from there.  Or the other way around if
you regard HMEM_REPORTING as more fundamental.  Prompting for both of
them may lead to issues.

As long as that is taken into account, I'm basically fine with any of
the two choices.
Brice Goglin - March 7, 2019, 11:49 a.m.
Le 14/02/2019 à 18:10, Keith Busch a écrit :
> If the HMAT Subsystem Address Range provides a valid processor proximity
> domain for a memory domain, or a processor domain matches the performance
> access of the valid processor proximity domain, register the memory
> target with that initiator so this relationship will be visible under
> the node's sysfs directory.
>
> By registering only the best performing relationships, this provides the
> most useful information applications may want to know when considering
> which CPU they should run on for a given memory node, or which memory
> node they should allocate memory from for a given CPU.
>
> Since HMAT requires valid address ranges have an equivalent SRAT entry,
> verify each memory target satisfies this requirement.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>

[...]

> +static __init struct memory_initiator *find_mem_initiator(unsigned int cpu_pxm)
> +{
> +	struct memory_initiator *intitator;
> +
> +	list_for_each_entry(intitator, &initiators, node)
> +		if (intitator->processor_pxm == cpu_pxm)
> +			return intitator;
> +	return NULL;
> +}

Typo intitator -> initiator

> +static __init void alloc_memory_initiator(unsigned int cpu_pxm)
> +{
> +	struct memory_initiator *intitator;
> +
> +	if (pxm_to_node(cpu_pxm) == NUMA_NO_NODE)
> +		return;
> +
> +	intitator = find_mem_initiator(cpu_pxm);
> +	if (intitator)
> +		return;
> +
> +	intitator = kzalloc(sizeof(*intitator), GFP_KERNEL);
> +	if (!intitator)
> +		return;
> +
> +	intitator->processor_pxm = cpu_pxm;
> +	list_add_tail(&intitator->node, &initiators);
> +}

Same typo


> +static __init void hmat_register_target_initiators(struct memory_target *target)
> +{
> +	static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
> +	struct memory_initiator *initiator;
> +	unsigned int mem_nid, cpu_nid;
> +	struct memory_locality *loc = NULL;
> +	u32 best = 0;
> +	int i;
> +
> +	if (target->processor_pxm == PXM_INVAL)
> +		return;


This test above looks wrong to me. First, it means that either you
return from here, or from the next branch below, hence the loop that
looks for best performance is dead code. Secondly, it means that memory
targets without a PXM never get an initiator.

I verified that removing this test makes things look better on my HMAT
tests.


> +	mem_nid = pxm_to_node(target->memory_pxm);
> +
> +	/*
> +	 * If the Address Range Structure provides a local processor pxm, link
> +	 * only that one. Otherwise, find the best performance attribtes and
> +	 * register all initiators that match.
> +	 */
> +	if (target->processor_pxm != PXM_INVAL) {
> +		cpu_nid = pxm_to_node(target->processor_pxm);
> +		register_memory_node_under_compute_node(mem_nid, cpu_nid, 0);
> +		return;
> +	}


This seems to contradict your first paragraph in the header where you say

"or a processor domain matches the performance access of the valid processor proximity domain".

By returning here, you're only keeping the the local PXM and ignoring
other initiators that may have the same performance.

I am testing a HMAT where one memory target has same performance to two
processor proxdomains. If no processor proxdomain is set in the HMAT for
this target, I get two initiators as expected. If proxdomain is
explicitly set in the HMAT, I get only that one as initiator.

Brice
Keith Busch - March 7, 2019, 3:19 p.m.
Hi Brice,

Please see v7 of this series from last week instead for reviews:

 https://patchwork.kernel.org/cover/10832365/

Patch

diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig
index c9637e2e7514..08e972ead159 100644
--- a/drivers/acpi/hmat/Kconfig
+++ b/drivers/acpi/hmat/Kconfig
@@ -2,6 +2,7 @@ 
 config ACPI_HMAT
 	bool "ACPI Heterogeneous Memory Attribute Table Support"
 	depends on ACPI_NUMA
+	select HMEM_REPORTING
 	help
 	 If set, this option causes the kernel to set the memory NUMA node
 	 relationships and access attributes in accordance with ACPI HMAT
diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 7a809f6a5119..b29f7160c7bb 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -13,11 +13,105 @@ 
 #include <linux/device.h>
 #include <linux/init.h>
 #include <linux/list.h>
+#include <linux/list_sort.h>
 #include <linux/node.h>
 #include <linux/sysfs.h>
 
 static __initdata u8 hmat_revision;
 
+static __initdata LIST_HEAD(targets);
+static __initdata LIST_HEAD(initiators);
+static __initdata LIST_HEAD(localities);
+
+/*
+ * The defined enum order is used to prioritize attributes selecting the best
+ * performing node.
+ */
+enum locality_types {
+	WRITE_LATENCY,
+	READ_LATENCY,
+	WRITE_BANDWIDTH,
+	READ_BANDWIDTH,
+};
+
+static struct memory_locality *localities_types[4];
+
+struct memory_target {
+	struct list_head node;
+	unsigned int memory_pxm;
+	unsigned int processor_pxm;
+	struct node_hmem_attrs hmem_attrs;
+};
+
+struct memory_initiator {
+	struct list_head node;
+	unsigned int processor_pxm;
+};
+
+struct memory_locality {
+	struct list_head node;
+	struct acpi_hmat_locality *hmat_loc;
+};
+
+static __init struct memory_initiator *find_mem_initiator(unsigned int cpu_pxm)
+{
+	struct memory_initiator *intitator;
+
+	list_for_each_entry(intitator, &initiators, node)
+		if (intitator->processor_pxm == cpu_pxm)
+			return intitator;
+	return NULL;
+}
+
+static __init struct memory_target *find_mem_target(unsigned int mem_pxm)
+{
+	struct memory_target *target;
+
+	list_for_each_entry(target, &targets, node)
+		if (target->memory_pxm == mem_pxm)
+			return target;
+	return NULL;
+}
+
+static __init void alloc_memory_initiator(unsigned int cpu_pxm)
+{
+	struct memory_initiator *intitator;
+
+	if (pxm_to_node(cpu_pxm) == NUMA_NO_NODE)
+		return;
+
+	intitator = find_mem_initiator(cpu_pxm);
+	if (intitator)
+		return;
+
+	intitator = kzalloc(sizeof(*intitator), GFP_KERNEL);
+	if (!intitator)
+		return;
+
+	intitator->processor_pxm = cpu_pxm;
+	list_add_tail(&intitator->node, &initiators);
+}
+
+static __init void alloc_memory_target(unsigned int mem_pxm)
+{
+	struct memory_target *target;
+
+	if (pxm_to_node(mem_pxm) == NUMA_NO_NODE)
+		return;
+
+	target = find_mem_target(mem_pxm);
+	if (target)
+		return;
+
+	target = kzalloc(sizeof(*target), GFP_KERNEL);
+	if (!target)
+		return;
+
+	target->memory_pxm = mem_pxm;
+	target->processor_pxm = PXM_INVAL;
+	list_add_tail(&target->node, &targets);
+}
+
 static __init const char *hmat_data_type(u8 type)
 {
 	switch (type) {
@@ -89,14 +183,83 @@  static __init u32 hmat_normalize(u16 entry, u64 base, u8 type)
 	return value;
 }
 
+static __init void hmat_update_target_access(struct memory_target *target,
+					     u8 type, u32 value)
+{
+	switch (type) {
+	case ACPI_HMAT_ACCESS_LATENCY:
+		target->hmem_attrs.read_latency = value;
+		target->hmem_attrs.write_latency = value;
+		break;
+	case ACPI_HMAT_READ_LATENCY:
+		target->hmem_attrs.read_latency = value;
+		break;
+	case ACPI_HMAT_WRITE_LATENCY:
+		target->hmem_attrs.write_latency = value;
+		break;
+	case ACPI_HMAT_ACCESS_BANDWIDTH:
+		target->hmem_attrs.read_bandwidth = value;
+		target->hmem_attrs.write_bandwidth = value;
+		break;
+	case ACPI_HMAT_READ_BANDWIDTH:
+		target->hmem_attrs.read_bandwidth = value;
+		break;
+	case ACPI_HMAT_WRITE_BANDWIDTH:
+		target->hmem_attrs.write_bandwidth = value;
+		break;
+	default:
+		break;
+	}
+}
+
+static __init void hmat_add_locality(struct acpi_hmat_locality *hmat_loc)
+{
+	struct memory_locality *loc;
+
+	loc = kzalloc(sizeof(*loc), GFP_KERNEL);
+	if (!loc) {
+		pr_notice_once("Failed to allocate HMAT locality\n");
+		return;
+	}
+
+	loc->hmat_loc = hmat_loc;
+	list_add_tail(&loc->node, &localities);
+
+	switch (hmat_loc->data_type) {
+	case ACPI_HMAT_ACCESS_LATENCY:
+		localities_types[READ_LATENCY] = loc;
+		localities_types[WRITE_LATENCY] = loc;
+		break;
+	case ACPI_HMAT_READ_LATENCY:
+		localities_types[READ_LATENCY] = loc;
+		break;
+	case ACPI_HMAT_WRITE_LATENCY:
+		localities_types[WRITE_LATENCY] = loc;
+		break;
+	case ACPI_HMAT_ACCESS_BANDWIDTH:
+		localities_types[READ_BANDWIDTH] = loc;
+		localities_types[WRITE_BANDWIDTH] = loc;
+		break;
+	case ACPI_HMAT_READ_BANDWIDTH:
+		localities_types[READ_BANDWIDTH] = loc;
+		break;
+	case ACPI_HMAT_WRITE_BANDWIDTH:
+		localities_types[WRITE_BANDWIDTH] = loc;
+		break;
+	default:
+		break;
+	}
+}
+
 static __init int hmat_parse_locality(union acpi_subtable_headers *header,
 				      const unsigned long end)
 {
 	struct acpi_hmat_locality *hmat_loc = (void *)header;
+	struct memory_target *target;
 	unsigned int init, targ, total_size, ipds, tpds;
 	u32 *inits, *targs, value;
 	u16 *entries;
-	u8 type;
+	u8 type, mem_hier;
 
 	if (hmat_loc->header.length < sizeof(*hmat_loc)) {
 		pr_notice("HMAT: Unexpected locality header length: %d\n",
@@ -105,6 +268,7 @@  static __init int hmat_parse_locality(union acpi_subtable_headers *header,
 	}
 
 	type = hmat_loc->data_type;
+	mem_hier = hmat_loc->flags & ACPI_HMAT_MEMORY_HIERARCHY;
 	ipds = hmat_loc->number_of_initiator_Pds;
 	tpds = hmat_loc->number_of_target_Pds;
 	total_size = sizeof(*hmat_loc) + sizeof(*entries) * ipds * tpds +
@@ -123,6 +287,7 @@  static __init int hmat_parse_locality(union acpi_subtable_headers *header,
 	targs = inits + ipds;
 	entries = (u16 *)(targs + tpds);
 	for (init = 0; init < ipds; init++) {
+		alloc_memory_initiator(inits[init]);
 		for (targ = 0; targ < tpds; targ++) {
 			value = hmat_normalize(entries[init * tpds + targ],
 					       hmat_loc->entry_base_unit,
@@ -130,9 +295,18 @@  static __init int hmat_parse_locality(union acpi_subtable_headers *header,
 			pr_info("  Initiator-Target[%d-%d]:%d%s\n",
 				inits[init], targs[targ], value,
 				hmat_data_type_suffix(type));
+
+			if (mem_hier == ACPI_HMAT_MEMORY) {
+				target = find_mem_target(targs[targ]);
+				if (target && target->processor_pxm == inits[init])
+					hmat_update_target_access(target, type, value);
+			}
 		}
 	}
 
+	if (mem_hier == ACPI_HMAT_MEMORY)
+		hmat_add_locality(hmat_loc);
+
 	return 0;
 }
 
@@ -160,6 +334,7 @@  static int __init hmat_parse_address_range(union acpi_subtable_headers *header,
 					   const unsigned long end)
 {
 	struct acpi_hmat_address_range *spa = (void *)header;
+	struct memory_target *target = NULL;
 
 	if (spa->header.length != sizeof(*spa)) {
 		pr_notice("HMAT: Unexpected address range header length: %d\n",
@@ -175,6 +350,23 @@  static int __init hmat_parse_address_range(union acpi_subtable_headers *header,
 		pr_info("HMAT: Memory Flags:%04x Processor Domain:%d Memory Domain:%d\n",
 			spa->flags, spa->processor_PD, spa->memory_PD);
 
+	if (spa->flags & ACPI_HMAT_MEMORY_PD_VALID) {
+		target = find_mem_target(spa->memory_PD);
+		if (!target) {
+			pr_debug("HMAT: Memory Domain missing from SRAT\n");
+			return -EINVAL;
+		}
+	}
+	if (target && spa->flags & ACPI_HMAT_PROCESSOR_PD_VALID) {
+		int p_node = pxm_to_node(spa->processor_PD);
+
+		if (p_node == NUMA_NO_NODE) {
+			pr_debug("HMAT: Invalid Processor Domain\n");
+			return -EINVAL;
+		}
+		target->processor_pxm = p_node;
+	}
+
 	return 0;
 }
 
@@ -198,6 +390,195 @@  static int __init hmat_parse_subtable(union acpi_subtable_headers *header,
 	}
 }
 
+static __init int srat_parse_mem_affinity(union acpi_subtable_headers *header,
+					  const unsigned long end)
+{
+	struct acpi_srat_mem_affinity *ma = (void *)header;
+
+	if (!ma)
+		return -EINVAL;
+	if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
+		return 0;
+	alloc_memory_target(ma->proximity_domain);
+	return 0;
+}
+
+static __init u32 hmat_initiator_perf(struct memory_target *target,
+			       struct memory_initiator *initiator,
+			       struct acpi_hmat_locality *hmat_loc)
+{
+	unsigned int ipds, tpds, i, idx = 0, tdx = 0;
+	u32 *inits, *targs;
+	u16 *entries;
+
+	ipds = hmat_loc->number_of_initiator_Pds;
+	tpds = hmat_loc->number_of_target_Pds;
+	inits = (u32 *)(hmat_loc + 1);
+	targs = inits + ipds;
+	entries = (u16 *)(targs + tpds);
+
+	for (i = 0; i < ipds; i++) {
+		if (inits[i] == initiator->processor_pxm) {
+			idx = i;
+			break;
+		}
+	}
+
+	if (i == ipds)
+		return 0;
+
+	for (i = 0; i < tpds; i++) {
+		if (targs[i] == target->memory_pxm) {
+			tdx = i;
+			break;
+		}
+	}
+	if (i == tpds)
+		return 0;
+
+	return hmat_normalize(entries[idx * tpds + tdx],
+			      hmat_loc->entry_base_unit,
+			      hmat_loc->data_type);
+}
+
+static __init bool hmat_update_best(u8 type, u32 value, u32 *best)
+{
+	bool updated = false;
+
+	if (!value)
+		return false;
+
+	switch (type) {
+	case ACPI_HMAT_ACCESS_LATENCY:
+	case ACPI_HMAT_READ_LATENCY:
+	case ACPI_HMAT_WRITE_LATENCY:
+		if (!*best || *best > value) {
+			*best = value;
+			updated = true;
+		}
+		break;
+	case ACPI_HMAT_ACCESS_BANDWIDTH:
+	case ACPI_HMAT_READ_BANDWIDTH:
+	case ACPI_HMAT_WRITE_BANDWIDTH:
+		if (!*best || *best < value) {
+			*best = value;
+			updated = true;
+		}
+		break;
+	}
+
+	return updated;
+}
+
+static int initiator_cmp(void *priv, struct list_head *a, struct list_head *b)
+{
+	struct memory_initiator *ia;
+	struct memory_initiator *ib;
+	unsigned long *p_nodes = priv;
+
+	ia = list_entry(a, struct memory_initiator, node);
+	ib = list_entry(b, struct memory_initiator, node);
+
+	set_bit(ia->processor_pxm, p_nodes);
+	set_bit(ib->processor_pxm, p_nodes);
+
+	return ia->processor_pxm - ib->processor_pxm;
+}
+
+static __init void hmat_register_target_initiators(struct memory_target *target)
+{
+	static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
+	struct memory_initiator *initiator;
+	unsigned int mem_nid, cpu_nid;
+	struct memory_locality *loc = NULL;
+	u32 best = 0;
+	int i;
+
+	if (target->processor_pxm == PXM_INVAL)
+		return;
+
+	mem_nid = pxm_to_node(target->memory_pxm);
+
+	/*
+	 * If the Address Range Structure provides a local processor pxm, link
+	 * only that one. Otherwise, find the best performance attribtes and
+	 * register all initiators that match.
+	 */
+	if (target->processor_pxm != PXM_INVAL) {
+		cpu_nid = pxm_to_node(target->processor_pxm);
+		register_memory_node_under_compute_node(mem_nid, cpu_nid, 0);
+		return;
+	}
+
+	if (list_empty(&localities))
+		return;
+
+	/*
+	 * We need the initiator list iteration sorted so we can use
+	 * bitmap_clear for previously set initiators when we find a better
+	 * memory accessor. We'll also use the sorting to prime the candidate
+	 * nodes with known initiators.
+	 */
+	bitmap_zero(p_nodes, MAX_NUMNODES);
+	list_sort(p_nodes, &initiators, initiator_cmp);
+	for (i = WRITE_LATENCY; i <= READ_BANDWIDTH; i++) {
+		loc = localities_types[i];
+		if (!loc)
+			continue;
+
+		best = 0;
+		list_for_each_entry(initiator, &initiators, node) {
+			u32 value;
+
+			if (!test_bit(initiator->processor_pxm, p_nodes))
+				continue;
+
+			value = hmat_initiator_perf(target, initiator, loc->hmat_loc);
+			if (hmat_update_best(loc->hmat_loc->data_type, value, &best))
+				bitmap_clear(p_nodes, 0, initiator->processor_pxm);
+			if (value != best)
+				clear_bit(initiator->processor_pxm, p_nodes);
+		}
+		if (best)
+			hmat_update_target_access(target, loc->hmat_loc->data_type, best);
+	}
+
+	for_each_set_bit(i, p_nodes, MAX_NUMNODES) {
+		cpu_nid = pxm_to_node(i);
+		register_memory_node_under_compute_node(mem_nid, cpu_nid, 0);
+	}
+}
+
+static __init void hmat_register_targets(void)
+{
+	struct memory_target *target;
+
+	list_for_each_entry(target, &targets, node)
+		hmat_register_target_initiators(target);
+}
+
+static __init void hmat_free_structures(void)
+{
+	struct memory_target *target, *tnext;
+	struct memory_locality *loc, *lnext;
+	struct memory_initiator *intitator, *inext;
+
+	list_for_each_entry_safe(target, tnext, &targets, node) {
+		list_del(&target->node);
+		kfree(target);
+	}
+
+	list_for_each_entry_safe(intitator, inext, &initiators, node) {
+		list_del(&intitator->node);
+		kfree(intitator);
+	}
+
+	list_for_each_entry_safe(loc, lnext, &localities, node) {
+		list_del(&loc->node);
+		kfree(loc);
+	}
+}
+
 static __init int hmat_init(void)
 {
 	struct acpi_table_header *tbl;
@@ -207,6 +588,17 @@  static __init int hmat_init(void)
 	if (srat_disabled())
 		return 0;
 
+	status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
+	if (ACPI_FAILURE(status))
+		return 0;
+
+	if (acpi_table_parse_entries(ACPI_SIG_SRAT,
+				sizeof(struct acpi_table_srat),
+				ACPI_SRAT_TYPE_MEMORY_AFFINITY,
+				srat_parse_mem_affinity, 0) < 0)
+		goto out_put;
+	acpi_put_table(tbl);
+
 	status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
 	if (ACPI_FAILURE(status))
 		return 0;
@@ -229,7 +621,9 @@  static __init int hmat_init(void)
 			goto out_put;
 		}
 	}
+	hmat_register_targets();
 out_put:
+	hmat_free_structures();
 	acpi_put_table(tbl);
 	return 0;
 }