Patchwork [v2] nfit: add Hyper-V NVDIMM DSM command set to white list

login
register
mail settings
Submitter Dexuan Cui
Date Jan. 29, 2019, 12:56 a.m.
Message ID <PU1P153MB01690EDA138D3ABDE5B4A482BF970@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
Download mbox | patch
Permalink /patch/711859/
State New
Headers show

Comments

Dexuan Cui - Jan. 29, 2019, 12:56 a.m.
Add the Hyper-V _DSM command set to the white list of NVDIMM command
sets.

This command set is documented at http://www.uefi.org/RFIC_LIST
(see "Virtual NVDIMM 0x1901").

Thanks Dan Williams <dan.j.williams@intel.com> for writing the
comment change.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---

Changes in v2:
    Updated the comment and changelog (Thanks, Dan!)
    Rebased to the tag libnvdimm-fixes-5.0-rc4 of the nvdimm tree.

 drivers/acpi/nfit/core.c   | 17 ++++++++++++++---
 drivers/acpi/nfit/nfit.h   |  6 +++++-
 include/uapi/linux/ndctl.h |  1 +
 3 files changed, 20 insertions(+), 4 deletions(-)
Dan Williams - Jan. 30, 2019, 6:23 a.m.
On Mon, Jan 28, 2019 at 4:56 PM Dexuan Cui <decui@microsoft.com> wrote:
>
>
> Add the Hyper-V _DSM command set to the white list of NVDIMM command
> sets.
>
> This command set is documented at http://www.uefi.org/RFIC_LIST
> (see "Virtual NVDIMM 0x1901").
>
> Thanks Dan Williams <dan.j.williams@intel.com> for writing the
> comment change.
>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> ---
>
> Changes in v2:
>     Updated the comment and changelog (Thanks, Dan!)
>     Rebased to the tag libnvdimm-fixes-5.0-rc4 of the nvdimm tree.

Thanks for the re-spin, applied.
Dan Williams - Feb. 1, 2019, 5:29 p.m.
On Fri, Feb 1, 2019 at 9:14 AM Dexuan Cui <decui@microsoft.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> > Sent: Tuesday, January 29, 2019 10:24 PM
> > On Mon, Jan 28, 2019 at 4:56 PM Dexuan Cui <decui@microsoft.com> wrote:
> > >
> > >
> > > Add the Hyper-V _DSM command set to the white list of NVDIMM command
> > > sets.
> > >
> > > Thanks Dan Williams <dan.j.williams@intel.com> for writing the
> > > comment change.
> > > ---
> > > Changes in v2:
> > >     Updated the comment and changelog (Thanks, Dan!)
> > >     Rebased to the tag libnvdimm-fixes-5.0-rc4 of the nvdimm tree.
> >
> > Thanks for the re-spin, applied.
>
> Hi Dan,
> Unluckily it looks this commit causes a regression on
> https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending
>
> With the patch, "ndctl list" shows nothing, and /dev/pmem0 can't appear.
> If I revert the patch, it will be back to normal.
>
> I attached the config/logs. In the bad case, "dmesg" shows a line
> [    5.259017] nd_pmem namespace0.0: 0x0000000000000000, too small must be at least 0x1000
>
> Any idea why this happens? I'm digging into the details and I appreciate your insights.

Looks like it is working as expected. The regression you are seeing is
the fact that the patch enables the kernel to enable
nvdimm-namespace-label reads. Those reads find a namespace index block
and a label. Unfortunately the label has the LOCAL flag set and Linux
explicitly ignores pmem namespace labels with that bit set. The reason
for that is due to the fact that the original definition of the LOCAL
bit from v1.1 of the namespace label implementation [1] explicitly
limited the LOCAL flag to "block aperture" regions. If you clear that
LOCAL flag I expect it will work. To my knowledge Windows pretends
that the v1.1 definition never existed.

The UEFI 2.7 specification for v1.2 labels states that setting the
LOCAL flag is optional when "nlabel", number of labels in the set, is
1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.

That said, the Robustness Principle makes a case that Linux should
tolerate the bit being set. However, it's just a non-trivial amount of
work to unwind the ingrained block-aperture assumptions of that bit.

[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
Dexuan Cui - Feb. 1, 2019, 11:17 p.m.
> From: Dan Williams <dan.j.williams@intel.com>

> Sent: Friday, February 1, 2019 9:29 AM

> > Hi Dan,

> > Unluckily it looks this commit causes a regression ...

> > With the patch, "ndctl list" shows nothing, and /dev/pmem0 can't appear.

> > If I revert the patch, it will be back to normal.

> >

> > I attached the config/logs. In the bad case, "dmesg" shows a line

> > [    5.259017] nd_pmem namespace0.0: 0x0000000000000000, too small

> must be at least 0x1000

> > Any idea why this happens? I'm digging into the details and I appreciate your

> insights.

> 

> Looks like it is working as expected. 


I was working on linux-next tree's next-20190107 and this patch did "work fine"
there. The "regression" happens on djbw/nvdimm.git tree's libnvdimm-pending
branch because we have this recent commit (Jan 19):

11189c1089da ("acpi/nfit: Fix command-supported detection"), which makes such
a change in acpi_nfit_ctl():

-       if (!test_bit(cmd, &cmd_mask) || !test_bit(func, &dsm_mask))
+       if (cmd == ND_CMD_CALL && !test_bit(func, &dsm_mask))
+               return -ENOTTY;
+       else if (!test_bit(cmd, &cmd_mask))
                return -ENOTTY;

So previously ND_CMD_GET_CONFIG_DATA fails with -ENOTTY and we're good.

Now the command succeeds, but it looks the returned data is inavlid, and I see
the "regression".

> The regression you are seeing is the fact that the patch enables the kernel to

> enable nvdimm-namespace-label reads. 

Yes.

> Those reads find a namespace index block

> and a label. Unfortunately the label has the LOCAL flag set and Linux

> explicitly ignores pmem namespace labels with that bit set. The reason

Can you please point out the function that ignores the flag?

I checked where NSLABEL_FLAG_LOCAL is used, but it looks I can't find a
related function.

> for that is due to the fact that the original definition of the LOCAL

> bit from v1.1 of the namespace label implementation [1] explicitly

> limited the LOCAL flag to "block aperture" regions. If you clear that

> LOCAL flag I expect it will work. To my knowledge Windows pretends

> that the v1.1 definition never existed.

I'm trying to find out where the flag is used and how to clear it.

> The UEFI 2.7 specification for v1.2 labels states that setting the

> LOCAL flag is optional when "nlabel", number of labels in the set, is

> 1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.

> 

> That said, the Robustness Principle makes a case that Linux should

> tolerate the bit being set. However, it's just a non-trivial amount of

> work to unwind the ingrained block-aperture assumptions of that bit.

Can you please explain this a bit more? Sorry, I'm new to this area...

Thanks,
-- Dexuan
Dan Williams - Feb. 1, 2019, 11:47 p.m.
On Fri, Feb 1, 2019 at 3:17 PM Dexuan Cui <decui@microsoft.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> > Sent: Friday, February 1, 2019 9:29 AM
> > > Hi Dan,
> > > Unluckily it looks this commit causes a regression ...
> > > With the patch, "ndctl list" shows nothing, and /dev/pmem0 can't appear.
> > > If I revert the patch, it will be back to normal.
> > >
> > > I attached the config/logs. In the bad case, "dmesg" shows a line
> > > [    5.259017] nd_pmem namespace0.0: 0x0000000000000000, too small
> > must be at least 0x1000
> > > Any idea why this happens? I'm digging into the details and I appreciate your
> > insights.
> >
> > Looks like it is working as expected.
>
> I was working on linux-next tree's next-20190107 and this patch did "work fine"
> there. The "regression" happens on djbw/nvdimm.git tree's libnvdimm-pending
> branch because we have this recent commit (Jan 19):
>
> 11189c1089da ("acpi/nfit: Fix command-supported detection"), which makes such
> a change in acpi_nfit_ctl():
>
> -       if (!test_bit(cmd, &cmd_mask) || !test_bit(func, &dsm_mask))
> +       if (cmd == ND_CMD_CALL && !test_bit(func, &dsm_mask))
> +               return -ENOTTY;
> +       else if (!test_bit(cmd, &cmd_mask))
>                 return -ENOTTY;
>
> So previously ND_CMD_GET_CONFIG_DATA fails with -ENOTTY and we're good.
>
> Now the command succeeds, but it looks the returned data is inavlid, and I see
> the "regression".

I believe it's the same reason. Without 11189c1089da the _LSR method
will fail, and otherwise it works and finds the label that it doesn't
like.

I'm not seeing "invalid" data in your failure log. Could you double
check that it's just not the success of _LSR that causes the issue?

> > The regression you are seeing is the fact that the patch enables the kernel to
> > enable nvdimm-namespace-label reads.
> Yes.
>
> > Those reads find a namespace index block
> > and a label. Unfortunately the label has the LOCAL flag set and Linux
> > explicitly ignores pmem namespace labels with that bit set. The reason
> Can you please point out the function that ignores the flag?
>
> I checked where NSLABEL_FLAG_LOCAL is used, but it looks I can't find a
> related function.

scan_labels() is where the namespace label is validated relative to
the region type:

                if (is_nd_blk(&nd_region->dev)
                                == !!(flags & NSLABEL_FLAG_LOCAL))
                        /* pass, region matches label type */;
                else
                        continue;

It also has meaning for the namespace capacity allocation
implementation that needed that flag to distinguish aliased capacity
between Block Aperture Mode and PMEM Mode access.

> > for that is due to the fact that the original definition of the LOCAL
> > bit from v1.1 of the namespace label implementation [1] explicitly
> > limited the LOCAL flag to "block aperture" regions. If you clear that
> > LOCAL flag I expect it will work. To my knowledge Windows pretends
> > that the v1.1 definition never existed.
> I'm trying to find out where the flag is used and how to clear it.

Assuming Hyper-V implements _LSW, you can recreate / reinitialize the
label area:

ndctl disable-region all
ndctl init-labels -f all
ndctl enable-region all
ndctl create-namespace

> > The UEFI 2.7 specification for v1.2 labels states that setting the
> > LOCAL flag is optional when "nlabel", number of labels in the set, is
> > 1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.
> >
> > That said, the Robustness Principle makes a case that Linux should
> > tolerate the bit being set. However, it's just a non-trivial amount of
> > work to unwind the ingrained block-aperture assumptions of that bit.
> Can you please explain this a bit more? Sorry, I'm new to this area...

The short story is that Linux enforces that LOCAL == Block Mode
Namespaces. See section 2.2 Namespace Label Layout in the original
spec [1]. The EFI 2.7 definition tried to allow for LOCAL to be set
when an interleave-set was comprised of a single NVDIMM, but then also
states its optional when Nlabel is 1. It has zero functional use for
interleave-set based namespaces even when the interleave-set-width is
1. So Linux takes the option to never set it, and goes further to
reject it if it's set and the region-type does not match, because that
follows the v1.1 meaning of the flag.

[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
Dexuan Cui - Feb. 2, 2019, 12:34 a.m.
> From: Dan Williams <dan.j.williams@intel.com>

> Sent: Friday, February 1, 2019 3:47 PM

> To: Dexuan Cui <decui@microsoft.com>

> 

> I believe it's the same reason. Without 11189c1089da the _LSR method

> will fail, and otherwise it works and finds the label that it doesn't

> like.

Exactly.
 
> I'm not seeing "invalid" data in your failure log. Could you double

> check that it's just not the success of _LSR that causes the issue?


acpi_label_read() never fails for me.

By "invalid", I only mean the messages in the dmesg.bad.txt I previously
attached (I'm just reading the specs to learn the details about NVDIMM
namespace's labels, so my description might be inaccurate) :

[    4.832367] nvdimm nmem1: nsindex0 labelsize 1 invalid
[    4.832369] nvdimm nmem1: nsindex1 labelsize 1 invalid
...
[    5.259017] nd_pmem namespace0.0: 0x0000000000000000, too small must be at least 0x1000

> > > The regression you are seeing is the fact that the patch enables the kernel

> to

> > > enable nvdimm-namespace-label reads.

> > Yes.

> >

> > > Those reads find a namespace index block

> > > and a label. Unfortunately the label has the LOCAL flag set and Linux

> > > explicitly ignores pmem namespace labels with that bit set. The reason

> > Can you please point out the function that ignores the flag?

> >

> > I checked where NSLABEL_FLAG_LOCAL is used, but it looks I can't find a

> > related function.

> 

> scan_labels() is where the namespace label is validated relative to

> the region type:

> 

>                 if (is_nd_blk(&nd_region->dev)

>                                 == !!(flags & NSLABEL_FLAG_LOCAL))

>                         /* pass, region matches label type */;

>                 else

>                         continue;

> 

> It also has meaning for the namespace capacity allocation

> implementation that needed that flag to distinguish aliased capacity

> between Block Aperture Mode and PMEM Mode access.

Thanks for the pointer! I'm looking at this function.

> > > for that is due to the fact that the original definition of the LOCAL

> > > bit from v1.1 of the namespace label implementation [1] explicitly

> > > limited the LOCAL flag to "block aperture" regions. If you clear that

> > > LOCAL flag I expect it will work. To my knowledge Windows pretends

> > > that the v1.1 definition never existed.

> > I'm trying to find out where the flag is used and how to clear it.

> 

> Assuming Hyper-V implements _LSW, you can recreate / reinitialize the

> label area:


I think Hyper-V only implements _LSR:
[    4.720623] nfit ACPI0012:00: device:00: has _LSR
[    4.723683] nfit ACPI0012:00: device:01: has _LSR
 
> > > The UEFI 2.7 specification for v1.2 labels states that setting the

> > > LOCAL flag is optional when "nlabel", number of labels in the set, is

> > > 1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.

> > >

> > > That said, the Robustness Principle makes a case that Linux should

> > > tolerate the bit being set. However, it's just a non-trivial amount of

> > > work to unwind the ingrained block-aperture assumptions of that bit.

> > Can you please explain this a bit more? Sorry, I'm new to this area...

> 

> The short story is that Linux enforces that LOCAL == Block Mode

> Namespaces. See section 2.2 Namespace Label Layout in the original

> spec [1]. The EFI 2.7 definition tried to allow for LOCAL to be set

> when an interleave-set was comprised of a single NVDIMM, but then also

> states its optional when Nlabel is 1. It has zero functional use for

> interleave-set based namespaces even when the interleave-set-width is

> 1. So Linux takes the option to never set it, and goes further to

> reject it if it's set and the region-type does not match, because that

> follows the v1.1 meaning of the flag.

> 

> [1]:

Thanks for the link! I'll read it.
BTW, it looks Hyper-V only supports PMEM namespace, at least so far.

Thanks,
-- Dexuan
Dan Williams - Feb. 2, 2019, 12:47 a.m.
On Fri, Feb 1, 2019 at 4:34 PM Dexuan Cui <decui@microsoft.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> > Sent: Friday, February 1, 2019 3:47 PM
> > To: Dexuan Cui <decui@microsoft.com>
> >
> > I believe it's the same reason. Without 11189c1089da the _LSR method
> > will fail, and otherwise it works and finds the label that it doesn't
> > like.
> Exactly.
>
> > I'm not seeing "invalid" data in your failure log. Could you double
> > check that it's just not the success of _LSR that causes the issue?
>
> acpi_label_read() never fails for me.
>
> By "invalid", I only mean the messages in the dmesg.bad.txt I previously
> attached (I'm just reading the specs to learn the details about NVDIMM
> namespace's labels, so my description might be inaccurate) :
>
> [    4.832367] nvdimm nmem1: nsindex0 labelsize 1 invalid
> [    4.832369] nvdimm nmem1: nsindex1 labelsize 1 invalid

Oh, those are benign. They are a side effect of Linux probing for v1.2
namespace labels vs v1.1. It will always find that one of those is
"invalid".

> ...
> [    5.259017] nd_pmem namespace0.0: 0x0000000000000000, too small must be at least 0x1000
>
> > > > The regression you are seeing is the fact that the patch enables the kernel
> > to
> > > > enable nvdimm-namespace-label reads.
> > > Yes.
> > >
> > > > Those reads find a namespace index block
> > > > and a label. Unfortunately the label has the LOCAL flag set and Linux
> > > > explicitly ignores pmem namespace labels with that bit set. The reason
> > > Can you please point out the function that ignores the flag?
> > >
> > > I checked where NSLABEL_FLAG_LOCAL is used, but it looks I can't find a
> > > related function.
> >
> > scan_labels() is where the namespace label is validated relative to
> > the region type:
> >
> >                 if (is_nd_blk(&nd_region->dev)
> >                                 == !!(flags & NSLABEL_FLAG_LOCAL))
> >                         /* pass, region matches label type */;
> >                 else
> >                         continue;
> >
> > It also has meaning for the namespace capacity allocation
> > implementation that needed that flag to distinguish aliased capacity
> > between Block Aperture Mode and PMEM Mode access.
> Thanks for the pointer! I'm looking at this function.
>
> > > > for that is due to the fact that the original definition of the LOCAL
> > > > bit from v1.1 of the namespace label implementation [1] explicitly
> > > > limited the LOCAL flag to "block aperture" regions. If you clear that
> > > > LOCAL flag I expect it will work. To my knowledge Windows pretends
> > > > that the v1.1 definition never existed.
> > > I'm trying to find out where the flag is used and how to clear it.
> >
> > Assuming Hyper-V implements _LSW, you can recreate / reinitialize the
> > label area:
>
> I think Hyper-V only implements _LSR:
> [    4.720623] nfit ACPI0012:00: device:00: has _LSR
> [    4.723683] nfit ACPI0012:00: device:01: has _LSR

That's unfortunate...

>
> > > > The UEFI 2.7 specification for v1.2 labels states that setting the
> > > > LOCAL flag is optional when "nlabel", number of labels in the set, is
> > > > 1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.
> > > >
> > > > That said, the Robustness Principle makes a case that Linux should
> > > > tolerate the bit being set. However, it's just a non-trivial amount of
> > > > work to unwind the ingrained block-aperture assumptions of that bit.
> > > Can you please explain this a bit more? Sorry, I'm new to this area...
> >
> > The short story is that Linux enforces that LOCAL == Block Mode
> > Namespaces. See section 2.2 Namespace Label Layout in the original
> > spec [1]. The EFI 2.7 definition tried to allow for LOCAL to be set
> > when an interleave-set was comprised of a single NVDIMM, but then also
> > states its optional when Nlabel is 1. It has zero functional use for
> > interleave-set based namespaces even when the interleave-set-width is
> > 1. So Linux takes the option to never set it, and goes further to
> > reject it if it's set and the region-type does not match, because that
> > follows the v1.1 meaning of the flag.
> >
> > [1]:
> Thanks for the link! I'll read it.
> BTW, it looks Hyper-V only supports PMEM namespace, at least so far.

I don't think it should bother. It only makes sense for bare metal and
even then I know of no NVDIMMs that are shipping it.
Dexuan Cui - Feb. 2, 2019, 1:06 a.m.
> From: Linux-nvdimm <linux-nvdimm-bounces@lists.01.org> On Behalf Of
> Dexuan Cui
> Sent: Friday, February 1, 2019 4:34 PM
> > > > ...
> > > > Those reads find a namespace index block
> > > > and a label. Unfortunately the label has the LOCAL flag set and Linux
> > > > explicitly ignores pmem namespace labels with that bit set. The reason
> > > > for that is due to the fact that the original definition of the LOCAL
> > > > bit from v1.1 of the namespace label implementation [1] explicitly
> > > > limited the LOCAL flag to "block aperture" regions. If you clear that
> > > > LOCAL flag I expect it will work. To my knowledge Windows pretends
> > > > that the v1.1 definition never existed.

On the libnvdimm-pending branch, I get this:

root@decui-gen2-u1904:~/nvdimm# ndctl list
root@decui-gen2-u1904:~/nvdimm# ndctl list --idle
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":0,
    "uuid":"00000000-0000-0000-0000-000000000000",
    "state":"disabled"
  },
  {
    "dev":"namespace0.0",
    "mode":"raw",
    "size":0,
    "uuid":"00000000-0000-0000-0000-000000000000",
    "state":"disabled"
  }
]

With the patch that clears the LOCAL label (BTW, the initial value of flags is 0x3,
meaning a read-only local label) :
@@ -2496,6 +2500,7 @@ static int init_active_labels(struct nd_region *nd_region)
                        if (!label_ent)
                                break;
                        label = nd_label_active(ndd, j);
+                       label->flags &= ~NSLABEL_FLAG_LOCAL;
                        label_ent->label = label;

I get this:

root@decui-gen2-u1904:~/nvdimm# ndctl list
root@decui-gen2-u1904:~/nvdimm# ndctl list --idle
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":0,
    "uuid":"c258aaab-f72b-e546-bfa5-be5e07761dbc",
    "state":"disabled",
    "name":"Microsoft Hyper-V NVDIMM 1 Label"
  },
  {
    "dev":"namespace0.0",
    "mode":"raw",
    "size":0,
    "uuid":"9f0497a7-4453-7c40-ad35-21a791e00345",
    "state":"disabled",
    "name":"Microsoft Hyper-V NVDIMM 0 Label"
  }
]

The "size" and "mode" still don't look right, but the improvement is that
now I can see a good descriptive "name", which I suppose is retrieved
from Hyper-V.

With Ubuntu 19.04 (4.18.0-11-generic), I get this:
(Note: the "mode" and "size" are correct. The "uuid" is different from
the above "9f0497a7-4453-7c40-ad35-21a791e00345" -- this is weird.) 

root@decui-gen2-u1904:~# ndctl list
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":137438953472,
    "blockdev":"pmem1"
  },
  {
    "dev":"namespace0.0",
    "mode":"fsdax",
    "map":"dev",
    "size":33820770304,
    "uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",
    "blockdev":"pmem0"
  }
]
 
I'm trying to find out the correct solution. I apprecite your insights!

Thanks,
-- Dexuan
Dan Williams - Feb. 2, 2019, 1:28 a.m.
On Fri, Feb 1, 2019 at 5:06 PM Dexuan Cui <decui@microsoft.com> wrote:
>
> > From: Linux-nvdimm <linux-nvdimm-bounces@lists.01.org> On Behalf Of
> > Dexuan Cui
> > Sent: Friday, February 1, 2019 4:34 PM
> > > > > ...
> > > > > Those reads find a namespace index block
> > > > > and a label. Unfortunately the label has the LOCAL flag set and Linux
> > > > > explicitly ignores pmem namespace labels with that bit set. The reason
> > > > > for that is due to the fact that the original definition of the LOCAL
> > > > > bit from v1.1 of the namespace label implementation [1] explicitly
> > > > > limited the LOCAL flag to "block aperture" regions. If you clear that
> > > > > LOCAL flag I expect it will work. To my knowledge Windows pretends
> > > > > that the v1.1 definition never existed.
>
> On the libnvdimm-pending branch, I get this:
>
> root@decui-gen2-u1904:~/nvdimm# ndctl list
> root@decui-gen2-u1904:~/nvdimm# ndctl list --idle
> [
>   {
>     "dev":"namespace1.0",
>     "mode":"raw",
>     "size":0,
>     "uuid":"00000000-0000-0000-0000-000000000000",
>     "state":"disabled"
>   },
>   {
>     "dev":"namespace0.0",
>     "mode":"raw",
>     "size":0,
>     "uuid":"00000000-0000-0000-0000-000000000000",
>     "state":"disabled"
>   }
> ]
>
> With the patch that clears the LOCAL label (BTW, the initial value of flags is 0x3,
> meaning a read-only local label) :
> @@ -2496,6 +2500,7 @@ static int init_active_labels(struct nd_region *nd_region)
>                         if (!label_ent)
>                                 break;
>                         label = nd_label_active(ndd, j);
> +                       label->flags &= ~NSLABEL_FLAG_LOCAL;
>                         label_ent->label = label;
>
> I get this:
>
> root@decui-gen2-u1904:~/nvdimm# ndctl list
> root@decui-gen2-u1904:~/nvdimm# ndctl list --idle
> [
>   {
>     "dev":"namespace1.0",
>     "mode":"raw",
>     "size":0,
>     "uuid":"c258aaab-f72b-e546-bfa5-be5e07761dbc",
>     "state":"disabled",
>     "name":"Microsoft Hyper-V NVDIMM 1 Label"
>   },
>   {
>     "dev":"namespace0.0",
>     "mode":"raw",
>     "size":0,
>     "uuid":"9f0497a7-4453-7c40-ad35-21a791e00345",
>     "state":"disabled",
>     "name":"Microsoft Hyper-V NVDIMM 0 Label"
>   }
> ]
>
> The "size" and "mode" still don't look right, but the improvement is that
> now I can see a good descriptive "name", which I suppose is retrieved
> from Hyper-V.

Mode is right, there is no way for Hyper-V to create Linux fsdax mode
namespaces it requires some setup using variables only Linux knows.
Can you send the output of:

ndctl read-labels -j all

>
> With Ubuntu 19.04 (4.18.0-11-generic), I get this:
> (Note: the "mode" and "size" are correct. The "uuid" is different from
> the above "9f0497a7-4453-7c40-ad35-21a791e00345" -- this is weird.)
>
> root@decui-gen2-u1904:~# ndctl list
> [
>   {
>     "dev":"namespace1.0",
>     "mode":"raw",
>     "size":137438953472,
>     "blockdev":"pmem1"
>   },
>   {
>     "dev":"namespace0.0",
>     "mode":"fsdax",
>     "map":"dev",
>     "size":33820770304,
>     "uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",
>     "blockdev":"pmem0"
>   }
> ]

This is because the Ubuntu kernel has the bug that causes _LSR to fail
so Linux falls back to a namespace defined by the region boundary. On
that namespace there is an "fsdax" info block located at the region
base +4K. That info block is tagged with the uuid of
"35018886-397e-4fe7-a348-0a4d16eec44d".

> I'm trying to find out the correct solution. I apprecite your insights!

It's a mess. First we need to figure out whether the label is actually
specifying a size of zero, or there is some other bug.

However, the next problem is going to be adding "fsdax" mode support
on top of the read-only defined namespaces. The ndctl reconfiguration
flow:

   ndctl create-namespace -e namespace0.0 -m fsdax -f

...will likely fail because deleting the previous namespace in the
labels is part of that flow. It's always that labels are writable.

Honestly, the quickest path to something functional for Linux is to
simply delete the _LSR support and use raw mode defined namespaces.
Why have labels if they are read-only and the region is sufficient for
defining boundaries?
Dexuan Cui - Feb. 2, 2019, 2:17 a.m.
> From: Dan Williams <dan.j.williams@intel.com>

> Sent: Friday, February 1, 2019 5:29 PM

> > ...

> > The "size" and "mode" still don't look right, but the improvement is that

> > now I can see a good descriptive "name", which I suppose is retrieved

> > from Hyper-V.

> 

> Mode is right, there is no way for Hyper-V to create Linux fsdax mode

> namespaces it requires some setup using variables only Linux knows.

> Can you send the output of:

> 

> ndctl read-labels -j all


The output is from a kernel built with the libnvdimm-pending branch plus
the one-line patch (label->flags &= ~NSLABEL_FLAG_LOCAL) in 
init_active_labels():

root@decui-gen2-u1904:~# ndctl read-labels -j all
[
  {
    "dev":"nmem1",
    "index":[
      {
        "signature":"NAMESPACE_INDEX",
        "major":1,
        "minor":2,
        "labelsize":256,
        "seq":1,
        "nslot":2
      },
      {
        "signature":"NAMESPACE_INDEX",
        "major":1,
        "minor":2,
        "labelsize":256,
        "seq":2,
        "nslot":2
      }
    ],
    "label":[
      {
        "uuid":"c258aaab-f72b-e546-bfa5-be5e07761dbc",
        "name":"Microsoft Hyper-V NVDIMM 1 Label",
        "slot":0,
        "position":0,
        "nlabel":1,
        "isetcookie":708891662257476870,
        "lbasize":0,
        "dpa":0,
        "rawsize":137438953472,
        "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
        "abstraction_guid":"00000000-0000-0000-0000-000000000000"
      }
    ]
  },
  {
    "dev":"nmem0",
    "index":[
      {
        "signature":"NAMESPACE_INDEX",
        "major":1,
        "minor":2,
        "labelsize":256,
        "seq":1,
        "nslot":2
      },
      {
        "signature":"NAMESPACE_INDEX",
        "major":1,
        "minor":2,
        "labelsize":256,
        "seq":2,
        "nslot":2
      }
    ],
    "label":[
      {
        "uuid":"9f0497a7-4453-7c40-ad35-21a791e00345",
        "name":"Microsoft Hyper-V NVDIMM 0 Label",
        "slot":0,
        "position":0,
        "nlabel":1,
        "isetcookie":708891619307803909,
        "lbasize":0,
        "dpa":0,
        "rawsize":34359738368,
        "type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
        "abstraction_guid":"00000000-0000-0000-0000-000000000000"
      }
    ]
  }
]
read 2 nmems

> > With Ubuntu 19.04 (4.18.0-11-generic), I get this:

> > (Note: the "mode" and "size" are correct. The "uuid" is different from

> > the above "9f0497a7-4453-7c40-ad35-21a791e00345" -- this is weird.)

> >

> > root@decui-gen2-u1904:~# ndctl list

> > [

> >   {

> >     "dev":"namespace1.0",

> >     "mode":"raw",

> >     "size":137438953472,

> >     "blockdev":"pmem1"

> >   },

> >   {

> >     "dev":"namespace0.0",

> >     "mode":"fsdax",

> >     "map":"dev",

> >     "size":33820770304,

> >     "uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",

> >     "blockdev":"pmem0"

> >   }

> > ]

> 

> This is because the Ubuntu kernel has the bug that causes _LSR to fail

> so Linux falls back to a namespace defined by the region boundary. On

> that namespace there is an "fsdax" info block located at the region

> base +4K. That info block is tagged with the uuid of

> "35018886-397e-4fe7-a348-0a4d16eec44d".

Thanks for the explanation!
 
> > I'm trying to find out the correct solution. I apprecite your insights!

> 

> It's a mess. First we need to figure out whether the label is actually

> specifying a size of zero, or there is some other bug.

I agree.
 
> However, the next problem is going to be adding "fsdax" mode support

> on top of the read-only defined namespaces. The ndctl reconfiguration

> flow:

> 

>    ndctl create-namespace -e namespace0.0 -m fsdax -f


> 

> ...will likely fail because deleting the previous namespace in the

> labels is part of that flow. It's always that labels are writable.

> 

> Honestly, the quickest path to something functional for Linux is to

> simply delete the _LSR support and use raw mode defined namespaces.

> Why have labels if they are read-only and the region is sufficient for

> defining boundaries?


Just now Hyper-V team confirmed _LSW is not supported.

But with Ubuntu 19.04 kernel (4.18.0-11-generic), I'm able to run the commands
without any issue:

root@decui-gen2-u1904:~# ndctl list
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":137438953472,
    "blockdev":"pmem1"
  },
  {
    "dev":"namespace0.0",
    "mode":"fsdax",
    "map":"dev",
    "size":33820770304,
    "uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",
    "blockdev":"pmem0"
  }
]

root@decui-gen2-u1904:~# ndctl destroy-namespace "namespace0.0"
  Error: namespace0.0 is active, specify --force for re-configuration

error destroying namespaces: Device or resource busy
destroyed 0 namespaces

root@decui-gen2-u1904:~# ndctl destroy-namespace "namespace0.0" --force
destroyed 1 namespace

root@decui-gen2-u1904:~# ndctl list
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":137438953472,
    "blockdev":"pmem1"
  }
]

root@decui-gen2-u1904:~# ndctl create-namespace -e namespace0.0 -m fsdax -f
{
  "dev":"namespace0.0",
  "mode":"fsdax",
  "map":"dev",
  "size":"31.50 GiB (33.82 GB)",
  "uuid":"9e4e819b-e2eb-4796-8f9e-15f96f63b5c2",
  "sector_size":512,
  "blockdev":"pmem0",
  "numa_node":1
}

root@decui-gen2-u1904:~# ndctl list
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":137438953472,
    "blockdev":"pmem1"
  },
  {
    "dev":"namespace0.0",
    "mode":"fsdax",
    "map":"dev",
    "size":33820770304,
    "uuid":"9e4e819b-e2eb-4796-8f9e-15f96f63b5c2",
    "blockdev":"pmem0"
  }
]


The above commands can also run fine with an upstream kernel that
doesn't have 
11189c1089da ("acpi/nfit: Fix command-supported detection")
or
1194c4133195 ("nfit: Add Hyper-V NVDIMM DSM command set to white list")

Thanks
-- Dexuan
Dexuan Cui - Feb. 2, 2019, 3:32 a.m.
> From: Dan Williams <dan.j.williams@intel.com>

> Sent: Friday, February 1, 2019 5:29 PM

> ... 

> Honestly, the quickest path to something functional for Linux is to

> simply delete the _LSR support and use raw mode defined namespaces.

> Why have labels if they are read-only and the region is sufficient for

> defining boundaries?

Hyper-V Virtual NVDIMM can already work with Ubuntu 19.04 virtual machine
running on Hyper-V, i.e. I can create a raw or fsdax namesapce, and create a
xfs or ext4 file sysetm in /dev/pmem0p1, and mount the file system with and
without "-o dax". The basic functionality is good.

My recent work is mainly for ndctl support, i.e. get the health info by ndctl,
and use ndctl to monitor the error events (if applicable).

Thanks,
-- Dexuan
Dan Williams - Feb. 2, 2019, 5:26 a.m.
On Fri, Feb 1, 2019 at 7:32 PM Dexuan Cui <decui@microsoft.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> > Sent: Friday, February 1, 2019 5:29 PM
> > ...
> > Honestly, the quickest path to something functional for Linux is to
> > simply delete the _LSR support and use raw mode defined namespaces.
> > Why have labels if they are read-only and the region is sufficient for
> > defining boundaries?
> Hyper-V Virtual NVDIMM can already work with Ubuntu 19.04 virtual machine
> running on Hyper-V, i.e. I can create a raw or fsdax namesapce, and create a
> xfs or ext4 file sysetm in /dev/pmem0p1, and mount the file system with and
> without "-o dax". The basic functionality is good.

Only in label-less mode apparently.

> My recent work is mainly for ndctl support, i.e. get the health info by ndctl,
> and use ndctl to monitor the error events (if applicable).

Right, but the recent fixes have exposed Linux to a labelled namespace
implementation that violates some of the driver's basic assumptions.

To preserve the level of operation you're currently seeing Linux might
need to add a hyper-v-specific quirk to force label-less operation.
Dan Williams - Feb. 2, 2019, 5:27 a.m.
On Fri, Feb 1, 2019 at 6:17 PM Dexuan Cui <decui@microsoft.com> wrote:
> > From: Dan Williams <dan.j.williams@intel.com>
[..]
> > Honestly, the quickest path to something functional for Linux is to
> > simply delete the _LSR support and use raw mode defined namespaces.
> > Why have labels if they are read-only and the region is sufficient for
> > defining boundaries?
>
> Just now Hyper-V team confirmed _LSW is not supported.
>
> But with Ubuntu 19.04 kernel (4.18.0-11-generic), I'm able to run the commands
> without any issue:

This is because Ubuntu is running in "label-less" mode. So all the
writes it is performing for namespace reconfiguration are just going
to the data-space of the region, not the labels.

Patch

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index e18ade5d74e9..a9270c99be72 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1861,9 +1861,17 @@  static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
 	dev_set_drvdata(&adev_dimm->dev, nfit_mem);
 
 	/*
-	 * Until standardization materializes we need to consider 4
-	 * different command sets.  Note, that checking for function0 (bit0)
-	 * tells us if any commands are reachable through this GUID.
+	 * There are 4 "legacy" NVDIMM command sets
+	 * (NVDIMM_FAMILY_{INTEL,MSFT,HPE1,HPE2}) that were created before
+	 * an EFI working group was established to constrain this
+	 * proliferation. The nfit driver probes for the supported command
+	 * set by GUID. Note, if you're a platform developer looking to add
+	 * a new command set to this probe, consider using an existing set,
+	 * or otherwise seek approval to publish the command set at
+	 * http://www.uefi.org/RFIC_LIST.
+	 *
+	 * Note, that checking for function0 (bit0) tells us if any commands
+	 * are reachable through this GUID.
 	 */
 	for (i = 0; i <= NVDIMM_FAMILY_MAX; i++)
 		if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1))
@@ -1886,6 +1894,8 @@  static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
 			dsm_mask &= ~(1 << 8);
 	} else if (nfit_mem->family == NVDIMM_FAMILY_MSFT) {
 		dsm_mask = 0xffffffff;
+	} else if (nfit_mem->family == NVDIMM_FAMILY_HYPERV) {
+		dsm_mask = 0x1f;
 	} else {
 		dev_dbg(dev, "unknown dimm command family\n");
 		nfit_mem->family = -1;
@@ -3729,6 +3739,7 @@  static __init int nfit_init(void)
 	guid_parse(UUID_NFIT_DIMM_N_HPE1, &nfit_uuid[NFIT_DEV_DIMM_N_HPE1]);
 	guid_parse(UUID_NFIT_DIMM_N_HPE2, &nfit_uuid[NFIT_DEV_DIMM_N_HPE2]);
 	guid_parse(UUID_NFIT_DIMM_N_MSFT, &nfit_uuid[NFIT_DEV_DIMM_N_MSFT]);
+	guid_parse(UUID_NFIT_DIMM_N_HYPERV, &nfit_uuid[NFIT_DEV_DIMM_N_HYPERV]);
 
 	nfit_wq = create_singlethread_workqueue("nfit");
 	if (!nfit_wq)
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 33691aecfcee..4de167b4f76f 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -34,11 +34,14 @@ 
 /* https://msdn.microsoft.com/library/windows/hardware/mt604741 */
 #define UUID_NFIT_DIMM_N_MSFT "1ee68b36-d4bd-4a1a-9a16-4f8e53d46e05"
 
+/* http://www.uefi.org/RFIC_LIST (see "Virtual NVDIMM 0x1901") */
+#define UUID_NFIT_DIMM_N_HYPERV "5746c5f2-a9a2-4264-ad0e-e4ddc9e09e80"
+
 #define ACPI_NFIT_MEM_FAILED_MASK (ACPI_NFIT_MEM_SAVE_FAILED \
 		| ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \
 		| ACPI_NFIT_MEM_NOT_ARMED | ACPI_NFIT_MEM_MAP_FAILED)
 
-#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_MSFT
+#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_HYPERV
 
 #define NVDIMM_STANDARD_CMDMASK \
 (1 << ND_CMD_SMART | 1 << ND_CMD_SMART_THRESHOLD | 1 << ND_CMD_DIMM_FLAGS \
@@ -94,6 +97,7 @@  enum nfit_uuids {
 	NFIT_DEV_DIMM_N_HPE1 = NVDIMM_FAMILY_HPE1,
 	NFIT_DEV_DIMM_N_HPE2 = NVDIMM_FAMILY_HPE2,
 	NFIT_DEV_DIMM_N_MSFT = NVDIMM_FAMILY_MSFT,
+	NFIT_DEV_DIMM_N_HYPERV = NVDIMM_FAMILY_HYPERV,
 	NFIT_SPA_VOLATILE,
 	NFIT_SPA_PM,
 	NFIT_SPA_DCR,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index f57c9e434d2d..de5d90212409 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -243,6 +243,7 @@  struct nd_cmd_pkg {
 #define NVDIMM_FAMILY_HPE1 1
 #define NVDIMM_FAMILY_HPE2 2
 #define NVDIMM_FAMILY_MSFT 3
+#define NVDIMM_FAMILY_HYPERV 4
 
 #define ND_IOCTL_CALL			_IOWR(ND_IOCTL, ND_CMD_CALL,\
 					struct nd_cmd_pkg)