Patchwork [for-4.10] libxc: load acpi RSDP table at correct address

login
register
mail settings
Submitter Juergen Gross
Date Nov. 20, 2017, 8:34 a.m.
Message ID <20171120083427.3434-1-jgross@suse.com>
Download mbox | patch
Permalink /patch/386629/
State New
Headers show

Comments

Juergen Gross - Nov. 20, 2017, 8:34 a.m.
For PVH domains loading of the ACPI RSDP table is done via allocating
a domain loader segment after having loaded the kernel. This leads to
the RSDP table being loaded at an arbitrary guest address instead of
the architectural correct address just below 1MB.

When using the Linux kernel this is currently no problem as the
bzImage loader is being used, which is loading ACPI tables via an
alternative method.

Using grub2 however exposes this problem leading to the selected
kernel no longer being able to find the RSDP table.

To solve this issue allow to load the RSDP table below the already
loaded kernel if this space hasn't been used before.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
Not sure if this is acceptable for 4.10, but I think PVH guest support
should include the possibility to use grub2 as a boot loader.

So please consider this patch for 4.10.
---
 tools/libxc/xc_dom_hvmloader.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)
Roger Pau Monné - Nov. 20, 2017, 9:51 a.m.
Adding xen-devel, dropped it on my reply.

Replying from my phone, sorry for the formatting.


El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com> escribió:

For PVH domains loading of the ACPI RSDP table is done via allocating
a domain loader segment after having loaded the kernel. This leads to
the RSDP table being loaded at an arbitrary guest address instead of
the architectural correct address just below 1MB.


AFAIK this is only true for legacy BIOS boot, when using UEFI the RSDP can
be anywhere in memory, hence grub2 must already have an alternative way of
finding the RSDP apart from scanning the low 1MB.

Thanks, Roger.
Juergen Gross - Nov. 20, 2017, 9:55 a.m.
On 20/11/17 10:51, Roger Pau Monné wrote:
> Adding xen-devel, dropped it on my reply. 
> 
>     Replying from my phone, sorry for the formatting. 
> 
> 
>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>     <mailto:jgross@suse.com>> escribió:
> 
>         For PVH domains loading of the ACPI RSDP table is done via
>         allocating
>         a domain loader segment after having loaded the kernel. This
>         leads to
>         the RSDP table being loaded at an arbitrary guest address instead of
>         the architectural correct address just below 1MB. 
> 
> 
>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>     RSDP can be anywhere in memory, hence grub2 must already have an
>     alternative way of finding the RSDP apart from scanning the low 1MB.

The problem isn't grub2, but the loaded linux kernel. Without this
patch Linux won't find the RSDP when booted in a PVH domain via grub2.

I could modify grub2 even further to move the RSDP to the correct
address, but I think doing it correctly on Xen side is the better
option.


Juergen
Andrew Cooper - Nov. 20, 2017, 9:58 a.m.
On 20/11/2017 09:55, Juergen Gross wrote:
> On 20/11/17 10:51, Roger Pau Monné wrote:
>> Adding xen-devel, dropped it on my reply. 
>>
>>     Replying from my phone, sorry for the formatting. 
>>
>>
>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>     <mailto:jgross@suse.com>> escribió:
>>
>>         For PVH domains loading of the ACPI RSDP table is done via
>>         allocating
>>         a domain loader segment after having loaded the kernel. This
>>         leads to
>>         the RSDP table being loaded at an arbitrary guest address instead of
>>         the architectural correct address just below 1MB. 
>>
>>
>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>     alternative way of finding the RSDP apart from scanning the low 1MB.
> The problem isn't grub2, but the loaded linux kernel. Without this
> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>
> I could modify grub2 even further to move the RSDP to the correct
> address, but I think doing it correctly on Xen side is the better
> option.

Why?  The PVH info block contains a pointer directly to the RSDP, and
Linux should be following this rather than scanning for it using the
legacy method.

~Andrew
Juergen Gross - Nov. 20, 2017, 10:04 a.m.
On 20/11/17 10:58, Andrew Cooper wrote:
> On 20/11/2017 09:55, Juergen Gross wrote:
>> On 20/11/17 10:51, Roger Pau Monné wrote:
>>> Adding xen-devel, dropped it on my reply. 
>>>
>>>     Replying from my phone, sorry for the formatting. 
>>>
>>>
>>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>>     <mailto:jgross@suse.com>> escribió:
>>>
>>>         For PVH domains loading of the ACPI RSDP table is done via
>>>         allocating
>>>         a domain loader segment after having loaded the kernel. This
>>>         leads to
>>>         the RSDP table being loaded at an arbitrary guest address instead of
>>>         the architectural correct address just below 1MB. 
>>>
>>>
>>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>>     alternative way of finding the RSDP apart from scanning the low 1MB.
>> The problem isn't grub2, but the loaded linux kernel. Without this
>> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>>
>> I could modify grub2 even further to move the RSDP to the correct
>> address, but I think doing it correctly on Xen side is the better
>> option.
> 
> Why?  The PVH info block contains a pointer directly to the RSDP, and
> Linux should be following this rather than scanning for it using the
> legacy method.

Oh no, please not this discussion again.

We already had a very long discussion how to do PVH support in grub2,
and the outcome was to try to use the standard boot entry of the kernel
instead the PVH sepcific one.

The Linux kernel right now doesn't make use of the RSDP pointer in the
PVH info block, so I think we shouldn't change this when using grub2.


Juergen
Andrew Cooper - Nov. 20, 2017, 10:21 a.m.
On 20/11/17 10:04, Juergen Gross wrote:
> On 20/11/17 10:58, Andrew Cooper wrote:
>> On 20/11/2017 09:55, Juergen Gross wrote:
>>> On 20/11/17 10:51, Roger Pau Monné wrote:
>>>> Adding xen-devel, dropped it on my reply. 
>>>>
>>>>     Replying from my phone, sorry for the formatting. 
>>>>
>>>>
>>>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>>>     <mailto:jgross@suse.com>> escribió:
>>>>
>>>>         For PVH domains loading of the ACPI RSDP table is done via
>>>>         allocating
>>>>         a domain loader segment after having loaded the kernel. This
>>>>         leads to
>>>>         the RSDP table being loaded at an arbitrary guest address instead of
>>>>         the architectural correct address just below 1MB. 
>>>>
>>>>
>>>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>>>     alternative way of finding the RSDP apart from scanning the low 1MB.
>>> The problem isn't grub2, but the loaded linux kernel. Without this
>>> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>>>
>>> I could modify grub2 even further to move the RSDP to the correct
>>> address, but I think doing it correctly on Xen side is the better
>>> option.
>> Why?  The PVH info block contains a pointer directly to the RSDP, and
>> Linux should be following this rather than scanning for it using the
>> legacy method.
> Oh no, please not this discussion again.
>
> We already had a very long discussion how to do PVH support in grub2,
> and the outcome was to try to use the standard boot entry of the kernel
> instead the PVH sepcific one.
>
> The Linux kernel right now doesn't make use of the RSDP pointer in the
> PVH info block, so I think we shouldn't change this when using grub2.

I clearly missed the previous discussion, and I don't advocate using yet
another PVH-specific entry point, but how does Linux cope in other
non-BIOS environments?  Does it genuinely rely exclusively on the legacy
mechanism?

~Andrew
Juergen Gross - Nov. 20, 2017, 10:43 a.m.
On 20/11/17 11:21, Andrew Cooper wrote:
> On 20/11/17 10:04, Juergen Gross wrote:
>> On 20/11/17 10:58, Andrew Cooper wrote:
>>> On 20/11/2017 09:55, Juergen Gross wrote:
>>>> On 20/11/17 10:51, Roger Pau Monné wrote:
>>>>> Adding xen-devel, dropped it on my reply. 
>>>>>
>>>>>     Replying from my phone, sorry for the formatting. 
>>>>>
>>>>>
>>>>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>>>>     <mailto:jgross@suse.com>> escribió:
>>>>>
>>>>>         For PVH domains loading of the ACPI RSDP table is done via
>>>>>         allocating
>>>>>         a domain loader segment after having loaded the kernel. This
>>>>>         leads to
>>>>>         the RSDP table being loaded at an arbitrary guest address instead of
>>>>>         the architectural correct address just below 1MB. 
>>>>>
>>>>>
>>>>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>>>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>>>>     alternative way of finding the RSDP apart from scanning the low 1MB.
>>>> The problem isn't grub2, but the loaded linux kernel. Without this
>>>> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>>>>
>>>> I could modify grub2 even further to move the RSDP to the correct
>>>> address, but I think doing it correctly on Xen side is the better
>>>> option.
>>> Why?  The PVH info block contains a pointer directly to the RSDP, and
>>> Linux should be following this rather than scanning for it using the
>>> legacy method.
>> Oh no, please not this discussion again.
>>
>> We already had a very long discussion how to do PVH support in grub2,
>> and the outcome was to try to use the standard boot entry of the kernel
>> instead the PVH sepcific one.
>>
>> The Linux kernel right now doesn't make use of the RSDP pointer in the
>> PVH info block, so I think we shouldn't change this when using grub2.
> 
> I clearly missed the previous discussion, and I don't advocate using yet
> another PVH-specific entry point, but how does Linux cope in other
> non-BIOS environments?  Does it genuinely rely exclusively on the legacy
> mechanism?

Looking at the code I think so, yes. Maybe there are cases where no RSDP
is needed, but in the grub2/PVH case we need it to distinguish PVH from
HVM.

Juergen
Andrew Cooper - Nov. 20, 2017, 10:57 a.m.
On 20/11/17 10:43, Juergen Gross wrote:
> On 20/11/17 11:21, Andrew Cooper wrote:
>> On 20/11/17 10:04, Juergen Gross wrote:
>>> On 20/11/17 10:58, Andrew Cooper wrote:
>>>> On 20/11/2017 09:55, Juergen Gross wrote:
>>>>> On 20/11/17 10:51, Roger Pau Monné wrote:
>>>>>> Adding xen-devel, dropped it on my reply. 
>>>>>>
>>>>>>     Replying from my phone, sorry for the formatting. 
>>>>>>
>>>>>>
>>>>>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>>>>>     <mailto:jgross@suse.com>> escribió:
>>>>>>
>>>>>>         For PVH domains loading of the ACPI RSDP table is done via
>>>>>>         allocating
>>>>>>         a domain loader segment after having loaded the kernel. This
>>>>>>         leads to
>>>>>>         the RSDP table being loaded at an arbitrary guest address instead of
>>>>>>         the architectural correct address just below 1MB. 
>>>>>>
>>>>>>
>>>>>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>>>>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>>>>>     alternative way of finding the RSDP apart from scanning the low 1MB.
>>>>> The problem isn't grub2, but the loaded linux kernel. Without this
>>>>> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>>>>>
>>>>> I could modify grub2 even further to move the RSDP to the correct
>>>>> address, but I think doing it correctly on Xen side is the better
>>>>> option.
>>>> Why?  The PVH info block contains a pointer directly to the RSDP, and
>>>> Linux should be following this rather than scanning for it using the
>>>> legacy method.
>>> Oh no, please not this discussion again.
>>>
>>> We already had a very long discussion how to do PVH support in grub2,
>>> and the outcome was to try to use the standard boot entry of the kernel
>>> instead the PVH sepcific one.
>>>
>>> The Linux kernel right now doesn't make use of the RSDP pointer in the
>>> PVH info block, so I think we shouldn't change this when using grub2.
>> I clearly missed the previous discussion, and I don't advocate using yet
>> another PVH-specific entry point, but how does Linux cope in other
>> non-BIOS environments?  Does it genuinely rely exclusively on the legacy
>> mechanism?
> Looking at the code I think so, yes. Maybe there are cases where no RSDP
> is needed, but in the grub2/PVH case we need it to distinguish PVH from
> HVM.

In which case, being a Linux limitation, I think it is wrong to
unilaterally apply this restriction to all other PVH guests.

Doing this in grub seems like the more appropriate place IMO.

~Andrew
Juergen Gross - Nov. 20, 2017, 11:20 a.m.
On 20/11/17 11:57, Andrew Cooper wrote:
> On 20/11/17 10:43, Juergen Gross wrote:
>> On 20/11/17 11:21, Andrew Cooper wrote:
>>> On 20/11/17 10:04, Juergen Gross wrote:
>>>> On 20/11/17 10:58, Andrew Cooper wrote:
>>>>> On 20/11/2017 09:55, Juergen Gross wrote:
>>>>>> On 20/11/17 10:51, Roger Pau Monné wrote:
>>>>>>> Adding xen-devel, dropped it on my reply. 
>>>>>>>
>>>>>>>     Replying from my phone, sorry for the formatting. 
>>>>>>>
>>>>>>>
>>>>>>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>>>>>>     <mailto:jgross@suse.com>> escribió:
>>>>>>>
>>>>>>>         For PVH domains loading of the ACPI RSDP table is done via
>>>>>>>         allocating
>>>>>>>         a domain loader segment after having loaded the kernel. This
>>>>>>>         leads to
>>>>>>>         the RSDP table being loaded at an arbitrary guest address instead of
>>>>>>>         the architectural correct address just below 1MB. 
>>>>>>>
>>>>>>>
>>>>>>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>>>>>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>>>>>>     alternative way of finding the RSDP apart from scanning the low 1MB.
>>>>>> The problem isn't grub2, but the loaded linux kernel. Without this
>>>>>> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>>>>>>
>>>>>> I could modify grub2 even further to move the RSDP to the correct
>>>>>> address, but I think doing it correctly on Xen side is the better
>>>>>> option.
>>>>> Why?  The PVH info block contains a pointer directly to the RSDP, and
>>>>> Linux should be following this rather than scanning for it using the
>>>>> legacy method.
>>>> Oh no, please not this discussion again.
>>>>
>>>> We already had a very long discussion how to do PVH support in grub2,
>>>> and the outcome was to try to use the standard boot entry of the kernel
>>>> instead the PVH sepcific one.
>>>>
>>>> The Linux kernel right now doesn't make use of the RSDP pointer in the
>>>> PVH info block, so I think we shouldn't change this when using grub2.
>>> I clearly missed the previous discussion, and I don't advocate using yet
>>> another PVH-specific entry point, but how does Linux cope in other
>>> non-BIOS environments?  Does it genuinely rely exclusively on the legacy
>>> mechanism?
>> Looking at the code I think so, yes. Maybe there are cases where no RSDP
>> is needed, but in the grub2/PVH case we need it to distinguish PVH from
>> HVM.
> 
> In which case, being a Linux limitation, I think it is wrong to
> unilaterally apply this restriction to all other PVH guests.

Which restriction? I'm loading the RSDP table to its architectural
correct addres if possible, otherwise it will be loaded to the same
address as without my patch. So I'm not adding a restriction, but
removing one.

> Doing this in grub seems like the more appropriate place IMO.

I don't think so.


Juergen
Jan Beulich - Nov. 20, 2017, 11:50 a.m.
>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
> Which restriction? I'm loading the RSDP table to its architectural
> correct addres if possible, otherwise it will be loaded to the same
> address as without my patch. So I'm not adding a restriction, but
> removing one.

What is "architecturally correct" in PVH can't be read out of
specs other than what we write down. When there's no BIOS,
placing anything right below the 1Mb boundary is at least
bogus.

As to the prior grub2 discussion you refer to - just like Andrew
I don't think I was really involved here. If it resulted in any
decisions affecting the PVH ABI, I think it would be a good idea
to summarize the outcome, and perhaps even submit a patch
adding respective documentation (e.g. by way of comment in
public headers). That'll then allow non-grub Xen folks (like
Andrew and me) to see what you're intending to do (and of
course there would be the risk of someone disagreeing with
what you had come up with while discussing this on the grub
side).

Jan
Boris Ostrovsky - Nov. 20, 2017, 1:51 p.m.
On 11/20/2017 06:20 AM, Juergen Gross wrote:
> On 20/11/17 11:57, Andrew Cooper wrote:
>> On 20/11/17 10:43, Juergen Gross wrote:
>>> On 20/11/17 11:21, Andrew Cooper wrote:
>>>> On 20/11/17 10:04, Juergen Gross wrote:
>>>>> On 20/11/17 10:58, Andrew Cooper wrote:
>>>>>> On 20/11/2017 09:55, Juergen Gross wrote:
>>>>>>> On 20/11/17 10:51, Roger Pau Monné wrote:
>>>>>>>> Adding xen-devel, dropped it on my reply. 
>>>>>>>>
>>>>>>>>     Replying from my phone, sorry for the formatting. 
>>>>>>>>
>>>>>>>>
>>>>>>>>     El 20 nov. 2017 9:35, "Juergen Gross" <jgross@suse.com
>>>>>>>>     <mailto:jgross@suse.com>> escribió:
>>>>>>>>
>>>>>>>>         For PVH domains loading of the ACPI RSDP table is done via
>>>>>>>>         allocating
>>>>>>>>         a domain loader segment after having loaded the kernel. This
>>>>>>>>         leads to
>>>>>>>>         the RSDP table being loaded at an arbitrary guest address instead of
>>>>>>>>         the architectural correct address just below 1MB. 
>>>>>>>>
>>>>>>>>
>>>>>>>>     AFAIK this is only true for legacy BIOS boot, when using UEFI the
>>>>>>>>     RSDP can be anywhere in memory, hence grub2 must already have an
>>>>>>>>     alternative way of finding the RSDP apart from scanning the low 1MB.
>>>>>>> The problem isn't grub2, but the loaded linux kernel. Without this
>>>>>>> patch Linux won't find the RSDP when booted in a PVH domain via grub2.
>>>>>>>
>>>>>>> I could modify grub2 even further to move the RSDP to the correct
>>>>>>> address, but I think doing it correctly on Xen side is the better
>>>>>>> option.
>>>>>> Why?  The PVH info block contains a pointer directly to the RSDP, and
>>>>>> Linux should be following this rather than scanning for it using the
>>>>>> legacy method.
>>>>> Oh no, please not this discussion again.
>>>>>
>>>>> We already had a very long discussion how to do PVH support in grub2,
>>>>> and the outcome was to try to use the standard boot entry of the kernel
>>>>> instead the PVH sepcific one.
>>>>>
>>>>> The Linux kernel right now doesn't make use of the RSDP pointer in the
>>>>> PVH info block, so I think we shouldn't change this when using grub2.


As I mentioned in the other thread --- it will when we get to dom0 support.

>>>> I clearly missed the previous discussion, and I don't advocate using yet
>>>> another PVH-specific entry point, but how does Linux cope in other
>>>> non-BIOS environments?  Does it genuinely rely exclusively on the legacy
>>>> mechanism?

FYI (and this is not directly related to this thread) there was a
discussion with KVM engineers and they may be interested in doing a
PVH-like boot, using Xen PVH entry point.

Adding Maran who is looking at this.

-boris

>>> Looking at the code I think so, yes. Maybe there are cases where no RSDP
>>> is needed, but in the grub2/PVH case we need it to distinguish PVH from
>>> HVM.
>> In which case, being a Linux limitation, I think it is wrong to
>> unilaterally apply this restriction to all other PVH guests.
> Which restriction? I'm loading the RSDP table to its architectural
> correct addres if possible, otherwise it will be loaded to the same
> address as without my patch. So I'm not adding a restriction, but
> removing one.
>
>> Doing this in grub seems like the more appropriate place IMO.
> I don't think so.
>
>
> Juergen
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Boris Ostrovsky - Nov. 20, 2017, 1:56 p.m.
On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>> Which restriction? I'm loading the RSDP table to its architectural
>> correct addres if possible, otherwise it will be loaded to the same
>> address as without my patch. So I'm not adding a restriction, but
>> removing one.
> What is "architecturally correct" in PVH can't be read out of
> specs other than what we write down. When there's no BIOS,
> placing anything right below the 1Mb boundary is at least
> bogus.

Unless it's a UEFI boot -- where else would you put it? Aren't these two
(UEFI and non-UEFI) the only two options that the ACPI spec provides?

-boris

>
> As to the prior grub2 discussion you refer to - just like Andrew
> I don't think I was really involved here. If it resulted in any
> decisions affecting the PVH ABI, I think it would be a good idea
> to summarize the outcome, and perhaps even submit a patch
> adding respective documentation (e.g. by way of comment in
> public headers). That'll then allow non-grub Xen folks (like
> Andrew and me) to see what you're intending to do (and of
> course there would be the risk of someone disagreeing with
> what you had come up with while discussing this on the grub
> side).
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Jan Beulich - Nov. 20, 2017, 2:11 p.m.
>>> On 20.11.17 at 14:56, <boris.ostrovsky@oracle.com> wrote:
> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>> Which restriction? I'm loading the RSDP table to its architectural
>>> correct addres if possible, otherwise it will be loaded to the same
>>> address as without my patch. So I'm not adding a restriction, but
>>> removing one.
>> What is "architecturally correct" in PVH can't be read out of
>> specs other than what we write down. When there's no BIOS,
>> placing anything right below the 1Mb boundary is at least
>> bogus.
> 
> Unless it's a UEFI boot -- where else would you put it? Aren't these two
> (UEFI and non-UEFI) the only two options that the ACPI spec provides?

Of course - we can't really expect them to cater for something like
PVH. But this also means we can't always use the spec as reference
point here.

Jan
Juergen Gross - Nov. 20, 2017, 2:14 p.m.
On 20/11/17 14:56, Boris Ostrovsky wrote:
> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>> Which restriction? I'm loading the RSDP table to its architectural
>>> correct addres if possible, otherwise it will be loaded to the same
>>> address as without my patch. So I'm not adding a restriction, but
>>> removing one.
>> What is "architecturally correct" in PVH can't be read out of
>> specs other than what we write down. When there's no BIOS,
>> placing anything right below the 1Mb boundary is at least
>> bogus.
> 
> Unless it's a UEFI boot -- where else would you put it? Aren't these two
> (UEFI and non-UEFI) the only two options that the ACPI spec provides?

I think Jan is right: for PVH its _our_ job to define the correct
placement. Which still can be the same as in the BIOS case, making
it easier to adapt any guest systems.

So I'd say: The RSDP address in PVH case is passed in the PVH start
info block to the guest. In case there is no conflict with the
physical load address of the guest kernel the preferred address of
the RSDP is right below the 1MB boundary.

Would this wording be okay?


Juergen
Jan Beulich - Nov. 20, 2017, 2:20 p.m.
>>> On 20.11.17 at 15:14, <jgross@suse.com> wrote:
> On 20/11/17 14:56, Boris Ostrovsky wrote:
>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>> correct addres if possible, otherwise it will be loaded to the same
>>>> address as without my patch. So I'm not adding a restriction, but
>>>> removing one.
>>> What is "architecturally correct" in PVH can't be read out of
>>> specs other than what we write down. When there's no BIOS,
>>> placing anything right below the 1Mb boundary is at least
>>> bogus.
>> 
>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
> 
> I think Jan is right: for PVH its _our_ job to define the correct
> placement. Which still can be the same as in the BIOS case, making
> it easier to adapt any guest systems.
> 
> So I'd say: The RSDP address in PVH case is passed in the PVH start
> info block to the guest. In case there is no conflict with the
> physical load address of the guest kernel the preferred address of
> the RSDP is right below the 1MB boundary.
> 
> Would this wording be okay?

To be honest (and in case it wasn't sufficiently clear form my
earlier replies) - I'm pretty much opposed to this below-1Mb thing.
There ought to be just plain RAM there for PVH.

Jan
Boris Ostrovsky - Nov. 20, 2017, 2:25 p.m.
On 11/20/2017 09:14 AM, Juergen Gross wrote:
> On 20/11/17 14:56, Boris Ostrovsky wrote:
>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>> correct addres if possible, otherwise it will be loaded to the same
>>>> address as without my patch. So I'm not adding a restriction, but
>>>> removing one.
>>> What is "architecturally correct" in PVH can't be read out of
>>> specs other than what we write down. When there's no BIOS,
>>> placing anything right below the 1Mb boundary is at least
>>> bogus.
>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
> I think Jan is right: for PVH its _our_ job to define the correct
> placement. 

Yes, and if it is placed in a non-standard location then the guest will
have to deal with it in a non-standard way. Which we can in Linux by
setting acpi_rsdp pointer in the special PVH entry point, before jumping
to Linux "standard" entry --- startup_{32|64}().

But if your goal is to avoid that special entry point (and thus not set
acpi_rsdp) then how do you expect kernel to find RSDP?

> Which still can be the same as in the BIOS case, making
> it easier to adapt any guest systems.
>
> So I'd say: The RSDP address in PVH case is passed in the PVH start
> info block to the guest. In case there is no conflict with the
> physical load address of the guest kernel the preferred address of
> the RSDP is right below the 1MB boundary.

And what do we do if there *is* a conflict?


-boris
Andrew Cooper - Nov. 20, 2017, 2:36 p.m.
On 20/11/17 14:25, Boris Ostrovsky wrote:
> On 11/20/2017 09:14 AM, Juergen Gross wrote:
>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>> removing one.
>>>> What is "architecturally correct" in PVH can't be read out of
>>>> specs other than what we write down. When there's no BIOS,
>>>> placing anything right below the 1Mb boundary is at least
>>>> bogus.
>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>> I think Jan is right: for PVH its _our_ job to define the correct
>> placement. 
> Yes, and if it is placed in a non-standard location then the guest will
> have to deal with it in a non-standard way. Which we can in Linux by
> setting acpi_rsdp pointer in the special PVH entry point, before jumping
> to Linux "standard" entry --- startup_{32|64}().
>
> But if your goal is to avoid that special entry point (and thus not set
> acpi_rsdp) then how do you expect kernel to find RSDP?
>
>> Which still can be the same as in the BIOS case, making
>> it easier to adapt any guest systems.
>>
>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>> info block to the guest. In case there is no conflict with the
>> physical load address of the guest kernel the preferred address of
>> the RSDP is right below the 1MB boundary.
> And what do we do if there *is* a conflict?

As a random alternative, what about writing up an RSDP reference into
the zeropage?

I'd be surprised if Xen PVH is the only software in this position of
trying to use the native paths wherever possible, and not retaining
legacy ideas of a PC system.

~Andrew
Boris Ostrovsky - Nov. 20, 2017, 2:52 p.m.
On 11/20/2017 09:36 AM, Andrew Cooper wrote:
> On 20/11/17 14:25, Boris Ostrovsky wrote:
>> On 11/20/2017 09:14 AM, Juergen Gross wrote:
>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>> removing one.
>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>> specs other than what we write down. When there's no BIOS,
>>>>> placing anything right below the 1Mb boundary is at least
>>>>> bogus.
>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>> I think Jan is right: for PVH its _our_ job to define the correct
>>> placement. 
>> Yes, and if it is placed in a non-standard location then the guest will
>> have to deal with it in a non-standard way. Which we can in Linux by
>> setting acpi_rsdp pointer in the special PVH entry point, before jumping
>> to Linux "standard" entry --- startup_{32|64}().
>>
>> But if your goal is to avoid that special entry point (and thus not set
>> acpi_rsdp) then how do you expect kernel to find RSDP?
>>
>>> Which still can be the same as in the BIOS case, making
>>> it easier to adapt any guest systems.
>>>
>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>> info block to the guest. In case there is no conflict with the
>>> physical load address of the guest kernel the preferred address of
>>> the RSDP is right below the 1MB boundary.
>> And what do we do if there *is* a conflict?
> As a random alternative, what about writing up an RSDP reference into
> the zeropage?

zeropage is an ABI with no provision for ACPI.

>
> I'd be surprised if Xen PVH is the only software in this position of
> trying to use the native paths wherever possible, and not retaining
> legacy ideas of a PC system.


I am not aware of any other guests that completely avoid legacy stuff.
But as I mentioned in another reply, KVM people may be looking at this
as well now.

-boris
Juergen Gross - Nov. 20, 2017, 3:24 p.m.
On 20/11/17 15:20, Jan Beulich wrote:
>>>> On 20.11.17 at 15:14, <jgross@suse.com> wrote:
>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>> removing one.
>>>> What is "architecturally correct" in PVH can't be read out of
>>>> specs other than what we write down. When there's no BIOS,
>>>> placing anything right below the 1Mb boundary is at least
>>>> bogus.
>>>
>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>
>> I think Jan is right: for PVH its _our_ job to define the correct
>> placement. Which still can be the same as in the BIOS case, making
>> it easier to adapt any guest systems.
>>
>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>> info block to the guest. In case there is no conflict with the
>> physical load address of the guest kernel the preferred address of
>> the RSDP is right below the 1MB boundary.
>>
>> Would this wording be okay?
> 
> To be honest (and in case it wasn't sufficiently clear form my
> earlier replies) - I'm pretty much opposed to this below-1Mb thing.
> There ought to be just plain RAM there for PVH.

So without my patch the RSDP table is loaded e.g. at about 6.5MB when
I'm using grub2 (the loaded grub image is about 5.5MB in size and it
is being loaded at 1MB).

When I'm using the PVH Linux kernel directly RSDP is just below 1MB
due to pure luck (the bzImage loader is still using the PV specific
ELF notes and this results in the loader believing RSDP is loadable
at this address, which is true, but the tests used to come to this
conclusion are just not applicable for PVH).

So in your opinion we should revoke the PVH support from Xen 4.10,
Linux and maybe BSD because RSDP is loaded in middle of RAM of the
guest? Doing it in a proper way you are outlining above would render
current PVH guests unusable.


Juergen
Juergen Gross - Nov. 20, 2017, 3:27 p.m.
On 20/11/17 15:25, Boris Ostrovsky wrote:
> On 11/20/2017 09:14 AM, Juergen Gross wrote:
>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>> removing one.
>>>> What is "architecturally correct" in PVH can't be read out of
>>>> specs other than what we write down. When there's no BIOS,
>>>> placing anything right below the 1Mb boundary is at least
>>>> bogus.
>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>> I think Jan is right: for PVH its _our_ job to define the correct
>> placement. 
> 
> Yes, and if it is placed in a non-standard location then the guest will
> have to deal with it in a non-standard way. Which we can in Linux by
> setting acpi_rsdp pointer in the special PVH entry point, before jumping
> to Linux "standard" entry --- startup_{32|64}().
> 
> But if your goal is to avoid that special entry point (and thus not set
> acpi_rsdp) then how do you expect kernel to find RSDP?
> 
>> Which still can be the same as in the BIOS case, making
>> it easier to adapt any guest systems.
>>
>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>> info block to the guest. In case there is no conflict with the
>> physical load address of the guest kernel the preferred address of
>> the RSDP is right below the 1MB boundary.
> 
> And what do we do if there *is* a conflict?

Either as without my patch: use the first available free memory page.

Or: add a domain config parameter for specifying the RSDP address
(e.g. default: as today, top: end of RAM, legacy: just below 1MB, or
a specific value) and fail to load in case of a conflict.


Juergen
Jan Beulich - Nov. 20, 2017, 4:14 p.m.
>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
> On 20/11/17 15:20, Jan Beulich wrote:
>>>>> On 20.11.17 at 15:14, <jgross@suse.com> wrote:
>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>> removing one.
>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>> specs other than what we write down. When there's no BIOS,
>>>>> placing anything right below the 1Mb boundary is at least
>>>>> bogus.
>>>>
>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>>
>>> I think Jan is right: for PVH its _our_ job to define the correct
>>> placement. Which still can be the same as in the BIOS case, making
>>> it easier to adapt any guest systems.
>>>
>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>> info block to the guest. In case there is no conflict with the
>>> physical load address of the guest kernel the preferred address of
>>> the RSDP is right below the 1MB boundary.
>>>
>>> Would this wording be okay?
>> 
>> To be honest (and in case it wasn't sufficiently clear form my
>> earlier replies) - I'm pretty much opposed to this below-1Mb thing.
>> There ought to be just plain RAM there for PVH.
> 
> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
> is being loaded at 1MB).
> 
> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
> due to pure luck (the bzImage loader is still using the PV specific
> ELF notes and this results in the loader believing RSDP is loadable
> at this address, which is true, but the tests used to come to this
> conclusion are just not applicable for PVH).
> 
> So in your opinion we should revoke the PVH support from Xen 4.10,
> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
> guest?

So what's wrong with it being put wherever the next free memory
location is being determined to be by the loader, just like is being
done for other information, including modules (if any)?

> Doing it in a proper way you are outlining above would render
> current PVH guests unusable.

I'm afraid I don't understand which outline of mine you refer to.
Iirc all I said was that placing it below 1Mb is bogus. Top of RAM
(as I think I saw being mentioned elsewhere) probably isn't much
better. Special locations should be used only if there's really no
other way to convey information.

Jan
Boris Ostrovsky - Nov. 20, 2017, 4:14 p.m.
On 11/20/2017 10:27 AM, Juergen Gross wrote:
> On 20/11/17 15:25, Boris Ostrovsky wrote:
>> On 11/20/2017 09:14 AM, Juergen Gross wrote:
>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>> removing one.
>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>> specs other than what we write down. When there's no BIOS,
>>>>> placing anything right below the 1Mb boundary is at least
>>>>> bogus.
>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>> I think Jan is right: for PVH its _our_ job to define the correct
>>> placement. 
>> Yes, and if it is placed in a non-standard location then the guest will
>> have to deal with it in a non-standard way. Which we can in Linux by
>> setting acpi_rsdp pointer in the special PVH entry point, before jumping
>> to Linux "standard" entry --- startup_{32|64}().
>>
>> But if your goal is to avoid that special entry point (and thus not set
>> acpi_rsdp) then how do you expect kernel to find RSDP?
>>
>>> Which still can be the same as in the BIOS case, making
>>> it easier to adapt any guest systems.
>>>
>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>> info block to the guest. In case there is no conflict with the
>>> physical load address of the guest kernel the preferred address of
>>> the RSDP is right below the 1MB boundary.
>> And what do we do if there *is* a conflict?
> Either as without my patch: use the first available free memory page.
>
> Or: add a domain config parameter for specifying the RSDP address
> (e.g. default: as today, top: end of RAM, legacy: just below 1MB, or
> a specific value) and fail to load in case of a conflict.

This feels like a band-aid to work around a problem that we want to fix
in the long term anyway.

What could cause grub2 to fail to find space for the pointer in the
first page? Will we ever have anything in EBDA (which is one of the
possible RSDP locations)?

-boris
Jan Beulich - Nov. 20, 2017, 4:26 p.m.
>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
> What could cause grub2 to fail to find space for the pointer in the
> first page? Will we ever have anything in EBDA (which is one of the
> possible RSDP locations)?

Well, the EBDA (see the B in its name) is again something that's
meaningless without there being a BIOS.

Jan
Boris Ostrovsky - Nov. 20, 2017, 4:28 p.m.
On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>> What could cause grub2 to fail to find space for the pointer in the
>> first page? Will we ever have anything in EBDA (which is one of the
>> possible RSDP locations)?
> Well, the EBDA (see the B in its name) is again something that's
> meaningless without there being a BIOS.

Exactly. So it should always be available for grub to copy the pointer
there.

-boris
Jan Beulich - Nov. 20, 2017, 4:43 p.m.
>>> On 20.11.17 at 17:28, <boris.ostrovsky@oracle.com> wrote:
> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>> What could cause grub2 to fail to find space for the pointer in the
>>> first page? Will we ever have anything in EBDA (which is one of the
>>> possible RSDP locations)?
>> Well, the EBDA (see the B in its name) is again something that's
>> meaningless without there being a BIOS.
> 
> Exactly. So it should always be available for grub to copy the pointer
> there.

But what use would it be if grub copied it there? It just shouldn't
be there, neither before nor after grub (just like grub doesn't
introduce firmware into the system).

Jan
Boris Ostrovsky - Nov. 20, 2017, 4:59 p.m.
On 11/20/2017 11:43 AM, Jan Beulich wrote:
>>>> On 20.11.17 at 17:28, <boris.ostrovsky@oracle.com> wrote:
>> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>>> What could cause grub2 to fail to find space for the pointer in the
>>>> first page? Will we ever have anything in EBDA (which is one of the
>>>> possible RSDP locations)?
>>> Well, the EBDA (see the B in its name) is again something that's
>>> meaningless without there being a BIOS.
>> Exactly. So it should always be available for grub to copy the pointer
>> there.
> But what use would it be if grub copied it there? It just shouldn't
> be there, neither before nor after grub (just like grub doesn't
> introduce firmware into the system).

So that the guest can find it using standard methods. If Xen can't
guarantee ACPI-compliant placement of the pointer then someone has to
help the guest find it in the expected place. We can do it with a
dedicated entry point by setting the pointer explicitly (although
admittedly this is not done correctly now) or we need to have firmware
(grub2) place it in the "right" location.

(It does look a bit hacky though)

-boris
Juergen Gross - Nov. 20, 2017, 6:28 p.m.
On 20/11/17 17:14, Jan Beulich wrote:
>>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
>> On 20/11/17 15:20, Jan Beulich wrote:
>>>>>> On 20.11.17 at 15:14, <jgross@suse.com> wrote:
>>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>>> removing one.
>>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>>> specs other than what we write down. When there's no BIOS,
>>>>>> placing anything right below the 1Mb boundary is at least
>>>>>> bogus.
>>>>>
>>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>>>
>>>> I think Jan is right: for PVH its _our_ job to define the correct
>>>> placement. Which still can be the same as in the BIOS case, making
>>>> it easier to adapt any guest systems.
>>>>
>>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>>> info block to the guest. In case there is no conflict with the
>>>> physical load address of the guest kernel the preferred address of
>>>> the RSDP is right below the 1MB boundary.
>>>>
>>>> Would this wording be okay?
>>>
>>> To be honest (and in case it wasn't sufficiently clear form my
>>> earlier replies) - I'm pretty much opposed to this below-1Mb thing.
>>> There ought to be just plain RAM there for PVH.
>>
>> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
>> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
>> is being loaded at 1MB).
>>
>> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
>> due to pure luck (the bzImage loader is still using the PV specific
>> ELF notes and this results in the loader believing RSDP is loadable
>> at this address, which is true, but the tests used to come to this
>> conclusion are just not applicable for PVH).
>>
>> So in your opinion we should revoke the PVH support from Xen 4.10,
>> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
>> guest?
> 
> So what's wrong with it being put wherever the next free memory
> location is being determined to be by the loader, just like is being
> done for other information, including modules (if any)?

The RSDP table is marked as "Reserved" in the memory map. So putting it
somewhere in the middle of the guest's memory will force the guest to
use 4kB pages instead of 2MB or even 1GB pages. I'd really like to avoid
this problem, as we've been hit by the very same in HVM guests before
causing quite measurable performance drops.

So I'd rather put it in the first MB as most kernels have to deal with
small pages at beginning of RAM today. An alternative would be to put
it just below 4GB where e.g. the console and Xenstore page are located.

> 
>> Doing it in a proper way you are outlining above would render
>> current PVH guests unusable.
> 
> I'm afraid I don't understand which outline of mine you refer to.
> Iirc all I said was that placing it below 1Mb is bogus. Top of RAM
> (as I think I saw being mentioned elsewhere) probably isn't much
> better. Special locations should be used only if there's really no
> other way to convey information.

I believe it is better to define where reserved memory areas are located
in order to make it easier for a kernel to deal with the consequences. A
location like "somewhere in memory, either just after the loaded kernel,
or after the boot loader which was loaded previously" seems not to be a
proper solution. Especially as a change in size of the boot loader could
make it impossible to load the kernel at the desired location, as this
would contradict the information returned by the hypervisor in the
memory map.


Juergen
Juergen Gross - Nov. 20, 2017, 6:29 p.m.
On 20/11/17 17:14, Boris Ostrovsky wrote:
> On 11/20/2017 10:27 AM, Juergen Gross wrote:
>> On 20/11/17 15:25, Boris Ostrovsky wrote:
>>> On 11/20/2017 09:14 AM, Juergen Gross wrote:
>>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>>> removing one.
>>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>>> specs other than what we write down. When there's no BIOS,
>>>>>> placing anything right below the 1Mb boundary is at least
>>>>>> bogus.
>>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>>> I think Jan is right: for PVH its _our_ job to define the correct
>>>> placement. 
>>> Yes, and if it is placed in a non-standard location then the guest will
>>> have to deal with it in a non-standard way. Which we can in Linux by
>>> setting acpi_rsdp pointer in the special PVH entry point, before jumping
>>> to Linux "standard" entry --- startup_{32|64}().
>>>
>>> But if your goal is to avoid that special entry point (and thus not set
>>> acpi_rsdp) then how do you expect kernel to find RSDP?
>>>
>>>> Which still can be the same as in the BIOS case, making
>>>> it easier to adapt any guest systems.
>>>>
>>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>>> info block to the guest. In case there is no conflict with the
>>>> physical load address of the guest kernel the preferred address of
>>>> the RSDP is right below the 1MB boundary.
>>> And what do we do if there *is* a conflict?
>> Either as without my patch: use the first available free memory page.
>>
>> Or: add a domain config parameter for specifying the RSDP address
>> (e.g. default: as today, top: end of RAM, legacy: just below 1MB, or
>> a specific value) and fail to load in case of a conflict.
> 
> This feels like a band-aid to work around a problem that we want to fix
> in the long term anyway.
> 
> What could cause grub2 to fail to find space for the pointer in the
> first page? Will we ever have anything in EBDA (which is one of the
> possible RSDP locations)?

This isn't something grub2 has to deal with. The RSDP is in a reserved
area of the memory, so it can't be relocated by grub2.


Juergen
Jan Beulich - Nov. 21, 2017, 7:44 a.m.
>>> On 20.11.17 at 17:59, <boris.ostrovsky@oracle.com> wrote:
> On 11/20/2017 11:43 AM, Jan Beulich wrote:
>>>>> On 20.11.17 at 17:28, <boris.ostrovsky@oracle.com> wrote:
>>> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>>>> What could cause grub2 to fail to find space for the pointer in the
>>>>> first page? Will we ever have anything in EBDA (which is one of the
>>>>> possible RSDP locations)?
>>>> Well, the EBDA (see the B in its name) is again something that's
>>>> meaningless without there being a BIOS.
>>> Exactly. So it should always be available for grub to copy the pointer
>>> there.
>> But what use would it be if grub copied it there? It just shouldn't
>> be there, neither before nor after grub (just like grub doesn't
>> introduce firmware into the system).
> 
> So that the guest can find it using standard methods. If Xen can't
> guarantee ACPI-compliant placement of the pointer then someone has to
> help the guest find it in the expected place. We can do it with a
> dedicated entry point by setting the pointer explicitly (although
> admittedly this is not done correctly now) or we need to have firmware
> (grub2) place it in the "right" location.
> 
> (It does look a bit hacky though)

Indeed. Of course ACPI without any actual firmware is sort of odd,
too. As to dedicated entry point and its alternatives: Xen itself
tells grub (aiui we're talking about a flavor of it running PVH itself)
where the RSDP is. Why can't grub forward that information in a
suitable way (e.g. via a new tag, or - for Linux - as a new entry
in the Linux boot header)?

Jan
Jan Beulich - Nov. 21, 2017, 7:50 a.m.
>>> On 20.11.17 at 19:28, <jgross@suse.com> wrote:
> On 20/11/17 17:14, Jan Beulich wrote:
>>>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
>>> On 20/11/17 15:20, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 15:14, <jgross@suse.com> wrote:
>>>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>>>> removing one.
>>>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>>>> specs other than what we write down. When there's no BIOS,
>>>>>>> placing anything right below the 1Mb boundary is at least
>>>>>>> bogus.
>>>>>>
>>>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>>>>
>>>>> I think Jan is right: for PVH its _our_ job to define the correct
>>>>> placement. Which still can be the same as in the BIOS case, making
>>>>> it easier to adapt any guest systems.
>>>>>
>>>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>>>> info block to the guest. In case there is no conflict with the
>>>>> physical load address of the guest kernel the preferred address of
>>>>> the RSDP is right below the 1MB boundary.
>>>>>
>>>>> Would this wording be okay?
>>>>
>>>> To be honest (and in case it wasn't sufficiently clear form my
>>>> earlier replies) - I'm pretty much opposed to this below-1Mb thing.
>>>> There ought to be just plain RAM there for PVH.
>>>
>>> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
>>> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
>>> is being loaded at 1MB).
>>>
>>> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
>>> due to pure luck (the bzImage loader is still using the PV specific
>>> ELF notes and this results in the loader believing RSDP is loadable
>>> at this address, which is true, but the tests used to come to this
>>> conclusion are just not applicable for PVH).
>>>
>>> So in your opinion we should revoke the PVH support from Xen 4.10,
>>> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
>>> guest?
>> 
>> So what's wrong with it being put wherever the next free memory
>> location is being determined to be by the loader, just like is being
>> done for other information, including modules (if any)?
> 
> The RSDP table is marked as "Reserved" in the memory map. So putting it
> somewhere in the middle of the guest's memory will force the guest to
> use 4kB pages instead of 2MB or even 1GB pages. I'd really like to avoid
> this problem, as we've been hit by the very same in HVM guests before
> causing quite measurable performance drops.

This is a valid point.

> So I'd rather put it in the first MB as most kernels have to deal with
> small pages at beginning of RAM today. An alternative would be to put
> it just below 4GB where e.g. the console and Xenstore page are located.

Putting it in the first Mb implies that mappings there will continue to
be 4k ones. I can't, however, see why for PVH that should be
necessary: There's no BIOS and nothing legacy that needs to live
there, so other than HVM it could benefit from using a 1Gb mapping
even at address zero (even if this might be something that can't
be achieved right away). So yes, if anything, the allocation should
be made top down starting from 4Gb. Otoh, I don't see a strict
need for this area to live below 4Gb in the first place.

>>> Doing it in a proper way you are outlining above would render
>>> current PVH guests unusable.
>> 
>> I'm afraid I don't understand which outline of mine you refer to.
>> Iirc all I said was that placing it below 1Mb is bogus. Top of RAM
>> (as I think I saw being mentioned elsewhere) probably isn't much
>> better. Special locations should be used only if there's really no
>> other way to convey information.
> 
> I believe it is better to define where reserved memory areas are located
> in order to make it easier for a kernel to deal with the consequences. A
> location like "somewhere in memory, either just after the loaded kernel,
> or after the boot loader which was loaded previously" seems not to be a
> proper solution. Especially as a change in size of the boot loader could
> make it impossible to load the kernel at the desired location, as this
> would contradict the information returned by the hypervisor in the
> memory map.

Well, you imply here that the kernel has to search for the structure.
In that case having a well defined, reasonably narrow range is of
course helpful. But I think we should, just like EFI does, prefer to
avoid any need for searching here. See my reply to Boris from a few
minutes ago.

Jan
Juergen Gross - Nov. 21, 2017, 8:13 a.m.
On 21/11/17 08:50, Jan Beulich wrote:
>>>> On 20.11.17 at 19:28, <jgross@suse.com> wrote:
>> On 20/11/17 17:14, Jan Beulich wrote:
>>>>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
>>>> On 20/11/17 15:20, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 15:14, <jgross@suse.com> wrote:
>>>>>> On 20/11/17 14:56, Boris Ostrovsky wrote:
>>>>>>> On 11/20/2017 06:50 AM, Jan Beulich wrote:
>>>>>>>>>>> On 20.11.17 at 12:20, <jgross@suse.com> wrote:
>>>>>>>>> Which restriction? I'm loading the RSDP table to its architectural
>>>>>>>>> correct addres if possible, otherwise it will be loaded to the same
>>>>>>>>> address as without my patch. So I'm not adding a restriction, but
>>>>>>>>> removing one.
>>>>>>>> What is "architecturally correct" in PVH can't be read out of
>>>>>>>> specs other than what we write down. When there's no BIOS,
>>>>>>>> placing anything right below the 1Mb boundary is at least
>>>>>>>> bogus.
>>>>>>>
>>>>>>> Unless it's a UEFI boot -- where else would you put it? Aren't these two
>>>>>>> (UEFI and non-UEFI) the only two options that the ACPI spec provides?
>>>>>>
>>>>>> I think Jan is right: for PVH its _our_ job to define the correct
>>>>>> placement. Which still can be the same as in the BIOS case, making
>>>>>> it easier to adapt any guest systems.
>>>>>>
>>>>>> So I'd say: The RSDP address in PVH case is passed in the PVH start
>>>>>> info block to the guest. In case there is no conflict with the
>>>>>> physical load address of the guest kernel the preferred address of
>>>>>> the RSDP is right below the 1MB boundary.
>>>>>>
>>>>>> Would this wording be okay?
>>>>>
>>>>> To be honest (and in case it wasn't sufficiently clear form my
>>>>> earlier replies) - I'm pretty much opposed to this below-1Mb thing.
>>>>> There ought to be just plain RAM there for PVH.
>>>>
>>>> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
>>>> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
>>>> is being loaded at 1MB).
>>>>
>>>> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
>>>> due to pure luck (the bzImage loader is still using the PV specific
>>>> ELF notes and this results in the loader believing RSDP is loadable
>>>> at this address, which is true, but the tests used to come to this
>>>> conclusion are just not applicable for PVH).
>>>>
>>>> So in your opinion we should revoke the PVH support from Xen 4.10,
>>>> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
>>>> guest?
>>>
>>> So what's wrong with it being put wherever the next free memory
>>> location is being determined to be by the loader, just like is being
>>> done for other information, including modules (if any)?
>>
>> The RSDP table is marked as "Reserved" in the memory map. So putting it
>> somewhere in the middle of the guest's memory will force the guest to
>> use 4kB pages instead of 2MB or even 1GB pages. I'd really like to avoid
>> this problem, as we've been hit by the very same in HVM guests before
>> causing quite measurable performance drops.
> 
> This is a valid point.
> 
>> So I'd rather put it in the first MB as most kernels have to deal with
>> small pages at beginning of RAM today. An alternative would be to put
>> it just below 4GB where e.g. the console and Xenstore page are located.
> 
> Putting it in the first Mb implies that mappings there will continue to
> be 4k ones. I can't, however, see why for PVH that should be
> necessary: There's no BIOS and nothing legacy that needs to live
> there, so other than HVM it could benefit from using a 1Gb mapping
> even at address zero (even if this might be something that can't
> be achieved right away). So yes, if anything, the allocation should
> be made top down starting from 4Gb. Otoh, I don't see a strict
> need for this area to live below 4Gb in the first place.

The physical RSDP address in the PVH start info block is 32 bits
only. So it can't be above 4GB.

>>>> Doing it in a proper way you are outlining above would render
>>>> current PVH guests unusable.
>>>
>>> I'm afraid I don't understand which outline of mine you refer to.
>>> Iirc all I said was that placing it below 1Mb is bogus. Top of RAM
>>> (as I think I saw being mentioned elsewhere) probably isn't much
>>> better. Special locations should be used only if there's really no
>>> other way to convey information.
>>
>> I believe it is better to define where reserved memory areas are located
>> in order to make it easier for a kernel to deal with the consequences. A
>> location like "somewhere in memory, either just after the loaded kernel,
>> or after the boot loader which was loaded previously" seems not to be a
>> proper solution. Especially as a change in size of the boot loader could
>> make it impossible to load the kernel at the desired location, as this
>> would contradict the information returned by the hypervisor in the
>> memory map.
> 
> Well, you imply here that the kernel has to search for the structure.
> In that case having a well defined, reasonably narrow range is of
> course helpful. But I think we should, just like EFI does, prefer to
> avoid any need for searching here. See my reply to Boris from a few
> minutes ago.
Placing it right below 1MB has the advantage we don't have to change
any interface between boot loader and kernel, while the disadvantage
is the rather hacky "search for the table" approach. OTOH this
interface is already existing, so we are not inventing it.

Roger, what would be your preference how the RSDP is being found by
the OS when started via grub2 using the common boot entry between
native and PVH mode?


Juergen
Jan Beulich - Nov. 21, 2017, 8:46 a.m.
>>> On 21.11.17 at 09:13, <jgross@suse.com> wrote:
> On 21/11/17 08:50, Jan Beulich wrote:
>>>>> On 20.11.17 at 19:28, <jgross@suse.com> wrote:
>>> On 20/11/17 17:14, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
>>>>> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
>>>>> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
>>>>> is being loaded at 1MB).
>>>>>
>>>>> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
>>>>> due to pure luck (the bzImage loader is still using the PV specific
>>>>> ELF notes and this results in the loader believing RSDP is loadable
>>>>> at this address, which is true, but the tests used to come to this
>>>>> conclusion are just not applicable for PVH).
>>>>>
>>>>> So in your opinion we should revoke the PVH support from Xen 4.10,
>>>>> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
>>>>> guest?
>>>>
>>>> So what's wrong with it being put wherever the next free memory
>>>> location is being determined to be by the loader, just like is being
>>>> done for other information, including modules (if any)?
>>>
>>> The RSDP table is marked as "Reserved" in the memory map. So putting it
>>> somewhere in the middle of the guest's memory will force the guest to
>>> use 4kB pages instead of 2MB or even 1GB pages. I'd really like to avoid
>>> this problem, as we've been hit by the very same in HVM guests before
>>> causing quite measurable performance drops.
>> 
>> This is a valid point.
>> 
>>> So I'd rather put it in the first MB as most kernels have to deal with
>>> small pages at beginning of RAM today. An alternative would be to put
>>> it just below 4GB where e.g. the console and Xenstore page are located.
>> 
>> Putting it in the first Mb implies that mappings there will continue to
>> be 4k ones. I can't, however, see why for PVH that should be
>> necessary: There's no BIOS and nothing legacy that needs to live
>> there, so other than HVM it could benefit from using a 1Gb mapping
>> even at address zero (even if this might be something that can't
>> be achieved right away). So yes, if anything, the allocation should
>> be made top down starting from 4Gb. Otoh, I don't see a strict
>> need for this area to live below 4Gb in the first place.
> 
> The physical RSDP address in the PVH start info block is 32 bits
> only. So it can't be above 4GB.

struct hvm_start_info {
    uint32_t magic;             /* Contains the magic value 0x336ec578       */
                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
    uint32_t version;           /* Version of this structure.                */
    uint32_t flags;             /* SIF_xxx flags.                            */
    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
    uint64_t modlist_paddr;     /* Physical address of an array of           */
                                /* hvm_modlist_entry.                        */
    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
    uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
                                /* structure.                                */
};

Granted a comment a few lines up in the public header says "NB: Xen
on x86 will always try to place all the data below the 4GiB boundary."

Jan
Juergen Gross - Nov. 21, 2017, 9:37 a.m.
On 21/11/17 09:46, Jan Beulich wrote:
>>>> On 21.11.17 at 09:13, <jgross@suse.com> wrote:
>> On 21/11/17 08:50, Jan Beulich wrote:
>>>>>> On 20.11.17 at 19:28, <jgross@suse.com> wrote:
>>>> On 20/11/17 17:14, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
>>>>>> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
>>>>>> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
>>>>>> is being loaded at 1MB).
>>>>>>
>>>>>> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
>>>>>> due to pure luck (the bzImage loader is still using the PV specific
>>>>>> ELF notes and this results in the loader believing RSDP is loadable
>>>>>> at this address, which is true, but the tests used to come to this
>>>>>> conclusion are just not applicable for PVH).
>>>>>>
>>>>>> So in your opinion we should revoke the PVH support from Xen 4.10,
>>>>>> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
>>>>>> guest?
>>>>>
>>>>> So what's wrong with it being put wherever the next free memory
>>>>> location is being determined to be by the loader, just like is being
>>>>> done for other information, including modules (if any)?
>>>>
>>>> The RSDP table is marked as "Reserved" in the memory map. So putting it
>>>> somewhere in the middle of the guest's memory will force the guest to
>>>> use 4kB pages instead of 2MB or even 1GB pages. I'd really like to avoid
>>>> this problem, as we've been hit by the very same in HVM guests before
>>>> causing quite measurable performance drops.
>>>
>>> This is a valid point.
>>>
>>>> So I'd rather put it in the first MB as most kernels have to deal with
>>>> small pages at beginning of RAM today. An alternative would be to put
>>>> it just below 4GB where e.g. the console and Xenstore page are located.
>>>
>>> Putting it in the first Mb implies that mappings there will continue to
>>> be 4k ones. I can't, however, see why for PVH that should be
>>> necessary: There's no BIOS and nothing legacy that needs to live
>>> there, so other than HVM it could benefit from using a 1Gb mapping
>>> even at address zero (even if this might be something that can't
>>> be achieved right away). So yes, if anything, the allocation should
>>> be made top down starting from 4Gb. Otoh, I don't see a strict
>>> need for this area to live below 4Gb in the first place.
>>
>> The physical RSDP address in the PVH start info block is 32 bits
>> only. So it can't be above 4GB.
> 
> struct hvm_start_info {
>     uint32_t magic;             /* Contains the magic value 0x336ec578       */
>                                 /* ("xEn3" with the 0x80 bit of the "E" set).*/
>     uint32_t version;           /* Version of this structure.                */
>     uint32_t flags;             /* SIF_xxx flags.                            */
>     uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
>     uint64_t modlist_paddr;     /* Physical address of an array of           */
>                                 /* hvm_modlist_entry.                        */
>     uint64_t cmdline_paddr;     /* Physical address of the command line.     */
>     uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
>                                 /* structure.                                */
> };

Oh, it seems I have been looking into an outdated document. Thanks for
the correction.

> Granted a comment a few lines up in the public header says "NB: Xen
> on x86 will always try to place all the data below the 4GiB boundary."

Okay.


Juergen
Andrew Cooper - Nov. 21, 2017, 10:42 a.m.
On 21/11/17 07:44, Jan Beulich wrote:
>>>> On 20.11.17 at 17:59, <boris.ostrovsky@oracle.com> wrote:
>> On 11/20/2017 11:43 AM, Jan Beulich wrote:
>>>>>> On 20.11.17 at 17:28, <boris.ostrovsky@oracle.com> wrote:
>>>> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>>>>> What could cause grub2 to fail to find space for the pointer in the
>>>>>> first page? Will we ever have anything in EBDA (which is one of the
>>>>>> possible RSDP locations)?
>>>>> Well, the EBDA (see the B in its name) is again something that's
>>>>> meaningless without there being a BIOS.
>>>> Exactly. So it should always be available for grub to copy the pointer
>>>> there.
>>> But what use would it be if grub copied it there? It just shouldn't
>>> be there, neither before nor after grub (just like grub doesn't
>>> introduce firmware into the system).
>> So that the guest can find it using standard methods. If Xen can't
>> guarantee ACPI-compliant placement of the pointer then someone has to
>> help the guest find it in the expected place. We can do it with a
>> dedicated entry point by setting the pointer explicitly (although
>> admittedly this is not done correctly now) or we need to have firmware
>> (grub2) place it in the "right" location.
>>
>> (It does look a bit hacky though)
> Indeed. Of course ACPI without any actual firmware is sort of odd,
> too. As to dedicated entry point and its alternatives: Xen itself
> tells grub (aiui we're talking about a flavor of it running PVH itself)
> where the RSDP is. Why can't grub forward that information in a
> suitable way (e.g. via a new tag, or - for Linux - as a new entry
> in the Linux boot header)?

Or if the worst comes to the worst, fabricate an acpi_rsdp= command line
parameter?

~Andrew
Andrew Cooper - Nov. 21, 2017, 10:44 a.m.
On 21/11/17 09:37, Juergen Gross wrote:
> On 21/11/17 09:46, Jan Beulich wrote:
>>>>> On 21.11.17 at 09:13, <jgross@suse.com> wrote:
>>> On 21/11/17 08:50, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 19:28, <jgross@suse.com> wrote:
>>>>> On 20/11/17 17:14, Jan Beulich wrote:
>>>>>>>>> On 20.11.17 at 16:24, <jgross@suse.com> wrote:
>>>>>>> So without my patch the RSDP table is loaded e.g. at about 6.5MB when
>>>>>>> I'm using grub2 (the loaded grub image is about 5.5MB in size and it
>>>>>>> is being loaded at 1MB).
>>>>>>>
>>>>>>> When I'm using the PVH Linux kernel directly RSDP is just below 1MB
>>>>>>> due to pure luck (the bzImage loader is still using the PV specific
>>>>>>> ELF notes and this results in the loader believing RSDP is loadable
>>>>>>> at this address, which is true, but the tests used to come to this
>>>>>>> conclusion are just not applicable for PVH).
>>>>>>>
>>>>>>> So in your opinion we should revoke the PVH support from Xen 4.10,
>>>>>>> Linux and maybe BSD because RSDP is loaded in middle of RAM of the
>>>>>>> guest?
>>>>>> So what's wrong with it being put wherever the next free memory
>>>>>> location is being determined to be by the loader, just like is being
>>>>>> done for other information, including modules (if any)?
>>>>> The RSDP table is marked as "Reserved" in the memory map. So putting it
>>>>> somewhere in the middle of the guest's memory will force the guest to
>>>>> use 4kB pages instead of 2MB or even 1GB pages. I'd really like to avoid
>>>>> this problem, as we've been hit by the very same in HVM guests before
>>>>> causing quite measurable performance drops.
>>>> This is a valid point.
>>>>
>>>>> So I'd rather put it in the first MB as most kernels have to deal with
>>>>> small pages at beginning of RAM today. An alternative would be to put
>>>>> it just below 4GB where e.g. the console and Xenstore page are located.
>>>> Putting it in the first Mb implies that mappings there will continue to
>>>> be 4k ones. I can't, however, see why for PVH that should be
>>>> necessary: There's no BIOS and nothing legacy that needs to live
>>>> there, so other than HVM it could benefit from using a 1Gb mapping
>>>> even at address zero (even if this might be something that can't
>>>> be achieved right away). So yes, if anything, the allocation should
>>>> be made top down starting from 4Gb. Otoh, I don't see a strict
>>>> need for this area to live below 4Gb in the first place.
>>> The physical RSDP address in the PVH start info block is 32 bits
>>> only. So it can't be above 4GB.
>> struct hvm_start_info {
>>     uint32_t magic;             /* Contains the magic value 0x336ec578       */
>>                                 /* ("xEn3" with the 0x80 bit of the "E" set).*/
>>     uint32_t version;           /* Version of this structure.                */
>>     uint32_t flags;             /* SIF_xxx flags.                            */
>>     uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
>>     uint64_t modlist_paddr;     /* Physical address of an array of           */
>>                                 /* hvm_modlist_entry.                        */
>>     uint64_t cmdline_paddr;     /* Physical address of the command line.     */
>>     uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
>>                                 /* structure.                                */
>> };
> Oh, it seems I have been looking into an outdated document. Thanks for
> the correction.
>
>> Granted a comment a few lines up in the public header says "NB: Xen
>> on x86 will always try to place all the data below the 4GiB boundary."
> Okay.

The 4GB limit is specifically because we don't know a priori whether it
is a 32 or 64bit guest, and we agreed not to put the tables anywhere you
couldn't reach with paging disabled.

~Andrew
Juergen Gross - Nov. 21, 2017, 11:13 a.m.
On 21/11/17 11:42, Andrew Cooper wrote:
> On 21/11/17 07:44, Jan Beulich wrote:
>>>>> On 20.11.17 at 17:59, <boris.ostrovsky@oracle.com> wrote:
>>> On 11/20/2017 11:43 AM, Jan Beulich wrote:
>>>>>>> On 20.11.17 at 17:28, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>>>>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>>>>>> What could cause grub2 to fail to find space for the pointer in the
>>>>>>> first page? Will we ever have anything in EBDA (which is one of the
>>>>>>> possible RSDP locations)?
>>>>>> Well, the EBDA (see the B in its name) is again something that's
>>>>>> meaningless without there being a BIOS.
>>>>> Exactly. So it should always be available for grub to copy the pointer
>>>>> there.
>>>> But what use would it be if grub copied it there? It just shouldn't
>>>> be there, neither before nor after grub (just like grub doesn't
>>>> introduce firmware into the system).
>>> So that the guest can find it using standard methods. If Xen can't
>>> guarantee ACPI-compliant placement of the pointer then someone has to
>>> help the guest find it in the expected place. We can do it with a
>>> dedicated entry point by setting the pointer explicitly (although
>>> admittedly this is not done correctly now) or we need to have firmware
>>> (grub2) place it in the "right" location.
>>>
>>> (It does look a bit hacky though)
>> Indeed. Of course ACPI without any actual firmware is sort of odd,
>> too. As to dedicated entry point and its alternatives: Xen itself
>> tells grub (aiui we're talking about a flavor of it running PVH itself)
>> where the RSDP is. Why can't grub forward that information in a
>> suitable way (e.g. via a new tag, or - for Linux - as a new entry
>> in the Linux boot header)?
> 
> Or if the worst comes to the worst, fabricate an acpi_rsdp= command line
> parameter?

This would be easy: just replace the #ifdef CONFIG_KEXEC in
drivers/acpi/osl.c of the Linux kernel with:

#if defined(CONFIG_KEXEC) || defined(CONFIG_XEN_PVH)

and the parameter is usable and active.

Another possibility would be to let grub copy the RSDP to below 1MB and
add the E820 entry for it.

In any case it seems save to let Xen place the RSDP just below 4GB,
together with console and Xenstore pages (so this area just expands).
grub can handle this either on its own or together with the kernel.

Lets see how Roger's plans with BSD look like.


Juergen
Roger Pau Monné - Nov. 22, 2017, 10:26 a.m.
On Tue, Nov 21, 2017 at 12:13 PM, Juergen Gross <jgross@suse.com> wrote:
> On 21/11/17 11:42, Andrew Cooper wrote:
>> On 21/11/17 07:44, Jan Beulich wrote:
>>>>>> On 20.11.17 at 17:59, <boris.ostrovsky@oracle.com> wrote:
>>>> On 11/20/2017 11:43 AM, Jan Beulich wrote:
>>>>>>>> On 20.11.17 at 17:28, <boris.ostrovsky@oracle.com> wrote:
>>>>>> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>>>>>>>>>> On 20.11.17 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>> What could cause grub2 to fail to find space for the pointer in the
>>>>>>>> first page? Will we ever have anything in EBDA (which is one of the
>>>>>>>> possible RSDP locations)?
>>>>>>> Well, the EBDA (see the B in its name) is again something that's
>>>>>>> meaningless without there being a BIOS.
>>>>>> Exactly. So it should always be available for grub to copy the pointer
>>>>>> there.
>>>>> But what use would it be if grub copied it there? It just shouldn't
>>>>> be there, neither before nor after grub (just like grub doesn't
>>>>> introduce firmware into the system).
>>>> So that the guest can find it using standard methods. If Xen can't
>>>> guarantee ACPI-compliant placement of the pointer then someone has to
>>>> help the guest find it in the expected place. We can do it with a
>>>> dedicated entry point by setting the pointer explicitly (although
>>>> admittedly this is not done correctly now) or we need to have firmware
>>>> (grub2) place it in the "right" location.
>>>>
>>>> (It does look a bit hacky though)
>>> Indeed. Of course ACPI without any actual firmware is sort of odd,
>>> too. As to dedicated entry point and its alternatives: Xen itself
>>> tells grub (aiui we're talking about a flavor of it running PVH itself)
>>> where the RSDP is. Why can't grub forward that information in a
>>> suitable way (e.g. via a new tag, or - for Linux - as a new entry
>>> in the Linux boot header)?
>>
>> Or if the worst comes to the worst, fabricate an acpi_rsdp= command line
>> parameter?
>
> This would be easy: just replace the #ifdef CONFIG_KEXEC in
> drivers/acpi/osl.c of the Linux kernel with:
>
> #if defined(CONFIG_KEXEC) || defined(CONFIG_XEN_PVH)
>
> and the parameter is usable and active.
>
> Another possibility would be to let grub copy the RSDP to below 1MB and
> add the E820 entry for it.
>
> In any case it seems save to let Xen place the RSDP just below 4GB,
> together with console and Xenstore pages (so this area just expands).
> grub can handle this either on its own or together with the kernel.
>
> Lets see how Roger's plans with BSD look like.

On FreeBSD the position of the RSDP is passed from the loader to the
kernel, memory scanning is only used if the loader hasn't provided a
RSDP address. I don't know much about the Linux boot protocol, but it
would seem sensible to pass this information from the loader to the
kernel in the boot information (IIRC it's called the zero page on
Linux), so that Linux tries to get the RSDP from the information
passed by the loader and then resorts to memory scanning if that's not
provided.

Roger.

Patch

diff --git a/tools/libxc/xc_dom_hvmloader.c b/tools/libxc/xc_dom_hvmloader.c
index 59f94e51e5..2284c7f9df 100644
--- a/tools/libxc/xc_dom_hvmloader.c
+++ b/tools/libxc/xc_dom_hvmloader.c
@@ -135,20 +135,43 @@  static int module_init_one(struct xc_dom_image *dom,
 {
     struct xc_dom_seg seg;
     void *dest;
+    xen_pfn_t start, end;
+    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
 
     if ( module->length )
     {
-        if ( xc_dom_alloc_segment(dom, &seg, name, 0, module->length) )
-            goto err;
-        dest = xc_dom_seg_to_ptr(dom, &seg);
-        if ( dest == NULL )
+        /*
+         * Check for module located below kernel.
+         * Make sure not to be fooled by a kernel based on virtual address.
+         */
+        if ( module->guest_addr_out && !(dom->kernel_seg.vstart >> 32) &&
+             module->guest_addr_out + module->length <= dom->kernel_seg.vstart )
         {
-            DOMPRINTF("%s: xc_dom_seg_to_ptr(dom, &seg) => NULL",
-                      __FUNCTION__);
-            goto err;
+            start = module->guest_addr_out / page_size;
+            end = (module->guest_addr_out + module->length + page_size - 1) /
+                  page_size;
+            dest = xc_dom_pfn_to_ptr(dom, start, end - start);
+            if ( dest == NULL )
+            {
+                DOMPRINTF("%s: xc_dom_pfn_to_ptr() => NULL", __FUNCTION__);
+                goto err;
+            }
+            dest += module->guest_addr_out - start * page_size;
+        }
+        else
+        {
+            if ( xc_dom_alloc_segment(dom, &seg, name, 0, module->length) )
+                goto err;
+            dest = xc_dom_seg_to_ptr(dom, &seg);
+            if ( dest == NULL )
+            {
+                DOMPRINTF("%s: xc_dom_seg_to_ptr(dom, &seg) => NULL",
+                          __FUNCTION__);
+                goto err;
+            }
+            module->guest_addr_out = seg.vstart;
         }
         memcpy(dest, module->data, module->length);
-        module->guest_addr_out = seg.vstart;
 
         assert(dom->mmio_start > 0 && dom->mmio_start < UINT32_MAX);
         if ( module->guest_addr_out > dom->mmio_start ||