Patchwork [V2,3/10] KVM/MMU: Add last_level in the struct mmu_spte_page

login
register
mail settings
Submitter 蓝天宇
Date Feb. 2, 2019, 1:38 a.m.
Message ID <20190202013825.51261-4-Tianyu.Lan@microsoft.com>
Download mbox | patch
Permalink /patch/716429/
State New
Headers show

Comments

蓝天宇 - Feb. 2, 2019, 1:38 a.m.
From: Lan Tianyu <Tianyu.Lan@microsoft.com>

This patch is to add last_level in the struct kvm_mmu_page. When build
flush tlb range list, last_level will be used to identify whehter the
page should be added into list.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu.c              | 3 +++
 2 files changed, 4 insertions(+)
Paolo Bonzini - Feb. 14, 2019, 4:12 p.m.
On 02/02/19 02:38, lantianyu1986@gmail.com wrote:
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ce770b446238..70cafd3f95ab 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  	if (level > PT_PAGE_TABLE_LEVEL)
>  		spte |= PT_PAGE_SIZE_MASK;
> +
> +	sp->last_level = is_last_spte(spte, level);

sp->last_level is always true here.

Paolo

>  	if (tdp_enabled)
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
Paolo Bonzini - Feb. 14, 2019, 4:32 p.m.
On 02/02/19 02:38, lantianyu1986@gmail.com wrote:
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ce770b446238..70cafd3f95ab 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  	if (level > PT_PAGE_TABLE_LEVEL)
>  		spte |= PT_PAGE_SIZE_MASK;
> +
> +	sp->last_level = is_last_spte(spte, level);

Wait, I wasn't thinking straight.  If a struct kvm_mmu_page exists, it
is never the last level.  Page table entries for the last level do not
have a struct kvm_mmu_page.

Therefore you don't need the flag after all.  I suspect your
calculations in patch 2 are off by one, and you actually need

	hlist_for_each_entry(sp, range->flush_list, flush_link) {
		int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1);
		...
	}

For example, if sp->role.level is 1 then the struct kvm_mmu_page is for
a page containing PTEs and covers an area of 2 MiB.

Thanks,

Paolo

>  	if (tdp_enabled)
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
蓝天宇 - Feb. 15, 2019, 3:05 p.m.
On Fri, Feb 15, 2019 at 12:32 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 02/02/19 02:38, lantianyu1986@gmail.com wrote:
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index ce770b446238..70cafd3f95ab 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >
> >       if (level > PT_PAGE_TABLE_LEVEL)
> >               spte |= PT_PAGE_SIZE_MASK;
> > +
> > +     sp->last_level = is_last_spte(spte, level);
>
> Wait, I wasn't thinking straight.  If a struct kvm_mmu_page exists, it
> is never the last level.  Page table entries for the last level do not
> have a struct kvm_mmu_page.
>
> Therefore you don't need the flag after all.  I suspect your
> calculations in patch 2 are off by one, and you actually need
>
>         hlist_for_each_entry(sp, range->flush_list, flush_link) {
>                 int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1);
>                 ...
>         }
>
> For example, if sp->role.level is 1 then the struct kvm_mmu_page is for
> a page containing PTEs and covers an area of 2 MiB.

Yes, you are right. Thanks to point out and will fix. The last_level
flag is to avoid adding middle page node(e.g, PGD, PMD)
into flush list. The address range will be duplicated if adding both
leaf, node and middle node into flush list.

>
> Thanks,
>
> Paolo
>
> >       if (tdp_enabled)
> >               spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> >                       kvm_is_mmio_pfn(pfn));
>
Paolo Bonzini - Feb. 15, 2019, 3:22 p.m.
On 15/02/19 16:05, Tianyu Lan wrote:
> Yes, you are right. Thanks to point out and will fix. The last_level
> flag is to avoid adding middle page node(e.g, PGD, PMD)
> into flush list. The address range will be duplicated if adding both
> leaf, node and middle node into flush list.

Hmm, that's not easy to track.  One kvm_mmu_page could include both leaf
and non-leaf page (for example a huge page for 0 to 2 MB and a page
table for 2 MB to 4 MB).

Is this really needed?  First, your benchmarks so far have been done
with sp->last_level always set to true.  Second, you would only
encounter this optimization in kvm_mmu_commit_zap_page when zapping a 1
GB region (which then would be invalidated twice, at both the PMD and
PGD level) or bigger.

Paolo

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a3d3e58fe0a..9d858d68c17a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -325,6 +325,7 @@  struct kvm_mmu_page {
 	struct hlist_node flush_link;
 	struct hlist_node hash_link;
 	bool unsync;
+	bool last_level;
 
 	/*
 	 * The following two entries are used to key the shadow page in the
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ce770b446238..70cafd3f95ab 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2918,6 +2918,9 @@  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
+
+	sp->last_level = is_last_spte(spte, level);
+
 	if (tdp_enabled)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));