Patchwork [v3,for-4.10,2/2] x86/mm: fix a potential race condition in modify_xen_mappings().

login
register
mail settings
Submitter Yu Zhang
Date Nov. 14, 2017, 6:53 a.m.
Message ID <1510642427-3629-2-git-send-email-yu.c.zhang@linux.intel.com>
Download mbox | patch
Permalink /patch/383839/
State New
Headers show

Comments

Yu Zhang - Nov. 14, 2017, 6:53 a.m.
In modify_xen_mappings(), a L1/L2 page table shall be freed,
if all entries of this page table are empty. Corresponding
L2/L3 PTE will need be cleared in such scenario.

However, concurrent paging structure modifications on different
CPUs may cause the L2/L3 PTEs to be already be cleared or set
to reference a superpage.

Therefore the logic to enumerate the L1/L2 page table and to
reset the corresponding L2/L3 PTE need to be protected with
spinlock. And the _PAGE_PRESENT and _PAGE_PSE flags need be
checked after the lock is obtained.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes in v3: 
According to comments from Jan Beulich:
  - indent the label by one space;
  - also check the _PAGE_PSE for L2E/L3E.
Others:
  - commit message changes.
---
 xen/arch/x86/mm.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
Jan Beulich - Nov. 14, 2017, 8:21 a.m.
>>> On 14.11.17 at 07:53, <yu.c.zhang@linux.intel.com> wrote:
> In modify_xen_mappings(), a L1/L2 page table shall be freed,
> if all entries of this page table are empty. Corresponding
> L2/L3 PTE will need be cleared in such scenario.
> 
> However, concurrent paging structure modifications on different
> CPUs may cause the L2/L3 PTEs to be already be cleared or set
> to reference a superpage.
> 
> Therefore the logic to enumerate the L1/L2 page table and to
> reset the corresponding L2/L3 PTE need to be protected with
> spinlock. And the _PAGE_PRESENT and _PAGE_PSE flags need be
> checked after the lock is obtained.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 1697be9..64ccd70 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5111,6 +5111,27 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
+            if ( locking )
+                spin_lock(&map_pgdir_lock);
+
+            /*
+             * L2E may be already cleared, or set to a superpage, by
+             * concurrent paging structure modifications on other CPUs.
+             */
+            if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
+            {
+                if ( locking )
+                    spin_unlock(&map_pgdir_lock);
+                goto check_l3;
+            }
+
+            if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
+            {
+                if ( locking )
+                    spin_unlock(&map_pgdir_lock);
+                continue;
+            }
+
             pl1e = l2e_to_l1e(*pl2e);
             for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                 if ( l1e_get_intpte(pl1e[i]) != 0 )
@@ -5119,11 +5140,16 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             {
                 /* Empty: zap the L2E and free the L1 page. */
                 l2e_write_atomic(pl2e, l2e_empty());
+                if ( locking )
+                    spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
                 free_xen_pagetable(pl1e);
             }
+            else if ( locking )
+                spin_unlock(&map_pgdir_lock);
         }
 
+ check_l3:
         /*
          * If we are not destroying mappings, or not done with the L3E,
          * skip the empty&free check.
@@ -5131,6 +5157,21 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
+        if ( locking )
+            spin_lock(&map_pgdir_lock);
+
+        /*
+         * L3E may be already cleared, or set to a superpage, by
+         * concurrent paging structure modifications on other CPUs.
+         */
+        if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ||
+              (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+        {
+            if ( locking )
+                spin_unlock(&map_pgdir_lock);
+            continue;
+        }
+
         pl2e = l3e_to_l2e(*pl3e);
         for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
             if ( l2e_get_intpte(pl2e[i]) != 0 )
@@ -5139,9 +5180,13 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         {
             /* Empty: zap the L3E and free the L2 page. */
             l3e_write_atomic(pl3e, l3e_empty());
+            if ( locking )
+                spin_unlock(&map_pgdir_lock);
             flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
             free_xen_pagetable(pl2e);
         }
+        else if ( locking )
+            spin_unlock(&map_pgdir_lock);
     }
 
     flush_area(NULL, FLUSH_TLB_GLOBAL);