Patchwork [V2,2/3] x86/efi: Fix EFI memory map leaks

login
register
mail settings
Submitter Sai Praneeth Prakhya
Date Dec. 5, 2018, 12:33 a.m.
Message ID <20181205003357.31077-3-sai.praneeth.prakhya@intel.com>
Download mbox | patch
Permalink /patch/672489/
State New
Headers show

Comments

Sai Praneeth Prakhya - Dec. 5, 2018, 12:33 a.m.
Presently, in efi subsystem of kernel, every time kernel allocates memory
for a new EFI memory map, it forgets to free the memory occupied by the
existing EFI memory map. This could be fixed by unmapping and freeing the
existing EFI memory map every time before installing a new EFI memory map.
Hence, modify efi_memmap_install() accordingly since it's the only place
which installs a new EFI memory map.

Presently, efi_memmap_alloc() allocates only physical memory and every
caller of efi_memmap_alloc() should remap the newly allocated memory in
order to use it. This extra step could sometimes lead to buggy error
handling conditions where in the allocated memory isn't freed should remap
fail. So, push the remap logic into efi_memmap_alloc() so that the error
handling could be improved and it also makes the caller look simpler.

With the modified efi_memmap_alloc() and efi_memmap_install() API's, a
typical flow to install a new EFI memory map would look something like
below.

1. Get the number of entries the new EFI memory map should have (typically
   through efi_memmap_split_count()).
2. Allocate memory for the new EFI memory map (efi_memmap_alloc()).
3. Populate memory descriptor entries in the new EFI memory map.
4. Install the new EFI memory map (efi_memmap_install() which also unmaps
   and frees existing memory map).

Existing functions like efi_clean_memmap(), efi_arch_mem_reserve(),
efi_free_boot_services() and efi_fake_memmap() are modified to fix the
above mentioned bugs and also to follow the above recommended usage of
API's.

Note that efi_clean_memmap() could be implemented without allocating any
new memory, but since this is not fast path and hence is not a concern for
performance, readability and maintainability wins. So, change it to use
efi_memmap_alloc() and efi_memmap_install().

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/include/asm/efi.h      |   1 +
 arch/x86/kernel/setup.c         |   6 ++
 arch/x86/platform/efi/efi.c     |  44 ++++++------
 arch/x86/platform/efi/quirks.c  |  43 +++---------
 drivers/firmware/efi/fake_mem.c |  21 ++----
 drivers/firmware/efi/memmap.c   | 118 +++++++++++++++++++++++---------
 include/linux/efi.h             |   7 +-
 7 files changed, 132 insertions(+), 108 deletions(-)
Ingo Molnar - Dec. 5, 2018, 7:31 a.m.
* Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> wrote:

>  	if (efi_enabled(EFI_MEMMAP)) {
> +		/*
> +		 * efi_clean_memmap() uses memblock_phys_alloc() to allocate
> +		 * memory for new EFI memmap and hence will work only after
> +		 * e820__memblock_setup()
> +		 */
> +		efi_clean_memmap();
>  		efi_fake_memmap();
>  		efi_find_mirror();
>  		efi_esrt_init();

I'd also suggest a namespace cleanup before we do any material 
modifications:

'efi_esrt_init()' is the proper pattern to follow, it's prefixed by efi_, 
then followed by the more generic subsystem (_esrt) and then by the 
functionality (_init). This is a good, hierarchical, top-down 
nomenclature that makes it easy to grep for ESRT functionality by typing 
'git grep esrt'.

The same is not true of the memmap functionality: 'git grep efi_memmap_' 
doesn't do the right thing.

So I think this should be renamed to:

	efi_memmap_clean()
	efi_memmap_insert()
	efi_memmap_free()
	efi_memmap_print()
	efi_memmap_fake()
	etc.

While at it, there's another area that could be improved:

 - efi_memmap_fake() is a bit weird naming: it's not really 'fake', 
   presumably the specified table is still very much real, otherwise it 
   won't result in a working kernel.

   What that functionality *really* is about is user-defined entries. Why 
   not name it in such a fashion? efi_memmap_init_user_defined() or so?

Thanks,

	Ingo
Bhupesh Sharma - Dec. 5, 2018, 9:43 a.m.
May be the x86 maintainers would have more comments on this, but I have 
a couple of nitpicks. Please see in-line:

On 12/05/2018 06:03 AM, Sai Praneeth Prakhya wrote:
> Presently, in efi subsystem of kernel, every time kernel allocates memory
> for a new EFI memory map, it forgets to free the memory occupied by the
> existing EFI memory map. This could be fixed by unmapping and freeing the
> existing EFI memory map every time before installing a new EFI memory map.
> Hence, modify efi_memmap_install() accordingly since it's the only place
> which installs a new EFI memory map.
> 
> Presently, efi_memmap_alloc() allocates only physical memory and every
> caller of efi_memmap_alloc() should remap the newly allocated memory in
> order to use it. This extra step could sometimes lead to buggy error
> handling conditions where in the allocated memory isn't freed should remap
> fail. So, push the remap logic into efi_memmap_alloc() so that the error
> handling could be improved and it also makes the caller look simpler.
> 
> With the modified efi_memmap_alloc() and efi_memmap_install() API's, a
> typical flow to install a new EFI memory map would look something like
> below.
> 
> 1. Get the number of entries the new EFI memory map should have (typically
>     through efi_memmap_split_count()).
> 2. Allocate memory for the new EFI memory map (efi_memmap_alloc()).
> 3. Populate memory descriptor entries in the new EFI memory map.
> 4. Install the new EFI memory map (efi_memmap_install() which also unmaps
>     and frees existing memory map).
> 
> Existing functions like efi_clean_memmap(), efi_arch_mem_reserve(),
> efi_free_boot_services() and efi_fake_memmap() are modified to fix the
> above mentioned bugs and also to follow the above recommended usage of
> API's.
> 
> Note that efi_clean_memmap() could be implemented without allocating any
> new memory, but since this is not fast path and hence is not a concern for
> performance, readability and maintainability wins. So, change it to use
> efi_memmap_alloc() and efi_memmap_install().
> 
> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
> Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Bhupesh Sharma <bhsharma@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>   arch/x86/include/asm/efi.h      |   1 +
>   arch/x86/kernel/setup.c         |   6 ++
>   arch/x86/platform/efi/efi.c     |  44 ++++++------
>   arch/x86/platform/efi/quirks.c  |  43 +++---------
>   drivers/firmware/efi/fake_mem.c |  21 ++----
>   drivers/firmware/efi/memmap.c   | 118 +++++++++++++++++++++++---------
>   include/linux/efi.h             |   7 +-
>   7 files changed, 132 insertions(+), 108 deletions(-)
> 
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index d1e64ac80b9c..744f945a00e7 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -143,6 +143,7 @@ extern void efi_switch_mm(struct mm_struct *mm);
>   extern void efi_recover_from_page_fault(unsigned long phys_addr);
>   extern void efi_free_boot_services(void);
>   extern void efi_reserve_boot_services(void);
> +extern void __init efi_clean_memmap(void);
>   
>   struct efi_setup_data {
>   	u64 fw_vendor;
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index b74e7bfed6ab..bed79b238b0d 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1102,6 +1102,12 @@ void __init setup_arch(char **cmdline_p)
>   	reserve_bios_regions();
>   
>   	if (efi_enabled(EFI_MEMMAP)) {
> +		/*
> +		 * efi_clean_memmap() uses memblock_phys_alloc() to allocate
> +		 * memory for new EFI memmap and hence will work only after
> +		 * e820__memblock_setup()
> +		 */
> +		efi_clean_memmap();
>   		efi_fake_memmap();
>   		efi_find_mirror();
>   		efi_esrt_init();
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 715601d1c581..63885cc8e34e 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -249,30 +249,36 @@ static bool __init efi_memmap_entry_valid(const efi_memory_desc_t *md, int i)
>   	return false;
>   }
>   
> -static void __init efi_clean_memmap(void)
> +void __init efi_clean_memmap(void)
>   {
> -	efi_memory_desc_t *out = efi.memmap.map;
> -	const efi_memory_desc_t *in = out;
> -	const efi_memory_desc_t *end = efi.memmap.map_end;
> -	int i, n_removal;
> -
> -	for (i = n_removal = 0; in < end; i++) {
> -		if (efi_memmap_entry_valid(in, i)) {
> -			if (out != in)
> -				memcpy(out, in, efi.memmap.desc_size);
> -			out = (void *)out + efi.memmap.desc_size;
> -		} else {
> +	void *out;
> +	efi_memory_desc_t *md;
> +	unsigned int i = 0, n_removal = 0;

Perhaps we can use 'unsigned int i = n_removal = 0;', which appears more 
readable.

> +	struct efi_memory_map new_memmap;
> +
> +	for_each_efi_memory_desc(md) {
> +		if (!efi_memmap_entry_valid(md, i))
>   			n_removal++;
> -		}
> -		in = (void *)in + efi.memmap.desc_size;
>   	}
>   
> -	if (n_removal > 0) {
> -		u64 size = efi.memmap.nr_map - n_removal;
> +	if (n_removal == 0)

and 'if (!n_removal)' above, to stay in sync with the rest of the code 
in this file.

Thanks,
Bhupesh

> +		return;
>   
> -		pr_warn("Removing %d invalid memory map entries.\n", n_removal);
> -		efi_memmap_install(efi.memmap.phys_map, size);
> +	pr_warn("Removing %d invalid memory map entries\n", n_removal);
> +	if (efi_memmap_alloc(&new_memmap, efi.memmap.nr_map - n_removal))
> +		return;
> +
> +	i = 0;
> +	out = new_memmap.map;
> +	for_each_efi_memory_desc(md) {
> +		if (efi_memmap_entry_valid(md, i)) {
> +			memcpy(out, md, efi.memmap.desc_size);
> +			out += efi.memmap.desc_size;
> +		}
> +		i++;
>   	}
> +
> +	efi_memmap_install(new_memmap);
>   }
>   
>   void __init efi_print_memmap(void)
> @@ -537,8 +543,6 @@ void __init efi_init(void)
>   		}
>   	}
>   
> -	efi_clean_memmap();
> -
>   	if (efi_enabled(EFI_DBG))
>   		efi_print_memmap();
>   }
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index ce6dcd40dd6c..e4e67b391e0a 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -242,11 +242,10 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
>    */
>   void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>   {
> -	phys_addr_t new_phys, new_size;
> +	struct efi_memory_map new_memmap;
>   	struct efi_mem_range mr;
>   	efi_memory_desc_t md;
>   	int num_entries;
> -	void *new;
>   
>   	if (efi_mem_desc_lookup(addr, &md) ||
>   	    md.type != EFI_BOOT_SERVICES_DATA) {
> @@ -274,24 +273,13 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>   	num_entries = efi_memmap_split_count(&md, &mr.range);
>   	num_entries += efi.memmap.nr_map;
>   
> -	new_size = efi.memmap.desc_size * num_entries;
> -
> -	new_phys = efi_memmap_alloc(num_entries);
> -	if (!new_phys) {
> +	if (efi_memmap_alloc(&new_memmap, num_entries)) {
>   		pr_err("Could not allocate boot services memmap\n");
>   		return;
>   	}
>   
> -	new = early_memremap(new_phys, new_size);
> -	if (!new) {
> -		pr_err("Failed to map new boot services memmap\n");
> -		return;
> -	}
> -
> -	efi_memmap_insert(&efi.memmap, new, &mr);
> -	early_memunmap(new, new_size);
> -
> -	efi_memmap_install(new_phys, num_entries);
> +	efi_memmap_insert(&efi.memmap, new_memmap.map, &mr);
> +	efi_memmap_install(new_memmap);
>   }
>   
>   /*
> @@ -389,10 +377,10 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
>   
>   void __init efi_free_boot_services(void)
>   {
> -	phys_addr_t new_phys, new_size;
> +	struct efi_memory_map new_memmap;
>   	efi_memory_desc_t *md;
>   	int num_entries = 0;
> -	void *new, *new_md;
> +	void *new_md;
>   
>   	for_each_efi_memory_desc(md) {
>   		unsigned long long start = md->phys_addr;
> @@ -444,25 +432,17 @@ void __init efi_free_boot_services(void)
>   	if (!num_entries)
>   		return;
>   
> -	new_size = efi.memmap.desc_size * num_entries;
> -	new_phys = efi_memmap_alloc(num_entries);
> -	if (!new_phys) {
> +	if (efi_memmap_alloc(&new_memmap, num_entries)) {
>   		pr_err("Failed to allocate new EFI memmap\n");
>   		return;
>   	}
>   
> -	new = memremap(new_phys, new_size, MEMREMAP_WB);
> -	if (!new) {
> -		pr_err("Failed to map new EFI memmap\n");
> -		return;
> -	}
> -
>   	/*
>   	 * Build a new EFI memmap that excludes any boot services
>   	 * regions that are not tagged EFI_MEMORY_RUNTIME, since those
>   	 * regions have now been freed.
>   	 */
> -	new_md = new;
> +	new_md = new_memmap.map;
>   	for_each_efi_memory_desc(md) {
>   		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
>   		    (md->type == EFI_BOOT_SERVICES_CODE ||
> @@ -473,12 +453,7 @@ void __init efi_free_boot_services(void)
>   		new_md += efi.memmap.desc_size;
>   	}
>   
> -	memunmap(new);
> -
> -	if (efi_memmap_install(new_phys, num_entries)) {
> -		pr_err("Could not install new EFI memmap\n");
> -		return;
> -	}
> +	efi_memmap_install(new_memmap);
>   }
>   
>   /*
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index 6c7d60c239b5..3490231fc391 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -53,9 +53,8 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
>   void __init efi_fake_memmap(void)
>   {
>   	int new_nr_map = efi.memmap.nr_map;
> +	struct efi_memory_map new_memmap;
>   	efi_memory_desc_t *md;
> -	phys_addr_t new_memmap_phy;
> -	void *new_memmap;
>   	int i;
>   
>   	if (!nr_fake_mem)
> @@ -71,25 +70,13 @@ void __init efi_fake_memmap(void)
>   	}
>   
>   	/* allocate memory for new EFI memmap */
> -	new_memmap_phy = efi_memmap_alloc(new_nr_map);
> -	if (!new_memmap_phy)
> +	if (efi_memmap_alloc(&new_memmap, new_nr_map))
>   		return;
>   
> -	/* create new EFI memmap */
> -	new_memmap = early_memremap(new_memmap_phy,
> -				    efi.memmap.desc_size * new_nr_map);
> -	if (!new_memmap) {
> -		memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
> -		return;
> -	}
> -
>   	for (i = 0; i < nr_fake_mem; i++)
> -		efi_memmap_insert(&efi.memmap, new_memmap, &fake_mems[i]);
> -
> -	/* swap into new EFI memmap */
> -	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
> +		efi_memmap_insert(&efi.memmap, new_memmap.map, &fake_mems[i]);
>   
> -	efi_memmap_install(new_memmap_phy, new_nr_map);
> +	efi_memmap_install(new_memmap);
>   
>   	/* print new EFI memmap */
>   	efi_print_memmap();
> diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> index 4318a69bdbbf..56405a11b4c9 100644
> --- a/drivers/firmware/efi/memmap.c
> +++ b/drivers/firmware/efi/memmap.c
> @@ -13,40 +13,94 @@
>   #include <linux/memblock.h>
>   #include <linux/slab.h>
>   
> -static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
> +static int __init __efi_memmap_alloc_early(struct efi_memory_map *new_memmap,
> +					   unsigned long size)
>   {
> -	return memblock_phys_alloc(size, SMP_CACHE_BYTES);
> +	void *map;
> +	phys_addr_t phys_map;
> +
> +	phys_map = memblock_phys_alloc(size, SMP_CACHE_BYTES);
> +
> +	/*
> +	 * No need to check for a valid physical address because
> +	 * memblock_alloc_base() panics if it's unable to allocate a physical
> +	 * address.
> +	 */
> +	map = early_memremap(phys_map, size);
> +	if (!map) {
> +		pr_err("Failed to map new EFI memmap\n");
> +		memblock_free(phys_map, size);
> +		return -ENOMEM;
> +	}
> +
> +	new_memmap->phys_map = phys_map;
> +	new_memmap->map = map;
> +	new_memmap->map_end = map + size;
> +	new_memmap->late = 0;
> +	return 0;
>   }
>   
> -static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
> +static int __init __efi_memmap_alloc_late(struct efi_memory_map *new_memmap,
> +					  unsigned long size)
>   {
> +	void *map;
> +	phys_addr_t phys_map;
>   	unsigned int order = get_order(size);
>   	struct page *p = alloc_pages(GFP_KERNEL, order);
>   
> -	if (!p)
> -		return 0;
> +	if (!p) {
> +		pr_err("Failed to allocate pages for new EFI memmap\n");
> +		return -ENOMEM;
> +	}
> +
> +	phys_map = PFN_PHYS(page_to_pfn(p));
> +	new_memmap->phys_map = phys_map;
> +	map = memremap(phys_map, size, MEMREMAP_WB);
> +	if (!map) {
> +		pr_err("Failed to map new EFI memmap\n");
> +		__free_pages(p, order);
> +		return -ENOMEM;
> +	}
>   
> -	return PFN_PHYS(page_to_pfn(p));
> +	new_memmap->phys_map = phys_map;
> +	new_memmap->map = map;
> +	new_memmap->map_end = map + size;
> +	new_memmap->late = 1;
> +	return 0;
>   }
>   
>   /**
> - * efi_memmap_alloc - Allocate memory for the EFI memory map
> - * @num_entries: Number of entries in the allocated map.
> + * efi_memmap_alloc - Allocate and remap memory for a new EFI memory map
> + * @new_memmap: Structure that holds attributes of the new EFI memory map.
> + * @nr_map: Number of entries in the new EFI memory map.
>    *
> - * Depending on whether mm_init() has already been invoked or not,
> - * either memblock or "normal" page allocation is used.
> + * Depending on whether mm_init() has already been invoked or not, either
> + * memblock or "normal" page allocation is used. Accordingly, the allocated
> + * memory is remapped as well and hence is ready for use. Populates fields of
> + * new_memmap like physical address of map, virtual address of map, the type of
> + * allocation performed etc.
>    *
> - * Returns the physical address of the allocated memory map on
> - * success, zero on failure.
> + * Returns zero on success and error code on failure.
>    */
> -phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
> +int __init efi_memmap_alloc(struct efi_memory_map *new_memmap, int nr_map)
>   {
> -	unsigned long size = num_entries * efi.memmap.desc_size;
> +	unsigned long size;
> +
> +	if (!new_memmap)
> +		return -EINVAL;
> +
> +	if (!nr_map)
> +		return -EINVAL;
> +
> +	new_memmap->nr_map = nr_map;
> +	new_memmap->desc_version = efi.memmap.desc_version;
> +	new_memmap->desc_size = efi.memmap.desc_size;
> +	size = nr_map * efi.memmap.desc_size;
>   
>   	if (slab_is_available())
> -		return __efi_memmap_alloc_late(size);
> +		return __efi_memmap_alloc_late(new_memmap, size);
>   
> -	return __efi_memmap_alloc_early(size);
> +	return __efi_memmap_alloc_early(new_memmap, size);
>   }
>   
>   /**
> @@ -54,6 +108,10 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
>    * @new_memmap: Structure that describes EFI memory map.
>    *
>    * Memory is freed depending on the type of allocation performed.
> + * Unlike efi_memmap_alloc() (which allocates and remaps the newly allocated
> + * memory), this routine doesn't unmap new_memmap because there are use cases
> + * where the kernel just needs to unmap the memory map (see efi_init() in arm
> + * and kexec_enter_virtual_mode()) but not free it.
>    */
>   static void __init efi_memmap_free(struct efi_memory_map new_memmap)
>   {
> @@ -241,27 +299,19 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
>   
>   /**
>    * efi_memmap_install - Install a new EFI memory map in efi.memmap
> - * @addr: Physical address of the memory map
> - * @nr_map: Number of entries in the memory map
> - *
> - * Unlike efi_memmap_init_*(), this function does not allow the caller
> - * to switch from early to late mappings. It simply uses the existing
> - * mapping function and installs the new memmap.
> + * @new_memmap: New memory map that should be installed
>    *
> - * Returns zero on success, a negative error code on failure.
> + * Should be called _only_ after calling efi_memmap_alloc() because it's assumed
> + * that the new_memmap has valid values populated which is usually done by
> + * efi_memmap_alloc(). Also, note that this routine unmaps and frees memory
> + * occupied by the existing EFI memory map.
>    */
> -int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
> +void __init efi_memmap_install(struct efi_memory_map new_memmap)
>   {
> -	struct efi_memory_map_data data;
> -
> -	efi_memmap_unmap();
> -
> -	data.phys_map = addr;
> -	data.size = efi.memmap.desc_size * nr_map;
> -	data.desc_version = efi.memmap.desc_version;
> -	data.desc_size = efi.memmap.desc_size;
> -
> -	return __efi_memmap_init(&data, efi.memmap.late);
> +	/* Unmap and free previous memmap */
> +	efi_memmap_unmap_and_free();
> +	efi.memmap = new_memmap;
> +	set_bit(EFI_MEMMAP, &efi.flags);
>   }
>   
>   /**
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 752260e02ae7..e31f9b993e1d 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -566,7 +566,7 @@ typedef efi_status_t efi_get_variable_t (efi_char16_t *name, efi_guid_t *vendor,
>   					 unsigned long *data_size, void *data);
>   typedef efi_status_t efi_get_next_variable_t (unsigned long *name_size, efi_char16_t *name,
>   					      efi_guid_t *vendor);
> -typedef efi_status_t efi_set_variable_t (efi_char16_t *name, efi_guid_t *vendor,
> +typedef efi_status_t efi_set_variable_t (efi_char16_t *name, efi_guid_t *vendor,
>   					 u32 attr, unsigned long data_size,
>   					 void *data);
>   typedef efi_status_t efi_get_next_high_mono_count_t (u32 *count);
> @@ -1015,12 +1015,13 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
>   #endif
>   extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
>   
> -extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
> +extern int __init efi_memmap_alloc(struct efi_memory_map *new_memmap,
> +				   int nr_map);
>   extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
>   extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
>   extern void __init efi_memmap_unmap(void);
>   extern void __init efi_memmap_unmap_and_free(void);
> -extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
> +extern void __init efi_memmap_install(struct efi_memory_map new_memmap);
>   extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
>   					 struct range *range);
>   extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
>
Sai Praneeth Prakhya - Dec. 5, 2018, 6:42 p.m.
> >  	if (efi_enabled(EFI_MEMMAP)) {
> > +		/*
> > +		 * efi_clean_memmap() uses memblock_phys_alloc() to allocate
> > +		 * memory for new EFI memmap and hence will work only after
> > +		 * e820__memblock_setup()
> > +		 */
> > +		efi_clean_memmap();
> >  		efi_fake_memmap();
> >  		efi_find_mirror();
> >  		efi_esrt_init();
> 
> I'd also suggest a namespace cleanup before we do any material
> modifications:
> 
> 'efi_esrt_init()' is the proper pattern to follow, it's prefixed by efi_, then
> followed by the more generic subsystem (_esrt) and then by the functionality
> (_init). This is a good, hierarchical, top-down nomenclature that makes it easy to
> grep for ESRT functionality by typing 'git grep esrt'.
> 
> The same is not true of the memmap functionality: 'git grep efi_memmap_'
> doesn't do the right thing.
> 
> So I think this should be renamed to:
> 
> 	efi_memmap_clean()
> 	efi_memmap_insert()
> 	efi_memmap_free()
> 	efi_memmap_print()
> 	efi_memmap_fake()
> 	etc.
> 
> While at it, there's another area that could be improved:
> 
>  - efi_memmap_fake() is a bit weird naming: it's not really 'fake',
>    presumably the specified table is still very much real, otherwise it
>    won't result in a working kernel.
> 
>    What that functionality *really* is about is user-defined entries. Why
>    not name it in such a fashion? efi_memmap_init_user_defined() or so?

From the example you gave about efi_esrt_init(), the naming of efi memmap 
related functions does look messy to me now.. and yes, a namespace clean up 
might be good.

Regards,
Sai
Sai Praneeth Prakhya - Dec. 5, 2018, 7:32 p.m.
> > -static void __init efi_clean_memmap(void)

> > +void __init efi_clean_memmap(void)

> >   {

> > -	efi_memory_desc_t *out = efi.memmap.map;

> > -	const efi_memory_desc_t *in = out;

> > -	const efi_memory_desc_t *end = efi.memmap.map_end;

> > -	int i, n_removal;

> > -

> > -	for (i = n_removal = 0; in < end; i++) {

> > -		if (efi_memmap_entry_valid(in, i)) {

> > -			if (out != in)

> > -				memcpy(out, in, efi.memmap.desc_size);

> > -			out = (void *)out + efi.memmap.desc_size;

> > -		} else {

> > +	void *out;

> > +	efi_memory_desc_t *md;

> > +	unsigned int i = 0, n_removal = 0;

> 

> Perhaps we can use 'unsigned int i = n_removal = 0;', which appears more

> readable.


Hmm.. that actually throws an compilation error saying that "n_removal is undeclared".
So, let's stick with unsigned int i = 0, n_removal = 0;

> 

> > +	struct efi_memory_map new_memmap;

> > +

> > +	for_each_efi_memory_desc(md) {

> > +		if (!efi_memmap_entry_valid(md, i))

> >   			n_removal++;

> > -		}

> > -		in = (void *)in + efi.memmap.desc_size;

> >   	}

> >

> > -	if (n_removal > 0) {

> > -		u64 size = efi.memmap.nr_map - n_removal;

> > +	if (n_removal == 0)

> 

> and 'if (!n_removal)' above, to stay in sync with the rest of the code in this file.


Sure! Thanks for the catch. I will fix it.

Regards,
Sai

Patch

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index d1e64ac80b9c..744f945a00e7 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,6 +143,7 @@  extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
 extern void efi_free_boot_services(void);
 extern void efi_reserve_boot_services(void);
+extern void __init efi_clean_memmap(void);
 
 struct efi_setup_data {
 	u64 fw_vendor;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b74e7bfed6ab..bed79b238b0d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1102,6 +1102,12 @@  void __init setup_arch(char **cmdline_p)
 	reserve_bios_regions();
 
 	if (efi_enabled(EFI_MEMMAP)) {
+		/*
+		 * efi_clean_memmap() uses memblock_phys_alloc() to allocate
+		 * memory for new EFI memmap and hence will work only after
+		 * e820__memblock_setup()
+		 */
+		efi_clean_memmap();
 		efi_fake_memmap();
 		efi_find_mirror();
 		efi_esrt_init();
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 715601d1c581..63885cc8e34e 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -249,30 +249,36 @@  static bool __init efi_memmap_entry_valid(const efi_memory_desc_t *md, int i)
 	return false;
 }
 
-static void __init efi_clean_memmap(void)
+void __init efi_clean_memmap(void)
 {
-	efi_memory_desc_t *out = efi.memmap.map;
-	const efi_memory_desc_t *in = out;
-	const efi_memory_desc_t *end = efi.memmap.map_end;
-	int i, n_removal;
-
-	for (i = n_removal = 0; in < end; i++) {
-		if (efi_memmap_entry_valid(in, i)) {
-			if (out != in)
-				memcpy(out, in, efi.memmap.desc_size);
-			out = (void *)out + efi.memmap.desc_size;
-		} else {
+	void *out;
+	efi_memory_desc_t *md;
+	unsigned int i = 0, n_removal = 0;
+	struct efi_memory_map new_memmap;
+
+	for_each_efi_memory_desc(md) {
+		if (!efi_memmap_entry_valid(md, i))
 			n_removal++;
-		}
-		in = (void *)in + efi.memmap.desc_size;
 	}
 
-	if (n_removal > 0) {
-		u64 size = efi.memmap.nr_map - n_removal;
+	if (n_removal == 0)
+		return;
 
-		pr_warn("Removing %d invalid memory map entries.\n", n_removal);
-		efi_memmap_install(efi.memmap.phys_map, size);
+	pr_warn("Removing %d invalid memory map entries\n", n_removal);
+	if (efi_memmap_alloc(&new_memmap, efi.memmap.nr_map - n_removal))
+		return;
+
+	i = 0;
+	out = new_memmap.map;
+	for_each_efi_memory_desc(md) {
+		if (efi_memmap_entry_valid(md, i)) {
+			memcpy(out, md, efi.memmap.desc_size);
+			out += efi.memmap.desc_size;
+		}
+		i++;
 	}
+
+	efi_memmap_install(new_memmap);
 }
 
 void __init efi_print_memmap(void)
@@ -537,8 +543,6 @@  void __init efi_init(void)
 		}
 	}
 
-	efi_clean_memmap();
-
 	if (efi_enabled(EFI_DBG))
 		efi_print_memmap();
 }
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index ce6dcd40dd6c..e4e67b391e0a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -242,11 +242,10 @@  EXPORT_SYMBOL_GPL(efi_query_variable_store);
  */
 void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 {
-	phys_addr_t new_phys, new_size;
+	struct efi_memory_map new_memmap;
 	struct efi_mem_range mr;
 	efi_memory_desc_t md;
 	int num_entries;
-	void *new;
 
 	if (efi_mem_desc_lookup(addr, &md) ||
 	    md.type != EFI_BOOT_SERVICES_DATA) {
@@ -274,24 +273,13 @@  void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	num_entries = efi_memmap_split_count(&md, &mr.range);
 	num_entries += efi.memmap.nr_map;
 
-	new_size = efi.memmap.desc_size * num_entries;
-
-	new_phys = efi_memmap_alloc(num_entries);
-	if (!new_phys) {
+	if (efi_memmap_alloc(&new_memmap, num_entries)) {
 		pr_err("Could not allocate boot services memmap\n");
 		return;
 	}
 
-	new = early_memremap(new_phys, new_size);
-	if (!new) {
-		pr_err("Failed to map new boot services memmap\n");
-		return;
-	}
-
-	efi_memmap_insert(&efi.memmap, new, &mr);
-	early_memunmap(new, new_size);
-
-	efi_memmap_install(new_phys, num_entries);
+	efi_memmap_insert(&efi.memmap, new_memmap.map, &mr);
+	efi_memmap_install(new_memmap);
 }
 
 /*
@@ -389,10 +377,10 @@  static void __init efi_unmap_pages(efi_memory_desc_t *md)
 
 void __init efi_free_boot_services(void)
 {
-	phys_addr_t new_phys, new_size;
+	struct efi_memory_map new_memmap;
 	efi_memory_desc_t *md;
 	int num_entries = 0;
-	void *new, *new_md;
+	void *new_md;
 
 	for_each_efi_memory_desc(md) {
 		unsigned long long start = md->phys_addr;
@@ -444,25 +432,17 @@  void __init efi_free_boot_services(void)
 	if (!num_entries)
 		return;
 
-	new_size = efi.memmap.desc_size * num_entries;
-	new_phys = efi_memmap_alloc(num_entries);
-	if (!new_phys) {
+	if (efi_memmap_alloc(&new_memmap, num_entries)) {
 		pr_err("Failed to allocate new EFI memmap\n");
 		return;
 	}
 
-	new = memremap(new_phys, new_size, MEMREMAP_WB);
-	if (!new) {
-		pr_err("Failed to map new EFI memmap\n");
-		return;
-	}
-
 	/*
 	 * Build a new EFI memmap that excludes any boot services
 	 * regions that are not tagged EFI_MEMORY_RUNTIME, since those
 	 * regions have now been freed.
 	 */
-	new_md = new;
+	new_md = new_memmap.map;
 	for_each_efi_memory_desc(md) {
 		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
 		    (md->type == EFI_BOOT_SERVICES_CODE ||
@@ -473,12 +453,7 @@  void __init efi_free_boot_services(void)
 		new_md += efi.memmap.desc_size;
 	}
 
-	memunmap(new);
-
-	if (efi_memmap_install(new_phys, num_entries)) {
-		pr_err("Could not install new EFI memmap\n");
-		return;
-	}
+	efi_memmap_install(new_memmap);
 }
 
 /*
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 6c7d60c239b5..3490231fc391 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -53,9 +53,8 @@  static int __init cmp_fake_mem(const void *x1, const void *x2)
 void __init efi_fake_memmap(void)
 {
 	int new_nr_map = efi.memmap.nr_map;
+	struct efi_memory_map new_memmap;
 	efi_memory_desc_t *md;
-	phys_addr_t new_memmap_phy;
-	void *new_memmap;
 	int i;
 
 	if (!nr_fake_mem)
@@ -71,25 +70,13 @@  void __init efi_fake_memmap(void)
 	}
 
 	/* allocate memory for new EFI memmap */
-	new_memmap_phy = efi_memmap_alloc(new_nr_map);
-	if (!new_memmap_phy)
+	if (efi_memmap_alloc(&new_memmap, new_nr_map))
 		return;
 
-	/* create new EFI memmap */
-	new_memmap = early_memremap(new_memmap_phy,
-				    efi.memmap.desc_size * new_nr_map);
-	if (!new_memmap) {
-		memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
-		return;
-	}
-
 	for (i = 0; i < nr_fake_mem; i++)
-		efi_memmap_insert(&efi.memmap, new_memmap, &fake_mems[i]);
-
-	/* swap into new EFI memmap */
-	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
+		efi_memmap_insert(&efi.memmap, new_memmap.map, &fake_mems[i]);
 
-	efi_memmap_install(new_memmap_phy, new_nr_map);
+	efi_memmap_install(new_memmap);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 4318a69bdbbf..56405a11b4c9 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -13,40 +13,94 @@ 
 #include <linux/memblock.h>
 #include <linux/slab.h>
 
-static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
+static int __init __efi_memmap_alloc_early(struct efi_memory_map *new_memmap,
+					   unsigned long size)
 {
-	return memblock_phys_alloc(size, SMP_CACHE_BYTES);
+	void *map;
+	phys_addr_t phys_map;
+
+	phys_map = memblock_phys_alloc(size, SMP_CACHE_BYTES);
+
+	/*
+	 * No need to check for a valid physical address because
+	 * memblock_alloc_base() panics if it's unable to allocate a physical
+	 * address.
+	 */
+	map = early_memremap(phys_map, size);
+	if (!map) {
+		pr_err("Failed to map new EFI memmap\n");
+		memblock_free(phys_map, size);
+		return -ENOMEM;
+	}
+
+	new_memmap->phys_map = phys_map;
+	new_memmap->map = map;
+	new_memmap->map_end = map + size;
+	new_memmap->late = 0;
+	return 0;
 }
 
-static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
+static int __init __efi_memmap_alloc_late(struct efi_memory_map *new_memmap,
+					  unsigned long size)
 {
+	void *map;
+	phys_addr_t phys_map;
 	unsigned int order = get_order(size);
 	struct page *p = alloc_pages(GFP_KERNEL, order);
 
-	if (!p)
-		return 0;
+	if (!p) {
+		pr_err("Failed to allocate pages for new EFI memmap\n");
+		return -ENOMEM;
+	}
+
+	phys_map = PFN_PHYS(page_to_pfn(p));
+	new_memmap->phys_map = phys_map;
+	map = memremap(phys_map, size, MEMREMAP_WB);
+	if (!map) {
+		pr_err("Failed to map new EFI memmap\n");
+		__free_pages(p, order);
+		return -ENOMEM;
+	}
 
-	return PFN_PHYS(page_to_pfn(p));
+	new_memmap->phys_map = phys_map;
+	new_memmap->map = map;
+	new_memmap->map_end = map + size;
+	new_memmap->late = 1;
+	return 0;
 }
 
 /**
- * efi_memmap_alloc - Allocate memory for the EFI memory map
- * @num_entries: Number of entries in the allocated map.
+ * efi_memmap_alloc - Allocate and remap memory for a new EFI memory map
+ * @new_memmap: Structure that holds attributes of the new EFI memory map.
+ * @nr_map: Number of entries in the new EFI memory map.
  *
- * Depending on whether mm_init() has already been invoked or not,
- * either memblock or "normal" page allocation is used.
+ * Depending on whether mm_init() has already been invoked or not, either
+ * memblock or "normal" page allocation is used. Accordingly, the allocated
+ * memory is remapped as well and hence is ready for use. Populates fields of
+ * new_memmap like physical address of map, virtual address of map, the type of
+ * allocation performed etc.
  *
- * Returns the physical address of the allocated memory map on
- * success, zero on failure.
+ * Returns zero on success and error code on failure.
  */
-phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+int __init efi_memmap_alloc(struct efi_memory_map *new_memmap, int nr_map)
 {
-	unsigned long size = num_entries * efi.memmap.desc_size;
+	unsigned long size;
+
+	if (!new_memmap)
+		return -EINVAL;
+
+	if (!nr_map)
+		return -EINVAL;
+
+	new_memmap->nr_map = nr_map;
+	new_memmap->desc_version = efi.memmap.desc_version;
+	new_memmap->desc_size = efi.memmap.desc_size;
+	size = nr_map * efi.memmap.desc_size;
 
 	if (slab_is_available())
-		return __efi_memmap_alloc_late(size);
+		return __efi_memmap_alloc_late(new_memmap, size);
 
-	return __efi_memmap_alloc_early(size);
+	return __efi_memmap_alloc_early(new_memmap, size);
 }
 
 /**
@@ -54,6 +108,10 @@  phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
  * @new_memmap: Structure that describes EFI memory map.
  *
  * Memory is freed depending on the type of allocation performed.
+ * Unlike efi_memmap_alloc() (which allocates and remaps the newly allocated
+ * memory), this routine doesn't unmap new_memmap because there are use cases
+ * where the kernel just needs to unmap the memory map (see efi_init() in arm
+ * and kexec_enter_virtual_mode()) but not free it.
  */
 static void __init efi_memmap_free(struct efi_memory_map new_memmap)
 {
@@ -241,27 +299,19 @@  int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 
 /**
  * efi_memmap_install - Install a new EFI memory map in efi.memmap
- * @addr: Physical address of the memory map
- * @nr_map: Number of entries in the memory map
- *
- * Unlike efi_memmap_init_*(), this function does not allow the caller
- * to switch from early to late mappings. It simply uses the existing
- * mapping function and installs the new memmap.
+ * @new_memmap: New memory map that should be installed
  *
- * Returns zero on success, a negative error code on failure.
+ * Should be called _only_ after calling efi_memmap_alloc() because it's assumed
+ * that the new_memmap has valid values populated which is usually done by
+ * efi_memmap_alloc(). Also, note that this routine unmaps and frees memory
+ * occupied by the existing EFI memory map.
  */
-int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
+void __init efi_memmap_install(struct efi_memory_map new_memmap)
 {
-	struct efi_memory_map_data data;
-
-	efi_memmap_unmap();
-
-	data.phys_map = addr;
-	data.size = efi.memmap.desc_size * nr_map;
-	data.desc_version = efi.memmap.desc_version;
-	data.desc_size = efi.memmap.desc_size;
-
-	return __efi_memmap_init(&data, efi.memmap.late);
+	/* Unmap and free previous memmap */
+	efi_memmap_unmap_and_free();
+	efi.memmap = new_memmap;
+	set_bit(EFI_MEMMAP, &efi.flags);
 }
 
 /**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 752260e02ae7..e31f9b993e1d 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -566,7 +566,7 @@  typedef efi_status_t efi_get_variable_t (efi_char16_t *name, efi_guid_t *vendor,
 					 unsigned long *data_size, void *data);
 typedef efi_status_t efi_get_next_variable_t (unsigned long *name_size, efi_char16_t *name,
 					      efi_guid_t *vendor);
-typedef efi_status_t efi_set_variable_t (efi_char16_t *name, efi_guid_t *vendor, 
+typedef efi_status_t efi_set_variable_t (efi_char16_t *name, efi_guid_t *vendor,
 					 u32 attr, unsigned long data_size,
 					 void *data);
 typedef efi_status_t efi_get_next_high_mono_count_t (u32 *count);
@@ -1015,12 +1015,13 @@  static inline efi_status_t efi_query_variable_store(u32 attributes,
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
+extern int __init efi_memmap_alloc(struct efi_memory_map *new_memmap,
+				   int nr_map);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);
 extern void __init efi_memmap_unmap_and_free(void);
-extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
+extern void __init efi_memmap_install(struct efi_memory_map new_memmap);
 extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
 					 struct range *range);
 extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,