Patchwork [RFC,2/4] x86/setup: parse acpi to get hotplug info before init_mem_mapping()

login
register
mail settings
Submitter Pingfan Liu
Date Jan. 7, 2019, 8:24 a.m.
Message ID <1546849485-27933-3-git-send-email-kernelfans@gmail.com>
Download mbox | patch
Permalink /patch/693647/
State New
Headers show

Comments

Pingfan Liu - Jan. 7, 2019, 8:24 a.m.
At present, memblock bottom-up allocation can help us against stamping over
movable node in very high probability. But if the hotplug info has already
been parsed, the memblock allocator can step around the movable node by
itself. This patch pushes the parsing step forward, just ahead of where,
the memblock allocator can work. Later in this series, the bottom-up
allocation style can be removed on x86_64.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/setup.c | 15 +++++++++++++++
 include/linux/acpi.h    |  1 +
 2 files changed, 16 insertions(+)
Pingfan Liu - Jan. 7, 2019, 12:52 p.m.
On Mon, Jan 7, 2019 at 4:25 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> At present, memblock bottom-up allocation can help us against stamping over
> movable node in very high probability. But if the hotplug info has already
> been parsed, the memblock allocator can step around the movable node by
> itself. This patch pushes the parsing step forward, just ahead of where,
> the memblock allocator can work. Later in this series, the bottom-up
> allocation style can be removed on x86_64.
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/kernel/setup.c | 15 +++++++++++++++
>  include/linux/acpi.h    |  1 +
>  2 files changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index acbcd62..df4132c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -805,6 +805,20 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>         return 0;
>  }
>
> +/* only need the effect of acpi_numa_memory_affinity_init()
> + * ->memblock_mark_hotplug()
> + */
> +static int early_detect_acpi_memhotplug(void)
> +{
> +#ifdef CONFIG_ACPI_NUMA
> +       acpi_table_upgrade(__va(get_ramdisk_image()), get_ramdisk_size());
> +       acpi_table_init();
> +       acpi_numa_init();

As this is RFC version, I do not suppress this extra printk info yet.
Should do it next version.
> +       acpi_tb_terminate();
> +#endif
> +       return 0;
> +}
> +
>  /*
>   * Determine if we were loaded by an EFI loader.  If so, then we have also been
>   * passed the efi memmap, systab, etc., so we should use these data structures
> @@ -1131,6 +1145,7 @@ void __init setup_arch(char **cmdline_p)
>         trim_platform_memory_ranges();
>         trim_low_memory_range();
>
> +       early_detect_acpi_memhotplug();
>         init_mem_mapping();
>
>         idt_setup_early_pf();
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 44dcbba..1b69044 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -235,6 +235,7 @@ int acpi_mps_check (void);
>  int acpi_numa_init (void);
>
>  int acpi_table_init (void);
> +void acpi_tb_terminate(void);
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init acpi_table_parse_entries(char *id, unsigned long table_size,
>                               int entry_id,
> --
> 2.7.4
>
Dave Hansen - Jan. 7, 2019, 5:11 p.m.
On 1/7/19 12:24 AM, Pingfan Liu wrote:
> At present, memblock bottom-up allocation can help us against stamping over
> movable node in very high probability.

Is this what you are fixing?  Making a "high probability", a certainty?
 Is this the problem?

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index acbcd62..df4132c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -805,6 +805,20 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>  	return 0;
>  }
>  
> +/* only need the effect of acpi_numa_memory_affinity_init()
> + * ->memblock_mark_hotplug()
> + */

CodingStyle, please.

> +static int early_detect_acpi_memhotplug(void)
> +{
> +#ifdef CONFIG_ACPI_NUMA
> +	acpi_table_upgrade(__va(get_ramdisk_image()), get_ramdisk_size());

This adds a new, early, call to acpi_table_upgrade(), and presumably all
the following functions.  However, it does not remove any of the later
calls.  How do they interact with each other now that they are
presumably called twice?

> +	acpi_table_init();
> +	acpi_numa_init();
> +	acpi_tb_terminate();
> +#endif
> +	return 0;
> +}

Why does this return an 'int' that is unconsumed by its lone caller?

There seems to be a lack of comments on this newly-added code.

>  /*
>   * Determine if we were loaded by an EFI loader.  If so, then we have also been
>   * passed the efi memmap, systab, etc., so we should use these data structures
> @@ -1131,6 +1145,7 @@ void __init setup_arch(char **cmdline_p)
>  	trim_platform_memory_ranges();
>  	trim_low_memory_range();
>  
> +	early_detect_acpi_memhotplug();

Comments, please.  Why is this call here, specifically?  What is it doing?
Pingfan Liu - Jan. 8, 2019, 6:30 a.m.
On Tue, Jan 8, 2019 at 1:11 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
>
> On 1/7/19 12:24 AM, Pingfan Liu wrote:
> > At present, memblock bottom-up allocation can help us against stamping over
> > movable node in very high probability.
>
> Is this what you are fixing?  Making a "high probability", a certainty?
>  Is this the problem?
>

Yes, as my reply on another mail in detail.
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index acbcd62..df4132c 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -805,6 +805,20 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
> >       return 0;
> >  }
> >
> > +/* only need the effect of acpi_numa_memory_affinity_init()
> > + * ->memblock_mark_hotplug()
> > + */
>
> CodingStyle, please.
>

Will fix.
> > +static int early_detect_acpi_memhotplug(void)
> > +{
> > +#ifdef CONFIG_ACPI_NUMA
> > +     acpi_table_upgrade(__va(get_ramdisk_image()), get_ramdisk_size());
>
> This adds a new, early, call to acpi_table_upgrade(), and presumably all
> the following functions.  However, it does not remove any of the later
> calls.  How do they interact with each other now that they are
> presumably called twice?
>

ACPI is a big subsystem, I have a hurry through these functions. This
group seems not to allocate extra memory, and using static data. So if
called twice, just overwriting the effect of previous one. The only
issue is printk some info twice. I will pay more time on this for the
next version.
> > +     acpi_table_init();
> > +     acpi_numa_init();
> > +     acpi_tb_terminate();
> > +#endif
> > +     return 0;
> > +}
>
> Why does this return an 'int' that is unconsumed by its lone caller?
>

No special purpose about the return. Just a habit.
> There seems to be a lack of comments on this newly-added code.
>
> >  /*
> >   * Determine if we were loaded by an EFI loader.  If so, then we have also been
> >   * passed the efi memmap, systab, etc., so we should use these data structures
> > @@ -1131,6 +1145,7 @@ void __init setup_arch(char **cmdline_p)
> >       trim_platform_memory_ranges();
> >       trim_low_memory_range();
> >
> > +     early_detect_acpi_memhotplug();
>
> Comments, please.  Why is this call here, specifically?  What is it doing?
>
It parses the acpi srat to extract memory hotmovable info, and feed
those info to memory allocator. The exactly effect is:
acpi_numa_memory_affinity_init() ->memblock_mark_hotplug(). So later
when memblock allocator allocates range, in __next_mem_range(), there
is cond check to skip movable node:  if (movable_node_is_enabled() &&
memblock_is_hotpluggable(m)) continue;

Thanks,
Pingfan

Patch

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index acbcd62..df4132c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -805,6 +805,20 @@  dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 	return 0;
 }
 
+/* only need the effect of acpi_numa_memory_affinity_init()
+ * ->memblock_mark_hotplug()
+ */
+static int early_detect_acpi_memhotplug(void)
+{
+#ifdef CONFIG_ACPI_NUMA
+	acpi_table_upgrade(__va(get_ramdisk_image()), get_ramdisk_size());
+	acpi_table_init();
+	acpi_numa_init();
+	acpi_tb_terminate();
+#endif
+	return 0;
+}
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -1131,6 +1145,7 @@  void __init setup_arch(char **cmdline_p)
 	trim_platform_memory_ranges();
 	trim_low_memory_range();
 
+	early_detect_acpi_memhotplug();
 	init_mem_mapping();
 
 	idt_setup_early_pf();
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 44dcbba..1b69044 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -235,6 +235,7 @@  int acpi_mps_check (void);
 int acpi_numa_init (void);
 
 int acpi_table_init (void);
+void acpi_tb_terminate(void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init acpi_table_parse_entries(char *id, unsigned long table_size,
 			      int entry_id,