Patchwork nfit, mce: only handle uncorrectable machine checks

login
register
mail settings
Submitter Vishal Verma
Date Oct. 24, 2018, 8:01 p.m.
Message ID <20181024200148.12597-1-vishal.l.verma@intel.com>
Download mbox | patch
Permalink /patch/643213/
State New
Headers show

Comments

Vishal Verma - Oct. 24, 2018, 8:01 p.m.
We only want a machine check error to be added to libnvdimm's 'badblock'
if it was an uncorrectable error. Currently we insert both corrected and
uncorrectable errors. Add a check in the nfit mce handler to filter out
corrected mce events.

Reported-by: Omar Avelar <omar.avelar@intel.com>
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Cc: stable@vger.kernel.org
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 arch/x86/include/asm/mce.h       | 1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
 drivers/acpi/nfit/mce.c          | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)
Dan Williams - Oct. 25, 2018, 12:25 a.m.
On Wed, Oct 24, 2018 at 1:03 PM Vishal Verma <vishal.l.verma@intel.com> wrote:
>
> We only want a machine check error to be added to libnvdimm's 'badblock'
> if it was an uncorrectable error. Currently we insert both corrected and
> uncorrectable errors. Add a check in the nfit mce handler to filter out
> corrected mce events.
>
> Reported-by: Omar Avelar <omar.avelar@intel.com>
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
> Cc: stable@vger.kernel.org
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>

Looks good, will let this sit in -next until the back half of the merge window.
Borislav Petkov - Oct. 25, 2018, 10:04 a.m.
Drop stable@ from CC.

On Wed, Oct 24, 2018 at 02:01:48PM -0600, Vishal Verma wrote:
> We only want a machine check error to be added to libnvdimm's 'badblock'
> if it was an uncorrectable error.

What is libnvdimm's 'badblock' ?

Also, pls write in the commit message *why* you want only UE errors.

Also, write your commit message in impartial tone, without the "we".

Thx.
Vishal Verma - Oct. 25, 2018, 5:55 p.m.
On Thu, 2018-10-25 at 12:04 +0200, Borislav Petkov wrote:
> Drop stable@ from CC.

> 

> On Wed, Oct 24, 2018 at 02:01:48PM -0600, Vishal Verma wrote:

> > We only want a machine check error to be added to libnvdimm's

> > 'badblock'

> > if it was an uncorrectable error.

> 

> What is libnvdimm's 'badblock' ?

> 

> Also, pls write in the commit message *why* you want only UE errors.

> 

> Also, write your commit message in impartial tone, without the "we".


Hi Borislav,

Thanks for the feedback, I'll send a new revision with better
explanations in the changelog.

Thanks,
	-Vishal

Patch

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3a17107594c8..3111b3cee2ee 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -216,6 +216,7 @@  static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
 
 int mce_available(struct cpuinfo_x86 *c);
 bool mce_is_memory_error(struct mce *m);
+bool mce_is_correctable(struct mce *m);
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 953b3ce92dcc..27015948bc41 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -534,7 +534,7 @@  bool mce_is_memory_error(struct mce *m)
 }
 EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
-static bool mce_is_correctable(struct mce *m)
+bool mce_is_correctable(struct mce *m)
 {
 	if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
 		return false;
@@ -544,6 +544,7 @@  static bool mce_is_correctable(struct mce *m)
 
 	return true;
 }
+EXPORT_SYMBOL_GPL(mce_is_correctable);
 
 static bool cec_add_mce(struct mce *m)
 {
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index e9626bf6ca29..7a51707f87e9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,8 +25,8 @@  static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct acpi_nfit_desc *acpi_desc;
 	struct nfit_spa *nfit_spa;
 
-	/* We only care about memory errors */
-	if (!mce_is_memory_error(mce))
+	/* We only care about uncorrectable memory errors */
+	if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
 		return NOTIFY_DONE;
 
 	/*