Patchwork arm64/io: Don't use WZR in writel

login
register
mail settings
Submitter AngeloGioacchino Del Regno
Date Feb. 9, 2019, 6:34 p.m.
Message ID <68b71c15f32341468a868f6418e4fcb375bc49ba.camel@gmail.com>
Download mbox | patch
Permalink /patch/722243/
State New
Headers show

Comments

AngeloGioacchino Del Regno - Feb. 9, 2019, 6:34 p.m.
From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
From: "Angelo G. Del Regno" <kholk11@gmail.com>
Date: Sat, 9 Feb 2019 18:56:46 +0100
Subject: [PATCH] arm64/io: Don't use WZR in writel

This is a partial revert of commit ee5e41b5f21a
("arm64/io: Allow I/O writes to use {W,X}ZR")

When we try to use the zero register directly on some SoCs,
their security will make them freeze due to a firmware bug.
This behavior is seen with the arm-smmu driver freezing on
TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

Allocating a temporary register to store the zero for the
write actually solves the issue on these SoCs.

Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
---
 arch/arm64/include/asm/io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Will Deacon - Feb. 11, 2019, 10:57 a.m.
On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del Regno wrote:
> From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
> From: "Angelo G. Del Regno" <kholk11@gmail.com>
> Date: Sat, 9 Feb 2019 18:56:46 +0100
> Subject: [PATCH] arm64/io: Don't use WZR in writel
> 
> This is a partial revert of commit ee5e41b5f21a
> ("arm64/io: Allow I/O writes to use {W,X}ZR")
> 
> When we try to use the zero register directly on some SoCs,
> their security will make them freeze due to a firmware bug.
> This behavior is seen with the arm-smmu driver freezing on
> TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

Hmm, this sounds very fragile. I hope they're not trapping and emulating
MMIO accesses and treating the zero register as the stack pointer...

Wouldn't this also be triggerable from userspace by mmap()ing either
/dev/mem or e.g. a PCI bar via sysfs?

> Allocating a temporary register to store the zero for the
> write actually solves the issue on these SoCs.

I don't think this catches all MMIO accesses, so I think we need to
understand more about the actual issue here. For example, is it only the
SMMU that causes this problem? Also, any workaround should be specific to
the broken SoCs.

Will
Marc Zyngier - Feb. 11, 2019, 11:52 a.m.
On 11/02/2019 10:57, Will Deacon wrote:
> On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del Regno wrote:
>> From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
>> From: "Angelo G. Del Regno" <kholk11@gmail.com>
>> Date: Sat, 9 Feb 2019 18:56:46 +0100
>> Subject: [PATCH] arm64/io: Don't use WZR in writel
>>
>> This is a partial revert of commit ee5e41b5f21a
>> ("arm64/io: Allow I/O writes to use {W,X}ZR")
>>
>> When we try to use the zero register directly on some SoCs,
>> their security will make them freeze due to a firmware bug.>> This behavior is seen with the arm-smmu driver freezing on
>> TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

This looks similar to the issue these SoCs have with GICv3, worked
around in 9c8114c20d18.

> Hmm, this sounds very fragile. I hope they're not trapping and emulating
> MMIO accesses and treating the zero register as the stack pointer...

I bet this is the case. The same bug was there in both KVM and Xen. The
only difference is that we fixed it back in December 2015 (at least for
KVM), while some of these SoCs were announced in 2017, and are still
shipping. Great stuff.

> 
> Wouldn't this also be triggerable from userspace by mmap()ing either
> /dev/mem or e.g. a PCI bar via sysfs?
> 
>> Allocating a temporary register to store the zero for the
>> write actually solves the issue on these SoCs.
> 
> I don't think this catches all MMIO accesses, so I think we need to
> understand more about the actual issue here. For example, is it only the
> SMMU that causes this problem? Also, any workaround should be specific to
> the broken SoCs.

Also, nothing would prevent a compiler from generating these accesses.

	M.

Jazz is not dead. It just smells funny...
AngeloGioacchino Del Regno - Feb. 11, 2019, 2:29 p.m.
Il giorno lun, 11/02/2019 alle 11.52 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 10:57, Will Deacon wrote:
> > On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del
> > Regno wrote:
> > > From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00
> > > 2001
> > > From: "Angelo G. Del Regno" <kholk11@gmail.com>
> > > Date: Sat, 9 Feb 2019 18:56:46 +0100
> > > Subject: [PATCH] arm64/io: Don't use WZR in writel
> > > 
> > > This is a partial revert of commit ee5e41b5f21a
> > > ("arm64/io: Allow I/O writes to use {W,X}ZR")
> > > 
> > > When we try to use the zero register directly on some SoCs,
> > > their security will make them freeze due to a firmware bug.>>
> > > This behavior is seen with the arm-smmu driver freezing on
> > > TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.
> 
> This looks similar to the issue these SoCs have with GICv3, worked
> around in 9c8114c20d18.
> 

Well, yes that's a firmware quirk, of course, due to the "security"
stuff that they have inside...

> > Hmm, this sounds very fragile. I hope they're not trapping and
> > emulating
> > MMIO accesses and treating the zero register as the stack
> > pointer...
> 
> I bet this is the case. The same bug was there in both KVM and Xen.
> The
> only difference is that we fixed it back in December 2015 (at least
> for
> KVM), while some of these SoCs were announced in 2017, and are still
> shipping. Great stuff.

Totally agree, they must be using it as stack pointer.
Poor decision.

> 
> > Wouldn't this also be triggerable from userspace by mmap()ing
> > either
> > /dev/mem or e.g. a PCI bar via sysfs?
> > 
> > > Allocating a temporary register to store the zero for the
> > > write actually solves the issue on these SoCs.
> > 
> > I don't think this catches all MMIO accesses, so I think we need to
> > understand more about the actual issue here. For example, is it
> > only the
> > SMMU that causes this problem? Also, any workaround should be
> > specific to
> > the broken SoCs.
> 
> Also, nothing would prevent a compiler from generating these
> accesses.
> 
> 	M.
> 
> Jazz is not dead. It just smells funny...

While I agree that nothing would prevent a compiler from generating
these accesses, please take in mind that everything worked on
downstream kernels before this change was introduced (which is first
seen downstream on msm-4.9).
So I've discovered it on msm-4.9 while porting the 8996-98, 630-660
to that and I've had a whole lot of head scratching: the arm-smmu
code was apparently right, then I've seen that surprise......
By the way, I can tell you for sure that this bug is not present on
at least SDM845, since that one worked fine even before this fix,
and I imagine that also SDM670 and newest may not be affected.
Also Family-B SoCs are not affected by this bug (MSM8916-36-37-56-76).

Unfortunately, I couldn't think of any other solution on these
Family-A SoCs, also because I'm not totally sure that the only
driver that produces this issue is arm-smmu. When I've fixed it
on the downstream kernel, I've also had some other random freezes
that weren't related to the SMMU: usually qseecom stuff was also
acting funny sometimes.

Also, just one more thing: yes this thing is going ARM64-wide and
- from my findings - it's targeting certain Qualcomm SoCs, but...
I'm not sure that only QC is affected by that, others may as well
have the same stupid bug.
Marc Zyngier - Feb. 11, 2019, 2:59 p.m.
On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:

[...]

> Also, just one more thing: yes this thing is going ARM64-wide and
> - from my findings - it's targeting certain Qualcomm SoCs, but...
> I'm not sure that only QC is affected by that, others may as well
> have the same stupid bug.
> 

At the moment, only QC SoCs seem to be affected, probably because
everyone else has debugged their hypervisor (or most likely doesn't
bother with shipping one).

In all honesty, we need some information from QC here: which SoCs are
affected, what is the exact nature of the bug, can it be triggered from
EL0. Randomly papering over symptoms is not something I really like
doing, and is likely to generate problems on unaffected systems.

Thanks,

	M.
AngeloGioacchino Del Regno - Feb. 11, 2019, 4:15 p.m.
Il giorno lun, 11/02/2019 alle 14.59 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
> > Also, just one more thing: yes this thing is going ARM64-wide and
> > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > I'm not sure that only QC is affected by that, others may as well
> > have the same stupid bug.
> > 
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 

Between all the ARM SoCs, as far as I know, the only (?) ones using
actual "smartphones", so actually provisioned SoCs, for upstream
development are using Qualcomm SoCs.. of which, some development
boards are not entirely security enabled, or have got newer firmwares
which can't be used on production phones etc, so.. that's why I said
that I'm not sure that only QC is affected.
It's just relative to what we currently know, but looking at, for
example, MediaTek, I'm not sure that the only bugged hypervisor is on
QC (because MTK is very similar on certain aspects).
I mean, it's highly possible that some other is affected and we don't
know (and we possibly don't care...).

> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered
> from

It'd be wonderful if Qualcomm gives us some information about that.
Would really be helpful and nice from them.

> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.
> 
> Thanks,
> 
> 	M.

I also don't like "randomly papering over symptoms", I totally agree
with you on that... but this change potentially generating problems on
unaffected systems is something I don't really agree on: this is a
partial revert of a commit that was done purely to introduce some
vmlinux (relatively small) size saving on ARM64 and no other reason
(as far as I can read on the original commit), so I really don't think
that my partial revert could ever harm anything.
Though, this is a personal opinion, I can be right, but I can obviously
be totally wrong on that.

Though I have to make this clear: if there's another (better/cleaner)
solution to this issue, I'd be totally happy (and, of course, curious
too) to see it!
Robin Murphy - Feb. 11, 2019, 4:37 p.m.
On 11/02/2019 14:59, Marc Zyngier wrote:
> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
>> Also, just one more thing: yes this thing is going ARM64-wide and
>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>> I'm not sure that only QC is affected by that, others may as well
>> have the same stupid bug.
>>
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 
> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered from
> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.

And even if we *were* to just try papering over the observed extent of 
the issue, I'd still be inclined to confine it to arm-smmu.c where the 
impact is finite and minimal - of the 4 instances of writel(0) there, 3 
of them don't care what the data is (so could just reuse the base 
register or similar) and the other one already has a zero in a GPR to 
hand by construction.

Robin.
Bjorn Andersson - Feb. 23, 2019, 6:12 p.m.
On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:

> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
> > Also, just one more thing: yes this thing is going ARM64-wide and
> > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > I'm not sure that only QC is affected by that, others may as well
> > have the same stupid bug.
> > 
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 
> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered from
> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.
> 

The bug at hand is that the XZR is not deemed a valid source in the
virtualization of the SMMU registers. It was identified and fixed for
all platforms that are shipping kernels based on v4.9 or later.

As such Angelo's list of affected platforms covers the high-profile
ones. In particular MSM8996 and MSM8998 is getting pretty good support
upstream, if we can figure out a way around this issue.

Regards,
Bjorn
Marc Zyngier - Feb. 23, 2019, 6:37 p.m.
On Sat, 23 Feb 2019 18:12:54 +0000,
Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> 
> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> 
> > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > 
> > [...]
> > 
> > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > I'm not sure that only QC is affected by that, others may as well
> > > have the same stupid bug.
> > > 
> > 
> > At the moment, only QC SoCs seem to be affected, probably because
> > everyone else has debugged their hypervisor (or most likely doesn't
> > bother with shipping one).
> > 
> > In all honesty, we need some information from QC here: which SoCs are
> > affected, what is the exact nature of the bug, can it be triggered from
> > EL0. Randomly papering over symptoms is not something I really like
> > doing, and is likely to generate problems on unaffected systems.
> > 
> 
> The bug at hand is that the XZR is not deemed a valid source in the
> virtualization of the SMMU registers. It was identified and fixed for
> all platforms that are shipping kernels based on v4.9 or later.

When you say "fixed": Do you mean fixed in the firmware? Or by adding
a workaround in the shipped kernel? If the former, is this part of an
official QC statement, with an associated erratum number? Is this
really limited to the SMMU accesses?

> As such Angelo's list of affected platforms covers the high-profile
> ones. In particular MSM8996 and MSM8998 is getting pretty good support
> upstream, if we can figure out a way around this issue.

We'd need an exhaustive list of the affected SoCs, and work out if we
can limit the hack to the SMMU driver (cc'ing Robin, who's the one
who'd know about it).

Thanks,

	M.
Bjorn Andersson - Feb. 24, 2019, 3:53 a.m.
On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:

> On Sat, 23 Feb 2019 18:12:54 +0000,
> Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> > 
> > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > 
> > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > 
> > > [...]
> > > 
> > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > I'm not sure that only QC is affected by that, others may as well
> > > > have the same stupid bug.
> > > > 
> > > 
> > > At the moment, only QC SoCs seem to be affected, probably because
> > > everyone else has debugged their hypervisor (or most likely doesn't
> > > bother with shipping one).
> > > 
> > > In all honesty, we need some information from QC here: which SoCs are
> > > affected, what is the exact nature of the bug, can it be triggered from
> > > EL0. Randomly papering over symptoms is not something I really like
> > > doing, and is likely to generate problems on unaffected systems.
> > > 
> > 
> > The bug at hand is that the XZR is not deemed a valid source in the
> > virtualization of the SMMU registers. It was identified and fixed for
> > all platforms that are shipping kernels based on v4.9 or later.
> 
> When you say "fixed": Do you mean fixed in the firmware? Or by adding
> a workaround in the shipped kernel?

I mean that it's fixed in the firmware.

> If the former, is this part of an official QC statement, with an
> associated erratum number?

I don't know, will get back to you on this one.

> Is this really limited to the SMMU accesses?
> 

Yes.

> > As such Angelo's list of affected platforms covers the high-profile
> > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > upstream, if we can figure out a way around this issue.
> 
> We'd need an exhaustive list of the affected SoCs, and work out if we
> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> who'd know about it).
> 

I will try to compose a list.

Regards,
Bjorn

Patch

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index ee723835c1f4..a0a6d1aeb670 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -49,7 +49,7 @@  static inline void __raw_writew(u16 val, volatile void __iomem *addr)
 #define __raw_writel __raw_writel
 static inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
-	asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
 }
 
 #define __raw_writeq __raw_writeq