Patchwork [05/22] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()

login
register
mail settings
Submitter Sebastian Andrzej Siewior
Date Jan. 9, 2019, 11:47 a.m.
Message ID <20190109114744.10936-6-bigeasy@linutronix.de>
Download mbox | patch
Permalink /patch/695755/
State New
Headers show

Comments

Sebastian Andrzej Siewior - Jan. 9, 2019, 11:47 a.m.
Since ->initialized is always true for user tasks and kernel threads
don't get this far, we always save the registers directly to userspace.

Remove check for ->initialized because it is always true and remove the
false condition.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)
Borislav Petkov - Jan. 16, 2019, 7:36 p.m.
On Wed, Jan 09, 2019 at 12:47:27PM +0100, Sebastian Andrzej Siewior wrote:
> Since ->initialized is always true for user tasks and kernel threads
> don't get this far,

Yeah, this is commit message is too laconic. Don'g get this far "where"?

> we always save the registers directly to userspace.

We don't save registers to userspace - please write stuff out.

So from looking at what you're removing I can barely rhyme up what
you're doing but this needs a lot more explanation why it is OK to
remove the else case. Hell, why was the thing added in the first place
if ->initialized is always true?

And why is it ok to save registers directly to the user task's buffers?

So please be more verbose even at the risk of explaning the obvious.
This is the FPU code, remember? :)

Thx.
Sebastian Andrzej Siewior - Jan. 16, 2019, 10:40 p.m.
On 2019-01-16 20:36:03 [+0100], Borislav Petkov wrote:
> On Wed, Jan 09, 2019 at 12:47:27PM +0100, Sebastian Andrzej Siewior wrote:
> > Since ->initialized is always true for user tasks and kernel threads
> > don't get this far,
> 
> Yeah, this is commit message is too laconic. Don'g get this far "where"?

To reach copy_fpregs_to_sigframe(). A kernel thread never invokes
copy_fpregs_to_sigframe(). Which means only user threads invoke
copy_fpregs_to_sigframe() and they have ->initialized set to one.

> > we always save the registers directly to userspace.
> 
> We don't save registers to userspace - please write stuff out.

Actually we do. copy_fpregs_to_sigframe() saves current FPU registers to
task's stack frame which is userspace memory.

> So from looking at what you're removing I can barely rhyme up what
> you're doing but this needs a lot more explanation why it is OK to
> remove the else case. Hell, why was the thing added in the first place
> if ->initialized is always true?

I think *parts* of the ->initialized field was wrongly converted while
lazy-FPU was removed *or* it was forgotten to be removed afterwards. Or
I don't know but it looks like a leftover.

At the beginning (while it was added) it was part of the lazy-FPU code.
So if tasks's FPU register are not active then they are saved in task's
FPU struct. So in this case (the else path) it does
	__copy_to_user(buf_fx, xsave, fpu_user_xstate_size)

In the other case (task's FPU struct is not up-to date, the current
FPU register content is in CPU's registers) it does
	copy_fpregs_to_sigframe(buf_fx)
	
which copies CPU's registers. In both cases it copies them (the FPU
registers) to the task's stack frame (the same location). Easy so far?

How does using_compacted_format() fit in here?
The point is that the "compacted" format is never exposed to
userland so it requires normal xsave. So far so good, right? But how
does it work in in the '->initialized = 0' case right?  It was
introduced in commit
  99aa22d0d8f7 ("x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use")

and it probably does not explain why this works, right?
So *either* fpregs_active() was always true if the task used FPU *once*
or if it used FPU *recently* and task's FPU register are active (I don't
remember anymore). Anyway:
a) we don't get here because caller checks for fpregs_active() before
   invoking copy_fpstate_to_sigframe()
b) a preemption check resets fpregs_active() after the first check
   then we do "xsave", xsaves traps because FPU is off/disabled, trap
   loads task's FPU registers, gets back to "xsave", "xsave" saves
   CPU's register to the stack frame.

The b part does not work like that since commit
  bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error")

but then at that point it was "okay" because fpregs_active() would
return true if the task used FPU registers at least once. If it did not
use them then it would not invoke that function (the caller checks for
fpregs_active()).

> And why is it ok to save registers directly to the user task's buffers?
So I can't tell you why it is okay but I can explain why it is done
(well, that part I puzzled together).
The task is running and using FPU registers. Then an evil mind sends a
signal. The task goes into kernel, prepares itself and is about to
handle the signal in userland. It saves its FPU registers on the stack
frame. It zeros its current FPU registers (ready for a fresh start),
loads the address of the signal handler and returns to user land
handling the signal.

Now. The signal handler may use FPU registers and the signal handler
maybe be preempted so you need to save the FPU registers of the signal
handler and you can't mix them up with the FPU register's of the task
(before it started handling the signal).

So in order to avoid a second FPU struct it saves them on user's stack
frame. I *think* this (avoiding a second FPU struct) is the primary
motivation. A bonus point might be that the signal handler has a third
argument the `context'. That means you can use can access the task's FPU
registers from the signal handler. Not sure *why* you want to do so but
yo can.
I can't imagine a use case and I was looking for a user and expecting it
to be glibc but I didn't find anything in the glibc that would explain
it. Intel even defines a few bytes as "user reserved" which are used by
"struct _fpx_sw_bytes" to add a marker in the signal and recognise it on
restore.
The only user that seems to make use of that is `criu' (or it looked
like it does use it). I would prefer to add a second struct-FPU and use
that for the signal handler. This would avoid the whole dance here. And
`criu' could maybe become a proper interface. I don't think as of now
that it will break something in userland if the signal handler suddenly
does not have a pointer to the FPU struct.

> So please be more verbose even at the risk of explaning the obvious.
> This is the FPU code, remember? :)

Okay. So I was verbose *now*. Depending on what you say (or don't) I
will try to recycle this into commit message in a few days.

> Thx.

Sebastian
Borislav Petkov - Jan. 17, 2019, 12:22 p.m.
On Wed, Jan 16, 2019 at 11:40:37PM +0100, Sebastian Andrzej Siewior wrote:
> Actually we do. copy_fpregs_to_sigframe() saves current FPU registers to
> task's stack frame which is userspace memory.

I know we do - I was only pointing at the not optimal choice of words -
"save registers to userspace" and to rather say "save hardware registers
to user buffers" or so.

> I think *parts* of the ->initialized field was wrongly converted while
> lazy-FPU was removed *or* it was forgotten to be removed afterwards. Or
> I don't know but it looks like a leftover.
> 
> At the beginning (while it was added) it was part of the lazy-FPU code.
> So if tasks's FPU register are not active then they are saved in task's
> FPU struct. So in this case (the else path) it does
> 	__copy_to_user(buf_fx, xsave, fpu_user_xstate_size)

So far, so good. Comment above says so too:

 * If the fpu, extended register state is live, save the state directly
 * to the user frame pointed by the aligned pointer 'buf_fx'. Otherwise,
 * copy the thread's fpu state to the user frame starting at 'buf_fx'.

> In the other case (task's FPU struct is not up-to date, the current
> FPU register content is in CPU's registers) it does
> 	copy_fpregs_to_sigframe(buf_fx)

ACK.

> How does using_compacted_format() fit in here?
> The point is that the "compacted" format is never exposed to
> userland so it requires normal xsave. So far so good, right? But how
> does it work in in the '->initialized = 0' case right?  It was
> introduced in commit
>   99aa22d0d8f7 ("x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use")
> 
> and it probably does not explain why this works, right?

I think this was imposed by our inability to handle XSAVES compacted
format. And that should be fixed now, AFAICR.

> So *either* fpregs_active() was always true if the task used FPU *once*
> or if it used FPU *recently* and task's FPU register are active (I don't
> remember anymore). Anyway:
> a) we don't get here because caller checks for fpregs_active() before
>    invoking copy_fpstate_to_sigframe()

Ok.

> b) a preemption check resets fpregs_active() after the first check
>    then we do "xsave", xsaves traps because FPU is off/disabled, trap
>    loads task's FPU registers, gets back to "xsave", "xsave" saves
>    CPU's register to the stack frame.
> 
> The b part does not work like that since commit
>   bef8b6da9522 ("x86/fpu: Handle #NM without FPU emulation as an error")
> 
> but then at that point it was "okay" because fpregs_active() would
> return true if the task used FPU registers at least once. If it did not
> use them then it would not invoke that function (the caller checks for
> fpregs_active()).

Right, AFAICT, we were moving to eager FPU at that time and this commit
is part of the lazy FPU removal stuff.

> So I can't tell you why it is okay but I can explain why it is done
> (well, that part I puzzled together).

I hate the fact that we have to puzzle stuff together for the FPU code.
;-\

> The task is running and using FPU registers. Then an evil mind sends a
> signal. The task goes into kernel, prepares itself and is about to
> handle the signal in userland. It saves its FPU registers on the stack
> frame. It zeros its current FPU registers (ready for a fresh start),
> loads the address of the signal handler and returns to user land
> handling the signal.
> 
> Now. The signal handler may use FPU registers and the signal handler
> maybe be preempted so you need to save the FPU registers of the signal
> handler and you can't mix them up with the FPU register's of the task
> (before it started handling the signal).
> 
> So in order to avoid a second FPU struct it saves them on user's stack
> frame. I *think* this (avoiding a second FPU struct) is the primary
> motivation.

Yah, makes sense. Sounds like something we'd do :-)

> A bonus point might be that the signal handler has a third
> argument the `context'. That means you can use can access the task's FPU
> registers from the signal handler. Not sure *why* you want to do so but
> yo can.

For <raisins>.

> I can't imagine a use case and I was looking for a user and expecting it
> to be glibc but I didn't find anything in the glibc that would explain
> it. Intel even defines a few bytes as "user reserved" which are used by
> "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on
> restore.
> The only user that seems to make use of that is `criu' (or it looked
> like it does use it). I would prefer to add a second struct-FPU and use
> that for the signal handler. This would avoid the whole dance here.

That would be interesting from the perspective of making the code
straight-forward and not having to document all that dance somewhere.

> And `criu' could maybe become a proper interface. I don't think as of
> now that it will break something in userland if the signal handler
> suddenly does not have a pointer to the FPU struct.

Well, but allocating a special FPU pointer for the signal handler
context sounds simple and clean, no? Or are we afraid that that would
slowdown signal handling, the whole allocation and assignment and
stuff...?

> Okay. So I was verbose *now*. Depending on what you say (or don't) I
> will try to recycle this into commit message in a few days.

Yeah, much much better. Thanks a lot for the effort!
Sebastian Andrzej Siewior - Jan. 18, 2019, 9:14 p.m.
tl;dr
The kernel saves task's FPU registers on user's signal stack before
entering the signal handler. Can we avoid that and have in-kernel memory
for that? Does someone rely on the FPU registers from the task in the
signal handler?

On 2019-01-17 13:22:53 [+0100], Borislav Petkov wrote:
> > The task is running and using FPU registers. Then an evil mind sends a
> > signal. The task goes into kernel, prepares itself and is about to
> > handle the signal in userland. It saves its FPU registers on the stack
> > frame. It zeros its current FPU registers (ready for a fresh start),
> > loads the address of the signal handler and returns to user land
> > handling the signal.
> > 
> > Now. The signal handler may use FPU registers and the signal handler
> > maybe be preempted so you need to save the FPU registers of the signal
> > handler and you can't mix them up with the FPU register's of the task
> > (before it started handling the signal).
> > 
> > So in order to avoid a second FPU struct it saves them on user's stack
> > frame. I *think* this (avoiding a second FPU struct) is the primary
> > motivation.
> 
> Yah, makes sense. Sounds like something we'd do :-)
> 
> > A bonus point might be that the signal handler has a third
> > argument the `context'. That means you can use can access the task's FPU
> > registers from the signal handler. Not sure *why* you want to do so but
> > yo can.
> 
> For <raisins>.
> 
> > I can't imagine a use case and I was looking for a user and expecting it
> > to be glibc but I didn't find anything in the glibc that would explain
> > it. Intel even defines a few bytes as "user reserved" which are used by
> > "struct _fpx_sw_bytes" to add a marker in the signal and recognise it on
> > restore.
> > The only user that seems to make use of that is `criu' (or it looked
> > like it does use it). I would prefer to add a second struct-FPU and use
> > that for the signal handler. This would avoid the whole dance here.
> 
> That would be interesting from the perspective of making the code
> straight-forward and not having to document all that dance somewhere.
> 
> > And `criu' could maybe become a proper interface. I don't think as of
> > now that it will break something in userland if the signal handler
> > suddenly does not have a pointer to the FPU struct.
> 
> Well, but allocating a special FPU pointer for the signal handler
> context sounds simple and clean, no? Or are we afraid that that would
> slowdown signal handling, the whole allocation and assignment and
> stuff...?

So I *think* we could allocate a second struct fpu for the signal
handler at task creation time and use it.
It should not slow-down signal handling. So instead saving it to user's
stack we would save it to "our" memory. On the restore path we could
trust our buffer and simply load it again.

Sebastian
Dave Hansen - Jan. 18, 2019, 9:17 p.m.
On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
> The kernel saves task's FPU registers on user's signal stack before
> entering the signal handler. Can we avoid that and have in-kernel memory
> for that? Does someone rely on the FPU registers from the task in the
> signal handler?

This is part of our ABI for *sure*.  Inspecting that state is how
userspace makes sense of MPX or protection keys faults.  We even use
this in selftests/.
Sebastian Andrzej Siewior - Jan. 18, 2019, 9:37 p.m.
On 2019-01-18 13:17:28 [-0800], Dave Hansen wrote:
> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
> > The kernel saves task's FPU registers on user's signal stack before
> > entering the signal handler. Can we avoid that and have in-kernel memory
> > for that? Does someone rely on the FPU registers from the task in the
> > signal handler?
> 
> This is part of our ABI for *sure*.  

I missed that part. I will try to look it up and look see if says
something about optional part.
But ABI means we must keep doing it even if there are no users?

> Inspecting that state is how
> userspace makes sense of MPX or protection keys faults.  We even use
> this in selftests/.

Okay. MPX does not check for FP_XSTATE_MAGIC[12] and simply assumes it
is there. That is why I didn't find it.
So we would break MPX. But then MPX is on its way out, so…

Sebastian
Dave Hansen - Jan. 18, 2019, 9:43 p.m.
On 1/18/19 1:37 PM, Sebastian Andrzej Siewior wrote:
> On 2019-01-18 13:17:28 [-0800], Dave Hansen wrote:
>> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
>>> The kernel saves task's FPU registers on user's signal stack before
>>> entering the signal handler. Can we avoid that and have in-kernel memory
>>> for that? Does someone rely on the FPU registers from the task in the
>>> signal handler?
>>
>> This is part of our ABI for *sure*.  
> 
> I missed that part. I will try to look it up and look see if says
> something about optional part.
> But ABI means we must keep doing it even if there are no users?

I'd bet a large sum of money there are users.

Google for 'uc_mcontext fpregs'.
Oleg Nesterov - Jan. 21, 2019, 11:21 a.m.
On 01/18, Dave Hansen wrote:
>
> On 1/18/19 1:14 PM, Sebastian Andrzej Siewior wrote:
> > The kernel saves task's FPU registers on user's signal stack before
> > entering the signal handler. Can we avoid that and have in-kernel memory
> > for that? Does someone rely on the FPU registers from the task in the
> > signal handler?
>
> This is part of our ABI for *sure*.  Inspecting that state is how
> userspace makes sense of MPX or protection keys faults.  We even use
> this in selftests/.

Yes.

And in any case I do not understand the idea to use the second in-kernel struct fpu.
A signal handler can be interrupted by another signal, this will need to save/restore
the FPU state again.

Oleg.
Borislav Petkov - Jan. 22, 2019, 1:40 p.m.
On Mon, Jan 21, 2019 at 12:21:17PM +0100, Oleg Nesterov wrote:
> And in any case I do not understand the idea to use the second
> in-kernel struct fpu. A signal handler can be interrupted by another
> signal, this will need to save/restore the FPU state again.

Well, we were just speculating whether doing that would simplify the
code around get_sigframe() et al. But if that is an ABI, then we can't
really touch it.

Btw, where is that whole ABI deal about saving FPU regs on the user
signal stack documented?

Thx.
Oleg Nesterov - Jan. 22, 2019, 4:15 p.m.
On 01/22, Borislav Petkov wrote:
>
> On Mon, Jan 21, 2019 at 12:21:17PM +0100, Oleg Nesterov wrote:
> > And in any case I do not understand the idea to use the second
> > in-kernel struct fpu. A signal handler can be interrupted by another
> > signal, this will need to save/restore the FPU state again.
>
> Well, we were just speculating whether doing that would simplify the
> code around get_sigframe() et al. But if that is an ABI, then we can't
> really touch it.
>
> Btw, where is that whole ABI deal about saving FPU regs on the user
> signal stack documented?

I don't know... tried to google, found nothing.

the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical
reasons, this didn't help.

Oleg.
Borislav Petkov - Jan. 22, 2019, 5 p.m.
On Tue, Jan 22, 2019 at 05:15:51PM +0100, Oleg Nesterov wrote:
> I don't know... tried to google, found nothing.
> 
> the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical
> reasons, this didn't help.

So I'm being told by one of the psABI folks that this is not really
written down somewhere explicitly but it is the result from the POSIX
and psABI treatise of signal handlers, what they're supposed to do,
caller- and callee-saved registers, etc.

And FPU registers are volatile, i.e., caller-saved. Which means, the
handler itself doesn't save them but the caller, which, doesn't really
expect any signals - they are async. So the kernel must do that and
slap the FPU regs onto the user stack...

Hohumm. Makes sense.
Sebastian Andrzej Siewior - Feb. 5, 2019, 11:17 a.m.
On 2019-01-21 12:21:17 [+0100], Oleg Nesterov wrote:
> > This is part of our ABI for *sure*.  Inspecting that state is how
> > userspace makes sense of MPX or protection keys faults.  We even use
> > this in selftests/.
> 
> Yes.
> 
> And in any case I do not understand the idea to use the second in-kernel struct fpu.
> A signal handler can be interrupted by another signal, this will need to save/restore
> the FPU state again.

So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the
current signal is handled. So no interruption. But then SIGSEGV is
probably the exception which will interrupt SIGUSR1. So we would need a
third one…

The idea was to save the FPU state in-kernel so we don't have to
revalidate everything because userspace had access to it and could do
things.
Some things are complicated and not documented why they are the way they
are. For instance on 64bit (based on the code) the signal handler can
remove SSE from the state-mask and the kernel loads the "default-empty"
SSE registers and the enabled states from user. This isn't done on
32bit. Also: we save with XSAVE and allocate the buffer on stack. But if
we can't find the FP_XSTATE_MAGIC* or the buffer is not properly aligned
then we fallback to FXSR and assume that we have a FXSR buffer in front
of us.

> Oleg.

Sebastian
Sebastian Andrzej Siewior - Feb. 5, 2019, 11:34 a.m.
On 2019-01-22 18:00:23 [+0100], Borislav Petkov wrote:
> On Tue, Jan 22, 2019 at 05:15:51PM +0100, Oleg Nesterov wrote:
> > I don't know... tried to google, found nothing.
> > 
> > the comment in /usr/include/sys/ucontext.h mentions SysV/i386 ABI + historical
> > reasons, this didn't help.
> 
> So I'm being told by one of the psABI folks that this is not really
> written down somewhere explicitly but it is the result from the POSIX
> and psABI treatise of signal handlers, what they're supposed to do,
> caller- and callee-saved registers, etc.
> 
> And FPU registers are volatile, i.e., caller-saved. Which means, the
> handler itself doesn't save them but the caller, which, doesn't really
> expect any signals - they are async. So the kernel must do that and
> slap the FPU regs onto the user stack...

My point was save them somewhere else if it is possible. So we could
save a few cycles during signal delivery and it would make the code a
little simpler.

Let me finish the series and then we can think how we could improve it.

> Hohumm. Makes sense.
> 

Sebastian
Oleg Nesterov - Feb. 26, 2019, 4:38 p.m.
Hi Sebastian,

Sorry, I just noticed your email...

On 02/05, Sebastian Andrzej Siewior wrote:
>
> On 2019-01-21 12:21:17 [+0100], Oleg Nesterov wrote:
> > > This is part of our ABI for *sure*.  Inspecting that state is how
> > > userspace makes sense of MPX or protection keys faults.  We even use
> > > this in selftests/.
> >
> > Yes.
> >
> > And in any case I do not understand the idea to use the second in-kernel struct fpu.
> > A signal handler can be interrupted by another signal, this will need to save/restore
> > the FPU state again.
>
> So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the
> current signal is handled. So no interruption. But then SIGSEGV is
> probably the exception which will interrupt SIGUSR1. So we would need a
> third one…

I guess you do not need my answer, but just in case.

SIGSEGV is not an exception. A SIGUSR1 handler can be interrupted by any other
signal which is not included in sigaction->sa_mask. Even SIGUSR1 can interrupt
the handler if SA_NODEFER was used.


> The idea was to save the FPU state in-kernel so we don't have to
> revalidate everything because userspace had access to it and could do
> things.

I understand, but this simply can't work, see above.

Oleg.
Sebastian Andrzej Siewior - March 8, 2019, 6:12 p.m.
On 2019-02-26 17:38:22 [+0100], Oleg Nesterov wrote:
> Hi Sebastian,
Hi Oleg,

> Sorry, I just noticed your email...

no worries.

> > So I assumed that while SIGUSR1 is handled SIGUSR2 will wait until the
> > current signal is handled. So no interruption. But then SIGSEGV is
> > probably the exception which will interrupt SIGUSR1. So we would need a
> > third one…
> 
> I guess you do not need my answer, but just in case.
> 
> SIGSEGV is not an exception. A SIGUSR1 handler can be interrupted by any other
> signal which is not included in sigaction->sa_mask. Even SIGUSR1 can interrupt
> the handler if SA_NODEFER was used.

okay, understood. My understanding was that since signal sending is not
very deterministic and as such you can't reliably control if one signal
arrived before the other finished it should not matter.
But well, this all is gone now…

Thank you for the explanation.

> Oleg.

Sebastian

Patch

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c0cdcb9b7de5a..c136a4327659d 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -157,7 +157,6 @@  static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
 int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct fpu *fpu = &current->thread.fpu;
-	struct xregs_state *xsave = &fpu->state.xsave;
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
 
@@ -172,29 +171,12 @@  int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	if (fpu->initialized || using_compacted_format()) {
-		/* Save the live register state to the user directly. */
-		if (copy_fpregs_to_sigframe(buf_fx))
-			return -1;
-		/* Update the thread's fxstate to save the fsave header. */
-		if (ia32_fxstate)
-			copy_fxregs_to_kernel(fpu);
-	} else {
-		/*
-		 * It is a *bug* if kernel uses compacted-format for xsave
-		 * area and we copy it out directly to a signal frame. It
-		 * should have been handled above by saving the registers
-		 * directly.
-		 */
-		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
-			return -1;
-		}
-
-		fpstate_sanitize_xstate(fpu);
-		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
-			return -1;
-	}
+	/* Save the live register state to the user directly. */
+	if (copy_fpregs_to_sigframe(buf_fx))
+		return -1;
+	/* Update the thread's fxstate to save the fsave header. */
+	if (ia32_fxstate)
+		copy_fxregs_to_kernel(fpu);
 
 	/* Save the fsave header for the 32-bit frames. */
 	if ((ia32_fxstate || !use_fxsr()) && save_fsave_header(tsk, buf))