Patchwork [v8,4/5] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

login
register
mail settings
Submitter Andrew Murray
Date Jan. 8, 2019, 11:25 a.m.
Message ID <20190108112512.GA56789@e119886-lin.cambridge.arm.com>
Download mbox | patch
Permalink /patch/694683/
State New
Headers show

Comments

Andrew Murray - Jan. 8, 2019, 11:25 a.m.
On Tue, Jan 08, 2019 at 11:18:43AM +0100, Christoffer Dall wrote:
> On Fri, Jan 04, 2019 at 03:32:06PM +0000, Will Deacon wrote:
> > On Tue, Dec 18, 2018 at 01:02:26PM +0100, Christoffer Dall wrote:
> > > On Wed, Dec 12, 2018 at 10:29:32AM +0000, Andrew Murray wrote:
> > > > Add support for the :G and :H attributes in perf by handling the
> > > > exclude_host/exclude_guest event attributes.
> > > > 
> > > > We notify KVM of counters that we wish to be enabled or disabled on
> > > > guest entry/exit and thus defer from starting or stopping :G events
> > > > as per the events exclude_host attribute.
> > > > 
> > > > With both VHE and non-VHE we switch the counters between host/guest
> > > > at EL2. We are able to eliminate counters counting host events on
> > > > the boundaries of guest entry/exit when using :G by filtering out
> > > > EL2 for exclude_host. However when using :H unless exclude_hv is set
> > > > on non-VHE then there is a small blackout window at the guest
> > > > entry/exit where host events are not captured.
> > > > 
> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > ---
> > > >  arch/arm64/kernel/perf_event.c | 51 ++++++++++++++++++++++++++++++++++++------
> > > >  1 file changed, 44 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > > > index de564ae..4a3c73d 100644
> > > > --- a/arch/arm64/kernel/perf_event.c
> > > > +++ b/arch/arm64/kernel/perf_event.c
> > > > @@ -26,6 +26,7 @@
> > > >  
> > > >  #include <linux/acpi.h>
> > > >  #include <linux/clocksource.h>
> > > > +#include <linux/kvm_host.h>
> > > >  #include <linux/of.h>
> > > >  #include <linux/perf/arm_pmu.h>
> > > >  #include <linux/platform_device.h>
> > > > @@ -647,11 +648,26 @@ static inline int armv8pmu_enable_counter(int idx)
> > > >  
> > > >  static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> > > >  {
> > > > +	struct perf_event_attr *attr = &event->attr;
> > > >  	int idx = event->hw.idx;
> > > > +	int flags = 0;
> > > > +	u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > >  
> > > > -	armv8pmu_enable_counter(idx);
> > > >  	if (armv8pmu_event_is_chained(event))
> > > > -		armv8pmu_enable_counter(idx - 1);
> > > > +		counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > +
> > > > +	if (!attr->exclude_host)
> > > > +		flags |= KVM_PMU_EVENTS_HOST;
> > > > +	if (!attr->exclude_guest)
> > > > +		flags |= KVM_PMU_EVENTS_GUEST;
> > > > +
> > > > +	kvm_set_pmu_events(counter_bits, flags);
> > > > +
> > > > +	if (!attr->exclude_host) {
> > > > +		armv8pmu_enable_counter(idx);
> > > > +		if (armv8pmu_event_is_chained(event))
> > > > +			armv8pmu_enable_counter(idx - 1);
> > > > +	}
> > > >  }
> > > >  
> > > >  static inline int armv8pmu_disable_counter(int idx)
> > > > @@ -664,11 +680,20 @@ static inline int armv8pmu_disable_counter(int idx)
> > > >  static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> > > >  {
> > > >  	struct hw_perf_event *hwc = &event->hw;
> > > > +	struct perf_event_attr *attr = &event->attr;
> > > >  	int idx = hwc->idx;
> > > > +	u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > >  
> > > >  	if (armv8pmu_event_is_chained(event))
> > > > -		armv8pmu_disable_counter(idx - 1);
> > > > -	armv8pmu_disable_counter(idx);
> > > > +		counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > +
> > > > +	kvm_clr_pmu_events(counter_bits);
> > > > +
> > > > +	if (!attr->exclude_host) {
> > > > +		if (armv8pmu_event_is_chained(event))
> > > > +			armv8pmu_disable_counter(idx - 1);
> > > > +		armv8pmu_disable_counter(idx);
> > > > +	}
> > > >  }
> > > >  
> > > >  static inline int armv8pmu_enable_intens(int idx)
> > > > @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> > > >  	 * Therefore we ignore exclude_hv in this configuration, since
> > > >  	 * there's no hypervisor to sample anyway. This is consistent
> > > >  	 * with other architectures (x86 and Power).
> > > > +	 *
> > > > +	 * To eliminate counting host events on the boundaries of
> > > > +	 * guest entry/exit we ensure EL2 is not included in hyp mode
> > > > +	 * with !exclude_host.
> > > >  	 */
> > > >  	if (is_kernel_in_hyp_mode()) {
> > > > -		if (!attr->exclude_kernel)
> > > > +		if (!attr->exclude_kernel && !attr->exclude_host)
> > > >  			config_base |= ARMV8_PMU_INCLUDE_EL2;
> > > >  	} else {
> > > > -		if (attr->exclude_kernel)
> > > > -			config_base |= ARMV8_PMU_EXCLUDE_EL1;
> > > >  		if (!attr->exclude_hv)
> > > >  			config_base |= ARMV8_PMU_INCLUDE_EL2;
> > > 
> > > I'm not sure about the current use of exclude_hv here.  The comment says
> > > it's consistent with other architectures, but I can't find an example to
> > > confirm this, and I don't think we have a comparable thing to the split
> > > of the hypervisor between EL1 and EL2 we have on non-VHE.
> > 
> > FWIW, that comment came from this thread:
> > 
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-April/503908.html
> > 
> > That was painful enough at the time, so I'd /really/ prefer not to change
> > the semantics of this again if we can avoid it.
> 
> The comment makes sense for the is_kernel_in_hyp_mode() case.
> 
> However, for the !is_kernel_in_hyp_mode() case I can't see the current
> behavior of exclude_hv being similar in other architectures.
> 
> I don't think the current semantics of excluding EL2 on a non-VHE host
> system makes much sense, and I doubt anyone is using that for something
> meaningful.  I think changing behavior for excldue_hv to depend on
> is_hyp_mode_available rather than is_kernel_in_hyp_mode is the right
> thing to do which would also align the semantics with other
> architectures and between VHE and non-VHE.

Just for clarity, see below for the proposed patch - this disallows EL2
counting for !VHE when we have the capability to be a KVM host.

Subject: [PATCH] arm64: arm_pmu: Disallow EL2 counting on !VHE unless guest

When the kernel runs without VHE support it runs at EL1. However it switches
to EL2 when switching to and from KVM guests. The exclude_hv flag (for a !VHE
kernel) will include EL2 counting. The exclude_hv flag is intended to count
events associated with the hypervisor for the current instance, not the
overhead of the current instance's guests.

Let's disallow EL2 counting for !VHE when we know that we have the capability
to be a KVM host (by virtue that we booted in EL2) and thus probably aren't
a guest ourselves.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/kernel/perf_event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Marc Zyngier - Jan. 8, 2019, 11:50 a.m.
On Tue, 08 Jan 2019 11:25:13 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:

Hi Andrew,

> My only doubt about this is as follows. If, on a KVM host you run this:
> 
> perf stat -e cycles:H lkvm run ...
> 
> then on the VHE host the cycles reported represents the entire non-guest cycles
> associated with running the guest.
> 
> On a !VHE, the cycles reported exclude EL2 (with or without this patch) and
> thus you don't get a representation of all the non-guest cycles associated with
> the guest. However without this patch you could at least still run:
> 
> perf stat -e cycles:H -e cycles:h lkvm run ...
> 
> and then add the two cycle counts together to get something comparative with
> the VHE host.
> 
> If the above patch represents the desired semantics, then perhaps we must count
> both EL1 and *EL2* for !exclude_kernel on !VHE. In fact I think we should do
> this anyway and remove a little complexity from armv8pmu_set_event_filter.
> Thoughts?

I'm not sure we should hide the architectural differences between VHE
and !VHE. If you're trying to measure what is happening at in the
hypervisor, you can't reason about it while ignoring the dual nature
of !VHE.

Thanks,

	M.
Christoffer Dall - Jan. 8, 2019, 12:03 p.m.
On Tue, Jan 08, 2019 at 11:50:59AM +0000, Marc Zyngier wrote:
> On Tue, 08 Jan 2019 11:25:13 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> 
> Hi Andrew,
> 
> > My only doubt about this is as follows. If, on a KVM host you run this:
> > 
> > perf stat -e cycles:H lkvm run ...
> > 
> > then on the VHE host the cycles reported represents the entire non-guest cycles
> > associated with running the guest.
> > 
> > On a !VHE, the cycles reported exclude EL2 (with or without this patch) and
> > thus you don't get a representation of all the non-guest cycles associated with
> > the guest. However without this patch you could at least still run:
> > 
> > perf stat -e cycles:H -e cycles:h lkvm run ...
> > 
> > and then add the two cycle counts together to get something comparative with
> > the VHE host.
> > 
> > If the above patch represents the desired semantics, then perhaps we must count
> > both EL1 and *EL2* for !exclude_kernel on !VHE. In fact I think we should do
> > this anyway and remove a little complexity from armv8pmu_set_event_filter.
> > Thoughts?
> 
> I'm not sure we should hide the architectural differences between VHE
> and !VHE. If you're trying to measure what is happening at in the
> hypervisor, you can't reason about it while ignoring the dual nature
> of !VHE.
> 

How do you define hypervisor here?  Is that just the code that runs at
EL2 or also parts of KVM that runs at EL1?

It remains unclear to me why you'd want to measure a subset of KVM,
which happens to run in EL2, in your host (and hypervisor-enabled)
kernel, and you are even precluded from measuring a comparable portion
of your implementation on other Arm systems (VHE).

Admittedly, I'm not at export in using perf, but I find this EL1/EL2
distinction out of place as it relates to exlude_kernel, exlude_user,
and exlude_hv.  Will we have a fourth Arm-specific flag which takes the
place of exclude_hv on PowerPC, which excludes an underlying hypervisor
when running a guest, should we ever support counting that in the
future?


Thanks,

    Christoffer
Marc Zyngier - Jan. 8, 2019, 12:12 p.m.
On 08/01/2019 12:03, Christoffer Dall wrote:
> On Tue, Jan 08, 2019 at 11:50:59AM +0000, Marc Zyngier wrote:
>> On Tue, 08 Jan 2019 11:25:13 +0000,
>> Andrew Murray <andrew.murray@arm.com> wrote:
>>
>> Hi Andrew,
>>
>>> My only doubt about this is as follows. If, on a KVM host you run this:
>>>
>>> perf stat -e cycles:H lkvm run ...
>>>
>>> then on the VHE host the cycles reported represents the entire non-guest cycles
>>> associated with running the guest.
>>>
>>> On a !VHE, the cycles reported exclude EL2 (with or without this patch) and
>>> thus you don't get a representation of all the non-guest cycles associated with
>>> the guest. However without this patch you could at least still run:
>>>
>>> perf stat -e cycles:H -e cycles:h lkvm run ...
>>>
>>> and then add the two cycle counts together to get something comparative with
>>> the VHE host.
>>>
>>> If the above patch represents the desired semantics, then perhaps we must count
>>> both EL1 and *EL2* for !exclude_kernel on !VHE. In fact I think we should do
>>> this anyway and remove a little complexity from armv8pmu_set_event_filter.
>>> Thoughts?
>>
>> I'm not sure we should hide the architectural differences between VHE
>> and !VHE. If you're trying to measure what is happening at in the
>> hypervisor, you can't reason about it while ignoring the dual nature
>> of !VHE.
>>
> 
> How do you define hypervisor here?  Is that just the code that runs at
> EL2 or also parts of KVM that runs at EL1?

I define it as "not a guest". Whatever is used to support a guest is the
hypervisor.

> It remains unclear to me why you'd want to measure a subset of KVM,
> which happens to run in EL2, in your host (and hypervisor-enabled)
> kernel, and you are even precluded from measuring a comparable portion
> of your implementation on other Arm systems (VHE).

Because I'm not trying to compare apples (VHE) and oranges (!VHE). My
use-case for perf is to measure the impact of a change on a given
implementation, and the more I can narrow the impact of that change, the
better (specially when !VHE precludes the use of other techniques such
as sampling).

> Admittedly, I'm not at export in using perf, but I find this EL1/EL2
> distinction out of place as it relates to exlude_kernel, exlude_user,
> and exlude_hv.  Will we have a fourth Arm-specific flag which takes the
> place of exclude_hv on PowerPC, which excludes an underlying hypervisor
> when running a guest, should we ever support counting that in the
> future?
In all honestly, exclude_hv doesn't make much sense to me on a VHE
system, unless you define an arbitrary cutting point where things are on
one side or the other. As for a fourth flag, I have no idea.

Thanks,

	M.
Christoffer Dall - Jan. 8, 2019, 12:20 p.m.
On Tue, Jan 08, 2019 at 12:12:13PM +0000, Marc Zyngier wrote:
> On 08/01/2019 12:03, Christoffer Dall wrote:
> > On Tue, Jan 08, 2019 at 11:50:59AM +0000, Marc Zyngier wrote:
> >> On Tue, 08 Jan 2019 11:25:13 +0000,
> >> Andrew Murray <andrew.murray@arm.com> wrote:
> >>
> >> Hi Andrew,
> >>
> >>> My only doubt about this is as follows. If, on a KVM host you run this:
> >>>
> >>> perf stat -e cycles:H lkvm run ...
> >>>
> >>> then on the VHE host the cycles reported represents the entire non-guest cycles
> >>> associated with running the guest.
> >>>
> >>> On a !VHE, the cycles reported exclude EL2 (with or without this patch) and
> >>> thus you don't get a representation of all the non-guest cycles associated with
> >>> the guest. However without this patch you could at least still run:
> >>>
> >>> perf stat -e cycles:H -e cycles:h lkvm run ...
> >>>
> >>> and then add the two cycle counts together to get something comparative with
> >>> the VHE host.
> >>>
> >>> If the above patch represents the desired semantics, then perhaps we must count
> >>> both EL1 and *EL2* for !exclude_kernel on !VHE. In fact I think we should do
> >>> this anyway and remove a little complexity from armv8pmu_set_event_filter.
> >>> Thoughts?
> >>
> >> I'm not sure we should hide the architectural differences between VHE
> >> and !VHE. If you're trying to measure what is happening at in the
> >> hypervisor, you can't reason about it while ignoring the dual nature
> >> of !VHE.
> >>
> > 
> > How do you define hypervisor here?  Is that just the code that runs at
> > EL2 or also parts of KVM that runs at EL1?
> 
> I define it as "not a guest". Whatever is used to support a guest is the
> hypervisor.
> 
> > It remains unclear to me why you'd want to measure a subset of KVM,
> > which happens to run in EL2, in your host (and hypervisor-enabled)
> > kernel, and you are even precluded from measuring a comparable portion
> > of your implementation on other Arm systems (VHE).
> 
> Because I'm not trying to compare apples (VHE) and oranges (!VHE). My
> use-case for perf is to measure the impact of a change on a given
> implementation, and the more I can narrow the impact of that change, the
> better (specially when !VHE precludes the use of other techniques such
> as sampling).
> 

Fair enough.  I don't know if that's the only use case for perf we
should consider though.

> > Admittedly, I'm not at export in using perf, but I find this EL1/EL2
> > distinction out of place as it relates to exlude_kernel, exlude_user,
> > and exlude_hv.  Will we have a fourth Arm-specific flag which takes the
> > place of exclude_hv on PowerPC, which excludes an underlying hypervisor
> > when running a guest, should we ever support counting that in the
> > future?
> In all honestly, exclude_hv doesn't make much sense to me on a VHE
> system, unless you define an arbitrary cutting point where things are on
> one side or the other. As for a fourth flag, I have no idea.
> 

I think this all boils down to how these flags are interpreted and
represented to a user via tooling.  If these flags must be considered
in complete isolation on a particular system and architecture, then
fine, we can define it as whatever we want, giving us a little bit more
insight on where things happen on a !VHE system.

If we care about these flags representing similar semantics to other
architectures, then I contend that we are abusing the exclude_hv flag
today, and exclude_hv should only ever have an effect when set within
a guest, not in a host.


Thanks,

    Christoffer

Patch

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 1620a37..bd3f6ca 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -832,7 +832,7 @@  static int armv8pmu_set_event_filter(struct hw_perf_event *event,
        } else {
                if (attr->exclude_kernel)
                        config_base |= ARMV8_PMU_EXCLUDE_EL1;
-               if (!attr->exclude_hv)
+               if (!attr->exclude_hv && !is_hyp_mode_is_available())
                        config_base |= ARMV8_PMU_INCLUDE_EL2;
        }
        if (attr->exclude_user)