Patchwork [3/4] dma-debug: Dynamically expand the dma_debug_entry pool

login
register
mail settings
Submitter Robin Murphy
Date Dec. 3, 2018, 5:28 p.m.
Message ID <f99ea022be92a99339404867bc925f4fbd2ee6c4.1543856576.git.robin.murphy@arm.com>
Download mbox | patch
Permalink /patch/670953/
State New
Headers show

Comments

Robin Murphy - Dec. 3, 2018, 5:28 p.m.
Certain drivers such as large multi-queue network adapters can use pools
of mapped DMA buffers larger than the default dma_debug_entry pool of
65536 entries, with the result that merely probing such a device can
cause DMA debug to disable itself during boot unless explicitly given an
appropriate "dma_debug_entries=..." option.

Developers trying to debug some other driver on such a system may not be
immediately aware of this, and at worst it can hide bugs if they fail to
realise that dma-debug has already disabled itself unexpectedly by the
time the code of interest gets to run. Even once they do realise, it can
be a bit of a pain to emprirically determine a suitable number of
preallocated entries to configure without massively over-allocating.

There's really no need for such a static limit, though, since we can
quite easily expand the pool at runtime in those rare cases that the
preallocated entries are insufficient, which is arguably the least
surprising and most useful behaviour.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 kernel/dma/debug.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)
John Garry - Dec. 3, 2018, 6:23 p.m.
On 03/12/2018 17:28, Robin Murphy wrote:
> Certain drivers such as large multi-queue network adapters can use pools
> of mapped DMA buffers larger than the default dma_debug_entry pool of
> 65536 entries, with the result that merely probing such a device can
> cause DMA debug to disable itself during boot unless explicitly given an
> appropriate "dma_debug_entries=..." option.
>
> Developers trying to debug some other driver on such a system may not be
> immediately aware of this, and at worst it can hide bugs if they fail to
> realise that dma-debug has already disabled itself unexpectedly by the
> time the code of interest gets to run. Even once they do realise, it can
> be a bit of a pain to emprirically determine a suitable number of
> preallocated entries to configure without massively over-allocating.
>
> There's really no need for such a static limit, though, since we can
> quite easily expand the pool at runtime in those rare cases that the
> preallocated entries are insufficient, which is arguably the least
> surprising and most useful behaviour.

Hi Robin,

Do you have an idea on shrinking the pool again when the culprit driver 
is removed, i.e. we have so many unused debug entries now available?

Thanks,
John

>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  kernel/dma/debug.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
> index de5db800dbfc..46cc075aec99 100644
> --- a/kernel/dma/debug.c
> +++ b/kernel/dma/debug.c
> @@ -47,6 +47,9 @@
>  #ifndef PREALLOC_DMA_DEBUG_ENTRIES
>  #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
>  #endif
> +/* If the pool runs out, try this many times to allocate this many new entries */
> +#define DMA_DEBUG_DYNAMIC_ENTRIES 256
> +#define DMA_DEBUG_DYNAMIC_RETRIES 2
>
>  enum {
>  	dma_debug_single,
> @@ -702,12 +705,21 @@ static struct dma_debug_entry *dma_entry_alloc(void)
>  {
>  	struct dma_debug_entry *entry;
>  	unsigned long flags;
> +	int retry_count;
>
> -	spin_lock_irqsave(&free_entries_lock, flags);
> +	for (retry_count = 0; ; retry_count++) {
> +		spin_lock_irqsave(&free_entries_lock, flags);
> +
> +		if (num_free_entries > 0)
> +			break;
>
> -	if (list_empty(&free_entries)) {
> -		global_disable = true;
>  		spin_unlock_irqrestore(&free_entries_lock, flags);
> +
> +		if (retry_count < DMA_DEBUG_DYNAMIC_RETRIES &&
> +		    !prealloc_memory(DMA_DEBUG_DYNAMIC_ENTRIES))
> +			continue;
> +
> +		global_disable = true;
>  		pr_err("debugging out of memory - disabling\n");
>  		return NULL;
>  	}
>
Robin Murphy - Dec. 4, 2018, 1:11 p.m.
Hi John,

On 03/12/2018 18:23, John Garry wrote:
> On 03/12/2018 17:28, Robin Murphy wrote:
>> Certain drivers such as large multi-queue network adapters can use pools
>> of mapped DMA buffers larger than the default dma_debug_entry pool of
>> 65536 entries, with the result that merely probing such a device can
>> cause DMA debug to disable itself during boot unless explicitly given an
>> appropriate "dma_debug_entries=..." option.
>>
>> Developers trying to debug some other driver on such a system may not be
>> immediately aware of this, and at worst it can hide bugs if they fail to
>> realise that dma-debug has already disabled itself unexpectedly by the
>> time the code of interest gets to run. Even once they do realise, it can
>> be a bit of a pain to emprirically determine a suitable number of
>> preallocated entries to configure without massively over-allocating.
>>
>> There's really no need for such a static limit, though, since we can
>> quite easily expand the pool at runtime in those rare cases that the
>> preallocated entries are insufficient, which is arguably the least
>> surprising and most useful behaviour.
> 
> Hi Robin,
> 
> Do you have an idea on shrinking the pool again when the culprit driver 
> is removed, i.e. we have so many unused debug entries now available?

I honestly don't believe it's worth the complication. This is a 
development feature with significant overheads already, so there's not 
an awful lot to gain by trying to optimise memory usage. If a system can 
ever load a driver that makes hundreds of thousands of simultaneous 
mappings, it can almost certainly spare 20-odd megabytes of RAM for the 
corresponding debug entries in perpetuity. Sure, it does mean you'd need 
to reboot to recover memory from a major leak, but that's mostly true of 
the current behaviour too, and rebooting during driver development is 
hardly an unacceptable inconvenience.

In fact, having got this far in, what I'd quite like to do is to get rid 
of dma_debug_resize_entries() such that we never need to free things at 
all, since then we could allocate whole pages as blocks of entries to 
save on masses of individual slab allocations.

Robin.

> 
> Thanks,
> John
> 
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>  kernel/dma/debug.c | 18 +++++++++++++++---
>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
>> index de5db800dbfc..46cc075aec99 100644
>> --- a/kernel/dma/debug.c
>> +++ b/kernel/dma/debug.c
>> @@ -47,6 +47,9 @@
>>  #ifndef PREALLOC_DMA_DEBUG_ENTRIES
>>  #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
>>  #endif
>> +/* If the pool runs out, try this many times to allocate this many 
>> new entries */
>> +#define DMA_DEBUG_DYNAMIC_ENTRIES 256
>> +#define DMA_DEBUG_DYNAMIC_RETRIES 2
>>
>>  enum {
>>      dma_debug_single,
>> @@ -702,12 +705,21 @@ static struct dma_debug_entry 
>> *dma_entry_alloc(void)
>>  {
>>      struct dma_debug_entry *entry;
>>      unsigned long flags;
>> +    int retry_count;
>>
>> -    spin_lock_irqsave(&free_entries_lock, flags);
>> +    for (retry_count = 0; ; retry_count++) {
>> +        spin_lock_irqsave(&free_entries_lock, flags);
>> +
>> +        if (num_free_entries > 0)
>> +            break;
>>
>> -    if (list_empty(&free_entries)) {
>> -        global_disable = true;
>>          spin_unlock_irqrestore(&free_entries_lock, flags);
>> +
>> +        if (retry_count < DMA_DEBUG_DYNAMIC_RETRIES &&
>> +            !prealloc_memory(DMA_DEBUG_DYNAMIC_ENTRIES))
>> +            continue;
>> +
>> +        global_disable = true;
>>          pr_err("debugging out of memory - disabling\n");
>>          return NULL;
>>      }
>>
> 
>
Christoph Hellwig - Dec. 4, 2018, 2:17 p.m.
On Tue, Dec 04, 2018 at 01:11:37PM +0000, Robin Murphy wrote:
> In fact, having got this far in, what I'd quite like to do is to get rid of 
> dma_debug_resize_entries() such that we never need to free things at all, 
> since then we could allocate whole pages as blocks of entries to save on 
> masses of individual slab allocations.

Yes, we should defintively kill dma_debug_resize_entries.  Allocating
page batches might sound nice, but is that going to introduce additional
complexity?
Christoph Hellwig - Dec. 4, 2018, 2:29 p.m.
> +	for (retry_count = 0; ; retry_count++) {
> +		spin_lock_irqsave(&free_entries_lock, flags);
> +
> +		if (num_free_entries > 0)
> +			break;
>  
>  		spin_unlock_irqrestore(&free_entries_lock, flags);

Taking a spinlock just to read a single integer value doesn't really
help anything.

> +
> +		if (retry_count < DMA_DEBUG_DYNAMIC_RETRIES &&
> +		    !prealloc_memory(DMA_DEBUG_DYNAMIC_ENTRIES))

Don't we need GFP_ATOMIC here?  Also why do we need the retries?
Robin Murphy - Dec. 4, 2018, 4:06 p.m.
On 04/12/2018 14:17, Christoph Hellwig wrote:
> On Tue, Dec 04, 2018 at 01:11:37PM +0000, Robin Murphy wrote:
>> In fact, having got this far in, what I'd quite like to do is to get rid of
>> dma_debug_resize_entries() such that we never need to free things at all,
>> since then we could allocate whole pages as blocks of entries to save on
>> masses of individual slab allocations.
> 
> Yes, we should defintively kill dma_debug_resize_entries.  Allocating
> page batches might sound nice, but is that going to introduce additional
> complexity?

OK, looking at what the weird AMD GART code does I reckon it should be 
happy enough with on-demand expansion, and that no tears will be shed if 
it can no longer actually trim the pool to the size it thinks is 
necessary. I'll add a patch to clean that up.

Page-based allocation, at least the way I'm thinking of it, shouldn't do 
much more than add an extra loop in one place, which should be more than 
made up for by removing all the freeing code :)

Robin.
John Garry - Dec. 4, 2018, 4:30 p.m.
On 04/12/2018 13:11, Robin Murphy wrote:
> Hi John,
>
> On 03/12/2018 18:23, John Garry wrote:
>> On 03/12/2018 17:28, Robin Murphy wrote:
>>> Certain drivers such as large multi-queue network adapters can use pools
>>> of mapped DMA buffers larger than the default dma_debug_entry pool of
>>> 65536 entries, with the result that merely probing such a device can
>>> cause DMA debug to disable itself during boot unless explicitly given an
>>> appropriate "dma_debug_entries=..." option.
>>>
>>> Developers trying to debug some other driver on such a system may not be
>>> immediately aware of this, and at worst it can hide bugs if they fail to
>>> realise that dma-debug has already disabled itself unexpectedly by the
>>> time the code of interest gets to run. Even once they do realise, it can
>>> be a bit of a pain to emprirically determine a suitable number of
>>> preallocated entries to configure without massively over-allocating.
>>>
>>> There's really no need for such a static limit, though, since we can
>>> quite easily expand the pool at runtime in those rare cases that the
>>> preallocated entries are insufficient, which is arguably the least
>>> surprising and most useful behaviour.
>>
>> Hi Robin,
>>
>> Do you have an idea on shrinking the pool again when the culprit
>> driver is removed, i.e. we have so many unused debug entries now
>> available?
>
> I honestly don't believe it's worth the complication. This is a
> development feature with significant overheads already, so there's not
> an awful lot to gain by trying to optimise memory usage. If a system can
> ever load a driver that makes hundreds of thousands of simultaneous
> mappings, it can almost certainly spare 20-odd megabytes of RAM for the
> corresponding debug entries in perpetuity. Sure, it does mean you'd need
> to reboot to recover memory from a major leak, but that's mostly true of
> the current behaviour too, and rebooting during driver development is
> hardly an unacceptable inconvenience.
>

ok, I just thought that it would not be too difficult to implement this 
on the dma entry free path.

> In fact, having got this far in, what I'd quite like to do is to get rid
> of dma_debug_resize_entries() such that we never need to free things at
> all, since then we could allocate whole pages as blocks of entries to
> save on masses of individual slab allocations.
>

On a related topic, is it possible for the user to learn the total 
entries created at a given point in time? If not, could we add a file in 
the debugfs folder for this?

Thanks,
John

> Robin.
>
>>
>> Thanks,
>> John
>>
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>  kernel/dma/debug.c | 18 +++++++++++++++---
>>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
>>> index de5db800dbfc..46cc075aec99 100644
>>> --- a/kernel/dma/debug.c
>>> +++ b/kernel/dma/debug.c
>>> @@ -47,6 +47,9 @@
>>>  #ifndef PREALLOC_DMA_DEBUG_ENTRIES
>>>  #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
>>>  #endif
>>> +/* If the pool runs out, try this many times to allocate this many
>>> new entries */
>>> +#define DMA_DEBUG_DYNAMIC_ENTRIES 256
>>> +#define DMA_DEBUG_DYNAMIC_RETRIES 2
>>>
>>>  enum {
>>>      dma_debug_single,
>>> @@ -702,12 +705,21 @@ static struct dma_debug_entry
>>> *dma_entry_alloc(void)
>>>  {
>>>      struct dma_debug_entry *entry;
>>>      unsigned long flags;
>>> +    int retry_count;
>>>
>>> -    spin_lock_irqsave(&free_entries_lock, flags);
>>> +    for (retry_count = 0; ; retry_count++) {
>>> +        spin_lock_irqsave(&free_entries_lock, flags);
>>> +
>>> +        if (num_free_entries > 0)
>>> +            break;
>>>
>>> -    if (list_empty(&free_entries)) {
>>> -        global_disable = true;
>>>          spin_unlock_irqrestore(&free_entries_lock, flags);
>>> +
>>> +        if (retry_count < DMA_DEBUG_DYNAMIC_RETRIES &&
>>> +            !prealloc_memory(DMA_DEBUG_DYNAMIC_ENTRIES))
>>> +            continue;
>>> +
>>> +        global_disable = true;
>>>          pr_err("debugging out of memory - disabling\n");
>>>          return NULL;
>>>      }
>>>
>>
>>
>
> .
>
Robin Murphy - Dec. 4, 2018, 4:32 p.m.
On 04/12/2018 14:29, Christoph Hellwig wrote:
>> +	for (retry_count = 0; ; retry_count++) {
>> +		spin_lock_irqsave(&free_entries_lock, flags);
>> +
>> +		if (num_free_entries > 0)
>> +			break;
>>   
>>   		spin_unlock_irqrestore(&free_entries_lock, flags);
> 
> Taking a spinlock just to read a single integer value doesn't really
> help anything.

If the freelist is non-empty we break out with the lock still held in 
order to actually allocate our entry - only if there are no free entries 
left do we drop the lock in order to handle the failure. This much is 
just the original logic shuffled around a bit (with the tweak that 
testing num_free_entries seemed justifiably simpler than the original 
list_empty() check).

>> +
>> +		if (retry_count < DMA_DEBUG_DYNAMIC_RETRIES &&
>> +		    !prealloc_memory(DMA_DEBUG_DYNAMIC_ENTRIES))
> 
> Don't we need GFP_ATOMIC here?  Also why do we need the retries?

Ah, right, we may be outside our own spinlock, but of course the whole 
DMA API call which got us here might be under someone else's and/or in a 
non-sleeping context - I'll fix that.

The number of retries is just to bound the loop due to its inherent 
raciness - since we drop the lock to create more entries, under 
pathological conditions by the time we get back in to grab one they 
could have all gone. 2 retries (well, strictly it's 1 try and 1 retry) 
was an entirely arbitrary choice just to accommodate that happening very 
occasionally by chance.

However, if the dynamic allocations need GFP_ATOMIC for external reasons 
anyway, then I don't need the lock-juggling that invites that race in 
the first place, and the whole loop disappears again. Neat!

Robin.
Robin Murphy - Dec. 4, 2018, 5:19 p.m.
On 04/12/2018 16:30, John Garry wrote:
> On 04/12/2018 13:11, Robin Murphy wrote:
>> Hi John,
>>
>> On 03/12/2018 18:23, John Garry wrote:
>>> On 03/12/2018 17:28, Robin Murphy wrote:
>>>> Certain drivers such as large multi-queue network adapters can use 
>>>> pools
>>>> of mapped DMA buffers larger than the default dma_debug_entry pool of
>>>> 65536 entries, with the result that merely probing such a device can
>>>> cause DMA debug to disable itself during boot unless explicitly 
>>>> given an
>>>> appropriate "dma_debug_entries=..." option.
>>>>
>>>> Developers trying to debug some other driver on such a system may 
>>>> not be
>>>> immediately aware of this, and at worst it can hide bugs if they 
>>>> fail to
>>>> realise that dma-debug has already disabled itself unexpectedly by the
>>>> time the code of interest gets to run. Even once they do realise, it 
>>>> can
>>>> be a bit of a pain to emprirically determine a suitable number of
>>>> preallocated entries to configure without massively over-allocating.
>>>>
>>>> There's really no need for such a static limit, though, since we can
>>>> quite easily expand the pool at runtime in those rare cases that the
>>>> preallocated entries are insufficient, which is arguably the least
>>>> surprising and most useful behaviour.
>>>
>>> Hi Robin,
>>>
>>> Do you have an idea on shrinking the pool again when the culprit
>>> driver is removed, i.e. we have so many unused debug entries now
>>> available?
>>
>> I honestly don't believe it's worth the complication. This is a
>> development feature with significant overheads already, so there's not
>> an awful lot to gain by trying to optimise memory usage. If a system can
>> ever load a driver that makes hundreds of thousands of simultaneous
>> mappings, it can almost certainly spare 20-odd megabytes of RAM for the
>> corresponding debug entries in perpetuity. Sure, it does mean you'd need
>> to reboot to recover memory from a major leak, but that's mostly true of
>> the current behaviour too, and rebooting during driver development is
>> hardly an unacceptable inconvenience.
>>
> 
> ok, I just thought that it would not be too difficult to implement this 
> on the dma entry free path.

True, in the current code it wouldn't be all that hard, but it feels 
more worthwhile to optimise for allocation rather than freeing, and as 
soon as we start allocating memory for multiple entries at once, trying 
to free anything becomes extremely challenging.

>> In fact, having got this far in, what I'd quite like to do is to get rid
>> of dma_debug_resize_entries() such that we never need to free things at
>> all, since then we could allocate whole pages as blocks of entries to
>> save on masses of individual slab allocations.
>>
> 
> On a related topic, is it possible for the user to learn the total 
> entries created at a given point in time? If not, could we add a file in 
> the debugfs folder for this?

I did get as far as pondering that you effectively lose track of 
utilisation once the low-water-mark of min_free_entries hits 0 and stays 
there - AFAICS it should be sufficient to just expose nr_total_entries 
as-is, since users can then calculate current and maximum occupancy 
based on *_free_entries. Does that sound reasonable to you?

That also indirectly reminds me that this lot is documented in 
DMA_API.txt, so I should be good and update that too...

Cheers,
Robin.
John Garry - Dec. 4, 2018, 5:38 p.m.
>
>>> In fact, having got this far in, what I'd quite like to do is to get rid
>>> of dma_debug_resize_entries() such that we never need to free things at
>>> all, since then we could allocate whole pages as blocks of entries to
>>> save on masses of individual slab allocations.
>>>
>>
>> On a related topic, is it possible for the user to learn the total
>> entries created at a given point in time? If not, could we add a file
>> in the debugfs folder for this?
>

Hi Robin,

> I did get as far as pondering that you effectively lose track of
> utilisation once the low-water-mark of min_free_entries hits 0 and stays

I did try your patches and I noticed this, i.e I was hitting the point 
at which we start to alloc more entries.

> there - AFAICS it should be sufficient to just expose nr_total_entries
> as-is, since users can then calculate current and maximum occupancy
> based on *_free_entries. Does that sound reasonable to you?
>

Sounds ok. I am just interested to know roughly how many DMA buffers 
we're using in our system.

> That also indirectly reminds me that this lot is documented in
> DMA_API.txt, so I should be good and update that too...

Thanks,
John

>
> Cheers,
> Robin.
>
> .
>

Patch

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index de5db800dbfc..46cc075aec99 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -47,6 +47,9 @@ 
 #ifndef PREALLOC_DMA_DEBUG_ENTRIES
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 #endif
+/* If the pool runs out, try this many times to allocate this many new entries */
+#define DMA_DEBUG_DYNAMIC_ENTRIES 256
+#define DMA_DEBUG_DYNAMIC_RETRIES 2
 
 enum {
 	dma_debug_single,
@@ -702,12 +705,21 @@  static struct dma_debug_entry *dma_entry_alloc(void)
 {
 	struct dma_debug_entry *entry;
 	unsigned long flags;
+	int retry_count;
 
-	spin_lock_irqsave(&free_entries_lock, flags);
+	for (retry_count = 0; ; retry_count++) {
+		spin_lock_irqsave(&free_entries_lock, flags);
+
+		if (num_free_entries > 0)
+			break;
 
-	if (list_empty(&free_entries)) {
-		global_disable = true;
 		spin_unlock_irqrestore(&free_entries_lock, flags);
+
+		if (retry_count < DMA_DEBUG_DYNAMIC_RETRIES &&
+		    !prealloc_memory(DMA_DEBUG_DYNAMIC_ENTRIES))
+			continue;
+
+		global_disable = true;
 		pr_err("debugging out of memory - disabling\n");
 		return NULL;
 	}