Patchwork [RFC] hw/arm/virt: use variable size of flash device to save memory

login
register
mail settings
Submitter Zheng Xiang
Date April 15, 2019, 2:39 a.m.
Message ID <a6affbbb-ab69-7ebd-13ba-a42dbf1ed78e@huawei.com>
Download mbox | patch
Permalink /patch/772791/
State New
Headers show

Comments

Zheng Xiang - April 15, 2019, 2:39 a.m.
On 2019/4/12 18:57, Kevin Wolf wrote:
> Am 12.04.2019 um 11:50 hat Xiang Zheng geschrieben:
>>
>> On 2019/4/12 9:52, Xiang Zheng wrote:
>>> On 2019/4/11 20:22, Kevin Wolf wrote:
>>>> Okay, so your problem is that blk_pread() writes to the whole buffer,
>>>> writing explicit zeroes for unallocated parts of the image, while you
>>>> would like to leave those parts of the buffer untouched so that we don't
>>>> actually allocate the memory, but can just use the shared zero page.
>>>>
>>>> If you just want to read the non-zero parts of the image, that can be
>>>> done by using a loop that calls bdrv_block_status() and only reads from
>>>> the image if the BDRV_BLOCK_ZERO bit is clear.
>>>>
>>>> Would this solve your problem?
>>>
>>> Sounds good! What if guest tried to read/write the zero parts?
>>>
>>
>> I wrote the below patch (refer to bdrv_make_zero()) for test, it seems
>> that everything is OK and the memory is also exactly allocated on demand.
>>
>> This requires pflash devices to use sparse files backend. Thus I have to
>> create images like:
>>
>>    dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M seek=64 count=0
>>    dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
>>
>>    dd of="empty_VARS.fd" if="/dev/zero" bs=1M seek=64 count=0
>>
>>
>> ---8>---
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index f78e82a..ed8ca87 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -1379,6 +1379,12 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, int64_t offset,
>>                          flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
>>  }
>>
>> +int blk_pread_nonzeroes(BlockBackend *blk, void *buf)
>> +{
>> +    int ret = bdrv_pread_nonzeroes(blk->root, buf);
>> +    return ret;
>> +}
> 
> I don't think this deserves a place in the public block layer interface,
> as it's only a single device that makes use of it.
> 
> Maybe you wrote things this way because there is no blk_block_status(),
> but you can get the BlockDriverState with blk_bs(blk) and then implement
> everything inside hw/block/block.c.

Yes, you are right.

> 
>>  int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
>>  {
>>      int ret = blk_prw(blk, offset, buf, count, blk_read_entry, 0);
>> diff --git a/block/io.c b/block/io.c
>> index dfc153b..83e5ea7 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -882,6 +882,38 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
>>                          BDRV_REQ_ZERO_WRITE | flags);
>>  }
>>
>> +int bdrv_pread_nonzeroes(BdrvChild *child, void *buf)
>> +{
>> +    int ret;
>> +    int64_t target_size, bytes, offset = 0;
>> +    BlockDriverState *bs = child->bs;
>> +
>> +    target_size = bdrv_getlength(bs);
>> +    if (target_size < 0) {
>> +        return target_size;
>> +    }
>> +
>> +    for (;;) {
>> +        bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
>> +        if (bytes <= 0) {
>> +            return 0;
>> +        }
>> +        ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL, NULL);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +        if (ret & BDRV_BLOCK_ZERO) {
>> +            offset += bytes;
>> +            continue;
>> +        }
>> +        ret = bdrv_pread(child, offset, buf, bytes);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +        offset += bytes;
> 
> I think the code becomes simpler the other way round:
> 
>     if (!(ret & BDRV_BLOCK_ZERO)) {
>         ret = bdrv_pread(child, offset, buf, bytes);
>         if (ret < 0) {
>             return ret;
>         }
>     }
>     offset += bytes;
> 
> You don't increment buf, so if you have a hole in the file, this will
> corrupt the buffer. You need to either increment buf, too, or use
> (uint8_t*) buf + offset for the bdrv_pread() call.
> 

Yes, I didn't notice it. I think the latter is better. Does *BDRV_BLOCK_ZERO*
mean that there are all-zeroes data or a hole in the sector? But if I use an
image filled with zeroes, it will not set BDRV_BLOCK_ZERO bit on return.

Should I resend a patch?

---8>---

From 4dbfe4955aa9fe23404cbe1890fbe148be2ff10e Mon Sep 17 00:00:00 2001
From: Xiang Zheng <zhengxiang9@huawei.com>
Date: Sat, 13 Apr 2019 02:27:03 +0800
Subject: [PATCH] pflash: Only read non-zero parts of backend image

Currently we fill the VIRT_FLASH memory space with two 64MB NOR images
when using persistent UEFI variables on virt board. Actually we only use
a very small(non-zero) part of the memory while the rest significant
large(zero) part of memory is wasted.

So this patch checks the block status and only writes the non-zero part
into memory. This requires pflash devices to use sparse files for
backends.

Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 hw/block/block.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

Patch

diff --git a/hw/block/block.c b/hw/block/block.c
index bf56c76..3cb9d4c 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -15,6 +15,44 @@ 
 #include "qapi/qapi-types-block.h"

 /*
+ * Read the non-zeroes parts of @blk into @buf
+ * Reading all of the @blk is expensive if the zeroes parts of @blk
+ * is large enough. Therefore check the block status and only write
+ * the non-zeroes block into @buf.
+ *
+ * Return 0 on success, non-zero on error.
+ */
+static int blk_pread_nonzeroes(BlockBackend *blk, void *buf)
+{
+    int ret;
+    int64_t target_size, bytes, offset = 0;
+    BlockDriverState *bs = blk_bs(blk);
+
+    target_size = bdrv_getlength(bs);
+    if (target_size < 0) {
+        return target_size;
+    }
+
+    for (;;) {
+        bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_SECTORS);
+        if (bytes <= 0) {
+            return 0;
+        }
+        ret = bdrv_block_status(bs, offset, bytes, &bytes, NULL, NULL);
+        if (ret < 0) {
+            return ret;
+        }
+        if (!(ret & BDRV_BLOCK_ZERO)) {
+            ret = bdrv_pread(bs->file, offset, (uint8_t *) buf + offset, bytes);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+        offset += bytes;
+    }
+}
+
+/*
  * Read the entire contents of @blk into @buf.
  * @blk's contents must be @size bytes, and @size must be at most
  * BDRV_REQUEST_MAX_BYTES.
@@ -53,7 +91,7 @@  bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
      * block device and read only on demand.
      */
     assert(size <= BDRV_REQUEST_MAX_BYTES);
-    ret = blk_pread(blk, 0, buf, size);
+    ret = blk_pread_nonzeroes(blk, buf);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "can't read block backend");
         return false;