Patchwork [RFC,v3,1/7] dmaengine: Add Synopsys eDMA IP core driver

login
register
mail settings
Submitter Gustavo Pimentel
Date Jan. 11, 2019, 6:33 p.m.
Message ID <d2b6e8bed0c5077425b913e551794ed126d74fb2.1547230339.git.gustavo.pimentel@synopsys.com>
Download mbox | patch
Permalink /patch/698027/
State New
Headers show

Comments

Gustavo Pimentel - Jan. 11, 2019, 6:33 p.m.
Add Synopsys eDMA IP core driver to kernel.

This core driver, initializes and configures the eDMA IP using vma-helpers
functions and dma-engine subsystem.

Also creates an abstration layer through callbacks allowing different
registers mappings in the future, organized in to versions.

This driver can be compile as built-in or external module in kernel.

To enable this driver just select DW_EDMA option in kernel configuration,
however it requires and selects automatically DMA_ENGINE and
DMA_VIRTUAL_CHANNELS option too.

Changes:
RFC v1->RFC v2:
 - Replace comments // (C99 style) by /**/
 - Fix the headers of the .c and .h files according to the most recent
   convention
 - Fix errors and checks pointed out by checkpatch with --strict option
 - Replace patch small description tag from dma by dmaengine
 - Change some dev_info() into dev_dbg()
 - Remove unnecessary zero initialization after kzalloc
 - Remove direction validation on config() API, since the direction
   parameter is deprecated
 - Refactor code to replace atomic_t by u32 variable type
 - Replace start_transfer() name by dw_edma_start_transfer()
 - Add spinlock to dw_edma_device_prep_slave_sg()
 - Add spinlock to dw_edma_free_chunk()
 - Simplify switch case into if on dw_edma_device_pause(),
   dw_edma_device_resume() and dw_edma_device_terminate_all()
RFC v2->RFC v3:
 - Add driver parameter to disable msix feature
 - Fix printk variable of phys_addr_t type
 - Fix printk variable of __iomem type
 - Fix printk variable of size_t type
 - Add comments or improve existing ones
 - Add possibility to work with multiple IRQs feature
 - Fix source and destination addresses
 - Add define to magic numbers
 - Add DMA cyclic transfer feature

Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Eugeniy Paltsev <paltsev@synopsys.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Jose Abreu <jose.abreu@synopsys.com>
Cc: Luis Oliveira <lolivei@synopsys.com>
Cc: Vitor Soares <vitor.soares@synopsys.com>
Cc: Nelson Costa <nelson.costa@synopsys.com>
Cc: Pedro Sousa <pedrom.sousa@synopsys.com>
---
 drivers/dma/Kconfig                |    2 +
 drivers/dma/Makefile               |    1 +
 drivers/dma/dw-edma/Kconfig        |    9 +
 drivers/dma/dw-edma/Makefile       |    4 +
 drivers/dma/dw-edma/dw-edma-core.c | 1059 ++++++++++++++++++++++++++++++++++++
 drivers/dma/dw-edma/dw-edma-core.h |  151 +++++
 include/linux/dma/edma.h           |   43 ++
 7 files changed, 1269 insertions(+)
 create mode 100644 drivers/dma/dw-edma/Kconfig
 create mode 100644 drivers/dma/dw-edma/Makefile
 create mode 100644 drivers/dma/dw-edma/dw-edma-core.c
 create mode 100644 drivers/dma/dw-edma/dw-edma-core.h
 create mode 100644 include/linux/dma/edma.h
Jose Abreu - Jan. 16, 2019, 10:21 a.m.
Hi Gustavo,

On 1/11/2019 6:33 PM, Gustavo Pimentel wrote:
> Add Synopsys eDMA IP core driver to kernel.
> 
> This core driver, initializes and configures the eDMA IP using vma-helpers
> functions and dma-engine subsystem.
> 
> Also creates an abstration layer through callbacks allowing different
> registers mappings in the future, organized in to versions.
> 
> This driver can be compile as built-in or external module in kernel.
> 
> To enable this driver just select DW_EDMA option in kernel configuration,
> however it requires and selects automatically DMA_ENGINE and
> DMA_VIRTUAL_CHANNELS option too.
> 
> Changes:
> RFC v1->RFC v2:
>  - Replace comments // (C99 style) by /**/
>  - Fix the headers of the .c and .h files according to the most recent
>    convention
>  - Fix errors and checks pointed out by checkpatch with --strict option
>  - Replace patch small description tag from dma by dmaengine
>  - Change some dev_info() into dev_dbg()
>  - Remove unnecessary zero initialization after kzalloc
>  - Remove direction validation on config() API, since the direction
>    parameter is deprecated
>  - Refactor code to replace atomic_t by u32 variable type
>  - Replace start_transfer() name by dw_edma_start_transfer()
>  - Add spinlock to dw_edma_device_prep_slave_sg()
>  - Add spinlock to dw_edma_free_chunk()
>  - Simplify switch case into if on dw_edma_device_pause(),
>    dw_edma_device_resume() and dw_edma_device_terminate_all()
> RFC v2->RFC v3:
>  - Add driver parameter to disable msix feature
>  - Fix printk variable of phys_addr_t type
>  - Fix printk variable of __iomem type
>  - Fix printk variable of size_t type
>  - Add comments or improve existing ones
>  - Add possibility to work with multiple IRQs feature
>  - Fix source and destination addresses
>  - Add define to magic numbers
>  - Add DMA cyclic transfer feature
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> Cc: Vinod Koul <vkoul@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Eugeniy Paltsev <paltsev@synopsys.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Jose Abreu <jose.abreu@synopsys.com>
> Cc: Luis Oliveira <lolivei@synopsys.com>
> Cc: Vitor Soares <vitor.soares@synopsys.com>
> Cc: Nelson Costa <nelson.costa@synopsys.com>
> Cc: Pedro Sousa <pedrom.sousa@synopsys.com>


> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
> +{
> +	struct dw_edma_chan *chan = desc->chan;
> +	struct dw_edma *dw = chan->chip->dw;
> +	struct dw_edma_chunk *chunk;
> +
> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
> +	if (unlikely(!chunk))
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&chunk->list);
> +	chunk->chan = chan;
> +	chunk->cb = !(desc->chunks_alloc % 2);
> +	chunk->ll_region.paddr = dw->ll_region.paddr + chan->ll_off;
> +	chunk->ll_region.vaddr = dw->ll_region.vaddr + chan->ll_off;
> +
> +	if (desc->chunk) {
> +		/* Create and add new element into the linked list */
> +		desc->chunks_alloc++;
> +		dev_dbg(chan2dev(chan), "alloc new chunk element (%d)\n",
> +			desc->chunks_alloc);
> +		list_add_tail(&chunk->list, &desc->chunk->list);
> +		dw_edma_alloc_burst(chunk);

No return check ?

> +	} else {
> +		/* List head */
> +		chunk->burst = NULL;
> +		desc->chunks_alloc = 0;
> +		desc->chunk = chunk;
> +		dev_dbg(chan2dev(chan), "alloc new chunk head\n");
> +	}
> +
> +	return chunk;
> +}
> +
> +static struct dw_edma_desc *dw_edma_alloc_desc(struct dw_edma_chan *chan)
> +{
> +	struct dw_edma_desc *desc;
> +
> +	dev_dbg(chan2dev(chan), "alloc new descriptor\n");
> +
> +	desc = kvzalloc(sizeof(*desc), GFP_NOWAIT);
> +	if (unlikely(!desc))
> +		return NULL;
> +
> +	desc->chan = chan;
> +	dw_edma_alloc_chunk(desc);

No return check ?

> +
> +	return desc;
> +}
> +

> +static void dw_edma_start_transfer(struct dw_edma_chan *chan)
> +{
> +	struct virt_dma_desc *vd;
> +	struct dw_edma_desc *desc;
> +	struct dw_edma_chunk *child;
> +	const struct dw_edma_core_ops *ops = chan2ops(chan);

Reverse tree order here and in remaining functions ...

> +
> +	vd = vchan_next_desc(&chan->vc);
> +	if (!vd)
> +		return;
> +
> +	desc = vd2dw_edma_desc(vd);
> +	if (!desc)
> +		return;
> +
> +	child = list_first_entry_or_null(&desc->chunk->list,
> +					 struct dw_edma_chunk, list);
> +	if (!child)
> +		return;
> +
> +	ops->start(child, !desc->xfer_sz);
> +	desc->xfer_sz += child->ll_region.sz;
> +	dev_dbg(chan2dev(chan), "transfer of %u bytes started\n",
> +		child->ll_region.sz);
> +
> +	dw_edma_free_burst(child);
> +	if (child->bursts_alloc)
> +		dev_dbg(chan2dev(chan),	"%u bursts still allocated\n",
> +			child->bursts_alloc);
> +	list_del(&child->list);
> +	kvfree(child);
> +	desc->chunks_alloc--;
> +}
> +

> +int dw_edma_probe(struct dw_edma_chip *chip)
> +{
> +	struct dw_edma *dw = chip->dw;
> +	struct device *dev = chip->dev;
> +	const struct dw_edma_core_ops *ops;
> +	size_t ll_chunk = dw->ll_region.sz;
> +	size_t dt_chunk = dw->dt_region.sz;
> +	u32 ch_tot;
> +	int i, j, err;
> +
> +	raw_spin_lock_init(&dw->lock);

Why raw ?

> +
> +	/* Callback operation selection accordingly to eDMA version */
> +	switch (dw->version) {
> +	default:
> +		dev_err(dev, "unsupported version\n");
> +		return -EPERM;
> +	}
> +
> +	pm_runtime_get_sync(dev);

Why do you need to increase usage count at probe ? And shouldn't
this be after pm_runtime_enable() ?

> +
> +	/* Find out how many write channels are supported by hardware */
> +	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
> +	if (!dw->wr_ch_cnt) {
> +		dev_err(dev, "invalid number of write channels(0)\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Find out how many read channels are supported by hardware */
> +	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
> +	if (!dw->rd_ch_cnt) {
> +		dev_err(dev, "invalid number of read channels(0)\n");
> +		return -EINVAL;
> +	}
> +
> +	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
> +		dw->wr_ch_cnt, dw->rd_ch_cnt);
> +
> +	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
> +
> +	/* Allocate channels */
> +	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);
> +	if (!dw->chan)
> +		return -ENOMEM;
> +
> +	/* Calculate the linked list chunk for each channel */
> +	ll_chunk /= roundup_pow_of_two(ch_tot);
> +
> +	/* Calculate the linked list chunk for each channel */
> +	dt_chunk /= roundup_pow_of_two(ch_tot);
> +
> +	/* Disable eDMA, only to establish the ideal initial conditions */
> +	ops->off(dw);
> +
> +	snprintf(dw->name, sizeof(dw->name), "dw-edma-core:%d", chip->id);
> +
> +	/* Request IRQs */
> +	if (dw->nr_irqs != 1) {
> +		dev_err(dev, "invalid number of irqs (%u)\n", dw->nr_irqs);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < dw->nr_irqs; i++) {
> +		err = devm_request_irq(dev, pci_irq_vector(to_pci_dev(dev), i),
> +				       dw_edma_interrupt_all,
> +				       IRQF_SHARED, dw->name, chip);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* Create write channels */
> +	INIT_LIST_HEAD(&dw->wr_edma.channels);
> +	for (i = 0; i < dw->wr_ch_cnt; i++) {
> +		struct dw_edma_chan *chan = &dw->chan[i];
> +		struct dw_edma_region *dt_region;
> +
> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
> +		if (!dt_region)
> +			return -ENOMEM;
> +
> +		chan->vc.chan.private = dt_region;
> +
> +		chan->chip = chip;
> +		chan->id = i;
> +		chan->dir = EDMA_DIR_WRITE;
> +		chan->configured = false;
> +		chan->request = EDMA_REQ_NONE;
> +		chan->status = EDMA_ST_IDLE;
> +
> +		chan->ll_off = (ll_chunk * i);
> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
> +
> +		chan->dt_off = (dt_chunk * i);
> +
> +		dev_dbg(dev, "L. List:\tChannel write[%u] off=0x%.8lx, max_cnt=%u\n",
> +			i, chan->ll_off, chan->ll_max);
> +
> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
> +
> +		dev_dbg(dev, "MSI:\t\tChannel write[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
> +			i, chan->msi.address_hi, chan->msi.address_lo,
> +			chan->msi.data);
> +
> +		chan->vc.desc_free = vchan_free_desc;
> +		vchan_init(&chan->vc, &dw->wr_edma);
> +
> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
> +		dt_region->sz = dt_chunk;
> +
> +		dev_dbg(dev, "Data:\tChannel write[%u] off=0x%.8lx\n",
> +			i, chan->dt_off);
> +	}
> +	dma_cap_zero(dw->wr_edma.cap_mask);
> +	dma_cap_set(DMA_SLAVE, dw->wr_edma.cap_mask);
> +	dma_cap_set(DMA_CYCLIC, dw->wr_edma.cap_mask);
> +	dw->wr_edma.directions = BIT(DMA_DEV_TO_MEM);
> +	dw->wr_edma.chancnt = dw->wr_ch_cnt;
> +
> +	/* Create read channels */
> +	INIT_LIST_HEAD(&dw->rd_edma.channels);
> +	for (j = 0; j < dw->rd_ch_cnt; j++, i++) {
> +		struct dw_edma_chan *chan = &dw->chan[i];
> +		struct dw_edma_region *dt_region;
> +
> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
> +		if (!dt_region)
> +			return -ENOMEM;
> +
> +		chan->vc.chan.private = dt_region;
> +
> +		chan->chip = chip;
> +		chan->id = j;
> +		chan->dir = EDMA_DIR_READ;
> +		chan->configured = false;
> +		chan->request = EDMA_REQ_NONE;
> +		chan->status = EDMA_ST_IDLE;
> +
> +		chan->ll_off = (ll_chunk * i);
> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
> +
> +		chan->dt_off = (dt_chunk * i);
> +
> +		dev_dbg(dev, "L. List:\tChannel read[%u] off=0x%.8lx, max_cnt=%u\n",
> +			j, chan->ll_off, chan->ll_max);
> +
> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
> +
> +		dev_dbg(dev, "MSI:\t\tChannel read[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
> +			j, chan->msi.address_hi, chan->msi.address_lo,
> +			chan->msi.data);
> +
> +		chan->vc.desc_free = vchan_free_desc;
> +		vchan_init(&chan->vc, &dw->rd_edma);
> +
> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
> +		dt_region->sz = dt_chunk;
> +
> +		dev_dbg(dev, "Data:\tChannel read[%u] off=0x%.8lx\n",
> +			i, chan->dt_off);
> +	}
> +	dma_cap_zero(dw->rd_edma.cap_mask);
> +	dma_cap_set(DMA_SLAVE, dw->rd_edma.cap_mask);
> +	dma_cap_set(DMA_CYCLIC, dw->rd_edma.cap_mask);
> +	dw->rd_edma.directions = BIT(DMA_MEM_TO_DEV);
> +	dw->rd_edma.chancnt = dw->rd_ch_cnt;
> +
> +	/* Set DMA channels  capabilities */
> +	SET_BOTH_CH(src_addr_widths, BIT(DMA_SLAVE_BUSWIDTH_4_BYTES));
> +	SET_BOTH_CH(dst_addr_widths, BIT(DMA_SLAVE_BUSWIDTH_4_BYTES));
> +	SET_BOTH_CH(residue_granularity, DMA_RESIDUE_GRANULARITY_DESCRIPTOR);
> +
> +	SET_BOTH_CH(dev, dev);
> +
> +	SET_BOTH_CH(device_alloc_chan_resources, dw_edma_alloc_chan_resources);
> +	SET_BOTH_CH(device_free_chan_resources, dw_edma_free_chan_resources);
> +
> +	SET_BOTH_CH(device_config, dw_edma_device_config);
> +	SET_BOTH_CH(device_pause, dw_edma_device_pause);
> +	SET_BOTH_CH(device_resume, dw_edma_device_resume);
> +	SET_BOTH_CH(device_terminate_all, dw_edma_device_terminate_all);
> +	SET_BOTH_CH(device_issue_pending, dw_edma_device_issue_pending);
> +	SET_BOTH_CH(device_tx_status, dw_edma_device_tx_status);
> +	SET_BOTH_CH(device_prep_slave_sg, dw_edma_device_prep_slave_sg);
> +	SET_BOTH_CH(device_prep_dma_cyclic, dw_edma_device_prep_dma_cyclic);
> +
> +	/* Power management */
> +	pm_runtime_enable(dev);
> +
> +	/* Register DMA device */
> +	err = dma_async_device_register(&dw->wr_edma);
> +	if (err)
> +		goto err_pm_disable;
> +
> +	err = dma_async_device_register(&dw->rd_edma);
> +	if (err)
> +		goto err_pm_disable;
> +
> +	/* Turn debugfs on */
> +	err = ops->debugfs_on(chip);
> +	if (err) {
> +		dev_err(dev, "unable to create debugfs structure\n");
> +		goto err_pm_disable;
> +	}
> +
> +	dev_info(dev, "DesignWare eDMA controller driver loaded completely\n");
> +
> +	return 0;
> +
> +err_pm_disable:
> +	pm_runtime_disable(dev);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(dw_edma_probe);
> +
> +int dw_edma_remove(struct dw_edma_chip *chip)
> +{
> +	struct dw_edma *dw = chip->dw;
> +	struct device *dev = chip->dev;
> +	const struct dw_edma_core_ops *ops = dw->ops;
> +	struct dw_edma_chan *chan, *_chan;
> +	int i;
> +
> +	/* Disable eDMA */
> +	if (ops)
> +		ops->off(dw);
> +
> +	/* Free irqs */

But devm will automatically free it, no ?

> +	for (i = 0; i < dw->nr_irqs; i++) {
> +		if (pci_irq_vector(to_pci_dev(dev), i) < 0)
> +			continue;
> +
> +		devm_free_irq(dev, pci_irq_vector(to_pci_dev(dev), i), chip);
> +	}
> +
> +	/* Power management */
> +	pm_runtime_disable(dev);
> +
> +	list_for_each_entry_safe(chan, _chan, &dw->wr_edma.channels,
> +				 vc.chan.device_node) {
> +		list_del(&chan->vc.chan.device_node);
> +		tasklet_kill(&chan->vc.task);
> +	}
> +
> +	list_for_each_entry_safe(chan, _chan, &dw->rd_edma.channels,
> +				 vc.chan.device_node) {
> +		list_del(&chan->vc.chan.device_node);
> +		tasklet_kill(&chan->vc.task);
> +	}
> +
> +	/* Deregister eDMA device */
> +	dma_async_device_unregister(&dw->wr_edma);
> +	dma_async_device_unregister(&dw->rd_edma);
> +
> +	/* Turn debugfs off */
> +	if (ops)
> +		ops->debugfs_off();
> +
> +	dev_info(dev, "DesignWare eDMA controller driver unloaded complete\n");
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dw_edma_remove);
Gustavo Pimentel - Jan. 16, 2019, 11:53 a.m.
Hi Jose,

On 16/01/2019 10:21, Jose Abreu wrote:
> Hi Gustavo,
> 
> On 1/11/2019 6:33 PM, Gustavo Pimentel wrote:
>> Add Synopsys eDMA IP core driver to kernel.
>>
>> This core driver, initializes and configures the eDMA IP using vma-helpers
>> functions and dma-engine subsystem.
>>
>> Also creates an abstration layer through callbacks allowing different
>> registers mappings in the future, organized in to versions.
>>
>> This driver can be compile as built-in or external module in kernel.
>>
>> To enable this driver just select DW_EDMA option in kernel configuration,
>> however it requires and selects automatically DMA_ENGINE and
>> DMA_VIRTUAL_CHANNELS option too.
>>
>> Changes:
>> RFC v1->RFC v2:
>>  - Replace comments // (C99 style) by /**/
>>  - Fix the headers of the .c and .h files according to the most recent
>>    convention
>>  - Fix errors and checks pointed out by checkpatch with --strict option
>>  - Replace patch small description tag from dma by dmaengine
>>  - Change some dev_info() into dev_dbg()
>>  - Remove unnecessary zero initialization after kzalloc
>>  - Remove direction validation on config() API, since the direction
>>    parameter is deprecated
>>  - Refactor code to replace atomic_t by u32 variable type
>>  - Replace start_transfer() name by dw_edma_start_transfer()
>>  - Add spinlock to dw_edma_device_prep_slave_sg()
>>  - Add spinlock to dw_edma_free_chunk()
>>  - Simplify switch case into if on dw_edma_device_pause(),
>>    dw_edma_device_resume() and dw_edma_device_terminate_all()
>> RFC v2->RFC v3:
>>  - Add driver parameter to disable msix feature
>>  - Fix printk variable of phys_addr_t type
>>  - Fix printk variable of __iomem type
>>  - Fix printk variable of size_t type
>>  - Add comments or improve existing ones
>>  - Add possibility to work with multiple IRQs feature
>>  - Fix source and destination addresses
>>  - Add define to magic numbers
>>  - Add DMA cyclic transfer feature
>>
>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>> Cc: Vinod Koul <vkoul@kernel.org>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Eugeniy Paltsev <paltsev@synopsys.com>
>> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Cc: Russell King <rmk+kernel@armlinux.org.uk>
>> Cc: Niklas Cassel <niklas.cassel@linaro.org>
>> Cc: Joao Pinto <jpinto@synopsys.com>
>> Cc: Jose Abreu <jose.abreu@synopsys.com>
>> Cc: Luis Oliveira <lolivei@synopsys.com>
>> Cc: Vitor Soares <vitor.soares@synopsys.com>
>> Cc: Nelson Costa <nelson.costa@synopsys.com>
>> Cc: Pedro Sousa <pedrom.sousa@synopsys.com>
> 
> 
>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
>> +{
>> +	struct dw_edma_chan *chan = desc->chan;
>> +	struct dw_edma *dw = chan->chip->dw;
>> +	struct dw_edma_chunk *chunk;
>> +
>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
>> +	if (unlikely(!chunk))
>> +		return NULL;
>> +
>> +	INIT_LIST_HEAD(&chunk->list);
>> +	chunk->chan = chan;
>> +	chunk->cb = !(desc->chunks_alloc % 2);
>> +	chunk->ll_region.paddr = dw->ll_region.paddr + chan->ll_off;
>> +	chunk->ll_region.vaddr = dw->ll_region.vaddr + chan->ll_off;
>> +
>> +	if (desc->chunk) {
>> +		/* Create and add new element into the linked list */
>> +		desc->chunks_alloc++;
>> +		dev_dbg(chan2dev(chan), "alloc new chunk element (%d)\n",
>> +			desc->chunks_alloc);
>> +		list_add_tail(&chunk->list, &desc->chunk->list);
>> +		dw_edma_alloc_burst(chunk);
> 
> No return check ?

Nice catch! I'll do a return check.

> 
>> +	} else {
>> +		/* List head */
>> +		chunk->burst = NULL;
>> +		desc->chunks_alloc = 0;
>> +		desc->chunk = chunk;
>> +		dev_dbg(chan2dev(chan), "alloc new chunk head\n");
>> +	}
>> +
>> +	return chunk;
>> +}
>> +
>> +static struct dw_edma_desc *dw_edma_alloc_desc(struct dw_edma_chan *chan)
>> +{
>> +	struct dw_edma_desc *desc;
>> +
>> +	dev_dbg(chan2dev(chan), "alloc new descriptor\n");
>> +
>> +	desc = kvzalloc(sizeof(*desc), GFP_NOWAIT);
>> +	if (unlikely(!desc))
>> +		return NULL;
>> +
>> +	desc->chan = chan;
>> +	dw_edma_alloc_chunk(desc);
> 
> No return check ?

Ditto.

> 
>> +
>> +	return desc;
>> +}
>> +
> 
>> +static void dw_edma_start_transfer(struct dw_edma_chan *chan)
>> +{
>> +	struct virt_dma_desc *vd;
>> +	struct dw_edma_desc *desc;
>> +	struct dw_edma_chunk *child;
>> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
> 
> Reverse tree order here and in remaining functions ...

Ok, I'll do that.

> 
>> +
>> +	vd = vchan_next_desc(&chan->vc);
>> +	if (!vd)
>> +		return;
>> +
>> +	desc = vd2dw_edma_desc(vd);
>> +	if (!desc)
>> +		return;
>> +
>> +	child = list_first_entry_or_null(&desc->chunk->list,
>> +					 struct dw_edma_chunk, list);
>> +	if (!child)
>> +		return;
>> +
>> +	ops->start(child, !desc->xfer_sz);
>> +	desc->xfer_sz += child->ll_region.sz;
>> +	dev_dbg(chan2dev(chan), "transfer of %u bytes started\n",
>> +		child->ll_region.sz);
>> +
>> +	dw_edma_free_burst(child);
>> +	if (child->bursts_alloc)
>> +		dev_dbg(chan2dev(chan),	"%u bursts still allocated\n",
>> +			child->bursts_alloc);
>> +	list_del(&child->list);
>> +	kvfree(child);
>> +	desc->chunks_alloc--;
>> +}
>> +
> 
>> +int dw_edma_probe(struct dw_edma_chip *chip)
>> +{
>> +	struct dw_edma *dw = chip->dw;
>> +	struct device *dev = chip->dev;
>> +	const struct dw_edma_core_ops *ops;
>> +	size_t ll_chunk = dw->ll_region.sz;
>> +	size_t dt_chunk = dw->dt_region.sz;
>> +	u32 ch_tot;
>> +	int i, j, err;
>> +
>> +	raw_spin_lock_init(&dw->lock);
> 
> Why raw ?

Marc Zyngier told me some time ago raw spin locks it would preferable, since it
doesn't bring any additional effect for the mainline kernel, but will make it
work correctly if you use the RT patches.

> 
>> +
>> +	/* Callback operation selection accordingly to eDMA version */
>> +	switch (dw->version) {
>> +	default:
>> +		dev_err(dev, "unsupported version\n");
>> +		return -EPERM;
>> +	}
>> +
>> +	pm_runtime_get_sync(dev);
> 
> Why do you need to increase usage count at probe ? And shouldn't
> this be after pm_runtime_enable() ?

It shouldn't be there. I'll remove it.

> 
>> +
>> +	/* Find out how many write channels are supported by hardware */
>> +	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
>> +	if (!dw->wr_ch_cnt) {
>> +		dev_err(dev, "invalid number of write channels(0)\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Find out how many read channels are supported by hardware */
>> +	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
>> +	if (!dw->rd_ch_cnt) {
>> +		dev_err(dev, "invalid number of read channels(0)\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
>> +		dw->wr_ch_cnt, dw->rd_ch_cnt);
>> +
>> +	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
>> +
>> +	/* Allocate channels */
>> +	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);
>> +	if (!dw->chan)
>> +		return -ENOMEM;
>> +
>> +	/* Calculate the linked list chunk for each channel */
>> +	ll_chunk /= roundup_pow_of_two(ch_tot);
>> +
>> +	/* Calculate the linked list chunk for each channel */
>> +	dt_chunk /= roundup_pow_of_two(ch_tot);
>> +
>> +	/* Disable eDMA, only to establish the ideal initial conditions */
>> +	ops->off(dw);
>> +
>> +	snprintf(dw->name, sizeof(dw->name), "dw-edma-core:%d", chip->id);
>> +
>> +	/* Request IRQs */
>> +	if (dw->nr_irqs != 1) {
>> +		dev_err(dev, "invalid number of irqs (%u)\n", dw->nr_irqs);
>> +		return -EINVAL;
>> +	}
>> +
>> +	for (i = 0; i < dw->nr_irqs; i++) {
>> +		err = devm_request_irq(dev, pci_irq_vector(to_pci_dev(dev), i),
>> +				       dw_edma_interrupt_all,
>> +				       IRQF_SHARED, dw->name, chip);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	/* Create write channels */
>> +	INIT_LIST_HEAD(&dw->wr_edma.channels);
>> +	for (i = 0; i < dw->wr_ch_cnt; i++) {
>> +		struct dw_edma_chan *chan = &dw->chan[i];
>> +		struct dw_edma_region *dt_region;
>> +
>> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
>> +		if (!dt_region)
>> +			return -ENOMEM;
>> +
>> +		chan->vc.chan.private = dt_region;
>> +
>> +		chan->chip = chip;
>> +		chan->id = i;
>> +		chan->dir = EDMA_DIR_WRITE;
>> +		chan->configured = false;
>> +		chan->request = EDMA_REQ_NONE;
>> +		chan->status = EDMA_ST_IDLE;
>> +
>> +		chan->ll_off = (ll_chunk * i);
>> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
>> +
>> +		chan->dt_off = (dt_chunk * i);
>> +
>> +		dev_dbg(dev, "L. List:\tChannel write[%u] off=0x%.8lx, max_cnt=%u\n",
>> +			i, chan->ll_off, chan->ll_max);
>> +
>> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
>> +
>> +		dev_dbg(dev, "MSI:\t\tChannel write[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
>> +			i, chan->msi.address_hi, chan->msi.address_lo,
>> +			chan->msi.data);
>> +
>> +		chan->vc.desc_free = vchan_free_desc;
>> +		vchan_init(&chan->vc, &dw->wr_edma);
>> +
>> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
>> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
>> +		dt_region->sz = dt_chunk;
>> +
>> +		dev_dbg(dev, "Data:\tChannel write[%u] off=0x%.8lx\n",
>> +			i, chan->dt_off);
>> +	}
>> +	dma_cap_zero(dw->wr_edma.cap_mask);
>> +	dma_cap_set(DMA_SLAVE, dw->wr_edma.cap_mask);
>> +	dma_cap_set(DMA_CYCLIC, dw->wr_edma.cap_mask);
>> +	dw->wr_edma.directions = BIT(DMA_DEV_TO_MEM);
>> +	dw->wr_edma.chancnt = dw->wr_ch_cnt;
>> +
>> +	/* Create read channels */
>> +	INIT_LIST_HEAD(&dw->rd_edma.channels);
>> +	for (j = 0; j < dw->rd_ch_cnt; j++, i++) {
>> +		struct dw_edma_chan *chan = &dw->chan[i];
>> +		struct dw_edma_region *dt_region;
>> +
>> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
>> +		if (!dt_region)
>> +			return -ENOMEM;
>> +
>> +		chan->vc.chan.private = dt_region;
>> +
>> +		chan->chip = chip;
>> +		chan->id = j;
>> +		chan->dir = EDMA_DIR_READ;
>> +		chan->configured = false;
>> +		chan->request = EDMA_REQ_NONE;
>> +		chan->status = EDMA_ST_IDLE;
>> +
>> +		chan->ll_off = (ll_chunk * i);
>> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
>> +
>> +		chan->dt_off = (dt_chunk * i);
>> +
>> +		dev_dbg(dev, "L. List:\tChannel read[%u] off=0x%.8lx, max_cnt=%u\n",
>> +			j, chan->ll_off, chan->ll_max);
>> +
>> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
>> +
>> +		dev_dbg(dev, "MSI:\t\tChannel read[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
>> +			j, chan->msi.address_hi, chan->msi.address_lo,
>> +			chan->msi.data);
>> +
>> +		chan->vc.desc_free = vchan_free_desc;
>> +		vchan_init(&chan->vc, &dw->rd_edma);
>> +
>> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
>> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
>> +		dt_region->sz = dt_chunk;
>> +
>> +		dev_dbg(dev, "Data:\tChannel read[%u] off=0x%.8lx\n",
>> +			i, chan->dt_off);
>> +	}
>> +	dma_cap_zero(dw->rd_edma.cap_mask);
>> +	dma_cap_set(DMA_SLAVE, dw->rd_edma.cap_mask);
>> +	dma_cap_set(DMA_CYCLIC, dw->rd_edma.cap_mask);
>> +	dw->rd_edma.directions = BIT(DMA_MEM_TO_DEV);
>> +	dw->rd_edma.chancnt = dw->rd_ch_cnt;
>> +
>> +	/* Set DMA channels  capabilities */
>> +	SET_BOTH_CH(src_addr_widths, BIT(DMA_SLAVE_BUSWIDTH_4_BYTES));
>> +	SET_BOTH_CH(dst_addr_widths, BIT(DMA_SLAVE_BUSWIDTH_4_BYTES));
>> +	SET_BOTH_CH(residue_granularity, DMA_RESIDUE_GRANULARITY_DESCRIPTOR);
>> +
>> +	SET_BOTH_CH(dev, dev);
>> +
>> +	SET_BOTH_CH(device_alloc_chan_resources, dw_edma_alloc_chan_resources);
>> +	SET_BOTH_CH(device_free_chan_resources, dw_edma_free_chan_resources);
>> +
>> +	SET_BOTH_CH(device_config, dw_edma_device_config);
>> +	SET_BOTH_CH(device_pause, dw_edma_device_pause);
>> +	SET_BOTH_CH(device_resume, dw_edma_device_resume);
>> +	SET_BOTH_CH(device_terminate_all, dw_edma_device_terminate_all);
>> +	SET_BOTH_CH(device_issue_pending, dw_edma_device_issue_pending);
>> +	SET_BOTH_CH(device_tx_status, dw_edma_device_tx_status);
>> +	SET_BOTH_CH(device_prep_slave_sg, dw_edma_device_prep_slave_sg);
>> +	SET_BOTH_CH(device_prep_dma_cyclic, dw_edma_device_prep_dma_cyclic);
>> +
>> +	/* Power management */
>> +	pm_runtime_enable(dev);
>> +
>> +	/* Register DMA device */
>> +	err = dma_async_device_register(&dw->wr_edma);
>> +	if (err)
>> +		goto err_pm_disable;
>> +
>> +	err = dma_async_device_register(&dw->rd_edma);
>> +	if (err)
>> +		goto err_pm_disable;
>> +
>> +	/* Turn debugfs on */
>> +	err = ops->debugfs_on(chip);
>> +	if (err) {
>> +		dev_err(dev, "unable to create debugfs structure\n");
>> +		goto err_pm_disable;
>> +	}
>> +
>> +	dev_info(dev, "DesignWare eDMA controller driver loaded completely\n");
>> +
>> +	return 0;
>> +
>> +err_pm_disable:
>> +	pm_runtime_disable(dev);
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL_GPL(dw_edma_probe);
>> +
>> +int dw_edma_remove(struct dw_edma_chip *chip)
>> +{
>> +	struct dw_edma *dw = chip->dw;
>> +	struct device *dev = chip->dev;
>> +	const struct dw_edma_core_ops *ops = dw->ops;
>> +	struct dw_edma_chan *chan, *_chan;
>> +	int i;
>> +
>> +	/* Disable eDMA */
>> +	if (ops)
>> +		ops->off(dw);
>> +
>> +	/* Free irqs */
> 
> But devm will automatically free it, no ?

Yes, it shouldn't be there. I'll remove it.

> 
>> +	for (i = 0; i < dw->nr_irqs; i++) {
>> +		if (pci_irq_vector(to_pci_dev(dev), i) < 0)
>> +			continue;
>> +
>> +		devm_free_irq(dev, pci_irq_vector(to_pci_dev(dev), i), chip);
>> +	}
>> +
>> +	/* Power management */
>> +	pm_runtime_disable(dev);
>> +
>> +	list_for_each_entry_safe(chan, _chan, &dw->wr_edma.channels,
>> +				 vc.chan.device_node) {
>> +		list_del(&chan->vc.chan.device_node);
>> +		tasklet_kill(&chan->vc.task);
>> +	}
>> +
>> +	list_for_each_entry_safe(chan, _chan, &dw->rd_edma.channels,
>> +				 vc.chan.device_node) {
>> +		list_del(&chan->vc.chan.device_node);
>> +		tasklet_kill(&chan->vc.task);
>> +	}
>> +
>> +	/* Deregister eDMA device */
>> +	dma_async_device_unregister(&dw->wr_edma);
>> +	dma_async_device_unregister(&dw->rd_edma);
>> +
>> +	/* Turn debugfs off */
>> +	if (ops)
>> +		ops->debugfs_off();
>> +
>> +	dev_info(dev, "DesignWare eDMA controller driver unloaded complete\n");
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(dw_edma_remove);

Thanks!
Andy Shevchenko - Jan. 19, 2019, 4:21 p.m.
On Wed, Jan 16, 2019 at 11:53:00AM +0000, Gustavo Pimentel wrote:
> On 16/01/2019 10:21, Jose Abreu wrote:
> > On 1/11/2019 6:33 PM, Gustavo Pimentel wrote:

> >> +	/* Free irqs */
> > 
> > But devm will automatically free it, no ?
> 
> Yes, it shouldn't be there. I'll remove it.

It depends. If the driver is using tasklets it must free it explicitly like this.

P.S. devm_*_irq() is quite doubtful to have in the first place.
Vinod Koul - Jan. 20, 2019, 11:44 a.m.
On 11-01-19, 19:33, Gustavo Pimentel wrote:
> Add Synopsys eDMA IP core driver to kernel.
> 
> This core driver, initializes and configures the eDMA IP using vma-helpers
> functions and dma-engine subsystem.

A description of eDMA IP will help review the driver

> Also creates an abstration layer through callbacks allowing different
> registers mappings in the future, organized in to versions.
> 
> This driver can be compile as built-in or external module in kernel.
> 
> To enable this driver just select DW_EDMA option in kernel configuration,
> however it requires and selects automatically DMA_ENGINE and
> DMA_VIRTUAL_CHANNELS option too.
> 
> Changes:
> RFC v1->RFC v2:

These do not belong to change log, either move them to cover-letter or
keep them after s-o-b lines..

> @@ -0,0 +1,1059 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.

2019 now

> +static struct dw_edma_burst *dw_edma_alloc_burst(struct dw_edma_chunk *chunk)
> +{
> +	struct dw_edma_chan *chan = chunk->chan;
> +	struct dw_edma_burst *burst;
> +
> +	burst = kvzalloc(sizeof(*burst), GFP_NOWAIT);

Is there a specific reason for kvzalloc(),  do you submit this to HW..

> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
> +{
> +	struct dw_edma_chan *chan = desc->chan;
> +	struct dw_edma *dw = chan->chip->dw;
> +	struct dw_edma_chunk *chunk;
> +
> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
> +	if (unlikely(!chunk))
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&chunk->list);
> +	chunk->chan = chan;
> +	chunk->cb = !(desc->chunks_alloc % 2);

cb ..?

> +static int dw_edma_device_config(struct dma_chan *dchan,
> +				 struct dma_slave_config *config)
> +{
> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
> +	unsigned long flags;
> +	int err = 0;
> +
> +	spin_lock_irqsave(&chan->vc.lock, flags);
> +
> +	if (!config) {
> +		err = -EINVAL;
> +		goto err_config;
> +	}
> +
> +	if (chan->status != EDMA_ST_IDLE) {
> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
> +		err = -EPERM;

this is not correct behaviour, device_config can be called anytime and
values can take affect on next transaction submitted..

> +		goto err_config;
> +	}
> +
> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
> +		&config->src_addr, &config->dst_addr);
> +
> +	chan->src_addr = config->src_addr;
> +	chan->dst_addr = config->dst_addr;
> +
> +	err = ops->device_config(dchan);

what does this do?

> +static enum dma_status
> +dw_edma_device_tx_status(struct dma_chan *dchan, dma_cookie_t cookie,
> +			 struct dma_tx_state *txstate)
> +{
> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
> +	unsigned long flags;
> +	enum dma_status ret;
> +
> +	spin_lock_irqsave(&chan->vc.lock, flags);
> +
> +	ret = ops->ch_status(chan);
> +	if (ret == DMA_ERROR) {
> +		goto ret_status;
> +	} else if (ret == DMA_IN_PROGRESS) {
> +		chan->status = EDMA_ST_BUSY;
> +		goto ret_status;

so in this case you set residue as zero, which is not correct

> +	} else {
> +		/* DMA_COMPLETE */
> +		if (chan->status == EDMA_ST_PAUSE)
> +			ret = DMA_PAUSED;
> +		else if (chan->status == EDMA_ST_BUSY)
> +			ret = DMA_IN_PROGRESS;

?? if txn is complete how are you returning DMA_IN_PROGRESS?

> +		else
> +			ret = DMA_COMPLETE;
> +	}

this looks incorrect interpretation to me. The status is to be retrieved
for the given cookie passed and given that you do not even use this
argument tells me that you have understood this as 'channel' status
reporting, which is not correct

> +static struct dma_async_tx_descriptor *
> +dw_edma_device_prep_slave_sg(struct dma_chan *dchan, struct scatterlist *sgl,
> +			     unsigned int sg_len,
> +			     enum dma_transfer_direction direction,
> +			     unsigned long flags, void *context)
> +{
> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> +	struct dw_edma_desc *desc;
> +	struct dw_edma_chunk *chunk;
> +	struct dw_edma_burst *burst;
> +	struct scatterlist *sg;
> +	unsigned long sflags;
> +	phys_addr_t src_addr;
> +	phys_addr_t dst_addr;
> +	int i;
> +
> +	if (sg_len < 1) {
> +		dev_err(chan2dev(chan), "invalid sg length %u\n", sg_len);
> +		return NULL;
> +	}
> +
> +	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {

what is the second part of the check, can you explain that, who sets
chan->dir?

> +		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
> +	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
> +		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
> +	} else {
> +		dev_err(chan2dev(chan), "invalid direction\n");
> +		return NULL;
> +	}
> +
> +	if (!chan->configured) {
> +		dev_err(chan2dev(chan), "(prep_slave_sg) channel not configured\n");
> +		return NULL;
> +	}
> +
> +	if (chan->status != EDMA_ST_IDLE) {
> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
> +		return NULL;
> +	}

No, wrong again. The txn must be prepared and then on submit added to a
queue. You are writing a driver for dmaengine, surely you dont expect
the channel to be free and then do a txn.. that would be very
inefficient!

> +static struct dma_async_tx_descriptor *
> +dw_edma_device_prep_dma_cyclic(struct dma_chan *dchan, dma_addr_t buf_addr,
> +			       size_t len, size_t cyclic_cnt,
> +			       enum dma_transfer_direction direction,
> +			       unsigned long flags)
> +{
> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> +	struct dw_edma_desc *desc;
> +	struct dw_edma_chunk *chunk;
> +	struct dw_edma_burst *burst;
> +	unsigned long sflags;
> +	phys_addr_t src_addr;
> +	phys_addr_t dst_addr;
> +	u32 i;
> +
> +	if (!len || !cyclic_cnt) {
> +		dev_err(chan2dev(chan), "invalid len or cyclic count\n");
> +		return NULL;
> +	}
> +
> +	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
> +		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
> +	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
> +		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
> +	} else {
> +		dev_err(chan2dev(chan), "invalid direction\n");
> +		return NULL;
> +	}
> +
> +	if (!chan->configured) {
> +		dev_err(chan2dev(chan), "(prep_dma_cyclic) channel not configured\n");
> +		return NULL;
> +	}
> +
> +	if (chan->status != EDMA_ST_IDLE) {
> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
> +		return NULL;
> +	}
> +
> +	spin_lock_irqsave(&chan->vc.lock, sflags);
> +
> +	desc = dw_edma_alloc_desc(chan);
> +	if (unlikely(!desc))
> +		goto err_alloc;
> +
> +	chunk = dw_edma_alloc_chunk(desc);
> +	if (unlikely(!chunk))
> +		goto err_alloc;
> +
> +	src_addr = chan->src_addr;
> +	dst_addr = chan->dst_addr;
> +
> +	for (i = 0; i < cyclic_cnt; i++) {
> +		if (chunk->bursts_alloc == chan->ll_max) {
> +			chunk = dw_edma_alloc_chunk(desc);
> +			if (unlikely(!chunk))
> +				goto err_alloc;
> +		}
> +
> +		burst = dw_edma_alloc_burst(chunk);
> +
> +		if (unlikely(!burst))
> +			goto err_alloc;
> +
> +		burst->sz = len;
> +		chunk->ll_region.sz += burst->sz;
> +		desc->alloc_sz += burst->sz;
> +
> +		burst->sar = src_addr;
> +		burst->dar = dst_addr;
> +
> +		dev_dbg(chan2dev(chan), "lli %u/%u, sar=0x%.8llx, dar=0x%.8llx, size=%u bytes\n",
> +			i + 1, cyclic_cnt, burst->sar, burst->dar, burst->sz);
> +	}
> +
> +	spin_unlock_irqrestore(&chan->vc.lock, sflags);
> +	return vchan_tx_prep(&chan->vc, &desc->vd, flags);

how is it different from previous? am sure we can reuse bits!

> +static void dw_edma_done_interrupt(struct dw_edma_chan *chan)
> +{
> +	struct dw_edma *dw = chan->chip->dw;
> +	const struct dw_edma_core_ops *ops = dw->ops;
> +	struct virt_dma_desc *vd;
> +	struct dw_edma_desc *desc;
> +	unsigned long flags;
> +
> +	ops->clear_done_int(chan);
> +	dev_dbg(chan2dev(chan), "clear done interrupt\n");
> +
> +	spin_lock_irqsave(&chan->vc.lock, flags);
> +	vd = vchan_next_desc(&chan->vc);
> +	switch (chan->request) {
> +	case EDMA_REQ_NONE:
> +		if (!vd)
> +			break;
> +
> +		desc = vd2dw_edma_desc(vd);
> +		if (desc->chunks_alloc) {
> +			dev_dbg(chan2dev(chan), "sub-transfer complete\n");
> +			chan->status = EDMA_ST_BUSY;
> +			dev_dbg(chan2dev(chan), "transferred %u bytes\n",
> +				desc->xfer_sz);
> +			dw_edma_start_transfer(chan);
> +		} else {
> +			list_del(&vd->node);
> +			vchan_cookie_complete(vd);
> +			chan->status = EDMA_ST_IDLE;
> +			dev_dbg(chan2dev(chan), "transfer complete\n");
> +		}
> +		break;
> +	case EDMA_REQ_STOP:
> +		if (!vd)
> +			break;
> +
> +		list_del(&vd->node);
> +		vchan_cookie_complete(vd);
> +		chan->request = EDMA_REQ_NONE;
> +		chan->status = EDMA_ST_IDLE;
> +		chan->configured = false;

why is configuration deemed invalid, it can still be used again!

> +int dw_edma_probe(struct dw_edma_chip *chip)
> +{
> +	struct dw_edma *dw = chip->dw;
> +	struct device *dev = chip->dev;
> +	const struct dw_edma_core_ops *ops;
> +	size_t ll_chunk = dw->ll_region.sz;
> +	size_t dt_chunk = dw->dt_region.sz;
> +	u32 ch_tot;
> +	int i, j, err;
> +
> +	raw_spin_lock_init(&dw->lock);
> +
> +	/* Callback operation selection accordingly to eDMA version */
> +	switch (dw->version) {
> +	default:
> +		dev_err(dev, "unsupported version\n");
> +		return -EPERM;
> +	}

So we have only one case which returns error, was this code tested?

> +
> +	pm_runtime_get_sync(dev);
> +
> +	/* Find out how many write channels are supported by hardware */
> +	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
> +	if (!dw->wr_ch_cnt) {
> +		dev_err(dev, "invalid number of write channels(0)\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Find out how many read channels are supported by hardware */
> +	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
> +	if (!dw->rd_ch_cnt) {
> +		dev_err(dev, "invalid number of read channels(0)\n");
> +		return -EINVAL;
> +	}
> +
> +	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
> +		dw->wr_ch_cnt, dw->rd_ch_cnt);
> +
> +	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
> +
> +	/* Allocate channels */
> +	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);

you may use struct_size() here

> +	if (!dw->chan)
> +		return -ENOMEM;
> +
> +	/* Calculate the linked list chunk for each channel */
> +	ll_chunk /= roundup_pow_of_two(ch_tot);
> +
> +	/* Calculate the linked list chunk for each channel */
> +	dt_chunk /= roundup_pow_of_two(ch_tot);
> +
> +	/* Disable eDMA, only to establish the ideal initial conditions */
> +	ops->off(dw);
> +
> +	snprintf(dw->name, sizeof(dw->name), "dw-edma-core:%d", chip->id);
> +
> +	/* Request IRQs */
> +	if (dw->nr_irqs != 1) {
> +		dev_err(dev, "invalid number of irqs (%u)\n", dw->nr_irqs);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < dw->nr_irqs; i++) {
> +		err = devm_request_irq(dev, pci_irq_vector(to_pci_dev(dev), i),
> +				       dw_edma_interrupt_all,
> +				       IRQF_SHARED, dw->name, chip);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* Create write channels */
> +	INIT_LIST_HEAD(&dw->wr_edma.channels);
> +	for (i = 0; i < dw->wr_ch_cnt; i++) {
> +		struct dw_edma_chan *chan = &dw->chan[i];
> +		struct dw_edma_region *dt_region;
> +
> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
> +		if (!dt_region)
> +			return -ENOMEM;
> +
> +		chan->vc.chan.private = dt_region;
> +
> +		chan->chip = chip;
> +		chan->id = i;
> +		chan->dir = EDMA_DIR_WRITE;
> +		chan->configured = false;
> +		chan->request = EDMA_REQ_NONE;
> +		chan->status = EDMA_ST_IDLE;
> +
> +		chan->ll_off = (ll_chunk * i);
> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
> +
> +		chan->dt_off = (dt_chunk * i);
> +
> +		dev_dbg(dev, "L. List:\tChannel write[%u] off=0x%.8lx, max_cnt=%u\n",
> +			i, chan->ll_off, chan->ll_max);
> +
> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
> +
> +		dev_dbg(dev, "MSI:\t\tChannel write[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
> +			i, chan->msi.address_hi, chan->msi.address_lo,
> +			chan->msi.data);
> +
> +		chan->vc.desc_free = vchan_free_desc;
> +		vchan_init(&chan->vc, &dw->wr_edma);
> +
> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
> +		dt_region->sz = dt_chunk;
> +
> +		dev_dbg(dev, "Data:\tChannel write[%u] off=0x%.8lx\n",
> +			i, chan->dt_off);
> +	}
> +	dma_cap_zero(dw->wr_edma.cap_mask);
> +	dma_cap_set(DMA_SLAVE, dw->wr_edma.cap_mask);
> +	dma_cap_set(DMA_CYCLIC, dw->wr_edma.cap_mask);
> +	dw->wr_edma.directions = BIT(DMA_DEV_TO_MEM);
> +	dw->wr_edma.chancnt = dw->wr_ch_cnt;
> +
> +	/* Create read channels */
> +	INIT_LIST_HEAD(&dw->rd_edma.channels);
> +	for (j = 0; j < dw->rd_ch_cnt; j++, i++) {
> +		struct dw_edma_chan *chan = &dw->chan[i];
> +		struct dw_edma_region *dt_region;
> +
> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
> +		if (!dt_region)
> +			return -ENOMEM;
> +
> +		chan->vc.chan.private = dt_region;
> +
> +		chan->chip = chip;
> +		chan->id = j;
> +		chan->dir = EDMA_DIR_READ;
> +		chan->configured = false;
> +		chan->request = EDMA_REQ_NONE;
> +		chan->status = EDMA_ST_IDLE;
> +
> +		chan->ll_off = (ll_chunk * i);
> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
> +
> +		chan->dt_off = (dt_chunk * i);
> +
> +		dev_dbg(dev, "L. List:\tChannel read[%u] off=0x%.8lx, max_cnt=%u\n",
> +			j, chan->ll_off, chan->ll_max);
> +
> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
> +
> +		dev_dbg(dev, "MSI:\t\tChannel read[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
> +			j, chan->msi.address_hi, chan->msi.address_lo,
> +			chan->msi.data);
> +
> +		chan->vc.desc_free = vchan_free_desc;
> +		vchan_init(&chan->vc, &dw->rd_edma);
> +
> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
> +		dt_region->sz = dt_chunk;
> +
> +		dev_dbg(dev, "Data:\tChannel read[%u] off=0x%.8lx\n",
> +			i, chan->dt_off);

this is similar to previous with obvious changes, I think this can be
made common routine invoke for R and W ...

So HW provides read channels and write channels and not generic channels
which can be used for either R or W?
Vinod Koul - Jan. 20, 2019, 11:47 a.m.
On 16-01-19, 11:53, Gustavo Pimentel wrote:

> >> +	/* Free irqs */
> > 
> > But devm will automatically free it, no ?
> 
> Yes, it shouldn't be there. I'll remove it.

Nope this is correct. we need to ensure irqs are freed and tasklets are
quiesced otherwise you can end up with a case when a spurious interrupt
and then tasklets spinning while core unrolls. devm for irq is not a
good idea unless you really know what you are doing!
Gustavo Pimentel - Jan. 21, 2019, 9:14 a.m.
On 19/01/2019 16:21, Andy Shevchenko wrote:
> On Wed, Jan 16, 2019 at 11:53:00AM +0000, Gustavo Pimentel wrote:
>> On 16/01/2019 10:21, Jose Abreu wrote:
>>> On 1/11/2019 6:33 PM, Gustavo Pimentel wrote:
> 
>>>> +	/* Free irqs */
>>>
>>> But devm will automatically free it, no ?
>>
>> Yes, it shouldn't be there. I'll remove it.
> 
> It depends. If the driver is using tasklets it must free it explicitly like this.

In this driver I'm not using tasklets.

> 
> P.S. devm_*_irq() is quite doubtful to have in the first place.

For curiosity, why?

> 	
>
Gustavo Pimentel - Jan. 21, 2019, 3:48 p.m.
On 20/01/2019 11:44, Vinod Koul wrote:
> On 11-01-19, 19:33, Gustavo Pimentel wrote:
>> Add Synopsys eDMA IP core driver to kernel.
>>
>> This core driver, initializes and configures the eDMA IP using vma-helpers
>> functions and dma-engine subsystem.
> 
> A description of eDMA IP will help review the driver

I've the IP description on the cover-letter, but I'll bring it to this patch, if
it helps.

> 
>> Also creates an abstration layer through callbacks allowing different
>> registers mappings in the future, organized in to versions.
>>
>> This driver can be compile as built-in or external module in kernel.
>>
>> To enable this driver just select DW_EDMA option in kernel configuration,
>> however it requires and selects automatically DMA_ENGINE and
>> DMA_VIRTUAL_CHANNELS option too.
>>
>> Changes:
>> RFC v1->RFC v2:
> 
> These do not belong to change log, either move them to cover-letter or
> keep them after s-o-b lines..

Ok, Andy has also referred that. I'll keep them after s-o-b lines.

> 
>> @@ -0,0 +1,1059 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
> 
> 2019 now

I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
affiliates." this way it's always up to date and I also kept 2018, because it
was the date that I started to develop this driver, if you don't mind.

> 
>> +static struct dw_edma_burst *dw_edma_alloc_burst(struct dw_edma_chunk *chunk)
>> +{
>> +	struct dw_edma_chan *chan = chunk->chan;
>> +	struct dw_edma_burst *burst;
>> +
>> +	burst = kvzalloc(sizeof(*burst), GFP_NOWAIT);
> 
> Is there a specific reason for kvzalloc(),  do you submit this to HW..

There is not reason for that, it were remains of an initial implementation. I'll
replace them by kzalloc.
Nicely spotted!

> 
>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
>> +{
>> +	struct dw_edma_chan *chan = desc->chan;
>> +	struct dw_edma *dw = chan->chip->dw;
>> +	struct dw_edma_chunk *chunk;
>> +
>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
>> +	if (unlikely(!chunk))
>> +		return NULL;
>> +
>> +	INIT_LIST_HEAD(&chunk->list);
>> +	chunk->chan = chan;
>> +	chunk->cb = !(desc->chunks_alloc % 2);
> 
> cb ..?

CB = change bit, is a property of this eDMA IP. Basically it is a kind of
handshake which serves to validate whether the linked list has been updated or
not, especially useful in cases of recycled linked list elements (every linked
list recycle is a new chunk, this will allow to differentiate each chunk).

> 
>> +static int dw_edma_device_config(struct dma_chan *dchan,
>> +				 struct dma_slave_config *config)
>> +{
>> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
>> +	unsigned long flags;
>> +	int err = 0;
>> +
>> +	spin_lock_irqsave(&chan->vc.lock, flags);
>> +
>> +	if (!config) {
>> +		err = -EINVAL;
>> +		goto err_config;
>> +	}
>> +
>> +	if (chan->status != EDMA_ST_IDLE) {
>> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
>> +		err = -EPERM;
> 
> this is not correct behaviour, device_config can be called anytime and
> values can take affect on next transaction submitted..

Hum, I thought we could only reconfigure after transfer being finished.

> 
>> +		goto err_config;
>> +	}
>> +
>> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
>> +		&config->src_addr, &config->dst_addr);
>> +
>> +	chan->src_addr = config->src_addr;
>> +	chan->dst_addr = config->dst_addr;
>> +
>> +	err = ops->device_config(dchan);
> 
> what does this do?

This is an initialization procedure to setup interrupts (data and addresses) to
each channel on the eDMA IP,  in order to be triggered after transfer being
completed or aborted. Due the fact the config() can be called at anytime,
doesn't make sense to have this procedure here, I'll moved it to probe().

> 
>> +static enum dma_status
>> +dw_edma_device_tx_status(struct dma_chan *dchan, dma_cookie_t cookie,
>> +			 struct dma_tx_state *txstate)
>> +{
>> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
>> +	unsigned long flags;
>> +	enum dma_status ret;
>> +
>> +	spin_lock_irqsave(&chan->vc.lock, flags);
>> +
>> +	ret = ops->ch_status(chan);
>> +	if (ret == DMA_ERROR) {
>> +		goto ret_status;
>> +	} else if (ret == DMA_IN_PROGRESS) {
>> +		chan->status = EDMA_ST_BUSY;
>> +		goto ret_status;
> 
> so in this case you set residue as zero, which is not correct

Ok, the residue should be the remaining bytes to transfer, right?

> 
>> +	} else {
>> +		/* DMA_COMPLETE */
>> +		if (chan->status == EDMA_ST_PAUSE)
>> +			ret = DMA_PAUSED;
>> +		else if (chan->status == EDMA_ST_BUSY)
>> +			ret = DMA_IN_PROGRESS;
> 
> ?? if txn is complete how are you returning DMA_IN_PROGRESS?

This IP reports DMA_COMPLETE when it successfully completes the transfer of all
linked list elements, given that there may be even more elements of the linked
list to be transferred the overall state is set to DMA_IN_PROGRESS.
> 
>> +		else
>> +			ret = DMA_COMPLETE;
>> +	}
> 
> this looks incorrect interpretation to me. The status is to be retrieved
> for the given cookie passed and given that you do not even use this
> argument tells me that you have understood this as 'channel' status
> reporting, which is not correct

Yes, you're right, my interpretation assumes this function reports
channel/transaction status. What would be the correct implementation?
Something like this?

{
	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
	const struct dw_edma_core_ops *ops = chan2ops(chan);
	struct dw_edma_desc *desc;
	struct virt_dma_desc *vd;
	unsigned long flags;
	enum dma_status ret;
	u32 residue = 0;

	spin_lock_irqsave(&chan->vc.lock, flags);

	ret = dma_cookie_status(chan, cookie, txstate);
	if (ret == DMA_COMPLETE)
		goto ret_status;

	vd = vchan_next_desc(&chan->vc);
	if (!vd)
		goto ret_status;

	desc = vd2dw_edma_desc(vd);
	if (!desc)
		residue = desc->alloc_sz - desc->xfer_sz;
		
	if (ret == DMA_IN_PROGRESS && chan->status == EDMA_ST_PAUSE)
		ret = DMA_PAUSED;

ret_status:
	spin_unlock_irqrestore(&chan->vc.lock, flags);
	dma_set_residue(txstate, residue);
	
	return ret;
}

> 
>> +static struct dma_async_tx_descriptor *
>> +dw_edma_device_prep_slave_sg(struct dma_chan *dchan, struct scatterlist *sgl,
>> +			     unsigned int sg_len,
>> +			     enum dma_transfer_direction direction,
>> +			     unsigned long flags, void *context)
>> +{
>> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>> +	struct dw_edma_desc *desc;
>> +	struct dw_edma_chunk *chunk;
>> +	struct dw_edma_burst *burst;
>> +	struct scatterlist *sg;
>> +	unsigned long sflags;
>> +	phys_addr_t src_addr;
>> +	phys_addr_t dst_addr;
>> +	int i;
>> +
>> +	if (sg_len < 1) {
>> +		dev_err(chan2dev(chan), "invalid sg length %u\n", sg_len);
>> +		return NULL;
>> +	}
>> +
>> +	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
> 
> what is the second part of the check, can you explain that, who sets
> chan->dir?

The chan->dir is set on probe() during the process of configuring each channel.

> 
>> +		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
>> +	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
>> +		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
>> +	} else {
>> +		dev_err(chan2dev(chan), "invalid direction\n");
>> +		return NULL;
>> +	}
>> +
>> +	if (!chan->configured) {
>> +		dev_err(chan2dev(chan), "(prep_slave_sg) channel not configured\n");
>> +		return NULL;
>> +	}
>> +
>> +	if (chan->status != EDMA_ST_IDLE) {
>> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
>> +		return NULL;
>> +	}
> 
> No, wrong again. The txn must be prepared and then on submit added to a
> queue. You are writing a driver for dmaengine, surely you dont expect
> the channel to be free and then do a txn.. that would be very
> inefficient!

I did not realize that the flow could be as you mentioned. The documentation I
read about the subsystem did not give me this idea. Thank you for clarifying me.

> 
>> +static struct dma_async_tx_descriptor *
>> +dw_edma_device_prep_dma_cyclic(struct dma_chan *dchan, dma_addr_t buf_addr,
>> +			       size_t len, size_t cyclic_cnt,
>> +			       enum dma_transfer_direction direction,
>> +			       unsigned long flags)
>> +{
>> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>> +	struct dw_edma_desc *desc;
>> +	struct dw_edma_chunk *chunk;
>> +	struct dw_edma_burst *burst;
>> +	unsigned long sflags;
>> +	phys_addr_t src_addr;
>> +	phys_addr_t dst_addr;
>> +	u32 i;
>> +
>> +	if (!len || !cyclic_cnt) {
>> +		dev_err(chan2dev(chan), "invalid len or cyclic count\n");
>> +		return NULL;
>> +	}
>> +
>> +	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
>> +		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
>> +	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
>> +		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
>> +	} else {
>> +		dev_err(chan2dev(chan), "invalid direction\n");
>> +		return NULL;
>> +	}
>> +
>> +	if (!chan->configured) {
>> +		dev_err(chan2dev(chan), "(prep_dma_cyclic) channel not configured\n");
>> +		return NULL;
>> +	}
>> +
>> +	if (chan->status != EDMA_ST_IDLE) {
>> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
>> +		return NULL;
>> +	}
>> +
>> +	spin_lock_irqsave(&chan->vc.lock, sflags);
>> +
>> +	desc = dw_edma_alloc_desc(chan);
>> +	if (unlikely(!desc))
>> +		goto err_alloc;
>> +
>> +	chunk = dw_edma_alloc_chunk(desc);
>> +	if (unlikely(!chunk))
>> +		goto err_alloc;
>> +
>> +	src_addr = chan->src_addr;
>> +	dst_addr = chan->dst_addr;
>> +
>> +	for (i = 0; i < cyclic_cnt; i++) {
>> +		if (chunk->bursts_alloc == chan->ll_max) {
>> +			chunk = dw_edma_alloc_chunk(desc);
>> +			if (unlikely(!chunk))
>> +				goto err_alloc;
>> +		}
>> +
>> +		burst = dw_edma_alloc_burst(chunk);
>> +
>> +		if (unlikely(!burst))
>> +			goto err_alloc;
>> +
>> +		burst->sz = len;
>> +		chunk->ll_region.sz += burst->sz;
>> +		desc->alloc_sz += burst->sz;
>> +
>> +		burst->sar = src_addr;
>> +		burst->dar = dst_addr;
>> +
>> +		dev_dbg(chan2dev(chan), "lli %u/%u, sar=0x%.8llx, dar=0x%.8llx, size=%u bytes\n",
>> +			i + 1, cyclic_cnt, burst->sar, burst->dar, burst->sz);
>> +	}
>> +
>> +	spin_unlock_irqrestore(&chan->vc.lock, sflags);
>> +	return vchan_tx_prep(&chan->vc, &desc->vd, flags);
> 
> how is it different from previous? am sure we can reuse bits!

There are some differences, but I agree I can rewrite the code of this and
previous function in order to reuse code between them. This function was a last
minute added feature :).

> 
>> +static void dw_edma_done_interrupt(struct dw_edma_chan *chan)
>> +{
>> +	struct dw_edma *dw = chan->chip->dw;
>> +	const struct dw_edma_core_ops *ops = dw->ops;
>> +	struct virt_dma_desc *vd;
>> +	struct dw_edma_desc *desc;
>> +	unsigned long flags;
>> +
>> +	ops->clear_done_int(chan);
>> +	dev_dbg(chan2dev(chan), "clear done interrupt\n");
>> +
>> +	spin_lock_irqsave(&chan->vc.lock, flags);
>> +	vd = vchan_next_desc(&chan->vc);
>> +	switch (chan->request) {
>> +	case EDMA_REQ_NONE:
>> +		if (!vd)
>> +			break;
>> +
>> +		desc = vd2dw_edma_desc(vd);
>> +		if (desc->chunks_alloc) {
>> +			dev_dbg(chan2dev(chan), "sub-transfer complete\n");
>> +			chan->status = EDMA_ST_BUSY;
>> +			dev_dbg(chan2dev(chan), "transferred %u bytes\n",
>> +				desc->xfer_sz);
>> +			dw_edma_start_transfer(chan);
>> +		} else {
>> +			list_del(&vd->node);
>> +			vchan_cookie_complete(vd);
>> +			chan->status = EDMA_ST_IDLE;
>> +			dev_dbg(chan2dev(chan), "transfer complete\n");
>> +		}
>> +		break;
>> +	case EDMA_REQ_STOP:
>> +		if (!vd)
>> +			break;
>> +
>> +		list_del(&vd->node);
>> +		vchan_cookie_complete(vd);
>> +		chan->request = EDMA_REQ_NONE;
>> +		chan->status = EDMA_ST_IDLE;
>> +		chan->configured = false;
> 
> why is configuration deemed invalid, it can still be used again!

At the time I thought, that was necessary to reconfigure every time that we
required to do a new transfer. You already explained to me that is not
necessary. My bad!

> 
>> +int dw_edma_probe(struct dw_edma_chip *chip)
>> +{
>> +	struct dw_edma *dw = chip->dw;
>> +	struct device *dev = chip->dev;
>> +	const struct dw_edma_core_ops *ops;
>> +	size_t ll_chunk = dw->ll_region.sz;
>> +	size_t dt_chunk = dw->dt_region.sz;
>> +	u32 ch_tot;
>> +	int i, j, err;
>> +
>> +	raw_spin_lock_init(&dw->lock);
>> +
>> +	/* Callback operation selection accordingly to eDMA version */
>> +	switch (dw->version) {
>> +	default:
>> +		dev_err(dev, "unsupported version\n");
>> +		return -EPERM;
>> +	}
> 
> So we have only one case which returns error, was this code tested?

Yes it was, but I understand what your point of view.
This was done like this, because I wanna to segment the patch series like this:
 1) Adding eDMA driver core, which contains the driver skeleton and the whole
logic associated.
 2) and 3) Adding the callbacks for the eDMA register mapping version 0 (it will
appear in the future a new version, I thought that this new version would came
while I was trying to get the feedback about this patch series, therefore would
have another 2 patches for the version 1 isolated and independent from the
version 0).
 4) and 5) Adding the PCIe glue-logic and device ID associated.
 6) Adding maintainer for this driver.
 7) Adding a test driver.

Since this switch will only have the associated case on patch 2, that why on
patch 1 doesn't appear any possibility.

If you feel logic to squash patch 2 with patch 1, just say something and I'll do
it for you :)

> 
>> +
>> +	pm_runtime_get_sync(dev);
>> +
>> +	/* Find out how many write channels are supported by hardware */
>> +	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
>> +	if (!dw->wr_ch_cnt) {
>> +		dev_err(dev, "invalid number of write channels(0)\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Find out how many read channels are supported by hardware */
>> +	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
>> +	if (!dw->rd_ch_cnt) {
>> +		dev_err(dev, "invalid number of read channels(0)\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
>> +		dw->wr_ch_cnt, dw->rd_ch_cnt);
>> +
>> +	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
>> +
>> +	/* Allocate channels */
>> +	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);
> 
> you may use struct_size() here

Hum, this would be useful if I wanted to allocate the dw struct as well, right?
Since dw struct is already allocated, it looks like this can't be used, or am I
missing something?

> 
>> +	if (!dw->chan)
>> +		return -ENOMEM;
>> +
>> +	/* Calculate the linked list chunk for each channel */
>> +	ll_chunk /= roundup_pow_of_two(ch_tot);
>> +
>> +	/* Calculate the linked list chunk for each channel */
>> +	dt_chunk /= roundup_pow_of_two(ch_tot);
>> +
>> +	/* Disable eDMA, only to establish the ideal initial conditions */
>> +	ops->off(dw);
>> +
>> +	snprintf(dw->name, sizeof(dw->name), "dw-edma-core:%d", chip->id);
>> +
>> +	/* Request IRQs */
>> +	if (dw->nr_irqs != 1) {
>> +		dev_err(dev, "invalid number of irqs (%u)\n", dw->nr_irqs);
>> +		return -EINVAL;
>> +	}
>> +
>> +	for (i = 0; i < dw->nr_irqs; i++) {
>> +		err = devm_request_irq(dev, pci_irq_vector(to_pci_dev(dev), i),
>> +				       dw_edma_interrupt_all,
>> +				       IRQF_SHARED, dw->name, chip);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	/* Create write channels */
>> +	INIT_LIST_HEAD(&dw->wr_edma.channels);
>> +	for (i = 0; i < dw->wr_ch_cnt; i++) {
>> +		struct dw_edma_chan *chan = &dw->chan[i];
>> +		struct dw_edma_region *dt_region;
>> +
>> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
>> +		if (!dt_region)
>> +			return -ENOMEM;
>> +
>> +		chan->vc.chan.private = dt_region;
>> +
>> +		chan->chip = chip;
>> +		chan->id = i;
>> +		chan->dir = EDMA_DIR_WRITE;
>> +		chan->configured = false;
>> +		chan->request = EDMA_REQ_NONE;
>> +		chan->status = EDMA_ST_IDLE;
>> +
>> +		chan->ll_off = (ll_chunk * i);
>> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
>> +
>> +		chan->dt_off = (dt_chunk * i);
>> +
>> +		dev_dbg(dev, "L. List:\tChannel write[%u] off=0x%.8lx, max_cnt=%u\n",
>> +			i, chan->ll_off, chan->ll_max);
>> +
>> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
>> +
>> +		dev_dbg(dev, "MSI:\t\tChannel write[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
>> +			i, chan->msi.address_hi, chan->msi.address_lo,
>> +			chan->msi.data);
>> +
>> +		chan->vc.desc_free = vchan_free_desc;
>> +		vchan_init(&chan->vc, &dw->wr_edma);
>> +
>> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
>> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
>> +		dt_region->sz = dt_chunk;
>> +
>> +		dev_dbg(dev, "Data:\tChannel write[%u] off=0x%.8lx\n",
>> +			i, chan->dt_off);
>> +	}
>> +	dma_cap_zero(dw->wr_edma.cap_mask);
>> +	dma_cap_set(DMA_SLAVE, dw->wr_edma.cap_mask);
>> +	dma_cap_set(DMA_CYCLIC, dw->wr_edma.cap_mask);
>> +	dw->wr_edma.directions = BIT(DMA_DEV_TO_MEM);
>> +	dw->wr_edma.chancnt = dw->wr_ch_cnt;
>> +
>> +	/* Create read channels */
>> +	INIT_LIST_HEAD(&dw->rd_edma.channels);
>> +	for (j = 0; j < dw->rd_ch_cnt; j++, i++) {
>> +		struct dw_edma_chan *chan = &dw->chan[i];
>> +		struct dw_edma_region *dt_region;
>> +
>> +		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
>> +		if (!dt_region)
>> +			return -ENOMEM;
>> +
>> +		chan->vc.chan.private = dt_region;
>> +
>> +		chan->chip = chip;
>> +		chan->id = j;
>> +		chan->dir = EDMA_DIR_READ;
>> +		chan->configured = false;
>> +		chan->request = EDMA_REQ_NONE;
>> +		chan->status = EDMA_ST_IDLE;
>> +
>> +		chan->ll_off = (ll_chunk * i);
>> +		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
>> +
>> +		chan->dt_off = (dt_chunk * i);
>> +
>> +		dev_dbg(dev, "L. List:\tChannel read[%u] off=0x%.8lx, max_cnt=%u\n",
>> +			j, chan->ll_off, chan->ll_max);
>> +
>> +		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
>> +
>> +		dev_dbg(dev, "MSI:\t\tChannel read[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
>> +			j, chan->msi.address_hi, chan->msi.address_lo,
>> +			chan->msi.data);
>> +
>> +		chan->vc.desc_free = vchan_free_desc;
>> +		vchan_init(&chan->vc, &dw->rd_edma);
>> +
>> +		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
>> +		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
>> +		dt_region->sz = dt_chunk;
>> +
>> +		dev_dbg(dev, "Data:\tChannel read[%u] off=0x%.8lx\n",
>> +			i, chan->dt_off);
> 
> this is similar to previous with obvious changes, I think this can be
> made common routine invoke for R and W ...

OK, I can rewrite the code to simplify this.

> 
> So HW provides read channels and write channels and not generic channels
> which can be used for either R or W?
> 

The HW, provides a well defined channels, with different registers sets.
Initially I thought to maskared it, but since the number of channels can be
client configurable [0..8] for each type of channel and also asymmetric (write
vs read), I prefer to kept then separated.


Once again, thanks Vinod!
Gustavo Pimentel - Jan. 21, 2019, 3:49 p.m.
On 20/01/2019 11:47, Vinod Koul wrote:
> On 16-01-19, 11:53, Gustavo Pimentel wrote:
> 
>>>> +	/* Free irqs */
>>>
>>> But devm will automatically free it, no ?
>>
>> Yes, it shouldn't be there. I'll remove it.
> 
> Nope this is correct. we need to ensure irqs are freed and tasklets are
> quiesced otherwise you can end up with a case when a spurious interrupt
> and then tasklets spinning while core unrolls. devm for irq is not a
> good idea unless you really know what you are doing!
> 

Ok. Nice explanation.
Vinod Koul - Jan. 23, 2019, 1:08 p.m.
On 21-01-19, 15:48, Gustavo Pimentel wrote:
> On 20/01/2019 11:44, Vinod Koul wrote:
> > On 11-01-19, 19:33, Gustavo Pimentel wrote:
> >> Add Synopsys eDMA IP core driver to kernel.
> >>
> >> This core driver, initializes and configures the eDMA IP using vma-helpers
> >> functions and dma-engine subsystem.
> > 
> > A description of eDMA IP will help review the driver
> 
> I've the IP description on the cover-letter, but I'll bring it to this patch, if
> it helps.

yeah cover doesnt get applied, changelog are very important
documentation for kernel

> >> @@ -0,0 +1,1059 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
> > 
> > 2019 now
> 
> I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
> affiliates." this way it's always up to date and I also kept 2018, because it
> was the date that I started to develop this driver, if you don't mind.

yeah 18 is fine :) it need to end with current year always

> >> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
> >> +{
> >> +	struct dw_edma_chan *chan = desc->chan;
> >> +	struct dw_edma *dw = chan->chip->dw;
> >> +	struct dw_edma_chunk *chunk;
> >> +
> >> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
> >> +	if (unlikely(!chunk))
> >> +		return NULL;
> >> +
> >> +	INIT_LIST_HEAD(&chunk->list);
> >> +	chunk->chan = chan;
> >> +	chunk->cb = !(desc->chunks_alloc % 2);
> > 
> > cb ..?
> 
> CB = change bit, is a property of this eDMA IP. Basically it is a kind of
> handshake which serves to validate whether the linked list has been updated or
> not, especially useful in cases of recycled linked list elements (every linked
> list recycle is a new chunk, this will allow to differentiate each chunk).

okay please add that somewhere. Also it would help me if you explain
what is chunk and other terminologies used in this driver

> >> +static int dw_edma_device_config(struct dma_chan *dchan,
> >> +				 struct dma_slave_config *config)
> >> +{
> >> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> >> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
> >> +	unsigned long flags;
> >> +	int err = 0;
> >> +
> >> +	spin_lock_irqsave(&chan->vc.lock, flags);
> >> +
> >> +	if (!config) {
> >> +		err = -EINVAL;
> >> +		goto err_config;
> >> +	}
> >> +
> >> +	if (chan->status != EDMA_ST_IDLE) {
> >> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
> >> +		err = -EPERM;
> > 
> > this is not correct behaviour, device_config can be called anytime and
> > values can take affect on next transaction submitted..
> 
> Hum, I thought we could only reconfigure after transfer being finished.

Nope, anytime. They take effect for next prepare when you use it

> >> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
> >> +		&config->src_addr, &config->dst_addr);
> >> +
> >> +	chan->src_addr = config->src_addr;
> >> +	chan->dst_addr = config->dst_addr;
> >> +
> >> +	err = ops->device_config(dchan);
> > 
> > what does this do?
> 
> This is an initialization procedure to setup interrupts (data and addresses) to
> each channel on the eDMA IP,  in order to be triggered after transfer being
> completed or aborted. Due the fact the config() can be called at anytime,
> doesn't make sense to have this procedure here, I'll moved it to probe().

Yeah I am not still convinced about having another layer! Have you
though about using common lib for common parts ..?

> > this looks incorrect interpretation to me. The status is to be retrieved
> > for the given cookie passed and given that you do not even use this
> > argument tells me that you have understood this as 'channel' status
> > reporting, which is not correct
> 
> Yes, you're right, my interpretation assumes this function reports
> channel/transaction status. What would be the correct implementation?
> Something like this?
> 
> {
> 	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> 	const struct dw_edma_core_ops *ops = chan2ops(chan);
> 	struct dw_edma_desc *desc;
> 	struct virt_dma_desc *vd;
> 	unsigned long flags;
> 	enum dma_status ret;
> 	u32 residue = 0;
> 
> 	spin_lock_irqsave(&chan->vc.lock, flags);
> 
> 	ret = dma_cookie_status(chan, cookie, txstate);
> 	if (ret == DMA_COMPLETE)
> 		goto ret_status;
> 
> 	vd = vchan_next_desc(&chan->vc);
> 	if (!vd)
> 		goto ret_status;
> 
> 	desc = vd2dw_edma_desc(vd);
> 	if (!desc)
> 		residue = desc->alloc_sz - desc->xfer_sz;
> 		
> 	if (ret == DMA_IN_PROGRESS && chan->status == EDMA_ST_PAUSE)
> 		ret = DMA_PAUSED;

this looks better, please do keep in mind txstate can be null, so
residue caln can be skipped

> >> +static struct dma_async_tx_descriptor *
> >> +dw_edma_device_prep_slave_sg(struct dma_chan *dchan, struct scatterlist *sgl,
> >> +			     unsigned int sg_len,
> >> +			     enum dma_transfer_direction direction,
> >> +			     unsigned long flags, void *context)
> >> +{
> >> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
> >> +	struct dw_edma_desc *desc;
> >> +	struct dw_edma_chunk *chunk;
> >> +	struct dw_edma_burst *burst;
> >> +	struct scatterlist *sg;
> >> +	unsigned long sflags;
> >> +	phys_addr_t src_addr;
> >> +	phys_addr_t dst_addr;
> >> +	int i;
> >> +
> >> +	if (sg_len < 1) {
> >> +		dev_err(chan2dev(chan), "invalid sg length %u\n", sg_len);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
> > 
> > what is the second part of the check, can you explain that, who sets
> > chan->dir?
> 
> The chan->dir is set on probe() during the process of configuring each channel.

So you have channels that are unidirectional?

> >> +		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
> >> +	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
> >> +		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
> >> +	} else {
> >> +		dev_err(chan2dev(chan), "invalid direction\n");
> >> +		return NULL;
> >> +	}
> >> +
> >> +	if (!chan->configured) {
> >> +		dev_err(chan2dev(chan), "(prep_slave_sg) channel not configured\n");
> >> +		return NULL;
> >> +	}
> >> +
> >> +	if (chan->status != EDMA_ST_IDLE) {
> >> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
> >> +		return NULL;
> >> +	}
> > 
> > No, wrong again. The txn must be prepared and then on submit added to a
> > queue. You are writing a driver for dmaengine, surely you dont expect
> > the channel to be free and then do a txn.. that would be very
> > inefficient!
> 
> I did not realize that the flow could be as you mentioned. The documentation I
> read about the subsystem did not give me this idea. Thank you for clarifying me.

I think we have improved  that part a lot, please do feel free to point
out inconsistency
See DMA usage in Documentation/driver-api/dmaengine/client.rst

> >> +int dw_edma_probe(struct dw_edma_chip *chip)
> >> +{
> >> +	struct dw_edma *dw = chip->dw;
> >> +	struct device *dev = chip->dev;
> >> +	const struct dw_edma_core_ops *ops;
> >> +	size_t ll_chunk = dw->ll_region.sz;
> >> +	size_t dt_chunk = dw->dt_region.sz;
> >> +	u32 ch_tot;
> >> +	int i, j, err;
> >> +
> >> +	raw_spin_lock_init(&dw->lock);
> >> +
> >> +	/* Callback operation selection accordingly to eDMA version */
> >> +	switch (dw->version) {
> >> +	default:
> >> +		dev_err(dev, "unsupported version\n");
> >> +		return -EPERM;
> >> +	}
> > 
> > So we have only one case which returns error, was this code tested?
> 
> Yes it was, but I understand what your point of view.
> This was done like this, because I wanna to segment the patch series like this:
>  1) Adding eDMA driver core, which contains the driver skeleton and the whole
> logic associated.
>  2) and 3) Adding the callbacks for the eDMA register mapping version 0 (it will
> appear in the future a new version, I thought that this new version would came
> while I was trying to get the feedback about this patch series, therefore would
> have another 2 patches for the version 1 isolated and independent from the
> version 0).
>  4) and 5) Adding the PCIe glue-logic and device ID associated.
>  6) Adding maintainer for this driver.
>  7) Adding a test driver.
> 
> Since this switch will only have the associated case on patch 2, that why on
> patch 1 doesn't appear any possibility.
> 
> If you feel logic to squash patch 2 with patch 1, just say something and I'll do
> it for you :)

well each patch should build and work on its own, otherwise we get
problems :) But since this is a new driver it is okay. Anyway this patch
is quite _huge_ so lets not add more to it..

I would have moved the callback check to subsequent one..

> >> +	pm_runtime_get_sync(dev);
> >> +
> >> +	/* Find out how many write channels are supported by hardware */
> >> +	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
> >> +	if (!dw->wr_ch_cnt) {
> >> +		dev_err(dev, "invalid number of write channels(0)\n");
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	/* Find out how many read channels are supported by hardware */
> >> +	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
> >> +	if (!dw->rd_ch_cnt) {
> >> +		dev_err(dev, "invalid number of read channels(0)\n");
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
> >> +		dw->wr_ch_cnt, dw->rd_ch_cnt);
> >> +
> >> +	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
> >> +
> >> +	/* Allocate channels */
> >> +	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);
> > 
> > you may use struct_size() here
> 
> Hum, this would be useful if I wanted to allocate the dw struct as well, right?
> Since dw struct is already allocated, it looks like this can't be used, or am I
> missing something?

yeah you can allocate dw + chan one shot...
Gustavo Pimentel - Jan. 31, 2019, 11:33 a.m.
On 23/01/2019 13:08, Vinod Koul wrote:
> On 21-01-19, 15:48, Gustavo Pimentel wrote:
>> On 20/01/2019 11:44, Vinod Koul wrote:
>>> On 11-01-19, 19:33, Gustavo Pimentel wrote:
>>>> Add Synopsys eDMA IP core driver to kernel.
>>>>
>>>> This core driver, initializes and configures the eDMA IP using vma-helpers
>>>> functions and dma-engine subsystem.
>>>
>>> A description of eDMA IP will help review the driver
>>
>> I've the IP description on the cover-letter, but I'll bring it to this patch, if
>> it helps.
> 
> yeah cover doesnt get applied, changelog are very important
> documentation for kernel

Ok. See below.

> 
>>>> @@ -0,0 +1,1059 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
>>>
>>> 2019 now
>>
>> I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
>> affiliates." this way it's always up to date and I also kept 2018, because it
>> was the date that I started to develop this driver, if you don't mind.
> 
> yeah 18 is fine :) it need to end with current year always

Just to be sure, are you saying that must be: "Copyright (c) 2018-2019 Synopsys,
Inc. and/or its affiliates."?

> 
>>>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
>>>> +{
>>>> +	struct dw_edma_chan *chan = desc->chan;
>>>> +	struct dw_edma *dw = chan->chip->dw;
>>>> +	struct dw_edma_chunk *chunk;
>>>> +
>>>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
>>>> +	if (unlikely(!chunk))
>>>> +		return NULL;
>>>> +
>>>> +	INIT_LIST_HEAD(&chunk->list);
>>>> +	chunk->chan = chan;
>>>> +	chunk->cb = !(desc->chunks_alloc % 2);
>>>
>>> cb ..?
>>
>> CB = change bit, is a property of this eDMA IP. Basically it is a kind of
>> handshake which serves to validate whether the linked list has been updated or
>> not, especially useful in cases of recycled linked list elements (every linked
>> list recycle is a new chunk, this will allow to differentiate each chunk).
> 
> okay please add that somewhere. Also it would help me if you explain
> what is chunk and other terminologies used in this driver

I'm thinking to put the below description on the patch, please check if this is
sufficient explicit and clear to understand what this IP needs and does.

In order to transfer data from point A to B as fast as possible this IP requires
a dedicated memory space where will reside a linked list of elements.
All elements of this linked list are continuous and each one describes a data
transfer (source and destination addresses, length and a control variable).

For the sake of simplicity, lets assume a memory space for channel write 0 which
allows about 42 elements.

+---------+
| Desc #0 |--+
+---------+  |
             V
        +----------+
        | Chunk #0 |---+
        |  CB = 1  |   |   +----------+   +-----+   +-----------+   +-----+
        +----------+   +-->| Burst #0 |-->| ... |-->| Burst #41 |-->| llp |
              |            +----------+   +-----+   +-----------+   +-----+
              V
        +----------+
        | Chunk #1 |---+
        |  CB = 0  |   |   +-----------+   +-----+   +-----------+   +-----+
        +----------+   +-->| Burst #42 |-->| ... |-->| Burst #83 |-->| llp |
              |            +-----------+   +-----+   +-----------+   +-----+
              V
        +----------+
        | Chunk #2 |---+
        |  CB = 1  |   |   +-----------+   +-----+   +------------+   +-----+
        +----------+   +-->| Burst #84 |-->| ... |-->| Burst #125 |-->| llp |
              |            +-----------+   +-----+   +------------+   +-----+
              V
        +----------+
        | Chunk #3 |---+
        |  CB = 0  |   |   +------------+   +-----+   +------------+   +-----+
        +----------+   +-->| Burst #126 |-->| ... |-->| Burst #129 |-->| llp |
                           +------------+   +-----+   +------------+   +-----+

Legend:

*Linked list*, also know as Chunk
*Linked list element*, also know as Burst
*CB*, also know as Change Bit, it's a control bit (and typically is toggled)
that allows to easily identify and differentiate between the current linked list
and the previous or the next one.
*LLP*, is a special element that indicates the end of the linked list element
stream also informs that the next CB should be toggle.

On scatter-gather transfer mode, the client will submit a scatter-gather list of
n (on this case 130) elements, that will be divide in multiple Chunks, each
Chunk will have (on this case 42) a limited number of Bursts and after
transferring all Bursts, an interrupt will be triggered, which will allow to
recycle the all linked list dedicated memory again with the new information
relative to the next Chunk and respective Burst associated and repeat the whole
cycle again.

On cyclic transfer mode, the client will submit a buffer pointer, length of it
and number of repetitions, in this case each burst will correspond directly to
each repetition.

Each Burst can describes a data transfer from point A(source) to point
B(destination) with a length that can be from 1 byte up to 4 GB. Since dedicated
the memory space where the linked list will reside is limited, the whole n burst
elements will be organized in several Chunks, that will be used later to recycle
the dedicated memory space to initiate a new sequence of data transfers.

The whole transfer is considered has completed when it was transferred all bursts.

Currently this IP has a set well-known register map, which includes support for
legacy and unroll modes. Legacy mode is version of this register map that has
multiplexer register that allows to switch registers between all write and read
channels and the unroll modes repeats all write and read channels registers with
an offset between them. This register map is called v0.

The IP team is creating a new register map more suitable to the latest PCIe
features, that very likely will change the map register, which this version will
be called v1. As soon as this new version is released by the IP team the support
for this version in be included on this driver.

What do you think? There is any gray area that I could clarify?

> 
>>>> +static int dw_edma_device_config(struct dma_chan *dchan,
>>>> +				 struct dma_slave_config *config)
>>>> +{
>>>> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>>>> +	const struct dw_edma_core_ops *ops = chan2ops(chan);
>>>> +	unsigned long flags;
>>>> +	int err = 0;
>>>> +
>>>> +	spin_lock_irqsave(&chan->vc.lock, flags);
>>>> +
>>>> +	if (!config) {
>>>> +		err = -EINVAL;
>>>> +		goto err_config;
>>>> +	}
>>>> +
>>>> +	if (chan->status != EDMA_ST_IDLE) {
>>>> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
>>>> +		err = -EPERM;
>>>
>>> this is not correct behaviour, device_config can be called anytime and
>>> values can take affect on next transaction submitted..
>>
>> Hum, I thought we could only reconfigure after transfer being finished.
> 
> Nope, anytime. They take effect for next prepare when you use it
> 
>>>> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
>>>> +		&config->src_addr, &config->dst_addr);
>>>> +
>>>> +	chan->src_addr = config->src_addr;
>>>> +	chan->dst_addr = config->dst_addr;
>>>> +
>>>> +	err = ops->device_config(dchan);
>>>
>>> what does this do?
>>
>> This is an initialization procedure to setup interrupts (data and addresses) to
>> each channel on the eDMA IP,  in order to be triggered after transfer being
>> completed or aborted. Due the fact the config() can be called at anytime,
>> doesn't make sense to have this procedure here, I'll moved it to probe().
> 
> Yeah I am not still convinced about having another layer! Have you
> though about using common lib for common parts ..?

Maybe I'm explaining myself wrongly. I don't have any clue about the new
register map for the future versions. I honestly tried to implement the common
lib for the whole process that interact with dma engine controller to ease in
the future any new addition of register mapping, by having this common callbacks
that will only be responsible for interfacing the HW accordingly to register map
version. Maybe I can simplify something in the future, but I only be able to
conclude that after having some idea about the new register map.

IMHO I think this is the easier and clean way to do it, in terms of code
maintenance and architecture, but if you have another idea that you can show me
or pointing out for a driver that implements something similar, I'm no problem
to check it out.

> 
>>> this looks incorrect interpretation to me. The status is to be retrieved
>>> for the given cookie passed and given that you do not even use this
>>> argument tells me that you have understood this as 'channel' status
>>> reporting, which is not correct
>>
>> Yes, you're right, my interpretation assumes this function reports
>> channel/transaction status. What would be the correct implementation?
>> Something like this?
>>
>> {
>> 	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>> 	const struct dw_edma_core_ops *ops = chan2ops(chan);
>> 	struct dw_edma_desc *desc;
>> 	struct virt_dma_desc *vd;
>> 	unsigned long flags;
>> 	enum dma_status ret;
>> 	u32 residue = 0;
>>
>> 	spin_lock_irqsave(&chan->vc.lock, flags);
>>
>> 	ret = dma_cookie_status(chan, cookie, txstate);
>> 	if (ret == DMA_COMPLETE)
>> 		goto ret_status;
>>
>> 	vd = vchan_next_desc(&chan->vc);
>> 	if (!vd)
>> 		goto ret_status;
>>
>> 	desc = vd2dw_edma_desc(vd);
>> 	if (!desc)
>> 		residue = desc->alloc_sz - desc->xfer_sz;
>> 		
>> 	if (ret == DMA_IN_PROGRESS && chan->status == EDMA_ST_PAUSE)
>> 		ret = DMA_PAUSED;
> 
> this looks better, please do keep in mind txstate can be null, so
> residue caln can be skipped

Ok.

> 
>>>> +static struct dma_async_tx_descriptor *
>>>> +dw_edma_device_prep_slave_sg(struct dma_chan *dchan, struct scatterlist *sgl,
>>>> +			     unsigned int sg_len,
>>>> +			     enum dma_transfer_direction direction,
>>>> +			     unsigned long flags, void *context)
>>>> +{
>>>> +	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
>>>> +	struct dw_edma_desc *desc;
>>>> +	struct dw_edma_chunk *chunk;
>>>> +	struct dw_edma_burst *burst;
>>>> +	struct scatterlist *sg;
>>>> +	unsigned long sflags;
>>>> +	phys_addr_t src_addr;
>>>> +	phys_addr_t dst_addr;
>>>> +	int i;
>>>> +
>>>> +	if (sg_len < 1) {
>>>> +		dev_err(chan2dev(chan), "invalid sg length %u\n", sg_len);
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>> +	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
>>>
>>> what is the second part of the check, can you explain that, who sets
>>> chan->dir?
>>
>> The chan->dir is set on probe() during the process of configuring each channel.
> 
> So you have channels that are unidirectional?

Yes. That's one another reason IMHO to keep the dw-edma-test separate from
dma-test, since this IP is more picky and have this particularities.

This don't invalidate to add in the future, similar features to dma-test that
are generic enough to work with other IPs, at least this is my opinion.

> 
>>>> +		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
>>>> +	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
>>>> +		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
>>>> +	} else {
>>>> +		dev_err(chan2dev(chan), "invalid direction\n");
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>> +	if (!chan->configured) {
>>>> +		dev_err(chan2dev(chan), "(prep_slave_sg) channel not configured\n");
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>> +	if (chan->status != EDMA_ST_IDLE) {
>>>> +		dev_err(chan2dev(chan), "channel is busy or paused\n");
>>>> +		return NULL;
>>>> +	}
>>>
>>> No, wrong again. The txn must be prepared and then on submit added to a
>>> queue. You are writing a driver for dmaengine, surely you dont expect
>>> the channel to be free and then do a txn.. that would be very
>>> inefficient!
>>
>> I did not realize that the flow could be as you mentioned. The documentation I
>> read about the subsystem did not give me this idea. Thank you for clarifying me.
> 
> I think we have improved  that part a lot, please do feel free to point
> out inconsistency
> See DMA usage in Documentation/driver-api/dmaengine/client.rst
> 
>>>> +int dw_edma_probe(struct dw_edma_chip *chip)
>>>> +{
>>>> +	struct dw_edma *dw = chip->dw;
>>>> +	struct device *dev = chip->dev;
>>>> +	const struct dw_edma_core_ops *ops;
>>>> +	size_t ll_chunk = dw->ll_region.sz;
>>>> +	size_t dt_chunk = dw->dt_region.sz;
>>>> +	u32 ch_tot;
>>>> +	int i, j, err;
>>>> +
>>>> +	raw_spin_lock_init(&dw->lock);
>>>> +
>>>> +	/* Callback operation selection accordingly to eDMA version */
>>>> +	switch (dw->version) {
>>>> +	default:
>>>> +		dev_err(dev, "unsupported version\n");
>>>> +		return -EPERM;
>>>> +	}
>>>
>>> So we have only one case which returns error, was this code tested?
>>
>> Yes it was, but I understand what your point of view.
>> This was done like this, because I wanna to segment the patch series like this:
>>  1) Adding eDMA driver core, which contains the driver skeleton and the whole
>> logic associated.
>>  2) and 3) Adding the callbacks for the eDMA register mapping version 0 (it will
>> appear in the future a new version, I thought that this new version would came
>> while I was trying to get the feedback about this patch series, therefore would
>> have another 2 patches for the version 1 isolated and independent from the
>> version 0).
>>  4) and 5) Adding the PCIe glue-logic and device ID associated.
>>  6) Adding maintainer for this driver.
>>  7) Adding a test driver.
>>
>> Since this switch will only have the associated case on patch 2, that why on
>> patch 1 doesn't appear any possibility.
>>
>> If you feel logic to squash patch 2 with patch 1, just say something and I'll do
>> it for you :)
> 
> well each patch should build and work on its own, otherwise we get
> problems :) But since this is a new driver it is okay. Anyway this patch
> is quite _huge_ so lets not add more to it..
> 
> I would have moved the callback check to subsequent one..

Sorry. I didn't catch this. Can you explain it?

> 
>>>> +	pm_runtime_get_sync(dev);
>>>> +
>>>> +	/* Find out how many write channels are supported by hardware */
>>>> +	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
>>>> +	if (!dw->wr_ch_cnt) {
>>>> +		dev_err(dev, "invalid number of write channels(0)\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	/* Find out how many read channels are supported by hardware */
>>>> +	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
>>>> +	if (!dw->rd_ch_cnt) {
>>>> +		dev_err(dev, "invalid number of read channels(0)\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
>>>> +		dw->wr_ch_cnt, dw->rd_ch_cnt);
>>>> +
>>>> +	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
>>>> +
>>>> +	/* Allocate channels */
>>>> +	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);
>>>
>>> you may use struct_size() here
>>
>> Hum, this would be useful if I wanted to allocate the dw struct as well, right?
>> Since dw struct is already allocated, it looks like this can't be used, or am I
>> missing something?
> 
> yeah you can allocate dw + chan one shot...

I'm allocating dw struct on the PCIe glue-logic driver in order to set it later
some data that will be useful here and here I'm only adding the channels. That's
why I can't allocate all in one shot.

>
Vinod Koul - Feb. 1, 2019, 4:14 a.m.
On 31-01-19, 11:33, Gustavo Pimentel wrote:
> On 23/01/2019 13:08, Vinod Koul wrote:
> > On 21-01-19, 15:48, Gustavo Pimentel wrote:
> >> On 20/01/2019 11:44, Vinod Koul wrote:
> >>> On 11-01-19, 19:33, Gustavo Pimentel wrote:

> >>>> @@ -0,0 +1,1059 @@
> >>>> +// SPDX-License-Identifier: GPL-2.0
> >>>> +/*
> >>>> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
> >>>
> >>> 2019 now
> >>
> >> I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
> >> affiliates." this way it's always up to date and I also kept 2018, because it
> >> was the date that I started to develop this driver, if you don't mind.
> > 
> > yeah 18 is fine :) it need to end with current year always
> 
> Just to be sure, are you saying that must be: "Copyright (c) 2018-2019 Synopsys,
> Inc. and/or its affiliates."?

Yup :)

> >>>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
> >>>> +{
> >>>> +	struct dw_edma_chan *chan = desc->chan;
> >>>> +	struct dw_edma *dw = chan->chip->dw;
> >>>> +	struct dw_edma_chunk *chunk;
> >>>> +
> >>>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
> >>>> +	if (unlikely(!chunk))
> >>>> +		return NULL;
> >>>> +
> >>>> +	INIT_LIST_HEAD(&chunk->list);
> >>>> +	chunk->chan = chan;
> >>>> +	chunk->cb = !(desc->chunks_alloc % 2);
> >>>
> >>> cb ..?
> >>
> >> CB = change bit, is a property of this eDMA IP. Basically it is a kind of
> >> handshake which serves to validate whether the linked list has been updated or
> >> not, especially useful in cases of recycled linked list elements (every linked
> >> list recycle is a new chunk, this will allow to differentiate each chunk).
> > 
> > okay please add that somewhere. Also it would help me if you explain
> > what is chunk and other terminologies used in this driver
> 
> I'm thinking to put the below description on the patch, please check if this is
> sufficient explicit and clear to understand what this IP needs and does.
> 
> In order to transfer data from point A to B as fast as possible this IP requires
> a dedicated memory space where will reside a linked list of elements.

rephrasing: a dedicated memory space containing linked list of elements

> All elements of this linked list are continuous and each one describes a data
> transfer (source and destination addresses, length and a control variable).
> 
> For the sake of simplicity, lets assume a memory space for channel write 0 which
> allows about 42 elements.
> 
> +---------+
> | Desc #0 |--+
> +---------+  |
>              V
>         +----------+
>         | Chunk #0 |---+
>         |  CB = 1  |   |   +----------+   +-----+   +-----------+   +-----+
>         +----------+   +-->| Burst #0 |-->| ... |-->| Burst #41 |-->| llp |
>               |            +----------+   +-----+   +-----------+   +-----+
>               V
>         +----------+
>         | Chunk #1 |---+
>         |  CB = 0  |   |   +-----------+   +-----+   +-----------+   +-----+
>         +----------+   +-->| Burst #42 |-->| ... |-->| Burst #83 |-->| llp |
>               |            +-----------+   +-----+   +-----------+   +-----+
>               V
>         +----------+
>         | Chunk #2 |---+
>         |  CB = 1  |   |   +-----------+   +-----+   +------------+   +-----+
>         +----------+   +-->| Burst #84 |-->| ... |-->| Burst #125 |-->| llp |
>               |            +-----------+   +-----+   +------------+   +-----+
>               V
>         +----------+
>         | Chunk #3 |---+
>         |  CB = 0  |   |   +------------+   +-----+   +------------+   +-----+
>         +----------+   +-->| Burst #126 |-->| ... |-->| Burst #129 |-->| llp |
>                            +------------+   +-----+   +------------+   +-----+

This is great and required to understand the driver.

How does controller move from Burnst #41 of Chunk 0 to Chunk 1 ?
Is Burst 0 to 129 a link list with Burst 0, 42, 84 and 126 having a
callback (done bit set)..

This sound **very** similar to dw dma concepts!

> 
> Legend:
> 
> *Linked list*, also know as Chunk
> *Linked list element*, also know as Burst
> *CB*, also know as Change Bit, it's a control bit (and typically is toggled)
> that allows to easily identify and differentiate between the current linked list
> and the previous or the next one.
> *LLP*, is a special element that indicates the end of the linked list element
> stream also informs that the next CB should be toggle.
> 
> On scatter-gather transfer mode, the client will submit a scatter-gather list of
> n (on this case 130) elements, that will be divide in multiple Chunks, each
> Chunk will have (on this case 42) a limited number of Bursts and after
> transferring all Bursts, an interrupt will be triggered, which will allow to
> recycle the all linked list dedicated memory again with the new information
> relative to the next Chunk and respective Burst associated and repeat the whole
> cycle again.
> 
> On cyclic transfer mode, the client will submit a buffer pointer, length of it
> and number of repetitions, in this case each burst will correspond directly to
> each repetition.
> 
> Each Burst can describes a data transfer from point A(source) to point
> B(destination) with a length that can be from 1 byte up to 4 GB. Since dedicated
> the memory space where the linked list will reside is limited, the whole n burst
> elements will be organized in several Chunks, that will be used later to recycle
> the dedicated memory space to initiate a new sequence of data transfers.
> 
> The whole transfer is considered has completed when it was transferred all bursts.
> 
> Currently this IP has a set well-known register map, which includes support for
> legacy and unroll modes. Legacy mode is version of this register map that has

whats  unroll..

> multiplexer register that allows to switch registers between all write and read
> channels and the unroll modes repeats all write and read channels registers with
> an offset between them. This register map is called v0.
> 
> The IP team is creating a new register map more suitable to the latest PCIe
> features, that very likely will change the map register, which this version will
> be called v1. As soon as this new version is released by the IP team the support
> for this version in be included on this driver.
> 
> What do you think? There is any gray area that I could clarify?

This sounds good. But we are also catering to a WIP IP which can change
right. Doesnt sound very good idea to me :)

> >>>> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
> >>>> +		&config->src_addr, &config->dst_addr);
> >>>> +
> >>>> +	chan->src_addr = config->src_addr;
> >>>> +	chan->dst_addr = config->dst_addr;
> >>>> +
> >>>> +	err = ops->device_config(dchan);
> >>>
> >>> what does this do?
> >>
> >> This is an initialization procedure to setup interrupts (data and addresses) to
> >> each channel on the eDMA IP,  in order to be triggered after transfer being
> >> completed or aborted. Due the fact the config() can be called at anytime,
> >> doesn't make sense to have this procedure here, I'll moved it to probe().
> > 
> > Yeah I am not still convinced about having another layer! Have you
> > though about using common lib for common parts ..?
> 
> Maybe I'm explaining myself wrongly. I don't have any clue about the new
> register map for the future versions. I honestly tried to implement the common
> lib for the whole process that interact with dma engine controller to ease in
> the future any new addition of register mapping, by having this common callbacks
> that will only be responsible for interfacing the HW accordingly to register map
> version. Maybe I can simplify something in the future, but I only be able to
> conclude that after having some idea about the new register map.
> 
> IMHO I think this is the easier and clean way to do it, in terms of code
> maintenance and architecture, but if you have another idea that you can show me
> or pointing out for a driver that implements something similar, I'm no problem
> to check it out.

That is what my fear was :)

Lets step back and solve one problem at a time. Right now that is v0 of
IP. Please write a simple driver which solve v0 without any layers
involved.

Once v1 is in good shape, you would know what it required and then we
can split v0 driver into common lib and v0 driver and then add v1
driver.


> >>> what is the second part of the check, can you explain that, who sets
> >>> chan->dir?
> >>
> >> The chan->dir is set on probe() during the process of configuring each channel.
> > 
> > So you have channels that are unidirectional?
> 
> Yes. That's one another reason IMHO to keep the dw-edma-test separate from
> dma-test, since this IP is more picky and have this particularities.

That is okay, that should be handled by prep calls, if you get a prep
call for direction you dont support return error and move to next
available one.

That can be done generically in dmatest driver and to answer your
question, yes I would test that :)

> >>>> +int dw_edma_probe(struct dw_edma_chip *chip)
> >>>> +{
> >>>> +	struct dw_edma *dw = chip->dw;
> >>>> +	struct device *dev = chip->dev;
> >>>> +	const struct dw_edma_core_ops *ops;
> >>>> +	size_t ll_chunk = dw->ll_region.sz;
> >>>> +	size_t dt_chunk = dw->dt_region.sz;
> >>>> +	u32 ch_tot;
> >>>> +	int i, j, err;
> >>>> +
> >>>> +	raw_spin_lock_init(&dw->lock);
> >>>> +
> >>>> +	/* Callback operation selection accordingly to eDMA version */
> >>>> +	switch (dw->version) {
> >>>> +	default:
> >>>> +		dev_err(dev, "unsupported version\n");
> >>>> +		return -EPERM;
> >>>> +	}
> >>>
> >>> So we have only one case which returns error, was this code tested?
> >>
> >> Yes it was, but I understand what your point of view.
> >> This was done like this, because I wanna to segment the patch series like this:
> >>  1) Adding eDMA driver core, which contains the driver skeleton and the whole
> >> logic associated.
> >>  2) and 3) Adding the callbacks for the eDMA register mapping version 0 (it will
> >> appear in the future a new version, I thought that this new version would came
> >> while I was trying to get the feedback about this patch series, therefore would
> >> have another 2 patches for the version 1 isolated and independent from the
> >> version 0).
> >>  4) and 5) Adding the PCIe glue-logic and device ID associated.
> >>  6) Adding maintainer for this driver.
> >>  7) Adding a test driver.
> >>
> >> Since this switch will only have the associated case on patch 2, that why on
> >> patch 1 doesn't appear any possibility.
> >>
> >> If you feel logic to squash patch 2 with patch 1, just say something and I'll do
> >> it for you :)
> > 
> > well each patch should build and work on its own, otherwise we get
> > problems :) But since this is a new driver it is okay. Anyway this patch
> > is quite _huge_ so lets not add more to it..
> > 
> > I would have moved the callback check to subsequent one..
> 
> Sorry. I didn't catch this. Can you explain it?

I would have moved these lines to next patch to give them right meaning,
here it feels wrong:

        switch (dw->version) {
        case foo:
                ...
        default:
                ...
        }
Gustavo Pimentel - Feb. 1, 2019, 11:23 a.m.
On 01/02/2019 04:14, Vinod Koul wrote:
> On 31-01-19, 11:33, Gustavo Pimentel wrote:
>> On 23/01/2019 13:08, Vinod Koul wrote:
>>> On 21-01-19, 15:48, Gustavo Pimentel wrote:
>>>> On 20/01/2019 11:44, Vinod Koul wrote:
>>>>> On 11-01-19, 19:33, Gustavo Pimentel wrote:
> 
>>>>>> @@ -0,0 +1,1059 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>> +/*
>>>>>> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
>>>>>
>>>>> 2019 now
>>>>
>>>> I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
>>>> affiliates." this way it's always up to date and I also kept 2018, because it
>>>> was the date that I started to develop this driver, if you don't mind.
>>>
>>> yeah 18 is fine :) it need to end with current year always
>>
>> Just to be sure, are you saying that must be: "Copyright (c) 2018-2019 Synopsys,
>> Inc. and/or its affiliates."?
> 
> Yup :)
> 
>>>>>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
>>>>>> +{
>>>>>> +	struct dw_edma_chan *chan = desc->chan;
>>>>>> +	struct dw_edma *dw = chan->chip->dw;
>>>>>> +	struct dw_edma_chunk *chunk;
>>>>>> +
>>>>>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
>>>>>> +	if (unlikely(!chunk))
>>>>>> +		return NULL;
>>>>>> +
>>>>>> +	INIT_LIST_HEAD(&chunk->list);
>>>>>> +	chunk->chan = chan;
>>>>>> +	chunk->cb = !(desc->chunks_alloc % 2);
>>>>>
>>>>> cb ..?
>>>>
>>>> CB = change bit, is a property of this eDMA IP. Basically it is a kind of
>>>> handshake which serves to validate whether the linked list has been updated or
>>>> not, especially useful in cases of recycled linked list elements (every linked
>>>> list recycle is a new chunk, this will allow to differentiate each chunk).
>>>
>>> okay please add that somewhere. Also it would help me if you explain
>>> what is chunk and other terminologies used in this driver
>>
>> I'm thinking to put the below description on the patch, please check if this is
>> sufficient explicit and clear to understand what this IP needs and does.
>>
>> In order to transfer data from point A to B as fast as possible this IP requires
>> a dedicated memory space where will reside a linked list of elements.
> 
> rephrasing: a dedicated memory space containing linked list of elements
> 
>> All elements of this linked list are continuous and each one describes a data
>> transfer (source and destination addresses, length and a control variable).
>>
>> For the sake of simplicity, lets assume a memory space for channel write 0 which
>> allows about 42 elements.
>>
>> +---------+
>> | Desc #0 |--+
>> +---------+  |
>>              V
>>         +----------+
>>         | Chunk #0 |---+
>>         |  CB = 1  |   |   +----------+   +-----+   +-----------+   +-----+
>>         +----------+   +-->| Burst #0 |-->| ... |-->| Burst #41 |-->| llp |
>>               |            +----------+   +-----+   +-----------+   +-----+
>>               V
>>         +----------+
>>         | Chunk #1 |---+
>>         |  CB = 0  |   |   +-----------+   +-----+   +-----------+   +-----+
>>         +----------+   +-->| Burst #42 |-->| ... |-->| Burst #83 |-->| llp |
>>               |            +-----------+   +-----+   +-----------+   +-----+
>>               V
>>         +----------+
>>         | Chunk #2 |---+
>>         |  CB = 1  |   |   +-----------+   +-----+   +------------+   +-----+
>>         +----------+   +-->| Burst #84 |-->| ... |-->| Burst #125 |-->| llp |
>>               |            +-----------+   +-----+   +------------+   +-----+
>>               V
>>         +----------+
>>         | Chunk #3 |---+
>>         |  CB = 0  |   |   +------------+   +-----+   +------------+   +-----+
>>         +----------+   +-->| Burst #126 |-->| ... |-->| Burst #129 |-->| llp |
>>                            +------------+   +-----+   +------------+   +-----+
> 
> This is great and required to understand the driver.
> 
> How does controller move from Burnst #41 of Chunk 0 to Chunk 1 ?

I forgot to explain that...

On every last Burst of the Chunk (Burst #41, Burst #83, Burst #125 or even Burst
#129) is set some flags on their control variable (RIE and LIE bits) that will
trigger the send of "done" interruption.

On the interruptions callback, is decided whether to recycle the linked list
memory space by writing a new set of Bursts elements (if still exists Chunks to
transfer) or is considered completed (if there is no Chunks available to transfer)

> Is Burst 0 to 129 a link list with Burst 0, 42, 84 and 126 having a
> callback (done bit set)..

I didn't quite understand it your question.

It comes from the prep_slave_sg n elements (130 applying the example), where
will be divide in several Chunks (#0, #1, #2, #3 and #4 applying the example) (a
linked list that will contain another linked list for the Bursts, CB, Chunk
size). The linked list inside of each Chunk will contain a number of Bursts
(limited to the memory space size), each one will possess Source Address,
Destination Address and size of that sub-transfer.

> 
> This sound **very** similar to dw dma concepts!

I believe some parts of dw dma and dw edma behavior are similar and that makes
perfectly sense since both dma are done by Synopsys and may be exist a shared
knowledge, however they are different IPs applied to different products.

> 
>>
>> Legend:
>>
>> *Linked list*, also know as Chunk
>> *Linked list element*, also know as Burst
>> *CB*, also know as Change Bit, it's a control bit (and typically is toggled)
>> that allows to easily identify and differentiate between the current linked list
>> and the previous or the next one.
>> *LLP*, is a special element that indicates the end of the linked list element
>> stream also informs that the next CB should be toggle.
>>
>> On scatter-gather transfer mode, the client will submit a scatter-gather list of
>> n (on this case 130) elements, that will be divide in multiple Chunks, each
>> Chunk will have (on this case 42) a limited number of Bursts and after
>> transferring all Bursts, an interrupt will be triggered, which will allow to
>> recycle the all linked list dedicated memory again with the new information
>> relative to the next Chunk and respective Burst associated and repeat the whole
>> cycle again.
>>
>> On cyclic transfer mode, the client will submit a buffer pointer, length of it
>> and number of repetitions, in this case each burst will correspond directly to
>> each repetition.
>>
>> Each Burst can describes a data transfer from point A(source) to point
>> B(destination) with a length that can be from 1 byte up to 4 GB. Since dedicated
>> the memory space where the linked list will reside is limited, the whole n burst
>> elements will be organized in several Chunks, that will be used later to recycle
>> the dedicated memory space to initiate a new sequence of data transfers.
>>
>> The whole transfer is considered has completed when it was transferred all bursts.
>>
>> Currently this IP has a set well-known register map, which includes support for
>> legacy and unroll modes. Legacy mode is version of this register map that has
> 
> whats  unroll..
> 
>> multiplexer register that allows to switch registers between all write and read

The unroll is explained here, see below

>> channels and the unroll modes repeats all write and read channels registers with
>> an offset between them. This register map is called v0.
>>
>> The IP team is creating a new register map more suitable to the latest PCIe
>> features, that very likely will change the map register, which this version will
>> be called v1. As soon as this new version is released by the IP team the support
>> for this version in be included on this driver.
>>
>> What do you think? There is any gray area that I could clarify?
> 
> This sounds good. But we are also catering to a WIP IP which can change
> right. Doesnt sound very good idea to me :)

This IP exists for several years like this and it works quite fine, however
because of new features and requests (SR-IOV, more dma channels, function
segregation and isolation, performance improvement) that are coming it's natural
to have exist improvements. The drivers should follow the evolution and be
sufficient robust enough to adapt to this new circumstance.

> 
>>>>>> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
>>>>>> +		&config->src_addr, &config->dst_addr);
>>>>>> +
>>>>>> +	chan->src_addr = config->src_addr;
>>>>>> +	chan->dst_addr = config->dst_addr;
>>>>>> +
>>>>>> +	err = ops->device_config(dchan);
>>>>>
>>>>> what does this do?
>>>>
>>>> This is an initialization procedure to setup interrupts (data and addresses) to
>>>> each channel on the eDMA IP,  in order to be triggered after transfer being
>>>> completed or aborted. Due the fact the config() can be called at anytime,
>>>> doesn't make sense to have this procedure here, I'll moved it to probe().
>>>
>>> Yeah I am not still convinced about having another layer! Have you
>>> though about using common lib for common parts ..?
>>
>> Maybe I'm explaining myself wrongly. I don't have any clue about the new
>> register map for the future versions. I honestly tried to implement the common
>> lib for the whole process that interact with dma engine controller to ease in
>> the future any new addition of register mapping, by having this common callbacks
>> that will only be responsible for interfacing the HW accordingly to register map
>> version. Maybe I can simplify something in the future, but I only be able to
>> conclude that after having some idea about the new register map.
>>
>> IMHO I think this is the easier and clean way to do it, in terms of code
>> maintenance and architecture, but if you have another idea that you can show me
>> or pointing out for a driver that implements something similar, I'm no problem
>> to check it out.
> 
> That is what my fear was :)
> 
> Lets step back and solve one problem at a time. Right now that is v0 of
> IP. Please write a simple driver which solve v0 without any layers
> involved.
> 
> Once v1 is in good shape, you would know what it required and then we
> can split v0 driver into common lib and v0 driver and then add v1
> driver.

Can I keep the code segregation as it is now? With the dw-edma-v0-core.c/h,
dw-edma-v0-debugfs.c/h and dw-edma-v0-regs.h

That way I would only replace the callbacks calls to direct function calls and
remove the switch case callback selection base on the version.

> 
> 
>>>>> what is the second part of the check, can you explain that, who sets
>>>>> chan->dir?
>>>>
>>>> The chan->dir is set on probe() during the process of configuring each channel.
>>>
>>> So you have channels that are unidirectional?
>>
>> Yes. That's one another reason IMHO to keep the dw-edma-test separate from
>> dma-test, since this IP is more picky and have this particularities.
> 
> That is okay, that should be handled by prep calls, if you get a prep
> call for direction you dont support return error and move to next
> available one.
> 
> That can be done generically in dmatest driver and to answer your
> question, yes I would test that :)

Like you said, let do this in small steps. For now I would like to suggest to
leave out the dw-dma-test driver and just focus on the current driver, if you
don't mind. I never thought that his test driver would raise this kind of discuss.

> 
>>>>>> +int dw_edma_probe(struct dw_edma_chip *chip)
>>>>>> +{
>>>>>> +	struct dw_edma *dw = chip->dw;
>>>>>> +	struct device *dev = chip->dev;
>>>>>> +	const struct dw_edma_core_ops *ops;
>>>>>> +	size_t ll_chunk = dw->ll_region.sz;
>>>>>> +	size_t dt_chunk = dw->dt_region.sz;
>>>>>> +	u32 ch_tot;
>>>>>> +	int i, j, err;
>>>>>> +
>>>>>> +	raw_spin_lock_init(&dw->lock);
>>>>>> +
>>>>>> +	/* Callback operation selection accordingly to eDMA version */
>>>>>> +	switch (dw->version) {
>>>>>> +	default:
>>>>>> +		dev_err(dev, "unsupported version\n");
>>>>>> +		return -EPERM;
>>>>>> +	}
>>>>>
>>>>> So we have only one case which returns error, was this code tested?
>>>>
>>>> Yes it was, but I understand what your point of view.
>>>> This was done like this, because I wanna to segment the patch series like this:
>>>>  1) Adding eDMA driver core, which contains the driver skeleton and the whole
>>>> logic associated.
>>>>  2) and 3) Adding the callbacks for the eDMA register mapping version 0 (it will
>>>> appear in the future a new version, I thought that this new version would came
>>>> while I was trying to get the feedback about this patch series, therefore would
>>>> have another 2 patches for the version 1 isolated and independent from the
>>>> version 0).
>>>>  4) and 5) Adding the PCIe glue-logic and device ID associated.
>>>>  6) Adding maintainer for this driver.
>>>>  7) Adding a test driver.
>>>>
>>>> Since this switch will only have the associated case on patch 2, that why on
>>>> patch 1 doesn't appear any possibility.
>>>>
>>>> If you feel logic to squash patch 2 with patch 1, just say something and I'll do
>>>> it for you :)
>>>
>>> well each patch should build and work on its own, otherwise we get
>>> problems :) But since this is a new driver it is okay. Anyway this patch
>>> is quite _huge_ so lets not add more to it..
>>>
>>> I would have moved the callback check to subsequent one..
>>
>> Sorry. I didn't catch this. Can you explain it?
> 
> I would have moved these lines to next patch to give them right meaning,
> here it feels wrong:
> 
>         switch (dw->version) {
>         case foo:
>                 ...
>         default:
>                 ...
>         }
>
Vinod Koul - Feb. 2, 2019, 10:07 a.m.
On 01-02-19, 11:23, Gustavo Pimentel wrote:
> On 01/02/2019 04:14, Vinod Koul wrote:
> > On 31-01-19, 11:33, Gustavo Pimentel wrote:
> >> On 23/01/2019 13:08, Vinod Koul wrote:
> >>> On 21-01-19, 15:48, Gustavo Pimentel wrote:
> >>>> On 20/01/2019 11:44, Vinod Koul wrote:
> >>>>> On 11-01-19, 19:33, Gustavo Pimentel wrote:
> > 
> >>>>>> @@ -0,0 +1,1059 @@
> >>>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>>> +/*
> >>>>>> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
> >>>>>
> >>>>> 2019 now
> >>>>
> >>>> I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
> >>>> affiliates." this way it's always up to date and I also kept 2018, because it
> >>>> was the date that I started to develop this driver, if you don't mind.
> >>>
> >>> yeah 18 is fine :) it need to end with current year always
> >>
> >> Just to be sure, are you saying that must be: "Copyright (c) 2018-2019 Synopsys,
> >> Inc. and/or its affiliates."?
> > 
> > Yup :)
> > 
> >>>>>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
> >>>>>> +{
> >>>>>> +	struct dw_edma_chan *chan = desc->chan;
> >>>>>> +	struct dw_edma *dw = chan->chip->dw;
> >>>>>> +	struct dw_edma_chunk *chunk;
> >>>>>> +
> >>>>>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
> >>>>>> +	if (unlikely(!chunk))
> >>>>>> +		return NULL;
> >>>>>> +
> >>>>>> +	INIT_LIST_HEAD(&chunk->list);
> >>>>>> +	chunk->chan = chan;
> >>>>>> +	chunk->cb = !(desc->chunks_alloc % 2);
> >>>>>
> >>>>> cb ..?
> >>>>
> >>>> CB = change bit, is a property of this eDMA IP. Basically it is a kind of
> >>>> handshake which serves to validate whether the linked list has been updated or
> >>>> not, especially useful in cases of recycled linked list elements (every linked
> >>>> list recycle is a new chunk, this will allow to differentiate each chunk).
> >>>
> >>> okay please add that somewhere. Also it would help me if you explain
> >>> what is chunk and other terminologies used in this driver
> >>
> >> I'm thinking to put the below description on the patch, please check if this is
> >> sufficient explicit and clear to understand what this IP needs and does.
> >>
> >> In order to transfer data from point A to B as fast as possible this IP requires
> >> a dedicated memory space where will reside a linked list of elements.
> > 
> > rephrasing: a dedicated memory space containing linked list of elements
> > 
> >> All elements of this linked list are continuous and each one describes a data
> >> transfer (source and destination addresses, length and a control variable).
> >>
> >> For the sake of simplicity, lets assume a memory space for channel write 0 which
> >> allows about 42 elements.
> >>
> >> +---------+
> >> | Desc #0 |--+
> >> +---------+  |
> >>              V
> >>         +----------+
> >>         | Chunk #0 |---+
> >>         |  CB = 1  |   |   +----------+   +-----+   +-----------+   +-----+
> >>         +----------+   +-->| Burst #0 |-->| ... |-->| Burst #41 |-->| llp |
> >>               |            +----------+   +-----+   +-----------+   +-----+
> >>               V
> >>         +----------+
> >>         | Chunk #1 |---+
> >>         |  CB = 0  |   |   +-----------+   +-----+   +-----------+   +-----+
> >>         +----------+   +-->| Burst #42 |-->| ... |-->| Burst #83 |-->| llp |
> >>               |            +-----------+   +-----+   +-----------+   +-----+
> >>               V
> >>         +----------+
> >>         | Chunk #2 |---+
> >>         |  CB = 1  |   |   +-----------+   +-----+   +------------+   +-----+
> >>         +----------+   +-->| Burst #84 |-->| ... |-->| Burst #125 |-->| llp |
> >>               |            +-----------+   +-----+   +------------+   +-----+
> >>               V
> >>         +----------+
> >>         | Chunk #3 |---+
> >>         |  CB = 0  |   |   +------------+   +-----+   +------------+   +-----+
> >>         +----------+   +-->| Burst #126 |-->| ... |-->| Burst #129 |-->| llp |
> >>                            +------------+   +-----+   +------------+   +-----+
> > 
> > This is great and required to understand the driver.
> > 
> > How does controller move from Burnst #41 of Chunk 0 to Chunk 1 ?
> 
> I forgot to explain that...
> 
> On every last Burst of the Chunk (Burst #41, Burst #83, Burst #125 or even Burst
> #129) is set some flags on their control variable (RIE and LIE bits) that will
> trigger the send of "done" interruption.
> 
> On the interruptions callback, is decided whether to recycle the linked list
> memory space by writing a new set of Bursts elements (if still exists Chunks to
> transfer) or is considered completed (if there is no Chunks available to transfer)
> 
> > Is Burst 0 to 129 a link list with Burst 0, 42, 84 and 126 having a
> > callback (done bit set)..
> 
> I didn't quite understand it your question.
> 
> It comes from the prep_slave_sg n elements (130 applying the example), where
> will be divide in several Chunks (#0, #1, #2, #3 and #4 applying the example) (a
> linked list that will contain another linked list for the Bursts, CB, Chunk
> size). The linked list inside of each Chunk will contain a number of Bursts
> (limited to the memory space size), each one will possess Source Address,
> Destination Address and size of that sub-transfer.

Thanks this helps :)

so in this case what does prep_slave_sg n elements corrosponds to (130
bursts) ..? If so how do you split a transaction to chunks and bursts?


> 
> > 
> > This sound **very** similar to dw dma concepts!
> 
> I believe some parts of dw dma and dw edma behavior are similar and that makes
> perfectly sense since both dma are done by Synopsys and may be exist a shared
> knowledge, however they are different IPs applied to different products.
> 
> > 
> >>
> >> Legend:
> >>
> >> *Linked list*, also know as Chunk
> >> *Linked list element*, also know as Burst
> >> *CB*, also know as Change Bit, it's a control bit (and typically is toggled)
> >> that allows to easily identify and differentiate between the current linked list
> >> and the previous or the next one.
> >> *LLP*, is a special element that indicates the end of the linked list element
> >> stream also informs that the next CB should be toggle.
> >>
> >> On scatter-gather transfer mode, the client will submit a scatter-gather list of
> >> n (on this case 130) elements, that will be divide in multiple Chunks, each
> >> Chunk will have (on this case 42) a limited number of Bursts and after
> >> transferring all Bursts, an interrupt will be triggered, which will allow to
> >> recycle the all linked list dedicated memory again with the new information
> >> relative to the next Chunk and respective Burst associated and repeat the whole
> >> cycle again.
> >>
> >> On cyclic transfer mode, the client will submit a buffer pointer, length of it
> >> and number of repetitions, in this case each burst will correspond directly to
> >> each repetition.
> >>
> >> Each Burst can describes a data transfer from point A(source) to point
> >> B(destination) with a length that can be from 1 byte up to 4 GB. Since dedicated
> >> the memory space where the linked list will reside is limited, the whole n burst
> >> elements will be organized in several Chunks, that will be used later to recycle
> >> the dedicated memory space to initiate a new sequence of data transfers.
> >>
> >> The whole transfer is considered has completed when it was transferred all bursts.
> >>
> >> Currently this IP has a set well-known register map, which includes support for
> >> legacy and unroll modes. Legacy mode is version of this register map that has
> > 
> > whats  unroll..
> > 
> >> multiplexer register that allows to switch registers between all write and read
> 
> The unroll is explained here, see below
> 
> >> channels and the unroll modes repeats all write and read channels registers with
> >> an offset between them. This register map is called v0.
> >>
> >> The IP team is creating a new register map more suitable to the latest PCIe
> >> features, that very likely will change the map register, which this version will
> >> be called v1. As soon as this new version is released by the IP team the support
> >> for this version in be included on this driver.
> >>
> >> What do you think? There is any gray area that I could clarify?
> > 
> > This sounds good. But we are also catering to a WIP IP which can change
> > right. Doesnt sound very good idea to me :)
> 
> This IP exists for several years like this and it works quite fine, however
> because of new features and requests (SR-IOV, more dma channels, function
> segregation and isolation, performance improvement) that are coming it's natural
> to have exist improvements. The drivers should follow the evolution and be
> sufficient robust enough to adapt to this new circumstance.

Kernel driver should be modular be design, if we keep things simple then
adding/splitting to support future revisions should be easy!

> >>>>>> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
> >>>>>> +		&config->src_addr, &config->dst_addr);
> >>>>>> +
> >>>>>> +	chan->src_addr = config->src_addr;
> >>>>>> +	chan->dst_addr = config->dst_addr;
> >>>>>> +
> >>>>>> +	err = ops->device_config(dchan);
> >>>>>
> >>>>> what does this do?
> >>>>
> >>>> This is an initialization procedure to setup interrupts (data and addresses) to
> >>>> each channel on the eDMA IP,  in order to be triggered after transfer being
> >>>> completed or aborted. Due the fact the config() can be called at anytime,
> >>>> doesn't make sense to have this procedure here, I'll moved it to probe().
> >>>
> >>> Yeah I am not still convinced about having another layer! Have you
> >>> though about using common lib for common parts ..?
> >>
> >> Maybe I'm explaining myself wrongly. I don't have any clue about the new
> >> register map for the future versions. I honestly tried to implement the common
> >> lib for the whole process that interact with dma engine controller to ease in
> >> the future any new addition of register mapping, by having this common callbacks
> >> that will only be responsible for interfacing the HW accordingly to register map
> >> version. Maybe I can simplify something in the future, but I only be able to
> >> conclude that after having some idea about the new register map.
> >>
> >> IMHO I think this is the easier and clean way to do it, in terms of code
> >> maintenance and architecture, but if you have another idea that you can show me
> >> or pointing out for a driver that implements something similar, I'm no problem
> >> to check it out.
> > 
> > That is what my fear was :)
> > 
> > Lets step back and solve one problem at a time. Right now that is v0 of
> > IP. Please write a simple driver which solve v0 without any layers
> > involved.
> > 
> > Once v1 is in good shape, you would know what it required and then we
> > can split v0 driver into common lib and v0 driver and then add v1
> > driver.
> 
> Can I keep the code segregation as it is now? With the dw-edma-v0-core.c/h,
> dw-edma-v0-debugfs.c/h and dw-edma-v0-regs.h
> 
> That way I would only replace the callbacks calls to direct function calls and
> remove the switch case callback selection base on the version.

That sound better, please keep patches smallish (anything going more
than 600 lines becomes harder to review) and logically split.

> >>>>> what is the second part of the check, can you explain that, who sets
> >>>>> chan->dir?
> >>>>
> >>>> The chan->dir is set on probe() during the process of configuring each channel.
> >>>
> >>> So you have channels that are unidirectional?
> >>
> >> Yes. That's one another reason IMHO to keep the dw-edma-test separate from
> >> dma-test, since this IP is more picky and have this particularities.
> > 
> > That is okay, that should be handled by prep calls, if you get a prep
> > call for direction you dont support return error and move to next
> > available one.
> > 
> > That can be done generically in dmatest driver and to answer your
> > question, yes I would test that :)
> 
> Like you said, let do this in small steps. For now I would like to suggest to
> leave out the dw-dma-test driver and just focus on the current driver, if you
> don't mind. I never thought that his test driver would raise this kind of discuss.

Well dmatest is just a testing aid and doesnt care about internal
designs of your driver. I would say keep it as it is useful aid and will
help other folks as well :)
Gustavo Pimentel - Feb. 6, 2019, 6:06 p.m.
On 02/02/2019 10:07, Vinod Koul wrote:
> On 01-02-19, 11:23, Gustavo Pimentel wrote:
>> On 01/02/2019 04:14, Vinod Koul wrote:
>>> On 31-01-19, 11:33, Gustavo Pimentel wrote:
>>>> On 23/01/2019 13:08, Vinod Koul wrote:
>>>>> On 21-01-19, 15:48, Gustavo Pimentel wrote:
>>>>>> On 20/01/2019 11:44, Vinod Koul wrote:
>>>>>>> On 11-01-19, 19:33, Gustavo Pimentel wrote:
>>>
>>>>>>>> @@ -0,0 +1,1059 @@
>>>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>>>> +/*
>>>>>>>> + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
>>>>>>>
>>>>>>> 2019 now
>>>>>>
>>>>>> I've changed to "Copyright (c) 2018-present Synopsys, Inc. and/or its
>>>>>> affiliates." this way it's always up to date and I also kept 2018, because it
>>>>>> was the date that I started to develop this driver, if you don't mind.
>>>>>
>>>>> yeah 18 is fine :) it need to end with current year always
>>>>
>>>> Just to be sure, are you saying that must be: "Copyright (c) 2018-2019 Synopsys,
>>>> Inc. and/or its affiliates."?
>>>
>>> Yup :)
>>>
>>>>>>>> +static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
>>>>>>>> +{
>>>>>>>> +	struct dw_edma_chan *chan = desc->chan;
>>>>>>>> +	struct dw_edma *dw = chan->chip->dw;
>>>>>>>> +	struct dw_edma_chunk *chunk;
>>>>>>>> +
>>>>>>>> +	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
>>>>>>>> +	if (unlikely(!chunk))
>>>>>>>> +		return NULL;
>>>>>>>> +
>>>>>>>> +	INIT_LIST_HEAD(&chunk->list);
>>>>>>>> +	chunk->chan = chan;
>>>>>>>> +	chunk->cb = !(desc->chunks_alloc % 2);
>>>>>>>
>>>>>>> cb ..?
>>>>>>
>>>>>> CB = change bit, is a property of this eDMA IP. Basically it is a kind of
>>>>>> handshake which serves to validate whether the linked list has been updated or
>>>>>> not, especially useful in cases of recycled linked list elements (every linked
>>>>>> list recycle is a new chunk, this will allow to differentiate each chunk).
>>>>>
>>>>> okay please add that somewhere. Also it would help me if you explain
>>>>> what is chunk and other terminologies used in this driver
>>>>
>>>> I'm thinking to put the below description on the patch, please check if this is
>>>> sufficient explicit and clear to understand what this IP needs and does.
>>>>
>>>> In order to transfer data from point A to B as fast as possible this IP requires
>>>> a dedicated memory space where will reside a linked list of elements.
>>>
>>> rephrasing: a dedicated memory space containing linked list of elements
>>>
>>>> All elements of this linked list are continuous and each one describes a data
>>>> transfer (source and destination addresses, length and a control variable).
>>>>
>>>> For the sake of simplicity, lets assume a memory space for channel write 0 which
>>>> allows about 42 elements.
>>>>
>>>> +---------+
>>>> | Desc #0 |--+
>>>> +---------+  |
>>>>              V
>>>>         +----------+
>>>>         | Chunk #0 |---+
>>>>         |  CB = 1  |   |   +----------+   +-----+   +-----------+   +-----+
>>>>         +----------+   +-->| Burst #0 |-->| ... |-->| Burst #41 |-->| llp |
>>>>               |            +----------+   +-----+   +-----------+   +-----+
>>>>               V
>>>>         +----------+
>>>>         | Chunk #1 |---+
>>>>         |  CB = 0  |   |   +-----------+   +-----+   +-----------+   +-----+
>>>>         +----------+   +-->| Burst #42 |-->| ... |-->| Burst #83 |-->| llp |
>>>>               |            +-----------+   +-----+   +-----------+   +-----+
>>>>               V
>>>>         +----------+
>>>>         | Chunk #2 |---+
>>>>         |  CB = 1  |   |   +-----------+   +-----+   +------------+   +-----+
>>>>         +----------+   +-->| Burst #84 |-->| ... |-->| Burst #125 |-->| llp |
>>>>               |            +-----------+   +-----+   +------------+   +-----+
>>>>               V
>>>>         +----------+
>>>>         | Chunk #3 |---+
>>>>         |  CB = 0  |   |   +------------+   +-----+   +------------+   +-----+
>>>>         +----------+   +-->| Burst #126 |-->| ... |-->| Burst #129 |-->| llp |
>>>>                            +------------+   +-----+   +------------+   +-----+
>>>
>>> This is great and required to understand the driver.
>>>
>>> How does controller move from Burnst #41 of Chunk 0 to Chunk 1 ?
>>
>> I forgot to explain that...
>>
>> On every last Burst of the Chunk (Burst #41, Burst #83, Burst #125 or even Burst
>> #129) is set some flags on their control variable (RIE and LIE bits) that will
>> trigger the send of "done" interruption.
>>
>> On the interruptions callback, is decided whether to recycle the linked list
>> memory space by writing a new set of Bursts elements (if still exists Chunks to
>> transfer) or is considered completed (if there is no Chunks available to transfer)
>>
>>> Is Burst 0 to 129 a link list with Burst 0, 42, 84 and 126 having a
>>> callback (done bit set)..
>>
>> I didn't quite understand it your question.
>>
>> It comes from the prep_slave_sg n elements (130 applying the example), where
>> will be divide in several Chunks (#0, #1, #2, #3 and #4 applying the example) (a
>> linked list that will contain another linked list for the Bursts, CB, Chunk
>> size). The linked list inside of each Chunk will contain a number of Bursts
>> (limited to the memory space size), each one will possess Source Address,
>> Destination Address and size of that sub-transfer.
> 
> Thanks this helps :)
> 
> so in this case what does prep_slave_sg n elements corrosponds to (130
> bursts) ..? If so how do you split a transaction to chunks and bursts?

In the example case, prep_slave_sg API receives a total of 130 elements and the
dw-edma-core implementation will slice in 4 chunks (the first 3 chunks will
containing 42 bursts each and the last chunk only containing 4 bursts).

The burst maximum capacity of each chunk depends of the linked list memory space
size available or destined for, in the example case I didn't specify the
actually space amount, but the maximum number of bursts is calculated by:

maximum number of bursts = (linked list memory size / 24) - 1

the size is in bytes and 24 is the size of each burst information element.
It's also subtracted 1 element because on the end of each burst stream is needed
the llp element and for simplicity it's size is considered to be equal to the
burst element.

> 
> 
>>
>>>
>>> This sound **very** similar to dw dma concepts!
>>
>> I believe some parts of dw dma and dw edma behavior are similar and that makes
>> perfectly sense since both dma are done by Synopsys and may be exist a shared
>> knowledge, however they are different IPs applied to different products.
>>
>>>
>>>>
>>>> Legend:
>>>>
>>>> *Linked list*, also know as Chunk
>>>> *Linked list element*, also know as Burst
>>>> *CB*, also know as Change Bit, it's a control bit (and typically is toggled)
>>>> that allows to easily identify and differentiate between the current linked list
>>>> and the previous or the next one.
>>>> *LLP*, is a special element that indicates the end of the linked list element
>>>> stream also informs that the next CB should be toggle.
>>>>
>>>> On scatter-gather transfer mode, the client will submit a scatter-gather list of
>>>> n (on this case 130) elements, that will be divide in multiple Chunks, each
>>>> Chunk will have (on this case 42) a limited number of Bursts and after
>>>> transferring all Bursts, an interrupt will be triggered, which will allow to
>>>> recycle the all linked list dedicated memory again with the new information
>>>> relative to the next Chunk and respective Burst associated and repeat the whole
>>>> cycle again.
>>>>
>>>> On cyclic transfer mode, the client will submit a buffer pointer, length of it
>>>> and number of repetitions, in this case each burst will correspond directly to
>>>> each repetition.
>>>>
>>>> Each Burst can describes a data transfer from point A(source) to point
>>>> B(destination) with a length that can be from 1 byte up to 4 GB. Since dedicated
>>>> the memory space where the linked list will reside is limited, the whole n burst
>>>> elements will be organized in several Chunks, that will be used later to recycle
>>>> the dedicated memory space to initiate a new sequence of data transfers.
>>>>
>>>> The whole transfer is considered has completed when it was transferred all bursts.
>>>>
>>>> Currently this IP has a set well-known register map, which includes support for
>>>> legacy and unroll modes. Legacy mode is version of this register map that has
>>>
>>> whats  unroll..
>>>
>>>> multiplexer register that allows to switch registers between all write and read
>>
>> The unroll is explained here, see below
>>
>>>> channels and the unroll modes repeats all write and read channels registers with
>>>> an offset between them. This register map is called v0.
>>>>
>>>> The IP team is creating a new register map more suitable to the latest PCIe
>>>> features, that very likely will change the map register, which this version will
>>>> be called v1. As soon as this new version is released by the IP team the support
>>>> for this version in be included on this driver.
>>>>
>>>> What do you think? There is any gray area that I could clarify?
>>>
>>> This sounds good. But we are also catering to a WIP IP which can change
>>> right. Doesnt sound very good idea to me :)
>>
>> This IP exists for several years like this and it works quite fine, however
>> because of new features and requests (SR-IOV, more dma channels, function
>> segregation and isolation, performance improvement) that are coming it's natural
>> to have exist improvements. The drivers should follow the evolution and be
>> sufficient robust enough to adapt to this new circumstance.
> 
> Kernel driver should be modular be design, if we keep things simple then
> adding/splitting to support future revisions should be easy!
> 
>>>>>>>> +	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
>>>>>>>> +		&config->src_addr, &config->dst_addr);
>>>>>>>> +
>>>>>>>> +	chan->src_addr = config->src_addr;
>>>>>>>> +	chan->dst_addr = config->dst_addr;
>>>>>>>> +
>>>>>>>> +	err = ops->device_config(dchan);
>>>>>>>
>>>>>>> what does this do?
>>>>>>
>>>>>> This is an initialization procedure to setup interrupts (data and addresses) to
>>>>>> each channel on the eDMA IP,  in order to be triggered after transfer being
>>>>>> completed or aborted. Due the fact the config() can be called at anytime,
>>>>>> doesn't make sense to have this procedure here, I'll moved it to probe().
>>>>>
>>>>> Yeah I am not still convinced about having another layer! Have you
>>>>> though about using common lib for common parts ..?
>>>>
>>>> Maybe I'm explaining myself wrongly. I don't have any clue about the new
>>>> register map for the future versions. I honestly tried to implement the common
>>>> lib for the whole process that interact with dma engine controller to ease in
>>>> the future any new addition of register mapping, by having this common callbacks
>>>> that will only be responsible for interfacing the HW accordingly to register map
>>>> version. Maybe I can simplify something in the future, but I only be able to
>>>> conclude that after having some idea about the new register map.
>>>>
>>>> IMHO I think this is the easier and clean way to do it, in terms of code
>>>> maintenance and architecture, but if you have another idea that you can show me
>>>> or pointing out for a driver that implements something similar, I'm no problem
>>>> to check it out.
>>>
>>> That is what my fear was :)
>>>
>>> Lets step back and solve one problem at a time. Right now that is v0 of
>>> IP. Please write a simple driver which solve v0 without any layers
>>> involved.
>>>
>>> Once v1 is in good shape, you would know what it required and then we
>>> can split v0 driver into common lib and v0 driver and then add v1
>>> driver.
>>
>> Can I keep the code segregation as it is now? With the dw-edma-v0-core.c/h,
>> dw-edma-v0-debugfs.c/h and dw-edma-v0-regs.h
>>
>> That way I would only replace the callbacks calls to direct function calls and
>> remove the switch case callback selection base on the version.
> 
> That sound better, please keep patches smallish (anything going more
> than 600 lines becomes harder to review) and logically split.
> 
>>>>>>> what is the second part of the check, can you explain that, who sets
>>>>>>> chan->dir?
>>>>>>
>>>>>> The chan->dir is set on probe() during the process of configuring each channel.
>>>>>
>>>>> So you have channels that are unidirectional?
>>>>
>>>> Yes. That's one another reason IMHO to keep the dw-edma-test separate from
>>>> dma-test, since this IP is more picky and have this particularities.
>>>
>>> That is okay, that should be handled by prep calls, if you get a prep
>>> call for direction you dont support return error and move to next
>>> available one.
>>>
>>> That can be done generically in dmatest driver and to answer your
>>> question, yes I would test that :)
>>
>> Like you said, let do this in small steps. For now I would like to suggest to
>> leave out the dw-dma-test driver and just focus on the current driver, if you
>> don't mind. I never thought that his test driver would raise this kind of discuss.
> 
> Well dmatest is just a testing aid and doesnt care about internal
> designs of your driver. I would say keep it as it is useful aid and will
> help other folks as well :)
> 

Let's see, I honestly hope so...

Gustavo

Patch

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index d2286c7..5877ee5 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -651,6 +651,8 @@  source "drivers/dma/qcom/Kconfig"
 
 source "drivers/dma/dw/Kconfig"
 
+source "drivers/dma/dw-edma/Kconfig"
+
 source "drivers/dma/hsu/Kconfig"
 
 source "drivers/dma/sh/Kconfig"
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 09571a8..e42e7d6 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -29,6 +29,7 @@  obj-$(CONFIG_DMA_SUN4I) += sun4i-dma.o
 obj-$(CONFIG_DMA_SUN6I) += sun6i-dma.o
 obj-$(CONFIG_DW_AXI_DMAC) += dw-axi-dmac/
 obj-$(CONFIG_DW_DMAC_CORE) += dw/
+obj-$(CONFIG_DW_EDMA) += dw-edma/
 obj-$(CONFIG_EP93XX_DMA) += ep93xx_dma.o
 obj-$(CONFIG_FSL_DMA) += fsldma.o
 obj-$(CONFIG_FSL_EDMA) += fsl-edma.o fsl-edma-common.o
diff --git a/drivers/dma/dw-edma/Kconfig b/drivers/dma/dw-edma/Kconfig
new file mode 100644
index 0000000..3016bed
--- /dev/null
+++ b/drivers/dma/dw-edma/Kconfig
@@ -0,0 +1,9 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+config DW_EDMA
+	tristate "Synopsys DesignWare eDMA controller driver"
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Support the Synopsys DesignWare eDMA controller, normally
+	  implemented on endpoints SoCs.
diff --git a/drivers/dma/dw-edma/Makefile b/drivers/dma/dw-edma/Makefile
new file mode 100644
index 0000000..3224010
--- /dev/null
+++ b/drivers/dma/dw-edma/Makefile
@@ -0,0 +1,4 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_DW_EDMA)		+= dw-edma.o
+dw-edma-objs			:= dw-edma-core.o
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
new file mode 100644
index 0000000..2b6b70f
--- /dev/null
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -0,0 +1,1059 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * Synopsys DesignWare eDMA core driver
+ */
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/pm_runtime.h>
+#include <linux/dmaengine.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/dma/edma.h>
+#include <linux/pci.h>
+
+#include "dw-edma-core.h"
+#include "../dmaengine.h"
+#include "../virt-dma.h"
+
+#define SET(reg, name, val)			\
+	reg.name = val
+
+#define SET_BOTH_CH(name, value)		\
+	do {					\
+		SET(dw->wr_edma, name, value);	\
+		SET(dw->rd_edma, name, value);	\
+	} while (0)
+
+static inline
+struct device *dchan2dev(struct dma_chan *dchan)
+{
+	return &dchan->dev->device;
+}
+
+static inline
+struct device *chan2dev(struct dw_edma_chan *chan)
+{
+	return &chan->vc.chan.dev->device;
+}
+
+static inline
+const struct dw_edma_core_ops *chan2ops(struct dw_edma_chan *chan)
+{
+	return chan->chip->dw->ops;
+}
+
+static inline
+struct dw_edma_desc *vd2dw_edma_desc(struct virt_dma_desc *vd)
+{
+	return container_of(vd, struct dw_edma_desc, vd);
+}
+
+static struct dw_edma_burst *dw_edma_alloc_burst(struct dw_edma_chunk *chunk)
+{
+	struct dw_edma_chan *chan = chunk->chan;
+	struct dw_edma_burst *burst;
+
+	burst = kvzalloc(sizeof(*burst), GFP_NOWAIT);
+	if (unlikely(!burst))
+		return NULL;
+
+	INIT_LIST_HEAD(&burst->list);
+
+	if (chunk->burst) {
+		/* Create and add new element into the linked list */
+		chunk->bursts_alloc++;
+		dev_dbg(chan2dev(chan), "alloc new burst element (%d)\n",
+			chunk->bursts_alloc);
+		list_add_tail(&burst->list, &chunk->burst->list);
+	} else {
+		/* List head */
+		chunk->bursts_alloc = 0;
+		chunk->burst = burst;
+		dev_dbg(chan2dev(chan), "alloc new burst head\n");
+	}
+
+	return burst;
+}
+
+static struct dw_edma_chunk *dw_edma_alloc_chunk(struct dw_edma_desc *desc)
+{
+	struct dw_edma_chan *chan = desc->chan;
+	struct dw_edma *dw = chan->chip->dw;
+	struct dw_edma_chunk *chunk;
+
+	chunk = kvzalloc(sizeof(*chunk), GFP_NOWAIT);
+	if (unlikely(!chunk))
+		return NULL;
+
+	INIT_LIST_HEAD(&chunk->list);
+	chunk->chan = chan;
+	chunk->cb = !(desc->chunks_alloc % 2);
+	chunk->ll_region.paddr = dw->ll_region.paddr + chan->ll_off;
+	chunk->ll_region.vaddr = dw->ll_region.vaddr + chan->ll_off;
+
+	if (desc->chunk) {
+		/* Create and add new element into the linked list */
+		desc->chunks_alloc++;
+		dev_dbg(chan2dev(chan), "alloc new chunk element (%d)\n",
+			desc->chunks_alloc);
+		list_add_tail(&chunk->list, &desc->chunk->list);
+		dw_edma_alloc_burst(chunk);
+	} else {
+		/* List head */
+		chunk->burst = NULL;
+		desc->chunks_alloc = 0;
+		desc->chunk = chunk;
+		dev_dbg(chan2dev(chan), "alloc new chunk head\n");
+	}
+
+	return chunk;
+}
+
+static struct dw_edma_desc *dw_edma_alloc_desc(struct dw_edma_chan *chan)
+{
+	struct dw_edma_desc *desc;
+
+	dev_dbg(chan2dev(chan), "alloc new descriptor\n");
+
+	desc = kvzalloc(sizeof(*desc), GFP_NOWAIT);
+	if (unlikely(!desc))
+		return NULL;
+
+	desc->chan = chan;
+	dw_edma_alloc_chunk(desc);
+
+	return desc;
+}
+
+static void dw_edma_free_burst(struct dw_edma_chunk *chunk)
+{
+	struct dw_edma_burst *child, *_next;
+
+	if (!chunk->burst)
+		return;
+
+	/* Remove all the list elements */
+	list_for_each_entry_safe(child, _next, &chunk->burst->list, list) {
+		list_del(&child->list);
+		kvfree(child);
+		chunk->bursts_alloc--;
+	}
+
+	/* Remove the list head */
+	kvfree(child);
+	chunk->burst = NULL;
+}
+
+static void dw_edma_free_chunk(struct dw_edma_desc *desc)
+{
+	struct dw_edma_chan *chan = desc->chan;
+	struct dw_edma_chunk *child, *_next;
+
+	if (!desc->chunk)
+		return;
+
+	/* Remove all the list elements */
+	list_for_each_entry_safe(child, _next, &desc->chunk->list, list) {
+		dw_edma_free_burst(child);
+		if (child->bursts_alloc)
+			dev_dbg(chan2dev(chan),	"%u bursts still allocated\n",
+				child->bursts_alloc);
+		list_del(&child->list);
+		kvfree(child);
+		desc->chunks_alloc--;
+	}
+
+	/* Remove the list head */
+	kvfree(child);
+	desc->chunk = NULL;
+}
+
+static void dw_edma_free_desc(struct dw_edma_desc *desc)
+{
+	struct dw_edma_chan *chan = desc->chan;
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	dw_edma_free_chunk(desc);
+	if (desc->chunks_alloc)
+		dev_dbg(chan2dev(chan), "%u chunks still allocated\n",
+			desc->chunks_alloc);
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static void vchan_free_desc(struct virt_dma_desc *vdesc)
+{
+	dw_edma_free_desc(vd2dw_edma_desc(vdesc));
+}
+
+static void dw_edma_start_transfer(struct dw_edma_chan *chan)
+{
+	struct virt_dma_desc *vd;
+	struct dw_edma_desc *desc;
+	struct dw_edma_chunk *child;
+	const struct dw_edma_core_ops *ops = chan2ops(chan);
+
+	vd = vchan_next_desc(&chan->vc);
+	if (!vd)
+		return;
+
+	desc = vd2dw_edma_desc(vd);
+	if (!desc)
+		return;
+
+	child = list_first_entry_or_null(&desc->chunk->list,
+					 struct dw_edma_chunk, list);
+	if (!child)
+		return;
+
+	ops->start(child, !desc->xfer_sz);
+	desc->xfer_sz += child->ll_region.sz;
+	dev_dbg(chan2dev(chan), "transfer of %u bytes started\n",
+		child->ll_region.sz);
+
+	dw_edma_free_burst(child);
+	if (child->bursts_alloc)
+		dev_dbg(chan2dev(chan),	"%u bursts still allocated\n",
+			child->bursts_alloc);
+	list_del(&child->list);
+	kvfree(child);
+	desc->chunks_alloc--;
+}
+
+static int dw_edma_device_config(struct dma_chan *dchan,
+				 struct dma_slave_config *config)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	const struct dw_edma_core_ops *ops = chan2ops(chan);
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	if (!config) {
+		err = -EINVAL;
+		goto err_config;
+	}
+
+	if (chan->status != EDMA_ST_IDLE) {
+		dev_err(chan2dev(chan), "channel is busy or paused\n");
+		err = -EPERM;
+		goto err_config;
+	}
+
+	dev_dbg(chan2dev(chan), "addr(physical) src=%pa, dst=%pa\n",
+		&config->src_addr, &config->dst_addr);
+
+	chan->src_addr = config->src_addr;
+	chan->dst_addr = config->dst_addr;
+
+	err = ops->device_config(dchan);
+	if (!err) {
+		chan->configured = true;
+		dev_dbg(chan2dev(chan),	"channel configured\n");
+	}
+
+err_config:
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+	return err;
+}
+
+static int dw_edma_device_pause(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	if (!chan->configured) {
+		dev_err(chan2dev(chan), "(pause) channel not configured\n");
+		err = -EPERM;
+		goto err_pause;
+	}
+
+	if (chan->status != EDMA_ST_BUSY) {
+		err = -EPERM;
+		goto err_pause;
+	}
+
+	if (chan->request != EDMA_REQ_NONE) {
+		err = -EPERM;
+		goto err_pause;
+	}
+
+	chan->request = EDMA_REQ_PAUSE;
+	dev_dbg(chan2dev(chan), "pause requested\n");
+
+err_pause:
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+	return err;
+}
+
+static int dw_edma_device_resume(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	if (!chan->configured) {
+		dev_err(chan2dev(chan), "(resume) channel not configured\n");
+		err = -EPERM;
+		goto err_resume;
+	}
+
+	if (chan->status != EDMA_ST_PAUSE) {
+		err = -EPERM;
+		goto err_resume;
+	}
+
+	if (chan->request != EDMA_REQ_NONE) {
+		err = -EPERM;
+		goto err_resume;
+	}
+
+	chan->status = EDMA_ST_BUSY;
+	dev_dbg(dchan2dev(dchan), "transfer resumed\n");
+	dw_edma_start_transfer(chan);
+
+err_resume:
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+	return err;
+}
+
+static int dw_edma_device_terminate_all(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	const struct dw_edma_core_ops *ops = chan2ops(chan);
+	unsigned long flags;
+	int err = 0;
+	LIST_HEAD(head);
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	if (!chan->configured)
+		goto err_terminate;
+
+	if (chan->status == EDMA_ST_PAUSE) {
+		dev_dbg(dchan2dev(dchan), "channel is paused, stopping immediately\n");
+		chan->status = EDMA_ST_IDLE;
+		chan->configured = false;
+		goto err_terminate;
+	} else if (chan->status == EDMA_ST_IDLE) {
+		chan->configured = false;
+		goto err_terminate;
+	} else if (ops->ch_status(chan) == DMA_COMPLETE) {
+		/*
+		 * The channel is in a false BUSY state, probably didn't
+		 * receive or lost an interrupt
+		 */
+		chan->status = EDMA_ST_IDLE;
+		chan->configured = false;
+		goto err_terminate;
+	}
+
+	if (chan->request > EDMA_REQ_PAUSE) {
+		err = -EPERM;
+		goto err_terminate;
+	}
+
+	chan->request = EDMA_REQ_STOP;
+	dev_dbg(dchan2dev(dchan), "termination requested\n");
+
+err_terminate:
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+	return err;
+}
+
+static void dw_edma_device_issue_pending(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	if (chan->configured && chan->request == EDMA_REQ_NONE &&
+	    chan->status == EDMA_ST_IDLE && vchan_issue_pending(&chan->vc)) {
+		dev_dbg(dchan2dev(dchan), "transfer issued\n");
+		chan->status = EDMA_ST_BUSY;
+		dw_edma_start_transfer(chan);
+	}
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static enum dma_status
+dw_edma_device_tx_status(struct dma_chan *dchan, dma_cookie_t cookie,
+			 struct dma_tx_state *txstate)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	const struct dw_edma_core_ops *ops = chan2ops(chan);
+	unsigned long flags;
+	enum dma_status ret;
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+
+	ret = ops->ch_status(chan);
+	if (ret == DMA_ERROR) {
+		goto ret_status;
+	} else if (ret == DMA_IN_PROGRESS) {
+		chan->status = EDMA_ST_BUSY;
+		goto ret_status;
+	} else {
+		/* DMA_COMPLETE */
+		if (chan->status == EDMA_ST_PAUSE)
+			ret = DMA_PAUSED;
+		else if (chan->status == EDMA_ST_BUSY)
+			ret = DMA_IN_PROGRESS;
+		else
+			ret = DMA_COMPLETE;
+	}
+
+ret_status:
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+	dma_set_residue(txstate, 0);
+
+	return ret;
+}
+
+static struct dma_async_tx_descriptor *
+dw_edma_device_prep_slave_sg(struct dma_chan *dchan, struct scatterlist *sgl,
+			     unsigned int sg_len,
+			     enum dma_transfer_direction direction,
+			     unsigned long flags, void *context)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	struct dw_edma_desc *desc;
+	struct dw_edma_chunk *chunk;
+	struct dw_edma_burst *burst;
+	struct scatterlist *sg;
+	unsigned long sflags;
+	phys_addr_t src_addr;
+	phys_addr_t dst_addr;
+	int i;
+
+	if (sg_len < 1) {
+		dev_err(chan2dev(chan), "invalid sg length %u\n", sg_len);
+		return NULL;
+	}
+
+	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
+		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
+	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
+		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
+	} else {
+		dev_err(chan2dev(chan), "invalid direction\n");
+		return NULL;
+	}
+
+	if (!chan->configured) {
+		dev_err(chan2dev(chan), "(prep_slave_sg) channel not configured\n");
+		return NULL;
+	}
+
+	if (chan->status != EDMA_ST_IDLE) {
+		dev_err(chan2dev(chan), "channel is busy or paused\n");
+		return NULL;
+	}
+
+	spin_lock_irqsave(&chan->vc.lock, sflags);
+
+	desc = dw_edma_alloc_desc(chan);
+	if (unlikely(!desc))
+		goto err_alloc;
+
+	chunk = dw_edma_alloc_chunk(desc);
+	if (unlikely(!chunk))
+		goto err_alloc;
+
+	src_addr = chan->src_addr;
+	dst_addr = chan->dst_addr;
+
+	for_each_sg(sgl, sg, sg_len, i) {
+		if (chunk->bursts_alloc == chan->ll_max) {
+			chunk = dw_edma_alloc_chunk(desc);
+			if (unlikely(!chunk))
+				goto err_alloc;
+		}
+
+		burst = dw_edma_alloc_burst(chunk);
+
+		if (unlikely(!burst))
+			goto err_alloc;
+
+		burst->sz = sg_dma_len(sg);
+		chunk->ll_region.sz += burst->sz;
+		desc->alloc_sz += burst->sz;
+
+		if (direction == DMA_MEM_TO_DEV) {
+			burst->sar = sg_dma_address(sg);
+			burst->dar = dst_addr;
+			dst_addr += sg_dma_len(sg);
+		} else {
+			burst->sar = src_addr;
+			burst->dar = sg_dma_address(sg);
+			src_addr += sg_dma_len(sg);
+		}
+
+		dev_dbg(chan2dev(chan), "lli %u/%u, sar=0x%.8llx, dar=0x%.8llx, size=%u bytes\n",
+			i + 1, sg_len, burst->sar, burst->dar, burst->sz);
+	}
+
+	spin_unlock_irqrestore(&chan->vc.lock, sflags);
+	return vchan_tx_prep(&chan->vc, &desc->vd, flags);
+
+err_alloc:
+	if (desc)
+		dw_edma_free_desc(desc);
+	spin_unlock_irqrestore(&chan->vc.lock, sflags);
+	return NULL;
+}
+
+static struct dma_async_tx_descriptor *
+dw_edma_device_prep_dma_cyclic(struct dma_chan *dchan, dma_addr_t buf_addr,
+			       size_t len, size_t cyclic_cnt,
+			       enum dma_transfer_direction direction,
+			       unsigned long flags)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	struct dw_edma_desc *desc;
+	struct dw_edma_chunk *chunk;
+	struct dw_edma_burst *burst;
+	unsigned long sflags;
+	phys_addr_t src_addr;
+	phys_addr_t dst_addr;
+	u32 i;
+
+	if (!len || !cyclic_cnt) {
+		dev_err(chan2dev(chan), "invalid len or cyclic count\n");
+		return NULL;
+	}
+
+	if (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE) {
+		dev_dbg(chan2dev(chan),	"prepare operation (WRITE)\n");
+	} else if (direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_READ) {
+		dev_dbg(chan2dev(chan),	"prepare operation (READ)\n");
+	} else {
+		dev_err(chan2dev(chan), "invalid direction\n");
+		return NULL;
+	}
+
+	if (!chan->configured) {
+		dev_err(chan2dev(chan), "(prep_dma_cyclic) channel not configured\n");
+		return NULL;
+	}
+
+	if (chan->status != EDMA_ST_IDLE) {
+		dev_err(chan2dev(chan), "channel is busy or paused\n");
+		return NULL;
+	}
+
+	spin_lock_irqsave(&chan->vc.lock, sflags);
+
+	desc = dw_edma_alloc_desc(chan);
+	if (unlikely(!desc))
+		goto err_alloc;
+
+	chunk = dw_edma_alloc_chunk(desc);
+	if (unlikely(!chunk))
+		goto err_alloc;
+
+	src_addr = chan->src_addr;
+	dst_addr = chan->dst_addr;
+
+	for (i = 0; i < cyclic_cnt; i++) {
+		if (chunk->bursts_alloc == chan->ll_max) {
+			chunk = dw_edma_alloc_chunk(desc);
+			if (unlikely(!chunk))
+				goto err_alloc;
+		}
+
+		burst = dw_edma_alloc_burst(chunk);
+
+		if (unlikely(!burst))
+			goto err_alloc;
+
+		burst->sz = len;
+		chunk->ll_region.sz += burst->sz;
+		desc->alloc_sz += burst->sz;
+
+		burst->sar = src_addr;
+		burst->dar = dst_addr;
+
+		dev_dbg(chan2dev(chan), "lli %u/%u, sar=0x%.8llx, dar=0x%.8llx, size=%u bytes\n",
+			i + 1, cyclic_cnt, burst->sar, burst->dar, burst->sz);
+	}
+
+	spin_unlock_irqrestore(&chan->vc.lock, sflags);
+	return vchan_tx_prep(&chan->vc, &desc->vd, flags);
+
+err_alloc:
+	if (desc)
+		dw_edma_free_desc(desc);
+	spin_unlock_irqrestore(&chan->vc.lock, sflags);
+	return NULL;
+}
+
+static void dw_edma_done_interrupt(struct dw_edma_chan *chan)
+{
+	struct dw_edma *dw = chan->chip->dw;
+	const struct dw_edma_core_ops *ops = dw->ops;
+	struct virt_dma_desc *vd;
+	struct dw_edma_desc *desc;
+	unsigned long flags;
+
+	ops->clear_done_int(chan);
+	dev_dbg(chan2dev(chan), "clear done interrupt\n");
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+	vd = vchan_next_desc(&chan->vc);
+	switch (chan->request) {
+	case EDMA_REQ_NONE:
+		if (!vd)
+			break;
+
+		desc = vd2dw_edma_desc(vd);
+		if (desc->chunks_alloc) {
+			dev_dbg(chan2dev(chan), "sub-transfer complete\n");
+			chan->status = EDMA_ST_BUSY;
+			dev_dbg(chan2dev(chan), "transferred %u bytes\n",
+				desc->xfer_sz);
+			dw_edma_start_transfer(chan);
+		} else {
+			list_del(&vd->node);
+			vchan_cookie_complete(vd);
+			chan->status = EDMA_ST_IDLE;
+			dev_dbg(chan2dev(chan), "transfer complete\n");
+		}
+		break;
+	case EDMA_REQ_STOP:
+		if (!vd)
+			break;
+
+		list_del(&vd->node);
+		vchan_cookie_complete(vd);
+		chan->request = EDMA_REQ_NONE;
+		chan->status = EDMA_ST_IDLE;
+		chan->configured = false;
+		dev_dbg(chan2dev(chan), "transfer stop\n");
+		break;
+	case EDMA_REQ_PAUSE:
+		chan->request = EDMA_REQ_NONE;
+		chan->status = EDMA_ST_PAUSE;
+		break;
+	default:
+		dev_err(chan2dev(chan), "invalid status state\n");
+		break;
+	}
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static void dw_edma_abort_interrupt(struct dw_edma_chan *chan)
+{
+	struct dw_edma *dw = chan->chip->dw;
+	const struct dw_edma_core_ops *ops = dw->ops;
+	struct virt_dma_desc *vd;
+	unsigned long flags;
+
+	ops->clear_abort_int(chan);
+	dev_dbg(chan2dev(chan), "clear abort interrupt\n");
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+	vd = vchan_next_desc(&chan->vc);
+	if (vd) {
+		list_del(&vd->node);
+		vchan_cookie_complete(vd);
+	}
+	chan->request = EDMA_REQ_NONE;
+	chan->status = EDMA_ST_IDLE;
+
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
+static irqreturn_t dw_edma_interrupt_write(int irq, void *data)
+{
+	struct dw_edma_chip *chip = data;
+	struct dw_edma *dw = chip->dw;
+	const struct dw_edma_core_ops *ops = dw->ops;
+	unsigned long tot = dw->wr_ch_cnt;
+	unsigned long pos = 0;
+	unsigned long val;
+
+	pos = 0;
+	val = ops->status_done_int(dw, EDMA_DIR_WRITE);
+	while ((pos = find_next_bit(&val, tot, pos)) != tot) {
+		struct dw_edma_chan *chan = &dw->chan[pos];
+
+		dw_edma_done_interrupt(chan);
+		pos++;
+	}
+
+	pos = 0;
+	val = ops->status_abort_int(dw, EDMA_DIR_WRITE);
+	while ((pos = find_next_bit(&val, tot, pos)) != tot) {
+		struct dw_edma_chan *chan = &dw->chan[pos];
+
+		dw_edma_abort_interrupt(chan);
+		pos++;
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t dw_edma_interrupt_read(int irq, void *data)
+{
+	struct dw_edma_chip *chip = data;
+	struct dw_edma *dw = chip->dw;
+	const struct dw_edma_core_ops *ops = dw->ops;
+	unsigned long tot = dw->rd_ch_cnt;
+	unsigned long off = dw->wr_ch_cnt;
+	unsigned long pos, val;
+
+	pos = 0;
+	val = ops->status_done_int(dw, EDMA_DIR_READ);
+	while ((pos = find_next_bit(&val, tot, pos)) != tot) {
+		struct dw_edma_chan *chan = &dw->chan[pos + off];
+
+		dw_edma_done_interrupt(chan);
+		pos++;
+	}
+
+	pos = 0;
+	val = ops->status_abort_int(dw, EDMA_DIR_READ);
+	while ((pos = find_next_bit(&val, tot, pos)) != tot) {
+		struct dw_edma_chan *chan = &dw->chan[pos + off];
+
+		dw_edma_abort_interrupt(chan);
+		pos++;
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t dw_edma_interrupt_all(int irq, void *data)
+{
+	dw_edma_interrupt_write(irq, data);
+	dw_edma_interrupt_read(irq, data);
+
+	return IRQ_HANDLED;
+}
+
+static int dw_edma_alloc_chan_resources(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+
+	if (chan->status != EDMA_ST_IDLE) {
+		dev_err(chan2dev(chan), "channel is busy\n");
+		return -EBUSY;
+	}
+
+	dev_dbg(dchan2dev(dchan), "allocated\n");
+
+	pm_runtime_get(chan->chip->dev);
+
+	return 0;
+}
+
+static void dw_edma_free_chan_resources(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	unsigned long timeout = jiffies + msecs_to_jiffies(5000);
+	int ret;
+
+	if (chan->status != EDMA_ST_IDLE)
+		dev_err(chan2dev(chan), "channel is busy\n");
+
+	do {
+		ret = dw_edma_device_terminate_all(dchan);
+		if (!ret)
+			break;
+
+		if (time_after_eq(jiffies, timeout)) {
+			dev_err(chan2dev(chan), "free timeout\n");
+			return;
+		}
+
+		cpu_relax();
+	} while (1);
+
+	dev_dbg(dchan2dev(dchan), "channel freed\n");
+
+	pm_runtime_put(chan->chip->dev);
+}
+
+int dw_edma_probe(struct dw_edma_chip *chip)
+{
+	struct dw_edma *dw = chip->dw;
+	struct device *dev = chip->dev;
+	const struct dw_edma_core_ops *ops;
+	size_t ll_chunk = dw->ll_region.sz;
+	size_t dt_chunk = dw->dt_region.sz;
+	u32 ch_tot;
+	int i, j, err;
+
+	raw_spin_lock_init(&dw->lock);
+
+	/* Callback operation selection accordingly to eDMA version */
+	switch (dw->version) {
+	default:
+		dev_err(dev, "unsupported version\n");
+		return -EPERM;
+	}
+
+	pm_runtime_get_sync(dev);
+
+	/* Find out how many write channels are supported by hardware */
+	dw->wr_ch_cnt = ops->ch_count(dw, EDMA_DIR_WRITE);
+	if (!dw->wr_ch_cnt) {
+		dev_err(dev, "invalid number of write channels(0)\n");
+		return -EINVAL;
+	}
+
+	/* Find out how many read channels are supported by hardware */
+	dw->rd_ch_cnt = ops->ch_count(dw, EDMA_DIR_READ);
+	if (!dw->rd_ch_cnt) {
+		dev_err(dev, "invalid number of read channels(0)\n");
+		return -EINVAL;
+	}
+
+	dev_dbg(dev, "Channels:\twrite=%d, read=%d\n",
+		dw->wr_ch_cnt, dw->rd_ch_cnt);
+
+	ch_tot = dw->wr_ch_cnt + dw->rd_ch_cnt;
+
+	/* Allocate channels */
+	dw->chan = devm_kcalloc(dev, ch_tot, sizeof(*dw->chan), GFP_KERNEL);
+	if (!dw->chan)
+		return -ENOMEM;
+
+	/* Calculate the linked list chunk for each channel */
+	ll_chunk /= roundup_pow_of_two(ch_tot);
+
+	/* Calculate the linked list chunk for each channel */
+	dt_chunk /= roundup_pow_of_two(ch_tot);
+
+	/* Disable eDMA, only to establish the ideal initial conditions */
+	ops->off(dw);
+
+	snprintf(dw->name, sizeof(dw->name), "dw-edma-core:%d", chip->id);
+
+	/* Request IRQs */
+	if (dw->nr_irqs != 1) {
+		dev_err(dev, "invalid number of irqs (%u)\n", dw->nr_irqs);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < dw->nr_irqs; i++) {
+		err = devm_request_irq(dev, pci_irq_vector(to_pci_dev(dev), i),
+				       dw_edma_interrupt_all,
+				       IRQF_SHARED, dw->name, chip);
+		if (err)
+			return err;
+	}
+
+	/* Create write channels */
+	INIT_LIST_HEAD(&dw->wr_edma.channels);
+	for (i = 0; i < dw->wr_ch_cnt; i++) {
+		struct dw_edma_chan *chan = &dw->chan[i];
+		struct dw_edma_region *dt_region;
+
+		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
+		if (!dt_region)
+			return -ENOMEM;
+
+		chan->vc.chan.private = dt_region;
+
+		chan->chip = chip;
+		chan->id = i;
+		chan->dir = EDMA_DIR_WRITE;
+		chan->configured = false;
+		chan->request = EDMA_REQ_NONE;
+		chan->status = EDMA_ST_IDLE;
+
+		chan->ll_off = (ll_chunk * i);
+		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
+
+		chan->dt_off = (dt_chunk * i);
+
+		dev_dbg(dev, "L. List:\tChannel write[%u] off=0x%.8lx, max_cnt=%u\n",
+			i, chan->ll_off, chan->ll_max);
+
+		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
+
+		dev_dbg(dev, "MSI:\t\tChannel write[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
+			i, chan->msi.address_hi, chan->msi.address_lo,
+			chan->msi.data);
+
+		chan->vc.desc_free = vchan_free_desc;
+		vchan_init(&chan->vc, &dw->wr_edma);
+
+		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
+		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
+		dt_region->sz = dt_chunk;
+
+		dev_dbg(dev, "Data:\tChannel write[%u] off=0x%.8lx\n",
+			i, chan->dt_off);
+	}
+	dma_cap_zero(dw->wr_edma.cap_mask);
+	dma_cap_set(DMA_SLAVE, dw->wr_edma.cap_mask);
+	dma_cap_set(DMA_CYCLIC, dw->wr_edma.cap_mask);
+	dw->wr_edma.directions = BIT(DMA_DEV_TO_MEM);
+	dw->wr_edma.chancnt = dw->wr_ch_cnt;
+
+	/* Create read channels */
+	INIT_LIST_HEAD(&dw->rd_edma.channels);
+	for (j = 0; j < dw->rd_ch_cnt; j++, i++) {
+		struct dw_edma_chan *chan = &dw->chan[i];
+		struct dw_edma_region *dt_region;
+
+		dt_region = devm_kzalloc(dev, sizeof(*dt_region), GFP_KERNEL);
+		if (!dt_region)
+			return -ENOMEM;
+
+		chan->vc.chan.private = dt_region;
+
+		chan->chip = chip;
+		chan->id = j;
+		chan->dir = EDMA_DIR_READ;
+		chan->configured = false;
+		chan->request = EDMA_REQ_NONE;
+		chan->status = EDMA_ST_IDLE;
+
+		chan->ll_off = (ll_chunk * i);
+		chan->ll_max = (ll_chunk / EDMA_LL_SZ) - 1;
+
+		chan->dt_off = (dt_chunk * i);
+
+		dev_dbg(dev, "L. List:\tChannel read[%u] off=0x%.8lx, max_cnt=%u\n",
+			j, chan->ll_off, chan->ll_max);
+
+		memcpy(&chan->msi, &dw->msi[0], sizeof(chan->msi));
+
+		dev_dbg(dev, "MSI:\t\tChannel read[%u] addr=0x%.8x%.8x, data=0x%.8x\n",
+			j, chan->msi.address_hi, chan->msi.address_lo,
+			chan->msi.data);
+
+		chan->vc.desc_free = vchan_free_desc;
+		vchan_init(&chan->vc, &dw->rd_edma);
+
+		dt_region->paddr = dw->dt_region.paddr + chan->dt_off;
+		dt_region->vaddr = dw->dt_region.vaddr + chan->dt_off;
+		dt_region->sz = dt_chunk;
+
+		dev_dbg(dev, "Data:\tChannel read[%u] off=0x%.8lx\n",
+			i, chan->dt_off);
+	}
+	dma_cap_zero(dw->rd_edma.cap_mask);
+	dma_cap_set(DMA_SLAVE, dw->rd_edma.cap_mask);
+	dma_cap_set(DMA_CYCLIC, dw->rd_edma.cap_mask);
+	dw->rd_edma.directions = BIT(DMA_MEM_TO_DEV);
+	dw->rd_edma.chancnt = dw->rd_ch_cnt;
+
+	/* Set DMA channels  capabilities */
+	SET_BOTH_CH(src_addr_widths, BIT(DMA_SLAVE_BUSWIDTH_4_BYTES));
+	SET_BOTH_CH(dst_addr_widths, BIT(DMA_SLAVE_BUSWIDTH_4_BYTES));
+	SET_BOTH_CH(residue_granularity, DMA_RESIDUE_GRANULARITY_DESCRIPTOR);
+
+	SET_BOTH_CH(dev, dev);
+
+	SET_BOTH_CH(device_alloc_chan_resources, dw_edma_alloc_chan_resources);
+	SET_BOTH_CH(device_free_chan_resources, dw_edma_free_chan_resources);
+
+	SET_BOTH_CH(device_config, dw_edma_device_config);
+	SET_BOTH_CH(device_pause, dw_edma_device_pause);
+	SET_BOTH_CH(device_resume, dw_edma_device_resume);
+	SET_BOTH_CH(device_terminate_all, dw_edma_device_terminate_all);
+	SET_BOTH_CH(device_issue_pending, dw_edma_device_issue_pending);
+	SET_BOTH_CH(device_tx_status, dw_edma_device_tx_status);
+	SET_BOTH_CH(device_prep_slave_sg, dw_edma_device_prep_slave_sg);
+	SET_BOTH_CH(device_prep_dma_cyclic, dw_edma_device_prep_dma_cyclic);
+
+	/* Power management */
+	pm_runtime_enable(dev);
+
+	/* Register DMA device */
+	err = dma_async_device_register(&dw->wr_edma);
+	if (err)
+		goto err_pm_disable;
+
+	err = dma_async_device_register(&dw->rd_edma);
+	if (err)
+		goto err_pm_disable;
+
+	/* Turn debugfs on */
+	err = ops->debugfs_on(chip);
+	if (err) {
+		dev_err(dev, "unable to create debugfs structure\n");
+		goto err_pm_disable;
+	}
+
+	dev_info(dev, "DesignWare eDMA controller driver loaded completely\n");
+
+	return 0;
+
+err_pm_disable:
+	pm_runtime_disable(dev);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(dw_edma_probe);
+
+int dw_edma_remove(struct dw_edma_chip *chip)
+{
+	struct dw_edma *dw = chip->dw;
+	struct device *dev = chip->dev;
+	const struct dw_edma_core_ops *ops = dw->ops;
+	struct dw_edma_chan *chan, *_chan;
+	int i;
+
+	/* Disable eDMA */
+	if (ops)
+		ops->off(dw);
+
+	/* Free irqs */
+	for (i = 0; i < dw->nr_irqs; i++) {
+		if (pci_irq_vector(to_pci_dev(dev), i) < 0)
+			continue;
+
+		devm_free_irq(dev, pci_irq_vector(to_pci_dev(dev), i), chip);
+	}
+
+	/* Power management */
+	pm_runtime_disable(dev);
+
+	list_for_each_entry_safe(chan, _chan, &dw->wr_edma.channels,
+				 vc.chan.device_node) {
+		list_del(&chan->vc.chan.device_node);
+		tasklet_kill(&chan->vc.task);
+	}
+
+	list_for_each_entry_safe(chan, _chan, &dw->rd_edma.channels,
+				 vc.chan.device_node) {
+		list_del(&chan->vc.chan.device_node);
+		tasklet_kill(&chan->vc.task);
+	}
+
+	/* Deregister eDMA device */
+	dma_async_device_unregister(&dw->wr_edma);
+	dma_async_device_unregister(&dw->rd_edma);
+
+	/* Turn debugfs off */
+	if (ops)
+		ops->debugfs_off();
+
+	dev_info(dev, "DesignWare eDMA controller driver unloaded complete\n");
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_remove);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
+MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
new file mode 100644
index 0000000..2d98a10
--- /dev/null
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -0,0 +1,151 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * Synopsys DesignWare eDMA core driver
+ */
+
+#ifndef _DW_EDMA_CORE_H
+#define _DW_EDMA_CORE_H
+
+#include <linux/msi.h>
+#include <linux/dma/edma.h>
+
+#include "../virt-dma.h"
+
+#define EDMA_LL_SZ					24
+
+enum dw_edma_dir {
+	EDMA_DIR_WRITE = 0,
+	EDMA_DIR_READ
+};
+
+enum dw_edma_mode {
+	EDMA_MODE_LEGACY = 0,
+	EDMA_MODE_UNROLL
+};
+
+enum dw_edma_request {
+	EDMA_REQ_NONE = 0,
+	EDMA_REQ_STOP,
+	EDMA_REQ_PAUSE
+};
+
+enum dw_edma_status {
+	EDMA_ST_IDLE = 0,
+	EDMA_ST_PAUSE,
+	EDMA_ST_BUSY
+};
+
+struct dw_edma_chan;
+struct dw_edma_chunk;
+
+struct dw_edma_core_ops {
+	/* eDMA management callbacks */
+	void (*off)(struct dw_edma *dw);
+	u16 (*ch_count)(struct dw_edma *dw, enum dw_edma_dir dir);
+	enum dma_status (*ch_status)(struct dw_edma_chan *chan);
+	void (*clear_done_int)(struct dw_edma_chan *chan);
+	void (*clear_abort_int)(struct dw_edma_chan *chan);
+	u32 (*status_done_int)(struct dw_edma *chan, enum dw_edma_dir dir);
+	u32 (*status_abort_int)(struct dw_edma *chan, enum dw_edma_dir dir);
+	void (*start)(struct dw_edma_chunk *chunk, bool first);
+	int (*device_config)(struct dma_chan *dchan);
+	/* eDMA debug fs callbacks */
+	int (*debugfs_on)(struct dw_edma_chip *chip);
+	void (*debugfs_off)(void);
+};
+
+struct dw_edma_burst {
+	struct list_head		list;
+	u64				sar;
+	u64				dar;
+	u32				sz;
+};
+
+struct dw_edma_region {
+	phys_addr_t			paddr;
+	dma_addr_t			vaddr;
+	size_t				sz;
+};
+
+struct dw_edma_chunk {
+	struct list_head		list;
+	struct dw_edma_chan		*chan;
+	struct dw_edma_burst		*burst;
+
+	u32				bursts_alloc;
+
+	u8				cb;
+	struct dw_edma_region		ll_region;	/* Linked list */
+};
+
+struct dw_edma_desc {
+	struct virt_dma_desc		vd;
+	struct dw_edma_chan		*chan;
+	struct dw_edma_chunk		*chunk;
+
+	u32				chunks_alloc;
+
+	u32				alloc_sz;
+	u32				xfer_sz;
+};
+
+struct dw_edma_chan {
+	struct virt_dma_chan		vc;
+	struct dw_edma_chip		*chip;
+	int				id;
+	enum dw_edma_dir		dir;
+
+	off_t				ll_off;
+	u32				ll_max;
+
+	off_t				dt_off;
+
+	struct msi_msg			msi;
+
+	enum dw_edma_request		request;
+	enum dw_edma_status		status;
+	u8				configured;
+
+	phys_addr_t			src_addr;
+	phys_addr_t			dst_addr;
+};
+
+struct dw_edma {
+	char				name[20];
+
+	struct dma_device		wr_edma;
+	u16				wr_ch_cnt;
+
+	struct dma_device		rd_edma;
+	u16				rd_ch_cnt;
+
+	struct dw_edma_region		rg_region;	/* Registers */
+	struct dw_edma_region		ll_region;	/* Linked list */
+	struct dw_edma_region		dt_region;	/* Data */
+
+	struct msi_msg			*msi;
+	int				nr_irqs;
+
+	u32				version;
+	enum dw_edma_mode		mode;
+
+	struct dw_edma_chan		*chan;
+	const struct dw_edma_core_ops	*ops;
+
+	raw_spinlock_t			lock;		/* Only for legacy */
+};
+
+static inline
+struct dw_edma_chan *vc2dw_edma_chan(struct virt_dma_chan *vc)
+{
+	return container_of(vc, struct dw_edma_chan, vc);
+}
+
+static inline
+struct dw_edma_chan *dchan2dw_edma_chan(struct dma_chan *dchan)
+{
+	return vc2dw_edma_chan(to_virt_chan(dchan));
+}
+
+#endif /* _DW_EDMA_CORE_H */
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
new file mode 100644
index 0000000..349e542
--- /dev/null
+++ b/include/linux/dma/edma.h
@@ -0,0 +1,43 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+// Synopsys DesignWare eDMA core driver
+
+#ifndef _DW_EDMA_H
+#define _DW_EDMA_H
+
+#include <linux/device.h>
+#include <linux/dmaengine.h>
+
+struct dw_edma;
+
+/**
+ * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
+ * @dev:		 struct device of the eDMA controller
+ * @id:			 instance ID
+ * @irq:		 irq line
+ * @dw:			 struct dw_edma that is filed by dw_edma_probe()
+ */
+struct dw_edma_chip {
+	struct device		*dev;
+	int			id;
+	int			irq;
+	struct dw_edma		*dw;
+};
+
+/* Export to the platform drivers */
+#if IS_ENABLED(CONFIG_DW_EDMA)
+int dw_edma_probe(struct dw_edma_chip *chip);
+int dw_edma_remove(struct dw_edma_chip *chip);
+#else
+static inline int dw_edma_probe(struct dw_edma_chip *chip)
+{
+	return -ENODEV;
+}
+
+static inline int dw_edma_remove(struct dw_edma_chip *chip)
+{
+	return 0;
+}
+#endif /* CONFIG_DW_EDMA */
+
+#endif /* _DW_EDMA_H */