From: Thierry Reding The Video Protection Region (VPR) found on NVIDIA Tegra chips is a region of memory that is protected from CPU accesses. It is used to decode and play back DRM protected content. It is a standard reserved memory region that can exist in two forms: static VPR where the base address and size are fixed (uses the "reg" property to describe the memory) and a resizable VPR where only the size is known upfront and the OS can allocate it wherever it can be accomodated. Reviewed-by: Rob Herring (Arm) Signed-off-by: Thierry Reding --- .../nvidia,tegra-video-protection-region.yaml | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.yaml diff --git a/Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.yaml b/Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.yaml new file mode 100644 index 000000000000..c13292a791bb --- /dev/null +++ b/Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.yaml @@ -0,0 +1,55 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/reserved-memory/nvidia,tegra-video-protection-region.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NVIDIA Tegra Video Protection Region (VPR) + +maintainers: + - Thierry Reding + - Jon Hunter + +description: | + NVIDIA Tegra chips have long supported a mechanism to protect a single, + contiguous memory region from non-secure memory accesses. Typically this + region is used for decoding and playback of DRM protected content. Various + devices, such as the display controller and multimedia engines (video + decoder) can access this region in a secure way. Access from the CPU is + generally forbidden. + + Two variants exist for VPR: one is fixed in both the base address and size, + while the other is resizable. Fixed VPR can be described by just a "reg" + property specifying the base address and size, whereas the resizable VPR + is defined by a size/alignment pair of properties. For resizable VPR the + memory is reusable by the rest of the system when it's unused for VPR and + therefore the "reusable" property must be specified along with it. For a + fixed VPR, the memory is permanently protected, and therefore it's not + reusable and must also be marked as "no-map" to prevent any (including + speculative) accesses to it. + +allOf: + - $ref: reserved-memory.yaml + +properties: + compatible: + const: nvidia,tegra-video-protection-region + +dependencies: + size: [alignment, reusable] + alignment: [size, reusable] + reusable: [alignment, size] + + reg: [no-map] + no-map: [reg] + +unevaluatedProperties: false + +oneOf: + - required: + - compatible + - reg + + - required: + - compatible + - size -- 2.52.0 From: Thierry Reding Add the memory-region and memory-region-names properties to the bindings for the display controllers and the host1x engine found on various Tegra generations. These memory regions are used to access firmware-provided framebuffer memory as well as the video protection region. Signed-off-by: Thierry Reding --- .../bindings/display/tegra/nvidia,tegra186-dc.yaml | 10 ++++++++++ .../bindings/display/tegra/nvidia,tegra20-dc.yaml | 10 +++++++++- .../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 7 +++++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml index ce4589466a18..881bfbf4764d 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml @@ -57,6 +57,16 @@ properties: - const: dma-mem # read-0 - const: read-1 + memory-region: + minItems: 1 + maxItems: 2 + + memory-region-names: + items: + enum: [ framebuffer, protected ] + minItems: 1 + maxItems: 2 + nvidia,outputs: description: A list of phandles of outputs that this display controller can drive. diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml index 69be95afd562..a012644eeb7d 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml @@ -65,7 +65,15 @@ properties: items: - description: phandle to the core power domain - memory-region: true + memory-region: + minItems: 1 + maxItems: 2 + + memory-region-names: + items: + enum: [ framebuffer, protected ] + minItems: 1 + maxitems: 2 nvidia,head: $ref: /schemas/types.yaml#/definitions/uint32 diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml index 3563378a01af..f45be30835a8 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml @@ -96,6 +96,13 @@ properties: items: - description: phandle to the HEG or core power domain + memory-region: + maxItems: 1 + + memory-region-names: + items: + - const: protected + required: - compatible - interrupts -- 2.52.0 From: Thierry Reding This is similar to bitmap_allocate_region() but allows allocation of non-power of two pages/bits. While at it, reimplement bitmap_allocate_region() in terms of this new helper to remove a sliver of code duplication. Signed-off-by: Thierry Reding --- include/linux/bitmap.h | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index b0395e4ccf90..0fc259908262 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -673,10 +673,10 @@ void bitmap_release_region(unsigned long *bitmap, unsigned int pos, int order) } /** - * bitmap_allocate_region - allocate bitmap region + * bitmap_allocate - allocate bitmap region * @bitmap: array of unsigned longs corresponding to the bitmap * @pos: beginning of bit region to allocate - * @order: region size (log base 2 of number of bits) to allocate + * @len: number of bits to allocate * * Allocate (set bits in) a specified region of a bitmap. * @@ -684,16 +684,31 @@ void bitmap_release_region(unsigned long *bitmap, unsigned int pos, int order) * free (not all bits were zero). */ static __always_inline -int bitmap_allocate_region(unsigned long *bitmap, unsigned int pos, int order) +int bitmap_allocate(unsigned long *bitmap, unsigned int pos, unsigned int len) { - unsigned int len = BIT(order); - if (find_next_bit(bitmap, pos + len, pos) < pos + len) return -EBUSY; bitmap_set(bitmap, pos, len); return 0; } +/** + * bitmap_allocate_region - allocate bitmap region + * @bitmap: array of unsigned longs corresponding to the bitmap + * @pos: beginning of bit region to allocate + * @order: region size (log base 2 of number of bits) to allocate + * + * Allocate (set bits in) a specified region of a bitmap. + * + * Returns: 0 on success, or %-EBUSY if specified region wasn't + * free (not all bits were zero). + */ +static __always_inline +int bitmap_allocate_region(unsigned long *bitmap, unsigned int pos, int order) +{ + return bitmap_allocate(bitmap, pos, BIT(order)); +} + /** * bitmap_find_free_region - find a contiguous aligned mem region * @bitmap: array of unsigned longs corresponding to the bitmap -- 2.52.0 From: Thierry Reding There is no technical reason why there should be a limited number of CMA regions, so extract some code into helpers and use them to create extra functions (cma_create() and cma_free()) that allow creating and freeing, respectively, CMA regions dynamically at runtime. The static array of CMA areas cannot be replaced by dynamically created areas because for many of them, allocation must not fail and some cases may need to initialize them before the slab allocator is even available. To account for this, keep these "early" areas in a separate list and track the dynamic areas in a separate list. Signed-off-by: Thierry Reding --- Changes in v2: - rename fixed number of CMA areas to reflect their main use - account for pages in dynamically allocated regions --- arch/arm/mm/dma-mapping.c | 2 +- arch/s390/mm/init.c | 2 +- drivers/dma-buf/heaps/cma_heap.c | 2 +- include/linux/cma.h | 7 +- mm/cma.c | 187 +++++++++++++++++++++++++------ mm/cma.h | 5 +- 6 files changed, 164 insertions(+), 41 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index a4c765d24692..88768dbd9cd6 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -254,7 +254,7 @@ struct dma_contig_early_reserve { unsigned long size; }; -static struct dma_contig_early_reserve dma_mmu_remap[MAX_CMA_AREAS] __initdata; +static struct dma_contig_early_reserve dma_mmu_remap[MAX_EARLY_CMA_AREAS] __initdata; static int dma_mmu_remap_num __initdata; diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 3c20475cbee2..de420ebdfd23 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -241,7 +241,7 @@ static int s390_cma_mem_notifier(struct notifier_block *nb, mem_data.start = arg->start_pfn << PAGE_SHIFT; mem_data.end = mem_data.start + (arg->nr_pages << PAGE_SHIFT); if (action == MEM_GOING_OFFLINE) - rc = cma_for_each_area(s390_cma_check_range, &mem_data); + rc = cma_for_each_early_area(s390_cma_check_range, &mem_data); return notifier_from_errno(rc); } diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c index 49cc45fb42dd..4c20e11dd286 100644 --- a/drivers/dma-buf/heaps/cma_heap.c +++ b/drivers/dma-buf/heaps/cma_heap.c @@ -30,7 +30,7 @@ #define DEFAULT_CMA_NAME "default_cma_region" -static struct cma *dma_areas[MAX_CMA_AREAS] __initdata; +static struct cma *dma_areas[MAX_EARLY_CMA_AREAS] __initdata; static unsigned int dma_areas_num __initdata; int __init dma_heap_cma_register_heap(struct cma *cma) diff --git a/include/linux/cma.h b/include/linux/cma.h index e2a690f7e77e..763c9ad0c556 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -7,7 +7,7 @@ #include #ifdef CONFIG_CMA_AREAS -#define MAX_CMA_AREAS CONFIG_CMA_AREAS +#define MAX_EARLY_CMA_AREAS CONFIG_CMA_AREAS #endif #define CMA_MAX_NAME 64 @@ -57,9 +57,14 @@ struct page *cma_alloc_frozen_compound(struct cma *cma, unsigned int order); bool cma_release_frozen(struct cma *cma, const struct page *pages, unsigned long count); +extern int cma_for_each_early_area(int (*it)(struct cma *cma, void *data), void *data); extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); extern bool cma_intersects(struct cma *cma, unsigned long start, unsigned long end); extern void cma_reserve_pages_on_error(struct cma *cma); +struct cma *cma_create(phys_addr_t base, phys_addr_t size, + unsigned int order_per_bit, const char *name); +void cma_free(struct cma *cma); + #endif diff --git a/mm/cma.c b/mm/cma.c index b80b60ed4927..da32eb565f24 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -33,7 +33,12 @@ #include "internal.h" #include "cma.h" -struct cma cma_areas[MAX_CMA_AREAS]; +static DEFINE_MUTEX(cma_lock); + +struct cma cma_early_areas[MAX_EARLY_CMA_AREAS]; +unsigned int cma_early_area_count; + +static LIST_HEAD(cma_areas); unsigned int cma_area_count; phys_addr_t cma_get_base(const struct cma *cma) @@ -193,7 +198,6 @@ static void __init cma_activate_area(struct cma *cma) free_reserved_page(pfn_to_page(pfn)); } } - totalcma_pages -= cma->count; cma->available_count = cma->count = 0; pr_err("CMA area %s could not be activated\n", cma->name); } @@ -202,8 +206,8 @@ static int __init cma_init_reserved_areas(void) { int i; - for (i = 0; i < cma_area_count; i++) - cma_activate_area(&cma_areas[i]); + for (i = 0; i < cma_early_area_count; i++) + cma_activate_area(&cma_early_areas[i]); return 0; } @@ -214,41 +218,77 @@ void __init cma_reserve_pages_on_error(struct cma *cma) set_bit(CMA_RESERVE_PAGES_ON_ERROR, &cma->flags); } +static void __init cma_init_area(struct cma *cma, const char *name, + phys_addr_t size, unsigned int order_per_bit) +{ + if (name) + snprintf(cma->name, CMA_MAX_NAME, "%s", name); + else + snprintf(cma->name, CMA_MAX_NAME, "cma%d\n", cma_area_count); + + cma->available_count = cma->count = size >> PAGE_SHIFT; + cma->order_per_bit = order_per_bit; + + INIT_LIST_HEAD(&cma->node); +} + static int __init cma_new_area(const char *name, phys_addr_t size, unsigned int order_per_bit, struct cma **res_cma) { struct cma *cma; - if (cma_area_count == ARRAY_SIZE(cma_areas)) { + if (cma_early_area_count == ARRAY_SIZE(cma_early_areas)) { pr_err("Not enough slots for CMA reserved regions!\n"); return -ENOSPC; } + mutex_lock(&cma_lock); + /* * Each reserved area must be initialised later, when more kernel * subsystems (like slab allocator) are available. */ - cma = &cma_areas[cma_area_count]; - cma_area_count++; + cma = &cma_early_areas[cma_early_area_count]; + cma_early_area_count++; - if (name) - snprintf(cma->name, CMA_MAX_NAME, "%s", name); - else - snprintf(cma->name, CMA_MAX_NAME, "cma%d\n", cma_area_count); + cma_init_area(cma, name, size, order_per_bit); - cma->available_count = cma->count = size >> PAGE_SHIFT; - cma->order_per_bit = order_per_bit; - *res_cma = cma; totalcma_pages += cma->count; + *res_cma = cma; + + mutex_unlock(&cma_lock); return 0; } static void __init cma_drop_area(struct cma *cma) { + mutex_lock(&cma_lock); totalcma_pages -= cma->count; - cma_area_count--; + cma_early_area_count--; + mutex_unlock(&cma_lock); +} + +static int __init cma_check_memory(phys_addr_t base, phys_addr_t size) +{ + if (!size || !memblock_is_region_reserved(base, size)) + return -EINVAL; + + /* + * CMA uses CMA_MIN_ALIGNMENT_BYTES as alignment requirement which + * needs pageblock_order to be initialized. Let's enforce it. + */ + if (!pageblock_order) { + pr_err("pageblock_order not yet initialized. Called during early boot?\n"); + return -EINVAL; + } + + /* ensure minimal alignment required by mm core */ + if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES)) + return -EINVAL; + + return 0; } /** @@ -271,22 +311,9 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, struct cma *cma; int ret; - /* Sanity checks */ - if (!size || !memblock_is_region_reserved(base, size)) - return -EINVAL; - - /* - * CMA uses CMA_MIN_ALIGNMENT_BYTES as alignment requirement which - * needs pageblock_order to be initialized. Let's enforce it. - */ - if (!pageblock_order) { - pr_err("pageblock_order not yet initialized. Called during early boot?\n"); - return -EINVAL; - } - - /* ensure minimal alignment required by mm core */ - if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES)) - return -EINVAL; + ret = cma_check_memory(base, size); + if (ret < 0) + return ret; ret = cma_new_area(name, size, order_per_bit, &cma); if (ret != 0) @@ -439,7 +466,7 @@ static int __init __cma_declare_contiguous_nid(phys_addr_t *basep, pr_debug("%s(size %pa, base %pa, limit %pa alignment %pa)\n", __func__, &size, &base, &limit, &alignment); - if (cma_area_count == ARRAY_SIZE(cma_areas)) { + if (cma_early_area_count == ARRAY_SIZE(cma_early_areas)) { pr_err("Not enough slots for CMA reserved regions!\n"); return -ENOSPC; } @@ -1041,12 +1068,12 @@ bool cma_release_frozen(struct cma *cma, const struct page *pages, return true; } -int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) +int cma_for_each_early_area(int (*it)(struct cma *cma, void *data), void *data) { int i; - for (i = 0; i < cma_area_count; i++) { - int ret = it(&cma_areas[i], data); + for (i = 0; i < cma_early_area_count; i++) { + int ret = it(&cma_early_areas[i], data); if (ret) return ret; @@ -1055,6 +1082,25 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) return 0; } +int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) +{ + struct cma *cma; + + mutex_lock(&cma_lock); + + list_for_each_entry(cma, &cma_areas, node) { + int ret = it(cma, data); + + if (ret) { + mutex_unlock(&cma_lock); + return ret; + } + } + + mutex_unlock(&cma_lock); + return 0; +} + bool cma_intersects(struct cma *cma, unsigned long start, unsigned long end) { int r; @@ -1137,3 +1183,74 @@ void __init *cma_reserve_early(struct cma *cma, unsigned long size) return ret; } + +struct cma *__init cma_create(phys_addr_t base, phys_addr_t size, + unsigned int order_per_bit, const char *name) +{ + struct cma *cma; + int ret; + + ret = cma_check_memory(base, size); + if (ret < 0) + return ERR_PTR(ret); + + cma = kzalloc(sizeof(*cma), GFP_KERNEL); + if (!cma) + return ERR_PTR(-ENOMEM); + + cma_init_area(cma, name, size, order_per_bit); + cma->ranges[0].base_pfn = PFN_DOWN(base); + cma->ranges[0].early_pfn = PFN_DOWN(base); + cma->ranges[0].count = cma->count; + cma->nranges = 1; + + cma_activate_area(cma); + + mutex_lock(&cma_lock); + list_add_tail(&cma->node, &cma_areas); + totalcma_pages += cma->count; + cma_area_count++; + mutex_unlock(&cma_lock); + + return cma; +} + +void cma_free(struct cma *cma) +{ + unsigned int i; + + /* + * Safety check to prevent a CMA with active allocations from being + * released. + */ + for (i = 0; i < cma->nranges; i++) { + unsigned long nbits = cma_bitmap_maxno(cma, &cma->ranges[i]); + + if (!bitmap_empty(cma->ranges[i].bitmap, nbits)) { + WARN(1, "%s: range %u not empty\n", cma->name, i); + return; + } + } + + /* free reserved pages and the bitmap */ + for (i = 0; i < cma->nranges; i++) { + struct cma_memrange *cmr = &cma->ranges[i]; + unsigned long end_pfn, pfn; + + end_pfn = cmr->base_pfn + cmr->count; + for (pfn = cmr->base_pfn; pfn < end_pfn; pfn++) + free_reserved_page(pfn_to_page(pfn)); + + bitmap_free(cmr->bitmap); + } + + mutex_destroy(&cma->alloc_mutex); + + mutex_lock(&cma_lock); + totalcma_pages -= cma->count; + list_del(&cma->node); + cma_area_count--; + mutex_unlock(&cma_lock); + + kfree(cma); +} diff --git a/mm/cma.h b/mm/cma.h index c70180c36559..ae4db9819e38 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -41,6 +41,7 @@ struct cma { unsigned long available_count; unsigned int order_per_bit; /* Order of pages represented by one bit */ spinlock_t lock; + struct list_head node; struct mutex alloc_mutex; #ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; @@ -71,8 +72,8 @@ enum cma_flags { CMA_ACTIVATED, }; -extern struct cma cma_areas[MAX_CMA_AREAS]; -extern unsigned int cma_area_count; +extern struct cma cma_early_areas[MAX_EARLY_CMA_AREAS]; +extern unsigned int cma_early_area_count; static inline unsigned long cma_bitmap_maxno(struct cma *cma, struct cma_memrange *cmr) -- 2.52.0 From: Thierry Reding Add a callback to struct dma_heap_ops that heap providers can implement to show information about the state of the heap in debugfs. A top-level directory named "dma_heap" is created in debugfs and individual files will be named after the heaps. Signed-off-by: Thierry Reding --- drivers/dma-buf/dma-heap.c | 56 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-heap.h | 2 ++ 2 files changed, 58 insertions(+) diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index d230ddeb24e0..9784fa74ce53 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -7,6 +7,7 @@ */ #include +#include #include #include #include @@ -223,6 +224,46 @@ const char *dma_heap_get_name(struct dma_heap *heap) } EXPORT_SYMBOL_NS_GPL(dma_heap_get_name, "DMA_BUF_HEAP"); +#ifdef CONFIG_DEBUG_FS +static int dma_heap_debug_show(struct seq_file *s, void *unused) +{ + struct dma_heap *heap = s->private; + int err = 0; + + if (heap->ops && heap->ops->show) + err = heap->ops->show(s, heap); + + return err; +} +DEFINE_SHOW_ATTRIBUTE(dma_heap_debug); + +static struct dentry *dma_heap_debugfs_dir; + +static void dma_heap_init_debugfs(void) +{ + struct dentry *dir; + + dir = debugfs_create_dir("dma_heap", NULL); + if (IS_ERR(dir)) + return; + + dma_heap_debugfs_dir = dir; +} + +static void dma_heap_exit_debugfs(void) +{ + debugfs_remove_recursive(dma_heap_debugfs_dir); +} +#else +static void dma_heap_init_debugfs(void) +{ +} + +static void dma_heap_exit_debugfs(void) +{ +} +#endif + /** * dma_heap_add - adds a heap to dmabuf heaps * @exp_info: information needed to register this heap @@ -297,6 +338,13 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) /* Add heap to the list */ list_add(&heap->list, &heap_list); + +#ifdef CONFIG_DEBUG_FS + if (heap->ops && heap->ops->show) + debugfs_create_file(heap->name, 0444, dma_heap_debugfs_dir, + heap, &dma_heap_debug_fops); +#endif + mutex_unlock(&heap_list_lock); return heap; @@ -333,6 +381,14 @@ static int dma_heap_init(void) } dma_heap_class->devnode = dma_heap_devnode; + dma_heap_init_debugfs(); + return 0; } subsys_initcall(dma_heap_init); + +static void __exit dma_heap_exit(void) +{ + dma_heap_exit_debugfs(); +} +__exitcall(dma_heap_exit); diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h index 648328a64b27..1c9bed1f4dde 100644 --- a/include/linux/dma-heap.h +++ b/include/linux/dma-heap.h @@ -12,6 +12,7 @@ #include struct dma_heap; +struct seq_file; /** * struct dma_heap_ops - ops to operate on a given heap @@ -24,6 +25,7 @@ struct dma_heap_ops { unsigned long len, u32 fd_flags, u64 heap_flags); + int (*show)(struct seq_file *s, struct dma_heap *heap); }; /** -- 2.52.0 From: Thierry Reding NVIDIA Tegra SoCs commonly define a Video-Protection-Region, which is a region of memory dedicated to content-protected video decode and playback. This memory cannot be accessed by the CPU and only certain hardware devices have access to it. Expose the VPR as a DMA heap so that applications and drivers can allocate buffers from this region for use-cases that require this kind of protected memory. VPR has a few very critical peculiarities. First, it must be a single contiguous region of memory (there is a single pair of registers that set the base address and size of the region), which is configured by calling back into the secure monitor. The memory region also needs to quite large for some use-cases because it needs to fit multiple video frames (8K video should be supported), so VPR sizes of ~2 GiB are expected. However, some devices cannot afford to reserve this amount of memory for a particular use-case, and therefore the VPR must be resizable. Unfortunately, resizing the VPR is slightly tricky because the GPU found on Tegra SoCs must be in reset during the VPR resize operation. This is currently implemented by freezing all userspace processes and calling invoking the GPU's freeze() implementation, resizing and the thawing the GPU and userspace processes. This is quite heavy-handed, so eventually it might be better to implement thawing/freezing in the GPU driver in such a way that they block accesses to the GPU so that the VPR resize operation can happen without suspending all userspace. In order to balance the memory usage versus the amount of resizing that needs to happen, the VPR is divided into multiple chunks. Each chunk is implemented as a CMA area that is completely allocated on first use to guarantee the contiguity of the VPR. Once all buffers from a chunk have been freed, the CMA area is deallocated and the memory returned to the system. Signed-off-by: Thierry Reding --- Changes in v2: - cluster allocations to reduce the number of resize operations - support cross-chunk allocation --- drivers/dma-buf/heaps/Kconfig | 7 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/tegra-vpr.c | 1265 +++++++++++++++++++++++++++++ include/trace/events/tegra_vpr.h | 57 ++ 4 files changed, 1330 insertions(+) create mode 100644 drivers/dma-buf/heaps/tegra-vpr.c create mode 100644 include/trace/events/tegra_vpr.h diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06c4226..4268e886a4a2 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -12,3 +12,10 @@ config DMABUF_HEAPS_CMA Choose this option to enable dma-buf CMA heap. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config DMABUF_HEAPS_TEGRA_VPR + bool "NVIDIA Tegra Video-Protected-Region DMA-BUF Heap" + depends on DMABUF_HEAPS && DMA_CMA + help + Choose this option to enable Video-Protected-Region (VPR) support on + a range of NVIDIA Tegra devices. diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 974467791032..265b77a7b889 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o +obj-$(CONFIG_DMABUF_HEAPS_TEGRA_VPR) += tegra-vpr.o diff --git a/drivers/dma-buf/heaps/tegra-vpr.c b/drivers/dma-buf/heaps/tegra-vpr.c new file mode 100644 index 000000000000..7815ac83e6f9 --- /dev/null +++ b/drivers/dma-buf/heaps/tegra-vpr.c @@ -0,0 +1,1265 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DMA-BUF restricted heap exporter for NVIDIA Video-Protection-Region (VPR) + * + * Copyright (C) 2024-2026 NVIDIA Corporation + */ + +#define pr_fmt(fmt) "tegra-vpr: " fmt + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include + +#define CREATE_TRACE_POINTS +#include + +#define TEGRA_VPR_MAX_CHUNKS 64 + +struct tegra_vpr; + +struct tegra_vpr_device { + struct list_head node; + struct device *dev; +}; + +struct tegra_vpr_chunk { + phys_addr_t start; + phys_addr_t limit; + size_t size; + + struct tegra_vpr *vpr; + struct cma *cma; + bool active; + + struct page *start_page; + unsigned int offset; + unsigned long virt; + pgoff_t num_pages; + + unsigned int num_buffers; +}; + +struct tegra_vpr { + struct device_node *dev_node; + unsigned long align; + phys_addr_t base; + phys_addr_t size; + bool use_freezer; + bool resizable; + + struct list_head buffers; + struct page *start_page; + unsigned long *bitmap; + pgoff_t num_pages; + + /* resizable VPR */ + DECLARE_BITMAP(active, TEGRA_VPR_MAX_CHUNKS); + struct tegra_vpr_chunk *chunks; + unsigned int num_chunks; + + unsigned int first; + unsigned int last; + + struct list_head devices; + struct mutex lock; +}; + +struct tegra_vpr_buffer { + struct list_head attachments; + struct tegra_vpr *vpr; + struct list_head list; + struct mutex lock; + + struct page *start_page; + struct page **pages; + pgoff_t num_pages; + phys_addr_t start; + phys_addr_t limit; + size_t size; + int pageno; + int order; + + DECLARE_BITMAP(chunks, TEGRA_VPR_MAX_CHUNKS); +}; + +struct tegra_vpr_attachment { + struct device *dev; + struct sg_table sgt; + struct list_head list; +}; + +#define ARM_SMCCC_TE_FUNC_PROGRAM_VPR 0x3 + +#define ARM_SMCCC_VENDOR_SIP_TE_PROGRAM_VPR_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + ARM_SMCCC_OWNER_SIP, \ + ARM_SMCCC_TE_FUNC_PROGRAM_VPR) + +static int tegra_vpr_set(phys_addr_t base, phys_addr_t size) +{ + struct arm_smccc_res res; + + arm_smccc_smc(ARM_SMCCC_VENDOR_SIP_TE_PROGRAM_VPR_FUNC_ID, base, size, + 0, 0, 0, 0, 0, &res); + + return res.a0; +} + +static int tegra_vpr_get_extents(struct tegra_vpr *vpr, phys_addr_t *base, + phys_addr_t *size) +{ + phys_addr_t start = ~0, limit = 0; + unsigned int i; + + for (i = 0; i < vpr->num_chunks; i++) { + struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; + + if (chunk->active) { + if (chunk->start < start) + start = chunk->start; + + if (chunk->limit > limit) + limit = chunk->limit; + } + } + + if (limit > start) { + *size = limit - start; + *base = start; + } else { + *base = *size = 0; + } + + return 0; +} + +static int tegra_vpr_resize(struct tegra_vpr *vpr) +{ + struct tegra_vpr_device *node; + phys_addr_t base, size; + int err, status = 0; + + err = tegra_vpr_get_extents(vpr, &base, &size); + if (err < 0) { + pr_err("%s(): failed to get VPR extents: %d\n", __func__, err); + return err; + } + + if (vpr->use_freezer) { + err = freeze_processes(); + if (err < 0) { + pr_err("%s(): failed to freeze processes: %d\n", + __func__, err); + return err; + } + } + + list_for_each_entry(node, &vpr->devices, node) { + err = pm_generic_freeze(node->dev); + if (err < 0) { + pr_err("failed to runtime suspend %s: %d\n", + dev_name(node->dev), err); + goto thaw; + } + } + + trace_tegra_vpr_set(base, size); + + err = tegra_vpr_set(base, size); + if (err < 0) { + pr_err("failed to secure VPR: %d\n", err); + status = err; + } + +thaw: + list_for_each_entry_continue_reverse(node, &vpr->devices, node) { + err = pm_generic_thaw(node->dev); + if (err < 0) { + pr_err("failed to runtime resume %s\n", + dev_name(node->dev)); + continue; + } + } + + if (vpr->use_freezer) + thaw_processes(); + + return status; +} + +static int tegra_vpr_protect_pages(pte_t *ptep, unsigned long addr, + void *unused) +{ + pte_t pte = __ptep_get(ptep); + + pte = clear_pte_bit(pte, __pgprot(PROT_NORMAL)); + pte = set_pte_bit(pte, __pgprot(PROT_DEVICE_nGnRnE)); + + __set_pte(ptep, pte); + + return 0; +} + +static int tegra_vpr_unprotect_pages(pte_t *ptep, unsigned long addr, + void *unused) +{ + pte_t pte = __ptep_get(ptep); + + pte = clear_pte_bit(pte, __pgprot(PROT_DEVICE_nGnRnE)); + pte = set_pte_bit(pte, __pgprot(PROT_NORMAL)); + + __set_pte(ptep, pte); + + return 0; +} + +static int __init tegra_vpr_chunk_init(struct tegra_vpr *vpr, + struct tegra_vpr_chunk *chunk, + phys_addr_t start, size_t size, + unsigned int order, const char *name) +{ + chunk->start = start; + chunk->limit = start + size; + chunk->size = size; + chunk->vpr = vpr; + + chunk->cma = cma_create(start, size, order, name); + if (IS_ERR(chunk->cma)) { + pr_err("cma_create() failed: %ld\n", PTR_ERR(chunk->cma)); + return PTR_ERR(chunk->cma); + } + + chunk->offset = (start - vpr->base) >> PAGE_SHIFT; + chunk->num_pages = size >> PAGE_SHIFT; + chunk->num_buffers = 0; + + /* CMA area is not reserved yet */ + chunk->start_page = NULL; + chunk->virt = 0; + + return 0; +} + +static void tegra_vpr_chunk_free(struct tegra_vpr_chunk *chunk) +{ + cma_free(chunk->cma); +} + +static inline bool tegra_vpr_chunk_is_last(const struct tegra_vpr_chunk *chunk) +{ + phys_addr_t limit = chunk->vpr->base + chunk->vpr->size; + + return chunk->limit == limit; +} + +static inline bool tegra_vpr_chunk_is_leaf(const struct tegra_vpr_chunk *chunk) +{ + const struct tegra_vpr_chunk *next = chunk + 1; + + if (tegra_vpr_chunk_is_last(chunk)) + return true; + + return !next->active; +} + +static int tegra_vpr_chunk_activate(struct tegra_vpr_chunk *chunk) +{ + unsigned long align = get_order(chunk->vpr->align); + int err; + + trace_tegra_vpr_chunk_activate(chunk->start, chunk->limit); + + chunk->start_page = cma_alloc(chunk->cma, chunk->num_pages, align, + false); + if (!chunk->start_page) { + err = -ENOMEM; + goto free; + } + + chunk->virt = (unsigned long)page_to_virt(chunk->start_page); + + apply_to_existing_page_range(&init_mm, chunk->virt, chunk->size, + tegra_vpr_protect_pages, NULL); + flush_tlb_kernel_range(chunk->virt, chunk->virt + chunk->size); + + chunk->active = true; + + return 0; + +free: + cma_release(chunk->cma, chunk->start_page, chunk->num_pages); + chunk->start_page = NULL; + chunk->virt = 0; + return err; +} + +static int tegra_vpr_chunk_deactivate(struct tegra_vpr_chunk *chunk) +{ + if (!chunk->active) + return 0; + + /* do not deactivate if there are buffers left in this chunk */ + if (WARN_ON(chunk->num_buffers > 0)) + return -EBUSY; + + trace_tegra_vpr_chunk_deactivate(chunk->start, chunk->limit); + + chunk->active = false; + + apply_to_existing_page_range(&init_mm, chunk->virt, chunk->size, + tegra_vpr_unprotect_pages, NULL); + flush_tlb_kernel_range(chunk->virt, chunk->virt + chunk->size); + + cma_release(chunk->cma, chunk->start_page, chunk->num_pages); + chunk->start_page = NULL; + chunk->virt = 0; + + return 0; +} + +static bool tegra_vpr_chunk_overlaps(struct tegra_vpr_chunk *chunk, + unsigned int start, unsigned int limit) +{ + unsigned int first = chunk->offset; + unsigned int last = chunk->offset + chunk->num_pages - 1; + + if (last < start || first >= limit) + return false; + + return true; +} + +static int tegra_vpr_activate_chunks(struct tegra_vpr *vpr, + struct tegra_vpr_buffer *buffer) +{ + DECLARE_BITMAP(dirty, vpr->num_chunks); + unsigned int i, bottom, top; + int err = 0, ret; + + bitmap_zero(dirty, vpr->num_chunks); + + /* activate any inactive chunks that overlap this buffer */ + for_each_set_bit(i, buffer->chunks, vpr->num_chunks) { + struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; + + if (chunk->active) + continue; + + err = tegra_vpr_chunk_activate(chunk); + if (err < 0) + goto deactivate; + + set_bit(i, vpr->active); + set_bit(i, dirty); + } + + /* + * Activating chunks above may have created holes, but since the VPR + * can only ever be a single contiguous region, make sure to activate + * any missing chunks. + */ + for_each_clear_bitrange(bottom, top, vpr->active, vpr->num_chunks) { + /* inactive chunks at the bottom or the top are harmless */ + if (bottom == 0 || top == vpr->num_chunks) + continue; + + for (i = bottom; i < top; i++) { + struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; + + err = tegra_vpr_chunk_activate(chunk); + if (err < 0) + goto deactivate; + + set_bit(i, vpr->active); + set_bit(i, dirty); + } + } + + /* if any chunks have been activated, VPR needs to be resized */ + if (!bitmap_empty(dirty, vpr->num_chunks)) { + err = tegra_vpr_resize(vpr); + if (err < 0) { + pr_err("failed to grow VPR: %d\n", err); + goto deactivate; + } + } + + /* increment buffer count for each chunk */ + for_each_set_bit(i, buffer->chunks, vpr->num_chunks) + vpr->chunks[i].num_buffers++; + + return 0; + +deactivate: + /* deactivate any of the previously inactive chunks on failure */ + for_each_set_bit(i, dirty, vpr->num_chunks) { + ret = tegra_vpr_chunk_deactivate(&vpr->chunks[i]); + if (ret < 0) + WARN(1, "failed to deactivate chunk #%u: %d\n", i, ret); + + clear_bit(i, vpr->active); + } + + return err; +} + +/* + * Retrieve the range of pages within the activate region of the VPR. + */ +static bool tegra_vpr_get_active_range(struct tegra_vpr *vpr, + unsigned int *first, + unsigned int *last) +{ + unsigned long i, j; + + i = find_first_bit(vpr->active, vpr->num_chunks); + if (i >= vpr->num_chunks) + return false; + + j = find_last_bit(vpr->active, vpr->num_chunks); + if (j >= vpr->num_chunks) + return false; + + *first = vpr->chunks[i].offset; + *last = vpr->chunks[j].offset + vpr->chunks[j].num_pages; + + return true; +} + +/* + * Try to find and allocate a free region within a specific page range. + * Returns the page number if successful, -ENOSPC otherwise. + * + * This function mimics bitmap_find_free_region() but restricts the search + * to a specific range to enable allocation within individual chunks. + */ +static int tegra_vpr_find_free_region_in_range(struct tegra_vpr *vpr, + unsigned int start_page, + unsigned int end_page, + unsigned int num_pages, + unsigned int align) +{ + unsigned int pos, next = ALIGN(start_page, align); + + /* Scan through aligned positions, trying to allocate at each one */ + for (pos = next; pos + num_pages <= end_page; pos = next) { + next = find_next_bit(vpr->bitmap, pos + num_pages, pos); + + if (next >= pos + num_pages) { + bitmap_set(vpr->bitmap, pos, num_pages); + return pos; + } + + next = find_next_zero_bit(vpr->bitmap, vpr->num_pages, next); + next = ALIGN(next, align); + } + + return -ENOSPC; +} + +static int tegra_vpr_find_free_region(struct tegra_vpr *vpr, + unsigned int num_pages, + unsigned long align) +{ + return tegra_vpr_find_free_region_in_range(vpr, 0, vpr->num_pages - 1, + num_pages, align); +} + +static int tegra_vpr_find_free_region_clustered(struct tegra_vpr *vpr, + unsigned int num_pages, + unsigned int align) +{ + unsigned int target, first, last; + int pageno; + + /* + * If there are no allocations, abort the clustered allocation scheme + * and use the generic allocation scheme instead. + */ + if (vpr->first > vpr->last) + return -ENOSPC; + + /* + * First, try to allocate within the currently allocated region. This + * keeps allocations tightly packed and minimizes the VPR size needed. + */ + pageno = tegra_vpr_find_free_region_in_range(vpr, vpr->first, + vpr->last + 1, num_pages, + align); + if (pageno >= 0) + return pageno; + + /* + * If not enough free space exists within the currently allocated + * region, check to see if the allocation fits anywhere within the + * active region, avoiding the need to resize the VPR. + */ + if (tegra_vpr_get_active_range(vpr, &first, &last)) { + pageno = tegra_vpr_find_free_region_in_range(vpr, first, last, + num_pages, align); + if (pageno >= 0) + return pageno; + } + + /* + * If not enough free space exists within the currently active region, + * try to allocate adjacent to it to grow it contiguously and ensure + * optimal packing. + */ + + /* + * Calculate where the allocation should start to end right at the + * first allocated page, with proper alignment. + */ + if (vpr->first >= num_pages) { + target = ALIGN_DOWN(vpr->first - num_pages, align); + + if (!bitmap_allocate(vpr->bitmap, target, num_pages)) + return target; + } + + /* Try after the last allocation */ + target = ALIGN(vpr->last + 1, align); + + if (target + num_pages <= vpr->num_pages && + !bitmap_allocate(vpr->bitmap, target, num_pages)) + return target; + + /* + * Couldn't allocate at the ideal adjacent position, search for any + * available space before the first allocated page. + */ + pageno = tegra_vpr_find_free_region_in_range(vpr, 0, vpr->first, + num_pages, align); + if (pageno >= 0) + return pageno; + + /* + * Couldn't allocate at the ideal adjacent position, search + * for any available space after the last allocated page. + */ + pageno = tegra_vpr_find_free_region_in_range(vpr, vpr->last + 1, + vpr->num_pages, num_pages, + align); + if (pageno >= 0) + return pageno; + + return -ENOSPC; +} + +/* + * Find a free region, preferring locations near existing allocations to + * minimize VPR fragmentation. The allocation strategy is to first allocate + * within or adjacent to the existing region to keep allocations clustered. + * Otherwise fall back to a generic allocation using the first available + * space. + * + * This approach focuses on page-level allocation first, then the chunk + * system determines which chunks need to be activated based on where the + * pages ended up. + */ +static int tegra_vpr_allocate_region(struct tegra_vpr *vpr, + unsigned int num_pages, + unsigned int align) +{ + int pageno; + + /* + * For non-resizable VPR (no chunks), use simple first-fit allocation. + * Clustering optimization is only beneficial for resizable VPR where + * keeping allocations together minimizes the active VPR size. + */ + if (vpr->num_chunks == 0) + return tegra_vpr_find_free_region(vpr, num_pages, align); + + /* + * Check if there are any existing allocations in the bitmap. If so, + * try to allocate near them to minimize fragmentation. + */ + pageno = tegra_vpr_find_free_region_clustered(vpr, num_pages, align); + if (pageno >= 0) + return pageno; + + /* + * If there are no existing allocations, or no space adjacent to them, + * fall back to the first available space anywhere in the VPR. + */ + pageno = tegra_vpr_find_free_region(vpr, num_pages, align); + if (pageno >= 0) + return pageno; + + return -ENOSPC; +} + +static struct tegra_vpr_buffer * +tegra_vpr_buffer_allocate(struct tegra_vpr *vpr, size_t size) +{ + unsigned int num_pages = size >> PAGE_SHIFT; + unsigned int order = get_order(size); + struct tegra_vpr_buffer *buffer; + unsigned long first, last; + int pageno, err; + pgoff_t i; + + /* + * "order" defines the alignment and size, so this may result in + * fragmented memory depending on the allocation patterns. However, + * since this is used primarily for video frames, it is expected that + * a number of buffers of the same size will be allocated, so + * fragmentation should be negligible. + */ + pageno = tegra_vpr_allocate_region(vpr, num_pages, 1); + if (pageno < 0) + return ERR_PTR(pageno); + + first = find_first_bit(vpr->bitmap, vpr->num_pages); + last = find_last_bit(vpr->bitmap, vpr->num_pages); + + buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); + if (!buffer) { + err = -ENOMEM; + goto release; + } + + INIT_LIST_HEAD(&buffer->attachments); + INIT_LIST_HEAD(&buffer->list); + mutex_init(&buffer->lock); + buffer->start = vpr->base + (pageno << PAGE_SHIFT); + buffer->limit = buffer->start + size; + buffer->size = size; + buffer->num_pages = num_pages; + buffer->pageno = pageno; + buffer->order = order; + + buffer->pages = kmalloc_array(buffer->num_pages, + sizeof(*buffer->pages), + GFP_KERNEL); + if (!buffer->pages) { + err = -ENOMEM; + goto free; + } + + /* track which chunks this buffer overlaps */ + if (vpr->num_chunks > 0) { + unsigned int limit = buffer->pageno + buffer->num_pages, i; + + for (i = 0; i < vpr->num_chunks; i++) { + struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; + + if (tegra_vpr_chunk_overlaps(chunk, pageno, limit)) + set_bit(i, buffer->chunks); + } + + /* activate chunks if necessary */ + err = tegra_vpr_activate_chunks(vpr, buffer); + if (err < 0) + goto free; + + /* track first and last allocated pages */ + if (buffer->pageno < vpr->first) + vpr->first = buffer->pageno; + + if (limit - 1 > vpr->last) + vpr->last = limit - 1; + } + + for (i = 0; i < buffer->num_pages; i++) + buffer->pages[i] = &vpr->start_page[pageno + i]; + + return buffer; + +free: + kfree(buffer->pages); + kfree(buffer); +release: + bitmap_release_region(vpr->bitmap, pageno, order); + return ERR_PTR(err); +} + +static void tegra_vpr_buffer_release(struct tegra_vpr_buffer *buffer) +{ + struct tegra_vpr *vpr = buffer->vpr; + struct tegra_vpr_buffer *entry; + unsigned long first, last; + unsigned int i; + + /* + * Decrement buffer count for each overlapping chunk. Note that chunks + * are not deactivated here yet, that's done in tegra_vpr_recycle() + * instead. + */ + for_each_set_bit(i, buffer->chunks, vpr->num_chunks) { + if (!WARN_ON(vpr->chunks[i].num_buffers == 0)) + vpr->chunks[i].num_buffers--; + } + + /* track first and last allocated pages */ + if (list_is_first(&buffer->list, &vpr->buffers) && + list_is_last(&buffer->list, &vpr->buffers)) { + /* if there are no remaining buffers after this, reset */ + vpr->first = ~0U; + vpr->last = 0U; + } else if (list_is_first(&buffer->list, &vpr->buffers)) { + entry = list_next_entry(buffer, list); + vpr->first = entry->pageno; + } else if (list_is_last(&buffer->list, &vpr->buffers)) { + entry = list_prev_entry(buffer, list); + vpr->last = entry->pageno + entry->num_pages - 1; + } + + bitmap_release_region(vpr->bitmap, buffer->pageno, buffer->order); + list_del(&buffer->list); + kfree(buffer->pages); + kfree(buffer); + + first = find_first_bit(vpr->bitmap, vpr->num_pages); + last = find_last_bit(vpr->bitmap, vpr->num_pages); +} + +static int tegra_vpr_attach(struct dma_buf *buf, + struct dma_buf_attachment *attachment) +{ + struct tegra_vpr_buffer *buffer = buf->priv; + struct tegra_vpr_attachment *attach; + int err; + + attach = kzalloc(sizeof(*attach), GFP_KERNEL); + if (!attach) + return -ENOMEM; + + err = sg_alloc_table_from_pages(&attach->sgt, buffer->pages, + buffer->num_pages, 0, buffer->size, + GFP_KERNEL); + if (err < 0) + goto free; + + attach->dev = attach->dev; + INIT_LIST_HEAD(&attach->list); + attachment->priv = attach; + + mutex_lock(&buffer->lock); + list_add(&attach->list, &buffer->attachments); + mutex_unlock(&buffer->lock); + + return 0; + +free: + kfree(attach); + return err; +} + +static void tegra_vpr_detach(struct dma_buf *buf, + struct dma_buf_attachment *attachment) +{ + struct tegra_vpr_buffer *buffer = buf->priv; + struct tegra_vpr_attachment *attach = attachment->priv; + + mutex_lock(&buffer->lock); + list_del(&attach->list); + mutex_unlock(&buffer->lock); + + sg_free_table(&attach->sgt); + kfree(attach); +} + +static struct sg_table * +tegra_vpr_map_dma_buf(struct dma_buf_attachment *attachment, + enum dma_data_direction direction) +{ + struct tegra_vpr_attachment *attach = attachment->priv; + struct sg_table *sgt = &attach->sgt; + int err; + + err = dma_map_sgtable(attachment->dev, sgt, direction, + DMA_ATTR_SKIP_CPU_SYNC); + if (err < 0) + return ERR_PTR(err); + + return sgt; +} + +static void tegra_vpr_unmap_dma_buf(struct dma_buf_attachment *attachment, + struct sg_table *sgt, + enum dma_data_direction direction) +{ + dma_unmap_sgtable(attachment->dev, sgt, direction, + DMA_ATTR_SKIP_CPU_SYNC); +} + +static void tegra_vpr_recycle(struct tegra_vpr *vpr) +{ + DECLARE_BITMAP(dirty, vpr->num_chunks); + unsigned int i; + int err; + + bitmap_zero(dirty, vpr->num_chunks); + + /* + * Deactivate any unused chunks from the bottom... + */ + for (i = 0; i < vpr->num_chunks; i++) { + struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; + + if (!chunk->active) + continue; + + if (chunk->num_buffers > 0) + break; + + err = tegra_vpr_chunk_deactivate(chunk); + if (err < 0) + pr_err("failed to deactivate chunk #%u\n", i); + else { + clear_bit(i, vpr->active); + set_bit(i, dirty); + } + } + + /* + * ... and the top. + */ + for (i = 0; i < vpr->num_chunks; i++) { + unsigned int index = vpr->num_chunks - i - 1; + struct tegra_vpr_chunk *chunk = &vpr->chunks[index]; + + if (!chunk->active) + continue; + + if (chunk->num_buffers > 0) + break; + + err = tegra_vpr_chunk_deactivate(chunk); + if (err < 0) + pr_err("failed to deactivate chunk #%u\n", index); + else { + clear_bit(i, vpr->active); + set_bit(i, dirty); + } + } + + if (!bitmap_empty(dirty, vpr->num_chunks)) { + err = tegra_vpr_resize(vpr); + if (err < 0) { + pr_err("failed to shrink VPR: %d\n", err); + goto activate; + } + } + + return; + +activate: + for_each_set_bit(i, dirty, vpr->num_chunks) { + err = tegra_vpr_chunk_activate(&vpr->chunks[i]); + if (WARN_ON(err < 0)) + pr_err("failed to activate chunk #%u: %d\n", i, err); + } +} + +static void tegra_vpr_release(struct dma_buf *buf) +{ + struct tegra_vpr_buffer *buffer = buf->priv; + struct tegra_vpr *vpr = buffer->vpr; + + mutex_lock(&vpr->lock); + + tegra_vpr_buffer_release(buffer); + + if (vpr->num_chunks > 0) + tegra_vpr_recycle(vpr); + + mutex_unlock(&vpr->lock); +} + +/* + * Prohibit userspace mapping because the CPU cannot access this memory + * anyway. + */ +static int tegra_vpr_begin_cpu_access(struct dma_buf *buf, + enum dma_data_direction direction) +{ + return -EPERM; +} + +static int tegra_vpr_end_cpu_access(struct dma_buf *buf, + enum dma_data_direction direction) +{ + return -EPERM; +} + +static int tegra_vpr_mmap(struct dma_buf *buf, struct vm_area_struct *vma) +{ + return -EPERM; +} + +static const struct dma_buf_ops tegra_vpr_buf_ops = { + .attach = tegra_vpr_attach, + .detach = tegra_vpr_detach, + .map_dma_buf = tegra_vpr_map_dma_buf, + .unmap_dma_buf = tegra_vpr_unmap_dma_buf, + .release = tegra_vpr_release, + .begin_cpu_access = tegra_vpr_begin_cpu_access, + .end_cpu_access = tegra_vpr_end_cpu_access, + .mmap = tegra_vpr_mmap, +}; + +static struct dma_buf *tegra_vpr_allocate(struct dma_heap *heap, + unsigned long len, u32 fd_flags, + u64 heap_flags) +{ + struct tegra_vpr *vpr = dma_heap_get_drvdata(heap); + struct tegra_vpr_buffer *buffer, *entry; + size_t size = ALIGN(len, vpr->align); + DEFINE_DMA_BUF_EXPORT_INFO(export); + struct dma_buf *buf; + + mutex_lock(&vpr->lock); + + buffer = tegra_vpr_buffer_allocate(vpr, size); + if (IS_ERR(buffer)) { + mutex_unlock(&vpr->lock); + return ERR_CAST(buffer); + } + + /* insert in the correct order */ + if (!list_empty(&vpr->buffers)) { + list_for_each_entry(entry, &vpr->buffers, list) { + if (buffer->pageno < entry->pageno) { + list_add_tail(&buffer->list, &entry->list); + break; + } + } + } + + if (list_empty(&buffer->list)) + list_add_tail(&buffer->list, &vpr->buffers); + + buffer->vpr = vpr; + + /* + * If a valid buffer was allocated, wrap it in a dma_buf + * and return it. + */ + export.exp_name = dma_heap_get_name(heap); + export.ops = &tegra_vpr_buf_ops; + export.size = buffer->size; + export.flags = fd_flags; + export.priv = buffer; + + buf = dma_buf_export(&export); + if (IS_ERR(buf)) + tegra_vpr_buffer_release(buffer); + + mutex_unlock(&vpr->lock); + return buf; +} + +static void tegra_vpr_debugfs_show_buffers(struct tegra_vpr *vpr, + struct seq_file *s) +{ + struct tegra_vpr_buffer *buffer; + char buf[16]; + + list_for_each_entry(buffer, &vpr->buffers, list) { + string_get_size(buffer->size, 1, STRING_UNITS_2, buf, + sizeof(buf)); + seq_printf(s, " %pap-%pap (%s)\n", &buffer->start, + &buffer->limit, buf); + + } +} + +static void tegra_vpr_debugfs_show_chunks(struct tegra_vpr *vpr, + struct seq_file *s) +{ + struct tegra_vpr_buffer *buffer; + unsigned int i; + char buf[16]; + + for (i = 0; i < vpr->num_chunks; i++) { + const struct tegra_vpr_chunk *chunk = &vpr->chunks[i]; + + string_get_size(chunk->size, 1, STRING_UNITS_2, buf, + sizeof(buf)); + seq_printf(s, " %pap-%pap (%s) (%s, %u buffers)\n", + &chunk->start, &chunk->limit, buf, + chunk->active ? "active" : "inactive", + chunk->num_buffers); + } + + list_for_each_entry(buffer, &vpr->buffers, list) { + string_get_size(buffer->size, 1, STRING_UNITS_2, buf, + sizeof(buf)); + seq_printf(s, "%pap-%pap (%s, chunks: %*pbl)\n", + &buffer->start, &buffer->limit, buf, + vpr->num_chunks, buffer->chunks); + } +} + +static int tegra_vpr_debugfs_show(struct seq_file *s, struct dma_heap *heap) +{ + struct tegra_vpr *vpr = dma_heap_get_drvdata(heap); + phys_addr_t limit = vpr->base + vpr->size; + char buf[16]; + + string_get_size(vpr->size, 1, STRING_UNITS_2, buf, sizeof(buf)); + seq_printf(s, "%pap-%pap (%s)\n", &vpr->base, &limit, buf); + + if (vpr->num_chunks == 0) + tegra_vpr_debugfs_show_buffers(vpr, s); + else + tegra_vpr_debugfs_show_chunks(vpr, s); + + return 0; +} + +static const struct dma_heap_ops tegra_vpr_heap_ops = { + .allocate = tegra_vpr_allocate, + .show = tegra_vpr_debugfs_show, +}; + +static int tegra_vpr_setup_chunks(struct tegra_vpr *vpr, const char *name) +{ + phys_addr_t start, limit; + unsigned int order, i; + size_t max_size; + int err; + + /* This seems a reasonable value, so hard-code this for now. */ + vpr->num_chunks = 4; + + vpr->chunks = kcalloc(vpr->num_chunks, sizeof(*vpr->chunks), + GFP_KERNEL); + if (!vpr->chunks) + return -ENOMEM; + + max_size = PAGE_SIZE << (get_order(vpr->size) - ilog2(vpr->num_chunks)); + order = get_order(vpr->align); + + /* + * Allocate CMA areas for VPR. All areas will be roughtly the same + * size, with the last area taking up the rest. + */ + start = vpr->base; + limit = vpr->base + vpr->size; + + pr_debug("VPR: %pap-%pap (%lu pages, %u chunks, %lu MiB)\n", &start, + &limit, vpr->num_pages, vpr->num_chunks, + (unsigned long)vpr->size / 1024 / 1024); + + for (i = 0; i < vpr->num_chunks; i++) { + size_t size = limit - start; + phys_addr_t end; + + size = min_t(size_t, size, max_size); + end = start + size - 1; + + err = tegra_vpr_chunk_init(vpr, &vpr->chunks[i], start, size, + order, name); + if (err < 0) { + pr_err("failed to create VPR chunk: %d\n", err); + goto free; + } + + pr_debug(" %2u: %pap-%pap (%lu MiB)\n", i, &start, &end, + size / 1024 / 1024); + start += size; + } + + vpr->first = ~0U; + vpr->last = 0U; + + return 0; + +free: + while (i--) + tegra_vpr_chunk_free(&vpr->chunks[i]); + + kfree(vpr->chunks); + return err; +} + +static void tegra_vpr_free_chunks(struct tegra_vpr *vpr) +{ + unsigned int i; + + for (i = 0; i < vpr->num_chunks; i++) + tegra_vpr_chunk_free(&vpr->chunks[i]); + + kfree(vpr->chunks); +} + +static int tegra_vpr_setup_static(struct tegra_vpr *vpr) +{ + phys_addr_t start, limit; + + start = vpr->base; + limit = vpr->base + vpr->size; + + pr_debug("VPR: %pap-%pap (%lu pages, %lu MiB)\n", &start, &limit, + vpr->num_pages, (unsigned long)vpr->size / 1024 / 1024); + + return 0; +} + +static int __init tegra_vpr_add_heap(struct reserved_mem *rmem, + struct device_node *np) +{ + struct dma_heap_export_info info = {}; + unsigned long first, last; + struct dma_heap *heap; + struct tegra_vpr *vpr; + int err; + + vpr = kzalloc(sizeof(*vpr), GFP_KERNEL); + if (!vpr) + return -ENOMEM; + + INIT_LIST_HEAD(&vpr->buffers); + INIT_LIST_HEAD(&vpr->devices); + vpr->resizable = !of_property_read_bool(np, "no-map"); + vpr->use_freezer = true; + vpr->dev_node = np; + vpr->align = PAGE_SIZE; + vpr->base = rmem->base; + vpr->size = rmem->size; + + /* common setup */ + vpr->start_page = phys_to_page(vpr->base); + vpr->num_pages = vpr->size >> PAGE_SHIFT; + + vpr->bitmap = bitmap_zalloc(vpr->num_pages, GFP_KERNEL); + if (!vpr->bitmap) { + err = -ENOMEM; + goto free; + } + + first = find_first_bit(vpr->bitmap, vpr->num_pages); + last = find_last_bit(vpr->bitmap, vpr->num_pages); + + if (vpr->resizable) + err = tegra_vpr_setup_chunks(vpr, rmem->name); + else + err = tegra_vpr_setup_static(vpr); + + if (err < 0) + goto free; + + info.name = vpr->dev_node->name; + info.ops = &tegra_vpr_heap_ops; + info.priv = vpr; + + heap = dma_heap_add(&info); + if (IS_ERR(heap)) { + err = PTR_ERR(heap); + goto cleanup; + } + + rmem->priv = heap; + + return 0; + +cleanup: + if (vpr->resizable) + tegra_vpr_free_chunks(vpr); +free: + bitmap_free(vpr->bitmap); + kfree(vpr); + return err; +} + +static int __init tegra_vpr_init(void) +{ + const char *compatible = "nvidia,tegra-video-protection-region"; + struct device_node *parent; + struct reserved_mem *rmem; + int err; + + parent = of_find_node_by_path("/reserved-memory"); + if (!parent) + return 0; + + for_each_child_of_node_scoped(parent, child) { + if (!of_device_is_compatible(child, compatible)) + continue; + + rmem = of_reserved_mem_lookup(child); + if (!rmem) + continue; + + err = tegra_vpr_add_heap(rmem, child); + if (err < 0) + pr_err("failed to add VPR heap for %pOF: %d\n", child, + err); + + /* only a single VPR heap is supported */ + break; + } + + return 0; +} +module_init(tegra_vpr_init); + +static int tegra_vpr_device_init(struct reserved_mem *rmem, struct device *dev) +{ + struct dma_heap *heap = rmem->priv; + struct tegra_vpr *vpr = dma_heap_get_drvdata(heap); + struct tegra_vpr_device *node; + + if (!dev->driver->pm->freeze || !dev->driver->pm->thaw) + return -EINVAL; + + node = kzalloc(sizeof(*node), GFP_KERNEL); + if (!node) + return -ENOMEM; + + INIT_LIST_HEAD(&node->node); + node->dev = dev; + + list_add_tail(&node->node, &vpr->devices); + + return 0; +} + +static void tegra_vpr_device_release(struct reserved_mem *rmem, + struct device *dev) +{ + struct dma_heap *heap = rmem->priv; + struct tegra_vpr *vpr = dma_heap_get_drvdata(heap); + struct tegra_vpr_device *node, *tmp; + + list_for_each_entry_safe(node, tmp, &vpr->devices, node) { + if (node->dev == dev) { + list_del(&node->node); + kfree(node); + } + } +} + +static const struct reserved_mem_ops tegra_vpr_ops = { + .device_init = tegra_vpr_device_init, + .device_release = tegra_vpr_device_release, +}; + +static int tegra_vpr_rmem_init(struct reserved_mem *rmem) +{ + rmem->ops = &tegra_vpr_ops; + + return 0; +} +RESERVEDMEM_OF_DECLARE(tegra_vpr, "nvidia,tegra-video-protection-region", + tegra_vpr_rmem_init); + +MODULE_DESCRIPTION("NVIDIA Tegra Video-Protection-Region DMA-BUF heap driver"); +MODULE_LICENSE("GPL"); diff --git a/include/trace/events/tegra_vpr.h b/include/trace/events/tegra_vpr.h new file mode 100644 index 000000000000..f8ceb17679fe --- /dev/null +++ b/include/trace/events/tegra_vpr.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#if !defined(_TRACE_TEGRA_VPR_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TEGRA_VPR_H + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM tegra_vpr + +#include + +TRACE_EVENT(tegra_vpr_chunk_activate, + TP_PROTO(phys_addr_t start, phys_addr_t limit), + TP_ARGS(start, limit), + TP_STRUCT__entry( + __field(phys_addr_t, start) + __field(phys_addr_t, limit) + ), + TP_fast_assign( + __entry->start = start; + __entry->limit = limit; + ), + TP_printk("%pap-%pap", &__entry->start, + &__entry->limit) +); + +TRACE_EVENT(tegra_vpr_chunk_deactivate, + TP_PROTO(phys_addr_t start, phys_addr_t limit), + TP_ARGS(start, limit), + TP_STRUCT__entry( + __field(phys_addr_t, start) + __field(phys_addr_t, limit) + ), + TP_fast_assign( + __entry->start = start; + __entry->limit = limit; + ), + TP_printk("%pap-%pap", &__entry->start, + &__entry->limit) +); + +TRACE_EVENT(tegra_vpr_set, + TP_PROTO(phys_addr_t base, phys_addr_t size), + TP_ARGS(base, size), + TP_STRUCT__entry( + __field(phys_addr_t, start) + __field(phys_addr_t, limit) + ), + TP_fast_assign( + __entry->start = base; + __entry->limit = base + size; + ), + TP_printk("%pap-%pap", &__entry->start, &__entry->limit) +); + +#endif /* _TRACE_TEGRA_VPR_H */ + +#include -- 2.52.0 From: Thierry Reding This node contains two sets of properties, one for the case where the VPR is resizable (in which case the VPR region will be dynamically allocated at boot time) and another case where the VPR is fixed in size and initialized by early firmware. The firmware running on the device is responsible for updating the node with the real physical address for the fixed VPR case and remove the properties needed only for resizable VPR. Similarly, if the VPR is resizable, the firmware should remove the "reg" property since it is no longer needed. Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra234.dtsi | 34 ++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi index 850c473235e3..62a5dfde9e38 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi @@ -29,6 +29,40 @@ aliases { i2c8 = &dp_aux_ch3_i2c; }; + reserved-memory { + #address-cells = <2>; + #size-cells = <2>; + ranges; + + vpr: video-protection-region@0 { + compatible = "nvidia,tegra-video-protection-region"; + status = "disabled"; + no-map; + + /* + * Two variants exist for this. For fixed VPR, the + * firmware is supposed to update the "reg" property + * with the fixed memory region configured as VPR. + * + * For resizable VPR we don't care about the exact + * address and instead want a reserved region to be + * allocated with a certain size and alignment at + * boot time. + * + * The firmware is responsible for removing the + * unused set of properties. + */ + + /* fixed VPR */ + reg = <0x0 0x0 0x0 0x0>; + + /* resizable VPR */ + size = <0x0 0x70000000>; + alignment = <0x0 0x100000>; + reusable; + }; + }; + bus@0 { compatible = "simple-bus"; -- 2.52.0 From: Thierry Reding Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra234.dtsi | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi index 62a5dfde9e38..5f67d9b57226 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi @@ -5317,6 +5317,23 @@ pcie-ep@141e0000 { }; }; + gpu@17000000 { + compatible = "nvidia,ga10b"; + reg = <0x0 0x17000000 0x0 0x1000000>, + <0x0 0x18000000 0x0 0x1000000>; + interrupts = , + , + , + ; + interrupt-names = "nonstall", "stall0", "stall1", "stall2"; + power-domains = <&bpmp TEGRA234_POWER_DOMAIN_GPU>; + clocks = <&bpmp TEGRA234_CLK_GPUSYS>, + <&bpmp TEGRA234_CLK_GPC0CLK>, + <&bpmp TEGRA234_CLK_GPC1CLK>; + clock-names = "sys", "gpc0", "gpc1"; + resets = <&bpmp TEGRA234_RESET_GPU>; + }; + sram@40000000 { compatible = "nvidia,tegra234-sysram", "mmio-sram"; reg = <0x0 0x40000000 0x0 0x80000>; -- 2.52.0 From: Thierry Reding The host1x needs access to the VPR region, so make sure to reference it via the memory-region property. Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra234.dtsi | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi index 5f67d9b57226..cfffa4f2f196 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi @@ -4474,6 +4474,9 @@ vic@15340000 { interconnect-names = "dma-mem", "write"; iommus = <&smmu_niso1 TEGRA234_SID_VIC>; dma-coherent; + + memory-region = <&vpr>; + memory-region-names = "protected"; }; nvdec@15480000 { @@ -4492,6 +4495,9 @@ nvdec@15480000 { iommus = <&smmu_niso1 TEGRA234_SID_NVDEC>; dma-coherent; + memory-region = <&vpr>; + memory-region-names = "protected"; + nvidia,memory-controller = <&mc>; /* -- 2.52.0 From: Thierry Reding The GPU needs to be idled before the VPR can be resized and unidled afterwards. Associate it with the VPR using the standard memory-region device tree property. Signed-off-by: Thierry Reding --- arch/arm64/boot/dts/nvidia/tegra234.dtsi | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi index cfffa4f2f196..21db5d107bc4 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi @@ -5338,6 +5338,9 @@ gpu@17000000 { <&bpmp TEGRA234_CLK_GPC1CLK>; clock-names = "sys", "gpc0", "gpc1"; resets = <&bpmp TEGRA234_RESET_GPU>; + + memory-region-names = "protected"; + memory-region = <&vpr>; }; sram@40000000 { -- 2.52.0