contpte_ptep_set_access_flags() compared the gathered ptep_get() value against the requested entry to detect no-ops. ptep_get() ORs AF/dirty from all sub-PTEs in the CONT block, so a dirty sibling can make the target appear already-dirty. When the gathered value matches entry, the function returns 0 even though the target sub-PTE still has PTE_RDONLY set in hardware. For CPU page-table walks this is benign: with FEAT_HAFDBS the hardware may set AF/dirty on any sub-PTE and the CPU TLB treats the gathered result as authoritative for the entire range. But an SMMU without HTTU (or with HA/HD disabled in CD.TCR) evaluates each descriptor individually and will keep raising F_PERMISSION on the unchanged target sub-PTE, causing an infinite fault loop. Gathering can therefore cause false no-ops when only a sibling has been updated: - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared) - read faults: target still lacks PTE_AF Fix by checking all sub-PTEs' access flags individually (not via the gathered view) before returning no-op, and use the raw target PTE for the write-bit unfold decision. The access-flag mask matches the one used by __ptep_set_access_flags(). Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT range may become the effective cached translation and software must maintain consistent attributes across the range. Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings") Reviewed-by: Alistair Popple Cc: Ryan Roberts Cc: Catalin Marinas Cc: Will Deacon Cc: Jason Gunthorpe Cc: John Hubbard Cc: Zi Yan Cc: Breno Leitao Cc: stable@vger.kernel.org Signed-off-by: Piotr Jaroszynski --- arch/arm64/mm/contpte.c | 47 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 43 insertions(+), 4 deletions(-) diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index bcac4f55f9c1..9868bfe4607c 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -390,6 +390,23 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma, } EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes); +static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry) +{ + pte_t *cont_ptep = contpte_align_down(ptep); + const pteval_t access_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY; + pteval_t entry_access = pte_val(entry) & access_mask; + int i; + + for (i = 0; i < CONT_PTES; i++) { + pteval_t pte_access = pte_val(__ptep_get(cont_ptep + i)) & access_mask; + + if (pte_access != entry_access) + return false; + } + + return true; +} + int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty) @@ -399,13 +416,35 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma, int i; /* - * Gather the access/dirty bits for the contiguous range. If nothing has - * changed, its a noop. + * Check whether all sub-PTEs in the CONT block already have the + * requested access flags, using raw per-PTE values rather than the + * gathered ptep_get() view. + * + * ptep_get() gathers AF/dirty state across the whole CONT block, + * which is correct for CPU TLB semantics: with FEAT_HAFDBS the + * hardware may set AF/dirty on any sub-PTE and the CPU TLB treats + * the gathered result as authoritative for the entire range. But an + * SMMU without HTTU (or with HA/HD disabled in CD.TCR) evaluates + * each descriptor individually and will keep faulting on the target + * sub-PTE if its flags haven't actually been updated. Gathering can + * therefore cause false no-ops when only a sibling has been updated: + * - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared) + * - read faults: target still lacks PTE_AF + * + * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may + * become the effective cached translation, so all entries must have + * consistent attributes. Check the full CONT block before returning + * no-op, and when any sub-PTE mismatches, proceed to update the whole + * range. */ - orig_pte = pte_mknoncont(ptep_get(ptep)); - if (pte_val(orig_pte) == pte_val(entry)) + if (contpte_all_subptes_match_access_flags(ptep, entry)) return 0; + /* + * Use raw target pte (not gathered) for write-bit unfold decision. + */ + orig_pte = pte_mknoncont(__ptep_get(ptep)); + /* * We can fix up access/dirty bits without having to unfold the contig * range. But if the write bit is changing, we must unfold. -- 2.22.1.7.gac84d6e93c.dirty