sst-linux/mm
Lorenzo Stoakes 52c81fd0f5 mm: resolve faulty mmap_region() error path behaviour
[ Upstream commit 5de195060b2e251a835f622759550e6202167641 ]

The mmap_region() function is somewhat terrifying, with spaghetti-like
control flow and numerous means by which issues can arise and incomplete
state, memory leaks and other unpleasantness can occur.

A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.

Taking advantage of previous patches in this series we move a number of
checks earlier in the code, simplifying things by moving the core of the
logic into a static internal function __mmap_region().

Doing this allows us to perform a number of checks up front before we do
any real work, and allows us to unwind the writable unmap check
unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
validation unconditionally also.

We move a number of things here:

1. We preallocate memory for the iterator before we call the file-backed
   memory hook, allowing us to exit early and avoid having to perform
   complicated and error-prone close/free logic. We carefully free
   iterator state on both success and error paths.

2. The enclosing mmap_region() function handles the mapping_map_writable()
   logic early. Previously the logic had the mapping_map_writable() at the
   point of mapping a newly allocated file-backed VMA, and a matching
   mapping_unmap_writable() on success and error paths.

   We now do this unconditionally if this is a file-backed, shared writable
   mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
   doing so does not invalidate the seal check we just performed, and we in
   any case always decrement the counter in the wrapper.

   We perform a debug assert to ensure a driver does not attempt to do the
   opposite.

3. We also move arch_validate_flags() up into the mmap_region()
   function. This is only relevant on arm64 and sparc64, and the check is
   only meaningful for SPARC with ADI enabled. We explicitly add a warning
   for this arch if a driver invalidates this check, though the code ought
   eventually to be fixed to eliminate the need for this.

With all of these measures in place, we no longer need to explicitly close
the VMA on error paths, as we place all checks which might fail prior to a
call to any driver mmap hook.

This eliminates an entire class of errors, makes the code easier to reason
about and more robust.

Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f65628 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-22 15:37:34 +01:00
..
damon mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock 2024-10-17 15:21:27 +02:00
kasan kasan: remove vmalloc_percpu test 2024-11-08 16:26:46 +01:00
kfence mm,kfence: decouple kfence from page granularity mapping judgement 2023-12-03 07:32:08 +01:00
kmsan kmsan: do not wipe out origin when doing partial unpoisoning 2024-06-16 13:41:38 +02:00
backing-dev.c writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs 2023-04-26 14:28:39 +02:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-06-16 13:41:39 +02:00
cma.h
compaction.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:19:42 +02:00
debug_page_ref.c
debug_vm_pgtable.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c filemap: Fix bounds checking in filemap_read() 2024-11-14 13:15:17 +01:00
folio-compat.c
frontswap.c
gup_test.c
gup_test.h
gup.c mm: always expand the stack with the mmap write lock held 2023-07-01 13:16:25 +02:00
highmem.c
hmm.c
huge_memory.c mm: migrate: try again if THP split is failed due to page refcnt 2024-11-08 16:26:47 +01:00
hugetlb_cgroup.c mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() to folios 2024-05-17 11:55:52 +02:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-09-19 12:27:56 +02:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio() 2024-08-14 13:53:02 +02:00
hwpoison-inject.c
init-mm.c
internal.h mm: unconditionally close VMAs on error 2024-11-22 15:37:34 +01:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm: z3fold: deprecate CONFIG_Z3FOLD 2024-10-17 15:22:05 +02:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-06-14 11:15:29 +02:00
khugepaged.c mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file() 2024-08-29 17:30:17 +02:00
kmemleak.c
ksm.c mm/ksm: fix race with VMA iteration and mm_struct teardown 2023-03-30 12:49:29 +02:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
madvise.c madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check 2023-08-30 16:11:11 +02:00
Makefile
mapping_dirty_helpers.c
memblock.c x86/numa: Fix the address overlap check in numa_fill_memblks() 2024-03-01 13:26:36 +01:00
memcontrol.c memcg: protect concurrent access to mem_cgroup_idr 2024-09-12 11:10:29 +02:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 11:12:27 +02:00
memory_hotplug.c x86/kaslr: Expose and use the end of the physical memory address space 2024-09-12 11:10:17 +02:00
memory-failure.c mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu 2024-08-29 17:30:15 +02:00
memory-tiers.c
memory.c mm: avoid leaving partial pfn mappings around in error case 2024-09-18 19:23:04 +02:00
mempolicy.c mm/numa_balancing: teach mpol_to_str about the balancing mode 2024-08-03 08:49:40 +02:00
mempool.c
memremap.c
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-03 15:19:36 +02:00
migrate_device.c
migrate.c migrate_pages_batch: fix statistics for longterm pin retry 2024-11-08 16:26:48 +01:00
mincore.c
mlock.c
mm_init.c
mm_slot.h
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-08-03 08:49:30 +02:00
mmap.c mm: resolve faulty mmap_region() error path behaviour 2024-11-22 15:37:34 +01:00
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-22 15:37:34 +01:00
oom_kill.c
page_alloc.c mm: fix NULL pointer dereference in alloc_pages_bulk_noprof 2024-11-22 15:37:30 +01:00
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-27 13:46:22 +02:00
page_vma_mapped.c
page-writeback.c Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again" 2024-07-11 12:47:14 +02:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:41:38 +02:00
process_vm_access.c
ptdump.c
readahead.c mm: use memalloc_nofs_save() in page_cache_ra_order() 2024-05-17 11:56:21 +02:00
rmap.c
rodata_test.c
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-17 15:22:28 +02:00
shmem.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-22 15:37:34 +01:00
shrinker_debug.c
shuffle.c
shuffle.h
slab_common.c mm: krealloc: Fix MTE false alarm in __do_krealloc 2024-11-17 15:07:22 +01:00
slab.c mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP 2023-03-30 12:49:23 +02:00
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c x86/kaslr: Expose and use the end of the physical memory address space 2024-09-12 11:10:17 +02:00
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swap.h mm/swap: fix race when skipping swapcache 2024-03-01 13:26:32 +01:00
swapfile.c mm/swapfile: skip HugeTLB pages for unuse_vma 2024-10-22 15:56:43 +02:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-09-04 13:25:00 +02:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
userfaultfd.c userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb 2024-02-23 09:12:51 +01:00
util.c mm: unconditionally close VMAs on error 2024-11-22 15:37:34 +01:00
vmalloc.c mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0 2024-08-29 17:30:52 +02:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-09-13 09:42:33 +02:00
vmscan.c mm/mglru: fix div-by-zero in vmpressure_calc_level() 2024-08-03 08:49:30 +02:00
vmstat.c
workingset.c mm/mglru: fix underprotected page cache 2023-12-20 17:00:26 +01:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c zsmalloc: allow only one active pool compaction context 2023-08-23 17:52:40 +02:00
zswap.c mm: zswap: fix missing folio cleanup in writeback race path 2024-03-01 13:26:39 +01:00