linux/mm
Lorenzo Stoakes 0b5be138ce mm/mremap: avoid expensive folio lookup on mremap folio pte batch
It was discovered in the attached report that commit f822a9a81a ("mm:
optimize mremap() by PTE batching") introduced a significant performance
regression on a number of metrics on x86-64, most notably
stress-ng.bigheap.realloc_calls_per_sec - indicating a 37.3% regression in
number of mremap() calls per second.

I was able to reproduce this locally on an intel x86-64 raptor lake
system, noting an average of 143,857 realloc calls/sec (with a stddev of
4,531 or 3.1%) prior to this patch being applied, and 81,503 afterwards
(stddev of 2,131 or 2.6%) - a 43.3% regression.

During testing I was able to determine that there was no meaningful
difference in efforts to optimise the folio_pte_batch() operation, nor
checking folio_test_large().

This is within expectation, as a regression this large is likely to
indicate we are accessing memory that is not yet in a cache line (and
perhaps may even cause a main memory fetch).

The expectation by those discussing this from the start was that
vm_normal_folio() (invoked by mremap_folio_pte_batch()) would likely be
the culprit due to having to retrieve memory from the vmemmap (which
mremap() page table moves does not otherwise do, meaning this is
inevitably cold memory).

I was able to definitively determine that this theory is indeed correct
and the cause of the issue.

The solution is to restore part of an approach previously discarded on
review, that is to invoke pte_batch_hint() which explicitly determines,
through reference to the PTE alone (thus no vmemmap lookup), what the PTE
batch size may be.

On platforms other than arm64 this is currently hardcoded to return 1, so
this naturally resolves the issue for x86-64, and for arm64 introduces
little to no overhead as the pte cache line will be hot.

With this patch applied, we move from 81,503 realloc calls/sec to 138,701
(stddev of 496.1 or 0.4%), which is a -3.6% regression, however accounting
for the variance in the original result, this is broadly restoring
performance to its prior state.

Link: https://lkml.kernel.org/r/20250807185819.199865-1-lorenzo.stoakes@oracle.com
Fixes: f822a9a81a ("mm: optimize mremap() by PTE batching")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508071609.4e743d7c-lkp@intel.com
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-11 23:00:59 -07:00
..
damon Significant patch series in this pull request: 2025-08-05 16:02:07 +03:00
kasan kasan/test: fix protection against compiler elision 2025-08-05 13:28:46 -07:00
kfence kfence: Remove mention of PG_slab 2025-07-23 11:55:22 +02:00
kmsan kmsan: test: add module description 2025-06-05 22:02:25 -07:00
backing-dev.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
balloon_compaction.c mm: stop storing migration_ops in page->mapping 2025-07-13 16:38:29 -07:00
bootmem_info.c mm/sparse: allow for alternate vmemmap section init at boot 2025-03-16 22:06:27 -07:00
cma.c mm: cma: simplify cma_debug_show_areas() 2025-07-24 19:12:36 -07:00
cma.h mm: cma: set early_pfn and bitmap as a union in cma_memrange 2025-05-22 14:55:36 -07:00
cma_debug.c mm: cma: simplify cma_maxchunk_get() 2025-07-24 19:12:36 -07:00
cma_sysfs.c
compaction.c mm: rename PG_isolated to PG_movable_ops_isolated 2025-07-13 16:38:30 -07:00
debug.c mm/util: introduce snapshot_page() 2025-07-24 19:12:35 -07:00
debug_page_alloc.c mm/debug_page_alloc: improve error message for invalid guardpage minorder 2025-05-12 23:50:38 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: use a swp_entry_t input value for swap tests 2025-07-13 16:38:21 -07:00
dmapool.c docs: dma-api: replace consistent with coherent 2025-07-01 13:25:36 -06:00
dmapool_test.c
early_ioremap.c mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereference 2025-01-13 22:40:59 -08:00
execmem.c mm: correct type for vmalloc vm_flags fields 2025-08-02 12:06:13 -07:00
fadvise.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
fail_page_alloc.c
failslab.c
filemap.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
folio-compat.c mm: Remove grab_cache_page_write_begin() 2025-03-04 17:02:25 +00:00
gup.c mm: rename PAGE_MAPPING_* to FOLIO_MAPPING_* 2025-07-13 16:38:31 -07:00
gup_test.c
gup_test.h
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdeffery 2025-07-19 18:59:53 -07:00
huge_memory.c mm/huge_memory: refactor after-split (page) cache code 2025-07-24 19:12:39 -07:00
hugetlb.c mm/page_owner: convert set_page_owner_migrate_reason() to folios 2025-07-19 18:59:57 -07:00
hugetlb_cgroup.c
hugetlb_cma.c mm/hugetlb: use separate nodemask for bootmem allocations 2025-05-12 23:50:35 -07:00
hugetlb_cma.h mm/hugetlb: move hugetlb CMA code in to its own file 2025-03-16 22:06:31 -07:00
hugetlb_vmemmap.c mm/pagewalk: split walk_page_range_novma() into kernel/user parts 2025-07-09 22:42:05 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c
internal.h Significant patch series in this pull request: 2025-08-05 16:02:07 +03:00
interval_tree.c
ioremap.c
Kconfig mm: remove mm/io-mapping.c 2025-08-02 12:06:10 -07:00
Kconfig.debug
khugepaged.c mm: fix the race between collapse and PT_RECLAIM under per-vma lock 2025-08-05 13:28:47 -07:00
kmemleak.c mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup() 2025-08-05 13:28:47 -07:00
ksm.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
list_lru.c mm, list_lru: refactor the locking code 2025-07-09 22:41:56 -07:00
maccess.c mm: unexport globally copy_to_kernel_nofault 2025-07-09 22:42:22 -07:00
madvise.c mm/mseal: small cleanups 2025-08-02 12:06:09 -07:00
Makefile mm: remove mm/io-mapping.c 2025-08-02 12:06:10 -07:00
mapping_dirty_helpers.c mm: remove redundant pXd_devmap calls 2025-07-09 22:42:17 -07:00
memblock.c memblock: add KHO support for reserve_mem 2025-05-12 23:50:42 -07:00
memcontrol-v1.c memcg: make count_memcg_events re-entrant safe against irqs 2025-05-22 14:55:38 -07:00
memcontrol-v1.h
memcontrol.c cgroup: Changes for v6.17 2025-07-31 16:04:19 -07:00
memfd.c mm/memfd: replace deprecated strcpy() with memcpy() in alloc_name() 2025-07-19 18:59:57 -07:00
memory-failure.c Significant patch series in this pull request: 2025-08-05 16:02:07 +03:00
memory-tiers.c mm,memory-tiers: use node-notifier instead of memory-notifier 2025-07-13 16:38:15 -07:00
memory.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
memory_hotplug.c mm: rename __PageMovable() to page_has_movable_ops() 2025-07-13 16:38:29 -07:00
mempolicy.c mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_flags() 2025-07-19 18:59:45 -07:00
mempool.c mm: mempool: fix crash in mempool_free() for zero-minimum pools 2025-08-02 12:06:13 -07:00
memremap.c mm/page_alloc: add support for initializing pageblock as isolated 2025-07-13 16:38:17 -07:00
memtest.c
migrate.c mm/page_owner: convert set_page_owner_migrate_reason() to folios 2025-07-19 18:59:57 -07:00
migrate_device.c mm: remove redundant pXd_devmap calls 2025-07-09 22:42:17 -07:00
mincore.c mm/mincore: hold PTL in mincore_hugetlb 2025-08-02 12:06:10 -07:00
mlock.c mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_flags() 2025-07-19 18:59:45 -07:00
mm_init.c mm/page_alloc: add support for initializing pageblock as isolated 2025-07-13 16:38:17 -07:00
mm_slot.h
mmap.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
mmap_lock.c mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped 2025-08-02 12:06:11 -07:00
mmu_gather.c mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() 2025-05-31 22:46:12 -07:00
mmu_notifier.c Update Christoph's Email address and make it consistent 2025-05-12 23:50:31 -07:00
mmzone.c
mprotect.c mm: pass page directly instead of using folio_page 2025-08-11 23:00:59 -07:00
mremap.c mm/mremap: avoid expensive folio lookup on mremap folio pte batch 2025-08-11 23:00:59 -07:00
mseal.c mm/mseal: rework mseal apply logic 2025-08-02 12:06:09 -07:00
msync.c
nommu.c Significant patch series in this pull request: 2025-08-05 16:02:07 +03:00
numa.c mm/numa: remove unnecessary local variable in alloc_node_data() 2025-05-12 23:50:38 -07:00
numa_emulation.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
numa_memblks.c mm: numa_memblks: introduce numa_add_reserved_memblk 2025-05-22 14:55:36 -07:00
oom_kill.c mm/oom_kill: fix trivial typo in comment 2025-03-16 22:05:55 -07:00
page-writeback.c mm, vmstat: remove the NR_WRITEBACK_TEMP node_stat_item counter 2025-07-19 18:59:47 -07:00
page_alloc.c mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() 2025-07-26 15:08:22 -07:00
page_counter.c
page_ext.c mm,page_ext: derive the node from the pfn 2025-07-13 16:38:16 -07:00
page_frag_cache.c
page_idle.c sysfs: treewide: switch back to attribute_group::bin_attrs 2025-06-17 10:44:15 +02:00
page_io.c mm: stop passing a writeback_control structure to swap_writeout 2025-07-09 22:41:58 -07:00
page_isolation.c mm/page_isolation: drop __folio_test_movable() check for large folios 2025-07-13 16:38:29 -07:00
page_owner.c mm/page_owner: convert set_page_owner_migrate_reason() to folios 2025-07-19 18:59:57 -07:00
page_poison.c
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: Batch-check pmds/puds just like ptes 2025-05-09 13:43:07 +01:00
page_vma_mapped.c mm: remove redundant pXd_devmap calls 2025-07-09 22:42:17 -07:00
pagewalk.c mm: remove redundant pXd_devmap calls 2025-07-09 22:42:17 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c mm: remove outdated filename comment in percpu-stats.c 2025-07-13 16:38:23 -07:00
percpu-vm.c
percpu.c mm/percpu: prevent concurrency problem for pcpu_nr_populated read with spin lock 2025-07-13 16:38:21 -07:00
pgalloc-track.h
pgtable-generic.c mm: remove redundant pXd_devmap calls 2025-07-09 22:42:17 -07:00
process_vm_access.c
pt_reclaim.c
ptdump.c mm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd() 2025-07-09 22:42:20 -07:00
readahead.c readahead: use folio_nr_pages() instead of shift operation 2025-07-19 18:59:53 -07:00
rmap.c mm: add get_and_clear_ptes() and clear_ptes() 2025-08-02 12:06:10 -07:00
rodata_test.c
secretmem.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
shmem.c Significant patch series in this pull request: 2025-08-05 16:02:07 +03:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
show_mem.c mm, vmstat: remove the NR_WRITEBACK_TEMP node_stat_item counter 2025-07-19 18:59:47 -07:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c
shuffle.c
shuffle.h
slab.h slab: Add SL_pfmemalloc flag 2025-06-18 13:06:26 +02:00
slab_common.c Update Christoph's Email address and make it consistent 2025-05-12 23:50:31 -07:00
slub.c printk changes for 6.17 2025-08-04 10:54:36 -07:00
sparse-vmemmap.c mm/hugetlb: do pre-HVO for bootmem allocated pages 2025-03-16 22:06:29 -07:00
sparse.c drivers/base/memory: improve add_boot_memory_block() 2025-03-17 22:07:01 -07:00
swap.c mm: optimize lru_note_cost() by adding lru_note_cost_unlock_irq() 2025-07-24 19:12:28 -07:00
swap.h mm: stop passing a writeback_control structure to swap_writeout 2025-07-09 22:41:58 -07:00
swap_cgroup.c mm: swap_cgroup: remove double initialization of locals 2025-03-17 22:06:58 -07:00
swap_state.c - The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox 2025-05-31 15:44:16 -07:00
swapfile.c mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-07-24 19:12:34 -07:00
truncate.c - The 2 patch series "zram: support algorithm-specific parameters" from 2025-06-02 16:00:26 -07:00
usercopy.c
userfaultfd.c userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry 2025-08-11 23:00:59 -07:00
util.c mm/util: introduce snapshot_page() 2025-07-24 19:12:35 -07:00
vma.c Significant patch series in this pull request: 2025-08-05 16:02:07 +03:00
vma.h mm/mseal: small cleanups 2025-08-02 12:06:09 -07:00
vma_exec.c mm/vma: use vmg->target to specify target VMA for new VMA merge 2025-07-09 22:42:11 -07:00
vma_init.c mm: convert VM_PFNMAP tracking to pfnmap_track() + pfnmap_untrack() 2025-05-22 14:55:37 -07:00
vma_internal.h
vmalloc.c mm/vmalloc: leave lazy MMU mode on PTE mapping error 2025-07-09 21:07:52 -07:00
vmpressure.c memcg: convert memcg->socket_pressure to u64 2025-07-24 19:12:32 -07:00
vmscan.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
vmstat.c mm, vmstat: remove the NR_WRITEBACK_TEMP node_stat_item counter 2025-07-19 18:59:47 -07:00
workingset.c mm: workingset: simplify lockdep check in update_node 2025-05-12 23:50:44 -07:00
zpdesc.h mm: convert "movable" flag in page->mapping to a page flag 2025-07-13 16:38:30 -07:00
zpool.c zsmalloc: prefer the the original page's node for compressed data 2025-05-11 17:48:06 -07:00
zsmalloc.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
zswap.c mm: stop passing a writeback_control structure to __swap_writepage 2025-07-09 22:41:57 -07:00