This set of fixes includes a solution for the issue described in the tag
annotation for v6.17 updates.
The issue involved a potential call to schedule() within an RCU read-side
critical section. The solution applies reference counting to ensure that
handlers which may call schedule() are invoked safely outside of the
critical section.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQE66IEYNDXNBPeGKSsLtaWM8LwEwUCaJ6SzAAKCRCsLtaWM8Lw
E2YMAP0SNa3lrX9AWVviipIztqPb5Eo5jbMFk1KztEYWf/g0RQD+JxdELawys+3V
oVNceJ5UUpQy0lSge4bll0Y1kZNlAQE=
=gk22
-----END PGP SIGNATURE-----
Merge tag 'firewire-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
"This fixes a potential call to schedule() within an RCU read-side
critical section. The solution applies reference counting to ensure
that handlers which may call schedule() are invoked safely outside of
the critical section"
* tag 'firewire-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: reallocate buffer for FCP address handlers when more than 4 are registered
firewire: core: call FCP address handlers outside RCU read-side critical section
firewire: core: call handler for exclusive regions outside RCU read-side critical section
firewire: core: use reference counting to invoke address handlers safely
- Prevent the ACPI EC driver from ignoring ECDT information in the
cases when the ID string in the ECDT is invalid, but not empty, to
fix thouchpad detection on ThinkBook 14 G7 IML (Armin Wolf).
- Rearrange checks in acpi_processor_ppc_init() to restore the handling
of frequency QoS requests related to _PPC limits inadvertently broken
by a recent update (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmidxxQSHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1l8UH/R3dcBoacfdhaD6+KLUfaerWiKyW76dm
+310aADCX8ZTHpcMiy17g+rzeS07O81wfCvDUptbfF/1xmdNiZ46wq8FkOUlgVhv
Zhap7Sm+zTHTU/WkcX/2xRgpiJ9p1c9QnDI19hfRbUHcr/0CcsR04fxS2tiJUKuT
cd2RN7v2SKiULMQNtD7RFKYYmtJjAj+h67kbvgA1cxhH+q1g3EFdjy16cPlkH/0X
LAtRHWJtZxx8wO9NF2R/boKXqtq2Z+/StDekZ0BXqBPe8ivW3V9Z78eiFpm01bhB
YdCAGxgNvoObfueFQK+SOSlGod0a3B1rdeyK7PbkPe1tK/m/7KNgP5s=
=5cFF
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These restore corner case behavior of the EC driver related to the
handling of defective ACPI tables and fix a recent regression in the
ACPI processor driver:
- Prevent the ACPI EC driver from ignoring ECDT information in the
cases when the ID string in the ECDT is invalid, but not empty, to
fix thouchpad detection on ThinkBook 14 G7 IML (Armin Wolf)
- Rearrange checks in acpi_processor_ppc_init() to restore the
handling of frequency QoS requests related to _PPC limits
inadvertently broken by a recent update (Rafael Wysocki)"
* tag 'acpi-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: Relax sanity check of the ECDT ID string
ACPI: processor: perflib: Move problematic pr->performance check
- Allow intel_idle to use _CST information from ACPI tables for
idle states enumeration on any family of processors (Len Brown).
- Restore corner case behavior of the menu cpuidle governor, related
to the handling of systems where idle states selected by the governor
are rejected by the cpuidle driver, inadvertently changed during the
6.15 development cycle (Rafael Wysocki).
- Add support for Clearwater Forest in the out-of-band (OOB) mode to
the intel_pstate driver (Srinivas Pandruvada).
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmidw1wSHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1NHAIAJDN1yMvQo0Pqlgw/FJYJzUGDF6kDfUF
xuAwBCr4/CYzlsi9yrUtfqx0hKHKdhRB9XsX/wlc0TM6nH6QH4X7XLy/SbZpZ28S
YsmJAPu/GJl/H6QvsDjYiovBhtPqmLFQMdTdW1WaIJUaBqffMUDb7xtZHy3Ib847
CpkmsKyPqsn0A0TMCWQ3fyprkqNMHMWL1qiPLllmlE/N9b9LMugN+KnxCL7dFnx1
b8mVDIvgJBMbayNs59oBFeg+92DLDvE41iCbW29QIblO1EjpS7ZnWHsbJvOswJCK
/ekNVkpgPywKvI0ycOjWTAkfNbfU3BhH3ZMupVzfzJ9WdqDxCU7tO7A=
=OJnT
-----END PGP SIGNATURE-----
Merge tag 'pm-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These remove an artificial limitation from the intel_idle driver,
update the menu cpuidle governor to restore its previous behavior in a
corner case and add one more supported platform configuration to the
intel_pstate driver:
- Allow intel_idle to use _CST information from ACPI tables for idle
states enumeration on any family of processors (Len Brown)
- Restore corner case behavior of the menu cpuidle governor, related
to the handling of systems where idle states selected by the
governor are rejected by the cpuidle driver, inadvertently changed
during the 6.15 development cycle (Rafael Wysocki)
- Add support for Clearwater Forest in the out-of-band (OOB) mode to
the intel_pstate driver (Srinivas Pandruvada)"
* tag 'pm-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Support Clearwater Forest OOB mode
cpuidle: governors: menu: Avoid using invalid recent intervals data
intel_idle: Allow loading ACPI tables for any family
Current release - regressions:
- netfilter: nft_set_pipapo:
- don't return bogus extension pointer
- fix null deref for empty set
Current release - new code bugs:
- core: prevent deadlocks when enabling NAPIs with mixed kthread config
- eth: netdevsim: Fix wild pointer access in nsim_queue_free().
Previous releases - regressions:
- page_pool: allow enabling recycling late, fix false positive warning
- sched: ets: use old 'nbands' while purging unused classes
- xfrm:
- restore GSO for SW crypto
- bring back device check in validate_xmit_xfrm
- tls: handle data disappearing from under the TLS ULP
- ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
- eth: bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
- eth: hv_netvsc: fix panic during namespace deletion with VF
Previous releases - always broken:
- netfilter: fix refcount leak on table dump
- vsock: do not allow binding to VMADDR_PORT_ANY
- sctp: linearize cloned gso packets in sctp_rcv
- eth: hibmcge: fix the division by zero issue
- eth: microchip: fix KSZ8863 reset problem
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmidxhgSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkPckP/AhJ0kARPgo72OhElbW6KvkHhXVNUTJb
6m15j9Z8ybQPBupSorxUEBx7pxczv/GBloN1xoJqUY/p7B6kLO2HDOpaReNFjZvF
J9hPl6ZF6CWzwgfcUfwI3UB1zhALKyHDClclfd8FBoKjAYXCrZXuPv/AV4oqYsA1
y7g24zpI76Cu+M+Nf5YhrlIVSmQ1/DXX8gifdcHFYnAKmCn7KxNv2lwvm2/yE2lL
9/Xl/D1cG/BiAaCUUXR4BP8RU5gdW72+lM3qmC/QFO/7jcBaoE89Y9Anona8p0PQ
dqDerHd0GDUH9QA6bht3asCS+IW+Zo2gf25o53OzlYIMAxDmEZLUBCwetJhvNJBq
DUQ6agzfNRxsCnlOc4JhMOqNr7rdU7d+9c9KuZWA/m8KRWdlvTYGJd/qzSlTWOhq
s9228dl+4oTb9Mnq8Bqafi42+TImeOyFRW9ZgF8ptjlF0l/lyv6moIrRCmVXppRZ
awABNDdG+i004XmAOAeOhjbUT7clLkLr+KEnsfH16qCa2o3dM6rlhvWYp2sucVJf
SyRvMdz5VqMLgruefpQS/DuK52UklpRawgvgngzU6UDYQUaxQKToeusMjRU7xUnW
hVI1y7/oNH6+r7Zr/iLTLKRR007B+RVC7VSbeMpxmAW+n6puMb+z7RnrJlnFapGM
qqqtk2/jItuK
=Ydk1
-----END PGP SIGNATURE-----
Merge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Netfilter and IPsec.
Current release - regressions:
- netfilter: nft_set_pipapo:
- don't return bogus extension pointer
- fix null deref for empty set
Current release - new code bugs:
- core: prevent deadlocks when enabling NAPIs with mixed kthread
config
- eth: netdevsim: Fix wild pointer access in nsim_queue_free().
Previous releases - regressions:
- page_pool: allow enabling recycling late, fix false positive
warning
- sched: ets: use old 'nbands' while purging unused classes
- xfrm:
- restore GSO for SW crypto
- bring back device check in validate_xmit_xfrm
- tls: handle data disappearing from under the TLS ULP
- ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
- eth:
- bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
- hv_netvsc: fix panic during namespace deletion with VF
Previous releases - always broken:
- netfilter: fix refcount leak on table dump
- vsock: do not allow binding to VMADDR_PORT_ANY
- sctp: linearize cloned gso packets in sctp_rcv
- eth:
- hibmcge: fix the division by zero issue
- microchip: fix KSZ8863 reset problem"
* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
net: usb: asix_devices: add phy_mask for ax88772 mdio bus
net: kcm: Fix race condition in kcm_unattach()
selftests: net/forwarding: test purge of active DWRR classes
net/sched: ets: use old 'nbands' while purging unused classes
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
netdevsim: Fix wild pointer access in nsim_queue_free().
net: mctp: Fix bad kfree_skb in bind lookup test
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
selftests: tls: test TCP stealing data from under the TLS socket
tls: handle data disappearing from under the TLS ULP
ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
net: prevent deadlocks when enabling NAPIs with mixed kthread config
net: update NAPI threaded config even for disabled NAPIs
selftests: drv-net: don't assume device has only 2 queues
docs: Fix name for net.ipv4.udp_child_hash_entries
riscv: dts: thead: Add APB clocks for TH1520 GMACs
...
* pm-cpuidle:
cpuidle: governors: menu: Avoid using invalid recent intervals data
intel_idle: Allow loading ACPI tables for any family
* pm-cpufreq:
cpufreq: intel_pstate: Support Clearwater Forest OOB mode
Without setting phy_mask for ax88772 mdio bus, current driver may create
at most 32 mdio phy devices with phy address range from 0x00 ~ 0x1f.
DLink DUB-E100 H/W Ver B1 is such a device. However, only one main phy
device will bind to net phy driver. This is creating issue during system
suspend/resume since phy_polling_mode() in phy_state_machine() will
directly deference member of phydev->drv for non-main phy devices. Then
NULL pointer dereference issue will occur. Due to only external phy or
internal phy is necessary, add phy_mask for ax88772 mdio bus to workarnoud
the issue.
Closes: https://lore.kernel.org/netdev/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: e532a096be ("net: usb: asix: ax88772: add phylib support")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250811092931.860333-1-xu.yang_2@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Davide Caratti says:
====================
ets: use old 'nbands' while purging unused classes
- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc
- patch 2/2 extends kselftests to verify effectiveness of the above fix
====================
Link: https://patch.msgid.link/cover.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
ixgbe: bypass devlink phys_port_name generation
Jedrzej adds option to skip phys_port_name generation and opts
ixgbe into it as some configurations rely on pre-devlink naming
which could end up broken as a result.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
====================
Link: https://patch.msgid.link/20250812205226.1984369-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The data page pool always fills the HW rx ring with pages. On arm64 with
64K pages, this will waste _at least_ 32K of memory per entry in the rx
ring.
Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This
makes the data page pool the same as the header pool.
Tested with iperf3 with a small (64 entries) rx ring to encourage buffer
circulation.
Fixes: cd1fafe7da ("eth: bnxt: add support rx side device memory TCP")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmicWlINHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AFcpD/957FdnCNXL9HLbxNaeh+bGPS/t4SSVx9miAKJa
lBBr8Xka2bT7SSWDzigjgdxPOI8dWP0b50pp1hiAmFN/ZiAgIZvYJkkLN0wrwpKB
lQftsxXHCu4U5rkr+dZobgJyt3mLI/UP4L9Aigb2jZihsdMq1cnuXwA2HLVr2tjl
VH8Xk266uHHluB3JarNA4EIIWpXi0VXVLeuWCzcSgjsCKfdyh8POHn3XFzcdJUAz
3g3/25e0U+S+/QU7fjfOryUr1smLw4oXk/gKwvbJZO9Spqo3Sr5V9/z2jxzp3+aB
e/VfJhcXr/TmhnQH4y91Fgg3WWbh0p2yQNIAKzVGOdpNMXi2XSvL3IlANIXS5CbJ
pnAjwlq31ANi5SML2UsUFaBTAwew7ptJdTXoSD0ydgUwvA69lQ6Nv9SCMibMIu9u
drawMjlNwQ6vzpl02JqUMG5n7hsd4QHytHai/Ih3GUQaXUjDvXzNYWraRdoSrYNW
7vcA/VWm+zAAhIa57AiAm4UsfzJu7iZ4p/8d0bR801y75xnrDZK9yRHee725oZV3
JCIRg0II7K+c2hgA0UzBLysp5MPcpc4Ofw3lgVa1+uRxmrQciSZ6EEIAd4BwgwYf
kzeZ1gwUIwBpyg8IVLoUolEgA7T6jFpfcVhJtID21ryOhepD5vZ3d8QR/33OffcF
rmcATQ==
=jWzU
-----END PGP SIGNATURE-----
Merge tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for *net*:
1) I managed to add a null dereference crash in nft_set_pipapo
in the current development cycle, was not caught by CI
because the avx2 implementation is fine, but selftest
splats when run on non-avx2 host.
2) Fix the ipvs estimater kthread affinity, was incorrect
since 6.14. From Frederic Weisbecker.
3) nf_tables should not allow to add a device to a flowtable
or netdev chain more than once -- reject this.
From Pablo Neira Ayuso. This has been broken for long time,
blamed commit dates from v5.8.
* tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
====================
Link: https://patch.msgid.link/20250813113800.20775-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Align FSDAX enablement among multiple devices;
- Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing
CRYPTO{,_DEFLATE}=y even if EROFS=m.
- Fix atomic context detection to properly launch kworkers on demand;
- Fix block count statistics for 48-bit addressing support.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmic0bsRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qpdiw//b6iC9A3v1DlGQrfRXGLdY6L8NzebGbE/
iqJT3I5kw5EKHfQMZ3CIv3vLJvtKpyUBtrbS+kH4DfRtQWy6C9XZ7uLykMU/ljx0
pofrzRErB0wWvt5Ragj8vqa49T2tt/4xjiscnirelirpjquBJSuYl1wJ+Z5SrJ8k
9ex8G5wl+MKpQama4+VeYldCwNsbu0jZsJvQ3C82uGBnOUvJWtysnNUp1Hy9jbgq
dS4qphUhe8IjWYCaXWbcaDJZRmPryQr6BvcVWlTXLQObxpSlqn343a3I00Sp4zUE
6cISoR2mxRy7li8Ij5CL568w3+yy29Wd/pbJsbGrW3VYVg7RbsV+GkiYn3nUVzZr
kwqMnzxNUQj/6ky8tbcXxU/gJXNPoc08ADoANh6vMZ0DGvMTbAYL976Qfi4/M189
BT+2EdNWQoRUuvOvke4yb5K3vnpAYmyCpJJZOUqlFmgrSogdQf7PnIF2THlnZlwn
GG3oAfTldOludwb9INX20sddGkJJDqrDKwKbVSOqnLDVIMZTZknsGDm3M/2Tz73Y
eoSSvRyPaOVNbAayIhEv0KKIpHvM5d9F3mMcr3fZ7vlYrXbWsl9Z8oi0qhRiVW8K
d1keUMwTaKw052vyCpNOX+ZCO7YYBESe7rjecBc5FyM3wyqoDXoOXOEtCL0J8BpW
YHzxlzfAnRE=
=VQ1B
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Align FSDAX enablement among multiple devices
- Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing
CRYPTO{,_DEFLATE}=y even if EROFS=m
- Fix atomic context detection to properly launch kworkers on demand
- Fix block count statistics for 48-bit addressing support
* tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix block count report when 48-bit layout is on
erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC
erofs: Do not select tristate symbols from bool symbols
erofs: Fallback to normal access if DAX is not supported on extra device
Fix a regression introduced by commit b41642c877 ("rcu: Fix
rcu_read_unlock() deadloop due to IRQ work") which results in boot hang
as reported by kernel test bot at [1]. This issue happens because RCU
re-initializes the deferred QS IRQ work everytime it is queued. With
commit b41642c877, the IRQ work re-initialization can happen while it
is already queued. This results in IRQ work being requeued to itself.
When IRQ work finally fires, as it is requeued to itself, it is
repeatedly executed and results in hang. Fix this with initializing the
IRQ work only once before the CPU boots.
[1] https://lore.kernel.org/rcu/202508071303.c1134cce-lkp@intel.com/
-----BEGIN PGP SIGNATURE-----
iHQEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCaJlhIQAKCRAAHS7/6Z0w
pbjqAQCfjAYzQiFDEFoiOKvbMsaikg1fmj+Td1g+UsC3utAiBAD1HcQJsCAV21Ft
0xw6B+xBLbUC+dyIhfIffM+NQmrcCg==
=Y0kO
-----END PGP SIGNATURE-----
Merge tag 'rcu.fixes.6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU fix from Neeraj Upadhyay:
"Fix a regression introduced by commit b41642c877 ("rcu: Fix
rcu_read_unlock() deadloop due to IRQ work") which results in boot
hang as reported by kernel test bot at [1].
This issue happens because RCU re-initializes the deferred QS IRQ work
everytime it is queued. With commit b41642c877, the IRQ work
re-initialization can happen while it is already queued. This results
in IRQ work being requeued to itself. When IRQ work finally fires, as
it is requeued to itself, it is repeatedly executed and results in
hang.
Fix this with initializing the IRQ work only once before the CPU
boots"
Link: https://lore.kernel.org/rcu/202508071303.c1134cce-lkp@intel.com/ [1]
* tag 'rcu.fixes.6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
rcu: Fix racy re-initialization of irq_work causing hangs
or aren't considered necessary for -stable kernels. 10 of these fixes are
for MM.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaJwLpgAKCRDdBJ7gKXxA
js1sAP0c/XlVJhICq9aNJluu4Nj7cKTlzN7nvD/YRivZrG8XJQD/Q5nvFY6yeOdi
/HxCAdTDY5HsWv28nEbvnxKKbuFligU=
=mtKs
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"12 hotfixes. 5 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels.
10 of these fixes are for MM"
* tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
proc: proc_maps_open allow proc_mem_open to return NULL
mm/mremap: avoid expensive folio lookup on mremap folio pte batch
userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
mm: pass page directly instead of using folio_page
selftests/proc: fix string literal warning in proc-maps-race.c
fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
mm/smaps: fix race between smaps_hugetlb_range and migration
mm: fix the race between collapse and PT_RECLAIM under per-vma lock
mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
MAINTAINERS: add Masami as a reviewer of hung task detector
mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
kasan/test: fix protection against compiler elision
A chain/flowtable update with duplicated devices in the same batch is
possible. Unfortunately, netdev event path only removes the first
device that is found, leaving unregistered the hook of the duplicated
device.
Check if a duplicated device exists in the transaction batch, bail out
with EEXIST in such case.
WARNING is hit when unregistering the hook:
[49042.221275] WARNING: CPU: 4 PID: 8425 at net/netfilter/core.c:340 nf_hook_entry_head+0xaa/0x150
[49042.221375] CPU: 4 UID: 0 PID: 8425 Comm: nft Tainted: G S 6.16.0+ #170 PREEMPT(full)
[...]
[49042.221382] RIP: 0010:nf_hook_entry_head+0xaa/0x150
Fixes: 78d9f48f7f ("netfilter: nf_tables: add devices to existing flowtable")
Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
The estimator kthreads' affinity are defined by sysctl overwritten
preferences and applied through a plain call to the scheduler's affinity
API.
However since the introduction of managed kthreads preferred affinity,
such a practice shortcuts the kthreads core code which eventually
overwrites the target to the default unbound affinity.
Fix this with using the appropriate kthread's API.
Fixes: d1a8919758 ("kthread: Default affine kthread to its preferred NUMA node")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Blamed commit broke the check for a null scratch map:
- if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
+ if (unlikely(!raw_cpu_ptr(m->scratch)))
This should have been "if (!*raw_ ...)".
Use the pattern of the avx2 version which is more readable.
This can only be reproduced if avx2 support isn't available.
Fixes: d8d871a35c ("netfilter: nft_set_pipapo: merge pipapo_get/lookup")
Signed-off-by: Florian Westphal <fw@strlen.de>
Check a race where data disappears from the TCP socket after
TLS signaled that its ready to receive.
ok 6 global.data_steal
# RUN tls_basic.base_base ...
# OK tls_basic.base_base
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807232907.600366-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TLS expects that it owns the receive queue of the TCP socket.
This cannot be guaranteed in case the reader of the TCP socket
entered before the TLS ULP was installed, or uses some non-standard
read API (eg. zerocopy ones). Replace the WARN_ON() and a buggy
early exit (which leaves anchor pointing to a freed skb) with real
error handling. Wipe the parsing state and tell the reader to retry.
We already reload the anchor every time we (re)acquire the socket lock,
so the only condition we need to avoid is an out of bounds read
(not having enough bytes in the socket for previously parsed record len).
If some data was read from under TLS but there's enough in the queue
we'll reload and decrypt what is most likely not a valid TLS record.
Leading to some undefined behavior from TLS perspective (corrupting
a stream? missing an alert? missing an attack?) but no kernel crash
should take place.
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://lore.kernel.org/tFjq_kf7sWIG3A7CrCg_egb8CVsT_gsmHAK0_wxDPJXfIzxFAMxqmLwp3MlU5EHiet0AwwJldaaFdgyHpeIUCS-3m3llsmRzp9xIOBR4lAI=@syst3mfailure.io
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807232907.600366-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported the following ABBA deadlock:
CPU0 CPU1
---- ----
n_vclocks_store()
lock(&ptp->n_vclocks_mux) [1]
(physical clock)
pc_clock_adjtime()
lock(&clk->rwsem) [2]
(physical clock)
...
ptp_clock_freerun()
ptp_vclock_in_use()
lock(&ptp->n_vclocks_mux) [3]
(physical clock)
ptp_clock_unregister()
posix_clock_unregister()
lock(&clk->rwsem) [4]
(virtual clock)
Since ptp virtual clock is registered only under ptp physical clock, both
ptp_clock and posix_clock must be physical clocks for ptp_vclock_in_use()
to lock &ptp->n_vclocks_mux and check ptp->n_vclocks.
However, when unregistering vclocks in n_vclocks_store(), the locking
ptp->n_vclocks_mux is a physical clock lock, but clk->rwsem of
ptp_clock_unregister() called through device_for_each_child_reverse()
is a virtual clock lock.
Therefore, clk->rwsem used in CPU0 and clk->rwsem used in CPU1 are
different locks, but in lockdep, a false positive occurs because the
possibility of deadlock is determined through lock-class.
To solve this, lock subclass annotation must be added to the posix_clock
rwsem of the vclock.
Reported-by: syzbot+7cfb66a237c4a5fb22ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7cfb66a237c4a5fb22ad
Fixes: 73f37068d5 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250728062649.469882-1-aha310510@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Users of the ixgbe driver report that after adding devlink support by
the commit a0285236ab ("ixgbe: add initial devlink support") their
configs got broken due to unwanted changes of interface names. It's
caused by automatic phys_port_name generation during devlink port
initialization flow.
To prevent from that set no_phys_port_name flag for ixgbe devlink ports.
Reported-by: David Howells <dhowells@redhat.com>
Closes: https://lore.kernel.org/netdev/3452224.1745518016@warthog.procyon.org.uk/
Reported-by: David Kaplan <David.Kaplan@amd.com>
Closes: https://lore.kernel.org/netdev/LV3PR12MB92658474624CCF60220157199470A@LV3PR12MB9265.namprd12.prod.outlook.com/
Fixes: a0285236ab ("ixgbe: add initial devlink support")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently when adding devlink port, phys_port_name is automatically
generated within devlink port initialization flow. As a result adding
devlink port support to driver may result in forced changes of interface
names, which breaks already existing network configs.
This is an expected behavior but in some scenarios it would not be
preferable to provide such limitation for legacy driver not being able to
keep 'pre-devlink' interface name.
Add flag no_phys_port_name to devlink_port_attrs struct which indicates
if devlink should not alter name of interface.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Link: https://lore.kernel.org/all/nbwrfnjhvrcduqzjl4a2jafnvvud6qsbxlvxaxilnryglf4j7r@btuqrimnfuly/
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaJtqogAKCRBZ7Krx/gZQ
6+FEAQDw4ljGzHf0Yi8Fv7kDYKdRhetqaY8jR0FOOn0gsOkoagEAmEUsczpRnRar
5rGnHKImSBPYShduUbStsap8OxtlzwA=
=YmfX
-----END PGP SIGNATURE-----
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull habanalabs fix from Al Viro:
"Yet another use-after-free fix due to dma_buf_fd() misuse"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
habanalabs: fix UAF in export_dmabuf()
It turns out that the ECDT table inside the ThinkBook 14 G7 IML
contains a valid EC description but an invalid ID string
("_SB.PC00.LPCB.EC0"). Ignoring this ECDT based on the invalid
ID string prevents the kernel from detecting the built-in touchpad,
so relax the sanity check of the ID string and only reject ECDTs
with empty ID strings.
Reported-by: Ilya K <me@0upti.me>
Fixes: 7a0d59f6a9 ("ACPI: EC: Ignore ECDT tables with an invalid ID string")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Tested-by: Ilya K <me@0upti.me>
Link: https://patch.msgid.link/20250729062038.303734-1-W_Armin@gmx.de
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmibLJAACgkQxWXV+ddt
WDvQnQ/7Bo9ruVKwbLrGAoLE0KAUIRL5gdLJrPdSUiKHXDXTqBqls+ST8Lo4u9VW
jifNLva2lEH3Hexp8n2qDwm5jgmEz/cZT/91+xAIolwlleQuvN0aR4JcEOqYGG3U
zp/py1cqtWfw04Kf8aGRB+kaGGR1snciOFoe/1i0sorHNXdhp23VGXJ2Vn1J8smG
fCS5dkebI0z58AOj61D0MVo1MfM2NfjP6Xs89waHU9kdM89UY/iapFQ+OYBumJ3H
OeHuuuHmFOkv0yKMToJ9kU0MUx+28SgvXRgmoLnsx74SLno8shJkO3uRChZqtSZ3
1xAJh29tLWw7zsgXfr/5qeaCmUAoHJ4SIZnCkFhooglcpWsjlhaBb/PhI79VJFQ7
1+lTRoFdtA/I3389xyvveZGn0ELCuhkvkb40NWGMBM3NT112k1ulC9jKMbXzytK0
zJiSfkWChQJwWgPaEi8d4s4tvcyJlSQzzNgfEWSXVVeUFq6Ff2+7JWEqx1mBcNsc
/gV4nrBANcT57Wb2MNLGbnW6A5SW5VTUx1rUNOOLU5RM1o5tFtGpX+YTQkkNUZs2
ZfwxFW+VkvRuXPl6W+F2QsQSpmovk1giC0ezWOtwEzhQbLdxZq9LDXC8SJflEp/A
w8fKaHlaSJOYRR3XL1Wxo/KFdMBOz2UJY2eT35dfPbHKc5FkJU8=
=fB9v
-----END PGP SIGNATURE-----
Merge tag 'for-6.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix bug in qgroups reporting incorrect usage for higher level qgroups
- in zoned mode, do not select metadata group as finish target
- convert xarray lock to RCU when trying to release extent buffer to
avoid a deadlock
- do not allow relocation on partially dropped subvolumes, which is
normally not possible but has been reported on old filesystems
- in tree-log, report errors on missing block group when unaccounting
log tree extent buffers
- with large folios, fix range length when processing ordered extents
* tag 'for-6.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix iteration bug in __qgroup_excl_accounting()
btrfs: zoned: do not select metadata BG as finish target
btrfs: do not allow relocation of partially dropped subvolumes
btrfs: error on missing block group when unaccounting log tree extent buffers
btrfs: fix wrong length parameter for btrfs_cleanup_ordered_extents()
btrfs: make btrfs_cleanup_ordered_extents() support large folios
btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()
SNP guest which makes sure all cache lines belonging to a 4K page are
evicted after latter has been converted to a guest-private page.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiTjlwACgkQEsHwGGHe
VUrNQBAAtVZqCIfX3MFyhrBu6k4laMDNjKMneX2JregOv42ha/pjuJ/S8mNQhG3J
0vLGcPmW1y8hNtm3+hhnt2xo7fNpwgReXkYrjmQPfIWrYHVgdyTz+Rwsacf7D8Yj
p15XrFz+8w7SHI095ZcvJTc/rAxUbGkAXqYE6p7gj04iJtSe4FpdZZw9fW8IPooE
hZypWG4aAJdsFn4qmIsIh/2QYiCuVS9yhJuUX43K7Ft2xl5jXUYulu5yyWyBZC6r
OOe6Uc+nc5dDLa+U5ileqgXGf23LgE7LxbsVpKs5OetJmZhAjGTY+go6U/Zm03FU
bsRQG8OumitgeUd+tjwtl/KNCu03DggQ8a1cqTt09leOkejQ9mB/ArZHuBWjIR7x
hPX8mlOIhCWkOP6S8DnnI53baGz/OgGTvWP1Ehuv2WaM1ip7fDduhENK0TWY0Hx+
mTGG1HA5zILnYSGh+3hr5xC+YDNgvMR1uSWhG2zARrs98rgP3MC8P2kcrmSpLOFF
BFJ8i2TSppr6mDdnRspgq3jn3MYKaPQxjG4OJQEawqkfzfsnw4+kxGYFK/9YaeB7
KqqtU/4GPh65y/LsELXPp/NlPaZ6LXjLoLj2IwDEHNBl2iBiRmH+FDSFL0Hsxb/c
mMW0V2RA+OwzktARXMazXAqM7hR8SStnkW38SrT/YSrRnRFFGSg=
=8kfV
-----END PGP SIGNATURE-----
Merge tag 'snp_cache_coherency' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
- Add a mitigation for a cache coherency vulnerability when running an
SNP guest which makes sure all cache lines belonging to a 4K page are
evicted after latter has been converted to a guest-private page
[ SNP: Secure Nested Paging - not to be confused with Single Nucleotide
Polymorphism, which is the more common use of that TLA. I am on a
mission to write out the more obscure TLAs in order to keep track of
them.
Because while math tells us that there are only about 17k different
combinations of three-letter acronyms using English letters (26^3), I
am convinced that somehow Intel, AMD and ARM have together figured out
new mathematics, and have at least a million different TLAs that they
use. - Linus ]
* tag 'snp_cache_coherency' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Evict cache lines during SNP memory validation
Commit d33bd88ac0 ("ACPI: processor: perflib: Fix initial _PPC limit
application") added a pr->performance check that prevents the frequency
QoS request from being added when the given processor has no performance
object. Unfortunately, this causes a WARN() in freq_qos_remove_request()
to trigger on an attempt to take the given CPU offline later because the
frequency QoS object has not been added for it due to the missing
performance object.
Address this by moving the pr->performance check before calling
acpi_processor_get_platform_limit() so it only prevents a limit from
being set for the CPU if the performance object is not present. This
way, the frequency QoS request is added as it was before the above
commit and it is present all the time along with the CPU's cpufreq
policy regardless of whether or not the CPU is online.
Fixes: d33bd88ac0 ("ACPI: processor: perflib: Fix initial _PPC limit application")
Tested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2801421.mvXUDI8C0e@rafael.j.wysocki
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmiZtCAACgkQrB3Eaf9P
W7c/5w/+P3F7DWsjlNUipyIRtwaaMImrELvwrRbg4ajv4RALd1HRSWV8idHg2Kaj
7XSJzDoGjegCwfXlRAuMRlmPz8HJUEkZ4rAeygOqwxRCrtV7R1JoGbiHot0Bk+Jn
aWNvbD4/cbULmkvdo1CBPmONb5XDkLJA1Rh6dJJoZtlCCF20zC1HLesoT1EZcIkE
1Rpmb+O2RB2zj0m+ciKuU5NgqGC4jpwcB+Wlcpa7hTUBIIsuEfUmj8IHXir/7b0v
MKZMqJfD4xM3MxRQjkR8xEXDcrVGLsS18BzCDW6x3DcW+aq+0gZlUhLg6LDB8c+e
lwHMzeTlorWIsFf4PrqL4QJMlFx8S5UInZOe/sex1xTJ3afhQgVJv+48V9U+XI4E
eOL6DVBG+L00dieBdSRcdF2g+ceOx0PaLtfSHhc1zLuWSOlgwnlcLfhhTfxw4eFl
ShE/TaGgat4l6ng1HHrq6ZnjjCSRXdfAquGUgIWIPlYyQw+8dxXtYaqMJKP28Gl/
HDqZRxoRy0Wy1woABj7vpotg9I/hi3wV7mRcWxo+tWfMNJm+4BhjvUyp8yYtaV3j
+1nO1HCg9nEQATVPNQKIUlo7UxwRcgFcm4QzfcWtsGh7/5O0VcUxgIRJtVdY2TZS
nSBcEQUuS0l47ngGOTzd1pHce5ocKV8YDEprqsXuqYAUh59KnD8=
=kxTv
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2025-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2025-08-11
1) Fix flushing of all states in xfrm_state_fini.
From Sabrina Dubroca.
2) Fix some IPsec software offload features. These
got lost with some recent HW offload changes.
From Sabrina Dubroca.
Please pull or let me know if there are problems.
* tag 'ipsec-2025-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
udp: also consider secpath when evaluating ipsec use for checksumming
xfrm: bring back device check in validate_xmit_xfrm
xfrm: restore GSO for SW crypto
xfrm: flush all states in xfrm_state_fini
====================
Link: https://patch.msgid.link/20250811092008.731573-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski says:
====================
net: prevent deadlocks and mis-configuration with per-NAPI threaded config
Running the test added with a recent fix on a driver with persistent
NAPI config leads to a deadlock. The deadlock is fixed by patch 3,
patch 2 is I think a more fundamental problem with the way we
implemented the config.
I hope the fix makes sense, my own thinking is definitely colored
by my preference (IOW how the per-queue config RFC was implemented).
v1: https://lore.kernel.org/20250808014952.724762-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250809001205.1147153-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The following order of calls currently deadlocks if:
- device has threaded=1; and
- NAPI has persistent config with threaded=0.
netif_napi_add_weight_config()
dev->threaded == 1
napi_kthread_create()
napi_enable()
napi_restore_config()
napi_set_threaded(0)
napi_stop_kthread()
while (NAPIF_STATE_SCHED)
msleep(20)
We deadlock because disabled NAPI has STATE_SCHED set.
Creating a thread in netif_napi_add() just to destroy it in
napi_disable() is fairly ugly in the first place. Let's read
both the device config and the NAPI config in netif_napi_add().
Fixes: e6d7626881 ("net: Update threaded state in napi config in netif_set_threaded")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We have to make sure that all future NAPIs will have the right threaded
state when the state is configured on the device level.
We chose not to have an "unset" state for threaded, and not to wipe
the NAPI config clean when channels are explicitly disabled.
This means the persistent config structs "exist" even when their NAPIs
are not instantiated.
Differently put - the NAPI persistent state lives in the net_device
(ncfg == struct napi_config):
,--- [napi 0] - [napi 1]
[dev] | |
`--- [ncfg 0] - [ncfg 1]
so say we a device with 2 queues but only 1 enabled:
,--- [napi 0]
[dev] |
`--- [ncfg 0] - [ncfg 1]
now we set the device to threaded=1:
,---------- [napi 0 (thr:1)]
[dev(thr:1)] |
`---------- [ncfg 0 (thr:1)] - [ncfg 1 (thr:?)]
Since [ncfg 1] was not attached to a NAPI during configuration we
skipped it. If we create a NAPI for it later it will have the old
setting (presumably disabled). One could argue if this is right
or not "in principle", but it's definitely not how things worked
before per-NAPI config..
Fixes: 2677010e77 ("Add support to set NAPI threaded for individual NAPI")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test is implicitly assuming the device only has 2 queues.
A real device will likely have more. The exact problem is that
because NAPIs get added to the list from the head, the netlink
dump reports them in reverse order. So the naive napis[0] will
actually likely give us the _last_ NAPI, not the first one.
Re-enable all the NAPIs instead of hard-coding 2 in the test.
This way the NAPIs we operated on will always reappear,
doesn't matter where they were in the registration order.
Fixes: e6d7626881 ("net: Update threaded state in napi config in netif_set_threaded")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yao Zi says:
====================
Fix broken link with TH1520 GMAC when linkspeed changes
It's noted that on TH1520 SoC, the GMAC's link becomes broken after
the link speed is changed (for example, running ethtool -s eth0 speed
100 on the peer when negotiated to 1Gbps), but the GMAC could function
normally if the speed is brought back to the initial.
Just like many other SoCs utilizing STMMAC IP, we need to adjust the TX
clock supplying TH1520's GMAC through some SoC-specific glue registers
when linkspeed changes. But it's found that after the full kernel
startup, reading from them results in garbage and writing to them makes
no effect, which is the cause of broken link.
Further testing shows perisys-apb4-hclk must be ungated for normal
access to Th1520 GMAC APB glue registers, which is neither described in
dt-binding nor acquired by the driver.
This series expands the dt-binding of TH1520's GMAC to allow an extra
"APB glue registers interface clock", instructs the driver to acquire
and enable the clock, and finally supplies CLK_PERISYS_APB4_HCLK for
TH1520's GMACs in SoC devicetree.
v2: https://lore.kernel.org/netdev/20250801091240.46114-1-ziyao@disroot.org/
v1: https://lore.kernel.org/all/20250729093734.40132-1-ziyao@disroot.org/
====================
Link: https://patch.msgid.link/20250808093655.48074-2-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Describe perisys-apb4-hclk as the APB clock for TH1520 SoC, which is
essential for accessing GMAC glue registers.
Fixes: 7e756671a6 ("riscv: dts: thead: Add TH1520 ethernet nodes")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Tested-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250808093655.48074-5-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
It's necessary to adjust the MAC TX clock when the linkspeed changes,
but it's noted such adjustment always fails on TH1520 SoC, and reading
back from APB glue registers that control clock generation results in
garbage, causing broken link.
With some testing, it's found a clock must be ungated for access to APB
glue registers. Without any consumer, the clock is automatically
disabled during late kernel startup. Let's get and enable it if it's
described in devicetree.
For backward compatibility with older devicetrees, probing won't fail if
the APB clock isn't found. In this case, we emit a warning since the
link will break if the speed changes.
Fixes: 33a1a01e3a ("net: stmmac: Add glue layer for T-HEAD TH1520 SoC")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Tested-by: Drew Fustini <fustini@kernel.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250808093655.48074-4-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Besides ones for GMAC core and peripheral registers, the TH1520 GMAC
requires one more clock for configuring APB glue registers. Describe
it in the binding.
Fixes: f920ce04c3 ("dt-bindings: net: Add T-HEAD dwmac support")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250808093655.48074-3-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
reset_gpio is claimed in mdiobus_register_device(), but it is not
released in mdiobus_unregister_device(). It is instead only
released when the whole MDIO bus is unregistered.
When a device uses the reset_gpio property, it becomes impossible
to unregister it and register it again, because the GPIO remains
claimed.
This patch resolves that issue.
Fixes: bafbdd527d ("phylib: Add device reset GPIO support") # see notes
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Csókás Bence <csokas.bence@prolan.hu>
[ csokas.bence: Resolve rebase conflict and clarify msg ]
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/20250807135449.254254-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
TJA1103/04/20/21 support both C22 and C45 accessing methods.
The TJA11xx driver has implemented the match_phy_device() API.
However, it does not handle the C45 ID. If C45 was used to access
TJA11xx, match_phy_device() would always return false due to
phydev->phy_id only used by C22 being empty, resulting in the
generic phy driver being used for TJA11xx PHYs.
Therefore, check phydev->c45_ids.device_ids[MDIO_MMD_PMAPMD] when
using C45.
Fixes: 1b76b2497a ("net: phy: nxp-c45-tja11xx: simplify .match_phy_device OP")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Link: https://patch.msgid.link/20250807040832.2455306-1-xiaoning.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The commit 65c6604725 ("proc: fix the issue of proc_mem_open returning
NULL") caused proc_maps_open() to return -ESRCH when proc_mem_open()
returns NULL. This breaks legitimate /proc/<pid>/maps access for kernel
threads since kernel threads have NULL mm_struct.
The regression causes perf to fail and exit when profiling a kernel
thread:
# perf record -v -g -p $(pgrep kswapd0)
...
couldn't open /proc/65/task/65/maps
This patch partially reverts the commit to fix it.
Link: https://lkml.kernel.org/r/20250807165455.73656-1-wjl.linux@gmail.com
Fixes: 65c6604725 ("proc: fix the issue of proc_mem_open returning NULL")
Signed-off-by: Jialin Wang <wjl.linux@gmail.com>
Cc: Penglei Jiang <superman.xpt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It was discovered in the attached report that commit f822a9a81a ("mm:
optimize mremap() by PTE batching") introduced a significant performance
regression on a number of metrics on x86-64, most notably
stress-ng.bigheap.realloc_calls_per_sec - indicating a 37.3% regression in
number of mremap() calls per second.
I was able to reproduce this locally on an intel x86-64 raptor lake
system, noting an average of 143,857 realloc calls/sec (with a stddev of
4,531 or 3.1%) prior to this patch being applied, and 81,503 afterwards
(stddev of 2,131 or 2.6%) - a 43.3% regression.
During testing I was able to determine that there was no meaningful
difference in efforts to optimise the folio_pte_batch() operation, nor
checking folio_test_large().
This is within expectation, as a regression this large is likely to
indicate we are accessing memory that is not yet in a cache line (and
perhaps may even cause a main memory fetch).
The expectation by those discussing this from the start was that
vm_normal_folio() (invoked by mremap_folio_pte_batch()) would likely be
the culprit due to having to retrieve memory from the vmemmap (which
mremap() page table moves does not otherwise do, meaning this is
inevitably cold memory).
I was able to definitively determine that this theory is indeed correct
and the cause of the issue.
The solution is to restore part of an approach previously discarded on
review, that is to invoke pte_batch_hint() which explicitly determines,
through reference to the PTE alone (thus no vmemmap lookup), what the PTE
batch size may be.
On platforms other than arm64 this is currently hardcoded to return 1, so
this naturally resolves the issue for x86-64, and for arm64 introduces
little to no overhead as the pte cache line will be hot.
With this patch applied, we move from 81,503 realloc calls/sec to 138,701
(stddev of 496.1 or 0.4%), which is a -3.6% regression, however accounting
for the variance in the original result, this is broadly restoring
performance to its prior state.
Link: https://lkml.kernel.org/r/20250807185819.199865-1-lorenzo.stoakes@oracle.com
Fixes: f822a9a81a ("mm: optimize mremap() by PTE batching")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508071609.4e743d7c-lkp@intel.com
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In commit_anon_folio_batch(), we iterate over all pages pointed to by the
PTE batch. Therefore we need to know the first page of the batch;
currently we derive that via folio_page(folio, 0), but, that takes us to
the first (head) page of the folio instead - our PTE batch may lie in the
middle of the folio, leading to incorrectness.
Bite the bullet and throw away the micro-optimization of reusing the folio
in favour of code simplicity. Derive the page and the folio in
change_pte_range, and pass the page too to commit_anon_folio_batch to fix
the aforementioned issue.
Link: https://lkml.kernel.org/r/20250806145611.3962-1-dev.jain@arm.com
Fixes: cac1db8c3a ("mm: optimize mprotect() by PTE batching")
Reported-by: syzbot+57bcc752f0df8bb1365c@syzkaller.appspotmail.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Debugged-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>